Georgia Institute of Technology
Professor Heck is head of Georgia Tech's AI Virtual Assistant (AVA) lab. He joined Georgia Tech in 2021 after 30+ years in Silicon Valley with leading AI research labs and companies (Stanford Research Institute, Nuance, Yahoo!, Microsoft, Google, and Samsung). The AVA Lab's research is in the areas of next-generation Virtual Digital Assistants, Conversational AI with Large Language Models (LLMs), LLM Response Generation, and Natural Language and Speech Processing with the long-term goal of creating an autonomous digital human that can freely communicate with humans in open, mixed-reality domains. The AVA Lab revisits assumptions regarding every aspect of modern AVAs - human-computer interaction design, single vs multimodal interactions, situated interactions over screens and mixed reality (AR/VR), task-oriented conversations to open-domain chit-chat to both, explicit to implicit (commonsense) knowledge-driven conversations, and higher level inference and reasoning. Recent years have seen significant advances in conversational systems, particularly with the advent of attention-based language models pre-trained on large datasets of unlabeled natural language text. While the breadth of the models has led to fluid and coherent dialogues over a broad range of topics, they can make mistakes when high precision is required. High precision is not only required when specialized skills are involved (legal/medical/tax advice, computations, etc.), but also to avoid seemingly trivial mistakes such as commonsense and other relevant ‘in-the-moment’ contexts. Much of this context centers on and should be derived from the user’s perspective. The AVA Lab's work in this area leverages this user-centric context (build it for one) and the user’s specific situation (right place right time). Professor Heck's work leverages these forms of situated knowledge to create much more accurate, trustworthy, explainable, and computationally efficient conversational systems. The situated context includes Conversational Content where the AI and human converse over a shared document, map,, etc., Conversational Vision over shared image/video, Conversational Knowledge over shared knowledge (world, personal, commonsense), and Conversational Expression - interpreting/emoting facial and body language between the AI and human.