Multimodal Dialogue State Tracking

  title={Multimodal Dialogue State Tracking},
  author={Hung Le and Nancy F. Chen and Steven C. H. Hoi},
  booktitle={North American Chapter of the Association for Computational Linguistics},
Designed for tracking user goals in dialogues, a dialogue state tracker is an essential component in a dialogue system. However, the research of dialogue state tracking has largely been limited to unimodality, in which slots and slot values are limited by knowledge domains (e.g. restaurant domain with slots of restaurant name and price range) and are defined by specific database schema. In this paper, we propose to extend the definition of dialogue state tracking to multimodality. Specifically… 

“Do you follow me?”: A Survey of Recent Approaches in Dialogue State Tracking

It is argued that some critical aspects of dialogue systems such as generalizability are still underexplored and to motivate future studies, several research avenues are proposed.

Multimodal Context Carryover

This work presents a novel yet pragmatic approach to expand an existing dialogue-based context carryover system in a voice assistant with state-of-the-art multi-modal components to facilitate quick deliv-ery of visual modality support with minimum changes.



Scalable multi-domain dialogue state tracking

A novel framework for state tracking is introduced which is independent of the slot value set, and represent the dialogue state as a distribution over a set of values of interest (candidate set) derived from the dialogue history or knowledge, which addresses the problem of slot-scalability.

Multimodal Dialogue State Tracking By QA Approach with Data Augmentation

This paper interprets the AVSD task from an open-domain Question Answering (QA) point of view and proposes a multimodal open- domain QA system to deal with the problem and introduces a new data augmentation approach specifically under QA assumption.

Global-Locally Self-Attentive Encoder for Dialogue State Tracking

This paper proposes the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules and shows that this significantly improves tracking of rare states.

Non-Autoregressive Dialog State Tracking

A novel framework of Non-Autoregressive Dialog State Tracking (NADST) which can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots is proposed.

UniConv: A Unified Conversational Neural Architecture for Multi-domain Task-oriented Dialogues

"UniConv" -- a novel unified neural architecture for end-to-end conversational systems in multi-domain task-oriented dialogues, designed to jointly train a Bi-level State Tracker which tracks dialogue states by learning signals at both slot and domain level independently and a Joint Dialogue Act and Response Generator which incorporates information from various input components and models dialogue acts and target responses simultaneously.

Visual Dialogue State Tracking for Question Generation

This paper proposes visual dialogue state tracking (VDST) based method for question generation that significantly outperforms existing methods and achieves new state-of-the-art performance on GuessWhat?! dataset.

Neural Belief Tracker: Data-Driven Dialogue State Tracking

This work proposes a novel Neural Belief Tracking (NBT) framework which overcomes past limitations, matching the performance of state-of-the-art models which rely on hand-crafted semantic lexicons and outperforming them when such lexicons are not provided.

Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems

A Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using copy mechanism, facilitating transfer when predicting (domain, slot, value) triplets not encountered during training.

The Second Dialog State Tracking Challenge

The results suggest that while large improvements on a competitive baseline are possible, trackers are still prone to degradation in mismatched conditions and ensemble learning demonstrates the most accurate tracking can be achieved by combining multiple trackers.

Knowledge-Aware Graph-Enhanced GPT-2 for Dialogue State Tracking

A novel hybrid architecture is presented that augments GPT-2 with representations derived from Graph Attention Networks in such a way to allow causal, sequential prediction of slot values and captures inter-slot relationships and dependencies across domains that otherwise can be lost in sequential prediction.