Corpus ID: 215814397

DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator

  title={DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator},
  author={H. Lee and Seunghyun Yoon and Franck Dernoncourt and Doo Soon Kim and Trung Bui and K. Jung},
Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog. Existing systems for this task employ the transformers or recurrent neural network-based architecture with the encoder-decoder framework. Even though these techniques show superior performance for this task, they have significant limitations: the model easily overfits only to memorize the grammatical patterns; the model… Expand
5 Citations
Structured Co-reference Graph Attention for Video-grounded Dialogue
  • Junyeong Kim, Sunjae Yoon, Dahyun Kim, C. Yoo
  • Computer Science
  • ArXiv
  • 2021
  • Highly Influenced
  • PDF
Look Before you Speak: Visually Contextualized Utterances
  • 1
  • PDF


Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
  • 32
  • Highly Influential
  • PDF
Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog
  • 7
  • Highly Influential
  • PDF
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
  • 30
  • Highly Influential
  • PDF
Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
  • 3
  • PDF
Attention is All you Need
  • 17,099
  • Highly Influential
  • PDF
Visual Dialog
  • 353
  • PDF
Sequence to Sequence Learning with Neural Networks
  • 11,947
  • PDF
Audio Visual Scene-Aware Dialog
  • 35
  • PDF
End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features
  • Chiori Hori, H. AlAmri, +10 authors Devi Parikh
  • Computer Science, Engineering
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
  • 57
  • Highly Influential
  • PDF
A Simple Baseline for Audio-Visual Scene-Aware Dialog
  • 11
  • PDF