Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

  title={Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice},
  author={V. Mitra and Sue Booker and E. Marchi and David Scott Farrar and Ute Dorothea Peitz and Bridget J. Cheng and Ermine A. Teves and A. Mehta and Devang Naik},
  • V. Mitra, Sue Booker, +6 authors Devang Naik
  • Published in INTERSPEECH 2019
  • Computer Science, Engineering
  • Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions. A transcription driven approach can interpret what has been said but fails to acknowledge how it… CONTINUE READING
    2 Citations

    Paper Mentions

    Detecting Emotion Primitives from Speech and Their Use in Discerning Categorical Emotions
    • Vasudha Kowtha, V. Mitra, +5 authors Devang Naik
    • Computer Science, Engineering
    • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2020
    • PDF
    Attentive Modality Hopping Mechanism for Speech Emotion Recognition
    • 7
    • PDF


    Investigating Utterance Level Representations for Detecting Intent from Acoustics
    • 3
    • PDF
    Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
    • 49
    • PDF
    Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies
    • 50
    • PDF
    Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models
    • 44
    • PDF
    Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
    • 46
    • PDF
    Predicting Arousal and Valence from Waveforms and Spectrograms Using Deep Neural Networks
    • 18
    • PDF
    Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing
    • 70
    • PDF
    The SRI AVEC-2014 Evaluation System
    • 41
    • PDF
    Unveiling the Acoustic Properties that Describe the Valence Dimension
    • 35
    • PDF