SpeechYOLO: Detection and Localization of Speech Objects

@inproceedings{Segal2019SpeechYOLODA,
  title={SpeechYOLO: Detection and Localization of Speech Objects},
  author={Y. Segal and T. Fuchs and Joseph Keshet},
  booktitle={INTERSPEECH},
  year={2019}
}
  • Y. Segal, T. Fuchs, Joseph Keshet
  • Published in INTERSPEECH 2019
  • Engineering, Computer Science, Mathematics
  • In this paper, we propose to apply object detection methods from the vision domain on the speech recognition domain, by treating audio fragments as objects. More specifically, we present SpeechYOLO, which is inspired by the YOLO algorithm for object detection in images. The goal of SpeechYOLO is to localize boundaries of utterances within the input signal, and to correctly classify them. Our system is composed of a convolutional neural network, with a simple least-mean-squares loss function. We… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 26 REFERENCES
    Jointly Learning to Locate and Classify Words Using Convolutional Networks
    20
    Visually Grounded Learning of Keyword Prediction from Untranscribed Speech
    40
    Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
    231
    You Only Look Once: Unified, Real-Time Object Detection
    8203
    Discriminative keyword spotting
    111
    A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment
    40
    Results of the 2006 Spoken Term Detection Evaluation
    252
    Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
    1446