Clotho: An Audio Captioning Dataset

@article{Drossos2019ClothoAA,
  title={Clotho: An Audio Captioning Dataset},
  author={Konstantinos Drossos and Samuel Lipping and Tuomas Virtanen},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.09387}
}
  • Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen
  • Published 2019
  • Computer Science, Engineering
  • ArXiv
  • Audio captioning is the novel task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal. In this paper we present Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length, and a baseline method to provide initial results… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 13 REFERENCES

    Audio Caption: Listen and Tell

    • Mengyue Wu, Heinrich Dinkel, Kai Yu
    • Computer Science, Engineering
    • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2019
    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Automated audio captioning with recurrent neural networks

    VIEW 5 EXCERPTS

    Audio Set: An ontology and human-labeled dataset for audio events

    VIEW 2 EXCERPTS

    Deep Visual-Semantic Alignments for Generating Image Descriptions

    VIEW 1 EXCERPT

    Freesound technical demo

    VIEW 1 EXCERPT

    Adam: A Method for Stochastic Optimization

    VIEW 1 EXCERPT