Improving Target Sound Extraction with Timestamp Information
@inproceedings{Wang2022ImprovingTS, title={Improving Target Sound Extraction with Timestamp Information}, author={Helin Wang and Dongchao Yang and Chao Weng and Jianwei Yu and Yuexian Zou}, booktitle={Interspeech}, year={2022} }
Target sound extraction (TSE) aims to extract the sound part of a target sound event class from a mixture audio with multiple sound events. The previous works mainly focus on the problems of weakly-labelled data, jointly learning and new classes, however, no one cares about the onset and offset times of the target sound event, which has been emphasized in the auditory scene analysis. In this paper, we study to utilize such timestamp information to help extract the target sound via a target…
33 References
Detect What You Want: Target Sound Detection
- 2022
Computer Science
DCASE
A novel target sound detection network (TSDNet) is presented which consists of two main parts: A conditional network which aims at generating a sound-discriminative conditional embedding vector representing the target sound, and a detection network which takes both the mixture audio and the conditionalembedding vector as inputs and produces the detection result of thetarget sound.
Few-shot learning of new sound classes for target sound extraction
- 2021
Physics
Interspeech
This work proposes combining 1-hotand enrollment-based target sound extraction, allowing optimal performance for seen AE classes and simple extension to new classes, and proposes adapting the embedding vectors obtained from a few enrollment audio samples to further improve performance on new classes.
Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis
- 2020
Computer Science
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This work proposes a source separation framework trained with weakly labelled data that can separate 527 kinds of sound classes from AudioSet within a single system.
Environmental Sound Classification with Parallel Temporal-Spectral Attention
- 2020
Computer Science, Environmental Science
INTERSPEECH
A novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations is proposed, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands.
One-Shot Conditional Audio Filtering of Arbitrary Sounds
- 2021
Physics
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We consider the problem of separating a particular sound source from a single-channel mixture, based on only a short sample of the target source (from the same recording). Using SoundFilter, a…
Audio Query-based Music Source Separation
- 2019
Computer Science
ISMIR
A network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals is proposed.
What Affects the Performance of Convolutional Neural Networks for Audio Event Classification
- 2019
Computer Science
2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
This paper designs convolutional neural networks for audio event classification (called FPNet), and on the environmental sounds dataset ESC-50, the classification accuracies of FPNet-1D andFPNet-2D achieve 73.90% and 85.10% respectively, which improve significantly comparing to the previous methods.
Learning to Separate Sounds from Weakly Labeled Scenes
- 2020
Computer Science
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This work proposes objective functions and network architectures that enable training a source separation system with weak labels, and benchmarks performance using synthetic mixtures of overlapping sound events recorded in urban environments.
Recurrent neural networks for polyphonic sound event detection in real life recordings
- 2016
Computer Science
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single…
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
- 2020
Computer Science
IEEE/ACM Transactions on Audio, Speech, and Language Processing
This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.