MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

  title={MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection},
  author={Chandan K. A. Reddy and Vishak Gopa and Harishchandra Dubey and Sergiy Matusevych and Ross Cutler and Robert Aichner},
With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience. In this paper, we present MusicNet, a compact neural model for detecting background music in the real-time communications pipeline. In video meetings, music frequently co-occurs with speech and background noises, making the accurate classification quite challenging. We… 

Tables from this paper



PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.

Audio Set: An ontology and human-labeled dataset for audio events

The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

Interspeech 2021 Deep Noise Suppression Challenge

In this version of the Deep Noise Suppression challenge, the training and test datasets were expanded to accommodate fullband scenarios and challenging test conditions and a reliable non-intrusive objective speech quality metric for wideband called DNSMOS was made available for participants to use during their development phase.

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments, selected from the Google AudioSet dataset.

Icassp 2022 Deep Noise Suppression Challenge

This challenge opened-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P.835 to rate and rank-order the challenge entries.

Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast

  • S. VenkateshD. Moffat E. Miranda
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
The data synthesis procedure is demonstrated as a highly effective technique to generate large datasets to train deep neural networks for audio segmentation and outperformed state-of-the-art algorithms for music-speech detection.

Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

The joint task of music detection and music relative loudness estimation has two characteristics, i.e., temporality and hierarchy, which could facilitate to obtain the solution, and Hierarchical Regulated Iterative Networks (HRIN), which includes two variants, termed asHRIN-r and HRIN-cr, which are based on recurrent and convolutional recurrent modules are proposed.

Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented, and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

A Multi-Task Learning (MTL) framework for learning from Weakly Labelled Audio data which encompasses the traditional MIL setup and outperforms existing benchmark models over all SNRs, specifically 22.3 % over benchmark models on 0, 10 and 20 dB SNR respectively.

Development of an Non-Speech Audio Event Detection System

The system can recognize thirteen types of sound events, such as baby crying, a person screaming or asking for help and so on, and has extremely low false positive rate and can be used for automated continuous monitoring.