• Publications
  • Influence
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
TLDR
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Audio Event Detection using Weakly Labeled Data
TLDR
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording
TLDR
The work on Task 1 Acoustic Scene Classification and Task 3 Sound Event Detection in Real Life Recordings has low-level and high-level features, classifier optimization and other heuristics specific to each task.
Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
TLDR
This paper deals with improving speech quality in office environment where multiple stationary as well as non-stationary noises can be simultaneously present in speech and proposes several strategies based on Deep Neural Networks for speech enhancement in these scenarios.
Ego4D: Around the World in 3, 000 Hours of Egocentric Video
TLDR
Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community and presents a host of new benchmark challenges centered around understanding the first-person visual experience in the past, present, and future.
A Closer Look at Weak Label Learning for Audio Events
TLDR
This work describes a CNN based approach for weakly supervised training of audio events and describes important characteristics, which naturally arise inweakly supervised learning of sound events, and shows how these aspects of weak labels affect the generalization of models.
Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
TLDR
A robust and efficient deep convolutional neural network (CNN) based framework to learn audio event recognizers from weakly labeled data that can train from and analyze recordings of variable length in an efficient manner and outperforms a network trained with {\em strongly labeled} web data by a considerable margin.
Informedia@TrecVID 2014: MED and MER
TLDR
On the MED task, the CMU team achieved leading performance in the Semantic Query, 000Ex, 010Ex and 100Ex settings, and the system utilizes a subset of features and detection results from the MED system from which the recounting is then generated.
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
TLDR
It is advocated that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds using both the audio and visual modalities as opposed to one or the other.
Audio event detection from acoustic unit occurrence patterns
TLDR
This paper develops a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which it is called Acoustic Unit Descriptors or AUDs.
...
...