• Publications
  • Influence
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection
TLDR
This paper presents the AVA Active Speaker detection dataset (AVA-ActiveSpeaker), which has been publicly released to facilitate algorithm development and comparison, and introduces a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compares several variants.
Non-negative matrix factorization based compensation of music for automatic speech recognition
TLDR
Non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music is proposed and shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.
Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification
TLDR
This paper uses audio from multi-class Youtube-quality multimedia data to converge on a set of sound units, such that each audio file is represented as a sequence of these units, and tries to learn category language models over sequences of acoustic units.
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
TLDR
A new dataset is described which will be released publicly containing densely labeled speech activity in YouTube videos, with the goal of creating a shared, available dataset for speech activity detection.
Engaging Collaborative Learners with Helping Agents
TLDR
The finding from a classroom study is that dialogue-based support is more effective in this collaborative context when invitations for help in the form of pointer hints are offered automatically, but dialogue agents are only provided when the invitation is explicitly accepted.
Audio event detection from acoustic unit occurrence patterns
TLDR
This paper develops a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which it is called Acoustic Unit Descriptors or AUDs.
Motivation and collaborative behavior: an exploratory analysis
TLDR
An exploratory analysis of data from a collaborative learning study from the standpoint of motivation type of students and their partners sees that a student's own motivation orientation may color their perception of the exchange of help in the collaboration.
Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers
TLDR
This work presents a system that associates faces with voices in a video by fusing information from the audio and visual signals to effectively associate faces and voices by aggregating statistics across a video.
It's Not Easy Being Green: Supporting Collaborative "Green Design" Learning
TLDR
A study in which alternative forms of collaborative learning support in the midst of a collaborative design task in which students negotiate between increasing power and increasing environmental friendliness is presented.
...
...