• Publications
  • Influence
Principles of computerized tomographic imaging
Computerized Tomographic Imaging provides a tutorial overview of topics in tomographic imaging covering mathematical principles and theory to apply the theory to problems in medical imaging and other fields. Expand
  • 4,250
  • 367
CNN architectures for large-scale audio classification
We use various CNN architectures to classify soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. Expand
  • 845
  • 119
  • PDF
Construction and evaluation of a robust multifeature speech/music discriminator
We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. Expand
  • 960
  • 88
  • PDF
Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG.
How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. ThisExpand
  • 406
  • 65
  • PDF
Video Rewrite: driving visual speech with audio
Video Rewrite uses existing footage to create automatically new video of a person mouthing words that she did not speak in the original footage. Expand
  • 684
  • 47
  • PDF
Content-Based Music Information Retrieval: Current Directions and Future Challenges
The steep rise in music downloading over CD sales has created a major shift in the music industry away from physical media formats and towards online products and services. Expand
  • 612
  • 43
  • PDF
An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank
This report describes an implementation of a cochlear model proposed by Roy Patterson [Patterson1992]. Expand
  • 508
  • 31
  • PDF
Limitations of Imaging with First-Order Diffraction Tomography
In this paper, the results of computer simulations used to determine the domains of applicability of the first-order Born and Rytov approximations in diffraction tomography for cross-sectional (orExpand
  • 561
  • 21
  • PDF
Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes]
This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases. Expand
  • 257
  • 19
  • PDF
FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker, projecting them onto a single axis. Expand
  • 154
  • 17
  • PDF