• Publications
  • Influence
SphereFace: Deep Hypersphere Embedding for Face Recognition
TLDR
This paper proposes the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features in deep face recognition (FR) problem under open-set protocol.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
A vector Taylor series approach for environment-independent speech recognition
TLDR
This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Sphinx-4: a flexible open source framework for speech recognition
TLDR
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems and to provide researchers with a "researchready" system.
Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition
TLDR
This work proposes a novel feature enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space and proves that MIFS enhances the learnability of differential-based features exponentially.
Speech denoising using nonnegative matrix factorization with priors
TLDR
A technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models is presented and improvements in speech quality across a range of interfering noise types are shown.
Missing-feature approaches in speech recognition
  • B. Raj, R. Stern
  • Computer Science
    IEEE Signal Processing Magazine
  • 26 September 2005
TLDR
Results confirm the effectiveness of all types of missing feature approaches discussed in ameliorating the effects of both stationary and transient noise, as well as the particular effectiveness of both soft masks and fragment decoding.
Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures
TLDR
A sparse latent variable model that can learn sounds based on their distribution of time/ frequency energy is presented that can be used to extract known types of sounds from mixtures in two scenarios.
Greedy sparsity-constrained optimization
TLDR
This paper presents a greedy algorithm, dubbed Gradient Support Pursuit (GraSP), for sparsity-constrained optimization, and quantifiable guarantees are provided for GraSP when cost functions have the “Stable Hessian Property”.
...
...