• Publications
  • Influence
Short-time Gaussianization for robust speaker verification
TLDR
In this paper, a novel approach for robust speaker verification, namely short-timeGaussianization, is proposed. Expand
  • 139
  • 10
  • PDF
Compression of acoustic features for speech recognition in network environments
TLDR
In this paper, we describe a new compression algorithm for encoding acoustic features used in typical speech recognition systems. Expand
  • 71
  • 8
  • PDF
The awe and mystery of t-norm
TLDR
Under certain local assumptions the T-norm performs a gaussianization of the individual true and impostor score populations and further derive conditions for clockwise and counter-clockwise DET rotations caused by this transform. Expand
  • 41
  • 3
  • PDF
Compensation of utterance length for speaker verification
TLDR
The effect of utterance length on the estimation of the likelihood of a speaker has previously seen a brief treatment in past works. Expand
  • 20
  • 2
  • PDF
Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition
TLDR
This paper introduces Pseudo Pitch Synchronous (PPS) signal processing procedures that attempt to align each individual frame to its natural cycle and avoid truncation of pitch cycles while still using constant frame size and frame offset, in an effort to address the above problems. Expand
  • 38
  • 1
Audio-visual speaker recognition using time-varying stream reliability prediction
TLDR
We examine a time-varying, context dependent, information fusion methodology for multi-stream authentication based on audio and video data collected simultaneously during a user's interaction with a system that outperforms the use of video or audio data alone as well as fused data streams (via concatenation). Expand
  • 26
  • 1
  • PDF
Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction
TLDR
We examine the techniques for multi-modal biometric information fusion for verification and identification of speakers, where the reliability of each data stream, either audio of video, is modeled with parameters that are time-varying and depend on the context created by its local behavior. Expand
  • 34
  • 1
  • PDF
TRANSCRIPTION OF NEW SPEAKING STYLES - VOICEMAIL
TLDR
Sp ontaneous sp eech o ccurring in day-to-day life can broadly b e classi ed into two categories(i) where the sp eaker do es not receiveany externalfeedback to direct his/her sp ech, and (ii) wherethesp eaker receives external feedback from another p er-son/machine/audience. Examples of the former cat-egory are radio broadcast news, voicemail etc. Expand
  • 9
  • 1
Audio-visual speech synchronization detection using a bimodal linear prediction model
TLDR
We propose a time-evolution model for AV features and derive an analytical approach to capture the notion of synchronization between them, with geometric visual features outperforming the image transform ones. Expand
  • 11
  • 1
  • PDF
Hierarchical feature-based translation for scalable natural language understanding
TLDR
This paper addresses scalability issues in natural language understanding, and describes a method for performing the translation of a user input into a formal command in a hierarchical manner. Expand
  • 9
  • 1
  • PDF
...
1
2
3
4
5
...