Learn More
A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and(More)
I4U is a joint entry of nine research Institutes and Universities across 4 continents to NIST SRE 2012. It started with a brief discussion during the Odyssey 2012 workshop in Singapore. An online discussion group was soon set up, providing a discussion platform for different issues surrounding NIST SRE'12. Noisy test segments, uneven multi-session training,(More)
The i-vector representation and PLDA classifier have shown state-of-the-art performance for speaker recognition systems. The availability of more than one enrollment utterance for a speaker allows a variety of configurations which can be used to enhance robustness to noise. The well-known technique of multicondition training can be utilized at different(More)
a r t i c l e i n f o a b s t r a c t The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the(More)
Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal(More)
In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each(More)
Total variability modeling, based on i-vector extraction of converting a variable-length sequence of feature vectors into a fixed-length i-vector, is currently an adopted parametriza-tion technique for state of-the-art speaker verification systems. However, when the number of the feature vectors is low, uncertainty in the i-vector representation as a point(More)
Human judgment is the final authority in forensic speaker recognition, but the use of modern speaker verification systems with accurate algorithms to perform the task under various circumstances has a huge potential to help the expert. The ultimate goal is to improve the accuracy of automatic systems when challenging data is provided and find a methodology(More)
In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker(More)
In this work, the feature based on the group delay function from all-pole models (APGD) is proposed for pitched musical instrument recognition. Conventionally, the spectrum-related features take into account merely the magnitude information, whereas the phase is often overlooked due to the complications related to its interpretation. However, there is often(More)