Learn More
Reliable automatic detection of speech/non-speech activity in degraded, noisy audio signals is a fundamental and challenging task in robust signal processing. As various speech technology applications rely on the accuracy of a Voice Activity Detection (VAD) system for their effectiveness and robustness, the problem has gained considerable research interest(More)
Empathy measures the capacity of the therapist to experience the same cognitive and emotional dispositions as the patient, and is a key quality factor in counseling. In this work we build computational models to infer the empathy of therapist using prosodic cues. We extract pitch, energy, jitter, shimmer and utterance duration from the speech signal, and(More)
Missing feature theory (MFT) has demonstrated great potential for improving the noise robustness in speech recognition. MFT was mostly applied in the log-spectral domain since this is also the representation in which the masks have a simple formulation. However, with diagonally structured covariance matrices in the log-spectral domain, recognition(More)
Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another(More)
We present a self-learning algorithm using a bottom-up based approach to automatically discover, acquire and recognize the words of a language. First, an unsupervised technique using non-negative matrix factorization (NMF) discovers phone-sized time–frequency patches into which speech can be decomposed. The input matrix for the NMF is constructed for static(More)
This paper presents a simplified and supervised i-vector modeling framework that is applied in the task of robust and efficient speaker verification (SRE). First, by concatenating the mean supervector and the i-vector factor loading matrix with respectively the label vector and the linear classifier matrix, the traditional i-vectors are then extended to(More)
Missing data theory (MDT) has been applied to handle the problem of noise-robust speech recognition. Conventional MDT-systems require acoustic models that are expressed in the log-spectral rather than in the cepstral domain, which leads to a loss in accuracy. Therefore, we have already introduced a MDT-technique that can be applied in any feature domain(More)
In this paper, we propose robust features for the problem of voice activity detection (VAD). In particular, we extend the long term signal variability (LTSV) feature to accommodate multiple spectral bands. The motivation of the multi-band approach stems from the non-uniform frequency scale of speech phonemes and noise characteristics. Our analysis shows(More)
Automatically evaluating pronunciation quality of non-native speech has seen tremendous success in both research and commercial settings, with applications in L2 learning. In this paper, submitted for the INTERSPEECH 2015 Degree of Nativeness Sub-Challenge, this problem is posed under a challenging cross-corpora setting using speech data drawn from multiple(More)
The application of Missing Data Theory (MDT) has shown to improve the robustness of automatic speech recognition (ASR) systems. A crucial part in a MDT-based recognizer is the computation of the reliability masks from noisy data. To estimate accurate masks in environments with unknown, non-stationary noise statistics only weak assumptions can be made about(More)