Maarten Van Segbroeck

Learn More
Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another(More)
In this paper we describe improvements to the IBM speech activity detection (SAD) system for the third phase of the DARPA RATS program. The progress during this final phase comes from jointly training convolutional and regular deep neural networks with rich time-frequency representations of speech. With these additions, the phase 3 system reduces the equal(More)
Reliable automatic detection of speech/non-speech activity in degraded, noisy audio signals is a fundamental and challenging task in robust signal processing. As various speech technology applications rely on the accuracy of a Voice Activity Detection (VAD) system for their effectiveness and robustness, the problem has gained considerable research interest(More)
We present a self-learning algorithm using a bottom-up based approach to automatically discover, acquire and recognize the words of a language. First, an unsupervised technique using non-negative matrix factorization (NMF) discovers phone-sized time–frequency patches into which speech can be decomposed. The input matrix for the NMF is constructed for static(More)
Missing data theory (MDT) has been applied to handle the problem of noise-robust speech recognition. Conventional MDT-systems require acoustic models that are expressed in the log-spectral rather than in the cepstral domain, which leads to a loss in accuracy. Therefore, we have already introduced a MDT-technique that can be applied in any feature domain(More)
We address the challenge of interpreting spoken input in a conversational dialogue system with an approach that aims to exploit the close relationship between the tasks of speech recognition and language understanding through joint modeling of these two tasks. Instead of using a standard pipeline approach where the output of a speech recognizer is the input(More)
In this paper, we address the problem of Language Identification (LID) on short duration segments. Current state-of-the-art LID systems typically employ total variability i-Vector modeling for obtaining fixed length representation of utterances. However, when the utterances are short, only a small amount of data is available, and the estimated i-Vector(More)
Missing feature theory (MFT) has demonstrated great potential for improving the noise robustness in speech recognition. MFT was mostly applied in the log-spectral domain since this is also the representation in which the masks have a simple formulation. However, with diagonally structured covariance matrices in the log-spectral domain, recognition(More)
Empathy measures the capacity of the therapist to experience the same cognitive and emotional dispositions as the patient, and is a key quality factor in counseling. In this work we build computational models to infer the empathy of therapist using prosodic cues. We extract pitch, energy, jitter, shimmer and utterance duration from the speech signal, and(More)