Learn More
Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based(More)
A popular approach for keyword search in speech files is the phone lattice search. Recently minimum edit distance (MED) has been used as a measure of similarity between strings rather than using simple string matching while searching the phone lattice for the keyword. In this paper, we propose a variation of the MED, where the substitution penalties are(More)
We present a semi-supervised algorithm for rescoring the output of a speech keyword search (KWS) system. Conventional loss functions such as squared-error and logistic loss are not suitable for optimizing the commonly-used KWS term-weighted value (TWV) performance metric. We derive a novel concave modified logistic log-likelihood function which lower-bounds(More)
We present an analysis of several publicly available automatic speech recogniz-ers (ASRs) in terms of their suitability for use in different types of dialogue systems. We focus in particular on cloud based ASRs that recently have become available to the community. We include features of ASR systems and desiderata and requirements for different dialogue(More)
Researchers have shown that fusion of categorical labels from multiple experts - humans or machine classifiers - improves the accuracy and generalizability of the overall classification system. Simple plurality is a popular technique for performing this fusion, but it gives equal importance to labels from all experts, who may not be equally reliable or(More)
In this paper, we present a systems approach for channel mod-eling of an Automatic Speech Recognition (ASR) system. This can have implications in improving speech recognition components , such as through discriminative language modeling. We simulate the ASR corruption using a phrase-based machine translation system trained between the reference phoneme and(More)
Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to(More)
Practical supervised learning scenarios involving subjectively evaluated data have multiple evaluators, each giving their noisy version of the hidden ground truth. Majority logic combination of labels assumes equally skilled evaluators, and is generally suboptimal. Previously proposed models have assumed data independent evaluator behavior. This paper(More)
Speech and spoken language cues offer a valuable means to measure and model human behavior. Computational models of speech behavior have the potential to support health care through assistive technologies, informed intervention, and efficient long-term monitoring. The Interspeech 2013 Autism Sub-Challenge addresses two developmental disorders that manifest(More)
Diversity or complementarity of automatic speech recognition (ASR) systems is crucial for achieving a reduction in word error rate (WER) upon fusion using the ROVER algorithm. We present a theoretical proof explaining this often-observed link between ASR system diversity and ROVER performance. This is in contrast to many previous works that have only(More)