Learn More
Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based(More)
A popular approach for keyword search in speech files is the phone lattice search. Recently minimum edit distance (MED) has been used as a measure of similarity between strings rather than using simple string matching while searching the phone lattice for the keyword. In this paper, we propose a variation of the MED, where the substitution penalties are(More)
Professional manual transcription of speech is an expensive and time consuming process. This paper focuses on the problem of combining noisy transcriptions frommultiple non-expert transcribers, where the quality of work from each worker varies. Computing transcriber reliability is a difficult task in the absence of gold standard reference transcripts. Three(More)
We present a semi-supervised algorithm for rescoring the output of a speech keyword search (KWS) system. Conventional loss functions such as squared-error and logistic loss are not suitable for optimizing the commonly-used KWS term-weighted value (TWV) performance metric. We derive a novel concave modified logistic log-likelihood function which lower-bounds(More)
Researchers have shown that fusion of categorical labels from multiple experts - humans or machine classifiers - improves the accuracy and generalizability of the overall classification system. Simple plurality is a popular technique for performing this fusion, but it gives equal importance to labels from all experts, who may not be equally reliable or(More)
We present an analysis of several publicly available automatic speech recogniz-ers (ASRs) in terms of their suitability for use in different types of dialogue systems. We focus in particular on cloud based ASRs that recently have become available to the community. We include features of ASR systems and desiderata and requirements for different dialogue(More)
Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to(More)
In this paper, we present a systems approach for channel mod-eling of an Automatic Speech Recognition (ASR) system. This can have implications in improving speech recognition components , such as through discriminative language modeling. We simulate the ASR corruption using a phrase-based machine translation system trained between the reference phoneme and(More)
Practical supervised learning scenarios involving subjectively evaluated data have multiple evaluators, each giving their noisy version of the hidden ground truth. Majority logic combination of labels assumes equally skilled evaluators, and is generally suboptimal. Previously proposed models have assumed data independent evaluator behavior. This paper(More)
This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data.(More)