Learn More
Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based(More)
A popular approach for keyword search in speech files is the phone lattice search. Recently minimum edit distance (MED) has been used as a measure of similarity between strings rather than using simple string matching while searching the phone lattice for the keyword. In this paper, we propose a variation of the MED, where the substitution penalties are(More)
Professional manual transcription of speech is an expensive and time consuming process. This paper focuses on the problem of combining noisy transcriptions frommultiple non-expert transcribers, where the quality of work from each worker varies. Computing transcriber reliability is a difficult task in the absence of gold standard reference transcripts. Three(More)
Non-verbal speech cues serve multiple functions in human interaction such as maintaining the conversational flow as well as expressing emotions, personality, and interpersonal attitude. In particular, non-verbal vocalizations such as laughters are associated with affective expressions while vocal fillers are used to hold the floor during a conversation. The(More)
One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most(More)
We present a semi-supervised algorithm for rescoring the output of a speech keyword search (KWS) system. Conventional loss functions such as squared-error and logistic loss are not suitable for optimizing the commonly-used KWS term-weighted value (TWV) performance metric. We derive a novel concave modified logistic log-likelihood function which lower-bounds(More)
Researchers have shown that fusion of categorical labels from multiple experts - humans or machine classifiers - improves the accuracy and generalizability of the overall classification system. Simple plurality is a popular technique for performing this fusion, but it gives equal importance to labels from all experts, who may not be equally reliable or(More)
We present an analysis of several publicly available automatic speech recogniz-ers (ASRs) in terms of their suitability for use in different types of dialogue systems. We focus in particular on cloud based ASRs that recently have become available to the community. We include features of ASR systems and desiderata and requirements for different dialogue(More)
Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to(More)
This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data.(More)