Martin Graciarena

Learn More
We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard(More)
To date, studies of deceptive speech have largely been confined to descriptive studies and observations from subjects, researchers, or practitioners, with few empirical studies of the specific lexical or acoustic/prosodic features which may characterize deceptive speech. We present results from a study seeking to distinguish deceptive from non-deceptive(More)
Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated(More)
We report on machine learning experiments to distinguish deceptive from nondeceptive speech in the Columbia-SRI-Colorado (CSC) corpus. Specifically, we propose a system combination approach using different models and features for deception detection. Scores from an SVM system based on prosodic/lexical features are combined with scores from a Gaussian(More)
The CALO Meeting Assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech(More)
We introduce a new database for evaluation of speaker recognition systems. This database involves types of variability already seen in NIST speaker recognition evaluations (SREs) like language, channel, speech style and vocal effort, and new types not yet available on any standard database like severe noise, and reverberation. The database is created using(More)
Deep Neural Network (DNN) based acoustic models have shown significant improvement over their Gaussian Mixture Model (GMM) counterparts in the last few years. While several studies exist that evaluate the performance of GMM systems under noisy and channel degraded conditions, noise robustness studies on DNN systems have been far fewer. In this work we(More)
We describe the ICSI-SRI-UW team’s entry in the Spring 2004 NIST Meeting Recognition Evaluation. The system was derived from SRI’s 5xRT Conversational Telephone Speech (CTS) recognizer by adapting CTS acoustic and language models to the Meeting domain, adding noise reduction and delay-sum array processing for far-field recognition, and postprocessing for(More)
We describe the large vocabulary automatic speech recognition system developed for Modern Standard Arabic by the SRI/Nightingale team, and used for the 2007 GALE evaluation as part of the speech translation system. We show how system performance is affected by different development choices, ranging from text processing and lexicon to decoding system(More)
The goal of this work was to explore modeling techniques to improve bird species classification from audio samples. We first developed an unsupervised approach to obtain approximate note models from acoustic features. From these note models we created a bird species recognition system by leveraging a phone n-gram statistical model developed for speaker(More)