Tasuku Oonishi

Learn More
When using Weighted Finite State Transducers (WFSTs) in speech recognition, on-the-fly composition approaches have been proposed as a method of reducing memory consumption and increasing flexibility during decoding. We have recently implemented several fast on-the-fly techniques, namely avoiding dead-end states, dynamic pushing and state sharing in our(More)
In this paper we present evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a fast, scalable, flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we have already implemented a(More)
In this paper we present a fast method for computing acoustic likelihoods that makes use of a Graphics Processing Unit (GPU). After enabling the GPU acceleration the main processor runtime dedicated to acoustic scoring tasks is reduced from the largest consumer to just a few percent even when using mixture models with a large number of Gaussian components.(More)
In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we(More)
This paper describes our system for " NEWS 2009 Machine Transliteration Shared Task " (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in(More)
In the Weighted Finite State Transducer (WFST) framework for speech recognition, we can reduce memory usage and increase flexibility by using on-the-fly composition which generates the search network dynamically during decoding. Methods have also been proposed for optimizing WFSTs in on-the-fly composition, however, these operations place restrictions on(More)
—Text corpus size is an important issue when building a language model (LM) in particular where insufficient training and evaluation data are available. In this paper we continue our work on creating a speech recognition system with a LM that is trained on a small amount of text in the target language. In order to get better performance we use a large(More)
In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or(More)