Large Vocabulary Speech Recognition on Parallel Architectures
The speed of modern processors has remained constant over the last few years but the integration capacity continues to follow Moore's law and thus, to be scalable, applications must be parallelized. In addition to the main CPU, almost every computer is equipped with a Graphics Processors Unit (GPU) which is in essence a specialized parallel processor. This paper explore how performance of speech recognition systems can be enhanced by using the A* algorithm which allows better parallelization over the Viterbi algorithm and a GPU for the acoustic computations in large vocabulary applications. First experiments with a “unigram approximation” heuristic resulted in approximatively 8.7 times less states being explored compared to our classical Viterbi decoder. The multi-thread implementation of the A* decoder combined with GPU for acoustic computation led to a speed-up factor of 5.2 over its sequential counterpart and an improvement of 5% absolute of the accuracy over the sequential Viterbi search at real-time.