• Corpus ID: 14177520

Sphinx-4: a flexible open source framework for speech recognition

@inproceedings{Walker2004Sphinx4AF,
  title={Sphinx-4: a flexible open source framework for speech recognition},
  author={William Walker and Paul Lamere and Philip Kwok and Bhiksha Raj and Rita Singh and Evandro B. Gouv{\^e}a and Peter Wolf and Joseph Woelfel},
  year={2004}
}
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "researchready" system, Sphinx-4 also includes several implementations of both simple and… 

Figures and Tables from this paper

Sautrela: a highly modular open source speech recognition framework

TLDR
The aim of Sautrela is to unify in a single framework almost all the tasks related to pattern recognition such as signal processing, model training and decoding, which ensures its portability to a large variety of computer platforms.

A review of speech recognition with Sphinx engine in language detection

TLDR
Sphinx approach is applied to integrate the advantage of sequential modeling structure and its pattern classification in speech recognition to assist in next phase of the research which is focusing on building an Arab language speech recognizer by Sphi nx4 engine process approach.

Efficient adaptations of the SphinxTrain procedure for building a robust ASR system in Slovak

  • J. KacurJ. Vojtko
  • Computer Science
    2008 15th International Conference on Systems, Signals and Image Processing
  • 2008
TLDR
The suggested and realized modifications to the classical SphinxTrain procedure for a given database and the Slovak language brought improved overall results as well.

RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit

RASR is the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The current version of the package includes state of the art speech

The Kaldi Speech Recognition Toolkit

TLDR
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

The Bavieca open-source speech recognition toolkit

  • Daniel Bolaños
  • Computer Science
    2012 IEEE Spoken Language Technology Workshop (SLT)
  • 2012
TLDR
The design of Bavieca is described, an open-source speech recognition toolkit intended for speech research and system development that presents a simple and modular design with an emphasis on scalability and reusability.

The training of Slovak speech recognition system based on Sphinx 4 for GSM networks

TLDR
The training process of HMM models that are designed to be used in ASR systems employed in GSM networks using the facility of the SphinxTrain system adjusted for the structure of MOBILDAT database and the Slovak language is presented.

Isolated Swahili Words Recognition using Sphinx4

TLDR
This paper proposes an approach to building a Swahili speech recognizer using Sphinx4 to demonstrate its adaptability to recognition of spokenswahili words and examined the Swahile language structure and sound synthesis processes.

The AhoSR Automatic Speech Recognition System

TLDR
The basic architecture as well as the most relevant aspects of the AhoSR speech recognition system are introduced and the results of several experiments which validate the system for its use in different tasks: phonetic, grammar-based and LM-based recognition.

Automatic Urdu Speech Recognition using Hidden Markov Model

TLDR
Experimental results suggest that better recognition accuracy has been achieved with this approach, as compared to the previous results reported on this corpus of Urdu.
...

References

SHOWING 1-10 OF 38 REFERENCES

Design of the CMU sphinx-4 decoder

The decoder of the sphinx-4 speech recognition system incorporates several new design strategies which have not been used earlier in conventional decoders of HMM-based large vocabulary speech

An overview of the SPHINX speech recognition system

TLDR
SPHINX is a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition, based on discrete hidden Markov models with LPC- (linear-predictive-coding) derived parameters.

From Sphinx-II to Whisper — Making Speech Recognition Usable

TLDR
This chapter reviews Sphinx-II, a large-vocabulary speaker-independent continuous speech recognition system developed at Carnegie Mellon University, and reviews Whisper, a system developed here at Microsoft Corporation, focusing on recognition accuracy, efficiency and usability issues.

A public domain speech-to-text system

TLDR
The core components of an available state-of-the-art Speech-toText system are presented: an acoustic processor which converts the speech signal into a sequence of feature vectors; a training module which estimates the parameters for a Hidden Markov Model; a linguistic processor which predicts the next word given a sequences of previously recognized words; and a search engine which finds the most probable word sequence given a set of feature vector.

The SPHINX-II speech recognition system: an overview

TLDR
The SPHINX-II speech recognition system is reviewed and recent efforts on improved speech recognition are summarized.

Audio-visual continuous speech recognition using a coupled hidden Markov model

TLDR
The experimental results show that the current system tested on the XM2VTS database reduces the error rate of the audio only speech recognition system at SNR of 0db by over 55%.

The HARPY speech recognition system

TLDR
The HARPY system is the result of an attempt to understand the relative importance of various design choices of two earlier speech recognition systems developed at Carnegie-Mellon University, in which knowledge is represented as a finite state transition network but without the a-priori transition probabilities.

The 1996 Hub-4 Sphinx-3 System

TLDR
The model structure, acoustic modeling, language modeling, lexical modeling, and system structure are summarized and the experimental results obtained with this system on the most recent DARPA evaluation are discussed.

The DARPA 1000-word resource management database for continuous speech recognition

A database of continuous read speech has been designed and recorded within the DARPA strategic computing speech recognition program. The data is intended for use in designing and evaluating

The DRAGON system--An overview

This paper briefly describes the major features of the DRAGON speech understanding system. DRAGON makes systematic use of a general abstract model to represent each of the knowledge sources necessary