Learn More
—We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This(More)
We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We(More)
—We present a freely available open-source toolkit for training recurrent neural network based language models. It can be easily used to improve existing speech recognition and machine translation systems. Also, it can be used as a baseline for future research of advanced language modeling techniques. In the paper, we discuss optimal parameter selection and(More)
Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentence-hypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper(More)
Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy(More)
This paper investigates the use of deep belief networks (DBN) for semantic tagging, a sequence classification task, in spoken language understanding (SLU). We evaluate the performance of the DBN based sequence tagger on the well-studied ATIS task and compare our technique to conditional random fields (CRF), a state-of-the-art classifier for sequence(More)
In this paper, we present strategies to incorporate long context information directly during the first pass decoding and also for the second pass lattice re-scoring in speech recognition systems. Long-span language models that capture complex syntactic and/or semantic information are seldom used in the first pass of large vocabulary continuous speech(More)
In this paper, we explore the model combination problem for rescoring Automatic Speech Recognition (ASR) hypotheses. We use minimum Empirical Bayes Risk for the optimization criterion and Deterministic Annealing techniques to search through the non-convex parameter space. Our experiments on the DARPA WSJ task using several different language models showed(More)
Models for statistical spoken language understanding (SLU) systems are conventionally trained using supervised discriminative training methods. In many cases, however, labeled data necessary for these supervised techniques is not readily available necessitating a laborious data collection and annotation effort. This often results into data sets that are not(More)