ACL Lifetime Achievement Award: The Dawn of Statistical ASR and MT

@article{Jelinek2009ACLLA,
  title={ACL Lifetime Achievement Award: The Dawn of Statistical ASR and MT},
  author={Frederick Jelinek},
  journal={Computational Linguistics},
  year={2009},
  volume={35},
  pages={483-494}
}
  • F. Jelinek
  • Published 1 December 2009
  • Education
  • Computational Linguistics
I am very grateful for the award you have bestowed on me. To understand your generosity I have to assume that you are honoring the leadership of three innovative groups that I headed in the last 47 years: at Cornell, IBM, and now at Johns Hopkins. You know my co-workers in the last two teams. The Cornell group was in Information Theory and included Toby Berger, Terrence Fine, and Neil J. A. Sloane (earlier my Ph.D. student), all of whom earned their own laurels. I was told that I should give an… 

Linguistics: The Garden and the Bush*

  • J. Bresnan
  • Linguistics
    Computational Linguistics
  • 2016
TLDR
The title of the talk describes two fields of linguistics, which differ in their approaches to data and analysis and in their fundamental concepts.

OBITUARY

Frederick Jelinek died, peacefully and unexpectedly, on 14 September 2010. Over a distinguished career of nearly fifty years, Fred made important contributions in areas ranging from coding theory and

Fred Jelinek

  • M. Liberman
  • Computer Science
    Computational Linguistics
  • 2010
Frederick Jelinek died, peacefully and unexpectedly, on 14 September 2010. Over a distinguished career of nearly fifty years, Fred made important contributions in areas ranging from coding theory and

Syntax-based language models for statistical machine translation

TLDR
This dissertation attempts to improve the fluency of machine translation output by explicitly incorporating models of the target language structure into machine translation systems by proposing a framework for decoding that decouples the structures of the sentences of the source and target languages.

Multi-word tokenization for natural language processing

TLDR
The central idea presented in this thesis is the proposition of multi-word tokenization (MWT), MWU-aware tokenization as a preprocessing step for NLP systems, to drive research towards NLP applications that understand unrestricted natural language.

Phonetics of Endangered Languages

astounding array of languages, 6,909, by the count of the Ethnologue (Lewis, 2009). Most of these use an acoustic signal as the main element in signal transmission, though vision affects speech even

The Causal Nature of Modeling with Big Data

TLDR
It is shown to lack a pronounced hierarchical, nested structure and the significance of the transition to such “horizontal” modeling is underlined by the concurrent emergence of novel inductive methodology in statistics such as non-parametric statistics.

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

TLDR
A novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM is proposed, which is more efficient in training than bi-directional models and can be applied to lattice rescoring.

Future word contexts in neural network language models

TLDR
A novel neural network structure, succeeding word RNNLMs (suRNNL Ms), where a feedforward unit is used to model a finite number of succeeding, future, words and can be trained much more efficiently and used for lattice rescoring.

Large-scale semi-supervised learning for natural language processing

TLDR
This dissertation proposes effective, efficient, versatile methodologies for extracting useful information from very large (potentially web-scale) volumes of unlabeled data and combining such information with standard supervised machine learning for NLP, and proposes a general approach for integrating information from multiple, overlapping sequences of context for lexical disambiguation problems.

References

SHOWING 1-10 OF 15 REFERENCES

The Computational Analysis of English—A Corpus‐Based Approach

specifically on the difficulty of recognizing learning versus language difficulties, that is, how to identify a nonnative-speaking child's need for special education services. They propose a model

A Maximum Entropy Approach to Adaptive Statistical Language Modeling

TLDR
An adaptive language model based on the principle of Maximum Entropy was trained on the Wall Street Journal corpus, and showed 32%–39% perplexity reduction over the baseline, illustrating the feasibility of incorporating many diverse knowledge sources in a single, unified statistical framework.

A maximum entropy approach to adaptive statistical language modelling

TLDR
An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources, and shows the feasibility of incorporating many diverse knowledge sources in a single, unified statistical framework.

Design of a linguistic statistical decoder for the recognition of continuous speech

TLDR
This paper describes the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech and describes a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP.

Estimation of probabilities from sparse data for the language model component of a speech recognizer

  • S. Katz
  • Computer Science
    IEEE Trans. Acoust. Speech Signal Process.
  • 1987
TLDR
The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.

The DRAGON system--An overview

This paper briefly describes the major features of the DRAGON speech understanding system. DRAGON makes systematic use of a general abstract model to represent each of the knowledge sources necessary

An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model

TLDR
A modified version of the algorithm which includes the full (forward) decoder, cross-word acoustic models and longer-span language models is described, which has been demonstrated to have a low probability of search error and to be very efficient.

Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition

TLDR
A model for channels in which an input sequence can produce output sequences of varying length is described and a stack decoding algorithm for decoding on such channels is presented.

Fast sequential decoding algorithm using a stack

In this paper a new sequential decoding algorithm is introduced that uses stack storage at the receiver. It is much simpler to describe and analyze than the Fano algorithm, and is about six times

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

  • A. Viterbi
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1967
TLDR
The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.