• Publications
  • Influence
Automating the Construction of Internet Portals with Machine Learning
Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allowsExpand
  • 396
  • 44
Learning Hidden Markov Model Structure for Information Extraction
Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hiddenExpand
  • 451
  • 26
Scalable backoff language models
When a trigram backoff language model is created from a large body of text, trigrams and bigrams that occur few times in the training text are often excluded from the model in order to decrease theExpand
  • 102
  • 11
A Machine Learning Approach to Building Domain-Specific Search Engines
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they areExpand
  • 211
  • 10
Building Domain-Specific Search Engines with Machine Learning Techniques
Domain-specific search engines are growing in popularity because they offer increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example,Expand
  • 165
  • 9
Using story topics for language model adaptation
The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes a piece of text,Expand
  • 121
  • 9
The 1997 CMU Sphinx-3 English Broadcast News Transcription System
This paper describes the 1997 Hub-4 Broadcast News Sphinx3 speech recognition system. This year’s system includes fullbandwidth acoustic models trained on Broadcast News and Wall Street JournalExpand
  • 72
  • 8
The 1996 Hub-4 Sphinx-3 System
This paper describes the CMU Sphinx-3 system, and the configuration we used for the 1996 DARPA (Hub-4) evaluation. The model structure, acoustic modeling, language modeling, lexical modeling, andExpand
  • 95
  • 2
Experiments in Spoken Document Retrieval at CMU
Experience de recherche d'information sur un corpus representant 70 heures d'informations radiodiffusees (CMU= Carnegie Mellon University)
  • 37
  • 2
This paper describes the experiments performed as part of the TREC-97 Spoken Document Retrieval Track. The task was to pick the correct document from 35 hours of recognized speech documents, based onExpand
  • 27
  • 2