Share This Author
Bleu: a Method for Automatic Evaluation of Machine Translation
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
IBM's Statistical Question Answering System
The authors des ribe the IBM Statisti al Question Answering for TREC-9 system in detail and look at several examples and errors and results at the 250 byte and 50 byte levels for the overall system as well as results on ea h sub omponent.
Automatic recognition of spontaneous speech for access to multilingual oral history archives
- W. Byrne, D. Doermann, Wei-Jing Zhu
- Computer ScienceIEEE Transactions on Speech and Audio Processing
- 21 June 2004
Results are presented from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators.
Unsupervised and supervised clustering for topic tracking
Important differences between two styles of document clustering in the context of Topic Detection and Tracking are investigated in both the design and the evaluation of TDT systems.
Segmentation and Detection at IBM
This work investigates the importance of merging microclusters together, and proposes a merging strategy which improves the performance of IBM’s story segmentation models.
Question Answering Using Maximum-Entropy Components
A statistical question answering system developed for TREC-9 is presented, an application of maximum entropy classification for question/answer type prediction and named entity marking and a new method of analyzing system performance via a transition matrix is shown.
Quantifying the utility of parallel corpora
It is found that the performance of the English-Chinese cross-language IR system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.
English-Chinese Information Retrieval at IBM
Abstract : We describe TREC-9 experiments with an IR system that incorporates statistical machine translation trained on sentence-aligned parallel corpora for both query translation…
Statistical methods for topic segmentation
This paper presents an algorithm for topic segmentation which uses a combination of machine learning, statistical natural language processing, and information retrieval techniques and presents the results on the widely used TDT2 and TDT3 corpora provided by NIST.
Topic styles in IR and TDT: effect on system behavior
This work compares the behavior of a topic tracking system using relevance judgements from TDT with that of the same system using relevant information from the SDR in order to investigate the influence of differences document relevance judgement on thebehavior of the tracking system.