Evaluating Natural Language Processing Systems: An Analysis and Review

@article{SparckJones1995EvaluatingNL,
  title={Evaluating Natural Language Processing Systems: An Analysis and Review},
  author={Karen Sparck Jones and Julia Galliers},
  journal={Evaluating Natural Language Processing Systems},
  year={1995}
}
From the Publisher: This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems. It addresses the whole area of NLP system evaluation, including aims and scope, problems and methodology. The authors provide a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations; they relate systems to their environments and develop a framework for proper evaluation. The discussion of principles is… 

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

TLDR
An up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised is given, to highlight a number of recent research topics that have arisen partly as a result of growing synergies betweenNLG and other areas of artifical intelligence.

Evaluation of NLG: some analogies and differences with machine translation and reference resolution

TLDR
An explanation of the difficulty to evaluate NLG systems based on a typology of natural language processing (NLP) systems is offered, and from this typology some suggestions for NLG evaluation are drawn, andNLG evaluation is compared to MT evaluation.

Automatic summarising: a review and discussion of the state of the art

TLDR
Automatic summarisation research has made valuable progress in the last decade, with some practically useful approaches, better evaluation, and more understanding of the task, but evaluation needs to be taken significantly further so as to engage with the purposes for which summaries are intended and the contexts in which they are used.

A Semantic QA-Based Approach for Text Summarization Evaluation

TLDR
This paper presents some preliminary results on how to pinpoint content differences of two text passages, and treats one text passage as a small knowledge base, and asks a large number of questions to exhaustively identify all content points in it.

Text summarisation in progress: a literature review

TLDR
This paper contains a large literature review in the research field of Text Summarisation (TS) based on Human Language Technologies, where the existing methodologies and systems are explained, as well as new research that has emerged concerning the automatic evaluation of summaries’ quality.

Anniversary article: Then and now: 25 years of progress in natural language engineering

TLDR
The move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML.

Evaluation of Text Summarization in a Cross-lingual Information Retrieval Framework

TLDR
A toolkit for evaluation of single-document and multi-document summarization and evaluation of summarization in the framework of cross-lingual information retrieval is developed and the measurement of relevance correlation is introduced and systematically examined in this workshop.

Automatic summarising: The state of the art

Automation of summarization evaluation methods and their application to the summarization process

TLDR
The development of an automatic summarization system which draws on the conceptual idea of the Pyramid evaluation scheme and the techniques developed for the proposed evaluation system, and the development of a fully automated evaluation method.

Natural Language Processing: A Historical Review

This paper reviews natural language processing (NLP) from the late 1940’s to the present, seeking to identify its successive trends as these reflect concerns with different problems or the pursuit of
...

References

SHOWING 1-10 OF 87 REFERENCES

Evaluation of natural language processing systems: Issues and approaches

This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and

An Evaluation Methodology for Natural Language Processing Systems

Abstract : The Neal-Montgomery NLP Evaluation Methodology was developed under the 'Benchmark Investigation/Identification' project as a means of determining the linguistic competence of Natural

A Practical Methodology for the Evaluation of Spoken Language Systems

TLDR
These evaluations are probably the only NL evaluations other than the series of Message Understanding Conferences to have been developed and used by a group of researchers at different sites, although several excellent workshops have been held to study some of these problems.

Evaluating natural language processing systems

TLDR
Evaluating Natural Language Processing Systems Designing customized methods for testing various NLP systems may be costly and expensive, so post hoc justification is needed.

Evaluation of evaluation in information retrieval

TLDR
A critical and historical analysis of evaluations of IR systems and processes, and issues related to systems under evaluation, and evaluation criteria, measures, measuring instruments, and methodologies are examined.

An English language question answering system for a large relational database

TLDR
By typing requests in English, casual users will be able to obtain explicit answers from a large relational database of aircraft flight and maintenance data using a system called PLANES, which uses a number of augmented transition networks to match phrases with a specific meaning.

The Hub and Spoke Paradigm for CSR Evaluation

TLDR
The new paradigm used in the most recent ARPA-sponsored Continuous Speech Recognition (CSR) evaluation is introduced and then the important features of the test design are discussed.

Natural language interfaces to databases - an introduction

TLDR
An introduction to natural language interfaces to databases (NLIDBS) and some less explored areas of NLIDB research are presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBS.

Overview of the third message understanding evaluation and conference

The Naval Ocean Systems Center (NOSC) has conducted the third in a series of evaluations of English text analysis systems. These evaluations are intended to advance our understanding of the merits of

Expanding the Scope of the ATIS Task: The ATIS-3 Corpus

TLDR
The migration of the ATIS task to a richer relational database and development corpus (ATIS-3) and the ATis-3 corpus is described, including breakdowns of data by type (e.g. context-independent, context-dependent, and unevaluable) and variations in the data collected at different sites.
...