Evaluation of natural language processing systems: Issues and approaches

  title={Evaluation of natural language processing systems: Issues and approaches},
  author={Giovanni Guida and Giancarlo Mauri},
  journal={Proceedings of the IEEE},
This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NLP systems evaluation. Then, after introducing the notion of the life cycle of an NLP system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems… 

Figures from this paper

Evaluating Natural Language Processing Systems: An Analysis and Review

This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems, and provides a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations.

A diagnostic tool for German syntax

An ongoing effort to construct a catalogue of syntactic data exemplifying the major syntactic patterns of German to support the diagnosis of errors in the syntactic components of natural language processing (NLP) systems is described.

Evaluating natural language processing systems

Evaluating Natural Language Processing Systems Designing customized methods for testing various NLP systems may be costly and expensive, so post hoc justification is needed.

Natural Language Sourcebook.

The Sourcebook is a compilation on 197 processing problems addressed or handled by intelligent computer systems classified into a scheme with an artificial intelligence bent and cross-referenced to companion schemes one with a linguistic and a cognitive psychological perspective on the type of issues reflected in the problems.

Why Human Translators Still Sleep in Peace? (Four Engineering and Linguistic Gaps in Nlp)

This paper is a brief dissertation on four engineering and linguistic issues believed critical for a more striking success of NLP: extensive acquisition of the semantic lexicon, formal performance evaluation methods to evaluate systems, development of shell systems for rapid prototyping and customization, and finally a more linguistically motivated approach to word categorization.

Evaluating Natural Language Systems: A Sourcebook Approach

Progress is reported in development of evaluation methodologies for natural language systems with a common classification of the problems in natural language understanding.

Issues in Performance Evaluation of Mathematical Notation Recognition Systems

Issues that are discussed cover the reported performance evaluation experiments, the code availability, the nature of the mathematical notation, the extent of the coverage of mathematical recognition systems, and the quantification of performance evaluation results.

New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology

An overview of NLP models is provided, with a focus on GPT-3, as well as discussion of applications specific to ophthalmology, and the limitations of G PT-3 and the challenges with its integration into routine ophthalmic care are outlined.

Evaluation: An assessment

An editorial introduction to this Special Issue of Machine Translation dedicated to Evaluation is provided, the rationale for the Issue is described, the various contributions of the papers in this issue are outlined, and the main current approaches are given.

Constructing natural language interface applications to operating systems

The presented linguistic stratification analysis has been employed in the design of a user interface management system for developing natural language interfaces to operating systems and is demonstrated through the development of a natural language interface for the Unix operating system.



Designing and automating the quality assessment of a knowledge-based. system: The initial Automated academic advisor experience

The automated academic advisor (AAA), a large practical artificial intelligence system currently under development, is introduced. Two parsers are described which were designed for use with the AAA.

Understanding Natural Language Through Parallel Processing of Syntactic and Semantic Knowledge: An Application to Data Base Query

The core of the PARNAX system is constituted by the analyzer that includes parallel processing of syntactic and semantic knowledge, and it is argued that this feature allowed the system to reach a good linguistic coverage, still ensuring an acceptable degree of efficiency.

Computing Machinery and Intelligence

  • A. Turing
  • Philosophy
    The Philosophy of Artificial Intelligence
  • 1950
The question, “Can machines think?” is considered, and the question is replaced by another, which is closely related to it and is expressed in relatively unambiguous words.

Software Reliability Analysis Models

  • M. Ohba
  • Engineering
    IBM J. Res. Dev.
  • 1984
Improvements to conventional software reliability analysis models by making the assumptions on which they are based more realistic are discussed, including the delayed S-shaped growth model, the inflection S- shaped model, and the hyperexponential model.

Software Reliability Analysis

A case study is presented of the analysis of failure data from a Space Shuttle software project to predict the number of failures likely during a mission, and the subsequent verification of these predictions.

Experience with ROBOT in 12 Commercial, Natural Language Data Base Query Applications

The unexpected linguistic and semantic difficulties encountered in the 12 commercial applications to which ROBOT has been applied during the last year and a half are discussed.

Experience with the Evaluation of Natural Language Question Answerers

Two measurements, conceptual and linguistic completeness, are defined and discussed in this paper and demonstrated that the conceptual coverage of natural language systems should be extended to better satisfy the needs and expectations of users.

A formal basis for performance evaluation ofatural languages understanding systems " Comput

  • Linguistics