• Corpus ID: 62761140

Automatic Evaluation of Question Answering System based on BE Method

  title={Automatic Evaluation of Question Answering System based on BE Method},
  author={Akiko Yamamoto and Jun-ichi Fukumoto},
In this paper, we describe automatic evaluation method for question answering in natural language. This method is based on BEs (Basic Elements) originally proposed by Hovy et. al. for automatic evaluation of document sum- maries. We applied BE method for evaluation of question an- swering with comparison between BEs of system answer and BEs of correct answers. According to the experiments using QAC4 test set, we have proved that BE method has some cor- relation with human evaluation. 

Figures and Tables from this paper


An Overview of the 4th Question Answering Challenge (QAC-4) at NTCIR Workshop 6
The evaluation results showed some of the participant systems could focus on the area which correct answer contents exist but have tendency to fail to extract correct answer areas, caused by complex question types and difficulty of correct answer scope extraction.
Automated Summarization Evaluation with Basic Elements
This paper describes a framework in which summary evaluation measures can be instantiated and compared, and implements a specific evaluation method using very small units of content, called Basic Elements that address some of the shortcomings of ngrams.
Evaluating DUC 2005 using Basic Elements
It is shown that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated.
Question Answering Challenge for Five Ranked Answers and List Answers - Overview of NTCIR4 QAC2 Subtask 1 and 2
An evaluation of question answering task, Question Answering Challenge 2 (QAC2), carried out at the NTCIR Workshop 3 in October 2002, to develop an evaluation method for the question answering system and information resources for evaluation.
DRAFT Overview of the TREC 2003 Question Answering Track
The TREC 2003 question answering track contained two tasks, the passages task and the main task; the evaluation metric was the number of snippets that contained a correct answer; the reliability of the evaluation used for these tasks was examined.
Evaluating Content Selection in Summarization: The Pyramid Method
It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.
An Overview of NTCIR-5 QAC3
The IAD task in QAC3 is based on QAC2 Subtask 3 with several improvements, including elaboration of the scope of questions and answers and introduction of multi-grade evaluation and the concept of a correct answer set.
ROUGE: A Package for Automatic Evaluation of Summaries
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Development of a T-DMB Monitoring System
This paper presents T-DMB monitoring systems that can simultaneously decode 6 services of different format within the two ensemble, through suitable algorithms and multi-thread techniques.
Adaptive MPEG-4 Video Streaming Over IP Networks
A system named VSS (Video Streaming Simulation) is proposed for comprehensive video delivered quality evaluation using traffice traces in RTP/UDP/IP network simulation environment.