Corpus ID: 62761140

Automatic Evaluation of Question Answering System based on BE Method

@inproceedings{Yamamoto2008AutomaticEO,
  title={Automatic Evaluation of Question Answering System based on BE Method},
  author={Akiko Yamamoto and Jun-ichi Fukumoto},
  year={2008}
}
In this paper, we describe automatic evaluation method for question answering in natural language. This method is based on BEs (Basic Elements) originally proposed by Hovy et. al. for automatic evaluation of document sum- maries. We applied BE method for evaluation of question an- swering with comparison between BEs of system answer and BEs of correct answers. According to the experiments using QAC4 test set, we have proved that BE method has some cor- relation with human evaluation. 

References

SHOWING 1-10 OF 10 REFERENCES
An Overview of the 4th Question Answering Challenge (QAC-4) at NTCIR Workshop 6
TLDR
The evaluation results showed some of the participant systems could focus on the area which correct answer contents exist but have tendency to fail to extract correct answer areas, caused by complex question types and difficulty of correct answer scope extraction. Expand
Automated Summarization Evaluation with Basic Elements
TLDR
This paper describes a framework in which summary evaluation measures can be instantiated and compared, and implements a specific evaluation method using very small units of content, called Basic Elements that address some of the shortcomings of ngrams. Expand
Evaluating DUC 2005 using Basic Elements
TLDR
It is shown that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated. Expand
Question Answering Challenge for Five Ranked Answers and List Answers - Overview of NTCIR4 QAC2 Subtask 1 and 2
TLDR
An evaluation of question answering task, Question Answering Challenge 2 (QAC2), carried out at the NTCIR Workshop 3 in October 2002, to develop an evaluation method for the question answering system and information resources for evaluation. Expand
DRAFT Overview of the TREC 2003 Question Answering Track
TLDR
The TREC 2003 question answering track contained two tasks, the passages task and the main task; the evaluation metric was the number of snippets that contained a correct answer; the reliability of the evaluation used for these tasks was examined. Expand
Evaluating Content Selection in Summarization: The Pyramid Method
TLDR
It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference. Expand
An Overview of NTCIR-5 QAC3
TLDR
The IAD task in QAC3 is based on QAC2 Subtask 3 with several improvements, including elaboration of the scope of questions and answers and introduction of multi-grade evaluation and the concept of a correct answer set. Expand
ROUGE: A Package for Automatic Evaluation of Summaries
TLDR
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations. Expand
Development of a T-DMB Monitoring System
With the start of T-DMB service, diverse multimedia broadcasting services became available in the high-speed mobile environment. In addition to high-quality digital radio (audio) and televisionExpand
Adaptive MPEG-4 Video Streaming Over IP Networks
TLDR
A system named VSS (Video Streaming Simulation) is proposed for comprehensive video delivered quality evaluation using traffice traces in RTP/UDP/IP network simulation environment. Expand