Learn More
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram cooccurrences between summary pairs correlates surprising well with human evaluations, based on(More)
Comparisons of automatic evaluation metrics for machine translation are usually conducted on corpus level using correlation statistics such as Pearson’s product moment correlation coefficient or Spearman’s rank order correlation coefficient between human scores and automatic scores. However, such comparisons rely on human judgments of translation qualities(More)
In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created with 6,194 TREC collection texts over 4 selected topics. We(More)
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring(More)
Product reviews posted at online shopping sites vary greatly in quality. This paper addresses the problem of detecting lowquality product reviews. Three types of biases in the existing evaluation standard of product reviews are discovered. To assess the quality of product reviews, a set of specifications for judging the quality of reviews is first defined.(More)
Online forums contain a huge amount of valuable user generated content. In this paper we address the problem of extracting question-answer pairs from forums. Question-answer pairs extracted from forums can be used to help Question Answering services (e.g. Yahoo! Answers) among other applications. We propose a sequential patterns based classification method(More)
This paper describes an overview of the Opinion Analysis Pilot Task from 2006 to 2007 at the Sixth NTCIR Workshop. We created test collection for 32, 30, and 28 topics (11,907, 15,279, and 8,379 sentences) in Chinese, Japanese and English. Using this test collection, we conducted opinion extraction subtask. The subtask was defined from four perspectives:(More)