Paraphrase Extraction from Parallel News Corpora

Abstract

Different expressions of the same statement is said to be paraphrases of each other. An example is the phrases ’solved’ and ’found a solution to’ in ’Alice solved the problem’ and ’Alice found a solution to the problem’. Paraphrase Extraction is the method of finding and grouping such paraphrases from free text. Finding equivalent paraphrases and structures can be very beneficial in a number of NLP applications, such as Question Answering, Machine Translation, and Multi-text Summarization, e.g. in Question Answering, alternative questions can be created using alternative paraphrases. We attack the problem by first grouping news articles that describe the same event and then collecting sentence pairs from these articles that are semantically close to each other, and then finally extracting paraphrases out of these sentence pairs to learn paraphrase structures. The precision of finding two equivalent documents turned out to be 0.56 and 0.70 on average, when matching criterion was strict and flexible, respectively. We tried 9 different evaluation techniques for sentence-level matching. Although, exact word match count approach had a better precision value than the n-gram precision count approaches, paraphrase extraction phase shows that the latter approaches catch sentence pairs with higher quality pairs for paraphrase extraction. Our system can extract paraphrases with 0.66 precision when only equivalent document pairs are used as a test set.

12 Figures and Tables

Cite this paper

@inproceedings{Mizrahi2006ParaphraseEF, title={Paraphrase Extraction from Parallel News Corpora}, author={Bengi Mizrahi and Aylin K{\"{u}ntay and Engin Erzin and Deniz Y{\"{u}ret and Burak G{\"{o}rkemli and Tayfun Elmas and Tuğba {\"{O}zbilgin and Mehmet Ali and Başak Mutlum and Z{\"{u}lk{\"{u}f Genç and Ozan S{\"{o}nmez and Utkan {\"{O}ğmen}, year={2006} }