Mining Wikipedia Revision Histories for Improving Sentence Compression

  title={Mining Wikipedia Revision Histories for Improving Sentence Compression},
  author={Elif Yamangil and Rani Nelken},
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this… CONTINUE READING
Highly Cited
This paper has 51 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.
32 Citations
8 References
Similar Papers


Publications citing this paper.

51 Citations

Citations per Year
Semantic Scholar estimates that this publication has 51 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…