Phrase Detection in the Wikipedia

  title={Phrase Detection in the Wikipedia},
  author={Miro Lehtonen and Antoine Doucet},
The Wikipedia XML collection turned out to be rich of marked-up phrases as we carried out our INEX 2007 experiments. Assuming that a phrase occurs at the inline level of the markup, we were able to identify over 18 million phrase occurrences, most of which were either the anchor text of a hyperlink or a passage of text with added emphasis. As our IR system — EXTIRP — indexed the documents, the detected inline-level elements were duplicated in the markup with two direct consequences: 1) The… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.
3 Citations
7 References
Similar Papers


Publications citing this paper.


Publications referenced by this paper.
Showing 1-7 of 7 references

Proceedings of the 16 th International Conference on Machine Learning ICML99 Workshop on Machine Learning in Text Data Analysis , Ljubljana , Slovenia ,

  • D. Mladenic, M. Grobelnik
  • J . Stefan Institute
  • 1999

An algorithm for suffix stripping

  • M. F. Porter
  • Program 14
  • 1980
1 Excerpt

Similar Papers

Loading similar papers…