Learning a concept-based document similarity measure

  title={Learning a concept-based document similarity measure},
  author={Anna-Lan Huang and David N. Milne and Eibe Frank and Ian H. Witten},
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning… CONTINUE READING
Highly Cited
This paper has 53 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 5 times over the past 90 days. VIEW TWEETS

14 Figures & Tables



Citations per Year

53 Citations

Semantic Scholar estimates that this publication has 53 citations based on the available data.

See our FAQ for additional information.