Learning a concept-based document similarity measure

@article{Huang2012LearningAC,
  title={Learning a concept-based document similarity measure},
  author={Anna-Lan Huang and David N. Milne and Eibe Frank and Ian H. Witten},
  journal={JASIST},
  year={2012},
  volume={63},
  pages={1593-1608}
}
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning… CONTINUE READING
Highly Cited
This paper has 53 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 5 times over the past 90 days. VIEW TWEETS

14 Figures & Tables

Topics

Statistics

01020201320142015201620172018
Citations per Year

53 Citations

Semantic Scholar estimates that this publication has 53 citations based on the available data.

See our FAQ for additional information.