Learning a concept-based document similarity measure

@article{Huang2012LearningAC,
  title={Learning a concept-based document similarity measure},
  author={Anna-Lan Huang and David N. Milne and Eibe Frank and Ian H. Witten},
  journal={JASIST},
  year={2012},
  volume={63},
  pages={1593-1608}
}
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning… CONTINUE READING

14 Figures & Tables

Topics

Statistics

01020201620172018
Citations per Year

Citation Velocity: 10

Averaging 10 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.