Scraping the ACM Digital Library

  title={Scraping the ACM Digital Library},
  author={Donna Bergmark and Paradee Phempoonpanich and Shumin Zhao},
  journal={SIGIR Forum},
As part of a larger project to automatically reference link the online scholarly literature, an attempt to analyze PDF documents was undertaken. The ACM Digital Library was used as the corpus for these experiments. With the current PDF and HTML analysis tools, roughly 80% accuracy was obtained in the automatic extraction of reference linking information.