ParsCit: an Open-source CRF Reference String Parsing Package


We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. At the core of ParsCit is a trained conditional random field (CRF) model used to label the token sequences in the reference string. A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.

Cite this paper

@inproceedings{Councill2008ParsCitAO, title={ParsCit: an Open-source CRF Reference String Parsing Package}, author={Isaac G. Councill and C. Lee Giles and Min-Yen Kan}, booktitle={LREC}, year={2008} }