ParsCit: an Open-source CRF Reference String Parsing Package


We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. At the core of ParsCit is a trained conditional random field (CRF) model used to label the token sequences in the reference string. A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.

Extracted Key Phrases

5 Figures and Tables

Citations per Year

257 Citations

Semantic Scholar estimates that this publication has 257 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Councill2008ParsCitAO, title={ParsCit: an Open-source CRF Reference String Parsing Package}, author={Isaac G. Councill and C. Lee Giles and Min-Yen Kan}, booktitle={LREC}, year={2008} }