Corpus ID: 141747

ParsCit: an Open-source CRF Reference String Parsing Package

@inproceedings{Councill2008ParsCitAO,
  title={ParsCit: an Open-source CRF Reference String Parsing Package},
  author={Isaac G. Councill and C. Lee Giles and Min-Yen Kan},
  booktitle={LREC},
  year={2008}
}
We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. [...] Key Method A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.Expand
Evaluation and Comparison of Open Source Bibliographic Reference Parsers: A Business Use Case
TLDR
This study applies, evaluates and compares ten reference parsing tools in a specific business use case, and confirms that tuning the models to the task-specific data results in the increase in the quality. Expand
Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings
TLDR
An empirical evaluation of a CRF-based bibliography parser developed for reference strings of research papers using a conditional random field to estimate the correct bibliographic label such as an author's name and a title for each token in a reference string. Expand
GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing
TLDR
GIANT is a large dataset with 991,411,100 XML labeled reference strings that can be used to train machine learning models, particularly deep learning models for citation parsing, and hypothesise that the dataset will be able to significantly improve the accuracy of citation parsing. Expand
ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers
TLDR
ParsRec, a meta-learning based recommender-system that recommends the potentially most effective parser for a given reference string, is proposed and evaluated on 105k references from chemistry. Expand
Error Detection of CRF-Based Bibliography Extraction from Reference Strings
TLDR
An empirical evaluation of the proposed parsing on the basis of its accuracies and how easy it is to detect errors showed that the proposed measures reasonably indicated parsing errors and could be used to improve the quality of extracted bibliographies at a moderate manual post-editing cost. Expand
Neural ParsCit: a deep learning-based reference string parser
We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural networkExpand
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers
TLDR
This study applies, evaluates and compares ten reference parsing tools in a specific business use case, and confirms that tuning the models to the task-specific data results in the increase in the quality. Expand
Evaluating Reference String Extraction Using Line-Based Conditional Random Fields: A Case Study with German Language Publications
TLDR
This work proposes a classification model that considers every line in a publication as a potential part of a reference string by applying line-based conditional random fields rather than constructing the graphical model based on individual words, dependencies and patterns that are typical in reference sections. Expand
Reference String Extraction Using Line-Based Conditional Random Fields
TLDR
A classification model is proposed that considers every line in a publication as a potential part of a reference string by applying line-based conditional random fields rather than constructing the graphical model based on the individual words, dependencies and patterns that are typical in reference sections. Expand
Reference String Extraction Using Line-Based Conditional Random Fields
TLDR
A classification model is proposed that considers every line in a publication as a potential part of a reference string by applying line-based conditional random fields rather than constructing the graphical model based on the individual words, dependencies and patterns that are typical in reference sections. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 19 REFERENCES
Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding
TLDR
This paper addresses the problem of multiple citation concept alignment by combining and modifying the CRF based pairwise word alignment system of Blunsom & Cohn (2006) and a posterior decoding based multiple sequence alignment algorithm of Schwartz & Pachter (2007). Expand
Citation Parsing Using Maximum Entropy and Repairs
TLDR
This thesis presents ParsCit, a system which parses citations from a publication or article, and label parts of citations with their corresponding field names, and emphasizes the efficiency of the system’s set of repairs, which is absent in citation parsers seen to date. Expand
Extracting Citation Metadata from Online Publication Lists Using BLAST
TLDR
This work presents a new methodology based on protein sequence alignment tool, and develops a template generating system to transform known semi-structured citation strings into protein sequences, which are saved as templates in a database. Expand
Automatic classification of citation function
TLDR
This work shows that the annotation scheme for citation function is reliable, and presents a supervised machine learning framework to automatically classify citation function, using both shallow and linguistically-inspired features, finding a strong relationship between citation function and sentiment classification. Expand
FLUX-CIM: flexible unsupervised extraction of citation metadata
TLDR
A knowledge-base approach to help extracting the correct components of citations in any given format that is unsupervised, in the sense that it does not rely on a learning method that requires a training phase. Expand
Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification
TLDR
This paper presents techniques for high accuracy extraction of citations and references from academic papers by collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching. Expand
CiteSeer: an automatic citation indexing system
TLDR
CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Expand
Accurate Information Extraction from Research Papers using Conditional Random Fields
TLDR
New state-of-the-art performance is achieved on a standard benchmark data set, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Expand
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TLDR
This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data. Expand
Learning Hidden Markov Model Structure for Information Extraction
TLDR
It is demonstrated that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy. Expand
...
1
2
...