Corpus ID: 4667931

Template mining for the extraction of citation from digital documents

@inproceedings{Foo2001TemplateMF,
  title={Template mining for the extraction of citation from digital documents},
  author={Schubert Foo and Gobinda G. Chowdhury and Ding Ying},
  year={2001}
}
Information Extraction (IE) is a term that involves the activity of automatically extracting pre-specified sorts of information from short, natural language texts. It may be seen as the activity of populating a structured information source (or database) from an unstructured, or free text, information source. This structured database then can be used for a number of purposes: for creating a citation database, for report generation, for decision making in business, for using data-mining or… Expand
Rule based Autonomous Citation Mining with TIERL
TLDR
A novel rule-based autonomous citation mining technique is proposed that is able to overcome limitations of current leading citation indexes such as ISI Web of Knowl- edge, Citeseer and Google Scholar and significantly enhances the correct discovery of citations. Expand
Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification
TLDR
This paper presents techniques for high accuracy extraction of citations and references from academic papers by collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching. Expand
Locating and parsing bibliographic references in HTML medical articles
  • Jie Zou, D. Le, G. Thoma
  • Computer Science, Medicine
  • International Journal on Document Analysis and Recognition (IJDAR)
  • 2009
TLDR
This paper describes a two-step process using statistical machine learning algorithms, to first locate the references in HTML medical articles and then to parse them, which achieves near-perfect precision and recall rates. Expand
A knowledge-based approach to citation extraction
TLDR
This paper proposes a knowledge-based approach to literature mining and focuses on reference metadata extraction methods for scholarly publications, adopting an ontological knowledge representation framework called INFOMAP to automatically extract the reference metadata. Expand
An Integrated Architecture for Processing Business Documents in Turkish
TLDR
This paper covers the first research activity in the field of automatic processing of business documents in Turkish and proposes a rule-based approach based on extraction ontology which increases portability which requires only domain concepts when compared to information extraction systems that rely on large set of linguistic patterns. Expand
A structural SVM approach for reference parsing
TLDR
A comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing. Expand
Bilingual PRESRI - Integration of Multiple Research Paper Databases
TLDR
P PRESRI develops a system that makes it possible to understand the relationships between papers intuitively based on citation information, and proposes a method for extracting bibliographic information from Postscript and PDF files based on a SVM. Expand
A Structural SVM Approach for Reference Parsing
TLDR
Two types of contextual features are used to compare structural SVM with conventional SVM, and both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. Expand
Automatic Citation Metadata Extraction Using Hidden Markov Models
  • Zhen Ni, Hong Xu
  • Computer Science
  • 2009 First International Conference on Information Science and Engineering
  • 2009
TLDR
This paper describes a method for citation metadata extraction using hidden Markov models that use unlabeled data (plain texts which the authors want to extract metadata) as training data and has good performance in precision and recall. Expand
Introducing structure management in automatic reference resolution: An XML-based approach
TLDR
This article presents a methodology, and an application case, to automatically extract and solve references to fragments of structured documents, and takes advantage of XML markup to locate the position within the structure in which the references are found. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Template Mining for Information Extraction from Digital Documents
TLDR
This article briefly reviews template mining research and shows how templates are used in Web search engines- such as Alta Vista-and in meta-search engines-such as Ask Jeeves-for helping end-users generate natural language search expressions. Expand
Information Extraction: Beyond Document Retrieval
In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities,Expand
Automatic extraction of citations from the text of English-language patents - an example of template mining
TLDR
Methods for automatically isolating and extracting biblio graphic references from the full texts of patents are described and evaluated, and a template mining approach has been developed to relieve patent examiners of the chore of doing this manually. Expand
Automatic Extraction of Facts from Press Releases to Generate News Stories
TLDR
JASPER is a fact extraction system recently developed and deployed by Carnegie Group for Reuters Ltd, which uses a template-driven approach, partial understanding techniques, and heuristic procedures to extract certain key pieces of information from a limited range of text. Expand
FIES: financial information extraction system
TLDR
This paper presents the design and implementation of the FIES system, and describes the evaluation which was carried out, the results obtained and the limitations of the system. Expand
Information extraction
TLDR
A relatively new development—information extraction (IE)—is the subject of this article and can transform the raw material, refining and reducing it to a germ of the original text. Expand
Financial information extraction using pre-defined and user-definable templates in the LOLITA system
TLDR
After describing LOLITA as a general purpose base NLP system, the papcr addresses the issue of how information extraction is pcrformed within the system and how the user-definable template interface has been designed. Expand
Information Extraction
  • M. Pazienza
  • Computer Science
  • Lecture Notes in Computer Science
  • 2002
TLDR
This paper discusses attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from Corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. Expand
Distilling information from text: the EDS TemplateFiller system
TLDR
A system is described which digests large volumes of text, filtering out irrelevant articles and distilling the remainder into templates that represent information from the articles in simple slot/filler pairs, taking advantage of simple string matching techniques to improve the effectiveness of more complex sentence‐level semantic processes. Expand
Distilling Information from Text: The EDS TemplateFiller System
TLDR
A system is described which digests large volumes of text, filtering out irrelevant articles and distilling the remainder into templates that represent information from the articles in simple slot/filler pairs, taking advantage of simple string matching techniques to improve the effectiveness of more complex sentence-level semantic processes. Expand
...
1
2
3
4
...