CiteSeer: an automatic citation indexing system

@inproceedings{Giles1998CiteSeerAA,
  title={CiteSeer: an automatic citation indexing system},
  author={C. Lee Giles and Kurt D. Bollacker and Steve Lawrence},
  booktitle={DL '98},
  year={1998}
}
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following… Expand
Exploring Automatic Citation Classification
TLDR
A new citation scheme that is easier to work with than most, a document acquisition and citation annotation tool that helps with the development of annotated citation corpora, and some experiments with automating citation classification are presented. Expand
Autonomous citation matching
TLDR
This work presents machine learning techniques that identify variant forms of citations to the same paper, and presents a number of algorithms that perform best and are sufficiently accurate for unassisted use in an autonomous citation indexing system. Expand
CAD: an algorithm for citation-anchors detection in research papers
TLDR
The paper proposes an algorithm, CAD, for identification of citation-anchors and its in-text citation frequency based on different rules and shows that CAD algorithm improved F-score by 44% and 37% respectively on both J.UCS and CiteSeer dataset over the contemporary technique. Expand
Extracting Citation Metadata from Online Publication Lists Using BLAST
TLDR
This work presents a new methodology based on protein sequence alignment tool, and develops a template generating system to transform known semi-structured citation strings into protein sequences, which are saved as templates in a database. Expand
Lessons Learned: The Complexity of Accurate Identification of in-Text Citations
TLDR
The accurate identification of in-text citations will help information retrieval systems, digital libraries and citation indexes, as well as highlighting the problems (mathematical ambiguities, wrong allotments, commonality in content and string variation) in identifying in- text citations from scientific documents. Expand
An annotation scheme for citation function Conference or Workshop Item
We study the interplay of the discourse structure of a scientific argument with formal citations. One subproblem of this is to classify academic citations in scientific articles according to theirExpand
Pattern Analysis of Citation-Anchors in Citing Documents for Accurate Identification of In-Text Citations
TLDR
A taxonomy and workable system is proposed, which utilizes a set of heuristics build from detailed study and is applied on unseen diversified data set taken from the Journal of Universal Computer Science and CiteSeer. Expand
Rule based Autonomous Citation Mining with TIERL
TLDR
A novel rule-based autonomous citation mining technique is proposed that is able to overcome limitations of current leading citation indexes such as ISI Web of Knowl- edge, Citeseer and Google Scholar and significantly enhances the correct discovery of citations. Expand
Are Your Citations Clean ? New Scenarios and Challenges in Maintaining Digital Libraries
In many scientific-publication digital libraries (DLs) such as CiteSeer, arXiv e-Print, DBLP, or Google Scholar, “citations” play an important role. (The term “citation” refers to the collection ofExpand
Effects of Unpopular Citation Fields in Citation Matching Performance
TLDR
It is proposed that there is always the best combination of citation record fields that helps increase citation matching performance and is applicable regardless of which research framework one may adopt, such as Machine Learning methods or Information Retrieval algorithms. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
AUTOMATIC INDEXING USING BIBLIOGRAPHIC CITATIONS
TLDR
It is shown that the use of bibliographic citations in addition to the normal keyword‐type indicators produces improved retrieval performance, and that in some circumstances, citations are more effective for retrieval purposes than other more conventional terms and concepts. Expand
Comparative citation rankings of authors in monographic and journal literature: a study of sociology
TLDR
The study examined the scholarly literature of sociology and found that the relative rankings of authors who were highly cited in the monographic literature did not change in the journal literature of the same period, suggesting that there may be two distinct populations of highly cited authors. Expand
Evidence of complex citer motivations
There were 20 scholars interviewed about their citation motives in recently published articles. Their 437 citations were scaled along 1 or more of the following 7 citer motives: currency, negativeExpand
Cited Documents as Concept Symbols
TLDR
An interpretation of citation practice in scientific literature is offered which regards citation of a document as an act of symbol usage around the footnote number, and a high degree of uniformity is revealed in the association of specific concepts with specific documents. Expand
On-the-fly Hyperlink Creation for Page Images
TLDR
Using the World-Wide Web, a system for creating hypertext links on the fly in a library composed of bitmapped images of paper documents and text derived from those images by optical-character recognition is described. Expand
Term-Weighting Approaches in Automatic Text Retrieval
TLDR
This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared. Expand
An algorithm for suffix stripping
TLDR
An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL and performs slightly better than a much more elaborate system with which it has been compared. Expand
On the Specification of Term Values in Automatic Indexing
TLDR
It is shown that the standard theories for the specification of term values (or weights) are not adequate, and new techniques are introduced for the assignment of weights to index terms, based on the characteristics of individual document collections. Expand
Citation indexing: its theory and application in science
Citation indexing-its theory and application in science, technology, and humanities , Citation indexing-its theory and application in science, technology, and humanities , مرکز فناوری اطلاعات و اطلاعExpand
Data structures and algorithms for nearest neighbor search in general metric spaces
TLDR
The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces. Expand
...
1
2
3
4
...