A Comparison of On-Line Computer Science Citation Databases

This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer's autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation… 
Network-based statistical comparison of citation topology of bibliographic databases
This work compares the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org to reveal statistically significant inconsistencies between some of the databases with respect to individual statistics.
The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency.
What is the best database for computer science journal articles?
It is found that WoS, INSPEC and Scopus provided better quality indexing and better bibliographic records in terms of accuracy, control and granularity of information, when compared to GS and DBLP.
Tsallis q-exponential describes the distribution of scientific citations—a new characterization of the impact
The analysis of the experimental data shows that, within a nonextensive thermostatistical formalism, the Tsallis q-exponential distribution N(c) satisfactorily describes Institute of Scientific Information citations.
Error correction of reference indexing system including multimedia journals
A reference index system and database, ‘Science Citation Index Processing System: SCIPS’ has been developed for reference indexing and to evaluate the impact factors and immediacy indices and the experimental results show the validity of presented system.
An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library
It is shown that the DBLP project started with a narrow focus on two sub-fields and how additional themes have been added in recent years and a model is provided which explains the differences in coverage.
Characterising Web Site Link Structure
Analysis of 18 web sites exhaustively crawled, showed that the internal link structure of the web sites are significantly different when measured with first and second- order topological properties, i.e. properties based on the connectivity of an individual or a pairs of nodes.
Record Matching in Digital Library Metadata Using evidence from external sources to create more accurate matching systems
The de-duplication task takes a list of metadata records as input and returns the list with duplicate records removed, and here, the problem and its solutions are examined.
WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia
A human-in-the-loop workflow that first extracts an initial category tree from crowd-sourced Wikipedia data, and then combines community detection, machine learning, and hand-crafted heuristics or rules to prune the initial tree resulted in WikiCSSH; a large-scale, hierarchically-organized vocabulary for the domain of computer science (CS).
Unsupervised Metadata Extraction in Scientific Digital Libraries Using A-Priori Domain-Specific Knowledge
This paper proposes and presents a novel approach focusing on the improvement in the metadata extraction quality without involving external information sources, but relying on the information present in the document itself and in its corresponding context.


How popular is your paper? An empirical study of the citation distribution
Abstract:Numerical data for the distribution of citations are examined for: (i) papers published in 1981 in journals which are catalogued by the Institute for Scientific Information (783,339 papers)
The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives
The most time-consuming task for the maintainers of DBLP may be viewed as a special instance of the authority control problem: how to normalize different spellings of person names.
Unfamiliar citations breed mistakes
The tendency to make errors when typing strings of familiar and unfamiliar names is examined, showing that typing errors occur at a greater rate in long strings of non-English names, and suggesting that undercitation could be an artefact of miscitation rather than trueUndercitation, which casts doubt on the objectivity of citation analysis.
Free online availability substantially increases a paper's impact
The results are dramatic, showing a clear correlation between the number of times an article is cited and the probability that the article is online, in computer science.
Browsing and visualizing digital bibliographic data
An overview of some important research issues within the field of bibliographical information retrieval and visualization within the DBLP (Digital Bibliography & Library Project) Computer Science Bibliography is given.
A comparative study of citations from papers by Korean scientists and their journal attributes
The analysis of variance indicated a significant difference between journal citations in Korean sources and publisher type in the chemistry field, and a correlation between several journal attributes and citations in the physics and computer science fields was indicated by the study.
Statistics of citation networks
The out-degree distribution of citation networks is investigated. Statistical data of the number of papers cited within a paper (out-degree) for different journals in the period 1991-1999 is
Read Before You Cite!
The application of statistical analysis to misprints in scientific citations can give an insight into the process of scientific writing as well as explain empirical studies of misprint distributions in citations.
Are citations of scientific papers a case of nonextensivity?
Abstract:The distribution N(x) of citations of scientific papers has recently been illustrated (on ISI and PRE data sets) and analyzed by Redner (Eur. Phys. J. B 4, 131 (1998)). To fit the data, a
Citation networks in high energy physics.
The citation network constituted by the SPIRES database is investigated empirically and a consideration of citation distribution by subfield shows that the citation patterns of high energy physics form a remarkably homogeneous network.