Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?

@article{Labb2012DuplicateAF,
  title={Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?},
  author={Cyril Labb{\'e} and Dominique Labb{\'e}},
  journal={Scientometrics},
  year={2012},
  volume={94},
  pages={379-396}
}
Two kinds of bibliographic tools are used to retrieve scientific publications and make them available online. For one kind, access is free as they store information made publicly available online. For the other kind, access fees are required as they are compiled on information provided by the major publishers of scientific literature. The former can easily be interfered with, but it is generally assumed that the latter guarantee the integrity of the data they sell. Unfortunately, duplicate and… Expand
On the Use of Similarity Search to Detect Fake Scientific Papers
TLDR
An investigation into the use of similarity search for detecting fake scientific papers is described by comparing several methods for signature construction and similarity scoring and a pseudo-relevance feedback technique is described that can be used to improve the effectiveness of these methods. Expand
Detection of computer generated papers in scientific literature
TLDR
This paper presents the mains characteristic of texts generated by PCFG and Markov Chains, and shows that quantitative tools are effective to characterize originality (or banality) of authors' language. Expand
Referees Often Miss Obvious Errors in Computer and Electronic Publications
TLDR
A scientometric study of 350 articles finding a lower bound of 85.4% articles are found to be incongruous findsorrect informational cascades ruin the literature's signal-to-noise ratio even for uncomplicated cases. Expand
Referees Often Miss Obvious Errors in Computer and Electronic Publications
Misconduct is extensive and damaging. So-called science is prevalent. Articles resulting from so-called science are often cited in other publications. This can have damaging consequences for societyExpand
Google Scholar as a Data Source for Research Assessment
TLDR
This chapter lays the foundations for the use of GS as a supplementary source (and in some disciplines, arguably the best alternative) for scientific evaluation, and presents a broader view of the academic world because it has brought to light a great amount of sources that were not previously visible. Expand
Web indicators for research evaluation. Part 1: Citations and links to academic articles from the Web
TLDR
Research about Google Scholar and Google Patents are reviewed, both of which can be used as sources of impact indicators for academic articles and methods to extract types of links and citations from the web as a whole are reviewed. Expand
Google Scholar as a data source for research assessment
TLDR
It is concluded that Google Scholar presents a broader view of the academic world because it has brought to light a great amount of sources that were not previously visible. Expand
Comparing the topological properties of real and artificially generated scientific manuscripts
TLDR
This paper devise a methodology to distinguish real manuscripts from those generated with SCIGen, an automatic paper generator and show, as a proof of principle, that network features can be used to identify scientific gibberish papers. Expand
Your Paper has been Accepted, Rejected, or Whatever: Automatic Generation of Scientific Paper Reviews
TLDR
This work investigates the feasibility of a tool capable of generating fake reviews for a given scientific paper automatically and finds it could nevertheless find a role in several questionable scenarios and magnify the scale of scholarly frauds. Expand
The Demise of Single-Authored Publications in Computer Science: A Citation Network Analysis
TLDR
The overall decaying trend of single author publications is qualitatively consistent with those observed in other scientific disciplines, though the diminution is taking place several decades later than those in the natural sciences. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science
TLDR
A case study comparing citations found in Scopus and Google Scholar with those found in Web of Science for items published by two Library and Information Science full-time faculty members and a brief overview of a prototype system called CiteSearch, which analyzes combined data from multiple citation databases to produce citation-based quality evaluation measures. Expand
Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for
TLDR
The related features of Google Scholar, Scopus, and Web of Science (WoS) are discussed, and it is demonstrated in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Expand
Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster
TLDR
The related features of Google Scholar, Scopus, and Web of Science (WoS) are discussed, and it is demonstrated in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Expand
Academic Search Engine Spam and Google Scholar's Resilience Against it
TLDR
The results show that academic search engine spam is indeed— and with little effort—possible, and whether academicsearch engine spam could become a serious threat to Web-based academic search engines is discussed. Expand
The pros and cons of computing the h-index using Google Scholar
  • P. Jacsó
  • Computer Science
  • Online Inf. Rev.
  • 2008
TLDR
The paper shows that effective corroboration of the h‐index and its two component indicators can be done only on persons and journals with which a researcher is intimately familiar, and shows that Corroborative tests must be done in every database for important research. Expand
Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses
TLDR
The content coverage and practical utility of PubMed, Scopus, Web of Science, and Google Scholar are compared and PubMed remains an optimal tool in biomedical electronic research. Expand
Algorithmic Detection of Computer Generated Text
TLDR
The results show that taking the formatting and contextual clues offered by online groups, message boards and social news sites into account may be of central importance when selecting features with which to identify such unwanted postings. Expand
Oracle, where shall I submit my papers?
TLDR
A group of MIT students pulled prank on the WMSCI, using software to generate bogus research papers, complete with context-free grammar, and submitted two of them to the conference, and to their surprise, one of the gibberish papers was accepted without any reviews. Expand
Using Compression to Identify Classes of Inauthentic Texts
TLDR
This paper employs the universal lossless source coding algorithms to generate features in a high-dimensional space and then applies support vector machines to discriminate between the classes of authentic and inauthentic texts, supporting conjecture that there exists a relationship between meaning and compressibility. Expand
The similarity metric
TLDR
A new "normalized information distance" is proposed, based on the noncomputable notion of Kolmogorov complexity, and it is demonstrated that it is a metric and called the similarity metric. Expand
...
1
2
3
4
...