Determining crucial factors for the popularity of scientific articles

  title={Determining crucial factors for the popularity of scientific articles},
  author={Robert Jankowski and Julian Sienkiewicz},
Using a set of over 70.000 records from PLOS One journal consisting of 37 lexical, sentiment and bibliographic variables we perform analysis backed with machine learning methods to predict the class of popularity of scientific papers defined by the number of times they have been viewed. Our study shows correlations among the features and recovers a threshold for the number of views that results in the best prediction results in terms of Matthew's correlation coefficient. Moreover, by creating a… 

Figures and Tables from this paper


Impact of lexical and sentiment factors on the popularity of scientific papers
It is found that, in most journals, short titles correlate positively with citations only for the most cited papers, whereas for typical papers, the correlation is usually negative.
The effect of characteristics of title on citation rates of articles
It is suggested that some features in the paper such as type of the title and articles with keywords different from words included in the title can help to predict the number of citation counts.
The incidence and role of negative citations in science
A methodology to characterize “negative” citations was elaborated, finding that negative citations concerned higher-quality papers, were focused on a study’s findings rather than theories or methods, and originated from scholars who were closer to the authors of the focal paper in terms of discipline and social distance.
The advantage of short paper titles
Investigating whether any of this variance can be explained by a simple metric of one aspect of the paper's presentation: the length of its title provides evidence that journals which publish papers with shorter titles receive more citations per paper.
A Principal Component Analysis of 39 Scientific Impact Measures
The results indicate that the notion of scientific impact is a multi-dimensional construct that can not be adequately measured by any single indicator, although some measures are more suitable than others.
Articles with short titles describing the results are cited more often
Some features of article titles can help predict the number of article views and citation counts and could be used by authors, reviewers, and editors to maximize the impact of articles in the scientific community.
Support Vector Machines
This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.
Categorical and Geographical Separation in Science
We study scientific collaboration at the level of universities. The scope of this study is to answer two fundamental questions: (i) can one indicate a category (i.e., a scientific discipline) that
Random Forests
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Usage metrics versus altmetrics: confusing terminology?
In what follows it will be argued why a distinction should be made between the two terms ‘usage metrics’ and ‘altmetrics’.