Regression for citation data: An evaluation of different methods

@article{Thelwall2014RegressionFC,
  title={Regression for citation data: An evaluation of different methods},
  author={Mike A Thelwall and Paul Wilson},
  journal={ArXiv},
  year={2014},
  volume={abs/1510.08877}
}
Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have… Expand
Stopped Sum Models for Citation Data
TLDR
This article assesses stopped sum models for citation data and compares them with two previously used models, the discretised lognormal and negative binomial distributions using the Akaike Information Criterion. Expand
Citation count distributions for large monodisciplinary journals
  • M. Thelwall
  • Computer Science, Mathematics
  • J. Informetrics
  • 2016
TLDR
Fitting statistical distributions to 50 large subject-specific journals in the belief that individual journals can be purer than subject categories and may therefore give clearer findings suggests that the discretised lognormal is the more appropriate distribution for modelling pure citation data. Expand
Stopped sum models and proposed variants for citation data
TLDR
Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). Expand
The h index for research assessment: Simple and popular, but shown by mathematical analysis to be inconsistent and misleading
TLDR
In synthetic series, the number of citations and the mean number of citation are much better indicators of research performance than h and h/N, and it is discussed that this conclusion can be extended to real citation series. Expand
Are the discretised lognormal and hooked power law distributions plausible for citation data?
  • M. Thelwall
  • Mathematics, Computer Science
  • J. Informetrics
  • 2016
TLDR
This article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1.0, and finds that both distributions fail a Kolmogorov–Smirnov goodness of fit test. Expand
More precise methods for national research citation impact comparisons
TLDR
Two new methods to identify national differences in average citation impact are introduced, one based on linear modelling for normalised data and the other using the geometric mean, which has the advantage of distinguishing between national contributions to internationally collaborative articles. Expand
The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression
  • M. Thelwall
  • Mathematics, Computer Science
  • J. Informetrics
  • 2016
TLDR
Comparisons of the discretised lognormal and the hooked power law with citation data are reported, adding 1 to citation counts in order to include zeros. Expand
Double rank analysis for research assessment
TLDR
The double rank analysis is developed, in which publications that have a low number of citations are also included, in order to achieve the same purpose without restrictions. Expand
Does quality and content matter for citedness? A comparison with para-textual factors and over time
TLDR
It is found that the JIF has a larger influence on the citation impact of a publication than the quality (measured by judgments of peers). Expand
The application of citation count regression to identify important papers in the literature on non-audit fees
PurposeThis paper aims to show that when conducting a literature review, important papers can be identified by regressing citation counts on prior publications’Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 81 REFERENCES
Universality of performance indicators based on citation and reference counts
TLDR
This work demonstrates that comparisons can be made between publications from different disciplines and publication dates, regardless of their citation count and without expensive access to the whole world-wide citation graph. Expand
Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents
TLDR
The multiple regression analyses demonstrate that the number of classification of cited patents contributes more to the regression than do other factors, which implies that, if confounding between factors is taken into account, it is the diversity of classifications assigned to backward citations that more largely influences thenumber of forward citations. Expand
Distributions for cited articles from individual subjects and years
TLDR
The results show that the power law is not a suitable model for collections of articles from a single subject and year, even for the purpose of estimating the slope of the tail of the citation data, and only the hooked power law and discrete lognormal distributions should be considered for subject-and-year-based citation analysis in future. Expand
Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal
TLDR
A large-scale empirical analysis of journals from every field in Thomson Reuters' Web of Science database suggests that the discrete lognormal distribution is a globally accurate model for the distribution of “eventual impact” of scientific papers published in single-discipline journal in a single year that is removed sufficiently from the present date. Expand
International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980-2008
TLDR
This researcher proposes geographies of invisible colleagues and a geographic scope effect to further investigate the relationships between author geographic affiliation and citation impact. Expand
How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects
TLDR
This study will explain what adjusted predictions and marginal effects are and how useful they are for institutional evaluative bibliometrics, and focus particularly on Average Adjusted Predictions (AAPs), Average Marginal Effects (AMEs), adjusted Predictions at Representative Values (APRVs) and Marginal effects at Representative values (MERVs). Expand
On determinants of citation scores: a case study in chemical engineering
TLDR
Using multiple regression analysis, it is found that the factor ‘top‐author,’ i.e., the ‘personal variation’ contributes the largest number of citations. Expand
Understanding journal usage: A statistical analysis of citation and use
TLDR
The regression results indicated that print journal use was a significant predictor of local journal citations prior to the adoption of online journals and publisher-provided and locally recorded online journal use measures were also significant predictors of local citations. Expand
Modeling nonuniversal citation distributions: the role of scientific journals
TLDR
A model for citation networks via an intrinsic nodal weight function and an intuitive ageing mechanism is developed that addresses the intrinsic heterogeneity of a paper determined by the impact factor of the journal publishing it. Expand
How well developed are altmetrics? A cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications
TLDR
The main result of the study is that the altmetrics source that provides the most metrics is Mendeley, with metrics on readerships for 62.6 % of all the publications studied, other sources only provide marginal information. Expand
...
1
2
3
4
5
...