On the citation lifecycle of papers with delayed recognition

Abstract

Delayed recognition is a concept applied to articles that receive very few to no citations for a certain period of time following publication, before becoming actively cited. To determine whether such a time spent in relative obscurity had an effect on subsequent citation patterns, we selected articles that received no citations before the passage of ten full years since publication, investigated the subsequent yearly citations received over a period of 37 years and compared them with the citations received by a group of papers without such a latency period. Our study finds that papers with delayed recognition do not exhibit the typical early peak, then slow decline in citations, but that the vast majority enter decline immediately after their first – and often only – citation. Middling papers’ citations remain stable over their lifetime, whereas the more highly cited papers, some of which fall into the “sleeping beauty” subtype, show non-stop growth in citations received. Finally, papers published in different disciplines exhibit similar behavior and did not differ significantly. Introduction Scientific papers are as all things – not all are equal. If a majority of them are noticed by the scientific community (Wallace, Larivière and Gingras, 2009) and integrated into their discipline's body of scientific knowledge soon after publication, something which being cited in other papers is generally considered an indication of, there are those that remain, for a more or less lengthy period of time, in limbo before being cited – papers experiencing delayed recognition. These are papers that receive no or very few citations in the years following their publication, and only later start being cited. Among these are found the so-called “sleeping beauties”, papers that, once “awakened” (usually by the first citing paper, known then as their “prince” (van Raan, 2004)), accumulate a considerable number of citations. Typologies have been proposed to give an overview of the different behaviors of these awakened beauties (Braun, Glänzel and Schubert, 2010), as well as models for the different phases of their citation curves (Li, 2014). The reasons for such differences are likely manifold, and may vary across disciplines. Cole (1970) had found that the content of a paper had more importance than the fame of its author in determining the amount of time required before pickup by the scientific community. Ohba and Nakao (2012) found that sleeping beauties in ophthalmology tended to be papers describing new diseases or new treatments, suggesting that these topics, which can be expected to be further probed or tested before being fully integrated into the discipline’s standard corpus, were the cause of the slow growth of citations, whereas Costas, van Leeuwen and van Raan (2013) found a correlation between delayed recognition and publication in journals with lower impact factors. Heinze et al. (2012) studied more technical questions of scientific growth, considering the cases of Buckminsterfullerene and scanning tunneling microscopy, showing that easy and reliable access to new matter and new instrumentation could also have an effect on the citation of papers on such topics. Van Dalen and Henkens (2005) investigated tell-tale signals of later citation activity, in the field of demography, one of which was whether the state of uncitedness negatively impacted a paper’s potential citation at a future time (“negative duration dependence”), and concluded that it was far from the “death sentence” common wisdom considered it to be, but a study by Li et al. (2014) found that the length of sleep (i. e., uncitedness or very low citation activity) did correlate with lower probability of later awakening. Similarly, in a larger study on delayed recognition focusing onlyon highly cited papers, Glänzel, Schlemmer and Thijs (2003) found that delayed reception did not simply “shift” the citation process in time, and that belated citation activity came with higher risk of uncitedness. Levitt and Thelwall (2008) studied late citation to determine indicators that might predict the presence of (future) highly cited papers. But what of the late bloomers that do not enjoy this – occasionally startling – success? Little seems to have been written specifically on the fate of the poorer cousins in the delayed recognition family. If the sleeping beauties and other Snow Whites are the princesses of this world, what of the shepherdesses, seamstresses and other common folk? In other words, how does late recognition affect the lifecycle of scientific papers? Is there a shift in time, a simple translation of the typical left-skewed distribution peaking 2, 3 years after publication followed by a slow decrease, or do papers with late recognition exhibit a different citation curve once they become cited? To what degree does the ultimate success of an article – in terms of its lifetime citations – affect, if even it does, the accumulation of citations? Does the behavior of sleeping beauties and “common” delayed-recognition papers mirror that of classics and “normal” papers? Finally, it is also known that the citation practices vary between disciplines (Finardi, 2013; Larivière et al., 2006), so what of this disciplinary effect? Do the sleepers behave the same in medicine, in physics or in the social sciences? Methods Sleepers The study was performed with citation data from the Web of Science, including the Science Citation Index Expanded, the Social Science Citation Index and the Arts and Humanities Citation Index. The data was acquired in November 2013. The sleepers group was populated with papers published between 1963 and 1975 (inclusive) that received their first citation(s) only once 10 full years (or more) had elapsed since their publication (i. e., first cited in 1974 or later for papers published in 1963; 1975 and later for those published in 1964, etc.). The number of yearly citations received for the period starting with [year of publication + 11] up until 2013 was gathered for each paper, as well as the discipline assigned by the SCIE. The citation data for 2013 was eliminated, as the year was not yet over at the time of the study. We used a relative time frame for the study, meaning that instead of the calendar years themselves, we used the amount of time elapsed since publication, as t + x, where t is the year of publication and x an integer representing a number of (complete) years elapsed. As we were focusing on the effect over time, this method allows for ignoring the effect of individual years or of punctual events that may have affected the production of scientific papers, which is beyond the scope of our inquiry. For example, this means that the year 2012 corresponds to t + 37 for the papers published in 1975, and to t + 49 for those of 1963, etc. The study was limited to the upper boundary of t + 37, as it was the last year for which a complete citation window was available for all publication years. Starting with t + 38, when the data of 1975-published papers ceases (as 1975 + 38 = 2013, the incomplete year removed from the dataset), each increment loses a publication year, making the calculations increasingly less meaningful and less comparable, as the populations dwindled. As we felt that the citation window available was sufficient to afford the desired overview, the t +37 upper limit it was deemed acceptable. Reference Group The reference group is also drawn from WoS data. It comprises papers published between 1963 and 1975 (inclusive), but with no restriction concerning the date of the first citation(s) – except that the sleepers described above were removed. The citation window for the reference group was structured the same way it was for the sleepers, i. e., using relative time (t + x), but the window itself goes from t + 0 (publication year) to t + 37, as the reference group papers are allowed to receive citations immediately upon publication. Citations received before publication were ignored to simplify data treatment; as they were very few in number, it was felt that their effects would not be impactful. Presentation of the Results We observed the evolution of the citations received by the sleepers over time and compared them to the reference group versus: time elapsed since publication; amount of lifetime citations; and discipline. Disciplinary clusters were made to see if the natural, social and medical sciences behaved differently. The clusters are medicine (made up of papers from the disciplines of: biomedical research, clinical medicine, health, psychology), science (papers from: biology, chemistry, earth & space, mathematics, physics), arts/humanities/social sciences (papers from: arts, humanities, social sciences) and applied disciplines (papers from: engineering and technology, professional fields). The few papers of “unknown” discipline were ignored for this. Data Overview Table 1 presents the distribution of the papers, grouped by total number of citations received. It shows that most sleepers obtain very low numbers of citations, with more than 92% of them receiving between 1 and 5 citations. For the reference group, only slightly more than a third of the papers receive between 1 and 5 citations. On the other hand, while less than 0.1% of sleepers obtain more than 51 citations, this number of citations is obtained by more than 11% of papers of the reference group. On the whole observe that sleepers are, in general, papers with very low citation rates. Appendices 1 and 2 provide the distribution of papers and of lifetime citations by discipline (Appendix 1) and for disciplinary clusters (Appendix 2). These tables show that, globally, disciplines with lower citation density have a higher proportion of sleepers. Table 1. Distribution of papers by total citations sleepers reference group Lifetime citations papers % papers % 1 to 5 117,862 92.60% 848,891 34.98% 6 to 10 6,754 5.31% 418,883 17.26% 11 to 15 1,557 1.22% 259,039 10.67% 16 to 20 509 0.40% 178,053 7.34% 21 to 50 520 0.41% 447,767 18.45% 51+ 82 0.06% 274,031 11.29% Total 127,284 100.00% 2,426,664 100.00% Figure 1 presents the distribution of lifetime citations of sleepers and of papers in the reference groups. In both cases, papers with few citations are clearly in the majority, but the progression is less direct for the reference group. The total number of papers in both groups is quite different (127,284 vs. 2,426,664), as is the total number of citations, which is, of course, much larger in the reference group (299,420 vs. 57,450,854). Figure 1. Log scale distribution of papers vs. lifetime citations received. Reference group (diamonds) and sleepers (squares) Results Effects over time Figure 2 below shows the yearly percentage of lifetime citations accumulated following publication year. Each line represents a specific year of publication, save for the dotted one, which is the global average. In both cases, this latter is an excellent simplified representation: the standard deviation is very small (varying, in percentage points, between 0.12 and 0.54 for the sleepers, 0.04 and 0.25 for the reference group).It shows that both groups behave very similarly. The reference group, being much larger, produces more homogeneous results, but the trends are quite clear among the sleepers. 1 10 100 1,000 10,000 100,000 1,000,000 1 10 100 1,000 10,000 100,000 1,000,000 Li fe %m e Ci ta %o ns Number of papers REF GRP

DOI: 10.1016/j.joi.2014.08.002

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@article{Lachance2014OnTC, title={On the citation lifecycle of papers with delayed recognition}, author={Christian Lachance and Vincent Larivi{\`e}re}, journal={J. Informetrics}, year={2014}, volume={8}, pages={863-872} }