RESEARCH ARTICLE - Specialized Research AI at MIT

RESEARCH ARTICLE

Heavy-tailed distribution of the number of
papers within scientific journals

Robin Delabays1

and Melvyn Tyloo2

1Center for Control, Dynamical Systems and Computation, UC Santa Barbara, Santa Barbara, CA 93106 USA
2Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA

Keywords: cumulative advantage, heavy-tail, preferential attachment, publications, scholarly
journals

ABSTRACT

Scholarly publications represent at least two benefits for the study of the scientific community
as a social group. First, they attest to some form of relation between scientists (collaborations,
mentoring, heritage, …), useful to determine and analyze social subgroups. Second, most of
them are recorded in large databases, easily accessible and including a lot of pertinent
information, easing the quantitative and qualitative study of the scientific community.
Understanding the underlying dynamics driving the creation of knowledge in general, and of
scientific publication in particular, can contribute to maintaining a high level of research, by
identifying good and bad practices in science. In this article, we aim to advance this
understanding by a statistical analysis of publication within peer-reviewed journals. Namely,
we show that the distribution of the number of papers published by an author in a given
journal is heavy-tailed, but has a lighter tail than a power law. Interestingly, we demonstrate
(both analytically and numerically) that such distributions match the result of a modified
preferential attachment process, where, on top of a Barabási-Albert process, we take the finite
career span of scientists into account.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

INTRODUCTION

One of the core mechanism in the practice of science is the self-examination of a field of
research. The validation of a scientific result is always collective, in the sense that it has been
scrutinized, criticized, and (hopefully) validated by a sufficient number of peers. Furthermore,
any scientific result is permanently subject to new evaluation and might be replaced by more
accurate work. At the level of a community, scientists are then used to criticize the work of
colleagues and to have their work criticized by them. It is then not surprising that some sci-
entists started to study (and thus somehow critically assess) the scientific community itself
(Price, 1963).

The quantitative study of the scientific community, sometimes referred to as Science of Sci-
ence (Fortunato, Bergstrom et al., 2018; Narin, 1976; Price, 1976; van Raan, 2019), is a key step
to unravel the underlying behaviors of its composing agents (authors, journals, institutions, etc.).
Pioneered by the early works of Lotka (1926), the science of science gained a lot of momentum
in the second half of the 20th century, with the creation of the first databases of scientific pub-
lications (Garfield, 1955; Merton, 1968; Price, 1965). More recently, the scientometric inves-
tigations have been significantly eased by the emergence of large online databases of scientific
publications ( Web of Science, PubMed, arXiv, …) and the ever-increasing computation power

a n o p e n a c c e s s

j o u r n a l

Citation: Delabays, R., & Tyloo, M.
(2022). Heavy-tailed distribution of the
number of papers within scientific
journals. Quantitative Science Studies,
3(3), 776–792. https://doi.org/10.1162
/qss_a_00201

DOI:
https://doi.org/10.1162/qss_a_00201

Peer Review:
https://publons.com/publon/10.1162
/qss_a_00201

Received: 18 February 2022
Accepted: 21 June 2022

Corresponding Author:
Robin Delabays
robindelabays@ucsb.edu

Handling Editor:
Ludo Waltman

Copyright: © 2022 Robin Delabays and
Melvyn Tyloo. Published under a
Creative Commons Attribution 4.0
International (CC BY 4.0) license.

The MIT Press

Distribution of the number of papers within scientific journals

of modern computers. These improvements have allowed the analysis of scientometric indica-
tors on a larger scale (Frandsen & Nicolaisen, 2017; Wang & Waltman, 2016) and with finer
resolution in terms of publication units (considering single articles instead of whole journals
(e.g., Waltman & van Eck, 2012) and time (Newman, 2001; Egghe & Rousseau, 2000). For a clear
historical overview of scientometrics, we refer to van Raan (2019).

The science of science has the potential to help maintaining the quality of research, and is
thus a good use of public funding. There are nowadays an increasing number of scientific
papers (Bornmann & Mutz, 2015; Price, 1965), combined with the ubiquitous presence of
predatory journals which publish the papers they receive, charging publication fees, but with-
out performing the fundamental editorial work that guarantees the papers’ quality (e.g., quality
and pertinence check, referee process; Bohannon, 2013; Sorokowski, Kulczycki et al., 2017).
In such a context, distinguishing bad practices from honest work in scientific publishing
becomes more and more challenging. Understanding the underlying dynamics of scientific
publication will be instrumental in this endeavor.

The fight against predatory publishing has benefited from the effort of many dedicated cit-
izens, whose initiatives have shown their efficacy (Butler, 2013; Grudniewicz, Moher et al.,
2019), as well as their limits (Beall, 2017). With regard to the proliferation of predatory jour-
nals, the task of identifying all of them unequivocally is overwhelming. In such a context, the
ability to perform a preliminary data-based sanity check of a given journal would allow
resources to be focused on the more problematic venues. However, such an approach requires
an accurate understanding of the quantitative and qualitative characteristics of scientific jour-
nals, which is still scarce.

The quality of a scientist’s work is commonly quantified by two different, but related, mea-
sures, namely, their number of papers and the number of citations thereof (summarized in the
h-index [Hirsch, 2005; Siudem, Żogal(cid:1)a-Siudem et al., 2020]). The vast majority of investiga-
tions about the scientific publication process are focused on the citation side. These analyses
mostly aim to describe how the citation network impacts the number of citations a given paper
is (and therefore its authors are) likely to receive. In particular, evidence suggests that citations
follow a cumulative advantage or preferential attachment process, where the more citations a
scientist has, the more likely they are to get new citations (Price, 1976). This process leads to a
power law (PL) distribution of citations (Eom & Fortunato, 2011; Waltman, van Eck, & van
Raan, 2012) or other heavy-tailed distributions (Thelwall, 2016). Indeed, preferential attach-
ment has been proven to lead to heavy-tailed distributions (Krapivsky, Redner, & Leyvraz,
2000), with some refinements to account for the lifetime of a paper (Parolo, Pan et al., 2015).

As early as 1926, Lotka showed that, in the field of chemistry, the number of scientists hav-
ing published N papers is proportional to N−2 (Lotka, 1926). In other words, he showed that
the distribution of the number of papers published by scientists follows a PL. Later on, the same
analysis was extended to other fields of science (e.g., Barrios, Borrego et al., 2008; Gupta &
Karisiddappa, 1996; Huber & Wagner-Döbler, 2001a, 2001b; Newby, Greenberg, & Jones,
2003; Pal, 2015; Sutter & Kocher, 2001; Wagner-Döbler & Berg, 1999) and refined to more
elaborate distributions, such as the power law with cutoff (PLwC ) (Kretschmer & Rousseau,
2001; Saam & Reiter, 1999; Smolinsky, 2017) or the stretched exponential distribution
(Laherrère & Sornette, 1998). Despite this early start, the number of papers published by a
scientist has been less investigated than the number of citations that a paper or a scientist gets.

With the objective of refining these past analyses, in this article we focus on the distribution
of the number of papers published by scientists within a given peer-reviewed journal. The dis-
tribution of the number of papers is both easily accessible (through any scientific publication

Quantitative Science Studies

777

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

Left and center: Histograms of the number of papers n published in Physical Review Letters (PRL) and Physical Review D (PRD)
Figure 1.
among the authors who published in these journals. For each value of n, the height of the bar gives the proportion of authors who published n
articles in the corresponding journal. Best distribution fits (see Section 2.1) are displayed for an exponential distribution (gray dotted), a power
law (dashed black), a power law with cutoff (dash-dotted black), and a Yule-Simon distribution (dotted black). The arrows indicate significant
peaks in the number of authors, corresponding to the ATLAS and CMS experiments at the CERN. Right: Two-dimensional, color-coded his-
togram of the number of authors with respect to the number of papers published in PRL (horizontal axis) and PRD (vertical axis).

database) and informative. Indeed, various characteristics of the publication dynamics within a
journal can be extracted from the aforementioned distribution. We illustrate this claim in the
striking examples of Physical Review Letters and Physical Review D, shown in Figure 1, where
the analysis of the distribution emphasizes an underlying preferential attachment dynamics;
the finiteness of scientific careers; and the presence of (very) large groups of scientists in the
related fields of physics (see the caption of Figure 1 for a detailed discussion).

As interestingly pointed out by Sekara, Deville et al. (2018), publishing in a peer-reviewed
journal (especially in high-impact ones) is more likely if one author of the manuscript has
already published in the same journal. Such a process can be interpreted as preferential attach-
ment, and an expected outcome of such an observation is a high representation of a few
authors in a given journal (Krapivsky et al., 2000). Furthermore, a scientist whose field of
research is well aligned with a journal topic is likely to publish a large proportion of their work
in this journal, leading again to high representation of a few specialized authors in a given
journal.

The heavy-tailedness of the distribution of the number of papers is striking in the histograms
(see Figures 1 and 2). Indeed, the tail of the histogram is stronger than the best exponential fit
to the data (gray dotted line). However, as we show below, the famous PL is not a good fit to
the data either, and the actual distribution lies somewhere between an exponential and a PL.
In addition to our analysis of the distribution, we propose an adaptation of the preferential
attachment law that models the evolution of the number of papers of a set of authors within
a journal.

2. EMPIRICAL AND FITTED DISTRIBUTIONS

We consider an arbitrary selection of 14 peer-reviewed journals (Table 1), whose data are
available on the Web of Science data base ( WoS, www.webofscience.com). The selected jour-
nals vary in age (from a few decades to more than a century) but are not too young, in order to
have sufficiently many papers available, and all of them are still publishing nowadays.
Whereas the choice of journals is arbitrary and limited, we tried to cover a diversity of disci-
plines of the natural sciences and various time spans. The limited sample of journals does not
allow us to claim any universality in our results, but we argue that it demonstrates the perti-
nence of our approach in the quantitative analysis of the scientific publication process.

Quantitative Science Studies

778

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

Figure 2. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in
these journals (see Table 1 for legends). As in Figure 1, for each value of n, the height of the bar gives the proportion of authors who published n
articles in the corresponding journal. The gray dotted line is an exponential fit of the data, emphasizing that the distribution is heavy-tailed. We
also show the best fit (MLE), discussed in Section 2.1, for a power law distribution (dashed black), power law with cutoff (dash-dotted black),
and Yule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximum number of papers if the distribution was
the fitted power law (see Section 4). The same plots for the other journals are available in Figure 1 and in Figure A.1.

Labels, names, and number of authors in the journals considered. In parentheses is given the reduction year (discussed in Section 4)
Table 1.
and the number of authors up to this year. One (resp. two) asterisk(s) indicate the journals where authors with one (resp. two) paper(s) are
discarded.

Label
NAT

PNA

SCI

LAN

NEM

PLC

ACS

TAC

ENE

CHA

SIA

AMA

PRD

PRL

Journal name (reduction year)

Nature* (1950)

# authors (reduction)
63,791 (3,374)

Proceedings of the National Academy of Sciences of the USA** (1950)

Science* (1940)

The Lancet* (1910)

New England Journal of Medicine* (1950)

Plant Cell (2000)

Journal of the American Chemical Society* (1930)

IEEE Transactions on Automatic Control (2000)

Energy (2005)

Chaos

SIAM Journal on Applied Mathematics

Annals of Mathematics

Physical Review D

Physical Review Letters*

55,849 (2,495)

48,928 (4,788)

33,416 (3,015)

27,078 (3,842)

20,649 (4,712)

82,223 (5,301)

8,911 (3,603)

28,920 (4,491)

7,409

6,106

3,679

64,922

90,993

Quantitative Science Studies

779

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

We denote by J = {NAT, PNA, …, PRL} the set of journals considered (see Table 1 for the list
, Atot
J being
, we count

of labels). Within each journal J 2 J , we index authors by an integer i = 1, …, Atot
the number of authors who published in journal J. Then for each author i = 1, …, Atot
the number n J
i of papers published by author i in journal J up to year 2017 in the whole WoS
database (meaning from year 1900 or the year of the journal’s creation, whichever is the later).
i : i = 1, …, Atot
This process yields the set of data DJ = {n J
integer numbers.
We restrict our investigation to papers labeled as “Article” in the WoS data base, to focus on
peer-reviewed papers.

J }, which is a set of Atot

From the data set DJ we can compute the number and proportion of authors who published

n papers

(cid:1)

AJ nð Þ ¼ # i : n J
i
P

(cid:3)
¼ n

;

aJ nð Þ ¼ AJ nð Þ=Atot

;

(1)

and by definition,
Figures 1, 2, and A.1, each panel corresponding to a different journal.

n aJ(n) = 1. The proportion aJ is represented on logarithmic scales in

Remark. Note that we did not take into account the fact that the different papers are co-
signed by multiple authors. Consequently, different papers have different “weights” in the data
set. We are mostly interested in the number of papers from the point of view of the authors; it is
then adequate to count, for each author, the number of papers they signed, independently of
the number of coauthors. Refining the analysis and taking into account the number of coau-
thors on each paper would be the purpose of future work.

Note also that we do not take into account papers published anonymously, which represent

a large number of papers in medicine journals in particular.

Finally, for some journals, the number of authors is too large to be downloaded from the
WoS database. As a consequence, authors who have published only one or two papers in
these journals have to be removed from the data (e.g., NAT, PNA, or SCI, indicated by asterisks
in Table 1).

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

2.1. Distribution Fitting

Because of the apparent heavy-tailedness of the distribution, it is tempting to fit a PL. However,
as pointed out by Clauset, Shalizi, and Newman (2009), such fitting should be done with care
in order to avoid spurious conclusions (Broido & Clauset, 2019). We therefore fit three heavy-
tailed distributions and assess the goodness-of-fit of our fitting following Clauset et al. (2009),
which is encoded in a p-value. Numerical results are summarized in Table 2.

For each empirical distribution of the number of papers published by an author i in journal
J, we fit an exponential distribution (gray dotted lines in Figures 1 and 2) to emphasize their
heavy-tailed behavior. The three heavy-tailed distribution that we fit are

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:129) A PL distribution (black dashed lines in the figures),

(cid:4)
Ppl n J
i

(cid:5)

¼ n; α

¼ Cαn−α;

with α > 1 and Cα 2 ℝ normalizing the distribution;

(cid:129) A PLwC (black dash-dotted lines in the figures),

(cid:4)
Pplc n J
i

(cid:5)

¼ n; β; γ

¼ Cβ;γn−βe−γn;

with β > 1, γ > 0, and normalizing constant Cβ,γ 2 ℝ; and

(2)

(3)

780

Quantitative Science Studies

Distribution of the number of papers within scientific journals

Fitted parameters and p-value of the goodness-of-fit for power law (PL), power law with cutoff (PLwC), and Yule-Simon ( Y-S)
Table 2.
distributions. No set of data is well fitted by a PL distribution. However, the PLwC seems to be a good fit for three journals (SCI, PLC, CHA),
and the Yule-Simon distribution seems to correctly fit the distribution of NEM and SIA. For the other journals, none of the distributions seem to
fit the data appropriately.

2.58

2.53

2.68

2.47

2.76

2.30

2.11

2.08

2.36

2.47

2.49

2.26

1.49

1.73

p (%)
0.0

0.0

2.11

2.30

2.09

2.36

1.92

1.95

1.84

2.12

2.28

2.20

1.72

1.24

1.52

PLwC
γ

0.07

0.02

0.06

0.05

0.07

0.10

0.01

0.04

0.06

0.05

0.08

0.14

0.005

p (%)
0.0

0.0

16.64

0.18

0.2

13.42

0.0

0.12

80.84

2.24

0.18

0.02

0.12

NAT

PNA

SCI

LAN

NEM

PLC

ACS

TAC

ENE

CHA

SIA

AMA

PRD

PRL

(cid:129) A Yule-Simon distribution (black dotted lines in the figures),

(cid:4)
Pys n J
i

(cid:5)

¼ n; ρ

ð
¼ Cρ ρ − 1

ð
ÞB n; ρ

Þ;

Y-S

3.10

2.83

3.28

2.90

3.43

3.01

2.32

2.51

3.15

3.43

3.49

2.95

1.55

1.80

p (%)
0.0

0.0

0.02

0.0

8.82

0.92

0.0

0.02

0.0

9.06

0.0

(4)

with ρ > 0, Cρ 2 ℝ is the normalizing constant, and where B(x, y) is the Euler beta
function.

We perform the distribution fitting by optimizing the parameters α, β, γ, and ρ with a Max-
imum Likelihood Estimator (Clauset et al., 2009). The curves of the fitted distributions are plot-
ted in Figures 1, 2, and A.1, and the fitted parameters are given in Table 2. Other distributions
(such as log-normal, Lévy, Weibull) were tested and discarded because they were far from
matching the data.

2.2. Goodness of Fit

To evaluate the goodness of our fits, we again follow Clauset et al. (2009), to which we refer
for an in-depth discussion of heavy-tailed distribution fitting. The whole goodness-of-fit esti-
mation is summarized in Figure 3.

Let us denote by θJ the parameters of the distribution P(X; θ) (e.g., θJ = α for the PL distri-
bution), fitted to the data set DJ. We generate 5,000 sets of synthetic data eDi, i = 1, …, 5,000,
each of them composed of Atot
J = |DJ| integer numbers, drawn randomly from the probability

Quantitative Science Studies

781

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

Scheme of the goodness-of-fit computation. For a given journal J, the data set DJ is fitted
Figure 3.
with a distribution whose parameters are θJ, and we compute the Kolmogorov-Smirnov (KS) dis-
tance between its empirical and theoretical cumulative distribution functions (TCDFs). Then, based
on the parameters θJ, we generate 5,000 synthetic data sets eDi for i = 1, …, 5,000, on which we
repeat the same process. Finally, the p-value is the proportion of synthetic data sets whose empirical
and TCDF are closer to each other (in the KS sense) than for the original data set DJ.

distribution PJ = P(X; θJ). For each of these synthetic data sets eDi, we perform again an MLE to fit
the same distribution P(X; θ), yielding parameters θ~

i and the distribution Pi = P(X; θ~
i).

The goodness-of-fit then relies on how well F e, the empirical cumulative distribution func-
tion (ECDF) for a given set of data, matches F t, the theoretical cumulative distribution function
(TCDF) of its fitted distribution. We define

# n 2 eDi : n ≤ k
# eDi

;

F e
i kð Þ ¼

F t
ð
i kð Þ ¼ P n ≤ k; θi

Þ;

and F e

J and F t

J are defined similarly with the data set DJ.

The p-value of the goodness-of-fit is then given by

(cid:1)

(cid:4)

# i : dKS F e
i

(cid:4)

> dKS F e
J

(cid:5)

; F t
i
5000

(cid:5)

(cid:3)

; F t
J

;

p ¼

(5)

(6)

where the Kolmogorov-Smirnov distance between two cumulative distribution functions F1
and F2 is defined as the maximum difference between them:

ð
dKS F1; F2

Þ ¼ max

F1 kð Þ − F2 kð Þ
j

(7)

Namely, p is the proportion of synthetic data sets that are further from the theoretical distribu-
tion (in the Kolmogorov-Smirnov sense) than the analyzed data set. The fit is rejected if p < 5%, and considered as good otherwise (see Clauset et al. (2009) for more details). This goodness-of-fit estimation is performed for each journal J 2 J and each distribution listed above (PL, PLwC, and Yule-Simon). The results are presented in Table 2 and the resulting distributions together with the data are shown in Figures 1, 2, and A.1. As can be seen in Figures 1, 2, and A.1, the PL distribution is a poor fit for all data, its p- value being zero for all journals. Indeed, for most of the journals, the tail of the data set is lighter than the tail of its PL fit (black dashed lines). For three journals (SCI, PLC, CHA), the p-value of the PLwC is larger than 5% and it seems to be a rather good fit, and for two others (NEM and SIA), the Yule-Simon distribution cannot be excluded. Quantitative Science Studies 782 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 3 3 7 7 6 2 0 5 7 7 9 1 q s s _ a _ 0 0 2 0 1 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Distribution of the number of papers within scientific journals 3. GENERAL DYNAMICS We argue that the heavy-tailedness observed in the previous section is likely to be a conse- quence of a preferential attachment or cumulative advantage process. Many social processes are ruled by so-called preferential attachment (Jeong, Néda, & Barabási, 2003), also called cumulative advantage. Scientific coauthorship (Barabási, Jeong et al., 2002), citations (Eom & Fortunato, 2011; Price, 1976), and performance of scientific institutions (van Raan, 2007) are apparently no exception to the rule. For instance, according to Eom and Fortunato (2011), the probability that a paper will get a new citation at time t is proportional to the number of citations this paper already has at time t. Such processes naturally lead to PLs in the relations between characteristics of the sys- tems of interest. For instance, Katz (1999) showed that the number of citations a scientific community gets is a PL of the number of publications in this community, with positive expo- nent (≈ 1.27). More recently, Bettencourt, Lobo et al. (2010) illustrate that the Gross Metro- politan Product of a city is a PL of its population, with positive exponent (≈ 1.126). In a similar spirit, Barabási and Albert (1999) showed that the empirical probability that a web page is targeted by k other pages follows a PL with negative exponent (≈ −2.1). It is reasonable to expect that the evolution of the number of papers published by an author in a given journal is described by a similar preferential attachment process. We support the hypothesis of a preferential attachment or cumulative advantage process by two distinct but similar analysis of publication data. Remark. Notice that even though we refer to the two analyses below as preferential attach- ment and cumulative advantage, respectively, these two denominations fundamentally refer to the same general process (Perc, 2014). The main reason for us to use these two denominations is to distinguish the two analyses. Furthermore, the line of reasoning underlying each of our analysis is inspired by the definition of the corresponding notion (“preferential attachment” or “cumulative advantage”). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 3 3 7 7 6 2 0 5 7 7 9 1 q s s _ a _ 0 0 2 0 1 p d . / 3.1. Preferential Attachment Heuristically, our first argument is that if an author published a lot of papers in a journal, it means (a) that they write a lot of papers and (b) that their research topic is well aligned with the scope of the journal (for specialized journals), or that the scientific impact of this author’s research matches the standards of the journal (for interdisciplinary journals). Assumptions (a) and (b) together imply that this author is likely to publish again in this journal. We refer to this process as preferential attachment. The above heuristic can be made more rigorous. For a given journal and for k, t 2 ℤ≥0, we define f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 (cid:129) S(k, t): the set of all authors who have published k papers on December 31 of year t − 1; (cid:129) Ak(t) = #S(k, t): the number of authors in the set S(k, t); (cid:129) Nk(t): the number of papers published during year t by all the authors in the set S(k, t); and (cid:129) ρk(t) = Nk(t)/Ak(t) 2 ℝ: the average number of papers published during year t, by the authors in the set S(k, t). In Figure 4, we plot the values of ρk(t) with respect to the number of papers k for years t 2 {1999, …, 2008} for SCI, LAN, and PRL (each point corresponds to one year t and one number of papers k). For each of the three journals, these values have a linear correlation coefficient Quantitative Science Studies 783 Distribution of the number of papers within scientific journals Figure 4. Average number of papers published within year t 2 {1999, …, 2008}, for authors in the set S(k, t), as a function of k, for SCI, LAN, and PRL. Each point corresponds to one of the years in {1999, …, 2008} (hence multiple points for the same value of k). The Pearson correlation coefficients of the point clouds are respectively rSCI ≈ 0.714, rLAN ≈ 0.707, and rPRL ≈ 0.763, all larger than 0.7, suggesting a relation close to linear. For SCI (resp. LAN and PRL), 14 points (resp. 12 points and two points) are left out of the frame, for sake of readability. larger than 0.7, supporting a fairly good linear dependence, ρk tð Þ∼k: (8) Note that, for each year considered, we do not take into account authors who did not publish, because the majority of those are not active anymore. The empirical probability that a new paper is signed by an author with k papers is then close to being proportional to k. Krapivsky et al. (2000) rigorously proved that, if the relation in Eq. 8 was exactly proportional, then after a long enough time, the distribution of the num- ber of papers over the set of authors would be a PL with exponent α ≤ −2. The fact that the relation 8 is not exactly proportional but close to it probably explains that the observed dis- tributions have tails that are heavy, but lighter than the PL, as suggested in Figures 1 and 2. 3.2. Cumulative Advantage The concept of cumulative advantage, which is directly related to preferential attachment, has been derived from the seminal work of Merton (1968, 1988) and Price (1976), and the follow- up by Katz (1999). Cumulative advantage emphasizes that an initial advantage leads to a dis- proportionate advantage in the future. For instance, it has been shown that, if author i has twice as many publications as author j, then they are likely to get more than twice as many citations (Katz, 1999). In the context of interest for this article, cumulative advantage translates as follows. Assume that author i and author j have respectively ni(t0) and nj(t0) papers in a journal at time t0, with a ratio ηij(t0) = ni(t0)/nj(t0) > 1. Then cumulative advantage means that, at a later time t1 > t0, the
ratio ηij(t1) ≥ ηij(t0), implying that author i gains a disproportional advantage over time. Math-
ematically speaking, cumulative advantage implies the following equivalences:

ni t0ð Þ ≥ nj t0ð Þ ⇔ ni t0ð Þ
nj t0ð Þ

≤ ni t1ð Þ
nj t1ð Þ

⇔ ni t1ð Þ
ni t0ð Þ

≥ nj t1ð Þ
nj t0ð Þ

⇔ ξ

ð
i t0; t1

Þ ≥ ξ

ð
j t0; t1

Þ;

(9)

where we defined ξi(t, s) = ni(s)/ni(t), and where equalities hold if the relation in Eq. 8 is exact.

To support the presence of a cumulative advantage in the publication within the journals
SCI, LAN, and PRL, we computed ξi(1999, 2008) for each author who published between
1999 and 2008. The statistics of ξi are shown in Figure 5 as a function of the initial number
of papers ni(1999). Even though the data are not perfectly conclusive, we clearly observe an
increasing trend of ξi as a function of ni, suggesting that the relation of Eq. 9 may be satisfied.

Quantitative Science Studies

784

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

Figure 5. Statistics of the ratio ξi between the number of papers in 1999 and in 2008 as a function of the number ni of papers in 1999, in the
three journals SCI (left), LAN (center), and PRL (right). For each value of ni(1999), there are multiple authors with this number of papers in 1999.
Among these authors, the dots show the median value of ξi, the bar covers the second and third quartiles, and the crosses are the maximal and
minimal values. Despite no exact increase of the values, there is an increasing trend of ξi with respect to ni, supporting the presence of a
cumulative advantage process.

This observation supports (at least partly) a cumulative advantage process, and henceforth the
presence of a PL.

The increasing trends in Figure 5 even suggest a superlinear cumulative advantage
(Krapivsky & Krioukov, 2008; Zhou, Wang et al., 2007).
Indeed, as mentioned above,
if the relation Eq. 8 was exact, ξi(t0, t1) would be constant with respect to ni(t0). In such a case,
the heavy-tailed distribution observed in Figures 1, 2, and A.1 would be the transient state of the
distribution discussed by Krapivsky and Krioukov (2008). A more in-depth analysis of the pos-
sibility of a superlinear cumulative advantage could be done, following the calibration
approach proposed by Zadorozhnyi and Yudin (2015), but goes beyond the purpose of this
article and will be treated in future work.

4. KEY PLAYERS

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

The general distribution of the number of papers per author is quite clear in our analysis: It
seems to be somewhere between an exponential distribution and a PL. The PL having the
heaviest tail of the three distributions considered (PL, PLwC, and Yule-Simon), we use it to
estimate an upper bound on the number of papers published by an author for each journal.
Assuming that the data are well described by the PL distribution in Eq. 2, one can compute the
J Cαn−α. Setting this number to An = 1,
number of authors with n papers in journal J, An ≈ Atot
1
the maximum number of papers is given by nmax ≈ (Atot
α, determining a theoretical upper
J Cα)
bound on the number of papers published by an author for each journal, shown as the vertical
dashed lines in Figures 1, 2, and A.1.

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

In some journals (see e.g., PNA, CHA, SIA, and AMA in Figure 2, and NEM and ACS in
Figure A.1), it appears that, some authors, which we refer to as key players, publish signifi-
cantly more papers in a journal than the PL would predict. Note that we checked that these
key players are not artifacts due to multiple authors having the same name, which would
count as the same person.

To make the data of different journals more comparable, we restricted our investigation to
the early years between 1900 (earliest possible in WoS) and the year in parentheses in the
second column of Table 1 for our first nine journals in the table. This yields a number of
authors comparable to the three following journals in Table 1 (CHA, SIA, and AMA). The
reduced number of authors is given in parentheses in the third column of Table 1. The resulting

Quantitative Science Studies

785

Distribution of the number of papers within scientific journals

Figure 6. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in
these journals (see Table 1 for legends). Data are restricted to the years between 1900 (earliest possible in WoS) and the years indicated in the
insets. The number of authors covered is given in parentheses in the third column of Table 1. As in Figures 1 and 2, for each value of n, the
height of the bar gives the proportion of authors who published n articles in the corresponding journal. We show the best fit for a power law
distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line
indicates the theoretical maximal number of published papers if the distribution was the fitted power law (see Section 4). We observe an almost
systematic exceeding of the number of papers published by some authors. The same plot for other journals is available in Figure A.2.

distributions are depicted in Figure 6 and in Figure A.2, and the fitted parameters are detailed
in Table 3. It appears from Figures 6 and A.2 that for such reduced number of authors, the
overshoot of some authors is more systematic, suggesting that in the early years of scientific
journals, there are usually a few very prolific authors publishing in it at a rather high rate.

Considering the results of the fitting, in Table 3, we observe better agreements than for the
full data sets. This probably indicates that the sample size is not large enough to accurately fit
heavy-tailed distributions, which obviously need large samples. The fact that NAT and PNA are
well fitted by two distributions also indicates that the reduced data sets are not large enough to
be conclusive.

Table 3. Fitted parameters and p-value of the goodness-of-fit for power law (PL), power law with cutoff (PLwC), and Yule-Simon ( Y-S) distributions,
for the nine journals with reduced time span. We see that the only data that are well-approximated by the PL are for NAT when reduced to
the first 3,374 entries of WoS. The PLwC, however, seems to be a good fit for the reduced data of six journals (NAT, PNA, SCI, LAN, TAC,
and ENE). ENE is particularly well-fitted by the PLwC. Finally, the Yule-Simon distribution seems to correctly fit the distribution of PAN, PLC, and
ACS. For the other journals, none of the distributions seem to fit the data appropriately. Remark that the reduced data of NAT and PNA are
correctly fitted for two distributions, indicating that the amount of data is probably not sufficient for a good fit.

2.32

2.10

2.44

2.25

2.27

2.59

2.06

2.32

2.69

p (%)

29.4

0.1

0.0

0.9

0.0

0.8

2.23

1.96

2.13

1.81

2.06

2.12

1.89

2.06

2.50

PLwC
γ

0.016

0.02

0.09

0.11

0.04

0.16

0.02

0.06

p (%)

6.0

15.0

72.0

30.2

4.4

0.3

0.1

23.7

94.5

NAT

PNA

SCI

LAN

NEM

PLC

ACS

TAC

ENE

Quantitative Science Studies

Y-S

2.98

2.55

3.37

2.91

3.82

2.46

3.04

4.06

p (%)
0.0

6.3

4.7

2.5

0.0

54.7

64.0

0.1

0.0

786

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

5. MODELING

We observe in Figures 1, 2, and A.1 that for old journals where a lot of papers are published,
the tail of the histogram has a rather fast decay after a heavy-tailed regime (this is particularly
striking in PRL and PRD, Figure 1). We explain this observation by the fact that the number of
publications of a given author depends on two parameters: their publication rate and the
length of their career. Both these quantities are bounded in practice, and even if it is possible
to publish a very large number of papers in a given journal, there is a practical limit to this
number. We hypothesize that the decay in the histograms of long-living journals comes from
the finiteness of publication rates and career lengths.

To support our hypothesis, we propose a model to generate data sets that mimic the distri-
butions observed above. As discussed, this model is built on two main dynamics. Fundamen-
tally, it is a preferential attachment process, where the likelihood that a researcher is in the
author’s list of a new paper is proportional to the number of papers this researcher already
has in this journal. But in addition, it is refined with a limited career span, requiring that after
some time, the likelihood that a researcher publishes a new paper decreases to reach zero after
they retire.

The model is based on five parameters:

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:129) Ny 2 ℤ≥0: The number of years (i.e., number of iterations) over which the model is run;
(cid:129) Np 2 ℤ≥0: The number of papers that are published every year in the synthetic journal;
(cid:129) ρ0 2 [0, 1]: The proportion of papers that are authored by new researchers who have not

yet published in the synthetic journal; and

(cid:129) Tmin, Tmax 2 ℤ≥0: The likelihood that an author publishes a new paper decreases linearly
after their Tminth year of activity, until reaching zero at their Tmaxth year of activity. We
illustrate this likelihood in Figure 7.

The model is arbitrarily initialized with some number of authors each with a few papers in
the synthetic journal, gathered in the data set D(0) = {n1(0), n2(0), …, nA(0)(0)}. Then for each
year t 2 {1, …, Ny} where the model is run, Np papers are attributed randomly either to new
authors (i.e., who have not yet published) with probability ρ0, or to an existing author with
probability 1 − ρ0. If it is attributed to an existing author, the probability that it is attributed
to author i is:

(cid:129) proportional to ni(t), the number of papers published by i at year t; and
(cid:129) linearly decreasing for Ti(t) 2 [Tmin, Tmax], where Ti(t) is the “academic age” of i, which is

the number of iterations between t and the first publication year of i.

Figure 7. Left: Scheme of the iterative process generating the synthetic distribution of number of publication per author in a journal. Right:
Illustration of the probability that a new paper is attributed to author i, knowing that they have already published in the past.

Quantitative Science Studies

787

Distribution of the number of papers within scientific journals

Figure 8. Histograms of the outcome of our synthetic data generator for different value of the journal life spa Ny. Fixed parameters are Np =
1,000, ρ0 = 0.5, Tmin = 20, Tmax = 60. There is a clear similarity between the shapes of these synthetic distributions and those of the actual data.

Fitted parameters and p-value of the goodness-of-fit for power law (PL) and power law with cutoff (PLwC), and Yule-Simon ( Y-S)
Table 4.
distributions on the synthetic histograms of Figure 8. None of the goodness-of-fit tests are conclusive, but the values of the fitted parameters are
very similar to what is observed in actual data.

Ny = 50

Ny = 100

Ny = 150

2.05

2.12

p (%)
0.0

0.0

1.94

2.03

2.01

PLwC
γ

0.013

0.01

0.02

p (%)
0.0

0.0

2.44

2.58

Y-S

p (%)
0.2

0.0

0.06

Mathematically, knowing that the new paper is attributed to an existing author, the proba-

bility that it is attributed to author i at year t is given by

(cid:6)

(cid:7)

P ið Þ ¼

Z tð Þ ni tð Þ min 1; Tmax − Ti tð Þ

Tmax − Tmin

;

(10)

where Z( y) is the appropriate normalizing factor. The actual implementation of this model is
available online (Delabays, 2022).

Histograms of the outcome of this model are illustrated in Figure 8 and the fitted parameters
are in Table 4. We observe a clear similarity between the histograms for synthetic and real
data. Namely, for short lifetime (Ny = 50), some authors beat the PL and exceed the number
of papers that would be expected, as is observed in Figure 2 for CHA, SIA, and AMA. For
longer lifetime (Ny = 150) the tail of the distribution decays and loses its heaviness, similar
to PRL and PRD in Figure 1.

These observations advocate in favor of the hypothesis that the two main ingredients in the
description of the evolution of the authorship within journals are both preferential attachment
and finiteness of careers.

6. DISCUSSION

The main observation of our article is the heavy-tailed shape of the distribution of papers,
which we explain by a preferential attachment or cumulative advantage process. Heavy-
tailedness in distributions related to scientific publications, especially in citation or

Quantitative Science Studies

788

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

collaboration networks, has widely been documented (Eom & Fortunato, 2011; Price, 1976).
We showed that heavy-tailedness is preserved when restricting the analysis to a single journal.

Interestingly, our analysis suggests that the distribution does not follow a PL, but has a slightly
lighter tail. Whereas we have not been able to unequivocally identify a canonical distribution, we
demonstrated that a PLwC or a Yule-Simon distribution seem to be better fits to the data than the PL.

We argue that the observed heavy-tailedness of the distribution follows from a preferential
attachment process through three pieces of evidence. First, we showed that the probability that
an author gets a new paper in a given journal at time t is approximately proportional to the
number of papers they already have in the very same journal. According to Krapivsky et al.
(2000), exact proportionality would lead to a PL. Therefore, it is likely that an approximate
proportionality leads to a heavy-tailed distribution.

Second, we emphasized an approximate cumulative advantage process, which also leads
to PL behaviors. Whereas both what we refer to as preferential attachment and cumulative
advantage are closely related, they display two underlying mechanisms explaining the
heavy-tailedness of the distributions.

Finally, we provided a mathematical model for generating synthetic data of number of
papers in a given journal, where preferential attachment plays a crucial role. The similarity
between the obtained distribution and the observed distributions also supports the claim of
the heavy tails being driven by preferential attachment.

Even though there seems to be a pattern in the data analyzed in this article, standard distri-
butions (e.g., PLwC, Yule-Simon) do not perfectly fit the data. More advanced fitting techniques
could identify a common distribution for all journals, provided that one exists. A more refined
explanation of the approximate preferential attachment taking place in scientific publishing
could unravel with more certainty the source of the distributions observed in this article. Even
though the preferential attachment has been emphasized in the past, the underlying reasons for
this bias are intricate. Disentangling the impact of scientific factors (quality and novelty of the
research) and more social ones (rank and reputation of the authors) in the publication process
will be a key step towards a fair and square evaluation of scientists and their work.

AUTHOR CONTRIBUTIONS

Robin Delabays: Conceptualization, Data curation, Formal analysis, Investigation, Methodol-
ogy, Software, Validation, Visualization, Writing—original draft, Writing—review & editing.
Melvyn Tyloo: Conceptualization, Methodology, Writing—review & editing.

FUNDING INFORMATION

Both authors were partly supported by the Swiss National Science Foundation under grant
number 200020_182050. RD was supported by the Swiss National Science Foundation under
grant number P400P2_194359.

COMPETING INTERESTS

The authors have no competing interests.

DATA AVAILABILITY

The data were extracted from www.webofscience.com and cannot be shared openly. The
code for synthetic data generation is available online (Delabays, 2022).

Quantitative Science Studies

789

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

REFERENCES

Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in ran-
dom networks. Science, 286, 509–512. https://doi.org/10.1126
/science.286.5439.509, PubMed: 10521342

Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek,
T. (2002). Evolution of the social network of scientific collabora-
tions. Physica A, 311, 590–614. https://doi.org/10.1016/S0378
-4371(02)00736-7

Barrios, M., Borrego, A., Vilaginé s, A., Ollé, C., & Somoza, M.
(2008). A bibliometric study of psychological research on tour-
ism. Scientometrics, 77, 453–467. https://doi.org/10.1007
/s11192-007-1952-0

Beall, J. (2017). What I learned from predatory publishers. Bio-
chemia Medica, 27, 273–278. https://doi.org/10.11613/ BM
.2017.029, PubMed: 28694718

Bettencourt, L. M. A., Lobo, J., Strumsky, D., & West, G. B. (2010).
Urban scaling and its deviations: Revealing the structure of wealth,
innovation and crime across cities. PLOS ONE, 5, e13541. https://
doi.org/10.1371/journal.pone.0013541, PubMed: 21085659
Bohannon, J. (2013). Who’s afraid of peer review? Science, 342,
60–65. https://doi.org/10.1126/science.2013.342.6154.342_60,
PubMed: 24092725

Bornmann, L., & Mutz, R. (2015). Growth rates of modern science:
A bibliometric analysis based on the number of publications and
cited references. Journal of the Association for Information Sci-
ence and Technology, 66, 2215–2222. https://doi.org/10.1002
/asi.23329

Broido, A. D., & Clauset, A. (2019). Scale-free networks are rare.
Nature Communications, 10, 1–10. https://doi.org/10.1038
/s41467-019-08746-5, PubMed: 30833554

Butler, D. (2013). Investigating journals: The dark side of publish-
ing. Nature, 495, 433–435. https://doi.org/10.1038/495433a,
PubMed: 23538810

Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law
distributions in empirical data. SIAM Review, 51, 661–703.
https://doi.org/10.1137/070710111

Delabays, R. (2022). ADGenerator: Authors Distribution Generator

(v1.0). Zenodo. https://zenodo.org/record/6030303

Egghe, L., & Rousseau, R. (2000). The influence of publication
delays on the observed aging distribution of scientific literature.
Journal of the American Society for Information Science and
Technology, 51, 158–165. https://doi.org/10.1002/(SICI)1097
-4571(2000)51:2<158::AID-ASI7>3.0.CO;2-X

Eom, Y.-H., & Fortunato, S. (2011). Characterizing and modeling
citation dynamics. PLOS ONE, 6, e24926. https://doi.org/10
.1371/journal.pone.0024926, PubMed: 21966387

Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D.,
… Barabá si, A.-L. (2018). Science of science. Science, 359,
eaao0185. https://doi.org/10.1126/science.aao0185, PubMed:
29496846

Frandsen, T. F., & Nicolaisen, J. (2017). Citation behavior: A
large-scale test of the persuasion by name-dropping hypothesis.
Journal of the Association for Information Science and Technol-
ogy, 68, 1278–1284. https://doi.org/10.1002/asi.23746

Garfield, E. (1955). Citation indexes for science: A new dimension
in documentation through association of ideas. Science, 122,
108–111. https://doi.org/10.1126/science.122.3159.108,
PubMed: 14385826

Grudniewicz, A., Moher, D., Cobey, K. D., Bryson, G. L., Cukier, S.,
… Lalu, M. M. (2019). Predatory journals: No definition, no
defence. Nature, 576, 210–212. https://doi.org/10.1038/d41586
-019-03759-y, PubMed: 31827288

Gupta, B. M., & Karisiddappa, C. R. (1996). Author productivity
patterns in theoretical population genetics (1900–1980). Sciento-
metrics, 36, 19–41. https://doi.org/10.1007/BF02126643

Hirsch, J. E. (2005). An index to quantify an individual’s scientific
research output. Proceedings of the National Academy of Sci-
ences of the USA, 102, 16569–16572. https://doi.org/10.1073
/pnas.0507655102, PubMed: 16275915

Huber, J. C., & Wagner-Döbler, R. (2001a). Scientific production: A
statistical analysis of authors in mathematical logic. Scientomet-
rics, 50, 323–337. https://doi.org/10.1023/A:1010581925357
Huber, J. C., & Wagner-Döbler, R. (2001b). Scientific production: A
statistical analysis of authors in physics, 1800–1900. Scientomet-
rics, 50, 437–453. https://doi.org/10.1023/A:1010558714879
Jeong, H., Néda, Z., & Barabási, A.-L. (2003). Measuring preferen-
tial attachment in evolving networks. Europhysics Letters, 61,
567–572. https://doi.org/10.1209/epl/i2003-00166-9

Katz, J. S. (1999). The self-similar science system. Research Policy,
28, 501–517. https://doi.org/10.1016/S0048-7333(99)00010-4
Krapivsky, P., & Krioukov, D. (2008). Scale-free networks as pre-
asymptotic regimes of superlinear preferential attachment. Phys-
ical Review E, 78, 026114. https://doi.org/10.1103/PhysRevE.78
.026114, PubMed: 18850904

Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity
of growing random networks. Physical Review Letters, 85,
4629–4632. https://doi.org/10.1103/ PhysRevLett.85.4629,
PubMed: 11082613

Kretschmer, H., & Rousseau, R. (2001). Author inflation leads to a
breakdown of Lotka’s law. Journal of the American Society for
Information Science and Technology, 52, 610–614. https://doi
.org/10.1002/asi.1118

Laherrère, J., & Sornette, D. (1998). Stretched exponential distribu-
tions in nature and economy: “Fat tails” with characteristic
scales. European Physical Journal B, 2, 525–539. https://doi.org
/10.1007/s100510050276

Lotka, A. J. (1926). The frequency distribution of scientific productiv-
ity. Journal of Washington Academy of Sciences, 16, 317–323.
Merton, R. K. (1968). The Matthew effect in science: The reward
and communication systems of science are considered. Science,
159, 56–63. https://doi.org/10.1126/science.159.3810.56,
PubMed: 5634379

Merton, R. K. (1988). The Matthew effect in science, II: Cumulative
advantage and the symbolism of intellectual property. Isis, 79,
606–623. https://doi.org/10.1086/354848

Narin, F. (1976). Evaluative bibliometrics: The use of publication
and citation analysis in the evaluation of scientific activity. Wash-
ington, DC: National Science Foundation.

Newby, G. B., Greenberg, J., & Jones, P. (2003). Open source soft-
ware development and Lotka’s law: Bibliometric patterns in
programming. Journal of the American Society for Information
Science and Technology, 54, 169–178. https://doi.org/10.1002
/asi.10177

Newman, M. E. J. (2001). The structure of scientific collaboration
networks. Proceedings of the National Academy of Sciences of
the USA, 98, 404–409. https://doi.org/10.1073/pnas.98.2.404,
PubMed: 11149952

Pal, J. K. (2015). Scientometric dimensions of cryptographic
research. Scientometrics, 105, 179–202. https://doi.org/10.1007
/s11192-015-1661-z

Parolo, P., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., &
Fortunato, S. (2015). Attention decay in science. Journal of Infor-
metrics, 9, 734–745. https://doi.org/10.1016/j.joi.2015.07.006

Quantitative Science Studies

790

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Distribution of the number of papers within scientific journals

Perc, M. (2014). The Matthew effect in empirical data. Journal of
the Royal Society Interface, 11, 20140378. https://doi.org/10
.1098/rsif.2014.0378, PubMed: 24990288

Price, D. de Solla. (1976). A general theory of bibliometric and
other cumulative advantage processes. Journal of the American
Society for Information Science and Technology, 27, 292–306.
https://doi.org/10.1002/asi.4630270505

Price, D. J. de Solla. (1963). Little science, big science. Columbia

University Press. https://doi.org/10.7312/pric91844

Price, D. J. de Solla. (1965). Networks of scientific papers. Science,
149, 510–515. https://doi.org/10.1126/science.149.3683.510,
PubMed: 14325149

Saam, N. J., & Reiter, L. (1999). Lotka’s law reconsidered: The
evolution of publication and citation distributions in scientific
fields. Scientometrics, 44, 135–155. https://doi.org/10.1007
/BF02457376

Sekara, V., Deville, P., Ahnert, S. E., Barabási, A.-L., Sinatra, R., &
Lehmann, S. (2018). The chaperone effect in scientific publish-
ing. Proceedings of the National Academy of Sciences of the
USA, 115, 12603–12607. https://doi.org/10.1073/pnas
.1800471115, PubMed: 30530676

Siudem, G., Żogal(cid:1)a-Siudem, B., Cena, A., & Gagolewski, M. (2020).
Three dimensions of scientific impact. Proceedings of the National
Academy of Sciences of the USA, 117, 13896–13900. https://doi
.org/10.1073/pnas.2001064117, PubMed: 32513724

Smolinsky, L. (2017). Discrete power law with exponential cutoff
and Lotka’s law. Journal of the Association for Information Science
and Technology, 68, 1792–1795. https://doi.org/10.1002/asi
.23763

Sorokowski, P., Kulczycki, E., Sorokowska, A., & Pisanski, K.
(2017). Predatory journals recruit fake editor. Nature, 543,
481–483. https://doi.org/10.1038/543481a, PubMed: 28332542
Sutter, M., & Kocher, M. G. (2001). Power laws of research output.
Evidence for journals of economics. Scientometrics, 51, 405–414.
https://doi.org/10.1023/A:1012757802706

Thelwall, M. (2016). The discretised lognormal and hooked power
law distributions for complete citation data: Best options for

modelling and regression. Journal of Informetrics, 10, 336–346.
https://doi.org/10.1016/j.joi.2015.12.007

van Raan, A. F. J. (2007). Bibliometric statistical properties of the
100 largest European research universities: Prevalent scaling
rules in the science system. Journal of the American Society for
Information Science and Technology, 59, 461–475. https://doi
.org/10.1002/asi.20761

van Raan, A. F. J. (2019). Measuring science: Basic principles and
application of advanced bibliometrics. In W. Glänzel, H. F.
Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook
of science and technology indicators (pp. 237–280). Cham:
Springer. https://doi.org/10.1007/978-3-030-02511-3_10

Wagner-Döbler, R., & Berg, J. (1999). Physics 1800–1900: A quan-
titative outline. Scientometrics, 46, 213–285. https://doi.org/10
.1007/BF02464778

Waltman, L., & van Eck, N. J. (2012). A new methodology for con-
structing a publication-level classification system of science: A
new methodology for constructing a publication-level classifica-
tion system of science. Journal of the American Society for Infor-
mation Science and Technology, 63, 2378–2392. https://doi.org
/10.1002/asi.22748

Waltman, L., van Eck, N. J., & van Raan, A. F. J. (2012). Universality
of citation distributions revisited. Journal of the American Society
for Information Science and Technology, 63, 72–77. https://doi
.org/10.1002/asi.21671

Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accu-
racy of the journal classification systems of Web of Science and
Scopus. Journal of Informetrics, 10, 347–364. https://doi.org/10
.1016/j.joi.2016.02.003

Zadorozhnyi, V. N., & Yudin, E. B.

(2015). Growing network:
Models following nonlinear preferential attachment rule. Phy-
sica A, 428, 111–132. https://doi.org/10.1016/j.physa.2015.01
.052

Zhou, T., Wang, B.-H., Jin, Y.-D., He, D.-R., Zhang, P.-P., … Liu, J.-G.
(2007). Modelling collaboration networks based on nonlinear
preferential attachment. International Journal of Modern Physics
C, 18, 297–314. https://doi.org/10.1142/S0129183107010437

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Quantitative Science Studies

791

Distribution of the number of papers within scientific journals

APPENDIX

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

Figure A.1. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in these
journals (see Table 1 for legends). As in Figures 1 and 2, for each value of n, the height of the bar gives the proportion of authors who published n
articles in the corresponding journal. The gray dotted line is the exponential fit of the data, emphasizing that the distribution is heavy-tailed. We
show the best fit for a power law distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted
black). The vertical dashed line indicates the theoretical maximal number of published papers if the distribution was the fitted power law.

e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d

f
/

3
3
7
7
6
2
0
5
7
7
9
1
q
s
s
_
a
_
0
0
2
0
1
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure A.2. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in
these journals (see Table 1 for legends). Data are restricted to the years between 1900 (earliest possible in WoS) and the years indicated in the
insets. The number of authors covered is given in parentheses in the third column of Table 1. As in Figures 1 and 2, for each value of n, the
height of the bar gives the proportion of authors who published n articles in the corresponding journal. We show the best fit for a power law
distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line
indicates the theoretical maximal number of published papers if the distribution was the fitted power law. We observe an almost systematic
exceeding of the number of papers published by some authors.

Quantitative Science Studies

792 RESEARCH ARTICLE image

Download pdf