RESEARCH ARTICLE
Are disruption index indicators convergently valid?
The comparison of several indicator variants
with assessments by peers
Lutz Bornmann1
, Sitaram Devarakonda2
, Alexander Tekles1,3
, and George Chacko2
1Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society,
Hofgartenstr. 8, 80539 慕尼黑, 德国
2NET ESolutions Corporation, 8180 Greensboro Dr, McLean, VA 22102, 美国
3Ludwig-Maximilians-Universität Munich, Department of Sociology, Konradstr. 6, 80801 慕尼黑, 德国
关键词: bibliometrics, dependence index, disruption index, 新奇
抽象的
最近, 吴, 王, and Evans (2019) proposed a new family of indicators, which measure
whether a scientific publication is disruptive to a field or tradition of research. Such disruptive
influences are characterized by citations to a focal paper, but not its cited references. 在这个
学习, we are interested in the question of convergent validity. We used external criteria of
newness to examine convergent validity: In the postpublication peer review system of
F1000Prime, experts assess papers whether the reported research fulfills these criteria (例如,
reports new findings). This study is based on 120,179 papers from F1000Prime published
之间 2000 和 2016. In the first part of the study we discuss the indicators. Based on the
insights from the discussion, we propose alternate variants of disruption indicators. 在里面
second part, we investigate the convergent validity of the indicators and the (possibly) 改进的
variants. Although the results of a factor analysis show that the different variants measure similar
方面, the results of regression analyses reveal that one variant (DI5) performs slightly
better than the others.
1.
介绍
Citation analyses often focus on counting the number of citations to a focal paper (FP). 到
assess the academic impact of the FP, its citation count is compared with the citation count
for a similar paper (SP) that has been published in the same research field and year. If the FP
receives significantly more citations than the SP, its impact is noteworthy: The FP seems to be
more useful or interesting for other researchers than the SP. 然而, the simple counting and
comparing of citations does not reveal what the reasons for the impact of publications might
是. As the overviews by Bornmann and Daniel (2008) and Tahamtan and Bornmann (2019)
展示, various reasons exist why publications are (highly) 引用. Especially for research evalu-
ation purposes, it is very interesting to know whether certain publications have impact be-
cause they report novel or revolutionary results. These are the results from which science
and society mostly profit.
在本文中, we focus on a new type of indicator family measuring the impact of publica-
tions by examining not only the number of citations received but also the references cited in
出版物. 最近, Funk and Owen-Smith (2017) proposed a new family of indicators that
开放访问
杂志
引文: Bornmann, L。, Devarakonda,
S。, Tekles, A。, & Chacko, G. (2020).
Are disruption index indicators
convergently valid? The comparison
of several indicator variants with
assessments by peers. Quantitative
Science Studies, 1(3), 1242–1259.
https://doi.org/10.1162/qss_a_00068
DOI:
https://doi.org/10.1162/qss_a_00068
已收到: 20 十一月 2019
公认: 01 可能 2020
通讯作者:
Lutz Bornmann
bornmann@gv.mpg.de
处理编辑器:
Vincent Larivière
版权: © 2020 Lutz Bornmann,
Sitaram Devarakonda, 亚历山大
Tekles, and George Chacko. 已发表
under a Creative Commons Attribution
4.0 国际的 (抄送 4.0) 执照.
麻省理工学院出版社
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
measure the disruptive potential of patents. Wu et al. (2019) transferred the conceptual idea to
publication data by measuring whether an FP disrupts a field or tradition of research (see also a
similar proposal in Bu, Waltman, & 黄, 2019). Azoulay (2019) describes the so-called
disruption index proposed by Wu et al. (2019) as follows: “When the papers that cite a given
article also reference a substantial proportion of that article’s references, then the article can be
seen as consolidating its scientific domain. When the converse is true—that is, when future
citations to the article do not also acknowledge the article’s own intellectual forebears—the
article can be seen as disrupting its domain” (p. 331).
We are interested in the question of whether disruption indicators are convergently valid
with assessments by peers. The current study has two parts: In the first part, we discuss the
indicators introduced by Wu et al. (2019) and Bu et al. (2019) and identify possible limitations.
Based on the insights from the discussion, we propose alternate variants of disruption indica-
托尔斯. In the second part, we investigate the convergent validity of the indicators proposed by
Wu et al. (2019) and Bu et al. (2019) 和 (possibly) improved variants. We used an exter-
nal criterion of newness, which is available at the paper level for a large paper set: tags (例如,
“new finding”) assigned to papers by peers expressing newness.
Convergent validity asks “to what extent does a bibliometric exercise exhibit externally
convergent and discriminant qualities? 换句话说, does the indicator satisfy the condition
that it is positively associated with the construct that it is supposed to be measuring? The cri-
teria for convergent validity would not be satisfied in a bibliometric experiment that found little
or no correlation between, 说, peer review grades and citation measures” (Rowlands, 2018).
The analyses are intended to identify the indicator (variant) that is more strongly related to
assessments by peers (concerning newness) than other indicators.
2.
INDICATORS MEASURING DISRUPTION
The new family of indicators measuring disruption has been developed based on the previous
introduction of another indicator family measuring novelty. Research on the novelty indicator
family is based on the view of research as a “problem solving process involving various com-
binatorial aspects so that novelty comes from making unusual combinations of preexisting
components” (王, 李, & Walsh, 2018, p. 1074). Uzzi, Mukherjee, 等人. (2013) 分析过的
cited references, and investigated whether referenced journal pairs in papers are atypical or
不是. Papers with many atypical journal pairs were denoted as papers with high novelty poten-
提尔. The authors argue that highly cited papers are not only highly novel but also very
conventionally oriented. In a related study, Boyack and Klavans (2014) reported strong disci-
plinary and journal effects in inferring novelty.
最近几年, 李, Walsh, and Wang (2015) proposed an adapted version of the novelty
measure proposed by Uzzi et al. (2013), 王, Veugelers, and Stephan (2017), and Stephan,
Veugelers, and Wang (2017) introduced a novelty measure focusing on publications with
great potential of being novel by identifying new journal pairs (instead of atypical pairs). A
different approach is used by Boudreau, Guinan, 等人. (2016) and Carayol, Lahatte, 和
Llopis (2017), who used unusual combinations of keywords for measuring novelty. Other stud-
ies in the area of measuring novelty have been published by Foster, Rzhetsky, and Evans
(2015), Mairesse and Pezzoni (2018), Bradley, Devarakonda, 等人. (2020), and Wagner,
Whetsell, and Mukherjee (2019), each with a different focus. According to the conclusion by
Wang et al. (2018), “prior work suggests that coding for rare combinations of prior knowledge
in the publication produces a useful a priori measure of the novelty of a scientific publication”
(p. 1074).
Quantitative Science Studies
1243
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
Novelty indicators have been developed against the backdrop of the desire to identify and
measure creativity. How is creativity defined? According to Hemlin, Allwood, 等人. (2013)
“creativity is held to involve the production of high-quality, original, and elegant solutions
to complex, 小说, ill-defined, or poorly structured problems” (p. 10). Puccio, Mance, 和
Zacko-Smith (2013) claim that “many of today’s creativity scholars now define creativity as
the ability to produce original ideas that serve some value or solve some problem” (p. 291).
The connection between the indicators measuring novelty and creativity of research is made
by that stream of research viewing creativity “as an evolutionary search process across a com-
binatorial space and sees creativity as the novel recombination of elements” (李等人。, 2015,
p. 685). For Estes and Ward (2002) “creative ideas are often the result of attempting to deter-
mine how two otherwise separate concepts may be understood together” (p. 149), whereby
the concepts may refer to different research traditions or disciplines. Similar statements on the
roots of creativity can be found in the literature from other authors, as the overview by Wagner
等人. (2019) 节目. Bibliometric novelty indicators try to capture the combinatorial dynamic
of papers (因此, the creative potential of papers; see Tahamtan & Bornmann, 2018) 经过
investigating lists of cited references or keywords for new or unexpected combinations
(Wagner et al., 2019).
In a recent study, Bornmann, Tekles, 等人. (2019) investigated two novelty indicators and
tested whether they exhibit convergent validity. They used a similar design to this study and
found that only one indicator is convergently valid.
In this context of measuring creativity, not only the development of indicators measuring
novelty but also the introduction of indicators identifying disruptive research have occurred.
These indicators are interested in exceptional research that turns knowledge formation in a
field around. The family of disruption indicators proposed especially by Wu et al. (2019)
and Bu et al. (2019) seizes on the concept of Kuhn (1962), who differentiated between phases
of normal science and scientific revolutions. Normal science is characterized by paradigmatic
思维, which is rooted in traditions and consensus orientation; scientific revolutions follow
divergent thinking and openness (Foster et al., 2015). Whereas normal science means linear
accumulation of research results in a field (Petrovich, 2018), scientific revolutions are dramatic
changes with an overthrow of established thinking (Casadevall & Fang, 2016). Preconditions
for scientific revolutions are creative knowledge claims that disrupt linear accumulation pro-
cesses in field-specific research (Kuukkanen, 2007).
Bu et al. (2019) see the development of disruption indicators in the context of a multidi-
mensional perspective on citation impact. In contrast to simple citation counting under the
umbrella of a one-dimensional perspective, the multidimensional perspective considers
breadth and depth through the cited references of an FP and the cited references of its citing
文件 (see also Marx & Bornmann, 2016). In contrast to the family of novelty indicators,
which are based exclusively on cited references, disruption indicators combine the cited ref-
erences of citing papers with the cited references data of FPs. The disruptiveness of an FP is
measured based on the extent to which the cited references of the papers citing the FP also
refer to the cited references of the FP. According to this idea, many citing papers not referring
to the FP’s cited references indicate disruptiveness. 在这种情况下, the FP is the basis for new
work that does not depend on the context of the FP (IE。, the FP gives rise to new research).
Disruptiveness was first described by Wu et al. (2019) and Funk and Owen-Smith (2017)
and presented as a weighted index DI1 (见图 1) calculated for an FP by dividing the
difference between the number of publications that cite the FP without citing any of its cited
参考 (Ni) and the number of publications that cite both the FP and at least one of its cited
Quantitative Science Studies
1244
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
数字 1. Different roles of papers in citation networks for calculating disruption indicators and
formulae for different disruption indicators (see Wu & 吴, 2019).
参考 (N1
j ) by the sum of Ni, N1
j , and Nk (the number of publications that cite at least one of
the FP’s cited reference without citing the FP itself ). 简单的说, this is the ratio
Ni−N1
j
þNk
NiþN1
j
. 高的
positive values of this indicator should be able to point to disruptive research; high negative
values should reflect developmental research (IE。, new research that continues previous
research lines).
DI1 corresponds to a certain notion of disruptiveness, according to which only few papers
are disruptive (while most papers are not disruptive), and a paper needs to have a large citation
impact to score high on DI1. DI1 detects only a few papers as disruptive due to the term Nk,
which is often very large compared to the other terms in the formula (吴 & 严, 2019). A
large Nk produces disruption values of small magnitude, as Nk only occurs in the denominator
of the formula. 因此, the disruption index is very similar for many papers, and only a few
papers get high disruption values. 然而, Funk and Owen-Smith (2017), who originally
defined the formula for the disruption index, designed the indicator to measure disruptiveness
on a continuous scale from −1 to 1: “the measure is continuous, able to capture degrees of
consolidation and destabilization” (p. 793). This raises the question of whether different
nuances of disruptions can be adequately captured by DI1, or if the term Nk is too dominant
for this purpose.
Including Nk in the formula can also be questioned with regard to its function for assessing
the disruptiveness of a paper. The basic idea that all disruption indicators share is to distinguish
between citing papers of an FP indicating disruptiveness, and citing papers indicating consoli-
日期. Nk does not refer to this distinction. 反而, it captures the citation impact of an FP com-
pared to other papers in a similar context (all papers citing the same papers as the FP). Assuming a
notion of disruptiveness that aims at detecting papers that have a large-scale disruptive effect,
this idea seems reasonable. 然而, this form of considering citation impact can be prob-
莱马蒂克, as it strongly depends on the citation behavior of the FP, and small changes in the FP’s
cited references can have a large effect on the disruption score.
Quantitative Science Studies
1245
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
A more general issue regarding the function of Nk is whether citation impact should be
considered at all when measuring disruptiveness, which depends on the underlying notion
of disruptiveness. Bu et al. (2019) suggest separating disruptiveness (depth of citation impact
in their terminology) and citation impact in terms of number of forward citations. This perspec-
tive assumes that disruptiveness is a quality of papers that can also be observed on a small
impact level. 从这个角度来看, Nk should not be included in the formula for measuring
disruptiveness, especially as it is the dominant factor in most cases. 最后, an alterna-
tive indicator would be simply to drop the term Nk, which corresponds to DInok
根据
the formula in Figure 1. This variant of the disruption indicator has been proposed by Wu and
, a very similar approach for calculating a paper’s disruption has been
严 (2019). With Ni
1
NiþN1
j
proposed by Bu et al. (2019). This indicator can be defined as a function of DInok
, 这样
differences between papers just change by the factor 0.5, so that both variants allow identical
结论. In our analyses, we will consider DInok
values as the original disruption index DI1.
because it has the same range of output
1
1
In contrast to the aforementioned variants of indicators measuring disruptiveness, Bu et al.
(2019) also proposed an indicator that considers how many cited references of the FP are cited
by an FP’s citing paper. This approach takes into account how strongly the FP’s citing papers
rely on the cited references of the FP, instead of just considering if this relationship exists (在
the form of at least one citation of a cited reference of the FP). The corresponding indicator
proposed by Bu et al. (2019) (denoted as DeIn in Figure 1) is defined as the average number of
cited references of the FP that its citing papers cite. In contrast to the other indicators men-
tioned earlier, DeIn is supposed to decrease with the disruptiveness of a paper, 因为它
measures the dependency of the paper on earlier work (as opposed to disruptiveness).
Another difference to the other indicators is that the range of DeIn has no upper bound, 是-
cause the average number of citation links from a paper citing the FP to the FP’s cited refer-
ences is not limited. This makes it more difficult to compare the results of DeIn and the other
指标.
By considering only those citing papers of the FP that cite at least l (l > 1) of the FP’s cited
参考, it becomes possible to follow the idea of taking into account how strongly the
FP’s citing papers rely on the cited references of the FP, but also get values that are more
comparable to the other indicators. The probability that a citing paper of the FP cites a
highly cited reference of the FP is higher than it is for a less frequently cited reference
of the FP. 所以, the fact that a paper cites both the FP and at least one of its cited ref-
erences is not equally indicative for a developmental FP in all cases. Only considering those
of the FP’s citing papers that also cite at least a certain number of the FP’s cited references
mitigates the problem, because the focus on the most reliable cases of citing papers indi-
cates a developmental FP.
This is formalized in the formulae in Figure 1, where the subscripts of DIl and DInok
corre-
spond to the threshold for the number of cited references of the FP which a citing paper must
cite to be considered. With a threshold of l = 1 (IE。, without any restriction on the number of
the FP’s cited references that the FP’s citing papers must cite), the indicator is identical to the
indicator originally proposed by Wu et al. (2019). To analyze how well these different strate-
gies are able to measure the disruptiveness of a paper, we compare the following indicators in
our analyses: DI1, DI5, DInok
, DeIn. The subscript in four variants indicates the minimum
number of cited references that are cited along with the FP. The superscript “no k” in two
variants indicates that Nk is excluded from the calculation.
, DInok
5
1
我
Quantitative Science Studies
1246
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
3. 方法
3.1. F1000Prime
F1000Prime is a database including important papers from biological and medical research
(see https://f1000.com/prime/home). The database is based on a postpublication peer review
系统: Peer-nominated faculty members (FMs) select the best papers in their specialties and
assess these papers for inclusion in the F1000Prime database. FMs write brief reviews explaining
the importance of papers and rate them as “good” (1 星星), “very good” (2 stars) or “exceptional”
(3 stars). Many papers in the database are assessed by more than one FM. To rank the papers in
the F1000Prime database, the individual scores are summed up to a total score for each paper.
FMs also assign the following tags to the papers, if appropriate:1
– Confirmation: article validates published data or hypotheses
– Controversial: article challenges established dogma
– Good for teaching: key article in a field and/or is well written
– Hypothesis: article presents an interesting hypothesis
− Negative/null results: article has null or negative findings
– New finding: article presents original data, models or hypotheses
– Novel drug target: article suggests new targets for drug discovery
– Refutation: article disproves published data or hypotheses
– Technical advance: article introduces a new practical/theoretical technique, or novel use
of an existing technique
The tags in bold reflect aspects of novelty in research. As disruptive research should include
elements of novelty, we expect that the tags are positively related to the disruption indicator
scores. 例如, we assume that a paper receiving many “new finding” tags from FMs
will have a higher disruption index score than a paper receiving only a few tags (or none at
全部). The tags not printed in bold are not related to newness (例如, confirmation of published
hypotheses), so that the expectations for these tags are zero or negative correlations with disrup-
tion index scores. In terms of measures that are likely to be inversely correlated with disruptive-
内斯, the one that seems most plausible is the “confirmation” tag. The tag “controversial” is
printed in italics. It is not clear whether the tag is able to reflect novelty or not. FMs further assign
the tags “clinical trial,” “systematic review/meta-analysis,” and “review/commentary” to papers
that are not relevant for this study (and not used thus).
We interpret the empirical results in Section 4.3 against the backdrop of the above assump-
系统蒸发散. In the interpretations of the results, 然而, it should be considered that the allocations
of tags by the FMs are subjective decisions associated with (或多或少) different meanings. 在
也就是说, the tags data are affected by noise (uncertainties) covering the signal (clear-cut
判断). Another point which should be considered in the interpretation of the empirical
results is the fact that the above assumptions can be formulated in another way. 例如,
we anticipate that papers that are “good for teaching” would be inversely correlated with
disruptiveness. The opposite could be true as well. Papers that introduce new topics, perspec-
特维斯, and ways of thinking—that shift the conversation—would be most useful for teaching.
Many factors play a role in the interpretation of the “good for teaching” tag: How complex is
the paper assessed by the FMs? Is it a landmark paper published decades ago or a recent
research front paper? Is the paper intended for teaching of bachelor, masters or doctoral
学生?
1 The definitions of the tags are adopted from https://f1000.com/prime/about/whatis/ how
Quantitative Science Studies
1247
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
Many other studies have already used data from the F1000Prime database for correlating
them with metrics. Most of these studies are interested in the relationship between quantitative
(metrics-based) and qualitative (human-based) assessments of research. The analysis of Anon
(2005) shows that “papers from high-profile journals tended to be rated more highly by the
faculty; there was a tight correlation (R2 = .93) between average score and the 2003 impact
factor of the journal” (see also Jennings, 2006). Bornmann and Leydesdorff (2013) correlated
several bibliometric indicators and F1000Prime recommendations. They found that the
“percentile in subject area achieves the highest correlation with F1000 ratings” (p. 286).
Waltman and Costas (2014) report “a clear correlation between F1000 recommendations
and citations. 然而, the correlation is relatively weak” (p. 433). Similar results were
published by Mohammadi and Thelwall (2013). Bornmann (2015) investigated the convergent
validity of F1000Prime assessments. He found that “the proportion of highly cited papers
among those selected by the FMs is significantly higher than expected. 此外, better rec-
ommendation scores are also associated with higher performing papers” (p. 2415). The most
recent study by Du, 唐, and Wu (2016) shows that “(A) nonprimary research or evidence-
based research are more highly cited but not highly recommended, 尽管 (乙) translational
research or transformative research are more highly recommended but have fewer citations”
(p. 3008).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
3.2. Data Set Used and Variables
The study is based on a data set from F1000Prime including 207,542 assessments of papers.
These assessments refer to 157,020 文件 (excluding papers with duplicate DOIs, 丢失的
DOIs, missing expert assessments, ETC。). The bibliometric data for these papers are from an
in-house database (Korobskiy, Davey, 等人。, 2019), which utilizes Scopus data (爱思唯尔
公司). To increase the validity of the indicators included in this study, we considered only pa-
pers with at least 10 cited references and at least 10 citations. 此外, we included only
papers from 2000 到 2016 to have reliable data (some publications are from 1970 or earlier)
and a citation window for the papers of at least 3 年 (since publication until the end of
2018). The reduced paper set consists of 120,179 papers published between 2000 和 2016
(见表 1).
We included several variables in the empirical part of this study: the disruption index pro-
posed by Wu et al. (2019) (DI1) and the dependence indicator proposed by Bu et al. (2019)
(DeIn). The alternative disruption indicators described in Section 2 considered were: DI5, DInok
,
and DInok
. For the comparison with the indicators reflecting disruption, we included the sum
(ReSc.sum) and the average (ReSc.avg) of reviewer scores (IE。, scores from FMs). Besides the
qualitative assessments of research, quantitative citation impact scores are also considered:
number of citations until the end of 2018 (Citations) and percentile impact scores (Percentiles).
5
1
As publication and citation cultures are different in the fields, it is standard in bibliometrics
to field- and time-normalize citation counts (希克斯, Wouters, 等人。, 2015). Percentiles are
场地- and time-normalized citation impact scores (Bornmann, 莱德斯多夫, & Mutz, 2013) 那
are between 0 和 100 (higher scores reflect more citation impact). For the calculation of per-
centiles, the papers published in a certain subject category and publication year are ranked in
decreasing order. Then the formula (i − 0.5)/n × 100 (Hazen, 1914) is used to calculate per-
centiles (i is the rank of a paper and n the number of papers in the subject category). Impact
percentiles of papers published in different fields can be directly compared (despite possibly
differing publication and citation cultures).
Quantitative Science Studies
1248
Are disruption index indicators convergently valid?
桌子 1. Number and percentage of papers included in the study
Publication year
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
全部的
Number of papers
Percentage of papers
196
1,530
3,229
3,717
5,185
6,711
8,765
8,824
10,046
10,368
11,074
10,934
10,536
9,903
7,261
6,121
5,779
0.16
1.27
2.69
3.09
4.31
5.58
7.29
7.34
8.36
8.63
9.21
9.1
8.77
8.24
6.04
5.09
4.81
120,179
100.00
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
.
/
桌子 2 shows the key figures for citation impact scores, reviewer scores, and variants mea-
suring disruption. As the percentiles reveal, the paper set includes especially papers with a
considerable citation impact. 桌子 3 lists the papers that received the maximum scores in
桌子 2. The maximum DI1 with the value 0.677 has been reached by the paper entitled
“Cancer statistics, 2010” published by Jemal, 西格尔, 等人. (2010). This publication is one of
an annual series published on incidence, mortality, and survival rates for cancer and its high
score may be an artifact of the DI1 formula because it is likely that the report is cited much
more than its cited references. 实际上, this publication may make the case for the DI5 formu-
关系. 相似地, 这 2013 edition of the cancer statistics report was found to have the maxi-
mum percentile. The maximum number of citations was seen for the well-known review
article by Hanahan and Weinberg (2011), “Hallmarks of cancer: the next generation.”
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
3.3. Statistics Applied
The statistical analyses in this study have three steps:
1. We investigated the correlations between citation impact scores, reviewer scores, 和
the scores of the indicators measuring disruption. All variables are not normally distrib-
uted and affected by outliers. To tackle this problem, we logarithmized the scores by
Quantitative Science Studies
1249
Are disruption index indicators convergently valid?
桌子 2. Key figures of the included variables (n = 120,179)
Variable
DI1
DI5
DInok
1
DInok
5
DeIn
ReSc.sum
ReSc.avg
Citations
Percentiles
意思是
−0.007
0.089
−0.521
−0.008
3.327
2.028
1.486
149.848
87.246
Median
−0.004
−0.007
−0.579
−0.053
2.970
2.000
1.000
73.000
91.947
Standard deviation
0.013
0.278
0.294
0.545
1.871
1.808
0.586
298.467
13.248
Minimum
−0.322
Maximum
0.677
−0.800
−0.998
−0.990
0.013
1.000
1.000
10.000
23.659
1.000
0.975
1.000
43.059
55.000
3.000
20446.000
100.000
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
using the formula log(X + 1). This logarithmic transformation approximates the distribu-
tions to normal distributions2. As perfectly normally distributed variables cannot be
achieved with the transformation, Spearman rank correlations have been calculated
(instead of Pearson correlations). We interpret the correlation coefficients against
the backdrop of the guidelines proposed by Cohen (1988) and Kraemer, 摩根,
等人. (2003): small effect = 0.1, medium effect = 0.3, large effect = 0.5, and very large
effect = 0.7.
2. We performed an exploratory factor analysis (F A) to analyze the variables. FA is a sta-
tistical method for data reduction (Gaskin & Happell, 2014); it is an exploratory tech-
nique to identify latent dimensions in the data and to investigate how the variables are
related to the dimensions (Baldwin, 2019). We expected three dimensions, because we
have variables with citation impact scores, reviewer scores, and indicators’ scores mea-
suring disruption. 作为 (logarithmized) variables do not perfectly follow the normal
分配, we performed the FA using the robust covariance matrix following
Verardi and McCathie (2012). 因此, the results of the FA are based on not the variables
but on a covariance matrix. The robust covariance matrix has been transformed into a
correlation matrix (StataCorp, 2017), which has been analyzed by the principal com-
ponent factor method (the communalities are assumed to be 1). We interpreted the fac-
tor loadings for the orthogonal varimax rotation; the factor loadings have been adjusted
“by dividing each of them by the communality of the correspondence variable. 这
adjustment is known as the Kaiser normalization” (Afifi, 可能, & 克拉克, 2012, p. 392).
In the interpretation of the results, we focused on factor loadings with values greater
比 0.5.
3. We investigated the relationship between the dimensions (identified in the FA) 和
F1000Prime tags (as proxies for newness or not). We expected a close relationship be-
tween the dimension reflecting disruption and tags reflecting newness. The tags are
count variables including the sum of the tags assignments from F1000Prime FMs for
2 We additionally performed the statistical analyses with scores that are not logarithmized and received very
类似的结果.
Quantitative Science Studies
1250
Are disruption index indicators convergently valid?
Variable
DI1
DI5
DInok
1
DInok
5
DeIn
ReSc.sum
ReSc.avg
Citations
Percentiles
桌子 3. Example papers with maximum scores
Jemal et al. (2010). Cancer statistics, 2010
纸 (authors, publication year, and title)
Mohan and Shellard (2014). Providing family planning services to remote communities in areas
of high biodiversity through a Population-Health-Environment programme in Madagascar
Kourtis, Nikoletopoulou, and Tavernarakis (2012). Small heat-shock proteins protect from
heat-stroke-associated neurodegeneration
Frank (2009). The common patterns of nature
Kincaid, Murata, 等人. (2016). Specialized proteasome subunits have an essential
role in the thymic selection of CD8+ T cells
Lolle, Victor, 等人. (2005). Genome-wide non-Mendelian inheritance of extra-genomic
information in Arabidopsis
McEniery, Yasmin, 等人. (2005). Normal vascular aging: Differential effects on wave reflection
and aortic pulse wave velocity: The Anglo-cardiff Collaborative Trial (ACCT)
Hanahan and Weinberg (2011). Hallmarks of cancer: The next generation
西格尔, Naishadham, and Jemal (2013). Cancer statistics, 2013
single papers. To calculate the relationship between dimensions and tags, we per-
formed a robust Poisson regression (Hilbe, 2014; 长的 & Freese, 2014). The Poisson
model is recommended to be used in cases of count data as dependent variable.
Robust methods are recommended when the distributional assumptions for the model
are not completely met (Hilbe, 2014). Because we are interested in identifying indica-
tors for measuring disruption that might perform better than the other variants, we tested
the correlation between each variant and the tag assignments using several robust
Poisson regressions. Citations, disruptiveness, and tag assignments are dependent on
时间 (Bornmann & Tekles, 2019). 因此, we included the number of years between
2018 and the publication year as exposure time in the models (长的 & Freese, 2014,
PP. 504–506).
4. 结果
4.1. Correlations Between Citation Impact Scores, Reviewer Scores, and Variants
Measuring Disruption
数字 2 shows the matrix including the coefficients of the correlations between reviewer
scores, citation impact indicators, and variants measuring disruption. DI1 is correlated at a
medium level with the other indicators measuring disruption, whereby these indicators corre-
late among themselves at a very high level. Very high positive correlations are visible between
citations and percentiles and between the average and sum of reviewer scores.
The correlation between DI1 and citation impact (citations and percentiles) is at least at the
medium level, but it is negative (r = −.46, r = −.37). 因此, the original DI1 seems to measure
another dimension than citation impact. This result is in agreement with results reported by
Wu et al. (2019, Figure 2a). 然而, the situation changed with the other indicators measur-
ing disruption to small positive (negative in the case of DeIn) correlation coefficients.
Quantitative Science Studies
1251
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Are disruption index indicators convergently valid?
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 2. Spearman rank correlations based on logarithmized variables [日志( y + 1)]. 下列
abbreviations are used: different indicators measuring disruption (DI1, DI5, DInok
, DeIn), 这
和 (ReSc.sum), and the average (ReSc.avg) of reviewer scores.
, DInok
5
1
4.2. Factor Analysis to Identify Latent Dimensions
We calculated an FA including reviewer scores, citation impact indicators, and variants
measuring disruption to investigate the underlying dimensions (latent variables). Most of the
results shown in Table 4 agree with expectations: We found three dimensions, 我们
labeled as disruption (因素 1), citations (因素 2), and reviewers (因素 3). 然而, 相反
to what was expected, DI1 loads negatively on the citation dimension revealing that (A) 高的
DI1 scores are related to low citation impact scores (see above) 和 (乙) all other indicators
measuring disruption are independent of DI1. 因此, the other indicators (at least one) seem
to be promising developments compared to the originally proposed indicator DI1.
4.3. Relationship Between Tag Mentions and FA Dimensions
Using Poisson regression models including the tags, we calculated correlations between the
tags and the three FA dimensions (disruption, citations, and reviewers). We are especially in-
terested in the correlation between the tags (measuring newness of research or not) 和
disruption dimension from the FA. We also included the citation impact and reviewer dimen-
sions into the analyses to see the corresponding results for comparison. In the analyses, 我们
considered the FA scores for the three dimensions (predicted values from FA that are not cor-
related by definition) as independent variables and the various tags (the sum of the tags assign-
评论) as dependent variables in nine Poisson regressions (one regression model for each tag).
The results of the regression analyses are shown in Table 5. We do not focus on the statis-
tical significance of the results, because they are more or less meaningless against the back-
drop of the high case numbers. The most important information in the table is the signs of the
Quantitative Science Studies
1252
Are disruption index indicators convergently valid?
桌子 4. Rotated factor loadings from a factor analysis using logarithmized variables [日志( y + 1)]
Variable
DI1
DI5
DInok
1
DInok
5
DeIn
Citations
Percentiles
ReSc.sum
ReSc.avg
Factor 1
0.24
Factor 2
−0.69
Factor 3
0.05
Uniqueness
0.46
0.90
0.90
0.97
−0.91
0.05
0.04
0.00
0.00
−0.07
−0.10
−0.03
−0.01
0.91
0.84
0.05
0.05
0.00
0.02
0.01
0.01
0.04
0.12
1.00
1.00
0.19
0.17
0.05
0.17
0.16
0.29
0.00
0.00
Notes: Three eigenvalues > 1. The following abbreviations are used: different indicators measuring disruption
(DI1, DI5, DInok
, and DeIn), the sum (ReSc.sum), and the average (ReSc.avg) of reviewer scores.
, DInok
5
1
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
A
_
0
0
0
6
8
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
coefficients and the percentage change coefficients. The percentage change coefficients are
counterparts to odds ratios in regression models, which measure the percentage changes in
the dependent variable if the independent variable (FA score) increases by one standard de-
viation (Deschacht & Engels, 2014; Hilbe, 2014; 长的 & Freese, 2014). The percentage
change coefficient for the model based on the “technical advance” tag and the disruption di-
mension can be interpreted as follows: For a standard deviation increase in the scores for dis-
破裂, a paper’s expected number of new finding tags increases by 10.93%, holding other
variables in the regression analysis constant. This increase is as expected and substantial.
然而, the results of the other tags expressing newness have a negative sign and are against
expectations.
The percentage change coefficients for the citation dimension are significantly higher than
for the disruption dimension (especially for the new finding tag) and positive. This result is
against our expectations, because the disruption variants should measure newness in a better
way than citations. 然而, one should consider in the interpretation of the results that DI1
correlates negatively with the citation indicators. 因此, the dimension also measures disrup-
主动性 (as originally proposed), whatever the case may be. If we interpret the results for
the dimension against this backdrop, at least the results for the tags not representing newness
seem to accord with our expectations. The results for the reviewer dimension are similar to the
citations dimension results. The consistent positive coefficients for the citations and reviewers
dimensions in Table 5 might result from the fact that the tags are from the same FMs as the
recommendations, and the FMs probably use citations to find relevant papers for reading,
assessing, and including in the F1000Prime database.
桌子 6 reports the results from some additional regression analyses. Because we are inter-
ested in not only correlations between dimensions (reflecting disruptiveness) and tags (the sum
of the tags assignments) but also in correlations between the various variants measuring
disruption and tags, we calculated 45 additional regression models. We are interested in
the question of which variant measuring disruption reflects newness better than other variants:
Are the different variants differently or similarly related to newness, as expressed by the tags?
桌子 6 only shows percentage change coefficients (see above) from the regression models
Quantitative Science Studies
1253
问
你
A
n
t
我
t
A
我
t
我
v
e
S
C
e
n
C
e
S
你
d
e
s
t
我
A
r
e
d
我
s
r
你
p
t
我
哦
n
我
n
d
e
X
我
n
d
我
C
A
t
哦
r
s
C
哦
n
v
e
r
G
e
n
t
我
y
v
A
我
我
d
?
桌子 5. Results of nine Poisson regression analyses (n = 120,179 文件). The models have been adjusted for exposure time (different publication years): How long was
the time that the papers have been at risk of being tagged and cited (number of years between publication and counting of citations or tags, 分别)?
Disruption
Citations
Reviewers
Tag
Tags expressing newness (expecting positive signs)
Coefficient
Percentage change
Hypothesis
New finding
Novel drug target
Technical advance
-.06*** (−10.22)
-.09*** (−41.77)
-.01 (−1.45)
.09*** (13.67)
−6.11
−9.92
−1.66
10.93
Tags not expressing newness (expecting negative signs)
Confirmation
Good for teaching
-.03*** (−6.41)
-.06*** (−6.73)
Negative/Null results
−.14** (−3.25)
Refutation
−.19*** (−8.14)
Tag without expectations
−3.85
−7.08
−14.72
−19.00
Coefficient
Percentage change
Coefficient
Percentage change
持续的
.20*** (36.91)
.23*** (111.06)
.27*** (27.33)
.22*** (32.56)
.14*** (28.93)
.14*** (14.71)
.17*** (5.09)
.23*** (12.05)
28.62
34.02
39.91
32.22
19.69
19.00
23.23
32.88
.32*** (33.24)
.27*** (76.41)
.34*** (22.96)
.25*** (24.34)
.13*** (21.86)
.38*** (19.75)
.34*** (10.02)
.28*** (12.64)
41.28
33.48
43.93
30.74
14.30
49.61
44.25
34.47
−3.96*** (−36.32)
−3.35*** (−83.59)
−5.87*** (−34.41)
−4.58*** (−39.50)
−4.56*** (−68.84)
−4.06*** (−21.12)
−7.86*** (−19.94)
−7.71*** (−28.50)
Controversial
-.04*** (−4.54)
−4.78
.20*** (24.85)
28.11
.27*** (27.92)
33.54
−5.30*** (−47.46)
Notes: t statistics in parentheses; * p < .05, ** p < .01, *** p < .001. Percentage change coefficients in bold are as expected, otherwise the results are not as expected (or, for the tag
“Controversial,” there are no clear expectations).
1
2
5
4
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Are disruption index indicators convergently valid?
Table 6. Results (percentage change coefficients) of 45 Poisson regressions with tags as dependent variables and different variants measuring
disruption as independent variables each. The models have been adjusted for exposure time (different publication years): How long have the
papers been at risk of being tagged and cited?
Tag
Tags expressing newness (expecting positive signs)
DI1
Hypothesis
New finding
Novel drug target
Technical advance
4.32
−2.71
6.89
6.72
Tags not expressing newness (expecting negative signs)
Confirmation
Good for teaching
Negative/Null results
Refutation
Tag without expectations
Controversial
−5.08
12.24
3.45
−9.28
−0.33
DI5
3.01
−0.62
6.74
20.65
−0.41
0.37
−11.46
−9.59
DInok
1
4.66
−2.13
14.85
18.34
0.08
8.82
−13.35
−14.83
DInok
5
2.75
−2.13
15.19
19.80
1.45
3.73
−5.20
−10.37
4.42
1.46
1.46
DeIn
0.25
1.92
−7.91
−18.11
−1.88
6.41
−2.10
8.06
−3.62
Note: The following abbreviations are used: different indicators measuring disruption (DI1, DI5, DInok
expected, otherwise the results are not as expected (or, for the tag “Controversial,” there are no clear expectations).
, DInok
5
1
, DeIn). Percentage change coefficients in bold are as
(because of the great number of models). In other words, percentage changes in expected
counts (of the tag) for a standard deviation increase in the variant measuring disruption are
listed. For example, a standard deviation change in DI1 on average increases a paper’s expected
number of technical advance tags by 6.72%. This result agrees with expectation, because the
technical advance tag reflects newness. It seems that DI5 reflects the assessments by FMs at best;
the lowest number of results in agreement is visible for DeIn.
5. DISCUSSION
For many years, scientometrics research has focused on improving the way of field-normalizing
citation counts or developing improved variants of the h-index. However, this research is
rooted in a relatively one-dimensional way of measuring impact. With the introduction of the
new family of disruption indicators, the one-dimensional method of impact measurement may
now give way to multidimensional approaches. Disruption indicators consider not only times-
cited information but also cited references data (of FPs and citing papers). High indicator
values should be able to point to published research disrupting traditional research lines.
Disruptive papers catch the attention of citing authors (at the expense of the attention devoted
to previous research); disruptive research enters unknown territory, which is scarcely consis-
tent with known territories handled in previous papers (and cited by disruptive papers). Thus,
the citing authors exclusively focus on the disruptive papers (by citing them) and do not ref-
erence previous papers cited in the disruptive papers.
Starting from the basic approach of comparing cited references of citing papers with cited
references of FPs, different variants of measuring disruptiveness have been proposed recently.
An overview of many possible variants can be found in Wu and Yan (2019). In this study, we
Quantitative Science Studies
1255
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Are disruption index indicators convergently valid?
included some variants that sounded reasonable and/or followed different approaches. For ex-
ample, DeIn proposed by Bu et al. (2019) is based on the number of citation links from an FP’s
citing papers to the FP’s cited references (without considering Nk). We were interested in the
convergent validity of these new indicators following the basic analyses approach by
Bornmann et al. (2019). The convergent validity can be tested by using an external criterion
measuring a similar dimension. Although we did not have an external criterion at hand mea-
suring disruptiveness specifically, we used tags from the F1000Prime database reflecting new-
ness. FMs assess papers using certain tags and some tags reflect newness. We assumed that
disruptive research is assessed as new. Based on the F1000Prime data, we investigated whether
the tags assigned by the FMs to the papers correspond with indicator values measuring
disruptiveness.
In the first step of the statistical analyses, we calculated an FA for inspecting whether the
various indicators measuring disruptiveness load on a single “disruptiveness” dimension. As
—the original disruption index
the results reveal, this was partly the case: All variants of the DI1
proposed by Wu et al. (2019)—loaded on one dimension—the “disruptiveness” dimension.
However, the original disruption index itself loaded on a dimension which reflects citation
impact; however, it loaded negatively. These results might be interpreted as follows: The pro-
posed disruption index variants measure the same construct, which might be interpreted as
“disruptiveness.” DI1 is related to citation impact whereby negative values—the developmen-
tal index manifestation of this indicator (see Section 2)—correspond to high citation impact
levels. As all variants of DI1 loaded on the same factor in the FA, the results do not show which
variant should be preferred (if any). Thus, we considered a second step of analyses in this
study.
In this step, we tested the correlation between each variant (including the original) and the
external “newness” criterion. The results showed that DI5 reflects the FMs’ assessments at best
(corresponding with our expectations more frequently than the other indicators); the lowest
number of results that demonstrated an agreement between tag and indicator scores is visible
for DeIn. The difference between the variants is not very large; however, the results can be
used to guide the choice of a variant if disruptiveness is measured in a scientometric study.
Although the authors of the paper introducing DI1 (Wu et al., 2019) performed analyses to
validate the index (e.g., by calculating the indicator for Nobel-Prize-winning papers), they
did not report on evaluating possible variants of the original, which might perform better.
1
We noted that while a single publication was the most highly disruptive for the DI1
(0.9747), 703 and 3816 publications respectively scored the maximum
(0.6774), and DInok
disruptiveness value of 1.0 for variants DI5 and DInok
. We also reviewed examples of the most
highly disruptive publications as measured by all four variants and observed that instances of
an annual Cancer Statistics report published by the American Cancer Society received maxi-
mal disruptiveness scores for all four variants, presumably because this report is highly cited in
each year of its publication without its references being cited. A publication from the Journal of
Global Environmental Change (https://doi.org/10.1016/j.gloenvcha.2008.10.009) was also
noteworthy and may reflect the focus on climate change.
5
All empirical research has limitations. This study is no exception. We assumed in this study
that novelty is necessarily a (or the) defining feature of disruptiveness. There are plenty of ex-
isting bibliometric measures of novelty (e.g., new keyword or cited references combinations;
see Bornmann et al., 2019). Although novelty may be necessary for disruptiveness, it is not
necessarily sufficient to make something disruptive. We cannot completely exclude the pos-
sibility that many nondisruptive discoveries are novel. “Normal science” (Kuhn, 1962)
Quantitative Science Studies
1256
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Are disruption index indicators convergently valid?
discoveries do not-necessarily lack novelty; they make novel contributions (e.g., hypotheses,
findings, or technical advances) in a way that builds on and enhances prior work (e.g., within
the paradigm). A second limitation of the study might be that the F1000Prime data are affected
by possible biases. We know from the extensive literature on ( journal and grant) peer review
processes that indications for possible biases in the assessments by peers exist (Bornmann,
2011). The third limitation refers to the F1000Prime tags that we used in this study. None of
them directly captures disruptiveness (as we acknowledge above). Most of the tags capture
novelty, which is related, but conceptually different from disruptiveness, and for which there
are already existing metrics (see Section 2). Because disruption indicators propose measuring
disruption (and not novelty), we can’t directly make claims on whether disruption indicators
measure what they propose to measure.
It would be interesting to follow up in future studies that use mixed-methods approaches to
evaluate the properties of Ni, Nj, and Nk variants more systematically against additional gold
standard data sets. The F1000 data set is certain to feature its own bias (e.g., it is restricted to
biomedicine and includes disproportionately many high-impact papers) and the variants we
describe may exhibit different properties when evaluated against multiple data sets.
ACKNOWLEDGMENTS
We would like to thank Tom Des Forges and Ros Dignon from F1000 for providing us with the
F1000Prime data set. We thank two anonymous reviewers for very helpful suggestions to
improve a previous version of the paper.
AUTHOR CONTRIBUTIONS
Lutz Bornmann: Conceptualization, Data curation, Formal analysis, Investigation,
Methodology, Writing—original draft. Sitaram Devarakonda: Data curation, Formal analysis,
Investigation, Methodology, Writing—review & editing. Alexander Tekles: Conceptualization,
Formal analysis, Investigation, Writing—review & editing. George Chacko: Conceptualization,
Formal analysis, Investigation, Methodology, Writing—review & editing.
COMPETING INTERESTS
The authors do not have any competing interests. Citation data used in this paper relied on
Scopus (Elsevier Inc.) as implemented in the ERNIE project (Korobskiy et al., 2019), which is
collaborative between NET ESolutions Corporation and Elsevier Inc. The content of this pub-
lication is solely the responsibility of the authors and does not necessarily represent the official
views of the National Institutes of Health, NET ESolutions Corporation, or Elsevier Inc.
FUNDING INFORMATION
Research and development reported in this publication was partially supported by funds from
the National Institute on Drug Abuse, National Institutes of Health, US Department of Health
and Human Services, under Contract No HHSN271201800040C (N44DA-18-1216).
DATA AVAILABILITY
Access to the Scopus bibliographic data requires a license from Elsevier; we cannot therefore
make the data used in this study publicly available.
Quantitative Science Studies
1257
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Are disruption index indicators convergently valid?
REFERENCES
Afifi, A., May, S., & Clark, V. A. (2012). Practical multivariate anal-
ysis (5th ed.). Boca Raton, FL: CRC Press.
Anon. (2005). Revolutionizing peer review? Nature Neuroscience,
8(4), 397–397.
Azoulay, P. (2019). Small research teams “disrupt” science more
radically than large ones. Nature, 566, 330–332.
Baldwin, S. (2019). Psychological statistics and psychometrics using
stata. College Station, TX: Stata Press.
Bornmann, L. (2011). Scientific peer review. Annual Review of
Information Science and Technology, 45, 199–245.
Bornmann, L. (2015). Inter-rater reliability and convergent validity
of F1000Prime peer review. Journal of the Association for
Information Science and Technology, 66(12), 2415–2426.
Bornmann, L., & Daniel, H.-D. (2008). What do citation counts
measure? A review of studies on citing behavior. Journal of
Documentation, 64(1), 45–80. https://doi.org/10.1108/
00220410810844150
Bornmann, L., & Leydesdorff, L. (2013). The validation of (ad-
vanced) bibliometric indicators through peer assessments: A
comparative study using data from InCites and F1000. Journal
of Informetrics, 7(2), 286–291. https://doi.org/10.1016/j.joi.
2012.12.003
Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of
percentiles and percentile rank classes in the analysis of biblio-
metric data: Opportunities and limits. Journal of Informetrics, 7(1),
158–165. https://doi.org/10.1016/j.joi.2012.10.001
Bornmann, L., & Tekles, A. (2019). Disruption index depends on
length of citation window. El profesional de la información, 28(2),
e280207. https://doi.org/10.3145/epi.2019.mar.07
Bornmann, L., Tekles, A., Zhang, H. H., & Ye, F. Y. (2019). Do we
measure novelty when we analyze unusual combinations of
cited references? A validation study of bibliometric novelty indi-
cators based on F1000Prime data. Journal of Informetrics, 13(4),
100979.
Boudreau, K. J., Guinan, E. C., Lakhani, K. R., & Riedl, C. (2016).
Looking across and looking beyond the knowledge frontier:
Intellectual distance, novelty, and resource allocation in science.
Management Science, 62(10), 2765–2783. https://doi.org/10.1287/
mnsc.2015.2285
Boyack, K., & Klavans, R. (2014). Atypical combinations are con-
founded by disciplinary effects. In E. Noyons (Ed.), Proceedings
of the Science and Technology Indicators Conference 2014
Leiden: “Context Counts: Pathways to Master Big and Little
Data” (pp. 64–70). Leiden: Universiteit Leiden.
Bradley, J., Devarakonda, S., Davey, A., Korobskiy, D., Liu, S.,
Lakhdar-Hamina, D., … Chacko, G. (2020). Co-citations in
context: Disciplinary heterogeneity is relevant. Quantitative
Science Studies, 1(1), 264–276. https://doi.org/10.1162/
qss_a_00007
Bu, Y., Waltman, L., & Huang, Y. (2019). A multidimensional per-
spective on the citation impact of scientific publications.
Retrieved February 6, 2019, from https://arxiv.org/abs/1901.
09663
Carayol, N., Lahatte, A., & Llopis, O. (2017). Novelty in science
Proceedings of the Science, Technology, & Innovation
Indicators Conference Open indicators: Innovation, participation
and actor-based STI indicators. Paris, France.
Casadevall, A., & Fang, F. C. (2016). Revolutionary science. mBio,
7(2), e00158. https://doi.org/10.1128/mBio.00158-16
Cohen, J. (1988). Statistical power analysis for the behavioral sci-
ences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Deschacht, N., & Engels, T. E. (2014). Limited dependent variable
models and probabilistic prediction in informetrics. In Y. Ding, R.
Rousseau & D. Wolfram (Eds.), Measuring scholarly impact
(pp. 193–214). Berlin: Springer.
Du, J., Tang, X., & Wu, Y. (2016). The effects of research level and
article type on the differences between citation metrics and
F1000 recommendations. Journal of the Association for
Information Science and Technology, 67(12), 3008–3021.
Estes, Z., & Ward, T. B. (2002). The emergence of novel attributes in
concept modification. Creativity Research Journal, 14(2), 149–156.
https://doi.org/10.1207/S15326934crj1402_2
Foster, J. G., Rzhetsky, A., & Evans, J. A. (2015). Tradition and
innovation in scientists’ research strategies. American Sociolog-
ical Review, 80(5), 875–908. https://doi.org/10.1177/
0003122415601618
Frank, S. A. (2009). The common patterns of nature. Journal of
Evolutionary Biology, 22(8), 1563–1585. https://doi.org/10.1111/
j.1420-9101.2009.01775.x
Funk, R. J., & Owen-Smith, J. (2017). A dynamic network measure
of technological change. Management Science, 63(3), 791–817.
https://doi.org/10.1287/mnsc.2015.2366
Gaskin, C. J., & Happell, B. (2014). On exploratory factor analysis:
A review of recent evidence, an assessment of current practice,
and recommendations for future use. International Journal of
Nursing Studies, 51(3), 511–521. https://doi.org/10.1016/j.
ijnurstu.2013.10.005
Hanahan, D., & Weinberg, Robert A. (2011). Hallmarks of cancer:
The next generation. Cell, 144(5), 646–674. https://doi.org/10.1016/
j.cell.2011.02.013
Hazen, A. (1914). Storage to be provided in impounding reservoirs
for municipal water supply. Transactions of American Society of
Civil Engineers, 77, 1539–1640.
Hemlin, S., Allwood, C. M., Martin, B., & Mumford, M. D. (2013).
Introduction: Why is leadership important for creativity in science,
technology, and innovation. In S. Hemlin, C. M. Allwood, B.
Martin & M. D. Mumford (Eds.), Creativity and leadership in
science, technology, and innovation (pp. 1–26). New York, NY:
Taylor & Francis.
Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I.
(2015). Bibliometrics: The Leiden Manifesto for research metrics.
Nature, 520(7548), 429–431.
Hilbe, J. M. (2014). Modelling count data. New York, NY:
Cambridge University Press.
Jemal, A., Siegel, R., Xu, J., & Ward, E. (2010). Cancer statistics,
2010. CA: A Cancer Journal for Clinicians, 60(5), 277–300.
https://doi.org/10.3322/caac.20073
Jennings, C. G. (2006). Quality and value: the true purpose of peer
review. What you can’t measure, you can’t manage: The need
for quantitative indicators in peer review. Retrieved July 6,
2006, from http://www.nature.com/nature/peerreview/debate/
nature05032.html
Kincaid, E. Z., Murata, S., Tanaka, K., & Rock, K. L. (2016).
Specialized proteasome subunits have an essential role in the
thymic selection of CD8+ T cells. Nature Immunology, 17(8),
938–945. https://doi.org/10.1038/ni.3480
Korobskiy, D., Davey, A., Liu, S., Devarakonda, S., & Chacko, G.
(2019). Enhanced Research Network Informatics (ERNIE) (Github
Repository). Retrieved November 11, 2019, from https://github.
com/NETESOLUTIONS/ERNIE
Kourtis, N., Nikoletopoulou, V., & Tavernarakis, N. (2012). Small
heat-shock proteins protect from heat-stroke-associated
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
1258
Are disruption index indicators convergently valid?
neurodegeneration. Nature, 490(7419), 213–218. https://doi.org/
10.1038/nature11417
Kraemer, H. C., Morgan, G. A., Leech, N. L., Gliner, J. A., Vaske, J. J.,
& Harmon, R. J. (2003). Measures of clinical significance.
Journal of the American Academy of Child and Adolescent
Psychiatry, 42(12), 1524–1529. https://doi.org/10.1097/01.
chi.0000091507.46853.d1
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago,
IL: University of Chicago Press.
Kuukkanen, J.-M. (2007). Kuhn, the correspondence theory of truth
and coherentist epistemology. Studies In History and Philosophy
of Science Part A, 38(3), 555–566.
Lee, Y.-N., Walsh, J. P., & Wang, J. (2015). Creativity in scientific
teams: Unpacking novelty and impact. Research Policy, 44(3),
684–697. https://doi.org/10.1016/j.respol.2014.10.007
Lolle, S. J., Victor, J. L., Young, J. M., & Pruitt, R. E. (2005).
Genome-wide non-Mendelian inheritance of extra-genomic in-
formation in Arabidopsis. Nature, 434(7032), 505–509. https://
doi.org/10.1038/nature03380
Long, J. S., & Freese, J. (2014). Regression models for categorical
dependent variables using Stata (3rd ed.). College Station, TX:
Stata Press.
Mairesse, J., & Pezzoni, M. (2018). Novelty in science: The impact of
French physicists’ novel articles. In P. Wouters (Ed.), Proceedings
of the Science and Technology Indicators Conference 2018
Leiden, “Science, technology and innovation indicators in transi-
tion” (pp. 212–220). Leiden: University of Leiden.
Marx, W., & Bornmann, L. (2016). Change of perspective:
Bibliometrics from the point of view of cited references-a litera-
ture overview on approaches to the evaluation of cited refer-
ences in bibliometrics. Scientometrics, 109(2), 1397–1415.
https://doi.org/10.1007/s11192-016-2111-2
McEniery, C. M., Yasmin, Hall, I. R., Qasem, A., Wilkinson, I. B., &
Cockcroft, J. R. (2005). Normal vascular aging: Differential ef-
fects on wave reflection and aortic pulse wave velocity. The
Anglo-Cardiff Collaborative Trial (ACCT), 46(9), 1753–1760.
https://doi.org/10.1016/j.jacc.2005.07.037
Mohammadi, E., & Thelwall, M. (2013). Assessing non-standard ar-
ticle impact using F1000 labels. Scientometrics, 97(2), 383–395.
https://doi.org/10.1007/s11192-013-0993-9
Mohan, V., & Shellard, T. (2014). Providing family planning ser-
vices to remote communities in areas of high biodiversity
through a population-health-environment programme in
Madagascar. Reproductive Health Matters, 22(43), 93–103.
https://doi.org/10.1016/S0968-8080(14)43766-2
Petrovich, E. (2018). Accumulation of knowledge in para-scientific
areas: The case of analytic philosophy. Scientometrics, 116(2),
1123–1151. https://doi.org/10.1007/s11192-018-2796-5
Puccio, G. J., Mance, M., & Zacko-Smith, J. (2013). Creative lead-
ership. Its meaning and value for science, technology, and inno-
vation. In S. Hemlin, C. M. Allwood, B. Martin & M. D. Mumford
(Eds.), Creativity and leadership in science, technology, and inno-
vation (pp. 287–315). New York, NY: Taylor & Francis.
Rowlands, I. (2018). What are we measuring? Refocusing on some
fundamentals in the age of desktop bibliometrics. FEMS
Microbiology Letters, 365(8). https://doi.org/10.1093/femsle/
fny059
Siegel, R., Naishadham, D., & Jemal, A. (2013). Cancer statistics,
2013. CA: A Cancer Journal for Clinicians, 63(1), 11–30. https://
doi.org/10.3322/caac.21166
StataCorp. (2017). Stata statistical software: release 15. College
Station, TX: Stata Corporation.
Stephan, P., Veugelers, R., & Wang, J. (2017). Blinkered by biblio-
metrics. Nature, 544(7651), 411–412.
Tahamtan, I., & Bornmann, L. (2018). Creativity in science and
the link to cited references: Is the creative potential of papers
reflected in their cited references? Journal of Informetrics, 12(3),
906–930. https://doi.org/10.1016/j.joi.2018.07.005
Tahamtan, I., & Bornmann, L. (2019). What do citation counts mea-
sure? An updated review of studies on citations in scientific
documents published between 2006 and 2018. Scientometrics,
121(3), 1635–1684. https://doi.org/10.1007/s11192-019-03243-4
Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical
combinations and scientific impact. Science, 342(6157), 468–472.
https://doi.org/10.1126/science.1240474
Verardi, V., & McCathie, A. (2012). The S-estimator of multivariate
location and scatter in Stata. Stata Journal, 12(2), 299–307.
Wagner, C. S., Whetsell, T. A., & Mukherjee, S. (2019). International
research collaboration: Novelty, conventionality, and atypicality
in knowledge recombination. Research Policy, 48(5), 1260–1270.
https://doi.org/10.1016/j.respol.2019.01.002
Waltman, L., & Costas, R. (2014). F1000 recommendations as a
new data source for research evaluation: A comparison with
citations. Journal of the Association for Information Science and
Technology, 65(3), 433–445.
Wang, J., Lee, Y.-N., & Walsh, J. P. (2018). Funding model and cre-
ativity in science: Competitive versus block funding and status
contingency effects. Research Policy, 47(6), 1070–1083. https://
doi.org/10.1016/j.respol.2018.03.014
Wang, J., Veugelers, R., & Stephan, P. E. (2017). Bias against nov-
elty in science: A cautionary tale for users of bibliometric indica-
tors. Research Policy, 46(8), 1416–1436.
Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and
small teams disrupt science and technology. Nature, 566, 378–382.
https://doi.org/10.1038/s41586-019-0941-9
Wu, Q., & Yan, Z. (2019). Solo citations, duet citations, and pre-
lude citations: New measures of the disruption of academic pa-
pers. Retrieved May 15, 2019, from https://arxiv.org/abs/
1905.03461
Wu, S., & Wu, Q. (2019). A confusing definition of disruption.
Retrieved May 15, 2019, from https://osf.io/preprints/socarxiv/
d3wpk/
Quantitative Science Studies
1259
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
3
1
2
4
2
1
8
6
9
8
5
9
q
s
s
_
a
_
0
0
0
6
8
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3