RESEARCH ARTICLE

Quantitative science studies should be framed
with middle-range theories and concepts
from the social sciences

开放访问

杂志

1Institute of Sociology, University of Wuppertal, 德国
2Interdisciplinary Center for Science and Technology Studies (IZWT), University of Wuppertal, 德国

Thomas Heinze1

and Arlette Jappe2

引文: Heinze, T。, & Jappe, A. (2020).
Quantitative science studies should be
framed with middle-range theories and
concepts from the social sciences.
Quantitative Science Studies, 1(3),
983–992. https://doi.org/10.1162/
qss_a_00059

DOI:
https://doi.org/10.1162/qss_a_00059

通讯作者:
Thomas Heinze
theinze@uni-wuppertal.de

Handling Editors:
Loet Leydesdorff, Ismael Rafols,
and Staša Milojević

关键词: bibliometrics, 意大利, middle-range theory, 荷兰, 职业, research assessment

抽象的

This paper argues that quantitative science studies should frame their data and analyses
with middle-range sociological theories and concepts. We illustrate this argument
with reference to the “sociology of professions,” a middle-range theoretical framework
developed by Chicago sociologist Andrew Abbott. Using this framework, we counter
the claim that the use of bibliometric indicators in research assessment is pervasive in
all advanced economies. 相当, our comparison between the Netherlands and Italy
reveals major differences in the national design of bibliometric research assessment:
The Netherlands follows a model of bibliometric professionalism, whereas Italy follows
a centralized bureaucratic model that co-opts academic elites. We conclude that
applying the sociology of professions framework to a broader set of countries would be
worthwhile, allowing the emerging bibliometric profession to be charted in a comprehensive,
and preferably quantitative, fashion. We also briefly discuss other sociological middle-range
concepts that could potentially guide empirical analyses in quantitative science studies.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

介绍

The argument of this paper is that quantitative science studies should more frequently frame
their data and analyses with middle-range sociological theories and concepts (Merton, 1968)
in order to advance our understanding of institutional configurations of national research sys-
tems and their changes. We illustrate this argument with reference to the theoretical frame-
work “sociology of professions” (Abbott, 1988, 1991), which we apply to a comparison of
research evaluation frameworks in the Netherlands and Italy, two countries that contrast in
how knowledge for research assessment is both produced and used.

We argue that, as a new and emerging profession, evaluative bibliometrics has successfully
established a subordinate jurisdiction in the Netherlands, but that similar professionalization
cannot be observed in Italy. Our comparison of these two countries suggests that the institu-
tionalization of bibliometrics via expert organizations in the context of decentralized decision
making among universities, such as in the Netherlands, generates trust and learning, contrib-
uting to increased scientific performance (so-called improvement use of research evaluation,
比照. Molas-Gallart, 2012). 相比之下, if research assessments are institutionalized via central-
ized government bureaucracies with little or no involvement of bibliometric expertise, 例如

版权: © 2020 Thomas Heinze and
Arlette Jappe. Published under a
Creative Commons Attribution 4.0
国际的 (抄送 4.0) 执照.

麻省理工学院出版社

Quantitative science studies with middle range theories

在意大利, it generates little trust and learning, and tends to be an instrument of administrative
控制 (so-called controlling use of research evaluation, 比照. Molas-Gallart, 2012).

Our starting point is the reportedly increased use of bibliometric indicators in research as-
评估 (希克斯, Wouters, 等人。, 2015; Wilsdon et al., 2015). Such metrics are commonly
based on both publication and citation data extracted from large multidisciplinary citation da-
tabases, most importantly the Web of Science ( WoS) and Scopus. The simplest and most com-
mon metrics include the Journal Impact Factor and the Hirsch Index (Moed, 2017; Todeschini
& Baccini, 2016). An influential narrative as to why such metrics have proliferated is increased
accountability pressures in the governance of public services, including research and higher
教育, reflecting a global trend towards an audit society in which public activities come
under ever-increasing scrutiny (例如, “governance by numbers” or “metric tide”) (Espeland &
Stevens, 1998; 磨坊主, 2001; Porter, 1995; 力量, 1997; Rottenburg, Merry, 等人。, 2015). 和
regard to research assessments, these metrics have led to considerable criticism, not only in
methodical terms (阿德勒, Ewing, & 泰勒, 2009; Cagan, 2013) but also with respect to po-
tential negative side effects, such as the suppression of interdisciplinary research or the sub-
optimal allocation of research funding (van Eck, Waltman, 等人。, 2013; Wilsdon et al., 2015).

Some believe that research assessment metrics are used pervasively in all advanced econ-
omies, regardless of the national institutional context in which the research is carried out. 这
contestation of such metrics among scientific stakeholders is said to be indicative of their per-
vasiveness. 然而, as we show here, such a view receives little support in light of empirical
evidence about how national research systems have institutionalized the professional use of
such metrics. Our country comparison clearly shows differences in the design of the Dutch
and Italian research evaluation frameworks, and a sociology of professions framework contrib-
utes to analyzing these differences (Abbott, 1988, 1991). The Netherlands and Italy differ con-
siderably in their institutional setup, providing very different contexts for the professional
activities of bibliometric experts.

第一的, we conclude that it would be worthwhile to apply the middle-range sociology of pro-
fessions framework to a broader set of countries. 在这样做, the strength of the emerging
bibliometric profession could be charted in a comprehensive, and preferably quantitative-
descriptive, 方式. 第二, we point out that insights into the emerging bibliometric profession
should be combined with other important institutional factors, such as organizational auton-
omy in higher education systems. We suggest that cross-national performance comparisons
should make a greater effort to include such theoretically framed explanatory variables in mul-
tivariate models. 第三, we argue that many other sociological middle-range theories and con-
cepts have potential for guiding empirical analyses in quantitative science studies. We briefly
discuss one such concept, Hollingsworth’s (2004, 2006, p. 425–426) “weakly versus strongly
regulated institutional environments.”

2. THE SOCIOLOGY OF PROFESSIONS FRAMEWORK

The work of professionals in modern societies has been described as the application of abstract
knowledge to complex individual cases. The application of such knowledge includes diagno-
姐姐, inference, and treatment, and is typically carried out in particular workplaces, such as hos-
pitals or professional service firms. Abbott (1988, p. 35–58) identified three social arenas in
which professionals must establish and defend their jurisdictional claims: 法律体系, 这
public sphere, and the workplace. From a system perspective, professional groups compete for
recognition of their expertise and seek to establish exclusive domains of competence (“juris-
dictions”). Abbott argued that historical case studies should be conducted to better understand

Quantitative Science Studies

984

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

the variety of jurisdictional settlements in modern societies. One such case study is the field of
evaluative bibliometrics, in which two main types of clients are interested in bibliometric
assessment services: organizations conducting research and funders of research (Jappe, Pithan,
& Heinze, 2018). Such organizations would be interested in quantitative assessment techniques
because of the rapid growth of science (Bornmann & Mutz, 2015). As the knowledge base in most
areas of science grows faster globally than the financial resources of any individual organization,
these organizations routinely face problems with both resource allocation and staff recruitment.
Bibliometric expertise is provided mostly by individual scholars, contract research organizations
specializing in assessment services, or database providers offering software for ready-made biblio-
metric tools, as well as more customized assessment services.

A recent study investigated bibliometric experts in Europe (Jappe, 2019). Based on a com-
prehensive collection of evaluation reports, expert organizations, such as the Dutch Centre for
Science and Technology Studies (CWTS) in Leiden, are able to set technical standards with
respect to data quality by investing in in-house databases with improved WoS data.
Bibliometric indicators based on field averages were most frequently used. 重要的, 这
study found that bibliometric research assessment occurred most often in the Netherlands,
the Nordic countries, 和意大利, confirming studies focusing on performance-based funding
系统 (Aagaard, Bloch, & 施耐德, 2015; 希克斯, 2012; Sandstrom & Van den Besselaar,
2018). 然而, how successful have bibliometric experts been at establishing what Abbott (1988:
35–58) calls “professional jurisdictions”?

3. RESEARCH EVALUATION FRAMEWORKS IN THE NETHERLANDS AND ITALY

The Netherlands has a tradition of decentralized research evaluation (Van Der Meulen, 2010;
van Steen & Eijffinger, 1998). The Dutch evaluation framework is based on the principles of
university autonomy, leadership at the level of universities and faculties, and accountability in
research quality. 相比之下, Italy has developed a highly centralized research evaluation ex-
ercise. The Italian evaluation framework is based on financial rewards for publication and ci-
tation performance, provides national rankings of university departments, and contributes to
an institutional environment that leaves little room for university autonomy (Capano, 2018).

3.1. The Dutch Research Evaluation Framework

In the Netherlands, the Standard Evaluation Protocol (SEP) regulates institutional evaluation
(IE。, evaluation of research units at universities), including university medical centers, 和
at the institutes affiliated with the Netherlands Organization for Scientific Research (NWO)
and the Royal Netherlands Academy of Arts and Sciences (KNAW). Three consecutive periods
of the SEP have been implemented thus far, from 2003–2009, 2009–2015, and 2015–2021
(VSNU, KNAW, & NWO, 2003, 2009, 2014). The responsibility for formulating the protocol
lies with the KNAW, NWO, and the Association of Universities in the Netherlands ( VSNU).
The legal basis is the Higher Education and Research Act ( WHW), which requires regular as-
sessment of the quality of activities at universities and public research institutions.

Research evaluation under the SEP is decentralized, in that evaluations are commissioned
by the boards of individual research organizations. No national agency is tasked to bring to-
gether the information from different institutions or exercise central oversight over the evalu-
ation process (van Drooge, Jong, 等人。, 2013). The aim of the SEP is to provide common
guidelines for the evaluation and improvement of research and research policy based on ex-
pert assessments. The protocol requires that all research be evaluated once every 6 年. 一个
internal midterm evaluation 3 years later serves to monitor measures taken in response to the

Quantitative Science Studies

985

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

external evaluation. The external evaluation of scientific research applies to two levels: 这
research institute as a whole and its research programs. Three main tasks of the research in-
stitute and its research programs are assessed: the production of results relevant to the scientific
社区, the production of results relevant to society, and the training of PhD students. Four
main criteria were considered in the assessments conducted thus far: 质量, 生产率, 所以-
cietal relevance and vitality, and feasibility. The goals of the SEP are to improve the quality of
研究, to provide accountability for the use of public money towards the research organi-
zation’s board, funding bodies, 政府, and society at large (van Drooge et al., 2013).
During the most recent period, the productivity criterion has been abandoned in favor of
greater emphasis on societal relevance (Petersohn & Heinze, 2018).

A precursor to the SEP was the VSNU protocol, which was developed in the early 1990s by
VSNU in consultation with the NWO and KNAW. In contrast to the current protocol, 这
VSNU protocol was designed as a national disciplinary evaluation across research organiza-
系统蒸发散. In most disciplines, research quality was assessed by a combination of peer review and
bibliometric data. In response to criticism, this assessment framework was overhauled in
1999–2000 and the national comparison of academic disciplines was abandoned in favor
of greater freedom for universities to choose the format in which they wanted to conduct their
research quality assessment while maintaining a common procedural framework. 回应-
sibility for commissioning evaluations was moved from the disciplinary chambers to the ex-
ecutive boards of research organizations (Petersohn & Heinze, 2018).

3.2. The Italian Research Evaluation Framework

In Italy, the Evaluation of Research Quality ( VQR) is a national evaluation exercise implement-
ed by the National Agency for the Evaluation of Universities and Research Organizations
(ANVUR). The evaluation is mandatory for all public and private universities, 也 12
national research organizations funded by the Ministry of Education, Universities, 和
研究 (MIUR), involving all researchers with fixed-term or permanent contracts. 电流-
rent legal basis is law no. 232/2016, which requires that the VQR be carried out every 5 年
on the basis of a ministerial decree. The VQR has been completed for the periods 2004–2010
( VQR I) and 2011–2014 ( VQR II), and will be continued in 5-year periods, with the next in
2015–2019 ( VQR III) (ANVUR, 2013, 2017). The periods refer to the publication years of
research output assessed in the respective cycle.

The objective of the VQR is to promote improvement in the research quality of the assessed
institutions and to allocate the merit-based share of university base funding. Performance-
based funding is implemented as an additional incentive for institutions to produce high qual-
ity research. Law 98/2013 dictates that the share of annual base funding (Fondo di
Finanziamento Ordinario) distributed to these organizations as a premium (IE。, in large part
dependent on their VQR results) will increase annually up to a level of 30%, reaching 23% 在
2018. The assessment builds on the administrative division of the Italian university system into
14 disciplinary areas and produces national rankings for university departments within each
area based on several composite indicators of research quality. Evaluations are based on the
submission of a fixed number of research products per employed researcher. 例如,
VQR I required three products per university researcher, and six products for researchers work-
ing in a nonuniversity setting without teaching obligations. The university or research institute
collects the submitted products and selects the final set submitted to ANVUR. These outputs
are then assigned to one of the 14 ( VQR I) 或者 16 ( VQR II) groups of evaluation experts (GEVs).
Research quality is judged in terms of scientific relevance, originality and innovation, 和

Quantitative Science Studies

986

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

internationalization. The GEVs are assigned to rate each individual product on a five-point
规模, relying on bibliometric information or peer review or a combination of the two. VQR
I involved nearly 185,000 research products for 61,822 研究人员 (Ancaiani, Anfossi, 等人。,
2015). ANVUR then computes a set of composite quality indicators for each department, 大学-
大学, and research institute (Ancaiani et al., 2015).

A precursor to the VQR was the Triannual Research Evaluation ( VTR), which was per-
formed in 2004–2006 with reference to the period 2001–2003. The VTR was inspired by
the Research Assessment Exercise in the United Kingdom (RAE); it was an expert review orga-
nized into 20 panels to assess the quality of submissions from researchers in all Italian univer-
sities and research organizations (Geuna & Piolatto, 2016). The VTR reassessed approximately
14% of the research produced by the Italian academic system during the respective period,
relying exclusively on peer review (Abramo, D’Angelo, & Caprasecca, 2009). Its results
affected university funding to a very limited extent.

4. PROFESSIONALIZATION OF BIBLIOMETRIC EXPERTISE IN THE NETHERLANDS
AND ITALY

The differences in the design of the Dutch and Italian research evaluation frameworks are re-
lated to the question of how bibliometric expertise has been institutionalized in each country.
From a theoretical point of view, research assessments can be understood as an intrusion upon
reputational control that operates within intellectual fields (Whitley, 2000, 2007). 然而,
research assessments do not replace reputational control, but are an additional institutional
layer of work control. The sociological question is how this new institutional layer operates.

The Dutch system follows a model of bibliometric professionalism. In the Netherlands,
there is a client relationship between universities and research institutes, with a legally en-
forced demand for regular performance evaluation on one side and primarily one contract
research organization, the CWTS, 在另一, providing bibliometric assessment as a profes-
sional service. In a study on the jurisdiction of bibliometrics, Petersohn and Heinze (2018)
investigated the history of the CWTS, which developed as an expert organization in the con-
text of Dutch science and higher education policies beginning in the 1970s. Even though bib-
liometrics are not used by all Dutch research organizations or for all disciplines, 还有一些
potential clients are satisfied with internet-based “ready-made indicators,” the current SEP sus-
tains a continuous demand for bibliometric assessment. In the 2000s, the CWTS became an
established provider, exporting assessment services to clients across several European coun-
tries and more widely influencing methodological practices within the field of bibliometric
experts (Jappe, 2019).

Regarding professional autonomy, there are two important points to consider in the Dutch
系统. 第一的, the additional layer of control via the SEP is introduced at the level of the re-
search organization, in Whitley’s (2000) 条款, at the level of the employing organization,
rather than the intellectual field or discipline. 因此, the purpose of research evaluation is to
inform the university or institute leadership about the organization’s strengths and weaknesses
regarding research performance. It is the organization’s board that determines its information
needs and commissions evaluations. 这样, the role of the employing organization is
strengthened vis-à-vis scientific elites from the intellectual field, as the organizational leader-
ship obtains relatively objective information on performance that can be understood and used
by nonexperts in their respective fields. 然而, this enhancement of work control seems to
depend on a high level of acceptance of the bibliometric services by Dutch academic elites as
professionally sound and nonpartisan information gathering. As Petersohn and Heinze (2018)

Quantitative Science Studies

987

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

emphasized, professional bibliometricians in the Netherlands have claimed a jurisdiction that
is subordinate to peer review.

This leads to the second point, which is related to the professional autonomy of biblio-
metrics as a field of experts. In the Dutch system, science and higher education policies have
helped create and sustain an academic community of bibliometricians in addition to the ex-
pert organization (IE。, the CWTS). The development of bibliometric methods is left to these
bibliometric professionals. Neither state agencies nor the universities, as employing organiza-
系统蒸发散, claim expertise in bibliometric methodology. 另一方面, for a professional or-
ganization to gain social acceptance of its claims of competence, the CWTS is obliged to
closely interact with its clients in order to determine the best way to serve their information
需要. 因此, the professional model of bibliometric assessment in the Netherlands strengthens
the leadership of employing organizations and supports the development of a subordinate pro-
fessional jurisdiction of bibliometrics with a certain degree of scientific autonomy. The model
of bibliometric professionalism seems to have contributed to the comparatively broad accep-
tance of quantitative performance evaluation in the Dutch scientific community.

In stark contrast, the Italian system follows a centralized bureaucratic model that co-opts
academic elites. In Italy, bibliometric assessment is part of a central state program implemented
在 14 disciplinary divisions of public research employment. Reputational control of academic
work is taken into account insofar as evaluation is carried out by members of a committee
representing disciplinary macroareas. It is the responsibility of these evaluation committees
to determine the details of the bibliometric methodology and evaluation criteria appropriate
for their area of discipline, whereas ANVUR, as the central agency, specifies a common
methodological approach to be followed by all disciplines (Anfossi, Ciolfi, 等人。, 2016). 在这个
方式, the Italian state cooperates with elites from intellectual fields in order to produce the
information required for performance comparisons within and across fields. VQR I comprised
14 committees with 450 professorial experts; VQR II comprised 16 committees with 436 亲-
fessorial experts.

When comparing the two systems, two points are notable regarding the development of a
new professional field. 第一的, in the Italian system, an additional layer of work control was in-
troduced at the level of a state agency that determines faculty rankings and at the MIUR, 哪个
is responsible for the subsequent budget allocation. 可以说, the role of organizational lead-
ership at universities and research institutes is not strengthened, but circumscribed, by this
centralized evaluation program. All public research organizations are assessed against the
same performance criteria and left with the same choices to improve performance.
Administrative, macrodisciplinary divisions are prescribed as the unitary reference for organi-
zational research performance. 此外, although the VQR provides rectors and deans
with aggregated information concerning the national ranking positions of their university de-
partments, access to individual performance data is limited to the respective scientists. 因此,
the VQR is not designed to inform leadership about the strengths of individuals and groups
within their organization. This could be seen as limiting the usefulness of the VQR from a
leadership perspective. 此外, the lack of transparency could give rise to concerns about
the fairness of individual performance assessments. There seems to be no provision for ex-post
validity checks or bottom-up complaints on the part of the research organization or at the in-
dividual level. This underlines the top-down, central planning logic of the VQR exercise.

The second point relates to the professionalism of bibliometrics. Italy has expert organiza-
tions with bibliometric competence, such as the Laboratory for Studies in Research Evaluation
(REV lab) at the Institute for System Analysis and Computer Science of the Italian Research

Quantitative Science Studies

988

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

理事会 (IASI-CNR) in Rome. 然而, the design and implementation of the national eval-
uation exercise has remained outside their purview. 例如, REV lab took a very critical
stance with regard to the VTR and VQR I. Abramo et al. (2009) criticized the fact that many
Italian research organizations failed to select their best research products for submission to the
VTR, as judged by an ex-post bibliometric comparison of submitted and nonsubmitted prod-
ucts. 因此, the validity of the VTR was seriously questioned, as their conclusion sug-
客人的: “the overall result of the evaluation exercise is in part distorted by an ineffective initial
选择, hampering the capacities of the evaluation to present the true level of scientific quality
of the institutions” (Abramo et al., 2009, p. 212).

The VQR has been criticized by the same authors for evaluating institutional performance
on the basis of a partial product sample that does not represent the total institutional produc-
tivity and covers different fields to different degrees (Abramo & D’Angelo, 2015). The combi-
nation of citation count and journal impact developed by ANVUR was also criticized as being
methodically flawed (Abramo & D’Angelo, 2016). This criticism is further substantiated by the
methodological design developed by ANVUR for bibliometric-based product ratings clearly
deviating from the more common approaches in bibliometric evaluation practice in Europe
(Jappe, 2019). Reportedly, the VTR/ VQR evaluation framework was introduced by the state
against strong resistance from Italian university professors (Geuna & Piolatto, 2016). As shown
by Bonaccorsi (2018), the involved experts have exerted great effort to build acceptance of
quantitative performance assessment among scientific communities outside the natural and
engineering fields.

总之, the centralized model of bibliometric assessment in Italy severely limits uni-
versity autonomy by directly linking centralized, state-organized performance assessment and
base funding allocation. Although the autonomy of reputational organizations is respected in
the sense that intellectual elites are co-opted into groups of evaluating experts, evaluative bib-
liometricians are not involved as independent experts. In contrast to the situation in the
荷兰, the current Italian research evaluation framework has not led to the development
of a professional jurisdiction of bibliometrics.

5. FUTURE RESEARCH AGENDA

What could be fruitful avenues for future research? 第一的, we think it would be worthwhile to
apply the analytical framework of the sociology of professions to a broader set of countries.
Similar to the Netherlands–Italy comparison, such analyses could ascertain the extent to which
professional jurisdictions have been established in countries where publication- or citation-
based metrics are regularly used in institutional evaluation, including Australia, 丹麦,
比利时 (Flanders), 芬兰, Norway, 波兰, 斯洛伐克, and Sweden. These findings could
then be contrasted with countries that do not operate such regular bibliometric assessments
at the institutional level, including France, 德国, 西班牙, and the United Kingdom. Based
on current knowledge (Aagaard et al., 2015; 希克斯, 2012; Kulczycki, 2017; Molas-Gallart,
2012; Petersohn, 2016), it is reasonable to assume that countries performing regular biblio-
metric assessments with the help of recognized expert organizations have developed similar
司法管辖区 (IE。, subordinate to peer review) as in the Netherlands. Possible examples would
be the Center for Research & Development Monitoring (ECOOM) in Flanders (比利时) 或者
Nordic Institute for Studies in Innovation, 研究, and Education (NIFU) in Norway. In coun-
tries without such regular bibliometric assessments, we would expect co-optation of scientific
elites into state-sponsored evaluation agencies. Possible examples would be the National
Commission for the Evaluation of Research Activity (CNEAI) in Spain and the Science

Quantitative Science Studies

989

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

理事会 ( WR) in Germany. 最终, these analyses could chart the institutional strength of
the emerging bibliometric profession in Europe, and globally, in a comprehensive and prefer-
ably quantitative-descriptive manner.

第二, these theoretically framed insights on the emerging bibliometric profession could
be juxtaposed with other institutional dimensions that are important for the scientific perfor-
mance of national research systems. One such dimension, according to Hollingsworth (2006),
is the autonomy of universities to recruit senior academic staff and decide on their promotion,
salaries, and dismissal. 在这方面, the autonomy scoreboard provided by the European
University Association (Pruvot & Estermann, 2017) shows that Dutch universities have higher
“staffing autonomy” scores (73 在......之外 100) than Italy (44). 此外, the most recent
科学 & Engineering Report (NSB, 2019) indicates that the Netherlands had a higher and
increasing share of S&E publications in the top 1% of most-cited articles in the Scopus data-
base between 1996 和 2014 than Italy. Does that mean that a country’s impact in science is
related to the institutional strength of its bibliometric profession and the staffing autonomy of
its universities? We are far from making such a bold claim, because there seems to be no linear
relationship between these two points, as exemplified by the United Kingdom (weak biblio-
metric profession but strong university autonomy). 相当, we suggest that cross-national/
regional performance comparisons (Bonaccorsi, Cicero, 等人。, 2017; Cimini, Zaccaria, &
Gabrielli, 2016; 莱德斯多夫, 瓦格纳, & Bornmann, 2014) should make greater effort to in-
clude both explanatory variables in their multivariate models to ascertain whether they are
competing and complementary. We would like to reiterate that such variables need to be an-
chored in middle-range social scientific frameworks, otherwise it will be difficult to build cu-
mulative knowledge. Such variables could be included at various levels of measurement
depending on availability and/or data quality: nominal level (dummy variables), ordinal level
(categorical/rank variables), or interval/ratio levels (count variables).

第三, although we have discussed the emerging bibliometric profession with reference to
the sociology of professions framework thus far, there are clearly other suitable middle-range
“candidate theories/concepts” with considerable potential for further quantitative science stud-
ies is Hollingsworth’s (2004, 2006: 425–426) concept of “weakly versus strongly regulated
institutional environments.” Based on extensive interviews with prize-winning scientists in
biomedicine, Hollingsworth argues that universities and research organizations with high
numbers of scientific breakthroughs are often found in weakly regulated institutional environ-
评论, whereas strong control constrains the capabilities of research organizations to achieve
breakthroughs. 进一步来说, in weakly regulated institutional environments, 研究
organizations have considerable decision-making authority on whether a particular research
field will be established and maintained within their boundaries, on the level of funding for
particular research fields within the organization, and on the training and recruitment rules for
their own scientific staff. Weakly regulated institutional environments, and thus considerable
organizational autonomy in national research systems, exist in the United States and the
英国, whereas Germany and France serve as examples of strongly regulated envi-
罗蒙兹. In the latter two countries, control over universities and public research organiza-
tions has been exercised to a large extent by state ministries. 所以, decisions are typically
made at state level, leaving little space for universities and research institutes to maneuver.
Hollingsworth’s (2004, 2006, p. 425–426) theoretical perspective has not yet been tested
empirically with a larger sample of countries or with a larger sample of research fields or different
measures of scientific breakthroughs. Returning to the abovementioned autonomy scoreboard
(Pruvot & Estermann, 2017), both the United Kingdom and France receive scores in strong
support of Hollingsworth’s claims, and Germany is only partially covered. 然而, we are far from

Quantitative Science Studies

990

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

suggesting that the autonomy scoreboard data are perfect; recent studies illustrate serious cross-
national measurement problems (Aksnes, Sivertsen, 等人。, 2017; Sandstrom & Van den
Besselaar, 2018). 所以, scholars of quantitative science studies should invest time and
resources in developing large-scale, longitudinal data sets with variables anchored in middle-
range theories such as Hollingsworth’s. Successful examples show that such efforts can bear fruit.
James March’s (1991) concept of “exploration versus exploitation” has ignited a whole stream of
mostly quantitative-empirical studies that have produced cumulative social scientific knowledge
(for an overview, see Gibson & Birkinshaw, 2004; Raisch & Birkinshaw, 2008).

作者贡献

概念化: AJ, TH; Investigation and Case Comparison: AJ; Writing: AJ, TH.

COMPETING INTERESTS

The authors have no competing interests.

资金信息

01PY13013A; Federal Ministry of Education and Research BMBF Germany. The funders had no role
in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

DATA AVAILABILITY

Not applicable.

参考

Aagaard, K., Bloch, C。, & 施耐德, J. 瓦. (2015). Impacts of
performance-based research funding systems: The case of the
Norwegian Publication Indicator. Research Evaluation, 24(2),
106–117.

Abbott, A. (1988). The system of professions. An essay on the divi-

sion of expert labor. 芝加哥: 芝加哥大学出版社.

Abbott, A. (1991). The future of professions: Occupation and exper-
tise in the age of organisation. Research in the Sociology of
Organisations, 8, 17–42.

Abramo, G。, & D’Angelo, C. A. (2015). The VQR, Italy’s second
national research assessment: Methodological failures and rank-
ing distortions. Journal of the Association for Information Science
and Technology, 66(11), 2202–2214.

Abramo, G。, & D’Angelo, C. A. (2016). Refrain from adopting the
combination of citation and journal metrics to grade publica-
系统蒸发散, as used in the Italian national research assessment exercise
( VQR 2011–2014). Scientometrics, 109, 2053–2065.

Abramo, G。, D’Angelo, C. A。, & Caprasecca, A. (2009). Allocative
efficiency in public research funding: Can bibliometrics help?
Research Policy, 38, 206–215.

阿德勒, R。, Ewing, J。, & 泰勒, 磷. (2009). Citation statistics: 一份报告
from the International Mathematical Union (IMU) in cooperation
with the International Council of Industrial and Applied
Mathematics (ICIAM) and the Institute of Mathematical
统计数据 (IMS). Statistical Science, 24(1), 1–14.

Aksnes, D. W., Sivertsen, G。, van Leeuwen, 时间. N。, & Wendt, K. K.
(2017). Measuring the productivity of national R&D systems:
Challenges in cross-national comparisons of R&D input and
publication output indicators. Science and Public Policy, 44(2),
246–258.

Ancaiani, A。, Anfossi, A. F。, 芭芭拉, A。, Benedetto, S。, Blasi, B., ……
Sileoni, S. (2015). Evaluating scientific research in Italy: The 2004–10
research evaluation exercise. Research Evaluation, 24(3), 242–255.
Anfossi, A。, Ciolfi, A。, Costa, F。, 帕里西, G。, & Benedetto, S. (2016).
Large-scale assessment of research outputs through a weighted
combination of bibliometric indicators. Scientometrics, 107,
671–683.

ANVUR. (2013). Valutazione della qualità della ricerca 2004–2010
( VQR 2004–2010). Rapporto finale ANVUR Parte Prima:
Statistiche e risultati di compendio. In Agenzia Nazionale di
Valutazione del sistema Universitario e della Ricerca ANVUR.
ANVUR. (2017). Valutazione della qualità della ricerca 2011–2014
( VQR 2011–2014). Rapporto finale ANVUR Parte Prima:
Statistiche e risultati di compendio. In Agenzia Nazionale di
Valutazione del sistema Universitario e della Ricerca ANVUR.
Bonaccorsi, A. (2018). The evaluation of research in social sciences
and humanities. Lessons from the Italian experience. Heidelberg:
施普林格.

Bonaccorsi, A。, Cicero, T。, Haddawy, P。, & Hassan, S.-U. (2017).
Explaining the transatlantic gap in research excellence.
Scientometrics, 110, 217–241.

Bornmann, L。, & Mutz, 右. (2015). Growth rates of modern science:
A bibliometric analysis based on the number of publications and
cited references. Journal of the Association for Information
Science and Technology, 66(11), 2215–2222.

Cagan, 右. (2013). The San Francisco Declaration on Research

Assessment. Disease Models & Mechanisms, 6, Editorial.

Capano, G. (2018). Policy design spaces in reforming governance
在高等教育中: the dynamics in Italy and the Netherlands.
Higher Education, 75, 675–394.

Quantitative Science Studies

991

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Quantitative science studies with middle range theories

Cimini, G。, Zaccaria, A。, & Gabrielli, A. (2016). Investigating the
interplay between fundamentals of national research systems:
Performance, investments and international collaborations.
Journal of Informetrics, 10(1), 200–211.

Espeland, 瓦. N。, & Stevens, 中号. L. (1998). Commensuration as a

social process. Annual Review of Sociology, 24, 313–343.

Geuna, A。, & Piolatto, 中号. (2016). Research assessment in the UK
和意大利: Costly and difficult, but probably worth it (至少对于
a while). Research Policy, 45, 260–271.

吉布森, C. B., & Birkinshaw, J. (2004). The antecedents, conse-
序列, and mediating role of organizational ambidexterity.
Academy of Management Journal, 47(2), 209–226.

希克斯, D. (2012). Performance-based university research funding

系统. Research Policy, 41, 251–261.

希克斯, D ., Wouters, P。, Waltman, L。, de Rijke, S。, & Rafols, 我.
(2015). The Leiden manifesto for research metrics. 自然, 520,
429–431.

Hollingsworth, J. 右. (2004). Institutionalizing excellence in biomed-
ical research: The case of the Rockefeller University. 在D中. H.
Stapleton (埃德。), Creating a tradition of biomedical research.
Contributions to the history of the Rockefeller University (PP. 17–63).
纽约, 纽约: Rockefeller University Press.

Hollingsworth, J. 右. (2006). A path-dependent perspective on insti-
tutional and organizational factors shaping major scientific dis-
coveries. 在J. 时间. Hage & 中号. Meeus (编辑。), 创新, 科学,
and institutional change (PP. 423–442). 牛津: 牛津
大学出版社.

Jappe, A. (2019). Professional standards in bibliometric research
评估? Results from a content analysis of evaluation studies
在欧洲. In Proceedings of the 17th International Conference
on Scientometrics & Informetrics ISSI, September 2–5, 2019
(PP. 1612–1623). 罗马.

Jappe, A。, Pithan, D ., & Heinze, 时间. (2018). Does bibliometric re-
search confer legitimacy to research assessment practice? A so-
ciological study of reputational control, 1972–2016. PLOS ONE,
13(6), e0199031.

Kulczycki, 乙. (2017). Assessing publications through a bibliometric
指标: The case of comprehensive evaluation of scientific
units in Poland. Research Evaluation, 26(1), 41–52.

莱德斯多夫, L。, 瓦格纳, C. S。, & Bornmann, L. (2014). 这
欧洲联盟, 中国, and the United States in the top-1%
and top-10% layers of most-frequently cited publications:
Competition and collaborations. Journal of Informetrics, 8,
606–617.

行进, J. G. (1991). Exploration and exploitation in organizational

学习. Organization Science, 2(1), 71–87.

Merton, 右. K. (1968). On sociological theories of the middle
范围. 在R中. K. Merton (埃德。), Social theory and social structure
(PP. 39–72). Glencoe: Free Press.

磨坊主, 磷. (2001). Governing by numbers: Why calculative practices

事情. Social Research, 68(2), 379–396.

Moed, H. (2017). Applied evaluative informetrics. Heidelberg:

施普林格.

Molas-Gallart, J. (2012). Research governance and the role of eval-
uation: A comparative study. American Journal of Evaluation,
33(4), 583–598.

NSB. (2019). 科学 & engineering indicators–2018. Arlington,

VA: National Science Board.

Petersohn, S. (2016). Professional competencies and jurisdictional
claims in evaluative bibliometrics: The educational mandate of
academic librarians. Education for Information, 32(2), 165–193.

Petersohn, S。, & Heinze, 时间. (2018). Professionalization of biblio-
metric research assessment. Insights from the history of the
Leiden Centre for Science and Technology Studies (CWTS).
Science and Public Policy, 45, 565–578.

Porter, 时间. 中号. (1995). Trust in numbers: The pursuit of objectivity in
science and public life. 普林斯顿大学, 新泽西州: 普林斯顿大学出版社.
力量, 中号. (1997). The audit society: Rituals of verification. 牛津:

牛津大学出版社.

Pruvot, 乙. B., & Estermann, 时间. (2017). University autonomy in
Europe III. The scorecard 2017. 布鲁塞尔: European University
协会.

Raisch, S。, & Birkinshaw, J. (2008). Organizational ambidexterity:
J o u r n a l o f

A n t e c e d e n t s , o u t c o m e s , a n d m o d e r a t o r s .
管理, 34(3), 375–409.

Rottenburg, R。, Merry, S. E., 公园, S.-J., & Mugler, J. (2015). 这
world of indicators. The making of governmental knowledge
through quantification. 剑桥: 剑桥大学出版社.
Sandstrom, U。, & Van den Besselaar, 磷. (2018). 资金, 评估-
的, and the performance of national research systems. 杂志
of Informetrics, 12(1), 365–384.

Todeschini, R。, & Baccini, A. (2016). Handbook of bibliometric in-
dicators: Quantitative tools for studying and evaluating research.
Weinheim: 威利.

Van Der Meulen, 乙. J. 右. (2010). 荷兰人. 在D中. 西蒙, A.
Knie, S. Hornbostel & K. Zimmermann (编辑。), Handbuch
Wissenschaftspolitik (PP. 514–528). Wiesbaden: VS Verlag für
Sozialwissenschaften.

van Drooge, L。, Jong, S。, Faber, M。, & Westerheijden, D. F. (2013).
Twenty years of research evaluation. In Facts & 人物. 这
Hague: Rathenau Institute.

van Eck, 氮. J。, Waltman, L。, Van Raan, A. F. J。, Klautz, 右. J. M。, &
Peul, 瓦. C. (2013). Citation analysis may severely underestimate
the impact of clinical research as compared to basic research.
PLOS ONE, 8(4), e62395.

van Steen, J。, & Eijffinger, 中号. (1998). Evaluation practices of scientific
research in the Netherlands. Research Evaluation, 7(2), 113–122.
VSNU, KNAW, & NWO. (2003). Standard Evaluation Protocol 2003–
2009. Protocol for research assessments in the Netherlands.
乌得勒支: Association of Universities in the Netherlands, 这
Netherlands Organisation for Scientific Research, and the Royal
Netherlands Academy of Arts and Sciences.

VSNU, KNAW, & NWO. (2009). Standard Evaluation Protocol
2009–2015. 协议
for Research assessments in the
荷兰. 可用于: www.knaw.nl/sep. Association of
Universities in the Netherlands, the Netherlands Organisation
for Scientific Research, and the Royal Netherlands Academy of
Arts and Sciences.

VSNU, KNAW, & NWO. (2014). Standard Evaluation Protocol
2015–2021. 协议
for research assessments in the
荷兰. Association of Universities in the Netherlands, 这
Netherlands Organisation for Scientific Research, and the Royal
Netherlands Academy of Arts and Sciences.

Whitley, 右. (2000). The intellectual and social organization of the

科学, 2ND版. 牛津: 牛津大学出版社.

Whitley, 右. (2007). Changing governance of the public sciences. 在
右. Whitley & J. Gläser (编辑。), The changing governance of the
科学 (PP. 3–27). 多德雷赫特: 施普林格.

Wilsdon, J。, 艾伦, L。, Belfiore, E., 坎贝尔, P。, Curry, S。, ……
约翰逊, 乙. (2015). The metric tide: Report of the independent
review of the role of metrics in research assessment and manage-
蒙特. 布里斯托尔: Higher Education Funding Council for England.

Quantitative Science Studies

992

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d

我

F
/

1
3
9
8
3
1
8
6
9
8
4
6
q
s
s
_
A
_
0
0
0
5
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3 RESEARCH ARTICLE image

下载pdf