文章

文章

Scopus as a curated, high-quality bibliometric data
source for academic research in quantitative
science studies

Jeroen Baas1,3

, Michiel Schotten1

, Andrew Plume1,3

,

Grégoire Côté2,3

, and Reza Karimi1

1Elsevier B.V., Radarweg 29, 阿姆斯特丹, 荷兰人
2Science-Metrix Inc., 爱思唯尔, 1335 Mont-Royal Ave E, 蒙特利尔, 质量控制, 加拿大
3International Center for the Study of Research, 爱思唯尔, Radarweg 29, 阿姆斯特丹, 荷兰人

关键词: abstract and citation database, author profile, bibliographic database, bibliometrics,
citation linking, Content Selection and Advisory Board, CSAB, data cleaning, data clustering, 数据
curation, data linking, ICSR, institution profile, International Center for the Study of Research,
network visualization, ORCiD, quality assurance, research assessment, researcher mobility, 科学
policy evaluation, scientometrics, university ranking

抽象的

Scopus is among the largest curated abstract and citation databases, with a wide global and
regional coverage of scientific journals, conference proceedings, and books, while ensuring
only the highest quality data are indexed through rigorous content selection and re-evaluation
by an independent Content Selection and Advisory Board. 此外, extensive quality
assurance processes continuously monitor and improve all data elements in Scopus. Besides
enriched metadata records of scientific articles, Scopus offers comprehensive author and
institution profiles, obtained from advanced profiling algorithms and manual curation, ensuring
high precision and recall. The trustworthiness of Scopus has led to its use as bibliometric data
source for large-scale analyses in research assessments, research landscape studies, science policy
evaluations, and university rankings. Scopus data have been offered for free for selected studies by
the academic research community, such as through application programming interfaces, 哪个
have led to many publications employing Scopus data to investigate topics such as researcher
mobility, network visualizations, and spatial bibliometrics. 在六月 2019, the International Center
for the Study of Research was launched, with an advisory board consisting of bibliometricians,
aiming to work with the scientometric research community and offering a virtual laboratory where
researchers will be able to utilize Scopus data.

1. SCOPUS AS A BIBLIOMETRIC DATA SOURCE

在 2004, Elsevier launched Scopus as a new search and discovery tool (Schotten, el Aisati,
Meester, Steiginga, & Ross, 2017). Scopus is an abstract and citation database consisting of
peer-reviewed scientific content. At its launch, it contained about 27 million publication re-
cords (1966–2004年). 自那以后, the content of the database has grown to over 76 million re-
cords at the time of writing, covering publications from 1788–2019, making it among the
largest curated bibliographic abstract and citation databases today. Approximately 3 百万
new items are being added every year. The content in Scopus is sourced from over 39,100
serial titles (with the most recently published content indexed from over 24,500 titles),

开放访问

杂志

引文: Baas, J。, Schotten, M。, Plume,
A。, Côté, G。, & Karimi, 右. (2020). Scopus
as a curated, high-quality bibliometric
data source for academic research
in quantitative science studies.
Quantitative Science Studies, 1(1),
377–386. https://doi.org/10.1162/
qss_a_00019

DOI:
https://doi.org/10.1162/qss_a_00019

已收到: 04 七月 2019
公认: 26 十月 2019

通讯作者:
Jeroen Baas
j.baas@elsevier.com

Handling Editors:
Ludo Waltman and Vincent Larivière

版权: © 2020 Jeroen Baas, Michiel
Schotten, Andrew Plume, Grégoire
Côté, and Reza Karimi. 已发表
under a Creative Commons Attribution
4.0 国际的 (抄送 4.0) 执照.

麻省理工学院出版社

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

120,000 conferences, 和 206,000 books from over 5,000 different publishers worldwide.
Scopus is a curated database, which means that content is selected for inclusion in the data-
base through a rigorous process: Serial content (IE。, journals, conference proceedings, 和
book series) submitted for possible inclusion in Scopus by editors and publishers is reviewed
and selected, based on criteria of scientific quality and rigor. This selection process is carried
out by an external Content Selection and Advisory Board (CSAB) of editorially independent
科学家, each of which are subject matter experts in their respective fields. This ensures that
only high-quality curated content is indexed in the database and affirms the trustworthiness of
Scopus. 此外, Scopus being a curated database also refers to the rigorous capturing pro-
cedures, publisher agreements, and technical infrastructure in place to source the publications
directly from the publishers themselves, ensuring comprehensive, 完全的, and accurate cov-
erage of the serial content once it has been selected by the CSAB.

Scopus indexes many different elements of scientific publications, obtained from external pub-
lishers, such as publication title, 抽象的, keywords, author names and linked affiliations, 参考-
恩塞斯, and drug terms (Berkvens, 2012). Content in Scopus contains publications from scientific
publishers from all over the world. 爱思唯尔, the owner of Scopus, is also a scientific publisher.
关于 9.9% of the serial titles (IE。, journals and book series) in Scopus are published by Elsevier
(this amounts to an article share in Scopus of 17.4% 之间 2012 和 2018); 另一个 90.1% 的
serial titles (和 82.6% of articles, 分别) are produced by an extensive list of global pub-
lishers (图1a). 此外, subject coverage of the serial titles in Scopus is quite balanced
among the four main subject categories (Figure 1b). Scopus also includes non-English content,
as long as an English title and abstract, as well as references in Roman script, 可用.

In addition to the fields that are provided in the source data for Scopus, Elsevier further
enriches the content using a variety of enhancements; citations provided in the full text are
structured and clustered together and, where referencing content that is already indexed in
Scopus, linked to the cited Scopus records. This allows users to view the citation count
(how many times an article was cited). At the time of writing, the precision for citation linking
in Scopus is measured at 99.9% and the recall is 98.3%. This means that in general, 参考
that should be linked to Scopus records are linked in 98.3% of cases; and among all reference
links, 99.9% are linked to the correct record.

数字 1. Distribution of publishers (A) and of the four main subject categories (乙) of the serial titles indexed in Scopus, rounded to whole
百分点.

Quantitative Science Studies

378

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

Authorships in the Scopus databases are clustered into publication histories called Scopus (非盟-
雷神) profiles. Author profiles are generated using a combination of algorithms and manual cura-
的. Elsevier uses a “gold set” of roughly 12,000 randomly selected authors for quality assessment.
This set is updated and expanded annually and is independent of sets used for tuning or training of
算法. The end-to-end accuracy is measured continuously by several metrics. 而且, reg-
ular spot checks are run on aspects of author profiles, such as canonical names or affiliations.
通常, accuracy metrics are averaged over authors to better represent the typical experience
of users. Publications in author profiles have an average precision of 98.1% and an average recall
的 94.4%. Both precision and recall are measured based on the best matching Scopus profile for a
given “gold set” author. The best match is determined based on the Scopus profile containing the
largest number of publications for that author (数字 2). This overlap number divided by the total
number of publications by the “gold set” author defines recall (IE。, ratio of publications captured).
The same overlap number divided by the total number of publications in the Scopus profile
defines precision (IE。, ratio of publications correctly assigned to the author).

A substantial number of author profiles in Scopus have been curated. Curation can be ini-
tiated through a variety of sources. A well-established process is through Open Researcher and
Contributor ID (ORCID), an open, non-proprietary registry of unique, persistent author identi-
fication codes (What is ORCID?, 2018). ORCID is managed by a non-profit organization with
the same name established in 2012. Researchers can export, import, or link their publications
and curated metadata between Scopus and ORCID. 相似地, researchers can use a feature on
Scopus.com called the “Author Feedback Wizard” (AFW) to improve their author profile.
最后, yet another process that results in improved, curated Scopus author profiles is a com-
mercial service offered by Elsevier to subscribers of its “Pure” university administration prod-
uct, called Profile Refinement Service. Pure customers can opt to have their profiles refined
upon request or refined every 4 月. Elsevier uses the same service to proactively refine
author profiles regardless of Pure subscription whenever needed.

All above efforts combined have led to approximately 1.8 million Scopus author profiles that
have been manually enhanced (Scopus index, 七月 2019). This total has been verified using a
“manual curated” flag in the XML data of Scopus author profile records. 尽管如此, 我们必须
emphasize that Scopus creates author profiles that are being actively updated by algorithms for all

数字 2. Schematic depicting how precision and recall are calculated for Scopus author profiles.
“Author A” and “Author B” represent manually curated “gold set” lists of publications by these
authors.

Quantitative Science Studies

379

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

authorships (of over 76 million publications) covered. Availability of author profiles throughout the
corpus enables author level analytics and benchmarking across the database beyond subsample
estimations. 而且, Scopus author profiles are designed to be a complete publication history
and so will not remove or hide select publications for personal gain (IE。, negatively impacting
出版物, such as retraction notices, errata, suspiciously funded). 实际上, feedback that
Elsevier receives is algorithmically and manually reviewed before changing any existing profile.

An important aspect of the Scopus database is the high coverage and availability of first
名字, even for relatively old records: 从 25% of authorships in 1970–1974 and 52% 是-
tween 1995–1999 to 82% of authorships in 2015–2019 (数字 1). This feature strengthens the
disambiguation of authors and allows, 例如, gender-based longitudinal analyses that
leverage first names (爱思唯尔, 2017; Lerchenmueller & Sorenson, 2018). Another relevant
aspect in the analytical context is the availability of author-affiliation links in publications
throughout the database historically. This enables studies dealing with the mobility of re-
searchers, by analyzing the author affiliations and how they change over time.

An important enrichment in the Scopus database is that of institution profiles, allowing
different name variants and hierarchies of institutions to be curated in a similar fashion as au-
thors, thereby allowing automated organizing of information where needed (via an advanced,
proprietary, and highly accurate institutional profiling algorithm) and manual modification and
instruction, where possible. 此外, the full text of articles sourced by Scopus is processed
using natural language processing to identify potential references to funding acknowledgements.

To maintain Scopus as a high-quality data source and push the boundaries of quality for-
病房, Scopus introduced internal review processes to constantly monitor preidentified areas of
quality focus, such as processing, profile quality, and completeness and accuracy of source
数据. This allows the content team to identify early trends in the data and to monitor progress
on key initiatives to increase quality. 例如, under this program, digital object identifier
(DOI) completion rates went up from 87.8% at the start of the program to 99.8% in December
2018. The completeness was measured across a gold record set for which each should have a
DOI. Other main focus areas where significant improvements have been made over the past
few years, and where continuous investments are being made, are completeness of indexed
publication records for the serial titles covered (by weekly comparisons against the CrossRef
数据库), the removal of duplicate MedLine and Article in Press records, the correctness and
completeness of citation links (by measuring against a gold set), of the author and institution
profiles and publication record metadata (such as document type classifications, 出版物
年, article numbers, country codes, and funding information), as well as improving the time-
liness and currency of newly indexed content. These quality review processes employ machine
learning approaches, supplemented with manual validation, and concern legacy content (IE。,
content already indexed in Scopus) as well as continuously improving and fine-tuning the cap-
turing procedures for newly indexed content.

In addition to expanding and enriching the content, as well as improving the timeliness of
the database, a curated database such as Scopus requires re-evaluation of the appropriateness
of new and already indexed titles on an ongoing basis. This is needed to exclude poor-quality
journals and “predatory” journals and publishers, a relatively recent phenomenon that is a
threat to the integrity of science, as well as an increasing challenge to all research publishing
stakeholders: authors, 编辑, 研究人员, research institutions, funding bodies, research as-
sessment bodies, and governments. To ensure that only the most reliable scientific articles and
content are available in Scopus to each of these stakeholders and that the quality of the ex-
isting content is maintained, a rigorous process of continuous monitoring and re-evaluation

Quantitative Science Studies

380

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

has been installed. This means that titles that have been selected for inclusion in Scopus may
be discontinued and no longer indexed going forward. There are three different identification
techniques applied, 使用 (A) external feedback (IE。, formally raised concerns about publication
标准), (乙) heuristics (指标) to flag underperforming journals, 和 (C) a machine learning
approach to flag outlier behavior, which each lead to titles being tagged for re-evaluation. 这
ultimate decision to (的)select content lies with the external and independent Scopus CSAB (为了
full details, please see Elsevier, 2019乙; Holland, Brimblecombe, Meester, & Steiginga, 2019). 为了
publications of which the CSAB determines they no longer meet the quality standards for inclu-
sion in Scopus, indexing of new content is discontinued, but content already indexed remains in
Scopus, to ensure the integrity of the scientific record as well the stability and consistency of
research trend analytics.

2. SUPPORT FOR LARGE-SCALE ANALYSES

Since its official launch in 2004, Scopus has been used globally in many large-scale analyses.
There are three types of large studies where Scopus data are used in a central role.

The first group of large-scale analyses deals with national research assessments. The first na-
tional assessment supported by Scopus was the Excellence in Research for Australia (ERA) 的
Australian Research Council (ARC) 在 2010, later repeated in 2012 和 2015. The first edition of
the Research Excellence Framework (REF 2014) national assessment in the UK also used Scopus
数据. The REF 2014 was held by the Higher Education Funding Council for England (HEFCE) 和
the funding bodies for Scotland, Wales, and Northern Ireland. Up to four research outputs per
active researcher were submitted by the UK’s Higher Education Institutions (HEIs) and matched
to the corresponding records in Scopus for 11 在......之外 36 “units of assessment” (UoAs; IE。, broad
subject areas). For each of these 11 UoAs, Scopus citation counts of the submitted outputs were
compared with that UoA’s citation benchmarks (also obtained from Scopus as contextual data)
and used as an additional assessment criterion by the REF’s expert panels, besides peer review of
the scientific content of the outputs. Prior to the announcement of the REF 2014 assessment
results by HEFCE, the REF Results Analysis Tool provided by Elsevier allowed the UK’s HEIs
to compare and evaluate their own REF performance across several benchmarks and metrics.
More examples of national assessments where Scopus data were used include the 2013–2014
assessment in Portugal held by the FCT (“Fundação para a Ciência e a Technologia”), the ASN
(“Abilitazione Scientifica Nazionale”) national accreditation rounds (2012–2013, 2016–2018,
and 2018–2020), VQR (“Valutazione della Qualità della Ricerca”) national assessment in
意大利 (2012–2013, 2016–2017), and National University Corporation Evaluation (NUCE) 已经-
tional assessment in Japan held in 2016–2017 and to be held in 2020 by the National Insti-
tution for Academic Degrees and Quality Enhancement of Higher Education (NIAD-QE).

A second type of analysis supported by Scopus data is government science policy evaluations. 这
UK department of Business, Innovation and Skills (BIS) commissioned a report that entailed a com-
parative study of the UK’s international research base, 在 2011 (爱思唯尔, 2011) 和 2013 (爱思唯尔,
2013). 在 2016, another refresh of this report was issued by the newly renamed Department for
商业, Energy, & Industrial Strategy (BEIS) (爱思唯尔, 2016). More examples of the use of Scopus
data for research policy reports include those of the European Research Council (ERC) and other large
government bodies, often dealing with program evaluations and research landscape analyses. 许多
of these evaluations include different data sources of various types (macroeconomic data, for in-
姿态), as well as deep qualitative evaluations in which Elsevier works in consortia.

As a further example of this second type of analysis, Scopus data have also been used in the
production of bibliometric indicators for the US National Science Foundation’s Science and

Quantitative Science Studies

381

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

Engineering Indicators (SEI), 为了 2016 (National Science Board, 2016) 和 2018 (国家的
Science Board, 2018) editions, and will be used in future editions of the SEI reports up to 2022.
Scopus data are also the source of bibliometric indicators for the European Research Area in
the context of the 2010–2014 study “Analysis and Regular Update of Bibliometric Indicators
for the European Commission” (Science-Metrix, 2014) and have recently been selected for the
continuation and improvement of this study, now called “Provision and Analysis of Key
Indicators in Research and Innovation,” for the three coming years.

The third type of analysis is that of university rankings. University rankings are often com-
posed of combinations of evaluations for which only part is a bibliometric resource. 不同的
ranking bodies have variations of subjective and objective data sources to provide ranked lists
of universities. These rankings and the media attention they draw provide a platform for aca-
demia to engage with the public. Elsevier provides publication output, citation, and interna-
tional collaboration data from Scopus for each university to organizations in the field of
university rankings. These include the World University Rankings and its various derived
Regional, Global Subject, Young University, and World Reputation Rankings, as well as the
Wall Street Journal/THE College Rankings, which are all issued by Times Higher Education
(THE, 自从 2014); and the World University Rankings and its various derived rankings issued
by Quacquarelli Symonds (QS, 自从 2015); as well as various other regional and subject-
specific rankings, such as the Best Chinese Universities Ranking issued by ShanghaiRanking
Consultancy (自从 2015), Perspektywy in Poland, Maclean’s in Canada, National Institu-
tional Ranking Framework (NIRF) in India, the Financial Times Global MBA Ranking, 和
the Frankfurter Allgemeine Zeitung Economists Ranking in Germany.

The large-scale analyses supported by Scopus are reports in the government and commer-
cial space. In contrast to peer-reviewed scientific studies, where the focus is on access to the
data and open reproducibility of results, reports in the evaluation space focus on accountabil-
性. Key aspects in those engagements are therefore about how data providers deal with quality
担忧, quality assurance, and risk mitigation. 例如, what is the process and timeline
in case content is identified as missing?

3. DATA AVAILABILITY FOR RESEARCH

Next to the aim of supporting the academic community with robust results using reliable data
in the analyses mentioned, providing access to raw data is an essential component in securing
advancement of the bibliometric field. Until 2014, Elsevier supported bibliometricians with
data using the Elsevier Bibliometric Research Program (EBRP). The program provided pre-
compiled data sets to researchers, after a scientific board reviewed and approved a submitted
proposal. Its aim was to enable external research groups or individual researchers in the field
of bibliometrics and quantitative research assessment to carry out strategic research using
Elsevier data and to present the outcomes in peer-reviewed journal papers and at international
conferences.

自从 2014, application programming interfaces (蜜蜂) have taken over the role of provid-
ing access to raw data, allowing free use for scientific purposes, such as the text-and-data-min-
ing resources (爱思唯尔, 2019C) and Scopus APIs for academic research purposes (爱思唯尔,
2019A). Use of the APIs does not require a Scopus subscription1; without a subscription, 用户
will have limited access to basic metadata for most citation records, as well as to basic search
functionality. Full access to Scopus APIs is only granted to subscribers of Scopus.

1 https://dev.elsevier.com/sc_apis.html.

Quantitative Science Studies

382

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

此外, Scopus data have been available in bulk for research groups. Research groups
working with bulk Scopus data include CWTS (CWTS, 2019), SciMago (Scimago Lab, 2019),
DZHW (DZHW, 2019), SciTech Strategies (SciTech Strategies, 2018), and others through tai-
lored agreements that have been established between these groups and Elsevier.

The mission of the International Center for the Study of Research (ICSR) is to advance re-
search evaluation in all fields of knowledge production. To foster this development, the ICSR
provides access to a working environment where new ideas and hypotheses can be tested
against high-quality, large data sets. This platform, offering a virtual laboratory, will allow re-
searchers to collaboratively develop indicators and methodologies. Elsevier is providing com-
putational access to Scopus data for research purposes on this platform, free of charge. 这
will also enhance the reproducibility of scientometric studies, by enabling other researchers to
verify published research findings using the same data set and methodologies with shared
代码. Researchers can use the environment for such academic, noncommercial purposes,
and access will be organized by the ICSR to review submitted proposals for use of the lab
as well as actively reaching out to researchers to collaboratively work on specific research
问题. The platform allows researchers to create and extract aggregate derivatives that
can be published as part of their work, under the condition that the source of the data is ac-
knowledged. At the moment of writing, this platform is not yet publicly available and will be
announced through the ICSR website (International Center for the Study of Research, 2019).

4. EXAMPLES OF STUDIES USING SCOPUS DATA

Scopus data have been used as a source for many types of different bibliometric studies. 这
different quality properties of Scopus described support different types of analyses.

例如, there are studies on mobility, using Scopus’ unique historic author-affiliation
记录, such as by Caroline Wagner and Koen Jonkers on international collaboration, mobility,
and openness (瓦格纳 & Jonkers, 2017), funding and collaboration (莱德斯多夫, Bornmann, &
瓦格纳, 2019), and author (Pina, Barac, Buljan, Grimaldo, & Marušic, 2019) and institutional
(李, 2012) collaboration networks. Another example of author-mobility analyses can be found
in a bibliometric study to measure knowledge transfer (Aman, 2018). The mobility analysis using
Scopus author profiles also informs the research policy of governments, such as through the
European Commission’s Joint Research Center (JRC) report on the rise of China as an industrial
and innovation powerhouse (Preziosi et al., 2019).

此外, Scopus’ availability of author first names historically, combined with author
profiling, enables studies using author gender assignments: 例如, “The gender gap in
early-career transitions in the life sciences” (Lerchenmueller & Sorenson, 2018) and “Gender
differences in research areas, methods and topics: Can people and thing orientations explain
结果?” (Thelwall, 贝利, 托宾, & Bradshaw, 2019). 此外, Scopus author profiles
have been used to study the recent phenomenon of hyperprolific authorships (Ioannidis,
Klavans, & Boyack, 2018) and for an author database of highly cited researchers (Ioannidis,
Baas, Klavans, & Boyack, 2019; Van Noorden & Singh Chawla, 2019).

There are also examples of studies using the full Scopus database to build new algorithms:
Richard Klavans and Kevin Boyack developed algorithms on top of the database, 导致
Topics of Prominence (Klavans & Boyack, 2017), which are now prominently displayed in
Elsevier’s SciVal research performance product (which uses Scopus data as one of its data sources).

In the more traditional sense of bibliometric analysis, there are many studies available
around citation analysis and correlations, such as on the influence of highly cited articles

Quantitative Science Studies

383

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

on indicators (Thelwall, 2019; Thelwall & Fairclough, 2015), on correlation between citations
and Mendeley readership (Maflahi & Thelwall, 2016; Thelwall & Wilson, 2016), on journal
用法 (Schloegl & Gorraiz, 2010), and studies revisiting bibliometric laws (Thelwall & Wilson,
2014). Scopus data were also used to analyze initiatives in open science, particularly open
使用权 (Solomon, Laakso, & Björk, 2013), citizen science (Follett & Strezov, 2015) and new
tools in the scientific space, such as ResearchGate (Thelwall & Kousha, 2017). They have been
used to evaluate the fate of rejected manuscripts (Bornmann et al., 2009), to investigate po-
tential citation manipulation by reviewers (Baas & Fennell, 2019; Singh Chawla, 2019) 并
study the development of multidisciplinarity (莱维特 & Thelwall, 2008). 现在, Scopus data
are used for bibliometric analysis to inform the EU Open Science Monitor (The Lisbon
理事会, CWTS, & Esade, 2018).

Another form of common analysis performed using Scopus data is around network visualiza-
tion and spatial bibliometrics (Bornmann & De Moya Anegón, 2019; Bornmann & Waltman,
2011; 莱德斯多夫 & Persson, 2010; Mutz, Bornmann, de Moya Anegón, & Stefaner, 2014)
as well as research building new visualization techniques (莱德斯多夫, 2010; Mischo &
Schlembach, 2018).

5. WHAT CAN WE LEARN FROM SCOPUS DATA, TOGETHER?

In the preceding sections, we have outlined some of the concrete ways in which Scopus data
have been used for large-scale evaluative studies (typically at the national or institutional
级别) and for exploratory work leading to a plethora of papers on aspects as diverse as topic
detection, researcher mobility, and data visualization techniques. But the potential of Scopus
to uncover and understand the fundamental forces that drive human knowledge creation
through the research endeavor may be limited only by our capability to ask the right questions.
How do career paths form and change for individual researchers through space and time? Can
we follow people as they develop from “apprentice” to “master” and understand the drift in
their topical focus, collaborative patterns, geolocation, and research impact (by citation-based
indicators or other means) through careers that may be either very short or very long? Can we
identify the conditions that, near the beginning of a research career, predict a long and suc-
cessful contribution to the knowledge front? And those conditions that foreshadow an early
exit from the world of (academic) 研究? Going beyond Scopus, can we use standardized
researcher identifiers, such as ORCID, connected to nonresearch online personas, 例如
LinkedIn, to pinpoint the exit of trained researchers from publication-centric roles (大部分
within or adjacent to academia) into careers in organizations in the commercial or charitable
部门? What is the influence of gender, nationality, and early-career mentoring on these out-
来了, and how much remains unexplained? Is a career in research more likely the result of
persistence or of good fortune—and what does this mean for the development of better and
fairer evaluative structures in research? 最后, what are the implications of the answers to
these questions for all the actors in research, from educators to public policy experts and from
university career advisors to researchers themselves?

This single example shows that those who create and those who use Scopus suffer no lack
of imagination to ask challenging questions, and Scopus itself offers a firm base on which to
begin seeking answers. The remaining piece of the puzzle is a collective one: How can the
bibliometric research community and the creators of Scopus best come together to address
these challenges together? 在六月 2019, the ICSR (International Center for the Study of
研究, 2019) was launched, with a wide-ranging brief and the support of an advisory
board, including experts in research policy, research evaluation, and bibliometrics, to be a

Quantitative Science Studies

384

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

place where a dialogue can happen and research of great interest and importance can be
pursued—together. 爱思唯尔, Scopus and the ICSR do not see themselves as apart from the
world of research but as part of it, and this spirit will inform our work for many years to come.

致谢

The authors wish to express their gratitude to Elsevier colleague Roy Boverhof, who provided
the design of the charts in Figure 1.

COMPETING INTERESTS

The authors of this paper are Elsevier employees. Elsevier runs Scopus, which is the database
discussed in this article.

参考

Aman, V. (2018). A new bibliometric approach to measure knowl-
edge transfer of internationally mobile scientists. Scientometrics,
117(1), 227–247. https://doi.org/10.1007/s11192-018-2864-x
Baas, J。, & Fennell, C. (2019). When peer reviewers go rogue—
estimated prevalence of citation manipulation by reviewers based
on the citation patterns of 69,000 reviewers. ISSI 2019, 九月
2–5, 2019, 罗马, Italy https://www.issi2019.org/. Retrieved from
https://ssrn.com/abstract=3339568

Berkvens, 磷. (2012). Scopus custom data documentation. Retrieved
来自 https://p.widencdn.net/mrbekb/Scopus_Custom_Data_
Documentation_Version9

Bornmann, L。, & De Moya Anegón, F. (2019). Hot and cold spots in
the US research: A spatial analysis of bibliometric data on the
institutional level. Journal of Information Science, 45(1), 84–91.
https://doi.org/10.1177/0165551518782829

Bornmann, L。, & Waltman, L. (2011). The detection of “hot re-
gions” in the geography of science—A visualization approach
by using density maps. Journal of Informetrics, 5(4), 547–553.
https://doi.org/10.1016/j.joi.2011.04.006

Bornmann, L。, 马克思, W., Schier, H。, Rahm, E., Thor, A。, & Daniel,
H.-D. (2009). Convergent validity of bibliometric Google Scholar
data in the field of chemistry—Citation counts for papers that were
accepted by Angewandte Chemie International Edition or rejected
but published elsewhere, using Google Scholar, Science Citation
Index, Scopus, and Chemical Abstracts. Journal of Informetrics,
3(1), 27–35. https://doi.org/10.1016/j.joi.2008.11.001

CWTS. (2019). CWTS Journal Indicators. Retrieved from http://

www.journalindicators.com/

DZHW. (2019). Competence centre for bibliometrics. Retrieved

来自 https://www.dzhw.eu/en/forschung/projekt?pr_id=484

爱思唯尔. (2011). International comparative performance of the UK
research base—2011. Retrieved from https://www.elsevier.com/
research-intelligence/resource-library/international-comparative-
performance-of-the-uk-research-base-2011

爱思唯尔. (2013). International comparative performance of the UK
research base—2013. Retrieved from https://www.elsevier.com/
research-intelligence/research-initiatives/ BIS2013

爱思唯尔. (2016). International comparative performance of the UK
research base—2016. Retrieved from https://www.elsevier.com/
research-intelligence/research-initiatives/ beis2016

爱思唯尔. (2017). Gender in the global research landscape.
阿姆斯特丹: 爱思唯尔. Retrieved from https://www.elsevier.com/
research-intelligence/resource-library/ty/gender-in-the-global-
research-landscape

爱思唯尔. (2019A). Academic research. Retrieved from https://dev.

elsevier.com/academic_research_scopus.html

爱思唯尔. (2019乙). Content—how Scopus works. Retrieved from
https://www.elsevier.com/solutions/scopus/how-scopus-works/
内容

爱思唯尔. (2019C). Text and data mining policy. Retrieved from
https://www.elsevier.com/about/policies/text-and-data-mining
Follett, R。, & Strezov, V. (2015). An analysis of citizen science
based research: Usage and publication patterns. PLoS ONE.
https://doi.org/10.1371/journal.pone.0143687

Holland, K., Brimblecombe, P。, Meester, 瓦. J。, & Steiginga, S.
(2019). The importance of high-quality content: Curation and
re-evaluation in Scopus. 爱思唯尔. Retrieved from https://万维网.
elsevier.com/research-intelligence/resource-library/scopus-high-
quality-content

International Center for the Study of Research. (2019). Retrieved

来自 https://www.elsevier.com/icsr

Ioannidis, J. P。, Klavans, R。, & Boyack, K. 乙. (2018). Thousands of
scientists publish a paper every five days. 自然, 561, 167–169.
https://doi.org/10.1038/d41586-018-06185-8

Ioannidis, J。, Baas, J。, Klavans, R。, & Boyack, K. (2019). A standard-
ized citation metrics author database annotated for scientific
场地. PLOS Biology, 17(8), e3000384. https://doi.org/10.1371/
journal.pbio.3000384

Klavans, R。, & Boyack, K. 瓦. (2017). Research portfolio analysis
and topic prominence. Journal of Informetrics. https://doi.org/
10.1016/j.joi.2017.10.002

李, D. S. (2012). Collaboration network patterns and research per-
formance: The case of Korean public research institutions.
Scientometrics, 91(3), 925–942. https://doi.org/10.1007/s11192-
011-0602-8

Lerchenmueller, 中号. J。, & Sorenson, 氧. (2018). The gender gap in
early career transitions in the life sciences. Research Policy, 47(6),
1007–1017. https://doi.org/10.1016/j.respol.2018.02.009

莱维特, J. M。, & Thelwall, 中号. (2008). Is multidisciplinary research more
highly cited? A macrolevel study. Journal of the American Society for
Information Science and Technology, 59(12), 1973–1984. https://土井.
org/10.1002/asi.20914

莱德斯多夫, L. (2010). Journal maps on the basis of Scopus data: A
comparison with the Journal Citation Reports of the ISI. 杂志
the American Society for Information Science and Technology,
61(2), 352–369. https://doi.org/10.1002/asi.21250

莱德斯多夫, L。, & Persson, 氧. (2010). Mapping the geography of
科学: Distribution patterns and networks of relations among

Quantitative Science Studies

385

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Scopus as a curated, high-quality bibliometric data source

cities and institutes. Journal of the American Society for Infor-
mation Science and Technology, 61(8), 1622–1634. https://土井.
org/10.1002/asi.21347

莱德斯多夫, L。, Bornmann, L。, & 瓦格纳, C. S. (2019). The relative
influences of government funding and international collaboration
on citation impact. Journal of the Association for Information
Science and Technology, 70(2). https://doi.org/10.1002/asi.24109
Maflahi, N。, & Thelwall, 中号. (2016). When are readership counts as
useful as citation counts? Scopus versus Mendeley for LIS jour-
nals. Journal of the Association for Information Science and
技术, 67(1), 191–199. https://doi.org/10.1002/asi.23369
Mischo, 瓦. H。, & Schlembach, 中号. C. (2018). A system for gen-
erating research impact visualizations over medical research
团体. Journal of Electronic Resources in Medical Libraries, 15(2),
96–107. https://doi.org/10.1080/15424065.2018.1507773

Mutz, R。, Bornmann, L。, de Moya Anegón, F。, & Stefaner, 中号. (2014).
Ranking and mapping of universities and research-focused insti-
tutions worldwide based on highly-cited papers: A visualisation
of results from multi-level mode. Online Information Review,
43–58. https://doi.org/10.1108/OIR-12-2012-0214

National Science Board. (2016). Science and engineering indica-
托尔斯 2016. National Science Foundation. Retrieved from https://
www.nsf.gov/statistics/2016/nsb20161/

National Science Board. (2018). Science and engineering indica-
托尔斯 2018. National Science Foundation. Retrieved from https://
www.nsf.gov/statistics/2018/nsb20181/

Pina, D. G。, Barac, L。, Buljan, 我。, Grimaldo, F。, & Marušic, A. (2019).
Effects of seniority, gender and geography on the bibliometric out-
put and collaboration networks of European Research Council
(ERC) grant recipients. PLoS ONE, 14(2), e0212286. https://doi.org/
10.1371/journal.pone.0212286

Preziosi, N。, Fako, P。, Hristov, H。, Jonkers, K., Goenaga, X。, Alves
Dias, P。, … Hristov, H. (2019). China—Challenges and prospects
from an industrial and innovation powerhouse. JRC. https://土井.
org/10.2760/445820

Schloegl, C。, & Gorraiz, J. (2010). Comparison of citation and usage
指标: The case of oncology journals. Scientometrics, 82(3),
567–580. https://doi.org/10.1007/s11192-010-0172-1

Schotten, M。, el Aisati, M。, Meester, 瓦. J。, Steiginga, S。, & Ross, C. A.
(2017). A Brief history of Scopus: The world’s largest abstract and
citation database of scientific literature. 在F中. J. Cantú-Ortiz (埃德。),
Research Analytics. Boosting University Productivity and Competi-
tiveness through Scientometrics (PP. 31–58). Boca Raton, FL: 泰勒
& 弗朗西斯集团. http://dx.doi.org/10.1201/9781315155890-3
Science-Metrix. (2014). Analysis of bibliometric indicators for
European policies. 欧盟委员会. Retrieved from https://

ec.europa.eu/research/innovation-union/pdf/bibliometric_indicators_
for_european_policies.pdf

Scimago Lab. (2019). SJR. Retrieved from https://www.scimagojr.

com/

SciTech Strategies. (2018). SciTech Strategies. Retrieved from

Home

Singh Chawla, D. (2019). Elsevier investigates hundreds of peer re-
viewers for manipulating citations. 自然, 573, 174. https://土井.
org/10.1038/d41586-019-02639-9

Solomon, D. J。, Laakso, M。, & Björk, B.-C. (2013). A longitudinal
comparison of citation rates and growth among open access
journals. Journal of Informetrics, 7(3), 642–650. https://doi.org/
10.1016/j.joi.2013.03.008

The Lisbon Council, CWTS, & Esade. (2018). OPEN science monitor—
draft methodological note. Retrieved from https://ec.europa.eu/
info/sites/info/files/open_science_monitor_methodological_note_
v2.pdf

Thelwall, 中号. (2019). The influence of highly cited papers on field
normalised indicators. Scientometrics, 118(2), 519–537. https://
doi.org/10.1007/s11192-018-03001-y

Thelwall, M。, 贝利, C。, 托宾, C。, & Bradshaw, N.-A. (2019). 性别
differences in research areas, methods and topics: Can people and
thing orientations explain the results? Journal of Informetrics, 13(1),
149–168. https://doi.org/10.1016/j.joi.2018.12.002

Thelwall, 我。, & Fairclough, 右. (2015). Geometric journal impact fac-
tors correcting for individual highly cited articles. 杂志
Informetrics, 9(2), 263–272. https://doi.org/10.1016/j.joi.2015.
02.004

Thelwall, M。, & Kousha, K. (2017). ResearchGate articles: 年龄, 迪斯-
cipline, audience size, and impact. Journal of the Association for
Information Science and Technology, 68(2), 468–479. https://
doi.org/10.1002/asi.23675

Thelwall, M。, & Wilson, 磷. (2014). Distributions for cited articles
from individual subjects and years. Journal of Informetrics, 8(4),
824–839. https://doi.org/10.1016/j.joi.2014.08.001

Thelwall, M。, & Wilson, 磷. (2016). Mendeley readership altmetrics for
medical articles: An analysis of 45 fields. Journal of the Association
for Information Science and Technology, 67(8), 1962–1972. https://
doi.org/10.1002/asi.23501

Van Noorden, R。, & Singh Chawla, D. (2019). Hundreds of extreme
self-citing scientists revealed in new database. 自然, 572(7771),
578–579. https://doi.org/10.1038/d41586-019-02479-7

瓦格纳, C。, & Jonkers, K. (2017). Open countries have strong sci-
恩斯. 自然, 550, 32–33. https://doi.org/10.1038/550032a
What is ORCID?. (2018). Retrieved from https://support.orcid.org/

hc/en-us/articles/360006897674

Quantitative Science Studies

386

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

1
1
3
7
7
1
7
6
0
8
8
2
q
s
s
_
A
_
0
0
0
1
9
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3ARTICLE image
ARTICLE image
ARTICLE image
ARTICLE image

下载pdf