ARTÍCULO DE INVESTIGACIÓN

ARTÍCULO DE INVESTIGACIÓN

Crossref as a bibliographic discovery tool
in the arts and humanities

Ángel Borrego1

, Jordi Ardanuy1

, and Llorenç Arguimbau2

1Departament de Biblioteconomia, Documentació i Comunicació Audiovisual & Centre de Recerca en Informació,
Comunicació i Cultura (CRICC), Universitat de Barcelona, Barcelona, España
2ContextI+D, Cerdanyola del Vallès, España

Palabras clave: letras, citation indexes, Crossref, humanidades, scholarly communication

ABSTRACTO

Crossref is an official digital object identifier registration agency launched in 2000 as a joint
effort between publishers to allow persistent cross-publisher citation linking in online
academic journals. Our study explores the coverage of Crossref for tracking literature in the
arts and humanities, which usually has a national or regional focus and targets domestic
audiences. An analysis of the coverage of ERIH PLUS journals shows that Crossref indexes
more sources than Scopus and includes additional journals from Eastern and Southern Europe
and the Global South. Crossref limitations arise when analyzing the amount of metadata
deposited by publishers. Just two-thirds of the journals deposit abstracts and ORCIDs and
around a third deposit affiliations. The level of metadata completion for individual articles is
más bajo, with major differences depending on the language of the document. Just half of the
journals actually deposit references. Como resultado, Scopus retrieves more citations than Crossref,
except for publications in German and French. Crossref represents a promising bibliographic
discovery tool in the arts and humanities but is in need of improvement regarding the level of
metadata completion.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

1.

INTRODUCCIÓN

Several bibliographic data sources have appeared in recent years, thereby diversifying the set
of tools available for searching for academic literature. In contrast to traditional bibliographic
databases provided by commercial companies such as Scopus (Baas, Schotten et al., 2020)
and Web of Science (Birkle, Pendlebury et al., 2020), some of these bibliographic information
providers offer metadata available openly to the public. These metadata are license-free
because metadata are facts: They cannot be owned, and therefore they have no license.
One of the most important open metadata infrastructure systems in this information landscape
is Crossref, an official digital object identifier (DOI) registration agency (Hendricks, Tkaczyk
et al., 2020).

Crossref1 is a not-for-profit association that provides most persistent identifiers assigned to
academic publications and publishes the metadata associated with these publications. Fue
launched in 2000 as a collaborative effort by publishers to enable persistent cross-publisher
citation linking between academic journals. DOIs are used to uniquely identify digital objects

un acceso abierto

diario

Citación: Borrego, Á., Ardanuy, J., &
Arguimbau, l. (2023). Crossref as a
bibliographic discovery tool in the arts
and humanities. Quantitative Science
Estudios, 4(1), 91–104. https://doi.org/10
.1162/qss_a_00240

DOI:
https://doi.org/10.1162/qss_a_00240

Revisión por pares:
https://www.webofscience.com/api
/gateway/wos/peer-review/10.1162
/qss_a_00240

Recibió: 5 Abril 2022
Aceptado: 27 December 2022

Autor correspondiente:
Ángel Borrego
borrego@ub.edu

Editor de manejo:
Juego Waltman

Derechos de autor: © 2023 Ángel Borrego, Jordi
Ardanuy, and Llorenç Arguimbau.
Publicado bajo Creative Commons
Atribución 4.0 Internacional (CC POR 4.0)
licencia.

La prensa del MIT

1 https://www.crossref.org.

Crossref as a bibliographic discovery tool

(artículos, data sets, monographs, reports, etc.). A DOI takes the form of a character string
dividido en dos partes, a prefix and a suffix, separated by a slash (p.ej., 10.1000/173). El
DOI remains fixed over the lifetime of the document and is tied to its metadata, incluido
the URL, thus providing access to the document. Referring to an online document by its
DOI supposedly provides a more stable link than simply using its URL. Sin embargo, for this to
happen, publishers must update metadata in the event of a change in URL so that the DOI
links to the new URL.

The open availability of document metadata through Crossref underlies the proliferation of
new academic information services (Martín-Martín, 2021): academic search engines such as
Dimensions and Lens; reference management software such as Zotero; and services to identify
open access versions of academic publications such as Unpaywall.

Multidisciplinary bibliographic databases such as Scopus and Web of Science have tradi-
tionally been criticized for their limitations in terms of tracking research in the social sciences
and humanities (Mongeon & Casa de Pablo, 2016). Research in these fields frequently has a
national or regional focus and targets domestic audiences. Como resultado, a considerable number
of academic publications in these fields are published in national or regional journals outside
the coverage of Scopus and Web of Science (Nederhof, 2006). Several studies have identified
an overrepresentation of English language journals and English-speaking countries and an
underrepresentation of documents from the arts, humanidades, and social sciences in both
Web of Science and Scopus, although the latter has much wider coverage (Mongeon &
Casa de Pablo, 2016; Vera-Baceta, Thelwall, & Kousha, 2019). A quick search shows that 93.1%
of articles, reviews, and proceedings indexed in Scopus in 2020 were in English, mientras que la
figure for Web of Science was 96.5%. Google Scholar provides better coverage than Scopus
and Web of Science but is limited in terms of usage, thereby reducing its usefulness for large-
scale citation analyses (Martín-Martín, Orduna-Malea et al., 2018).

Our study aims to explore the coverage of Crossref for tracking literature in the arts and
humanities by analyzing its coverage of journals in ERIH PLUS2, an index containing biblio-
graphic information on academic journals in the social sciences and humanities. ERIH stands
for “European Reference Index for the Humanities.” However, en 2014, the list was renamed
ERIH PLUS, to indicate that it had been extended to include social science disciplines as well.
Because it is hard to draw a precise line between humanities and social sciences, tenemos
considered all journals listed in ERIH PLUS. The inclusion of social science journals in the list
should be borne in mind when analyzing the results. Crossref coverage was also compared
with that of Scopus, with a special focus on geographical differences in the indexing of jour-
nals by both sources. Además, the amount of metadata present in Crossref records was
measured for a sample of articles in the arts and humanities published in 2020 in eight lan-
calibres. Finalmente, the number of citations to this sample of articles retrieved by Crossref and
Scopus was compared.

2. BACKGROUND

2.1. What Is Crossref?

Crossref was created as a neutral party among publishers to enable the exchange of links
between article reference lists through DOIs. It was envisioned as a digital archive of journals,
accessible free of charge and with the added value of reference linking (Crossref, 2009, pag. 8).

2 https://kanalregister.hkdir.no/publiseringskanaler/erihplus/.

Estudios de ciencias cuantitativas

92

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

The metadata deposited by publishers for bibliographic works includes the reference lists.
Crossref uses these references to create links between works that cite each other. El número
of citations each work receives is visible to anyone through Crossref public APIs. Además,
Crossref members who deposit references can retrieve the full list of citing works (not just the
count), and can display them on their website3. Actualmente, journal content represents the
largest subset of Crossref content, given that it accounted for 73% del 106 million records
registered in 2019 (Hendricks et al., 2020).

Crossref asks members to deposit as much rich metadata as possible, including the list of
references. Until recently, members could choose whether their references were “closed”
(only used for the “cited-by” service, but not distributed through any public interface), “lim-
ited” (organizations that signed an agreement for a subscription-based service could access
these references) or “open” (available to everyone through open APIs) (Hendricks et al.,
2020, pag. 425). Sin embargo, this “reference distribution preference” was removed and, desde 3
Junio 2022, all references in Crossref are treated as open metadata4.

The Initiative for Open Citations (I4OC)5 is an advocacy group that campaigns to encour-
age publishers to make references of their academic publications openly available. Based on
these data, the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI)6 ha sido
desarrollado (Heibi, peroni, & Shotton, 2019; peroni & Shotton, 2020).

2.2. How Does Crossref Compare to Other Bibliographic Data Sources?

One of the first systematic studies to compare Crossref to other bibliographic databases
was conducted by Harzing (2019), who concluded that it might serve as a good alternative
to Scopus and Web of Science, although Google Scholar and Microsoft Academic7 were
the most comprehensive free sources of bibliographic information.
In a subsequent
estudiar, Chudlarský and Dvořák (2020) studied whether Crossref could replace Web of
Science for research evaluation purposes using the Czech Technical University in Prague as
a case study. They observed that just 53.7% of Web of Science citation links were present
in COCI.

Martín-Martín, Thelwall et al. (2021) compared the coverage of more than three million
citations to a sample of highly cited documents in six data sources. They concluded that Goo-
gle Scholar was the most comprehensive source, whereas COCI was the smallest, given that it
retrieved just 28% of all citations. Sin embargo, an update in September 2021 showed that COCI
coverage had increased to cover up to 53% de citas (Martín-Martín, 2021).

Van Eck and Waltman (2021) focused on the amount of metadata provided by Crossref to
measure the availability of six elements in Crossref: reference lists, abstracts, ORCIDs, author
affiliations, funding information, and license information. They observed that coverage had
improved with respect to previous measurements, although there were significant differences
in the submission of metadata among publishers. A subsequent study by Visser, Van Eck, y
waltman (2021) compared Crossref with four multidisciplinary bibliographic data sources:

3 https://www.crossref.org/services/cited-by/.
4 https://www.crossref.org/ blog/amendments-to-membership-terms-to-open-reference-distribution-and

-include-uk-jurisdiction/.

5 https://i4oc.org.
6 https://opencitations.net/index/coci.
7 Microsoft Academic was discontinued at the end of 2021: https://www.microsoft.com/en-us/research/project

/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/.

Estudios de ciencias cuantitativas

93

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

Dimensions, Microsoft Academic, Scopus and Web of Science. In terms of size, Crossref
covered 35 million documents published in the period 2008–2017, which was substantially
more than Scopus and Web of Science. Sin embargo, in terms of references, 58% of the citation
links in Scopus could not be retrieved from Crossref. Como se ha mencionado más arriba, some publishers
deposited documents in Crossref without references, but others did not make them openly
disponible.

In the health sciences, Liang, Mao et al. (2021) investigated the coverage and citation qual-
ity of five freely available data sources for 30 million PubMed documents. Dimensions was the
most comprehensive data source, given that it provided references for 62.4% of the docu-
mentos, whereas COCI covered 34.7%.

Beyond comparative studies on the coverage of different data sources, the Ministry of Edu-
cation and Science of Ukraine launched the Open Ukrainian Citation Index (OUCI)8, a search
engine and citation database that comprises citations from all publishers that use Crossref’s
“cited-by” service (Cheberkus & Nazarovets, 2019). Based on this tool, Mryglod, Nazarovets,
and Kozmenko (2021) conducted a disciplinary analysis of Ukrainian economic research
based on Crossref data.

3. OBJECTIVES

This article aims to explore the coverage of academic publications in the arts and humanities
in Crossref for tracking the literature in these fields, with a special focus on geographical and
linguistic coverage. The study is underpinned by the following research questions:

1. To what extent are ERIH PLUS journals covered by Crossref?
2. How does the coverage of Crossref compare with that of Scopus for ERIH PLUS journals?
3. Are there any geographical differences in the coverage of both sources?
4. To what extent are the metadata of individual articles deposited in Crossref?
5. How does the number of citations received by articles in Crossref compare to the num-

ber of citations received in Scopus?

4. MÉTODOS

4.1.

Journal-Level Comparison

A possible approach for comparing the coverage of several bibliographic databases would be
to record all the journals indexed by each source in a single list. It would then be possible to
measure the extent to which each source covers the whole set of journals. This approach was
not feasible in our study, as Crossref does not support subject searching9 and it was therefore
not possible to identify the arts and humanities journals indexed. En cambio, we used ERIH PLUS
as the initial source of journals and measured the coverage of Crossref and Scopus against
this list.

ERIH PLUS is an index that holds bibliographic information on academic journals in the
social sciences and humanities. Journals submitted for inclusion in ERIH PLUS are evaluated
based on several criteria related to editorial quality, authorship, transparencia, etc.. En el momento
of data collection, Febrero 2022, ERIH PLUS listed 10,213 journals.

8 https://ouci.dntb.gov.ua/en/.
9 https://community.crossref.org/t/retrieve-subjects-and-subject-from-journals-and-works/2403.

Estudios de ciencias cuantitativas

94

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

The set of journals listed in ERIH PLUS was compared with the journals indexed in Scopus
considering print and online ISSNs10. Similarmente, all ISSNs were searched in Crossref through its
public API using the R package rcrossref 11.

To identify any geographical differences in the coverage of journals by both sources, nosotros
classified the journals’ countries of publication provided by ERIH PLUS according to the
geographical regions used by the United Nations Statistics Division in its publications and
bases de datos. This division compiles and disseminates global statistical information, develops
standards and norms for statistical activities, and supports countries’ efforts to strengthen their
national statistical systems12.

4.2. Article-Level Comparison

To determine the extent to which metadata (abstracts, ORCIDs, affiliations, fondos, licenses,
and references) were present in individual records, we built a sample of articles in the arts and
humanities published in 2020. As Crossref does not support subject searching, we retrieved all
journal content (mainly articles, but also reviews, editorials, letters, etc.) from Scopus with a
DOI classified in the arts and humanities published in English in 2020 that had received three
or more citations at the time of data collection in February 2022 (norte = 17,054), and all journal
content with a DOI classified in the arts and humanities in the seven languages with output of
al menos 1,000 documents in 2020: Español (norte = 7,330), Russian (norte = 4,696), Francés (norte =
3,330), italiano (norte = 1,864), Portuguese (norte = 1,760), Alemán (norte = 1,583) and Polish (norte =
1,127). The query used to retrieve the records from Scopus was as follows:

SUBJAREA(letras) AND DOI(10.*) Y (LIMIT-TO(SRCTYPE,»j»)) Y
(LIMIT-TO(PUBYEAR,2020))

We searched all DOIs in Crossref through its public API using the R package rcrossref. En
addition, for the analysis of the metadata deposited, we compared the number of citations
received by each article according to both sources, Scopus and Crossref. En el caso de
Crossref, we considered DOI to DOI citations, the ones recorded by the source. En el caso de
Scopus, we considered all citations, including the ones received from documents without a DOI.

4.3. Source Code and Data Availability

The R code to retrieve data from Crossref and to reproduce the analysis is available at https://
github.com/angbor09/crossref_humanities/.

5. RESULTADOS

5.1.

Journal Coverage

Scopus indexed 49% of the journals listed in ERIH PLUS, while Crossref indexed 80%
(Mesa 1). Había, sin embargo, major differences in the coverage of journals by both sources
depending on the world region in which journals are published. De este modo, Scopus presented wide
coverage of journals published in North America (86% of the journals listed in ERIH PLUS),
Oceania (86%), Northern Europe (85%), and Western Europe (74%). These percentages were

10 https://www.scopus.com/sources.uri.
11 https://github.com/ropensci/rcrossref.
12 https://unstats.un.org/UNSDWebsite/; https://unstats.un.org/unsd/methodology/m49/.

Estudios de ciencias cuantitativas

95

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

World regions
África

Asia

Europa (East)

Europa (North)

Europa (South)

Europa ( Oeste)

Latin America and
the Caribbean

North America

Oceania

Not available

Total

Mesa 1.

ERIH PLUS journals covered by Scopus and Crossref by world region

ERIH PLUS
journals
41

357

2,280

1,590

1,843

1,265

1,412

1,231

51

143

10,213

Journals
in Scopus

17

74

547

1,351

778

938

173

1,064

44

40

5,026

% journals
in Scopus
41

21

24

85

42

74

12

86

86

28

49

Journals
in Crossref

23

284

1,666

1,489

1,210

1,086

1,078

1,154

45

113

8,148

% journals
in Crossref
56

80

73

94

66

86

76

94

88

79

80

lower for journals published in Southern Europe (42%), África (41%), Eastern Europe (24%),
Asia (21%), and Latin America and the Caribbean (12%).

Crossref presented better coverage of journals published worldwide. Like Scopus, it cov-
ered ERIH PLUS journals published in North America (94%), Northern Europe (94%), Oceania
(88%), and Western Europe (86%). Coverage was also wide for Asia (80%), Latin America and
the Caribbean (76%), and Eastern Europe (73%). The regions with lowest coverage were
Southern Europe (66%) and Africa (56%), although in both cases the coverage was higher than
that provided by Scopus.

When detailing the metadata deposited by journal publishers, Crossref made a distinction
between “backfile” records (es decir., those with a publication date older than 2 años) y
“current” records (es decir., those published within the last 2 años)13. Mesa 2 details the metadata
deposited by publishers of journals listed in ERIH PLUS for “current” records (es decir., para
articles published in the past 2 años). When searching for an ISSN, Crossref returns a set of
information for the journal, including logical fields (“True” or “False”) indicating whether the
journal deposits abstracts, ORCIDs, etc.. The value is “True” as long as one article has an
abstract (or an ORCID, etc.). Por ejemplo, the finding that 64% of journals deposit abstracts
means that 64% of the journals had deposited at least one abstract.

The amount and type of metadata deposited in Crossref varied greatly depending on the
world region in which the journal was published. De este modo, journals published in Latin America
and the Caribbean (86%), Southern Europe (83%), and Eastern Europe (75%) were most likely
to deposit abstracts for their articles. Publishers in Northern Europe were most likely to deposit
ORCIDs (78%) and affiliations (67%), whereas publishers in Latin American and the Caribbean
tended to deposit ORCIDs (77%) but not affiliations (11%). The information on research

13 https://github.com/CrossRef/rest-api-doc/issues/47.

Estudios de ciencias cuantitativas

96

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

Mesa 2. Metadata deposited in Crossref by publishers of journals listed in ERIH PLUS by world region

África

Asia

Europa (East)

Europa (North)

Europa (South)

Europa ( Oeste)

Latin America and
the Caribbean

North America

Oceania

NA

Total

Abstracts
(%)
43

ORCIDs
(%)
57

Affiliations
(%)
30

Funders
(%)
17

Licenses
(%)
43

Open references
(%)
74

Referencias
(%)
43

30

75

49

83

57

86

46

47

69

64

26

58

78

67

53

77

59

71

67

64

8

24

67

12

39

11

48

56

26

33

4

3

62

8

35

3

51

49

18

26

17

38

84

44

56

66

59

62

50

56

55

59

89

54

84

46

76

80

68

68

54

34

79

30

61

18

66

64

47

49

funding was most frequently deposited by journals published in Northern Europe (62%) y,
en un grado menor, in North America (51%). Publishers in Northern Europe (84%) usually depos-
ited information on articles’ licenses, whereas this information was provided to a lesser extent
for journals published in other world regions.

Until June 2022, Crossref members could set reference distribution to open, limitado, o
closed. Sin embargo, this setting was not linked to the actual submission of references. Most jour-
nal (68%), especially those in Northern Europe (89%) and Western Europe (84%), had used
the default setting of open. Sin embargo, just half of the journals (49%) actually registered refer-
ences, whether open or not, for articles published in the past 2 años.

5.2. Article Metadata

The fact that a publisher has deposited metadata for articles published within the past 2 años
does not mean that it has done so systematically for all its records. Por lo tanto, to determine the
extent to which publishers actually deposit metadata in Crossref, we built a sample of articles.
Given the importance of domestic journals in the dissemination of academic publications in
the arts and humanities and the inequalities observed in the coverage of journals in different
world regions, we analyzed the presence of metadata for articles in English that had received
three or more citations at the time of data collection and for all articles in the seven languages
with output of more than 1,000 articles in Scopus in 2020 (Mesa 3).

Most of the arts and humanities articles indexed by Scopus in 2020 were also present in
Crossref, with coverage ranging from 86% for articles in Polish to 99% for articles in English,
which was the most frequent language in the sample. The only major exception was for arti-
cles in Italian, with just a quarter (27%) of the articles indexed in Scopus present in Crossref.

There were major differences in the amount and type of metadata deposited depending on
the language of the document. De este modo, most articles in Portuguese (81%), Español (71%), y

Estudios de ciencias cuantitativas

97

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

q
tu
a
norte

t
i
t

a

i

t
i
v
mi
S
C
mi
norte
C
mi
S
tu
d
mi
s

t

i

Mesa 3.

Article metadata deposited by publishers in Crossref

Artículos
en
Scopus
17,054

7,330

4,696

3,330

1,864

Idioma
Inglés

Español

Russian

Francés

italiano

Portuguese

1,760

Alemán

Polish

1,583

1,127

Artículos
retrieved
de
Crossref
16,953

% artículos
retrieved
de
Crossref
99

Abstracts
Crossref
5,280

% artículos
con
abstract
31

Fondos
Crossref
5,939

% artículos
con
fondos
info
35

6,698

4,430

3,031

510

1,697

1,453

965

91

94

91

27

96

92

86

4,735

1,944

498

255

1,369

649

651

71

44

16

50

81

45

67

76

41

26

1

8

68

2

1

1

1

0

0

5

0

% artículos
con
licencia
info
69

42

9

19

22

75

17

34

License
Crossref
11,714

2,844

397

580

114

1,274

244

329

Referencias
Crossref
14,947

1,503

286

928

103

653

238

69

% artículos
con
references
88

22

9

21

20

45

14

7

9
8

C
r
oh
s
s
r
mi
F

a
s

a

b
i
b
yo
i
oh
gramo
r
a
pag
h
i
C

d
i
s
C
oh
v
mi
r
y

t
oh
oh
yo

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

Polish (68%) had an abstract, whereas these percentages dropped to 31% for articles in
Inglés. Por el contrario, 88% of the articles in English included references, the next highest rate
being in Portuguese (45%). Thirty-five percent of the articles in English included funding infor-
formación, but in other languages this information appeared only very rarely.

The presence or absence of metadata does not necessarily reflect the commitment of pub-
lishers to provide information. Some fields may not be applicable to certain articles. Esto es
the case, por ejemplo, with editorials or letters that lack abstracts or articles that do not
acknowledge any source of funding. Por lo tanto, figures in Table 3 cannot be assessed against
the supposed ideal of 100% completion, although articles and reviews accounted for 96% de
the documents in the sample. It is difficult to make comparisons with Scopus given its export
limits. Sin embargo, for a sample of the two thousand most cited articles in each language (norte =
15,123), Scopus provided abstracts in 83% of the records and funding information in just
8%. As with Crossref, funding information in Scopus was mostly available for articles in
Inglés.

To determine the extent to which authors’ ORCIDs and affiliations were deposited, nosotros
retrieved all authors from the sample of articles. We did not remove duplicates, but considered
the presence of this information in the metadata of each article published by any given author.
Mesa 4 shows significant differences in the presence of this information according to the lan-
guage of the document. De este modo, authors’ ORCIDs were mostly present for outputs in Portuguese,
Polish and, to a much lesser extent, Español. Por el contrario, affiliations were present in German
and English publications, although in neither case did this information reach half of the
autores. Scopus included affiliation metadata for 83% of the records in a sample of the most
cited articles in each language (norte = 15,123).

5.3. Number of Citations

Finalmente, we compared the number of citations received by each article according to both
sources, Scopus and Crossref (Mesa 5). To make the comparison meaningful, we restricted
the analysis to articles present in both sources.

Most of the articles in the sample were in English and all of them had received at least
three citations at the time of data collection. For outputs in this language, Scopus presented a

Mesa 4. ORCID and affiliation metadata deposited by publishers in Crossref

Idioma
Inglés

Autores
51,835

ORCIDs
13,458

% ORCIDs
26

Affiliations
21,969

% affiliations
42

Español

Russian

Francés

italiano

Portuguese

Alemán

Polish

9,786

9,057

4,706

618

2,602

1,705

1,065

3,651

1,362

187

44

1,801

37

666

37

15

4

7

69

2

63

732

808

301

133

916

735

30

7

9

6

22

35

43

3

Estudios de ciencias cuantitativas

99

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

Idioma
Inglés

Español

Russian

Francés

italiano

Portuguese

Alemán

Polish

Mesa 5.

Citations to articles in Scopus and Crossref

Artículos
16,953

Citations in Scopus
124,208

Citations in Crossref
117,286

6,698

4,430

3,031

510

1,697

1,453

965

1,708

1,150

1,145

69

260

296

103

959

582

1,169

32

205

337

76

minor advantage over Crossref, given that it retrieved 6% more citations (Cifra 1). For out-
puts in other languages, there was no clear pattern. Crossref retrieved more citations than
Scopus for documents in German (+14%) y francés (+2%), whereas Scopus retrieved more
citations for the remaining languages. In Russian (+98%) and Spanish (+78%), Scopus had
nearly double the number of citations retrieved by Crossref. In the remaining languages,
the number of outputs and citations was very small, thus limiting the meaningfulness of
the results.

We compared the number of citations received by each output according to both sources,
although we did not analyze overlaps in citations in the two databases. Sin embargo, Cifra 2
shows a high level of association between the number of citations received by each output in
both sources for articles in English and French. The relationship was much weaker for articles
in other languages.

Cifra 1. Citations by language of the publication.

Estudios de ciencias cuantitativas

100

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 2. Citations received by each output according to Scopus and Crossref by language of the publication.

Estudios de ciencias cuantitativas

101

Crossref as a bibliographic discovery tool

6. DISCUSSION AND CONCLUSIONS

The results of our study illustrate the advantages and limitations of Crossref as a source of bib-
liographic information in the arts and humanities. Crossref is an open resource built on the
information deposited by publishers. It indexes a larger share of ERIH PLUS journals than
Scopus. The additional journals covered are published mainly in Eastern and Southern Europe
and the so-called Global South (es decir., África, Asia, and Latin America)14. These results are con-
sistent with those of previous studies (Mongeon & Casa de Pablo, 2016; Vera-Baceta et al., 2019)
that have revealed an overrepresentation of English language journals in Scopus, and are note-
worthy given that research in the arts and humanities frequently has a national or regional
focus and is published in domestic journals.

When searching for individual articles, the overwhelming majority of those indexed in
Scopus were also available in Crossref. The only major exception was articles in Italian, cual
presented a very low level of coverage in Crossref. A series of online searches suggest that
Italian scholarly journals may be registering their DOIs not with Crossref but with another
DOI registration agency, namely mEDRA, “a brand of ediSer, the service company of the
Italian Publishers Association”15.

The limitations of Crossref became evident when we analyzed the amount of metadata
actually deposited by publishers. Less than two-thirds of the journals were found to deposit
abstracts, and those that did deposit this information did not do so systematically for all arti-
cles. Slightly more than half deposited license information, which is relevant to measure com-
pliance with open access mandates and open access availability.

The situation was similar regarding author information. Around two-thirds of the journals
deposited ORCIDs and a third deposited affiliations. Sin embargo, the level of metadata comple-
tion for individual articles was much lower, with major differences depending on the language
del documento.

The inclusion of reference lists in records is important to improve retrieval options and for
citation analysis. Our results suggest that most publishers were willing to share this information
and make the reference lists in their journals openly available, although they could opt to
make them “closed” or “limited.” Nevertheless, this situation has changed recently, porque
new Crossref policies oblige publishers to make their references open. Sin embargo, only half the
journals actually deposit lists of cited references in their articles.

Although it could be surmised that the significant presence of journals published outside
the Anglosphere16 in Crossref could increase the amount of citation data for outputs in non-
English languages, our results suggest that this is not the case. Except for outputs in German
y francés, Scopus retrieves more citations than Crossref, possibly because most publishers
do not deposit reference lists. When interpreting this information, it should be borne in mind
that Crossref only considers DOI to DOI citations, whereas Scopus also includes references
received from documents without a DOI.

En resumen, Crossref represents a promising source of information but is in need of
improvement as a bibliographic discovery tool in the arts and humanities. The number of jour-
nals and articles indexed is vast and includes a large share of journals published outside North
America and Europe. Sin embargo, the amount of metadata deposited by publishers remains

14 https://en.wikipedia.org/wiki/Global_North_and_Global_South.
15 https://www.medra.org.
16 https://en.wikipedia.org/wiki/Anglosphere.

Estudios de ciencias cuantitativas

102

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Crossref as a bibliographic discovery tool

limitado. Further research could examine the motivations behind publishers’ behavior in order
to make Crossref a more comprehensive, accurate, and up-to-date source of information.

EXPRESIONES DE GRATITUD

We would like to thank the reviewers for their comments and suggestions, which helped to
improve the manuscript, and Crossref staff for replying to our questions.

CONTRIBUCIONES DE AUTOR

Ángel Borrego: Conceptualización, Curación de datos, Análisis formal, Adquisición de financiación, Inves-
tigation, Metodología, Software, Visualización, Escritura: borrador original, Escritura: revisión &
edición. Jordi Ardanuy: Adquisición de financiación, Escritura: revisión & edición. Llorenç Arguimbau:
Adquisición de financiación, Escritura: revisión & edición.

CONFLICTO DE INTERESES

Los autores no tienen intereses en competencia.

INFORMACIÓN DE FINANCIACIÓN

This work was supported by the grant PGC2018-096586-B-100 «Redes de colaboración cien-
tífica en ciencias sociales y humanidades en Europa: análisis de la participación en proyectos
y de la coautoría» funded by MCIN/AEI/10.13039/501100011033 and “ERDF A way of making
Europe”, by the European Union.

DISPONIBILIDAD DE DATOS

The data and code used in this study are available at https://github.com/angbor09/crossref
_humanities/.

REFERENCIAS

Baas, J., Schotten, METRO., Plume, A., Côté, GRAMO., & Karimi, R. (2020).
Scopus as a curated, high-quality bibliometric data source for
academic research in quantitative science studies. Quantitative
Science Studies, 1(1), 377–386. https://doi.org/10.1162/qss_a
_00019

Birkle, C., Pendlebury, D. A., Schnell, J., & Adams, j. (2020). Web
of Science as a data source for research on scientific and schol-
arly activity. Estudios de ciencias cuantitativas, 1(1), 363–376. https://
doi.org/10.1162/qss_a_00018

Cheberkus, D., & Nazarovets, S. (2019). Ukrainian open index
maps local citations. Naturaleza, 575(7784), 596–596. https://doi
.org/10.1038/d41586-019-03662-6, PubMed: 31772366

Chudlarský, T., & Dvořák, j. (2020). Can Crossref citations replace
Web of Science for research evaluation? The share of open cita-
ciones. Journal of Data and Information Science, 5(4), 35–42.
https://doi.org/10.2478/jdis-2020-0037

Crossref. (2009). The formation of CrossRef: A short history. https://

www.doi.org/topics/CrossRef10Years.pdf

Harzing, A. W.. (2019). Two new kids on the block: ¿Cómo
Crossref and Dimensions compare with Google Scholar,
Microsoft Academic, Scopus and the Web of Science? ciencia-
tometrics, 120(1), 341–349. https://doi.org/10.1007/s11192-019
-03114-y

Heibi, I., peroni, S., & Shotton, D. (2019). Software review: COCI,
the OpenCitations Index of Crossref open DOI-to-DOI citations.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

cienciometria, 121(2), 1213–1228. https://doi.org/10.1007
/s11192-019-03217-6

Hendricks, GRAMO., Tkaczyk, D., lin, J., & Feeney, PAG. (2020). Crossref:
The sustainable source of community-owned scholarly metadata.
Estudios de ciencias cuantitativas, 1(1), 414–427. https://doi.org/10
.1162/qss_a_00022

Liang, Z., Mao, J., Lu, K., & li, GRAMO. (2021). Finding citations for
PubMed: A large-scale comparison between five freely available
bibliographic data sources. cienciometria, 126(12), 9519–9542.
https://doi.org/10.1007/s11192-021-04191-8, PubMed:
34720252

Martín-Martín, A. (2021). La cobertura de los índices de citas abier-
tos se acerca a la de Web of Science y Scopus. Anuario ThinkEPI,
15. https://doi.org/10.3145/thinkepi.2021.e15e04

Martín-Martín, A., Orduna-Malea, MI., Thelwall, METRO., & López-Cózar,
mi. D. (2018). Google Scholar, Web of Science, y Scopus: A
systematic comparison of citations in 252 subject categories.
Journal of Informetrics, 12(4), 1160–1177. https://doi.org/10
.1016/j.joi.2018.09.002

Martín-Martín, A., Thelwall, METRO., Orduna-Malea, MI., & delgado
López-Cózar, mi. (2021). Google Scholar, Microsoft Academic,
Scopus, Dimensions, Web of Science, and OpenCitations’ COCI:
A multidisciplinary comparison of coverage via citations. ciencia-
tometrics, 126(1), 871–906. https://doi.org/10.1007/s11192-020
-03690-4, PubMed: 32981987

Estudios de ciencias cuantitativas

103

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

Crossref as a bibliographic discovery tool

Mongeon, PAG., & Casa de Pablo, A. (2016). La cobertura periodística de la Web
de Ciencia y Scopus: Un análisis comparativo. cienciometria,
106(1), 213–228. https://doi.org/10.1007/s11192-015-1765-5
Mryglod, o., Nazarovets, S., & Kozmenko, S. (2021). Universal
and specific features of Ukrainian economic research: Publica-
tion analysis based on Crossref data. cienciometria, 126(9),
8187–8203. https://doi.org/10.1007/s11192-021-04079-7

Nederhof, A. j. (2006). Bibliometric monitoring of research perfor-
mance in the social sciences and the humanities: A review.
cienciometria, 66(1), 81–100. https://doi.org/10.1007/s11192
-006-0007-2

peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure
organization for open scholarship. Estudios de ciencias cuantitativas,
1(1), 428–444. https://doi.org/10.1162/qss_a_00023

Van Eck, norte. J., & waltman, l. (2021). Crossref as a source of open
bibliographic metadata. In Proceedings of the 18th International
Conference of the International Society for Scientometrics and
Informetrics (páginas. 1169–1174). https://www.issi-society.org
/proceedings/issi_2021/Proceedings%20ISSI%202021.pdf#page
=1201

Vera-Baceta, M.-A., Thelwall, METRO., & Kousha, k. (2019). Web of
Science and Scopus language coverage. cienciometria, 121(3),
1803–1813. https://doi.org/10.1007/s11192-019-03264-z

Visser, METRO., Van Eck, norte. J., & waltman, l. (2021). Large-scale
comparison of bibliographic data sources: Scopus, Web of
Ciencia, Dimensions, Crossref, and Microsoft Academic. Quan-
titative Science Studies, 2(1), 20–41. https://doi.org/10.1162/qss
_a_00112

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
q
s
s
/
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

/

4
1
9
1
2
0
7
8
3
6
0
q
s
s
_
a
_
0
0
2
4
0
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Estudios de ciencias cuantitativas

104ARTÍCULO DE INVESTIGACIÓN imagen
ARTÍCULO DE INVESTIGACIÓN imagen
ARTÍCULO DE INVESTIGACIÓN imagen
ARTÍCULO DE INVESTIGACIÓN imagen

Descargar PDF