ARTICLE DE RECHERCHE
Funding COVID-19 research: Insights from an
exploratory analysis using open
data infrastructures
un accès ouvert
journal
Alexis-Michel Mugabushaka1
, Nees Jan van Eck2
, and Ludo Waltman2
1Conseil européen de la recherche, Brussels, Belgium
2Centre for Science and Technology Studies (CWTS), Leiden University, Leiden, Netherlands
Citation: Mugabushaka, A.-M., Van Eck,
N. J., & Waltman, L. (2022). Funding
COVID-19 research: Insights from an
exploratory analysis using open data
infrastructures. Quantitative Science
Études, 3(3), 560–582. https://doi.org
/10.1162/qss_a_00212
EST CE QUE JE:
https://doi.org/10.1162/qss_a_00212
Peer Review:
https://publons.com/publon/10.1162
/qss_a_00212
Reçu: 7 Mars 2022
Accepté: 29 Juillet 2022
Auteur correspondant:
Alexis-Michel Mugabushaka
Alexis-Michel.MUGABUSHAKA@ec
.europa.eu
Éditeur de manipulation:
Vincent Larivière
droits d'auteur: © 2022 Alexis-Michel
Mugabushaka, Nees Jan van Eck, et
Ludo Waltman. Published under a
Creative Commons Attribution 4.0
International (CC PAR 4.0) Licence.
La presse du MIT
Mots clés: COVID-19, Crossref, funding data, open metadata, Scopus, Web de la Science
ABSTRAIT
To analyze the outcomes of the funding they provide, it is essential for funding agencies to be
able to trace the publications resulting from their funding. We study the open availability of
funding data in Crossref, focusing on funding data for publications that report research related
to COVID-19. We also present a comparison with the funding data available in two proprietary
bibliometric databases: Scopus and Web of Science. Our analysis reveals limited coverage
of funding data in Crossref. It also shows problems related to the quality of funding data,
especially in Scopus. We offer recommendations for improving the open availability of funding
data in Crossref.
1.
INTRODUCTION
The ongoing coronavirus 2019 pandemic (COVID-19) has caused a major public crisis. Il a
advanced to a major cause of death and overwhelmed healthcare systems in many countries.
According to the World Health Organization ( WHO) COVID-19 dashboard, as of January
2022, there have been over 340 million confirmed cases and 5.5 million deaths worldwide.
The measures taken to contain its spread have also caused unprecedented disruptions of eco-
nomic and social life around the world.
Researchers have been among the “first responding” professions dealing with the pandemic
and its consequences. They advise public authorities on the best measures to control the pan-
demic, study the course of the disease, develop clinical guidelines and medical protocols, et
very importantly develop vaccines—some of them in record time—as well as therapies.
Around the world, several research teams have redirected their research efforts to help fight
the pandemic (Hao, 2020; Kwon, 2020; Viglione, 2020). Research funding bodies have mul-
tiplied initiatives to support research related to the pandemic. In addition to measures allowing
grant management flexibility (Stoye, 2020), several organizations have launched fast-track
research funding programs specifically targeted at various aspects of the crisis. Par exemple,
in the United States, the National Institutes of Health (NIH) launched several initiatives to
tackle the pandemic by using existing funding mechanisms or establishing new, dedicated
programs. They are bundled in the NIH-Wide Strategic Plan for COVID-19 Research. Le
National Science Foundation (NSF) activated its Rapid Response Research mechanisms
(RAPID) used for research funding in unanticipated events. In Europe, the European Union
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
launched a COVID-19 emergency call for proposals in January 2020 and published subse-
quent calls throughout the year. The Innovative Medicines Initiative (IMI) launched a fast-track
call for proposals to speed up the development of new drugs and diagnostics to halt the global
outbreak of COVID-19. The section “national activities” of the European Research Area (ERA)
corona platform lists several other initiatives launched at the beginning of the pandemic1. Ils
include the Deutsche Forschungsgemeinschaft (DFG—German Research Foundation), lequel
set up a COVID-19-focused funding program, and the Swedish Research Council, lequel,
among other initiatives, teamed up with the National Natural Science Foundation of China
(NSFC) to support collaborative projects on coronavirus.
The Organisation for Economic Co-operation and Development’s COVID-19 WATCH,
which monitors research policy responses to the COVID-19 crisis, estimates the combined
value of public research funding in those measures to be about US$2.6 billion, and US$3.8
billion if other sources (charities, industry) are also considered.
This has led to a massive expansion of COVID-19 research. Scientific publishers have
adapted their editorial processes to allow fast dissemination of new results (Hurst & Greaves,
2021), posing to researchers and the public a particular “challenge of discerning signal amidst
noises,” as the editors of one journal put it (Bleck, Buchman et al., 2020).
The resulting unprecedented increase of research papers on a single topic—by some accounts
over 100,000 dans 2020 alone, accounting for about 4% of total research outputs (Else, 2020)—has
also triggered a large body of metaresearch on COVID-19 research. One strand of this research
seeks to tame what has been termed a “paper tsunami” (Brainard, 2020). It uses advanced machine
learning techniques for information extraction, misinformation detection, question answering, etc..
(Shorten, Khoshgoftaar, & Furht, 2021). Another line of work uses scientometric techniques to
develop an understanding of the output of COVID-19 publications. This line of research, pour
instance, studies the role of countries, institutions, journaux, and authors (Mohadab, Bouikhalene,
& Safi, 2020; Tao, Zhou et al., 2020), specific fields and techniques (Aristovnik, Ravšelj, & Umek,
2020; Hossain, Sarwar et al., 2020), genre (Andersen, Nielsen et al., 2020), research areas
(Colavizza, Costas et al., 2021) and researchers (Ioannidis, Salholz-Hillel et al., 2021).
One aspect that remains underexplored is how this research has been, and is being,
funded. One of the notable exceptions is the recent work by Cross, Rho et al. (2021) entitled
“Who funded the research behind the Oxford-AstraZeneca COVID-19 vaccine?” In this work,
the authors analyzed funding information of about 100 peer-reviewed articles relevant to the
Chimpanzee adenovirus-vectored vaccine (ChAdOx) on which the Oxford-AstraZeneca vac-
cine is based. The authors found that this research was almost entirely supported by public
funding. The European Commission, the Wellcome Trust, and the Coalition for Epidemic Pre-
paredness Innovations (CEPI) were the biggest funders of ChAdOx research and development.
The study also highlights the lack of transparency in reporting of funding, which “hinders the
discourse surrounding public and private contributions towards R&D and the cost of R&D.”
Another study in this context is the analysis of how NIH funding has contributed to research
used in COVID-19 vaccine development (Kiszewski, Cleary et al., 2021). The authors focused
sur 10 technologies employed in candidate vaccines (as of July 2020), identified from WHO
documents, and on research on five viruses with epidemic potential. They then estimated the
NIH funding to those areas by linking relevant publications (identified by searching in PubMed
via MeSH terms) to grants using acknowledgments. The authors concluded that NIH funding
has significantly contributed to advances in vaccine technologies, which helped the rapid
1 https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/covid-19?tabId=5.
Études scientifiques quantitatives
561
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
development of COVID-19 vaccines. Cependant, they also noted that NIH funding for vaccine
research for specific pandemic threats has been inconsistent and called for sustained public
sector funding for better preparedness against future pandemics.
In this paper, we expand the scope from a single technology or single funding body to
research on COVID-19 in general. Getting an accurate picture of COVID-19 research funding
is important for a number of reasons:
(cid:129) Insights into the various funding mechanisms and modalities used and how (relatively)
successful they have been may inform the organization of funding in case of future
emergencies.
(cid:129) Given the societal interest and policy implications of the outcomes of COVID-19
recherche, it is important to understand how these outcomes relate to the interests of
sponsors. Most publication ethics guidelines require researchers to state the role of fun-
ders in reported research. Although this is mostly applied in medical journals, extending
it to research on other aspects of the pandemic can bring the transparency needed to
assess the credibility of scientific findings.
(cid:129) La plupart, if not all, public research organizations funding COVID-19 research have account-
ability obligations. They must report to public authorities on the results of their funding activ-
ities. Studying the funding patterns of this research can help funders understand not only the
results of their activities but also how these results relate to research funded by others and, par
putting it in a wider perspective, their relative weight in funding COVID-19 research.
(cid:129) The concerns over fair vaccine access and the resulting debates on patent waivers for
COVID-19 vaccines could be better informed by reliable data on the public investments
that enabled the vaccine development.
In this paper, we explore the funding of COVID-19 research. The main objective is to find
out which funding organizations have contributed to COVID-19 research reported in the
scholarly literature. We seek specifically
(cid:129) to explore the extent to which funding data can be found in openly available databases,
in particular Crossref;
(cid:129) to identify the main funding organizations that supported COVID-19 research; et
(cid:129) to compare the findings based on openly available databases with those based on pro-
prietary databases.
Another study of COVID-19 research funding was carried out by Shueb, Gul et al. (2022).
Unlike our work reported in the present paper, the study by Shueb et al. covers only research
published in the first months of the pandemic and does not make use of openly available
funding data.
This paper is organized as follows. In Section 2, we discuss the data used in our analyses.
We report our results based on openly available data in Section 3 and present a comparison
with results based on proprietary data in Section 4. We summarize our findings and draw
conclusions in Section 5.
2. DATA
We use the funding acknowledged by publications as a proxy for funding of the underlying
recherche (Álvarez-Bornstein & Montesi, 2020). This requires combining data on COVID-19
Études scientifiques quantitatives
562
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
related publications and data on the funding sources of these publications. Dans cette section, nous
discuss the data we combined to link publications to funding as well as the data resulting from
this linking.
2.1. CORD-19 Data
We use the COVID-19 Open Research Dataset (CORD-19), a data set of COVID-19 research
articles (both metadata and full text) released by Semantic Scholar in partnership with other
organizations. CORD-19 defines itself as “a comprehensive collection of publications and
preprints on COVID-19 and related historical coronaviruses such as SARS and MERS” (Wang,
Lo et al., 2020).
CORD-19 combines data from different sources which follow mainly the same search
approche. It is updated regularly by adding new records and deleting erroneous or retracted
entries. While CORD-19 is the most widely used COVID-19 literature data set, it is not without
limitations. The search approach used by CORD-19 has the advantage of conceptual clarity
(c'est à dire., papers included say something about the last three outbreaks caused by coronaviruses or
about coronaviruses more generally), but this advantage is also its inherent limitation:
keywords-based search may lead to the inclusion of papers which only cursorily mention a coro-
navirus outbreak (false positive) or miss relevant papers which use other terms (false negatives).
Other limitations, acknowledged by the CORD-19 team, include the restriction of the data to
scholarly publications, including preprints, leaving aside “other types of documents that could
be important, such as technical reports, white papers, informational publications by governmen-
tal bodies” (Wang et al., 2020) as well as the focus on English language publications.
Some research has critically inspected the CORD-19 data set with respect to coverage and has
suggested possible improvements. Kanakia, Wang et al. (2020) explored how citation links can be
used to understand and mitigate possible bias in CORD-19. Colavizza et al. (2021) also studied the
coverage of CORD-19, using a version of the data set from July 2020. Within the data set, ils
identified a subset called “CORD-19 strict,” for which the CORD-19 query matches the title and
abstract of a publication, disregarding the full text. They found that this subset of CORD-19 almost
perfectly matches the results retrieved from the Web of Science (WoS) database, indicating that
CORD-19 “provides an almost complete coverage of research on COVID-19 and coronaviruses.”
Cependant, the fact that this subset is small suggests that CORD-19 does not only cover COVID-19
recherche, leading Colavizza et al. to caution users to be aware that CORD-19 may include “a large
number of publications whose relevance for COVID-19 and coronaviruses research needs a more
careful assessment, and some of which may be of limited relevance.” The uncertainties in the
scope of CORD-19 and inevitable errors due to its data collection approach are a limitation of
our analysis that should be kept in mind when interpreting our results.
In the rest of this paper, we refer to publications in the CORD-19 data set as COVID-19
publications. This should be understood as publications that are in a broad sense related to
COVID-19, including publications that appeared before the start of the COVID-19 pandemic
and that deal with coronavirus research more broadly.
We used the CORD-19 data set released by the Allen Institute on 15 Février 2021, dans le
version enriched by Microsoft Academic (MAG) by adding publication identifiers from MAG.
2.2. Crossref Data
Linking publications to funding sources is far from straightforward. As discussed elsewhere
(Mugabushaka, 2020), several approaches can be used, each with their advantages and
Études scientifiques quantitatives
563
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
limitations. Our primary focus is on funding data provided by Crossref, although we also
perform a comparison with funding data from proprietary databases.
2.2.1. Crossref funding data
Crossref is a not-for-profit organization that provides an open infrastructure used by many
stakeholders in the scholarly communication system. Its members include publishers, univer-
sities, preprint services, and funding organizations. Its primary function is to enable parties
globally to update and exchange metadata about the scholarly record, identified through
Digital Object Identifiers (DOIs) and made open for all.
Crossref encourages its members to deposit metadata going beyond standard bibliographic
information. Those rich metadata may also include funding data. According to the guidance
Crossref gives to its members, funding data can be obtained from authors when they submit
a manuscript or extracted from the acknowledgment or funding information section of a
manuscript. As part of its data curation, Crossref can also add missing data, for example by inferring
missing funder identifiers from funders names. The share of Crossref records with funding data
has steadily increased to reach about 25% dans 2019 (Hendricks, Tkaczyk et al., 2020, Chiffre 3).
Crossref makes data on funding together with other publication metadata openly available.
We use Crossref’s XML Metadata Plus Snapshot. The snapshot was downloaded on March 5,
2021. We consider only the 110,851,607 records classified as journal article, book content,
conference paper, or preprint2.
2.2.2. Crossref Funder Registry
Funding data in Crossref are powered by a taxonomy of funders maintained in the Crossref
Funder Registry. The Funder Registry was started by Elsevier in 2012 and was donated to
Crossref. The curation of the registry is supported by Elsevier, which reviews it every 4–6
weeks to add new funding entities as well as update or correct existing ones.
The Funder Registry assigns a unique identifier, a DOI, to each funder. The registry is orga-
nized in a hierarchy in which individual entries are linked to parent and child entries.
In Crossref, funding data may refer to any hierarchical level in the Funder Registry. To give
two examples
(cid:129) In the case of the NIH, funding data may refer to a specific institute such as the National
Institute of Allergy and Infectious Diseases (EST CE QUE JE: 10.13039/100000060), the NIH as a
whole (EST CE QUE JE: 10.13039/100000002), or the US Department of Health and Human Ser-
vices (EST CE QUE JE: 10.13039/100000016).
(cid:129) In the case of the European Union, funding data may refer to the Marie Skl(cid:1)odowska-
Curie program (EST CE QUE JE: 10.13039/100010665), the European Research Council (EST CE QUE JE:
10.13039/100011199), the H2020 program (EST CE QUE JE: 10.13039/501100007601), or the
European Commission (EST CE QUE JE: 10.13039/501100000780).
Funding data captured by publishers and submitted to Crossref reflects the different
acknowledgment practices of authors. This can lead to an inconsistent picture of the contri-
butions of funders of COVID-19 research. For a more accurate picture, there is a need to group
funders based on the hierarchy of funding organizations in the Funder Registry. Because this
2 In late 2019, funding organizations started registering grant metadata in Crossref, which is then linked to
publications. In this paper, we focus on funding data submitted to Crossref by publishers.
Études scientifiques quantitatives
564
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Chiffre 1. Funding data for COVID-19 publications in Crossref.
hierarchy follows the legal structure of funding bodies, it can, in some cases, make compar-
isons difficult. Par exemple, public funding bodies in Canada have the Government of Canada
at the highest level in the hierarchy of the Funder Registry, and in the United States govern-
ment departments usually constitute the highest level. We created a mapping of each entry in
the Funder Registry to the corresponding top-level entity in the Funder Registry hierarchy (van
Eck & Mugabushaka, 2021). The mapping is based on version 1.34 of the Funder Registry,
which has 27,741 entries at lower levels, grouped into 22,369 funders at the highest level.
While not perfect, this approach has the advantage of transparency and simplicity. Creating
an alternative mapping would not only require detailed knowledge of funding structures
worldwide but also require subjective choices that might bias our analysis.
2.3. Linking the Data Sets
Chiffre 1 illustrates how the data sets were linked. We used DOIs to link CORD-19 publications
to publications in Crossref. The CORD-19 version we used includes 484,064 records, of which
474,691 are unique publication records (c'est à dire., unique CORD-19 identifiers). Of these records,
260,636 (or about 55%) have a DOI in CORD-19. After eliminating duplicates, we ended up
avec 259,652 unique DOIs, of which 255,378 were found in Crossref. Our analysis is based on
ces 255,378 records in Crossref.
The lack of DOIs for a substantial share of the publications in CORD-19—including about
half of the publications in 2020 and 2021—is another important limitation of our analysis. Il
means that the results reported in the subsequent sections offer only a partial picture of the
funding of COVID-19 research.
Of the CORD-19 publications linked to Crossref, 44,820 have funding data. Pour 36,008
publications, we also have an identifier of a funding organization included in the Crossref
Funder Registry. Our analysis of COVID-19 funding is based on these 36,008 publications.
The relatively low share of publications with funding data in Crossref is another limitation of
our analysis. An important implication of this limitation is the need to pay attention to the way
in which funding data is collected, and which measures can be taken by various stakeholders
to increase the availability of funding data in open data infrastructures. We will share some
reflections and suggestions in the concluding section.
3. ANALYSIS OF OPEN FUNDING DATA IN CROSSREF
The data at hand, effectively a relatively limited subset of COVID-19 research papers due to
the data limitations described above, offers an incomplete picture of COVID-19 funding. Comme
partial as the results are, cependant, they can give indications of funding patterns, notably for the
most prolific funders and how they interact in the network of “cofunding.” The results also
provide insight into the type of funding bodies supporting COVID-19 research.
Études scientifiques quantitatives
565
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Chiffre 2. Percentage of COVID-19 publications with a DOI that have funding data in Crossref.
3.1. Availability of Funding Data
As noted above, we focus on publications indexed in Crossref, as our aim is to use open fund-
ing data. Funding data are available in Crossref for 44,820 COVID-19 publications. Ce
accounts for 17% of the COVID-19 publications for which a DOI is available.
As Figure 2 shows, the availability of funding data in Crossref has increased over time, reach-
ing almost 40% dans 2019. This is in line with earlier analyses showing that the amount of funding
data submitted by publishers to Crossref has steadily increased (Hendricks et al., 2020; Van Eck
& Waltman, 2021). Before 2020, the share of COVID-19 publications with funding data is
higher than the overall share of publications with funding data. The sharp decline in 2020 dans
the share of publications with funding data may seem puzzling. It could be that at the beginning
of the COVID-19 pandemic many researchers immediately started to work on COVID-19-
related research projects, without first applying for funding. This impression is confirmed when
looking at other databases. Both WoS and Scopus also show a significant decrease in the share
of publications with funding data, reinforcing the idea that a relatively large share of all
COVID-19 research in 2020 did not receive funding from a funding agency.
3.2. Top Funders
Le 36,008 publications linked to a funder identifier acknowledge 5,386 distinct funders at the
top level of the Crossref Funder Registry hierarchy. This indicates that, based on open funding
data in Crossref, close to one in four funding bodies at the top level in the Funder Registry have
supported COVID-19 research. The number of papers per funder varies significantly. Le
median is two papers. The top 10 funders account for over half of all papers.
Tableau 1 shows the top 30 funders with the largest number of COVID-19 publications. Based
on open funding data in Crossref, the top funders are the US Department of Health and Human
Services, avec 7,081 publications and the National Natural Science Foundation of China with
5,318 publications. They account for 20% et 15%, respectivement, of the papers for which
funding data is available. They are followed by the European Commission in third place with
1,313 publications and by UK Research and Innovation and the Ministry of Science and Tech-
nology of the People’s Republic of China, avec 1,111 et 1,128 publications, respectivement.
Other funders in the top 10 are the Government of Canada, the U.S. National Science
Études scientifiques quantitatives
566
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Country
Etats-Unis
CHN
EU
CHN
GBR
PEUT
Etats-Unis
DEU
JPN
GBR
BRA
KOR
Etats-Unis
AUS
Etats-Unis
CHN
BRA
ESP
CHN
GBR
CHN
BRA
FRA
CHE
TWN
JPN
Etats-Unis
IND
Etats-Unis
DEU
Tableau 1.
Top 30 funders of COVID-19 publications (based on Crossref data)
U.S. Department of Health and Human Services
National Natural Science Foundation of China
Funder
European Commission
Ministry of Science and Technology of the People’s Republic of China
UK Research and Innovation
Government of Canada
National Science Foundation
Deutsche Forschungsgemeinschaft
Ministry of Education, Culture, Sports, Science and Technology
National Institute for Health Research
Ministério da Ciência, Tecnologia e Inovação
National Research Foundation of Korea
US Department of Defense
Department of Health, Australian Government
Bill and Melinda Gates Foundation
Ministry of Education of the People’s Republic of China
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Ministerio de Economía y Competitividad
Ministry of Finance
Wellcome Trust
China Postdoctoral Science Foundation
Fundação de Amparo à Pesquisa do Estado de São Paulo
Agence Nationale de la Recherche
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Ministry of Science and Technology, Taiwan
Japan Agency for Medical Research and Development
US Department of Agriculture
Department of Science and Technology, Ministry of Science and Technology, India
Foundation for the National Institutes of Health
Bundesministerium für Bildung und Forschung
Number of publications
7,081
5,318
1,313
1,128
1,111
970
942
730
691
655
560
536
531
498
473
466
450
418
343
340
298
273
271
265
261
256
251
236
229
223
Études scientifiques quantitatives
567
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Fondation, the Deutsche Forschungsgemeinschaft, the Japanese Ministry of Education,
Culture, Sports, Science and Technology, and the UK National Institute for Health Research.
3.3. Type of Funder
For each funding body, the Crossref Funder Registry includes information about the type of
organization. This information is organized in two dimensions. D'une part, a distinction
is made between private and public organizations. On the other hand, for each of these cat-
egories, a further distinction is made between different organization forms.
Tableau 2 shows that over two-thirds of the publications (28,186) acknowledge funding bodies
classified as “government” while the rest (13,900) acknowledge funding from private entities.
The classification of funding organizations in the Funder Registry follows the legal status of
these organizations in different countries. Given their particularities, this can lead to results
that are difficult to compare. Par exemple, one of the major public funding bodies in Germany,
the DFG, is classified in the Funder Registry as a private organization under “trusts, charities,
foundations,” which is indeed its legal form. In a way, cependant, it is comparable to other fun-
ders classified as “government”—such as the NIH and NSF in the United States—as it receives
its funding from public authorities (at both the federal and local levels).
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
The classification of organizations in the Funder Registry allows us to identify other non-
public players active in funding COVID-19 research. Among those with more than 100 pub-
lications, we find philanthropic organizations such as the Wellcome Trust and the Bill &
Melinda Gates Foundation and pharmaceutical companies such as Pfizer, Sanofi, and Novartis
(see “Sec_3_2” in the Supplementary material; Mugabushaka, Van Eck, & Waltman, 2022).
3.4. Cofunding Rate and Cofunding Network
As shown in Figure 3, one-third of the COVID-19 papers with funding data are linked to more
than one funding body. In these cases, the research team behind the reported work may have
multiple lines of funding that contributed to the reported results, or the authors may belong to
multiple research teams with different sources of funding.
Tableau 2. Number of COVID-19 publications by type of funding organization (based on Crossref data)
Type of funding organization
National government
Public
23,959
Private
All
23,959
Trusts, charities, foundations (both public and private)
10
6,355
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Local government
Universities (academic only)
Other nonprofit organizations
For-profit companies (industry)
Associations and societies (private and public)
International organizations
Research institutes and centers
Total
Distinct total
6,309
843
18
577
55
6
59
3,732
3,073
1,204
987
742
274
6,365
6,309
4,575
3,091
1,781
1,042
748
333
31,836
16,367
48,203
28,186
13,900
42,086
Études scientifiques quantitatives
568
Funding COVID-19 research
Chiffre 3. Number of COVID-19 publications by number of funders (based on Crossref data).
Through papers acknowledging multiple sources of funding, funding bodies are effectively
engaged in a cofunding collaboration network. Looking at the funders with the largest number
of publications (see “Sec_3_4b” in the Supplementary material; Mugabushaka et al., 2022), nous
see that the share of publications in which a funding agency is acknowledged together with
other funders is, on average, relatively high, but with some variation across funders. For some
of the large funders, the share of publications cofunded with other funders is around 50%.
Examples are the National Natural Science Foundation of China (48%), the Government of
Canada (55%), the Deutsche Forschungsgemeinschaft (55%), and the US Department of
Health and Human Services (58%). For other large funders, such as UK Research and Innova-
tion (68%) and the European Commission (72%), more than two-thirds of the publications are
cofunded with other funders.
Chiffre 4 presents a visualization of a cofunding network for COVID-19 publications. Le
visualization was created using the VOSviewer software (Van Eck & Waltman, 2010). The net-
work includes 384 funders that each have at least 30 cofunding links with other funders in the
réseau. The visualization can be explored interactively at https://tinyurl.com/z27f97ek.
4. COMPARISON WITH PROPRIETARY FUNDING DATA
To assess the comprehensiveness of open funding data in Crossref, we performed a comparison
with funding data in two proprietary bibliometric databases: Scopus (Baas, Schotten et al., 2020)
and WoS (Birkle, Pendlebury et al., 2020). Following the approach taken for Crossref, we use
the subset of the CORD-19 data set with DOIs to query Scopus and WoS and retrieve funding data.
The comparison with funding data in proprietary databases aims to provide insight into the com-
prehensiveness of open funding data in Crossref. Comparing the funding data made available by
different proprietary databases is not the main purpose of our analysis. For earlier analyses of
funding data available in Scopus and WoS, we refer to Álvarez-Bornstein, Morillo, and Bordons
(2017), Grassano, Rotolo et al. (2017), Kokol and Blažun Vošner (2018), Liu (2020), Liu, Tang,
and Hu (2020), Paul-Hus, Desrochers, and Costas (2016), and Tang, Hu, and Liu (2017).
To have a meaningful comparison, we focus on funding data obtained from publishers and
made available in bibliometric databases. We do not consider funding data collected from
Études scientifiques quantitatives
569
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
Chiffre 4. Co-funding network for COVID-19 publications (based on Crossref data).
funding agencies. Considering data obtained from funding agencies would obscure the com-
parison between Crossref and proprietary bibliometric databases because our analysis for
Crossref takes into account only data obtained from publishers. WoS nowadays also includes
funding data obtained from funding agencies, such as data from NIH Reporter, but we do not
use this data. We also do not consider funding data from the Dimensions database (Herzog,
Hook, & Konkiel, 2020), as this database does not make a distinction between funding data
obtained from publishers and from funding agencies.
The Scopus and WoS data were retrieved from the in-house database system of the Centre for
Science and Technology Studies (CWTS) at Leiden University. For both databases, we used data
from April 2021. The following WoS citation indexes were used: Index des citations scientifiques
Expanded (SCIE), Index des citations en sciences sociales (SSCI), Arts & Index des citations en sciences humaines
(AHCI), and Conference Proceedings Citation Index (CPCI). We did not use the Emerging
Sources Citation Index, because this citation index is not included in the WoS license of CWTS.
Scopus uses the same funder registry as Crossref, making it relatively easy to compare the
funding data available in Crossref and Scopus. WoS takes a different approach and uses its
own funder registry. This registry provides a unified name to each funder. Due to the different
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Études scientifiques quantitatives
570
Funding COVID-19 research
Chiffre 5. Funding data for COVID-19 publications in Scopus and WoS.
approach taken by WoS, there is no easy way to compare Crossref and WoS in terms of the
availability of funding data at the level of individual funders. We therefore compare Crossref
and WoS only in terms of whether publications do or do not report funding, without taking into
account the funder that provided the funding.
Dans cette section, we first analyze the availability of funding data in Scopus and WoS. We then
look at the top funders and finally we explore the differences between Scopus, WoS, et
Crossref.
4.1. Availability of Funding Data
Bibliometric databases have different scopes due to differences in their inclusion criteria.
Viser, Van Eck, and Waltman (2021) recently reported that overall WoS covers fewer publi-
cations than Scopus, even though there are some publications, for instance meeting abstracts
and book reviews, that are covered by WoS and not by Scopus. For DOIs in CORD-19,
Chiffre 5 shows that Scopus indexes more publications than WoS (187,518 vs. 171,130).
Cependant, looking at publications with funding data, Scopus has lower coverage than WoS
(61,168 vs. 73,444 publications). On the other hand, Scopus has higher coverage than WoS
(52,747 vs. 46,070 publications) if we consider only publications that include funder identi-
fiers or unified funder names.
Both in Scopus and in WoS, the availability of funding data is higher than in Crossref, où
funding data are available for 44,820 publications, of which 36,008 include funder identifiers
(voir la figure 1).
4.2. Top Funders
Scopus uses the Crossref Funder Registry described in Section 2.2.2, while WoS has its own
registry of funders. On account of this, we do not make a direct comparison of the number of
publications per funder in Scopus and WoS. Plutôt, we present separate statistics for each of
the two databases.
For Scopus and WoS, Tableau 3 shows the top funders in terms of the number of COVID-19
publications. For Scopus, we look at funders at the highest level of the Funder Registry
Études scientifiques quantitatives
571
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Q
toi
un
n
t
je
t
un
je
t
je
v
e
S
c
e
n
c
e
S
toi
d
e
s
t
je
Tableau 3.
Top 30 funders of COVID-19 publications (based on Scopus and WoS data)
Scopus
Funder
U.S. Department of Health and Human Services
National Natural Science Foundation of China
European Commission
Ministry of Science and Technology of the People’s
Republic of China
# pub.
13,559
5,938
3,373
2,380
WoS
Funder
United States Department of Health & Human Services
National Institutes of Health (NIH) – Etats-Unis
National Natural Science Foundation of China (NSFC)
European Commission
UK Research and Innovation
2,269
NIH National Institute of Allergy & Infectious Diseases
National Science Foundation (NOUS)
Ministry of Education, Culture, Sports, Science and
Technologie ( Japan)
Government of Canada
Deutsche Forschungsgemeinschaft
National Institute for Health Research (ROYAUME-UNI )
Wellcome Trust
Ministry of Education of the People’s Republic of China
Pfizer
Ministério da Ciência, Tecnologia e Inovação (Brazil)
U.S. Department of Defense
Department of Health, Australian Government (Australia)
National Research Foundation of Korea
Ministerio de Economía y Competitividad (Espagne)
Ministry of Finance (Chine)
Coordenação de Aperfeiçoamento de Pessoal de Nível
Superior (Brazil)
Merck
5
7
2
1,587
1,585
1,454
1,146
1,118
1,022
972
866
819
798
748
697
658
633
609
582
(NIAID)
National Science Foundation (NSF)
German Research Foundation (DFG)
Canadian Institutes of Health Research (CIHR)
Ministry of Education, Culture, Sports, Science and
Technologie, Japan (MEXT)
Medical Research Council UK (MRC)
UK Research & Innovation (UKRI)
Wellcome Trust
National Council for Scientific and Technological
Développement (CNPq)
National Health and Medical Research Council
of Australia
Japan Society for the Promotion of Science
CAPES (Brazil)
Fundamental Research Funds for the Central Universities
National Institute for Health Research (NIHR)
NIH National Cancer Institute (NCI)
NIH National Heart Lung & Blood Institute (NHLBI)
French National Research Agency (ANR)
# pub.
11,857
11,341
7,946
4,875
1,832
1,464
1,424
1,305
1,284
1,254
1,221
1,201
1,022
1,012
901
740
739
736
722
685
668
F
toi
n
d
je
n
g
C
Ô
V
je
D
–
1
9
r
e
s
e
un
r
c
h
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Q
toi
un
n
t
je
t
un
je
t
je
v
e
S
c
e
n
c
e
S
toi
d
e
s
t
je
Bill and Melinda Gates Foundation
AstraZeneca
Novartis
Roche
COVIDien
Auris Health
Schweizerischer Nationalfonds zur Förderung der
Wissenschaftlichen Forschung
Chinese Academy of Sciences
Medtronic
580
577
576
567
551
549
532
505
483
Natural Sciences and Engineering Research Council
of Canada (NSERC)
United States Department of Defense
Bill & Melinda Gates Foundation
Conseil européen de la recherche (ERC)
National Basic Research Program of China
CGIAR
NIH National Institute of General Medical Sciences
(NIGMS)
Swiss National Science Foundation (SNSF)
Ministry of Science and Technology, Taiwan
641
622
620
578
560
555
552
538
533
F
toi
n
d
je
n
g
C
Ô
V
je
D
–
1
9
r
e
s
e
un
r
c
h
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
5
7
3
Funding COVID-19 research
hierarchy, following the approach discussed in Section 2.2.2. For WoS, we use unified funder
names.
Tableau 3 provides three interesting insights:
(cid:129) D'abord, the table shows that by and large, the top funders are the same in Scopus and
WoS and match those in the open data obtained from Crossref. En fait, the first three
funders are the same across the three databases: the U.S. Department of Health &
Human Services, followed by the National Natural Science Foundation of the China
(NSFC) and the European Commission. Other top funders listed in Table 1 based on
Crossref are also visible among the top funders listed in Table 3 based on the proprietary
databases.
(cid:129) A second observation is the difficulty of making comparisons between databases that
use different registries of funders. WoS harmonizes funder names, but unlike Scopus
and Crossref, it does not enable funders to be aggregated into higher level entities.
While one may intuitively infer that the different institutes of the NIH belong to the
same higher level entity, it requires considerable knowledge of the funding landscape
to know that the European Research Council, which is listed as a separate organization
in the case of WoS, is part of the European Commission in the case of Scopus and
Crossref. Ainsi, when comparing funding data from different databases, it is essential
to pay close attention to the way in which relations between funding entities are
handled.
(cid:129) A third observation relates to pharmaceutical companies, which feature prominently on
the Scopus list but not on the Crossref and WoS lists.
We explore the differences between the three databases in more detailed in the next
section.
4.3. Differences Between the Databases
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
In the previous sections, we analyzed differences at an aggregate level in funding data
obtained from three different databases: Crossref, Scopus, and WoS. Dans cette section, we present
an analysis at a more detailed level, focusing on the extent to which—at the level of individual
publications—funding data obtained from these three databases differs or overlaps.
4.3.1.
Intersections and differences
Chiffre 6 shows the overlap and the differences between the three databases in terms of pub-
lications for which funding data are available. The relatively small overlap is remarkable.
Il y a 95,292 publications with funding data in at least one of the three databases. Only
23,950 of these publications have funding data in all three databases, an overlap of 25%. Le
number of publications with funding data in only one of the databases is largest for WoS
(16,155). It is somewhat smaller for Scopus (12,738) and smallest for Crossref (6,209).
The differences shown in Figure 6 are partly due to differences in the publications indexed
in the three databases. As indicated in Figure 5, of the DOIs in CORD-19, only 72% can be
linked to publications indexed in Scopus and only 66% to publications indexed in WoS. Dans
contraste, 98% of the DOIs can be linked to publications in Crossref, as shown in Figure 1.
In Figure 7, we therefore restrict the analysis to the 141,291 publications indexed in all
three databases. Of these publications, 72,402 have funding data in at least one of the data-
bases and 23,950 have funding data in all three databases, resulting in an overlap of 33%. Comme
Études scientifiques quantitatives
574
Funding COVID-19 research
Chiffre 6. Overlap of Crossref, Scopus, and WoS in terms of COVID-19 publications with funding
data (considering all publications indexed in at least one of the three databases).
in Figure 6, the number of publications that have funding data in one database but not in the
others is largest for WoS (11,714). It is somewhat smaller for Scopus (8,457) and smallest for
Crossref (729).
4.3.2. Accuracy of funding data
We now analyze the accuracy of funding data for individual publications by comparing fund-
ing data obtained from the different databases with funding information found in the full text of
publications. The comparison is based on a stratified random sample of 120 publications. Pour
each of the three databases considered, we drew a random sample of 40 publications that
have funding data only in that database and not in the other two databases. En outre, given
Chiffre 7. Overlap of Crossref, Scopus, and WoS in terms of COVID-19 publications with funding
data (considering only publications indexed in all three databases).
Études scientifiques quantitatives
575
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
the notable presence of pharmaceutical companies among the funders of COVID-19 publica-
tions in Scopus, we also analyze a random sample of 25 publications that have at least one
private entity as funder in Scopus (c'est à dire., an entity classified in the Funder Registry as “private”
and “for-profit company/industry”). The samples are available in the Supplementary material
(Mugabushaka et al., 2022).
Our findings can be summarized as follows:
(cid:129) In the Crossref sample, we found the funding data for 27 of the 40 publications to
correspond to the text in either the acknowledgment or the funding information sec-
tion of a publication. In the following we refer to this as “correct” funding data. Le
correct entries include two instances in which the funding information is in fact a
statement that there was no specific funding supporting the work (par exemple., “This work
received no specific grant from a funding agency”). Although the sample is too small
to generalize to Crossref as a whole, this type of funding statement may be an impor-
tant one that deserves further analysis. Databases that provide funding data may con-
sider including it in their taxonomies. De la 13 publications with incorrect funding
data, four were apparently due to an error of the extraction algorithm, which for
example mistook the affiliation of the authors for a funding body. In one case, le
funding information was partially correct: One funding organization listed in the
acknowledgment was missed but others correctly identified. In the other eight cases,
the funding information could not be located anywhere in the full text of the
publication.
(cid:129) In the Scopus sample, pour 15 of the 40 publications, the funding data corresponds to
the funding statement found in the full text. Twenty-five cases were found to be
errors, most probably of the algorithm for extracting funding information. The most
common error was the algorithm incorrectly identifying the section of a publication
that includes a funding statement. Sometimes a conflict-of-interest section was incor-
rectly interpreted as a funding statement. Dans d'autres cas, the acknowledgment section
was interpreted as a funding statement, when the publication in fact included a sep-
arate funding information section. Other errors, Par exemple, included mistaking a
natural person thanked by the authors for a funding body, or interpreting the affilia-
tion of a researcher mentioned in the acknowledgment section as a funding body. Dans
four cases, the funding information could not be found anywhere in the full text of
the publication3.
(cid:129) In the WoS sample, we found that in 37 of the 40 cases the funding data corresponds to
the funding statement in the full text. In two of these cases, we noted that the funding
information provided in the full text was ambiguous. The funding information section
stated that there was no funding to report, but the acknowledgment section mentioned
a funding body that provided financial support. In the three cases in which no relevant
funding statement could be found in the full text, there was an error, most probably
caused by the extraction algorithm mistaking a conflict-of-interest section for a funding
statement (see also Grassano et al., 2017; Lewison & Sullivan, 2015).
(cid:129) In the sample of publications with Scopus funding data that includes a private entity, five
of the 25 publications indeed contained a funding statement mentioning the private
entity. In the other 20 cases, the funding data was incorrect: The private entity was
3 Scopus informed us that its funding data should be seen as “work in progress” as there are still many incre-
mental improvements in the planning. Currently Scopus focuses on optimizing the data for the top 300
funders.
Études scientifiques quantitatives
576
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Chiffre 8. Percentage of COVID-19 publications with funding data, breakdown by publisher and
database (considering only publications indexed in all three databases).
mentioned in the conflict-of-interest section, but not as a funder of the research. In Sco-
pus, this problem occurred in all 20 publications. It also occurred in four publications in
WoS and in one publication in Crossref.
4.3.3. Differences by publisher
We now turn to differences by publisher in the coverage of funding data. Chiffre 8
includes—for each of the databases considered—the share of publications with funding data.
We restricted the analysis to publications from 2020 et 2021 and show only publishers with
500 or more COVID-19 publications. Statistics for other publishers are available in the Sup-
plementary material (Mugabushaka et al., 2022).
Chiffre 8 shows some interesting differences between Scopus and WoS, but these are not as
striking as the differences between the two proprietary databases and Crossref.
Two publishers, Oxford University Press and American Chemical Society, do an excellent
job of submitting funding data to Crossref. For these publishers, the number of publications
with funding data in Crossref is almost as large as in WoS and, in the case of Oxford University
Presse, substantially larger than in Scopus.
For many other publishers, the number of publications with funding data in Crossref is
considerably below the corresponding number in WoS, and in most cases also below the cor-
responding number in Scopus. These publishers seem to have gaps in the funding data they
submit to Crossref, or they may have started submitting funding data only recently.
Chiffre 8 also reveals three publishers that do not submit funding data to Crossref at all:
American Medical Association, la presse de l'Universite de Cambridge, and JMIR.
There is a need to better understand why some publishers include funding information in
the metadata they provide to Crossref while others do not. A possible explanation is that
awareness among publishers of the importance of submitting funding data to Crossref still
needs to grow, and some publishers may also need more time to implement the submission
of funding data throughout all their workflows.
Études scientifiques quantitatives
577
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
5. SUMMARY AND OUTLOOK
The COVID-19 pandemic has turned the world upside down. As the surge of cases threatens to
overwhelm healthcare systems and the death toll increases around the world, the effects of
containment measures reverberate through economies and societies, sending shock waves
that many fear will be also felt in years to come. Researchers have been among the first
professions contributing to tackling the pandemic, and research funding organizations have
adapted their programs or developed new ones to support them.
Par conséquent, there has been an explosion of scientific papers on all aspects of the pandemic, par
some accounts making up 4% of the 2020 scientific output indexed by major bibliometric data-
bases. This has led also to scientometric analyses trying to uncover patterns and trends in this
vast literature. In this paper we have looked into one aspect that has received little attention so
far: the funding of COVID-19 research. It is important to understand how past and current fund-
ing have contributed to tackling the pandemic. Such an understanding can not only inform the
design of adequate mechanisms for future emergencies but can also help funders meet their
accountability obligations. As the pandemic has shown, scientific evidence that generates
sound knowledge on which solutions and public policies are built has to compete with well-
organized disinformation campaigns. The disclosure of funding sources can also enhance the
transparency of the research process and increase the public’s confidence in scientific findings.
The main objective of this paper was to explore the extent to which openly available data
sets (Crossref funding data) can help in the study of funding of COVID-19 research (operation-
alized by the CORD-19 data set). We also aimed to make a comparison with the availability of
funding data in proprietary bibliometric databases (Scopus and WoS).
We found that only 17% of the CORD-19 publications with DOIs have funding data in
Crossref. This rate was higher for the proprietary databases: 24% for Scopus and 28% pour
WoS. Considering only publications indexed by a database, we found that 33% of the
CORD-19 publications indexed in Scopus have funding data. The corresponding share for
WoS is 43%.
In terms of the main funders of COVID-19 research, the three databases paint a broadly
similar picture. The three funders with most publications in CORD-19 are the U.S. Department
of Health and Human Services (mainly the NIH), the National Natural Science Foundation of
Chine, and the European Commission. There are some differences in lower ranks.
By comparing publications with funding data in the three databases, we found a relatively
low overlap. Considering only publications indexed in all three databases, only 33% of the
publications with funding data in at least one database have funding data in all three data-
bases. For the two proprietary databases, the overlap is 64%.
We also assessed, based on small samples, the accuracy of the funding data present in one
database but not in the other two. Our analysis shows that most funding data exclusive to WoS
(c'est à dire., data available neither in Crossref nor in Scopus) matched with funding information in the
full text of publications. The share of publications for which we could not confirm the correct-
ness of the funding data based on the full text of the publication was also relatively low in
Crossref, but it was quite high in Scopus.
After observing that the list of top funders from Scopus includes more pharmaceutical com-
panies than the lists from Crossref and WoS, we also checked manually a sample of 25 pub-
lications which had a company among the funding organizations. We found that in most cases
this did not correspond to the funding information included in the paper. Plutôt, it seems to be
the result of an error made by the algorithm used to extract and structure funding data. In most
Études scientifiques quantitatives
578
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
cases, the algorithm incorrectly treats the conflict of interest or disclosure section of a paper as
a funding statement. Par conséquent, pharmaceutical companies are often presented as funders of
the research presented in a paper, while in fact they are funders or collaborators in other activ-
ities of the authors4.
The main observation from this study is the limited coverage of funding data in open data
infrastructures. Using Crossref funding data alone allows us to paint only a partial picture of
who is funding COVID-19 research. In comparison with proprietary databases, the share of
CORD-19 publications (with DOI) that have funding data in Crossref is about seven percent-
age points lower than in Scopus and about 11 percentage points lower than in WoS. Le
limited coverage of funding data in Crossref can be explained by differences in the metadata
deposited by publishers to Crossref. Although for some publishers we have nearly full
coverage of funding data, for others the coverage is relatively low, and there are also
publishers that do not deposit funding data at all. We also observed that in proprietary data-
bases the coverage by publisher rarely exceeds 75%. Publications without funding data may
present research for which the authors did not receive any funding. Cependant, it is also possible
that the authors did receive funding and did report this in their publication, but that the
funding information was not processed properly due to algorithmic mistakes.
A second observation from this study is the uncertain quality of funding data. As already
mentioned, in a small random sample that we analyzed, we found that most funding data
related to pharmaceutical companies is based on algorithmic errors, mainly because extrac-
tion algorithms confuse conflict of interest statements with funding statements. This issue
affects all databases considered but seems more severe in Scopus.
In the following we offer some reflections on how the availability of funding data can be
improved.
Authors provide funding information in their papers to comply with the requirements of
publishers and funders. D'une part, publishers’ ethical guidelines increasingly require
disclosure of funding sources. In the case of research with commercial or political interests at
stake, this transparency helps readers assess the extent to which the credibility of the findings
may be related to possible conflicts of interests of the authors. This is standard practice in many
medical journals. On the other hand, almost all major research funders require—often as part
of their grant terms and conditions—that grant holders explicitly acknowledge the funder’s
support in publications to which the funding has contributed.
By collecting and making available funding data provided in funding statements in publica-
tion, publishers provide an important service to various stakeholders, including the following:
(cid:129) to readers by providing transparency on the sponsor of a study;
(cid:129) to authors by helping them comply with funders’ requirements;
(cid:129) to funders by allowing them to easily identify the results of their funding; et
(cid:129) to scientometricians by enabling them to study the effectiveness and impact of funding
pratiques.
Cependant, as the results presented in this paper show, funding data in open infrastructures is
lacking in terms of coverage and to some extent also quality.
4 Scopus informed us that it currently focuses on optimizing its funding data for the top 300 funders. While this
may be good enough for global comparisons, our findings show that it may give an inaccurate picture in
more specific analyses.
Études scientifiques quantitatives
579
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Publishers should be encouraged to sustain and intensify their efforts to submit funding data
to Crossref. One improvement that could be considered is to extend the data that can be pro-
vided to Crossref to also include the raw funding information text. This would allow publishers
who cannot commit resources to extract structured funding data from papers to participate in
this effort, overcoming the current situation in which publishers need to choose between pro-
viding structured funding data to Crossref and providing no funding data at all. Another advan-
tage would also be that having the funding statements as provided in papers could help
improve the quality of funding data. As better algorithms become available, they can be
appliqué, also retrospectively, to the available funding statements to turn these statements into
high-quality structured funding data.
Funders also have an important role in improving both the availability and the quality of
funding data in open data infrastructures. En particulier, funders should support the efforts of
open scholarly infrastructures to create persistent identifiers. The Funder Registry used by
Crossref and Scopus and the deposition of funding data to Crossref should be seen as part
of broader efforts to improve the availability of high-quality open funding data, building on
past efforts—such as the guidelines of the UK Research Information Network (RIN, 2008),
and continuing to evolve. Crossref recently started an initiative to assign DOIs to research
grants5. Funders should take up this opportunity and offer guidance to grant holders on
how persistent identifiers for grants should be used in funding statements in publications.
As scientific results increasingly impact our daily life, chercheurs, funders, publishers,
research organizations, and society at large share an interest in safeguarding the trust and cred-
ibility that scholar communication enjoys. Working together on realizing high-quality open
funding data is an essential step in this endeavor.
REMERCIEMENTS
We are grateful to Clara Calero, Dan Gibson, and Jeroen van Honk (CWTS, Leiden University),
Ginny Hendricks (Crossref ), M’hamed El Aisati (Elsevier Scopus), and Gali Halevi (Clarivate
WoS) for their feedback on an earlier draft of this paper. We also thank two anonymous
reviewers for their helpful comments and suggestions.
CONTRIBUTIONS DES AUTEURS
Alexis-Michel Mugabushaka: Conceptualisation; Conservation des données; Analyse formelle; Enquête;
Méthodologie; Visualisation; Writing—original draft. Nees Jan van Eck: Conceptualisation;
Conservation des données; Analyse formelle; Enquête; Méthodologie; Visualisation; Writing—review
& édition. Ludo Waltman: Conceptualisation; Visualisation; Writing—review & édition.
COMPETING INTERESTS
Depuis 2018 jusqu'à 2020, Alexis-Michel Mugabushaka was a member of Crossref’s Funders
Advisory Group on persistent identifiers for grants. En outre, the work was performed while
he was on secondment to the Directorate-General for Research and Innovation (DG RTD) de
the European Commission (EC). The views expressed in this paper are the authors’. They do
not reflect official positions of the EC or the European Research Council.
INFORMATIONS SUR LE FINANCEMENT
The authors did not receive any funding for the research reported in this paper.
5 https://www.crossref.org/blog/request-for-feedback-on-grant-identifier-metadata/.
Études scientifiques quantitatives
580
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
DATA AVAILABILITY
The version of the CORD-19 data set used in this paper was released by Microsoft Academic
on February 22, 2021 and corresponds to the release from February 15, 2021 by the Allen
Institut. It is accessible at https://magcord19.blob.core.windows.net/mapping/2021-02-22
-CORD-19-MappedTo-2021-02-15-MAG-Backfill.csv.
The mapping of funding organizations to the corresponding top-level entities (see Section
2.2.2) is available in Zenodo (Van Eck & Mugabushaka, 2021): https://doi.org/10.5281/zenodo
.5562841.
The Supplementary material is also available in Zenodo (Mugabushaka et al., 2022): https://
doi.org/10.5281/zenodo.6805409.
RÉFÉRENCES
Álvarez-Bornstein, B., & Montesi, M.. (2020). Funding acknowl-
edgements in scientific publications: A literature review.
Research Evaluation, 29(4), 469–488. https://est ce que je.org/10.1093
/reseval/rvaa038
Álvarez-Bornstein, B., Morillo, F., & Bordons, M.. (2017). Funding
acknowledgments in the Web of Science: Completeness and
accuracy of collected data. Scientometrics, 112(3), 1793–1812.
https://doi.org/10.1007/s11192-017-2453-4
Andersen, J.. P., Nielsen, M.. W., Simone, N. L., Lewiss, R.. E., & Jagsi,
R.. (2020). COVID-19 medical papers have fewer women first
authors than expected. eLife, 9, e58807. https://doi.org/10.7554
/eLife.58807, PubMed: 32538780
Aristovnik, UN., Ravšelj, D., & Umek, L. (2020). A bibliometric anal-
ysis of COVID-19 across science and social science research
landscape. Durabilité, 12(21), 9132. https://doi.org/10.3390
/su12219132
Baas, J., Schotten, M., Plume, UN., Côté, G., & Karimi, R.. (2020).
Scopus as a curated, high-quality bibliometric data source for
academic research in quantitative science studies. Quantitative
Science Studies, 1(1), 377–386. https://doi.org/10.1162/qss_a
_00019
Birkle, C., Pendlebury, D. UN., Rapide, J., & Adams, J.. (2020). Web
of Science as a data source for research on scientific and schol-
arly activity. Études scientifiques quantitatives, 1(1), 363–376. https://
est ce que je.org/10.1162/qss_a_00018
Bleck, T. P., Buchman, T. G., Dellinger, R.. P., Deutschman, C. S.,
Maréchal, J.. C., … Zimmerman, J.. J.. (2020). Pandemic-related
submissions: The challenge of discerning signal amidst noise.
Critical Care Medicine, 48(8), 1099–1102. https://est ce que je.org/10
.1097/CCM.0000000000004477, PubMed: 32697478
Brainard, J.. (2020). New tools aim to tame pandemic paper
tsunami. Science, 368, 924–925. https://doi.org/10.1126
/science.368.6494.924, PubMed: 32467369
Colavizza, G., Costas, R., Traag, V. UN., Van Eck, N. J., van Leeuwen,
T., & Waltman, L. (2021). A scientometric overview of CORD-19.
PLOS ONE, 16(1), e0244839. https://doi.org/10.1371/journal
.pone.0244839, PubMed: 33411846
Cross, S., Rho, Y., Reddy, H., Pepperrell, T., Rodgers, F., … Keestra,
S. (2021). Who funded the research behind the Oxford-
AstraZeneca COVID-19 vaccine? BMJ Global Health, 6,
e007321. https://doi.org/10.1136/ bmjgh-2021-007321,
PubMed: 34937701
Else, H. (2020). How a torrent of COVID science changed research
publishing—In seven charts. Nature, 588, 553. https://est ce que je.org/10
.1038/d41586-020-03564-y, PubMed: 33328621
Grassano, N., Rotolo, D., Hutton, J., Lang, F., & Hopkins, M.. M..
(2017). Funding data from publication acknowledgments: Cover-
âge, uses, and limitations. Journal of the Association for Informa-
tion Science and Technology, 68(4), 999–1017. https://est ce que je.org/10
.1002/asi.23737
Hao, K. (2020). The scientists and technologists who dropped
everything to fight covid-19. MIT Technology Review, Avril 15.
https://www.technologyreview.com/2020/04/15/999478
/scientists-engineers-volunteer-fight-covid-19-pandemic/
Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P.. (2020). Crossref:
The sustainable source of community-owned scholarly metadata.
Études scientifiques quantitatives, 1(1), 414–427. https://est ce que je.org/10
.1162/qss_a_00022
Herzog, C., Hook, D., & Konkiel, S. (2020). Dimensions: Bringing down
barriers between scientometricians and data. Quantitative Science
Études, 1(1), 387–395. https://doi.org/10.1162/qss_a_00020
Hossain, M., Sarwar, S., McKyer, E. L., & Ma, P.. (2020). Applica-
tions of artificial intelligence technologies in COVID-19
recherche: A bibliometric study. Preprints, 2020, 2020060161.
https://doi.org/10.20944/preprints202006.0161.v1
Hurst, P., & Greaves, S. (2021). COVID-19 Rapid Review
cross-publisher initiative: What we have learned and what we
are going to do next. Learned Publishing, 34(3), 450–453.
https://doi.org/10.1002/leap.1375, PubMed: 34230774
Ioannidis, J.. P.. UN., Salholz-Hillel, M., Boyack, K. W., & Baas, J..
(2021). The rapid, massive growth of COVID-19 authors in the
scientific literature. Royal Society Open Science, 8(9), 210389.
https://doi.org/10.1098/rsos.210389, PubMed: 34527271
Kanakia, UN., Wang, K., Dong, Y., Xie, B., Lo, K., … Wu, C. H.
(2020). Mitigating biases in CORD-19 for analyzing COVID-19
literature. Frontiers in Research Metrics and Analytics, 5, 596624.
https://doi.org/10.3389/frma.2020.596624, PubMed: 33870059
Kiszewski, UN. E., Cleary, E. G., Jackson, M.. J., & Ledley, F. D. (2021).
NIH funding for vaccine readiness before the COVID-19 pan-
demic. Vaccine, 39(17), 2458–2466. https://est ce que je.org/10.1016/j
.vaccine.2021.03.022, PubMed: 33781600
Kokol, P., & Blažun Vošner, H. (2018). Discrepancies among
Scopus, Web de la Science, and PubMed coverage of funding infor-
mation in medical journal articles. Journal of the Medical Library
Association, 106(1), 81–86. https://doi.org/10.5195/jmla.2018
.181, PubMed: 29339937
Kwon, D. (2020). Scientists around the globe pivot their research to
SARS-CoV-2. The Scientist, Avril 6. https://www.the-scientist
.com/news-opinion/scientists-around-the-globe-pivot-their
-research-to-sars-cov-2-67385
Études scientifiques quantitatives
581
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Funding COVID-19 research
Lewison, G., & Sullivan, R.. (2015). Conflicts of interest statements
on biomedical papers. Scientometrics, 102(3), 2151–2159.
https://doi.org/10.1007/s11192-014-1507-0
Liu, W. (2020). Accuracy of funding information in Scopus: A com-
parative case study. Scientometrics, 124(1), 803–811. https://est ce que je
.org/10.1007/s11192-020-03458-w
Liu, W., Tang, L., & Hu, G. (2020). Funding information in Web
of Science: An updated overview. Scientometrics, 122(3),
1509–1524. https://doi.org/10.1007/s11192-020-03362-3
Mohadab, M.. E., Bouikhalene, B., & Safi, S. (2020). Bibliometric
method for mapping the state of the art of scientific production
in Covid-19. Chaos, Solitons, and Fractals, 139, 110052. https://
doi.org/10.1016/j.chaos.2020.110052, PubMed: 32834606
Mugabushaka, A.-M. (2020). Linking publications to funding at
project level: A curated dataset of publications reported by FP7
projects. arXiv, arXiv:2011.07880. https://doi.org/10.48550/arXiv
.2011.07880
Mugabushaka, A.-M., Van Eck, N. J., & Waltman, L. (2022). Fund-
ing Covid-19 research: Insights from an exploratory analysis
using open data infrastructures—Supplementary material [Données
ensemble]. Zenodo. https://doi.org/10.5281/zenodo.6112762
Paul-Hus, UN., Desrochers, N., & Costas, R.. (2016). Characteriza-
tion, description, and considerations for the use of funding
acknowledgement data in Web of Science. Scientometrics,
108(1), 167–182. https://doi.org/10.1007/s11192-016-1953-y
RIN. (2008). Acknowledgement of funders in scholarly journal articles:
Guidance for UK research funders, authors and publishers. Retrieved
from https://www.ukri.org/wp-content/uploads/2020/10/ RIN
-251020-FundersAcknowledgementInScholarlyjournalArticles.pdf
Shorten, C., Khoshgoftaar, T. M., & Furht, B. (2021). Deep Learning
applications for COVID-19. Journal of Big Data, 8(1), 18. https://
doi.org/10.1186/s40537-020-00392-9, PubMed: 33457181
Shueb, S., Gul, S., Nisa, N. T., Shabir, T., Rehman, S. U., & Hussain,
UN. (2022). Measuring the funding landscape of COVID-19
recherche. Library Hi Tech, 40(2), 421–436. https://est ce que je.org/10
.1108/LHT-04-2021-0136
Stoye, E. (2020). How research funders are tackling coronavirus disrup-
tion. Nature, Avril 17. https://doi.org/10.1038/d41586-020-01120-2
Tang, L., Hu, G., & Liu, W. (2017). Funding acknowledgment anal-
ysis: Queries and caveats. Journal of the Association for Informa-
tion Science and Technology, 68(3), 790–794. https://est ce que je.org/10
.1002/asi.23713
Tao, Z., Zhou, S., Yao, R., Wen, K., Da, W., … Tao, L. (2020).
COVID-19 will stimulate a new coronavirus research break-
through: A 20-year bibliometric analysis. Annals of Translational
Medicine, 8(8), 528. https://doi.org/10.21037/atm.2020.04.26,
PubMed: 32411751
Van Eck, N. J., & Mugabushaka, A.-M. (2021). Crossref Funder
Registry—Mapping to top-level funding organisations [Données
ensemble]. Zenodo. https://doi.org/10.5281/zenodo.5562842
Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer,
a computer program for bibliometric mapping. Scientometrics,
84(2), 523–538. https://doi.org/10.1007/s11192-009-0146-3,
PubMed: 20585380
Van Eck, N. J., & Waltman, L. (2021). Crossref as a source of open
bibliographic metadata. In Proceedings of the 18th International
Conference of the International Society for Scientometrics and
Informetrics (pp. 1169–1174).
Viglione, G. UN. (2020). Tens of thousands of scientists are redeploy-
ing to fight coronavirus. Nature, Mars 27. https://est ce que je.org/10
.1038/d41586-020-00905-9, PubMed: 32221508
Viser, M., Van Eck, N. J., & Waltman, L. (2021). Large-scale com-
parison of bibliographic data sources: Scopus, Web de la Science,
Dimensions, Crossref, and Microsoft Academic. Quantitative Sci-
ence Studies, 2(1), 20–41. https://doi.org/10.1162/qss_a_00112
Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Lequel, J., … Kohlmeier,
S. (2020). CORD-19: The COVID-19 Open Research Dataset. arXiv,
arXiv:2004.10706. https://doi.org/10.48550/arXiv.2004.10706
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
3
5
6
0
2
0
5
7
8
2
3
q
s
s
_
un
_
0
0
2
1
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Études scientifiques quantitatives
582