RESEARCH ARTICLE
COVID-19 research in Wikipedia
Giovanni Colavizza
University of Amsterdam, the Netherlands
Keywords: bibliometrics, CORD-19, coronavirus, COVID 19, scientometrics, Wikipedia
a n o p e n a c c e s s
j o u r n a l
ABSTRACT
Citation: Colavizza, G. (2020). COVID 19
research in Wikipedia. Quantitative
Science Studies, 1(4), 1349–1380.
https://doi.org/10.1162/qss_a_00080
DOI:
https://doi.org/10.1162/qss_a_00080
Received: 14 May 2020
Accepted: 12 Luglio 2020
Corresponding Author:
Giovanni Colavizza
g.colavizza@uva.nl
Handling Editor:
Staša Milojević
Copyright: © 2020 Giovanni Colavizza.
Pubblicato sotto Creative Commons
Attribuzione 4.0 Internazionale (CC BY 4.0)
licenza.
The MIT Press
Wikipedia is one of the main sources of free knowledge on the Web. During the first few months
of the pandemic, Sopra 5,200 new Wikipedia pages on COVID-19 were created, accumulating
Sopra 400 million page views by mid-June 2020.1 Allo stesso tempo, an unprecedented amount
of scientific articles on COVID-19 and the ongoing pandemic have been published online.
Wikipedia’s content is based on reliable sources, such as scientific literature. Given its public
function, it is crucial for Wikipedia to rely on representative and reliable scientific results,
especially in a time of crisis. We assess the coverage of COVID-19-related research in
Wikipedia via citations to a corpus of over 160,000 articles. We find that Wikipedia editors are
integrating new research at a fast pace, and have cited close to 2% of the COVID-19 literature
under consideration. While doing so, they are able to provide a representative coverage of
COVID-19-related research. We show that all the main topics discussed in this literature are
proportionally represented from Wikipedia, after accounting for article-level effects. We further
use regression analyses to model citations from Wikipedia and show that Wikipedia editors
on average rely on literature that is highly cited, widely shared on social media, E
peer-reviewed.
1.
INTRODUCTION
Alongside the primary health crisis, the COVID-19 pandemic has been recognized as an informa-
tion crisis, or an “infodemic” (Cinelli, Quattrociocchi, et al., 2020; Ioannidis, 2020; Xie, Lui, et al.,
2020). Widespread misinformation (Swire-Thompson & Lazer, 2020) and low levels of health
literacy (Paakkari & Okan, 2020) are two of the main issues. In an effort to deal with them, IL
World Health Organization maintains a list of relevant research updated daily (Zarocostas,
2020), as well as a portal to provide information to the public (World Health Organization,
2020UN); the European Commission does similarly (European Commission, 2020), as do many other
countries and organizations. The need to convey accurate, reliable, and understandable medical
information online has never been so pressing.
Wikipedia plays a fundamental role as a public source of information on the Web, striving to
provide “neutral” and unbiased content (Mesgari, Okoli, et al., 2015). Wikipedia is particularly
important for access trusted medical information (Smith, 2020; Swire-Thompson & Lazer,
2020). Fortunately, Wikipedia biomedical articles have repeatedly been found to be highly visible
and of high quality (Adams, Montgomery, et al., 2020; Maggio, Steinberg, et al., 2020).
Wikipedia’s verifiability policy mandates that readers can check the sources of information con-
tained in Wikipedia, and that reliable sources should be secondary and published.2 These
1 https://wikimediafoundation.org/covid19/data (accessed July 4, 2020).
2 https://en.wikipedia.org/wiki/ Wikipedia:Reliable_sources (avuto accesso 10 May, 2020).
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
Figura 1. Number of months elapsed from publication to the first Wikipedia citation (mean and
median binned by year) of COVID-19-related research. In 2020, the average number of months from
(official) publication to the first citation in Wikipedia has gone to zero, likely due to the effect of early
releases by some journals. As this figure shows censored data, it should only be taken as illustrative of
the fact that Wikipedia editors are citing very recent or even unpublished research.
guidelines are particularly strict with respect to biomedical content, where the preferred sources
are, in order, systematic reviews, recensioni, books, and other scientific literature3.
The COVID-19 pandemic has put Wikipedia under stress, with a large amount of new, often
nonpeer-reviewed, research being published in parallel with a surge in interest for information
related to the pandemic (Wikimedia Foundation, 2020). The response of Wikipedia’s editor com-
munity has been rapid: Since March 17, 2020, all COVID-19-related Wikipedia pages have been
put under indefinite sanctions, entailing restricted edit access, to allow for better vetting of their
contents4. In parallel, a COVID-19 WikiProject has been established and a content creation cam-
paign is ongoing (Jung, Geng, et al., 2020; Wikimedia Foundation, 2020)5. While this effort is com-
mendable, it also raises questions about the capacity of editors to find, select, and integrate
scientific information on COVID-19 at such a rapid pace, while keeping quality high. As an illus-
tration of the speed at which events are happening, in Figure 1 we show the average time in number
of months from publication to a first citation in Wikipedia for a large set of COVID-19-related
articles (see Data and Methods). In 2020, this time has gone to zero: Articles on COVID-19 are
frequently cited in Wikipedia immediately after (or even before) their official publication date,
based on early access versions of articles.
In this work, we pose the following general question: Is Wikipedia relying on a representa-
tive and reliable sample of COVID-19-related research? We break this question down into the
following two research questions:
1. RQ1: Is the literature cited in Wikipedia representative of the broader topics discussed
in COVID-19-related research?
2. RQ2: Is Wikipedia citing COVID-19-related research during the pandemic following the
same inclusion criteria adopted before and in general?
3 https://en.wikipedia.org/wiki/ Wikipedia:Identifying_reliable_sources_(medicine) (avuto accesso 10 May, 2020).
4 https://en.wikipedia.org/wiki/ Wikipedia:General_sanctions (avuto accesso 10 May, 2020).
5 https://en.wikipedia.org/wiki/ Wikipedia:WikiProject_COVID-19 (avuto accesso 10 May, 2020).
Quantitative Science Studies
1350
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
We approach the first question by clustering COVID-19-related publications using text and
citation data and comparing Wikipedia’s coverage of different clusters before and during the
pandemia. The second question is instead approached using regression analysis. In particular,
we model whether an article is cited in Wikipedia or not, and how many citations it receives
from Wikipedia. We then again compare results for articles cited before and during the pandemic.
Our main finding is that Wikipedia contents rely on representative and high-impact COVID-19-
related research. (RQ1) During the past few months, Wikipedia editors have successfully integrated
COVID-19 and coronavirus research, keeping apace with the rapid growth of related literature by
including a representative sample of each of the topics it contains. (RQ2) The inclusion criteria used
by Wikipedia editors to integrate COVID-19-related research during the pandemic are consistent
with those from before, and appear reasonable in terms of source reliability. Specifically, editors
prefer articles from specialized journals or mega journals over preprints, and focus on highly cited
and/or highly socially visible literature. Altmetrics such as Twitter shares, mentions in news and
blogs, and the number of Mendeley readers complement citation counts from the scientific litera-
ture as an indicator of impact positively correlated with citations from Wikipedia. After controlling
for these article-level impact indicators and for publication venue, time, and size effects, Non c'è
indication that the topic of research matters with respect to receiving citations from Wikipedia. Questo
indicates that Wikipedia is currently neither over- nor underrelying on any specific COVID-19-
related scientific topic.
2. RELATED WORK
Wikipedia articles are created, improved, and maintained by the efforts of the community of
volunteer editors (Chen & Roth, 2012; Priedhorsky, Chen, et al., 2007), and they are used in a
variety of ways by a wide user base (Lemmerich, Sáez-Trumper, et al., 2019; Piccardi, Redi,
et al., 2020; Singer, Lemmerich, et al., 2017). The information Wikipedia contains is generally
considered to be of high quality and up to date (Adams et al., 2020; Geiger & Halfaker, 2013;
Keegan, Gergle, & Contractor, 2011; Kumar, West, & Leskovec, 2016; Piscopo & Simperl,
2019; Priedhorsky et al., 2007; Smith, 2020), notwithstanding room for improvement and the
need for constant knowledge maintenance (Chen & Roth, 2012; Forte, Andalibi, et al., 2018;
Lewoniewski, We˛ cel, & Abramovich, 2017).
Following Wikipedia’s editorial guidelines, the community of editors creates content often re-
lying on scientific and scholarly literature (Arroyo-Machado, Torres-Salinas, et al., 2020; Halfaker,
Mansurov, et al., 2018; Nielsen, Mietchen, & Willighagen, 2017), and therefore Wikipedia can be
considered a mainstream gateway to scientific information (Heilman, Kemmann, et al., 2011;
Laurent & Vickers, 2009; Lewoniewski et al., 2017; Maggio, Willinsky, et al., 2019; Piccardi
et al., 2020; Shafee, Masukume, et al., 2017). Unfortunately, few studies have considered the rep-
resentativeness and reliability of Wikipedia’s scientific sources. The evidence on what scientific and
scholarly literature is cited in Wikipedia is slim. Early studies point to a relative low overall cover-
age, indicating that between 1% E 5% of all published journal articles are cited in Wikipedia
(Priem, Piwowar, & Hemminger, 2012; Shuai, Jiang, et al., 2013; Zahedi, Costas, & Wouters,
2014). Previous studies have shown that the subset of scientific literature cited from Wikipedia is
more likely on average to be published in popular, high-impact-factor journals, and to be available
via open access (Arroyo-Machado et al., 2020; Nielsen, 2007; Teplitskiy, Lu, & Duede, 2017).
Wikipedia is particularly relevant as a means to access medical information online (Heilman
et al., 2011; Laurent & Vickers, 2009; Smith, 2020; Swire-Thompson & Lazer, 2020).
Wikipedia’s medical content is of very high quality on average (Adams et al., 2020) and is primarily
written by a core group of medical professionals who are part of the nonprofit Wikipedia Medicine
Quantitative Science Studies
1351
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
(Shafee et al., 2017). Articles that are part of WikiProject Medicine “are longer, possess a greater
density of external links, and are visited more often than other articles on Wikipedia” (Maggio et al.,
2020). Perhaps not surprisingly, the fields of research that receive most citations from Wikipedia are
“Medicine (32.58%)” and “Biochemistry, Genetics and Molecular Biology (31.5%)" (Arroyo-
Machado et al., 2020); Wikipedia’s medical pages also contain more citations to scientific literature
than the average Wikipedia page (Maggio et al., 2019). Scope for improvement remains, COME, for
esempio, the readability of medical content in Wikipedia remains difficult for the nonexpert
(Brezar & Heilman, 2019). Given Wikipedia’s medical content’s high quality and high visibility,
our work is concerned with understanding whether the Wikipedia editor community has been
able to maintain the same standards for COVID-19-related research.
3. DATA AND METHODS
3.1. COVID-19-Related Research
COVID-19-related research is not trivial to delimit (Colavizza, Costas, et al., 2020). Our approach
is to consider two public and regularly updated lists of publications:
(cid:129) The Dimensions COVID-19 Publications list (Dimensions, 2020).
(cid:129) The COVID-19 Open Research Dataset (CORD-19): a collection of COVID-19 and
coronavirus-related research, including publications from PubMed Central, Medline, arXiv,
bioRxiv, and medRxiv (Wang, Lo, et al., 2020). CORD-19 also includes publications from the
World Health Organization COVID-19 Database (World Health Organization, 2020B).
Publications from these three lists are merged, and duplicates removed using publication iden-
tifiers, including DOI, PMID, PMCID, and Dimensions ID. Publications without at least one iden-
tifier among these are discarded. As of July 1, 2020, the resulting list of publications contains
160,656 entries with a valid identifier, of which 72,795 were released in 2020, as can be seen
from Figure 2. Research on coronaviruses, and therefore the accumulation of this corpus over time,
has been clearly influenced by the SARS (2003+), MERS (2012+), and COVID-19 outbreaks. Noi
use this list of publications to represent COVID-19 and coronavirus research in what follows.
More details are given in the online repositories.
3.2. Auxiliary Data Sources
In order to study Wikipedia’s coverage of this list of COVID-19-related publications, we use data
from Altmetric (Ortega, 2018; Robinson-García, Torres-Salinas, et al., 2014). Altmetric provides
Figura 2. COVID-19-related literature over time, binned by publication year. (UN) Overall; (B) Since 2000 included.
Quantitative Science Studies
1352
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
Wikipedia citation data relying on known identifiers6. Despite this limitation, Altmetric data have
been previously used to map Wikipedia’s use of scientific articles (Arroyo-Machado et al., 2020;
Torres-Salinas, Romero-Frías, & Arroyo-Machado, 2019; Zahedi et al., 2014), especially because
citations from Wikipedia are considered a possible measure of impact (Kousha & Thelwall, 2017;
Sugimoto, Work, et al., 2017). Publications from the full list above are queried using the Altmetric
API by DOI or PMID. In this way, 101,662 publications could be retrieved. After merging for
duplicates by summing Altmetric indicators, we have a final set of 94,600 distinct COVID-19-
related publications with an Altmetric entry.
Inoltre, we use data from Dimensions (Herzog, Hook, & Konkiel, 2020; Martín-Martín,
Thelwall, & López-Cózar, 2020) in order to get citation counts for COVID-19-related publications.
The Dimensions API is also queried by DOI and PMID, resulting in 141,783 matches. All auxiliary
data sources were queried on July 1, 2020 pure.
3.3. Methods
We approach our two research questions with the following methods:
1. RQ1: To assess whether the literature cited in Wikipedia is representative of the broader
topics discussed in COVID-19-related research, we first cluster COVID-19 literature using
text and citation data. Clusters of related literature allow us to identify broad distributions
over topics within our COVID-19 corpus. We then assess to what extent the literature cited
from Wikipedia follows the same distribution over topics of the entire corpus.
2. RQ2: To ascertain the inclusion criteria of Wikipedia editors, we use linear regression to
model whether an article is cited from Wikipedia or not (logistic regression) and the number
of Wikipedia citations it receives (linear regression).
In this section, we detail the experimental choices made for clustering analysis using publication
text and citation data. Details of the regression analyses are given in the corresponding section.
Text-based clustering of publications was performed in two ways: topic modeling and k-means
relying on SPECTER embeddings. Both methods made use of the titles and abstracts of available
publications by concatenating them into a single string. We detected 152,247 articles in English
out of 160,656 total articles (8,409 less than the total). Of these, 33,301 have no abstract; thus we
only used their title, as the results did not change significantly when excluding articles without an
abstract. Before performing topic modeling, we applied a preprocessing pipeline using scispaCy’s
en_core_sci_md model (Neumann, King, et al., 2019) to convert each document into a bag of
words representation, which includes the following steps: entity detection and inclusion in the bag-
of-words for entities strictly longer than one token; lemmatization; removal of isolated punctuation,
stop words, and tokens consisting of a single character; and inclusion of frequent bigrams.
SPECTER embeddings were instead retrieved from the API without any preprocessing7.
Topic modeling is a family of methods to learn statistical patterns of keywords frequently
occurring together in the same documents. Formalmente, a topic is defined as a probability distribution
over a vocabulary. Multiple topics can be learned from a corpus of documents and then used to
cluster it (Blei, 2012). While topic models are useful given that they require no annotated data, Essi
also provide a way to look at a certain corpus of documents. As such, they have been previously
6 The identifiers considered by Altmetric in order to establish a citation from Wikipedia to an article currently
include DOI, URI from a domain white list, PMID, PMCID, and arXiv ID. https://help.altmetric.com/support
/solutions/articles/6000060980-how-does-altmetric-track-mentions-on-wikipedia (accessed April 27, 2020).
7 https://github.com/allenai/paper-embedding-public-apis (accessed April 25, 2020).
Quantitative Science Studies
1353
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
used for bibliometric analysis (Leydesdorff & Nerghes, 2017; Yau, Porter, et al., 2014). We trained
and compared topic models using Latent Dirichlet Allocation (LDA; Blei, Di, & Jordan, 2003),
Correlated Topic Models (CTM; Blei & Lafferty, 2007), Hierarchical Dirichlet Process (HDP;
Teh, Jordan, et al., 2006) and a range of topics between five and 50. We found similar results in
terms of topic contents and their Wikipedia coverage (see Results) across models and over multiple
runs, and a reasonable value of the number of topics to be between 15 E 25 from a topic co-
herence analysis (Mimno, Wallach, et al., 2011). Therefore, in what follows we discuss an LDA
model with 15 topics8. The top words for each topic of this model are given in the Appendix, while
topic intensities over time are plotted as a heat map in Figure A2. SPECTER is a novel method to
generate document-level embeddings of scientific documents based on a transformer language
model and the network of citations (Cohan, Feldman, et al., 2020). SPECTER does not require ci-
tation information at inference time, and performs well without any further training on a variety of
compiti. We embed every paper and cluster them using k-means with k = 20. The number of clusters
was established using the elbow and silhouette methods; different values of k could well be cho-
sen, so we again decided to pick the smallest reasonable value of k.
We then turned our attention to citation network clustering. We constructed a bibliographic
coupling citation network (Kessler, 1963) based on all publications with references provided by
Dimensions; these amount to 118,214. Edges were weighted using fractional counting
(Perianes-Rodriguez, Waltman, & van Eck, 2016), hence dividing the number of references in
common between any two publications by the length of the union of their reference lists (così,
the maximum possible weight is 1.0). We used only the giant weakly connected component,
which amounts to 114,829 nodes (3,385 less than the total) E 70,091,752 edges with a
median weight of 0.0217. We clustered the citation network using the Leiden algorithm
(Traag, Waltman, & van Eck, 2019) with a resolution parameter of 0.05 and the Constant
Potts Model (CPM) quality function (Traag, Van Dooren, & Nesterov, 2011). With this configu-
ration, we found that the largest 43 clusters account for half the nodes in the network, and the
largest cluster is composed of 15,749 nodes.
These three methods differ in which data they use and how, and thus provide for comple-
mentary results. While topic models focus on word co-occurrences and are easier to interpret,
bibliographic coupling networks rely on the explicit citation links among publications. Finalmente,
SPECTER combines both kinds of data and modern deep learning techniques.
4. RESULTS
Intense editorial work was carried out over the early weeks of 2020 to include scientific infor-
mation on COVID-19 and coronaviruses into Wikipedia (Jung, Geng, et al., 2020). From
Figura 3(UN), we can appreciate the surge in new citations added in Wikipedia to COVID-19
research. Importantly, these citations were not only added not only to cope with the growing
amount of new literature but also to fill gaps by including literature published before 2020, COME
shown in Figure 3(B). The total fraction of COVID-19-related articles that are cited at least
once in Wikipedia compared with the total is 1.9%. Yet, this number is uneven over languages
and over time. Articles in English have a 2.0% chance of being cited in Wikipedia, while articles
in other languages have only a 0.24% chance. To be sure, the whole corpus is English dominated,
as we discussed above. This might be an artifact of the coverage of the data sources, as well as the
way the corpus was assembled. The coverage of articles over time is instead given in Figure 4,
(cid:1)
8 We used gensim’s implementation for LDA (R
hu˚ r(cid:1)ek & Sojka, 2010) and tomotopy for CTM and HTM,
https:// bab2min.github.io/tomotopy (version 0.7.0). The reader can find more results and the code to
replicate all experiments in the accompanying repository.
Quantitative Science Studies
1354
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
starting from 2003 when the first surge of publications happens due to SARS. We can appreciate
that the coverage seems to be uneven, and less pronounced for the past few years (2017–2020), yet
this needs to be considered in view of the high growth of publications in 2020. Hence, while 2020
is a relatively low-coverage year (1.2%), it is already the year with the most publications cited in
Wikipedia in absolute number (Figure 3b).
Citation distributions are skewed in Wikipedia, as they are in science more generally. Some
articles receive a high number of citations in Wikipedia and some Wikipedia articles make a high
number of citations to COVID-19-related literature. Table A1 lists the top 20 Wikipedia articles by
number of citations of COVID-19-related research. These articles, largely in English, primarily
focus on the recent pandemic and coronaviruses/viruses from a virology perspective, as already
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Figura 3. Timing of new citations from Wikipedia, and publication years of the articles they refer to. See Figure A1 for the full timeline.
(UN) Number of citations in Wikipedia to COVID-19 literature, per month from January 2018. (B) Publication year of COVID-19 articles cited
from Wikipedia, distinguishing between citations added before 2020 and in 2020.
Quantitative Science Studies
1355
COVID-19 research in Wikipedia
Figura 4. Fraction of COVID-19-related articles cited from Wikipedia per year, from 2003.
highlighted in a study by the Wikimedia Foundation (Jung et al., 2020). Table A2 reports instead
the top 20 journal articles cited from Wikipedia. These also follow a similar pattern: Articles
published before 2020 focus on virology and are made of a high proportion of review articles.
Articles published in 2020, instead, have a focus on the ongoing pandemic, its origins, and its
epidemiological and public health aspects. As we see next, this strongly aligns with the general
trends of COVID-19-related research over time.
In order to discuss research trends in our CORD-19-related corpus at a higher level of granu-
larity, we grouped the 15 topics from the LDA topic model into five general topics and labeled
them as follows:
(cid:129) Coronaviruses: topics 5, 8; this general topic includes research explicitly on coronaviruses
(COVID 19, SARS, MERS) from a variety of perspectives (virology, epidemiology, inten-
sive care, historical unfolding of outbreaks).
(cid:129) Epidemics: topics 9, 11, 12; research on epidemiology, including modeling the trans-
mission and spread of pathogens.
(cid:129) Public health: topics 0, 1, 10; research on global health issues, healthcare.
(cid:129) Molecular biology and immunology: topics 2, 4, 6; research on the genetics and biology
of viruses, vaccines, drugs, therapies.
(cid:129) Clinical medicine: topics 3, 7, 13, 14; research on intensive care, hospitalization, E
clinical trials.
The grouping is informed by agglomerative clustering based on the Jensen-Shannon distance
between topic-word distributions (Figure A5). To be sure, the labeling is a simplification of the
actual publication contents. It is also worth considering that topics overlap substantially. IL
COVID-19 research corpus is dominated by literature on coronaviruses, public health, E
epidemics, largely due to 2020 publications. COVID-19-related research did not accumulate
uniformly over time. We plot the relative (yearly mean, Figure A3a) and absolute (yearly sum,
Figure A3b) general topic intensity. From these plots, we confirm the periodization of COVID-
19-related research as connected to known outbreaks. Outbreaks generate a shift in the attention
of the research community, which is apparent when we consider the relative general topic in-
tensity over time in Figure A3(UN). IL 2003 SARS outbreak generated a shift associated with a rise
in publications on coronaviruses and in the management of epidemic outbreaks (public health,
epidemiology). A similar shift is again happening, at a much larger scale, during the current
Quantitative Science Studies
1356
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
COVID-19 pandemic. When we consider the absolute general topic intensity, which can be
interpreted as the number of articles on a given topic (Figure A3b), we can appreciate how sci-
entists are mostly focusing on topics related to public health, epidemics, and coronaviruses
(COVID 19) during these first months of the current pandemic.
4.1. RQ1: Wikipedia Coverage of COVID-19-Related Research
We address here our first research question: Is the literature cited in Wikipedia representative of
the broader topics discussed in COVID-19-related research? We start by comparing the general
topic coverage of articles cited from Wikipedia with those which are not. In Figure 5, three plots
are provided: the general topic intensity of articles published before 2020 (Figure 5a), In 2020
(Figure 5b) and overall (Figure 5c). The general topic intensity is averaged and 95% confidence
intervals are provided. From Figure 5(C) we can see that Wikipedia seems to cover COVID-19-
related research well. The general topics on immunology, molecular biology, and epidemics
seem slightly over represented, where clinical medicine and public health are slightly under
represented. A comparison between publications from 2020 and from before highlights further
trends. In particular, In 2020, Wikipedia editors have focused more on recent literature on coro-
naviruses, thus directly related to COVID-19 and the current pandemic, and proportionally less on
literature on public health, which is also dominating 2020 publications. The traditional slight over
representation of immunology and molecular biology literature persists. Detailed Kruskal–Wallis
H test statistics for significant differences (Kruskal & Wallis, 1952) and Cohen’s d for their effect
sizes (Cohen, 1988) are provided in the Appendix (Figure A6 and Tables A3–A5). While the dis-
tributions are significantly different for most general topics and periodizations, the effect sizes are
often small. The coverage of COVID-19-related literature from Wikipedia appears therefore to be
reasonably balanced from this first analysis, and to remain so in 2020. The topical differences we
found, especially around coronaviruses and the current COVID-19 outbreak, might in part be
explained by the criterion of notability, which led to the creation or expansion of Wikipedia
articles on the ongoing pandemic9.
A complementary way to address the same research question is to investigate Wikipedia’s cov-
erage of publication clusters. We consider here both SPECTER k-means clusters and bibliographic
network clusters. While we use all 20 SPECTER clusters, we limit ourselves to the top n network
clusters that are necessary in order to cover at least 50% of the nodes in the network. In this way,
we consider 41 clusters for the citation network, all of size above 300. In Figure 6 we plot the
percentage of articles cited in Wikipedia per cluster, and the clusters’ size in number of publica-
tions they contain. There is no apparent size effect in either of the two clustering solutions.
When we characterize clusters using general topic intensities, some clear patterns emerge.
Starting with SPECTER k-means clusters, the most cited clusters are numbers 6 E 8 (main macro-
topics: molecular biology) E 5 (main macrotopics: coronaviruses and public health, particolarmente
focusing on COVID-19 characteristics, detection, and treatment). The least cited clusters include
number 18 (containing preprints) E 13 (focused on the social sciences, and especially econom-
ics, such as from SSRN journals). Considering citation network clusters, the largest but not most
cited are numbers 0 (containing 2020 research on COVID-19) E 1 (with publications on molec-
ular biology and immunology). The other clusters are smaller and hence more specialized. IL
reader can explore all clusters using the accompanying repository.
We have seen so far that Wikipedia relies on a reasonably representative sample of COVID-19-
related literature when assessed using topic models. During 2020, the main effort of editors has
9 https://en.wikipedia.org/wiki/ Wikipedia:Notability (accessed May 10, 2020).
Quantitative Science Studies
1357
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Figura 5. Average general topic intensity of COVID-19-related publications cited in Wikipeda
(green) or not (blue). 95% bootstrapped confidence intervals are given. See Figure A6 and
Tables A3–A5 for significance tests and effect sizes. (UN) Published before 2020. Note: this plot also
considers as cited in Wikipedia those publications published before 2020 and cited for the first time
In 2020. (B) Published in 2020. (C) All publications.
Quantitative Science Studies
1358
COVID-19 research in Wikipedia
Figura 6. Proportion of articles cited from Wikipedia ( y-axis) per cluster size (number of articles in the cluster, x-axis). (UN) SPECTER k-means
(Tutto). (B) Bibliographic coupling (top 41).
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
focused on catching up with abundant new research (and some backlog) on the ongoing pandemic
E, to a lower extent, on public health and epidemiology literature. When assessing coverage
using different clustering methods, we do not find a size effect by which larger clusters are propor-
tionally more cited from Wikipedia. Yet we also find that, in particular with citation network clus-
ters, smaller clusters can be either highly or lowly cited from Wikipedia on average. Lastly, we find
an underrepresentation of preprint and social science research. Despite this overall encouraging
result, differences in coverage persist. In the next section, we further assess whether these differ-
ences can be explained away by considering article-level measures of impact.
4.2. RQ2: Predictors of Citations from Wikipedia
In this section, we address our second research question: Is Wikipedia citing COVID-19-related
research during the pandemic following the same criteria adopted before and in general? We use
regression analysis in two forms: a logistic regression to model if a paper is cited in Wikipedia or
non, and a linear regression to model the number of citations a paper receives in Wikipedia. While
the former model captures the suitability of an article to provide encyclopedic evidence, the latter
captures its relevance to multiple Wikipedia articles.
4.2.1. Dependent variables
Wikipedia citation counts for each article are taken from Altmetric. If this count is 1 or more, an
article is considered as cited in Wikipedia. We consider citation counts from Altmetric at the time
of the data collection for this study. We focus on the articles with a match from Dimensions, E
consider an article to have zero citations in Wikipedia if it is not found in the Altmetric database.
4.2.2.
Independent variables
We focus our study on three groups of independent variables at the article level capturing impact,
topic, and timing respectively. Previous studies have shown how literature cited in Wikipedia
tends to be published in prestigious journals and available via open access (Arroyo-Machado
et al., 2020; Nielsen, 2007; Teplitskiy et al., 2017). We are interested in assessing some of these
Quantitative Science Studies
1359
COVID-19 research in Wikipedia
known patterns for COVID-19-related research, complementing them by considering citation
counts and the topics discussed in the literature, and eventually understanding whether there
has been any change in 2020.
Article-level variables include citation counts from Dimensions and a variety of altmetric
indicators (Robinson-García et al., 2014), which have been found to correlate with later citation
impact of COVID-19 research (Kousha & Thelwall, 2020). Altmetrics include the number of
Mendeley readers, Twitter interactions (unique users), Facebook shares, mentions in news and
blog posts (summed due to their high correlation), mentions in policy documents, and the expert
ratio in user engagement10. We also include the top 20 publication venues by number of articles in
the corpus using dummy coding, taking as reference level a generic category “other,” which
includes articles from all other venues. It is worth clarifying that article-level variables were also
calculated at the time of the data collection for this study. This might seem counterintuitive,
especially for the classification task, as one might prefer to calculate variables at the time when
an article was first cited in Wikipedia. We argue that this is not the case, because Wikipedia can
always be edited and citations removed as easily as added. As a consequence, a citation in
Wikipedia (or its absence) is a continued rather than a discrete action, justifying calculating all
counts at the same time for all articles in the corpus.
Topic-level variables capture the topics discussed in the articles, as well as their relative
importance in terms of size (size effects). They include the macrotopic intensities for each article,
the size of the SPECTER cluster an article belongs to, and the size of its bibliographic coupling
network cluster (for the 41 largest clusters with more than 300 articles each, setting it to zero
for articles belonging to other clusters. In this way, the variable accounts for both size and thresh-
olding effects). Cluster identities for both SPECTER and citation network clusters were also tested,
but did not contribute significantly to the models. Several other measures were considered, come
as the semantic centrality of an article to its cluster centroid (SPECTER k-means) and network
centralities, but because these all strongly correlate to size indicators, they were discarded to
avoid multicollinearity.
Lastly, we include the year of publication using dummy coding and 2020 as the reference level.
Several other variables were tested. The proposed selection removes highly correlated variables
while preserving the information required by the research question. The Pearson’s correlations for
the selected transformed variables are shown in Figure A4. More details, along with a full profiling
of variables, are provided in the accompanying repository.
4.2.3. Model
We consider two models: a logistic model on being cited in Wikipedia (1) or not (0) and an ordi-
nary least squares (OLS) model on citation counts from Wikipedia. Both models use the same
set of independent variables and transformations described in Table 1.
All count variables are transformed by adding one and taking the natural logarithm, while the
remaining variables are either indicators or range between 0 E 1 (such as general topic inten-
sities, beginning with a tm_ prefix; Per esempio, tm_ph is “public health”). OLS models including
log transform and the addition of 1 for count variables such as citation counts, have been found to
perform well in practice when compared to more involved alternatives (Thelwall, 2016; Thelwall
& Wilson, 2014). Inoltre, all missing values were set to zero, except for the publication year,
10 Calculated using Altmetric data, which distinguishes among the number of researchers (R), experts (e), prac-
rþeþp
rþeþpþm.
titioners ( P) and members of the public (M) engaging with an article. The expert ratio is defined as
Quantitative Science Studies
1360
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
ln(X + 1)
ln(X + 1)
ln(X + 1)
ln(X + 1)
ln(X + 1)
ln(X + 1)
ln(X + 1)
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
COVID-19 research in Wikipedia
Tavolo 1.
logarithm is taken
Regression variables, their description, typology, and transformations. ln(X + 1) means one is added to the value and then the natural
Description
Type
Transformations
Variable
in_wikipedia
n_cit_w
Whether an article is cited from Wikipedia (1) or not (0)
Number of citations from Wikipedia
publication_year
Publication year of the article
times_cited
Number of citations (Dimensions)
counts_mendeley
Number of Mendeley readers (Altmetric)
counts_policy
Number of mentions in policy documents (Altmetric)
Indicator
Numeric
Categorical
Numeric
Numeric
Numeric
counts_twitter_unique
Number of engagements with unique Twitter users (Altmetric)
Numeric
counts_blogs_news
Number of mentions in news and blogs (Altmetric)
counts_facebook
Number of mentions in Facebook (Altmetric)
Numeric
Numeric
Ratio of engagements with experts (Altmetric)
Numeric (0 A 1)
expert_ratio
top_journal
Journal
tm_coronaviruses
Topic intensity: Coronaviruses
tm_epidemics
Topic intensity: Epidemics
Topic intensity: Public health
tm_ph
tm_mbi
Categorical
Numeric (0 A 1)
Numeric (0 A 1)
Numeric (0 A 1)
Topic intensity: Molecular biology and immunology
Numeric (0 A 1)
tm_clinical_medicine
Topic intensity: Clinical medicine
Numeric (0 A 1)
spectre_cluster_size
Size of SPECTER cluster the article belongs to
network_cluster_size
Size of bib. coupling cluster the article belongs to
Numeric
Numeric
ln(X + 1)
ln(X + 1)
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
venue ( journal), and general topic intensities as removing rows with missing values yielded
comparable results.
4.2.4. Discussion
We discuss results for three models: two logistic regression models on articles published and first
cited up to and including in 2020, and one on articles published and first cited up to and including
2019. IL 2019 model only considers articles published in 2019 or earlier and cited for the first time
from Wikipedia in 2019 or earlier, or articles never cited from Wikipedia, discarding articles
published in 2020 or cited from Wikipedia in 2020 irrespective of their publication time. We also
discuss an OLS model predicting (the log of ) citation counts including all data up to and including
2020. We do not discuss a 2019 OLS model because it would require Wikipedia citation counts
calculated at the end of 2019, which were not available to us. Regression tables for these three
models are provided in the Appendix, while Figure A7 shows the distribution of some variables
distinguishing between articles cited in Wikipedia or not. Logistic regression tables provide marginal
effects, while the OLS table provides the usual coefficients. The actual number of data points used to
fit each model, after removing those that contained any null value, is given in the regression tables.
Quantitative Science Studies
1361
COVID-19 research in Wikipedia
Considering the logistic models first, we can show some significant effects11. First of all, the year
of publication is mostly negatively correlated with being cited from Wikipedia, compared with the
reference category 2020. This seems largely due to publication size effects, as the fraction of 2020
articles cited from Wikipedia is quite low (Guarda la figura 4). IL 2019 model indeed shows positive
correlations for all years when compared to the reference category 2019, and indeed 2019 is the
year with lowest coverage since 2000. Secondly, some of the most popular venues are positively
correlated with citations in Wikipedia, when compared to an “other” category (which includes all
venues except the top 20). Nel 2020 modello, these venues include mega journals (Nature,
Scienza) and specialized journals (The Lancet, BMJ ). Negative correlations occur for preprint
servers (medRxiv and bioRxiv in particular).
When we consider indicators of impact, we see a significant positive effect for citation counts,
Mendeley readers, Twitter, and news and blog mentions; we see instead no effect for policy doc-
ument mentions and Facebook engagements. This is consistent in the 2019 modello, except for
Facebook having a positive effect and Twitter a lack of correlation. This result, on the one hand,
highlights the importance of academic indicators of impact such as citations, and on the other hand
suggests the possible complementarity of altmetrics in this respect. As certain altmetrics can accu-
mulate more rapidly than citations (Fang & Costas, 2020), they could complement them effectively
when needed (Kousha & Thelwall, 2020). Inoltre, the expert ratio in altmetrics engagement is
negatively correlated with being cited from Wikipedia in 2020. This might be due to the high
altmetrics engagement with COVID-19 research in 2020, but it could also hint at the possibility
that social media impact need not be driven by experts in order to be correlated with scientific
impact. We can further see how cluster size effects are not or very marginally correlated with being
cited in Wikipedia.
Lastly, general topic intensities are never correlated with being cited in Wikipedia in either
modello, underlining that Wikipedia appears to be proportionally representing all COVID-19-related
research and that residual topical differences in coverage are due to article-level effects.
IL 2020 OLS model largely confirms these results, except that mentions in policy documents
and Facebook engagements become positively correlated with the number of citations from
Wikipedia. It is important to underline that, for all these results, there is no attempt to establish
causality. Per esempio, the positive correlation between the number of Wikipedia articles citing
a scientific article and the number of policy documents mentioning it might be due to policy
document editors using Wikipedia, Wikipedia editors using policy documents, both, or neither.
The fact is, more simply, that some articles are picked up by both.
5. CONCLUSION
The results of this study provide some reassuring evidence. It appears that Wikipedia’s editors are
well able to keep track of COVID-19-related research. Of 141,783 articles in our corpus, 3,083
(∼2%) are cited in Wikipedia: a share comparable to what was found in previous studies.
Wikipedia editors are relying on scientific results representative of the several topics included
11 Marginal effect coefficients should be interpreted as follows. For binary discrete variables (0/1), they represent
the discrete rate of change in the probability of the outcome, everything else kept fixed; Perciò, a change from
0 A 1 with a significant coefficient of 0.01 entails an increase in the probability of the outcome of 1%. For
categorical variables with more than two outcomes, they represent the difference in the predicted probabilities
of any one category relative to the reference category. For continuous variables, they represent the instanta-
neous rate of change. It might be the case that this can also be interpreted linearly (per esempio., a significant change
Di 1 in the variable entails a change proportional to the marginal effect coefficient in the probability of the out-
come). Yet, this rests on the assumption that the relationship between independent and dependent variables is
linear, irrespective of the orders of magnitude under consideration. This might not be the case in practice.
Quantitative Science Studies
1362
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
in a large corpus of COVID-19-related research. They have been effectively able to cope with
new, rapidly growing literature. The minor discrepancies in coverage that persist, with slightly
more Wikipedia-cited articles on topics such as molecular biology and immunology and slightly
fewer on clinical medicine and public health, are fully explained away by article-level effects.
Wikipedia editors rely on impactful and visible research, as evidenced by largely positive cita-
tion and altmetrics correlations. Importantly, Wikipedia editors also appear to be following the
same inclusion standards in 2020 as before: Generalmente, they rely on specialized and highly cited
results from reputable journals, avoiding, Per esempio, preprints.
The main limitation of this study is that it is purely observational, and thus does not explain why
some articles are cited in Wikipedia or not. While in order to assess the coverage of COVID-19-
related research from Wikipedia this is of secondary importance, it remains relevant when attempting
to predict and explain it. A second limitation is that this study is based on citations from Wikipedia to
scientific publications, and no Wikipedia content analysis is performed. Citations of scientific litera-
ture, while informative, do not completely address the interrelated questions of Wikipedia’s knowl-
edge representativeness and reliability. Therefore, some directions for future work include comparing
Wikipedia coverage with expert COVID-19 review articles, as well as studying Wikipedia edit and
discussion history in order to assess editor motivations. Another interesting direction for future work is
the assessment of all Wikipedia citations of any source from COVID-19 Wikipedia pages, because
here we only focused on the fraction directed at COVID-19-related scientific articles. Lastly, future
work can address the engagement of Wikipedia users with cited COVID-19-related sources.
Wikipedia is a fundamental source of free knowledge, open to all. The capacity of its editor
community to respond quickly to a crisis and provide high-quality content is, Perciò, critical.
Our results here are encouraging in this respect.
COMPETING INTERESTS
The author has no competing interests.
FUNDING INFORMATION
This research was not funded.
ACKNOWLEDGMENTS
Digital Science kindly provided access to Altmetric and Dimensions data.
DATA AND CODE AVAILABILITY
All the analyses can be replicated using code and following the instructions given in the accom-
panying repository: https://github.com/Giovanni1085/covid-19_wikipedia. The preparation of
the data follows the steps detailed in this repository instead: https://github.com/CWTSLeiden
/cwts_covid (Colavizza et al., 2020). Analyses based on Altmetric and Dimensions data require
access to these services.
REFERENCES
Adams, C. E., Montgomery, UN. A., Aburrow, T., Bloomfield, S.
Briley, P. M., … Xia, J. (2020). Adding evidence of the effects
of treatments into relevant Wikipedia pages: A randomised trial.
BMJ Open, 10(2), e033655. DOI: https://doi.org/10.1136/bmjopen
-2019-033655, PMID: 32086355, PMCID: PMC7045027
Arroyo-Machado, W., Torres-Salinas, D., Herrera-Viedma, E., &
Romero-Frías, E. (2020). Science through Wikipedia: A novel
representation of open knowledge through co-citation net-
works. PLOS ONE, 15(2), e0228713. DOI: https://doi.org/10
.1371/journal.pone.0228713, PMID: 32040488, PMCID:
PMC7010282
Blei, D. M. (2012) Probabilistic topic models. Communications
of the ACM, 55(4), 77–84. DOI: https://doi.org/10.1145
/2133806.2133826
Quantitative Science Studies
1363
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science.
Annals of Applied Statistics, 1(1), 17–35. DOI: https://doi.org/10
.1214/07-AOAS114
Blei, D. M., Di, UN. Y., & Jordan, M. IO. (2003). Latent Dirichlet allocation.
Journal of Machine Learning Research, 3, 993–1022.
Brezar, A., & Heilman, J. (2019). Readability of English Wikipedia’s
health information over time. WikiJournal of Medicine, 6(1), 7.
DOI: https://doi.org/10.15347/wjm/2019.007
Chen, C.-C., & Roth, C. (2012). {{Citation needed}}: The dynamics
of referencing in Wikipedia. In Proceedings of the Eighth Annual
International Symposium on Wikis and Open Collaboration.
New York, NY: ACM Press. DOI: https://doi.org/10.1145/2462932
.2462943
Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M.,
Brugnoli, E., … Scala, UN. (2020). The COVID-19 social media
infodemic. Scientific Reports, 10, 16598. DOI: https://doi.org/10
.1038/s41598-020-73510-5, PMID: 33024152, PMCID:
PMC7538912
Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S.
(2020). SPECTER: Document-level representation learning using
citation-informed transformers. arXiv, 2004.07180. http://arxiv
.org/abs/2004.07180. DOI: https://doi.org/10.18653/v1/2020
.acl-main.207
Cohen, J. (1988). Statistical power analysis for the behavioral sciences.
London: Routledge. DOI: https://doi.org/10.1002/bs.3830330104
Colavizza, G., Costas, R., Traag, V. A., van Eck, N. J., van Leeuwen,
T., & Waltman, l. (2020). A scientometric overview of CORD-19.
bioRxiv, 2020.04.20.046144. DOI: https://doi.org/10.1101
/2020.04.20.046144
Dimensions. (2020). Dimensions COVID-19 publications, datasets
and clinical trials. https://dimensions.figshare.com/articles
/ Dimensions_COVID-19_publications_datasets_and_clinical
_trials/11961063
European Commission. (2020). Fighting disinformation: EU actions
to tackle COVID-19 disinformation. https://www.consilium.europa
.eu/en/policies/coronavirus/fighting-disinformation/
Fang, Z., & Costas, Z. (2020). Studying the accumulation velocity
of altmetric data tracked by Altmetric.com. Scientometrics, 123,
1077–1101. DOI: https://doi.org/10.1007/s11192-020-03405-9
Forte, A., Andalibi, N., Gorichanaz, T., Kim, M. C., Park, T., &
Halfaker, UN. (2018). Information fortification: An online citation
behavior. Negli Atti del 2018 ACM Conference on
Supporting Groupwork – GROUP ’18 (pag. 83–92). New York,
NY: ACM Press. DOI: https://doi.org/10.1145/3148330.3148347
Geiger, R. S., & Halfaker, UN. (2013). When the levee breaks:
Without bots, what happens to Wikipedia’s quality control pro-
cesses? In Proceedings of the 9th International Symposium on Open
Collaboration (pag. 1–6). New York, NY: ACM Press. DOI: https://doi
.org/10.1145/2491055.2491061
Halfaker, A., Mansurov, B., Redi, M., & Taraborelli, D. (2018).
Citations with identifiers in Wikipedia. https://figshare.com/articles
/Citations_with_identifiers_in_Wikipedia/1299540/1
Heilman, J. M., Kemmann, E., Bonert, M., Chatterjee, A., Ragar, B., …
Laurent, M. R. (2011). Wikipedia: A key tool for global public
health promotion. Journal of Medical Internet Research, 13(1), e14.
DOI: https://doi.org/10.2196/jmir.1589, PMID: 21282098, PMCID:
PMC3221335
Herzog, C., Hook, D., & Konkiel, S. (2020). Dimensions: Bringing
down barriers between scientometricians and data. Quantitative
Science Studies, 1(1), 387–395. DOI: https://doi.org/10.1162/qss
_a_00020
Ioannidis, J. P. (2020). Coronavirus disease 2019: The harms of ex-
aggerated information and non-evidence-based measures.
European Journal of Clinical Investigation, 50(4), e13222. DOI:
https://doi.org/10.1111/eci.13222, PMID: 32191341, PMCID:
PMC7163529
Jung, C., Geng, S., Cha, M., Hong, I., & Sáez-Trumper, D. (2020).
Open data and COVID-19: Wikipedia as an informational resource
durante la pandemia. https://medium.com/@diegosaeztrumper
/open-data-and-covid-19-wikipedia-as-an-informational-resource
-during-the-pandemic-dcca6a
Keegan, B., Gergle, D., & Contractor, N. (2011). Hot off the wiki:
Dynamics, practices, and structures in Wikipedia’s coverage of
the To(cid:3)hoku catastrophes. In Proceedings of the 7th International
Symposium on Wikis and Open Collaboration – WikiSym ’11.
New York, NY: ACM Press. DOI: https://doi.org/10.1145/2038558
.2038577
Kessler, M. M. (1963). Bibliographic coupling between scientific
papers. American Documentation, 14(1), 10–25. DOI: https://
doi.org/10.1002/asi.5090140103
Kousha, K., & Thelwall, M. (2017). Are Wikipedia citations impor-
tant evidence of the impact of scholarly articles and books?
Journal of the Association for Information Science and Technology,
68(3), 762–779. DOI: https://doi.org/10.1002/asi.23694
Kousha, K., & Thelwall, M. (2020). COVID-19 publications:
Database coverage, citations, readers, tweet, news, Facebook
muri, Reddit posts. Quantitative Science Studies, 1(3), 1068–1091.
DOI: https://doi.org/10.1162/qss_a_00066
Kruskal, W. H., & Wallis, W. UN. (1952). Use of ranks in one-criterion
variance analysis. Journal of the American Statistical Association,
47(260), 583–621. DOI: https://doi.org/10.1080/01621459.1952
.10483441
Kumar, S., West, R., & Leskovec, J. (2016). Disinformation on the
Web: Impact, characteristics, and detection of Wikipedia
hoaxes. In Proceedings of the 25th International Conference on
World Wide Web (pag. 591–602). New York, NY: ACM Press.
DOI: https://doi.org/10.1145/2872427.2883085
Laurent, M. R., & Vickers, T. J. (2009). Seeking health information
online: Does Wikipedia matter? Journal of the American Medical
Informatics Association, 16(4), 471–479. DOI: https://doi.org/10
.1197/jamia.M3059, PMID: 19390105, PMCID: PMC2705249
Lemmerich, F., Sáez-Trumper, D., West, R., & Zia, l. (2019). Why
the world reads Wikipedia: Beyond English speakers. In
Proceedings of the Twelfth ACM International Conference on
Web Search and Data Mining (pag. 618–626). New York, NY:
ACM Press. DOI: https://doi.org/10.1145/3289600.3291021
Lewoniewski, W., We˛ cel, K., & Abramowicz, W. (2017). Analysis
of references across Wikipedia languages. In R. Damaševic(cid:1)ius &
V. Mikašyte_ (Eds.) Information and Software Technologies, Vol. 756,
pag. 561–573. Cham: Springer. DOI: https://doi.org/10.1007/978-3
-319-67642-547
Leydesdorff, L., & Nerghes, UN. (2017). Co-word maps and topic
modeling: A comparison using small and medium-sized corpora
(N < 1,000). Journal of the Association for Information Science
and Technology, 68(4), 1024–1035. DOI: https://doi.org/10.1002
/asi.23740
Maggio, L. A., Willinsky, J. M., Steinberg, R. M., Mietchen, D.,
Wass, J. L., & Dong, T. (2019). Wikipedia as a gateway to bio-
medical research: The relative distribution and use of citations in
the English Wikipedia. PLOS ONE, 12(12), e0190046. DOI: https://
doi.org/10.1371/journal.pone.0190046, PMID: 29267345,
PMCID: PMC5739466
Maggio, L. A., Steinberg, R. M., Piccardi, T., & Willinsky, J. M.
(2020). Reader engagement with medical content on Wikipedia.
eLife, 9, e52426. DOI: https://doi.org/10.7554/eLife.52426, PMID:
32142406, PMCID: PMC7089765
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
1364
COVID-19 research in Wikipedia
Martín-Martín, A., Thelwall, M., & López-Cózar, E. D. (2020).
Google Scholar, Microsoft Academic, Scopus, Dimensions,
Web of Science, and OpenCitations’ COCI: A multidisciplinary
comparison of coverage via citations. arXiv, 2004.14329. https://
arxiv.org/abs/2004.14329. DOI: https://doi.org/10.1007/s11192
-020-03792-z, PMID: 32981987, PMCID: PMC7505221
Mesgari, M., Okoli, C., Mehdi, M., Nielsen, F. A., & Lanamäki, A.
(2015). “The sum of all human knowledge”: A systematic review
of scholarly research on the content of Wikipedia. Journal of
the Association for Information Science and Technology, 66(2),
219–245. DOI: https://doi.org/10.1002/asi.23172
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A.
(2011). Optimizing semantic coherence in topic models. In
Proceedings of the 2011 Conference on Empirical Methods in
Natural Language Processing (pp. 262–272). New York, NY: ACM.
Neumann, M., King, D., Beltagy, I., & Ammar, W. (2019).
ScispaCy: Fast and robust models for biomedical natural language
processing. arXiv, 1902.07669. https://arxiv.org/pdf/1902.07669.
DOI: https://doi.org/10.18653/v1/W19-5034
Nielsen, F. A. (2007). Scientific citations in Wikipedia. First Monday,
12(8). https://firstmonday.org/ojs/index.php/fm/article/view
/1997/1872. DOI: https://doi.org/10.5210/fm.v12i8.1997
Nielsen, F. A., Mietchen, D., & Willighagen, E. (2017). Scholia,
Scientometrics and Wikidata. In E. Blomqvist, K. Hose, H. Paulheim,
A. Ławrynowicz, F. Ciravegna, & O. Hartig (Eds.) The Semantic Web:
ESWC 2017 Satellite Events, Vol. 10577, pp. 237–259. Cham:
Springer. DOI: https://doi.org/10.1007/978-3-319-70407-4_36
Ortega, J. L. (2018). Reliability and accuracy of altmetric providers:
A comparison among Altmetric.com, PlumX and Crossref Event
Data. Scientometrics, 116(3), 2123–2138. DOI: https://doi.org
/10.1007/s11192-018-2838-z
Paakkari, L., & Okan, O. (2020). COVID-19: Health literacy is an un-
derestimated problem. The Lancet Public Health, 5(5), e249–e250.
DOI: https://doi.org/10.1016/S2468-2667(20)30086-4
Perianes-Rodriguez, A., Waltman, L., & van Eck, N. J. (2016).
Constructing bibliometric networks: A comparison between full
and fractional counting. Journal of Informetrics, 10(4), 1178–1195.
DOI: https://doi.org/10.1016/j.joi.2016.10.006
Piccardi, T., Redi, M., Colavizza, G., & West, R. (2020). Quantifying
engagement with citations on Wikipedia. In Proceedings of The
Web Conference 2020 (pp. 2365–2376). New York, NY: ACM.
DOI: https://doi.org/10.1145/3366423.3380300
Piscopo, A., & Simperl, E. (2019). What we talk about when we
talk about Wiki-data quality: A literature survey. In Proceedings
of the 15th International Symposium on Open Collaboration.
New York, NY: ACM Press. DOI: https://doi.org/10.1145/3306446
.3340822
Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K., Terveen, L., &
Riedl, J. (2007). Creating, destroying, and restoring value in
Wikipedia. In Proceedings of the 2007 international ACM conference
on Conference on Supporting Group Work. New York, NY: ACM
Press. DOI: https://doi.org/10.1145/1316624.1316663
Priem, J., Piwowar, H. A., & Hemminger, B. M. (2012). Altmetrics
in the wild: Using social media to explore scholarly impact. arXiv,
1203.4745v1. https://arxiv.org/abs/1203.4745v1
R(cid:1)hu˚ r(cid:1)ek, R., & Sojka, P. (2010). Software framework for topic modelling
with large corpora. In Proceedings of the LREC 2010 Workshop on
New Challenges for NLP Frameworks (pp. 45–50). http://is.muni
.cz/publication/884893/en
Robinson-García, N., Torres-Salinas, D., Zahedi, Z., & Costas, R.
(2014). New data, new possibilities: Exploring the insides of
Altmetric.com. El Profesional de la Informacion, 23(4), 359–366.
DOI: https://doi.org/10.3145/epi.2014.jul.03
Shafee, T., Masukume, G., Kipersztok, L., Das, D., Häggström, M.,
& Heilman, J. (2017). Evolution of Wikipedia’s medical content:
Past, present and future. Journal of Epidemiology and Community
Health, 71, 1122–1129. DOI: https://doi.org/10.1136/jech-2016
-208601, PMID: 28847845, PMCID: PMC5847101
Shuai, X., Jiang, Z., Liu, X., & Bollen, J. (2013). A comparative study
of academic and Wikipedia ranking. In Proceedings of the 13th
ACM/IEEE-CS Joint Conference on Digital Libraries – JCDL ’13.
New York, NY: ACM Press. DOI: https://doi.org/10.1145/2467696
.2467746
Singer, P., Lemmerich, F., West, R., Zia, L., Wulczyn, E., Strohmaier,
M., & Leskovec, J. (2017). Why we read Wikipedia. In
Proceedings of the 26th International Conference on World
Wide Web (pp. 1591–1600). New York, NY: ACM Press. DOI:
https://doi.org/10.1145/3038912.3052716
Smith, D. A. (2020). Situating Wikipedia as a health information
resource in various contexts: A scoping review. PLOS ONE, 15(2),
e0228786. DOI: https://doi.org/10.1371/journal.pone.0228786,
PMID: 32069322, PMCID: PMC7028268
Sugimoto, C. R., Work, S., Larivière, V., & Haustein, S. (2017).
Scholarly use of social media and altmetrics: A review of the liter-
ature. Journal of the Association for Information Science and
Technology, 68(9), 2037–2062. DOI: https://doi.org/10.1002
/asi.23833
Swire-Thompson, B., & Lazer, D.
(2020). Public health and
online misinformation: Challenges and recommendations.
Annual Review of Public Health, 41(1), 433–451. DOI: https://
doi.org/10.1146/annurev-publhealth-040119-094127, PMID:
31874069
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006).
Hierarchical Dirichlet processes. Journal of the American
Statistical Association, 101(476), 1566–1581. DOI: https://doi.org
/10.1198/016214506000000302
Teplitskiy, M., Lu, G., & Duede, E. (2017). Amplifying the impact of
open access: Wikipedia and the diffusion of science. Journal of
the Association for Information Science and Technology, 68(9),
2116–2127. DOI: https://doi.org/10.1002/asi.23687
Thelwall, M. (2016). The discretised lognormal and hooked power
law distributions for complete citation data: Best options for model-
ling and regression. Journal of Informetrics, 10(2), 336–346. DOI:
https://doi.org/10.1016/j.joi.2015.12.007
Thelwall, M., & Wilson, P. (2014). Regression for citation data:
An evaluation of different methods. Journal of Informetrics, 8(4),
963–971. DOI: https://doi.org/10.1016/j.joi.2014.09.011
Torres-Salinas, D., Romero-Frías, E., & Arroyo-Machado, W.
(2019). Mapping the backbone of the Humanities through the eyes
of Wikipedia. Journal of Informetrics, 13(3), 793–803. DOI: https://
doi.org/10.1016/j.joi.2019.07.002
Traag, V. A., Van Dooren, P., & Nesterov, Y. (2011). Narrow scope
for resolution-limit-free community detection. Physical Review E,
84(1), 016114. DOI: https://doi.org/10.1103/PhysRevE.84.016114,
PMID: 21867264
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to
Leiden: Guaranteeing well-connected communities. Scientific
Reports, 9(1), 5233. DOI: https://doi.org/10.1038/s41598-019
-41695-z, PMID: 30914743, PMCID: PMC6435756
Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide,
D., … Kohlmeier, S. (2020). CORD-19: The Covid-19 Open
Research Dataset. arXiv, 2004.10706. http://arxiv.org/abs/2004
.10706
Wikimedia Foundation. (2020). Responding to COVID-19. How we
can help in this time of uncertainty. https://wikimediafoundation
.org/covid19
Quantitative Science Studies
1365
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
COVID-19 research in Wikipedia
World Health Organization. (2020a). EPI-WIN: WHO information net-
work for epidemics. https://www.who.int/teams/risk-communication
World Health Organization. (2020b). WHO COVID-19 Database.
https://www.who.int/emergencies/diseases/novel-coronavirus
-2019/global-research-on-novel-coronavirus-2019-ncov
Xie, B., He, D., Mercer, T., Wang, Y., Wu, D., … Lee, M. K. (2020).
Global health crises are also information crises: A call to action.
Journal of the Association for Information Science and Technology,
1–5. DOI: https://doi.org/10.1002/asi.24357, PMID: 32427189,
PMCID: PMC7228248
Yau, C.-K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering
scientific documents with topic modeling. Scientometrics, 100(3),
767–786. DOI: https://doi.org/10.1007/s11192-014-1321-8
Zahedi, Z., Costas, R., & Wouters, P. (2014). How well developed
are altmetrics? A cross-disciplinary analysis of the presence of
‘alternative metrics’ in scientific publications. Scientome- trics,
101(2), 1491–1513. DOI: https://doi.org/10.1007/s11192-014
-1264-0
Zarocostas, J. (2020). How to fight an infodemic. Lancet, 395(10225).
DOI: https://doi.org/10.1016/S0140-6736(20)30461-X
APPENDIX
A.1. TOPICS
Refer to Figures A2 and A3 for topic intensities over time. See Figure A5 for the topic clustering.
The topic label is given next to the topic number, for reference.
(cid:129) Topic #0, Public health: “method”, “system”, “use”, “drug”, “application”, “approach”,
“image”, “design”, “test”, “develop”, “technology”, “provide”, “technique”, “new”,
“tool”, “potential”, “base”, “device”, “allow”, “result”.
(cid:129) Topic #1, Public health: “health”, “pandemic”, “covid-19”, “COVID-19”, “public”,
“country”, “outbreak”, “social”, “care”, “covid-19_pandemic”, “measure”, “policy”,
“people”, “public_health”, “Health”, “impact”, “response”, “risk”, “medical”, “need”.
(cid:129) Topic #2, Molecular biology and immunology: “cell”, “infection”, “response”, “mouse”,
“immune”, “expression”, “lung”, “induce”, “disease”, “cat”, “role”, “tissue”, “system”,
“increase”, “level”, “receptor”, “study”, “gene”, “cytokine”, “human”.
(cid:129) Topic #3, Clinical medicine: “group”, “patient”, “day”, “study”, “year”, “result”, “rate”,
“age”, “method”, “compare”, “conclusion”, “total”, “time”, “period”, “mean”, “respec-
tively”, “high”, “month”, “significantly”.
(cid:129) Topic #4, Molecular biology and immunology: “protein”, “virus”, “cell”, “rna”, “viral”,
“coronavirus”, “activity”, “replication”, “gene”, “antiviral”, “study”, “human”, “mem-
brane”, “domain”, “binding”, “structure”, “sequence”, “target”, “infection”, “inhibitor”.
(cid:129) Topic #5, Coronaviruses: “respiratory”, “infection”, “acute”, “virus”, “syndrome”,
“SARS”, “severe”, “respiratory_syndrome”, “severe_acute”, “influenza”, “child”, “case”,
“patient”, “viral”, “acute respiratory syndrome”, “cause”, “coronavirus”, “clinical”,
“sars”, “pneumonia”.
(cid:129) Topic #6, Molecular biology and immunology: “virus”, “antibody”, “strain”, “sample”,
“detect”, “sequence”, “assay”, “isolate”, “coronavirus”, “detection”, “test”, “gene”, “calf ”,
“result”, “serum”, “positive”, “analysis”, “study”, “bovine”, “ibv”.
(cid:129) Topic #7, Clinical medicine: “patient”, “surgery”, “laparoscopic”, “surgical”, “procedure”,
“cancer”, “complication”, “perform”, “technique”, “undergo”, “postoperative”, “case”,
“tumor”, “result”, “method”, “repair”, “time”, “patient undergo”, “resection”, “hernia”.
(cid:129) Topic #8, Coronaviruses: “covid-19”, “COVID-19”, “sars-cov-2”, “coronavirus”, “case”,
“disease”, “patient”, “2019”, “2020”, “infection”, “severe”, “clinical”, “China”, “novel”,
“confirm”, “coronavirus_disease”, “report”, “symptom”, “novel_coronavirus”, “Wuhan”.
(cid:129) Topic #9, Epidemics: “model”, “datum”, “number”, “analysis”, “epidemic”, “case”,
“time”, “network”, “study”, “different”, “result”, “rate”, “dynamic”, “base”, “paper”,
“estimate”, “propose”, “population”, “spread”, “individual”.
(cid:129) Topic #10, Public health: “study”, “review”, “trial”, “include”, “clinical”, “treatment”,
“search”, “evidence”, “literature”, “result”, “datum”, “intervention”, “quality”, “report”,
“systematic”, “use”, “outcome”, “method”, “research”, “article”.
Quantitative Science Studies
1366
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
COVID-19 research in Wikipedia
(cid:129) Topic #11, Epidemics: “disease”, “vaccine”, “infectious”, “human”, “review”, “virus”,
“new”, “infectious_disease”, “emerge”, “development”, “animal”, “infection”, “pathogen”,
“recent”, “potential”, “cause”, “vaccination”, “infectious diseases”, “outbreak”, “include”.
(cid:129) Topic #12, Epidemics: “risk”, “factor”, “associate”, “associated with”, “mortality”,
“high”, “analysis”, “increase”, “study”, “95_ci”, “risk_factor”, “death”, “age”, “patient”,
“rate”, “ratio”, “outcome”, “regression”.
(cid:129) Topic #13, Clinical medicine: “effect”, “increase”, “group”, “study”, “level”, “concen-
tration”, “control”, “blood”, “change”, “pressure”, “result”, “high”, “low”, “decrease”,
“compare”, “measure”, “temperature”, “significantly”, “weight”, “reduce”.
(cid:129) Topic #14, Clinical medicine: “patient”, “treatment”, “clinical”, “acute”, “lung”,
“therapy”, “chest”, “aneurysm”, “outcome”, “treat”, “ventilation”, “care”, “case”,
“artery”, “stroke”, “failure”, “lesion”, “pulmonary”, “diagnosis”.
A.2. TABLES
Table A1.
Top 20 citing Wikipedia articles
# citations
62
Wikipedia id
201983
Coronavirus
Wikipedia article title
Lang
en
59
53
49
39
36
33
32
31
30
34
28
25
23
24
22
22
21
20
20
62786585
Severe acute respiratory syndrome coronavirus 2
62750956
2019–20 Wuhan coronavirus outbreak
63030231
Coronavirus disease 2019
63676463
2019–20 coronavirus pandemic
63430824
COVID-19 drug repurposing research
63895130
Paediatric multisystem inflammatory syndrome
211547
Severe acute respiratory syndrome-related coro…
63319438
COVID-19 vaccine
63435931
COVID-19 drug development
39532251
Middle East respiratory syndrome
19572217
Influenza
63204759
COVID-19 testing
2717089
Angiotensin-converting enzyme 2
22693252
Feline coronavirus
64144585
Management of COVID-19
4354646
Emergent virus
10849236
Antibody-dependent enhancement
196741
Severe acute respiratory syndrome
64144627
Prognosis of COVID-19
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
Quantitative Science Studies
1367
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Q
u
a
n
t
i
t
a
Table A2.
these articles from the scientific literature (data from Dimensions)
Top 20 cited journal articles. The first column gives the number of distinct citing Wikipedia articles, while the last column gives the number of citations to
# citations
67
DOI
10.3390/info11050263
Title
Modeling Popularity and Reliability of Sources …
10.1007/s00705-012-1299-6
Ratification vote on taxonomic proposals to …
10.1007/978-1-4939-2438-7_1
Coronaviruses: An Overview of Their Replication …
10.3390/v2081803
Coronavirus Genomics and Bioinformatics Analysis
10.1016/s0140-6736(20)30183-5
Clinical features of patients infected with 20 …
10.3390/su12104295
An Integrated Planning Framework for
Sustainability …
10.1002/med.20081
The regulation of HIV-1 transcription …
10.1056/nejmoa2001316
Early Transmission Dynamics in Wuhan, China, …
10.3390/v11020174
Global Epidemiology of Bat Coronaviruses
10.1056/nejmoa2001191
First Case of 2019 Novel Coronavirus …
Publication
year
2020
2012
2015
2010
2020
2020
2006
2020
2019
2020
Journal
Information
Archives of Virology
Methods in Molecular Biology
Viruses
The Lancet
Sustainability
Times
cited
1
210
395
143
5508
1
Medicinal Research Reviews
93
New England Journal
2399
of Medicine
Viruses
New England Journal
of Medicine
39
1127
10.1083/jcb.148.5.931
Pex19 Binds Multiple Peroxisomal Membrane
Proteins, Is Predominantly Cytoplasmic …
2000
Journal of Cell Biology
NaN
10.1038/s41586-020-2012-7
A pneumonia outbreak associated with a
new coronavirus …
10.1016/s2215-0366(19)30401-8
Cannabinoids for the treatment of mental disorders …
10.1128/jvi.06540-11
Discovery of seven novel Mammalian and avian …
10.1038/s41591-020-0820-9
The proximal origin of SARS-CoV-2
10.1001/jama.2016.17324
Prevalence of Depression, Depressive Symptoms, …
10.1016/s0140-6736(20)31180-6
Hydroxychloroquine or chloroquine with or with …
10.1016/j.pnpbp.2006.01.008
Human brain evolution and the
“Neuroevolutionary …
10.1016/s0140-6736(20)30567-5
How will country-based mitigation measures …
10.1086/511159
Infectious Diseases Society of America/America …
2020
Nature
2115
2019
2012
2020
2016
2020
2006
2020
2007
The Lancet Psychiatry
Journal of Virology
Nature Medicine
JAMA
The Lancet
Progress in Neuro-
Psychopharmacology
& Biological Psychiatry
The Lancet
37
453
411
398
84
63
303
Clinical Infectious Diseases
3967
i
t
i
v
e
S
c
e
n
c
e
S
u
d
e
s
t
i
1
3
6
8
18
15
15
13
12
12
11
11
11
10
10
9
9
9
8
8
8
8
8
C
O
V
I
D
-
1
9
r
e
s
e
a
r
c
h
i
n
W
i
k
i
p
e
d
i
a
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
COVID-19 research in Wikipedia
Table A3.
cited in Wikipedia; Not in W: not cited in Wikipedia; KWH: Kruskal–Wallis H test
Test statistics for general topic intensities of articles cited in Wikipedia or not, limited to articles published before 2020. In W:
General topic
Coronaviruses
Epidemics
Public health
Molecular biology and immunology
Clinical medicine
In W
Not in W
Mean
0.092
0.225
0.195
0.349
0.129
SD
0.156
0.197
0.222
0.313
0.198
Mean
0.094
0.161
0.172
0.294
0.266
SD
0.165
0.186
0.215
0.324
0.307
Test
KWH
0.06
357.84
35.158
102.917
475.092
p-value
KWH
0.807
Effect size
Cohen’s d
0.015
0.0
0.0
0.0
0.0
0.341
0.107
0.17
0.45
Table A4.
Wikipedia; Not in W: not cited in Wikipedia; KWH: Kruskal–Wallis H test
Test statistics for general topic intensities of articles cited in Wikipedia or not, limited to articles published in 2020. In W: cited in
General topic
Coronaviruses
Epidemics
Public health
Molecular biology and immunology
Clinical medicine
In W
Not in W
Mean
0.239
0.171
0.312
0.132
0.121
SD
0.193
0.166
0.271
0.199
0.165
Mean
0.157
0.182
0.387
0.103
0.147
SD
0.183
0.19
0.286
0.185
0.187
Test
KWH
171.205
p-value
KWH
0.0
0.041
0.84
46.889
15.078
19.711
0.0
0.0
0.0
Effect size
Cohen’s d
0.448
0.055
0.261
0.159
0.139
Table A5.
W: not cited in Wikipedia; KWH: Kruskal–Wallis H test
Test statistics for general topic intensities of articles cited in Wikipedia or not; all publications. In W: cited in Wikipedia; Not in
General topic
Coronaviruses
Epidemics
Public health
Molecular biology and immunology
Clinical medicine
Quantitative Science Studies
In W
Not in W
Mean
0.127
0.212
0.223
0.297
0.127
SD
0.177
0.191
0.24
0.305
0.191
Mean
0.122
0.17
0.268
0.209
0.213
SD
0.176
0.188
0.271
0.287
0.267
Test
KWH
0.173
224.428
53.051
343.365
331.354
p-value
KWH
0.678
Effect size
Cohen’s d
0.025
0.0
0.0
0.0
0.0
0.221
0.167
0.309
0.323
1369
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
COVID-19 research in Wikipedia
A.3. FIGURES
Figure A1. Timing of new citations from Wikipedia, and publication years of the articles they refer
to. (a) Number of citations from Wikipedia to COVID-19 literature, per year, overall. (b) Publication
year of the articles cited from Wikipedia, overall.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Figure A2. Heatmap of topic intensities over time.
Quantitative Science Studies
1370
COVID-19 research in Wikipedia
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
Figure A3. General topic intensities over time. (a) Average aggregate; this can be interpreted as the
average topic intensity. (b) Cumulative aggregate; this can be interpreted as the number of papers
per topic.
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
1371
COVID-19 research in Wikipedia
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
Figure A4. Heatmap of regression variables correlations (Pearson’s), after transformations.
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Figure A5. Agglomerative clustering dendrogram over topics, based on JensenShannon distances.
Considering a cut at 1.1, the left-most cluster (topics 3, 12, 13) focuses on viral epidemics and clinical
medicine; next is a cluster on COVID-19 and its treatment in intensive care (topics 7, 8, 14); next is a
clus ter COVID-19, public health, epidemics, and immunology (topics 0, 1, 9, 10, 11); lastly, on the
right, is a cluster on molecular biology and immunology/vaccines (topics 2, 4, 5, 6).
Quantitative Science Studies
1372
COVID-19 research in Wikipedia
Figure A6. Cohen’s d-effect statistic for general topic intensity differences be tween ar ticles cited
in Wikipedia and not. Publications published before 2020, in 2020, and overall are considered. See
Table A3, A4, and A5. Effect sizes are con sidered very small when below 0.2, small when below
0.5 and medium when below 0.8.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
1373
COVID-19 research in Wikipedia
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
s
s
_
a
_
0
0
0
8
0
p
d
/
.
Figure A7.
Some variables used for regression analyses. The plots distinguish variable values for
articles cited from Wikipedia (green) or not (blue). (a) Citations (Dimensions). (b) Mendeley readers.
(c) Mentions in blogs and news. (d) Twitter (unique) user interactions. (e) SPECTER cluster size. (f )
General topic coronaviruses.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
1374
COVID-19 research in Wikipedia
A.4. REGRESSION TABLES
Model:
Method:
No. Observations:
Pseudo R-squ.:
variable
C(publication_year, Treatment(2020))[T.2000.0]
C(publication_year, Treatment(2020))[T.2001.0]
C(publication_year, Treatment(2020))[T.2002.0]
C(publication_year, Treatment(2020))[T.2003.0]
C(publication_year, Treatment(2020))[T.2004.0]
C(publication_year, Treatment(2020))[T.2005.0]
C(publication_year, Treatment(2020))[T.2006.0]
C(publication_year, Treatment(2020))[T.2007.0]
C(publication_year, Treatment(2020))[T.2008.0]
C(publication_year, Treatment(2020))[T.2009.0]
C(publication_year, Treatment(2020))[T.2010.0]
C(publication_year, Treatment(2020))[T.2011.0]
C(publication_year, Treatment(2020))[T.2012.0]
C(publication_year, Treatment(2020))[T.2013.0]
C(publication_year, Treatment(2020))[T.2014.0]
C(publication_year, Treatment(2020))[T.2015.0]
C(publication_year, Treatment(2020))[T.2016.0]
C(publication_year, Treatment(2020))[T.2017.0]
C(publication_year, Treatment(2020))[T.2018.0]
C(publication_year, Treatment(2020))[T.2019.0]
C(top_j, Treatment(‘OTHER’))[T.Arch_Virol]
C(top_j, Treatment(‘OTHER’))[T.ChemRxiv]
C(top_j, Treatment(‘OTHER’))[T.Emerg_Infect_Dis]
C(top_j, Treatment(‘OTHER’))[T.JAMA]
C(top_j, Treatment(’OTHER’))[T.JMIR_Preprints]
C(top_j, Treatment(’OTHER’))[T.Journal_of_virology]
Quantitative Science Studies
dx/dy coef
−0.0035
−0.0164
−0.0018
−0.0030
0.0004
0.0050
0.0036
0.0023
0.0034
−0.0010
0.0010
−0.0031
−0.0032
−0.0080
−0.0085
−0.0123
−0.0119
−0.0116
−0.0112
−0.0067
0.0031
−0.0010
−0.0001
−0.0222
−0.2486
−0.0013
Logistic regression 2020
Marginal effects (Logistic regression)
130,864
0.2790
z
−0.457
P > |z|
0.648
std err
0.008
[0.025
−0.019
0.010
0.006
0.004
0.003
0.003
0.003
0.003
0.002
0.003
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.006
0.011
0.003
0.004
10.693
0.003
−1.644
−0.317
−0.763
0.100
0.751
0.445
0.127
0.899
1.755
0.079
1.303
0.192
0.792
0.428
1.357
0.175
−0.388
0.698
0.392
0.695
−1.260
−1.375
−3.458
−3.842
−5.528
−5.321
−5.191
−5.001
−2.919
0.208
0.169
0.001
0.000
0.000
0.000
0.000
0.000
0.004
0.528
0.598
−0.089
−0.051
−5.200
−0.023
−0.422
0.929
0.960
0.000
0.981
0.673
−0.036
−0.013
−0.011
−0.005
−0.001
−0.002
−0.003
−0.002
−0.006
−0.004
−0.008
−0.008
−0.012
−0.013
−0.017
−0.016
−0.016
−0.016
−0.011
−0.008
−0.023
−0.006
−0.031
−21.207
−0.007
0.975]
0.012
0.003
0.009
0.005
0.006
0.011
0.009
0.008
0.008
0.004
0.006
0.002
0.001
−0.003
−0.004
−0.008
−0.008
−0.007
−0.007
−0.002
0.015
0.021
0.005
−0.014
20.710
0.005
1375
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
(continued )
Model:
Method:
No. Observations:
Pseudo R-squ.:
variable
C(top_j, Treatment(‘OTHER’))[T.Nature]
C(top_j, Treatment(‘OTHER’))[T.PLoS_One]
C(top_j, Treatment(‘OTHER’))[T.Research_Square]
dx/dy coef
0.0105
−0.0034
−0.3905
C(top_j, Treatment(‘OTHER’))[T.SSRN_Electronic_Journal]
−0.0173
C(top_j, Treatment(‘OTHER’))[T.Sci_Rep]
C(top_j, Treatment(‘OTHER’))[T.Science]
C(top_j, Treatment(‘OTHER’))[T.Surgical_endoscopy]
C(top_j, Treatment(‘OTHER’))[T.The_BMJ]
C(top_j, Treatment(‘OTHER’))[T.The_Lancet]
C(top_j, Treatment(‘OTHER’))[T.Vaccine]
C(top_j, Treatment(‘OTHER’))[T.Virology]
C(top_j, Treatment(‘OTHER’))[T.Viruses]
C(top_j, Treatment(‘OTHER’))[T.bioRxiv]
C(top_j, Treatment(’OTHER’))[T.medRxiv]
−0.0071
0.0083
−0.0003
0.0126
0.0129
−0.0118
−0.0013
0.0091
−0.0073
−0.0489
Logistic regression 2020
Marginal effects (Logistic regression)
130,864
0.2790
z
std err
0.003
0.003
235.195
0.008
0.005
0.004
0.007
0.004
0.003
0.006
0.004
0.003
0.004
0.008
P > |z|
0.000
0.210
0.999
0.027
0.169
[0.025
0.005
−0.009
0.975]
0.016
0.002
−461.365
460.584
−0.033
−0.017
−0.002
0.003
3.614
−1.255
−0.002
−2.208
−1.375
2.202
0.028
0.001
0.016
−0.039
0.969
−0.014
0.013
3.267
0.001
0.005
0.020
4.645
0.000
0.007
0.018
−1.909
−0.308
0.056
0.758
−0.024
−0.010
0.000
0.007
2.639
0.008
0.002
0.016
−1.824
−6.260
0.068
0.000
−0.015
0.001
−0.064
−0.034
times_cited
counts_mendeley
counts_policy
0.0078
0.000
19.909
0.000
0.007
0.009
0.0066
0.000
20.967
0.000
0.006
0.007
−0.0013
0.001
−1.319
0.187
−0.003
0.001
counts_twitter_unique
0.0050
0.000
13.822
0.000
0.004
0.006
counts_blogs_news
counts_facebook
expert_ratio
tm_coronaviruses
tm_epidemics
tm_ph
tm_mbi
tm_clinical_medicine
spectre_cluster_size
network_cluster_size
Quantitative Science Studies
0.0051
0.000
11.343
0.000
0.004
0.006
0.0008
−0.0067
0.0138
0.0241
0.0197
0.0219
−0.0064
0.0022
−2.14e−05
0.001
0.002
0.018
0.018
0.018
0.018
0.018
0.001
0.000
1.115
0.265
−3.679
0.000
0.760
0.447
1.343
0.179
1.086
0.278
1.226
0.220
−0.362
0.717
1.483
0.138
−0.132
0.895
−0.001
−0.010
−0.022
−0.011
−0.016
−0.013
−0.041
−0.001
−0.000
0.002
−0.003
0.050
0.059
0.055
0.057
0.028
0.005
0.000
1376
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
Model:
Method:
No. Observations:
Pseudo R-squ.:
variable
C(publication_year, Treatment(2019))[T.2000.0]
C(publication_year, Treatment(2019))[T.2001.0]
C(publication_year, Treatment(2019))[T.2002.0]
C(publication_year, Treatment(2019))[T.2003.0]
C(publication_year, Treatment(2019))[T.2004.0]
C(publication_year, Treatment(2019))[T.2005.0]
C(publication_year, Treatment(2019))[T.2006.0]
C(publication_year, Treatment(2019))[T.2007.0]
C(publication_year, Treatment(2019))[T.2008.0]
C(publication_year, Treatment(2019))[T.2009.0]
C(publication_year, Treatment(2019))[T.2010.0]
C(publication_year, Treatment(2019))[T.2011.0]
C(publication_year, Treatment(2019))[T.2012.0]
C(publication_year, Treatment(2019))[T.2013.0]
C(publication_year, Treatment(2019))[T.2014.0]
C(publication_year, Treatment(2019))[T.2015.0]
C(publication_year, Treatment(2019))[T.2016.0]
C(publication_year, Treatment(2019))[T.2017.0]
C(publication_year, Treatment(2019))[T.2018.0]
C(top_j, Treatment(‘OTHER’))[T.Arch_Virol]
C(top_j, Treatment(‘OTHER’))[T.Emerg_Infect_Dis]
C(top_j, Treatment(‘OTHER’))[T.JAMA]
0.0190
0.0359
0.0322
0.0335
0.0415
0.0365
0.0371
0.0356
0.0297
0.0290
0.0247
0.0262
0.0201
0.0195
0.0129
0.0100
0.0110
0.0085
0.0043
0.0022
0.0067
C(top_j, Treatment(‘OTHER’))[T.Journal_of_virology]
−0.0006
C(top_j, Treatment(‘OTHER’))[T.Nature]
C(top_j, Treatment(‘OTHER’))[T.PLoS_One]
C(top_j, Treatment(‘OTHER’))[T.Sci_Rep]
C(top_j, Treatment(‘OTHER’))[T.Science]
C(top_j, Treatment(‘OTHER’))[T.Surgical_endoscopy]
C(top_j, Treatment(‘OTHER’))[T.The_Lancet]
0.0148
−0.0070
0.0002
−0.0008
0.0037
0.0109
Quantitative Science Studies
Logistic regression 2019
Marginal effects (Logistic regression)
69,444
0.2670
dx/dy coef
0.0326
std err
0.010
0.012
0.008
0.006
0.005
0.005
0.005
0.005
0.004
0.004
0.004
0.004
0.004
0.004
0.004
0.004
0.004
0.004
0.004
0.008
0.004
0.007
0.004
0.005
0.004
0.007
0.009
0.008
0.005
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
z
3.389
1.615
4.728
5.487
6.812
8.825
7.797
7.913
8.032
6.776
6.644
5.730
6.337
4.874
4.813
3.189
2.418
2.668
2.005
0.549
0.574
1.029
−0.153
2.861
−1.926
0.027
−0.096
0.477
2.136
P > |z|
0.001
0.106
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.016
0.008
0.045
0.583
0.566
0.303
0.879
0.004
0.054
0.978
0.923
0.633
0.033
[0.025
0.014
−0.004
0.021
0.021
0.024
0.032
0.027
0.028
0.027
0.021
0.020
0.016
0.018
0.012
0.012
0.005
0.002
0.003
0.000
−0.011
−0.005
−0.006
−0.008
0.005
−0.014
−0.013
−0.018
−0.011
0.001
0.975]
0.051
0.042
0.051
0.044
0.043
0.051
0.046
0.046
0.044
0.038
0.038
0.033
0.034
0.028
0.027
0.021
0.018
0.019
0.017
0.020
0.010
0.020
0.007
0.025
0.000
0.013
0.016
0.019
0.021
1377
COVID-19 research in Wikipedia
(continued )
Model:
Method:
No. Observations:
Pseudo R-squ.:
variable
C(top_j, Treatment(‘OTHER’))[T.Vaccine]
C(top_j, Treatment(‘OTHER’))[T.Virology]
C(top_j, Treatment(‘OTHER’))[T.Viruses]
Logistic regression 2019
Marginal effects (Logistic regression)
69,444
0.2670
dx/dy coef
−0.0220
−0.0048
0.0078
std err
0.010
0.006
0.005
z
−2.291
−0.826
1.562
C(top_j, Treatment(‘OTHER’))[T.bioRxiv]
−0.9986
4.64e+08
−2.15e−09
times_cited
counts_mendeley
counts_policy
counts_twitter_unique
counts_blogs_news
counts_facebook
expert_ratio
tm_coronaviruses
tm_epidemics
tm_ph
tm_mbi
tm_clinical_medicine
spectre_cluster_size
network_cluster_size
0.0027
0.0164
−0.0020
0.0006
0.0051
0.0044
−0.0010
−0.0251
−0.0172
−0.0118
−0.0091
−0.0356
0.0041
−0.0009
0.001
0.001
0.001
0.001
0.001
0.001
0.002
0.034
0.034
0.034
0.034
0.034
0.002
0.000
4.136
22.952
−1.377
0.915
6.915
3.950
−0.408
−0.729
−0.505
−0.346
−0.267
−1.056
1.947
−3.371
P > |z|
0.022
0.409
0.118
1.000
0.000
0.000
0.168
0.360
0.000
0.000
0.683
0.466
0.613
0.729
0.789
0.291
0.052
0.001
[0.025
−0.041
−0.016
−0.002
0.975]
−0.003
0.007
0.018
−9.1e+08
9.1e+08
0.001
0.015
−0.005
−0.001
0.004
0.002
−0.006
−0.093
−0.084
−0.079
−0.076
−0.102
−2.79e−05
0.004
0.018
0.001
0.002
0.007
0.007
0.004
0.042
0.049
0.055
0.057
0.031
0.008
−0.001
−0.000
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Quantitative Science Studies
1378
COVID-19 research in Wikipedia
Model:
Method:
No. Observations:
R-squ.:
variable
Intercept
C(publication_year, Treatment(2020))[T.2000.0]
C(publication_year, Treatment(2020))[T.2001.0]
C(publication_year, Treatment(2020))[T.2002.0]
C(publication_year, Treatment(2020))[T.2003.0]
C(publication_year, Treatment(2020))[T.2004.0]
C(publication_year, Treatment(2020))[T.2005.0]
C(publication_year, Treatment(2020))[T.2006.0]
C(publication_year, Treatment(2020))[T.2007.0]
C(publication_year, Treatment(2020))[T.2008.0]
C(publication_year, Treatment(2020))[T.2009.0]
C(publication_year, Treatment(2020))[T.2010.0]
C(publication_year, Treatment(2020))[T.2011.0]
C(publication_year, Treatment(2020))[T.2012.0]
C(publication_year, Treatment(2020))[T.2013.0]
C(publication_year, Treatment(2020))[T.2014.0]
C(publication_year, Treatment(2020))[T.2015.0]
C(publication_year, Treatment(2020))[T.2016.0]
C(publication_year, Treatment(2020))[T.2017.0]
C(publication_year, Treatment(2020))[T.2018.0]
C(publication_year, Treatment(2020))[T.2019.0]
C(top_j, Treatment(‘OTHER’))[T.Arch_Virol]
C(top_j, Treatment(‘OTHER’))[T.ChemRxiv]
C(top_j, Treatment(‘OTHER’))[T.Emerg_Infect_Dis]
C(top_j, Treatment(‘OTHER’))[T.JAMA]
C(top_j, Treatment(‘OTHER’))[T.JMIR_Preprints]
C(top_j, Treatment(‘OTHER’))[T.Journal_of_virology]
C(top_j, Treatment(‘OTHER’))[T.Nature]
C(top_j, Treatment(‘OTHER’))[T.PLoS_One]
Quantitative Science Studies
OLS regression 2020
OLS
130,809
0.115
T
−0.326
−2.312
−3.206
−2.078
−3.003
−1.775
−0.532
−1.361
−1.296
−1.655
−4.486
−4.380
−6.441
−5.809
−10.165
−12.212
−14.416
−14.181
−13.631
−11.885
−6.694
1.266
0.347
−1.808
−4.730
0.708
1.067
7.789
−4.651
P > |T|
0.745
0.021
0.001
0.038
0.003
0.076
0.594
0.173
0.195
0.098
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.206
0.729
0.071
0.000
0.479
0.286
0.000
0.000
coef
−0.0056
−0.0201
−0.0262
−0.0105
−0.0117
−0.0058
−0.0017
−0.0043
−0.0041
−0.0044
−0.0112
−0.0110
−0.0155
−0.0134
−0.0220
−0.0252
−0.0287
−0.0279
−0.0262
−0.0228
−0.0124
0.0082
0.0018
−0.0087
−0.0297
0.0040
0.0050
0.0403
−0.0151
std err
0.017
0.009
0.008
0.005
0.004
0.003
0.003
0.003
0.003
0.003
0.002
0.003
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.006
0.005
0.005
0.006
0.006
0.005
0.005
0.003
[0.025
−0.039
−0.037
−0.042
−0.020
−0.019
−0.012
−0.008
−0.010
−0.010
−0.010
−0.016
−0.016
−0.020
−0.018
−0.026
−0.029
−0.033
−0.032
−0.030
−0.027
−0.016
−0.004
−0.009
−0.018
−0.042
−0.007
−0.004
0.030
0.975]
0.028
−0.003
−0.010
−0.001
−0.004
0.001
0.005
0.002
0.002
0.001
−0.006
−0.006
−0.011
−0.009
−0.018
−0.021
−0.025
−0.024
−0.022
−0.019
−0.009
0.021
0.012
0.001
−0.017
0.015
0.014
0.050
−0.021
−0.009
1379
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
.
/
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
COVID-19 research in Wikipedia
(continued )
Model:
Method:
No. Observations:
R-squ.:
variable
C(top_j, Treatment(‘OTHER’))[T.Research_Square]
coef
0.0031
std err
0.003
C(top_j, Treatment(‘OTHER’))[T.SSRN_Electronic_Journal]
−0.0009
C(top_j, Treatment(‘OTHER’))[T.Sci_Rep]
C(top_j, Treatment(‘OTHER’))[T.Science]
C(top_j, Treatment(‘OTHER’))[T.Surgical_endoscopy]
C(top_j, Treatment(‘OTHER’))[T.The_BMJ]
−0.0106
0.0236
−0.0055
0.0019
0.003
0.005
0.005
0.004
0.005
OLS regression 2020
OLS
130,809
0.115
T
0.983
−0.340
−2.079
4.481
−1.249
0.375
C(top_j, Treatment(‘OTHER’))[T.The_Lancet]
0.0825
0.006
13.939
−0.0183
−0.0051
0.0235
−0.0053
−0.0183
0.006
0.005
0.005
0.003
0.002
−3.204
−0.945
4.710
−1.602
−8.482
0.0099
0.000
25.497
0.0096
0.000
31.320
0.0612
0.002
29.233
0.0006
0.000
1.504
0.0298
0.001
42.103
0.0232
0.001
19.815
−0.0145
−0.0055
0.0111
−0.0025
0.0048
−0.0145
0.0006
−0.0005
0.002
0.014
0.013
0.014
0.013
0.013
0.001
0.000
−8.933
−0.399
0.833
−0.183
0.362
−1.089
0.428
−3.816
C(top_j, Treatment(‘OTHER’))[T.Vaccine]
C(top_j, Treatment(‘OTHER’))[T.Virology]
C(top_j, Treatment(‘OTHER’))[T.Viruses]
C(top_j, Treatment(‘OTHER’))[T.bioRxiv]
C(top_j, Treatment(‘OTHER’))[T.medRxiv]
times_cited
counts_mendeley
counts_policy
counts_twitter_unique
counts_blogs_news
counts_facebook
expert_ratio
tm_coronaviruses
tm_epidemics
tm_ph
tm_mbi
tm_clinical_medicine
spectre_cluster_size
network_cluster_size
Quantitative Science Studies
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
1
4
1
3
4
9
1
8
7
0
9
8
5
q
S
S
_
UN
_
0
0
0
8
0
P
D
/
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
P > |T|
0.326
0.734
0.038
0.000
0.212
0.707
0.000
0.001
0.345
0.000
0.109
0.000
0.000
0.000
0.000
0.133
0.000
0.000
0.000
0.690
0.405
0.855
0.717
0.276
0.669
0.000
[0.025
−0.003
−0.006
−0.021
0.013
−0.014
−0.008
0.071
−0.029
−0.016
0.014
−0.012
−0.022
0.009
0.009
0.057
−0.000
0.028
0.021
−0.018
−0.032
−0.015
−0.029
−0.021
−0.041
−0.002
−0.001
0.975]
0.009
0.004
−0.001
0.034
0.003
0.012
0.094
−0.007
0.006
0.033
0.001
−0.014
0.011
0.010
0.065
0.001
0.031
0.026
−0.011
0.021
0.037
0.024
0.031
0.012
0.003
−0.000
1380