RESEARCH ARTICLE

RESEARCH ARTICLE

Wikinformetrics: Construction and description of
an open Wikipedia knowledge graph data set
for informetric purposes

Wenceslao Arroyo-Machado1

, Daniel Torres-Salinas1

, and Rodrigo Costas2,3

1Department of Information and Communication Sciences, University of Granada, Granada, 西班牙
2Centre for Science and Technology Studies (CWTS), 莱顿大学, Leiden, 荷兰人
3DSI-NRF Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy,
Stellenbosch University, Stellenbosch, 南非

关键词: altmetrics, 数据, informetrics, knowledge graph, 指标, 维基百科

抽象的

Wikipedia is one of the most visited websites in the world and is also a frequent subject of
scientific research. 然而, the analytical possibilities of Wikipedia information have not yet
been analyzed considering at the same time both a large volume of pages and attributes. 这
main objective of this work is to offer a methodological framework and an open knowledge
graph for the informetric large-scale study of Wikipedia. Features of Wikipedia pages are
compared with those of scientific publications to highlight the (迪斯)similarities between the two
types of documents. Based on this comparison, different analytical possibilities that Wikipedia
and its various data sources offer are explored, ultimately offering a set of metrics meant
to study Wikipedia from different analytical dimensions. In parallel, a complete dedicated
data set of the English Wikipedia was built (and shared) following a relational model. 最后,
a descriptive case study is carried out on the English Wikipedia data set to illustrate the
analytical potential of the knowledge graph and its metrics.

1.

介绍

On January 15, 2001, Wikipedia was born under the umbrella of Nupedia, an encyclopedia
project that was based on a peer review system. Due to the lack of agility in publishing articles,
Wikipedia was created as a feeder project, as its objective was to make the creation of new
articles easier before they were reviewed (History of Wikipedia, 2021). Wikipedia combined in
a single project different elements that were new on the web and that made possible for the
first time a universal encyclopedia (Reagle, 2009). It was successful enough to make Nupedia
disappear in 2 年, experiencing steady growth. 自那以后, Wikipedia has become one
of the most visited websites in the world (https://www.semrush.com/website/top/, 访问过
八月 4, 2022), 拥有 328 different editions, 285 of them having more than 1,000 文章
(https://meta.wikimedia.org/wiki/List_of_Wikipedias, accessed August 4, 2022). Although this
is the most successful project of Wikimedia Foundation, there are also other well-known
knowledge projects using wikis as a basis (例如, the Wiktionary dictionary or the Wikidata
knowledge base).

Wikipedia has been a disruptive innovation, finding in its open nature and decentralized
knowledge development one of its key elements (Olleros, 2008). Not only can everyone access

开放访问

杂志

引文: Arroyo-Machado, W., Torres-
Salinas, D ., & Costas, 右. (2022).
Wikinformetrics: Construction and
description of an open Wikipedia
knowledge graph data set for
informetric purposes. Quantitative
Science Studies, 3(4), 931–952.
https://doi.org/10.1162/qss_a_00226

DOI:
https://doi.org/10.1162/qss_a_00226

支持信息:
https://doi.org/10.1162/qss_a_00226

已收到: 10 八月 2022
公认: 28 十月 2022

通讯作者:
Wenceslao Arroyo-Machado
wences@ugr.es

处理编辑器:
Vincent Larivière

版权: © 2022 Wenceslao Arroyo-
Machado, Daniel Torres-Salinas, 和
Rodrigo Costas. Published under a
Creative Commons Attribution 4.0
国际的 (抄送 4.0) 执照.

麻省理工学院出版社

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

its contents free of charge, but they can also participate in its construction, in a fully transparent
过程. This social construction of the knowledge can be seen in the differences found among
language editions of the same Wikipedia pages (原 & Doney, 2015). Wikipedia contents are
also the result of consensus among editors or Wikipedians. This consensus is built in open dis-
cussions in the Wikipedia talk pages (真木, Yoder et al., 2017; Yasseri, Sumi et al., 2012), open to
anyone and capturing transnational debates around Wikipedia contents (Kopf, 2020). Some of
these talks and debates have sometimes transcended Wikipedia itself (O’Neil, 2017).

As an online encyclopedia, Wikipedia is not exempt from problems. The reliability of its
content has been much debated, as it is based on contributions from anonymous individuals
(Olleros, 2008). The quality of Wikipedia pages’ content has been studied numerous times from
different perspectives, especially with regard to medical content pages, pointing out limitations,
such as occasional incomplete or imprecise information (Adams, Montgomery et al., 2020;
Candelario, Vazquez et al., 2017; 韦纳, Horbacewicz et al., 2019). The importance of inte-
grating Wikipedia into academia, both in its use and in its development, has been highlighted
(Jemielniak, 2019). Social and cultural inequalities have also been pointed out, such as racial
and gender gaps in its biographies (Adams, Brückner, & Naslund, 2019; Tripodi, 2021).

Wikipedia is not free of bots and vandalism, although they do not constitute a serious threat
to its contents and reliability and Wikipedia’s policy does not allow detrimental use of the
activity of bots or automated accounts. Most of the bots on Wikipedia are publicly identified
(https://en.wikipedia.org/wiki/Special:ListUsers/bot), and they contribute to improving the
content and structure of Wikipedia articles (Arroyo-Machado, Torres-Salinas et al., 2020;
郑, Albano et al., 2019). Bots also help to control and reduce problems of vandalism
and trolls, as they eliminate their harmful edits of articles in advance of human editors. 那里
is also no shortage of proposals for methods based on machine learning to prevent this type of
harmful activity (Martinez-Rico, Martinez-Romo, & Araujo, 2019).

In spite of all of these issues, the general idea is that Wikipedia is a transparent and reliable
source of encyclopedic information (Lageard & Paternotte, 2021), with value of its own to be
the subject of scientific research.

1.1. Wikipedia as Source for Informetric Research

Wikipedia has been researched from different scientific perspectives. One of them is infor-
指标, quantitatively studying the contents and activity generated on Wikipedia. 因此,
Wikipedia has been studied from the points of view of scientometrics, bibliometrics, 和
webometrics, which are discussed in detail below.

Bibliographic references made in Wikipedia have been studied, particularly since the emer-
gence of the notion of “altmetrics” (Priem, Taraborelli et al., 2010), which considered citations
on Wikipedia to scientific literature as part of its realm1. Wikipedia citations are one of the
most popular sources covered in altmetric aggregators (Ortega, 2020; Zahedi & Costas,
2018) such as Altmetric.com, PlumX, or Crossref Event Data. In addition to altmetric data
提供者, there are also several other open data sources providing extensive metadata on
Wikipedia citations (辛格, 西方, & Colavizza, 2020; Zagorova, Ulloa et al., 2022). 而且,
other proposals, such as Scholia, enable the exploration of bibliographic data at different
levels through Wikidata (Nielsen, Mietchen, & Willighagen, 2017). 表中 1 a summary of
previous studies on Wikipedia bibliographic references are presented.

1 Wikipedia references had already been studied for years before the birth of altmetrics, such as in the citation

analysis by Nielsen (2007) 或者, in a more qualitative way, that of Mühlhauser and Oser (2008).

Quantitative Science Studies

932

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3



A
n

t

t

A

t

v
e
S
C
e
n
C
e
S

d
e
s

t

桌子 1. Main studies on the bibliographic references included in Wikipedia pages

Type
Qualitative

Application
Content and quality

数据

分析

Methodological
方法

语言

Topic analyzed

Check list

德语

Health care

Content and quality

33 页面

Scoring system

英语

Medication

分析

Analyze the editors’
citation process

Survey and interviews

Multilingual

Multidisciplinary

参考
Mühlhauser and Oser

(Mühlhauser & Oser,
2008)

Candelario et al.

(Candelario et al.,
2017)

Kaffee and Elsahar

(Kaffee & Elsahar,
2021)

Nielsen (Nielsen, 2007)

Quantitative

Analyze citation

30,368 citations

Descriptive statistics

英语

Multidisciplinary

图案

Kousha and Thelwall

(Kousha & Thelwall,
2017)

Lewoniewski et al.

(Lewoniewski, Węcel,
& Abramowicz, 2017)

Maggio et al. (Maggio,

Willinsky et al., 2017)

Pooladian and Borrego

(Pooladian & Borrego,
2017)

Jemielniak et al.
(Jemielniak,
Masukume, &
Wilamowski, 2019)

Torres-Salinas et al.
(Torres-Salinas,
Romero-Frías,
& Arroyo-Machado,
2019)

Arroyo-Machado et al.
(Arroyo-Machado
等人。, 2020)

9
3
3

Evaluate the impact

36,191 citations

Descriptive statistics

Multilingual

Multidisciplinary

of references

参考

coverage across
语言

6.8 million pages
41 million citations

Analyze citation

图案

229,857 页面
1,049,025 citations

Descriptive statistics

Multilingual

Multidisciplinary

Descriptive statistics

英语

药品

Evaluate the impact

982 citations

Descriptive analysis

Multilingual

Multidisciplinary

of references

Rank journals by

citations

11,325 页面
137,889 citations

Citation analysis

英语

药品

Mapping of

知识
结构

Mapping of

知识
结构

25,555 页面
41,655 citations

193,802 页面
847,512 citations

Cocitation analysis

英语

艺术 & 人文学科

Cocitation analysis

英语

Multidisciplinary



k

n
F

r

e
t
r

C
s

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3



A
n

t

t

A

t

v
e
S
C
e
n
C
e
S

d
e
s

t

桌子 1.

(continued )

参考
Colavizza (Colavizza,

2020)

Nicholson et al.
(Nicholson,
Uppala et al., 2021)

Type

Application

数据

Publications
覆盖范围

3,083 ref. 酒吧.

Reviewing citation

质量

1,923,575 页面
824,298 ref. 酒吧.

Methodological
方法

Topic modeling and
regression analysis

语言

Topic analyzed

英语

COVID-19

Classification modeling

英语

Multidisciplinary

Singh et al. (Singh et al.,

Data set creation

4 million citations

Text mining

英语

Multidisciplinary

2020)

Zagorova et al.

(Zagorova et al., 2022)

Data set creation

6,073,708 页面
55 million citations

Text mining

英语

Multidisciplinary



k

n
F

r

e
t
r

C
s

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

9
3
4

Wikinformetrics

Kaffee and Elsahar (2021) explored the flow that Wikipedians follow to include references
in Wikipedia articles. Kousha and Thelwall (2017), and Pooladian and Borrego (2017)
described the problems of Wikipedia citations in performance evaluation. Nicholson et al.
(2021) studied the quality of cited references in Wikipedia. Lewoniewski et al. (2017) 显示
that the different language editions of the same Wikipedia page tended to cite common
来源, with the largest overlap between English and German and some differences depend-
ing on the topics. Colavizza (2020) studied the coverage of the scientific literature on
COVID-19 on Wikipedia, showing that although there was only a small percentage of scien-
tific literature on COVID-19 in Wikipedia, it was sufficiently representative of its various
主题. Arroyo-Machado et al. (2020) and Torres-Salinas et al. (2019) mapped Wikipedia
cocitations patterns, showing fundamental differences in the use of scientific literature in
Wikipedia compared to the academic realm. Bould, Hladkowicz et al. (2014), 李, Thelwall,
and Mohammadi (2021), and Tomaszewski and MacDonald (2016) studied academic cita-
tions in scientific publications to Wikipedia articles, proving that scientific publications also
use Wikipedia content in their citations, as well as other digital encyclopedias, especially in
areas such as chemistry, 物理, or mathematics.

Wikipedia has also been the subject of webometric studies. 例如, “Wikiometrics”
were proposed as a rating system to rank universities or journals based on the features of their
Wikipedia pages, also finding positive correlations with existing academic rankings (Katz &
Rokach, 2017). The estimation of the importance of Wikipedia pages based on the PageRank
algorithm was also studied, correlating positively with other page-view-based rankings
(Thalhammer & Rettinger, 2016). Miquel-Ribé and Laniado (2018) showed that the different
language editions of Wikipedia pages reflect cultural differences, as the contents cover local
topics corresponding to different linguistic regions. Other studies focused on metrics about the
attention generated around Wikipedia articles (例如, likes or page view counts), showing how
they reflect current topics of interest at a particular time/region (Dzogang, Lansdall-Welfare, &
Cristianini, 2016; Mittermeier, Roll et al., 2019; Mittermeier, Correia et al., 2021; Roll,
Mittermeier et al., 2016; Vilain, Larrieu et al., 2017), and even demonstrating the potential
of Wikipedia pages to monitor the spread of diseases (Generous, Fairchild et al., 2014).

There are also numerous studies around Wikipedia’s informetric features. Wilkinson and
Huberman (2007) found a correlation between the quality of Wikipedia articles and their num-
ber of edits. The relationship between the length of Wikipedia articles and their quality has
been highlighted by Blumenstock (2008). Beyond quality, relationships between Wikipedia
metrics have also been explored. Previous studies found positive correlations between views
and the number of edits and editors (Mittermeier et al., 2021), and weak correlations between
the length of Wikipedia pages and the length of their talk pages (Yasseri et al., 2012). 张,
Ren, and Kraut (2018) suggested the value of using metrics in specific moments of the life
cycles, for example the number of editors in the first 3 months of an article’s life was not
when it was most strongly related to its future quality.

虽然, as shown above, there is abundant scientific literature on Wikipedia and its infor-
metric applications, most previous studies tended to focus on either limited sets of metrics
(例如, Nicholson et al. (2021), who were focused on the level of quality of scientific publica-
tions referenced in Wikipedia articles), or limited data sets (例如, Mittermeier et al. (2021), WHO
studied a large set of features in a data set of Wikipedia pages of 10,099 bird species across
251 language editions). 因此, large-scale study of Wikipedia, from both a large volume of
pages and attributes, is still missing in the literature. 可以说, a potential reason for this lack
of large-scale studies on Wikipedia is the lack of a conceptual framework that highlights both

Quantitative Science Studies

935

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

the large-scale data available from Wikipedia and the multiple informetric metrics that Wiki-
pedia offers. Such absence has hindered the development of broader research perspectives,
especially regarding the relationship of Wikipedia with science, where a contextualization
of the relationships between the two is still needed.

In this study, we propose such a framework by means of developing an informetric-inspired
knowledge graph, with the aim of enabling similar analytical approaches to those developed
in scientometric research. Such a knowledge graph could work as a complement of other
Wikipedia knowledge graphs such as Wikidata (https://www.wikidata.org/) or DBpedia (https://
www.dbpedia.org/). Wikidata and DBpedia provide exhaustive Wikipedia knowledge graphs
but they are more focused on content and semantic relationships, transforming Wikipedia
pages into entities (例如, 人们, 地方, music bands) and establishing different computer-
understandable relationships between them. Our proposed knowledge graph aims at charac-
terizing the attention and usage of Wikipedia pages using a relational model and incorporating
activity metadata that are not present in the semantic graphs of Wikidata and DBpedia,
capturing the attention and social engagement, such as views or edits, as well as the presence
of scientific literature in Wikipedia pages.

The paper is structured as follows: 第一的, we describe our main objectives and our alignment
with recent developments in the field of altmetrics. 第二, we describe the informetric fea-
tures of Wikipedia pages and their similarities with scientific publications, together with the
existing data sources for data collection. Several informetric-inspired metrics ( Wikinformetrics)
are proposed for Wikipedia. 第三, a Wikipedia knowledge graph, based on the combination
of different Wikipedia data sources, is constructed and presented. 第四, the data set is
explored in a descriptive way to show the analytical possibilities of the knowledge graph
and the proposed metrics. 最后, we conclude by discussing our findings and proposing
future research venues.

1.2. Objectives

The main objective of this work is to explore the research value of Wikipedia from an infor-
metric perspective, ultimately providing a complete Wikipedia knowledge graph. 更多的
具体来说, three different objectives are targeted:

1. Theoretical objective: To establish a framework for Wikipedia analytics, by exploring
the informetric features of Wikipedia pages (作品, 类别, 来源, 数据
搜集, etc..) and proposing a set of informetric-inspired metrics ( Wikinformetrics)
for their quantitative study. This objective will help us to map the analytical possibilities
of Wikipedia as a scientific object.
Instrumental objective: To create a large open Wikipedia knowledge graph. Once we
are familiar with the main features of Wikipedia, we will construct a dedicated knowl-
edge graph focused on the English-language edition of Wikipedia with the main infor-
mation and data relationships coming from combining different data sources.

2.

3. Applied objective: To conduct a descriptive quantitative study of Wikipedia metrics
based on the knowledge graph data set, and to explore the proposed metrics and the
different types of attention they capture.

This work and its objects align with novel developments on social media metrics (Díaz-
Faes, Bowman, & Costas, 2019; Wouters, Zahedi, & Costas, 2019), contributing to the explo-
ration of different science-society interactions that can be captured on Wikipedia (Costas, 的

Quantitative Science Studies

936

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

Rijcke, & Marres, 2020). Our ambition is to frame Wikipedia as a data source with multiple
informetric research possibilities. 此外, a dedicated data set of the English edition of
Wikipedia is constructed for informetric purposes and is freely available at Zenodo (https://土井
.org/10.5281/zenodo.6346899). R and Python were used together for its elaboration, 与
scripts available on GitHub (https://doi.org/10.5281/zenodo.6959428). Many of the results
presented here are novel, as to the best of our knowledge there is no previous literature that
has explored the same large set of Wikipedia features and with the same large-scale perspec-
tive as in this study. This work is intended to be useful for a wide range of researchers, 例如
librarians, informetricians, sociologists, and data scientists.

2. WIKIPEDIA FROM AN INFORMETRIC PERSPECTIVE

2.1. Analogy Between Wikipedia Pages and Scientific Publications

In Wikipedia, the key components are the individual pages. Wikipedia pages are not only used
for the publication of encyclopedia articles but also other numerous typologies of pages, 这样的
as categories, 用户, and talk pages, as well as relationships among them. The different types of
pages are given by a pre-established namespace (a type of page with special features identi-
fiable through a prefix included in the title). Wikipedia currently has 12 namespaces in use
(文章, 用户, 维基百科, file, mediawiki, template, 帮助, 类别, portal, 草稿, timedtext,
and module), each with an associated “talk namespace” (or “talk page”) in which discussions
are held around the contents and edits of the page, and two virtual namespaces (special and
媒体).

There are several features of Wikipedia pages, in particular namespace article pages, 为了
which it is possible to establish an equivalence with that of a scientific publication. 第一的, 他们
have a title and an associated page identifier ( Wikipedia page ID). They may have one or more
authors, it being possible to identify the first person who created it, 什么时候, and those who
have made a greater contribution or whose edition has been revoked. The contents may
include multimedia files, links to external resources, and bibliographic references, 之中
其他的. There are also internal links that enable Wikipedia pages to connect to each other, 只是
like citations among scientific publications. 最后, Wikipedia pages can be classified with
categories according to their contents to carry out its thematic classification, such as keywords
and classifications applied to scientific publications. Most of these elements can be seen as
metadata to be treated in the study of Wikipedia pages. 然而, there are several differences
between Wikipedia pages and scientific publications that cannot be ignored (桌子 2). 这
most important is that Wikipedia pages are a living resource and not static documents. 这
access and editing of the contents also differ between Wikipedia pages and scientific publica-
tions because Wikipedia pages do not focus on a specific audience (例如, scientific publica-
tions mostly focus on academic audiences), but anyone can take an active part in editing
他们. It should be also noted that some pages may be temporarily limited or protected for
编辑 (爬坡道 & Shaw, 2015).

The living nature of Wikipedia pages puts them at the center of a complex system
(Ladyman, Lambert, & Wiesner, 2013), whose main elements are represented in Figure 1.
Many of the elements of the pages are static or unalterable, such as the creation date or page
ID, while others are in constant evolution, especially the contents themselves. This makes it
difficult to study certain elements in Wikipedia (Détienne, Baker et al., 2016), as Wikipedia
content is volatile and authorship and contribution roles can be diluted in contrast to the
higher stability of scientific publications. 此外, the same page, especially encyclopedic
文章, may have parallel versions in different language editions of Wikipedia, 这可能

Quantitative Science Studies

937

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

ID

姓名

Type

Creation

作者身份

Content

语言

讨论

描述

Tags

媒体

Wikinformetrics

桌子 2.

Comparison of features between Wikipedia pages and scientific publications

Wikipedia element description
状态

Document state condition

Document identification number

Title of the document

Document typologies

Wikipedia pages vs. Scientific publications

Wikipedia page

Living

Page ID

标题

Namespace

(12 + 12 类型)

Scientific publication
Static

DOI, 国际标准书号, URI …

标题

纸, proceeding,

letter …

Date from which it is available

First edition date

Publication date

Responsible for the work

Type of content

Wikipedians

Structured text

Authors

Structured text

Language of the resource

Edition dependent

Document dependent

Comments on the contents

Talk

Work summary

Terms describing the content

Audiovisual resources includable

Short description

Categories

图片, audios,
and videos

Internal links

Peer review

抽象的

关键词

图片, audios,
and videos

Citations

Internal links

Links to the related resources

Format

Standardized structure and content

Manual of style*

Format guidelines

Bibliography

References of cited resources

Access

Audience

Access model

Document target audience

参考

Open

General

参考

Closed/Open

Specialized

* The English Wikipedia has its own manual of style https://en.wikipedia.org/wiki/ Wikipedia:Manual_of_Style.

数字 1. Diagram of the main elements involved in creating and editing Wikipedia articles.

Quantitative Science Studies

938

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

vary in content. This scenario becomes even more complex when taking into account that not
only human users are involved in the development of Wikipedia pages but also bots, 因此
making the interactions that can occur more complex to analyze (Tsvetkova, García-
Gavilanes et al., 2017).

2.2. Categorization

Wikipedia pages are not thematically organized according to a controlled language-based
classification, such as Britannica’s subject organization system. 反而, Wikipedia pages have
a category system that works like a folksonomy (Minguillón, Lerga et al., 2017). Wikipedians
are free to tag each page under one or more existing categories or to create new ones.
Numerous studies have approached them, such as by studying their semantic domain
(Aghaebrahimian, Stauder, & Ustaszewski, 2020; Heist & 保尔海姆, 2019). 然而, the main
problem of this folksonomy is the large number of individual categories and their unstructured
(IE。, without a clear hierarchical system) relations at different levels, introducing a lot of noise
and making it difficult to have a general thematic view of Wikipedia (Boldi & Monti, 2016;
Kittur, 志, & Suh, 2009). 此外, there are also hidden categories, related to the mainte-
nance or management of the page.

Besides the categories, Wikipedia has other options for accessing and browsing its contents
by topics (https://en.wikipedia.org/wiki/ Wikipedia:Contents). 一方面, it offers differ-
ent curated content lists (例如, the “list of articles every Wikipedia should have” or the list of
“vital articles”). There are other lists that offer collections of articles that respond to the same
话题, and even “lists of lists.” Similarly, there are “portals,” which imitate the classic web por-
tals and are organized in sections that group the main contents of a topic, not only the articles
(例如, the “Science” portal or the “History of science” subportal). WikiProjects, communities of
Wikipedians aimed at improving Wikipedia content on a specific topic and which have their
own page from which they coordinate their activities, can also work as a classification
approach due to their thematic orientation (例如, “Anthropology” or “The Beatles”). 那里
are also third-party classification systems, such as the “Library of Congress Classification” or
the “Universal Decimal Classification.” Finally, external to Wikipedia, but within the Wiki-
media ecosystem, there are other types of classification solutions, such as Wikidata taxon-
omies (https://www.wikidata.org/wiki/ Wikidata:WikiProject_Taxonomy) or ORES (https://
www.mediawiki.org/wiki/ORES), that can be used to identify Wikipedia topics using machine
learning techniques. The main limitation with all of the above is that there is no central clas-
sification system that covers all Wikipedia pages, and that at the same time it is concise and
easy to manage, particularly in terms of the number of subjects and the hierarchical relation-
ships among them. The lack of such central classification in Wikipedia is a major hindrance
for the large-scale epistemic study of Wikipedia.

2.3. Content Control

Each Wikipedia page has a discussion space called “talk pages,” where Wikipedians discuss
with other Wikipedians. Talk pages aim at improving the quality and reliability of the articles.
Discussions in talk pages are public (Ferschke, 古列维奇, & Chebotar, 2012), resembling the
model of open peer review of scientific publications (黑色的, 2008), and representing a form of
public review in contrast to the traditional academic blind peer review system (Cummings,
2020). Wikipedia also includes formal peer review approaches in which Wikipedians request
assistance from experts on given topics (https://en.wikipedia.org/wiki/ Wikipedia:Peer_review).
Despite discrepancies and differences about what open peer review means and the different

Quantitative Science Studies

939

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

桌子 3. General quality grading scheme of WikiProject articles

班级
Featured article

The best possible content on Wikipedia, no need for improvement

描述

Assignment
审查

Badge
是的

Featured list

The best possible list on Wikipedia, no need for improvement

A

Fully addresses the subject and requires only minor improvements

Good article

It satisfies Wikipedia’s main criteria and is close to a professional article

C

Start

Stub

List

The content is almost complete and has no major problems

The content is considerable, but has significant problems

It includes significant content, but is still in development

The content is very short and requires substantial work

Content displayed in a list linking to Wikipedia articles on a specific topic

审查

审查

审查

自由的

自由的

自由的

自由的

自由的

是的

是的

models proposed (Ross-Hellauer, 2017), the three basic principles (open identities, 报告, 和
参与) are clearly recognizable in Wikipedia (Table S2 in the Supplementary material).
Wikipedians are both authors and reviewers of content and their reports are available as com-
ments on the talk pages, all of which are always open and identifiable. 有趣的是, 维基百科-
inspired reviewing approaches have even been proposed for scholarly publishing, 例如
postpublication correction system and readers’ comments (Xiao & Askin, 2014).

Wikipedia also includes a quality control system of the content of the different articles that
comes from WikiProjects. It is grounded on an evaluation system to classify pages in higher or
lower levels of content quality, with standard grades that are listed on the respective talk page.
Although there is a general scheme (桌子 3), it is possible that some WikiProjects do not
include all grades or that there may be differences in their application. 相似地, the pages
are also classified according to their importance within the topic (Top, 高的, 中, 和
低的). Wikipedians can set any level of quality and importance on a given page, 也
modifying them. When there are disagreements among Wikipedians about the quality level
of a page, this leads to a discussion and a search for consensus around the quality level of
the page. 然而, at the highest levels of quality (Featured Articles and Good Articles) 这
assignment requires a stricter review process, including the presentation of a candidacy and an
evaluation by independent Wikipedians according to pre-established criteria. These two levels
also have their own badges on the article page.

2.4. 来源

A fundamental aspect of Wikipedia lies in the system of links that allows its pages to be con-
nected among them, making Wikipedia unique in this sense with regard to other encyclopedic
系统 (Reagle & Koerner, 2020). These internal links have been studied, showing both the
semantic relationships they can establish and other potential utilities (Consonni, Laniado, &
Montresor, 2019; Presutti, Consoli et al., 2014), as well as the possibility of calculating net-
work indicators such as PageRank based on them (Thalhammer & Rettinger, 2016). 有,
然而, important issues to consider when working with Wikipedia pages links:

1. The links may be redirects; 那是, old page versions that automatically redirect to the

new versions when accessing them.

Quantitative Science Studies

940

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

2. There are lists of links to other Wikipedia pages. Most of the lists include pages that are
conceptually related to each other and share a clear subject matter. 然而, 有
specific lists such as disambiguation pages, which are aimed at reducing the ambiguity
of some terms (例如, “citation” or “granada”), and therefore the links in these lists are not
necessarily thematically related.

Another fundamental source for Wikipedia is its bibliographic references. Wikipedia rec-
ommends the use of bibliographic references to support its contents and it is an essential
requirement for a page to achieve the best quality status (Featured article). These references
are the same as those made in scientific publications, in both cases serving as a support for an
主意. 然而, it is necessary to consider that citations in Wikipedia and citations in scientific
publications are governed by different norms and dynamics. 图中 2 the main differences
between scientific publications references and Wikipedia references are schematized.

Other relevant particularities of Wikipedia references include

(西德:129) Unlike scientific publications in which the identity of the citers (IE。, those including the
references in the scientific publication) is clear and invariable, in Wikipedia this is more
复杂的 (given the live nature of Wikipedia articles) and not always possible. 然而,
there are some methodological proposals for this purpose (Zagorova et al., 2022).

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

数字 2. Differences between traditional citations and Wikipedia mentions of scientific publications.

Quantitative Science Studies

941

Wikinformetrics

(西德:129) Wikipedia citation counts can be distorted by the translations of articles into different
语言, because it is possible to easily transfer the references across the different lan-
guage versions of the same article, thus distorting the meaning and value of Wikipedia
citation counts. This limitation does not occur in scientific publications, as only one lan-
guage version of a given publication is usually considered in the counting of citations.
(西德:129) There are certain Wikipedia pages that function as large bibliographic indexes, 带来
together the most relevant literature on a specific topic (例如, research annuals or
bibliographies).

(西德:129) There are also templates (special Wikipedia pages that are embedded within other pages
to facilitate the repetition of information), which are sometimes used to generate pre-
established lists of references that are quickly inserted and replicated into numerous
Wikipedia pages that are strongly related. This happened, 例如, with the listing
of lunar crater references (https://en.wikipedia.org/wiki/ Wikipedia:Templates_for
_discussion/Log/2014_June_8#Template:Lunar_crater_references).

2.5. Data Gathering

There are numerous data sources, and the choice of one or the other depends mostly on the
type and volume of data required. 在某些情况下, there are even multiple ways of accessing the
same data. These have been summarized in Table 4, but can be found in detail in Section S3 in
the Supplementary material. 实际上, Wikimedia has a Research community (https://meta
.wikimedia.org/wiki/Research) that gathers different resources to help and guide all those peo-
ple who want to access the data of the Wikimedia projects and that lists the different projects
related to it.

The two main sources are dumps and APIs. One of the main problems when working with
Wikipedia data dumps is their size, especially when dealing with the more complete editions
(例如, the metadata of the revision of the English Wikipedia pages as of June 2022 is formed by
27 files of more than 2 Gbyte each), so accessing a subset of data requires a lot of time and
努力. In the case of using Wikipedia APIs, metadata can be accessed on demand, 但是
retrieval process is very laborious, especially when large volumes of data are required. 其他
sources are characterized by offering already preprocessed data, such as the total number of
edits or page views, which can be consulted from XTool.

在本文中, we extracted and developed a full Wikipedia knowledge graph with the ambi-
tion of facilitating the future of the English Wikipedia, reducing the time and effort that
researchers may need in collecting and connecting all the different data sources.

2.6. Wikinformetrics

最后, there are multiple metrics that can be extracted from the sources presented before and
that enable the informetric study of Wikipedia pages. Based on previous studies and the above
exploration of the informetric characteristics of Wikipedia, several metrics have been selected
(桌子 5). Each of them is of interest for measuring a particular dimension of the pages. 为了
例子, the number of views can be seen as a measure of the impact and outreach of a par-
ticular page, and although the numbers of edits and editors reflect the volume of activity, 这
numbers of talks and talkers are representative of the discussions that take place around these
页面. These are not the only metrics that can be obtained from Wikipedia, but they can be
considered to capture some of the most important analytical aspects of Wikipedia pages (例如,

Quantitative Science Studies

942

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3



A
n

t

t

A

t

v
e
S
C
e
n
C
e
S

d
e
s

t

9
4
3

桌子 4.

Summary of Wikipedia data sources by format, update frequency, data quantity, 类型, and challenges

Wikimedia Dumps Metadata, page content,

and relationships

Content

Access
Offline

Format

Update frequency Data quantity*

XML, SQL

Once/twice
一个月

Big data

Type** Main challenge***
Data processing
General

MediaWiki and

Wikimedia APIs

Metadata, page content,

在线的

JSON, WDDX,

Real time

Small data

General

Data recovery

关系, and statistics

XML, YAML, PHP

Wiki Replicas

Metadata, page content,

在线的

SQL

Near-real time

Small data

General

Data recovery

and relationships

Event Streams

Real-time logs

在线的

SSE, JSON

Analytics dumps

Statistics on page views

Offline

TSV

Real time

Monthly

Specific

Data recovery

Big data

Specific

Data processing

WikiStats

and activity

Statistics on page views,
内容, and activity

在线的

JSON/CSV

Monthly

Small data

Specific

Data recovery

Dbpedia

Contents and semantic

两个都

RDF/ XML, Turtle,

Live/monthly

General

Data recovery

关系

N-Triplets,
SPARQL
endpoint

XTools

Statistics on page views,
内容, and activity

在线的

JSON

Real time

Small data

Specific

Data recovery

Repositories

Dedicated Wikipedia

Offline

data sets

Altmetric

聚合器

Wikipedia References

to publications

* Volume of data to be retrieved and processed.

在线的

CSV/JSON

日常的

** Data from Wikipedia are included to address different problems or are of a specific nature.

*** Task that will require more effort when using the data source.

Specific

Data processing



k

n
F

r

e
t
r

C
s

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

Metric
Editors

Edits

链接

Links

年龄

Length

Talkers

Talks

Views

参考

Pub. referenced

URLs

桌子 5. Description of the metrics obtained for Wikipedia articles by analytical dimension

Analytical dimension

描述

Activity

Activity

Number of unique editors that have edited a Wikipedia article

Number of total edits that have a Wikipedia article

Connectivity

Number of Wikipedia articles in which the article is linked to

Connectivity

Number of internal links that include a Wikipedia article to others

描述

Years that have passed since the creation of the page to the date of data collection

描述

Length in bytes of the page

讨论

讨论

外展

支持

支持

支持

Number of unique editors that have edited a Wikipedia article’s talk page

Number of total edits that the talk page of a Wikipedia article has

Number of daily views of a Wikipedia page

Number of elements listed in the references

Number of publications referenced

Number of external links that include a Wikipedia article

contributions, content development, links and interactions, and impact), being also easy to
interpret in an informetric framework.

3. WIKIPEDIA KNOWLEDGE GRAPH

Using the different data sources described above, a knowledge graph of the English edition of
Wikipedia has been constructed for informetric purposes and freely shared on Zenodo (https://
doi.org/10.5281/zenodo.6346899). The English edition of Wikipedia has been chosen
because it is the largest one and has an international scope. For its construction, 数据来自
Wikimedia and analytic dumps were used, as well as data shared in repositories, 具体来说
the data set of Singh et al. (2020) in which they share references made in Wikipedia articles.
The data included in this data set covers all English Wikipedia activity until July 2021, 除了
page views, which are from April 1, 2021 to June 30, 2021, and bibliographic reference data,
until May 2020. R and Python have been used together, with the scripts available on GitHub
(https://doi.org/10.5281/zenodo.6959428). The construction of this data set is described in
Section S1 in the Supplementary material. The resulting data set consists of nine files con-
nected to each other by a relational structure summarized in Figure 3.

This knowledge graph offers numerous possibilities for the informetric study of Wikipedia,
making it possible to study new relationships (and interactions) between science and this
social medium (例如, the attention on Wikipedia to academic topics, the presence of scientific
literature on popular Wikipedia pages, or the use of scientific literature in Wikipedia pages
with large discussions in their Talk pages). This is the case of the work of Arroyo-Machado,
Díaz-Faes, and Costas (2022), who found a positive relationship between the research
performance of universities and their social attention on Wikipedia, using data from this
数据集.

Although the generation of new versions of the knowledge graph cannot be guaranteed by
the authors of this paper, the way in which its creation is detailed and the shared scripts ensure

Quantitative Science Studies

944

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

数字 3. Diagram of files and relationships of the Wikipedia knowledge graph data set.

that new versions can be generated. This is also of importance for the generation of new
knowledge graphs in other language editions of Wikipedia, as the data used as a basis are also
available in other languages. The only limitation in this respect is in the reference data, 像他们
come from a specific data set (Singh et al., 2020). 然而, those responsible have also shared
the tools used to obtain the references and there are other alternatives such as Zagorova et al.
(2022) or altmetric data aggregators.

4. CASE STUDY: INFORMETRIC ANALYSIS OF THE ENGLISH WIKIPEDIA

As a case study, the knowledge graph of the English Wikipedia is used to calculate and study
the proposed metrics in a broad manner. The analysis was performed in Python and the code is
available at GitHub (https://doi.org/10.5281/zenodo.6958972).

4.1. Wikipedia Metrics and Articles’ Content

有 53,710,529 pages in the English Wikipedia, considering all namespaces as well as
pages that are redirects; 然而, this number is reduced to 6,328,134 pages when the
focus is on articles that are not redirects. These represent just 11.79% of the overall English
维基百科. The metrics proposed in Figure 4 have been obtained for all of them.

数字 4 shows the descriptive statistics of the main variables, differentiating between
total Wikipedia articles and those classified based on their quality; 5,522,676 文章
(87.27% of the total) are associated with a WikiProject and with some quality level. 文章
with different quality levels have been considered in all of them. It is noticeable that in all
指标, Featured articles have the highest values. The case of class B articles is noteworthy,
as they not only show few differences with respect to the Good and A-Class articles, 存在

Quantitative Science Studies

945

Wikinformetrics

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

数字 4. Average of Wikipedia article metrics differentiating by the quality assigned from a project.

also greater in number of articles than both, but in aspects such as views they are positioned
above them.

There are important differences in the number of referenced publications, going from an
平均数 14.27 publications in Featured articles to 8.52 in A and 5.84 in Good articles, 尽管
the Start and Stub articles cite on average less than one publication. This reflects compliance
with English Wikipedia’s criteria for establishing the quality level of articles. The general cri-
teria do not make explicit the need for a greater number of references to increase the level of
质量, 除其他外, but they do require an increase in “reliable sources,” so that citations to
publications can serve as a proxy for this. 同样地, it also corroborates previous findings of a
relationship between the level of quality and the number of edits (Wilkinson & Huberman,
2007), and the length of articles (Blumenstock, 2008).

Most Wikipedia pages are not of recent creation (Figure 5A), with a median of 11 年. 在
some of the metrics, such as edits and talks, extreme outliers are found. This can be seen in the
fact that their average values are 102 和 9.19, 分别, above the median and third quar-
tile values. This situation is much more pronounced in the case of views, with an average of
3,346.59. 此外, the number of referenced elements has a median of 1 and an average
的 4.6. When comparing the links with the linked ones, we find that Wikipedia pages link more
than they are linked, because the median for the former is 36 and for the latter 15.

The correlations between these variables are all positive (Figure 5B). The strongest correla-
tion is between talkers and talks (rs = 0.97), followed by another analogous relationship such
as that between editors and edits (rs = 0.94). When considering pairs of metrics of different
自然, the strongest correlation is between edits and views (rs = 0.74), followed by that of

Quantitative Science Studies

946

Wikinformetrics

数字 5. A: Boxplots of the main metrics for Wikipedia articles excluding outliers from the figures and marking the mean with a cross sym-
bol. 乙: Spearman’s rho correlations between the main metrics for Wikipedia English articles.

editors and views (rs = 0.72), which suggests a relationship between the popularity of Wikipe-
dia pages in terms of visits and their number of edits. 有趣的是, a lower correlation was
found between views, and both talks and talkers (rs = 0.48), suggesting that discussions around
Wikipedia pages are not necessarily related to higher numbers of views. Another moderate
correlation can be found between the length of an article and its views (rs = 0.6), 哪个
may indicate that the larger the article, the more attention it receives or that the more attention
it receives, the more it grows in length. There are other moderate correlations, such as between
the length and the number of references (rs = 0.56) and URLs (rs = 0.65), but which are to be
expected as the two elements directly interfere with each other. The number of referenced
publications is the metric most weakly correlated, there being for example a weak correlation
between this and views (rs = 0.24) or talks (rs = 0.2). Our results confirm the same type of
relationships reported in previous research (Mittermeier et al., 2021), albeit this time consid-
ering the entire population of English language Wikipedia articles.

4.2. Different Types of Attention Captured on Wikipedia

The results of this analysis can also be accessed interactively and in greater detail via the R
Shinny app: https://wenceslao-arroyo-machado.shinyapps.io/wikinformetrics/.

A review of Wikipedia’s main pages based on different metrics reveals its potential to cap-
ture content that responds to different types of attention (Table S4 in the Supplementary
材料). The page views make it possible to identify those topics that capture the most atten-
tion of society in a given period—page views are limited to a period of 3 months in our data
放. 因此, in our data set the pages of Prince Philip, Duke of Edinburgh (10,860,553 意见) 和
Elizabeth II (9,900,275), or Mare of Easttown (5,995,513) rank among the most visited in the
English-language Wikipedia. 还, five of the 20 most viewed pages are series or movies
released in the period analyzed, which also highlights that content related to entertainment

Quantitative Science Studies

947

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

occupies a relevant position in Wikipedia. Sports also receive many views and reflect current
事件, as evidenced by the UEFA Euro 2020 页 (12,100,455 意见), the second most
viewed, just after the Main Page (554,030,839). There is a clear presence of articles that
respond to general interests, such as the Bible (11,048,609) or Cleopatra (9,516,340) 页面.
This may indicate that some topics raise general interest and may not be time related.

The number of talks of Wikipedia articles is often used in conjunction with other variables
in the construction of models for controversy detection (Jang, Foley et al., 2016). 这表明
that this metric may be useful for detecting such controversial content in a simple way. Among
这 20 pages with the highest number of talks, those of political figures, religion topics, 和
scientific controversies stand out. The strong talk that takes place in some of them, 如
Donald Trump (62,944), and the vandalism and presence of trolls, as in Gamergate controversy
(27,185), have caused the editing of these pages to be restricted. 实际上, there are some articles
clearly related to controversial or sensitive issues, such as Climate change (40,837) and Home-
opathy (25,898). 在这方面, Wikipedia itself offers a page with a curated list of controversial
文章 (https://en.wikipedia.org/wiki/ Wikipedia:List_of_controversial_issues), 和 13 的
20 pages listed as of 4 七月 2021.

最后, based on the volume of referenced publications, 那是, all materials with an asso-
ciated identifier (DOI, 国际标准书号, arXiv ID, ETC。), it is also possible to identify the Wikipedia pages
that cite more scientific publications. 然而, in this case there are many research annuals
and bibliographic pages present among the 20 文章, 例如 2018 in paleontology with
569 referenced publications. These lists have been eliminated to select the top 20 articles with
encyclopedic content. In these articles there is a clear presence of scientific content, 尤其
in medicine, such as Feminizing hormone therapy (329) and Alzheimer’s disease (277). 如何-
曾经, there are also articles related to history, such as History of Lisbon (313) or World War II
(264). This may suggest that the metric of the number of publications cited can be used as a
proxy to identify Wikipedia articles that are more scholarly oriented.

5. 讨论

In this study we describe how Wikipedia is a complex system, involving numerous actors and
元素, and whose rules and governance depend on the community itself (Jemielniak,
2012). It is not only one of the first and clearest examples of Web 2.0 but also one of the
few that remains among the most visited websites and has not deviated from its initial objec-
主动的. Far from that, over the years it has gained the acceptance and trust of many of those who
initially looked at it with skepticism.

We describe many similarities between scientific publications and Wikipedia pages. 两个都
have different typologies of documents, structured content, evaluation of content, and use of
links and bibliographic references. There are also notable differences. While scientific publi-
cations may have limited access and a more specialized audiences, Wikipedia’s content and
scope is more open and targeted to more general audiences. The live nature of Wikipedia is
probably its main distinctive feature when compared to scientific publications. This must be
considered when conducting informetric research on Wikipedia. To help in this endeavor, 我们
propose an informetric-inspired conceptual framework, proposing different metrics that pay
attention to the different analytical dimensions of Wikipedia, such as article characteristics,
outreach, or citations to scientific publications. Some of these metrics have been already
explored in the literature, such as page views (Mittermeier et al., 2019, 2021), but never in
a comprehensive conceptual framework. The informetric-inspired conceptual framework
presented here is expected to be useful for any Wikipedia study involving informetric,

Quantitative Science Studies

948

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

scientometric, bibliometric, or webometric perspectives. 相似地, different Wikipedia data
sources have been identified and described, finding in their differences in coverage, 体积,
使用权, or data processing crucial aspects for their selection.

Alongside the conceptual analytical framework proposed, a knowledge graph of the English
edition of Wikipedia has been built and shared openly (https://doi.org/10.5281/zenodo
.6346899). The data are gathered under a comprehensive data set that follows a relational
model and can be used by anyone interested in the study of this encyclopedia from an infor-
metric point of view. It combines different data sources that allow users on the one hand to
characterize any Wikipedia page, while also allowing them to establish relationships between
彼此 (例如, between two articles, an article and a category or an article and a linked
website or a scientific publication referenced in it). Together with the metadata and relations
of Wikipedia pages, the data of their bibliographic references are also incorporated, 哪个
come from the data set shared by Singh et al. (2020). It is precisely in Wikipedia’s biblio-
graphic reference data where greater efforts are needed so that they can be efficiently accessed
through its official sources, such as dumps or the API.

The case study provides a descriptive overview of Wikipedia articles in its English edition,
suggesting interesting valuable analytical possibilities and highlighting the relationships and
usefulness of the metrics described. Our results suggest that the low correlations among most
of the metrics point to the fact that the analytical dimensions measured through them are rather
distinct. The potential analytical usefulness of some of the metrics has been highlighted. 为了
例子, the number of Wikipedia page views can be seen as a metric of social attention; 这
number of talks of Wikipedia pages can be seen as a proxy of controversial topics; 和
number of scientific references in Wikipedia pages can help identify scholarly-related content.
The use of the quality levels derived from WikiProjects has proved to be useful, showing clear
differences between the different levels, but has also provided an overview of the Wikipedia
文章.

最后, it is important to also mention some of the limitations of this work. 第一的, not all
possible Wikipedia metrics and their relationships have been explored (例如, the relationship
between pages and users, or the number of users who follow the pages (the so-called
watchers), or the number of editions in other languages of a given article). The use of large
amounts of data and some specific sources leads to a loss of consistency. 例如, 这
Wikipedia dump process takes several days without blocking the edits during that time, so they
are not really a snapshot. This loss of consistency also occurs when using different sources,
especially when combining 2021 Wikipedia data with references from a third-party data set
published in 2020. The knowledge graph and the case study are based on the English
维基百科; 然而, future research should study whether the same relationships found in this
study also hold for other languages as well as the existing relationships between language
editions.

致谢

We thank Mercedes and María for their intellectual advice in the early stages.

作者贡献

Wenceslao Arroyo-Machado: 数据管理, 形式分析, 调查, 软件, Visual-
化, Writing—original draft. Daniel Torres-Salinas: 资金获取, 资源, Valida-
的, Writing—review & 编辑. Rodrigo Costas: 概念化, 方法, 项目
行政, 监督, Writing—review & 编辑.

Quantitative Science Studies

949

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

COMPETING INTERESTS

The authors have no competing interests.

资金信息

This work was funded by the Spanish Ministry of Science and Innovation with grant number
PID2019-109127RB-I00/SRA/10.13039/501100011033. Wenceslao Arroyo-Machado
received an FPU Grant (FPU18/05835) from the Spanish Ministry of Universities. Daniel
Torres-Salinas received support under the Reincorporation Programme for Young Researchers
of the University of Granada. Rodrigo Costas is partially funded by the South African DSI-NRF
Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy
(SciSTIP).

DATA AVAILABILITY

The Wikipedia knowledge graph data set is available in Zenodo (Arroyo-Machado et al.,
2022).

The source code for constructing the Wikipedia knowledge graph data set is available in

Zenodo (Arroyo-Machado, 2022A).

The case study code is available in Zenodo (Arroyo-Machado, 2022乙).

参考

Adams, C. E., 蒙哥马利, A. A。, Aburrow, T。, Bloomfield, S。,
Briley, 磷. M。, … Xia, J. (2020). Adding evidence of the effects of
treatments into relevant Wikipedia pages: A randomised trial.
BMJ Open, 10(2), e033655. https://doi.org/10.1136/ bmjopen
-2019-033655, 考研: 32086355

Adams, J。, Brückner, H。, & Naslund, C. (2019). Who counts as a
notable sociologist on Wikipedia? 性别, 种族, and the “Profes-
sor Test.” Socius, 5, 2378023118823946. https://doi.org/10.1177
/2378023118823946

Aghaebrahimian, A。, Stauder, A。, & Ustaszewski, 中号. (2020). Testing
the validity of Wikipedia categories for subject matter labelling of
open-domain corpus data. Journal of Information Science, 48(5),
686–700. https://doi.org/10.1177/0165551520977438

Arroyo-Machado, 瓦. (2022A). Wences91/wikipedia_knowledge_
图形 [Source code]. https://doi.org/10.5281/zenodo.6959428
Arroyo-Machado, 瓦. (2022乙). Wences91/wikinformetrics [来源

代码]. https://doi.org/10.5281/zenodo.6958972

Arroyo-Machado, W., Díaz-Faes, A. A。, & Costas, 右. (2022). 新的
insights on social media metrics: Examining the relationship
between universities’ academic reputation and Wikipedia atten-
的. 26th International Conference on Science, Technology and
Innovation Indicators (STI 2022), Granada, 西班牙. https://doi.org
/10.5281/zenodo.6962442

Arroyo-Machado, W., Torres-Salinas, D ., & Costas, 右. (2022). Wiki-
pedia knowledge graph dataset [Data set]. https://doi.org/10
.5281/zenodo.6346899

Arroyo-Machado, W., Torres-Salinas, D ., Herrera-Viedma, E., &
Romero-Frías, 乙. (2020). Science through Wikipedia: A novel
representation of open knowledge through co-citation networks.
PLOS ONE, 15(2), e0228713. https://doi.org/10.1371/journal
.pone.0228713, 考研: 32040488

黑色的, 乙. 瓦. (2008). Wikipedia and academic peer review. 在线的
Information Review, 32(1), 73–88. https://doi.org/10.1108
/14684520810865994

Blumenstock, J. 乙. (2008). Size matters: Word count as a measure of
quality on Wikipedia. In Proceedings of the 17th International
Conference on World Wide Web (PP. 1095–1096). https://土井
.org/10.1145/1367497.1367673

Boldi, P。, & Monti, C. (2016). Cleansing Wikipedia categories using
centrality. In Proceedings of the 25th International Conference
Companion on World Wide Web (PP. 969–974). https://doi.org
/10.1145/2872518.2891111

Bould, 中号. D ., Hladkowicz, 乙. S。, Pigford, A.-A. E., Ufholz, L.-A.,
Postonogova, T。, … Boet, S. (2014). References that anyone can
edit: Review of Wikipedia citations in peer reviewed health sci-
ence literature. BMJ: British Medical Journal, 348, g1585. https://
doi.org/10.1136/bmj.g1585, 考研: 24603564

Candelario, D. M。, 巴斯克斯, 五、, Jackson, W., & Reilly, 时间. (2017).
Completeness, 准确性, and readability of Wikipedia as a refer-
ence for patient medication information. Journal of the American
Pharmacists Association: JAPhA, 57(2), 197–200., https://doi.org
/10.1016/j.japh.2016.12.063, 考研: 28139458

Colavizza, G. (2020). COVID-19 research in Wikipedia. Quantita-
tive Science Studies, 1(4), 1349–1380. https://doi.org/10.1162
/qss_a_00080

Consonni, C。, Laniado, D ., & Montresor, A. (2019). WikiLink-
Graphs: A complete, longitudinal and multi-language dataset of
the Wikipedia link networks. In Proceedings of the 13th Interna-
tional AAAI Conference on Web and Social Media (PP. 598–607).
https://doi.org/10.1609/icwsm.v13i01.3257

Costas, R。, de Rijcke, S。, & Marres, 氮. (2020). “Heterogeneous cou-
plings”: Operationalizing network perspectives to study
science-society interactions through social media metrics. Jour-
nal of the Association for Information Science and Technology,
72(5), 595–610. https://doi.org/10.1002/asi.24427

Cummings, 右. 乙. (2020). Writing knowledge: 维基百科, 民众
review, and peer review. Studies in Higher Education, 45(5),
950–962. https://doi.org/10.1080/03075079.2020.1749791

Quantitative Science Studies

950

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

.

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

Détienne, F。, 贝克, M。, Fréard, D ., Barcellini, F。, Denis, A。, &
Quignard, 中号. (2016). The descent of Pluto: Interactive dynamics,
specialisation and reciprocity of roles in a Wikipedia debate.
International Journal of Human-Computer Studies, 86, 11–31.
https://doi.org/10.1016/j.ijhcs.2015.09.002

Díaz-Faes, A. A。, Bowman, 时间. D ., & Costas, 右. (2019). Towards a
second generation of “social media metrics”: Characterizing
Twitter communities of attention around science. PLOS ONE,
14(5), e0216408. https://doi.org/10.1371/journal.pone
.0216408, 考研: 31116783

Dzogang, F。, Lansdall-Welfare, T。, & Cristianini, 氮. (2016). Sea-
sonal fluctuations in collective mood revealed by Wikipedia
searches and Twitter posts. 在 2016 IEEE 16th International Con-
ference on Data Mining Workshops (ICDMW ) (PP. 931–937).
https://doi.org/10.1109/ICDMW.2016.0136

Ferschke, 奥。, 古列维奇, 我。, & Chebotar, 是. (2012). Behind the
文章: Recognizing dialog acts in Wikipedia talk pages. In Pro-
ceedings of the 13th Conference of the European Chapter of the
计算语言学协会 (PP. 777–786).

Generous, N。, Fairchild, G。, 德什潘德, A。, Del Valle, S. Y。, &
Priedhorsky, 右. (2014). Global disease monitoring and forecast-
ing with Wikipedia. PLOS Computational Biology, 10(11),
e1003892. https://doi.org/10.1371/journal.pcbi.1003892,
考研: 25392913

原, N。, & Doney, J. (2015). Social construction of knowledge in
维基百科. First Monday, 20(6). https://doi.org/10.5210/fm.v20i6
.5869

Heist, N。, & 保尔海姆, H. (2019). Uncovering the semantics of
Wikipedia categories. 在C中. Ghidini, 氧. Hartig, 中号. Maleshkova,
V. Svátek, 我. Cruz, A. Hogan, J. 歌曲, 中号. Lefrançois, & F. Gandon
(编辑。), The Semantic Web – ISWC 2019 (PP. 219–236). 施普林格
国际出版. https://doi.org/10.1007/978-3-030
-30793-6_13

爬坡道, 乙. M。, & Shaw, A. (2015). Page protection: Another missing
dimension of Wikipedia research. In Proceedings of the 11th
International Symposium on Open Collaboration. https://doi.org
/10.1145/2788993.2789846

History of Wikipedia. (2021). 维基百科. 28 可能. https://在

.wikipedia.org/wiki/History_of_Wikipedia

Jang, M。, Foley, J。, Dori-Hacohen, S。, & 艾伦, J. (2016). Probabilis-
tic approaches to controversy detection. 在诉讼程序中
25th ACM International on Conference on Information and
Knowledge Management (PP. 2069–2072). https://doi.org/10
.1145/2983323.2983911

Jemielniak, D. (2012). 维基百科: An effective anarchy. 巴尔的摩,

医学博士: Society for Applied Anthropology.

Jemielniak, D. (2019). 维基百科: Why is the common knowledge
resource still neglected by academics? GigaScience, 8(12),
giz139. https://doi.org/10.1093/gigascience/giz139, 考研:
31794014

Jemielniak, D ., Masukume, G。, & Wilamowski, 中号. (2019). 这
most influential medical journals according to Wikipedia: 普通话-
titative analysis. Journal of Medical Internet Research, 21(1),
e11429. https://doi.org/10.2196/11429, 考研: 30664451
Kaffee, L.-A., & Elsahar, H. (2021). References in Wikipedia: 这
editors’ perspective. In Companion Proceedings of the Web Con-
参考 2021 (PP. 535–538). https://doi.org/10.1145/3442442
.3452337

Katz, G。, & Rokach, L. (2017). Wikiometrics: A Wikipedia based
ranking system. World Wide Web, 20(6), 1153–1177. https://
doi.org/10.1007/s11280-016-0427-8

Kittur, A。, 志, 乙. H。, & Suh, 乙. (2009). What’s in Wikipedia? Map-
ping topics and conflict using socially annotated category

结构. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (PP. 1509–1512). https://doi.org
/10.1145/1518701.1518930

Kopf, S. (2020). Participation and deliberative discourse on social
media—Wikipedia talk pages as transnational public spheres?
Critical Discourse Studies, 19(2), 196–211. https://doi.org/10
.1080/17405904.2020.1822896

Kousha, K., & Thelwall, 中号. (2017). Are Wikipedia citations impor-
tant evidence of the impact of scholarly articles and books? Jour-
nal of the Association for Information Science and Technology,
68(3), 762–779. https://doi.org/10.1002/asi.23694

Ladyman, J。, Lambert, J。, & Wiesner, K. (2013). What is a complex
系统? European Journal for Philosophy of Science, 3(1), 33–67.
https://doi.org/10.1007/s13194-012-0056-8

Lageard, 五、, & Paternotte, C. (2021). Trolls, bans and reverts: 辛-
ulating Wikipedia. Synthese, 198(1), 451–470. https://doi.org/10
.1007/s11229-018-02029-0

Lewoniewski, W., Węcel, K., & Abramowicz, 瓦. (2017). 分析
of references across Wikipedia languages. 在R中. Damaševičius &
V. Mikašytė (编辑。), Information and Software Technologies
(PP. 561–573). Springer International Publishing. https://doi.org
/10.1007/978-3-319-67642-5_47

李, X。, Thelwall, M。, & Mohammadi, 乙. (2021). How are encyclope-
dias cited in academic research? 维基百科, Britannica, Baidu
Baike, and Scholarpedia. Profesional de La Información, 30(5).
https://doi.org/10.3145/epi.2021.sep.08

Maggio, L. A。, Willinsky, J. M。, Steinberg, 右. M。, Mietchen, D .,
Wass, J. L。, & Dong, 时间. (2017). Wikipedia as a gateway to bio-
medical research: The relative distribution and use of citations
in the English Wikipedia. PLOS ONE, 12(12), e0190046.
https://doi.org/10.1371/journal.pone.0190046, 考研:
29267345

真木, K., Yoder, M。, Jo, Y。, & Rosé, C. (2017). Roles and success in
Wikipedia talk pages: Identifying latent patterns of behavior. 在
Proceedings of the Eighth International Joint Conference on Natural
语言处理 ( 体积 1: Long Papers) (PP. 1026–1035).
https://aclanthology.org/I17-1103

Martinez-Rico, J. R。, Martinez-Romo, J。, & Araujo, L. (2019). Can
deep learning techniques improve classification performance of
vandalism detection in Wikipedia? Engineering Applications of
人工智能, 78, 248–259. https://doi.org/10.1016/j
.engappai.2018.11.012

Minguillón, J。, Lerga, M。, Aibar, E., Lladós-Masllorens, J。, &
Meseguer-Artola, A. (2017). Semi-automatic generation of a cor-
pus of Wikipedia articles on science and technology. Profesional
de La Información, 26(5), 995–1005. https://doi.org/10.3145/epi
.2017.sep.20

Miquel-Ribé, M。, & Laniado, D. (2018). Wikipedia culture gap: 普通话-
tifying content imbalances across 40 language editions. Frontiers in
Physics, 6, 54. https://doi.org/10.3389/fphy.2018.00054

Mittermeier, J. C。, Correia, R。, Grenyer, R。, Toivonen, T。, & Roll, U.
(2021). Using Wikipedia to measure public interest in biodiver-
sity and conservation. Conservation Biology, 35(2), 412–423.
https://doi.org/10.1111/cobi.13702, 考研: 33749051

Mittermeier, J. C。, Roll, U。, Matthews, 时间. J。, & Grenyer, 右. (2019). A
season for all things: Phenological imprints in Wikipedia usage
and their relevance to conservation. PLOS Biology, 17(3),
e3000146. https://doi.org/10.1371/journal.pbio.3000146,
考研: 30835729

Mühlhauser, 我。, & Oser, F. (2008). Does WIKIPEDIA provide evi-
dence based health care information? A content analysis. Shared
Decision-Making in Health Care, 102(7), e1–e7. https://doi.org
/10.1016/j.zefq.2008.06.020

Quantitative Science Studies

951

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Wikinformetrics

Nicholson, J. M。, Uppala, A。, Sieber, M。, Grabitz, P。, Mordaunt, M。,
& Rife, S. C. (2021). Measuring the quality of scientific references
in Wikipedia: An analysis of more than 115M citations to over
800 000 scientific articles. The FEBS Journal, 288(14), 4242–4248.
https://doi.org/10.1111/febs.15608, 考研: 33089957

Nielsen, F. A. (2007). Scientific citations in Wikipedia. 第一的

Monday, 12(8). https://doi.org/10.5210/fm.v12i8.1997

Nielsen, F. Å., Mietchen, D ., & Willighagen, 乙. (2017). Scholia,
scientometrics and Wikidata. 在E中. Blomqvist, K. Hose, H.
保尔海姆, A. L(西德:1) awrynowicz, F. 奇拉韦尼亚, & 氧. Hartig (编辑。),
语义网: ESWC 2017 Satellite Events (PP. 237–259).
Springer International Publishing. https://doi.org/10.1007/978-3
-319-70407-4_36

Olleros, F. X. (2008). Learning to trust the crowd: Some lessons
from Wikipedia. 在 2008 International MCETECH Conference
on E-Technologies (Mcetech 2008) (PP. 212–216). https://土井
.org/10.1109/MCETECH.2008.17

O’Neil, 时间. (2017). Wikipedia erases record of accomplished scientist
—‘Censored’ for his intelligent design position. PJ Media. https://
pjmedia.com/faith/tyler-o-neil/2017/11/21/wikipedia-erases
-record-of-accomplished-scientist-censored-for-his-intelligent
-design-position-n101002

Ortega, J.-L. (2020). Altmetrics data providers: A meta-analysis
review of the coverage of metrics and publication. Profesional
de La Información, 29(1). https://doi.org/10.3145/epi.2020.ene.07
Pooladian, A。, & Borrego, Á. (2017). Methodological issues in mea-
suring citations in Wikipedia: A case study in library and infor-
mation science. Scientometrics, 113(1), 455–464. https://doi.org
/10.1007/s11192-017-2474-z

Presutti, 五、, Consoli, S。, Nuzzolese, A. G。, Recupero, D. R。,
Gangemi, A。, … Zargayouna, H. (2014). Uncovering the seman-
tics of Wikipedia pagelinks. In K. Janowicz, S. Schlobach, 磷.
Lambrix, & 乙. Hyvönen (编辑。), Knowledge engineering and
knowledge management (PP. 413–428). Springer International
出版. https://doi.org/10.1007/978-3-319-13704-9_32

Priem, J。, Taraborelli, D ., Groth, P。, & Neylon, C. (2010). Altmetrics:

A manifesto. Altmetrics. https://altmetrics.org/manifesto/

Reagle, J. (2009). 维基百科: The happy accident. Interactions,

16(3), 42–45. https://doi.org/10.1145/1516016.1516026

Reagle, J。, & Koerner, J. (编辑。). (2020). Wikipedia @ 20: Stories of an
incomplete revolution. 与新闻界. https://doi.org/10.7551
/mitpress/12366.001.0001

Roll, U。, Mittermeier, J. C。, Diaz, G. 我。, Novosolov, M。, 费尔德曼, A。,
… Grenyer, 右. (2016). Using Wikipedia page views to explore the
cultural importance of global reptiles. Biological Conservation,
204, 42–50. https://doi.org/10.1016/j.biocon.2016.03.037

Ross-Hellauer, 时间. (2017). What is open peer review? A systematic
review. F1000Research, 6, 588. https://doi.org/10.12688
/f1000research.11369.2, 考研: 28580134

辛格, H。, 西方, R。, & Colavizza, G. (2020). Wikipedia citations: A
comprehensive data set of citations with identifiers extracted
from English Wikipedia. Quantitative Science Studies, 2(1),
1–19. https://doi.org/10.1162/qss_a_00105

Thalhammer, A。, & Rettinger, A. (2016). PageRank on Wikipedia:
Towards general importance scores for entities. 在H. Sack, G.
Rizzo, 氮. Steinmetz, D. Mladenić, S. Auer, & C. Lange (编辑。),

语义网 (PP. 227–240). Springer International Publish-
英. https://doi.org/10.1007/978-3-319-47602-5_41

Tomaszewski, R。, & MacDonald, K. 我. (2016). A study of citations to
Wikipedia in scholarly publications. 科学 & 技术
Libraries, 35(3), 246–261. https://doi.org/10.1080/0194262X
.2016.1206052

Torres-Salinas, D ., Romero-Frías, E., & Arroyo-Machado, 瓦.
(2019). Mapping the backbone of the humanities through the
eyes of Wikipedia. Journal of Informetrics, 13(3), 793–803.
https://doi.org/10.1016/j.joi.2019.07.002

Tripodi, F. (2021). Ms. Categorized: 性别, notability, and inequality
on Wikipedia. New Media & 社会, 14614448211023772.
https://doi.org/10.1177/14614448211023772

Tsvetkova, M。, García-Gavilanes, R。, Floridi, L。, & Yasseri, 时间.
(2017). Even good bots fight: The case of Wikipedia. PLOS
ONE, 12(2), e0171774. https://doi.org/10.1371/journal.pone
.0171774, 考研: 28231323

Vilain, P。, Larrieu, S。, Cossin, S。, Caserio-Schönemann, C。, & Filleul,
L. (2017). 维基百科: A tool to monitor seasonal diseases trends?
Online Journal of Public Health Informatics, 9(1). https://doi.org
/10.5210/ojphi.v9i1.7630

韦纳, S. S。, Horbacewicz, J。, Rasberry, L。, & Bensinger-Brody, 是.
(2019). Improving the quality of consumer health information on
维基百科: Case series. Journal of Medical Internet Research, 21(3),
e12450. https://doi.org/10.2196/12450, 考研: 30882357

Wilkinson, D. M。, & Huberman, 乙. A. (2007). Assessing the value of
cooperation in Wikipedia. First Monday, 12(4). https://doi.org/10
.5210/fm.v12i4.1763

Wouters, P。, Zahedi, Z。, & Costas, 右. (2019). Social media metrics
for new research evaluation. In W. Glänzel, H. F. Moed, U.
Schmoch, & 中号. Thelwall (编辑。), Springer handbook of science
and technology indicators (PP. 687–713). Springer International
出版. https://doi.org/10.1007/978-3-030-02511-3_26

Xiao, L。, & Askin, 氮. (2014). Academic opinions of Wikipedia and
Open Access publishing. Online Information Review, 38(3),
332–347. https://doi.org/10.1108/OIR-04-2013-0062

Yasseri, T。, Sumi, R。, Rung, A。, Kornai, A。, & Kertész, J. (2012).
Dynamics of conflicts in Wikipedia. PLOS ONE, 7(6), e38869.
https://doi.org/10.1371/journal.pone.0038869, 考研:
22745683

Zagorova, 奥。, Ulloa, R。, Weller, K., & Flöck, F. (2022). “I updated
”: The evolution of references in the English Wikipedia
and the implications for altmetrics. Quantitative Science Studies,
3(1), 147–173. https://doi.org/10.1162/qss_a_00171

Zahedi, Z。, & Costas, 右. (2018). General discussion of data quality
challenges in social media metrics: Extensive comparison of four
major altmetric data aggregators. PLOS ONE, 13(5), e0197326.
https://doi.org/10.1371/journal.pone.0197326, 考研: 29772003
张, H。, Ren, Y。, & Kraut, 右. 乙. (2018). Mining and predicting
temporal patterns in the quality evolution of Wikipedia articles.
Academy of Management Proceedings, 2018(1), 13746. https://
doi.org/10.5465/AMBPP.2018.13746abstract

郑, L。, Albano, C. M。, Vora, 氮. M。, Mai, F。, & Nickerson, J. V.
(2019). The roles bots play in Wikipedia. 在诉讼程序中
ACM Conference on Human-Computer Interactions, 3(CSCW),
1–20. https://doi.org/10.1145/3359317

Quantitative Science Studies

952

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

e
d

q
s
s
/
A
r
t

C
e

p
d

F
/

/

/

/

3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image

下载pdf