RESEARCH ARTICLE
Wikinformetrics: Construction and description of
an open Wikipedia knowledge graph data set
for informetric purposes
Wenceslao Arroyo-Machado1
, Daniel Torres-Salinas1
, and Rodrigo Costas2,3
1Department of Information and Communication Sciences, University of Granada, Granada, 西班牙
2Centre for Science and Technology Studies (CWTS), 莱顿大学, Leiden, 荷兰人
3DSI-NRF Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy,
Stellenbosch University, Stellenbosch, 南非
关键词: altmetrics, 数据, informetrics, knowledge graph, 指标, 维基百科
抽象的
Wikipedia is one of the most visited websites in the world and is also a frequent subject of
scientific research. 然而, the analytical possibilities of Wikipedia information have not yet
been analyzed considering at the same time both a large volume of pages and attributes. 这
main objective of this work is to offer a methodological framework and an open knowledge
graph for the informetric large-scale study of Wikipedia. Features of Wikipedia pages are
compared with those of scientific publications to highlight the (迪斯)similarities between the two
types of documents. Based on this comparison, different analytical possibilities that Wikipedia
and its various data sources offer are explored, ultimately offering a set of metrics meant
to study Wikipedia from different analytical dimensions. In parallel, a complete dedicated
data set of the English Wikipedia was built (and shared) following a relational model. 最后,
a descriptive case study is carried out on the English Wikipedia data set to illustrate the
analytical potential of the knowledge graph and its metrics.
1.
介绍
On January 15, 2001, Wikipedia was born under the umbrella of Nupedia, an encyclopedia
project that was based on a peer review system. Due to the lack of agility in publishing articles,
Wikipedia was created as a feeder project, as its objective was to make the creation of new
articles easier before they were reviewed (History of Wikipedia, 2021). Wikipedia combined in
a single project different elements that were new on the web and that made possible for the
first time a universal encyclopedia (Reagle, 2009). It was successful enough to make Nupedia
disappear in 2 年, experiencing steady growth. 自那以后, Wikipedia has become one
of the most visited websites in the world (https://www.semrush.com/website/top/, 访问过
八月 4, 2022), 拥有 328 different editions, 285 of them having more than 1,000 文章
(https://meta.wikimedia.org/wiki/List_of_Wikipedias, accessed August 4, 2022). Although this
is the most successful project of Wikimedia Foundation, there are also other well-known
knowledge projects using wikis as a basis (例如, the Wiktionary dictionary or the Wikidata
knowledge base).
Wikipedia has been a disruptive innovation, finding in its open nature and decentralized
knowledge development one of its key elements (Olleros, 2008). Not only can everyone access
开放访问
杂志
引文: Arroyo-Machado, W., Torres-
Salinas, D ., & Costas, 右. (2022).
Wikinformetrics: Construction and
description of an open Wikipedia
knowledge graph data set for
informetric purposes. Quantitative
Science Studies, 3(4), 931–952.
https://doi.org/10.1162/qss_a_00226
DOI:
https://doi.org/10.1162/qss_a_00226
支持信息:
https://doi.org/10.1162/qss_a_00226
已收到: 10 八月 2022
公认: 28 十月 2022
通讯作者:
Wenceslao Arroyo-Machado
wences@ugr.es
处理编辑器:
Vincent Larivière
版权: © 2022 Wenceslao Arroyo-
Machado, Daniel Torres-Salinas, 和
Rodrigo Costas. Published under a
Creative Commons Attribution 4.0
国际的 (抄送 4.0) 执照.
麻省理工学院出版社
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
its contents free of charge, but they can also participate in its construction, in a fully transparent
过程. This social construction of the knowledge can be seen in the differences found among
language editions of the same Wikipedia pages (原 & Doney, 2015). Wikipedia contents are
also the result of consensus among editors or Wikipedians. This consensus is built in open dis-
cussions in the Wikipedia talk pages (真木, Yoder et al., 2017; Yasseri, Sumi et al., 2012), open to
anyone and capturing transnational debates around Wikipedia contents (Kopf, 2020). Some of
these talks and debates have sometimes transcended Wikipedia itself (O’Neil, 2017).
As an online encyclopedia, Wikipedia is not exempt from problems. The reliability of its
content has been much debated, as it is based on contributions from anonymous individuals
(Olleros, 2008). The quality of Wikipedia pages’ content has been studied numerous times from
different perspectives, especially with regard to medical content pages, pointing out limitations,
such as occasional incomplete or imprecise information (Adams, Montgomery et al., 2020;
Candelario, Vazquez et al., 2017; 韦纳, Horbacewicz et al., 2019). The importance of inte-
grating Wikipedia into academia, both in its use and in its development, has been highlighted
(Jemielniak, 2019). Social and cultural inequalities have also been pointed out, such as racial
and gender gaps in its biographies (Adams, Brückner, & Naslund, 2019; Tripodi, 2021).
Wikipedia is not free of bots and vandalism, although they do not constitute a serious threat
to its contents and reliability and Wikipedia’s policy does not allow detrimental use of the
activity of bots or automated accounts. Most of the bots on Wikipedia are publicly identified
(https://en.wikipedia.org/wiki/Special:ListUsers/bot), and they contribute to improving the
content and structure of Wikipedia articles (Arroyo-Machado, Torres-Salinas et al., 2020;
郑, Albano et al., 2019). Bots also help to control and reduce problems of vandalism
and trolls, as they eliminate their harmful edits of articles in advance of human editors. 那里
is also no shortage of proposals for methods based on machine learning to prevent this type of
harmful activity (Martinez-Rico, Martinez-Romo, & Araujo, 2019).
In spite of all of these issues, the general idea is that Wikipedia is a transparent and reliable
source of encyclopedic information (Lageard & Paternotte, 2021), with value of its own to be
the subject of scientific research.
1.1. Wikipedia as Source for Informetric Research
Wikipedia has been researched from different scientific perspectives. One of them is infor-
指标, quantitatively studying the contents and activity generated on Wikipedia. 因此,
Wikipedia has been studied from the points of view of scientometrics, bibliometrics, 和
webometrics, which are discussed in detail below.
Bibliographic references made in Wikipedia have been studied, particularly since the emer-
gence of the notion of “altmetrics” (Priem, Taraborelli et al., 2010), which considered citations
on Wikipedia to scientific literature as part of its realm1. Wikipedia citations are one of the
most popular sources covered in altmetric aggregators (Ortega, 2020; Zahedi & Costas,
2018) such as Altmetric.com, PlumX, or Crossref Event Data. In addition to altmetric data
提供者, there are also several other open data sources providing extensive metadata on
Wikipedia citations (辛格, 西方, & Colavizza, 2020; Zagorova, Ulloa et al., 2022). 而且,
other proposals, such as Scholia, enable the exploration of bibliographic data at different
levels through Wikidata (Nielsen, Mietchen, & Willighagen, 2017). 表中 1 a summary of
previous studies on Wikipedia bibliographic references are presented.
1 Wikipedia references had already been studied for years before the birth of altmetrics, such as in the citation
analysis by Nielsen (2007) 或者, in a more qualitative way, that of Mühlhauser and Oser (2008).
Quantitative Science Studies
932
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
问
你
A
n
t
我
t
A
我
t
我
v
e
S
C
e
n
C
e
S
你
d
e
s
t
我
桌子 1. Main studies on the bibliographic references included in Wikipedia pages
Type
Qualitative
Application
Content and quality
数据
–
分析
Methodological
方法
语言
版
Topic analyzed
Check list
德语
Health care
Content and quality
33 页面
Scoring system
英语
Medication
分析
Analyze the editors’
citation process
–
Survey and interviews
Multilingual
Multidisciplinary
参考
Mühlhauser and Oser
(Mühlhauser & Oser,
2008)
Candelario et al.
(Candelario et al.,
2017)
Kaffee and Elsahar
(Kaffee & Elsahar,
2021)
Nielsen (Nielsen, 2007)
Quantitative
Analyze citation
30,368 citations
Descriptive statistics
英语
Multidisciplinary
图案
Kousha and Thelwall
(Kousha & Thelwall,
2017)
Lewoniewski et al.
(Lewoniewski, Węcel,
& Abramowicz, 2017)
Maggio et al. (Maggio,
Willinsky et al., 2017)
Pooladian and Borrego
(Pooladian & Borrego,
2017)
Jemielniak et al.
(Jemielniak,
Masukume, &
Wilamowski, 2019)
Torres-Salinas et al.
(Torres-Salinas,
Romero-Frías,
& Arroyo-Machado,
2019)
Arroyo-Machado et al.
(Arroyo-Machado
等人。, 2020)
9
3
3
Evaluate the impact
36,191 citations
Descriptive statistics
Multilingual
Multidisciplinary
of references
参考
coverage across
语言
6.8 million pages
41 million citations
Analyze citation
图案
229,857 页面
1,049,025 citations
Descriptive statistics
Multilingual
Multidisciplinary
Descriptive statistics
英语
药品
Evaluate the impact
982 citations
Descriptive analysis
Multilingual
Multidisciplinary
of references
Rank journals by
citations
11,325 页面
137,889 citations
Citation analysis
英语
药品
Mapping of
知识
结构
Mapping of
知识
结构
25,555 页面
41,655 citations
193,802 页面
847,512 citations
Cocitation analysis
英语
艺术 & 人文学科
Cocitation analysis
英语
Multidisciplinary
瓦
我
k
我
n
F
哦
r
米
e
t
r
我
C
s
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
问
你
A
n
t
我
t
A
我
t
我
v
e
S
C
e
n
C
e
S
你
d
e
s
t
我
桌子 1.
(continued )
参考
Colavizza (Colavizza,
2020)
Nicholson et al.
(Nicholson,
Uppala et al., 2021)
Type
Application
数据
Publications
覆盖范围
3,083 ref. 酒吧.
Reviewing citation
质量
1,923,575 页面
824,298 ref. 酒吧.
Methodological
方法
Topic modeling and
regression analysis
语言
版
Topic analyzed
英语
COVID-19
Classification modeling
英语
Multidisciplinary
Singh et al. (Singh et al.,
Data set creation
4 million citations
Text mining
英语
Multidisciplinary
2020)
Zagorova et al.
(Zagorova et al., 2022)
Data set creation
6,073,708 页面
55 million citations
Text mining
英语
Multidisciplinary
瓦
我
k
我
n
F
哦
r
米
e
t
r
我
C
s
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
9
3
4
Wikinformetrics
Kaffee and Elsahar (2021) explored the flow that Wikipedians follow to include references
in Wikipedia articles. Kousha and Thelwall (2017), and Pooladian and Borrego (2017)
described the problems of Wikipedia citations in performance evaluation. Nicholson et al.
(2021) studied the quality of cited references in Wikipedia. Lewoniewski et al. (2017) 显示
that the different language editions of the same Wikipedia page tended to cite common
来源, with the largest overlap between English and German and some differences depend-
ing on the topics. Colavizza (2020) studied the coverage of the scientific literature on
COVID-19 on Wikipedia, showing that although there was only a small percentage of scien-
tific literature on COVID-19 in Wikipedia, it was sufficiently representative of its various
主题. Arroyo-Machado et al. (2020) and Torres-Salinas et al. (2019) mapped Wikipedia
cocitations patterns, showing fundamental differences in the use of scientific literature in
Wikipedia compared to the academic realm. Bould, Hladkowicz et al. (2014), 李, Thelwall,
and Mohammadi (2021), and Tomaszewski and MacDonald (2016) studied academic cita-
tions in scientific publications to Wikipedia articles, proving that scientific publications also
use Wikipedia content in their citations, as well as other digital encyclopedias, especially in
areas such as chemistry, 物理, or mathematics.
Wikipedia has also been the subject of webometric studies. 例如, “Wikiometrics”
were proposed as a rating system to rank universities or journals based on the features of their
Wikipedia pages, also finding positive correlations with existing academic rankings (Katz &
Rokach, 2017). The estimation of the importance of Wikipedia pages based on the PageRank
algorithm was also studied, correlating positively with other page-view-based rankings
(Thalhammer & Rettinger, 2016). Miquel-Ribé and Laniado (2018) showed that the different
language editions of Wikipedia pages reflect cultural differences, as the contents cover local
topics corresponding to different linguistic regions. Other studies focused on metrics about the
attention generated around Wikipedia articles (例如, likes or page view counts), showing how
they reflect current topics of interest at a particular time/region (Dzogang, Lansdall-Welfare, &
Cristianini, 2016; Mittermeier, Roll et al., 2019; Mittermeier, Correia et al., 2021; Roll,
Mittermeier et al., 2016; Vilain, Larrieu et al., 2017), and even demonstrating the potential
of Wikipedia pages to monitor the spread of diseases (Generous, Fairchild et al., 2014).
There are also numerous studies around Wikipedia’s informetric features. Wilkinson and
Huberman (2007) found a correlation between the quality of Wikipedia articles and their num-
ber of edits. The relationship between the length of Wikipedia articles and their quality has
been highlighted by Blumenstock (2008). Beyond quality, relationships between Wikipedia
metrics have also been explored. Previous studies found positive correlations between views
and the number of edits and editors (Mittermeier et al., 2021), and weak correlations between
the length of Wikipedia pages and the length of their talk pages (Yasseri et al., 2012). 张,
Ren, and Kraut (2018) suggested the value of using metrics in specific moments of the life
cycles, for example the number of editors in the first 3 months of an article’s life was not
when it was most strongly related to its future quality.
虽然, as shown above, there is abundant scientific literature on Wikipedia and its infor-
metric applications, most previous studies tended to focus on either limited sets of metrics
(例如, Nicholson et al. (2021), who were focused on the level of quality of scientific publica-
tions referenced in Wikipedia articles), or limited data sets (例如, Mittermeier et al. (2021), WHO
studied a large set of features in a data set of Wikipedia pages of 10,099 bird species across
251 language editions). 因此, large-scale study of Wikipedia, from both a large volume of
pages and attributes, is still missing in the literature. 可以说, a potential reason for this lack
of large-scale studies on Wikipedia is the lack of a conceptual framework that highlights both
Quantitative Science Studies
935
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
the large-scale data available from Wikipedia and the multiple informetric metrics that Wiki-
pedia offers. Such absence has hindered the development of broader research perspectives,
especially regarding the relationship of Wikipedia with science, where a contextualization
of the relationships between the two is still needed.
In this study, we propose such a framework by means of developing an informetric-inspired
knowledge graph, with the aim of enabling similar analytical approaches to those developed
in scientometric research. Such a knowledge graph could work as a complement of other
Wikipedia knowledge graphs such as Wikidata (https://www.wikidata.org/) or DBpedia (https://
www.dbpedia.org/). Wikidata and DBpedia provide exhaustive Wikipedia knowledge graphs
but they are more focused on content and semantic relationships, transforming Wikipedia
pages into entities (例如, 人们, 地方, music bands) and establishing different computer-
understandable relationships between them. Our proposed knowledge graph aims at charac-
terizing the attention and usage of Wikipedia pages using a relational model and incorporating
activity metadata that are not present in the semantic graphs of Wikidata and DBpedia,
capturing the attention and social engagement, such as views or edits, as well as the presence
of scientific literature in Wikipedia pages.
The paper is structured as follows: 第一的, we describe our main objectives and our alignment
with recent developments in the field of altmetrics. 第二, we describe the informetric fea-
tures of Wikipedia pages and their similarities with scientific publications, together with the
existing data sources for data collection. Several informetric-inspired metrics ( Wikinformetrics)
are proposed for Wikipedia. 第三, a Wikipedia knowledge graph, based on the combination
of different Wikipedia data sources, is constructed and presented. 第四, the data set is
explored in a descriptive way to show the analytical possibilities of the knowledge graph
and the proposed metrics. 最后, we conclude by discussing our findings and proposing
future research venues.
1.2. Objectives
The main objective of this work is to explore the research value of Wikipedia from an infor-
metric perspective, ultimately providing a complete Wikipedia knowledge graph. 更多的
具体来说, three different objectives are targeted:
1. Theoretical objective: To establish a framework for Wikipedia analytics, by exploring
the informetric features of Wikipedia pages (作品, 类别, 来源, 数据
搜集, etc..) and proposing a set of informetric-inspired metrics ( Wikinformetrics)
for their quantitative study. This objective will help us to map the analytical possibilities
of Wikipedia as a scientific object.
Instrumental objective: To create a large open Wikipedia knowledge graph. Once we
are familiar with the main features of Wikipedia, we will construct a dedicated knowl-
edge graph focused on the English-language edition of Wikipedia with the main infor-
mation and data relationships coming from combining different data sources.
2.
3. Applied objective: To conduct a descriptive quantitative study of Wikipedia metrics
based on the knowledge graph data set, and to explore the proposed metrics and the
different types of attention they capture.
This work and its objects align with novel developments on social media metrics (Díaz-
Faes, Bowman, & Costas, 2019; Wouters, Zahedi, & Costas, 2019), contributing to the explo-
ration of different science-society interactions that can be captured on Wikipedia (Costas, 的
Quantitative Science Studies
936
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
Rijcke, & Marres, 2020). Our ambition is to frame Wikipedia as a data source with multiple
informetric research possibilities. 此外, a dedicated data set of the English edition of
Wikipedia is constructed for informetric purposes and is freely available at Zenodo (https://土井
.org/10.5281/zenodo.6346899). R and Python were used together for its elaboration, 与
scripts available on GitHub (https://doi.org/10.5281/zenodo.6959428). Many of the results
presented here are novel, as to the best of our knowledge there is no previous literature that
has explored the same large set of Wikipedia features and with the same large-scale perspec-
tive as in this study. This work is intended to be useful for a wide range of researchers, 例如
librarians, informetricians, sociologists, and data scientists.
2. WIKIPEDIA FROM AN INFORMETRIC PERSPECTIVE
2.1. Analogy Between Wikipedia Pages and Scientific Publications
In Wikipedia, the key components are the individual pages. Wikipedia pages are not only used
for the publication of encyclopedia articles but also other numerous typologies of pages, 这样的
as categories, 用户, and talk pages, as well as relationships among them. The different types of
pages are given by a pre-established namespace (a type of page with special features identi-
fiable through a prefix included in the title). Wikipedia currently has 12 namespaces in use
(文章, 用户, 维基百科, file, mediawiki, template, 帮助, 类别, portal, 草稿, timedtext,
and module), each with an associated “talk namespace” (or “talk page”) in which discussions
are held around the contents and edits of the page, and two virtual namespaces (special and
媒体).
There are several features of Wikipedia pages, in particular namespace article pages, 为了
which it is possible to establish an equivalence with that of a scientific publication. 第一的, 他们
have a title and an associated page identifier ( Wikipedia page ID). They may have one or more
authors, it being possible to identify the first person who created it, 什么时候, and those who
have made a greater contribution or whose edition has been revoked. The contents may
include multimedia files, links to external resources, and bibliographic references, 之中
其他的. There are also internal links that enable Wikipedia pages to connect to each other, 只是
like citations among scientific publications. 最后, Wikipedia pages can be classified with
categories according to their contents to carry out its thematic classification, such as keywords
and classifications applied to scientific publications. Most of these elements can be seen as
metadata to be treated in the study of Wikipedia pages. 然而, there are several differences
between Wikipedia pages and scientific publications that cannot be ignored (桌子 2). 这
most important is that Wikipedia pages are a living resource and not static documents. 这
access and editing of the contents also differ between Wikipedia pages and scientific publica-
tions because Wikipedia pages do not focus on a specific audience (例如, scientific publica-
tions mostly focus on academic audiences), but anyone can take an active part in editing
他们. It should be also noted that some pages may be temporarily limited or protected for
编辑 (爬坡道 & Shaw, 2015).
The living nature of Wikipedia pages puts them at the center of a complex system
(Ladyman, Lambert, & Wiesner, 2013), whose main elements are represented in Figure 1.
Many of the elements of the pages are static or unalterable, such as the creation date or page
ID, while others are in constant evolution, especially the contents themselves. This makes it
difficult to study certain elements in Wikipedia (Détienne, Baker et al., 2016), as Wikipedia
content is volatile and authorship and contribution roles can be diluted in contrast to the
higher stability of scientific publications. 此外, the same page, especially encyclopedic
文章, may have parallel versions in different language editions of Wikipedia, 这可能
Quantitative Science Studies
937
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
ID
姓名
Type
Creation
作者身份
Content
语言
讨论
描述
Tags
媒体
Wikinformetrics
桌子 2.
Comparison of features between Wikipedia pages and scientific publications
Wikipedia element description
状态
Document state condition
Document identification number
Title of the document
Document typologies
Wikipedia pages vs. Scientific publications
Wikipedia page
Living
Page ID
标题
Namespace
(12 + 12 类型)
Scientific publication
Static
DOI, 国际标准书号, URI …
标题
纸, proceeding,
letter …
Date from which it is available
First edition date
Publication date
Responsible for the work
Type of content
Wikipedians
Structured text
Authors
Structured text
Language of the resource
Edition dependent
Document dependent
Comments on the contents
Talk
Work summary
Terms describing the content
Audiovisual resources includable
Short description
Categories
图片, audios,
and videos
Internal links
Peer review
抽象的
关键词
图片, audios,
and videos
Citations
Internal links
Links to the related resources
Format
Standardized structure and content
Manual of style*
Format guidelines
Bibliography
References of cited resources
Access
Audience
Access model
Document target audience
参考
Open
General
参考
Closed/Open
Specialized
* The English Wikipedia has its own manual of style https://en.wikipedia.org/wiki/ Wikipedia:Manual_of_Style.
数字 1. Diagram of the main elements involved in creating and editing Wikipedia articles.
Quantitative Science Studies
938
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
vary in content. This scenario becomes even more complex when taking into account that not
only human users are involved in the development of Wikipedia pages but also bots, 因此
making the interactions that can occur more complex to analyze (Tsvetkova, García-
Gavilanes et al., 2017).
2.2. Categorization
Wikipedia pages are not thematically organized according to a controlled language-based
classification, such as Britannica’s subject organization system. 反而, Wikipedia pages have
a category system that works like a folksonomy (Minguillón, Lerga et al., 2017). Wikipedians
are free to tag each page under one or more existing categories or to create new ones.
Numerous studies have approached them, such as by studying their semantic domain
(Aghaebrahimian, Stauder, & Ustaszewski, 2020; Heist & 保尔海姆, 2019). 然而, the main
problem of this folksonomy is the large number of individual categories and their unstructured
(IE。, without a clear hierarchical system) relations at different levels, introducing a lot of noise
and making it difficult to have a general thematic view of Wikipedia (Boldi & Monti, 2016;
Kittur, 志, & Suh, 2009). 此外, there are also hidden categories, related to the mainte-
nance or management of the page.
Besides the categories, Wikipedia has other options for accessing and browsing its contents
by topics (https://en.wikipedia.org/wiki/ Wikipedia:Contents). 一方面, it offers differ-
ent curated content lists (例如, the “list of articles every Wikipedia should have” or the list of
“vital articles”). There are other lists that offer collections of articles that respond to the same
话题, and even “lists of lists.” Similarly, there are “portals,” which imitate the classic web por-
tals and are organized in sections that group the main contents of a topic, not only the articles
(例如, the “Science” portal or the “History of science” subportal). WikiProjects, communities of
Wikipedians aimed at improving Wikipedia content on a specific topic and which have their
own page from which they coordinate their activities, can also work as a classification
approach due to their thematic orientation (例如, “Anthropology” or “The Beatles”). 那里
are also third-party classification systems, such as the “Library of Congress Classification” or
the “Universal Decimal Classification.” Finally, external to Wikipedia, but within the Wiki-
media ecosystem, there are other types of classification solutions, such as Wikidata taxon-
omies (https://www.wikidata.org/wiki/ Wikidata:WikiProject_Taxonomy) or ORES (https://
www.mediawiki.org/wiki/ORES), that can be used to identify Wikipedia topics using machine
learning techniques. The main limitation with all of the above is that there is no central clas-
sification system that covers all Wikipedia pages, and that at the same time it is concise and
easy to manage, particularly in terms of the number of subjects and the hierarchical relation-
ships among them. The lack of such central classification in Wikipedia is a major hindrance
for the large-scale epistemic study of Wikipedia.
2.3. Content Control
Each Wikipedia page has a discussion space called “talk pages,” where Wikipedians discuss
with other Wikipedians. Talk pages aim at improving the quality and reliability of the articles.
Discussions in talk pages are public (Ferschke, 古列维奇, & Chebotar, 2012), resembling the
model of open peer review of scientific publications (黑色的, 2008), and representing a form of
public review in contrast to the traditional academic blind peer review system (Cummings,
2020). Wikipedia also includes formal peer review approaches in which Wikipedians request
assistance from experts on given topics (https://en.wikipedia.org/wiki/ Wikipedia:Peer_review).
Despite discrepancies and differences about what open peer review means and the different
Quantitative Science Studies
939
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
桌子 3. General quality grading scheme of WikiProject articles
班级
Featured article
The best possible content on Wikipedia, no need for improvement
描述
Assignment
审查
Badge
是的
Featured list
The best possible list on Wikipedia, no need for improvement
A
Fully addresses the subject and requires only minor improvements
Good article
It satisfies Wikipedia’s main criteria and is close to a professional article
乙
C
Start
Stub
List
The content is almost complete and has no major problems
The content is considerable, but has significant problems
It includes significant content, but is still in development
The content is very short and requires substantial work
Content displayed in a list linking to Wikipedia articles on a specific topic
审查
审查
审查
自由的
自由的
自由的
自由的
自由的
是的
不
是的
不
不
不
不
不
models proposed (Ross-Hellauer, 2017), the three basic principles (open identities, 报告, 和
参与) are clearly recognizable in Wikipedia (Table S2 in the Supplementary material).
Wikipedians are both authors and reviewers of content and their reports are available as com-
ments on the talk pages, all of which are always open and identifiable. 有趣的是, 维基百科-
inspired reviewing approaches have even been proposed for scholarly publishing, 例如
postpublication correction system and readers’ comments (Xiao & Askin, 2014).
Wikipedia also includes a quality control system of the content of the different articles that
comes from WikiProjects. It is grounded on an evaluation system to classify pages in higher or
lower levels of content quality, with standard grades that are listed on the respective talk page.
Although there is a general scheme (桌子 3), it is possible that some WikiProjects do not
include all grades or that there may be differences in their application. 相似地, the pages
are also classified according to their importance within the topic (Top, 高的, 中, 和
低的). Wikipedians can set any level of quality and importance on a given page, 也
modifying them. When there are disagreements among Wikipedians about the quality level
of a page, this leads to a discussion and a search for consensus around the quality level of
the page. 然而, at the highest levels of quality (Featured Articles and Good Articles) 这
assignment requires a stricter review process, including the presentation of a candidacy and an
evaluation by independent Wikipedians according to pre-established criteria. These two levels
also have their own badges on the article page.
2.4. 来源
A fundamental aspect of Wikipedia lies in the system of links that allows its pages to be con-
nected among them, making Wikipedia unique in this sense with regard to other encyclopedic
系统 (Reagle & Koerner, 2020). These internal links have been studied, showing both the
semantic relationships they can establish and other potential utilities (Consonni, Laniado, &
Montresor, 2019; Presutti, Consoli et al., 2014), as well as the possibility of calculating net-
work indicators such as PageRank based on them (Thalhammer & Rettinger, 2016). 有,
然而, important issues to consider when working with Wikipedia pages links:
1. The links may be redirects; 那是, old page versions that automatically redirect to the
new versions when accessing them.
Quantitative Science Studies
940
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
2. There are lists of links to other Wikipedia pages. Most of the lists include pages that are
conceptually related to each other and share a clear subject matter. 然而, 有
specific lists such as disambiguation pages, which are aimed at reducing the ambiguity
of some terms (例如, “citation” or “granada”), and therefore the links in these lists are not
necessarily thematically related.
Another fundamental source for Wikipedia is its bibliographic references. Wikipedia rec-
ommends the use of bibliographic references to support its contents and it is an essential
requirement for a page to achieve the best quality status (Featured article). These references
are the same as those made in scientific publications, in both cases serving as a support for an
主意. 然而, it is necessary to consider that citations in Wikipedia and citations in scientific
publications are governed by different norms and dynamics. 图中 2 the main differences
between scientific publications references and Wikipedia references are schematized.
Other relevant particularities of Wikipedia references include
(西德:129) Unlike scientific publications in which the identity of the citers (IE。, those including the
references in the scientific publication) is clear and invariable, in Wikipedia this is more
复杂的 (given the live nature of Wikipedia articles) and not always possible. 然而,
there are some methodological proposals for this purpose (Zagorova et al., 2022).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 2. Differences between traditional citations and Wikipedia mentions of scientific publications.
Quantitative Science Studies
941
Wikinformetrics
(西德:129) Wikipedia citation counts can be distorted by the translations of articles into different
语言, because it is possible to easily transfer the references across the different lan-
guage versions of the same article, thus distorting the meaning and value of Wikipedia
citation counts. This limitation does not occur in scientific publications, as only one lan-
guage version of a given publication is usually considered in the counting of citations.
(西德:129) There are certain Wikipedia pages that function as large bibliographic indexes, 带来
together the most relevant literature on a specific topic (例如, research annuals or
bibliographies).
(西德:129) There are also templates (special Wikipedia pages that are embedded within other pages
to facilitate the repetition of information), which are sometimes used to generate pre-
established lists of references that are quickly inserted and replicated into numerous
Wikipedia pages that are strongly related. This happened, 例如, with the listing
of lunar crater references (https://en.wikipedia.org/wiki/ Wikipedia:Templates_for
_discussion/Log/2014_June_8#Template:Lunar_crater_references).
2.5. Data Gathering
There are numerous data sources, and the choice of one or the other depends mostly on the
type and volume of data required. 在某些情况下, there are even multiple ways of accessing the
same data. These have been summarized in Table 4, but can be found in detail in Section S3 in
the Supplementary material. 实际上, Wikimedia has a Research community (https://meta
.wikimedia.org/wiki/Research) that gathers different resources to help and guide all those peo-
ple who want to access the data of the Wikimedia projects and that lists the different projects
related to it.
The two main sources are dumps and APIs. One of the main problems when working with
Wikipedia data dumps is their size, especially when dealing with the more complete editions
(例如, the metadata of the revision of the English Wikipedia pages as of June 2022 is formed by
27 files of more than 2 Gbyte each), so accessing a subset of data requires a lot of time and
努力. In the case of using Wikipedia APIs, metadata can be accessed on demand, 但是
retrieval process is very laborious, especially when large volumes of data are required. 其他
sources are characterized by offering already preprocessed data, such as the total number of
edits or page views, which can be consulted from XTool.
在本文中, we extracted and developed a full Wikipedia knowledge graph with the ambi-
tion of facilitating the future of the English Wikipedia, reducing the time and effort that
researchers may need in collecting and connecting all the different data sources.
2.6. Wikinformetrics
最后, there are multiple metrics that can be extracted from the sources presented before and
that enable the informetric study of Wikipedia pages. Based on previous studies and the above
exploration of the informetric characteristics of Wikipedia, several metrics have been selected
(桌子 5). Each of them is of interest for measuring a particular dimension of the pages. 为了
例子, the number of views can be seen as a measure of the impact and outreach of a par-
ticular page, and although the numbers of edits and editors reflect the volume of activity, 这
numbers of talks and talkers are representative of the discussions that take place around these
页面. These are not the only metrics that can be obtained from Wikipedia, but they can be
considered to capture some of the most important analytical aspects of Wikipedia pages (例如,
Quantitative Science Studies
942
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
问
你
A
n
t
我
t
A
我
t
我
v
e
S
C
e
n
C
e
S
你
d
e
s
t
我
9
4
3
桌子 4.
Summary of Wikipedia data sources by format, update frequency, data quantity, 类型, and challenges
Wikimedia Dumps Metadata, page content,
and relationships
Content
Access
Offline
Format
Update frequency Data quantity*
XML, SQL
Once/twice
一个月
Big data
Type** Main challenge***
Data processing
General
MediaWiki and
Wikimedia APIs
Metadata, page content,
在线的
JSON, WDDX,
Real time
Small data
General
Data recovery
关系, and statistics
XML, YAML, PHP
Wiki Replicas
Metadata, page content,
在线的
SQL
Near-real time
Small data
General
Data recovery
and relationships
Event Streams
Real-time logs
在线的
SSE, JSON
Analytics dumps
Statistics on page views
Offline
TSV
Real time
Monthly
–
Specific
Data recovery
Big data
Specific
Data processing
WikiStats
and activity
Statistics on page views,
内容, and activity
在线的
JSON/CSV
Monthly
Small data
Specific
Data recovery
Dbpedia
Contents and semantic
两个都
RDF/ XML, Turtle,
Live/monthly
–
General
Data recovery
关系
N-Triplets,
SPARQL
endpoint
XTools
Statistics on page views,
内容, and activity
在线的
JSON
Real time
Small data
Specific
Data recovery
Repositories
Dedicated Wikipedia
Offline
–
–
data sets
Altmetric
聚合器
Wikipedia References
to publications
* Volume of data to be retrieved and processed.
在线的
CSV/JSON
日常的
** Data from Wikipedia are included to address different problems or are of a specific nature.
*** Task that will require more effort when using the data source.
–
–
–
–
Specific
Data processing
瓦
我
k
我
n
F
哦
r
米
e
t
r
我
C
s
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
Metric
Editors
Edits
链接
Links
年龄
Length
Talkers
Talks
Views
参考
Pub. referenced
URLs
桌子 5. Description of the metrics obtained for Wikipedia articles by analytical dimension
Analytical dimension
描述
Activity
Activity
Number of unique editors that have edited a Wikipedia article
Number of total edits that have a Wikipedia article
Connectivity
Number of Wikipedia articles in which the article is linked to
Connectivity
Number of internal links that include a Wikipedia article to others
描述
Years that have passed since the creation of the page to the date of data collection
描述
Length in bytes of the page
讨论
讨论
外展
支持
支持
支持
Number of unique editors that have edited a Wikipedia article’s talk page
Number of total edits that the talk page of a Wikipedia article has
Number of daily views of a Wikipedia page
Number of elements listed in the references
Number of publications referenced
Number of external links that include a Wikipedia article
contributions, content development, links and interactions, and impact), being also easy to
interpret in an informetric framework.
3. WIKIPEDIA KNOWLEDGE GRAPH
Using the different data sources described above, a knowledge graph of the English edition of
Wikipedia has been constructed for informetric purposes and freely shared on Zenodo (https://
doi.org/10.5281/zenodo.6346899). The English edition of Wikipedia has been chosen
because it is the largest one and has an international scope. For its construction, 数据来自
Wikimedia and analytic dumps were used, as well as data shared in repositories, 具体来说
the data set of Singh et al. (2020) in which they share references made in Wikipedia articles.
The data included in this data set covers all English Wikipedia activity until July 2021, 除了
page views, which are from April 1, 2021 to June 30, 2021, and bibliographic reference data,
until May 2020. R and Python have been used together, with the scripts available on GitHub
(https://doi.org/10.5281/zenodo.6959428). The construction of this data set is described in
Section S1 in the Supplementary material. The resulting data set consists of nine files con-
nected to each other by a relational structure summarized in Figure 3.
This knowledge graph offers numerous possibilities for the informetric study of Wikipedia,
making it possible to study new relationships (and interactions) between science and this
social medium (例如, the attention on Wikipedia to academic topics, the presence of scientific
literature on popular Wikipedia pages, or the use of scientific literature in Wikipedia pages
with large discussions in their Talk pages). This is the case of the work of Arroyo-Machado,
Díaz-Faes, and Costas (2022), who found a positive relationship between the research
performance of universities and their social attention on Wikipedia, using data from this
数据集.
Although the generation of new versions of the knowledge graph cannot be guaranteed by
the authors of this paper, the way in which its creation is detailed and the shared scripts ensure
Quantitative Science Studies
944
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 3. Diagram of files and relationships of the Wikipedia knowledge graph data set.
that new versions can be generated. This is also of importance for the generation of new
knowledge graphs in other language editions of Wikipedia, as the data used as a basis are also
available in other languages. The only limitation in this respect is in the reference data, 像他们
come from a specific data set (Singh et al., 2020). 然而, those responsible have also shared
the tools used to obtain the references and there are other alternatives such as Zagorova et al.
(2022) or altmetric data aggregators.
4. CASE STUDY: INFORMETRIC ANALYSIS OF THE ENGLISH WIKIPEDIA
As a case study, the knowledge graph of the English Wikipedia is used to calculate and study
the proposed metrics in a broad manner. The analysis was performed in Python and the code is
available at GitHub (https://doi.org/10.5281/zenodo.6958972).
4.1. Wikipedia Metrics and Articles’ Content
有 53,710,529 pages in the English Wikipedia, considering all namespaces as well as
pages that are redirects; 然而, this number is reduced to 6,328,134 pages when the
focus is on articles that are not redirects. These represent just 11.79% of the overall English
维基百科. The metrics proposed in Figure 4 have been obtained for all of them.
数字 4 shows the descriptive statistics of the main variables, differentiating between
total Wikipedia articles and those classified based on their quality; 5,522,676 文章
(87.27% of the total) are associated with a WikiProject and with some quality level. 文章
with different quality levels have been considered in all of them. It is noticeable that in all
指标, Featured articles have the highest values. The case of class B articles is noteworthy,
as they not only show few differences with respect to the Good and A-Class articles, 存在
Quantitative Science Studies
945
Wikinformetrics
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 4. Average of Wikipedia article metrics differentiating by the quality assigned from a project.
also greater in number of articles than both, but in aspects such as views they are positioned
above them.
There are important differences in the number of referenced publications, going from an
平均数 14.27 publications in Featured articles to 8.52 in A and 5.84 in Good articles, 尽管
the Start and Stub articles cite on average less than one publication. This reflects compliance
with English Wikipedia’s criteria for establishing the quality level of articles. The general cri-
teria do not make explicit the need for a greater number of references to increase the level of
质量, 除其他外, but they do require an increase in “reliable sources,” so that citations to
publications can serve as a proxy for this. 同样地, it also corroborates previous findings of a
relationship between the level of quality and the number of edits (Wilkinson & Huberman,
2007), and the length of articles (Blumenstock, 2008).
Most Wikipedia pages are not of recent creation (Figure 5A), with a median of 11 年. 在
some of the metrics, such as edits and talks, extreme outliers are found. This can be seen in the
fact that their average values are 102 和 9.19, 分别, above the median and third quar-
tile values. This situation is much more pronounced in the case of views, with an average of
3,346.59. 此外, the number of referenced elements has a median of 1 and an average
的 4.6. When comparing the links with the linked ones, we find that Wikipedia pages link more
than they are linked, because the median for the former is 36 and for the latter 15.
The correlations between these variables are all positive (Figure 5B). The strongest correla-
tion is between talkers and talks (rs = 0.97), followed by another analogous relationship such
as that between editors and edits (rs = 0.94). When considering pairs of metrics of different
自然, the strongest correlation is between edits and views (rs = 0.74), followed by that of
Quantitative Science Studies
946
Wikinformetrics
数字 5. A: Boxplots of the main metrics for Wikipedia articles excluding outliers from the figures and marking the mean with a cross sym-
bol. 乙: Spearman’s rho correlations between the main metrics for Wikipedia English articles.
editors and views (rs = 0.72), which suggests a relationship between the popularity of Wikipe-
dia pages in terms of visits and their number of edits. 有趣的是, a lower correlation was
found between views, and both talks and talkers (rs = 0.48), suggesting that discussions around
Wikipedia pages are not necessarily related to higher numbers of views. Another moderate
correlation can be found between the length of an article and its views (rs = 0.6), 哪个
may indicate that the larger the article, the more attention it receives or that the more attention
it receives, the more it grows in length. There are other moderate correlations, such as between
the length and the number of references (rs = 0.56) and URLs (rs = 0.65), but which are to be
expected as the two elements directly interfere with each other. The number of referenced
publications is the metric most weakly correlated, there being for example a weak correlation
between this and views (rs = 0.24) or talks (rs = 0.2). Our results confirm the same type of
relationships reported in previous research (Mittermeier et al., 2021), albeit this time consid-
ering the entire population of English language Wikipedia articles.
4.2. Different Types of Attention Captured on Wikipedia
The results of this analysis can also be accessed interactively and in greater detail via the R
Shinny app: https://wenceslao-arroyo-machado.shinyapps.io/wikinformetrics/.
A review of Wikipedia’s main pages based on different metrics reveals its potential to cap-
ture content that responds to different types of attention (Table S4 in the Supplementary
材料). The page views make it possible to identify those topics that capture the most atten-
tion of society in a given period—page views are limited to a period of 3 months in our data
放. 因此, in our data set the pages of Prince Philip, Duke of Edinburgh (10,860,553 意见) 和
Elizabeth II (9,900,275), or Mare of Easttown (5,995,513) rank among the most visited in the
English-language Wikipedia. 还, five of the 20 most viewed pages are series or movies
released in the period analyzed, which also highlights that content related to entertainment
Quantitative Science Studies
947
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
occupies a relevant position in Wikipedia. Sports also receive many views and reflect current
事件, as evidenced by the UEFA Euro 2020 页 (12,100,455 意见), the second most
viewed, just after the Main Page (554,030,839). There is a clear presence of articles that
respond to general interests, such as the Bible (11,048,609) or Cleopatra (9,516,340) 页面.
This may indicate that some topics raise general interest and may not be time related.
The number of talks of Wikipedia articles is often used in conjunction with other variables
in the construction of models for controversy detection (Jang, Foley et al., 2016). 这表明
that this metric may be useful for detecting such controversial content in a simple way. Among
这 20 pages with the highest number of talks, those of political figures, religion topics, 和
scientific controversies stand out. The strong talk that takes place in some of them, 如
Donald Trump (62,944), and the vandalism and presence of trolls, as in Gamergate controversy
(27,185), have caused the editing of these pages to be restricted. 实际上, there are some articles
clearly related to controversial or sensitive issues, such as Climate change (40,837) and Home-
opathy (25,898). 在这方面, Wikipedia itself offers a page with a curated list of controversial
文章 (https://en.wikipedia.org/wiki/ Wikipedia:List_of_controversial_issues), 和 13 的
20 pages listed as of 4 七月 2021.
最后, based on the volume of referenced publications, 那是, all materials with an asso-
ciated identifier (DOI, 国际标准书号, arXiv ID, ETC。), it is also possible to identify the Wikipedia pages
that cite more scientific publications. 然而, in this case there are many research annuals
and bibliographic pages present among the 20 文章, 例如 2018 in paleontology with
569 referenced publications. These lists have been eliminated to select the top 20 articles with
encyclopedic content. In these articles there is a clear presence of scientific content, 尤其
in medicine, such as Feminizing hormone therapy (329) and Alzheimer’s disease (277). 如何-
曾经, there are also articles related to history, such as History of Lisbon (313) or World War II
(264). This may suggest that the metric of the number of publications cited can be used as a
proxy to identify Wikipedia articles that are more scholarly oriented.
5. 讨论
In this study we describe how Wikipedia is a complex system, involving numerous actors and
元素, and whose rules and governance depend on the community itself (Jemielniak,
2012). It is not only one of the first and clearest examples of Web 2.0 but also one of the
few that remains among the most visited websites and has not deviated from its initial objec-
主动的. Far from that, over the years it has gained the acceptance and trust of many of those who
initially looked at it with skepticism.
We describe many similarities between scientific publications and Wikipedia pages. 两个都
have different typologies of documents, structured content, evaluation of content, and use of
links and bibliographic references. There are also notable differences. While scientific publi-
cations may have limited access and a more specialized audiences, Wikipedia’s content and
scope is more open and targeted to more general audiences. The live nature of Wikipedia is
probably its main distinctive feature when compared to scientific publications. This must be
considered when conducting informetric research on Wikipedia. To help in this endeavor, 我们
propose an informetric-inspired conceptual framework, proposing different metrics that pay
attention to the different analytical dimensions of Wikipedia, such as article characteristics,
outreach, or citations to scientific publications. Some of these metrics have been already
explored in the literature, such as page views (Mittermeier et al., 2019, 2021), but never in
a comprehensive conceptual framework. The informetric-inspired conceptual framework
presented here is expected to be useful for any Wikipedia study involving informetric,
Quantitative Science Studies
948
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
scientometric, bibliometric, or webometric perspectives. 相似地, different Wikipedia data
sources have been identified and described, finding in their differences in coverage, 体积,
使用权, or data processing crucial aspects for their selection.
Alongside the conceptual analytical framework proposed, a knowledge graph of the English
edition of Wikipedia has been built and shared openly (https://doi.org/10.5281/zenodo
.6346899). The data are gathered under a comprehensive data set that follows a relational
model and can be used by anyone interested in the study of this encyclopedia from an infor-
metric point of view. It combines different data sources that allow users on the one hand to
characterize any Wikipedia page, while also allowing them to establish relationships between
彼此 (例如, between two articles, an article and a category or an article and a linked
website or a scientific publication referenced in it). Together with the metadata and relations
of Wikipedia pages, the data of their bibliographic references are also incorporated, 哪个
come from the data set shared by Singh et al. (2020). It is precisely in Wikipedia’s biblio-
graphic reference data where greater efforts are needed so that they can be efficiently accessed
through its official sources, such as dumps or the API.
The case study provides a descriptive overview of Wikipedia articles in its English edition,
suggesting interesting valuable analytical possibilities and highlighting the relationships and
usefulness of the metrics described. Our results suggest that the low correlations among most
of the metrics point to the fact that the analytical dimensions measured through them are rather
distinct. The potential analytical usefulness of some of the metrics has been highlighted. 为了
例子, the number of Wikipedia page views can be seen as a metric of social attention; 这
number of talks of Wikipedia pages can be seen as a proxy of controversial topics; 和
number of scientific references in Wikipedia pages can help identify scholarly-related content.
The use of the quality levels derived from WikiProjects has proved to be useful, showing clear
differences between the different levels, but has also provided an overview of the Wikipedia
文章.
最后, it is important to also mention some of the limitations of this work. 第一的, not all
possible Wikipedia metrics and their relationships have been explored (例如, the relationship
between pages and users, or the number of users who follow the pages (the so-called
watchers), or the number of editions in other languages of a given article). The use of large
amounts of data and some specific sources leads to a loss of consistency. 例如, 这
Wikipedia dump process takes several days without blocking the edits during that time, so they
are not really a snapshot. This loss of consistency also occurs when using different sources,
especially when combining 2021 Wikipedia data with references from a third-party data set
published in 2020. The knowledge graph and the case study are based on the English
维基百科; 然而, future research should study whether the same relationships found in this
study also hold for other languages as well as the existing relationships between language
editions.
致谢
We thank Mercedes and María for their intellectual advice in the early stages.
作者贡献
Wenceslao Arroyo-Machado: 数据管理, 形式分析, 调查, 软件, Visual-
化, Writing—original draft. Daniel Torres-Salinas: 资金获取, 资源, Valida-
的, Writing—review & 编辑. Rodrigo Costas: 概念化, 方法, 项目
行政, 监督, Writing—review & 编辑.
Quantitative Science Studies
949
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
COMPETING INTERESTS
The authors have no competing interests.
资金信息
This work was funded by the Spanish Ministry of Science and Innovation with grant number
PID2019-109127RB-I00/SRA/10.13039/501100011033. Wenceslao Arroyo-Machado
received an FPU Grant (FPU18/05835) from the Spanish Ministry of Universities. Daniel
Torres-Salinas received support under the Reincorporation Programme for Young Researchers
of the University of Granada. Rodrigo Costas is partially funded by the South African DSI-NRF
Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy
(SciSTIP).
DATA AVAILABILITY
The Wikipedia knowledge graph data set is available in Zenodo (Arroyo-Machado et al.,
2022).
The source code for constructing the Wikipedia knowledge graph data set is available in
Zenodo (Arroyo-Machado, 2022A).
The case study code is available in Zenodo (Arroyo-Machado, 2022乙).
参考
Adams, C. E., 蒙哥马利, A. A。, Aburrow, T。, Bloomfield, S。,
Briley, 磷. M。, … Xia, J. (2020). Adding evidence of the effects of
treatments into relevant Wikipedia pages: A randomised trial.
BMJ Open, 10(2), e033655. https://doi.org/10.1136/ bmjopen
-2019-033655, 考研: 32086355
Adams, J。, Brückner, H。, & Naslund, C. (2019). Who counts as a
notable sociologist on Wikipedia? 性别, 种族, and the “Profes-
sor Test.” Socius, 5, 2378023118823946. https://doi.org/10.1177
/2378023118823946
Aghaebrahimian, A。, Stauder, A。, & Ustaszewski, 中号. (2020). Testing
the validity of Wikipedia categories for subject matter labelling of
open-domain corpus data. Journal of Information Science, 48(5),
686–700. https://doi.org/10.1177/0165551520977438
Arroyo-Machado, 瓦. (2022A). Wences91/wikipedia_knowledge_
图形 [Source code]. https://doi.org/10.5281/zenodo.6959428
Arroyo-Machado, 瓦. (2022乙). Wences91/wikinformetrics [来源
代码]. https://doi.org/10.5281/zenodo.6958972
Arroyo-Machado, W., Díaz-Faes, A. A。, & Costas, 右. (2022). 新的
insights on social media metrics: Examining the relationship
between universities’ academic reputation and Wikipedia atten-
的. 26th International Conference on Science, Technology and
Innovation Indicators (STI 2022), Granada, 西班牙. https://doi.org
/10.5281/zenodo.6962442
Arroyo-Machado, W., Torres-Salinas, D ., & Costas, 右. (2022). Wiki-
pedia knowledge graph dataset [Data set]. https://doi.org/10
.5281/zenodo.6346899
Arroyo-Machado, W., Torres-Salinas, D ., Herrera-Viedma, E., &
Romero-Frías, 乙. (2020). Science through Wikipedia: A novel
representation of open knowledge through co-citation networks.
PLOS ONE, 15(2), e0228713. https://doi.org/10.1371/journal
.pone.0228713, 考研: 32040488
黑色的, 乙. 瓦. (2008). Wikipedia and academic peer review. 在线的
Information Review, 32(1), 73–88. https://doi.org/10.1108
/14684520810865994
Blumenstock, J. 乙. (2008). Size matters: Word count as a measure of
quality on Wikipedia. In Proceedings of the 17th International
Conference on World Wide Web (PP. 1095–1096). https://土井
.org/10.1145/1367497.1367673
Boldi, P。, & Monti, C. (2016). Cleansing Wikipedia categories using
centrality. In Proceedings of the 25th International Conference
Companion on World Wide Web (PP. 969–974). https://doi.org
/10.1145/2872518.2891111
Bould, 中号. D ., Hladkowicz, 乙. S。, Pigford, A.-A. E., Ufholz, L.-A.,
Postonogova, T。, … Boet, S. (2014). References that anyone can
edit: Review of Wikipedia citations in peer reviewed health sci-
ence literature. BMJ: British Medical Journal, 348, g1585. https://
doi.org/10.1136/bmj.g1585, 考研: 24603564
Candelario, D. M。, 巴斯克斯, 五、, Jackson, W., & Reilly, 时间. (2017).
Completeness, 准确性, and readability of Wikipedia as a refer-
ence for patient medication information. Journal of the American
Pharmacists Association: JAPhA, 57(2), 197–200., https://doi.org
/10.1016/j.japh.2016.12.063, 考研: 28139458
Colavizza, G. (2020). COVID-19 research in Wikipedia. Quantita-
tive Science Studies, 1(4), 1349–1380. https://doi.org/10.1162
/qss_a_00080
Consonni, C。, Laniado, D ., & Montresor, A. (2019). WikiLink-
Graphs: A complete, longitudinal and multi-language dataset of
the Wikipedia link networks. In Proceedings of the 13th Interna-
tional AAAI Conference on Web and Social Media (PP. 598–607).
https://doi.org/10.1609/icwsm.v13i01.3257
Costas, R。, de Rijcke, S。, & Marres, 氮. (2020). “Heterogeneous cou-
plings”: Operationalizing network perspectives to study
science-society interactions through social media metrics. Jour-
nal of the Association for Information Science and Technology,
72(5), 595–610. https://doi.org/10.1002/asi.24427
Cummings, 右. 乙. (2020). Writing knowledge: 维基百科, 民众
review, and peer review. Studies in Higher Education, 45(5),
950–962. https://doi.org/10.1080/03075079.2020.1749791
Quantitative Science Studies
950
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
Détienne, F。, 贝克, M。, Fréard, D ., Barcellini, F。, Denis, A。, &
Quignard, 中号. (2016). The descent of Pluto: Interactive dynamics,
specialisation and reciprocity of roles in a Wikipedia debate.
International Journal of Human-Computer Studies, 86, 11–31.
https://doi.org/10.1016/j.ijhcs.2015.09.002
Díaz-Faes, A. A。, Bowman, 时间. D ., & Costas, 右. (2019). Towards a
second generation of “social media metrics”: Characterizing
Twitter communities of attention around science. PLOS ONE,
14(5), e0216408. https://doi.org/10.1371/journal.pone
.0216408, 考研: 31116783
Dzogang, F。, Lansdall-Welfare, T。, & Cristianini, 氮. (2016). Sea-
sonal fluctuations in collective mood revealed by Wikipedia
searches and Twitter posts. 在 2016 IEEE 16th International Con-
ference on Data Mining Workshops (ICDMW ) (PP. 931–937).
https://doi.org/10.1109/ICDMW.2016.0136
Ferschke, 奥。, 古列维奇, 我。, & Chebotar, 是. (2012). Behind the
文章: Recognizing dialog acts in Wikipedia talk pages. In Pro-
ceedings of the 13th Conference of the European Chapter of the
计算语言学协会 (PP. 777–786).
Generous, N。, Fairchild, G。, 德什潘德, A。, Del Valle, S. Y。, &
Priedhorsky, 右. (2014). Global disease monitoring and forecast-
ing with Wikipedia. PLOS Computational Biology, 10(11),
e1003892. https://doi.org/10.1371/journal.pcbi.1003892,
考研: 25392913
原, N。, & Doney, J. (2015). Social construction of knowledge in
维基百科. First Monday, 20(6). https://doi.org/10.5210/fm.v20i6
.5869
Heist, N。, & 保尔海姆, H. (2019). Uncovering the semantics of
Wikipedia categories. 在C中. Ghidini, 氧. Hartig, 中号. Maleshkova,
V. Svátek, 我. Cruz, A. Hogan, J. 歌曲, 中号. Lefrançois, & F. Gandon
(编辑。), The Semantic Web – ISWC 2019 (PP. 219–236). 施普林格
国际出版. https://doi.org/10.1007/978-3-030
-30793-6_13
爬坡道, 乙. M。, & Shaw, A. (2015). Page protection: Another missing
dimension of Wikipedia research. In Proceedings of the 11th
International Symposium on Open Collaboration. https://doi.org
/10.1145/2788993.2789846
History of Wikipedia. (2021). 维基百科. 28 可能. https://在
.wikipedia.org/wiki/History_of_Wikipedia
Jang, M。, Foley, J。, Dori-Hacohen, S。, & 艾伦, J. (2016). Probabilis-
tic approaches to controversy detection. 在诉讼程序中
25th ACM International on Conference on Information and
Knowledge Management (PP. 2069–2072). https://doi.org/10
.1145/2983323.2983911
Jemielniak, D. (2012). 维基百科: An effective anarchy. 巴尔的摩,
医学博士: Society for Applied Anthropology.
Jemielniak, D. (2019). 维基百科: Why is the common knowledge
resource still neglected by academics? GigaScience, 8(12),
giz139. https://doi.org/10.1093/gigascience/giz139, 考研:
31794014
Jemielniak, D ., Masukume, G。, & Wilamowski, 中号. (2019). 这
most influential medical journals according to Wikipedia: 普通话-
titative analysis. Journal of Medical Internet Research, 21(1),
e11429. https://doi.org/10.2196/11429, 考研: 30664451
Kaffee, L.-A., & Elsahar, H. (2021). References in Wikipedia: 这
editors’ perspective. In Companion Proceedings of the Web Con-
参考 2021 (PP. 535–538). https://doi.org/10.1145/3442442
.3452337
Katz, G。, & Rokach, L. (2017). Wikiometrics: A Wikipedia based
ranking system. World Wide Web, 20(6), 1153–1177. https://
doi.org/10.1007/s11280-016-0427-8
Kittur, A。, 志, 乙. H。, & Suh, 乙. (2009). What’s in Wikipedia? Map-
ping topics and conflict using socially annotated category
结构. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (PP. 1509–1512). https://doi.org
/10.1145/1518701.1518930
Kopf, S. (2020). Participation and deliberative discourse on social
media—Wikipedia talk pages as transnational public spheres?
Critical Discourse Studies, 19(2), 196–211. https://doi.org/10
.1080/17405904.2020.1822896
Kousha, K., & Thelwall, 中号. (2017). Are Wikipedia citations impor-
tant evidence of the impact of scholarly articles and books? Jour-
nal of the Association for Information Science and Technology,
68(3), 762–779. https://doi.org/10.1002/asi.23694
Ladyman, J。, Lambert, J。, & Wiesner, K. (2013). What is a complex
系统? European Journal for Philosophy of Science, 3(1), 33–67.
https://doi.org/10.1007/s13194-012-0056-8
Lageard, 五、, & Paternotte, C. (2021). Trolls, bans and reverts: 辛-
ulating Wikipedia. Synthese, 198(1), 451–470. https://doi.org/10
.1007/s11229-018-02029-0
Lewoniewski, W., Węcel, K., & Abramowicz, 瓦. (2017). 分析
of references across Wikipedia languages. 在R中. Damaševičius &
V. Mikašytė (编辑。), Information and Software Technologies
(PP. 561–573). Springer International Publishing. https://doi.org
/10.1007/978-3-319-67642-5_47
李, X。, Thelwall, M。, & Mohammadi, 乙. (2021). How are encyclope-
dias cited in academic research? 维基百科, Britannica, Baidu
Baike, and Scholarpedia. Profesional de La Información, 30(5).
https://doi.org/10.3145/epi.2021.sep.08
Maggio, L. A。, Willinsky, J. M。, Steinberg, 右. M。, Mietchen, D .,
Wass, J. L。, & Dong, 时间. (2017). Wikipedia as a gateway to bio-
medical research: The relative distribution and use of citations
in the English Wikipedia. PLOS ONE, 12(12), e0190046.
https://doi.org/10.1371/journal.pone.0190046, 考研:
29267345
真木, K., Yoder, M。, Jo, Y。, & Rosé, C. (2017). Roles and success in
Wikipedia talk pages: Identifying latent patterns of behavior. 在
Proceedings of the Eighth International Joint Conference on Natural
语言处理 ( 体积 1: Long Papers) (PP. 1026–1035).
https://aclanthology.org/I17-1103
Martinez-Rico, J. R。, Martinez-Romo, J。, & Araujo, L. (2019). Can
deep learning techniques improve classification performance of
vandalism detection in Wikipedia? Engineering Applications of
人工智能, 78, 248–259. https://doi.org/10.1016/j
.engappai.2018.11.012
Minguillón, J。, Lerga, M。, Aibar, E., Lladós-Masllorens, J。, &
Meseguer-Artola, A. (2017). Semi-automatic generation of a cor-
pus of Wikipedia articles on science and technology. Profesional
de La Información, 26(5), 995–1005. https://doi.org/10.3145/epi
.2017.sep.20
Miquel-Ribé, M。, & Laniado, D. (2018). Wikipedia culture gap: 普通话-
tifying content imbalances across 40 language editions. Frontiers in
Physics, 6, 54. https://doi.org/10.3389/fphy.2018.00054
Mittermeier, J. C。, Correia, R。, Grenyer, R。, Toivonen, T。, & Roll, U.
(2021). Using Wikipedia to measure public interest in biodiver-
sity and conservation. Conservation Biology, 35(2), 412–423.
https://doi.org/10.1111/cobi.13702, 考研: 33749051
Mittermeier, J. C。, Roll, U。, Matthews, 时间. J。, & Grenyer, 右. (2019). A
season for all things: Phenological imprints in Wikipedia usage
and their relevance to conservation. PLOS Biology, 17(3),
e3000146. https://doi.org/10.1371/journal.pbio.3000146,
考研: 30835729
Mühlhauser, 我。, & Oser, F. (2008). Does WIKIPEDIA provide evi-
dence based health care information? A content analysis. Shared
Decision-Making in Health Care, 102(7), e1–e7. https://doi.org
/10.1016/j.zefq.2008.06.020
Quantitative Science Studies
951
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Wikinformetrics
Nicholson, J. M。, Uppala, A。, Sieber, M。, Grabitz, P。, Mordaunt, M。,
& Rife, S. C. (2021). Measuring the quality of scientific references
in Wikipedia: An analysis of more than 115M citations to over
800 000 scientific articles. The FEBS Journal, 288(14), 4242–4248.
https://doi.org/10.1111/febs.15608, 考研: 33089957
Nielsen, F. A. (2007). Scientific citations in Wikipedia. 第一的
Monday, 12(8). https://doi.org/10.5210/fm.v12i8.1997
Nielsen, F. Å., Mietchen, D ., & Willighagen, 乙. (2017). Scholia,
scientometrics and Wikidata. 在E中. Blomqvist, K. Hose, H.
保尔海姆, A. L(西德:1) awrynowicz, F. 奇拉韦尼亚, & 氧. Hartig (编辑。),
语义网: ESWC 2017 Satellite Events (PP. 237–259).
Springer International Publishing. https://doi.org/10.1007/978-3
-319-70407-4_36
Olleros, F. X. (2008). Learning to trust the crowd: Some lessons
from Wikipedia. 在 2008 International MCETECH Conference
on E-Technologies (Mcetech 2008) (PP. 212–216). https://土井
.org/10.1109/MCETECH.2008.17
O’Neil, 时间. (2017). Wikipedia erases record of accomplished scientist
—‘Censored’ for his intelligent design position. PJ Media. https://
pjmedia.com/faith/tyler-o-neil/2017/11/21/wikipedia-erases
-record-of-accomplished-scientist-censored-for-his-intelligent
-design-position-n101002
Ortega, J.-L. (2020). Altmetrics data providers: A meta-analysis
review of the coverage of metrics and publication. Profesional
de La Información, 29(1). https://doi.org/10.3145/epi.2020.ene.07
Pooladian, A。, & Borrego, Á. (2017). Methodological issues in mea-
suring citations in Wikipedia: A case study in library and infor-
mation science. Scientometrics, 113(1), 455–464. https://doi.org
/10.1007/s11192-017-2474-z
Presutti, 五、, Consoli, S。, Nuzzolese, A. G。, Recupero, D. R。,
Gangemi, A。, … Zargayouna, H. (2014). Uncovering the seman-
tics of Wikipedia pagelinks. In K. Janowicz, S. Schlobach, 磷.
Lambrix, & 乙. Hyvönen (编辑。), Knowledge engineering and
knowledge management (PP. 413–428). Springer International
出版. https://doi.org/10.1007/978-3-319-13704-9_32
Priem, J。, Taraborelli, D ., Groth, P。, & Neylon, C. (2010). Altmetrics:
A manifesto. Altmetrics. https://altmetrics.org/manifesto/
Reagle, J. (2009). 维基百科: The happy accident. Interactions,
16(3), 42–45. https://doi.org/10.1145/1516016.1516026
Reagle, J。, & Koerner, J. (编辑。). (2020). Wikipedia @ 20: Stories of an
incomplete revolution. 与新闻界. https://doi.org/10.7551
/mitpress/12366.001.0001
Roll, U。, Mittermeier, J. C。, Diaz, G. 我。, Novosolov, M。, 费尔德曼, A。,
… Grenyer, 右. (2016). Using Wikipedia page views to explore the
cultural importance of global reptiles. Biological Conservation,
204, 42–50. https://doi.org/10.1016/j.biocon.2016.03.037
Ross-Hellauer, 时间. (2017). What is open peer review? A systematic
review. F1000Research, 6, 588. https://doi.org/10.12688
/f1000research.11369.2, 考研: 28580134
辛格, H。, 西方, R。, & Colavizza, G. (2020). Wikipedia citations: A
comprehensive data set of citations with identifiers extracted
from English Wikipedia. Quantitative Science Studies, 2(1),
1–19. https://doi.org/10.1162/qss_a_00105
Thalhammer, A。, & Rettinger, A. (2016). PageRank on Wikipedia:
Towards general importance scores for entities. 在H. Sack, G.
Rizzo, 氮. Steinmetz, D. Mladenić, S. Auer, & C. Lange (编辑。),
语义网 (PP. 227–240). Springer International Publish-
英. https://doi.org/10.1007/978-3-319-47602-5_41
Tomaszewski, R。, & MacDonald, K. 我. (2016). A study of citations to
Wikipedia in scholarly publications. 科学 & 技术
Libraries, 35(3), 246–261. https://doi.org/10.1080/0194262X
.2016.1206052
Torres-Salinas, D ., Romero-Frías, E., & Arroyo-Machado, 瓦.
(2019). Mapping the backbone of the humanities through the
eyes of Wikipedia. Journal of Informetrics, 13(3), 793–803.
https://doi.org/10.1016/j.joi.2019.07.002
Tripodi, F. (2021). Ms. Categorized: 性别, notability, and inequality
on Wikipedia. New Media & 社会, 14614448211023772.
https://doi.org/10.1177/14614448211023772
Tsvetkova, M。, García-Gavilanes, R。, Floridi, L。, & Yasseri, 时间.
(2017). Even good bots fight: The case of Wikipedia. PLOS
ONE, 12(2), e0171774. https://doi.org/10.1371/journal.pone
.0171774, 考研: 28231323
Vilain, P。, Larrieu, S。, Cossin, S。, Caserio-Schönemann, C。, & Filleul,
L. (2017). 维基百科: A tool to monitor seasonal diseases trends?
Online Journal of Public Health Informatics, 9(1). https://doi.org
/10.5210/ojphi.v9i1.7630
韦纳, S. S。, Horbacewicz, J。, Rasberry, L。, & Bensinger-Brody, 是.
(2019). Improving the quality of consumer health information on
维基百科: Case series. Journal of Medical Internet Research, 21(3),
e12450. https://doi.org/10.2196/12450, 考研: 30882357
Wilkinson, D. M。, & Huberman, 乙. A. (2007). Assessing the value of
cooperation in Wikipedia. First Monday, 12(4). https://doi.org/10
.5210/fm.v12i4.1763
Wouters, P。, Zahedi, Z。, & Costas, 右. (2019). Social media metrics
for new research evaluation. In W. Glänzel, H. F. Moed, U.
Schmoch, & 中号. Thelwall (编辑。), Springer handbook of science
and technology indicators (PP. 687–713). Springer International
出版. https://doi.org/10.1007/978-3-030-02511-3_26
Xiao, L。, & Askin, 氮. (2014). Academic opinions of Wikipedia and
Open Access publishing. Online Information Review, 38(3),
332–347. https://doi.org/10.1108/OIR-04-2013-0062
Yasseri, T。, Sumi, R。, Rung, A。, Kornai, A。, & Kertész, J. (2012).
Dynamics of conflicts in Wikipedia. PLOS ONE, 7(6), e38869.
https://doi.org/10.1371/journal.pone.0038869, 考研:
22745683
Zagorova, 奥。, Ulloa, R。, Weller, K., & Flöck, F. (2022). “I updated
这 ”: The evolution of references in the English Wikipedia
and the implications for altmetrics. Quantitative Science Studies,
3(1), 147–173. https://doi.org/10.1162/qss_a_00171
Zahedi, Z。, & Costas, 右. (2018). General discussion of data quality
challenges in social media metrics: Extensive comparison of four
major altmetric data aggregators. PLOS ONE, 13(5), e0197326.
https://doi.org/10.1371/journal.pone.0197326, 考研: 29772003
张, H。, Ren, Y。, & Kraut, 右. 乙. (2018). Mining and predicting
temporal patterns in the quality evolution of Wikipedia articles.
Academy of Management Proceedings, 2018(1), 13746. https://
doi.org/10.5465/AMBPP.2018.13746abstract
郑, L。, Albano, C. M。, Vora, 氮. M。, Mai, F。, & Nickerson, J. V.
(2019). The roles bots play in Wikipedia. 在诉讼程序中
ACM Conference on Human-Computer Interactions, 3(CSCW),
1–20. https://doi.org/10.1145/3359317
Quantitative Science Studies
952
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
q
s
s
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
3
4
9
3
1
2
0
7
0
7
7
9
q
s
s
_
A
_
0
0
2
2
6
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3