ARTICLE DE RECHERCHE
A quantitative and qualitative open citation
analysis of retracted articles in the humanities
Ivan Heibi1,2
and Silvio Peroni1,2
1Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies,
University of Bologna, Bologna, Italy
2Digital Humanities Advanced Research Centre (/ DH.arc), Department of Classical Philology and Italian Studies,
University of Bologna, Bologna, Italy
Mots clés: citation analysis, sciences humaines, retraction, topic modeling
ABSTRAIT
In this article, we show and discuss the results of a quantitative and qualitative analysis of open
citations of retracted publications in the humanities domain. Our study was conducted by
selecting retracted papers in the humanities domain and marking their main characteristics
(par exemple., retraction reason). Alors, we gathered the citing entities and annotated their basic
metadata (par exemple., title, venue, sujet) and the characteristics of their in-text citations (par exemple., intent,
sentiment). Using these data, we performed a quantitative and qualitative study of retractions
in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing
entities’ abstracts and the in-text citation contexts. As part of our main findings, we noticed that
there was no drop in the overall number of citations after the year of retraction, with few
entities that have either mentioned the retraction or expressed a negative sentiment toward the
cited publication. En outre, on several occasions, we noticed a higher concern/awareness
by citing entities belonging to the health sciences domain about citing a retracted publication,
compared with the humanities and social science domains. Philosophy, arts, and history are
the humanities areas that showed higher concern toward the retraction.
1.
INTRODUCTION
Retraction is a way to correct the literature and alert readers to erroneous materials in the pub-
lished literature. A retraction should be formally accompanied by a retraction notice—a doc-
ument that justifies such a retraction. Reasons for retraction include plagiarism, peer review
manipulation, and unethical research (Barbour, Kleinert et al., 2009).
Several works in the past have studied and uncovered important aspects regarding this phe-
nomenon, such as the reasons for retraction (Casadevall, Steen, & Fang, 2014; Corbyn, 2012),
the temporal characteristics of the retracted articles (Bar-Ilan & Halevi, 2018), their authors’
countries of origin (Ataie-Ashtiani, 2018), and the impact factor of the journals publishing
eux (Campos-Varela, Villaverde-Castañeda, & Ruano-Raviña, 2020; Fang & Casadevall,
2011). Other works have analyzed authors with a higher number of retractions (Brainard,
2018), and the scientific impact, technological impact, funding impact, and Altmetric impact
in retractions (Feng, Yuan, & Lequel, 2020). Other studies focused on the retraction in the med-
ical and biomedical domain (Campos-Varela, Villaverde-Castañeda, & Ruano-Raviña, 2020;
Gaudino, Robinson et al., 2021; Gasparyan, Ayvazyan et al., 2014).
un accès ouvert
journal
Citation: Heibi, JE., & Peroni, S. (2022).
A quantitative and qualitative open
citation analysis of retracted articles in
the humanities. Quantitative Science
Études, 3(4), 953–975. https://doi.org
/10.1162/qss_a_00222
EST CE QUE JE:
https://doi.org/10.1162/qss_a_00222
Peer Review:
https://publons.com/publon/10.1162
/qss_a_00222
Reçu: 22 Novembre 2021
Accepté: 10 Octobre 2022
Auteur correspondant:
Ivan Heibi
ivan.heibi2@unibo.it
Éditeur de manipulation:
Ludo Waltman
droits d'auteur: © 2022 Ivan Heibi and Silvio
Peroni. Published under a Creative
Commons Attribution 4.0 International
(CC PAR 4.0) Licence.
La presse du MIT
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
un
_
0
0
2
2
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Scientometricians have also proposed several works on retraction based on quantitative
data. Par exemple, several works (Azoulay, Bonatti, & Krieger, 2017; Lu, Jin et al., 2013;
Mongeon & Larivière, 2016; Shuai, Rollins et al., 2017) focused on showing how a single
retraction could trigger citation losses through an author’s prior body of work. Bordignon
(2020) investigated the different impacts that negative citations in articles and comments
posted on postpublication peer review platforms have on the correction of science, alors que
Dinh, Sarol et al. (2019) applied descriptive statistics and ego-network methods to examine
4,871 retracted articles and their citations before and after retraction. Other authors focused
on the analysis of the citations made before the retraction (Bolland, Grey, & Avenell, 2021) et
on a specific reason for retraction, such as misconduct (Candal-Pedreira, Ruano-Ravina et al.,
2020). The studies that considered only one retraction case usually observed also the in-text
citations and the related citation context in the articles citing retracted publications
(Bornemann-Cimenti, Szilagyi, & Sandner-Kiesling, 2016; Luwel, Van Eck, & van Leeuwen,
2019; Schneider, Ye et al., 2020; van der Vet & Nijveen, 2016).
Although citation analysis concerning retraction has been done several times in Science,
Technologie, Engineering, and Mathematics (STEM) disciplines, less attention has been given to
the humanities domain. One of the rare analyses done in the humanities domain was recently
presented by Halevi (2020), who considered two examples of retracted articles and showed
their continuous postretraction citations.
Our study seeks to expand the work concerning the analysis of citations of retracted pub-
lications in the humanities domain. By combining quantitative analysis with quantification of
citations and their related characteristics/metadata, and qualitative analysis, through a subjec-
tive examination of aspects related to the quality of the citations (par exemple., the reason for a citation
based on the examination/interpretation of its in-text citation context), we aim to understand
this phenomenon in the humanities, which has gained little attention in the past literature. Dans
particular, the research questions (RQ1–RQ3) we aim to address are
(cid:129) RQ1: How did scholarly research cite retracted humanities publications before and after
their retraction?
(cid:129) RQ2: Did all the humanities areas behave similarly concerning the retraction phenomenon?
(cid:129) RQ3: What were the main differences in citing retracted publications between STEM
disciplines and the humanities?
In this paper, we use a methodology developed to gather, characterize, and analyze incom-
ing citations of retracted publications (Heibi & Peroni, 2022un), adapted for the case of the
humanities1. The citation analysis is based on collections of open citations (c'est à dire., data are struc-
tured, separate, open, identifiable, and available) (Peroni & Shotton, 2018, 2020).
2. DATA GATHERING
The workflow followed to gather and analyze the data in this study is based on the method-
ology introduced in Heibi and Peroni (2022un), briefly summarized in Figure 1. The first two
phases of the methodology are dedicated to the collection and characterization of the entities
that have cited the retracted publications. The third phase is focused on analyzing the infor-
mation annotated in the first two phases to summarize quantitatively the data collected. Le
fourth and final phase applies a topic modeling analysis (Barde & Bainwad, 2017) on the
1 We have not described the methodology adopted in full here due to space constraints.
Études scientifiques quantitatives
954
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
un
_
0
0
2
2
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Chiffre 1. A summarizing schema representing the methodology in its four phases: identifying, retrieving, and characterizing the citing enti-
liens; extracting and labeling additional features based on the citing entities’ contents; building a descriptive statistical summary; and running a
topic modeling analysis.
textual information (extracted from the full text of the citing entities) and builds a set of
dynamic visualizations to enable an overview and investigation of the generated topics. Le
data gathering of our study is detailed in the following sections.
2.1. Retraction in the Humanities
D'abord, we wanted to have a descriptive statistical overview of the retractions in the humanities as a
function of crucial features (par exemple., reasons of retraction) to help us define the set of retractions to
use as input in the next phases. Ainsi, we queried the Retraction Watch database (https://
retractiondatabase.org; Collier, 2011) searching for all the retracted publications labeled as
sciences humaines (marked with “HUM” in the database). Ainsi, the humanities domain considered
in this work is based on the subject classification used by Retraction Watch (c'est à dire., the subjects
under the macro category “(HUM) Humanities”). Then we classified the results as a function
of three parameters: the year of the retraction, the subject area of the retracted publications
(architecture, arts, etc.), and the reason(s) for the retraction. We collected an overall number
de 474 publications; the earliest retraction occurred in 2002, and the last year of retraction we
obtained was 2020.
As shown in Figure 2, we noticed an increasing trend throughout the years, with some
exceptions. En particulier, we observed that the highest number of retractions per year was
119 dans 2010, probably due to an investigation and a massive retraction of several articles
belonging to one author, Joachim Boldt (Brainard, 2018). When looking at the subject areas,
we noticed that most of the retractions are related to arts and history, and plagiarism motives2
were by far the most representative ones, confirming the observation in Halevi (2020). Most of
the retracted publications (88%) are of article type (c'est à dire., labeled in Retraction Watch as either
“Conference Abstract/ Paper,” “Research Article,” or “Review Articles”). Book chapters/
References represent 8% of the total, and the rest are “Commentary/Editorials” (1%), et autre
residual types (3%, par exemple., letters, case reports, articles in press).
2.2. Retracted Publications Set and their Citations
As the focus of our study is on the analysis of citations of fully retracted publications, nous
excluded all the retracted publications collected in the previous step that did not receive at
2 A complete list of reasons accompanied by a description is provided by Retraction Watch at https://
retractionwatch.com/retraction-watch-database-user-guide/retraction-watch-database-user-guide-appendix
-b-reasons/.
Études scientifiques quantitatives
955
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
un
_
0
0
2
2
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
un
_
0
0
2
2
2
p
d
.
/
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 2. Retractions in the humanities domain with respect to three different features: the year of retraction (line chart), the subject areas of
the retracted publications (ring chart), the type of the retracted publication (large horizontal bar), and the reasons for retraction (horizontal bar
chart). Based on the data retrieved from the Retraction Watch database in June 2021.
least one citation according to two open citation databases: Microsoft Academic Graph (MAG,
https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/) (Wang, Shen
et coll., 2020) and OpenCitations’ (2020) COCI (https://opencitations.net/index/coci) (Heibi,
Peroni, & Shotton, 2019). MAG is a knowledge graph that contains the scientific publication
records, citations, authors, institutions, journaux, conferences, and fields of study. It also pro-
vides a free REST API service to search, filter, and retrieve its data. COCI is a citation index that
contains details of all the DOI-to-DOI citation links retrieved by processing the open biblio-
graphic references available in Crossref (Hendricks, Tkaczyk et al., 2020), and it can be que-
ried using open and free REST APIs. We decided to not use other proprietary and nonopen
databases because we aimed to make our workflow and results as reproducible as possible.
After querying COCI and MAG3, we found that 85 retracted items (out of 474) had at least
one citation (2,054 citations). We manually checked the data set for possible mistakes intro-
duced by the collections. En effet, either some of the citing entities identified in MAG did not
include a bibliographic reference to any of the retracted publications or the retracted publica-
tion in consideration was not cited in the content of the citing entity (although present in its
reference list), or the citing entities’ type did not refer to a scholarly publication (par exemple., bibliog-
raphie, retraction notice, presentation, data repository). There was also one article retracted for
duplication “The Nature of Creativity” by Sternberg (2006) that received 1,050 citations. Ce
retracted article contains a substantial amount of content published by the same author in
several of his previous works and it was the fourth retracted article by the same author who
3 We used their REST APIs in June 2021 to retrieve citation information.
Études scientifiques quantitatives
956
Open citation analysis of retracted articles in the humanities
Chiffre 3. A Venn diagram (bubble chart) to plot the number of entities gathered from MAG (Microsoft Academic Graph) and COCI (Open-
Citations Index of Crossref open DOI-to-DOI citations) which have cited the retracted publications, along with their distribution according to
the hum_affinity score of the retracted publication they cite (pie chart).
used to cite himself at a high rate while not doing enough to encourage diversity in psychology
recherche. We decided to exclude it from our study to reduce bias in the results. Following these
considerations, the final number of retracted publications considered was 84, involving a total
number of 935 unique citing entities. As shown in the bubble chart in Figure 3, most of the
citing entities (c'est à dire., 891) were included in MAG; 388 were included in COCI, and they shared
344 entities.
Although the retracted items identified so far were all in the humanities domain according to
the categories specified in Retraction Watch, an item might have other nonhumanities subjects
associated with it. Sometimes, these nonhumanities subjects might be more representative of the
content of the retracted document and, thus, they might generate an unwanted bias for the rest of
the analysis. Par exemple, consider the retracted article “The good, the bad, and the ugly: Should
we completely banish human albumin from our intensive care units?» (Boldt, 2000). In Retrac-
tion Watch, the subjects associated with it were medicine and journalism. Encore, when we checked
the full text of the article, we noticed that argumentations close to journalism are very few and, comme
tel, the article should not be considered as belonging to humanities research.
To avoid considering these peculiar publications in our analysis, we devised a mechanism
to help us evaluate the affinity of each retracted item to the humanities domain. We assigned to
each retracted item in the list (84) an initial score of 1, named hum_affinity—this value ranges
depuis 0 (c'est à dire. very low) à 5 (c'est à dire. very high). The final value of hum_affinity for each retracted item
is calculated as follows:
2.
1. We assigned to each retracted item additional subject categories obtained by searching
the venue where it was published in external databases—we used Scimago classifica-
tion (https://www.scimagojr.com/) for journals and the Library of Congress Classification
(LCC, https://www.loc.gov/catdir/cpso/lcco/) for books/book chapters.
If both the Retraction Watch subjects and those gathered in step (1) included at least one
subject identifying a discipline in the humanities, we added 1 to hum_affinity of that item.
If all the Retraction Watch subjects are part of the humanities domain, we added
another 1 to hum_affinity of that item.
If the title of the retracted item has a clear affinity to the humanities (par exemple., “The origins of
probabilism in late scholastic moral thought”), we added another 1 to hum_affinity of
that item.
3.
4.
5. Enfin, we provided a subjective score of −1, 0, ou 1 based on the abstract of the item.
Par exemple, we assigned 1 to the abstract of the retracted article of Mößner (2011):
Études scientifiques quantitatives
957
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
q
s
s
/
un
r
t
je
c
e
–
p
d
je
F
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
un
_
0
0
2
2
2
p
d
/
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
“… This paper aims at a more thorough comparison between Ludwik Fleck’s concept of
thought style and Thomas Kuhn’s concept of paradigm. Although some philosophers
suggest that these two concepts ….”
The pie chart in Figure 3 shows how we classified the retracted publications and those cit-
ing them according to their hum_affinity score. To narrow our analysis and reduce bias, nous
decided to consider only the retracted publications (and their corresponding citing entities)
having a medium or high hum_affinity score (c'est à dire., ≥ 2). Twelve retracted publications have
been excluded from the analysis (c'est à dire., hum_affinity < 2) along with their 257 citations. A list
of the excluded retracted publications is available at the Zenodo repository (Heibi & Peroni,
2021b). At the end of this phase, the final number of retracted items we considered was 72,
with 678 citing entities.
2.3. Annotating the Citation Characteristics
Once collected the 72 retracted items and their related 678 citing entities were collected, we
wanted to characterize such citing entities with respect to their basic metadata and full-text
content.
2.3.1. Gathering citing entities metadata
We retrieved basic metadata via REST APIs from either COCI/MAG, for each citing entity (i.e.,
DOI (if any), year of publication, title, venue id (ISSN/ISBN), and venue title). Then, using the
Retraction Watch database, we annotated whether the citing entity was fully retracted as well.
We also classified the citing entities into areas of study and specific subjects, following the
Scimago Journal Classification (https://www.scimagojr.com/), which uses 27 main subject
areas (medicine, social sciences, etc.) and 313 subject categories (psychiatry, anatomy,
etc.). We searched for the titles and IDs (ISSN/ISBN) of the venues of publication of all the
citing entities and classified them into specific subject areas and subject categories. For
books/book chapters, we used the ISBNDB service (https://isbndb.com/) to look up the related
Library of Congress Classification (LCC, https://www.loc.gov/catdir/cpso/lcco/), and then we
mapped the LCC categories into a corresponding Scimago subject area using an established
set of rules detailed in Heibi and Peroni (2022a).
2.3.2.
Extracting textual content features
We extracted the abstract of each citing entity and all its in-text citations of the retracted pub-
lications in our set, marking the reference pointers to them (i.e., the in-line textual devices,
e.g., “[3]” used to refer to bibliographic references), the section where they appear, and their
citation context4. The citation context is based on the sentence that contains the in-text refer-
ence (i.e., the anchor sentence), plus the preceding and following sentences5. The definition of
this citation context is based on the study of Ritchie, Robertson, and Teufel (2008). We anno-
tated the first-level sections containing the in-text citation with their type using the categories
4 If we could not access the full text of a citing entity (e.g., due to paywalls restrictions), the corresponding
entity was still considered in our data set. However, we did not use it for the qualitative postanalysis
described in Sections 3.2.2 and 3.3. Details about the number of entities for which we could not retrieve
are introduced in Section 3.2.1.
5 Exceptions to this rule (e.g., when the anchor sentence is the last one of a paragraph) are discussed in Heibi
and Peroni (2022a).
Quantitative Science Studies
958
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
“introduction,” “method,” “abstract,” “results,” “conclusions,” “background,” and “discus-
sion” listed in Suppe (1998) if such section rhetoric was clear by looking at its title; otherwise
we used other three residual categories: “first section,” “middle section,” and “final section,”
depending on their position in the citing entity.
Then, we manually annotated each in-text citation with three main features: the citation
sentiment conveyed by the citation context, whether the citation context mentioned the retrac-
tion of the cited entity, and the citation intent. The annotation of the citation sentiment is
inspired by the classification proposed in Bar-Ilan and Halevi (2017), and we marked each
in-text citation with one of the following values:
(cid:129) positive, when the retracted publication was cited as sharing valid conclusions, and its
findings could also have been used in the citing entity;
(cid:129) negative, if the citing entity cited the retracted publication and addressed its findings as
inappropriate and/or invalid; and
(cid:129) neutral, when the author of the citing entity referred to the retracted publication without
including any judgment or opinion regarding its validity.
Then, we annotated with yes/no each citing entity if any in-text citation context we gath-
ered from it did/did not explicitly mention the fact that the cited entity was retracted. Finally,
we annotated the intent of each in-text citation. The citation intent (or citation function) is
defined as the authors’ reason for citing a specific publication (e.g., the citing entity uses a
method defined in the cited entity). To label such citation functions, we used those specified
in the Citation Typing Ontology (CiTO, https://purl.org/spar/cito) (Peroni & Shotton, 2012), an
ontology for the characterization of factual and rhetorical bibliographic citations. We used the
decision model developed and adopted in Heibi and Peroni (2021a) to decide which citation
function select to label an in-text citation. Figure 4 shows part of the decision model; it pre-
sents the case when the intent of the citation is “Reviewing and eventually giving an opinion
on the cited entity” and the citation function is part of one of the following groups: “Consistent
with,” “Inconsistent with,” or “Talking about.”
We do not introduce the full details of the labeling process due to space constraints; the
complete diagram of the decision model is available in Heibi (2022), and an extensive intro-
duction and explanation can be found in Heibi and Peroni (2022a).
3. RESULTS AND ANALYSIS
We have produced an annotated data set containing 678 citing entities and 1,020 in-text
citations of 72 retracted publications. We have published a dedicated web page (https://
ivanhb.github.io/ret-analysis-hum-results/) embedding visualizations that enable the readers
to view and interact with the results, also available in Heibi and Peroni (2021b).
In the following sections, we introduce some important concepts adopted in the description
and organization of our results. Then we show the results of quantitative and qualitative
analyses of all the data we collected.
3.1. Data Organization
We defined three periods to distribute the citations of retracted publications:
(cid:129) Period P-Pre—from the year of publication of the retracted work to the year before its full
retraction (the year of the retraction is not part of this period).
Quantitative Science Studies
959
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
Figure 4. Part of the decision model for the selection of a CiTO (Citation Typing Ontology) citation
function for annotating the citation intent of an examined in-text citation based on its citation con-
text. The first large row contains one of the three macro categories (“Reviewing …”); each macro
category has a set of subcategories such that each subcategory refers to a set of citation functions.
The first row defines what citation functions are suitable for it through the help of a guiding sentence
that needs to be completed according to the chosen subcategory and citation function.
(cid:129) Period P-Ret—the year of the full retraction.
(cid:129) Period P-Post—from the year after the full retraction to the year of the last citation
received by the retracted publication, according to the citation data we gathered.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
Each citing entity falls under one of the above three periods. The two periods P-Pre and
P-Post were split into fifths, labeled “[−1.00, −0.61],” “[−0.60, −0.21],” “[−0.20, 0.20],”
“[0.21, 0.60],” and “[0.61, 1.00].” When the citing entity is part of either P-Pre or P-Post, then
it is also part of a specific fifth, which identifies how close or far that entity is to or from the
events that defining the period.
The division into fifths helped us define a uniform time span to locate the citing entities
independently of the year of retraction of the work they cite and the publication years of
the citing and cited entities6. For instance, if an entity A published in 2011 had cited a
retracted publication R published in 2002, fully retracted in 2012, then A is part of the last
fifth (i.e., “[0.61, 1.00]”) of P-Pre. This means that A has cited R in the last fifth, immediately
before the formal retraction of R.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
3.2. Descriptive Statistics
We have classified the distribution of the citing entities in the three periods (i.e., P-Pre, P-Ret,
and P-Post) as a function of the humanities disciplines used in Retraction Watch, as shown in
Figure 5. Religion was the discipline that received the highest number of citations (375), and
history had the highest number of retracted items (20).
6 A detailed explanation regarding the calculation of the periods is discussed in Heibi and Peroni (2022a).
Quantitative Science Studies
960
Open citation analysis of retracted articles in the humanities
Figure 5. The number of citing entities in P-Pre (before the year of retraction), P-Ret (in the year of
retraction), and P-Post (after the year of retraction) for each different humanities discipline specified
to the retracted publication as gathered from Retraction Watch.
In Figure 6 we have classified the entities citing a retracted publication in each discipline
according to their subject areas. Arts and humanities and Social sciences (AH&SS) were highly
represented in both the P-Pre and P-Post periods of almost all the retracted publications’
disciplines. However, we noticed some exceptions to this rule in P-Pre in Journalism (10%
of citing entities were AH&SS publications), P-Post in Arts (13% AH&SS publications), and
P-Pre and P-Post of Architecture (no AH&SS publications in either period).
Because we expected, as also highlighted in previous studies (e.g., Ngah & Goi, 1997), that
a good part of the citations of humanities publications come from AH&SS publications, we
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 6. The subject areas distribution of the citing entities of the retracted publications in P-Pre (before the year of retraction) and P-Post
(after the year of retraction) for each different humanities discipline as specified in Retraction Watch. The number of citing entities is mentioned
between brackets.
Quantitative Science Studies
961
Open citation analysis of retracted articles in the humanities
decided to look more deeply into the obtained results before moving on to the next stage. As
shown in Figure 5, we noticed that Journalism has a completely different behavior compared
to the other disciplines. Indeed, the citations of Journalism have cited three retracted publica-
tions: two with a hum_affinity of 3, and one with a hum_affinity of 2. The latter article was
“Personality, stress and disease: Description and validation of a new inventory” (Grossarth-
Maticek & Eysenck, 1990). This article has 130 citations (almost 95% of all the citations in
Journalism). Retraction Watch has labeled this article with the additional two subject areas:
Public Health and Safety and Sociology; therefore Journalism represents the only humanities
subject. A further investigation in the full text of the paper revealed the fact that this article is
highly related to health sciences, and Journalism has a marginal (almost absent) relevance in it.
Considering these discovered facts, we felt that this article could represent a significant bias in
our analysis. Therefore, to limit its impact on the results we decided to exclude it from our
analysis.
As a further check, we have investigated all the retracted publications of all the humanities
disciplines in Figure 6 having citations from Arts and humanities publications less than 20% in
either P-Pre or P-Post. Arts and Architecture are the two disciplines falling in this category. After
a manual check, we detected the article “A systematic review on postimplementation evalu-
ation models of enterprise architecture artefacts” (Nikpay, Ahmad et al., 2020), classified
under Architecture, yet while reading its full text we found little evidence supporting the
proposed labeling, as it was a computer science study. Therefore, we decided to also exclude
this article from our analysis.
After this data refinement, our final data were reduced to 546 citing entities and 786 in-text
citations of 70 retracted publications. Considering the final data and the classification of the
retracted publications based on their humanities discipline, we investigated another aspect: In
Figure 7 we have plotted the total number of citations gained by each humanities discipline as
a function of the number of years passed after the date of retraction. This trend is compared to
the average time of retraction for each humanities discipline. From Figure 7, we noticed that
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 7. The total number of citations gained by the retracted publications, grouped according to
their humanities discipline (represented by different colors), as a function of the number of years
passed after their date of retraction. The vertical dotted lines represent the average time of retraction
of each humanities discipline. The gray line sums up all the humanities disciplines together.
Quantitative Science Studies
962
Open citation analysis of retracted articles in the humanities
on average disciplines such as religion and philosophy reported their peak in the year before
their retraction, while this trend is the opposite for history, arts, and architecture.
To infer other interesting statistics regarding the obtained results, we treated the citing enti-
ties and the in-text citations they contain as two different classes, and we present descriptive
statistics of these two classes in the following subsections.
3.2.1. Citing entities
We examined the distribution of the citing entities to retracted publications as a function of
two features: the periods (i.e., P-Pre, P-Ret, and P-Post), further classified into those that men-
tioned the retraction or for which we could not access their full-text; and their subject areas.
The results are shown in Figure 8.
The number of citing entities before the retraction (192, period P-Pre) was lower than the
number of citing entities after the retraction (260, period P-Post). Along P-Pre and P-Ret, we
noticed a continuous increment in the overall number of citing entities, which suddenly
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 8. A descriptive statistical summary of the distribution of the citing entities to retracted publications in the three periods (P-Pre, P-Ret,
and P-Post; i.e., before/during/after the year of retraction), also considering their subject areas. The bar charts on top highlight the citing entities
that either did or did not mention the retraction and those for which we could not retrieve the full text.
Quantitative Science Studies
963
Open citation analysis of retracted articles in the humanities
started decreasing after the first fifth of P-Post, yet the numbers were in line with those
observed in the third and fourth fifths of P-Pre. The last fifth of P-Post is an exception to the
declining trend, with an unexpected high peak. This result was due to the fact that 27 retracted
items received only one citation in P-Post and, in these cases, that citation always represented
the last citation received, which is the final border of P-Post.
The full text of 8.42% of the citing entities was not accessible. For those for which we suc-
cessfully retrieved the full text, our results showed that a relatively low percentage mentioned
the retraction of the cited entity—2.25% of the total number of citing entities in P-Ret and
P-Post.
Looking at their subject areas, we noticed that the citing entities started to spread into a
higher number of subject areas (i.e., an additional nine) in P-Post compared to P-Pre, where
the residual category Others contained 16% of the citing entities. The Arts and humanities
subject area had a similar percentage throughout all three periods (22.94%, 18.42%, and
18.14%), and it represents, together with Social sciences, the two most representative subject
areas in P-Ret and P-Post. We also noticed an important drop in Psychology, from 15.41% in
P-Pre to 4.42% in P-Post.
3.2.2.
In-text citations
We focused on the distribution of the in-text citations as a function of three features: the
periods (i.e., P-Pre, P-Ret, and P-Post); the citation intent; and the section containing the in-
text citation. The results of the three distributions have been further classified according to the
in-text citation sentiment (i.e., negative/neutral/positive), as shown in Figure 9.
The overall trend in the number of in-text citations during the three periods was close to the
one we observed for the citing entities (shown in the previous section), although the differ-
ences between P-Pre and P-Post were even more marked. As introduced in the previous sec-
tion, the pick in the last fifth of P-Post was due to the retracted items receiving only one citation
in P-Post. Even though the overall percentage of negative citations was low, it had a higher
presence in P-Pre (4.5%). Generally, most in-text citations were tagged as neutral, and very few
were positive (0.75%).
The citation intents “obtains background from” and “cites for information” were the two
most dominant ones in the three periods, and they represented 31.29% and 22.64% of the
total number of in-text citations, respectively. The citation intent “cites for information”
increased its presence moving from 17.8% in P-Pre to 27.20% in P-Post.
Considering the citation sections, we can clearly see that the in-text citations were mostly
located in the “Introduction” section in all the three periods. The in-text citations in the section
“Introduction” decreased a lot after P-Ret moving from 30.15% in P-Pre to 22.13% in P-Post.
Instead, the in-text citations contained in the section “Discussion” have an increasing trend,
from 6.87% in P-Pre to 15.20% in P-Post.
3.3. Topic Models of Citing Entities’ Abstracts and their Citation Contexts
A topic modeling analysis is a statistical modeling approach for automatically discovering the
topics (represented as a set of words) that occur in a collection of documents. We used it with
our data to understand what the evolution of the topics in time was and whether it was depen-
dent, in some way, on the retraction received by the publications considered.
A standard workflow for building a topic model is based on three main steps: tokenization,
vectorization, and topic model I creation. The topic model we have built is based on the Latent
Quantitative Science Studies
964
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
Figure 9. A descriptive statistical summary for the distribution of the in-text citations contained in the citing entities to the retracted publi-
cations in the three periods (P-Pre, P-Ret, and P-Post, i.e., before/during/after the year of retraction), according to their intent, and section. The
sentiment of the in-text citations is also highlighted.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Dirichlet Allocation (LDA) model (Jelodar, Wang et al., 2019). In the tokenization process, we
have converted the text into a list of words by removing punctuation, unnecessary characters,
and stop words, and we also decided to lemmatize and stem the extracted tokens. In the sec-
ond step, we created vectors for each of the generated tokens using a Bag-of-Words (BoW)
model (Brownlee, 2019), which we considered appropriate to model our study considering
our direct experience in previous findings (Heibi & Peroni, 2021a) and the suggestions by
Bengfort, Bilbro, and Ojeda (2018) on the same issue. Finally, to build the LDA topic model,
we determined in advance the number of topics to retrieve according to the examined corpus
using a popular method based on the value of the topic coherence score, as suggested in
Schmiedel, Müller, and vom Brocke (2019), which can be used to measure the degree of
semantic similarity between high-scoring words in the topic.
Quantitative Science Studies
965
Open citation analysis of retracted articles in the humanities
We built and executed two LDA topic models, one using the abstracts of the entities citing
the retracted publications (with 16 topics), named TM-Abs, and another using the citation con-
texts where the in-text reference pointers to retracted publications were contained (with 20
topics), named TM-Cits. To create the topic models, we used MITAO (Ferri, Heibi et al.,
2020) (https://github.com/catarsi/mitao), a visual interface to create a customizable visual
workflow for text analysis. With MITAO, we have generated two visualizations: Latent
Dirichlet Allocation Visualization (LDAvis) (Sievert & Shirley, 2014) for an overview of the
topic modeling results, and Metadata-Based Topic Modeling Visualization (MTMvis) for a
dynamic and interactive visualization of the topics based on customizable metadata.
3.3.1. Citing entities abstracts
The total number of available abstracts in our data set was 509. We extended the list of
MITAO’s default English stop words (“the”, “is”, etc.) with ad hoc stop words devised for
our study, such as “method,” “results,” and “conclusions,” which represent the typical words
that might be part of a structured abstract.
Figure 10 shows the topic distribution represented in the two-dimensional space of LDAvis.
Using the LDAvis interface, we set the parameter λ to 0.3 to determine the weight given to the
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 10. The 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities).
The visualization is taken from LDAvis, and it shows the topic distribution in a two-dimensional
space.
Quantitative Science Studies
966
Open citation analysis of retracted articles in the humanities
probability of a term under a specific topic relative to its lift (Sievert & Shirley, 2014), and
retrieved the 30 most relevant terms of each topic. We gave an interpretation and a title to
each topic by analyzing its related terms, which we avoid introducing here due to space con-
straints, but they are available in Heibi and Peroni (2021b). Topic 6 (“Leadership organization,
and management”) was the dominant topic. The topics were distributed in four main clusters,
as shown in Figure 10:
(cid:129) one composed of topics 2 (“Sociopolitical issues related to leadership”) and 6, concerning
issues related to leadership, work organization, and management form a sociopolitical
point of view;
(cid:129) a large one composed of topics 1 (“Sociopolitical issues possibly related to Vietnam”), 4
(“History of the Jewish culture”), 5 (“Music and psychological diseases”), 11(“Family and
religion”), etc. This treats several subjects from different domains close to social sci-
ences, political sciences and psychology; and
(cid:129) another two clusters composed of one topic each: topic 16 (“Geography and climatic
issues”) and topic 3 (“Colonial history”).
Figure 11 shows the chart generated using MTMvis. We plotted the topic distribution as a
function of the three periods. At a first analysis, we noticed how topics 6 and 16 incremented
their distribution along the three periods. On the other hand, topics 1 and 11 decreased their
percentage throughout the three periods.
3.3.2.
In-text citation contexts
The total number of in-text citation contexts in our data set, used as input to produce the sec-
ond topic model, was 786. As we did with the abstracts, we have defined and used a list of ad
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 11. The MTMvis chart created over the 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities). The topics are
plotted as a function of the three periods (represented on the x-axis).
Quantitative Science Studies
967
Open citation analysis of retracted articles in the humanities
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 12. The 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The
visualization is taken from LDAvis and shows the topic distribution in a two-dimensional space.
hoc stop words, which included all the given and family names of the authors of the cited
publications.
Figure 12 shows the topics represented in the two-dimensional space of LDAvis. As we
did for the abstracts’ topic modeling, we set λ to 0.3 and interpreted each topic by analyz-
ing its 30 most relevant terms (Heibi & Peroni, 2021a, 2021b). In this case, we noticed that
the topics are less overlapping and more distributed along the whole axis of the visualiza-
tion. Topic 12 (“Leadership organization, and management”) is the most representative
(11.7%) and was very distant
from the other topics. The bottom right part of the
graphics—with topics 2 (“Countries in conflict”), 15 (“War and terrorism”), 17 (“War and
history”), 18 (“History of Europe”), and 20 (“War and army conflicts”)—is mostly close to
history studies, especially discussion of army conflicts. The top part of the graphic contains
several single-topic clusters, such as topic 5 (“Gender social issues”) and 9 (“Geography and
climatic issues”).
Figure 13 shows the chart generated using MTMvis, where we plotted the topic distribution
as a function of the three periods. We noticed a continuous decrement in topics 7 (“Family and
religion”) and 18 along the three periods. Topic 3 (“Drugs/alcohol and psychological dis-
eases”) had a high decrement immediately after P-Ret. On the other hand, we noticed an
Quantitative Science Studies
968
Open citation analysis of retracted articles in the humanities
Figure 13. The MTMvis chart created over the 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The topics are
plotted as a function of the three periods (represented on the x-axis).
increment in topics 5, 9, and 11 (“Music and psychological diseases”), although the latter topic
had a higher percentage in P-Ret than in P-Post.
4. DISCUSSION AND CONCLUSION
In this section, we address separately each of our research questions RQ1–RQ3 presented in
Section 1. We conclude the section by discussing the limits of our work and by sketching out
some future work that might help us overcome these issues.
4.1. Answering RQ1: Citing Retracted Publications in the Humanities
It seems that, on average, retracted publications in the humanities did not have a drop in cita-
tions after their retraction (Figure 8) and only 2.25% of the citing entities—five Arts and
humanities publications and three related to health sciences subject areas (e.g., medicine, psy-
chology, nursing) mentioned the retraction in the citation context. In addition, we noticed that
the negative perception of a retracted work, although limited in the data we have, happened
before its retraction if the cited entity had a low affinity to the humanities domain. The fact that
we reported few negative citations in P-Post is consistent with other studies (Bordignon, 2020;
Luwel et al., 2019; Schneider et al., 2020).
Citing entities talking about retraction usually discussed the cited entity rather than obtain-
ing background material from it or generic informative claims (Figure 14). Most of the in-text
citations marked as discusses occurred in the Discussion section (as shown in Figure 15), and
from TM-Cits we noticed the emerging of topic 6 (“The retraction phenomenon”) in Discussion
sections only in P-Post—in other words, the retraction was not mentioned in the Discussion
section before the retraction, and the retraction event might have been the trigger of a higher
discussion from the citing entities.
From the distribution of the subject areas of the citing entities over the three periods
(Figure 8), we noticed that Social sciences and Arts and humanities had almost the same
Quantitative Science Studies
969
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Figure 14. The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic modeling on the in-text citation contexts) over
the three periods for the four citation intents that have been used the most.
percentages in the P-Ret and P-Post periods, which is less than their percentages in P-Pre, sug-
gesting that the retraction event did have an impact on these subject areas. However, other
subject areas such as psychology decreased in P-Ret and more in P-Post, which may be an
indicator of higher concern in these subject areas toward the citation of retracted publications.
This is evidenced by the observation of the TM-Abs topics distribution for the citing entities
assigned to psychology (Figure 16), with a clear decrement in the topics related to health sci-
ences, such as topics 10 and 11, whereas others, such as topics 6 and 9 (close to sociohistorical
discussions with no relation to health sciences) increased their presence in P-Ret and P-Post. In
other words, not only did the overall number of citing entities from the health sciences domain
decrease after the retraction, but their subject areas moved from the health sciences domain
to subjects that are closer to the Social sciences and Arts and humanities domains.
4.2. Answering RQ2: Citation Behaviors in the Humanities
As shown in Figure 6, Religion and History had a very similar distribution pattern. In both, the
citing entities belonging to Social sciences had an important decrement in P-Post, and for that
Figure 15. The distribution of the main (positional sections are not included, e.g., first section) in-
text citation sections over the three periods. The percentages of in-text citations having a corre-
sponding annotated main section for each period (i.e., P-Pre, P-Ret, and P-Post) are respectively
50.76%, 56.68%, and 61.86%.
Quantitative Science Studies
970
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Figure 16. A filtered MTMvis to show the distribution of the topics of TM-Abs (LDA topic modeling
on the abstracts of the citing entities) as a function of the three periods. The visualization is built
considering only the documents (i.e., abstracts) that have Psychology as subject areas.
period, the TM-Cits of these entities does not include topic 3 (“Drugs/Alcohol psychological
diseases”) for Religion and topic 7 (“Family and religion”) for History. We can speculate that
Social sciences studies significantly reduced its percentage due to a higher concern toward
sensitive social subjects such as healthcare, family, and religion.
Arts had the highest number of citations in P-Post, although we reported an important drop
in the Arts and humanities citing entities, in favor of subject areas such as Medicine, Nursing,
and Engineering (Figure 6). On the other hand, for Philosophy we had a completely different
situation: Citing entities labeled as Arts and humanities incremented a lot in P-Post at the
expense of citing entities from Psychology. For the Arts discipline, topic 11 (“Music and psy-
chological diseases”) of TM-Cits is the reason for the positive trend of P-Post. In other words,
arts (and especially music) had been discussed with relation to psychological and medical
diseases.
In Figure 17, we show the distribution of topic 6 (“The retraction phenomenon”) as a func-
tion of the three periods and considering the four humanities disciplines with the higher
Figure 17. The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic
modeling on the in-text citation contexts) over the three periods for the humanities disciplines Reli-
gion, History, Arts, and Philosophy.
Quantitative Science Studies
971
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
number of citing entities. Topic 6 increased a lot in P-Post in Philosophy and in Religion it had
a steady trend, whereas History and Arts had a peak in P-Ret and a lower, yet relatively high,
percentage in P-Post. These results might suggest that the entities that cite retracted publica-
tions in Philosophy, Arts, and History (which following the results of the topic modeling anal-
ysis produced topics close to STEM disciplines) were those showing major concerns toward
the retraction—in the case of History and Arts starting from the year of the retraction.
Considering these hypotheses, we can interpret the fact that History and Arts reached their
peak of citations after their year of retraction (Figure 7) as a sign of awareness/acknowledgment
regarding the retraction rather than unconsciousness use of the retracted publications, at least
for part of these citations.
4.3. Answering RQ3: Comparing STEM and the Humanities
Our findings showed that the retraction of humanities publications did not have a negative
impact on the citation trend (Figure 8). The opposite trend was observed in other disciplines,
according to prior studies, such as biomedicine (Dinh et al., 2019) and psychology (Yang & Qi,
2020). However, studies, such as Heibi and Peroni (2021a) and Schneider et al. (2020),
also observed that in the health sciences domain there were cases where either a single or a
few popular cases of retraction were characterized by an increment of citations after the retrac-
tion. This might suggest that the discipline related to the retracted publication is not the only
central factor to consider for predicting the citation trend after the retraction. Other factors
might play a crucial role, such as the popularity of and media attention to the retraction case,
as has been discussed in the studies by Mott, Fairhurst, and Torgerson (2019) and Bar-Ilan and
Halevi (2017).
The work by Bar-Ilan and Halevi (2018) analyzed the citations of 995 retracted publications
and found the same growing trend in the citations in the postretraction period. However, they
did not analyze the retraction according to different and separate disciplines. As such, we
might consider such results as a representation of a general trend of retracted publications,
that confirms the general observations we derived from our data. In addition, considering
the results we have obtained for the specific humanities disciplines, it seems as though the
potential threats and damage from retracted materials have been perceived more seriously
by others (i.e., citing entities) when the retracted publications have been linked to a sensitive
area of study and to the STEM domain. This final observation notes the different behaviors that
might occur when a retracted publication manifests a higher relation to STEM.
4.4. Limitations and Future Developments
There are some limitations in our studies that may have introduced some biases. First, com-
pared to other fields of study, bibliographic metadata in the humanities have limited coverage
in well-known citation databases (Hammarfelt, 2016). This fact led to some limitations when
applying a citation analysis in the humanities domain (Archambault & Larivière, 2010). In this
regard, a coverage analysis and comparison of the citations in the humanities domain in COCI
and MAG might be highly valuable. Other data sources, such as OpenAlex (Priem, Piwowar,
& Orr, 2022), a free and open catalog of the world’s scholarly papers, researchers, journals,
and institutions, could be considered. Pragmatically, as far as our study is concerned, we
undoubtedly collected fewer citing entities than those that had in fact cited the retracted pub-
lications. In addition, we have considered only open citation data; therefore the citation cov-
erage could significantly improve with the addition of nonopen citation data. The availability
of a larger amount of data could have strengthened and improved the quality of our results.
Quantitative Science Studies
972
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
The selection of the retracted publications was another crucial issue, because we faced two
major problems: some inconsistencies in the data provided by Retraction Watch and the pres-
ence of retracted publications labeled as humanities that, on close analysis, actually belonged
to a different discipline. The first descriptive statistical results, our manual check, and the def-
inition of the humanities affinity score helped us limit the biases of these two issues. However,
we could improve the approach adopted by using additional services such as Elsevier’s
ScienceDirect—as done in Bar-Ilan and Halevi (2018)—and increasing the threshold of
the humanities affinity level to exclude border cases.
A citation analysis concerning retraction in the humanities domain is something that has
rarely been discussed in the past, and therefore the discussion of our results included a compar-
ison with similar works that considered different domains or retraction cases. Such works have
not addressed the humanities domain or were based either on a single retraction case or a limited
set of them. Work that considered other domains did not include most of the features that we
have analyzed in this work (e.g., the citation intent), which made the comparison with them
difficult. We intend that this study and others to be done in this field can lead to a comparison
and improvement in the understanding of the retraction phenomenon in the humanities domain.
ACKNOWLEDGMENTS
We would the like to thank the editor and the reviewers for taking the time and effort necessary
to review the paper. We sincerely appreciate all valuable suggestions, which helped us to
improve the quality of the paper.
AUTHOR CONTRIBUTIONS
Ivan Heibi: Data curation, Investigation, Formal analysis, Investigation, Methodology, Soft-
ware, Visualization, Writing—Original draft, Writing—Review & editing. Silvio Peroni: Con-
ceptualization, Project administration, Supervision, Validation, Writing—Review & editing.
COMPETING INTERESTS
The authors have no competing interests.
FUNDING INFORMATION
This work has been partially funded by the European Union’s Horizon 2020 research and
innovation program under grant agreement No 101017452 (OpenAIRE-Nexus).
DATA AVAILABILITY
The data produced in this work (i.e. inputs, annotations, and results) are published and avail-
able on Zenodo (Heibi & Peroni, 2022b).
REFERENCES
Archambault, É., & Larivière, V. (2010). The limits of bibliometrics
for the analysis of the social sciences and humanities literature.
h t t p s : / / o s t . o p e n u m . c a / f i l e s / s i t e s / 1 3 2 / 2 0 1 7 / 0 6 / W S S R
_ArchambaultLariviere.pdf
Ataie-Ashtiani, B. (2018). World map of scientific misconduct. Sci-
ence and Engineering Ethics, 24(5), 1653–1656. https://doi.org
/10.1007/s11948-017-9939-6, PubMed: 28653166
Azoulay, P., Bonatti, A., & Krieger,
(2017). The career
effects of scandal: Evidence from scientific retractions. Research
J. L.
Policy, 46(9), 1552–1569. https://doi.org/10.1016/j.respol.2017
.07.003
Barbour, V., Kleinert, S., Wager, E., & Yentis, S. (2009). Guidelines
for retracting articles. Committee on Publication Ethics. https://
doi.org/10.24318/cope.2019.1.4
Barde, B. V., & Bainwad, A. M. (2017, June). An overview of
topic modeling methods and tools. In 2017 International Con-
ference on Intelligent Computing and Control Systems (ICICCS)
(pp. 745–750). IEEE. https://doi.org/10.1109/ICCONS.2017.8250563
Quantitative Science Studies
973
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Bar-Ilan, J., & Halevi, G. (2017). Post retraction citations in context:
A case study. Scientometrics, 113(1), 547–565. https://doi.org/10
.1007/s11192-017-2242-0, PubMed: 29056790
Bar-Ilan, J., & Halevi, G. (2018). Temporal characteristics of
retracted articles. Scientometrics, 116(3), 1771–1783. https://doi
.org/10.1007/s11192-018-2802-y
Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis
with Python: Enabling language-aware data products with
machine learning. O’Reilly Media, Inc.
Boldt, J. (2000). The good, the bad, and the ugly: Should we
completely banish human albumin from our intensive care units?
Anesthesia & Analgesia, 91(4), 887–895. https://doi.org/10.1097
/00000539-200010000-00022, PubMed: 11004043
Bolland, M. J., Grey, A., & Avenell, A. (2021). Citation of retracted
publications: A challenging problem. Accountability in Research,
29(1), 18–25. https://doi.org/10.1080/08989621.2021.1886933,
PubMed: 33557605
Bordignon, F. (2020). Self-correction of science: A comparative
study of negative citations and post-publication peer review.
Scientometrics, 124(2), 1225–1239. https://doi.org/10.1007
/s11192-020-03536-z
Bornemann-Cimenti, H., Szilagyi, I. S., & Sandner-Kiesling, A.
(2016). Perpetuation of retracted publications using the example
of the Scott S. Reuben case: Incidences, reasons and possible
improvements. Science and Engineering Ethics, 22(4), 1063–1072.
https://doi.org/10.1007/s11948-015-9680-y, PubMed: 26150092
Brainard, J. (2018). What a massive database of retracted papers
reveals about science publishing’s “death penalty.” Science, 25
October. https://doi.org/10.1126/science.aav8384
Brownlee, J. (2019). A gentle introduction to the Bag-of-Words
model. https://machinelearningmastery.com/gentle-introduction
-bag-words-model/
Campos-Varela, I., Villaverde-Castañeda, R., & Ruano-Raviña, A.
(2020). Retraction of publications: A study of biomedical journals
retracting publications based on impact factor and journal cate-
gory. Gaceta Sanitaria, 34(5), 430–434. https://doi.org/10.1016/j
.gaceta.2019.05.008, PubMed: 31530483
Candal-Pedreira, C., Ruano-Ravina, A., Fernández, E., Ramos, J.,
Campos-Varela, I., & Pérez-Ríos, M. (2020). Does retraction after
misconduct have an impact on citations? A pre–post study. BMJ
Global Health, 5(11), e003719. https://doi.org/10.1136/ bmjgh
-2020-003719, PubMed: 33187964
Casadevall, A., Steen, R. G., & Fang, F. C. (2014). Sources of error
in the retracted scientific literature. The FASEB Journal, 28(9),
3847–3855. https://doi.org/10.1096/fj.14-256735, PubMed:
24928194
Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization
techniques for assessing textual topic models. In Proceedings of
the International Working Conference on Advanced Visual
Interfaces (pp. 74–77). https://doi.org/10.1145/2254556.2254572
Collier, R. (2011). Shedding light on retractions. Canadian Medical
Association Journal, 183(7), E385–E386. https://doi.org/10.1503
/cmaj.109-3827, PubMed: 21444620
Corbyn, Z. (2012). Misconduct is the main cause of life-sciences
retractions. Nature, 490, 21. https://doi.org/10.1038/490021a,
PubMed: 23038445
Dinh, L., Sarol, J., Cheng, Y., Hsiao, T., Parulian, N., & Schneider, J.
(2019). Systematic examination of pre- and post-retraction
citations. Proceedings of the Association for Information Science
and Technology, 56(1), 390–394. https://doi.org/10.1002/pra2.35
Fang, F. C., & Casadevall, A. (2011). Retracted science and the
retraction index. Infection and Immunity, 79(10), 3855–3859.
https://doi.org/10.1128/IAI.05661-11, PubMed: 21825063
Feng, L., Yuan, J., & Yang, L. (2020). An observation framework for
retracted publications in multiple dimensions. Scientometrics,
125(2), 1445–1457. https://doi.org/10.1007/s11192-020-03702-3
Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A user
friendly and modular software for topic modelling. PuntOorg
International Journal, 5(2), 135–149. https://doi.org/10.19245
/25.05.pij.5.2.3
Gasparyan, A. Y., Ayvazyan, L., Akazhanov, N. A., & Kitas, G. D.
(2014). Self-correction in biomedical publications and the scien-
tific impact. Croatian Medical Journal, 55(1), 61–72. https://doi
.org/10.3325/cmj.2014.55.61, PubMed: 24577829
Gaudino, M., Robinson, N. B., Audisio, K., Rahouma, M.,
Benedetto, U., … Fremes, S. E. (2021). Trends and characteristics
of retracted articles in the biomedical literature, 1971 to 2020.
JAMA Internal Medicine, 181(8), 1118–1121. https://doi.org/10
.1001/jamainternmed.2021.1807, PubMed: 33970185
Grossarth-Maticek, R., & Eysenck, H. J. (1990). Personality, stress
and disease: Description and validation of a new inventory. Psy-
chological Reports, 66(2), 355–373. https://doi.org/10.2466/pr0
.1990.66.2.355, PubMed: 2349321
Halevi, G. (2020). Why articles in arts and humanities are being
retracted? Publishing Research Quarterly, 36(1), 55–62. https://
doi.org/10.1007/s12109-019-09699-9
Hammarfelt, B. (2016). Beyond coverage: Toward a bibliometrics
for the humanities. In M. Ochsner, S. E. Hug, & H.-D. Daniel
(Eds.), Research assessment in the humanities (pp. 115–131).
Springer International Publishing. https://doi.org/10.1007/978-3
-319-29016-4_10
Heibi, I. (2022). A guiding diagram for the selection of a CiTO cita-
tion function for a given in-text citation. Zenodo. https://doi.org
/10.5281/zenodo.7147985
Heibi, I., & Peroni, S. (2021a). A qualitative and quantitative anal-
ysis of open citations to retracted articles: The Wakefield 1998
et al.’s case. Scientometrics, 126(10), 8433–8470. https://doi
.org/10.1007/s11192-021-04097-5, PubMed: 34376878
Heibi, I., & Peroni, S. (2021b). Inputs and results of “A quantitative
and qualitative citation analysis to retracted articles in the human-
ities domain” [Data set]. Zenodo. https://doi.org/10.5281/zenodo
.5639371
Heibi, I., & Peroni, S. (2022a). A protocol to gather, characterize
and analyze incoming citations of retracted articles. PLOS
ONE, 17(7), e0270872. https://doi.org/10.1371/journal.pone
.0270872, PubMed: 35853087
Heibi, I., Peroni, S., & Shotton, D. (2019). Software review: COCI,
the OpenCitations Index of Crossref open DOI-to-DOI citations.
Scientometrics, 121(2), 1213–1228. https://doi.org/10.1007
/s11192-019-03217-6
Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref:
The sustainable source of community-owned scholarly metadata.
Quantitative Science Studies, 1(1), 414–427. https://doi.org/10
.1162/qss_a_00022
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., … Zhao, L. (2019).
Latent Dirichlet allocation (LDA) and topic modeling: Models,
applications, a survey. Multimedia Tools and Applications, 78(11),
15169–15211. https://doi.org/10.1007/s11042-018-6894-4
Lu, S. F., Jin, G. Z., Uzzi, B., & Jones, B. (2013). The retraction
penalty: Evidence from the Web of Science. Scientific Reports,
3(1), 3146. https://doi.org/10.1038/srep03146, PubMed:
24192909
Luwel, M., van Eck, N. J., & van Leeuwen, T. N. (2019). The
Schön case: Analyzing in-text citations to papers before and
after retraction [Preprint]. SocArXiv. https://doi.org/10.31235
/osf.io/c6mvs
Quantitative Science Studies
974
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Open citation analysis of retracted articles in the humanities
Mongeon, P., & Larivière, V. (2016). Costly collaborations: The
impact of scientific fraud on co-authors’ careers. Journal of the
Association for Information Science and Technology, 67(3),
535–542. https://doi.org/10.1002/asi.23421
Mott, A., Fairhurst, C., & Torgerson, D. (2019). Assessing the impact
of retraction on the citation of randomized controlled trial
reports: An interrupted time-series analysis. Journal of Health
Services Research & Policy, 24(1), 44–51. https://doi.org/10
.1177/1355819618797965, PubMed: 30249142
Mößner, N. (2011). RETRACTED: Thought styles and paradigms: A
comparative study of Ludwik Fleck and Thomas S. Kuhn. Studies
in History and Philosophy of Science Part A, 42(3), 416–425.
https://doi.org/10.1016/j.shpsa.2011.02.001
Ngah, Z. A., & Goi, S. S. (1997). Characteristics of citations used by
humanities researchers. Malaysian Journal of Library & Informa-
tion Science, 2(2), 19–36.
Nikpay, F., Ahmad, R., Rouhani, B. D., & Shamshirband, S. (2020).
R E T R A C T E D A RT I C L E : A s ys t e m a t i c r e v i e w o n p o s t -
implementation evaluation models of enterprise architecture
artefacts. Information Systems Frontiers, 22(3), 789. https://doi
.org/10.1007/s10796-016-9716-0 (Retraction published 2016,
https://doi.org/10.1007/s10796-016-9716-0)
OpenCitations. (2020). COCI CSV dataset of all the citation data
(p. 18077041949 Bytes) [Data set]. figshare. https://doi.org/10
.6084/M9.FIGSHARE.6741422.V6
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for
describing bibliographic resources and citations. Journal of Web
Semantics, 17, 33–43. https://doi.org/10.1016/j.websem.2012.08
.001
Peroni, S., & Shotton, D. (2018). Open Citation: Definition. https://
doi.org/10.6084/M9.FIGSHARE.6683855
Peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure
organization for open scholarship. Quantitative Science Studies,
1(1), 428–444. https://doi.org/10.1162/qss_a_00023
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open
index of scholarly works, authors, venues, institutions, and con-
cepts (arXiv:2205.01833). arXiv. https://doi.org/10.48550/arXiv
.2205.01833
Ritchie, A., Robertson, S., & Teufel, S. (2008). Comparing citation
contexts for information retrieval. In Proceedings of the 17th
ACM Conference on Information and Knowledge Management
(pp. 213–222). https://doi.org/10.1145/1458082.1458113
Schmiedel, T., Müller, O., & vom Brocke, J. (2019). Topic modeling
as a strategy of inquiry in organizational research: A tutorial with
an application example on organizational culture. Organiza-
tional Research Methods, 22(4), 941–968. https://doi.org/10
.1177/1094428118773858
Schneider, J., Ye, D., Hill, A. M., & Whitehorn, A. S. (2020).
Continued post-retraction citation of a fraudulent clinical trial
report, 11 years after it was retracted for falsifying data. Sciento-
metrics, 125(3), 2877–2913. https://doi.org/10.1007/s11192-020
-03631-1
Shuai, X., Rollins, J., Moulinier, I., Custis, T., Edmunds, M., &
Schilder, F. (2017). A multidimensional investigation of the
effects of publication retraction on scholarly impact. Journal of
the Association for Information Science and Technology, 68(9),
2225–2236. https://doi.org/10.1002/asi.23826
Sievert, C., & Shirley, K. E. (2014). LDAvis: A method for visualizing
and interpreting topics. https://doi.org/10.13140/2.1.1394.3043
Sternberg, R. J. (2006). RETRACTED ARTICLE: The nature of crea-
tivity. Creativity Research Journal, 18(1), 87–98. https://doi.org/10
.1207/s15326934crj1801_10. (Retraction published 2019,
https://doi.org/10.1080/10400419.2019.1647690)
Suppe, F. (1998). The structure of a scientific paper. Philosophy of
Science, 65(3), 381–405. https://doi.org/10.1086/392651
van der Vet, P. E., & Nijveen, H. (2016). Propagation of errors in
citation networks: A study involving the entire citation network
of a widely cited paper published in, and later retracted from,
the journal Nature. Research Integrity and Peer Review, 1, 3.
https://doi.org/10.1186/s41073-016-0008-5 , PubMed:
29451542
Wang, K., Shen, Z., Huang, C., Wu, C.-H., Dong, Y., & Kanakia, A.
(2020). Microsoft Academic Graph: When experts are not
enough. Quantitative Science Studies, 1(1), 396–413. https://doi
.org/10.1162/qss_a_00021
Yang, S., & Qi, F. (2020). How do retractions influence the citations
of retracted articles? In E. Ishita, N. L. S Pang, & L. Zhou
(Eds.), Digital libraries at times of massive societal transition
(pp. 139–148). Springer International Publishing. https://doi
.org/10.1007/978-3-030-64452-9_12
Quantitative Science Studies
975
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
3
4
9
5
3
2
0
7
0
8
4
3
q
s
s
_
a
_
0
0
2
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3