ARTÍCULO DE INVESTIGACIÓN
h-Index manipulation by undoing merges*
René van Bevern1
, Christian Komusiewicz2
Manuel Sorge4
, and Toby Walsh3
, Hendrik Molter3
, Rolf Niedermeier3
,
un acceso abierto
diario
1Department of Mechanics and Mathematics, Novosibirsk State University, Novosibirsk, Russian Federation
2Fachbereich Mathematik und Informatik, Philipps-Universität Marburg, Marburg, Alemania
3Algorithmics and Computational Complexity, Fakultät IV, TU Berlin, Alemania
4Institute of Informatics, University of Warsaw, Poland
Citación: van Bevern, r., Komusiewicz,
C., Molter, h., Niedermeier, r., Sorge,
METRO., & Walsh, t. (2020). h-Index
manipulation by undoing merges.
Estudios de ciencias cuantitativas, 1(4),
1529–1552. https://doi.org/10.1162
/qss_a_00093
DOI:
https://doi.org/10.1162/qss_a_00093
Recibió: 11 Noviembre 2019
Aceptado: 11 Octubre 2020
Autor correspondiente:
Hendrik Molter
h.molter@tu-berlin.de
Editor de manejo:
Juego Waltman
Derechos de autor: © 2020 René van Bevern,
Christian Komusiewicz, Hendrik Molter,
Rolf Niedermeier, Manuel Sorge, y
Toby Walsh. Published under a
Creative Commons Attribution 4.0
Internacional (CC POR 4.0) licencia.
Palabras clave: article splitting, citation graph, experimental algorithmics, Google Scholar profiles,
NP-hard problems, parameterized complexity
ABSTRACTO
The h-index is an important bibliographic measure used to assess the performance of researchers.
Dutiful researchers merge different versions of their articles in their Google Scholar profile even
though this can decrease their h-index. In this article, we study the manipulation of the h-index by
undoing such merges. In contrast to manipulation by merging articles, such manipulation is harder
to detect. We present numerous results on computational complexity (from linear-time algorithms
to parameterized computational hardness results) and empirically indicate that at least small
improvements of the h-index by splitting merged articles are unfortunately easily achievable.
1.
INTRODUCCIÓN
We suppose that an author has a publication profile, for example in Google Scholar, that consists
of single articles and aims to increase her or his h-index1 by merging articles. This will result in a
new article with a potentially higher number of citations. The merging option is provided by
Google Scholar to identify different versions of the same article, for example a journal version
and its archived version.
Our main points of reference are three publications dealing with the manipulation of the
h-index, particularly motivated by Google Scholar author profile manipulation (de Keijzer & Apt,
2013; Pavlou & Elkind, 2016; van Bevern, Komusiewicz, et al., 2016b). En efecto, we will closely
follow the notation and concepts introduced by van Bevern et al. (2016b) and we refer to this work
for discussion of related work concerning strategic self-citations to manipulate the h-index
(Bartneck & Kokkelmans, 2011; Delgado López-Cózar, Robinson-García, & Torres-Salinas,
* An extended abstract of this article appeared in the proceedings of the 22nd European Conference on
Artificial Intelligence (ECAI ’16; Komusiewicz, van Bevern, et al., 2016a). This full version contains addi-
tional, corrected experimental results, and strengthened hardness results (Teorema 5). The following errors
in the previously performed computational experiments were corrected: (a) The algorithm (Ramsey) for gen-
erating initially merged articles was previously not described accurately. The description is now more
accurate and we consider additional algorithms to avoid bias in the generated instances. (b) Two authors
from the ai10-2011 and ai10-2013 data sets with incomplete data have been used in the computational
experimentos; these authors are now omitted. (C) There were several technical errors in the code relating
to the treatment of article and cluster identifiers of the crawled articles. This led to inconsistent instances
and thus erroneous possible h-index increases. All of these errors have been corrected.
1 The h-index of a researcher is the maximum number h such that he or she has at least h articles each cited at
La prensa del MIT
least h times (Hirsch, 2005).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
2014; Vinkler, 2013), other citation indices (Egghe, 2006; Pavlou & Elkind, 2016; Woeginger,
2008), and manipulation in general (Faliszewski & Procaccia, 2010; Faliszewski, Hemaspaandra,
& Hemaspaandra, 2010; Oravec, 2017). The main difference between this work and previous
publications is that they focus on merging articles for increasing the h-index (Bodlaender & camioneta
Kreveld, 2015; de Keijzer & Apt, 2013; Pavlou & Elkind, 2016; van Bevern et al., 2016b) or other
indices, such as g-index and the i10-index (Pavlou & Elkind, 2016), while we focus on splitting.
In the case of splitting, we assume that, most of the time, an author will maintain a correct profile
in which all necessary merges are performed. Some of these merges may decrease the h-index. Para
instancia, this can be the case when the two most cited papers are the conference and archived
version of the same article. A very realistic scenario is that at certain times, for example when being
evaluated by their dean2, authors may temporarily undo some of these merges to artificially
increase their h-index. A further point that distinguishes manipulation by splitting from manipula-
tion by merging is that for merging it is easier to detect whether someone cheats too much. This can
be done by looking at the titles of merged articles (van Bevern et al., 2016b). A diferencia de, it is much
harder to prove that someone is manipulating by splitting; the manipulator can always claim to be
too busy or that he or she does not know how to operate the profile.
The main theoretical conclusion from our work is that h-index manipulation by splitting
merged articles3 is typically computationally easier than manipulation by merging. Por eso,
undoing all merges and then merging from scratch might be computationally intractable in some
casos, mientras, in contrast, computing an optimal splitting is computationally feasible. El único
good news in terms of problem complexity (y, in a way, a recommendation) is that, if one were
to use the citation measure “fusionCite” as defined by van Bevern et al. (2016b), then manipulation
is computationally much harder than for the “unionCite” measure used by Google Scholar. En el
practical part of our work, we experimented with data from Google Scholar profiles (van Bevern
et al., 2016b).
1.1. Models for Splitting Articles
We consider the publication profile of an author and denote the articles in this profile by W (cid:1) V,
where V is the set of all articles. Following previous work (van Bevern et al., 2016b), we call these
articles atomic. Merging articles yields a partition P of W in which each part P 2 P with |PAG| ≥ 2 es un
merged article.
Given a partition P of W, the aim of splitting merged articles is to find a refined partition R of P
with a larger h-index, where the h-index of a partition P is the largest number h such that there are
at least h parts P 2 P whose number μ(PAG) of citations is at least h. Herein, we have multiple
possibilities of defining the number μ(PAG) of citations of an article in P (van Bevern et al., 2016b).
The first one, sumCite(PAG), was introduced by de Keijzer and Apt (2013), and is simply the sum of the
citations of each atomic article in P. Después, van Bevern et al. (2016b) introduced the cita-
tion measures unionCite (used by Google Scholar), where we take the cardinality of the union of
the citing atomic articles, and fusionCite, where we additionally remove self-citations of merged
articles as well as duplicate citations between merged articles. In generic definitions, we denote
these measures by μ (ver figura 1 for an illustration and Section 2 for the formal definitions). Nota
2 Lesk (2015) pointed out that the h-index is the modern equivalent of the old saying “Deans can’t read, ellos
can only count.” He also remarked that the idea of “least publishable units” by dividing one’s reports into
multiple (corto) papers has been around since the 1970s.
3 Google Scholar allows authors to group different versions of an article. We call the resulting grouping a
merged article. Google Scholar author profiles typically contain many merged articles (p.ej., an arXiv version
with a conference version and a journal version).
Estudios de ciencias cuantitativas
1530
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Cifra 1. Vertices represent articles, arrows represent citations, and numbers are citation counts.
The articles on a gray background in (a) have been merged in (b)–(d), and citation counts are given
according to the measures sumCite, unionCite, and fusionCite, respectivamente. The arrows represent
the citations counted by the corresponding measure.
eso, to compute these citation measures, we need a citation graph: a directed graph whose
vertices represent articles and in which an arc from a vertex u to a vertex v means that article u
cites article v.
En este trabajo, we introduce three different operations that may be used for undoing merges
in a merged article a:
Atomizing: splitting a into all its atomic articles,
Extracting: splitting off a single atomic article from a, y
Dividing: splitting a into two parts arbitrarily.
See Figure 2 for an illustration of the three splitting operations. Note that the atomizing, extrayendo,
and dividing operations are successively strictly more powerful in the sense that successively
larger h-indices can be achieved. Google Scholar offers the extraction operation. Multiple appli-
cations of the extraction operation can, sin embargo, simulate atomizing and, together with merging,
also dividing.
The three splitting operations lead to three problem variants, each taking as input a citation
graph D = (V, A), a set W (cid:1) V of articles belonging to the author, a partition P of W that defines
already-merged articles, and a nonnegative integer h denoting the h-index to achieve. For μ 2
{sumCite, unionCite, fusionCite}, we define the following problems.
ATOMIZING(μ)
Question: Is there a partition R of W such that
1.
2.
for each R 2 R either |R| = 1 or there is a P 2 P such that R = P,
the h-index of R with respect to μ is at least h?
Cifra 2. Vertices represent articles, arrows represent citations, and numbers are citation counts.
The articles on a gray background have been merged in the initial profile (a) and correspond to
remaining merged articles after applying one operation in (C) y (d). Cada (merged) article has
the same citation count, regardless of the used measure: sumCite, unionCite, and fusionCite.
Estudios de ciencias cuantitativas
1531
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
EXTRACTING(μ)
Question: Is there a partition R of W such that
1.
2.
3.
for each R 2 R there is a P 2 P such that R (cid:1) PAG,
for each P 2 P we have |{R 2 R | R (cid:3) P and |R| > 1}| ≤ 1,
the h-index of R with respect to μ is at least h?
DIVIDING(μ)
Question: Is there a partition R of W such that
1.
2.
for each R 2 R there is a P 2 P such that R (cid:1) PAG,
the h-index of R with respect to μ is at least h?
1.2. Conservative Splitting
We study for each of the problem variants an additional upper bound on the number of merged
articles that are split. We call these variants conservative: If an insincere author would like to
manipulate his or her profile temporarily, then he or she might prefer a manipulation that can
be easily undone. To formally define CONSERVATIVE ATOMIZING, CONSERVATIVE EXTRACTING, y
CONSERVATIVE DIVIDING, we add the following restriction to the partition R: “the number |PAG \ R|
of changed articles is at most k.”
A further motivation for the conservative variants is that, in a Google Scholar profile, un
author can click on a merged article and tick a box for each atomic article that he or she wants
to extract. As Google Scholar uses the unionCite measure (van Bevern et al., 2016b),
Conservative Extracting(unionCite) thus corresponds closely to manipulating the Google
Scholar h-index via few of the splitting operations available to the user.
1.3. Cautious Splitting
For each splitting operation, we also study an upper bound k on the number of split operations.
Following our previous work (van Bevern et al., 2016a), we call this variant cautious. In the case
of atomizing, conservativity and caution coincide, because exactly one operation is performed per
changed article. De este modo, we obtain two cautious problem variants: CAUTIOUS EXTRACTING and CAUTIOUS
DIVIDING. For both we add the following restriction to the partition R: “the number |R| − |PAG| of ex-
tractions (or divisions, respectivamente) is at most k.” In both variants we consider k to be part of the input.
1.4. Our Results
We investigate the parameterized computational complexity of our problem variants with respect to
the parameters “the h-index h to achieve,” and in the conservative case “the number k of modified
merged articles,” and in the cautious case “the number k of splitting operations.” To put it briefly, el
goal is to exploit potentially small parameter values (eso es, special properties of the input instances)
to gain efficient algorithms for problems that are in general computationally hard. In our context, el
choice of the parameter h is motivated by the scenario that young researchers may have an incentive
to increase their h-index and, because they are young, the h-index h to achieve is not very large. El
conservative and cautious scenario tries to capture that the manipulation can easily be undone or is
hard to detect, respectivamente. Por eso, it is well motivated that the parameter k shall be small. Nuestro
teórico (computational complexity classification) results are summarized in Table 1 (ver
Sección 2 for further definitions). The measures sumCite and unionCite behave in basically the same
Estudios de ciencias cuantitativas
1532
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Mesa 1.
definitions). For all FPT and W[1]-hardness results we also show NP-hardness
Time complexity of manipulating the h-index by splitting operations (mira la sección 2 para
Problema
Atomizing
sumCite / unionCite
Lineal (Teorema 1)
Conservative A.
Lineal (Teorema 1)
fusionCite
†
FPT
(Theorems 5 y 6)
?
W.[1]-h
(Teorema 7)
Extracting
Lineal (Teorema 2)
Conservative E.
Lineal (Teorema 2)
Cautious E.
Lineal (Teorema 2)
Dividing
Conservative D.
Cautious D.
†
FPT
(Teorema 3)
†,‡
FPT
(Teorema 3)
W.[1]-h},(cid:4) (Teorema 4)
NP-h(cid:4) (Teorema 5)
?
?
W.[1]-h
W.[1]-h
(Corolario 1)
(Corolario 1)
NP-h(cid:4) (Proposition 1)
?
?
W.[1]-h
W.[1]-h
(Corolario 1)
(Corolario 1)
†
wrt. parameter h, the h-index to achieve.
} wrt. parameter k, the number of operations.
?
wrt. parameter h + k + s, where s is the largest number of articles merged into one.
‡
NP-hard even if k = 1 (Proposition 1).
(cid:4) Parameterized complexity wrt. h open.
way. En particular, in the case of atomizing and extracting, manipulation is doable in linear time,
while fusionCite mostly leads to (parameterized) intractability; eso es, to high worst-case computa-
tional complexity. Además, the dividing operation (the most general one) seems to lead to com-
putationally much harder problems than atomizing and extracting.
We performed computational experiments with real-world data (van Bevern et al., 2016b) y
the mentioned linear-time algorithms, in particular for the case directly relevant to Google
Scholar; eso es, using the extraction operation and the unionCite measure. Our general findings
are that increases of the h-index by one or two typically are easily achievable with few operations.
The good news is that dramatic manipulation opportunities due to splitting are rare. They cannot
be excluded, sin embargo, and they could be easily executed when relying on standard operations
and measures (as used in Google Scholar). Working with fusionCite instead of the other two could
substantially hamper manipulation.
2. PRELIMINARIES
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
A lo largo de este trabajo, we use n := |V| for the number of input articles and m := |A| for the overall
number of arcs in the input citation graph D = (V, A). By W (cid:1) V we denote the articles in the author
profile that we are manipulating. Let degin(v) denote the indegree of an article v in a citation graph
re = (V, A); eso es, v’s number of citations. Además, let N in
D(v) := {tu | (tu, v) 2 A} denote the set of
D − W(v) := {tu | (tu, v) 2 A ^ u =2 W} be the set of articles outside W that cite v.
articles that cite v and N in
For each part P 2 PAG, the following three measures for the number μ(PAG) of citations of P have been
introducido (van Bevern et al., 2016b). They are illustrated in Figure 1. The measure
X
sumCite Pð Þ :¼
degin vð Þ
Estudios de ciencias cuantitativas
1533
v2P
h-Index manipulation by undoing merges
defines the number of citations of a merged article P as the sum of the citations of the atomic articles
it contains. This measure was proposed by de Keijzer and Apt (2013). A diferencia de, the measure
unionCite Pð Þ :¼
(cid:2)
(cid:2)
(cid:2)
(cid:2)
(cid:2)
N in
D vð Þ
(cid:2)
(cid:2)
(cid:2)
(cid:2)
(cid:2)
[
v2P
defines the number of citations of a merged article P as the number of distinct atomic articles citing
at least one atomic article in P. Google Scholar uses the unionCite measure (van Bevern et al.,
2016b). The measure
fusionCite Pð Þ :¼
(cid:2)
(cid:2)
(cid:2)
(cid:2)
(cid:2)
[
v 2 PAG
N in
D − W vð Þ
(cid:2)
(cid:2)
(cid:2)
(cid:2)
(cid:2) þ
(cid:3)
X
P0 2 P n Pf g
1 if 9v 2 P09w 2 PAG : v; wð
0 de lo contrario
Þ 2 A;
is perhaps the most natural one: At most one citation of a part P 0 2 P to a part P 2 P is counted; eso
es, we additionally remove duplicate citations between merged articles and self-citations of
merged articles.
Our theoretical analysis is in the framework of parameterized complexity (Cygan, Fomin, et al.,
2015; Downey & Fellows, 2013; Flum & Grohe, 2006; Niedermeier, 2006). Eso es, for those prob-
lems that are NP-hard, we study the influence of a parameter, an integer associated with the input,
on the computational complexity. For a problem P, we seek to decide P using a fixed-parameter
algoritmo, an algorithm with running time f (pag) · |I|oh(1), where I is the input and f (pag) is a computable
function depending only on the parameter p. If such an algorithm exists, then P is fixed-parameter
tractable (FPT) with respect to p. W.[1]-hard parameterized problems presumably do not admit FPT
algoritmos. Por ejemplo, the problem of finding an order-k clique in an undirected graph is known
to be W[1]-hard for the parameter k. W.[1]-hardness of a problem P parameterized by p can be
shown via a parameterized reduction from a known W[1]-hard problem Q parameterized by q.
Eso es, a reduction that runs in f (q) · nO(1) time on input of size n with parameter q and produces
instances that satisfy p ≤ f (q) for some function f.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
3. SUMCITE AND UNIONCITE
En esta sección, we study the sumCite and unionCite measures. We provide linear-time algorithms
for atomizing and extracting and analyze the parameterized complexity of dividing with respect to
the number k of splits and the h-index h to achieve. In our results for sumCite and unionCite, nosotros
Algoritmo 1: Atomizing
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, a nonnegative
integer h and a measure μ.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Output: A partition R of W.
1 R ;
2 foreach P 2 P do
3 A Atomize(PAG)
4
5
if 9A 2 A: μ(A) ≥ h then R R [ A
else R R [ {PAG}
6 return R
Estudios de ciencias cuantitativas
1534
h-Index manipulation by undoing merges
often tacitly use the observation that local changes to the merged articles do not influence the
citations of other merged articles.
3.1. Manipulation by Atomizing
Recall that the atomizing operation splits a merged article into singletons and that, for the at-
omizing operation, the notions of conservative (touching few articles) and cautious (haciendo
few split operations) manipulation coincide and are thus both captured by CONSERVATIVE
ATOMIZING. Both ATOMIZING and CONSERVATIVE ATOMIZING are solvable in linear time. Intuitivamente,
it suffices to find the merged articles that, when atomized, increase the number of articles with
at least h citations the most. This leads to Algorithms 1 y 2 for ATOMIZING and CONSERVATIVE
ATOMIZING. Herein, the Atomize() operation takes a set S as input and returns {{s} | s 2 S}. El
algorithms yield the following theorem.
Teorema 1. ATOMIZING(μ) and CONSERVATIVE ATOMIZING(μ) are solvable in linear time for μ 2
{sumCite, unionCite}.
Prueba. We first consider ATOMIZING(μ). Let R be a partition created from a partition P by
atomizing a part P* 2 PAG. Observe that for all P 2 P and R 2 R we have that P = R implies
μ(PAG) = μ(R), for μ 2 {sumCite, unionCite}. Intuitivamente, this means that atomizing a single part
P* 2 P does not alter the μ-value of any other part of the partition.
Algoritmo 1 computes a partition R that has a maximal number of parts R with μ(R) ≥ h that can
be created by applying atomizing operations to P: It applies the atomizing operation to each part
PAG 2 P if there is at least one singleton A in the atomization of P with μ(A) ≥ h. By the above
Algoritmo 2:
Conservative Atomizing
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, nonnegative
integers h and k, and a measure μ.
Output: A partition R of W.
1 R P
2
3
4
5
6
7
8
9
10
11
12
foreach P 2 P do
ℓ
PAG 0
A Atomize(PAG)
P ℓ
ℓ
if μ(PAG) ≥ h then ℓ
PAG + |{A 2 A | μ(A) ≥ h}|
P ℓ
– 1
PAG
for i 1 to k do
P* arg maxP2P{ℓ
if ℓ
P* > 0 entonces
PAG}
A Atomize(P*)
R (R \ {P*}) [ A
ℓ
P* −1
13 return R
Estudios de ciencias cuantitativas
1535
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
observación, this cannot decrease the total number of parts in the partition that have a μ-value of at
least h. Además, we have that for all R 2 R, we cannot potentially increase the number of parts
with μ-value at least h by atomizing R. De este modo, we get the maximal number of parts R with μ(R) ≥ h
that can be created by applying atomizing operations to P.
Obviamente, if R has at least h parts R with μ(R) ≥ h, we face a yes-instance. En cambio, if the
input is a yes-instance, then there is a number of atomizing operations that can be applied to P
such that the resulting partition R has at least h parts R with μ(R) ≥ h and the algorithm finds
such a partition R. Finalmente, it is easy to see that the algorithm runs in linear time.
The pseudocode for solving CONSERVATIVE ATOMIZING(μ) is given in Algorithm 2. Primero, in Lines
2–6, for each part P, Algoritmo 2 records how many singletons A with μ(A) ≥ h are created when
atomizing P. Entonces, in Lines 7–12, it repeatedly atomizes the part yielding the most such singletons.
This procedure creates the maximum number of parts that have a μ-value of at least h, because the
μ-value cannot be increased by exchanging one of these atomizing operations by another.
Obviamente, if R has at least h parts R with μ(R) ≥ h, then we face a yes-instance. En cambio, si
the input is a yes-instance, then there are k atomizing operations that can be applied to P to yield
an h-index of at least h. Because Algorithm 2 takes successively those operations that yield the
most new parts with h citations, the resulting partition R has at least h parts R with μ(R) ≥ h. Es
□
not hard to verify that the algorithm has linear running time.
3.2. Manipulation by Extracting
Recall that the extracting operation removes a single article from a merged article. All variants
of the extraction problem are solvable in linear time. Intuitivamente, in the cautious case, it suffices
to find k extracting operations that each increase the number of articles with h citations. En el
conservative case, we determine for each merged article a set of extraction operations that
increases the number of articles with h citations the most. Then we use the extraction opera-
tions for those k merged articles that yield the k largest increases in the number of articles with
h citations. This leads to Algorithms 3–5 for EXTRACTING, CAUTIOUS EXTRACTING, and CONSERVATIVE
EXTRACTING, respectivamente, which yield the following theorem.
Algoritmo 3:
Extracting
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, a nonnegative
integer h and a measure μ.
Output: A partition R of W.
1 R ;
2 foreach P 2 P do
3
4
5
6
7
foreach v 2 P do
if μ({v}) ≥ h then
R R [ {{v}}
P P \ {v}
if P 6¼ ; then R R [ {PAG}
8 return R
Estudios de ciencias cuantitativas
1536
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Algoritmo 4: Cautious Extracting
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, nonnegative
integers h and k, and a measure μ.
Output: A partition R of W.
1 R ;
2 foreach P 2 P do
3
4
5
6
7
8
foreach v 2 P do
if k > 0 and μ({v}) ≥ h and μ(PAG \ {v}) ≥ h then
R R [ {{v}}
P P \ {v}
k k – 1
if P 6¼ ; then R R [ {PAG}
9 return R
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
Teorema 2. EXTRACTING(μ), CONSERVATIVE EXTRACTING(μ), and CAUTIOUS EXTRACTING(μ) are solv-
able in linear time for μ 2 {sumCite, unionCite}.
Prueba. We first consider EXTRACTING(μ). Let R be a partition produced from P by extracting an
article from a part P* 2 PAG. Recall that this does not alter the μ-value of any other part (es decir., para
all P 2 P and R 2 R, we have that P = R implies μ(PAG) = μ(R) for μ 2 {sumCite, unionCite}).
Algoritmo 5:
Conservative Extracting
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, nonnegative
integers h and k, and a measure μ.
Output: A partition R of W.
foreach P 2 P do
ℓ
PAG 0
RP ;
foreach v 2 P do
if μ({v}) ≥ h and μ(PAG \ {v}) ≥ h then
RP RP [ {{v}}
P P \ {v}
ℓ
P ℓ
PAG + 1
if P = ; then RP RP [ {PAG}
1
2
3
4
5
6
7
8
9
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
10 P* the k elements of P 2 P with largest ℓP-values
S
11 R
P2P* RP [ (PAG \ P*)
12 return R
Estudios de ciencias cuantitativas
1537
h-Index manipulation by undoing merges
Consider Algorithm 3. It is easy to see that the algorithm only performs extracting operations
and that the running time is linear. So we have to argue that whenever there is a partition R
that can be produced by extracting operations from P such that the h-index is at least h, entonces
the algorithm finds a solution.
We show this by arguing that the algorithm produces the maximum number of articles with at
least h citations possible. Extracting an article that has strictly less than h citations cannot produce
an h-index of at least h unless we already have an h-index of at least h, because the number of
articles with h or more citations does not increase. Extracting an article with h or more citations
cannot decrease the number of articles with h or more citations. Por eso, if there are no articles with
at least h citations that we can extract, we cannot create more articles with h or more citations.
Por lo tanto, we have produced the maximum number of articles with h or more citations when the
algorithm stops.
The pseudocode for solving CAUTIOUS EXTRACTING(μ) is given in Algorithm 4. We perform up to k
extracting operations (Line 6). Each of them increases the number of articles that have h or more
citations by one. As Algorithm 4 checks each atomic article in each merged article, it finds k
extraction operations that increase the number of articles with h or more citations if they exist.
De este modo, it produces the maximum possible number of articles that have h or more citations and that
can be created by k extracting operations.
To achieve linear running time, we need to compute μ(PAG \ {v}) efficiently in Line 4. This can be
done by representing articles as integers and using an n-element array A that stores throughout
the loop in Line 3, for each article w 2 Nin
D[PAG], the number A[w] of articles in P that are cited by
w. Using this array, one can compute μ(PAG \ {v}) en O(degin(v)) time in Line 4, amounting to overall
linear time. The time needed to maintain array A is also linear: We initialize it once in the begin-
ning with all zeros. Entonces, before entering the loop in Line 3, we can in O(|Nin
D(PAG)|) total time store
for each article v 2 Nin
D[PAG], the number A[w] of articles in P that are cited by w. To update the array
within the loop in Line 3, we need O(degin(v)) time if Line 6 aplica. In total, this is linear time.
Finalmente, the pseudocode for solving CONSERVATIVE EXTRACTING(μ) is given in Algorithm 5. For each
merged article P 2 PAG, Algoritmo 5 computes a set RP and the number ℓ
P of additional articles v
with μ(v) ≥ h that can be created by extracting. Then it chooses a set P* of k merged articles P 2 PAG
with maximum ℓP and, from each P 2 P*, extracts the articles in RP.
This procedure creates the maximum number of articles that have a μ-value of at least h while
only performing extraction operations on at most k merges.
Obviamente, if the solution R has at least h parts R with μ(R) ≥ h, then we face a yes-instance.
En cambio, if the input is a yes-instance, then there are k merged articles that we can apply
extraction operations to, such that the resulting partition R has at least h parts R with μ(R) ≥ h.
Because the algorithm produces the maximal number of parts R with μ(R) ≥ h, it achieves an
h-index of at least h.
The linear running time follows by implementing the check in Line 5 en O(degin(v)) time as
described for Algorithm 4 and by using counting sort to find the k parts to extract from in Line 10. □
3.3. Manipulation by Dividing
Recall that the dividing operation splits a merged article into two arbitrary parts. First we con-
sider the basic and conservative cases and show that they are FPT when parameterized by the
h-index h. Then we show that the cautious variant is W[1]-hard when parameterized by k.
DIVIDING(μ) is closely related to H-INDEX MANIPULATION(μ) (van Bevern et al., 2016b; de Keijzer
Estudios de ciencias cuantitativas
1538
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
& Apt, 2013) cual es, given a citation graph D = (V, A), a subset of articles W (cid:1) V, y un
nonnegative integer h, to decide whether there is a partition P of W such that P has h-index h
with respect to μ. de Keijzer and Apt (2013) showed that H-INDEX MANIPULATION(sumCite) is NP-
hard, even if merges are unconstrained. The NP-hardness of H-INDEX MANIPULATION for μ 2
{unionCite, fusionCite} follows. We can reduce H-INDEX MANIPULATION to CONSERVATIVE
DIVIDING by defining the partition P = {W.}; hence we get the following.
Proposition 1. DIVIDING and CONSERVATIVE DIVIDING are NP-hard for μ 2 {sumCite, unionCite,
fusionCite}.
As to computational tractability, DIVIDING and CONSERVATIVE DIVIDING are FPT when parameterized
by h—the h-index to achieve.
Teorema 3. DIVIDING and CONSERVATIVE DIVIDING( μ) can be solved in 2O(h4 log h) · nO(1) tiempo,
where h is the h-index to achieve and μ 2 {sumCite, unionCite}.
Prueba. The pseudocode is given in Algorithm 6. Herein, Merge(D, W., h, μ) decides H-INDEX
MANIPULATION(μ); eso es, it returns true if there is a partition Q of W such that has h-index h
and false otherwise. It follows from van Bevern et al. (2016b, Teorema 7) that Merge can be
carried out in 2O(h4 log h) · nO(1) tiempo.
Algoritmo 6 first finds, using Merge, the maximum number ℓ
P of (merged) articles with at least
h citations that we can create in each part P 2 PAG. Para esto, we first prepare an instance (D 0, W. 0, h, μ)
of H-INDEX MANIPULATION(μ) in Lines 2 y 3. In the resulting instance, we ask whether there is a
partition of P with h-index h. If this is the case, then we set ℓP to h. De lo contrario, we add one artificial
article with h citations to W 0 in Line 9. Intuitivamente, this causes Merge to check whether there is a
partition of P into h − 1 (more generally, one less than in the current iteration) merged articles with
h citations each in the next iteration. We iterate this process until Merge returns true, or we find
that there is not even one merged article contained in P with h citations. Claramente, this process
Algoritmo 6:
Conservative Dividing
Input: A citation graph D = (V, A), a set W (cid:1) V of articles, a partition P of W, nonnegative
integers h and k, and a measure μ.
Output: true if k dividing operations can be applied to P to yield h-index h and false
de lo contrario.
foreach P 2 P do
D0 The graph obtained from D by removing all citations (tu, v) such that
v =2 P and adding h + 1 articles r1, …, rh+1
W0 P, ℓ
PAG 0
for i 0 to h do
if Merge(D0, W0, h, μ) entonces
P h – i
ℓ
Break
else
Add ri to W0 and add each citation (rj, ri), j 2 {1, …, h + 1} \ {i} to D0
1
2
3
4
5
6
7
8
9
10 return 9P0 (cid:1) P s.t. |P0| ≤ k and (cid:2)
P2P0 ℓ
PAG
≥ h
Estudios de ciencias cuantitativas
1539
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
correctly computes ℓP. De este modo, the algorithm is correct. The running time is clearly dominated by the
calls to Merge. As Merge runs in 2O(h4 log h) · nO(1) tiempo (van Bevern et al., 2016b, Teorema 7), el
□
running time bound follows.
We note that Merge can be modified so that it outputs the desired partition. Por eso, podemos
modify Algorithm 6 to output the actual solution. Además, for k = n, Algoritmo 6 solves
the nonconservative variant, which is therefore also fixed-parameter tractable parameterized
by h.
A diferencia de, for the cautious variant we show W[1]-hardness when parameterized by k, el
number of allowed operations.
Teorema 4. CAUTIOUS DIVIDING(μ) is NP-hard and W[1]-hard when parameterized by k for μ
2 {sumCite, unionCite, fusionCite}, even if the citation graph is acyclic.
Prueba. We reduce from the UNARY BIN PACKING problem: given a set S of n items with integer sizes
si, i 2 {1, …, norte}, ℓ bins and a maximum bin capacity B, can we distribute all items into the ℓ bins?
Herein, all sizes are encoded in unary. UNARY BIN PACKING parameterized by ℓ is W[1]-hard (Jansen,
Kratsch, et al., 2013).
Given an instance (S, ℓ, B) of UNARY BIN PACKING, we produce an instance (D, W., PAG, h, ℓ − 1)
of CAUTIOUS DIVIDING(sumCite). Let s* = (cid:2)i si be the sum of all item sizes. We assume that B < s*
and ℓ · B ≥ s* as, otherwise, the problem is trivial, because all items fit into one bin or they
collectively cannot fit into all bins, respectively. Furthermore, we assume that ℓ < B because,
otherwise, the instance size is upper bounded by a function of ℓ and, hence, is trivially FPT
with respect to ℓ. We construct the instance of CAUTIOUS DIVIDING(sumCite) in polynomial time
as follows.
(cid:129) Add s* articles x1, …, xs* to D. These are only used to increase the citation count of other
articles.
(cid:129) Add one article ai to D and W for each si.
(cid:129) For each article ai, add citations (xj, ai) for all 1 ≤ j ≤ si to G. Note that, after adding these
citations, each article ai has citation count si.
(cid:129) Add Δ := ℓ · B – s* articles u1, …, uΔ to D and W.
(cid:129) For each article ui with i 2 {1, …, Δ}, add a citation (x1, ui) to D. Note that each article ui
has citation count 1.
(cid:129) Add B – ℓ articles h1, …, hB−ℓ to D and W.
(cid:129) For each article hi with i 2 {1, …, B − ℓ}, add citations (xj, hi) for all 1 ≤ j ≤ B to D. Note
that each article hi has citation count B.
(cid:129) Add P* = {a1, …, an, u1, …, uΔ} to P, for each article hi with i 2 {1, …, B − ℓ}, add {hi} to
P, and set h = B.
Now we show that (S, ℓ, B) is a yes-instance if and only if (D, W, P, h, ℓ − 1) is a yes-instance.
()) Assume that (S, ℓ, B) is a yes-instance and let S1, …, Sℓ be a partition of S such that items
in Si are placed in bin i. Now we split P* into ℓ parts R1, …, Rℓ in the following way. Note that
i = Δ. Recall that
for each Si, we have that (cid:2)
i for some (cid:2)
sj2Si sj = B − (cid:2)
there are Δ articles u1, …, uΔ in P*. Let (cid:2)
0 = 0 and
if (cid:2)
i = 0. We set Ri = {aj | sj 2 Si} [ Ui. Then for each Ri, we have that
sumCite(Ri) = sumCite({aj | sj 2 Si}) + sumCite(Ui), which simplifies to sumCite(Ri) = (cid:2)sj2Si sj +
(cid:2)
i = B. For each i, 1 ≤ i ≤ n, we have sumCite({hi}) = B. Hence, R = {R1, …, Rℓ, {h1}, …, {hB−ℓ}}
has h-index B.
3 and that each clause contains three literals
over mutually distinct variables. Given a formula F with variables x1, …, xn and clauses c1, …,
cm such that n + m > 3, we produce an instance (D, W., PAG, metro + norte) of ATOMIZING(fusionCite) o
EXTRACTING(fusionCite) in polynomial time as follows. The construction is illustrated in Figure 3.
For each variable xi of F, add to D and W sets X F
i and X T
i
i := {XT
i;1, XF
to P. Let h := m + norte. For each variable xi, agregar
variable articles. Add X F
i;3} and X T
i := {XF
i;2, XF
i;1, XT
i;2, XT
i;3} de
i;1 to XT
i;1 to XF
1. h − 2 citations from (newly introduced) distinct atomic articles to XT
2. citations from XF
3. citations from XT
i;2 to XF
i;3 y
i;2 to XT
i;3.
Próximo, for each clause cj of F, add a clause article Cj with h − 4 incoming citations to D, to W,
and add {Cj} to P. Finalmente, if a positive literal xi occurs in a clause cj, then add citations (XT
i; ℓ, Cj)
to D for ℓ 2 {2, 3}. If a negative literal ¬xi occurs in a clause cj, then add citations (XF
i;ℓ, Cj) to D
i;2 and from XT
i;2 and from XF
i;1 and XF
i;1,
Cifra 3.
clause cj.
Illustration of the construction in the proof of Theorem 5 for a literal ¬xi contained in a
Estudios de ciencias cuantitativas
1541
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
for ℓ 2 {2, 3}. This concludes the construction. Observe that D is acyclic, as all citations go
from variable articles to clause articles or to variable articles with a higher index. It remains to
show that F is satisfiable if and only if (D, W., PAG, h) is a yes-instance.
i 2 R and we put X T
i gets two citations from {XT
()) If F is satisfiable, then a solution R for (D, W., PAG, h) looks as follows: for each i 2 {1, …,
norte}, if xi is true, then we put XF
i 2 R otherwise. All other articles of D are
added to R as singletons. We count the citations that every part of R gets from other parts of R.
i; ℓ} for ℓ 2 {1, 2} and the h − 2 initially added
If xi is true, then X F
citas. Además, for the clause cj containing the literal xi, {Cj} gets two citations from {XT
i; ℓ}
for ℓ 2 {2, 3}, at least two citations from variable articles for two other literals it contains, y
the h − 4 initially added citations. Symmetrically, if xi is false, entonces {X T
i } gets h citations and so
does every {Cj} for each clause cj containing the literal ¬xi. As every clause is satisfied and
every variable is either true or false, it follows that each of the m clause articles gets h citations
y eso, for each of the n variables xi, either X F
i gets h citations. It follows that h = m + norte
parts of R get at least h citations and thus, that R has h-index at least h.
i or X T
(() Let R be a solution for (D, W., PAG, metro + norte). We first show that, for each variable xi, nosotros
have either X T
i 2 R or X F
i 2 R. Para tal fin, it is important to note two facts:
1. For each variable xi, X T
one with h − 2 incoming arcs. De este modo, no subset of X T
for X F
i .
Si, for some variable xi, the part X T
2.
i contains two atomic articles with one incoming arc in D and
i can get h citations. The same holds
i 2 R gets h citations, then X F
i
=2 R and vice versa.
i , X F
De este modo, as there are at most m clause articles and R contains h = m + n parts with h citations, R
contains exactly one of the parts X T
i of each variable xi. It follows that, in R, all singleton clause
articles have to receive h citations. Each such article gets at most h − 4 initially added citations and
citations from at most three sets X T
i or X F
i for some variable xi. De este modo, for each clause cj, there is a
=2 R, respectivamente. It follows that setting each
literal xi in cj or a literal ¬xi in cj such that X T
i
□
xi to true if and only if X T
i
=2 R gives a satisfying truth assignment to the variables of F.
=2 R or X F
i
This NP-hardness result motivates the search for fixed-parameter tractability.
Teorema 6. ATOMIZING(fusionCite) can be solved in O(4h2
(norte + metro)) tiempo, where h is the h-index
to achieve.
Prueba. We use the following procedure to solve an instance (D, W., PAG , h) of ATOMIZING
(fusionCite).
Let P≥h be the set of merged articles P 2 P with fusionCite (PAG) ≥ h. Si |P≥h| ≥ h, then we face a
yes-instance and output “yes.” We can determine whether this is the case in linear time because
we can compute fusionCite (PAG) in linear time for all P 2 PAG. Below we assume that |P≥h| < h.
First, we atomize all P 2 P that cannot have h or more citations; that is, for which, even if
we atomize all merged articles except for P, we have fusionCite(P) < h. Formally, we atomize P
if (cid:2)
D−P(v)| < h. Let P0 be the partition obtained from P after these atomizing operations;
note that P0 can be computed in linear time.
v2P |Nin
The basic idea is now to look at all remaining merged articles that receive at least h citations
from atomic articles; they form the set P
4 See http://gitlab.com/rvb/split-index.
Estudios de ciencias cuantitativas
1547
h-Index manipulation by undoing merges
GreedyMin. GreedyMax and GreedyMin are surprisingly close. Sin embargo, the differences be-
tween the methods in general are rather small, indicating that the property of having potential is
inherent to the profile rather than the method for generating initial merges. As GreedyMax is one of
the most straightforward of the four, we will focus only on GreedyMax below.
At first glance, we could expect that the number of authors with potential would decrease
monotonically with increasing compatibility threshold: Tenga en cuenta que, for increasing compatibility
threshold the edge sets in compatibility graphs are decreasing in the subset order. Hence each
maximal clique in the compatibility graph can only increase in size. Sin embargo, because we employ
heuristics to find the set of initial merges (in the case of Ramsey, GreedyMax, and GreedyMin) y
because there may be multiple choices for a maximum-size clique (for Maximum), diferente
possible partitionings into initial merges may result. This can lead to the fact that the authors with
potential do not decrease monotonically with increasing compatibility threshold.
Además, with the same initial merges it can happen that an increase in the h-index value
through unmerging with respect to sumCite is possible and no increase is possible with respect to
unionCite and vice versa. The first may happen, Por ejemplo, if two articles v, w are merged such
that sumCite({v, w}) is above but unionCite({v, w}) is below the h-index threshold. The second may
happen if the h-index of the merged profile is lower for unionCite compared to that for sumCite.
Entonces, unmerging articles may yield atomic articles that are still above the h-index threshold for
unionCite but not for sumCite. As can be seen from Figure 5, both options occur in our data set.
The fraction of authors with potential differs clearly between the three data sets. Los autores
in ai10-2011 have already accumulated so many citations that almost all have potential for
each threshold up to 0.6. Mientras tanto, the authors with potential in ai10-2013 continually drop
for increasing threshold and this drop is even more pronounced for ijcai-2013. This may reflect
the three levels of seniority represented by the data sets.
There is no clear difference between the achievable h-indices when comparing fusionCite
with unionCite and sumCite: While there are generally more authors with potential for each
threshold for fusionCite in the ai10-2011 data set, there are fewer authors with potential for the
ai10-2013 data set, and a similar number of authors with potential for the ijcai-2013 data set.
Focusing on the most relevant threshold, 0.4, and the unionCite measure, which is used by
Google Scholar (van Bevern et al., 2016a), we see that all authors (100%) in ai10-2011 could
potentially increase their h-indices by unmerging, four authors (57%) could do so in ai10-
2013, and seven (31%) in ijcai-2013. We next focus only on these authors with potential
and gauge to that extent manipulation is possible.
5.5. Extent and Cost of Possible Manipulation
Cifra 6 shows the largest achievable h-index increases for the authors with potential in the
three data sets: De nuevo, the lower edge of a box is the first quartile, the upper edge the third
quartile, and the thick bar is the median; the remaining data points are shown by dots.
In the majority of cases, drastic increases can only be achieved when the compatibility
threshold is lower than 0.4. Generally, the increases achieved for the fusionCite measure are
slightly lower than for the other two, but the median is at most smaller by one. debido a la
heuristic nature of our algorithms for fusionCite, we cannot exclude the possibility that the
largest possible increases for fusionCite are comparable to the other two measures. In the most
relevant regime of unionCite and compatibility threshold t = 0.4, the median h-index increases
son 4 for the ai10-2011 authors, 1 for the ai10-2013 authors, y 2 for the ijcai-2013 authors.
Notablemente, there is an outlier in ijcai-2013 who can achieve an increase of 5.
Estudios de ciencias cuantitativas
1548
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Cifra 6. h-index increases for each compatibility threshold for authors with potential (note that
these authors may be different for different threshold values). The increases are largest-possible for
sumCite and unionCite.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 7. h-index increases versus number of changed articles or allowed operations for authors
with potential and compatibility threshold 0.4. The increases are largest-possible for sumCite and
unionCite.
Estudios de ciencias cuantitativas
1549
h-Index manipulation by undoing merges
Cifra 7 shows the h-index increases that can be achieved by changing a certain number of
artículos (in the rows containing the conservative problem variants) or with a certain number of
operaciones (in the row containing the cautious problem variant) for compatibility threshold 0.4.
For the majority of ai10-2013 and ijcai-2013 authors we can see that, if manipulation is
posible, then the maximum h-index increase can be reached already by manipulating at most
two articles and performing at most two unmerges. The more senior authors in the ai10-2011
data set can still gain increased h-indices by manipulating four articles and performing four
unmerges. For the outlier in ijcai-2013 with an h-index increase of 5, we see that there is one
merged article that contains many atomic articles with citations above her or his unmanipulated
h-index: With respect to an increasing number of operations, we see a continuously increasing
h-index for CAUTIOUS EXTRACTING compared to a constant high increase for CONSERVATIVE ATOMIZING.
Summarizing, our findings indicate that realistic profiles from academically young authors
cannot in the majority of cases be manipulated by unmerging articles. If they can, then in most
cases the achievable increase in h-index is at most two. Además, our findings indicate that
the increase can be obtained by tampering with a small number of merged articles (at most two
in the majority of cases).
6. CONCLUSIÓN
En resumen, our theoretical results suggest that using fusionCite as a citation measure for
merged articles makes manipulation by undoing merges harder. From a practical point of
vista, our experimental results indicate that author profiles with surprisingly large h-index
may be worth inspecting concerning potential manipulation.
Regarding theory, we leave three main open questions concerning the computational com-
plexity of EXTRACTING(fusionCite), the parameterized complexity of DIVIDING(fusionCite), and the pa-
rameterized complexity of CAUTIOUS DIVIDING (sumCite / unionCite) with respect to h (ver tabla 1),
as the most immediate challenges for future work. También, finding hardness reductions that produce
more realistic instances would be desirable. From the experimental side, evaluating the potentially
possible h-index increase by splitting on real merged profiles would be interesting as well as com-
putational experiments using fusionCite as a measure. Además, it makes sense to consider the
manipulation of the h-index also in context with the simultaneous manipulation of other indices
(p.ej., Google’s i10-index; see also Pavlou and Elkind [2016]) and to look for Pareto-optimal
soluciones. We suspect that our algorithms easily adapt to other indices. Además, it is natural
to consider combining merging and splitting in manipulation of author profiles.
EXPRESIONES DE GRATITUD
The authors thank Clemens Hoffmann and Kolja Stahl for their excellent support in implementing
our algorithms and performing and analyzing the computational experiments.
Manuel Sorge’s work was carried out while affiliated with Technische Universität Berlin, Berlina,
Alemania; Ben-Gurion University of the Negev, Beer-Sheva, Israel; and University of Warsaw,
Warsaw, Poland.
CONTRIBUCIONES DE AUTOR
René van Bevern: Conceptualización, Formal Analysis, Escritura: borrador original, Escritura: revisión
& edición, Visualización. Christian Komusiewicz: Conceptualización, Formal Analysis, Writing—
original draft, Escritura: revisión & edición, Curación de datos, Software. Hendrik Molter:
Conceptualización, Formal Analysis, Escritura: borrador original, Escritura: revisión & edición,
Estudios de ciencias cuantitativas
1550
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Software. Rolf Niedermeier: Conceptualización, Formal Analysis, Escritura: borrador original, Writing—
revisar & edición. Manuel Sorge: Conceptualización, Formal Analysis, Escritura: borrador original,
Escritura: revisión & edición, Curación de datos, Visualización, Software. Toby Walsh: Conceptualización,
Formal Analysis, Escritura: borrador original, Escritura: revisión & edición.
CONFLICTO DE INTERESES
Los autores no tienen intereses en competencia.
INFORMACIÓN DE FINANCIACIÓN
We acknowledge support by the Open Access Publication Fund of TU Berlin. René van Bevern
was supported by Mathematical Center in Akademgorodok, agreement No. 075-15-2019-1675
with the Ministry of Science and Higher Education of the Russian Federation. Hendrik Molter was
supported by the DFG, projects DAPA (NI 369/12) and MATE (NI 369/17). Manuel Sorge was
supported by the DFG, project DAPA (NI 369/12), by the People Programme (Marie Curie
Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under REA
grant agreement number 631163.11, the Israel Science Foundation (grant no. 551145/14), y
the European Research Council (ERC) under the European Union’s Horizon 2020 research and
innovation programme under grant agreement number 714704. Toby Walsh was supported by
the European Research Council (ERC) under the European Union’s Horizon 2020 research and
innovation programme under grant agreement number 670077.
DISPONIBILIDAD DE DATOS
The data sets used in this paper were originally collected by van Bevern et al. (2016b) y son
available under http://gitlab.com/rvb/split-index.
REFERENCIAS
Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipula-
tion through self-citation analysis. cienciometria, 87(1), 85–98.
DOI: https://doi.org/10.1007/s11192-010-0306-5, PMID:
21472020, PMCID: PMC3043246
Bodlaender, h. l., & Kreveld, METRO. camioneta. (2015). Google Scholar
makes it hard–the complexity of organizing one’s publications.
Information Processing Letters, 115(12), 965–968. DOI: https://
doi.org/10.1016/j.ipl.2015.07.003
Boppana, r., & Halldórsson, METRO. METRO. (1992). Approximating maximum
independent sets by excluding subgraphs. BIT Numerical Mathe-
matemáticas, 32(2), 180–196. DOI: https://doi.org/10.1007/BF01994876
Cygan, METRO., Fomin, F. v., Kowalik, l., Lokshtanov, D., Marx, D.,
Pilipczuk, METRO., Pilipczuk, METRO., & Saurabh, S. (2015). Parameterized
Algorithms. Heidelberg: Saltador. DOI: https://doi.org/10.1007
/978-3-319-21275-3
de Keijzer, B., & Apt, k. R. (2013). The h-index can be easily ma-
nipulated. Bulletin of the EATCS, 110, 79–85.
Delgado López-Cózar, MI., Robinson-García, NORTE., & Torres-Salinas, D.
(2014). The Google Scholar experiment: How to index false papers
and manipulate bibliometric indicators. Journal of the Association
for Information Science and Technology, 65(3), 446–454. DOI:
https://doi.org/10.1002/asi.23056
Downey, R. GRAMO., & Fellows, METRO. R. (2013). Fundamentals of
Parameterized Complexity. Heidelberg: Saltador. DOI: https://
doi.org/10.1007/978-1-4471-5559-1
Egghe, l. (2006). Theory and practise of the g-index. cienciometria,
69(1), 131–152. DOI: https://doi.org/10.1007/s11192-006-0144-7
Faliszewski, PAG., & Procaccia, A. D. (2010). AI’s war on manipula-
ción: Are we winning? AI Magazine, 31(4), 53–64. DOI: https://
doi.org/10.1609/aimag.v31i4.2314
Faliszewski, PAG., Hemaspaandra, MI., & Hemaspaandra, l. A. (2010).
Using complexity to protect elections. Communications of the ACM,
53(11), 74–82. DOI: https://doi.org/10.1145/1839676.1839696
Flum, J., & Grohe, METRO. (2006). Parameterized Complexity Theory.
Heidelberg: Saltador.
Hirsch, j. mi. (2005). An index to quantify an individual’s scientific
research output. procedimientos de la Academia Nacional de Ciencias
of the United States of America, 102(46), 16569–16572. DOI:
https://doi.org/10.1073/pnas.0507655102, PMID: 16275915,
PMCID: PMC1283832
Jansen, K., Kratsch, S., Marx, D., & Schlotter, I. (2013). Bin packing
with fixed number of bins revisited. Journal of Computer and
System Sciences, 79(1), 39–49. DOI: https://doi.org/10.1016/j
.jcss.2012.04.004
Jukna, S. (2001). Extremal Combinatorics – with Applications in
Computer Science. Texts in Theoretical Computer Science.
Heidelberg: Saltador. DOI: https://doi.org/10.1007/978-3-662
-04650-0
Lesk, METRO. (2015). How many scientific papers are not original?
Proceedings of the National Academy of Sciences of the United
States of America, 112(1), 6–7. DOI: https://doi.org/10.1073
/pnas.1422282112, PMID: 25538304, PMCID: PMC4291619
Niedermeier, R. (2006). Invitation to Fixed-Parameter Algorithms.
Oxford Lecture Series in Mathematics and Its Applications.
Estudios de ciencias cuantitativas
1551
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
h-Index manipulation by undoing merges
Oxford: prensa de la Universidad de Oxford. DOI: https://doi.org/10.1093
/acprof:oso/9780198566076.001.0001
Oravec, j. A. (2017). The manipulation of scholarly rating and mea-
surement systems: Constructing excellence in an era of academic
stardom. Teaching in Higher Education, 22(4), 423–436. DOI:
https://doi.org/10.1080/13562517.2017.1301909
Pavlou, C., & Elkind, mi. (2016). Manipulating citation indices in a social
contexto. In Proceedings of the 15th International Conference on
Autonomous Agents and Multiagent Systems (AAMAS ’16)
(páginas. 32–40).
van Bevern, r., Komusiewicz, C., Molter, h., Niedermeier, r.,
Sorge, METRO., & Walsh, t. (2016a). h-Index manipulation by undoing
merges. In Proceedings of the 22nd European Conference on
Artificial Intelligence (ECAI ’16), Fronteras de la inteligencia artificial
y aplicaciones, (páginas. 895–903).
van Bevern, r., Komusiewicz, C., Niedermeier, r., Sorge, METRO., &
Walsh, t. (2016b). h-Index manipulation by merging articles:
Modelos, theory, and experiments. Artificial Intelligence, 240,
19–35. DOI: https://doi.org/10.1016/j.artint.2016.08.001
Vinkler, PAG. (2013). Would it be possible to increase the Hirsch-index,
π-index or CDS-index by increasing the number of publications or
citations only by unity? Journal of Informetrics, 7(1), 72–83. DOI:
https://doi.org/10.1016/j.joi.2012.08.001
Woeginger, GRAMO. j. (2008). An axiomatic analysis of Egghe’s g-index.
Journal of Informetrics, 2(4), 364–368. DOI: https://doi.org
/10.1016/j.joi.2008.05.002
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
4
1
5
2
9
1
8
7
1
0
3
8
q
s
s
_
a
_
0
0
0
9
3
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Estudios de ciencias cuantitativas
1552