RESEARCH ARTICLE
Improving overlay maps of science:
Combining overview and detail
Peter Sjögårde1,2
1Health Informatics Centre, Department of Learning, Informatics, Management and Ethics,
Karolinska Institutet, Stockholm, Sweden
2University Library, Karolinska Institutet, Stockholm, Sweden
Keywords: algorithmic subject classification, citation networks, publication-level classification,
PubMed, science mapping
ABSTRACT
Overlay maps of science are global base maps over which subsets of publications can be
projected. Such maps can be used to monitor, explore, and study research through its
publication output. Most maps of science, including overlay maps, are flat in the sense that
they visualize research fields at one single level. Such maps generally fail to provide both
overview and detail about the research being analyzed. The aim of this study is to improve
overlay maps of science to provide both features in a single visualization. I created a map
based on a hierarchical classification of publications, including broad disciplines for overview
and more granular levels to incorporate detailed information. The classification was obtained
by clustering articles in a citation network of about 17 million publication records in PubMed
from 1995 onwards. The map emphasizes the hierarchical structure of the classification by
visualizing both disciplines and the underlying specialties. To show how the visualization
methodology can help getting both an overview of research and detailed information about
its topical structure, I studied two cases: coronavirus/Covid-19 research and the university
alliance called Stockholm Trio.
1.
INTRODUCTION
To be able to support and manage research activities, there is a need to monitor and study
research; for example, to coordinate research, for follow-up investment, or to strengthen col-
laboration in targeted areas. It is relatively easy to keep track of the research activities of small
research units. In contrast, large research units, such as whole universities, may have hundreds
or thousands of employees, producing thousands of research publications each year within a
broad spectrum of topics. Keeping track of the competences, research areas, and collabora-
tions is a challenging task for such organizations.
Research publications are one important output of research activity. Publications can be
studied to monitor research activity and gain insight into aspects such as collaboration pat-
terns, specialization, strong research areas, trends, and development. Overlay maps of science
have been proposed to “offer an intuitive way of visualizing the position of organizations or
topics in a fixed map” (Rafols, Porter, & Leydesdorff, 2010). Overlay maps are base maps over
which subsets of publications or filters can be projected, for example to study the position of
organizations or topics, or to highlight properties such as citation impact, open access pub-
lishing or clinical research (Kay, Newman et al., 2014).
a n o p e n a c c e s s
j o u r n a l
Citation: Sjögårde, P. (2022). Improving
overlay maps of science: Combining
overview and detail. Quantitative
Science Studies, 3(4), 1097–1118.
https://doi.org/10.1162/qss_a_00216
DOI:
https://doi.org/10.1162/qss_a_00216
Peer Review:
https://publons.com/publon/10.1162
/qss_a_00216
Received: 15 October 2021
Accepted: 17 September 2022
Corresponding Author:
Peter Sjögårde
peter.sjogarde@ki.se
Handling Editor:
Vincent Larivière
Copyright: © 2022 Peter Sjögårde.
Published under a Creative Commons
Attribution 4.0 International (CC BY 4.0)
license.
The MIT Press
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
To this point, most overlay maps have been flat in the sense that they visualize research
fields at one single level, commonly the levels of research disciplines or specialties. Such maps
generally fail to provide both overview and detail about the research being studied. The aim of
this study is to improve overlay maps of science to provide these two features in one single,
interactive map, having a focus on the biomedical sciences. The maps created enable users to
explore multiple levels of a hierarchical classification in a single interactive visualization.
2. BACKGROUND
Visualizations of science have been around for a long time (for overviews, see Börner, Chen, &
Boyack, 2005; Petrovich, 2020; van Eck & Waltman, 2014; Zitt, Lelu et al., 2019). Early work
focused on maps restricted to one or a few research areas. Different aspects of areas have been
visualized and studied using a variety of entities and relations: for example, copublishing
between researchers or organizations, co-occurrence of keywords, and citation relations
between publications or journals.
Since the end of the 1990s, maps that cover large parts of the science system (at least within
the natural science and biomedicine) have been created. Initial maps of science were based
on journals and made it possible to get unprecedented overviews of the science system (e.g.,
Bassecoulard & Zitt, 1999; Boyack, Klavans, & Börner, 2005; Leydesdorff, 2004, 2006; Moya-
Anegón, Vargas-Quesada et al., 2004).
Rafols et al. (2010) showed how comprehensive maps of science can be used as base maps,
over which overlays can be projected. The idea of such a map is to fix the positions of the nodes
in the map, representing for example research fields, so that an overlay projected onto the map
can be easily compared with the base map, as well as with other projections. For instance, con-
sider the publication outputs, A and B, of two universities. An overlay map is created by project-
ing A onto a base map. The size of the nodes is scaled in relation to the distribution of A over
research fields. Another map is created based on B, using the same procedure. We can now
compare the subject orientation of the two universities by exploring the two maps and compare
node sizes. If we color the nodes of the maps based on some variable, we can analyze different
aspects of A and B, such as the amount of open access publishing, citation impact or degree of
international collaboration in different research fields. Compared to maps restricted to particular
areas, overlay maps provide context and points of reference, for example by offering the possi-
bility to spot areas in which A and B do not have any research.
Since overlay maps were introduced in scientometrics, they have been used in many appli-
cations. Kay et al. (2014) used overlay maps to visualize patents by companies; Tang and
Shapira (2011) analyzed the growth of U.S. and China copublications in nanotechnology;
Klaine, Koelmans et al. (2012) positioned environmental, health, and safety of nanomaterials
in relation to general nanotechnology; Leydesdorff, Moya-Anegón, and Guerrero-Bote (2015)
created an overlay map based on journal relations using Scopus data and exemplified how the
map can be used to explore the publication output of authors, organizational units, or other
publication sets; and Rotolo, Rafols et al. (2017) used three case studies to demonstrate the use
of overlay maps for strategic intelligence.
Most scientometric studies using overlay maps have visualized maps at one single level and
one single entity (e.g., journals or keywords). An exception is the early work by Small (1999)
visualizing a hierarchical structure of a set of about 37 thousand documents. The study by
Rotolo et al. (2017) includes what they refer to as “cognitive” base maps of research publica-
tions at different granularity levels. These maps are based on the Web of Science categories at
the broadest level, journals at the meso-level, and MeSH at the most granular level. However,
Quantitative Science Studies
1098
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
the different granularity levels are presented in different maps and hierarchical relations
between levels are not shown.
To provide the possibility to navigate from broad to narrow levels in one single map, I base
the map presented in this paper on a hierarchical classification obtained by clustering articles
in a citation network.
Publication-level classification at a global level (covering a complete multidisciplinary data
source) obtained by clustering articles in a citation network was first implemented by Waltman
and van Eck (2012). Compared to classification at the journal level, publication-level classifi-
cations can be made more granular. A hierarchical structure can be obtained by merging clus-
ters at lower levels into broader clusters. Publication-level classifications have been used to
create overlay maps (RoRI Institute, Waltman et al., 2019). However, the applications are few
and lack hierarchical structure, other than node coloring by major research areas.
The validity of the clustering solutions created by clustering articles in citation networks has
been contested (Held, Laudel, & Gläser, 2021). There is no ground truth classification and
different methodological choices result in different, sometimes equally valid, representations
of research delineation (Glänzel & Schubert, 2003; Gläser, Glänzel, & Scharnhorst, 2017;
Klavans & Boyack, 2017; Mai, 2011; Sjögårde & Ahlgren, 2018; Smiraglia & van den Heuvel,
2013; Velden, Boyack et al., 2017; Waltman & van Eck, 2012). Nevertheless, the results have
been compared to a wide range of baselines and many different applications have been
evaluated and compared (Ahlgren, Chen et al., 2020; Boyack, 2017; Boyack, Newman
et al., 2011; Boyack & Klavans, 2010, 2018; Donner, 2021; Haunschild, Schier et al.,
2018; Sjögårde & Ahlgren, 2018, 2020; Šubelj, van Eck, & Waltman, 2016; Waltman, Boyack
et al., 2020).
Including citations that are external to the analyzed data set improves the accuracy of a
clustering solution (Ahlgren et al., 2020; Boyack, 2017; Donner, 2021; Klavans & Boyack,
2011). This is an advantage of a global approach, compared to a local one in which a clus-
tering is based on a restricted set of publications. Nonetheless, a local approach may be pref-
erable in some applications because it can emphasize the local, within field, context of
research. However, local maps are difficult to compare to other maps. The rationale of overlay
maps is to make comparisons between maps. A global approach is therefore most often more
useful for the purpose of comparison.
3. DATA AND METHOD
To create a visualization of biomedical research literature that incorporates both overview and
detail, I based the visualization on a hierarchical publication-level classification. This classi-
fication was obtained by clustering articles in a (direct) citation network of PubMed records.
Currently, PubMed indexes over 1 million publications yearly, covering a wide range of bio-
medical research disciplines. I therefore refer to the classification created as “global,” even
though it does not have a comprehensive coverage of all fields of science.
A similar classification was recently created by Boyack, Smith, and Klavans (2020). This
classification differs mainly by the choice of similarity measure between publications. Boyack
et al. based their classification on a combination of direct citations and textual similarity. By
complementing direct citations with textual similarity, they were able to include publications
that would otherwise have no relations. Thus, the relation between publications in their
approach is a mixture of fundamentally different similarity measures and makes the interpre-
tation of the classification more difficult. Since the model of Boyack et al. was published, more
Quantitative Science Studies
1099
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
citations have been made openly available and it is now possible to create citation-based clas-
sifications with a more comprehensive coverage. I therefore base my classification strictly on
direct citations.
Clustering of publications creates one of several possible representations of the biomedical
sciences. The clusters created are internally connected by citations. This way they are self-
organized by the use of formal communication. The advantages of this methodology are that
it does not rely on predefined categories, the cost is low, high granularity as well as a hierar-
chical structure can be obtained, and the assignment of individual publications can be made
without subjective choices. Nevertheless, subjectiveness is still present in, for example, the
choice of publication–publication relation and parameter values, in particular the value of
the resolution parameter. The value of the resolution parameter used in this study was guided
by previous work (Sjögårde & Ahlgren, 2018, 2020); nonetheless, arbitrariness in this choice is
still unavoidable. Another disadvantage is the creation of disjoint clusters where each publi-
cation is assigned to exactly one cluster. Forcing publications into one cluster is practical and
facilitates interpretation but also means that information is lost. Naturally, publications can
address multiple concepts and be multidisciplinary in nature. Classification obtained by
clustering does not represent such characteristics in itself. However, to enable analyses of
transverse structures, the classification can be complemented with other sources, such as cita-
tion relations or the Medical Subject Headings (MeSH).
A comprehensive discussion about which methods (such as the choice of publication–
publication relation or choice of clustering algorithm) to prefer when clustering publications
is out of scope of this paper. The focus of this paper is on how to improve the interpretability of
overlay maps of science by providing possibilities to navigate between overview and details in
a map, and by taking advantage of the hierarchical structure of a classification. I leave the
discussion about how best to obtain classifications to future work. Nevertheless, the visuali-
zation method presented in this paper may help to evaluate classifications by making them
easier to navigate and interpret. In this way the paper may also contribute to the understanding
of what kind of clusters are created by the use of clustering in citation networks. The visual-
ization methodology may be applied to any hierarchical classification and can also be delim-
ited to fewer levels in the classification.
In this paper I present two examples of maps incorporating possibilities to navigate the hier-
archical structure of a classification. These maps are publicly available and are limited to the
presented cases and the base map of science. It is out of scope of this study to provide a web
tool that can be used to explore other overlays.
Figure 1 illustrates the process used to obtain the classification from the citation network
and to create a base map from the classification. The base map incorporates features to
emphasize the hierarchical structure of the classification. In Section 3.1 I describe the data
and process to obtain the classification, and in Section 3.2 I describe the process to create
the base map from the classification.
3.1. Obtaining the Classification
I used PubMed data to create a classification of publications in four levels based on citation
relations from the NIH Open Citation Collection (Hutchins, Baker et al., 2019; iCite, Hutchins,
& Santangelo, 2019). I used the bibliometric system at Karolinska Institutet for the analysis. The
system contains PubMed data from 1995 onwards. Data were extracted in February 2022
(version 28 of the NIH Open Citation Collection) and were restricted to the publication types
“article” and “review”: about 18.6 million publications with about 462 million direct citation
Quantitative Science Studies
1100
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 1.
Illustration of the process used to create and visualize the classification.
relations. In the remainder of this paper, I use the term publication to refer to both articles
and reviews.
Except for some modifications, I obtained the classification using the methodology put for-
ward in Waltman and van Eck (2012). In accordance with this methodology, direct citation
relations were used to create a network. Citation relations were normalized in relation to each
publication’s total number of citation relations. The Leiden algorithm (Traag, Waltman, & van
Eck, 2019) was used to obtain a partitioning of the publications1. To get clusters of substantial
1 The process took about 10 h 15 min to run 100 iterations. A total of 256 Gbyte RAM was allocated for the
process. The value of the quality function (CPM) was 0.408. The resolution parameter was set to 0.00010.
Version 1.1.0 of the software was downloaded from https://github.com/CWTSLeiden/networkanalysis
(November 20, 2020).
Quantitative Science Studies
1101
Improving overlay maps of science
size, I restricted the cluster size to a minimum of 50 publications by reclassifying publications
in clusters below this minimum size using the method provided in the software. The resolution
parameter was calibrated to obtain clusters of about the same size as in Sjögårde and Ahlgren
(2018) for the corresponding publication years, resulting in 63,575 clusters. Thereby, clusters
approximately correspond to research topics, and I refer to clusters at this level as such.
Topics were clustered into larger groups based on their summed relatedness and normaliz-
ing for cluster sizes (Eq. 4 in Boyack et al., 2020). I calibrated the resolution parameter to
obtain clusters of about the same size for corresponding publication years as obtained in
Sjögårde (2020). Thereby, clusters approximately correspond to research specialties. A mini-
mum threshold of at least 500 publications was used at this level, which resulted in 1,602
specialties2.
Specialties were clustered into larger clusters. My intention was to create clusters of approx-
imately the size of other broad classifications, such as Web of Science journal categories
(about 250 clusters), Science Metrix journal classification (180 clusters at the “subfields” level)
and Scopus Subject areas (about 330 clusters). Because the classification only includes
biomedicine, I aimed for a smaller number of clusters. I tested several different values of the
resolution parameter. Too low values merged specialties with seemingly weak relatedness into
coarse clusters, while too high values resulted in many specialties being unmerged. I finally
chose a solution with 131 clusters, after restricting cluster sizes to at least 100,000 publica-
tions. I refer to this level as research disciplines. These clusters represent fields of science hold
together by citations. Thus, only the formal communication through citations has been taken
into account. These higher level clusters do not necessarily coincide with disciplines as social
and organizational structures (Hammarfelt, 2019) but may help to shed light on the organiza-
tion of science through its communication.
The disciplines were grouped into 22 broader research areas. It is particularly difficult to
obtain good labels at this level, because terms extracted from bibliographic fields tend to be
too narrow. For this reason, the research areas are not displayed in the visualization. The
research areas are used to color sibling disciplines.
To create labels, I used the procedure proposed in Sjögårde, Ahlgren, and Waltman (2021).
Noun phrases were extracted from article titles, MeSH, journal titles, and author addresses. A
noun phrase was operationalized as a sequence of adjectives and nouns, ending with a noun
(van Eck, Waltman et al., 2010a). A Java program was written for this purpose and the Stanford
Core NLP software was used for data mining (Manning, Surdeanu et al., 2014), in particular
the lemmatizer and the Part-of-Speech tagger (Toutanova, Klein et al., 2003; Toutanova &
Manning, 2000)3. The relevance of terms to clusters was calculated using term frequency to
specificity ratio (TFS; Sjögårde et al., 2021). TFS balances term frequency and term specificity
to obtain terms that are both frequent in a cluster and specific to the cluster. For each cluster,
the three terms with the highest TFS value were concatenated into a label. Seven more terms
are listed when clicking on a node in the visualization. I used article titles and MeSH to create
labels at the topic level (α = 0.33 was used for the TFS calculation). I used article titles, MeSH,
and journal titles at the specialty level (α = 0.5) and journal titles and author addresses at the
discipline level (α = 0.67).
2 At aggregated level, reclassification was performed by merging clusters below the threshold with the cluster
above the threshold having the strongest relational strength.
3 Stanford CoreNLP is available at https://stanfordnlp.github.io/CoreNLP/.
Quantitative Science Studies
1102
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
3.2. Creating the Base Map
As a basis for the map, I created a network of specialties. In the following step I contracted
each subnetwork of sibling specialties and positioned the parent discipline on top of this sub-
network. A list of topics was created for each specialty. This list is displayed when clicking a
node. In the following I describe the steps in detail.
3.2.1.
Specialty level
I created a list of specialties with the attributes shown in Table 1. For the purpose of illustration,
the table presents values for an example specialty. The size attribute was calculated as the
square root of the number of publications. This makes the area of each node proportional
to the number of publications. If a cluster contained no more than 500 publications, a hyper-
link was created to the underlying publications (“Get list in PubMed”). If a cluster contained
501–5,000 publications a separate hyperlink was created for each batch of 500 publications
(1–500, 501–1,000, etc). If more than 5,000 publications were in a cluster no hyperlinks were
provided and instead the text “Too many publ.” was shown. This was done because of restric-
tions on hyperlink length. The underlying topics were listed in the column “Children.” Labels
and numbers of publications for the topics were concatenated into a list. The interactive map
contains hyperlinks to the underlying publications for each of the topics (with the same restric-
tion to 5,000 publications). Specialty nodes were colored according to their cluster at the top
level (research areas).
Attribute
id
label
size
color
Table 1.
Attributes for an example specialty. Hyperlinks omitted
Value
l2.229
skin neoplasm; cancer; nevus
30.4
rgba(116,200,0,1)
Additional terms
nevus; malignant melanoma; surgical oncology; sentinel lymph node biopsy; cutaneous melanoma;
Level
Parent
# Publ.
raf; oncogene proteins b
Specialty
dermatology; melanoma; skin
3707
Get list in PubMed
1–500, 501–1000, 1001–1500, 1501–2000, 2001–2500, 2501–3000, 3001–3500, 3501–3707
Children
(cid:129) neoadjuvant therapy; placebo; ipilimumab – Get list in PubMed: 243 publ.
(cid:129) artificial intelligence; deep learning; convolutional neural network – Get list in PubMed:
229 publ.
(cid:129) sulfonamides; protein kinase inhibitor; vemurafenib – Get list in PubMed: 226 publ.
(cid:129) dermoscopy; computer; image interpretation – Get list in PubMed: 174 publ.
(cid:129) sunburn; sunscreening agents; health behavior – Get list in PubMed: 149 publ.
x
y
−214.7
−590.5
Quantitative Science Studies
1103
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
The ForceAtlas algorithm (Jacomy, Venturini et al., 2014) was used to create a layout based
on the normalized direct citation value between the specialties (the same relatedness value
used for clustering)4. The algorithm resembles a physical system in which nodes repulse each
other and edges attract the nodes. The magnitude of the attraction is relative to the weight of
the edges. For each specialty, edges were restricted to the 20 having the highest relatedness
values. This was done to improve efficiency.
3.2.2. Discipline level
For each group of sibling specialties, the node of the parent discipline was placed on top of the
group. The means of the x- and y-coordinates were used to position the node of the parent
discipline. Let V = {v1, …, vn} be a set of sibling specialties, with the parent p. The x-coordinate
of p is given by
P
gx við
Þ
;
f
i2 1;…;n
n
x pð Þ ¼
(1)
where x(vi) is the x-coordinate of vi. The y-coordinate of p was calculated equivalently.
The same attributes were calculated for disciplines as for specialties. Underlying specialties
were listed in the attribute “Children” in the case of disciplines. In correspondence with spe-
cialties, only the 20 relations with the highest relational strength were kept.
The sizes of the specialty nodes were rescaled by dividing by 2. This was done for better
readability of the visualization. The discipline nodes were made partly transparent in order not
to hide the underlying specialties.
3.2.3. Adjusting specialties subnetworks
After the layout had been created and the coordinates of the disciplines had been calculated,
each group of sibling specialties was contracted. This feature emphasizes the hierarchical
structure of the classification and facilitates the interpretation of the network. The adjusted
x-coordinate of the specialty vi is given by
xa við
Þ ¼ x pð Þ þ x við
ð
Þ − x pð Þ
Þ (cid:2) α;
(2)
where α is a parameter ranging from 0 to 1. If α = 1 no adjustments are made, and the network
keeps the original layout based on the relations between specialties. If α = 0 each subnetwork
of sibling specialties is contracted to its midpoint (i.e., the coordinates of the parent discipline).
In the presented network each subnetwork was contracted to a fifth of its original size (i.e.,
α = 1/5).
The chosen visualization approach emphasizes the delineation of publications obtained by
the clustering algorithm. Furthermore, it emphasizes the hierarchical structure obtained by
clustering lower level clusters into higher level clusters. The benefit of this approach is that
a more easily interpretable overview can be provided. However, the approach comes with
a cost. At the level of specialties, the layout contracts siblings and thereby underemphasizes
relations between different areas.
4 An R-function (with base code in C) was created by my colleague Robert Juhasz for this task. The function is
equivalent to the ForceAtlas layout in Gephi. The following parameter values were used: number of itera-
tions = 10,000, inertia = 0.1, repulsion strength = 5,000, attraction strength = 5, max displacement = 5,
freeze balance = true, freeze strength = 80, freeze inertia = 0.2, gravity = 1, outbound attraction distribution =
false, adjust sizes = false, speed = 1, cooling = 1.
Quantitative Science Studies
1104
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
A map without adjustment (α = 1) may represent relations between a node and all its rela-
tions somewhat better. However, no network visualization layout solves this problem entirely
because they are all limited to a two- (or three-) dimensional Euclidean space. Given the many
and diverse relations that each cluster has, it is only possible for the layout algorithm to find a
best possible solution in such a space. This solution must emphasize some relations (by posi-
tioning the nodes) at the expense of others.
I visualized the networks using the sigma.js package created with the “SigmaExporter”
plugin for the visualization software Gephi5. R was used to create networks and other files
necessary for the visualization (a json network file6, a json configuration file7, and an html
file). The package includes the possibility to search for nodes. Note that this search feature
is restricted to searching in node labels to identify nodes. It cannot be used to restrict the
map to a set of publications or nodes.
The file size of the full base map is large due to the high amount of data in hyperlinks. To
decrease loading time, I restricted the available online version of the map to the publication
years 2020–February 2022, showing about 2.7 million publications. Below I refer to this map
as the base map. As a result of this restriction, the map shows the current (or most recent) state
of the biomedical literature.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
4. RESULTS
In this section I demonstrate how the base map provides both overview and detail by visual-
izing the hierarchical structure of the classification which it has been built upon. I then present
two cases that show how the map can be used to enrich the study of research activities:
coronavirus/covid-19 research and its historical roots and the subject orientation of Stock-
holm’s three largest universities, part of the “Stockholm Trio” university alliance. Table 2 lists
the URLs for all maps presented in this section.
4.1. The Base Map
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
Figure 2 is a screenshot of the interactive base map that is available online.
Disciplines and their underlying specialties have been colored by clusters at the level above
disciplines. The map shows clusters oriented from biophysics and biochemistry at the bottom
right to social, psychological, and healthcare aspects of medicine at the bottom left.
The bottom left side includes health profession-related research: nursing, psychology,
medical informatics, and public health.
At the top left side, we find disciplines with a clinical focus, including, for example, neu-
rosurgery, gastroenterology, dentistry, pathology, obstetrics, cardiology diseases, and a variety
of cancers and treatment thereof.
The disciplines at the top middle of the map are focused on cell and molecular medicine,
including research on human proteins, transcription factors, immunology, stem cells, DNA
and RNA, etc. Several clinical areas are strongly connected to the cell and molecular
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
5 The SigmaExporter was developed through the InteractiveVis project at the Oxford Internet Institute, Uni-
versity of Oxford. The Java code of the exporter is available under a GPLv3 License. https://gephi.org/plugins
/#/plugin/sigmaexporter (March 5, 2020).
6 https://petersjogarde.github.io/papers/ hiervis/base/data.json.
7 https://petersjogarde.github.io/papers/ hiervis/base/config.json.
Quantitative Science Studies
1105
Improving overlay maps of science
Table 2. URLs to the base map and overlays presented in the results section
Map
Base map
https://petersjogarde.github.io/papers/hiervis/base/index.html
URL
Covid-19 publications
https://petersjogarde.github.io/papers/hiervis/covid_v2/pubs/index.html
Publications cited from Covid-19 publications
https://petersjogarde.github.io/papers/hiervis/covid_v2/cited/index.html
KTH Royal Institute of Technology
https://petersjogarde.github.io/papers/hiervis/sthlm_trio/kth/index.html
Stockholm University
Karolinska Institutet
https://petersjogarde.github.io/papers/hiervis/sthlm_trio/sthlm_univ/index.html
https://petersjogarde.github.io/papers/hiervis/sthlm_trio/ki/index.html
disciplines and are positioned between the top middle and the top right, including, for
example, some transplantation, oncology, and rheumatology.
A group of life science disciplines are located to the right of the cell and molecular disci-
plines, including biology, microbiology, biochemistry, biotechnology, environmental sciences,
and environmental engineering. Note that PubMed primarily covers biomedicine and life
sciences and does not have full coverage of, for example, environmental sciences.
It is not easy to compare the base map with other maps of science, such as the RoRI map of
funding landscape (RoRI Institute et al., 2019) and the PubMed model by Boyack et al. (2020),
because the maps display clusters at different levels of aggregation and these other maps do
not provide much possibility for overview. Nonetheless, it is clear that the base map presented
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 2. A base map of biomedical science based on about 2.7 million articles from 2020–February 2022.
Quantitative Science Studies
1106
Improving overlay maps of science
Figure 3. The discipline “dermatology; melanoma; skin” and its child nodes.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
in this paper and the two mentioned maps all position clinical research at one end of the map
and more basic research at the other end of the map. All three maps have areas oriented
towards natural sciences, biophysics, and biochemistry at the basic end of the map. They also
have cell and molecular science positioned close to these areas as well as an area of infectious
diseases. Areas of research oriented towards healthcare and health professions are positioned
furthest from the technical side in all maps. The maps seem to be rather similar at this overall
level.
The zooming feature is displayed in Figure 3, showing specialties in “dermatology; mela-
noma; skin.” The figure reveals major specialties addressing skin cancer, psoriasis, allergy,
acne, and hair loss. By looking at the node sizes we can estimate the relative size of these
fields. Skin cancer and psoriasis are the two largest nodes. Some nodes are about half the size
of these large nodes, for example hair loss (“alopecia; hair; hair follicle”) and allergy (“atopic
dermatitis; pruritus; dog diseases”), while others are very small, for example pemphigus (“pem-
phigus; bullous pemphigoid; pemphigus vulgaris”).
Figure 4 shows the strong relation between “skin absorption; pharmaceutic; drug delivery
systems” clustered together with other skin related specialties but having strong relations with
pharmaceutical specialties located in the other end of the map, in particular “pharmaceutic;
pharmaceutical science; excipient.”
Clicking on a specialty gives the user further information. This feature is exemplified in
Figure 5, in which the information panel for the skin cancer specialty (“skin neoplasm; cancer;
nevus”) is displayed. The information panel reveals subtopics addressing treatment, the use of
artificial intelligence to detect skin cancers, medication, imaging and behavior, and risk fac-
tors. Hyperlinks make it possible to retrieve the publications underlying each topic in PubMed.
4.2. Coronavirus
To create a map of research related to the coronavirus pandemic that started in late 2019, I
used the search query in Table 3. The query has been designed by the library at Karolinska
Quantitative Science Studies
1107
Improving overlay maps of science
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 4. Relation between “skin absorption; pharmaceutic; drug delivery systems” and pharmaceutical specialties.
Institutet to get publications both about the disease (COVID-19) and the virus causing the dis-
ease (SARS-CoV-2).
The search query resulted in 145,089 articles from 2019 until February 2022. The COVID-19/
SARS-CoV-2 map (Figure 6)8 shows that most research related to the pandemic fuses into one
8 https://petersjogarde.github.io/papers/ hiervis/covid/pubs/index.html.
Quantitative Science Studies
1108
Improving overlay maps of science
Figure 5. The specialty “skin neoplasm; cancer; nevus.” The information panel shows information about the specialty. The image has been
cropped to show the five largest topics in the specialty; the interactive map shows all topics.
Table 3.
Search query for covid-19/SARS-CoV2 research
Covid*[tw] OR nCov[tw] OR 2019 ncov[tw] OR novel coronavirus[tw] OR novel corona virus[tw] OR ” Covid-19″[All Fields] OR
“Covid-2019″[All Fields] OR “severe acute respiratory syndrome coronavirus 2″[Supplementary Concept] OR “severe acute respi-
ratory syndrome coronavirus 2″[All Fields] OR “2019-nCoV”[All Fields] OR “SARS-CoV-2″[All Fields] OR “2019nCoV”[All Fields] OR
((“Wuhan”[All Fields] AND (“coronavirus”[MeSH Terms] OR “corona virus”[All Fields] OR “coronavirus”[All Fields])) AND (2019/12
[PDAT] OR 2020[PDAT] OR 2021[PDAT]))
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
discipline (“covid; cov; sars”). Nonetheless, almost half of the publications retrieved from the
search query are distributed over a wide range of other specialties, including specialties in
psychiatry and mental health (“psychiatry; health; mental disorders”), nursing and other health
profession related research (“nursing; education; surgery”), remote healthcare (“telemedicine;
radiology; internet”), immunology (“lymphocyte; immunology; hiv”), and infectious diseases
(“infectious disease; microbiology; staphylococcal infection”).
Zooming into the largest coronavirus node (Figure 7) reveals some major topics addressing;
imaging of the lungs (“tomography; x ray; chest ct”), pregnancy (“infectious pregnancy com-
plication; pregnancy; pregnant woman”), thromboembolism (“venous thromboembolism;
pulmonary embolism; anticoagulant”), characteristics of the virus (“variant; coronavirus spike
glycoprotein; concern”) and effects on the neurological system (“nervous system diseases;
guillain; barre syndrome”).
By creating a map of research cited by the coronavirus research (Figure 8)9, we get a picture
of the research upon which the coronavirus research has been built. Node sizes are relative to
the total number of publications cited by the set of coronavirus research publications and the
9 https://petersjogarde.github.io/papers/ hiervis/covid/cited/index.html.
Quantitative Science Studies
1109
Improving overlay maps of science
Figure 6. Covid-19/SARS-CoV-2 research.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 7. Topics in coronavirus cluster (“covid 19; coronavirus infections; viral pneumonia”).
Quantitative Science Studies
1110
Improving overlay maps of science
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 8. Publications cited by covid-19/SARS-CoV-2 research, published before the pandemic. Node sizes reflect the number of cited pub-
lications and node colors the average number of citations per cited publication.
colors have been set by the clusters’ average number of citations from this set. The map shows
that coronavirus research is based on a wide range of areas, for example:
1. knowledge from previous coronavirus epidemics (topics addressing SARS and MERS in
2.
3.
the specialty “coronavirus infection; covid 19; viral pneumonia”);
research on mRNA vaccines (the topic “messenger rna; mrna; mrna vaccine” in “dna;
transfection; gene transfer technique”);
research on the drug targeting (“drug repositioning; drug target interaction; drug target
interaction prediction” in “chemical information; modeling; drug design”); and
4. protein structure modeling (the topic “sars; cov; sars cov 2” in the specialty “hla; allele;
tissue antigen”).
4.3. Stockholm Trio
Stockholm Trio is a university alliance in Stockholm including the city’s three large universi-
ties: KTH Royal Institute of Technology (KTH), Karolinska Institutet (KI), and Stockholm Uni-
versity. The universities have fundamentally different subject orientations: KTH is a one-faculty
technical university, Karolinska Institutet is a one-faculty medical university, and Stockholm
University has several faculties within the humanities, social sciences, and natural sciences.
Mapping of the biomedical research at the three universities may help management to find
areas of potential collaboration.
Quantitative Science Studies
1111
Improving overlay maps of science
Table 4.
was used for truncation and “_” to match any character
Search queries used to identify publications by the universities in Stockholm trio. “%”
University
KTH
Karolinska Institutet
Stockholm Trio
Search queries
’%KTH%Sweden%’
’%royal%inst%tech%sweden%’
’%kungliga tekniska%sweden%’
’%karolinska%sweden%’
’%university hosp%huddinge%sweden%’
’%university hosp%solna%sweden%’
’%danderyd hosp%sweden%’
’%s_dersjuk%sweden%’
’%stockholm county council%sweden%’
’%stockholm univ%sweden%’
’%university of stockholm%sweden%’
To analyze the publication output of the three universities, I created one map for each uni-
versity. The maps were delimited to PubMed (i.e., the biomedical area) and to 2019–2021. The
publications authored by researchers at the three universities of the Stockholm Trio were iden-
tified using search queries. Table 4 shows the search queries used to identify publications by
researchers at the three universities in KI’s internal version of PubMed. A minor part of the
publications by each university was not captured by these rather few and simple search
queries. However, the coverage is sufficient for the purpose of illustration.
Figure 9 displays snapshots of the maps for KTH (A), Stockholm University (B) and Karo-
linska Institutet (C) for the publication years 2019–2021.
The KTH map (Figure 9A) shows about 2,400 biomedical publications, which is a minor
part of KTH’s total publications output. It shows a clear focus on the technical areas to the right
of the map, including materials science, biophysics, and biochemistry (disciplines “pharmacy;
nanoparticle; engineering,” “chemistry; biochemistry; chemical engineering,” and “technol-
ogy; physics; engineering”). There are also a relatively large number of publications in
environmental engineering (“technology; environmental engineering; environment”) and bio-
technology (“biotechnology; technology; biochemistry”). At the top middle part of the map we
find KTH publications in disciplines focusing on cell and molecular medicine (“microrna; dna;
biochemistry” and “lymphocyte; immunology; hiv”). There are also publications in some clin-
ical areas (e.g., “cardiology; heart; heart failure,” “neurosurgery; radiology; stroke,” “orthopae-
dic surgery; orthopaedics; arthroplasty,” and “radiation oncology; urology; radiology”). At the
bottom left side KTH has some publications in psychiatry, mental health, and brain research
(“psychiatry; neurology; pharmacology,” “brain; cognition; attention,” and (“psychiatry;
health; mental disorders”). There are also some publications in the covid-19 cluster (“covid;
cov; sars”).
The KTH map reveals a clear focus on methodology, expressed by terms such as “waste-
water surveillance” in the covid-19 cluster, “magnetic resonance imaging” in the neurosurgery
discipline, and “tomography” in “radiation oncology; urology; radiology”.
Quantitative Science Studies
1112
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 9. Biomedical publications by KTH (A), Stockholm University (B) and Karolinska Institutet (C). Publication years 2019–2021.
Quantitative Science Studies
1113
Improving overlay maps of science
The map for Stockholm University (Figure 9B) shows about 4,400 biomedical publications,
which is a small proportion of Stockholm university’s total publication output. This map
reveals a focus towards natural sciences, biophysics, and biochemistry, but also a large pro-
portion of publications in psychology. There is a stronger focus on biology and ecology in this
map than in the KTH map. Some disciplines are of about the same size in the Stockholm Uni-
versity map and the KTH map. However, zooming in to the specialties reveals differences. For
example, in “technology; environmental engineering; environment” Stockholm University has
a focus on monitoring the environmental and toxicological effects from pollutants and emis-
sions and KTH has a focus on bioengineering (expressed by terms such as “waste disposal,”
“filtering,” “arsenite binding,” and “deionization”). In similarity with KTH, Stockholm Univer-
sity has only a few publications on the clinical side of the map (top left).
The KI map (Figure 9C) is fundamentally different from the Stockholm University and KTH
maps. It shows about 21,600 biomedical publications, which covers a high proportion of KI’s
total publication output. KI has a wide range of research in most areas of the map, not the least
on the clinical side. However, there is a smaller proportion of publications on the right side of
the map, including, for example, medical aspects of nanotechnology, toxicology, and micros-
copy. Similar to Stockholm University, KI has many publications in psychology and psychiatry.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
5. DISCUSSION
I have shown how a publication-level classification, including both coarse and granular levels,
can be used to create overlay maps of science that provide both overview and detail. To exem-
plify the use of such maps I have demonstrated potential utilization by revealing the topical
structure of coronavirus/COVID-19 research and differences and similarities in research orien-
tation at three universities.
The visualizations created by the methodology put forward in this paper enable the navi-
gation of millions of articles, from broad levels down to individual articles. No existing soft-
ware supports such navigation. Potentially, any set of PubMed publications can be projected
onto the map (currently not supported by the web tool): for example, the publication output of
an organization or a journal or within a particular research field. The contents of the disci-
plines and specialties displayed in the visualization can be explored down to narrow topics
and individual articles. Thereby, analysts and other users can get a deeper and richer under-
standing of the data displayed in overlay maps. Potentially, the visualization technique can be
used in other applications, for example to display local maps or to visualize search results in
information retrieval systems.
There are some limitations of the visualization methodology. Some of these limitations
relate to the creation of the classification and some to the visualization methodology itself.
Several researchers have acknowledged that different choices of methods, parameter values,
relational measures, and clustering algorithms results in diverse representations of science,
sometimes equally valid (for examples, see Gläser et al., 2017; Waltman et al., 2020). My
choice of citation relation, clustering algorithm, parameter values, and labeling approach have
been guided by both empirical support and practical considerations. My approach, based on
direct citations, has the advantages of being efficient (having fewer relations than bibliographic
coupling and cocitations) and including relations to and from a large proportion of the publi-
cations in the data source, given that a comprehensive data source and a large time span are
used. The direct citation approach has performed well in quantitative evaluations using large
corpuses (Boyack & Klavans, 2020; Klavans & Boyack, 2017). The resolution parameter values
have also been guided by previous research (Sjögårde & Ahlgren, 2018, 2020). Nonetheless,
Quantitative Science Studies
1114
Improving overlay maps of science
other representations may be equally valid and express other aspects of the research land-
scape. For example, cocitations may be a better choice if one wants to examine the historical
development of a research field, and bibliographic coupling might be preferable to display
related publications in an information retrieval setting.
Labeling of the obtained clusters is a challenging task. Even though the methodology used
works reasonably well, the subject orientation of clusters is sometimes hard to interpret using
the cluster labels. Occasionally, other information needs to be considered by a user to under-
stand the subject orientation of a cluster and to distinguish it from other clusters; for example,
additional key terms, sibling cluster labels, parent and children cluster labels, and consulting
publication records in PubMed. Providing interactivity facilitates such interpretation. How-
ever, it remains unclear to what extent interpretation is a problem for users. Further work eval-
uating the interpretability of classifications from a user perspective is therefore needed.
The visualizations that I have presented include clusters visualized as nodes at two levels. It
is possible to visualize more levels, but at the risk of making the visualization more cluttered
and harder to interpret. Functionality hiding nodes at granular levels when zooming out and
showing nodes when zooming in might be an option to be able to include more levels in the
visualization. However, including nodes at additional levels does not necessarily help users to
read and interpret the visualization. Users might, for example, prefer reading lists at more gran-
ular levels. Therefore, user studies are needed to develop user-friendly features and to make
interactive overlay maps of science easier to interpret.
The intention of this study has not been to evaluate normalization methods or layout algo-
rithms. There might be better options to create layouts, in particular at the discipline level,
regarding both normalization of citation relations and layout algorithm.
I have emphasized the clusters by contracting the subnetwork of sibling specialties. This
procedure puts sibling specialties in proximity and improves readability. However, it may dis-
tort relations outside the cluster. Stressing the hierarchical structure of the classification hides
transverse relations in this structure, such as relations between specialties with different
parents. Complementing the visualization with other information may be a viable option to
make such relations visible and to enrich analyses. The purpose of a study must guide the
application of the map. For example, the map can be restricted to concepts expressed by
MeSH and transverse relations can be highlighted using the citation relations obtained when
constructing the classification. Using specialties at the top level and topics at a lower level
might be a better solution for smaller sets of data.
The choice of the ForceAtlas layout algorithm was guided by my experience with visual-
izing a wide range of bibliometric networks (e.g., coauthor networks, MeSH-networks, coau-
thoring organization networks, and article citation networks) in bibliometric practice at
Swedish universities. ForceAtlas is implemented in the visualization software Gephi10, which
makes it possible to try out different parameter values to facilitate the readability of a particular
network and more generally to learn how to use parameter values for different kinds of networks.
In my experience, the ForceAtlas algorithm creates visualizations that are interpretable and
make sense, and I have received positive feedback from users on visualizations created using
this layout. Alternatives to ForceAtlas are, for example, the OpenOrd layout algorithm used by
Boyack et al. (2020), the VOS layout algorithm which is implemented in the VOSviewer soft-
ware (van Eck, Waltman, et al., 2010b; van Eck & Waltman, 2010), the Fruchterman–Reingold
10 https://gephi.org/.
Quantitative Science Studies
1115
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
layout algorithm (Fruchterman & Reingold, 1991) and the Kamada–Kawai layout algorithm
(Kamada & Kawai, 1989).
The maps created in this study have been restricted to the biomedical sciences. During
recent years the amount and proportion of available bibliographic metadata has increased
substantially. Future work may be extended to other research fields.
There are several technical issues related to the visualization tool used. For example, the
current version of the visualization package does not support smartphones and tablets; iden-
tification of areas of interest to a user could be facilitated by filters and improved search fea-
tures; loading the visualization files is rather slow; and hyperlinks are provided in batches if a
cluster includes more than 500 publications. The intention has not been to provide a perfect
visualization tool but rather to show how interactive visualizations of hierarchical classifica-
tions can provide users with enriched possibilities to explore the scientific literature. I have
demonstrated that it is possible to provide maps of science that can give the user an overview
of millions of publications and details down to individual publications. Such maps may con-
stitute a valuable tool for researchers studying science, improve the transparency of cluster-
based citation normalization, support research management and policy making, and constitute
a tool for researchers to explore research of relevance to them.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
ACKNOWLEDGMENTS
I would like to thank Ludo Waltman and two anonymous reviewers for their constructive feed-
back on an earlier version of this paper.
COMPETING INTERESTS
The author has no competing interests.
FUNDING INFORMATION
Peter Sjögårde was funded by the Foundation for Promotion and Development of Research at
Karolinska Institutet.
DATA AVAILABILITY
All maps and underlying data and configuration files are available online:
(cid:129) Classification and labels:
https://doi.org/10.6084/m9.figshare.c.5610971.v3
(cid:129) Base map:
https://petersjogarde.github.io/papers/hiervis/base/index.html
(cid:129) Base map files:
https://github.com/petersjogarde/petersjogarde.github.io/tree/main/papers/ hiervis/base
(cid:129) Covid-19/SARS-CoV-2 maps:
Publications: https://petersjogarde.github.io/papers/ hiervis/covid_v2/pubs/index.html
Cited publications: https://petersjogarde.github.io/papers/ hiervis/covid_v2/cited
/index.html
(cid:129) Covid-19/SARS-CoV-2 files:
https://github.com/petersjogarde/petersjogarde.github.io/tree/main/papers/ hiervis
/covid_v2
(cid:129) Stockholm trio maps:
KTH: https://petersjogarde.github.io/papers/ hiervis/sthlm_trio/kth/index.html
Quantitative Science Studies
1116
Improving overlay maps of science
Stockholm University: https://petersjogarde.github.io/papers/hiervis/sthlm_trio/sthlm
_univ/index.html
KI: https://petersjogarde.github.io/papers/hiervis/sthlm_trio/ki/index.html
(cid:129) Stockholm trio files:
https://github.com/petersjogarde/petersjogarde.github.io/tree/main/papers/ hiervis
/sthlm_trio
REFERENCES
Ahlgren, P., Chen, Y., Colliander, C., & van Eck, N. J. (2020).
Enhancing direct citations: A comparison of relatedness mea-
sures for community detection in a large set of PubMed publica-
tions. Quantitative Science Studies, 1(2), 714–729. https://doi.org
/10.1162/qss_a_00027
Bassecoulard, E., & Zitt, M. (1999). Indicators in a research insti-
tute: A multi-level classification of scientific journals. Sciento-
metrics, 44(3), 323–345. https://doi.org/10.1007/BF02458483
Börner, K., Chen, C., & Boyack, K. W. (2005). Visualizing knowledge
domains. Annual Review of Information Science and Technology,
37(1), 179–255. https://doi.org/10.1002/aris.1440370106
Boyack, K. W. (2017). Investigating the effect of global data on
topic detection. Scientometrics, 111(2), 999–1015. https://doi
.org/10.1007/s11192-017-2297-y
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, biblio-
graphic coupling, and direct citation: Which citation approach
represents the research front most accurately? Journal of the
American Society for Information Science and Technology, 61(12),
2389–2404. https://doi.org/10.1002/asi.21419
Boyack, K. W., & Klavans, R. (2018). Accurately identifying topics
using text: Mapping PubMed. In STI 2018 Conference Proceed-
ings (pp. 107–115). https://openaccess.leidenuniv.nl/ handle
/1887/65319
Boyack, K. W., & Klavans, R. (2020). A comparison of large-scale
science models based on textual, direct citation and hybrid relat-
edness. Quantitative Science Studies, 1(4), 1570–1585. https://
doi.org/10.1162/qss_a_00085
Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the back-
bone of science. Scientometrics, 64(3), 351–374. https://doi.org
/10.1007/s11192-005-0255-6
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., …
Börner, K. (2011). Clustering more than two million biomedical
publications: Comparing the accuracies of nine text-based simi-
larity approaches. PLOS ONE, 6(3), e18029. https://doi.org/10
.1371/journal.pone.0018029, PubMed: 21437291
Boyack, K. W., Smith, C., & Klavans, R. (2020). A detailed open access
model of the PubMed literature. Scientific Data, 7(1), 408. https://doi
.org/10.1038/s41597-020-00749-y, PubMed: 33219227
Donner, P. (2021). Validation of the Astro dataset clustering solu-
tions with external data. Scientometrics, 126(2), 1619–1645.
https://doi.org/10.1007/s11192-020-03780-3
Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph drawing by
force-directed placement. Journal of Software: Practice and Experi-
ence, 21(11), 1129–1164. https://doi.org/10.1002/spe.4380211102
Glänzel, W., & Schubert, A. (2003). A new classification scheme of
science fields and subfields designed for scientometric evalua-
tion purposes. Scientometrics, 56(3), 357–367. https://doi.org
/10.1023/A:1022378804087
Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Same data—
Different results? Towards a comparative approach to the identi-
fication of thematic structures in science. Scientometrics, 111(2),
981–998. https://doi.org/10.1007/s11192-017-2296-z
Hammarfelt, B. (2019). Discipline. In ISKO encyclopedia of knowl-
edge organization. https://www.isko.org/cyclo/discipline
Haunschild, R., Schier, H., Marx, W., & Bornmann, L. (2018). Algo-
rithmically generated subject categories based on citation rela-
tions: An empirical micro study using papers on overall water
splitting. Journal of Informetrics, 12(2), 436–447. https://doi.org
/10.1016/j.joi.2018.03.004
Held, M., Laudel, G., & Gläser, J. (2021). Challenges to the validity
of topic reconstruction. Scientometrics, 126, 4511–4536. https://
doi.org/10.1007/s11192-021-03920-3
Hutchins, B. I., Baker, K. L., Davis, M. T., Diwersy, M. A., Haque,
E., … Santangelo, G. M. (2019). The NIH Open Citation Collec-
tion: A public access, broad coverage resource. PLOS Biology,
17(10), e3000385. https://doi.org/10.1371/journal.pbio
.3000385, PubMed: 31600197
iCite, Hutchins, B. I., & Santangelo, G. (2019). ICite Database Snap-
shots (NIH Open Citation Collection). The NIH Figshare Archive.
Collection. https://doi.org/10.35092/yhjc.c.4586573
Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014).
ForceAtlas2, a continuous graph layout algorithm for handy net-
work visualization designed for the Gephi software. PLOS ONE,
9(6), e98679. https://doi.org/10.1371/journal.pone.0098679,
PubMed: 24914678
Kamada, T., & Kawai, S. (1989). An algorithm for drawing general
undirected graphs. Information Processing Letters, 31(1), 7–15.
https://doi.org/10.1016/0020-0190(89)90102-6
Kay, L., Newman, N., Youtie, J., Porter, A. L., & Rafols, I. (2014).
Patent overlay mapping: Visualizing technological distance. Jour-
nal of the Association for Information Science and Technology,
65(12), 2432–2443. https://doi.org/10.1002/asi.23146
Klaine, S. J., Koelmans, A. A., Horne, N., Carley, S., Handy, R. D., …
von der Kammer, F. (2012). Paradigms to assess the environmen-
tal impact of manufactured nanomaterials. Environmental Toxi-
cology and Chemistry, 31(1), 3–14. https://doi.org/10.1002/etc
.733, PubMed: 22162122
Klavans, R., & Boyack, K. W. (2011). Using global mapping to
create more accurate document-level maps of research fields.
Journal of the American Society for Information Science and
Technology, 62(1), 1–18. https://doi.org/10.1002/asi.21444
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis
generates the most accurate taxonomy of scientific and technical
knowledge? Journal of the Association for Information Science and
Technology, 68(4), 984–998. https://doi.org/10.1002/asi.23734
Leydesdorff, L. (2004). Clusters and maps of science journals based
on bi-connected graphs in Journal Citation Reports. Journal of
Documentation, 60(4), 371–427. https://doi.org/10.1108
/00220410410548144
Leydesdorff, L. (2006). Can scientific journals be classified in terms
of aggregated journal-journal citation relations using the Journal
Citation Reports? Journal of the American Society for Information
Science and Technology, 57(5), 601–613. https://doi.org/10.1002
/asi.20322
Quantitative Science Studies
1117
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Improving overlay maps of science
Leydesdorff, L., Moya-Anegón, F. d., & Guerrero-Bote, V. P. (2015).
Journal maps, interactive overlays, and the measurement of inter-
disciplinarity on the basis of Scopus data (1996–2012). Journal of
the Association for Information Science and Technology, 66(5),
1001–1016. https://doi.org/10.1002/asi.23243
Mai, J. (2011). The modernity of classification. Journal of Documenta-
tion, 67(4), 710–730. https://doi.org/10.1108/00220411111145061
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., &
McClosky, D. (2014). The Stanford CoreNLP natural language
processing toolkit. In Proceedings of 52nd Annual Meeting of
the Association for Computational Linguistics: System Demon-
strations (pp. 55–60). https://doi.org/10.3115/v1/P14-5010
Moya-Anegón, F., Vargas-Quesada, B., Herrero-Solana, V.,
Chinchilla-Rodríguez, Z., Corera-Álvarez, E., & Munoz-Fernández,
F. J. (2004). A new technique for building maps of large scientific
domains based on the cocitation of classes and categories. Sci-
entometrics, 61(1), 129–145. https://doi.org/10.1023/ B:SCIE
.0000037368.31217.34
Petrovich, E. (2020). Science mapping. In Encyclopedia of knowl-
edge organization. https://www.isko.org/cyclo/science_mapping
Rafols, I., Porter, A. L., & Leydesdorff, L. (2010). Science overlay
maps: A new tool for research policy and library management.
Journal of the American Society for Information Science and
Technology, 61(9), 1871–1887. https://doi.org/10.1002/asi
.21368
RoRI Institute, Waltman, L., Rafols, I., van Eck, N. J., & Yegros
Yegros, A. (2019). Supporting priority setting in science using
research funding landscapes (Report No. 1; RoRI Working Paper).
Research on Research Institute. https://doi.org/10.6084/m9
.figshare.9917825.v1
Rotolo, D., Rafols, I., Hopkins, M. M., & Leydesdorff, L. (2017).
Strategic intelligence on emerging technologies: Scientometric
overlay mapping. Journal of the Association for Information
Science and Technology, 68(1), 214–233. https://doi.org/10
.1002/asi.23631
Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically
constructed publication-level classifications of research publica-
tions: Identification of topics. Journal of Informetrics, 12(1),
133–152. https://doi.org/10.1016/j.joi.2017.12.006
Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically
constructed publication-level classifications of research publica-
tions: Identification of specialties. Quantitative Science Studies,
1(1), 207–238. https://doi.org/10.1162/qss_a_00004
Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic label-
ing in hierarchical classifications of publications: Evaluation of
bibliographic fields and term weighting approaches. Journal of
the Association for Information Science and Technology, 72(7),
853–869. https://doi.org/10.1002/asi.24452
Small, H. (1999). Visualizing science by citation mapping. Journal
of the American Society for Information Science, 50(9), 799–813.
https://doi.org/10.1002/(SICI)1097-4571(1999)50:9<799::AID
-ASI9>3.0.CO;2-G
Smiraglia, R. P., & van den Heuvel, C. (2013). Classifications and
concepts: Towards an elementary theory of knowledge interac-
tion. Journal of Documentation, 69(3), 360–383. https://doi.org
/10.1108/JD-07-2012-0092
Šubelj, L., van Eck, N. J., & Waltman, L. (2016). Clustering scientific
publications based on citation relations: A systematic compari-
son of different methods. PLOS ONE, 11(4), e0154404. https://
doi.org/10.1371/journal.pone.0154404, PubMed: 27124610
Tang, L., & Shapira, P. (2011). China–US scientific collaboration in
nanotechnology: Patterns and dynamics. Scientometrics, 88,
1–16. https://doi.org/10.1007/s11192-011-0376-z
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003).
Feature-rich part-of-speech tagging with a cyclic dependency
network. In Proceedings of the 2003 Conference of the North
American Chapter of the Association for Computational Linguis-
tics on Human Language Technology (pp. 173–180). https://doi
.org/10.3115/1073445.1073478
Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge
sources used in a maximum entropy part-of-speech tagger. In
Proceedings of the 2000 Joint SIGDAT Conference on Empirical
Methods in Natural Language Processing and Very Large Corpora
(pp. 63–70). https://doi.org/10.3115/1117794.1117802
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to
Leiden: Guaranteeing well-connected communities. Scientific
Reports, 9(1), 5233. https://doi.org/10.1038/s41598-019-41695-z,
PubMed: 30914743
van Eck, N. J., & Waltman, L. (2014). Visualizing bibliometric networks.
In Y. Ding, R. Rousseau, & D. Wolfram (Eds.), Measuring scholarly
impact: Methods and practice (pp. 285–320). Springer International
Publishing. https://doi.org/10.1007/978-3-319-10377-8_13
van Eck, N. J., Waltman, L., Noyons, E. C. M., & Buter, R. K. (2010a).
Automatic term identification for bibliometric mapping. Sciento-
metrics, 82(3), 581–596. https://doi.org/10.1007/s11192-010
-0173-0, PubMed: 20234767
van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer,
a computer program for bibliometric mapping. Scientometrics,
84(2), 523–538. https://doi.org/10.1007/s11192-009-0146-3,
PubMed: 20585380
van Eck, N. J., Waltman, L., Dekker, R., & van den Berg, J. (2010b).
A comparison of two techniques for bibliometric mapping:
Multidimensional scaling and VOS. Journal of the American
Society for Information Science and Technology, 61(12),
2405–2416. https://doi.org/10.1002/asi.21421
Velden, T., Boyack, K. W., Gläser, J., Koopman, R., Scharnhorst, A.,
& Wang, S. (2017). Comparison of topic extraction approaches
and their results. Scientometrics, 111(2), 1169–1221. https://doi
.org/10.1007/s11192-017-2306-1
Waltman, L., Boyack, K. W., Colavizza, G., & van Eck, N. J. (2020).
A principled methodology for comparing relatedness measures
for clustering publications. Quantitative Science Studies, 1(2),
691–713. https://doi.org/10.1162/qss_a_00035
Waltman, L., & van Eck, N. J. (2012). A new methodology for con-
structing a publication-level classification system of science.
Journal of the American Society for Information Science and
Technology, 63(12), 2378–2392. https://doi.org/10.1002/asi
.22748
Zitt, M., Lelu, A., Cadot, M., & Cabanac, G. (2019). Bibliometric
delineation of scientific fields. In Springer handbook of science
and technology indicators (pp. 25–68). https://doi.org/10.1007
/978-3-030-02511-3_2
Quantitative Science Studies
1118
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
–
p
d
l
f
/
/
/
/
3
4
1
0
9
7
2
0
7
0
8
2
2
q
s
s
_
a
_
0
0
2
1
6
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3