ARTÍCULO DE INVESTIGACIÓN
Practical method to reclassify Web
of Science articles into unique subject
categories and broad disciplines
Staša Milojević
un acceso abierto
diario
Center for Complex Networks and Systems Research, Luddy School of Informatics, Informática, and Engineering,
Universidad de Indiana, Bloomington
Citación: Milojević, S. (2020). Practical
method to reclassify Web of Science
articles into unique subject categories
and broad disciplines. Quantitative
Science Studies, 1(1), 183–206. https://
doi.org/10.1162/qss_a_00014
DOI:
https://doi.org/10.1162/qss_a_00014
Recibió: 17 Julio 2019
Aceptado: 03 December 2019
Autor correspondiente:
Staša Milojević
smilojev@indiana.edu
Editor de manejo:
Juego Waltman
Palabras clave: clasificación
ABSTRACTO
Classification of bibliographic items into subjects and disciplines in large databases is essential
for many quantitative science studies. The Web of Science classification of journals into
aproximadamente 250 subject categories, which has served as a basis for many studies, es
known to have some fundamental problems and several practical limitations that may
affect the results from such studies. Here we present an easily reproducible method to
perform reclassification of the Web of Science into existing subject categories and into 14
broad areas. Our reclassification is at the level of articles, so it preserves disciplinary
differences that may exist among individual articles published in the same journal.
Reclassification also eliminates ambiguous (multiple) categories that are found for 50% de
items and assigns a discipline/field category to all articles that come from broad-coverage
journals such as Nature and Science. The correctness of the assigned subject categories
is evaluated manually and is found to be ∼95%.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
1.
INTRODUCCIÓN
The problem of the classification of science has attracted the attention of philosophers and
scientists alike for centuries (Dolby, 1979). The practice of classification is usually understood
as a process of arranging things “in groups which are distinct from each other, and are sepa-
rated by clearly determined lines of demarcation” (Durkheim & Mauss, 1963, pag. 4). Sin embargo,
naturaleza, and therefore science, with all its complexity, does not conform to any particular cat-
egorization or hierarchical structuring (Bryant, 2000) and there is no singular or perfect clas-
sification (Glänzel & Schubert, 2003). Despite inherent limitations, classifications are of
practical use to organize and study knowledge. Many classification schemes of science and
scientific literature have been proposed, with different levels of granularity and/or hierarchy.
Different schemes have different levels of complexity and sophistication, and criteria can be
constructed to compare and evaluate them (Rafols & Leydesdorff, 2009).
The classification of scientific literature has been pursued within quantitative science stud-
ies since at least the 1970s (p.ej., Carpintero & Narin, 1973; Narin, Carpintero, & Berlt, 1972;
Pequeño & Griffith, 1974; Pequeño & Koenig, 1977). A number of studies frame this research as
discipline/field delineation or delimitation (Gläser, Glänzel, & Scharnhorst, 2017; Gómez,
Bordones, Fernandez, & Méndez, 1996; López-Illescas, Noyons, Visser, De Moya-Anegón, &
Moed, 2009; Zitt, 2015). The search for adequate solutions to classification has intensified in
recent years, often motivated by finding appropriate reference sets for citation normalization
Derechos de autor: © 2020 Staša Milojević.
Publicado bajo Creative Commons
Atribución 4.0 Internacional (CC POR 4.0)
licencia.
La prensa del MIT
Practical method to reclassify Web of Science articles
needed for evaluation studies (Bornmann, 2014; Glänzel & Schubert, 2003; Haunschild, Schier,
Marx, & Bornmann, 2018; Leydesdorff & Bornmann, 2016).
Recent classification efforts have most commonly been divided into journal-focused and
paper (artículo)-focused solutions. The most prevalent and widely used classification of litera-
ture into disciplines is via journals, based on a simplistic assumption that a discipline can be
defined through journal subject categories (Carpintero & Narin, 1973; Narin, 1976; Narin,
Pinski, & Gee, 1976). Such approach is not surprising—journals often serve as anchors for
individual research communities, and new journals may signify the formations of disciplines.
On a more practical note, the Web of Science (WoS) Journal Citation Reports subject catego-
ries are “one of the few classification systems available, spanning all disciplines” (Rinia, camioneta
Leeuwen, Bruins, van Vuren, & Van Raan, 2001, pag. 296), and is easy to implement since it is
available for items in one of the most widely used bibliographic databases, WoS. WoS clas-
sifies all of the journals it indexes into approximately 250 groups called subject categories.
Each journal is classified into one, or up to six, subject categories. The classification uses a num-
ber of heuristics and its rather general description is provided by Pudovkin and Garfield
(2002). WoS classification is not explicitly hierarchical, even though some subject categories
can be considered as part of other, broader ones. Además, WoS contains categories that
are explicitly broad (labeled as multidisciplinary) in order to describe the content of journals
that publish across one broad area or across the entire field of science.
A través de los años, a number of other journal-centered classifications have been developed.
Most of them are hierarchical. For example Scopus, another major bibliographic database,
uses All Science Journal Classification (ASJC). Fundación Nacional de Ciencia (NSF) uses a
two-level system in which journals are classified into 14 broad fields and 144 lower level
fields known as CHI, after Narin and Carpenter’s company, Computer Horizons, Cª, cual
developed it in the 1970s (Archambault, Beauchesne, & Caruso, 2011). Science-Metrix uses a
three-level classification that classifies journals into exclusive categories using both algorith-
mic methods and expert judgment (Archambault et al., 2011). Glänzel and Schubert (2003)
developed KU Leuven ECOOM journal classification. Gómez-Núñez, Vargas-Quesada, de
Moya-Anegón, and Glänzel (2011) used reference analysis to reclassify the SCImago Journal
and Country Ranks (SJR) journals into 27 areas and 308 subject categories. Some classifica-
tions used a hybrid method combining text and citations to cluster journals (Janssens, zhang,
De Moor, & Glänzel, 2009). Chen (2008) has used WoS as a starting point for developing a
classification using an affinity propagation method on journal-to-journal citation network. El
University of California San Diego (UCSD) classification has been developed in mapping of
science efforts (Börner et al., 2012).
Journal-level classification suffers from a number of problems, many of which have been
pointed out previously. Por ejemplo, Klavans and Boyack (2017) found journal-based taxon-
omies of science to be more inaccurate than topic-based ones and therefore argued against
their use. Similar findings were reported in a recent study that carried out direct comparison of
journal- and article-level classifications (Shu et al., 2019) reporting that journal-level classifi-
cations have the potential to misclassify almost half of the papers. The issues with accuracy
might be tied to the increase both in the number of journals that publish papers from multiple
research areas and the number of papers published in those journals, making journal-level
classifications problematic (Gómez et al., 1996; Wang & waltman, 2016). Although journal-
level classifications underperform compared to article-level classification in microlevel anal-
yses, they might still be useful for (nonevaluative) macrolevel analysis (Leydesdorff & Rafols,
2009; Rafols & Leydesdorff, 2009).
Estudios de ciencias cuantitativas
184
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
The use of journals as an appropriate level for classification has been problematized even
for journals with unique, nonmultidisciplinary classification in WoS, given that a journal may
publish articles from different disciplines and would not be the right unit to capture interdis-
ciplinary activities (Abramo, D’Angelo, & zhang, 2018; Klavans & Boyack, 2010). Boyack and
Klavans (2011, pag. 123) suggest that “few journals are truly disciplinary.” In their study of re-
search specialties, Small and Griffith (1974) found journals to be too broad a unit of analysis
and called for the use of publications instead. The mounting body of research pointing to the
drawbacks of journal-based classifications has prompted the development of article-level clas-
sifications. These efforts are usually accompanied by the development of new classification
schemes, and are often called algorithmic classifications, due to the clustering techniques used
to come up with classes and categories (Ding, Ahlgren, Cual, & yue, 2018). Klavans and Boyack
(2010) have pioneered these classifications at large scale using cocitation techniques (biblio-
graphic coupling of references and keywords) at the paper level to develop the SciTech
Strategies (STS) schema consisting of 554 temas, and an alternative method based on cocitation
analysis of highly cited references to identify over 84,000 paradigms. Further advances in these
techniques were made by Waltman and van Eck (2012), who used direct citations with the min-
imum number of publications per cluster and a resolution parameter to come up with a three-
level classification. Their work has been further advanced by creating a number of algorithmic
classifications at different levels of granularity (Ruiz-Castillo & waltman, 2015) and searches for
the optimal resolution parameter for the level of topics (Sjögårde & Ahlgren, 2018). Además,
because these methods are based on clustering algorithms, and it has long been argued that the
resulting classifications are not algorithm-neutral (Leydesdorff, 1987), some studies addressed
how different algorithms affect resulting classifications (Šubelj, van Eck, & waltman, 2016).
En general, the article-based classifications have been praised for being able to classify papers re-
gardless of the type of journals they were published in and placing each publication into a single
class/category. One of the drawbacks of the paper-level classification is the problem of naming
the classes/categories (Perianes-Rodriguez & Ruiz-Castillo, 2017) making these classifications
problematic for macrolevel analysis (Ding et al., 2018).
The usefulness of classification schemes for science studies and research evaluation is not
determined only by its quality, but also by the availability of a classification of scientific liter-
ature at all levels of analysis (from micro to macro), flexibility for different purposes, y el
simplicity of interpretation and reproduction. Although it is clear that journal-level classifica-
tions in general, and WoS journal-level classification in particular, have a number of short-
comings, they are still widely used, primarily because of their wide availability and the
familiarity of audiences with WoS subject categories. An article-level classification that would
still use the familiar WoS subject categories would be a welcome and practical solution to
some of the problems of journal-level classification, but no such classification currently exists.
The purpose of this work is to fill this gap by presenting a flexible, simple and easily reproduc-
ible method to reclassify WoS items using existing WoS categories, but at the article level.
Such a classification is particularly useful for “descriptive bibliometrics” (Borgman & Furner,
2002) or “science of science” (Fortunato et al., 2018) investigación, especially when the compar-
ison across all the fields and over long time periods is needed.
In addition to being journal level, there are two additional practical problems with WoS
classification that will be addressed in the proposed reclassification. One problem is related
to different levels of specialization of journals (Glänzel, Schubert, & Czerwon, 1999). El
scope of journals ranges from highly specialized ones, via those that cover a whole range
of subfields within a field or a discipline (p.ej., general journals in physics or chemistry), a
journals covering multiple disciplines or fields (Narin, 1976). In WoS subject categories,
Estudios de ciencias cuantitativas
185
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
journals that cover entire large disciplines (broader than typical subject categories) are classi-
fied as “multidisciplinary” (p.ej., “Physics, multidisciplinary” includes journals containing indi-
vidual articles actually belong to specific subject categories, such as “Physics, nuclear”;
“Optics”; and “Thermodynamics”). Además, there are journals such as Nature, Ciencia
and PNAS that cover many disciplines and are classified in WoS as “Multidisciplinary
Sciences.” Such journals rarely carry truly multidisciplinary articles but rather articles from a
large number of disciplines (katz & Hicks, 1995; waltman & van Eck, 2012). Altogether, 10%
of WoS items belong to nine explicit “multidisciplinary” categories. Without the means to es-
tablish their true subject category, these articles are often excluded from the analyses of dis-
ciplinary practices, thus removing what are often articles with high impact (Fang, 2015). Como un
solution to this problem, a number of researchers have suggested reclassification of individual
articles in such journals, especially in the subject category “Multidisciplinary Sciences.” Many of
the proposed solutions are based on the references of the articles (p.ej., Glänzel & Schubert,
2003; Glänzel, Schubert, & Czerwon, 1999; Glänzel, Schubert, Schoepflin, & Czerwon,
1999; López-Illescas et al., 2009). A more recent solution to this problem utilized both citing
and cited publications as a basis for reclassification (Ding et al., 2018). Our article-level reclas-
sification of WoS classifies articles from such multidisciplinary journals into other more specific
WoS subject categories.
The second problem of WoS classification is the lack of exclusivity (Bornmann, 2014;
Herranz & Ruiz-Castillo, 2012a, b). Namely, many journals in WoS (containing, by our esti-
compañero, 40% of all items in WoS) are assigned more than one subject category (in agreement
with other studies, such as Herranz and Ruiz-Castillo (2012a), who reported that 42% de 3.6 millón
articles published in 1998–2002 were assigned to more than one category, and Wang and
waltman (2016), who reported that almost 60% of journals in WoS are assigned a single category).
Multiple subject categories lead to ambiguities when it comes to the analysis. Should such articles
be counted in each category, artificially increasing their weight in the overall analysis? Should they
be counted fractionally, thus decreasing their weight within a single category? How to treat them
when a nonoverlapping delineation is desired, as is often the case? Most journals are assigned mul-
tiple categories because they cover more than one subject, even though articles in them usually
deal predominantly with one subject. Less often the articles, and not just the journal, are indeed
positioned at the intersection of several subjects, and multiple subjects may be appropriate. En
such cases we may still wish to assign a primary single category to arrive at nonoverlapping
delineation of scientific literature. As in the case of “multidisciplinary” categories, references
have been proposed for the classification of journals (and articles) with multiple WoS categories
into unique categories (p.ej., Glänzel & Schubert, 2003; Glänzel, Schubert, & Czerwon, 1999;
Narin, 1976; Narin et al., 1976). Our article-level reclassification will assign the most prevalent
subject category as the single category for each article and remove the ambiguity. Información
regarding potential multidisciplinarity at the level of article will nevertheless be retained if re-
quired for the analysis.
Finalmente, many of the large-scale studies, especially those that are comparative in nature,
require a smaller number of broader classes. To achieve this goal, we additionally categorize
articles into 14 broad areas, based on NSF WebCASPAR classification (Javitz et al., 2010).
2. PROPOSED APPROACH
In this paper we propose a reference-based (re)classification system that can easily be applied
at various levels of granularity. The approach is relatively straightforward and allows for easy
reproducibility. También, by using existing WoS subject categories as units of classification, el
Estudios de ciencias cuantitativas
186
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
approach obviates the need to develop an independent scheme for defining and naming of the
classes/categories.
Following previous efforts, our approach is to use each item’s references to infer the topic of
a bibliographic item. Sin embargo, given the problems identified above, we initially use only ref-
erences that were published in journals that have a single subject category that is not “multi-
disciplinary” (es decir., it is not published in multidisciplinary or general disciplinary journals). Semejante
an approach appears appropriate given that previous studies have found WoS subject catego-
ries to be fairly precise description of subjects of individual articles published in journals de-
scribed with one or two subject categories (Glänzel, Schubert, & Czerwon, 1999; Glänzel
et al., 1999) and that central journals within particular disciplines “exhibit little cross citing”
(Narin, 1976, pag. 194). For the purposes of this paper, we refer to such items as classifier ref-
erences or classifiers. The tallying of the subject categories of classifier references allows us to
determine the unique WoS subject category of items that originally had multiple categories or
were placed in multidisciplinary categories. Sin embargo, what is novel in our approach is that
the method is applied to reclassify all items that contain classifier references, whether they had
unique original (journal-based) classification or not, in order to obtain a consistent compre-
hensive classification at the level of individual items (es decir., artículos). También, unlike a number of
other approaches, this one does not apply a particular threshold that an item should meet in
order to be classified into a particular category (p.ej., Fang, 2015; Glänzel, Schubert, &
Czerwon, 1999; Gómez-Núñez et al., 2011; López-Illescas et al., 2009), giving every item a
definitive category.
The proposed approach allows both for the classification into exclusive classes (donde cada
article is placed into a single class) y, if needed for particular research questions, a construc-
tion of a detailed vector description of disciplinary composition of articles (and consequently,
of journals, autores, etc.), which will be described in a future work.
In the remainder of the paper we describe the data, methodology and evaluation of the
proposed approach using WoS. The approach itself is rather general and a similar methodol-
ogy can be used both to reclassify articles in WoS using a different starting classification of
core journals or classifying articles in other databases that use journal-level classifications.
We present the results of the classification of individual items both at the level of subject cat-
egories and an aggregated level of broad research areas. New classifications are evaluated
using an automated method and validated using blind manual classification.
3. DATA AND METHODOLOGY
3.1.
Initial Reclassification
Para (re)classification we use the full WoS Core Collection database, containing items pub-
lished from 1900 through the end of 2017. The database contains 69 million items (biblio-
graphic entries), of which 55 million have at least one reference recorded in the database.
WoS items belong to different document types: artículos, proceedings papers, editorials, dejar-
ters, reviews, etc.. We perform the classification on (and using) all document types but carry
out the evaluation and validation on document types article and proceedings paper—the items
containing original research and most often used in analyses. Hay 45 million items of
these two types in WoS with at least one reference, and we refer to them collectively as just
the “articles.” The edition of WoS used in this work uses 252 subject categories. Classification
was extracted from the SQL table subjects using the subject category collection referred to
by field ascatype as the “traditional” classification. Categories are listed in Table A.1 in the
Apéndice.
Estudios de ciencias cuantitativas
187
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
For higher level classification, we place each of 252 subject categories into 14 broad areas.
Names of broad areas are taken from NSF WebCASPAR Broad Field (Javitz et al., 2010), ex-
cept that we include their “Other life sciences” within “Medical sciences.” Mapping between
WoS subject categories and our broad areas, given in Table A.1, follows Javitz et al. (2010)
mapping between the ipIQ Fine Field category (formerly CHI category) and WebCASPAR
Broad Field whenever there is an ipIQ category that clearly matches WoS category. En otra
instancias (half of all WoS categories) the broad category is determined by the author.
WoS attempts to match each item’s references to other items in WoS. It is the items that
have matched references that can be reclassified using the proposed method. Además,
to allow initial classification using our method, the references need to be classifiers (es decir., elementos
whose original classification is unique and not multidisciplinary). Forty-one million items
contain classifier references and can therefore be classified into subject categories, of which
36 million are articles, representing 79% of all articles with references. We will outline later in
this section how this percentage can be further increased using an iterative approach.
Classification into broader areas is possible for a larger number of items (44 million of any
tipo, y 38 million articles), because classifiers can include items classified as multidisciplin-
ary as long as they can be placed in some broad area (p.ej., category “Physics, multidisciplin-
ary” can be used, but “Multidisciplinary Sciences” cannot). The fraction of articles (containing
references) that can be classified, as a function of publication year, se muestra en la figura 1. El
fractions are above 90% in recent years and are relatively high since the 1950s. The rising
trend is likely a combination of several factors: more complete efforts on behalf of database
administrators to match the references in recent publications, journal articles becoming “the
central medium for the dissemination and exchange of scientific ideas” (Bowker, 2005, pag. 126),
and the overall increase in the number of references per paper over time (Milojević, 2012; Precio,
1963; van Raan, 2000), all of which increase the chances of an article containing classifier
references. The items that remain without new classification are rarely full-fledged research
papers but most often items such as book reviews or short conference proceedings.
For classification at the subject category level, 20 million items serve as classifiers. An al-
gorithm for the entire classification procedure is given in Figure 2. Classification at the level of
subject categories proceeds as follows. For each classifiable item we go through all of its clas-
sifier references and produce a ranked list of their subject categories. A subject category that is
Cifra 1. Percentage of all articles containing references that can be reclassified into subject cat-
egories or broad areas as a function of article publication year. Numbers are based on initial reclas-
sification. An iterative pass will increase the percentage of articles classified into subject categories
por 5%.
Estudios de ciencias cuantitativas
188
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
Cifra 2. An algorithm (pseudocode) describing the reclassification procedure.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
the most frequent is adopted as a new (reclassified) subject category. Most often the distribu-
tion of categories is dominated by the most frequent subject category (the article is predomi-
nantly unidisciplinary). Ocasionalmente, the tallying results in a tie between two most frequent
categories (13% of cases). We attempt to break the ties by adding to the tally the original sub-
ject category (or categories, if they were multiple). This can be done if the original subject
category is nonmultidisciplinary. In this way, 52% of the ties can be broken. De lo contrario, nosotros
adopt as the final classification the category with a larger number of articles.
The granularity of reclassified subject categories defined as the number of items divided by
the sum of the items in each category squared (waltman, Boyack, Colavizza, & Van Eck,
−6 for the original classification (es decir., it is relatively
2019) es 1.5 × 10
similar). The number of categories of different sizes (es decir., total number of reclassified items) es
presented in Figure 3. Categories span a wide range of sizes.
−6, compared to 2.3 × 10
Classification at the level of broad areas proceeds in the same way, except that the ranked
list is made of classifiers’ broad areas. For classification into broad areas, the number of clas-
sifiers is 50% larger than in the case of subject categories (30 millón), because individual sub-
ject categories of items that have multiple subject categories most often belong to the same
broad area, and such items are therefore eligible to serve as classifiers. For the classification of
items into broad areas, ties happen in 4% of all cases, and can be resolved by including the
Estudios de ciencias cuantitativas
189
Practical method to reclassify Web of Science articles
Cifra 3. Size distribution of WoS subject categories after initial reclassification.
original broad area in the ranked list in 69% of those cases. De lo contrario, we take the more
populous category as the final one.
En general, the classification is not sensitive to the extent of the classifier set. We perform the
test in which we base the classification on only half of all available classifiers. La resultante
broad categories agree with the ones obtained with the full classifier set in 94% of cases.
The exact counts pertaining to the data set and initial reclassification are provided in
Tables 1 y 2.
3.2.
Iterative Reclassification
Once the reclassification has been carried out, it is possible and often recommended to carry
out the process of reclassification iteratively. In iterative reclassification, the tallying of subject
categories of references and the determination of which reference can serve as classifier is
based on the reclassified subject categories (or broad areas, for the high-level classification).
The process can be repeated multiple times, but here we limit ourselves to one iterative pass
and the quality and extensiveness of this second reclassification compared to the first. El
iterative pass is procedurally similar to the original one, and the needed modifications are laid
out in Figure 2. After the iterative pass 9% of items acquire a different broad-area classification,
y 20% of items acquire a different subject category.
There are two principal reasons for carrying out the iterative pass: an increase in the num-
ber of items that can be classified, y, potentially, an increased accuracy of new categories.
In the original pass only items that had classifier references could be classified, cual, as we
have shown, represents 79% of all articles, y alrededor 90% of recent articles. Items that have
only had references with multiple original categories and/or multidisciplinary categories could
Mesa 1. Number of items from the Web of Science used in (re)clasificación
All items
with references
multidisciplinary
All types
69,326,147
54,581,163
5,585,211
multidisciplinary science
1,317,033
Artículos + conference proceedings
49,775,351
45,219,572
4,640,854
1,071,437
Estudios de ciencias cuantitativas
190
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
Mesa 2. Number of classified items of different types after initial reclassification. Percentage in parentheses is with respect to all such items
with references
Subject category classification
Broad area classification
All types
Artículos + conference proceedings
All types
Artículos + conference proceedings
Classifier items
20,286,801
29,853,395
Classified items
41,132,197 (75%)
35,940,588 (79%)
43,847,374 (80%)
38,118,382 (84%)
multidisciplinary
3,719,208 (67%)
2,599,373 (56%)
multidisciplinary
896,169 (68%)
740,592 (69%)
909,543 (69%)
792,875 (74%)
ciencia
not be classified. Sin embargo, after the first reclassification, most of these references will receive
a unique, nonmultidisciplinary classification and can now serve as classifiers. The numbers of
items and articles that can be classified in the iterative pass are presented in Table 3.
Comparing these numbers to those in Table 2 we see a relatively significant increase in the
number of items or articles that get classified into subject categories (∼8%) and a more modest
increase of items/articles classified into broad areas (∼2%).
The increase of completeness using the iterative pass is especially significant in the cases
where the majority of the journals in some discipline originally had multiple WoS categories
and were therefore precluded from serving as classifier references. Although such cases are not
common in general, one of them happens to include core journals in quantitative studies of
ciencia. Específicamente, Journal of Informetrics (JoI), cienciometria, and Journal of the Association
for Information Science and Technology (JASIST) are all listed with two WoS subject categories:
“Computer Science, Interdisciplinary Applications” and “Information Science & Library
Ciencia,” which means that they cannot serve as classifiers, at least not in the initial pass. Para
ejemplo, out of 840 items published in JoI, 663 can be classified in the first pass (79%), a lower
fraction than on average. Curiosamente, of the classified items, 41% received the classification of
“Information Science & Library Science,” whereas essentially none were classified as “Computer
Ciencia, Interdisciplinary Applications.” This shows that the reclassification successfully rejected
this obviously inappropriate categorization. In the iterative pass, sin embargo, the number of clas-
sified articles increased substantially, a 796 (95% of total). Además, 52% have now re-
ceived the classification of “Information Science & Library Science,” the most of any category.
Other frequent categories included “Economics” (9%), “History and Philosophy of Science”
(8%), and “Sociology” (6%).
Mesa 3. Number of classified items of different types after the second (iterative) reclassification
Subject category classification
Broad area classification
All types
Artículos + conference proceedings
All types
Artículos + conference proceedings
Classifier items
36,104,403
38,504,614
Classified items
44,349,678 (81%)
38,450,585 (85%)
44,936,331 (80%)
38,918,386 (84%)
multidisciplinary
4,317,080 (77%)
2,931,707 (63%)
multidisciplinary
1,011,770 (77%)
804,203 (75%)
968,783 (74%)
822,849 (77%)
ciencia
Estudios de ciencias cuantitativas
191
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Practical method to reclassify Web of Science articles
Para concluir, extending the classification to include the iterative pass provides an increase
in the number of classified items (especially at subject category level), which for certain cases
can be quite significant.
4. VALIDATION AND EVALUATION
The validation and evaluation of the approach and of the final reclassification is performed
using three tests, each serving a separate purpose:
1. Automatic internal test against the original WoS classification, in order to validate the
methodology.
2. Manual tests in order to evaluate the accuracy of reclassification in comparison to the
original WoS classification.
3. Manual external test in order to evaluate the overall reliability of the resulting classification.
4.1. Validación
To validate the methodology and hone the approach, we have performed an automatic test by
calculating the percentage of articles whose original and new classifications agree. This test can
only be performed on items whose original classification was unique and nonmultidisciplinary.
This test is internal, because we do not evaluate the accuracy of the original WoS classification
using any external knowledge. We do not expect the test to produce 100% agreement. First of
todo, the reclassification is at the level of articles, whose topics may be to some extent different
from those of their journals, y segundo, because the subject categories are rarely entirely mu-
tually exclusive, so a reclassified category may be related but not exactly the same as the original
uno. The value of this test is in the relative assessment. When evaluating, Por ejemplo, two ar-
ticle-level classification schemes, the one that has a higher level of agreement with respect to,
however imperfect, reference classification (in this case the original classification), should be
considered more accurate internally. For the reclassification at the level of subject categories
we find the overall agreement to be 66% after the initial reclassification and 58% after the iter-
ative pass. En comparación, an alternative classification scheme that we devised but ultimately did
not adopt, which uses the similarity of titles to perform reclassification, had an agreement of
<50%. For this alternative method we calculated TF-IDF (“term frequency-inverse document
frequency”) values between each article title to be reclassified and each of the classifier articles
(articles that have a unique nonmultidisciplinary WoS category). In this case, IDF actually rep-
resents inverse title word frequency, which was first determined from the entire data set, and
TF-IDF is the sum of all IDFs of the words that overlap. For an article to be classified we adopt
the category of an article with the greatest TF-IDF value.
The level of agreement varies from one subject category to another. It is highest for astron-
omy and astrophysics—97%. The number of articles in different categories varies widely, with
the largest category being 2,000 times larger than the smallest (see Figure 3). We find that the
agreement is correlated with the size of the subject category, with larger categories having a
higher level of agreement. This is probably because some of the smaller categories can also be
considered subcategories of larger ones, so many of the articles get reclassified into these
larger categories. The opposite (an item that was originally in a larger category being reclassi-
fied into a smaller one) is less likely simply because there are fewer classifiers that belong to
smaller categories. Furthermore, small categories may represent more recent disciplines,
which would naturally cite works from the disciplines from which they emerged. As we will
see shortly, this lower level of agreement for smaller categories does not imply that the new
Quantitative Science Studies
192
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
category is incorrect—it may simply be placing individual items in a related, equally correct,
subject category or may reflect a high degree of interdisciplinarity of an article.
We perform a similar automatic validation for broad-area classification and find the overall
agreement of 85% after the initial reclassification and 82% after the iterative pass. Agreement
in different areas is now more similar, ranging from 60% for agricultural sciences (which
tends to be highly interdisciplinary) to 93% for astronomy and astrophysics (which has a
low degree of interdisciplinarity), and the level of agreement is not correlated with the size
of the area.
4.2. Manual Evaluation of Accuracy
The internal validation in itself does not allow us to evaluate the quality of the reclassification
with respect to the original classification. We assess this by manual evaluation, performed by
the author, in the following way. For 142 randomly selected articles whose original classifica-
tion was unique and non-multidisciplinary, we output the original and new subject categories.
The order in which the two categories are written out is randomly reversed in 50% of cases.
The evaluator does not know a priori which category is original and which is new—this
information is saved separately and is used only after the evaluation was performed. The eval-
uator’s task is to select the subject category that better describes the article based on its title
(and abstract, if necessary), but ignoring the name of the journal, so as not to bias the assess-
ment, because the journal topic was the basis for the original classification. If both categories
are estimated to be equally appropriate, this is also indicated. After the initial reclassification,
91 out of 142 articles had the same new and old category (64%; in agreement with the full
sample). For 25 articles, the old and new categories were equally good (most often because
one category can be considered a part of another). Of the remaining 26 articles, the original
classification was considered better in 15 cases and the new one in 11 cases. In 15 cases
where the original classification was considered better, the new one was still essentially cor-
rect in 13 cases. Altogether, the initial reclassification is nearly as good as the original one (i.e.,
we have not introduced spurious results in the process of reclassification). The differences
between the original and new classifications revealed by automated validation can be attrib-
uted to articles’ interdisciplinarity (such that both categories are correct) and to somewhat
stratified, nonexclusive nature of WoS subject categories (again making both categories
correct).
Manual evaluation is also carried out for the same 142 articles for their broad-area classi-
fications. The areas agree for 124 articles (87%; in agreement with the full sample) and are
considered equally good in four cases. Of the remaining 14 articles, the original classification
is considered better in only three cases, and the new area is considered more accurate in the
remaining 11 cases (i.e., the new classification is overall somewhat better).
4.3. Manual Evaluation of Reliability
The overall reliability of the new classification is what is ultimately of most interest. We test it
based on an external assessment, which looks at all items irrespective of how the items were
originally classified (i.e., it includes items that originally had ambiguous classification or where
the classification was effectively missing because the item was published in a multidisciplinary
journal). The test is performed by the author by evaluating the correctness of subject categories
and broad areas of 100 randomly selected items, based on their titles and abstracts. We find
92% of subject categories and 95% of broad areas to be correct after the initial reclassification.
The accuracy increased to 95% for subject categories and 97% for broad areas after the
Quantitative Science Studies
193
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
iterative pass. It needs to be pointed out that whereas the error rate is relatively small across the
entire data set, it need not be uniform in different disciplines or for different journals, so it is
advisable to perform similar manual tests for subsets of a data set that one wishes to study.
5. EXTENSION OF THE METHOD USING CITATION DATA
It is in principle possible to adapt our method to use not only the references as the basis for
reclassification but also the citations. Citations, at least in the initial reclassification, would also
have to come from sources that have a unique, nonmultidisciplinary WoS category. The use of
citations may allow some items to be classified that otherwise did not have classifier refer-
ences. We carry out such reclassification at broad-area level and find that the number of clas-
sified items increases from 43,847,374 (63% of all possible items, regardless of whether they
had references or not) to 47,593,363 (69%). The increase exceeds that from the iterative pass
(44,936,331 or 65%). The fraction is still short of 100% because most of the items that lack
references also lack citations (most of them are not really citable items.) One possible draw-
back of using citations is the disproportionality of information available for different items.
Unlike references, the number of which tends to be normally distributed, the citations follow
a power law distributions, with most articles having few citations and few having thousands.
Furthermore, citations constantly change, making the proposed procedure essentially non
reproducible.
There are 6% of articles with no linked references or citations. These are mostly items more
than half a century old. For these items, one could apply the TF-IDF method that we discussed
in Section 4, which has 100% completeness.
6. DISCUSSION AND CONCLUSION
This paper proposes a method of classification that is based on references and applies it to
classifying WoS articles, both at the field and broad research area levels. Although some of
the proposed clustering-based methods may lead to a better delineation, especially for citation
normalization, the proposed method has a number of advantages: It is easily replicated and uti-
lizes widely used WoS subject categories and NSF broad subject areas, does not require exten-
sive computational resources (∼40 million articles can be classified on a personal computer
within several hours), and avoids the problem of naming classes/categories (something that
article-level classifications have struggled with but are making progress on due to more sophis-
ticated natural language processing approaches and including a wider range of fields of biblio-
graphic records). The major purpose for this classification is devising a flexible and simple way
of classifying all of the WoS literature for the purposes of “descriptive bibliometrics” or “science
of science” studies. The classification has not been designed for the purposes of research eval-
uation, and if used in that context, may be outperformed by approaches that identify more
focused comparison sets, as in Colliander and Ahlgren (2019), for example.
The major limitations of the proposed method are tied to its usage of WoS subject catego-
ries as a starting point and references as a major source of data. Because it uses WoS subject
categories as seeds, the proposed classification will inherit some of the known problems of this
classification, primarily having to do with erroneous lumping of unconnected journals into a
single category. This limitation can potentially be alleviated by the iterative procedure.
Furthermore, because the method is based on references, it can be applied only to the items
that have references. This should not be a problem with most contemporary original research but
may prove problematic for other types of contributions and for older items. At the same time,
relying on references rather than citations, as in some other studies, has some advantages,
Quantitative Science Studies
194
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
since more articles have cited other works than are cited themselves. This should lead to a
higher recall than citation-based classifications have. An approach that combines references
and citations is also possible and was described.
Overall, we find the error rate of the resulting classification to be relatively low (<5%) mak-
ing it a reasonably reliable basis for a wide range of studies. However, the accuracy may be
higher or lower for specific research areas, so, as with any classification, users should exercise
caution and validate the classification for the sample of interest. Also, as we have pointed out,
especially at the level of 252 subject categories, it is often the case that more than one category
is essentially correct, so it is advisable to consider all potentially relevant categories when the
recall of a sample is important. This is less of an issue for broad areas.
ACKNOWLEDGMENTS
This work uses Web of Science data by Clarivate Analytics provided by the Indiana University
Network Science Institute and the Cyberinfrastructure for Network Science Center at Indiana
University.
AUTHOR CONTRIBUTIONS
Staša Milojević: conceptualization, data curation, formal analysis, methodology, writing.
COMPETING INTERESTS
No competing interests to declare.
FUNDING INFORMATION
This material is partially based upon work supported by the Air Force Office of Scientific
Research under award number FA9550-19-1-0391.
DATA AVAILABILITY
The data used in this paper is proprietary and cannot be posted in a repository.
REFERENCES
Abramo, G., D’Angelo, C. A., & Zhang, L. (2018). A comparison of
two approaches for measuring interdisciplinary research output:
The disciplinary diversity of authors vs the disciplinary diversity
of the reference list. Journal of Informetrics, 12(4), 1182–1193.
Archambault, É., Beauchesne, O. H., & Caruso, J. (2011). Towards
a multilingual, comprehensive and open scientific journal ontol-
ogy. Paper presented at the Proceedings of the 13th International
Conference of the International Society for Scientometrics and
Informetrics, South Africa: Durban.
Borgman, C. L., & Furner, J. (2002). Scholarly communication and
bibliometrics. In B. Cronin (Ed.), Annual Review of Information
Science and Technology (Vol. 36, pp. 3–72). Medford, NJ:
Information Today.
Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R.
P., …, Boyack, K. W. (2012). Design and update of a classification
system: The UCSD map of science. PLOS One, 7(7), e39464.
Bornmann, L. (2014). Assigning publications to multiple subject
categories for bibliometric analysis: An empirical case study
based on percentiles. Journal of Documentation, 70(1), 52–61.
Bowker, G. C. (2005). Memory practices in the sciences. Cambridge,
MA: MIT Press.
Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal
specificity: Why journals can’t be assigned to disciplines. Paper
the International Society
presented at The 13th Conference of
for Scientometrics and Informetrics, Durban, South Africa.
Bryant, R. (2000). Discovery and decision: Exploring the metaphysics
and epistemology of scientific classification. London: Associated
University Presses.
Carpenter, M. P., & Narin, F. (1973). Clustering of scientific jour-
nals. Journal of the American Society for Information Science,
24(6), 425–436.
Chen, C. M. (2008). Classification of scientific networks using ag-
gregated journal-journal citation relations in the Journal Citation
Reports. Journal of the American Society for Information Science
and Technology, 59(14), 2296–2304.
Colliander, C., & Ahlgren, P. (2019). Comparison of publication level
approaches to ex post citation normalization. Scientometrics,
120(1), 283–300.
Quantitative Science Studies
195
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Ding, J., Ahlgren, P., Yang, L., & Yue, T. (2018). Disciplinary struc-
tures in Nature, Science and PNAS: Journal and country levels.
Scientometrics, 116(3), 1817–1852.
Dolby, R. G. A. (1979). Classification of the sciences: The nine-
teenth century tradition. In R. F. Ellen & D. Reason (Eds.),
Classifications in Their Social Context (pp. 167–193). London:
Academic Press.
Durkheim, E., & Mauss, M. (1963). Primitive classification. Chicago:
University of Chicago Press.
Fang, H. (2015). Classifying research articles in multidisciplinary sci-
ence journals into subject categories. Knowledge Organization,
42(3), 139–153.
Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D.,
Milojević, S., …, Barabási, A.-L. (2018). Science of science.
Science, 359(6379), eaao0185.
Glänzel, W., & Schubert, A. (2003). A new classification scheme of
science fields and subfields designed for scientometric evaluation
purposes. Scientometrics, 56(3), 357–367.
Glänzel, W., Schubert, A., & Czerwon, H. J. (1999). An item-by-item sub-
ject classification of papers published in multidisciplinary and general
journals using reference analysis. Scientometrics, 44(3), 427–439.
Glänzel, W., Schubert, A., Schoepflin, U., & Czerwon, H. J. (1999).
An item-by-item subject classification of papers published in
journals covered by the SSCI database using reference analysis.
Scientometrics, 46(3), 431–441.
Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Same data—different
results? Towards a comparative approach to the identification of
thematic structures in science. Scientometrics, 111(2), 981–998.
Gómez-Núñez, A. J., Vargas-Quesada, B., de Moya-Anegón, F., &
Glänzel, W. (2011). Improving SCImago Journal & Country Rank (SJR)
subject classification through reference analysis. Scientometrics,
89(3), 741–758.
Gómez, I., Bordons, M., Fernandez, M., & Méndez, A. (1996). Coping
with the problem of subject classification diversity. Scientometrics,
35(2), 223–235.
Haunschild, R., Schier, H., Marx, W., & Bornmann, L. (2018).
Algorithmically generated subject categories based on citation
relations: An empirical micro study using papers on overall water
splitting. Journal of Informetrics, 12(2), 436–447.
Herranz, N., & Ruiz-Castillo, J. (2012a). Multiplicative and fraction-
al strategies when journals are assigned to several subfields.
Journal of the American Society for Information Science and
Technology, 63(11), 2195–2205.
Herranz, N., & Ruiz-Castillo, J. (2012b). Sub-field normalization in
the multiplicative case: High- and low-impact citation indicators.
Research Evaluation, 21(2), 113–125.
Janssens, F., Zhang, L., De Moor, B., & Glänzel, W. (2009). Hybrid
clustering for validation and improvement of subject-classification
schemes. Information Processing & Management, 45(6), 683–702.
Javitz, H., Grimes, T., Hill, D., Rapoport, A., Bell, R., Fecso, R., &
Lehming, R. (2010). U.S. Academic Scientific Publishing. Working
paper SRS 11-201. Arlington, VA: National Science Foundation,
Division of Science Resources Statistics.
Katz, J. S., & Hicks, D. (1995). The classification of interdisciplin-
ary journals: A new approach. Paper presented
at the Proceedings of the Fifth International Conference of the
International Society for Scientometrics and Informetrics, Rosary
College, River Forest, IL.
Klavans, R., & Boyack, K. W. (2010). Toward an objective, reliable and
accurate method for measuring research leadership. Scientometrics,
82(3), 539–553.
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis
generates the most accurate taxonomy of scientific and technical
knowledge? Journal of the Association for Information Science
and Technology, 68(4), 984–998.
Leydesdorff, L. (1987). Various methods for the mapping of science.
Scientometrics, 11(5–6), 295–324.
Leydesdorff, L., & Bornmann, L. (2016). The operationalization of
“fields” as WoS subject categories (WCs) in evaluative bib-
liometrics: The cases of “library and information science” and
“science & technology studies.” Journal of the Association for
Information Science and Technology, 67(3), 707–714.
Leydesdorff, L., & Rafols, I. (2009). A global map of science based
on the ISI subject categories. Journal of the American Society for
Information Science and Technology, 60(2), 348–362.
López-Illescas, C., Noyons, E. C., Visser, M. S., De Moya-Anegón,
F., & Moed, H. F. (2009). Expansion of scientific journal catego-
ries using reference analysis: How can it be done and does it
make a difference? Scientometrics, 79(3), 473–490.
Milojević, S. (2012). How are academic age, productivity and
collaboration related to citing behavior of researchers? PLOS
One, 7(11), e49176.
Narin, F. (1976). Evaluative bibliometrics: The use of publication
and citation analysis in the evaluation of scientific activity.
Cherry Hill, NJ: Computer Horizons.
Narin, F., Carpenter, M., & Berlt, N. C. (1972). Interrelationships of
scientific journals. Journal of the American Society for Information
Science, 23(5), 323–331.
Narin, F., Pinski, G., & Gee, H. H. (1976). Structure of the biomed-
ical literature. Journal of the American Society for Information
Science, 27(1), 25–45.
Perianes-Rodriguez, A., & Ruiz-Castillo, J. (2017). A comparison of
the Web of Science and publication-level classification systems
of science. Journal of Informetrics, 11(1), 32–45.
Price, D. J. d. S. (1963). Little science, big science. New York:
Columbia University Press.
Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for find-
ing semantically related journals. Journal of the American Society
for Information Science and Technology, 53(13), 1113–1119.
Rafols, I., & Leydesdorff, L. (2009). Content-based and algorithmic
classifications of journals: Perspectives on the dynamics of scientific
communication and indexer effects. Journal of the American Society
for Information Science and Technology, 60(9), 1823–1835.
Rinia, E. J., van Leeuwen, T. N., Bruins, E. E. W., van Vuren, H. G.,
& Van Raan, A. F. J. (2001). Citation delay in interdisciplinary
knowledge exchange. Scientometrics, 51(1), 293–309.
Ruiz-Castillo, J., & Waltman, L. (2015). Field-normalized citation
impact indicators using algorithmically constructed classification
systmes of science. Journal of Informetrics, 9(1), 102–117.
Shu, F., Julien, C.-A., Zhang, L., Qiu, J., Zhang, J., & Larivière, V.
(2019). Comparing journal and paper level classifications of sci-
ence. Journal of Informetrics, 13(1), 202–225.
Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically
constructed publication-level classifications of research publica-
tions: Identification of topics. Journal of Informetrics,
12(1), 133–152.
Small, H., & Griffith, B. C. (1974). The structure of scientific literatures I:
Identifying and graphing specialties. Science Studies, 4(1), 17–40.
Small, H., & Koenig, M. E. D. (1977). Journal clustering using a biblio-
graphic coupling method. Information Processing & Management,
13(5), 277–288.
Šubelj, L., van Eck, N. J., & Waltman, L. (2016). Clustering scientific
publications based on citation relations: A systematic comparison
of different methods. PLOS One, 11(4), e0154404.
van Raan, A. F. J. (2000). On growth, ageing, and fractal dif-
ferentiation of science. Scientometrics, 47(2), 347–362.
Quantitative Science Studies
196
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Waltman, L., Boyack, K. W., Colavizza, G., & Van Eck, N. J.
(2019). A principled methodology for comparing relatedness
measures for clustering publications. arXiv:1901.06815.
Waltman, L., & van Eck, N. J. (2012). A new methodology for con-
structing a publication-level classification system of science.
Journal of the American Society for Information Science and
Technology, 63(12), 2378–2392.
Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accu-
racy of the journal classification systems of Web of Science and
Scopus. Journal of Informetrics, 10(2), 347–364.
Zitt, M. (2015). Meso-level retrieval: IR-bibliometrics interplay and
hybrid citation-words methods in scientific fields delineation.
Scientometrics, 102(3), 2223–2245.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
197
Practical method to reclassify Web of Science articles
APPENDIX
Table A1. The list of WoS subject categories and corresponding broad areas
WoS subject category
Agriculture, Dairy & Animal Science
Agriculture, Multidisciplinary
Agronomy
Fisheries
Food Science & Technology
Forestry
Green & Sustainable Science & Technology
Horticulture
Astronomy & Astrophysics
Anatomy & Morphology
Biochemical Research Methods
Biochemistry & Molecular Biology
Biodiversity Conservation
Biology
Biophysics
Biotechnology & Applied Microbiology
Cell & Tissue Engineering
Cell Biology
Developmental Biology
Ecology
Entomology
Evolutionary Biology
Genetics & Heredity
Microbiology
Mycology
Nutrition & Dietetics
Ornithology
Paleontology
Parasitology
Physiology
Broad area
Agricultural sciences
Agricultural sciences
Agricultural sciences
Agricultural sciences
Agricultural sciences
Agricultural sciences
Agricultural sciences
Agricultural sciences
Astronomy
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Quantitative Science Studies
198
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Plant Sciences
Reproductive Biology
Virology
Zoology
Chemistry, Analytical
Chemistry, Applied
Chemistry, Inorganic & Nuclear
Chemistry, Medicinal
Chemistry, Multidisciplinary
Chemistry, Organic
Chemistry, Physical
Crystallography
Electrochemistry
Polymer Science
Spectroscopy
Computer Science, Artificial Intelligence
Computer Science, Cybernetics
Computer Science, Hardware & Architecture
Computer Science, Information Systems
Computer Science, Interdisciplinary Applications
Computer Science, Software Engineering
Computer Science, Theory & Methods
Medical Informatics
Agricultural Engineering
Automation & Control Systems
Construction & Building Technology
Energy & Fuels
Engineering, Aerospace
Engineering, Biomedical
Engineering, Chemical
Engineering, Civil
Broad area
Biological sciences
Biological sciences
Biological sciences
Biological sciences
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Chemistry
Computer sciences
Computer sciences
Computer sciences
Computer sciences
Computer sciences
Computer sciences
Computer sciences
Computer sciences
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Quantitative Science Studies
199
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Engineering, Electrical & Electronic
Engineering, Environmental
Engineering, Geological
Engineering, Industrial
Engineering, Manufacturing
Engineering, Marine
Engineering, Mechanical
Engineering, Multidisciplinary
Engineering, Ocean
Engineering, Petroleum
Imaging Science & Photographic Technology
Instruments & Instrumentation
Materials Science, Biomaterials
Materials Science, Ceramics
Materials Science, Characterization & Testing
Materials Science, Coatings & Films
Materials Science, Composites
Materials Science, Multidisciplinary
Materials Science, Paper & Wood
Materials Science, Textiles
Mathematical & Computational Biology
Medical Laboratory Technology
Metallurgy & Metallurgical Engineering
Mining & Mineral Processing
Nanoscience & Nanotechnology
Neuroimaging
Nuclear Science & Technology
Operations Research & Management Science
Remote Sensing
Robotics
Telecommunications
Broad area
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Engineering
Quantitative Science Studies
200
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Transportation
Transportation Science & Technology
Environmental Sciences
Environmental Studies
Geochemistry & Geophysics
Geography, Physical
Geology
Geosciences, Multidisciplinary
Limnology
Marine & Freshwater Biology
Meteorology & Atmospheric Sciences
Mineralogy
Oceanography
Soil Science
Water Resources
Archaeology
Architecture
Art
Asian Studies
Classics
Cultural Studies
Dance
Ethics
Ethnic Studies
Film, Radio, Television
Folklore
History
History & Philosophy Of Science
History Of Social Sciences
Humanities, Multidisciplinary
Language & Linguistics
Broad area
Engineering
Engineering
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Geosciences
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Quantitative Science Studies
201
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Literary Reviews
Literary Theory & Criticism
Literature
Literature, African, Australian, Canadian
Literature, American
Literature, British Isles
Literature, German, Dutch, Scandinavian
Literature, Romance
Literature, Slavic
Logic
Medical Ethics
Medieval & Renaissance Studies
Music
Philosophy
Poetry
Religion
Theater
Women’s Studies
Mathematics
Mathematics, Applied
Mathematics, Interdisciplinary Applications
Statistics & Probability
Allergy
Andrology
Anesthesiology
Audiology & Speech-Language Pathology
Cardiac & Cardiovascular Systems
Clinical Neurology
Critical Care Medicine
Dentistry, Oral Surgery & Medicine
Dermatology
Broad area
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Humanities
Mathematical sciences
Mathematical sciences
Mathematical sciences
Mathematical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Quantitative Science Studies
202
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Emergency Medicine
Endocrinology & Metabolism
Gastroenterology & Hepatology
Geriatrics & Gerontology
Health Policy & Services
Hematology
Immunology
Infectious Diseases
Integrative & Complementary Medicine
Medicine, General & Internal
Medicine, Research & Experimental
Microscopy
Neurosciences
Nursing
Obstetrics & Gynecology
Oncology
Ophthalmology
Orthopedics
Otorhinolaryngology
Pathology
Pediatrics
Peripheral Vascular Disease
Pharmacology & Pharmacy
Psychiatry
Public, Environmental & Occupational Health
Radiology, Nuclear Medicine & Medical Imaging
Rehabilitation
Respiratory System
Rheumatology
Sport Sciences
Substance Abuse
Broad area
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Quantitative Science Studies
203
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Surgery
Toxicology
Transplantation
Tropical Medicine
Urology & Nephrology
Veterinary Sciences
Acoustics
Mechanics
Optics
Physics, Applied
Physics, Atomic, Molecular & Chemical
Physics, Condensed Matter
Physics, Fluids & Plasmas
Physics, Mathematical
Physics, Multidisciplinary
Physics, Nuclear
Physics, Particles & Fields
Thermodynamics
Business
Business, Finance
Communication
Education & Educational Research
Education, Scientific Disciplines
Education, Special
Ergonomics
Family Studies
Health Care Sciences & Services
Hospitality, Leisure, Sport & Tourism
Industrial Relations & Labor
Information Science & Library Science
Law
Broad area
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Medical sciences
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Physics
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Professional fields
Quantitative Science Studies
204
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Management
Medicine, Legal
Primary Health Care
Social Work
Behavioral Sciences
Psychology
Psychology, Applied
Psychology, Biological
Psychology, Clinical
Psychology, Developmental
Psychology, Educational
Psychology, Experimental
Psychology, Mathematical
Psychology, Multidisciplinary
Psychology, Psychoanalysis
Psychology, Social
Agricultural Economics & Policy
Anthropology
Area Studies
Criminology & Penology
Demography
Economics
Geography
Gerontology
International Relations
Linguistics
Planning & Development
Political Science
Public Administration
Social Issues
Social Sciences, Biomedical
Broad area
Professional fields
Professional fields
Professional fields
Professional fields
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Psychology
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Social sciences
Quantitative Science Studies
205
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Practical method to reclassify Web of Science articles
Table A1. (continued )
WoS subject category
Social Sciences, Interdisciplinary
Social Sciences, Mathematical Methods
Sociology
Urban Studies
Broad area
Social sciences
Social sciences
Social sciences
Social sciences
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
1
1
1
8
3
1
7
6
0
8
6
7
q
s
s
_
a
_
0
0
0
1
4
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
206