ARTÍCULO DE INVESTIGACIÓN
Citation metrics covary with researchers’
assessments of the quality of their works
Dag W. Aksnes
, Fredrik Niclas Piro
, and Lone Wanderås Fossum
Nordic Institute for Studies in Innovation, Investigación, and Education (NIFU), Oslo, Norway
un acceso abierto
diario
Palabras clave: bibliometric indicators, citas, métrica, peer review, research quality, scientific
importance
Citación: Aksnes, D. w., Piro, F. NORTE., &
Fossum, l. W.. (2023). Citation metrics
covary with researchers’ assessments
of the quality of their works. Quantitative
Science Studies, 4(1), 105–126.
https://doi.org/10.1162/qss_a_00241
DOI:
https://doi.org/10.1162/qss_a_00241
Revisión por pares:
https://www.webofscience.com/api
/gateway/wos/peer-review/10.1162
/qss_a_00241
Recibió: 15 Septiembre 2022
Aceptado: 28 December 2022
Autor correspondiente:
Dag W. Aksnes
dag.w.aksnes@nifu.no
Editor de manejo:
Juego Waltman
Derechos de autor: © 2023 Dag W. Aksnes,
Fredrik Niclas Piro, and Lone Wanderås
Fossum. Published under a Creative
Commons Attribution 4.0 Internacional
(CC POR 4.0) licencia.
La prensa del MIT
ABSTRACTO
For a long time, citation counts have been used to measure scientific impact or quality. Do such
measures align with researchers’ assessments of the quality of their work? en este estudio, nosotros
address this issue by decomposing the research quality concept into constituent parts and
analyzing their correspondence with citation measures. The focus is on individual publications,
their citation counts and how the publications are rated by the authors themselves along quality
dimensions. En general, the study shows a statistically significant relationship for all dimensions
analyzed: solidity, novelty/originality, scientific importance and societal impact. The highest
correlation is found for scientific importance. Sin embargo, it is not very strong, but we find distinct
gradients when publications are grouped by quality scores. This means that the higher the
researchers rate their work, the more they are cited. The results suggest that citation metrics
have low reliability as indicators at the level of individual articles, but at aggregated levels, el
validity is higher, at least according to how authors perceive quality.
1.
INTRODUCCIÓN
Citation data are widely used in the context of research evaluation and performance assess-
mentos (Wilsdon, Allen et al., 2015). How often a publication is cited in the research literature
is seen as a sign of its valuation as a scientific contribution. The citation counts of individual
publications constitute the basis for measuring performance at various levels in the research
sistema, such as individual authors, research groups, departments, e instituciones. The idea
that citation numbers can be used as a proxy for research quality dates back a long time (Col
& Col, 1971). Hoy, citations are still claimed to reflect research quality (Caon, Trapp, &
Baldock, 2020), although most bibliometric professionals would probably adhere to the view
that citations reflect scientific impact rather than quality.
A recent literature review examined the relationship between research quality and citation
indicators (Aksnes, Langfeldt, & Wouters, 2019). The point of departure is the multidimen-
sional character of the research quality concept, where plausibility/reliability, originality,
scientific value, and societal value are seen as key characteristics. In Polanyi’s original elab-
orations (1962), the merit of a scientific contribution relates to the three first dimensions, pero
societal value has been added by other scholars (Lamont, 2009; weinberg, 1963). These key
distinctions of research quality reappear in many later empirical studies (Langfeldt, Nedeva
et al., 2020). Plausibility/reliability may refer to the solidity of empirical evidence, the sound-
ness of the results, and their reliability; originality to providing new knowledge and innovative
investigación; scientific value to the contribution to research progress and importance for other
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
investigación; and societal value to the usefulness for society. Although a multitude of other notions
and aspects of research quality have been suggested, these can generally be regarded as spe-
cific cases of the four dimensions (Langfeldt et al., 2020).
The review by Aksnes et al. (2019) argues that citations, hasta cierto punto, indicate scientific
impact and relevance, but there is little evidence of citations reflecting other key dimensions of
research quality described above. The latter conclusion was based on an examination of the
literature, showing that studies addressing the issue empirically are lacking.
The lack of previous studies addressing this issue is the motivation for the current paper. El
aim is to provide further knowledge on the extent to which citations reflect the various dimen-
sions of research quality. The focus is on individual publications, their citation counts, y
how the publications are rated by the authors themselves along quality dimensions. Specifi-
cally, the following research questions are addressed:
(cid:129) To what extent do citation metrics of publications correspond with the authors’ self-
assessments of research quality dimensions: novelty/originality, solidity, scientific impor-
tance, and societal impact?
(cid:129) As a subordinate issue: To what extent does the relationship differ by type of research
contribución (teórico, empirical, methodological and reviews) and by research field?
The latter questions have been added because previous research has shown that citation
patterns differ across types of contributions. Review articles are particularly known to be, en
promedio, more frequently cited than ordinary articles (Mendoza, 2021; Miranda & Garcia-
Carpintero, 2018). Además, some of the world’s most highly cited publications are method
documentos (Aksnes et al., 2019; Pequeño, 2018). Less is known about the citation scores of other
types of contributions. Sin embargo, a study by Aksnes (2006) showed relatively small differences
in citations to theoretical, metodológico, and empirical contributions. The field dimension is
also important. Not only do citation patterns differ significantly across fields, but there are also
large variations in the coverage of the scientific and scholarly literature in bibliometric data-
bases (Aksnes & Sivertsen, 2019; Marx & Bornmann, 2015). This limitation particularly affects
the humanities field as well as many social sciences disciplines, presumably affecting the
validity of citation measures in performance analyses in these disciplines. Respectivamente, él
has been recommended that citation analyses be applied with caution in these areas (Moed,
2005; Ochsner, Hug, & Galleron, 2017). Por lo tanto, our analysis will specifically address how
correspondence differs across fields.
Considering the frequent use of citations and other publication-based metrics for research
evaluation purposes, hiring, and funding (Langfeldt, Reymert, & Aksnes, 2021), la investigación
questions of our study are important and should be paid attention to. Many studies have
addressed similar research questions, comparing citation measures with external benchmarks
(p.ej., peer reviews). The results are typically interpreted within a validation framework, significar-
ing that if citation indictors can legitimately be used as performance measures, there should be
a certain congruity with peer assessments. Por ejemplo, Harzing (2018) analyzed the British
national research evaluation (Research Assessment Exercise [REF]) and compared universities’
REF scores with the number of citations, reporting a very high correlation (0.97). Sin embargo, el
degree of correspondence identified differs significantly across individual studies and gener-
ally tends to be moderate and far from perfect (Aksnes et al., 2019; Wilsdon et al., 2015).
In an examination of the lack of consistency in previous REF-based comparative assess-
mentos, Traag and Waltman (2019) emphasized that the results will differ according to the level
of aggregation studied, from individual publications to aggregated levels, such as institutions.
Estudios de ciencias cuantitativas
106
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
en este estudio, we focus on the lowest level of aggregation: individual publications. At this
nivel, previous studies seem to have found rather low correspondence. One of the most com-
prehensive studies is the one carried out for the Metric Tide report (Wilsdon et al., 2015) en
how REF 2014 quality scores correlated with metrics (Higher Education Funding Council for
Inglaterra (HEFCE), 2015). Aquí, the REF quality scores were based on peer assessments of the
originality, significance, and rigor of the publications. A variety of indicators were examined,
but none obtained a higher correlation coefficient (Spearman’s) than 0.34. Similarmente, a recent
study of bibliometric indicators and peer reviews of the Italian research assessment showed
weak correspondence at the level of individual articles (Baccini, Barabesi, & De Nicolao,
2020), concluding that metrics should not replace peer reviews at the level of individual arti-
cles. This was based on a combined index in which the number of citations and journal impact
factors were used. Other examples include a study analyzing articles that were singled out in
Mathematical Reviews as being especially important (Smolinsky, Sage et al., 2021). De estos,
17% were highly cited (among the top 1% cited papers). Articles that have been recom-
mended as important in biomedicine by the Faculty of 1,000 (a publication peer-review service
for biological and medical research) have also been shown to correlate with citations, but only
weakly (waltman & costas, 2014). Borchardt, Moran et al. (2018) analyzed chemistry articles
and found that peer assessments of importance and significance differed considerably from
citation-based measurements. Older studies of individual articles with similar findings include
Aksnes (2006) and Patterson and Harris (2009), but there are also studies that have concluded
differently. Ioannidis, Boyack et al. (2014) found that papers ranked by elite scientists as their
best were also among the most highly cited. Similarmente, an examination of award-winning
papers in economics showed that these papers had a significantly higher number of citations
than ordinary papers (Coupe, 2013).
The large variety in the observed degree of correspondence in previous comparative studies
may not be surprising. Not only is research quality a multidimensional concept, but peer eval-
uations also often include assessments of factors besides quality. De este modo, the foundation for sim-
ple comparative assessments may be weak or lacking. Además, many citation indicators
existir, and the results may depend on the type of indicator selected. Finalmente, peer assessments
are uncertain and fallible (Aksnes, 2006; Traag & waltman, 2019).
Against this background, we believe there is a need for more studies that address the topic
in a simpler and more transparent manner. En nuestra opinión, a problem or limitation with many
previous studies is that the multidimensional character of research quality is not taken into
cuenta. This paper expands the perspective by decomposing the concept, making it evident
which research performance or quality dimensions are compared with citation metrics.
In the study, we rely on authors’ self-assessments of their papers’ quality dimensions, cual
is an approach also adopted in several previous studies (Aksnes, 2006; Case & Higgins, 2000;
Dirk, 1999; Ioannidis et al., 2014; Portero, Chubin, & Xiao-Yin, 1988; Shibayama & Wang,
2020). Still, there are pros and cons to such a methodology. A main advantage is that the
authors have thorough knowledge of the content of the publication, the research reported,
and the field. Sin embargo, their views may be regarded as more subjective than those of their
colegas. Por ejemplo, one might expect certain psychological mechanisms at play, como el
Dunning–Kruger effect (Kruger & Dunning, 1999), which states that people overestimate their
own abilities but where the effect is reversed for highly skilled individuals. Al mismo tiempo,
there are also limitations to alternative approaches that rely on peer assessments. As noted
arriba, these are fallible, and the agreement between different reviewers has been shown to
be very low (Sotavento, Sugimoto et al., 2013), meaning that there is no objective yardstick to which
citations can be validated as indicators.
Estudios de ciencias cuantitativas
107
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
2. DATA AND METHODS
2.1. Study Design and Questionnaire
Citation distributions are skewed at the level of individual authors (Seglen, 1992). A large num-
ber of published articles are little cited or not at all. Rather than selecting a random set of
publications and authors, which would be dominated by less-cited publications, we designed
a method for which contributions from a wide citation spectrum would be well represented.
Específicamente, this means that we divided the publications into three categories:
(cid:129) Highly cited publications (within the top 10 percentile rank)
(cid:129) Publications with intermediate number of citations (within the top 10–50 percentile
rank)
(cid:129) Less-cited publications (within the bottom 50–100 percentile rank)
We then preselected individuals who had published at least one article in each of the
groups during the period analyzed and where they appeared as either the first or last author.
The latter criterion was added because we would like to include publications in which the
authors had contributed in key roles. Usually, this is indicated by first authorship (main con-
tributor) or last authorship (principal investigator), although this rule does not always hold, como
alphabetical author listing is common in some fields (waltman, 2012).
Además, the study was limited to scientists affiliated with institutions in Norway. Uno
might ask whether this specific national delimitation has relevance when interpreting the
resultados. Por un lado, the attributes of research quality analyzed are thought to be universal
in the way that they transcend specific field or national delimitations. Por otro lado, estos
attributes might still be given different content in different contexts (Langfeldt et al., 2020).
Researchers’ perceptions of quality may also be influenced by national research evaluation
sistemas (Bianco, Gras, & Sutz, 2016). Still, we do not think national peculiarities should be
given much emphasis when interpreting the results. A large majority of the publications also
have coauthors from other countries and do not therefore represent “domestic” Norwegian
investigación.
What should be emphasized is that the investigation is not based on a random selection of
individuals. The survey is biased in favor of scientists who have published highly cited papers,
have key roles in research, and are reasonably productive. En la práctica, this means that the
survey is dominated by experienced scientists, often in full-professor positions.
A questionnaire was designed in which the authors were asked questions about three of
their publications, one randomly selected from each of the citation categories described
arriba. The respondents were not informed about this strategy, as their responses should not
be influenced by knowing the citation-based selection procedure. We therefore simply asked
them to rate three of their papers that had been randomly selected.
The questions were identical for all papers. Specific questions were included for each of
the different quality dimensions, in addition to a general question on the type of contribution.
We also included a question on groundbreaking research, as this has been claimed to be
associated with highly cited publications (Savov, Jatowt, & Nielek, 2020). An overview is pro-
vided in Table 1.
As can be seen from Table 1, the various research quality dimensions were operationalized
only to some extent in the survey, leaving some room for the respondents’ own interpretations.
The operationalization is based on an examination of the relevant literature on research
Estudios de ciencias cuantitativas
108
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Mesa 1. Overview of questions and answer alternatives
Questions
Please characterize the main contribution(s) from these articles
Answer alternatives
Theoretical–Empirical – Methodological–Review–
Cannot say/Not relevant
How do you regard the novelty/originality of the research reported in these
1 (bajo)–2–3–4–5 (alto)-Cannot say/Not relevant
artículos (p.ej., of in terms of topic addressed, research question,
methodology and results)?
How do you regard the solidity of the research reported in these articles
(es decir., validity and certainty/reliability of the methods and results
reported)?
How do you regard the scientific importance of the research reported in
these articles (p.ej., in terms of new discoveries/findings, teórico
developments and new analytic techniques)?
As far as you know, has the research/results presented in these articles had
any societal impact (es decir., effect on, change or benefit to the economy,
sociedad, cultura, public policy or services, salud, the environment, o
quality of life, beyond academia)?
Would you consider your article as ‘groundbreaking research’?
Yes–No–Don’t know
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
quality and research evaluations; específicamente, the one on societal impact relies on the defini-
tion applied in REF (Savov et al., 2020).
2.2. Bibliometric Data, Indicators and Analyses
The study relies on two bibliometric databases. The first is the Norwegian publication data-
base, Cristin, which contains complete data on the scientific and scholarly publication output
of Norwegian researchers (Sivertsen, 2018). From this database, the publication outputs of
individual researchers can easily be identified. The second is the Web of Science ( WoS) datos-
base, which has been used to retrieve citation data. The two databases were coupled through a
joint identity key. We applied a local version of WoS maintained by the Norwegian Agency for
Shared Services in Education and Research. De este modo, the study is limited to articles that have been
indexed in WoS.
We identified the publication output of all Norwegian researchers (covering higher educa-
tion institutions, hospitals, and independent research institutes) for the period 2015–2018. Nosotros
did not include more recent publications, as we required a citation window of at least three
años. Además, we did not include older publications, as the memory of the respondents may
be more limited when going back in time.
Only publication items classified as regular articles in WoS were included. We excluded
review articles (because the survey focused on the characteristics of original or primary
investigación) as well as minor items, such as editorials and letters. Sin embargo, we preserved
the review category in the questionnaire because the WoS item classification system is known
to be inaccurate (Donner, 2017).
Two types of citation indicators were used. Primero, we used the normalized citation index
(MNCS), where the citation numbers of each publication are normalized by subject field, arti-
cle type and year (waltman & Van Eck, 2013), thus allowing publications from different fields
and years to be compared on equal grounds. Segundo, we used the citation percentile, ranging
Estudios de ciencias cuantitativas
109
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
de 0 a 100%. This is an indicator showing the articles’ position within the citation distribu-
tion of their field (waltman & Schreiber, 2013), also taking into account their publication year
and article type. Thereby, we cover the two most commonly applied indicators in citation
studies in which the percentile-based indicator seems to be increasingly preferred due to its
mathematical properties and insensitivity to outliers (Hicks, Wouters et al., 2015; Wilsdon
et al., 2015).
The survey results are combined with data on the percentile category of the publications
through descriptive bivariate analyses. In this way, we used a binning approach. Although this
implies the loss of some information, it simplifies the analyses and the visual interpretation of
the results. The most common approach in previous similar studies is correlation analysis (ver,
Por ejemplo, Aksnes, 2006; Borchardt et al., 2018; Smolinsky et al., 2021). en este estudio, nosotros
carried out supplementary analyses using Pearson’s correlation coefficient in the case of the
percentile citation indicator and Spearman’s rank correlation coefficient in the case of the
MNCS. Spearman’s test was used in the latter case due to the lack of normally distributed data.
De este modo, the strength of the relationship between a ranking derived from the MNCS indicator and
author ratings is analyzed.
2.3. The Survey
The survey was distributed in January 2022 using SurveyXact software. Questionnaires were
sent out to a sample of 1,250 researchers based on the criteria described above. The response
rate was 47%, and the final sample consisted of 592 individuals, each with three publications
incluido. El estudio, por lo tanto, encompasses assessments of almost 1,800 publicaciones.
Mesa 2 shows how the respondents are distributed by scientific domain and gender. En
total, 180 women (30.4%) y 412 hombres (69.6%) participated in the survey. Medicine and
Health is the largest domain (43.4%), and there were few participants from the Humanities
(4.6%). This is due to the publication patterns of the Humanities, where only a small part of
the publication output is indexed in WoS (Aksnes & Sivertsen, 2019). We acknowledge that the
number of respondents from Humanities is low. The results in this domain should accordingly
be treated with caution (especially as many researchers do not publish in WoS-indexed jour-
nal, but rather in national language journals and in books). The field classification applied in
the study relies on the system of the Cristin database (Sivertsen, 2018), where each publication
is assigned field categories and several researchers have publications in two (and even three)
dominios. En mesa 2, they are listed according to the majority principle.
Data on the birth years of the researchers (not shown in tables) show that the average ages
of the male and female respondents are 56.8 years and 54.7 años, respectivamente. De este modo, we are
dealing with an experienced group of researchers, which is also evident from the figures on the
Mesa 2. Distribution of respondents by scientific domain and gender
Scientific field
Humanities
Medicine and Health
Natural Sciences and Technology
Social Sciences
Total
Women
10
92
39
39
180
Hombres
17
165
159
71
412
Total
27
257
198
110
592
Estudios de ciencias cuantitativas
% Women
5.6
% Hombres
4.1
% Total
4.6
51.1
21.7
21.7
100
40.0
38.6
17.2
100
43.4
33.4
18.6
100
110
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
total publication output of the respondents during the period, cual es 13 publications per
year on average. Aquí, women have a substantially lower number than men: 10 versus 14
publicaciones. The skewed distribution of men and women across fields (both vertically and
horizontally in Table 2) is quite similar to that seen in the total Norwegian population of
researchers in senior positions (p.ej., Nygaard, Aksnes, & Piro, 2022).
To assess possible response biases, we compared the field, género, and age distributions
of the respondents with those for the original sample. The response rate was 44% for women
y 49% for men. The respondents were slightly older than the nonrespondents (average age
+3 años). At the level of domain, the response rate was lowest in the two largest fields
(Medicine and Health, 44%, and Natural Sciences and Technology, 46%), somewhat higher
in Social Sciences (55%) and much higher in Humanities (71%). Por eso, the differences in
response rates across domains have led to a more balanced representation by reducing some
of the original size differences. We consider the response bias across the two other variables
as minor, not representing a methodological problem.
3. RESULTADOS
3.1. Type of Contribution
Mesa 3 shows the distribution of the publication types (self-reported) across three citation rank
categories: arriba 10 percentile (highly cited publications), 10–50 percentile and the 50–100 per-
centile (less-cited/uncited publications). For almost half of the publications (47%), el principal
contribution was assessed to be empirical. One-quarter of the papers were assessed as fore-
most contributing theoretically, y 20% contributed methodologically. As noted above,
articles classified as reviews in WoS were excluded from the sample. Sin embargo, 7% de
the papers were claimed by the authors to foremost have “review” as the main contribution.
At the level of domain, theoretical contributions appear more frequently in the Social Sciences
and Humanities, while empirical contributions dominate in the Medicine and Health sciences.
The distribution of articles across citation rank categories did not differ much (Mesa 3).
De este modo, the results do not suggest that certain types of contributions tend to be more highly cited
than others. Even the review contributions are distributed quite evenly across citation catego-
ries. This might seem surprising, as review papers have generally been shown to be more
highly cited (Miranda & Garcia-Carpintero, 2018). Sin embargo, the data set was preselected
not to include review articles, thus preventing any conclusions on this matter.
Mesa 3. Distribution of publications by type of contribution and citation rank categories (%)*
Type of contribution
Teórico
0–10%
26.2
10–50%
25.8
50–100%
25.5
Empirical
Methodological
Revisar
Total
norte
46.0
19.9
7.9
100
592
48.3
20.0
5.9
100
592
47.3
20.7
6.4
100
592
Total
25.9
47.2
20.2
6.8
100
1,776
* The respondents were allowed to select more than one type of contribution; por lo tanto, double counts occur.
Missing values and ‘don’t know’ replies are excluded from the calculations.
Estudios de ciencias cuantitativas
111
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
En general, the results imply that the contribution type variable is of little interest to include in
forthcoming analyses, which would otherwise have been the case if a contribution type was
encima- or underrepresented in particular citation rank categories.
Abajo, we turn to the various components of the research quality concept.
3.2. Solidity
The respondents were asked to assess the solidity of the research reported in their publications
de 1 a 5. The lowest score options (1–2) were used to a very small extent; this also holds
true for the other survey questions. En general, the respondents assessed solidity as moderate or
alto. The distribution by citation rank categories is shown in Figure 1.
Across all citation intervals, the large majority of the publications were assessed to have
high solidity (puntuaciones 4 y 5). Even the less-cited publications were usually considered to have
high solidity. Sin embargo, the distribution of solidity assessments was not equal across the cita-
tion groups. A larger proportion of the most cited papers (10%) obtained the highest score (5)
(54%), comparado con 40% for the papers in the 50–100 percentile category. We also observe
the opposite pattern for papers with the lowest score (1–3), amounting to 7% y 21%, respetar-
activamente. En suma, Cifra 1 gives the impression that perceptions of solidity increase with citation
puntuaciones.
To analyze the correspondence at field levels, we applied a simplified approach calculating
the average author score across citation rank categories (Cifra 2). Except for the Humanities,
we observe that the scores increased according to citation rank categories.
3.3. Novelty/Originality
On the question concerning the novelty/originality of the research, solo 5.4% of the articles
were rated with the lowest scores (1 y 2) (Cifra 3). Approximately 25% obtained an inter-
mediate score of 3, while the remaining articles were classified in the two highest categories
(4–5). The pattern is quite similar to that observed for solidity, but the respondents were some-
what more modest when assessing novelty/originality. Por ejemplo, a pesar de 46.3% del
articles were rated 5 on solidity, the corresponding figure for novelty/originality is 29.8%.
Cifra 1. Distribution of publications by solidity score and citation rank categories (%).
Estudios de ciencias cuantitativas
112
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Cifra 2. Average solidity score by citation percentile categories and scientific domain.
The distribution by citation rank categories shows that in the top 10 percentile, 43% del
articles got the highest score. The corresponding figure for the articles in the 50–100 percentile
es 19%. Similarmente, we see an opposite pattern for articles in the 50–100 percentile category,
where more of the publications got low/intermediate scores (1–3). De este modo, we see a tendency for
the ratings to correspond with the citation rank categories of the publications.
Cifra 4 shows the mean and total scores on novelty/originality across fields. In all fields,
articles in the top 10 percentile are ranked highest. Similarmente, articles in the 50–100 percentile
category clearly have lower scores than those in the 10–50 percentile category, con el
exception of Medicine.
3.4. Scientific Importance
The third dimension related to scientific quality is scientific importance. The largest number of
articles were rated 4 (39%), with equal shares rated 3 o 5 (27%) (Cifra 5). The distribution is
very similar to the one previously shown for novelty/originality. The interpretation of this is that
the researchers acknowledge that although the work itself is of high solidity, it may not have
been equally novel/original or scientifically important.
Cifra 3. Distribution of publications by novelty/originality score and citation rank categories (%).
Estudios de ciencias cuantitativas
113
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Cifra 4. Average novelty/originality score by citation percentile categories and scientific domain.
The distribution by citation rank categories is also very similar to the one we observed for
novelty/originality. A much higher proportion of the top 10 percentile articles obtained the
highest score (45%) compared with articles in the 50–100 percentile category (14%).
Cifra 6 shows the mean and total scores for scientific importance across fields. We do not
observe large differences here. In all fields, the patterns are quite similar.
En suma, the differences in scores for the three quality dimensions indicate that the
researchers have been able to differentiate between the dimensions (es decir., they have not auto-
matically scored each paper with equal rating for all dimensions). This was confirmed with,
primero, a correlation analysis, revealing correlations (Pearson’s r, two-tailed, sig. 000) de .350
between novelty/originality and solidity; .651 between novelty/originality and scientific impor-
tance; y .408 between solidity and scientific importance. Segundo, we calculated the per-
centages of scores that differed between pairs of quality dimensions. En 54.5% de los papeles
the researchers rated novelty/originality and solidity differently. For novelty/originality and sci-
entific importance different scores were given for 41.5% de los papeles, and for solidity and
scientific importance, different scores were given for 56.9% de los papeles.
Cifra 5. Distribution of publications by scientific importance score and citation rank categories (%).
Estudios de ciencias cuantitativas
114
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Cifra 6. Average scientific importance score by citation percentile categories and scientific
domain.
3.5. Groundbreaking Research
The researchers were also asked whether their publications represented “groundbreaking
research.” In total, more than one-quarter of the papers were perceived as groundbreaking
(Cifra 7). In the Humanities, this proportion was as high as 39.5%. In the Health Sciences the
researchers reported that their papers were groundbreaking substantially less often (16.3%). En
the other domains, the percentage ranged from 23.8% (Medicamento) a 31.4% (Tecnología).
The respondents’ assessments corresponded well with the citation range categories. A
much higher proportion of the 10 percentile articles were considered groundbreaking com-
pared with the two other categories. Still, 15% of the articles in the lowest 50–100 percentile
category were considered groundbreaking. Sin embargo, not all highly cited publications were
considered groundbreaking. In all fields except Medicine, there was a distinct pattern corre-
sponding with increasing proportions from 50–100, to 10–50, to the 10 percentile, but in Med-
icine, the reporting of groundbreaking research is twice as high in the 10 percentile compared
to the other percentiles.
Cifra 7. Percentage of publications considered to be groundbreaking research by citation per-
centile categories and scientific domain.
Estudios de ciencias cuantitativas
115
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
At the level of individuals (excluding those who did not answer the question for all three of
their papers), 22 respondents (3.8%) claimed that all three of their papers were groundbreak-
En g, mientras 256 respondents (44.7%) claimed that none of their papers were groundbreaking.
Ninety-one respondents (15.9%) claimed two of their papers were groundbreaking and one
was not. One groundbreaking paper and two nongroundbreaking papers were reported by
204 respondents (35.6%).
3.6. Societal Impact
Societal impact is the last dimension of the research quality concept. Aquí, the response alter-
natives were simply “yes,” “no,” and “don’t know.” A total of 24.9% of the papers were
claimed to have had societal impact (Cifra 8). In this calculation, the “don’t know” publica-
tions are also included in the denominator because it is fully possible to know that your
research has had societal impact, but it is not possible to know with certainty that it has not
(the researcher may simply not be aware of it), which makes it not so important to distinguish
between “no” and “don’t know.”
There are notable differences across fields in the extent to which the respondents consider
their research to have societal impact. This proportion is highest in Medicine (31%) y el
Social Sciences (29.8%) and lowest in Humanities (11.5%) and Health Sciences (12.7%).
Además, in this dimension, the respondents’ answers correspond with the citation rank
categories in the previously observed manner: The highest proportion is for the 10 percentile
grupo (33.9%) and lowest for the 50–100 percentile group (17.1%). This pattern is consistent
across all fields.
3.7. Further Analyses
We have so far presented results using a binning approach consisting of three citation percen-
tile categories corresponding to the criteria applied in the selection of publications. Sin embargo,
some information is lost by this procedure, and we will present analyses using accurate data
(citation scores). In addition to the percentile-based indicator, we will analyze the MNCS.
Cifra 8. Percentage of publications considered to have had a societal impact across citation rank
categories and scientific domains.
Estudios de ciencias cuantitativas
116
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Cifra 9. Percentile citation values and assessments of research quality dimensions. Response
alternativa 1 (lowest) is not shown due to a very small number of observations.
Cifra 9 shows how the respondents’ assessments of quality dimensions vary according to
response alternatives and percentile values. Por ejemplo, the average percentile value for arti-
cles rated with high (5) novelty was 26, comparado con 50 for articles rated with the lowest
puntaje (2). For all quality dimensions, the results correspond with the patterns identified above,
in which there is a distinct difference in values by response alternatives.
We then carried out a similar analysis using the other citation indicator, MNCS (Cifra 10).
A corresponding pattern is found but with a larger difference in citation values across the
puntuaciones (1–5). This is due to the distributional character of the MNCS indicator and the presence
of outliers.
Further insights concerning the relationships are obtained by carrying out correlation
analiza. Both MNCS and percentile values show moderate to weak correlations with the
three quality dimensions. Sin embargo, the associations are statistically significant, as shown in
Cifra 10. Mean normalized citation score (MNCS) and assessments of research quality dimen-
siones. Response alternative 1 (lowest) is not shown due to a very small number of observations.
Estudios de ciencias cuantitativas
117
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Mesa 4.
Correlation analysis of MNCS/citation percentiles and research quality measures
MNCS
Spearman’s correlation
Percentile values
Correlación de Pearson
Sig. (2-cola)
Sig. (2-cola)
norte
Solidity
.177*
.000
−.181*
.000
1,691
Novelty/originality
.280*
Scientific importance
.370*
.000
−.266*
.000
1,732
.000
−.351*
.000
1,707
* Correlation is significant at the 0.01 nivel (2-cola).
The different signs reflect that a low percentile value corresponds with high citation counts and vice versa.
Mesa 4. There are minor differences only across the two types of citation indicators. The stron-
gest correlation is between citation measures and scientific importance.
A more differentiated picture emerges when we analyze correlations separately by field.
Mesa 5 shows the results for the citation percentile indicator. Very similar patterns were
Mesa 5.
Correlation analysis (Pearson’s r ) of citation percentiles and research quality measures across scientific fields
Health sciences
Correlación
Humanities
Medicamento
Sig. (2-cola)
norte
Correlación
Sig. (2-cola)
norte
Correlación
Sig. (2-cola)
norte
Natural sciences
Correlación
Sig. (2-cola)
norte
Social Sciences
Correlación
Tecnología
Sig. (2-cola)
norte
Correlación
Sig. (2-cola)
norte
* Correlation is significant at the 0.05 nivel (2-cola).
** Correlation is significant at the 0.01 nivel (2-cola).
Solidity
−.194**
.002
257
−.085
.438
85
−.203**
.000
428
−.182**
.000
424
−.215**
.000
293
−.173*
.013
204
Novelty/originality
−.318**
Scientific importance
−.338**
.000
268
−.219*
.036
92
−.111*
.020
435
−.278**
.000
426
−.418**
.000
307
−.311**
.000
204
.000
262
−.341**
.001
86
−.290**
.000
428
−.378**
.000
426
−.407**
.000
302
−.379**
.000
203
Estudios de ciencias cuantitativas
118
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
obtained for the MNCS indicator (not shown). Contrary to our expectations, there were rather
small differences across fields. The rather weak correlation regarding solidity (Mesa 4) is also
observed for most fields, but the variety between fields on the other quality dimensions shows
that correlations range from very small on novelty/originality (Medicamento: −.11) to moderate
(Health Sciences: −.32; Social Sciences: −.42). For scientific importance, all correlations were
in the range of −0.29 to −.41.
En mesa 6, we show the mean and median values of MNCS for each research quality dimen-
sion and score. The median values are clearly lower than the average, demonstrating a skewed
distribution in which the mean values are strongly influenced by a relatively small group of very
artículos muy citados. In almost all fields, there is a distinct gradient, meaning that the citation scores
increase according to the self-assessments of the papers. For scientific importance, the gradient is
congruent with author ratings in all fields except Technology, where papers with a score of 3 tener
lower MNCS than those with a score of 2. For the other dimensions, there are also very few excep-
tions to the overall pattern, in which citation scores increase with the respondents’ scores.
To test for the impact of the skewed data distribution in responses to the quality dimensions
(with a dominance of respondents answering at the higher end of the scale), we ran
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
Mesa 6. Mean and median values of MNCS across fields and research quality dimensions*
4
5 (highest)
Indicator
Novelty/originality
Field
Health sciences
Humanities
Medicamento
Natural sciences
Social sciences
Tecnología
Solidity
Health sciences
Humanities
Medicamento
Natural sciences
Social sciences
Tecnología
Scientific importance
Health sciences
Humanities
Medicamento
Natural sciences
Social sciences
Tecnología
2 (bajo)
Median
0.48
–
1.10
0.52
0.41
0.60
0.44
–
0.86
1.13
–
–
0.44
0.33
0.51
0.39
0.39
0.74
Avg.
0.66
–
1.45
1.11
0.68
0.94
0.35
–
1.48
1.08
–
–
0.97
1.38
0.73
0.97
0.58
1.10
3
Median
0.73
1.42
0.84
0.60
0.46
0.57
0.77
1.48
0.70
0.48
0.50
0.72
0.84
0.35
0.84
0.55
0.48
0.42
Avg.
1.24
2.30
1.75
1.38
1.21
1.24
1.45
2.69
1.07
0.74
1.09
1.41
1.41
1.49
1.28
0.96
1.23
0.87
Avg.
2.17
1.91
1.65
1.40
1.99
1.62
2.00
2.80
1.95
1.80
1.94
1.58
2.39
2.96
1.83
1.64
1.96
1.64
Median
1.75
1.13
1.27
0.82
1.34
1.08
1.41
0.99
0.89
0.89
1.35
0.92
1.63
1.73
0.95
1.06
1.20
1.08
Avg.
4.91
4.12
2.44
2.74
3.26
2.21
3.86
2.31
2.35
2.00
2.68
1.89
5.08
3.50
3.00
3.06
3.55
2.42
* Cells with fewer than 5 papers are not shown in the table. Cannot say/not relevant responses and missing values are also excluded.
Estudios de ciencias cuantitativas
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Median
2.63
3.22
1.16
1.99
2.94
2.27
1.58
1.62
1.56
1.28
1.83
1.58
2.62
2.59
2.26
2.49
2.97
2.31
119
Citation metrics covary with researchers’ assessments
Mesa 7.
Independent-samples Kruskal–Wallis* test of MNCS and research quality dimensions
Samples compared
2–1
2–1
2–1
2–3
2–3
2–3
2–4
2–4
2–4
2–5
2–5
2–5
1–3
1–3
1–3
1–4
1–4
1–4
1–5
1–5
1–5
3–4
3–4
3–4
Quality dimension
Test statistic
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
46.8
–274.3
–24.2
–88.3
–293.3
–126.4
–259.0
–476.0
–340.6
–433.0
–565.0
–568.2
–41.5
–19.0
–102.2
–212.2
–201.7
–316.4
–386.2
–290.6
–544.0
–170.7
–182.7
–214.2
estándar. error
160.5
297.2
128.8
60.0
283.8
121.8
58.1
282.5
121.0
59.1
282.4
121.7
152.7
99.6
53.2
152.0
95.9
51.5
152.4
95.6
53.1
30.8
38.2
30.0
estándar. test statistic
.33
–.92
–.22
–1.51
–1.03
–1.04
–4.54
–1.77
–2.81
–7.32
–2.00
–4.77
–.33
–.22
–1.92
–1.43
–2.10
–6.14
–2.54
–3.04
–10.20
–5.54
–4.88
–7.15
Estudios de ciencias cuantitativas
Sig.
.770
.356
.851
.141
.301
.299
.000
.092
.005
.000
.045
.000
.786
.849
.055
.163
.035
.000
.011
.002
.000
.000
.000
.000
120
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Samples compared
3–5
3–5
3–5
4–5
4–5
4–5
Quality dimension
Novelty
Solidity
Scientific importance
Novelty
Solidity
Scientific importance
Mesa 7.
(continued )
Test statistic
–344.7
estándar. error
32.8
estándar. test statistic
–10.50
–271.6
–441.9
–174.1
–88.9
–227.7
37.5
32.6
29.0
25.8
29.8
–7.33
–13.60
–6.00
–3.44
–7.63
Sig.
.000
.000
.000
.000
.001
.000
* Each row tests the null hypothesis that the Sample 1 and Sample 2 distributions are the same.
nonparametric tests to determine statistically significant differences in MNCS between the
scores on the quality dimensions. In all three dimensions, the Kruskal–Wallis H test rejected
the null hypothesis (that the distribution of MNCS is the same across categories of quality
dimensions, sig. .000), with MNCS values significantly different (and higher) between compar-
isons of groups scoring 3 and higher (Mesa 7) but with some nonsignificant results in compar-
ison, including scores 1–2, where the number of responses is very low.
4. DISCUSIÓN
The main purpose of this study was to assess how citation indicators align with researchers’
subjective assessments of research quality dimensions. Además, the study has provided
knowledge on how researchers evaluate their own research. We discuss some of the findings
related to this.
Generally, researchers rate their research publications highly. More than one-quarter of the
papers are perceived by the authors as groundbreaking research. Almost none of the publica-
tions are rated with the lowest quality score (1), few are rated with the second lowest score (2),
and a large majority are rated with the two highest scores (4 y 5). This holds true across all
quality dimensions assessed in the survey.
The pattern is particularly evident for solidity, while the researchers are slightly more
restrained when assessing the novelty/originality and scientific importance of their research.
This is perhaps not surprising, as studies that lack solidity would not be accepted for publica-
tion in reputable journals. Además, studies with little novelty might be filtered out through
the peer review process. De este modo, by being accepted for publication in journals, the studies have
already undergone a selection process that might explain the patterns.
The respondents of our survey are all experienced researchers, typically having a long
career within the academic system; they have also contributed to highly cited publications.
It is reasonable to assume that they would be reluctant to contribute to studies they consider
having little novelty or scientific importance or lacking solidity. This is an additional factor that
might explain why so few of the articles are ranked with low scores.
Al mismo tiempo, researchers may be too positive about their research. The study shows
how scientists think about their research, and in this way, there is prima facie reliability.
Sin embargo, an open question is to what extent their assessments would be shared by external
reviewers.
Estudios de ciencias cuantitativas
121
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Despite these two limitations, the judgments of the researchers provide an interesting and
suitable reference point for addressing the validity of citations as performance measures. Primero,
the spread of responses is still large enough to allow comparative analyses. Segundo, usando
author views is one approach in the range of studies applying different reference points, todo
with various strengths and limitations.
4.1. Possible Biases in the Respondents’ Assessments
Citation metrics are available through a large number of products and services. Many
researchers know their own metrics and which publications are highly cited. In Norway, cita-
tion analyses have also been provided in broad peer-based national evaluations of research
campos. To what extent the respondents’ answers are affected by beforehand knowledge, nosotros
simply do not know. We find it important to note that while some researchers’ answers may
be consciously or unconsciously guided by their knowledge of citation numbers, for others
they may not, and some may even “disagree” with the citation scores (cf. Aksnes & Rip,
2009). One might also think that the perceived prestige of the publication channels has some
influencia, such as that a Nature paper automatically receives a high rating. Still, as we do not
find strong correlations between citation metrics and author ratings, we do not think this is a
major problem, and prima facie we do not see any reason why the respondents would
rate their papers systematically in line with known citation numbers, rather than providing a
sincere assessment.
Además, the results may be biased due to a lack of memory. Some researchers responded
that the selected papers were not among their most relevant ones and that they contributed in
peripheral roles only (making it difficult to “remember”). En particular, people who have been
involved in a large number of papers in recent years may have limited memories of individual
contributions. Además, one might ask if people will rate their more recent papers higher.
Sin embargo, when testing this issue, we did not find any difference at all on the various quality
dimensions (es decir., the average scores for papers were identical across all 4 años).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
4.2. Correlations Are Not Strong, But Significant for All Quality Dimensions
En general, this study has disclosed notable covariations between citation indicators and author
ratings of quality dimensions. Highly cited publications tend to obtain substantially higher
ratings than less-cited publications and vice versa. This holds for all quality dimensions but
is weaker for solidity than for the other. The latter finding reflects that most publications
(85%) were assessed to have high solidity (puntuaciones 4 y 5) in the first place.
The agreement between citation indicators and author assessments has been analyzed
using different approaches. In the first approach, based on binning, we looked at three main
citation categories. Aquí, we observed a distinct pattern in which highly cited publications are
generally seen as having a higher research quality than other publications. This holds true
across the different dimensions analyzed. To the contrary, publications that have been mod-
erately or little cited are ranked lower but rarely as of inferior quality. In this way, quality rank-
ings of publications based on rough citation categories seem to have certain justifications—at
aggregated levels.
In the other part of the analysis, exact citation figures were used as the basis. These analyses
support the main findings of the first part. Articles that have been ranked as high quality are, en
promedio, more cited. Comparing the averages in this way reveals distinct differences between
the groups.
Estudios de ciencias cuantitativas
122
Citation metrics covary with researchers’ assessments
The results provide some support for the hypothesis derived from Aksnes et al. (2019) eso
citation rates reflect scientific impact and relevance (hasta cierto punto) but not the other quality
dimensions. There is a significant correlation for all the quality dimensions analyzed. The cor-
relation is strongest for scientific importance, which holds true for both the citation indicators
analyzed: the percentile (−0.35) and the MNCS indicator (0.37). En particular, highly cited
papers are considered to have high scientific importance. For novelty/originality, the correla-
tion is somewhat weaker (−0.27 and 0.28, respectivamente), and it is even weaker for solidity
(−0.18 and 0.18, respectivamente).
It is not surprising that the agreement is the poorest for solidity, considering that solidity per
se is not the reason why a publication is cited in subsequent research (Aksnes et al., 2019).
Bastante, it may be a necessary but not sufficient criterion for a publication to be considered
worth citing, at least according to a normative citation behavior model (ver, Por ejemplo,
Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2019). Similar reasoning can be provided
for novelty/originality. A publication may report research that is original in approach, but if
the results do not make interesting contributions to current knowledge, the publication may
not be cited.
Although the association is statistically significant for all quality dimensions and both indica-
tors analyzed, it is not strong. As a rule of thumb, correlation coefficients below 0.35 are con-
sidered weak (taylor, 1990). en este estudio, only one case (scientific importance) has a higher
valor ( just about): 0.35/0.37. This would correspond to a coefficient of determination of approx-
imately 0.13 (r 2), meaning that 13% of the variance in the indicator can be “explained” by
author ratings. Still, the characteristics of the response distribution must be taken into consider-
ation in the interpretation. Generally, the value of the correlation coefficient will be larger when
there is more variability among the observations than when there is less variability (Goodwin &
Sanguijuela, 2006). As noted, a large majority of the papers were rated with scores 3–5, reduciendo
the variability. Además, certain range restrictions were applied in the initial identification
of the researchers. This means that the identified correlation should be interpreted as stronger
than the raw coefficients might suggest.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
The results show that at the level of individual publications, citation metrics quite often do
not correspond with author assessments. The correspondence is highest for highly cited pub-
lications. Many of the less-cited publications are considered to be of good quality and obtain
high scores on the quality dimensions. This supports the conclusion that citations are unreli-
able at the level of individual articles but have stronger reliability at aggregated levels. A
similar point is made by Traag and Waltman (2019), who claim that when analyzed at the
level of articles, the strength of the correlation is much weaker than when addressed for aggre-
gated units.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
How do our results compare with previous studies comparing peer ratings and citation indi-
cators for individual publications? A relevant example is the study of how the REF 2014 quality
scores correlated with metrics (HEFCE, 2015), reporting a Spearman’s rank correlation coeffi-
cient of −0.29 for the percentile indicator and 0.28 for an MNCS-like indicator (field-weighted
citation impact). These results are lower than the results of our study on scientific importance
but higher than those obtained for the other quality dimensions. The study by Baccini et al.
(2020) also showed an overall degree of agreement consistent with the results of the HEFCE
informe. We also find other studies addressing this at the level of individual papers, reporting
correlation coefficients around 0.2 (Borchardt et al., 2018; Patterson & harris, 2009).
The conclusions of the HEFCE report have, sin embargo, been criticized by Traag and Waltman
(2019). They noted that previous studies showed much higher agreement between metrics and
Estudios de ciencias cuantitativas
123
Citation metrics covary with researchers’ assessments
peer reviews in the REF. In their view, the issue should be addressed at an aggregated level
(institutional) rather than at the level of individual papers. The reason for this is that the goal of
the REF is not to assess individual publications but to assess institutional research. Usando un
alternative approach, Traag and Waltman (2019) were able to show much stronger correla-
tions than those found in the HEFCE report. The fact that citations are not accurate at the
individual publication level does not hinder satisfactory accuracy at higher levels.
4.3. The Results Are Consistent Across Fields
As a subordinate issue to the present study, we aimed to address field differences in the cor-
respondence between self-perceived quality dimensions and citation scores. Aquí, we did not
observe large differences across domains. En comparación, the HEFCE (2015) study found large
variations across fields, the strongest correlation for Clinical Medicine (Spearman’s rank cor-
relation of 0.64 for the FWCI) and less than 0.2 in several Social Science disciplines and the
Humanities (SSH). As noted in the introduction, a common view is that citation indicators have
less reliability in SSH. Our study of author perceptions does not support the notion that SSH
should be more problematic in this respect, although the problem of limited coverage will be
more severe. It should be noted that some of the questions (p.ej., concerning solidity) principalmente
refer to empirical sciences. It is therefore an open question how this is interpreted within the
Humanities.
5. CONCLUSIONS
This study has shown that citation metrics of publications covary with the authors’ own assess-
ments of their quality dimensions. The association is statistically significant, although not
strong. At aggregated levels, there is a distinct pattern in which rankings decline with declining
citation metrics. Generally, the highest accuracy is obtained for highly cited publications,
which are usually considered to have high research quality attributes. In terms of policy impli-
cations, this means that citations are not reliable indicators at the level of individual articles,
while at aggregated levels, the validity is higher, at least according to how authors perceive
quality. Por eso, it is important to take the level of aggregation into account when using cita-
tions as performance measures. Despite statistically significant covariations for all quality
dimensions analyzed, the association is strongest for scientific importance.
EXPRESIONES DE GRATITUD
We are thankful to the R-QUEST team for their input and comments on the questionnaire and a
previous draft of the paper. We would also like to thank three anonymous reviewers for their
valuable comments on earlier drafts of the manuscript. Last but not least we would like to
thank the many researchers who took the time to fill in the questionnaire.
CONTRIBUCIONES DE AUTOR
Dag W. Aksnes: Conceptualización, Curación de datos, Investigación, Metodología, Project admin-
istración, Supervisión, Validación, Escritura: borrador original, Escritura: revisión & edición. Lone
Wanderås Fossum: Curación de datos, Investigación, Software, Escritura: revisión & edición. Fredrik
Niclas Piro: Conceptualización, Análisis formal, Investigación, Metodología, Visualización,
Escritura: borrador original, Escritura: revisión & edición.
CONFLICTO DE INTERESES
Los autores no tienen intereses en competencia.
Estudios de ciencias cuantitativas
124
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
INFORMACIÓN DE FINANCIACIÓN
This research was funded by the Norges Forskningsråd under grant number 256223 (el
R-QUEST Centre).
DISPONIBILIDAD DE DATOS
Data are not available. The participants of this study did not give written consent for their data
to be shared publicly. Bibliographic record data cannot be released due to copyright/license
restricciones.
REFERENCIAS
Aksnes, D. W.. (2006). Citation rates and perceptions of scientific
contribución. Journal of the American Society for Information
Science and Technology, 57(2), 169–185. https://doi.org/10
.1002/asi.20262
Aksnes, D. w., Langfeldt, l., & Wouters, PAG. (2019). Citations, cita-
tion indicators, and research quality: An overview of basic con-
cepts and theories. SAGE Open, 9(1), 1–17. https://doi.org/10
.1177/2158244019829575
Aksnes, D. w., & Rip, A. (2009). Researchers’ perceptions of cita-
ciones. Política de investigación, 38(6), 895–905. https://doi.org/10.1016/j
.respol.2009.02.001
Aksnes, D. w., & Sivertsen, GRAMO. (2019). A criteria-based assessment
of the coverage of Scopus and Web of Science. Journal of Data
and Information Science, 4(1), 1–21. https://doi.org/10.2478/jdis
-2019-0001
Baccini, A., Barabesi, l., & De Nicolao, GRAMO. (2020). On the agreement
between bibliometrics and peer review: Evidence from the Italian
research assessment exercises. MÁS UNO, 15(11), e0242520.
https://doi.org/10.1371/journal.pone.0242520, PubMed: 33206715
Bianco, METRO., Gras, NORTE., & Sutz, j. (2016). Academic evaluation:
Universal instrument? Tool for development? Minerva, 54(4),
399–421. https://doi.org/10.1007/s11024-016-9306-9
Borchardt, r., Moran, C., Cantrill, S., Chemjobber, Oh, S. A., &
Hartings, METRO. R. (2018). Perception of the importance of chemistry
research papers and comparison to citation rates. MÁS UNO,
13(3), e0194903. https://doi.org/10.1371/journal.pone
.0194903, PubMed: 29590216
Bornmann, l., & Daniel, h. D. (2008). What do citation counts
measure? A review of studies on citing behavior. Diario de
Documentation, 64(1), 45–80. https://doi.org/10.1108
/00220410810844150
Caon, METRO., Trapp, J., & Baldock, C. (2020). Citations are a good way
to determine the quality of research. Physical and Engineering
Sciences in Medicine, 43(4), 1145–1148. https://doi.org/10
.1007/s13246-020-00941-9, PubMed: 33165822
Case, D. o., & Higgins, GRAMO. METRO. (2000). How can we investigate
citation behavior? A study of reasons for citing literature in
comunicación. Journal of the American Society for Information
Ciencia, 51(7), 635–645. https://doi.org/10.1002/(CIENCIA)1097
-4571(2000)51:7<635::AID-ASI6>3.0.CO;2-h
Col, J., & Col, S. (1971). Measuring the quality of sociological
investigación: Problems in the use of the Science Citation Index.
American Sociologist, 6, 23–29.
Coupe, t. (2013). Peer review versus citations—An analysis of best
paper prizes. Política de investigación, 42(1), 295–301. https://doi.org/10
.1016/j.respol.2012.05.004
Dirk, l. (1999). A measure of originality: The elements of science.
Social Studies of Science, 29(5), 765–776. https://doi.org/10
.1177/030631299029005004
Donner, PAG. (2017). Document type assignment accuracy in the
journal citation index data of Web of Science. cienciometria,
113(1), 219–236. https://doi.org/10.1007/s11192-017-2483-y
Goodwin, l. D., & Sanguijuela, norte. l. (2006). Understanding correlation:
Factors that affect the size of r. Journal of Experimental Education,
74(3), 249–266. https://doi.org/10.3200/JEXE.74.3.249-266
Harzing, A.-W. (2018). Running the REF on a rainy Sunday after-
mediodía: Can we exchange peer review for metrics? En R. costas,
t. Franssen, & A. Yegros-Yegros (Editores.), Proceedings of the 23rd
International Conference on Science and Technology Indicators
(páginas. 339–345). Centre for Science and Technology Studies
(CWTS), Leiden.
HEFCE. (2015). The metric tide: Correlation analysis of REF2014
scores and metrics (Supplementary report II to the independent
review of the role of metrics in research assessment and manage-
mento). https://responsiblemetrics.org/the-metric-tide/
Hicks, D., Wouters, PAG., waltman, l., de Rijcke, S., & Rafols, I.
(2015). Bibliometrics: The Leiden Manifesto for research metrics.
Naturaleza, 520(7548), 429–431. https://doi.org/10.1038/520429a,
PubMed: 25903611
Ioannidis, j. PAG. A., Boyack, k. w., Pequeño, h., Sorensen, A. A., &
Klavans, R. (2014). Bibliometrics: Is your most cited work your
best? Naturaleza, 514(7524), 561–562. https://doi.org/10.1038
/514561a, PubMed: 25355346
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: Cómo
difficulties in recognizing one’s own incompetence lead to
inflated self-assessments. Journal of Personality and Social Psy-
chology, 77(6), 1121–1134. https://doi.org/10.1037/0022-3514
.77.6.1121, PubMed: 10626367
Lamont, METRO. (2009). How professors think: Inside the curious world
of academic judgment. Cambridge, MAMÁ: Harvard University
Prensa. https://doi.org/10.4159/9780674054158
Langfeldt, l., Nedeva, METRO., Sorlin, S., & tomás, D. A. (2020).
Co-existing notions of research quality: A framework to study
context-specific understandings of good research. Minerva,
58(1), 115–137. https://doi.org/10.1007/s11024-019-09385-2
Langfeldt, l., Reymert, I., & Aksnes, D. W.. (2021). The role of met-
rics in peer assessments. Research Evaluation, 30(1), 112–126.
https://doi.org/10.1093/reseval/rvaa032
Sotavento, C. J., Sugimoto, C. r., zhang, GRAMO., & Cronin, B. (2013). Bias in
peer review. Journal of the American Society for Information
Science and Technology, 64(1), 2–17. https://doi.org/10.1002
/asi.22784
Marx, w., & Bornmann, l. (2015). On the causes of subject-specific
citation rates in Web of Science. cienciometria, 102(2),
1823–1827. https://doi.org/10.1007/s11192-014-1499-9
Mendoza, METRO. (2021). Differences in citation patterns across areas,
article types and age groups of researchers. Publications, 9(4),
47. https://doi.org/10.3390/publications9040047
Estudios de ciencias cuantitativas
125
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Citation metrics covary with researchers’ assessments
Miranda, r., & Garcia-Carpintero, mi. (2018). Overcitation and over-
representation of review papers in the most cited papers. Diario
of Informetrics, 12(4), 1015–1030. https://doi.org/10.1016/j.joi
.2018.08.006
Moed, h. F. (2005). Citation analysis in research evaluation. Berlina:
Saltador.
Nygaard, l. PAG., Aksnes, D. w., & Piro, F. norte. (2022). Identifying gen-
der disparities in research performance: The importance of com-
paring apples with apples. Higher Education, 84, 1127–1142.
https://doi.org/10.1007/s10734-022-00820-0
Ochsner, METRO., Hug, S., & Galleron, I. (2017). The future of research
assessment in the humanities: Bottom-up assessment procedures.
Palgrave Communications, 3, 17020. https://doi.org/10.1057
/palcomms.2017.20
Patterson, METRO. S., & harris, S. (2009). The relationship between
reviewers’ quality-scores and number of citations for papers
published in the journal Physics in Medicine and Biology from
2003–2005. cienciometria, 80(2), 343–349. https://doi.org/10
.1007/s11192-008-2064-1
Polanyi, METRO. (1962). The republic of science: Its political and economic
theory. Minerva, 1, 54–73. https://doi.org/10.1007/BF01101453
Portero, A. l., Chubin, D. MI., & Xiao-Yin, j. (1988). Citations and sci-
entific progress: Comparing bibliometric measures with scientist
judgments. cienciometria, 13(3–4), 103–124. https://doi.org/10
.1007/BF02017178
Savov, PAG., Jatowt, A., & Nielek, R. (2020). Identifying breakthrough
scientific papers. Information Processing & Management, 57(2),
102168. https://doi.org/10.1016/j.ipm.2019.102168
Seglen, PAG. oh. (1992). The skewness of science. Journal of the Amer-
ican Society for Information Science, 43(9), 628–638. https://doi
.org/10.1002/(CIENCIA)1097-4571(199210)43:9<628::AID-ASI5>3.0
.CO;2-0
Shibayama, S., & Wang, j. (2020). Measuring originality in science.
cienciometria, 122(1), 409–427. https://doi.org/10.1007/s11192
-019-03263-0
Sivertsen, GRAMO. (2018). The Norwegian model in Norway. Diario de
Data and Information Science, 3(4), 3–19. https://doi.org/10
.2478/jdis-2018-0017
Pequeño, h. (2018). Characterizing highly cited method and
non-method papers using citation contexts: El rol de
incertidumbre. Journal of Informetrics, 12(2), 461–480. https://doi
.org/10.1016/j.joi.2018.03.007
Smolinsky, l., Sage, D. S., Lercher, A. J., & Cao, A. (2021). Citations
versus expert opinions: Citation analysis of featured reviews of
the American Mathematical Society. cienciometria, 126(5),
3853–3870. https://doi.org/10.1007/s11192-021-03894-2
Tahamtan, I., & Bornmann, l. (2019). What do citation counts
measure? An updated review of studies on citations in scientific
documents published between 2006 y 2018. cienciometria,
121(3), 1635–1684. https://doi.org/10.1007/s11192-019-03243-4
taylor, R. (1990). Interpretation of the correlation-coefficient–A
basic review. Journal of Diagnostic Medical Sonography, 6(1),
35–39. https://doi.org/10.1177/875647939000600106
Traag, V. A., & waltman, l. (2019). Systematic analysis of agree-
ment between metrics and peer review in the UK REF. Palgrave
Comunicaciones, 5, 29. https://doi.org/10.1057/s41599-019
-0233-X
waltman, l. (2012). An empirical analysis of the use of alphabetical
authorship in scientific publishing. Journal of Informetrics, 6(4),
700–711. https://doi.org/10.1016/j.joi.2012.07.008
waltman, l., & costas, R. (2014). F1000 Recommendations as a
potential new data source for research evaluation: A comparison
with citations. Journal of the Association for Information Science
and Technology, 65(3), 433–445. https://doi.org/10.1002/asi
.23040
waltman, l., & Schreiber, METRO. (2013). On the calculation of
percentile-based bibliometric indicators. Journal of the American
Society for Information Science and Technology, 64(2), 372–379.
https://doi.org/10.1002/asi.22775
waltman, l., & Van Eck, norte. j. (2013). A systematic empirical com-
parison of different approaches for normalizing citation impact
indicators. Journal of Informetrics, 7(4), 833–849. https://doi.org
/10.1016/j.joi.2013.08.002
weinberg, A. METRO. (1963). Criteria for scientific choice. Minerva, 1(2),
159–171. https://doi.org/10.1007/BF01096248
Wilsdon, J., allen, l., Belfiore, MI., Campbell, PAG., Curry, S., …
Johnson, B. (2015). The metric tide: Report of the independent
review of the role of metrics in research assessment and manage-
mento. HEFCE. https://responsiblemetrics.org/the-metric-tide/.
https://doi.org/10.4135/9781473978782
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
4
1
1
0
5
2
0
7
8
4
3
7
q
s
s
_
a
_
0
0
2
4
1
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Estudios de ciencias cuantitativas
126