COMMENTARY - Ricerca sull'intelligenza artificiale specializzata al MIT

COMMENTARY

Gender issues in fundamental physics: Strumia’s
bibliometric analysis fails to account
for key confounders and confuses
correlation with causation

a n o p e n a c c e s s

j o u r n a l

Philip Ball1, T. Benjamin Britton2
Gina Rippon6

, Erin Hengel3
, Angela Saini7, and Jessica Wade8

, Philip Moriarty4

, Rachel A. Oliver5

1Freelance Writer, London
2Department of Materials, Imperial College London, London, SW7 2AZ
3Department of Economics, University of Liverpool Management School, Liverpool, L69 7ZH
4School of Physics & Astronomy, University of Nottingham, Nottingham NG7 2RD, UK
5Department of Materials Science, University of Cambridge, 27 Charles Babbage Road, Cambridge CB3 0FS
6Professor Emeritus of Cognitive NeuroImaging, Brain Centre, Aston University, Birmingham B4 7ET
7Science Journalist, London
8Department of Physics, Imperial College London South Kensington Campus, London SW7 2AZ

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

INTRODUCTION

Alessandro Strumia recently published a survey of gender differences in publications and cita-
tions in high-energy physics (HEP). In addition to providing full access to the data, code, E
methodology, Strumia (2021) systematically describes and accounts for gender differences in
HEP citation networks. His analysis points both to ongoing difficulties in attracting women to
HEP and an encouraging—though slow—trend in improvement.

Unfortunately, Tuttavia, the time and effort that Strumia (2021) devoted to collating and
quantifying the data are not matched by a similar rigor in interpreting the results. To support
his conclusions, he selectively cites available literature and fails to adequately adjust for a range
of confounding factors. Per esempio, his analyses do not consider how unobserved factors—for
esempio, a tendency to overcite well-known authors—drive a wedge between quality and
citations and correlate with author gender. He also fails to take into account many structural
and nonstructural factors—including, but not limited to, direct discrimination and the expecta-
tions that women form (and actions they take) in response to it—that undoubtedly lead to gender
differences in productivity.

We therefore believe that a number of Strumia’s conclusions are not supported by his
analysis. Infatti, we reanalyze a subsample of solo-authored papers from his data, adjusting
for year and journal of publication, authors’ research age and their lifetime “fame.” Our reanal-
ysis suggests that female-authored papers are actually cited more than male-authored papers.
This finding is inconsistent with the “greater male variability” hypothesis that Strumia (2021)
proposes to explain many of his results.

In the conclusion to his paper, Strumia states that “… dealing with complex systems, any
simple interpretation can easily be incomplete …”. We agree entirely. Strumia’s simple—and,
more importantly, simplistic—analysis and interpretation are far from complete.

Citation: Ball, P., Britton, T. B., Hengel,
E., Moriarty, P., Oliver, R. A., Rippon,
G., Saini, A., & Wade, J. (2021). Gender
issues in fundamental physics:
Strumia’s bibliometric analysis fails to
account for key confounders and
confuses correlation with causation.
Quantitative Science Studies, 2(1),
263–272. https://doi.org/10.1162
/qss_a_00117

DOI:
https://doi.org/10.1162/qss_a_00117

Corresponding Authors:
Erin Hengel
erin.hengel@liverpool.ac.uk

Philip Moriarty
philip.moriarty@nottingham.ac.uk

Copyright: © 2021 Philip Ball, T.
Benjamin Britton, Erin Hengel, Philip
Moriarty, Rachel A. Oliver, Gina
Rippon, Angela Saini, and Jessica
Wade. Published under a Creative
Commons Attribution 4.0 Internazionale
(CC BY 4.0) licenza.

The MIT Press

Gender issues in fundamental physics: A critique of Strumia’s analysis

2. BIASED LITERATURE REVIEW
Strumia (2021) notes that there is a “vast literature” dealing with gender differences in STEM
subjects. Scientific analyses of gender differences should represent this literature in an even-
handed and unbiased manner; as Del Giudice, Puts et al. (2019) highlight, “An honest, sophis-
ticated public debate on sex differences demands a broad perspective with an appreciation for
nuance and full engagement with all sides of the question.” That appreciation for nuance and full
engagement is not present in Strumia (2021). Per esempio

(cid:129) In the introduction to his paper, Strumia asserts that “No significant biases have been
found in examined real grant evaluations [Ceci et al., 2014; Ley & Hamilton, 2008;
Marsh, Jayasinghe, & Bond, 2011; Mutz, Bornmann, & Daniel, 2012] and referee reports
of journals [Borsuk, Aarssen et al., 2009; Ceci et al., 2014; Edwards, Schroeder, &
Dugdale, 2018].” Yet a large body of literature—which he fails to cite—reaches the op-
posite conclusion. (Vedere, Per esempio, Burns, Straus et al., 2019; Card, DellaVigna et al.,
2020; Dworkin, Linn et al., 2020; Fox & Paine, 2019; Hengel, 2017; Helmer, Schottdorf
et al., 2017; Royal Society of Chemistry, 2019; Steinberg, Skae, & Sampson, 2018;
Witteman, Hendricks et al., 2019 and references therein.) Strumia (2021)’s lack of
balance in citing relevant work is misleading and arguably disingenuous. An objective
analysis of gender differences should aim to be neither.

(cid:129) Strumia (2021) also notes that theoretical modeling of citations is “affected by question-
able systematic issues.” We assume the use of “questionable” here is meant to capture
relationships that are difficult to identify and quantify (per esempio., due to a lack of suitable con-
trols). We agree entirely that all efforts to study citations must be mindful of these limita-
zioni. Again, Tuttavia, Strumia selectively cites just one paper that “[tries] to correct
for some social factors,” namely Caplar, Tacchella, and Birrer (2017). Other studies ana-
lyzing more restricted samples have come to the opposite conclusion (per esempio., Card et al.,
2020; Hengel & Moon, 2020).

We do not, Ovviamente, expect Strumia (2021) to review the entire research on gender differ-
ences in STEM. We believe, Tuttavia, that a fairer representation of the literature is warranted,
especially considering the contentious nature of the topic.

3. CONFOUNDERS AND STATISTICAL REANALYSIS

In addition to selectively citing the literature, Strumia (2021) fails to consider the potential impact
of a broad range of confounding factors on gender differences in citations and the publication
processi. To help highlight the problems this introduces, one of the authors (EH) reanalyzed a
subset of Strumia’s data.

The reanalyzed data contain 5,599 solo-authored articles—5,386 authored by a man and 213
authored by a woman—published in five high-profile physics journals from 2010 A 2016 (inclu-
sive). The selection criteria were designed to address the influence of key confounders in
Strumia’s data. Primo, we restricted the sample to solo-authored articles to account for the fact that
male physicists are more likely to be senior authors on papers involving much larger research
teams1. Secondo, restricting the data to articles published in a small set of well-known journals

1 In contrasto, Strumia (2021), Hengel (2017), and Hengel and Moon (2020) assume that each coauthor on a
paper contributes equally to it. This relationship, Tuttavia, is unlikely to be linear when there are a very large
number of coauthors, as there often are in fields such as high-energy experimental physics.

Quantitative Science Studies

264

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

made it easier to confirm the quality of gender assignment in Strumia’s data2. By including
journal-year fixed effects, we also better account for differing citation patterns between fields3.
Third, younger articles have had less time to accrue citations and older articles are disproportion-
ately by male authors. For that reason, we also restrict our analysis to newer articles (cioè., quelli
published between 2010 E 2016) as well as controlling for journal-year fixed effects.

An additional difference between Strumia’s study and our own is that we analyze data at the
article level instead of aggregating citations over authors. This allows us to better address the
following issues discussed in Hengel and Moon (2020):

1. Male authors disproportionately cluster at the very top and very bottom of the citation dis-
tribution, but raw citation counts are truncated from below at zero and unbounded from
above. This generates a nonlinear mapping from quality onto citations that depends on
the former’s variance. When used as a proxy for quality, average citations for male-authored
papers will, as a result, generally place too much weight on high-citation papers and not
enough weight on low-citation papers compared to the average for female-authored papers.
To deal with this issue, we transform raw citation counts with the inverse hyperbolic sine
function (asinh). We stress, Tuttavia, that our results do not meaningfully change if we use
raw citation counts as the dependent variable instead.

2. Unobserved (or uncontrolled for) confounders (per esempio., winning a prestigious award) boost
citations conditional on quality and disproportionately correlate with articles and authors
located in the distribution’s right tail. A related concern is that the citations that a paper
accumulates are not fixed in time. Di conseguenza, they could be influenced by the future suc-
cess or failure of a paper’s authors (cioè., even among nonsuperstar physicists, a stronger
publishing record later on probably drives citations to earlier work, all else being equal).
For evidence see, Per esempio, Bjarnason and Sigfusdottir (2002). Both factors potentially
correlate with gender: Per esempio, women produce fewer papers than men and are
proportionately less likely to win the Nobel Prize.
To deal with these issues, we control for the year in which an author was first published and
her total number of lifetime publications4. Our results should therefore be interpreted as
gender differences in citations between authors who began their careers around the same
time and had accumulated similar lifetime “fame” at the time citations were collected.

Assuming citations are not perfectly explained by these variables5, gender differences present
in Strumia’s complete, unadjusted data should also be present in the conditioned, restricted

2 To verify gender, we manually searched (via Google) for all the previously identified women and the subset of
men with no more than one citation. Gender was identified based on pronouns or inferred from photos.
Authors classified as female about whom we found no information—or the information we did find was
ambiguous about gender—are omitted from our analysis (13 observations). As per the main text, we note that
we would rarely identify individuals as nonbinary from this gender analysis. From this analysis, we found that
21–26% of people classified as women are more reasonably classified as men. Inoltre, their solo-authored
articles tended to receive a disproportionately low number of citations.

3 Strumia (2021), in contrast, adjusts for field by normalizing a paper’s citation count by the length of its own
reference list, which roughly correlates with field (Strumia, 2021; Eq. 1). ( We obtain similar results using his
normalized indicator of citations.)

4 To account for age, Strumia (2021) weights each author by one-half times the inverse of the proportion of
authors of the same gender who first published in the same year he or she did (Eq. 2). Per esempio, if 300
authors—two of whom were female—first published in 1995, then each female author would be weighted
by 75, whereas each male author would be weighted by roughly 0.5.

5 Effectively, the distribution of citations cannot collapse to a degenerate distribution after conditioning on these
variables. As evidence against this possibility, it does not appear that citations are homogenous, conditional
SU, Per esempio, journal.

Quantitative Science Studies

265

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

Figura 1. Distribution of citations for solo-authored papers. Note. Graphs display histograms of asinh transformed (left) and raw (right) cita-
tions for solo-authored papers by men (blue) and women (pink) published between 2010 E 2016 (inclusive) in Physical Review D,
Astrophysical Journal, Journal of High Energy Physics, Physical Review Letters, and Physics Letters B. Citations have been residualized with
respect to year-journal fixed effects, fixed effects for each author’s year of first publication, and total lifetime number of publications. Data from
Strumia (2021).

data if male physicists are, Infatti, biologically more productive or produce higher quality
work than women6. In other words, if male talent is more variable, then male physicists’ work
will be (on average) higher “quality” (as defined by citation count), all else being equal. As a
result, a paper by a famous female physicist published in 2010 in Physical Review D should be
(on average) cited less than a paper by a similarly famous male physicist that was also pub-
lished in Physical Review D in 2010.

This is not what we observe. Our evidence suggests that female-authored papers receive
Di 12 log points more citations than male-authored papers, conditional on covariates. Questo
figure is weakly statistically significant.

Figura 1 shows the distribution of citations among male vs. female-authored papers in this
sample for both transformed and nontransformed citations. Note that the distribution of citations
to male-authored papers closely overlaps with the distribution of citations to female-authored
papers across the entire range of the distribution.

Estimated gender differences in citations at the mean for each of the five journals are shown in
Figura 2. They consistently suggest either no statistically significant gender gap in citations or a
citation gap that favors women.

We are aware that automatic and binary classification here can be problematic (Keyes, 2018),
especially for nonbinary and transgender physicists. We have followed Strumia (2021)’s use of a
binary classification scheme, as we wish to highlight issues with the original analysis and we do
not have access to demographic information provided by the authors within the present work.
But we recognize the shortcomings of a reductive analysis to an apparent gender binary in such
bibliometric analyses and discussions (Strauss, Borges et al., 2020; Rasmussen, Maier et al.,

6 Assuming male and female talent is normally distributed with identical means, then gender differences in var-
iability are equivalent to gender differences in (conditional) medie. Presumably, all physicists are drawn
from the top half of each distribution; così, greater variability in men implies that average male talent is higher
than average female talent, conditional on being a physicist.

Quantitative Science Studies

266

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

Figura 2. Gender differences in citations across journals. Note. The first five figures display conditional mean gender differences in citations
(asinh); a positive value indicates female-authored papers are cited more than male-authored papers, conditional on included controls. They
are estimated by regressing citations (asinh) on a dummy variable equal to 1 if the author was female in the subsamples of solo-authored articles
published between 2010 E 2016 (inclusive) in Journal of High Energy Physics ( JHEP), Physical Review Letters (Phys.Rev.Lett), Astrophysical
Journal (Astrophys. J.) and Physical Review D (Phys.Rev.D). The final figure is the estimated gender difference in the pooled data. Lines
represent 95% confidence intervals from standard errors clustered on an author’s year of first publication. All estimates control for fixed effects
for year or year-journal interactions, year of first publication, and total lifetime number of publications. Data from Strumia (2021).

2019) and urge journals to collect demographic data regarding authorship more routinely and
to promote the use of self-identity.

4. THE HIGHER MALE VARIABILITY HYPOTHESIS
Despite the admission in Strumia (2021) that the simple interpretation laid out therein “can easily
be incomplete,” the data are nonetheless explained in the context of the highly contentious
higher male variability (HMV) hypothesis and a biological basis of difference, together with
gendered differences in interests. Once again, there is a lack of appropriate representation
and citation of the relevant extensive literature base.

The HMV hypothesis is widely contested and debated; both Gray, Lyth et al. (2019) E
Stevens and Haidt (2017) provide systematic and even-handed discussions. Note, in particular,
the geographical variation highlighted by Gray et al. (2019) in relation to the HMV hypothesis,
which counters the claims that any observed gender differences are biological in origin:

… we find that there is significant heterogeneity between countries, and that much of this can
be quantified using variables applicable across these assessments (such as test, year, male-
female effect size, mean country size, and Global Gender Gap Indicators).

Geographical and temporal heterogeneity are consistently observed in a variety of measures of
gender disparity in STEM (Vedere, Per esempio, Breda, Jouini, & Napp, 2018; Kane & Mertz, 2012;
Nollenberger, Rodríguez-Planas, & Sevilla, 2016). Counterintuitively, Tuttavia, the so-called
“gender equality paradox” put forward by Stoet and Geary (2018), and cited on a number of
occasions in Strumia (2021), is the claim that countries with a higher level of gender equality
tend to have less gender balance in STEM fields. We note that Stoet and Geary’s arguments
have been undermined significantly by the many deficiencies in their data analysis highlighted
by Richardson, Reiches et al. (2020) (including those that have necessitated the publication of
a corrected version of Stoet and Geary [2018]).

Quantitative Science Studies

267

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

Strumia’s undue and unwarranted confidence in the HMV interpretation—given his admis-
sion that the data are influenced by “questionable systematic issues”—is such that the abstract
closes with the claim that the “quantitative shape” of the data can be “fitted by higher male
variability.” As highlighted by Hyde (2014),

… even if there is slightly greater male variability for some cognitive measures, this finding
is simply a description of the phenomenon. It does not address the causes of greater male
variability, which could be due to biological factors, sociocultural factors, or both.

The sociocultural factors to which Hyde refers are exceptionally difficult to model (and correct
for) yet may play an integral role, as discussed by Kalender, Marshman et al. (2019) in their
study of gendered patterns in the construction of physics identity.

Strumia (2021) also does not provide any direct evidence for the causal link he suggests
between the HMV hypothesis, biological determinism, and citation rates. Invece, there is
nothing more than an inference based on drawing a comparison with what are termed “psycho-
metric observations.” This is an entirely unjustified conflation of correlation and causation and
has no place in a rigorous interpretation of the data. Only properly controlled studies allow for a
robust distinction between correlation and causation—a fundamental tenet of all statistical
analysis. Strumia (2021) admits that those controlled studies are simply not possible.

Allo stesso modo, the homogeneity of the people/things dimension (Spelke, 2005; Thelwall, Bailey
et al., 2019) is very much overstated in Strumia (2021) in support of the argument that a lack of
interest underpins the level of participation by women in HEP. Thelwall et al. (2019), whose
work is cited by Strumia in the context of the people/things metric, devote a great deal of time
in the conclusion of their paper highlighting the deficiencies in their approach:

Così, the people/things dimensions can only provide a partial explanation for gender differ-
ences in topic choices across the full spectrum of academia because there are many impor-
tant exceptions … Given that the current research has not attempted to assess any cause and
effect relationships, deviations from the people/thing dimensions could also be due to other
factors within academia that deflect people from pursuing their interests, such as editorial,
departmental or funding policies.

None of this important yet difficult-to-quantify nuance is captured by the discussion in Strumia
(2021).

Another confounding factor is vocal criticism of women within academia by individuals such
as Strumia, who may well be contributing to the problem of women feeling unwelcome in
physics. As Halpern, Benbow et al. (2007), whose paper is cited in Strumia (2021), put it,

We conclude that early experience, biological factors, educational policy, and cultural con-
text affect the number of women and men who pursue advanced study in science and math
and that these effects add and interact in complex ways. There are no single or simple answers
to the complex questions about sex differences in science and mathematics.

The issues outlined above demonstrate that extreme care must be taken when arguing for causal
relationships among variables in a system where confounders are exceptionally difficult to deal
con (O, Infatti, to identify in the first place). A full, appropriately controlled statistical analysis of
the cumulative influence of bias—both explicit and implicit, including differing levels of scrutiny
in the publication process (Hengel, 2017), discrimination, harassment, bullying, parental and

Quantitative Science Studies

268

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

domestic responsibilities, access to research funding, and variations in teaching, committee, E
pastoral care workloads on the work of women in HEP (and hence on their outputs and citation
rates)—would be needed to interpret Strumia’s observations in a scientifically robust and
objective manner.

5. BIBLIOMETRICS AS A PROXY FOR SCIENTIFIC QUALITY
There is yet a broader issue surrounding Strumia’s methodology: The analysis implicitly assumes
not only a direct link between citation rates (normalized as per Eq. 1 of Strumia [2021]) and scien-
tific quality, Ma, as noted above, it infers—with no direct quantitative evidence—a further link to
the HMV hypothesis. Inoltre, it is assumed that all authors contribute equally to a paper. Strumia
recognizes, to some extent, that this assumption of equal contributions is problematic—“… there
is no warranty that each author contributed to each paper”—but his justification is very far from
compelling:

Despite this, the data show that the total fractionally counted bibliometric output of collab-
orations scales, on average, as their number of authors [Rossi, Strumia, & Torre (2019)],
suggesting that large collaborations form when scientifically needed and that gift authorship
does not play a large role.

A lack of gift authorship (cioè., the inclusion of an author who contributed little or nothing at all)
is a far cry from the extreme assumption of equal contributions across the entire authorship of a
paper. (This is another justification for the restriction of our analysis in the previous section to
solo-authored publications.)

In dismissing the role of sociological factors, Strumia (2021) also subjectively (and rather in-
consistently) appeals to the reader’s perception of the prestige of top authors: “A physicist might
read their names and conclude that no sociological confounder can wash away most of them.”
MacRoberts and MacRoberts (2010) describe the process of biased citing, which involves a
number of considerations including the “halo effect” (exemplified by the preceding quote from
Strumia [2021]), in-house citations, obliteration, and the Matthew effect. Each of these effects,
and others (Cowley, 2015) can lead to disproportionate citations of the primary literature. There
is also very good evidence that seminal papers can often be cited but not actually read (Ball,
2002; Simkin & Roychowdhury, 2003).

More broadly, the use of citations as a measure of scientific quality is highly questionable and
has been the subject of significant debate for decades (see Leydesdorff, Bornmann et al. (2016)
for a review). We note that Strumia highlights that “traditional metrics (such as citation counts,
h-index, paper counts) now fail to provide reasonable proxies for scientific merit in fundamental
physics.” His solution—the introduction of what he terms an “individual citation” metric, Eq. 1
of Strumia (2021)—is not a compelling strategy to isolate scientific quality from citation
numbers. As Aksnes, Langfeldt, and Wouters (2019) highlight,

Research quality is a multidimensional concept, where plausibility/soundness, originality,
scientific value, and societal value commonly are perceived as key characteristics … cita-
tions reflect aspects related to scientific impact and relevance, although with important lim-
itations … there is no evidence that citations reflect other key dimensions of research quality

Strumia’s reasoning regarding the efficacy of citation metrics as a measure of scientific quality is
also notably circular: “… bibliometric indicators, that can be used as reliable proxies for scien-
tific merit, being significantly correlated to human evaluations such as scientific prizes.” The

Quantitative Science Studies

269

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

attempt to draw a distinction here between bibliometric indicators and “human evaluations
such as scientific prizes” is telling. Bibliometrics are not a wholly objective quantitative proxy
for scientific merit—they are just as much a human evaluation as is a scientific prize. The award
of scientific prizes very often involves detailed consideration of citation rates, and the prestige
that stems from a scientific prize will in turn drive more citations for a particular scientist. È
naïve to imagine that bibliometrics are an independent measure of scientific merit whose
usage is somehow justified by their correlation with the award of “human evaluations such
as scientific prizes.”

6. CONCLUSIONS

The analysis presented in Strumia (2021) suffers from a number of deficiencies that severely
undermine the inferences drawn and the conclusions reached therein. In particular, the contri-
butions of a wide variety of potential confounding factors are not only unaddressed but are dis-
missed with no justification. This unwarranted dismissal, coupled with a fundamental confusion
of correlation and causation when drawing inferences, is especially problematic in a study that
claims to provide an unbiased assessment of the role of gender in citation patterns. Invece, IL
analysis throughout is very far from neutral or disinterested; the data are interpreted within an
ideologically motivated context—as is clear from both the paper itself and previous work by the
same author, namely Strumia (2018) and Strumia (2019).

In that broader context, Strumia’s analysis is substantively problematic. The views he es-
pouses have had an impact not only across the physics and STEM communities but have been
widely reported across the international media (including Conradi, 2019; Giuffrida and Busby,
2018; Nicholson, 2018; Young, 2019). Physics as a discipline is broadly acknowledged to strug-
gle with both the recruitment and retention of women (Eddy & Brownell, 2016; Kalender et al.,
2019; Porter & Ivie, 2019). The widespread dissemination of derogatory and unsubstantiated
views about women’s ability in this sphere is not going to overcome this problem. It will be
off-putting to talented women who might be considering a career in the sciences and contributes
to the hostile environment endured by many women (and other minorities) currently working in
physics. There is an increasing body of research that suggests that an individual’s performance in
an academic context can be harmed by an awareness that others’ perception of their work might
be distorted by stereotypes (Vedere, Per esempio, Casad, Hale, & Wachs, 2017; Kalender et al.,
2019; Shapiro & Williams, 2012). The associated cultural implications can result in minoritized
individuals not contributing or disengaging from their academic communities.

In definitiva, the work presented in Strumia (2021) is not merely a flawed, biased, and ideo-
logically motivated analysis. It is also likely to be actively harmful to the progress of women in
physics, to the detriment not only of many individuals but of our entire community.

ACKNOWLEDGMENTS

We thank Beck Strauss for helpful discussions regarding issues associated with misclassification
of gender.

FUNDING INFORMATION

TBB acknowledges funding of his Research Fellowship from the Royal Academy of Engineering.
PM similarly acknowledges the Engineering and Physical Sciences Research Council (EPSRC)
for the award of an established career fellowship (EP/T033568/1.).

Quantitative Science Studies

270

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

DATA AVAILABILITY

Data and replication files for all analyses presented in this paper are available on GitHub (https://
github.com/erinhengel/strumia-qss).

REFERENCES

Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, cita-
tion indicators, and research quality: An overview of basic con-
cepts and theories. SAGE Open, 9(1), 1–17. DOI: https://doi.org
/10.1177/2158244019829575

Ball, P. (2002). Paper trail reveals references go unread by citing
authors. Nature, 420(6916), 594. DOI: https://doi.org/10.1038
/420594UN, PMID: 12478249

Bjarnason, T., & Sigfusdottir, IO. D. (2002). Nordic impact: Article
productivity and citation patterns in sixteen Nordic sociology
departments. Acta Sociologica, 45(4), 253–267. DOI: https://
doi.org/10.1177/000169930204500401

Breda, T., Jouini, E. & Napp, C. (2018). Societal inequalities amplify
gender gaps in math. Scienza, 359(6381), 1219–1220. DOI:
https://doi.org/10.1126/science.aar2307, PMID: 29590066

Burns, K. E. A., Straus, S. E., Liu, K., Rizvi, L., & Guyatt, G. (2019).
Gender differences in grant and personnel award funding rates at
the Canadian Institutes of Health Research based on research
content area: A retrospective analysis. PLOS Medicine, 16(10),
e1002935. DOI: https://doi.org/10.1371/journal.pmed.1002935,
PMID: 31613898, PMCID: PMC6793847

Caplar, N., Tacchella, S., & Birrer, S. (2017). Quantitative evaluation
of gender bias in astronomical publications from citation counts.
Nature Astronomy, 1, 0141. DOI: https://doi.org/10.1038
/s41550-017-0141

Card, D., DellaVigna, S., Funk, P., & Iriberri, N. (2020). Are referees
and editors in economics gender neutral? Quarterly Journal of
Economics, 135(1), 269–327. DOI: https://doi.org/10.1093/qje
/qjz035

Casad, B. J., Hale, P., & Wachs, F. l. (2017). Stereotype threat
among girls: Differences by gender identity and math education
context. Psychology of Women Quarterly, 41(4), 513–529. DOI:
https://doi.org/10.1177/0361684317711412

Conradi, P. (2019). Alessandro Strumia: The data doesn’t lie—
women don’t like physics. The Times, Marzo 24. https://www
.thetimes.co.uk/article/alessandro-strumia-the-data-doesnt-lie
-women-dont-like-physics-jl0bpfd9t (accessed March 18, 2020).
Cowley, S. J. (2015). How peer-review constrains cognition: On the
frontline in the knowledge sector. Frontiers in Psychology, 6,
1706. DOI: https://doi.org/10.3389/fpsyg.2015.01706, PMID:
26579064, PMCID: PMC4630500

Del Giudice, M., Puts, D. A., Geary, D. C., & Schmitt, D. P. (2019).
Sex differences in brain and behavior: Eight counterpoints.
Psychology Today, April 8. https://www.psychologytoday.com
/intl/blog/sexual-personalities/201904/sex-differences-in-brain
-and-behavior-eight-counterpoints?amp (accessed March 18,
2020).

Dworkin, J. D., Linn, K. A., Teich, E. G., Zurn, P., Shinohara, R. T.,
& Bassett, D. S. (2020). The extent and drivers of gender imbal-
ance in neuroscience reference lists. Nature Neuroscience, 23,
918–926. DOI: https://doi.org/10.1038/s41593-020-0658-y,
PMID: 32561883

Eddy, S. L., & Brownell, S. E. (2016). Beneath the numbers: A re-
view of gender disparities in undergraduate education across sci-
ence, technology, engineering, and math disciplines. Physical
Review Physics Education Research, 12(2), 020106. DOI:
https://doi.org/10.1103/PhysRevPhysEducRes.12.020106

Fox, C. W., & Paine, C. E. T. (2019). Gender differences in peer
review outcomes and manuscript impact at six journals of ecology
and evolution. Ecology and Evolution, 9(6), 3599–3619. DOI:
https://doi.org/10.1002/ece3.4993, PMID: 30962913, PMCID:
PMC6434606

Giuffrida, UN. & Busby, M. (2018). ‘Physics was built by men’: Cern
suspends scientist over remarks. The Guardian, ottobre 1. http://
www.theguardian.com/science/2018/oct/01/physics-was-built
-by-men-cern-scientist-alessandro-strumia-remark-sparks-fury
(accessed March 18, 2020).

Gray, H., Lyth, A., McKenna, C., Stothard, S., Tymms, P., &
Copping, l. (2019). Sex differences in variability across nations in
reading, mathematics and science: A meta-analytic extension of
Baye and Monseur (2016). Large-Scale Assessments in Education,
7(1), 2. DOI: https://doi.org/10.1186/s40536-019-0070-9

Halpern, D. F., Benbow, C. P., Geary, D. C., Gur, R. C., Hyde, J. S.,
& Gernsbacher, M. UN. (2007). The science of sex differences in
science and mathematics. Psychological Science in the Public
Interesse, 8(1), 1–51. DOI: https://doi.org/10.1111/j.1529
-1006.2007.00032.X, PMID: 25530726, PMCID: PMC4270278
Helmer, M., Schottdorf, M., Neef, A., Battaglia, D. (2017). Gender
bias in scholarly peer review. eLife, 6, e21718. DOI: https://doi
.org/10.7554/eLife.21718, PMID: 28322725, PMCID:
PMC5360442

Hengel, E. (2017). Publishing while female. Are women held to
higher standards? Evidence from peer review. Cambridge
Working Papers in Economics 1753. Faculty of Economics,
University of Cambridge. http://www.erinhengel.com/research
/publishing_female.pdf.

Hengel, E., & Moon, E. (2020). Gender and quality at top economics
journals. Working Paper in Economics 202001. University of
Liverpool Management School. https://erinhengel.github.io
/Gender-Quality/quality.pdf.

Hyde, J. S. (2014). Gender similarities and differences. Annual
Review of Psychology, 65, 373–398. DOI: https://doi.org/10
.1146/annurev-psych-010213-115057, PMID: 23808917

Kalender, Z. Y., Marshman, E., Schunn, C. D., Nokes-Malach, T. J., &
Singh, C. (2019). Gendered patterns in the construction of physics
identity from motivational factors. Physical Review Physics
Education Research, 15(2), 020119. DOI: https://doi.org/10
.1103/PhysRevPhysEducRes.15.020119

Kane, J. M., & Mertz, J. E. (2012). Debunking myths about gender
and mathematics performance. Notices of the AMS, 59(1), 10–21.
DOI: https://doi.org/10.1090/noti790

Keyes, O. (2018). The misgendering machines: Trans/HCI implica-
tions of automatic gender recognition. Proceedings of the ACM
on Human-Computer Interaction, Vol. 2, No. CSCW, Article 88.
DOI: https://doi.org/10.1145/3274357

Leydesdorff, L., Bornmann, L., Comins, J. A., & Milojevic(cid:1), S. (2016).
Citations: Indicators of quality? The impact fallacy. Frontiers in
Research Metrics and Analytics, 1, 787. DOI: https://doi.org/10
.3389/frma.2016.00001

MacRoberts, M. H. & MacRoberts, B. R. (2010). Problems of cita-
tion analysis: A study of uncited and seldom-cited influences.
Journal of the American Society for Information Science and
Tecnologia, 61(1), 1–12. DOI: https://doi.org/10.1002/asi.21228

Quantitative Science Studies

271

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

2
1
2
6
3
1
9
0
6
5
3
7
q
S
S
_
UN
_
0
0
1
1
7
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Gender issues in fundamental physics: A critique of Strumia’s analysis

Nicholson, l. (2018). Physics does have a women problem—and
it’s not rocket science to work out why. Daily Telegraph,
ottobre 1. https://www.telegraph.co.uk/women/work/physics
-does-have-women-problem-not-rocket-science-work/ (avuto accesso
Marzo 18, 2020).

Nollenberger, N., Rodríguez-Planas, N., & Sevilla, UN. (2016) IL
math gender gap: The role of culture. American Economic
Review, 106(5), 257–261. DOI: https://doi.org/10.1257/aer
.p20161121

Porter, UN. M., & Ivie, R. (2019). Women in physics and astronomy,
2019. American Institute of Physics. https://www.aip.org/sites
/default/files/statistics/women/ Women%20in%20Physics%20
and%20Astronomy%202019.1.pdf.

Rasmussen, K. C., Maier, E., Strauss, B. E., Durbin, M., Riesbeck, L.,
… Erena, UN. (2019). The nonbinary fraction: Looking towards the
future of gender equity in astronomy. Bulletin of the AAS, 51(7).
Richardson, S. S., Reiches, M. W., Bruch, J., Boulicault, M., Noll,
N. E., & Shattuck-Heidorn, H. (2020). Is there a gender-equality
paradox in science, technology, engineering, and math (STEM)?
Commentary on the study by Stoet and Geary (2018).
Psychological Science, 31(3), 338–341. DOI: https://doi.org/10
.1177/0956797619872762, PMID: 32043923

Royal Society of Chemistry. (2019). Is publishing in the chemical
sciences gender biased? Royal Society of Chemistry. https://www
.rsc.org/globalassets/04-campaigning-outreach/campaigning
/gender-bias/gender-bias-report-final.pdf.

Shapiro, J. R., & Williams, UN. M. (2012). The role of stereotype threats
in undermining girls’ and women’s performance and interest in
STEM fields. Sex Roles, 66(3–4), 175–183. DOI: https://doi.org
/10.1007/s11199-011-0051-0

Simkin, M. V., & Roychowdhury, V. P. (2003). Read before you cite!

Complex Systems, 14(3), 269–274.

Spelke, E. S. (2005). Sex differences in intrinsic aptitude for math-
ematics and science? A critical review. American Psychologist,
60(9), 950–958. DOI: https://doi.org/10.1037/0003-066X
.60.9.950, PMID: 16366817

Steinberg, J. J., Skae, C., & Sampson, B. (2018). Gender gap, dis-
parity, and inequality in peer review. The Lancet, 391(10140),

2602–2603. DOI: https://doi.org/10.1016/S0140-6736(18)
31141-3

Stevens, S. & Haidt, J. (2017). The Google memo: What does the
research say about gender differences? Heterodox: The Blog,
agosto 10. https://heterodoxacademy.org/the-google-memo
-what-does-the-research-say-about-gender-differences/ (avuto accesso
Marzo 18, 2020).

Stoet, G., & Geary, D. C. (2018). The gender-equality paradox in
science, technology, engineering, and mathematics education.
Psychological Science, 29(4), 581–593. DOI: https://doi.org
/10.1177/0956797617741719, PMID: 29442575

Strauss, B. E., Borges, S. R., Faridani, T., Grier, J. A., Kiihne, A., …
Zamloot, V. (2020). Nonbinary systems: Looking towards the
future of gender equity in planetary science. arXiv, arXiv:2009
.08247.

Strumia, UN. (2018). Experimental test of a new global discrete sym-
metry. https://alessandrostrumiahome.files.wordpress.com/2019
/03/strumiagenderslidescern.pdf.

Strumia, UN. (2019). Is there a gender bias in physics? https://
a l e s s a n d r o s t r um i a ho m e . f i l es . w o r d p r e s s . c om / 2 0 1 9/ 1 0
/genderslidesleiden19.pdf.

Strumia, UN. (2021) Gender issues in fundamental physics: A biblio-

metric analysis. Quantitative Science Studies (this issue).

Thelwall, M., Bailey, C., Tobin, C., & Bradshaw, N.-A. (2019).
Gender differences in research areas, methods and topics: Can
people and thing orientations explain the results? Journal of
Informetrics, 13(1), 149–169. DOI: https://doi.org/10.1016/j.joi
.2018.12.002

Witteman, H. O., Hendricks, M., Straus, S., & Tannenbaum, C.
(2019). Are gender gaps due to evaluations of the applicant or
the science? A natural experiment at a national funding agency.
The Lancet, 393(10171), 531–540. DOI: https://doi.org/10.1016
/S0140-6736(18)32611-4

Young, C. (2019). Alessandro Strumia: Another politically-correct
witch-hunt, or a more complicated story? Quillette, April 22.
https://quillette.com/2019/04/22/alessandro-strumia-another
-politically-correct-witch-hunt-or-a-more-complicated-story/
(accessed March 18, 2020).