On the Relationships Between the Grammatical Genders of Inanimate

On the Relationships Between the Grammatical Genders of Inanimate
Nouns and Their Co-Occurring Adjectives and Verbs

Adina Williams∗1 Ryan Cotterell∗,2,3 Lawrence Wolf-Sonkin4
Dami´an Blasi5 Hanna Wallach6
2ETH Z¨urich
5Universit¨at Z¨urich

1Facebook AI Research
4Johns Hopkins University

3University of Cambridge

6Microsoft Research

adinawilliams@fb.com

ryan.cotterell@inf.ethz.ch

lawrencews@jhu.edu

damian.blasi@uzh.ch

wallach@microsoft.com

Abstract

We use large-scale corpora in six different
gendered languages, along with tools from
NLP and information theory, to test whether
there is a relationship between the grammatical
genders of inanimate nouns and the adjectives
used to describe those nouns. For all six
languages, we find that there is a statistically
significant relationship. We also find that
there are statistically significant relationships
between the grammatical genders of inanimate
nouns and the verbs that take those nouns
as direct objects, as indirect objects, and as
subjects. We defer deeper investigation of
these relationships for future work.

1

Introduction

In many languages, nouns possess grammati-
cal genders. When a noun refers to an animate
object, its grammatical gender typically reflects
the biological sex or gender identity of that
object (Zubin and K¨opcke, 1986; Corbett, 1991;
Kramer, 2014). For example, in German, the word
for a boss is grammatically feminine when it
refers to a woman, but grammatically masculine
when it refers to a man—Chefin and Chef, res-
pectively. But inanimate nouns (i.e., nouns that
refer to inanimate objects) also possess grammat-
ical genders. Any German speaker will tell you
that the word for a bridge, Br¨ucke, is grammati-
cally feminine, even though bridges have neither
biological sexes nor gender identities. Histori-
cally, the grammatical genders of inanimate nouns
have been considered more idiosyncratic and

∗Equal contribution in this scientific whirlwind.

139

less meaningful than the grammatical genders
of animate nouns (Brugmann, 1889; Bloomfield,
1933; Fox, 1990; Aikhenvald, 2000). However,
some cognitive scientists have reopened this dis-
cussion by using laboratory experiments to test
whether speakers of gendered languages reveal
gender stereotypes (Sera et al., 1994)—for exam-
ple, and most famously, when choosing adjectives
to describe inanimate nouns (Boroditsky et al.,
2003).

Although laboratory experiments are highly
informative, they typically involve small sample
sizes. In this paper, we therefore use large-scale
corpora and tools from NLP and information
theory to test whether there is a relationship
between the grammatical genders of inanimate
nouns and the adjectives used to describe those
nouns. Specifically, we calculate the mutual infor-
mation (MI)—a measure of the mutual statisti-
cal dependence between two random variables—
between the grammatical genders of inanimate
nouns and the adjectives that describe them (i.e.,
share a dependency arc labeled AMOD) using large-
scale corpora in six different gendered languages
(specifically, German, Italian, Polish, Portuguese,
Russian, and Spanish). For all six languages, we
find that the MI is statistically significant, meaning
that there is a relationship.

We also test whether there are relationships
between the grammatical genders of inanimate
nouns and the verbs that take those nouns as direct
objects, as indirect objects, and as subjects. For all
six languages, we find that there are statistically
significant relationships for the verbs that take
those nouns as direct objects and as subjects. For
five of the six languages, we also find that there
is a statistically significant relationship for the
verbs that take those nouns as indirect objects, but

Transactions of the Association for Computational Linguistics, vol. 9, pp. 139–159, 2021. https://doi.org/10.1162/tacl a 00355
Action Editor: Sebastian Pad´o. Submission batch: 3/2020; Revision batch: 7/2020; Published 3/2021.
c(cid:3) 2021 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
5
5
1
9
2
4
1
4
2

/

/
t

l

a
c
_
a
_
0
0
3
5
5
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

because of the small number of noun–verb pairs
involved, we caution against reading too much
into this finding.

To contextualize our findings, we test whether
there are statistically significant
relationships
between the grammatical genders of inanimate
nouns and the cases and numbers of these nouns.
A priori, we do not expect to find statistically
significant relationships, so these tests can be
viewed as a baseline of sorts. As expected, for
each of the six languages, there are no statistically
significant relationships.

To provide further context, we also repeat
tests for animate nouns—a ‘‘skyline’’ of
all
sorts—finding that for all six languages there
is a statistically significant relationship between
the grammatical genders of animate nouns and
the adjectives used to describe those nouns. We
also find that there are statistically significant
relationships between the grammatical genders
of animate nouns and the verbs that take those
nouns as direct objects, as indirect objects, and
as subjects. All of these relationships have effect
sizes (operationalized as normalized MI values)
that are larger than the effect sizes for inanimate
nouns.

We emphasize that the practical significance
and implications of our findings require deeper
investigation. Most
importantly, we do not
investigate the characteristics of the relationships
that we find. This means that we do not know
whether these relationships are characterized by
gender stereotypes, as argued by some cognitive
scientists. We also do not engage with the ways
that historical and sociopolitical factors affect the
grammatical genders possessed by either animate
or inanimate nouns (Fodor, 1959; Ibrahim, 2014).

2 Background

2.1 Grammatical Gender

Languages lie along a continuum with respect
to whether nouns possess grammatical genders.
like
Languages with no grammatical genders,
Turkish, lie on one end of this continuum, while
languages with tens of gender-like classes, like
Swahili (Corbett, 1991), lie on the other. In this
paper, we focus on six different gendered lan-
guages for which large-scale corpora are readily
available: German, Italian, Polish, Portuguese,

languages of Indo-
Russian, and Spanish—all
these languages
European descent. Three of
(Italian, Portuguese, and Spanish) have two
grammatical genders (masculine and feminine),
while the other two have three grammatical
genders (masculine, feminine, and neuter).

All six languages exhibit gender agreement,
meaning that words are marked with morpholog-
ical suffixes that reflect the grammatical genders
of their surrounding nouns (Corbett, 2006). For
example, consider the following translations of
the sentence, ‘‘The delicate fork is on the cold
ground.’’

(1) Die zierliche Gabel steht auf dem kalten

Boden.
the.F.SG.NOM delicate.F.SG.NOM fork.F.SG.NOM
stands
cold.M.SG.DAT
the.M.SG.DAT
ground.M.SG.DAT
The delicate fork is on the cold ground.

on

(2) El tenedor delicado est´a en el suelo fr´ıo.

the.M.SG fork.M.SG delicate.M.SG is on the.M.SG
ground.M.SG cold.M.SG
The delicate fork is on the cold ground.

Because the German word for a fork, Gabel, is
grammatically feminine, the German translation
uses the feminine determiner, die. Had Gabel been
masculine, the German translation would have
used the masculine determiner, der. Similarly,
because the Spanish word for a fork, tenedor, is
grammatically masculine, the Spanish translation
uses the masculine determiner, el,
instead of
the feminine determiner, la. As we explain in
Section 3, we lemmatize each corpus to ensure
that our tests do not simply reflect the presence of
gender agreement.

2.2 Grammatical Gender & Meaning

Although some scholars have described the
grammatical genders possessed by inanimate
nouns as ‘‘creative’’ and meaningful (Grimm,
1890; Wheeler, 1899), many scholars have
considered them to be idiosyncratic (Brugmann,
1889; Bloomfield, 1933) or arbitrary (Maratsos,
1979, p. 317). In an overview of this work,
Dye et al. (2017) wrote, ‘‘As often as not, the
languages of the world assign [inanimate] objects
into seemingly arbitrary [classes]… William of

140

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
5
5
1
9
2
4
1
4
2

/

/
t

l

a
c
_
a
_
0
0
3
5
5
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Ockham considered gender to be a meaningless,
unnecessary aspect of language.’’ Bloomfield
(1933) shared this viewpoint, stating that ‘‘[t]here
seems to be no practical criterion by which the
gender of a noun in German, French, or Latin
language
[can] be determined.’’ Indeed, adult
learners often have particular difficulty mastering
the grammatical genders of
inanimate nouns
(Franceschina 2005, Ch. 4, DeKeyser 2005;
Montrul et al. 2008), which suggests that their
meanings are not straightforward.

Even if the grammatical genders possessed by
inanimate nouns are meaningless, ample evidence
suggests that gender-related information may af-
fect cognitive processes (Sera et al., 1994; Cubelli
et al., 2005, 2011; Kurinski and Sera, 2011;
Boutonnet et al., 2012; Saalbach et al., 2012).
Typologists and formal linguists have argued that
grammatical genders are an important feature for
morphosyntactic processes (Corbett, 1991, 2006;
Harbour et al., 2008; Harbour, 2011; Kramer,
2014, 2015), while some cognitive scientists
have shown that grammatical genders can be a
perceptual cue—for example, human brain res-
ponses exhibit sensitivity to gender mismatches
in several different
languages (Osterhout and
Mobley, 1995; Hagoort and Brown, 1999;
Vigliocco et al., 2002; Wicha et al., 2003, 2004;
Barber et al., 2004; Barber and Carreiras, 2005;
Ba˜n´on et al., 2012; Caffarra et al., 2015), and
the grammatical genders of determiners and
adjectives can prime nouns (Bates et al., 1996;
Akhutina et al., 1999; Friederici and Jacobsen,
1999). However,
the
relationship between grammatical gender and
meaning remains an open research question.

the precise nature of

In particular,

this viewpoint

the grammatical genders pos-
sessed by inanimate nouns might affect the ways
that speakers of gendered languages conceptualize
the objects referred to by those nouns (Jakobson,
1959; Clarke et al., 1981; Ervin-Tripp, 1962;
Konishi, 1993; Sera et al., 1994, 2002; Vigliocco
et al., 2005; Bassetti, 2007)—although we note
that
is somewhat contentious
(Hofst¨atter, 1963; Bender et al., 2011; McWhorter,
2014). Neo-Whorfian cognitive scientists hold
a particularly strong variant of this viewpoint,
arguing that the grammatical genders possessed
by inanimate nouns prompt speakers of gendered
languages to rely on gender stereotypes when
choosing adjectives to describe those nouns
(Boroditsky and Schmidt, 2000; Boroditsky et al.,

in German,

stereotypically masculine
that

2002; Phillips and Boroditsky, 2003; Boroditsky,
2003; Boroditsky et al., 2003; Semenuks et al.,
2017). Most famously, Boroditsky et al. (2003)
claim to have conducted a laboratory experi-
ment showing that speakers of German choose
stereotypically feminine adjectives to describe,
for example, bridges, while speakers of Spanish
adjectives,
choose
the word
reflecting the fact
for a bridge, Br¨ucke, is grammatically feminine,
while in Spanish, the word for a bridge, puente,
is grammatically masculine. Boroditsky et al.
(2003) took these findings to be a relatively strong
confirmation of the existence of a stereotype
effect—that is, that speakers of gendered lan-
guages reveal gender stereotypes when choosing
adjectives to describe inanimate nouns. That said,
the experiment has not gone unchallenged. Indeed,
Mickan et al. (2014) reported two unsuccessful
replication attempts.

2.3 Laboratory Experiments vs. Corpora

Traditionally, studies of grammatical gender and
meaning have relied on laboratory experiments.
This is for two reasons: 1) laboratory experiments
can be tightly controlled, and 2) they enable
scholars to measure speakers’ immediate, real-
time speech production. However,
they also
typically involve small sample sizes and, in many
cases, somewhat artificial settings. In contrast,
large-scale corpora of written text enable scholars
to measure even relatively weak correlations via
writers’ text production in natural, albeit
less
tightly controlled, settings. They also facilitate
the discovery of correlations that hold across
languages with disparate histories, cultural con-
texts, and even gender systems. As a result, large-
scale corpora have proven useful for studying a
wide variety of language-related phenomena (e.g.,
Featherston and Sternefeld, 2007; Kennedy, 2014;
Blasi et al., 2019).

In this paper, we assume that a writer’s choice
of words in written text is as informative as a
speaker’s choice of words in a laboratory expe-
riment, despite the obvious differences between
these settings. Consequently, we use large-scale
corpora and tools from NLP and information
theory, enabling us to test for the presence
of even relatively weak relationships involving
the grammatical genders of
inanimate nouns
across multiple different gendered languages. We

141

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
5
5
1
9
2
4
1
4
2

/

/
t

l

a
c
_
a
_
0
0
3
5
5
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 1: Dependency tree for the sentence, ‘‘Yo quiero cruzar un puente robusto.’’

therefore argue that our findings complement,
rather than supersede, laboratory experiments.

3 Data Preparation

2.4 Related Work

Our paper is not the first to use large-scale corpora
and tools from NLP to investigate gender and
language. Many scholars have studied the ways
that societal norms and stereotypes,
including
gender norms and stereotypes, can be reflected in
representations of distributional semantics derived
from large-scale corpora, such as word embed-
dings (Bolukbasi et al., 2016; Caliskan et al.,
2017; Garg et al., 2018; Zhao et al., 2018).
More recently, Williams et al. (2019) found
that the grammatical genders of inanimate nouns
in eighteen different languages were correlated
with their lexical semantics. Dye et al. (2017)
used tools from information theory to reject
the idea that the grammatical genders of nouns
separate those nouns into coherent categories,
arguing instead that grammatical genders are only
meaningful in that they systematically facilitate
communication efficiency by reducing nominal
entropy. Also relevant to our paper is the work
of Kann (2019), who proposed a computational
approach to testing whether there is a relationship
between the grammatical genders of inanimate
nouns and the words that co-occur with those
nouns, operationalized via word embeddings.
However, in contrast to our findings, they found no
evidence for the presence of such a relationship.
Finally, many scholars have proposed a variety
of computational techniques for mitigating gender
norms and stereotypes in a wide range of language-
based applications (Dev and Phillips, 2019; Dinan
et al., 2019; Ethayarajh et al., 2019; Hall Maudslay
et al., 2019; Stanovsky et al., 2019; Tan and Celis,
2019; Zhou et al., 2019; Zmigrod et al., 2019).

142

We use the May 2018 dump of Wikipedia to
create a corpus for each of the six different
gendered languages (i.e., German, Italian, Polish,
Portuguese, Russian, and Spanish). Although
Wikipedia is not the most representative data
source,
language-specific
corpora that are roughly parallel—that is, they
refer to the same objects, but are not direct
translations of each other (which could lead to
artificial word choices). We use UDPipe to to-
kenize each corpus (Straka et al., 2016).

this choice yields

We dependency parse the corpus for each
language using a language-specific dependency
parser (Andor et al., 2016; Alberti et al., 2017),
trained using Universal Dependencies treebanks
(Nivre et al., 2017). An example dependency
tree is shown in Figure 1. We then extract all
noun–adjective pairs (dependency arcs labeled
AMOD) and noun–verb pairs from each of the
six corpora; for verbs, we extract three types of
pairs, reflecting the fact that nouns can be direct
objects (dependency arcs labeled DOBJ), indirect
objects (dependency arcs labeled IOBJ), or subjects
(dependency arcs labeled NSUBJ) of verbs. We
discard all pairs that contain a noun that is not
present in WordNet (Princeton University, 2010).
We label the remaining nouns as ‘‘animate’’ or
‘‘inanimate’’ according to WordNet.

Next, we lemmatize all words (i.e., nouns,
adjectives, and verbs). Each word is factored into
a set of lexical features consisting of a lemma,
or canonical morphological form, and a bundle
of three morphological features corresponding
to the grammatical gender, number, and case of
that word. For example, the German word for a
fork, Gabel, is grammatically feminine, singular,
and genitive. For nouns, we discard the lemmas
themselves and retain only the morphological

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
5
5
1
9
2
4
1
4
2

/

/
t

l

a
c
_
a
_
0
0
3
5
5
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

features; for adjectives and verbs, we retain the
lemmas and discard the morphological features.

For adjectives and verbs,

lemmatizing is
especially important because it ensures that our
tests do not simply reflect
the presence of
gender agreement, as we describe in Section 2.1.
However, this means that if the lemmatizer fails,
then our tests may simply reflect gender agreement
despite our best efforts. To guard against this, we
use a state-of-the-art lemmatizer (M¨uller et al.,
2015), trained for each language using Universal
Dependencies treebanks (Nivre et al., 2017). We
expect that when the lemmatizer fails, the resulting
lemmata will be low frequency. We try to exclude
lemmatization failures from our calculations by
discarding low-frequency lemmata. For each
language, we rank the adjective lemmata by
their token counts and retain only the highest-
ranked lemmata (in rank order) that account for
90% of the adjective tokens; we then discard
all noun–adjective pairs that do not contain one
of these lemmata. We repeat the same process
for verbs.
Finally,

the
most
relationships, we also discard
low-frequency inanimate nouns and, separately,
low-frequency animate nouns using the same
process. We provide counts of the remaining
noun–adjective and noun–verb pairs in Table 3
(for inanimate nouns) and Table 4 (for animate
nouns).

to ensure that our tests reflect

salient

4 Methodology

VERB

⊂ V (cid:2)

For each language (cid:2) ∈ {de, it, pl, pt, ru, es}, we
define V (cid:2)
ADJ to be the set of adjective lemmata
represented in the noun–adjective pairs retained
for that language as defined above. We similarly
define V (cid:2)
to be the set of verb lemmata
represented in the noun–verb pairs retained for
language, as described above. We then
that
define V (cid:2)
VERB, and
VERB-IOBJ
V (cid:2)
VERB to be the sets of verbs that take the
nouns as direct objects, as indirect objects, and as
subjects, respectively. We also define G(cid:2) to be the
set of grammatical genders for that language (e.g.,
Ges = {MSC, FEM}), C(cid:2) to be the set of cases (e.g.,
Cde = {NOM, ACC, GEN, DAT}), and N (cid:2) to be the set
of numbers (e.g., N pt = {PL, SG}). Finally, we
define fourteen random variables: A(cid:2)
a are
V (cid:2)
ADJ-valued random variables, D(cid:2)
a are

i and A(cid:2)
i and D(cid:2)

VERB-DOBJ
⊂ V (cid:2)

VERB, V (cid:2)

⊂ V (cid:2)

VERB-SUBJ

VERB-IOBJ-valued random variables, S(cid:2)
VERB-SUBJ-valued random variables, G(cid:2)
i and C(cid:2)
i and N (cid:2)

V (cid:2)
i and I (cid:2)
VERB-DOBJ-valued random variables, I (cid:2)
a
are V (cid:2)
i and
a are V (cid:2)
S(cid:2)
i and
a are G(cid:2)-valued random variables, C(cid:2)
G(cid:2)
a are
C(cid:2)-valued random variables, and N (cid:2)
a are
N (cid:2)-valued random variables. The subscripts ‘‘i’’
and ‘‘a’’ denote inanimate and animate nouns,
respectively

To test whether there is a relationship between
the grammatical genders of inanimate nouns and
the adjectives used to describe those nouns for
language (cid:2), we calculate the MI (mutual
in-
formation)—a measure of the mutual statistical
dependence between two random variables—
between G(cid:2)

i and A(cid:2)
i:

MI(G(cid:2)

i; A(cid:2)
i)
(cid:2)

(cid:2)

=

P (g, a) log2

g ∈ G(cid:2)

a∈V (cid:2)

ADJ

Pi(g, a)
Pi(g) Pi(a)

,

(1)

i; A(cid:2)

i) = 0;

i and A(cid:2)

if G(cid:2)
then MI(G(cid:2)

i)}, where H(G(cid:2)
i) is the entropy of A(cid:2)

where all probabilities are calculated with respect
to inanimate nouns only. If G(cid:2)
i are
independent (i.e., there is no relationship between
i and A(cid:2)
them) then MI(G(cid:2)
i
i; A(cid:2)
i) =
are maximally dependent
min{H(G(cid:2)
i), H(A(cid:2)
i) is the entropy
of G(cid:2)
i and H(A(cid:2)
i. For
simplicity, we use plug-in estimates for all
probabilities (i.e., empirical probabilities), defer-
ring the use of more sophisticated estimators for
future work. We note that MI(G(cid:2)
i) can be
time; however, | G(cid:2)|
calculated in O
is negligible (i.e, two or three) so the main cost
is |V (cid:2)

| G(cid:2)| · |V (cid:2)

i, A(cid:2)

|.

(cid:3)

(cid:4)

ADJ

|

ADJ

To test for statistical significance, we perform a
permutation test. Specifically, we permute the
the inanimate nouns
grammatical genders of
10,000 times and, for each permutation, recal-
culate the MI between G(cid:2)
i using the
permuted genders. We obtain a p-value by
calculating the percentage of permutations that
have a higher MI than the MI obtained using the
non-permuted genders; if the p-value is less than
0.05, then we treat the relationship between G(cid:2)
i
and A(cid:2)

i as statistically significant.

i and A(cid:2)

Because the maximum possible MI between
any pair of random variables depends on the
entropies of those variables, MI values are not
comparable across pairs of random variables. We
therefore also calculate the normalized MI (NMI)

143

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
5
5
1
9
2
4
1
4
2

/

/
t

l

a
c
_
a
_
0
0
3
5
5
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

de

it

pl

pt

ru

es

MI(G(cid:2)
MI(G(cid:2)
MI(G(cid:2)
MI(G(cid:2)
MI(G(cid:2)
MI(G(cid:2)

i, A(cid:2)
0.0310
i)
i, D(cid:2)
0.0290
i )
i, I (cid:2)
0.0743
i )
i, S(cid:2)
0.0276
i )
i, C(cid:2)
i ) < 0.001 i, N (cid:2) i ) < 0.001 < 0.001 <0.001 <0.001 < 0.001 < 0.001 0.0520 0.0440 0.0640 0.0270 < 0.001 0.0225 0.0109 0.0514 0.0226 < 0.001 0.0400 0.0129 0.0230 0.0090 N/A 0.0664 0.0090 0.0184 0.0090 N/A 0.0500 0.0232 0.6973 0.0274 N/A Table 1: The mutual information (MI) between the grammatical genders of inanimate nouns and a) the adjectives used to describe those nouns (top row), b) the verbs that take those nouns as direct objects, as indirect objects, and as subjects (rows 2–4, respectively), and c) the cases and numbers of those nouns (rows 5 and 6, respectively) for six different gendered languages. Statistical significance (i.e., a p-value less than 0.05) is indicated using bold. MI values are not comparable across pairs of random variables. i and A(cid:2) i by normalizing MI(G(cid:2) between G(cid:2) i, A(cid:2) i) to lie between zero and one. The most obvious choice of normalizer is the maximum possible i)}—however, var- MI—that is, min{H(G(cid:2) ious other normalizers have been proposed, each of which has different advantages and disadvan- tages (Gates et al., 2019). We therefore calculate six different variants of NMI(G(cid:2) i) using the following normalizers: i), H(A(cid:2) i, A(cid:2) min{H(G(cid:2) i), H(A(cid:2) i)} (cid:5) H(G(cid:2) i)H(A(cid:2) i) i) + H(A(cid:2) i) H(G(cid:2) 2 i)} max{H(G(cid:2) i), H(A(cid:2) max {log | G(cid:2)|, log | V (cid:2) ADJ | } log M (cid:2) i , (2) (3) (4) (5) (6) (7) where M (cid:2) i imate) noun–adjective pairs retained for language. is the number of non-unique (inan- that i, D(cid:2) To test whether there are relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects, we calculate MI(G(cid:2) i ). Again, all probabilities are calculated with respect to inanimate perform only, permutation tests to test for statistical significance. We also calculate six NMI variants for each of the three pairs of random variables, using normalizers i ), and MI(G(cid:2), S(cid:2) i ), MI(G(cid:2) and we nouns i, I (cid:2) that are analogous to those in Eq. (2) through Eq. (7). i, N (cid:2) As a baseline, we test whether there are rela- tionships between the grammatical genders of inanimate nouns and the cases and numbers of those nouns—that is, we calculate MI(G(cid:2) i, C(cid:2) i ) and MI(G(cid:2) i ) using probabilities that are calculated with respect to inanimate nouns only. Again, we perform permutation tests (but we there will be statistically do not expect significant relationships), and we calculate six NMI variants for each pair of random variables using normalizers that are analogous to those in Eq. (2) through Eq. (7). that a, S(cid:2) a, D(cid:2) a, I (cid:2) a, N (cid:2) a), MI(G(cid:2) a, C(cid:2) a), a, A(cid:2) Finally, we calculate MI(G(cid:2) a), a), MI(G(cid:2) a), MI(G(cid:2) MI(G(cid:2) and MI(G(cid:2) a)) using probabilities calculated with respect to animate nouns only. The first five of these are intended to serve as a ‘‘skyline,’’ while the last two are intended to serve as a sanity check (i.e., we expect them to be close to zero, as with inanimate nouns). Again, we perform permutation tests to test for statistical significance, and we calculate six NMI variants for each pair of random variables. 5 Results i and A(cid:2) In the first row of Table 1, we provide the i for each language (cid:2) ∈ MI between G(cid:2) {de, it, pl, pt, ru, es}. For all six languages, MI(G(cid:2) i) is statistically significant (i.e., p < 0.05), meaning that there is a relationship between the grammatical genders of inanimate nouns and i, A(cid:2) 144 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 2: The normalized mutual information (NMI) between the grammatical genders of inanimate nouns and a) the adjectives used to describe those nouns and b) the verbs that take those nouns as direct objects and as subjects for six different gendered languages. Each subplot contains six variants of NMI(G(cid:2) i ), and NMI(G(cid:2) i )—one per normalizer—for a single language (cid:2) ∈ {de, it, pl, pt, ru, es}. i ), NMI(G(cid:2) i , D(cid:2) i , A(cid:2) i , S(cid:2) i, I (cid:2) i, I (cid:2) i, S(cid:2) i ), MI(G(cid:2) the adjectives used to describe those nouns. Rows 2–4 of Table 1 contain MI(G(cid:2) i, D(cid:2) i ), and MI(G(cid:2), S(cid:2) i ) for each language. For all six languages, MI(G(cid:2) i ) and MI(G(cid:2) i, D(cid:2) i ) are statistically significant (i.e., p < 0.05). For five of the six languages, MI(G(cid:2) i ) is statistically significant, but because of the small number of noun–verb pairs involved, we caution against reading too much into this finding. We note that direct objects are closest to verbs in analyses of constituent structures, followed by subjects and then indirect objects (Chomsky, 1957; Adger, 2003). Finally, the last two rows of Table 1 contain MI(G(cid:2) i ), respectively, for each language. We do not find any statistically sig- nificant relationships for either case or number. i ) and MI(G(cid:2) i , N (cid:2) i, C(cid:2) 145 (2) i ), i ), (7), i, I (cid:2) i, S(cid:2) i, D(cid:2) through Eq. i ) from each plot because of To facilitate comparisons, each subplot in i, A(cid:2) Figure 2 contains six variants of NMI(G(cid:2) i), and NMI(G(cid:2) NMI(G(cid:2) calculated using normalizers that are analogous to those for a single in Eq. language (cid:2) ∈ {de, it, pl, pt, ru, es}. (We omit NMI(G(cid:2) the small number of noun–verb pairs involved.) For (cid:2) ∈ {it, pl, pt, es}, NMI(G(cid:2) i) is larger than NMI(G(cid:2) i ) and NMI(G(cid:2) i ), regardless of the normalizer. For (cid:2) ∈ {it, pl}, NMI(G(cid:2) i, S(cid:2) i ) i , Dpt i ); NMI(Gpt is larger than NMI(G(cid:2) i, D(cid:2) i ) is larger than NMI(Gpt i , Spt i , Des i ); and NMI(Ges i ) and i , Ses NMI(Ges i ) are roughly comparable—again, the normalizer. Meanwhile, regardless of all i , Dde than NMI(Gde i , Ade NMI(Gde i ) i ) is larger i, A(cid:2) i, S(cid:2) i, D(cid:2) de it pl pt ru es MI(G(cid:2) MI(G(cid:2) MI(G(cid:2) MI(G(cid:2) MI(G(cid:2) MI(G(cid:2) a, A(cid:2) 0.0928 a) a, D(cid:2) 0.0410 a) a, I (cid:2) 0.0737 a) a, S(cid:2) 0.0343 a) a, C(cid:2) a) < 0.001 a, N (cid:2) a) < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.0845 0.0664 0.0600 0.0303 < 0.001 0.0621 0.0273 0.0439 0.0258 < 0.001 0.1111 0.0091 0.0358 0.0192 N/A 0.0933 0.0320 0.0687 0.0252 N/A 0.1316 0.0543 0.0543 0.0543 N/A Table 2: The mutual information (MI) between the grammatical genders of animate nouns and a) the adjectives used to describe those nouns (top row), b) the verbs that take those nouns as direct objects, as indirect objects, and as subjects (rows 2–4, respectively), and c) the cases and numbers of those nouns (rows 5 and 6, respectively) for six different gendered languages. Statistical significance (i.e., a p-value less than 0.05) is indicated using bold. MI values are not comparable across pairs of random variables. i , Sde i , Sde i ) the normalizer i ), NMI(Gde and NMI(Gde for i , Ade Eq. (2), while NMI(Gde and NMI(Gde the other five normalizers. Finally, NMI(Gru and NMI(Gru larger normalizer. in i ), i ) are all roughly comparable for i , Aru i ) i ) are roughly comparable and the i , Dru than NMI(Gru regardless of i , Dde i , Sru i ), In other words, the relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns is generally stronger than, but sometimes roughly comparable to, the relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects and as subjects. However, the relative strengths of the relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects and as subjects vary depending on the language. a, S(cid:2) a), MI(G(cid:2) a, A(cid:2) a), MI(G(cid:2) In Table 2, we provide MI(G(cid:2) a), MI(G(cid:2) a, N (cid:2) a), MI(G(cid:2) a, a, I (cid:2) a, C(cid:2) D(cid:2) a), and a) for each language (cid:2) ∈ {de, it, pl, MI(G(cid:2) pt, ru, es}. As with inanimate nouns, we find that there is a statistically significant relationship between the grammatical genders of animate nouns and the adjectives used to describe those nouns. We also find that there are statistically significant relationships between the grammatical genders of animate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. Again, the relationship for the verbs that take those nouns as indirect objects involves a small number of noun–verb 146 pairs. As expected, we do not find any statisti- cally significant relationships for either case or number. a), a), a, I (cid:2) i, S(cid:2) a, S(cid:2) a, D(cid:2) regardless of i, S(cid:2) i ); for (cid:2) ∈ {de, pt}, NMI(G(cid:2) Figure 3 is analogous to Figure 2, in that each a, A(cid:2) subplot contains six variants of NMI(G(cid:2) a), NMI(G(cid:2) and NMI(G(cid:2) calculated using normalizers that are analogous to those in Eq. (2) through Eq. (7), for a single language (cid:2) ∈ {de, it, pl, pt, ru, es}. (As with inanimate nouns, we omit NMI(G(cid:2) a) from each plot because of the small number of noun–verb involved.) For (cid:2) ∈ {de, it, pl, pt, es}, pairs i, D(cid:2) i) is larger than NMI(G(cid:2) i, A(cid:2) NMI(G(cid:2) i ) and NMI(G(cid:2) i, S(cid:2) i ), the normalizer. For (cid:2) ∈ {it, pl}, NMI(G(cid:2) i ) is larger than i, D(cid:2) NMI(G(cid:2) i, D(cid:2) i ) is larger than NMI(G(cid:2) i , Des i ); and NMI(Ges i ) and i , Ses NMI(Ges i ) are roughly comparable—again, the normalizer. Meanwhile, regardless of all , Dru , Aru NMI(Gru i ) i which is larger than NMI(Gru i ) for the i (3), while normalizers in Eq. NMI(Gru i ) are roughly i , Sru comparable and larger than NMI(Gru i ) for i the other five normalizers. Finally, each subplot i) and NMI(G(cid:2) in Figure 4 contains a, A(cid:2) NMI(G(cid:2) a), calculated using a single normalizer, for each for each language (cid:2) ∈ {de, it, pl, pt, ru, es}. Each subplot in Figure 5 analogously contains NMI(G(cid:2) i, D(cid:2) i ) and a, D(cid:2) NMI(G(cid:2) a), while each subplot in Figure 6 contains NMI(G(cid:2) i ) and NMI(G(cid:2) a). The NMI values for animate nouns are generally larger i ) is larger than NMI(Gru i (2) and Eq. i , Dru i ) and NMI(Gru i, A(cid:2) a, S(cid:2) i, S(cid:2) , Aru , Sru l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . Figure 3: The normalized mutual information (NMI) between the grammatical genders of animate nouns and a) the adjectives used to describe those nouns and b) the verbs that take those nouns as direct objects and as subjects for six different gendered languages. Each subplot contains six variants of NMI(G(cid:2) a), and a)—one per normalizer—for a single language (cid:2) ∈ {de, it, pl, pt, ru, es}. NMI(G(cid:2) a), NMI(G(cid:2) a, D(cid:2) a, A(cid:2) a, S(cid:2) f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 147 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . Figure 4: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the adjectives used to describe those nouns. Each subplot contains NMI(G(cid:2) i ) and NMI(G(cid:2) a), calculated using a single normalizer, for each language (cid:2) ∈ {de, it, pl, pt, ru, es}. a, A(cid:2) i , A(cid:2) f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 148 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . Figure 5: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the verbs that take those nouns as direct objects. Each subplot contains NMI(G(cid:2) i ) and NMI(G(cid:2) a), calculated using a single normalizer, for each language (cid:2) ∈ {de, it, pl, pt, ru, es}. a, D(cid:2) i , D(cid:2) f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 149 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . Figure 6: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the verbs that take those nouns as subjects. Each subplot contains NMI(G(cid:2) i ) and a), calculated using a single normalizer, for each language (cid:2) ∈ {de, it, pl, pt, ru, es}. NMI(G(cid:2) a, S(cid:2) i , S(cid:2) f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 150 than the NMI values for inanimate nouns. The i , Apl only exception is Polish, where NMI(Gpl i ) is larger than NMI(Gpl a ), regardless of the normalizer. a , Apl 6 Discussion We find evidence for the presence of a statistically significant relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns for six different gendered languages (specifically, German, Italian, Polish, Portuguese, Russian, and Spanish). We also find evidence for the presence of statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. However, we caution against reading too much into the relationship for the verbs that take those nouns as indirect objects because of the small number of noun–verb pairs involved. The effect sizes (operationalized as NMI values) for all of these relationships are smaller than the effect sizes for animate nouns. As expected, we do not find any statistically significant relationships for either case or number. We emphasize that our findings complement, rather than supersede, laboratory experiments, such as that of Boroditsky et al. (2003). We use large-scale corpora and tools from NLP and information theory to test for the presence of even relatively weak relationships across multi- ple different gendered languages—and, indeed, the relationships that we find have effect sizes (operationalized as NMI values) that are small. In contrast, laboratory experiments typically focus on much stronger relationships by tightly con- trolling experimental conditions and measuring speakers’ immediate, real-time speech produc- tion. Moreover, although we find statistically significant relationships, we do not investigate the characteristics of these relationships. This means that we do not know whether they are character- ized by gender stereotypes, as argued by some cognitive scientists, including Boroditsky et al. (2003). We also do not know whether the rela- tionships that we find are causal in nature. Because MI is symmetric, our findings say nothing about whether the grammatical genders of inanimate nouns cause writers to choose particular adjec- tives or verbs. We defer deeper investigation of this for future work. We note that each of our tests can be viewed as a comparison of the similarity of two clusterings of a set of items—specifically, a ‘‘clustering’’ of nouns into grammatical genders and a ‘‘clustering’’ of the same nouns into, for example, adjective lemmata. Although (normalized) MI is a standard measure for comparing clusterings, is not limitations (see, e.g., Newman et al. without [2020] future work, for an overview). For we therefore recommend replicating our tests using other information-theoretic measures for comparing clusterings. it Acknowledgments We thank Lera Boroditsky, Hagen Blix, Eleanor Chodroff, Andrei Cimpian, Zach Davis, Jason Eisner, Richard Futrell, Todd Gureckis, Katharina Kann, Peter Klecha, Zhiwei Li, Ethan Ludwin-Peery, Alec Marantz, Arya McCarthy, John McWhorter, Sabrina J. Mielke, Elizabeth Salesky, Arturs Semenuks, and Colin Wilson for discussions at various points related to the ideas in this paper. Katharina Kann approves this acknowledgment. A Appendix A: Counts Counts of the noun–adjective and noun–verb pairs for all six gendered languages are in Table 3 (for inanimate nouns) and Table 4 (for animate nouns). 151 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 # noun–adj. tokens # noun–adj. types # noun types # adj. types # noun–verb (subj.) tokens # noun–verb (subj.) types # noun (subj.) types # verb types # noun–verb (dobj.) tokens # noun–verb (dobj.) types # noun (dobj.) types # verb types # noun–verb (iobj.) tokens # noun–verb (iobj.) types # noun (iobj.) types # verb types # noun–case tokens # noun–case types # noun types # case types # noun–number tokens # noun–number types # noun types # number types de it pl pt ru es 6443907 770952 10712 4129 3191030 445536 10741 707 3440922 427441 10504 805 163935 50133 5520 386 14681293 2252632 11989 4 14681293 2252632 11989 2 6246856 666656 6410 3607 1432354 292949 6318 702 2855037 393246 6407 806 71 53 59 68 N/A N/A N/A 0 11588448 1748927 7014 2 11631913 640107 5533 4080 2179396 297996 5522 874 3964828 236849 4359 708 54138 18214 2258 417 15300621 1465314 5839 7 15300621 1465314 5839 2 640558 638774 5672 3431 1871941 337262 5780 758 4850012 541347 5896 738 95009 39738 3757 357 N/A N/A N/A 0 14631732 2042626 6256 2 32900200 1633963 9327 11028 6007063 864480 9129 1803 6738606 713703 8998 1539 1570273 300703 8150 1816 51641929 5028075 9692 6 51641929 5028075 9692 2 3605439 368795 6157 1907 1534211 376888 7470 875 2859135 576835 11567 9746 56038 24830 3574 464 N/A N/A N/A 0 5672790 1034307 1593 2 Table 3: Counts of the inanimate noun–adjective and noun–verb pairs for all six gendered languages. 152 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 de it pl pt ru es # noun–adj. tokens # noun–adj. types # noun types # adj. types # noun–verb (subj.) tokens # noun–verb (subj.) types # noun (subj.) types # verb types # noun–verb (dobj.) tokens # noun–verb (dobj.) types # noun (dobj.) types # verb types # noun–verb (iobj.) tokens # noun–verb (iobj.) types # noun (iobj.) types # verb types # noun–case tokens # noun–case types # noun types # case types # noun–number tokens # noun–number types # noun types # number types 662760 99332 1998 3587 637801 113308 2056 707 321400 60760 1901 804 51359 17804 1149 378 1926614 390672 2292 4 1926614 390672 2292 2 818300 92424 1078 3507 399747 77551 1066 702 388187 55574 1025 805 7 6 6 6 N/A N/A N/A 0 1801285 306968 1135 2 1137209 97847 954 3836 526894 89819 969 874 456824 76348 867 724 43187 8440 628 411 1907688 299511 1024 7 1907688 299511 1024 2 712101 90865 1006 3176 456349 89959 1013 758 527259 92220 1028 737 23139 110185 773 340 N/A N/A N/A 0 1931315 356352 1072 2 3225932 264117 2098 9833 1516740 253150 2020 1799 494534 118818 1912 1535 518540 11353 1858 1769 6357089 987420 2194 6 6357089 987420 2194 2 387025 50173 1320 1828 310569 93586 1477 874 850234 85235 1023 745 23955 9586 947 456 N/A N/A N/A 0 786177 200785 1593 2 Table 4: Counts of the animate noun–adjective and noun–verb pairs for all six gendered languages. 153 References David Adger. 2003. Core Syntax: A Minimalist Approach, 33. Oxford University Press Oxford. Alexandra Y. Aikhenvald. 2000. Classifiers: A Typology of Noun Categorization Devices: A Typology of Noun Categorization Devices. Oxford University Press. Tatiana Akhutina, Andrei Kurgansky, Maria Polinsky, and Elizabeth Bates. 1999. Process- ing of grammatical gender in a three-gender system: Experimental evidence from Russian. Journal of Psycholinguistic Research, 28(6): 695–713. DOI: https://doi.org/10 .1023/A:1023225129058, PMID: 10510865 Chris Alberti, Daniel Andor, Ivan Bogatyy, Michael Collins, Dan Gillick, Lingpeng Kong, Terry Koo, Ji Ma, Mark Omernick, Slav Petrov, Chayut Thanapirom, Zora Tung, and David Weiss. 2017. SyntaxNet models for the CoNLL 2017 shared task. CoRR abs/1703.04929 arXiv preprint arXiv:1703.04929. Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based the neural networks. the Association 54th Annual Meeting of for Computational Linguistics (Volume 1: Long Papers), pages 2442–2452. Associ- ation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P16 -1231 In Proceedings of Jos´e Alem´an Ba˜n´on, Robert Fiorentino, and Alison Gabriele. 2012. The processing of number and gender agreement in Spanish: An event-related potential the effects of structural distance. Brain Research, https://doi.org 1456:49–63. DOI: /10.1016/j.brainres.2012.03.057, PMID: 22520436 investigation of Horacio Barber and Manuel Carreiras. 2005. Grammatical gender and number agreement in Spanish: An ERP comparison. Journal of Cognitive Neuroscience, 17(1):137–153. DOI: https://doi.org/10.1162 /0898929052880101, PMID: 15701245 Horacio Barber, Elena Salillas, and Manuel Carreiras. 2004. Gender or genders agreement. On-line Study of Sentence Comprehension, pages 309–328. Benedetta Bassetti. 2007. Bilingualism and thought: Grammatical gender and concepts of objects in Italian–German bilingual children. International Journal of Bilingualism, 11(3): 251–273. DOI: https://doi.org/10 .1177/13670069070110030101 Elizabeth Bates, Antonella Devescovi, Arturo Hernandez, and Luigi Pizzamiglio. 1996. Gender priming in Italian. Perception & Psy- chophysics, 58(7):992–1004. DOI: https:// doi.org/10.3758/BF03206827, PMID: 8920836 Andrea Bender, Sieghard Beller, and Karl Christoph Klauer. 2011. Grammatical gender in german: A case for linguistic relativity? The Quarterly Journal of Experimental Psy- chology, 64(9):1821–1835. DOI: https:// doi.org/10.1080/17470218.2011 .582128, PMID: 21740112 Damian Blasi, Ryan Cotterell, Lawrence Wolf- Sonkin, Sabine Stoll, Balthasar Bickel, and Marco Baroni. 2019. On the distribution of deep clausal embeddings: A large cross-linguistic study. In Proceedings of the 57th Annual the Association for Computa- Meeting of tional Linguistics, pages 3938–3943. DOI: https://doi.org/10.18653/v1/P19 -1384 Leonard Bloomfield. 1933. Language, London: Allen & Unwin. Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems, pages 4349–4357. Lera Boroditsky. 2003. Linguistic Relativity. Encyclopedia of Cognitive Science. Lera Boroditsky and Lauren A. Schmidt. 2000. Sex, Syntax, and Semantics. In Proceedings of the Annual Meeting of the Cognitive Science Society. 154 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Lera Boroditsky, Lauren A. Schmidt, and Webb Phillips. 2002. Can quirks of grammar affect the way you think? Spanish and German speakers’ ideas about the genders of objects. https://escholarship.org/uc/item /31t455gf. Lera Boroditsky, Lauren A. Schmidt, and Webb Phillips. 2003. Sex, Syntax, and Semantics. Language in Mind: Advances in the Study of Language and Thought, pages 61–79. Greville G. Corbett. 2006. Agreement, Cambridge University Press. is Roberto Cubelli, Lorella Lotto, Daniela Paolieri, and Remo Job. 2005. Massimo Girelli, selected in bare Grammatical gender noun production: Evidence from the pic- ture–word interference paradigm. Journal of Memory and Language, 53(1):42–59. DOI: https://doi.org/10.1016/j.jml .2005.02.007 Bastien Boutonnet, Panos Athanasopoulos, and Guillaume Thierry. 2012. Unconscious effects of grammatical gender during object categorisation. Brain Research, 1479:72–79. DOI: https://doi.org/10.1016/j .brainres.2012.08.044, PMID: 22960201 Roberto Cubelli, Daniela Paolieri, Lorella Lotto, and Remo Job. 2011. The effect of grammatical gender on object categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(2):449. DOI: https://doi.org/10.1037/a0021965, PMID: 21261427 Karl Brugmann. 1889. Das nominalgeschlecht Inter- in den indogermanischen sprachen. f¨ur allgemeine Sprach- nationale Zeitschrift wissenschaft, 4. DOI: https://doi.org /10.1111/psyp.12429, PMID: 25817315 Robert M. DeKeyser. 2005. What makes learning second-language grammar difficult? A review of issues. Language Learning, 55(S1):1–25. DOI: https://doi.org/10.1111/j .0023-8333.2005.00294.x Sendy Caffarra, Anna Siyanova-Chanturia, Francesca Pesciarelli, Francesco Vespignani, Is the noun and Cristina Cacciari. 2015. ending a cue to grammatical gender process- ing? An ERP study on sentences in Italian. Psychophysiology, 52(8):1019–1030. DOI: https://doi.org/10.1126/science .aal4230, PMID: 28408601 Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automati- cally from language corpora contain human-like biases. Science, 356(6334):183–186. Noam Chomsky. 1957. Syntactic Structures (The Hague/Paris, Mouton), The Hague/Paris: Mouton. Mark A. Clarke, Ann Losoff, Margaret Dickenson McCracken, and JoAnn Still. 1981. Gender perception in Arabic and English. Language Learning, 31(1):159–169. DOI: https://doi.org/10.1111/j.1467 -1770.1981.tb01377.x Greville G. Corbett. 1991. Gender, Cambridge University Press. DOI: https://doi.org /10.1017/CBO9781139166119 Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in word vectors. In The 22nd International Intelligence and Conference on Artificial Statistics, pages 879–887. Emily Dinan, Angela Fan, Adina Williams, Jack Urbanek, Douwe Kiela, and Jason Weston. 2019. Queens are powerful too: Mitigating gender bias in dialogue genera- tion. arXiv preprint arXiv:1911.03842. DOI: https://doi.org/10.18653/v1/2020 .emnlp-main.656 Melody Dye, Petar Milin, Richard Futrell, and Michael Ramscar. 2017. A functional theory of gender paradigms, In Perspectives on Morpho- logical Organization, pages 212–239. Brill. DOI: https://doi.org/10.1163 /9789004342934 011 Susan Ervin-Tripp. 1962. The connotations of gender. Word, 18249–261. Kawin Ethayarajh, David Duvenaud, and Graeme Hirst. 2019. Understanding unde- In sirable word embedding associations. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 155 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 pages 1696–1705, Florence, Italy. Associ- ation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P19 -1166 Sam Featherston and Wolfgang Sternefeld. 2007. Roots: Linguistics in Search of its Evidential Base, volume 96. Walter de Gruyter. DOI: https://doi.org/10.1515 /9783110198621 Istvan Fodor. 1959. The origin of grammatical gender. Lingua, 8:186–214. DOI: https:// doi.org/10.1016/0024-3841(59)90020-8 Anthony Fox. 1990. The structure of German, Oxford University Press. Florencia Franceschina. 2005. Fossilized Second Language Grammars: The Acquisition of Grammatical Gender, John Benjamins Publishing. DOI: https:// doi.org/10.1075/lald.38 volume 38. Angela D. Friederici and Thomas Jacobsen. 1999. Processing grammatical gender dur- Journal of ing language comprehension. Psycholinguistic Research, 28(5):467–484. https://doi.org/10.1023/A DOI: https://doi.org :1023243708702, /10.1023/A:1023264209610 Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereo- types. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644. DOI: https://doi.org/10.1073/pnas .1720347115, PMID: 29615513, PMCID: PMC5910851 Alexander J. Gates, Ian B. Wood, William P. Hetrick, and Yong-Yeol Ahn. 2019. Element-centric clustering comparison unifies overlaps and hierarchy. Scientific Reports, https://doi.org/10 9(8574). DOI: .1038/s41598-019-44892-y, PMID: 31189888, PMCID: PMC6561975 Jacob Grimm. 1890. Deutsche Grammatik, C. Bertelsmann. Peter Hagoort and Colin M. Brown. 1999. Gender electrified: ERP evidence on the syntactic nature of gender processing. Journal of Psycholinguistic Research, 28(6):715–728. DOI: https://doi.org/10.1023/A :1023277213129, PMID: 10510866 Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, and Simone Teufel. 2019. It’s all in the name: Mitigating gender bias with name- based counterfactual data substitution. In Pro- ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Nat- ural Language Processing (EMNLP-IJCNLP), pages 5267–5275, Hong Kong, China. Asso- ciation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/D19 -1530 Daniel Harbour. 2011. Valence and atomic num- ber. Linguistic Inquiry, 42(4):561–594. DOI: https://doi.org/10.1162/LING a 00061 Daniel Harbour, David Adger, and Susana B´ejar. 2008. Phi theory: Phi-features Across Modules and Interfaces, 16, Oxford University Press. Peter R. Hofst¨atter. 1963. ¨Uber sprachliche bestimmungsleistungen: Das problem des grammatikalischen geschlechts von sonne und f¨ur experimentelle und mond. Zeitschrift angewandte Psychologie. Muhammad Hasan Ibrahim. 2014. Grammatical Gender: Its Origin and Development, 166, Walter de Gruyter. Roman Jakobson. 1959. On linguistic aspects of translation. On Translation, 3:30–39. DOI: https://doi.org/10.4159/harvard .9780674731615.c18 Katharina Kann. 2019. Grammatical gender, neo-Whorfianism, and word embeddings: A Data-Driven Approach to Linguistic Relativity. arXiv preprint arXiv:1910.09729. Graeme 2014. Kennedy. Intro- duction to Corpus Linguistics, Routledge. DOI: https://doi.org/10.4324 /9781315843674 An l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Toshi Konishi. 1993. The semantics of gram- matical study. gender: A cross-cultural Journal of Psycholinguistic Research, 22(5): 519–534. DOI: https://doi.org/10 .1007/BF01068252, PMID: 8246207 156 Ruth Kramer. 2014. Gender in Amharic: a morphosyntactic approach to natural and grammatical gender. Language Sciences, 43: 102–115. DOI: https://doi.org/10 .1016/j.langsci.2013.10.004 Ruth Kramer. 2015. The Morphosyntax of Gender, 58, Oxford University Press. DOI: https://doi.org/10.1093/acprof :oso/9780199679935.001.0001 Elena Kurinski and Maria D. Sera. 2011. Does learning Spanish grammatical gender change English-speaking adults’ categorization of inanimate objects? Bilingualism: Language and Cognition, 14(2):203–220. DOI: https:// doi.org/10.1017/S1366728910000179 Michael Maratsos. 1979. How to get from words to sentences, Doris Aaronson and Rober W. Reiber, editors, Psycholinguistic Research: Implications and Applications, Psychology Press, Taylor & Francis Group, London and New York. John H. McWhorter. 2014. The Language Hoax: Why the world looks the same in any language, Oxford University Press. Anne Mickan, Maren Schiefke, and Anatol Stefanowitsch. 2014. Key is a llave is a Schl¨ussel: A failure to replicate an experi- ment from Boroditsky et al. 2003. Yearbook of the German Cognitive Linguistics Asso- ciation, 2(1):39. DOI: https://doi.org /10.1515/gcla-2014-0004 Silvina Montrul, Rebecca Foote, and Silvia Perpi˜n´an. 2008. Gender agreement in adult second language learners and spanish heritage speakers: The effects of age and context of acquisition. 58(3): 503–553. DOI: https://doi.org/10 .1111/j.1467-9922.2008.00449.x Language Learning, Thomas M¨uller, Ryan Cotterell, Alexander Fraser, and Hinrich Sch¨utze. 2015. Joint tagging lemmatization and morphological with lemming. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2268–2274. DOI: https://doi.org/10.18653/v1/D15 -1272, PMID: 25768671 157 Mark E. J. Newman, George T. Cantwell, Improved and Jean-Gabriel Young. 2020. mutual information measure for classification and community detection. Physical Review https://doi.org/10.1103 E. DOI: /PhysRevE.101.042304, PMID: 32422767 Joakim Nivre, ˇZeljko Agi´c, Lars Ahrenberg, Maria Jesus Aranzabe, Masayuki Asahara, Aitziber Atutxa, Miguel Ballesteros, John Bauer, Kepa Bengoetxea, Riyaz Ahmad Bhat, Eckhard Bick, Cristina Bosco, Gosse Bouma, Sam Bowman, Marie Candito, G¨uls¸en Cebiro˘glu Eryi˘git, Giuseppe G. A. Celano, Fabricio Chalub, Jinho Choi, C¸ a˘grı C¸ ¨oltekin, Miriam Connor, Elizabeth Davidson, Marie- Catherine de Marneffe, Valeria de Paiva, Arantza Diaz de Ilarraza, and Kaja Dobrovoljc. 2017. Universal dependencies 2.0. Lee Osterhout and Linda A. Mobley. 1995. Event-related brain potentials elicited by failure to agree. Journal of Memory and Language, 34(6):739–773. DOI: https://doi.org /10.1006/jmla.1995.1033 Webb Phillips and Lera Boroditsky. 2003. Can quirks of grammar affect the way you think? Grammatical gender and object concepts. In Proceedings of the Cognitive Science Society, volume 25. the Annual Meeting of Princeton University. 2010. About WordNet. https://wordnet.princeton.edu/ HenrikSaalbach,MutsumiImai,andLennart Schalk. 2012. Grammatical gender and infer- ences about biological properties in German- speaking children. Cognitive Science, 36(7): 1251–1267. DOI: https://doi.org/10 .1111/j.1551-6709.2012.01251.x, PMID: 22578067 Arturs Semenuks, Webb Phillips, Ioana Dalca, Cora Kim, and Lera Boroditsky. 2017. Effects of grammatical gender on object description. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society (CogSci 2017). Maria D. Sera, Christian A. H. Berge, and Javier del Castillo Pintado. 1994. Grammatical and conceptual forces in the attribution of gender by English and Spanish speakers. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Cognitive Development, 9(3):261–292. DOI: https://doi.org/10.1016/0885 -2014(94)90007-8 logy: General, 134(4):501. DOI: https:// doi.org/10.1037/0096-3445.134.4 .501, PMID: 16316288 Maria D. Sera, Chryle Elieff, James Forbes, Melissa Clark Burch, Wanda Rodr´ıguez, and Diane Poulin Dubois. 2002. When language affects cognition and when it does not: An analysis of grammatical gender and classification. Journal of Experimental Psychology: General, 131(3):377. DOI: https://doi.org/10.1037/0096 -3445.131.3.377 Gabriel Stanovsky, Noah A. Smith, in machine translation. the 57th Annual Meeting of and Luke Zettlemoyer. 2019. Evaluating gender In Proceed- bias the ings of Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Associ- ation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P19 -1164 Milan Straka, Jan Hajiˇc, and Jana Strakov´a. 2016. UDPipe: Trainable pipeline for processing CoNLL-u files performing tokenization, mor- phological analysis, POS tagging and parsing. In Proceedings of the Tenth International Con- ference on Language Resources and Evalua- tion (LREC’16), pages 4290–4297, Portoroˇz, Slovenia. European Language Resources Asso- ciation (ELRA). http://ufal.mff.cuni .cz/udpipe Yi Chern Tan and L. Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. In Advances in Neural Information Processing Systems, pages 13209–13220. and syntactic Gabriella Vigliocco, Marcus Lauer, Markus F. Damian, and Willem J. M. Levelt. 2002. in noun Semantic phrase production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(1):46. DOI: https://doi.org/10 .1037/0278-7393.28.1.46 forces Benj Ide Wheeler. of grammatical gender. The Journal of Germanic Philology, 2(4):528–545. 1899. The origin Nicole Y. Y. Wicha, Elizabeth A. Bates, Eva M. Moreno, and Marta Kutas. 2003. Potato not Pope: Human brain potentials to gender expectation and agreement in Spanish spoken sentences. Neuroscience 346(3): 165–168. DOI: https://doi.org/10 .1016/S0304-3940(03)00599-8 Letters, Nicole Y. Y. Wicha, Eva M. Moreno, and Marta Kutas. 2004. Anticipating words and their gender: An event-related brain potential study of semantic integration, gender expec- tancy, and gender agreement in Spanish sentence reading. Journal of Cognitive Neuro- science, 16(7):1272–1288. DOI: https:// doi.org/10.1162/0898929041920487, PMID: 15453979, PMCID: PMC3380438 Adina Williams, Damian Blasi, Lawrence Wolf-Sonkin, Hanna Wallach, and Ryan Cotterell. 2019. Quantifying the semantic core of gender systems. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language (EMNLP-IJCNLP), pages 5734–5739, Hong Kong, China. Asso- ciation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/D19 -1577 Processing Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. 2018. Learning gender-neutral word embeddings. In Proceed- ings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4847–4853, Brussels, Belgium, Asso- ciation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/D18 -1521 Gabriella Vigliocco, David P. Vinson, Federica Paganelli, and Katharina Dworzynski. 2005. Grammatical gender effects on cognition: Implications for language learning and lan- guage use. Journal of Experimental Psycho- Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, and Kai-Wei Chang. 2019. Examining gender bias in languages with grammatical gender. In Pro- ceedings of the 2019 Conference on Empirical 158 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Methods in Natural Language Processing and the 9th International Joint Conference on Nat- ural Language Processing (EMNLP-IJCNLP), pages 5276–5284, Hong Kong, China. Asso- ciation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/D19 -1531, PMCID: PMC6540912 31191883, PMID: Ran Zmigrod, Sebastian J. Mielke, Hanna Wallach, and Ryan Cotterell. 2019. Counter- factual data augmentation for mitigating gender stereotypes in languages with rich morphology. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1651–1661. Florence, Italy. Associ- ation for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P19 -1161 David Zubin relation and Klaus-Michael K¨opcke. and folk taxonomy: The 1986. Gender grammatical indexical categorization. Noun Classes and lexical and Categorization, pages 139–180. DOI: https://doi.org/10.1075/tsl.7 .12zub between l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 5 5 1 9 2 4 1 4 2 / / t l a c _ a _ 0 0 3 5 5 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 159
Download pdf