Gender Bias in Machine Translation

Gender Bias in Machine Translation

Beatrice Savoldi1,2, Marco Gaido1,2, Luisa Bentivogli2, Matteo Negri2, Marco Turchi2
1University of Trento, 意大利
2Fondazione Bruno Kessler, 意大利
{bsavoldi,mgaido,bentivo,negri,turchi}@fbk.eu

抽象的

机器翻译 (公吨) technology has fa-
cilitated our daily tasks by providing acces-
sible shortcuts for gathering, 加工, 和
communicating information. 然而, it can
suffer from biases that harm users and society
at large. As a relatively new field of inquiry,
studies of gender bias in MT still lack cohesion.
This advocates for a unified framework to
ease future research. 为此, 我们: 我) crit-
ically review current conceptualizations of bias
in light of theoretical insights from related
学科, 二) summarize previous analyses
aimed at assessing gender bias in MT, 三、)
discuss the mitigating strategies proposed so
远的, and iv) point toward potential directions
for future work.

1

介绍

Interest in understanding, assessing, and mitigat-
ing gender bias is steadily growing within the
natural language processing (自然语言处理) 社区,
with recent studies showing how gender dispar-
language technologies. 有时,
ities affect
例如, coreference resolution systems fail
to recognize women doctors (赵等人。, 2017;
Rudinger et al., 2018), image captioning mod-
els do not detect women sitting next to a com-
电脑 (Hendricks et al., 2018), 和自动
speech recognition works better with male voices
(Tatman, 2017). Despite a prior disregard for
such phenomena within research agendas (Cislak
等人。, 2018), it is now widely recognized that
NLP tools encode and reflect controversial social
asymmetries for many seemingly neutral tasks,
machine translation (公吨) 包括. 诚然,
the problem is not new (Frank et al., 2004). A
few years ago, Schiebinger (2014) criticized the
phenomenon of ‘‘masculine default’’ in MT after
running one of her interviews through a commer-
cial translation system. In spite of several feminine

mentions in the text, she was repeatedly referred
to by masculine pronouns. Gender-related con-
cerns have also been voiced by online MT users,
who noticed how commercial systems entrench
social gender expectations, 例如, translat-
ing engineers as masculine and nurses as feminine
(奥尔森, 2018).

With language technologies entering wide-
spread use and being deployed at a massive scale,
their societal
impact has raised concern both
之内 (Hovy and Spruit, 2016; Bender et al.,
2021) and outside (Dastin, 2018) the scientific
社区. To take stock of the situation, Sun
等人. (2019) reviewed NLP studies on the topic.
然而, their survey is based on monolingual
applications, whose underlying assumptions and
solutions may not be directly applicable to lan-
guages other than English (Zhou et al., 2019;
赵等人。, 2020; Takeshita et al., 2020) 和
cross-lingual settings. 而且, MT is a multi-
faceted task, which requires resolving multiple
gender-related subtasks at the same time (例如, 共-
reference resolution, named entity recognition).
因此, depending on the languages involved and
the factors accounted for, gender bias has been
conceptualized differently across studies. 迄今为止,
gender bias in MT has been tackled by means
of a narrow, problem-solving oriented approach.
While technical countermeasures are needed, 失败-
ing to adopt a wider perspective and engage with
related literature outside of NLP can be detri-
mental to the advancement of the field (Blodgett
等人。, 2020).

在本文中, we intend to put such literature
to use for the study of gender bias in MT. 我们
go beyond surveys restricted to monolingual NLP
(孙等人。, 2019) or that are more limited in
范围 (Costa-juss`a, 2019; Monti, 2020), 和
present the first comprehensive review of gender
bias in MT. 尤其, 我们 1) offer a unified
framework that introduces the concepts, 来源,
and effects of bias in MT, clarified in light of

845

计算语言学协会会刊, 卷. 9, PP. 845–874, 2021. https://doi.org/10.1162/tacl 00401
动作编辑器: Emily M. Bender. 提交批次: 3/2021; 修改批次: 4/2021; 已发表 8/2021.
C(西德:2) 2021 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

relevant notions on the relation between gender
and different languages; 和 2) critically discuss
the state of the research by identifying blind spots
and key challenges.

2 Bias Statement

Bias is a fraught term with partially overlapping,
or even competing, definitions (Campolo et al.,
2017). In cognitive science, bias refers to the
possible outcome of heuristics, 那是, 精神的
shortcuts that can be critical to support prompt
reactions (Tversky and Kahneman, 1973, 1974).
AI research borrowed from such a tradition (Rich
and Gureckis, 2019; Rahwan et al., 2019) 和
conceived bias as the divergence from an ideal or
expected value (Glymour and Herington, 2019;
Shah et al., 2020), which can occur if models
rely on spurious cues and unintended shortcut
strategies to predict outputs (Schuster et al., 2019;
McCoy et al., 2019; Geirhos et al., 2020). 自从
this can lead to systematic errors and/or adverse
social effects, bias investigation is not only a sci-
entific and technical endeavor but also an ethi-
cal one, given the growing societal role of NLP
applications (Bender and Friedman, 2018). 作为
Blodgett et al. (2020) recently called out, 并且有
been endorsed in other venues (Hardmeier et al.,
2021), analyzing bias is an inherently normative
process that requires identifying what is deemed
as harmful behavior, 如何, and to whom. 这里, 我们
stress a human-centered, sociolinguistically moti-
vated framing of bias. By drawing on the definition
by Friedman and Nissenbaum (1996), we consider
as biased an MT model that systematically and
unfairly discriminates against certain individuals
or groups in favor of others. We identify bias per
a specific model’s behaviors, which are assessed
by envisaging their potential risks when the model
is deployed (Bender et al., 2021) and the harms
that could ensue (克劳福德, 2017), with people
in focus (Bender, 2019). Since MT systems are
used daily by millions of individuals, 他们可以
impact a wide array of people in different ways.

As a guide, we rely on Crawford (2017), WHO
defines two main categories of harms produced
by a biased system: 我) Representational harms
(右) (IE。, detraction from the representation of
social groups and their identity, 哪个, 反过来,
affects attitudes and beliefs); and ii) Allocational
harms (A) (IE。, a system allocates or withholds

opportunities or resources to certain groups). 骗局-
sidering the so-far reported real-world instances
of gender bias (Schiebinger, 2014; 奥尔森, 2018)
and those addressed in the MT literature reviewed
在本文中, (右) can be further distinguished into
under-representation and stereotyping.

Under-representation refers to the reduction
of the visibility of certain social groups through
language by i) producing a disproportionately
low representation of women (例如, most feminine
entities in a text are misrepresented as male in
翻译); or ii) not recognizing the existence
of non-binary individuals (例如, when a system
does not account for gender neutral forms). 为了
such cases, the misrepresentation occurs in the
language employed to talk ‘‘about’’ such groups.1
还, this harm can imply the reduced visibility of
the language used ‘‘by’’ speakers of such groups
by iii) failing to reflect their identity and commu-
nicative repertoires. In these cases, an MT flattens
their communication and produces an output that
indexes unwanted gender identities and social
含义 (例如, women and non-binary speakers
are not referred to by their preferred linguistic
expressions of gender).

Stereotyping regards the propagation of nega-
tive generalizations of a social group, 例如,
belittling feminine representation to less presti-
gious occupations (teacher (Feminine) 与. lecturer
(Masculine)), or in association with attractiveness
判断 (pretty lecturer (Feminine)).

Such behaviors are harmful as they can directly
affect the self-esteem of members of the target
团体 (Bourguignon et al., 2015). 此外,
they can propagate to indirect stakeholders. 为了
实例, if a system fosters the visibility of the
way of speaking of the dominant group, 公吨
users can presume that such a language represents
the most appropriate or prestigious variant2—at
the expense of other groups and communicative
repertoires. These harms can aggregate, 和
ubiquitous embedding of MT in Web applications
provides us with paradigmatic examples of how
the two types of (右) can interplay. 例如, 如果
women or non-binary3 scientists are the subjects
of a query, automatically translated pages run the

1See also the classifications by Dinan et al. (2020).
2For an analogy on how technology shaped the perception
of feminine voices as shrill and immature, see Tallon (2019).
3Throughout the paper, we use non-binary as an umbrella
term for referring to all gender identities between or outside
the masculine/feminine binary categories.

846

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

risk of referring to them via masculine-inflected
job qualifications. Such misrepresentations can
lead readers to the experience of feelings of iden-
tity invalidation (Zimman et al., 2017). 还,
users may not be aware of being exposed to MT
mistakes due to the deceptively fluent output of
a system (Martindale and Carpuat, 2018). 在里面
long run, stereotypical assumptions and preju-
dices (例如, only men are qualified for high-level
positions) will be reinforced (Levesque, 2011;
R´egner et al., 2019).

Regarding (A), MT services are consumed by
the general public and can thus be regarded as
resources in their own right. 因此, (右) can di-
rectly imply (A) as a performance disparity across
users in the quality of service, 即, the overall
efficiency of the service. 因此, a woman
attempting to translate her biography by relying
on an MT system requires additional energy and
time to revise incorrect masculine references. 如果
such disparities are not accounted for, the MT
field runs the risk of producing systems that pre-
vent certain groups from fully benefiting from
such technological resources.

In the following, we operationalize such cate-
gories to map studies on gender bias to their moti-
vations and societal implications (Tables 1 和 2).

3 Understanding Bias

To confront bias in MT, it is vital to reach out
to other disciplines that foregrounded how the
socio-cultural notions of gender interact with lan-
规格(s), 翻译, and implicit biases. 仅有的
then can we discuss the multiple factors that
concur to encode and amplify gender inequalities
in language technology. 注意, 除了
Saunders et al. (2020), current studies on gender
bias in MT have assumed an (often implicit) 双-
nary vision of gender. 像这样, our discussion is
largely forced into this classification. 虽然
we also describe bimodal feminine/masculine
linguistic forms and social categories, we empha-
size that gender encompasses multiple biosocial
elements not to be conflated with sex (Risman,
2018; Fausto-Sterling, 2019), and that some indi-
viduals do not experience gender, 根本不, or in
binary terms (Glen and Hurrell, 2012).

3.1 Gender and Language

The relation between language and gender is
not straightforward. 第一的, the linguistic structures

used to refer to the extra-linguistic reality of
gender vary across languages (§3.1.1). 而且,
how gender is assigned and perceived in our
verbal practices depends on contextual factors as
well as assumptions about social roles, 特质, 和
属性 (§3.1.2). 最后, language is conceived
as a tool for articulating and constructing personal
身份 (§3.1.3).

3.1.1 Linguistic Encoding of Gender
Drawing on linguistic work (Corbett, 1991; Craig,
1994; Comrie, 1999; Hellinger and Bußman,
2001, 2002, 2003; Corbett, 2013; Gygax et al.,
2019) we describe the linguistic forms (词汇的,
pronominal, grammatical) that bear a relation with
the extra-linguistic reality of gender. 下列的
Stahlberg et al. (2007), we identify three language
团体:

Genderless languages (例如, Finnish, Turkish).
In such languages, the gender-specific repertoire
is at its minimum, only expressed for basic lexical
对, usually kinship or address terms (例如, 在
Finnish sisko/sister vs. veli/brother).

Notional gender languages4 (例如, Danish,
英语). On top of lexical gender (mom/dad),
such languages display a system of pronomi-
nal gender (she/he, her/him). English also hosts
some marked derivative nouns (actor/actress)
and compounds (chairman/chairwoman).

Grammatical gender languages (例如, Arabic,
西班牙语). In these languages, each noun pertains
to a class such as masculine, feminine, and neu-
特尔 (if present). Although for most
inanimate
objects gender assignment is only formal,5 为了
human referents masculine/feminine markings are
assigned on a semantic basis. Grammatical gen-
der is defined by a system of morphosyntactic
协议, where several parts of speech beside
名词 (例如, 动词, determiners, 形容词)
carry gender inflections.

有鉴于此, the English sentence ‘‘He/She
is a good friend’’ has no overt expression of
gender in a genderless language like Turkish
(‘‘O iyi bir arkadas¸’’), whereas Spanish spreads
several masculine or feminine markings (‘‘El/la

4Also referred to as natural gender languages. Follow-
ing McConnell-Ginet (2013), we prefer notional to avoid
terminological overlapping with ‘‘natural’’,
IE。, biologi-
cal/anatomical sexual categories. For a wider discussion on
the topic, see Nevalainen and Raumolin-Brunberg (1993);
Curzan (2003).

5E.g., ‘‘moon’’ is masculine in German, feminine in

法语, and neuter in Greek.

847

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

es un/a buen/a amigo/a’’). Although general, 这样的
macro-categories allow us to highlight typologi-
cal differences across languages. These are cru-
cial to frame gender issues in both human and
machine translation. 还, they exhibit to what
extent speakers of each group are led to think and
communicate via binary distinctions,6 也
underline the relative complexity in carving out
a space for lexical innovations that encode non-
binary gender (Hord, 2016; Conrod, 2020). 在这个
感觉, while English is bringing the singular they
in common use and developing neo-pronouns
(Bradley et al., 2019), for grammatical gender
languages like Spanish neutrality requires the
development of neo-morphemes (‘‘Elle es une
buene amigue’’).

3.1.2 Social Gender Connotations

To understand gender bias, we have to grasp not
only the structure of different languages, 但是也
how linguistic expressions are connoted, 已部署,
and perceived (Hellinger and Motschenbacher,
2015). In grammatical gender languages, fem-
to a so-called
inine forms are often subject
semantic derogation (Schulz, 1975), 例如,
in French, couturier
(fashion designer) 与.
couturi`ere (seamstress). English is no exception
(例如, governor/governess).

而且, bias can lurk underneath seemingly
neutral forms. Such is the case of epicene (IE。,
gender neutral) nouns where gender is not gram-

matically marked. 这里, gender assignment
linked to (typically binary) social gender, 那是,
‘‘the socially imposed dichotomy of masculine
and feminine role and character traits’’ (Kramarae
and Treichler, 1985). As an illustration, Danish
speakers tend to pronominalize dommer (法官)
with han (他) when referring to the whole occu-
pational category (Gomard, 1995; Nissen, 2002).
Social gender assignment varies across time and
空间 (Lyons, 1977; Romaine, 1999; Cameron,
2003) and regards stereotypical assumptions
is typical or appropriate for men
about what
and women. Such assumptions impact our percep-
系统蒸发散 (汉密尔顿, 1988; Gygax et al., 2008; Kreiner
等人。, 2008) and influence our behavior (例如,
leading individuals to identify with and fulfill
stereotypical expectations; Wolter and Hannover,

6Outside of the Western paradigm, there are cultures
whose languages traditionally encode gender outside of the
binary (Epple, 1998; 穆雷, 2003; Hall and O’Donovan,
2014).

2016; Sczesny et al., 2018) and verbal commu-
nication (例如, women are often misquoted in the
academic community; Krawczyk, 2017).

Translation studies highlight how social gen-
der assignment
influences translation choices
(Jakobson, 1959; Chamberlain, 1988; Comrie,
1999; Di Sabato and Perri, 2020). Primarily, 这
problem arises from typological differences across
languages and their gender systems. 尽管如此,
socio-cultural factors also influence how trans-
lators deal with such differences. Consider the
character of the cook in Daphne du Maurier’s
Rebecca, whose gender is never explicitly stated in
the whole book. In the lack of any available infor-
运动, translators of five grammatical gender
languages represented the character as either a man
or a woman (Wandruszka, 1969; Nissen, 2002).
Although extreme, this case can illustrate the sit-
uation of uncertainty faced by MT: the mapping
of one-to-many forms in gender prediction. 但,
as discussed in §4.1, mistranslations occur when
contextual gender information is available as well.

3.1.3 Gender and Language Use

Language use varies between demographic groups
and reflects their backgrounds, personalities, 和
social identities (Labov, 1972; Trudgill, 2000;
Pennebaker and Stone, 2003). In this light, 这
study of gender and language variation has re-
ceived much attention in socio- and corpus lin-
语言学 (Holmes and Meyerhoff, 2003; Eckert
and McConnell-Ginet, 2013). Research conducted
in speech and text analysis highlighted several
gender differences, which are exhibited at the
phonological and lexical-syntactic level. 对于前-
充足, women rely more on hedging strategies
(‘‘it seems that’’), purpose clauses (‘‘in order
to’’), first-person pronouns, and prosodic excla-
mations (Mulac et al., 2001; Mondorf, 2002;
Brownlow et al., 2003). Although some corre-
spondences between gender and linguistic features
hold across cultures and languages (史密斯, 2003;
Johannsen et al., 2015), it should be kept in mind
that they are far from universal7 and should not
be intended in a stereotyped and oversimplified

7It has been largely debated whether gender-related
differences are inherently biological or cultural and social
产品 (Mulac et al., 2001). 现在, the idea that they
depend on biological reasons is largely rejected (Hyde,
2005) in favor of a socio-cultural or performative perspective
(管家, 1990).

848

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

方式 (Bergvall et al., 1996; Nguyen et al.,
2016; Koolen and van Cranenburgh, 2017).

Drawing on gender-related features proved use-
ful for building demographically informed NLP
工具 (Garimella et al., 2019) and personalized MT
型号 (Mirkin et al., 2015; Bawden et al., 2016;
Rabinovich et al., 2017). 然而, using personal
gender as a variable requires a prior understand-
ing of which categories may be salient, and a
critical reflection on how gender is intended and
ascribed (拉森, 2017). 否则, if we assume
that the only relevant categories are ‘‘male’’ and
‘‘female’’, our models will inevitably fulfill such
a reductionist expectation (Bamman et al., 2014).

3.2 Gender Bias in MT

the context

迄今为止, an overview of how several factors may
contribute to gender bias in MT does not exist.
We identify and clarify concurring problematic
causes, accounting for
其中
systems are developed and used (§2). To this aim,
we rely on the three overarching categories of bias
described by Friedman and Nissenbaum (1996),
which foreground different sources that can lead
to machine bias. 这些都是: pre-existing bias—
rooted in our institutions, practices and attitudes
(§3.2.1); technical bias—due to technical con-
straints and decisions (§3.2.2); and emergent bias
—arising from the interaction between systems
and users (§3.2.3). We consider such categories
as placed along a continuum, rather than being
离散的.

3.2.1 Pre-existing Bias
MT models are known to reflect gender dispari-
ties present in the data. 然而, reflections on
such generally invoked disparities are often over-
looked. Treating data as an abstract, monolithic
实体 (Gitelman, 2013)—or relying on ‘‘overly
broad/overloaded terms like training data bias’’8
(Suresh and Guttag, 2019)—do not encourage
reasoning on the many factors of which data
are the product: first and foremost, the historical,
socio-cultural context in which they are generated.
to tackle these issues is
A starting point
the Europarl corpus
(科恩, 2005), 在哪里
仅有的 30% of sentences are uttered by women
(Vanmassenhove et al., 2018). Such an imbalance
is a direct window into the glass ceiling that

8见约翰逊 (2020A) and Samar (2020) for a discussion
on how such narrative can be counterproductive for tackling
bias.

has hampered women’s access to parliamentary
positions. This case exemplifies how data might
be ‘‘tainted with historical bias’’, mirroring an
‘‘unequal ground truth’’ (Hacker, 2018). 其他
gender variables are harder to spot and quantify.

Empirical linguistics research pointed out that
subtle gender asymmetries are rooted in lan-
guages’ use and structure. 例如, an im-
portant aspect regards how women are referred
到. Femaleness is often explicitly invoked when
there is no textual need to do so, even in lan-
guages that do not require overt gender marking.
A case in point regards Turkish, which differen-
tiates cocuk (孩子) and kiz cocugu (female child)
(布劳恩, 2000). 相似地,
in a corpus search,
Romaine (2001) 成立 155 explicit female mark-
ings for doctor (女性, woman, or lady doctor),
compared with only 14 male doctor. Feminist
language critique provided extensive analysis of
such a phenomenon by highlighting how referents
in discourse are considered men by default unless
explicitly stated (Silveira, 1980; 汉密尔顿, 1991).
最后, prescriptive top–down guidelines limit
the linguistic visibility of gender diversity, 为了
例子, the Real Academia de la Lengua Es-
pa˜nola recently discarded the official use of non-
binary innovations and claimed the functionality
of masculine generics (Mundo, 2018; L´opez et al.,
2020).

By stressing such issues, we are not condoning
the reproduction of pre-existing bias in MT.
相当, the above-mentioned concerns are the
starting point to account for when dealing with
gender bias.

3.2.2 Technical Bias

Technical bias comprises aspects related to data
creation, model design, and training and testing
程序. If present in training and testing sam-
普莱斯, asymmetries in the semantics of language use
and gender distribution are respectively learned
by MT systems and rewarded in their evaluation.
然而, as just discussed, biased representations
are not merely quantitative, but also qualitative.
因此, straightforward procedures (例如,
balancing the number of speakers in existing
datasets) do not ensure a fairer representation
of gender in MT outputs. Since datasets are a
crucial source of bias, it is also crucial to advo-
cate for a careful data curation (Mehrabi et al.,
2019; Paullada et al., 2020; Hanna et al., 2021;
Bender et al., 2021), guided by pragmatically

849

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

and socially informed analyses (Hitti et al., 2019;
Sap et al., 2020; Devinney et al., 2020) 和
annotation practices (Gaido et al., 2020).

全面的, while data can mirror gender inequal-
ities and offer adverse shortcut learning oppor-
统一, it is ‘‘quite clear that data alone rarely
constrain a model sufficiently’’ (Geirhos et al.,
that models over-
2020) nor explain the fact
amplify (Shah et al., 2020) such inequalities in
their outputs. Focusing on models’ components,
Costa-juss`a et al. (2020乙) demonstrate that archi-
tectural choices in multilingual MT impact the
systems’ behavior: Shared encoder-decoders re-
tain less gender information in the source embed-
dings and less diversion in the attention than
language-specific encoder-decoders
(Escolano
thus disfavoring the genera-
等人。, 2021),
tion of feminine forms. While discussing the
loss and decay of certain words in translation,
Vanmassenhove et al. (2019, 2021) attest to the
existence of an algorithmic bias that leads under-
represented forms in the training data (as it may
be the case for feminine references) to further de-
crease in the MT output. 具体来说, Roberts et al.
(2020) prove that beam search (unlike sampling)
is skewed toward the generation of more frequent
(masculine) pronouns, as it leads models to an ex-
treme operating point that exhibits zero variability.
因此, efforts towards understanding and mit-
igating gender bias should also account for the
model and its algorithmic implications. 迄今为止,
this remains largely unexplored.

3.2.3 Emergent Bias

Emergent bias may arise when a system is used
in a different context than the one it was designed
for—for example, when it is applied to another
demographic group. From car crash dummies
to clinical trials, we have evidence of how not
accounting for gender differences brings to the
creation of male-grounded products with dire
结果 (Liu and Dipietro Mager, 2016;
Criado-Perez, 2019), such as higher death and
injury risks in vehicle crashes and less effective
medical treatments for women. 相似地, unbe-
knownst to their creators, MT systems that are
not intentionally envisioned for a diverse range
of users will not generalize for the feminine seg-
ment of the population. 因此, in the interaction
with an MT system, a woman will likely be mis-
gendered or not have her linguistic style pre-
服务 (Hovy et al., 2020). Other conditions

of user/system mismatch may be the result of
changing societal knowledge and values. A case
in point regards Google Translate’s historical
decision to adjust
its system for instances of
gender ambiguity. Since its launch twenty years
前, Google had provided only one translation for
single-word gender-ambiguous queries (例如, 亲-
fessor translated in Italian with the masculine pro-
fessore). In a community increasingly conscious
of the power of language to hardwire stereotyp-
ical beliefs and women’s invisibility (Lindqvist
等人。, 2019; Beukeboom and Burgers, 2019),
the bias exhibited by the system was confronted
with a new sensitivity. The service’s decision
provide
(Kuczmarski,
double

A
(professor→profes-
feminine/masculine output
soressa|professore) stems from current demands
for gender-inclusive resolutions. For the recogni-
tion of non-binary groups (Richards et al., 2016),
we invite studies on how such modeling could be
integrated with neutral strategies (§6).

2018)

4 Assessing Bias

First accounts on gender bias in MT date back to
Frank et al. (2004). Their manual analysis pointed
out how English-German MT suffers from a dearth
of linguistic competence, as it shows severe diffi-
culties in recovering syntactic and semantic infor-
mation to correctly produce gender agreement.

Similar inquiries were conducted on other target
grammatical gender languages for several com-
mercial MT systems (Abu-Ayyash, 2017; Monti,
2017; Rescigno et al., 2020). While these studies
focused on contrastive phenomena, Schiebinger
(2014)9 went beyond linguistic insights, call-
ing for a deeper understanding of gender bias.
Her article on Google Translate’s ‘‘masculine
default’’ behavior emphasized how such a phe-
nomenon is related to the larger issue of gender
不平等, also perpetuated by socio-technical
artifacts (Selbst et al., 2019). All in all, 这些
qualitative analyses demonstrated that gender
problems encompass all
three MT paradigms
(neural, 统计, and rule-based), preparing the
ground for quantitative work.

To attest the existence and scale of gender bias
across several languages, dedicated benchmarks,
evaluations, and experiments have been designed.

9See also Schiebinger’s project Gendered Innovations:
http://genderedinnovations.stanford.edu
/case-studies/nlp.html.

850

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Study
(Prates et al., 2018)
(Cho et al., 2019)
(Gonen and Webster, 2020)
(Stanovsky et al., 2019)
(Vanmassenhove et al., 2018)
(Hovy et al., 2020)

Benchmark

Synthetic, 我们. 劳工统计局
Synthetic equity evaluation corpus (EEC)
BERT-based perturbations on natural sentences
WinoMT
Europarl (generic)
Trustpilot (reviews with gender and age)

性别





Harms
右: under-rep, stereotyping
右: under-rep, stereotyping
右: under-rep, stereotyping
右: under-rep, stereotyping
A: 质量
右: under-rep

桌子 1: For each Study, the Table shows on which Benchmark gender bias is assessed, how Gender
is intended (here only in binary (乙) 条款). 最后, we indicate which (右)epresentational—under-
representation and stereotyping—or (A)llocational Harm—as reduced quality of service—is addressed
in the study.

We first discuss large scale analyses aimed at
assessing gender bias in MT, grouped according
to two main conceptualizations: 我) works focus-
ing on the weight of prejudices and stereotypes
in MT (§4.1); and ii) studies assessing whether
gender is properly preserved in translation (§4.2).
In accordance with the human-centered approach
embraced in this survey, 表中 1 we map each
work to the harms (see §2) ensuing from the biased
behaviors they assess. 最后, we review exist-
ing benchmarks for comparing MT performance
across genders (§4.3).

4.1 MT and Gender Stereotypes

In MT, we record prior studies concerned with
pronoun translation and coreference resolution
across typologically different languages account-
ing for both animate and inanimate referents
(Hardmeier and Federico, 2010; Le Nagard and
科恩, 2010; Guillou, 2012). For the specific
analysis on gender bias, 反而, such tasks are
exclusively studied in relation to human entities.
Prates et al. (2018) and Cho et al. (2019) 的-
sign a similar setting to assess gender bias. Prates
等人. (2018) investigate pronoun translation from
12 genderless languages into English. Retrieving
∼1,000 job positions from the U.S. Bureau of
Labor Statistics, they build simple constructions
like the Hungarian ‘‘¨o egy m´ern¨ok’’ (‘‘he/she
is an engineer’’). Following the same template,
Cho et al. (2019) extend the analysis to Korean-
English including both occupations and sentiment
字 (例如, 种类). As their samples are ambiguous
by design, the observed predictions of he/she
pronouns should be random, yet they show a
strong masculine skew.10

10Cho et al. (2019) highlight that a higher frequency of
feminine references in the MT output does not necessarily
it may reflect gender
imply a bias reduction. 相当,

To further analyze the under-representation of
she pronouns, Prates et al. (2018) focus on 22
macro-categories of occupation areas and com-
pare the proportion of pronoun predictions against
the real-world proportion of men and women
employed in such sectors. 这样, they find
that MT not only yields a masculine default, 但
it also underestimates feminine frequency at a
greater rate than occupation data alone suggest.
Such an analysis starts by acknowledging pre-
existing bias (see §3.2.1)—for example, low rates
of women in STEM—to attest the existence of
machine bias, and defines it as the exacerbation
of actual gender disparities.

Going beyond word lists and simple syn-
thetic constructions, Gonen and Webster (2020)
inspect
the translation into Russian, 西班牙语,
德语, and French of natural yet ambiguous
English sentences. Their analysis on the ratio and
type of generated masculine/feminine job titles
consistently exhibits social asymmetries for target
grammatical gender languages (例如, lecturer mas-
culine vs. teacher feminine). 最后, Stanovsky
等人. (2019) assess that MT is skewed to the point
of actually ignoring explicit feminine gender
information in source English sentences. 为了
实例, MT systems yield a wrong masculine
translation of the job title baker, although it is
referred to by the pronoun she. Aside from over-
looking of overt gender mentions, the model’s
reliance on unintended (and irrelevant) cues for
gender assignment is further confirmed by the

刻板印象, as for hairdresser that
is skewed toward
feminine. This observation points to the tension between
frequency count, suitable for testing under-representation,
and qualitative-oriented analysis on bias conceptualized in
terms of stereotyping.

851

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

fact that adding a socially connoted (但对于-
mally epicene) 形容词 (the pretty baker) pushes
models toward feminine inflections in translation.
We observe that the propagation of stereotypes
is a widely researched form of gender asymmetries
in MT, one that so far has been largely narrowed
down to occupational stereotyping. 毕竟,
occupational stereotyping has been studied by dif-
ferent disciplines (Greenwald et al., 1998) attested
across cultures (Lewis and Lupyan, 2020), and it
can be easily detected in MT across multiple lan-
guage directions with consistent results. 当前的
research should not neglect other stereotyping
dynamics, as in the case of Stanovsky et al. (2019)
and Cho et al. (2019), who include associations
to physical characteristics or psychological traits.
还, the intrinsically contextual nature of socie-
tal expectations advocates for the study of culture-
specific dimensions of bias. 最后, we signal
the BERT-based perturbation method by

Webster et al.
identifies other bias-
susceptible nouns that tend to be assigned to
fighter as masculine).
a specific gender (例如,
As Blodgett (2021) underscores, 然而, ‘‘the
existence of these undesirable correlations is not
sufficient to identify them as normatively unde-
sirable’’. It should thus be investigated whether
such statistical preferences can cause harms (例如,
by checking if they map to existing harmful
associations or quality of service disparities).

(2019)

4.2 MT and Gender Preservation

Vanmassenhove et al. (2018) and Hovy et al.
(2020) investigate whether speakers’ gender11 is
properly reflected in MT. This line of research is
preceded by findings on gender personalization of
统计机器翻译 (Mirkin et al., 2015; Bawden et al.,
2016; Rabinovich et al., 2017), which claim that
gender ‘‘signals’’ are weakened in translation.

Hovy et al. (2020) conjecture the existence
of age and gender stylistic bias due to models’
under-exposure to the writings of women and
younger segments of the population. 为了测试这个
假设, they automatically translate a corpus
of online reviews with available metadata about
用户 (Hovy et al., 2015). 然后, they compare
such demographic information with the predic-
tion of age and gender classifiers run on the

11注意

these studies distinguish speakers

进入
female/male. As discussed in §3.1.3, we invite a reflection on
the appropriateness and use of such categories.

MT output. Results indicate that different com-
mercial MT models systematically make authors
‘‘sound’’ older and male. Their study thus con-
cerns the under-representation of the language
used ‘‘by’’ certain speakers and how it is per-
ceived (Blodgett, 2021). 然而, 作者
do not inspect which linguistic choices MT over-
produces, nor which stylistic features may char-
acterize different socio-demographic groups.

仍然

starting from the

assumption that
demographic factors influence language use,
Vanmassenhove et al. (2018) probe MT’s abil-
ity to preserve speaker’s gender translating from
English into ten languages. To this aim, 他们
develop gender-informed MT models (see § 5.1)
whose outputs are compared with those obtained
by their baseline counterparts. Tested on a set for
spoken language translation (科恩, 2005), 他们的
enhanced models show consistent gains in terms
of overall quality when translating into grammati-
cal gender languages, where speaker’s references
are often marked. 例如, the French transla-
tion of ‘‘I’m happy’’ is either ‘‘Je suis heureuse‘‘
or ‘‘Je suis hereux’’ for a female/male speaker,
分别. Through a focused cross-gender
分析 (carried out by splitting their English-
French test set into 1st person male vs. 女性
数据) they assess that the largest margin of im-
provement for their gender-informed approach
concerns sentences uttered by women, 自从
results of their baseline disclose a quality of ser-
vice disparity in favor of male speakers. As well
as morphological agreement, they also attribute
such improvement to the fact that their enhanced
model produces gendered preferences in other
word choices. 例如, it opts for think rather
than believe, which is in concordance with corpus
studies claiming a tendency for women to use less
assertive speech (Newman et al., 2008). 注意
the authors rely on manual analysis to ascribe per-
formance differences to gender-related features. 在
事实, global evaluations on generic test sets alone
are inadequate to pointedly measure gender bias.

4.3 Existing Benchmarks

MT outputs are typically evaluated against ref-
erence translations employing standard metrics
such as BLEU (Papineni et al., 2002) or TER
(Snover et al., 2006). This procedure poses two
挑战. 第一的, these metrics provide coarse-
grained scores for translation quality, as they treat

852

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

all errors equally and are rather insensitive to
specific linguistic phenomena (Sennrich, 2017).
第二, generic test sets containing the same
gender imbalance present in the training data can
reward biased predictions. 这里, we describe the
publicly available MT Gender Bias Evaluation
Testsets (GBETs) (孙等人。, 2019), 那是, 长椅-
marks designed to probe gender bias by isolating
the impact of gender from other factors that may
affect systems’ performance. Note that differ-
ent benchmarks and metrics respond to different
conceptualizations of bias (Barocas et al., 2019).
Common to them all in MT, 然而, 就是它
biased behaviors are formalized by using some
variants of averaged performance12 disparities
across gender groups, comparing the accuracy
of gender predictions on an equal number of
masculine, feminine, and neutral references.

Escud´e Font and Costa-juss`a (2019) 开发-
oped the bilingual English-Spanish Occupations
test set. It consists of 1,000 sentences equally
distributed across genders. The phrasal structure
envisioned for their sentences is ‘‘I’ve known
{她|他|} for a long time, 我的
friend works as {A|一个} ’’. The eval-
uation focuses on the translation of the noun friend
into Spanish (amigo/a). Since gender information
is present in the source context and sentences
are the same for both masculine/feminine partic-
爱普茨, an MT system exhibits gender bias if it
disregards relevant context and cannot provide
the correct translation of friend at the same rate
across genders.

Stanovsky et al. (2019) created WinoMT by
concatenating two existing English GBETs for
coreference resolution (Rudinger et al., 2018;
赵等人。, 2018A). The corpus consists of 3,888
Winogradesque sentences presenting two human
entities defined by their role and a subsequent
pronoun that needs to be correctly resolved to
one of the entities (例如, ‘‘The lawyer yelled
at the hairdresser because he did a bad job’’).
For each sentence, there are two variants with
either he or she pronouns, so as to cast the referred
annotated entity (hairdresser) into a proto- or anti-
stereotypical gender role. By translating WinoMT
into grammatical gender languages, one can thus
measure systems’ ability to resolve the anaphoric

12This is a value-laden option (Birhane et al., 2020), 和
not the only possible one (米切尔等人。, 2020). For a broader
discussion on measurement and bias we refer the reader also
to Jacobs (2021); Jacobs et al. (2020).

relation and pick the correct feminine/masculine
inflection for the occupational noun. On top of
quantifying under-representation as the difference
between the total amount of translated feminine
and masculine references, the subdivision of the
corpus into proto- and anti-stereotypical sets also
allows verifying if MT predictions correlate with
occupational stereotyping.

最后, Saunders et al. (2020) enriched the orig-
inal version of WinoMT in two different ways.
第一的, they included a third gender-neutral case
based on the singular they pronoun, thus paving
the way to account for non-binary referents. 秒-
另一, they labeled the entity in the sentence which
is not coreferent with the pronoun (律师). 这
latter annotation is used to verify the shortcomings
of some mitigating approaches as discussed in §5.
The above-mentioned corpora are known as
challenge sets, consisting of sentences created ad
hoc for diagnostic purposes. 这样, they can
be used to quantify bias related to stereotyping and
under-representation in a controlled environment.
然而, since they consist of a limited variety of
synthetic gender-related phenomena, they hardly
address the variety of challenges posed by real-
world language and are relatively easy to overfit.
As recognized by Rudinger et al. (2018) ‘‘they
may demonstrate the presence of gender bias in
a system, but not prove its absence’’.

The Arabic Parallel Gender Corpus (Habash
等人。, 2019) includes an English-Arabic test
set13 retrieved from OpenSubtitles natural lan-
guage data (Lison and Tiedemann, 2016). 每个
的 2,448 sentences in the set exhibits a first
person singular reference to the speaker (例如,
‘‘I’m rich’’). Among them, ∼200 English sen-
tences require gender agreement to be assigned
in translation. These were translated into Arabic
in both gender forms, obtaining a quantitatively
and qualitatively equal amount of sentence pairs
with annotated masculine/feminine references.
This natural corpus thus allows for cross-gender
evaluations on MT production of correct speaker’s
gender agreement.

MuST-SHE (Bentivogli et al., 2020) is a natu-
ral benchmark for three language pairs (英语-
French/Italian/Spanish). Built on TED talks data
(Cattoni et al., 2021), for each language pair
it comprises ∼1,000 (声音的, transcript, transla-

13全面的, the corpus comprises over 12,000 annotated

sentences and 200,000 synthetic sentences.

853

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Authors

Benchmark

Vanmassenhove et al.
Elaraby et al.
Saunders et al.
Stafanoviˇcs et al.
Basta et al.
Escud´e Font and Costa-juss`a Occupation test set
Costa-juss`a and de Jorge

Europarl (generic)
Open subtitles (generic)
expanded WinoMT
WinoMT
WinoMT

Approach
Gender tagging
(sentence-level)
Gender tagging
(word-level)
Adding context
Word-embeddings
微调
Black-box injection Moryossef et al.
Lattice-rescoring
Re-inflection

Saunders and Byrne
Habash et al.; Alhafni et al. Arabic Parallel Gender Corpus

WinoMT
Open subtitles (selected sample)
WinoMT

性别


nb






Harms
右: under-rep, A: 质量
右: under-rep, A: 质量
右: under-rep, stereotyping
右: under-rep, stereotyping
右: under-rep, stereotyping
右: under-rep
右: under-rep, stereotyping
右: under-rep, A: 质量
右: under-rep, steretoyping
右: under-rep, A: 质量

桌子 2: For each Approach and related Authors, the Table shows on which Benchmark it is tested,
if Gender is intended in binary terms (乙), or including non-binary (nb) 身份. 最后, we indicate
哪个 (右)epresentational—under-representation and stereotyping—or (A)llocational Harm—as re-
duced quality of service—the approach attempts to mitigate.

的) triplets, thus allowing evaluation for both MT
and speech translation (英石). Its samples are bal-
anced between masculine and feminine phenom-
ena, and incorporate two types of constructions:
我) sentences referring to the speaker (例如, ‘‘I was
born in Mumbai’’), and ii) sentences that present
contextual information to disambiguate gender
(例如, ‘‘My mum was born in Mumbai’’). 自从
every gender-marked word in the target language
is annotated in the corpus, MuST-SHE grants
the advantage of complementing BLEU- 和
accuracy-based evaluations on gender translation
for a great variety of phenomena.

Unlike challenge sets, natural corpora quantify
whether MT yields reduced feminine represen-
tation in authentic conditions and whether the
quality of service varies across speakers of dif-
ferent genders. 然而, as they treat all gender-
marked words equally,
is not possible to
identify if the model is propagating stereotypical
陈述.

All in all, we stress that each test set and
metric is only a proxy for framing a phenomenon
or an ability (例如, anaphora resolution), 和
approximation of what we truly intend to gauge.
因此, as we discuss in §6, advances in MT should
account for the observation of gender bias in
real-world conditions to avoid a situation in which
achieving high scores on a mathematically for-
malized estimation could lead to a false sense of
安全. 仍然, benchmarks remain valuable tools
to monitor models’ behavior. 像这样, we remark
that evaluation procedures ought to cover both
models’ general performance and gender-related
问题. This is crucial to establish the capabilities
and limits of mitigating strategies.

5 Mitigating Bias

To attenuate gender bias in MT, different strate-
gies dealing with input data, learning algorithms,
and model outputs have been proposed. As attested
by Birhane et al. (2020), since advancements are
oftentimes exclusively reported in terms of val-
ues internal to the machine learning field (例如,
efficiency, 表现), it is not clear how such
strategies are meeting societal needs by reducing
MT-related harms. In order to conciliate technical
perspectives with the intended social purpose, 在
桌子 2 we map each mitigating approach to the
harms (see §2) they are meant to alleviate, 作为
well as to the benchmark their effectiveness is
evaluated against. 相辅相成, we hereby
describe each approach by means of two cat-
egories: model debiasing (§5.1) and debiasing
through external components (§5.2).

5.1 Model Debiasing

This line of work focuses on mitigating gender bias
through architectural changes of general-purpose
MT models or via dedicated training procedures.
Gender Tagging. To improve the generation of
speaker’s referential markings, Vanmassenhove
等人. (2018) prepend a gender tag (M or F) to each
source sentence, both at training and inference
时间. As their model is able to leverage this addi-
tional information, the approach proves useful to
handle morphological agreement when translating
from English into French. 然而, this solution
requires additional metadata regarding the speak-
ers’ gender that might not always be feasible to
获得. Automatic annotation of speakers’ gen-
这 (例如, based on first names) is not advisable,

854

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

as it runs the risk of introducing additional bias
by making unlicensed assumptions about one’s
身份.

Elaraby et al. (2018) bypass this risk by defin-
ing a comprehensive set of cross-lingual gender
agreement rules based on POS tagging. 在这个
方式, they identify speakers’ and listeners’ gender
references in an English-Arabic parallel corpus,
which is consequently labeled and used for train-
英. 这个想法, originally developed for spoken
language translation in a two-way conversational
环境, can be adapted for other languages and
scenarios by creating new dedicated rules. 如何-
曾经, in realistic deployment conditions where
reference translations are not available, 性别
information still has to be externally supplied as
metadata at inference time.

Stafanoviˇcs et al. (2020) and Saunders et al.
(2020) explore the use of word-level gender tags.
While Stafanoviˇcs et al. (2020) just report a gender
translation improvement, Saunders et al. (2020)
rely on the expanded version of WinoMT to iden-
tify a problem concerning gender tagging: It intro-
duces noise if applied to sentences with references
to multiple participants, as it pushes their transla-
tion toward the same gender. Saunders et al. (2020)
also include a first non-binary exploration of neu-
tral translation by exploiting an artificial dataset,
where neutral tags are added and gendered inflec-
tions are replaced by placeholders. The results
are inconclusive, 然而, most likely due to the
small size and synthetic nature of their dataset.

Adding Context. Without further information
needed for training or inference, Basta et al. (2020)
adopt a generic approach and concatenate each
sentence with its preceding one. By providing
more context, they attest a slight improvement in
gender translations requiring anaphoric corefer-
ence to be solved in English-Spanish. This finding
motivates exploration at the document level, 但它
should be validated with manual (Castilho et al.,
2020) and interpretability analyses since the added
context can be beneficial for gender-unrelated
原因, such as acting as a regularization factor
(Kim et al., 2019).

Debiased Word Embeddings. The two above-
mentioned mitigations share the same intent: sup-
ply the model with additional gender knowledge.
反而, Escud´e Font and Costa-juss`a (2019)
leverage pre-trained word embeddings, 哪个
are debiased by using the hard-debiasing method

(2016) 或者

proposed by Bolukbasi et al.

GN-GloVe algorithm (赵等人。, 2018乙). 这些
methods respectively remove gender associa-
tions or isolate them from the representations of
English gender-neutral words. Escud´e Font and
Costa-juss`a (2019) employ such embeddings on
the decoder side, the encoder side, and both sides
of an English-Spanish model. The best results are
obtained by leveraging GN-GloVe embeddings
on both encoder and decoder sides, 增加
BLEU scores and gender accuracy. 作者
generically apply debiasing methods developed
for English also to their target language. 如何-
曾经, with being Spanish a grammatical gen-
der language, other language-specific approaches
should be considered to preserve the quality of
the original embeddings (Zhou et al., 2019; 赵
等人。, 2020). We also stress that it is debatable
whether depriving systems of some knowledge
and diminish their perceptions is the right path
toward fairer language models (Dwork et al.,
2012; Caliskan et al., 2017; Gonen and Goldberg,
2019; Nissim and van der Goot, 2020). 还,
Goldfarb-Tarrant et al. (2020) find that there is no
reliable correlation between intrinsic evaluations
of bias in word-embeddings and cascaded effects
on MT models’ biased behavior.

Balanced Fine-tuning. Costa-juss`a and de Jorge
(2020) rely on Gebiotoolkit (Costa-juss`a et al.,
2020C) to build gender-balanced datasets (IE。,
featuring an equal amount of masculine/feminine
参考) based on Wikipedia biographies. 经过
fine-tuning their models on such natural and more
even data, the generation of feminine forms is
overall improved. 然而, the approach is not
as effective for gender translation on the anti-
stereotypical WinoMT set. As discussed in §3.2.2,
they employ a straightforward method that aims
to increase the number of Wikipedia pages cover-
ing women in their training data. 然而, 这样的
coverage increase does not mitigate stereotyping
harms, as it does not account for the qualitative
different ways in which men and women are
portrayed (Wagner et al., 2015).

5.2 Debiasing through External Components

Instead of directly debiasing the MT model, 这些
mitigating strategies intervene in the inference
phase with external dedicated components. 这样的
approaches do not imply retraining, but introduce

855

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

the additional cost of maintaining separate mod-
ules and handling their integration with the MT
模型.

Black-box Injection. Moryossef et al. (2019)
attempt to control the production of feminine
inflec-
references to the speaker and numeral
系统蒸发散 (plural or singular) for the listener(s) 在一个
English-Hebrew spoken language setting. 对此
目的, they rely on a short construction, 例如
‘‘she said to them’’, which is prepended to the
source sentence and then removed from the MT
输出. Their approach is simple, it can handle two
types of information (gender and number) for mul-
tiple entities (speaker and listener), and improves
systems’ ability to generate feminine target forms.
然而, as in the case of Vanmassenhove
等人。, 2018 and Elaraby et al. (2018), it requires
metadata about speakers and listeners.

Lattice Re-scoring. Saunders and Byrne
(2020) propose to post-process the MT output
with a lattice re-scoring module. This module
exploits a transducer to create a lattice by map-
ping gender marked words in the MT output to
all their possible inflectional variants. Developed
for German, 西班牙语, and Hebrew, all the sen-
tences corresponding to the paths in the lattice
are re-scored with another model, which has been
gender-debiased but at the cost of lower generic
translation quality. 然后, the sentence with the
highest probability is picked as the final out-
放. When tested on WinoMT, such an approach
leads to an increase in the accuracy of gender
forms selection. Note that the gender-debiased
system is created by fine-tuning the model on an
ad hoc built tiny set containing a balanced number
of masculine/feminine forms. 这样的做法,
also known as counterfactual data augmentation
(卢等人。, 2020), requires one to create identical
pairs of sentences differing only in terms of gender
参考. 实际上, Saunders and Byrne (2020)
compile English sentences following this schema:
‘‘The finished work’’.
然后, the sentences are automatically translated
and manually checked. 这样, they obtain
gender-balanced parallel corpus. 因此, to imple-
ment their method for other language pairs, 这
generation of new data is necessary. 为了
fine-tuning set, the effort required is limited as
the goal is to alleviate stereotypes by focusing
on a pre-defined occupational lexicon. 然而,
data augmentation is very demanding for complex
sentences that represent a rich variety of gender

agreement phenomena14 such as those occurring
in natural language scenarios.

Gender Re-inflection. Habash et al. (2019)
and Alhafni et al. (2020) confront the problem
of speaker’s gender agreement in Arabic with a
post-processing component that re-inflects first
person references into masculine/feminine forms.
In Alhafni et al. (2020), the preferred gender of the
speaker and the translated Arabic sentence are fed
to the component, which re-inflects the sentence
in the desired form. In Habash et al. (2019) 这
component can be: 我) a two-step system that first
identifies the gender of first person references in an
MT output, and then re-inflects them in the oppo-
site form; or ii) a single-step system that always
produces both forms from an MT output. 他们的
method does not necessarily require speakers’
gender information: If metadata are supplied, 这
MT output is re-inflected accordingly; 否则,
both feminine/masculine inflections are offered
(leaving to the user the choice of the appropriate
一). The implementation of the re-inflection
component was made possible by the Arabic
Parallel Gender Corpus (see §4.3), which de-
manded an expensive work of manual data cre-
化. 然而, such corpus grants research
on English-Arabic the benefits of a wealth of
gender-informed natural language data that have
been curated to avoid hetero-centrist interpreta-
tions and preconceptions (例如, proper names and
speakers of sentences like ‘‘that’s my wife’’ are
flagged as gender-ambiguous). Along the same
线, Google Translate also delivers two outputs
for short gender-ambiguous queries (约翰逊,
2020乙). Among languages with grammatical gen-
这, the service is currently available only for
English-Spanish.

In light of the above, we remark that there is no
conclusive state-of-the-art method for mitigating
bias. The discussed interventions in MT tend to
respond to specific aspects of the problem with
modular solutions, but if and how they can be
integrated within the same MT system remains
未探索的. As we have discussed through the
survey, the umbrella term ‘‘gender bias’’ refers to
a wide array of undesirable phenomena. 因此, 它
is unlikely that a one-size-fits-all solution will be

14Zmigrod et al. (2019) proposed an automatic approach
for augmenting data into morphologically rich languages, 但
it is only viable for simple constructions with one single
实体.

856

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

able tackle problems that differ from one another,
as they depend on, 例如, how bias is
conceptualized, the language combinations, 这
kinds of corpora used. 因此, we believe that
generalization and scalability should not be the
only criteria against which mitigating strategies
are valued. 反过来, we should make room
for openly context-aware interventions. 最后,
gender bias in MT is a socio-technical problem.
We thus highlight that engineering interventions
alone are not a panacea (张, 2019) and should
be integrated with long-term multidisciplinary
commitment and practices (D’Ignazio and Klein,
2020; Gebru, 2020) necessary to address bias in
our community, hence in its artifacts, 也.

6 Conclusion and Key Challenges

Studies confronting gender bias in MT are rapidly
新兴的; in this paper we presented them within
a unified framework to critically overview cur-
rent conceptualizations and approaches to the
问题. Since gender bias is a multifaceted and
interdisciplinary issue, in our discussion we inte-
grated knowledge from related disciplines, 哪个
can be instrumental to guide future research and
make it thrive. We conclude by suggesting several
directions that can help this field going forward.
Model De-biasing. Neural networks

rely
on easy-to-learn shortcuts or ‘‘cheap tricks’’
(Levesque, 2014), as picking up on spurious cor-
relations offered by training data can be easier for
machines than learning to actually solve a specific
任务. What is ‘‘easy to learn’’ for a model depends
on the inductive bias (Sinz et al., 2019; Geirhos
等人。, 2020) resulting from architectural choices,
training data and learning rules. We think that
explainability techniques (Belinkov et al., 2020)
represent a useful tool to identify spurious cues
(特征) exploited by the model during inference.
Discerning them can provide the research com-
munity with guidance on how to improve models’
generalization by working on data, 架构,
loss functions and optimizations. 例如,
data responsible for spurious features (例如,
stereotypical correlations) might be recognized
and their weight at training time might be lowered
(Karimi Mahabadi et al., 2020). 此外,
state-of-the-art architectural choices and algo-
rithms in MT have mostly been studied in terms
of overall translation quality without specific anal-
yses regarding gender translation. 例如,

current systems segment text into subword units
with statistical methods that can break the mor-
phological structure of words, thus losing relevant
semantic and syntactic information in morphologi-
cally rich languages (Niehues et al., 2016; Ataman
等人。, 2017). Several languages show complex
feminine forms, typically derivative and created
by adding a suffix to the masculine form, 这样的
as Lehrer/Lehrerin (的), studente/studentessa (它).
It would be relevant to investigate whether, com-
pared to other segmentation techniques, 统计
approaches disadvantage (rarer and more com-
丛) feminine forms. The MT community should
not overlook focused hypotheses of such kind, 作为
they can deepen our comprehension of the gender
bias conundrum.

Non-textual Modalities. Gender bias for non-
textual automatic translations (例如, audiovisual)
has been largely neglected. 在这个意义上, ST rep-
resents a small niche (Costa-juss`a et al., 2020A).
For the translation of speaker-related gender phe-
nomena, Bentivogli et al. (2020) prove that direct
ST systems exploit speaker’s vocal characteris-
tics as a gender cue to improve feminine trans-
关系. 然而, as addressed by Gaido et al.
(2020), relying on physical gender cues (例如,
沥青) for such task implies reductionist gender
classifications (Zimman, 2020) making systems
potentially harmful for a diverse range of users.
相似地, although image-guided translation has
been claimed useful for gender translation since it
relies on visual inputs for disambiguation (Frank
等人。, 2018; Ive et al., 2019), it could bend toward
stereotypical assumptions about appearance. 毛皮-
ther research should explore such directions to
identify potential challenges and risks, by draw-
ing on bias in image captioning (van Miltenburg,
2019) and consolidated studies from the fields
of automatic gender recognition and human–
computer interaction (人机交互) (Hamidi et al., 2018;
Keyes, 2018; 可能, 2019).

Beyond Dichotomies. Besides a few notable
exceptions for English NLP tasks (Manzini et al.,
2019; Cao and Daum´e III, 2020; 孙等人。, 2021)
and one in MT (Saunders et al., 2020), the discus-
sion around gender bias has been reduced to the
binary masculine/feminine dichotomy. 虽然
research in this direction is currently hampered
by the absence of data, we invite considering
inclusive solutions and exploring nuanced dimen-
sions of gender. Starting from language practices,
Indirect Non-binary Language (INL) overcomes

857

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

INL can

gender specifications (例如, using service, 人类-
kind rather than waiter/waitress or mankind).15
Although more

具有挑战性的,
achieved also for grammatical gender languages
(Motschenbacher, 2014; Lindqvist et al., 2019),
and it is endorsed for official EU documents
(Papadimoulis, 2018). 因此, MT mod-
els could be brought
to avoid binary forms
and move toward gender-unspecified solutions,
例如, adversarial networks including a
discriminator that classifies speaker’s linguistic
expression of gender (masculine or feminine)
could be employed to ‘‘neutralize’’ speaker-
related forms (李等人。, 2018; Delobelle et al.,
2020). 反过来, Direct Non-binary Language
(DNL) aims at increasing the visibility of non-
binary individuals via neologisms and neomor-
phemes (Bradley et al., 2019; Papadopoulos,
2019; Knisely, 2020). With DNL starting to circu-
晚的 (Shroy, 2016; 圣地亚哥, 2018; L´opez, 2019),
the community is presented with the opportunity
to promote the creation of inclusive data.

最后, as already highlighted in legal and
social science theory, discrimination can arise
from the intersection of multiple identity cate-
gories (例如, race and gender) (Crenshaw, 1989)
which are not additive and cannot always be
detected in isolation (Schlesinger et al., 2017).
Following the MT work by Hovy et al. (2020),
as well as other intersectional analyses from
自然语言处理 (Herbelot et al., 2012; Jiang and Fellbaum,
2020) and AI-related fields (Buolamwini and
Gebru, 2018), future studies may account for
the interaction of gender attributes with other
sociodemographic classes.

Human-in-the-Loop. Research on gender bias
in MT is still restricted to lab tests. 像这样,
unlike other studies that rely on participatory
设计 (Turner et al., 2015; Cercas Curry et al.,
2020; Liebling et al., 2020), the advancement of
the field is not measured with people’s experience
in focus or in relation to specific deployment
上下文. 然而, these are fundamental consid-
erations to guide the field forward and, as HCI
studies show (Vorvoreanu et al., 2019), to propel
the creation of gender-inclusive technology. 在
特别的, representational harms are intrinsically
difficult to estimate and available benchmarks
only provide a rough idea of their extent. 这是

15INL suggestions have also been recently implemented

within Microsoft text editors (Langston, 2020).

an argument in favor of focused studies16 on their
individual or aggregate effects in everyday life.
还, we invite the whole development process to
be paired with bias-aware research methodology
(Havens et al., 2020) and HCI approaches (Stumpf
等人。, 2020), which can help to operationalize sen-
sitive attributes like gender (Keyes et al., 2021).
最后, MT is not only built for people, 但是也
by people. 因此, it is vital to reflect on the implicit
biases and backgrounds of the people involved in
MT pipelines at all stages and how they could be
reflected in the model. This means starting from
bottom-level countermeasures, engaging with
translators (De Marco and Toto, 2019; Lessinger,
2020) and annotators (Waseem, 2016; Geva
等人。, 2019), and considering everyone’s subjec-
tive positionality—and, 关键地, also the lack
of diversity within technology teams (Schluter,
2018; Waseem et al., 2020).

致谢

We would like to thank the anonymous reviewers
and the TACL Action Editors. Their insight-
ful comments helped us improve on the current
version of the paper.

参考

Emad A. S. Abu-Ayyash. 2017. Errors and
non-errors in english-arabic machine transla-
tion of gender-bound constructs in technical
文本. Procedia Computer Science, 117:73–80.
https://doi.org/10.1016/j.procs
.2017.10.095

Bashar Alhafni, Nizar Habash, and Houda
Bouamor. 2020. Gender-aware reinflection
using linguistically enhanced neural models.
In Proceedings of the Second Workshop on
Gender Bias in Natural Language Processing,
pages 139–150, 在线的. Association for Com-
putational Linguistics.

Duygu Ataman, Matteo Negri, Marco Turchi,
and Marcello Federico. 2017. Linguistically

16据我们所知, the Gender-Inclusive
Language Models Survey is the first project of this kind
that includes MT. At time of writing it is available at:
https://docs.google.com/forms/d/e/1FAIpQL
SfKenp4RKtDhKA0WLqPflGSBV2VdBA9h3F8MwqRex
4kiCf9Q/viewform.

858

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

motivated vocabulary reduction for neural
machine translation from Turkish to English.
The Prague Bulletin of Mathematical Linguis-
抽动症, 108(1)331–342. https://doi.org/10
.1515/pralin-2017-0031

David Bamman, Jacob Eisenstein, and Tyler
Schnoebelen. 2014. Gender identity and lexical
variation in social media. Journal of Socio-
语言学, 18(2):135–160. https://土井
.org/10.1111/josl.12080

Solon Barocas, Moritz Hardt,

and Arvind
Narayanan. 2019. Fairness and Machine
http://万维网
学习.
.fairmlbook:org.

fairmlbook.org.

Christine Basta, Marta R. Costa-juss`a, and Jos´e
A. 右. Fonollosa. 2020. Towards mitigating
gender bias in a decoder-based neural machine
在-
translation model by adding contextual
形成. In Proceedings of the The Fourth
Widening Natural Language Processing Work-
店铺, pages 99–102, Seattle, 美国. 协会
for Computational Linguistics. https://
doi.org/10.18653/v1/2020.winlp-1
.25

Rachel Bawden, Guillaume Wisniewski, 和
H´el`ene Maynard. 2016. Investigating gender
adaptation for speech translation. In Proceed-
ings of the 23`eme Conf´erence sur le Traitement
Automatique des Langues Naturelles, 体积 2,
pages 490–497, 巴黎, FR.

Yonatan Belinkov, Nadir Durrani, Fahim
Dalvi, Hassan Sajjad, and James Glass.
2020. 在
representational
语言学的
power of neural machine translation mod-
这. 计算语言学, 46(1):1–52.
https://doi.org/10.1162/coli a
00367

Emily M. Bender. 2019. A typology of ethical risks
in language technology with an eye towards
where transparent documentation might help. 在
CRAASH. The future of Artificial Intelligence:
语言, 伦理, 技术. 剑桥,
英国.

Emily M. Bender and Batya Friedman. 2018. 数据
语言处理:
statements for natural
Toward mitigating system bias and enabling
better science. Transactions of the Association

859

for Computational Linguistics, 6:587–604.
https://doi.org/10.1162/tacl
00041

Emily M. Bender, Timnit Gebru, Angelina
McMillan-Major, and Shmargaret Shmitchell.
2021. On the dangers of stochastic parrots:
Can language models be

Proceedings of the Conference on Fairness,
Accountability, and Transparency (FAccT ’21),
pages 610–623, 在线的. ACM. https://
doi.org/10.1145/3442188.3445922

too big?

Luisa Bentivogli, Beatrice Savoldi, Matteo Negri,
Mattia A. Di Gangi, Roldano Cattoni, 和
Marco Turchi. 2020. Gender in danger? Eval-
uating speech translation technology on the
MuST-SHE Corpus. In Proceedings of the 58th
Annual Meeting of the Association for Compu-
tational Linguistics, pages 6923–6933, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.acl-main.619

Victoria L. Bergvall, Janet M. Bing, and Alice F.
Freed. 1996. Rethinking Language and Gender
研究: Theory and Practice. 伦敦, 英国.
Addison Wesley Longman.

Camiel J. Beukeboom and Christian Burgers.
2019. How stereotypes are shared through
语言: A Review and Introduction of the
and Stereotypes Com-
Social Categories
框架. Review of
munication (SCSC)
Communication Research, 7:1–37.

Abeba Birhane, Pratyusha Kalluri, Dallas Card,
William Agnew, Ravit Dotan, and Michelle
Bao. 2020. The underlying values of machine
learning research. In Resistance AI Workshop
@ NeurIPS, 在线的.

Su Lin Blodgett. 2021. Sociolinguistically Driven
Just Natural Language

为了
Approaches
加工. Doctoral Dissertation.

Su Lin Blodgett, Solon Barocas, Hal Daum´e III,
and Hanna Wallach. 2020. 语言 (techno-
logy) is power: A critical survey of ‘‘bias’’
in NLP. In Proceedings of the 58th Annual
the Association for Computa-
Meeting of
tional Linguistics, pages 5454–5476, 在线的.
计算语言学协会.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou,
Venkatesh Saligrama, and Adam T. Kalai. 2016.
Man is to computer programmer as woman is to
homemaker? Debiasing word embeddings. 在
Proceedings of the 30th Conference on Neural
Information Processing Systems (NIPS 2016),
体积 29, pages 4349–4357, 巴塞罗那, ES.
柯伦联合公司, Inc.

David Bourguignon, Vincent Y. Yzerbyt, Catia P.
Teixeira, and Ginette Herman. 2015. 什么时候
does it hurt? Intergroup permeability mode-
rates the link between discrimination and
self-esteem. European Journal of Social
心理学, 45(1):3–9. https://doi.org
/10.1002/ejsp.2083

Evan D. Bradley, Julia Salkind, Ally Moore,
and Sofi Teitsort. 2019. Singular ‘they’ and
novel pronouns: gender-neutral, nonbinary,
the Linguistic So-
或两者? 会议记录
ciety of America, 4(1):36–1. https://土井
.org/10.3765/plsa.v4i1.4542

Friederike Braun. 2000. Geschlecht im T¨urkischen:
Untersuchungen zum sprachlichen Umgang mit
einer sozialen Kategorie, Turcologica Series.
Otto Harrassowitz Verlag, Wiesbaden, DE.

Sheila Brownlow,

朱莉A. Rosamond, 和
Jennifer A. 派克. 2003. Gender-linked lin-
guistic behavior in television interviews. 性别
Roles, 49(3–4):121–132. https://doi.org
/10.1023/A:1024404812972

商业的

Intersectional

Joy Buolamwini and Timnit Gebru. 2018. 性别
accuracy disparities
shades:


classification.
性别
Proceedings of the 1st Conference on Fairness,
Accountability and Transparency, 体积 81
of Proceedings of Machine Learning Research,
pages 77–91, 纽约, 美国. PMLR.

Judith Butler. 1990. Gender Trouble: Feminism
and the Subversion of Identity. 劳特利奇,
纽约, 美国.

Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan. 2017. Semantics derived automat-
ically from language corpora contain human-
356(6334):183–186.
喜欢
https://doi.org/10.1126/science
.aal4230, 考研: 28408601

科学,

biases.

Deborah Cameron. 2003. 性别


language change. Annual Review of Applied
语言学, 23:187–201. https://doi.org
/10.1017/S0267190503000266

问题

Alex Campolo, Madelyn R. Sanfilippo, Meredith
Whittaker, and Kate Crawford. 2017. AI Now
报告 2017. 纽约: AI Now Institute.

Yang T. Cao and Hal Daum´e III. 2020. 走向
gender-inclusive coreference resolution.

Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 4568–4595, 在线的. 协会
计算语言学.

Sheila Castilho, Maja Popovi´c, and Andy Way.
2020. On context span needed for machine
translation evaluation. 在诉讼程序中
12th Language Resources and Evaluation
会议, pages 3735–3742, Marseille, FR.
European Language Resources Association.

Roldano Cattoni, Mattia A. Di Gangi, Luisa
Bentivogli, Matteo Negri, and Marco Turchi.
2021. MuST-C: A multilingual corpus for end-
to-end speech translation. Computer Speech
& 语言, 66:101155. https://土井
.org/10.1016/j.csl.2020.101155

Amanda Cercas Curry, Judy Robertson, 和
Verena Rieser. 2020. Conversational assistants
and gender stereotypes: Public perceptions
and desiderata for voice personas. In Pro-
ceedings of
the Second Workshop on Gen-
der Bias in Natural Language Processing,
pages 72–78, 在线的. Association for Com-
putational Linguistics.

Lori Chamberlain. 1988. 性别


metaphorics of translation. Signs: 杂志
Women in Culture and Society, 13(3):454–472.
https://doi.org/10.1086/494428

Kai-Wei Chang. 2019. Bias and fairness in natural
语言处理. Tutorial at
这 2019
Conference on Empirical Methods in Natural
语言处理 (EMNLP).

Won Ik Cho, Ji Won Kim, Seok Min Kim,
and Nam Soo Kim. 2019. On measuring
gender bias in translation of gender-neutral
pronouns. In Proceedings of the First Work-
shop on Gender Bias in Natural Language

860

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

加工, pages 173–181, Florence,
它.
计算语言学协会.
https://doi.org/10.18653/v1/W19
-3824

In Proceedings of the 12th Language Resources
and Evaluation Conference, pages 4081–4088,
Marseille, FR. European Language Resources
协会.

Aleksandra Cislak, Magdalena Formanowicz, 和
Tamar Saguy. 2018. Bias against research on
gender bias. Scientometrics, 115(1):189–200.

Bernard Comrie. 1999. Grammatical gender sys-
特姆斯: A linguist’s assessment. Journal of Psy-
cholinguistic Research, 28:457–466. https://
doi.org/10.1023/A:1023212225540

Kirby Conrod. 2020. Pronouns and gender in
语言. The Oxford Handbook of Language
and Sexuality. https://doi.org/10.1093
/oxfordhb/9780190212926.013.63

Greville G. Corbett. 1991. 性别. 剑桥
Textbooks in Linguistics. 剑桥大学-
城市出版社, 剑桥, 英国.

Greville G. Corbett. 2013. The Expression of

性别. De Gruyter Mouton, 柏林, DE.

Marta R. Costa-juss`a. 2019. An analysis of gen-
der bias studies in natural language process-
英. Nature Machine Intelligence, 1:495–496.
https://doi.org/10.1038/s42256
-019-0105-5

Marta R. Costa-juss`a, Christine Basta, 和
Gerard I. G´allego. 2020A. Evaluating gender
bias in speech translation. arXiv 预印本
arXiv:2010.14465.

Marta R. Costa-juss`a and Adri`a de Jorge. 2020.
Fine-tuning neural machine translation on
gender-balanced datasets. 在诉讼程序中
Second Workshop on Gender Bias in Natural
语言处理, pages 26–34, 在线的.
计算语言学协会.

Marta R. Costa-juss`a, Carlos Escolano, Christine
Basta, Javier Ferrando, Roser Batlle, 和
Ksenia Kharitonova. 2020乙. Gender bias
in multilingual neural machine translation:
architecture matters. arXiv 预印本

arXiv:2012.13176.

Marta R. Costa-juss`a, Pau Li Lin,


Cristina Espa˜na-Bonet. 2020C. GeBioToolkit:
Automatic
gender-balanced
multilingual corpus of Wikipedia biographies.

extraction

Colette G. Craig. 1994, Classifier languages.
In Ronald E. 亚瑟 & 詹姆斯·M. 是.
辛普森, 编辑, The Encyclopedia of Language
and Linguistics, 体积 2, pages 565–569.
Pergamon Press, 牛津, 英国.

Kate Crawford. 2017. The trouble with bias. 在
Conference on Neural Information Processing
系统 (NIPS) – Keynote, Long Beach, 美国.

Kimberl´e Crenshaw. 1989. Demarginalizing the
intersection of race and sex: A Black feminist
critique of antidiscrimination doctrine, feminist
theory and antiracist politics. 大学
Chicago Legal Forum, 1989:139–167.

Caroline Criado-Perez. 2019. Invisible Women:
Exposing Data Bias in a World Designed for
男士. Chatto & Windus, 伦敦, 英国.

Anne Curzan. 2003. Gender Shifts in the His-
tory of English. 剑桥大学出版社,
剑桥, 英国. https://doi.org/10
.1017/CBO9780511486913

Jeffrey Dastin. 2018. Amazon scraps secret AI
recruiting tool that showed bias against women.
https://www.reuters.com/article
/us-amazon-com-jobs-automation
-insight-idUSKCN1MK08G. Accessed:
2021-02-25.

Marcella De Marco and Piero Toto. 2019. 介绍-
归纳法: The potential of gender training in the
translation classroom. In Gender Approaches in
the Translation Classroom: Training the Doers,
pages 1–7, 帕尔格雷夫·麦克米伦, 占婆, CH.
https://doi.org/10.1007/978-3-030
-04390-2 1

Pieter Delobelle, Paul Temple, Gilles Perrouin,
Benoˆıt Fr´enay, Patrick Heymans, and Bettina
Berendt. 2020. Ethical adversaries: Towards
mitigating unfairness with adversarial machine
学习. In Informal Proceedings of the Bias
and Fairness in AI Workshop at ECML-PKDD
(BIAS 2020). BIAS 2020.

Hannah Devinney,


Henrik Bj¨orklund. 2020. Semi-supervised topic

Jenny Bj¨orklund,

861

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

modeling for gender bias discovery in English
and Swedish. In Proceedings of the Second
Workshop on Gender Bias in Natural Language
加工, pages 79–92, 在线的. 协会
for Computational Linguistics.

Bruna Di Sabato and Antonio Perri. 2020.
Grammatical gender and translation: A cross-
linguistic overview. In Luise von Flotow and
Hala Kamal, 编辑, The Routledge Hand-
book of Translation, Feminism and Gender.
劳特利奇, 纽约, 美国. https://土井
.org/10.4324/9781315158938-32

Catherine D’Ignazio and Lauren F. 克莱因. 2020.

Data Feminism. 与新闻界, 伦敦, 英国.

Emily Dinan, Angela Fan, Ledell Wu, Jason
Weston, Douwe Kiela, and Adina Williams.
2020. Multi-dimensional gender bias classifi-
阳离子. 在诉讼程序中 2020 会议
on Empirical Methods in Natural Language
加工 (EMNLP), pages 314–331, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.emnlp-main.23

Cynthia Dwork, Moritz Hardt, Toniann Pitassi,
Omer Reingold, and Richard Zemel. 2012.
Fairness through awareness. 在诉讼程序中
the 3rd Innovations in Theoretical Computer
Science Conference, ITCS ’12, pages 214–226,
纽约, 美国. Association for Computing
Machinery. https://doi.org/10.1145
/2090236.2090255

Penelope Eckert and Sally McConnell-Ginet.
2013. Language and Gender. 剑桥
大学出版社, 剑桥, 英国. https://
doi.org/10.1017/CBO9781139245883

Mostafa Elaraby, Ahmed Y. Tawfik, Mahmoud
Khaled, Hany Hassan, and Aly Osama. 2018.
Gender aware spoken language translation
applied to English-Arabic. 在诉讼程序中
the 2nd International Conference on Natural
Language and Speech Processing (ICNLSP),
pages 1–6, Algiers, DZ. https://doi.org
/10.1109/ICNLSP.2018.8374387

Carolyn Epple. 1998. Coming to terms with
Navajo n´adleeh´ı: A critique of berdache,
‘‘gay’’, ‘‘alternate gender’’, and ‘‘two-spirit’’.
American Ethnologist, 25(2):267–290.

Carlos Escolano, Marta R. Costa-juss`a, Jos´e A. 右.
Fonollosa, and Mikel Artetxe. 2021. 多-
lingual machine translation: Closing the gap
between shared and language-specific encoder-
decoders. In Proceedings of the 16th conference
of the European Chapter of the Association
for Computational Linguistics (EACL). 在线的.
https://doi.org/10.1525/ae.1998
.25.2.267

Joel Escud´e Font and Marta R. Costa-juss`a.
2019. Equalizing gender bias in neural machine
translation with word embeddings techniques.
在诉讼程序中
the First Workshop on
Gender Bias in Natural Language Processing,
pages 147–154, Florence, 它. 协会
计算语言学. https://土井
.org/10.18653/v1/W19-3821

Anne Fausto-Sterling. 2019. Gender/sex, 性的
orientation, and identity are in the body: 如何
did they get there? The Journal of Sex Research,
56(4–5):529–555.

Anke Frank, Christiane Hoffmann, and Maria
in machine

Strobel. 2004. 性别
问题
翻译. University of Bremen.

Stella Frank, Desmond Elliott, and Lucia Specia.
2018. Assessing multilingual multimodal image
description: Studies of native speaker pref-
erences and translator choices. Natural Lan-
guage Engineering, 24(3):393–413. https://
doi.org/10.1017/S1351324918000074

Batya Friedman and Helen Nissenbaum. 1996.
Bias in computer systems. ACM Transactions
on Information Systems (TOIS), 14(3):330–347.
https://doi.org/10.1145/230538
.230561

Marco Gaido, Beatrice Savoldi, Luisa Bentivogli,
Matteo Negri, and Marco Turchi. 2020. Breed-
ing gender-aware direct speech translation
系统. In Proceedings of the 28th Interna-
tional Conference on Computational Linguis-
抽动症, pages 3951–3964, 在线的. 国际的
Committee on Computational Linguistics.

Aparna Garimella, Carmen Banea, Dirk Hovy,
and Rada Mihalcea. 2019. Women’s syntactic
resilience and men’s grammatical luck: 性别-
bias in part-of-speech tagging and dependency
解析. In Proceedings of the 57th Annual

862

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Meeting of the Association for Computational
语言学, pages 3493–3498, Florence, 它.
计算语言学协会.
https://doi.org/10.18653/v1/P19
-1339

Timnit Gebru. 2020. Race and gender. In Markus
D. Dubber, Frank Pasquale, and Sunit Das,
编辑, The Oxford Handbook of Ethics of AI.
Oxford Handbook Online. https://doi.org
/10.1093/oxfordhb/9780190067397
.013.16

Robert Geirhos, J¨orn-Henrik Jacobsen, Claudio
Michaelis, Richard Zemel, Wieland Brendel,
Matthias Bethge, and Felix A. Wichmann. 2020.
Shortcut
learning in deep neural networks.
Nature Machine Intelligence, 2(11):665–673.
https://doi.org/10.1038/s42256
-020-00257-z

Mor Geva, Yoav Goldberg, and Jonathan Berant.
2019. Are we modeling the task or

annotator? An investigation of annotator bias
in natural language understanding datasets. 在
诉讼程序 2019 Conference on Empir-
ical Methods in Natural Language Processing
and the 9th International Joint Conference
on Natural Language Processing (EMNLP-
IJCNLP), pages 1161–1166, 香港, CN.
计算语言学协会.
https://doi.org/10.18653/v1/D19
-1107

Lisa Gitelman. 2013. Raw Data is an Oxymoron.

与新闻界.

Fiona Glen and Karen Hurrell. 2012. 测量
gender identity. https://www.equalityhuman
rights.com/sites/default/files
/technical note final.pdf. Accessed:
2021-02-25.

Bruce Glymour and Jonathan Herington. 2019.
Measuring the biases that matter: The ethical
and casual foundations for measures of fairness
in algorithms. In Proceedings of the Confer-
ence on Fairness, Accountability, and Trans-
parency, FAT* ’19, pages 269–278, 纽约,
美国. Association for Computing Machinery.
https://doi.org/10.1145/3287560
.3287573

Seraphina Goldfarb-Tarrant, Rebecca Marchant,
Ricardo Mu˜noz Sanchez, Mugdha Pandya, 和

Adam Lopez. 2020. Intrinsic bias metrics do not
correlate with application bias. arXiv 预印本
arXiv:2012.15859.

Kirsten Gomard. 1995. 这 (和)equal treatment
of women in language: A comparative study of
Danish, 英语, and German. Working Papers
on Language, Gender and Sexism, 5(1):5–25.

Hila Gonen and Yoav Goldberg. 2019. Lipstick on
a pig: Debiasing methods cover up systematic
gender biases in word embeddings but do not
remove them. 在诉讼程序中
这 2019
Conference of the North American Chapter
of the Association for Computational Linguis-
抽动症: 人类语言技术, 体积
1 (Long and Short Papers), pages 609–614,
明尼阿波利斯, Minnesota, 美国. 协会
计算语言学.

Hila Gonen and Kellie Webster. 2020. Automat-
ically identifying gender issues in machine
translation using perturbations. In Findings of
the Association for Computational Linguistics:
EMNLP 2020, pages 1991–1995, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.findings-emnlp.180

Anthony G. Greenwald, Debbie E. McGhee,
and Jordan L. K. 施瓦茨. 1998. 测量
individual differences in implicit cognition:
The implicit association test. Journal of Per-
sonality and Social Psychology, 74(6):1464.
https://doi.org/10.1037/0022-3514
.74.6.1464, 考研: 9654756

Liane Guillou. 2012. Improving pronoun trans-
lation for statistical machine translation. 在
Proceedings of the Student Research Work-
shop at the 13th Conference of the European
Chapter of the Association for Computational
语言学, 第 1–10 页, 阿维尼翁, FR. Asso-
ciation for Computational Linguistics.

Pascal M. Gygax, Daniel Elmiger, Sandrine
Zufferey, Alan Garnham, Sabine Sczesny, Lisa
von Stockhausen, Friederike Braun, and Jane
Oakhill. 2019. A language index of grammatical
gender dimensions to study the impact of gram-
matical gender on the way we perceive women
and men. 心理学前沿, 10:1604.
https://doi.org/10.3389/fpsyg.2019
.01604, 考研: 31379661

863

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Pascal M. Gygax, Ute Gabriel, Oriane Sarrasin,
Jane Oakhill, and Alan Garnham. 2008. Gen-
erically intended, but specifically interpreted:
When beauticians, musicians and mechanics
are all men. Language and Cognitive Processes,
23:464–485. https://doi.org/10.1080
/01690960701702035

Nizar Habash, Houda Bouamor, and Christine
钟. 2019. Automatic gender identification
and reinflection in Arabic. 在诉讼程序中
First Workshop on Gender Bias in Natural Lan-
guage Processing, pages 155–165, Florence,
它. 计算语言学协会.
https://doi.org/10.18653/v1/W19
-3822

Philipp Hacker. 2018. Teaching fairness


artificial
智力: Existing and novel
strategies against algorithmic discrimination
under EU law. Common market law review,
55(4):1143–1185.

Kira Hall and Veronica O’Donovan. 2014. Shift-
ing gender positions among Hindi-speaking
hijras. Rethinking language and gender re-
搜索: Theory and practice, pages 228–266.

Foad Hamidi, Morgan K. Scheuerman, 和
识别-
Stacy M. Branham. 2018. 性别
nition or gender
reductionism? The social
implications of embedded gender recognition
这 2018 CHI
系统. 在诉讼程序中
Conference on Human Factors in Computing
系统, CHI ’18, pages 1–13, 纽约,
美国. Association for Computing Machinery.
https://doi.org/10.1145/3173574
.3173582

Mykol C. 汉密尔顿. 1988. Using masculine gener-
集成电路: Does generic he increase male bias in the
user’s imagery? Sex roles, 19(11–12):785–799.
https://doi.org/10.1007/BF00288993

Mykol C. 汉密尔顿. 1991. Masculine bias in the
attribution of personhood: People = male, male
= people. Psychology of Women Quarterly,
15(3):393–402.

Alex Hanna, Andrew Smart, Ben Hutchinson,
Christina Greer, Emily Denton, 玛格丽特
and Parker
米切尔, Oddur Kjartansson,
巴恩斯. 2021. Towards accountability for
machine learning datasets. 在诉讼程序中

864

the Conference on Fairness, Accountability,
and Transparency (FAccT ’21), pages 560–575,
在线的. ACM.

Christian Hardmeier, Marta R. Costa-juss`a,
Kellie Webster, Will Radford, and Su Lin
Blodgett. 2021. How to write a bias statement:
Recommendations for submissions to the work-
shop on gender bias in NLP. arXiv 预印本
arXiv:2104.03026.

Christian Hardmeier and Marcello Federico.
2010. Modelling pronominal anaphora in sta-
tistical machine translation. 在诉讼程序中
the seventh International Workshop on Spoken
Language Translation (IWSLT), pages 283–289,
巴黎, FR.

Lucy Havens, Melissa Terras, Benjamin Bach,
and Beatrice Alex. 2020. Situated data,
situated systems: A methodology to engage
语言
with power
processing research. 在诉讼程序中

Second Workshop on Gender Bias in Natural
语言处理, pages 107–124, 在线的.
计算语言学协会.

in natural

关系

Marlis Hellinger and Hadumond Bußman. 2001.
Gender across Languages: The Linguistic
Representation of Women and Men, 体积 1.
John Benjamins Publishing, 阿姆斯特丹, NL.
https://doi.org/10.1075/impact.9
.05hel

Marlis Hellinger and Hadumond Bußman. 2002.
Gender across Languages: The Linguistic
Representation of Women and Men, 体积 2.
John Benjamins Publishing, 阿姆斯特丹, NL.
https://doi.org/10.1075/impact.10
.05hel

Marlis Hellinger and Hadumond Bußman. 2003.
Gender across Languages: The Linguistic
Representation of Women and Men, 体积 3.
John Benjamins Publishing, 阿姆斯特丹, NL.
https://doi.org/10.1075/impact
.05hel

Marlis Hellinger and Heiko Motschenbacher.
2015. Gender Across Languages: The Lin-
guistic Representation of Women and Men,
体积 4. John Benjamins, 阿姆斯特丹, NL.
https://doi.org/10.1075/impact
.36.01hel

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Lisa A. Hendricks, Kaylee Burns, Kate Saenko,
Trevor Darrell, and Anna Rohrbach. 2018.
Women also snowboard: Overcoming bias in

captioning model.
European Conference on Computer Vision
(ECCV), pages 740–755. 慕尼黑, DE. https://
doi.org/10.1007/978-3-030-01219-9 47

在诉讼程序中

Aur´elie Herbelot, Eva von Redecker, and Johanna
M¨uller. 2012. Distributional
techniques for
在诉讼程序中
philosophical enquiry.
the 6th Workshop on Language Technology
for Cultural Heritage, 社会科学, 和
人文学科, pages 45–54, 阿维尼翁, FR.
计算语言学协会.

Yasmeen Hitti, Eunbee Jang, Ines Moreno, 和
Carolyne Pelletier. 2019. Proposed taxonomy
for gender bias in text: A filtering methodology
为了

the gender generalization subtype.
the First Workshop on
会议记录
Gender Bias in Natural Language Processing,
pages 8–17, Florence,
它. 协会
计算语言学. https://土井
.org/10.18653/v1/W19-3802

Janet Holmes and Miriam Meyerhoff. 2003.
The Handbook of Language and Gender.
Blackwell Publishing Ltd, Malden, 美国.
https://doi.org/10.1002/9780470
756942

Levi C. 右. Hord. 2016. Bucking the linguistic
binary: Gender neutral language in English,
Swedish, 法语, and German. Western Papers
linguistiques de
/ Cahiers
in Linguistics
Western, 3(1):4.

Dirk Hovy, Federico Bianchi, and Tommaso
Fornaciari. 2020. ‘‘You sound just like your
father’’: Commercial machine translation sys-
tems include stylistic biases. In Proceedings
of the 58th Annual Meeting of the Association
for Computational Linguistics, pages 1686–1690,
在线的. Association for Computational Lin-
语言学. https://doi.org/10.18653
/v1/2020.acl-main.154

pages 452–461, 日内瓦, CH.
国际的
World Wide Web Conferences Steering
委员会. https://doi.org/10.1145
/2736277.2741141

Dirk Hovy and Shannon L. Spruit. 2016. 这
social impact of natural language processing.
In Proceedings of the 54th Annual Meeting
of the Association for Computational Linguis-
抽动症 (体积 2: Short Papers), pages 591–598,
柏林, DE. Association for Computational Lin-
语言学. https://doi.org/10.18653
/v1/P16-2096

Janet S. Hyde. 2005. The gender similarities
假设. American Psychologist, 60(6):
581–592. https://doi.org/10.1037/0003
-066X.60.6.581, 考研: 16173891

Julia Ive, Pranava Madhyastha, and Lucia
Specia. 2019. Distilling translations with visual
意识. In Proceedings of the 57th Annual
Meeting of the Association for Computational
语言学, pages 6525–6538, Florence, 它.
计算语言学协会.
https://doi.org/10.18653/v1/P19
-1653

Abigail Z.

Jacobs. 2021. Measurement and
公平. 在诉讼程序中
这 2021 ACM
Conference on Fairness, Accountability, 和
Transparency, FAccT ’21, pages 375–385,
纽约, 美国. Association for Computing
Machinery. https://doi.org/10.1145
/3442188.3445901

Abigail Z. Jacobs, Su Lin Blodgett, Solon
Barocas, Hal Daum´e III, and Hanna Wallach.
2020. The meaning and measurement of bias:
语言处理.
Lessons from natural
在诉讼程序中
这 2020 会议
Fairness, Accountability, and Transparency,
FAT* ’20, 页 706, 纽约, 美国. Associa-
tion for Computing Machinery. https://
doi.org/10.1145/3351095.3375671

Dirk Hovy, Anders

Johannsen, and Anders
索加德. 2015. User review sites as a resource
for large-scale sociolinguistic studies. In Pro-
ceedings of
the 24th International Confer-
ence on World Wide Web, WWW ’15,

Roman Jakobson. 1959. On Linguistic Aspects of
Translation. In Reuben A. Brower, 编辑, 在
翻译, pages 232–239. 剑桥, 美国.
哈佛大学出版社. https://土井
.org/10.4159/harvard.9780674731615.c18

865

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

May Jiang and Christiane Fellbaum. 2020. 国际米兰-
dependencies of gender and race in contextua-
lized word embeddings. 在诉讼程序中
Second Workshop on Gender Bias in Natural
语言处理, pages 17–25, 在线的.
计算语言学协会.

Anders

Johannsen, Dirk Hovy, and Anders
索加德. 2015. Cross-lingual syntactic varia-
tion over age and gender. In Proceedings
of the Nineteenth Conference on Computational
Natural Language Learning, pages 103–112,
北京, CN. https://doi.org/10.18653
/v1/K15-1011

Kari Johnson. 2020A. AI weekly: A deep learning
teachable moment on AI bias.
pioneer’s
https://venturebeat.com/2020/06
/26/ai-weekly-a-deep-learning
-pioneers-teachable-moment-on-ai-bias/.
Accessed: 2021-02-25.

Melvin Johnson. 2020乙. A scalable approach
to reducing gender bias in Google Translate.
https://ai.googleblog.com/2020/40
/04/a-scalable-approach-to-reducing
-gender.html. Accessed: 2021-02-25.

Rabeeh Karimi Mahabadi, Yonatan Belinkov,
and James Henderson. 2020. End-to-end bias
mitigation by modelling biases in corpora.
In Proceedings of the 58th Annual Meeting
of the Association for Computational Linguis-
抽动症, pages 8706–8716, 在线的. 协会
计算语言学. https://土井
.org/10.18653/v1/2020.acl-main.769

Os Keyes. 2018. The misgendering machines:
Trans/HCI implications of automatic gender re-
认识. Proceedings of the ACM on Human-
Computer Interaction, 2(CSCW). https://
doi.org/10.1145/3274357

Os Keyes, Chandler May, and Annabelle Carrell.
2021. You keep using that word: Ways of
thinking about gender in computing research.
Proceedings of the ACM on Human-Computer
Interaction, 5(CSCW). https://doi.org
/10.1145/3449113

Yunsu Kim, Duc Thanh Tran, and Hermann
Ney. 2019. When and why is document-level
context useful in neural machine translation?
In Proceedings of the Fourth Workshop on

Discourse in Machine Translation (DiscoMT
2019), pages 24–34, 香港, CN.
计算语言学协会.
https://doi.org/10.18653/v1/D19
-6503

Kris Aric Knisely. 2020. Le franc¸ais non-
binaire: Linguistic forms used by non-binary
speakers of French. Foreign Language Annals,
53(4):850–876. https://doi.org/10.1111
/flan.12500

Philipp Koehn. 2005. Europarl: A parallel
corpus for statistical machine translation. 在
Proceedings of the tenth Machine Translation
Summit, pages 79–86, Phuket, TH. AAMT.

Corina Koolen and Andreas van Cranenburgh.
2017. These are not the stereotypes you are
looking for: Bias and fairness in authorial

gender attribution.
First ACL Workshop on Ethics in Natural
语言处理, pages 12–22, Valencia,
ES. 计算语言学协会.
https://doi.org/10.18653/v1/W17
-1602

在诉讼程序中

Cheris Kramarae and Paula A. Treichler. 1985.
A feminist dictionary. Pandora Press, 伦敦,
英国.

Michał Krawczyk. 2017. Are all

研究人员
male? Gender misattributions in citations. Sci-
entometrics, 110(3):1397–1402. https://
doi.org/10.1007/s11192-016-2192-y,
考研: 28255187

Hamutal Kreiner, Patrick Sturt, and Simon
Garrod. 2008. Processing definitional and
stereotypical gender in reference resolution:
Evidence from eye-movements. 杂志
记忆与语言, 58:239–261. https://
doi.org/10.1016/j.jml.2007.09.003

James Kuczmarski. 2018. Reducing gender bias
in Google Translate. https://www.blog
.google/products/translate/reducing
-gender-bias-google-translate/.
Accessed: 2021-02-25.

William Labov. 1972. Sociolinguistic Patterns. 4.

University of Pennsylvania Press.

Jennifer Langston. 2020. New AI tools help
writers be more clear, concise and inclusive in

866

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Office and across the Web. https://博客
.microsoft.com/ai/microsoft-365
-ai-tools/. Accessed: 2021-02-25.

Brian Larson. 2017. Gender as a variable in
natural-language processing: Ethical consid-
erations. 在诉讼程序中
the First ACL
Workshop on Ethics in Natural Language Pro-
cessing, pages 1–11, Valencia, ES. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/W17-1601

Ronan Le Nagard and Philipp Koehn. 2010.
Aiding pronoun translation with co-reference
resolution. In Proceedings of the Joint Fifth
Workshop on Statistical Machine Translation
and MetricsMATR, pages 252–261, Uppsala,
SE. 计算语言学协会.

Enora Lessinger. 2020. Le pr´esident est une
femme: The challenges of translating gender
in UN texts. In Luise von Flotow and Hala
Kamal, 编辑, The Routledge Handbook of
Translation, Feminism and Gender. 纽约,
美国. 劳特利奇. https://doi.org/10
.4324/9781315158938-33

Hector J. Levesque. 2014. On our best beha-
智力, 212(1):27–35.
viour. Artificial
https://doi.org/10.1016/j.artint
.2014.03.007

translation AI. 在诉讼程序中 2020 CHI
Conference on Human Factors in Computing
系统, CHI ’20, pages 1–13, 纽约,
美国. Association for Computing Machinery.
https://doi.org/10.1145/3313831
.3376261

Anna Lindqvist, Emma A. Renstr¨om, 和
Marie Gustafsson Send´en. 2019. 减少
a male bias in language? Establishing the
efficiency of three different gender-fair lan-
guage strategies. Sex Roles, 81(1–2):109–117.
https://doi.org/10.1007/s11199
-018-0974-9

Pierre Lison

J¨org Tiedemann.

2016.
OpenSubtitles2016: Extracting large parallel
corpora from movie and TV Subtitles. 在
Proceedings of the Tenth International Con-
ference on Language Resources and Evalua-
的 (LREC’16), pages 923–929, Portoroˇz, 和.
European Language Resources Association
(ELRA).

Katherine A. Liu and Natalie A. Dipietro Mager.
2016. Women’s involvement in clinical trials:
Historical perspective and future implications.
Pharmacy Practice, 14(1):708. https://
doi.org/10.18549/PharmPract.2016
.01.708, 考研: 27011778

Roger J. 右. Levesque. 2011. Sex Roles and Gen-
der Roles. 施普林格, 纽约, 美国. https://
doi.org/10.1007/978-1-4419-1695-2
602

´Artemis L´opez. 2019. T´u, yo, elle y el lenguaje
no binario. http://www.lalinternadel
traductor.org/n19/traducir-lenguaje
-no-binario.html. Accessed: 2021-02-25.

Molly Lewis and Gary Lupyan. 2020. 性别
stereotypes are reflected in the distributional
structure of 25 语言. Nature Human
行为, 4(10):1021–1028.

Yitong Li, Timothy Baldwin, and Trevor Cohn.
2018. Towards robust and privacy-preserving

text representations. 在诉讼程序中
56th Annual Meeting of
the Association
for Computational Linguistics (体积 2:
Short Papers), pages 25–30, 墨尔本, AU.
计算语言学协会.

Daniel

J. Liebling, Michal Lahav, Abigail
Jess Holbrook,
埃文斯, Aaron Donsbach,
Boris Smus,
and Lindsey Boran. 2020.
Unmet needs and opportunities for mobile

´Artemis L´opez, Susana Rodr´ıguez Barcia, 和
Mar´ıa del Carmen Cabeza Pereiro. 2020.
Visibilizar o interpretar: Respuesta al Informe
de la Real Academia Española sobre el lenguaje
inclusivo y cuestiones conexas. http://万维网
.ngenespanol.com/el-mundo/la-rae
-rechaza-nuevamente-el-lenguaje
-inclusivo/. Accessed: 2021-02-25.

Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam
Amancharla, and Anupam Datta. 2020. Gen-
der bias in neural natural language processing.
In Logic, 语言, 和安全, 体积
12300 计算机科学讲义,
pages 189–202, 施普林格. https://土井
.org/10.1007/978-3-030-62077-6 14

867

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

John Lyons. 1977. 语义学, 体积 2,
剑桥大学出版社, Cambrdige, 英国.

Thomas Manzini, Lim Yao Chong, Alan W.
黑色的, and Yulia Tsvetkov. 2019. Black is to
criminal as caucasian is to police: Detecting and
removing multiclass bias in word embeddings.
在诉讼程序中 2019 Conference of the
North American Chapter of the Association for
计算语言学: Human Language
Technologies, 体积 1 (Long and Short
文件), pages 615–621, 明尼阿波利斯, 美国.
计算语言学协会.
https://doi.org/10.18653/v1/N19
-1062

Marianna Martindale and Marine Carpuat. 2018.
Fluency over adequacy: A pilot study in mea-
suring user trust in imperfect MT. In Proceed-
ings of the 13th Conference of the Association
for Machine Translation in the Americas (卷-
梅 1: Research Track), pages 13–25, 波士顿,
美国. Association for Machine Translation in
the Americas.

Chandler May. 2019. Deconstructing gender
prediction in NLP. In Conference on Neu-
ral Information Processing Systems (NIPS) –
Keynote. Vancouver, CA.

Sally McConnell-Ginet. 2013. Gender and its re-
lation to sex: The myth of ‘natural’ gender.
In Greville G. Corbett, 编辑, The Expression
of Gender, pages 3–38. De Gruyter Mouton,
柏林, DE.

Tom McCoy, Ellie Pavlick, and Tal Linzen.
2019. Right for the wrong reasons: Diagnos-
ing syntactic heuristics in natural
语言
inference. In Proceedings of the 57th Annual
Meeting of the Association for Computational
语言学, pages 3428–3448, Florence, 它.
计算语言学协会.
https://doi.org/10.18653/v1/P19
-1334

Ninareh Mehrabi, Fred Morstatter, Nripsuta
Saxena, Kristina Lerman, and Aram Galstyan.
2019. A survey on bias and fairness in machine
学习.

Shachar Mirkin, Scott Nowson, Caroline Brun,
and Julien Perez. 2015. Motivating personality-
aware machine translation. 在诉讼程序中

这 2015 实证方法会议
自然语言处理, pages 1102–1108,
里斯本, PT. Association for Computational
语言学. https://doi.org/10.18653
/v1/D15-1130

Margaret Mitchell, Dylan Baker, Nyalleng
Moorosi, Emily Denton, Ben Hutchinson, Alex
Hanna, Timnit Gebru, and Jamie Morgenstern.
2020. Diversity and inclusion metrics in sub-
set selection. In Proceedings of the AAAI/ACM
Conference on AI, 伦理, 与社会, AIES
’20, pages 117–123, 纽约, 美国. Asso-
ciation for Computing Machinery.

Britta Mondorf. 2002. Gender differences in
English syntax. Journal of English Linguistics,
30:158–180. https://doi.org/10.1177
/007242030002005

Johanna Monti. 2017. Questioni di genere in
femminile. Scritti
in onore di Cristina Vallini,

traduzione automatica. 铝
linguistici
139:411–431.

Johanna Monti. 2020. Gender issues in machine
翻译: An unsolved problem? In Luise
von Flotow and Hala Kamal, 编辑, 这
Routledge Handbook of Translation, Feminism
and Gender, pages 457–468, 劳特利奇.
https://doi.org/10.4324/9781315
158938-39

injection.

Amit Moryossef, Roee Aharoni, and Yoav
Goldberg. 2019. Filling gender & number gaps
in neural machine translation with black-box

语境
First Workshop on Gender Bias in Natural
语言处理, pages 49–54, Florence,
它. 计算语言学协会.
https://doi.org/10.18653/v1/W19
-3807

在诉讼程序中

Heiko Motschenbacher. 2014. Grammatical gen-
der as a challenge for language policy: 这
(im)possibility of non-heteronormative lan-
guage use in German versus English. 兰-
guage policy, 13(3):243–261. https://土井
.org/10.1007/s10993-013-9300-0

Anthony Mulac, 詹姆斯·J. Bradac, and Pamela

Gibbons. 2001. Empirical support
gender-as-culture hypothesis. Human Commu-
nication Research, 27:121–152. https://

为了

868

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

doi.org/10.1111/j.1468-2958.2001
.tb00778.x

El Mundo. 2018. La RAE rechaza nueva-
mente el lenguaje inclusivo. https://万维网
.ngenespanol.com/el-mundo/la-rae
-rechaza-nuevamente-el-lenguaje
-inclusivo/. Accessed: 2021-02-25.

赋予生命. 乙. 穆雷. 2003. Who is Takat¯apui?
语言, sexuality and identity in
Anthropologica,
233–244. https://doi.org/10

M¯aori
Aotearoa/New
页面
.2307/25606143

西兰岛.

Terttu Nevalainen

and Helena Raumolin-
Brunberg. 1993. Its strength and the beauty
其中: The standardization of the third person
neuter possessive in early modern English.
In Dieter Stein and Ingrid Tieken-Boon van
Ostade, 编辑, Towards a Standard English,
pages 171–216. De Gruyter, 柏林, DE.

Matthew L. 纽曼, Carla J. Groom, Lori D.
Handelman, and James W. Pennebaker. 2008.
in language use: 一个
Gender differences
analysis of 14,000 text samples. Discourse Pro-
过程, 45(3):211–236. https://doi.org
/10.1080/01638530802073712

Dong Nguyen, A. Seza Do˘gru¨oz, Carolyn P.
Ros´e, and Franciska de Jong. 2016. Compu-
tational sociolinguistics: 一项调查. Computa-
tional linguistics, 42(3):537–593. https://
doi.org/10.1162/COLI 00258

Jan Niehues, Eunah Cho, Thanh-Le Ha, 和
Alex Waibel. 2016. Pre-translation for neu-
ral machine translation. 在诉讼程序中
科林 2016, the 26th International Con-
ference on Computational Linguistics: 科技-
nical Papers, pages 1828–1836, 大阪, JP.
The COLING 2016 Organizing Committee.

Uwe Kjær Nissen. 2002. Aspects of translating

性别. Linguistik Online, 11(2).

Malvina Nissim and Rob van der Goot. 2020.
Fair is better than sensational: Man is to doctor
as woman is to doctor. Computational Lin-
语言学, 46(2):487–497. https://doi.org
/10.1162/coli a 00379

Parmy Olson. 2018. The algorithm that helped
google translate become sexist. https://万维网

869

.forbes.com/sites/parmyolson/2018
/02/15/the-algorithm-that-helped
-google-translate-become-sexist
/?sh=d675b9c7daa2. Accessed: 2021-02-25.

Dimitrios Papadimoulis. 2018. Gender-Neutral
Language in the European Parliament. Euro-
pean Parliament 2018.

Benjamin Papadopoulos. 2019. Morphological
Gender Innovations in Spanish of Genderqueer
Speakers.

Kishore Papineni, Salim Roukos, Todd Ward,
and Wei-Jing Zhu. 2002. 蓝线: A method for
automatic evaluation of machine translation.
In Proceedings of the 40th Annual Meeting

the Association for Computational Lin-
语言学, pages 311–318, 费城, 美国.
计算语言学协会.
https://doi.org/10.3115/1073083
.1073135

Amandalynne Paullada,

Inioluwa D. Raji,
Emily M. Bender, Emily Denton, and Alex
Hanna. 2020. Data and its (迪斯)contents: A
survey of dataset development and use in
machine learning research. In NeurIPS 2020
作坊: ML Retrospectives, Surveys &
Meta-analyses (ML-RSA). Vitual.

James Pennebaker and Lori Stone. 2003. Words
of wisdom: Language use over the life span.
Journal of Personality and Social Psychol-
奥吉, 85:291–301. https://doi.org/10
.1037/0022-3514.85.2.291, 考研:
12916571

Marcelo O. 右. Prates, Pedro H. C. Avelar, 和
Lu´ıs C. Lamb. 2018. Assessing gender bias in
machine translation: A case study with Google
Translate. Neural Computing and Applications,
pages 1–19.

Ella Rabinovich, Raj N. Patel, Shachar Mirkin,
Lucia Specia, and Shuly Wintner. 2017.
Personalized machine translation: 保存
original author traits. 在诉讼程序中
15th Conference of the European Chapter of
the Association for Computational Linguistics:
体积 1, Long Papers, pages 1074–1084,
Valencia, ES. Association for Computational
语言学.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Iyad Rahwan, Manuel Cebrian, Nick Obradovich,
Jean-Franc¸ois Bonnefon,
Josh Bongard,
Cynthia Breazeal, Jacob W. Crandall, 尼古拉斯
A. Christakis, Iain D. Couzin, Matthew O.
Jackson, 等人. 2019. Machine behaviour.
自然, 568(7753):477–486. https://土井
.org/10.1038/s41586-019-1138-y,
考研: 31019318

Isabelle R´egner, Catherine Thinus-Blanc, Agn`es
Netter, Toni Schmader, and Pascal Huguet.
2019. Committees with implicit biases pro-
mote fewer women when they do not believe
gender bias exists. Nature Human Behaviour,
3(11):1171–1179. https://doi.org/10
.1038/s41562-019-0686-3,
考研:
31451735

Argentina A. Rescigno, Johanna Monti, Andy
方式, and Eva Vanmassenhove. 2020. A
case study of natural gender phenomena in
翻译: A Comparison of Google Translate,
Bing Microsoft Translator and DeepL for
English to Italian, French and Spanish. 在
Proceedings of the Workshop on the Impact
of Machine Translation (iMpacT 2020),
pages 62–90, 在线的. Association for Machine
Translation in the Americas.

Alexander S. Rich and Todd M. Gureckis.
2019. Lessons for artificial intelligence from
the study of natural stupidity. Nature Ma-
chine Intelligence, 1(4):174–180. https://
doi.org/10.1038/s42256-019-0038-z

Christina Richards, Walter P. Bouman, 莱顿
Seal, Meg J. Barker, Timo O. Nieder, 和
Guy TSjoen. 2016. Non-binary or genderqueer
genders. International Review of Psychiatry,
28(1):95–102. https://doi.org/10.3109
/09540261.2015.1106446,
考研:
26753630

Barbara J. Risman. 2018. Gender as a Social
Structure. In Barbara Risman, Carissa Froyum,
and William J. 士嘉堡, 编辑, Handbook
the Sociology of Gender, pages 19–43,

https://doi.org/10.1007
施普林格.
/978-3-319-76333-0

Nicholas Roberts, Davis Liang, Graham Neubig,
and Zachary C. Lipton. 2020. Decoding and
diversity in machine translation. In Proceed-
ings of the Resistance AI Workshop at 34th

Conference on Neural Information Processing
系统 (神经信息处理系统 2020). Vancouver, CA.

Suzanne Romaine. 1999. Communicating Gender.
劳伦斯·埃尔鲍姆, 莫瓦, 美国. https://
doi.org/10.4324/9781410603852

Suzanne Romaine. 2001. A corpus-based view of
gender in British and American English. Gen-
der across Languages, 1:153–175. https://
doi.org/10.1075/impact.9.12rom

Rachel Rudinger,

Jason Naradowsky, Brian
Leonard, and Benjamin Van Durme. 2018.
Gender bias in coreference resolution.

诉讼程序 2018 Conference of the
North American Chapter of the Association
for Computational Linguistics: Human Lan-
guage Technologies, 体积 2 (Short Papers),
pages 8–14, New Orleans, Louisiana. Associa-
tion for Computational Linguistics.

Ram Samar. 2020. Machines Are Indifferent, 我们
Are Not: Yann LeCuns Tweet Sparks ML bias
debate. https://analyticsindiamag
.com/yann-lecun-machine-learning
-bias-debate/. Accessed: 2021-02-25.

Kalinowsky Santiago. 2018. Todos/Todas/Todes.
Interview with Megan Figueroa, 主持人; Carrie
Gillon, 主持人. In The Vocal Fries [Podcast].
Vancouver, CA.

Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan
Jurafsky, 诺亚A. 史密斯, and Yejin Choi.
2020. Social bias frames: Reasoning about
social and power implications of language. 在
Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 5477–5490, 在线的. 协会
计算语言学.

Danielle Saunders

在诉讼程序中

and Bill Byrne. 2020.
Reducing gender bias
in neural machine
translation as a domain adaptation prob-
the 58th Annual
莱姆.
Meeting of
the Association for Computa-
tional Linguistics, pages 7724–7736, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.acl-main.690

Danielle Saunders, Rosie Sallis, and Bill Byrne.
2020. Neural machine translation doesn’t

870

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

在诉讼程序中

translate gender coreference right unless you
the Second
make it.
Workshop on Gender Bias in Natural Language
加工, pages 35–43, 在线的. 协会
for Computational Linguistics.

Londa Schiebinger. 2014. Scientific research
must take gender into account. 自然, 507(9).
https://doi.org/10.1038/507009a,
考研: 24598604

Ari Schlesinger, 瓦. Keith Edwards,


Rebecca E. Grinter. 2017. Intersectional HCI:
Engaging identity through gender, 种族, 和
班级. 在诉讼程序中 2017 CHI Confer-
ence on Human Factors in Computing Systems,
’17, pages 5412–5427, 纽约,
CHI
美国. Association for Computing Machinery.
https://doi.org/10.1145/3025453
.3025766

Natalie Schluter. 2018. The Glass Ceiling in
自然语言处理. 在诉讼程序中 2018 会议
on Empirical Methods in Natural Language
加工, pages 2793–2798, 布鲁塞尔, 是.
计算语言学协会.
https://doi.org/10.18653/v1/D18
-1301

Muriel R. Schulz. 1975. The semantic derogation
of woman. In Barrie Thorne and Nancy Henley,
编辑, Sex and Language. Difference and
Dominance, pages 64–75, Newbury House,
Rowley, 美国.

Tal Schuster, Darsh Shah, Yun Jie Serene
杨, Daniel Roberto Filizzola Ortiz, Enrico
Santus, and Regina Barzilay. 2019. Towards
debiasing fact verification models. In Pro-
ceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing
and the 9th International Joint Conference
on Natural Language Processing (EMNLP-
IJCNLP), pages 3419–3425, 香港, CN.
计算语言学协会.
https://doi.org/10.18653/v1/D19
-1341

Sabine Sczesny, Christa Nater, and Alice H.
Eagly. 2018. Agency and communion: 他们的
implications for gender stereotypes and gender
身份, Agency and Communion in Social
心理学, pages 103–116. Taylor and Francis.

https://doi.org/10.4324/9780203
703663-9

Andrew D. Selbst, Danah Boyd, Sorelle A.
Friedler, Suresh Venkatasubramanian, 和
Janet Vertesi. 2019. Fairness and abstraction
in sociotechnical systems. 在诉讼程序中
the Conference on Fairness, Accountability,
and Transparency, FAT* ’19, pages 59–68,
纽约, 美国. Association for Computing
Machinery. https://doi.org/10.1145
/3287560.3287598

Rico Sennrich. 2017. How grammatical


character-level neural machine translation?
Assessing MT quality with contrastive trans-
lation pairs. In Proceedings of the 15th Confer-
ence of the European Chapter of the Association
for Computational Linguistics: 体积 2,
Short Papers, pages 376–382, Valencia, ES.
计算语言学协会.
https://doi.org/10.18653/v1/E17
-2060

Deven S. Shah, Hansen A. 施瓦茨, and Dirk
蓝色的. 2020. Predictive biases in natural lan-
guage processing models: A conceptual frame-
work and overview. In Proceedings of the 58th
Annual Meeting of the Association for Compu-
tational Linguistics, pages 5248–5264, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.acl-main.468

Alyx J. Shroy. 2016. Innovations in gender-neutral
法语: Language practices of nonbinary
French speakers on Twitter. Ms., 大学
加利福尼亚州, 戴维斯.

Jeanette Silveira. 1980. Generic masculine words
and thinking. Women’s Studies International
季刊, 3(2–3):165–178. https://土井
.org/10.1016/S0148-0685(80)92113-2

Fabian H. Sinz, Xaq Pitkow, Jacob Reimer,
Matthias Bethge, and Andreas S. Tolias.
2019. Engineering a less artificial
intelli-
根杰斯. 神经元, 103(6):967–979. https://
doi.org/10.1016/j.neuron.2019.08
.034, 考研: 31557461

Janet Smith. 2003. Gendered structures in Japa-
nese. Gender across Languages, 3:201–227.
https://doi.org/10.1075/impact.11
.12shi

871

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Matthew Snover, Bonnie Dorr, Richard Schwartz,
Linnea Micciulla, and John Makhoul. 2006.
A study of translation edit rate with targeted
human annotation. In Proceedings of the 7th
Conference of
the Association for Machine
Translation in the Americas, pages 223–231,
剑桥, 美国. The Association for Machine
Translation in the Americas.

Art¯urs Stafanoviˇcs, M¯arcis Pinnis, and Toms
Bergmanis. 2020. Mitigating gender bias in
machine translation with target gender annota-
系统蒸发散. In Proceedings of the Fifth Conference on
机器翻译, pages 629–638, 在线的.
计算语言学协会.

Dagmar Stahlberg, Friederike Braun, Lisa Irmen,
and Sabine Sczesny. 2007. 代表
the sexes in language. Social Communication,
pages 163–187.

Gabriel Stanovsky, 诺亚A. 史密斯, and Luke
Zettlemoyer. 2019. Evaluating gender bias in
machine translation. 在诉讼程序中

57th Annual Meeting of the Association for
计算语言学, pages 1679–1684,
Florence, 它. Association for Computational
语言学. https://doi.org/10.18653
/v1/P19-1164

Simone Stumpf, Anicia Peters, Shaowen Bardzell,
Margaret Burnett, Daniela Busse,
Jessica
Cauchard, and Elizabeth Churchill. 2020.
Gender-inclusive HCI research and design: A
conceptual review. Foundations and Trends
in Human–Computer Interaction, 13(1):1–69.
https://doi.org/10.1561/1100000056

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin
黄, Mai ElSherief,
Jieyu Zhao, Diba
Mirza, Elizabeth Belding, Kai-Wei Chang, 和
William Yang Wang. 2019. Mitigating gender
bias in natural language processing: Literature
审查. In Proceedings of the 57th Annual
Meeting of the Association for Computational
语言学, pages 1630–1640, Florence, 它.
计算语言学协会.
https://doi.org/10.18653/v1/P19
-1159

Tony Sun, Kellie Webster, Apu Shah, William Y.
王, and Melvin Johnson. 2021. 他们, 他们,
theirs: Rewriting with gender-neutral English.
arXiv 预印本 arXiv:2102.06788.

Harini Suresh and John V. Guttag. 2019. A
framework for understanding unintended con-
sequences of machine learning. arXiv 预印本
arXiv:1901.10002.

Masashi Takeshita, Yuki Katsumata, Rafal
Rzepka, and Kenji Araki. 2020. Can existing
methods debias languages other than english?
First attempt to analyze and mitigate Japanese

Word Embeddings.
Second Workshop on Gender Bias in Natural
语言处理, pages 44–55, 在线的.
计算语言学协会.

在诉讼程序中

Tina Tallon. 2019. A century of ‘‘shrill’’: 如何
bias in technology has hurt women’s voices.
纽约客.

Rachael Tatman. 2017. Gender and dialect bias in
youtube’s automatic captions. In Proceedings
of the First ACL Workshop on Ethics in Natural
语言处理, pages 53–59, Valencia,
ES. 计算语言学协会.
https://doi.org/10.18653/v1/W17
-1606

Peter Trudgill. 2000. Sociolinguistics: 简介-
duction to Language and Society. 企鹅
图书, 伦敦, 英国.

安妮·M. 车工, Megumu K. Brownstein, Kate
Cole, Hilary Karasz, and Katrin Kirchhoff.
2015. Modeling workflow to design machine
translation applications
for public health
实践. Journal of Biomedical Informatics,
53:136–146. https://doi.org/10.1016
/j.jbi.2014.10.005, 考研: 25445922

Amos Tversky and Daniel Kahneman. 1973.
Availability: A heuristic for judging frequency
and probability. 认知心理学, 5(2):
https://doi.org/10.1016
207–232.
/0010-0285(73)90033-9

Amos Tversky and Daniel Kahneman. 1974.
Judgment under uncertainty: Heuristics and bia-
塞斯. 科学, 185(4157):1124–1131. https://
doi.org/10.1126/science.185.4157
.1124, 考研: 17835457

Emiel van Miltenburg. 2019. Pragmatic Factors
在 (Automatic) Image Description. 博士. 论文,
Vrije Universiteit, 阿姆斯特丹, NL.

872

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Eva Vanmassenhove, Christian Hardmeier, 和
Andy Way. 2018. Getting Gender Right
in Neural Machine Translation. In Proceed-
ings of
这 2018 Conference on Empirical
Methods in Natural Language Processing,
pages 3003–3008, 布鲁塞尔, 是. 协会
for Computational Linguistics.

Eva Vanmassenhove, Dimitar Shterionov, 和
Matthew Gwilliam. 2021. Machine transla-
tionese: Effects of algorithmic bias on linguistic
complexity in machine translation. In Proceed-
ings of the 16th Conference of the European
Chapter of the Association for Computational
语言学: Main Volume, pages 2203–2213.

Eva Vanmassenhove, Dimitar Shterionov, 和
Andy Way. 2019. Lost in translation: Loss and
decay of linguistic richness in machine trans-
关系. In Proceedings of Machine Transla-
tion Summit XVII Volume 1: Research Track,
pages 222–232, 都柏林, IE. European Asso-
ciation for Machine Translation.

Mihaela Vorvoreanu, Lingyi Zhang, Yun-Han
黄, Claudia Hilderbrand, Zoe Steine-
Hanson, and Margaret Burnett. 2019. 从
gender biases to gender-inclusive design: 一个
empirical investigation. 在诉讼程序中
2019 CHI Conference on Human Factors in
Computing Systems, CHI ’19, pages 1–14,
纽约, 美国. Association for Computing
Machinery. https://doi.org/10.1145
/3290605.3300283

Claudia Wagner, David Garcia, Mohsen Jadidi,
and Markus Strohmaier. 2015. It’s a man’s
维基百科? Assessing gender inequality in an
online encyclopedia. 在诉讼程序中

International AAAI Conference on Web and
Social Media, 体积 9.

Mario Wandruszka. 1969. Sprachen: Vergleichbar
und Vnvergleichlich, 右. 派珀 & 钴. Verlag,
慕尼黑, DE.

Zeerak Waseem. 2016. Are you a racist or am
i seeing things? Annotator influence on hate
speech detection on Twitter. 在诉讼程序中
the First Workshop on NLP and Computational
社会科学, pages 138–142, Austin, 美国.
计算语言学协会.
https://doi.org/10.18653/v1/W16
-5618

Zeerak Waseem, Smarika Lulz, Joachim Bingel,
and Isabelle Augenstein. 2020. Disembodied
机器学习: On the illusion of objectivity
in NLP. OpenReview Preprint.

Kellie Webster, Marta R. Costa-juss`a, Christian
Hardmeier, and Will Radford. 2019. Gendered
ambiguous pronoun (GAP) shared task at
the gender bias in NLP workshop 2019. 在
Proceedings of the First Workshop on Gen-
der Bias in Natural Language Processing,
pages 1–7, Florence, 它. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/W19-3801

Ilka B. Wolter and Bettina Hannover. 2016.
Gender role self-concept at school start and
its impact on academic self-concept and per-
formance in mathematics and reading. Euro-
pean Journal of Developmental Psychology,
13(6):681–703. https://doi.org/10.1080
/17405629.2016.1175343

Jieyu Zhao, Subhabrata Mukherjee, Saghar
Hosseini, Kai-Wei Chang,
and Ahmed
Hassan Awadallah. 2020. Gender bias in
multilingual embeddings and cross-lingual
transfer. In Proceedings of the 58th Annual
the Association for Computa-
Meeting of
tional Linguistics, pages 2896–2907, 在线的.
计算语言学协会.
https://doi.org/10.18653/v1/2020
.acl-main.260

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
Ordonez, and Kai-Wei Chang. 2017. 男士
also like shopping: Reducing gender bias
amplification using corpus-level constraints.
在诉讼程序中
这 2017 会议
Empirical Methods in Natural Language Pro-
cessing, pages 2979–2989, 哥本哈根, DK.
计算语言学协会.
https://doi.org/10.18653/v1/D17
-1323

Jieyu Zhao, Tianlu Wang, Mark Yatskar,
Vicente Ordonez, and Kai-Wei Chang. 2018A.
Gender bias in coreference resolution: Eval-
uation and debiasing methods. In Proceed-
ings of
the North
American Chapter of
the Association for
计算语言学: Human Lan-
guage Technologies, 体积 2 (Short Papers),
pages 15–20, New Orleans, 美国. 协会

这 2018 Conference of

873

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

for Computational Linguistics. https://
doi.org/10.18653/v1/N18-2003

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei
王, and Kai-Wei Chang. 2018乙. 学习
gender-neutral word embeddings. In Proceed-
ings of
这 2018 Conference on Empirical
Methods in Natural Language Processing,
pages 4847–4853, 布鲁塞尔, 是. 协会
for Computational Linguistics. https://
doi.org/10.18653/v1/D18-1521

Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao
黄, Muhao Chen, Ryan Cotterell, 和
Kai-Wei Chang. 2019. Examining gender bias
in languages with grammatical gender. In Pro-
ceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing
and the 9th International Joint Conference
on Natural Language Processing (EMNLP-
IJCNLP), pages 5276–5284, 香港, CN.
计算语言学协会.
https://doi.org/10.18653/v1/D19
-1531.

Lal Zimman. 2020. Transgender language, 反式-
gender moment: Toward a trans linguistics. 在
Kira Hall and Rusty Barrett, 编辑, The Oxford
Handbook of Language and Sexuality, 牛津
大学出版社. https://doi.org/10
.1093/oxfordhb/9780190212926.013
.45

Lal Zimman, Evan Hazenberg, and Miriam
Meyerhoff. 2017. Trans peoples linguistic
self-determination and the dialogic nature of
身份. Representing trans: Linguistic, 合法的
and everyday perspectives, pages 226–248.

Ran Zmigrod, Sabrina J. Mielke, Hanna Wallach,
and Ryan Cotterell. 2019. Counterfactual data
augmentation for mitigating gender stereotypes
in languages with rich morphology. In Pro-
ceedings of the 57th Annual Meeting of the
计算语言学协会,
pages 1651–1661, Florence, 它. 协会
for Computational Linguistics. https://
doi.org/10.18653/v1/P19-1161

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
4
0
1
1
9
5
7
7
0
5

/

/
t

A
C
_
A
_
0
0
4
0
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

874
下载pdf