Analysis Methods in Neural Language Processing: 调查

Yonatan Belinkov1,2 and James Glass1

1MIT Computer Science and Artificial Intelligence Laboratory
2Harvard School of Engineering and Applied Sciences
剑桥, 嘛, 美国
{belinkov, glass}@mit.edu

抽象的

The field of natural language processing has
seen impressive progress in recent years, 和
neural network models replacing many of the
traditional systems. A plethora of new mod-
els have been proposed, many of which are
thought to be opaque compared to their feature-
rich counterparts. This has led researchers to
analyze, interpret, and evaluate neural net-
works in novel and more fine-grained ways. 在
this survey paper, we review analysis meth-
ods in neural language processing, categorize
them according to prominent research trends,
highlight existing limitations, and point to po-
tential directions for future work.

介绍

The rise of deep learning has transformed the field
of natural language processing (自然语言处理) in recent
年. Models based on neural networks have
obtained impressive improvements in various
任务,
including language modeling (米科洛夫
等人。, 2010; Jozefowicz et al., 2016), 句法的
解析 (Kiperwasser and Goldberg, 2016),
machine translation (公吨) (Bahdanau et al., 2014;
Sutskever et al., 2014), and many other tasks; 看
Goldberg (2017) for example success stories.

This progress has been accompanied by a
myriad of new neural network architectures. 在
many cases, traditional feature-rich systems are
being replaced by end-to-end neural networks
that aim to map input text to some output pre-
措辞. As end-to-end systems are gaining preva-
伦斯, one may point to two trends. 第一的, 一些
push back against the abandonment of linguis-
tic knowledge and call for incorporating it inside

the networks in different ways.1 Others strive to
better understand how NLP models work. 这
theme of analyzing neural networks has connec-
tions to the broader work on interpretability in
机器学习, along with specific characteris-
tics of the NLP field.

Why should we analyze our neural NLP mod-
this question falls into
这? 在某种程度上,
the larger question of interpretability in machine
学习, which has been the subject of much
debate in recent years.2 Arguments in favor
of interpretability in machine learning usually
mention goals like accountability, 相信, 公平,
safety, and reliability (Doshi-Velez and Kim,
2017; Lipton, 2016). Arguments against inter-
pretability typically stress performance as the
most important desideratum. All these arguments
naturally apply to machine learning applications
in NLP.

In the context of NLP, this question needs to
be understood in light of earlier NLP work, 经常
referred to as feature-rich or feature-engineered
系统. In some of these systems, features are
more easily understood by humans—they can be
morphological properties,
lexical classes, syn-
tactic categories, semantic relations, ETC. 理论上,
one could observe the importance assigned by
statistical NLP models to such features in order
to gain a better understanding of the model.3 In

1看, 例如, Noah Smith’s invited talk at ACL
2017: vimeo.com/234958746. See also a recent debate
on this matter by Chris Manning and Yann LeCun: 万维网.
youtube.com/watch?v=fKk9KhGRBdI. (Videos accessed
on December 11, 2018.)

2看, 例如, the NIPS 2017 debate: www.youtube.
com/watch?v=2hW05ZfsUUo. (Accessed on December 11,
2018.)

3尽管如此, one could question how feasible such
an analysis is; consider, 例如, interpreting support vec-
tors in high-dimensional support vector machines (SVMs).

计算语言学协会会刊, 卷. 7, PP. 49–72, 2019. 动作编辑器: Marco Baroni.
提交批次: 10/2018; 修改批次: 12/2018; 已发表 3/2019.
C(西德:13) 2019 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

对比, it is more difficult to understand what
happens in an end-to-end neural network model
那
takes input (说, word embeddings) 和
generates an output (说, a sentence classification).
Much of the analysis work thus aims to understand
how linguistic concepts that were common as
features in NLP systems are captured in neural
网络.

As the analysis of neural networks for language
is becoming more and more prevalent, neural
networks in various NLP tasks are being analyzed;
different network architectures and components
are being compared, and a variety of new anal-
ysis methods are being developed. This survey
aims to review and summarize this body of work,
highlight current trends, and point to existing
lacunae. It organizes the literature into several
主题. 部分 2 reviews work that targets a
fundamental question: What kind of linguistic in-
formation is captured in neural networks? 我们
also point to limitations in current methods for
answering this question. 部分 3 discusses visu-
alization methods, and emphasizes the difficulty
in evaluating visualization work. 在部分 4,
we discuss the compilation of challenge sets, 或者
test suites, for fine-grained evaluation, a meth-
odology that has old roots in NLP. 部分 5
deals with the generation and use of adversarial
examples to probe weaknesses of neural networks.
We point to unique characteristics of dealing with
text as a discrete input and how different studies
handle them. 部分 6 summarizes work on
explaining model predictions, an important goal
of interpretability research. This is a relatively
underexplored area, and we call for more work
in this direction. 部分 7 mentions a few other
methods that do not fall neatly into one of the
above themes. In the conclusion, we summarize
the main gaps and potential research directions for
the field.

The paper is accompanied by online supple-
mentary materials that contain detailed references
for studies corresponding to Sections 2, 4, 和
5 (Tables SM1, SM2, and SM3, 分别),
available at https://boknilev.github.io/
nlp-analysis-methods.

Before proceeding, we briefly mention some

earlier work of a similar spirit.

A Historical Note Reviewing the vast literature
language is beyond
on neural networks for

our scope.4 However, we mention here a few
representative studies that focused on analyzing
such networks in order to illustrate how recent
trends have roots that go back to before the recent
deep learning revival.

Rumelhart and McClelland (1986) built a
feedforward neural network for
learning the
English past tense and analyzed its performance
on a variety of examples and conditions. 他们
were especially concerned with the performance
over the course of training, as their goal was to
model the past form acquisition in children. 他们
also analyzed a scaled-down version having eight
input units and eight output units, which allowed
them to describe it exhaustively and examine how
certain rules manifest in network weights.

In his seminal work on recurrent neural
网络 (RNNs), Elman trained networks on
synthetic sentences in a language prediction
任务 (Elman, 1989, 1990, 1991). Through exten-
sive analyses, he showed how networks discover
the notion of a word when predicting characters;
capture syntactic structures like number agree-
蒙特; and acquire word representations that
reflect lexical and syntactic categories. 相似的
analyses were later applied to other networks and
任务 (哈里斯, 1990; Niklasson and Lin˚aker, 2000;
波拉克, 1990; Frank et al., 2013).

While Elman’s work was limited in some ways,
such as evaluating generalization or various lin-
guistic phenomena—as Elman himself recog-
尼泽德 (Elman, 1989)—it introduced methods that
are still relevant today: from visualizing network
activations in time, through clustering words by
hidden state activations, to projecting represen-
tations to dimensions that emerge as capturing
properties like sentence number or verb valency.
The sections on visualization (部分 3) and iden-
tifying linguistic information (部分 2) contain
many examples for these kinds of analysis.

2 What Linguistic Information Is
Captured in Neural Networks?

Neural network models in NLP are typically
trained in an end-to-end manner on input–output
对, without explicitly encoding linguistic

4例如, a neural network that learns distributed
representations of words was developed already in
Miikkulainen and Dyer (1991). See Goodfellow et al. (2016,
chapter 12.4) for references to other important milestones.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

它

is convenient

特征. 因此, a primary question is the fol-
lowing: What linguistic information is captured
in neural networks? When examining answers
to this question,
to consider
三个维度: which methods are used for
conducting the analysis, what kind of linguistic
information is sought, and which objects in the
neural network are being investigated. Table SM1
(in the supplementary materials) categorizes rel-
evant analysis work according to these criteria. 在
the next subsections, we discuss trends in analysis
work along these lines, followed by a discussion
of limitations of current approaches.

2.1 方法

this classifier

The most common approach for associating neural
network components with linguistic properties
is to predict such properties from activations of
the neural network. 通常, in this approach a
neural network model is trained on some task
(说, 公吨) and its weights are frozen. 然后,
the trained model is used for generating feature
representations for another task by running it on
a corpus with linguistic annotations and recording
the representations (说, hidden state activations).
Another classifier is then used for predicting the
property of interest (说, part-of-speech [销售点]
tags). The performance of
是
used for evaluating the quality of the generated
陈述, and by proxy that of the original
模型. This kind of approach has been used
in numerous papers in recent years; see Table SM1
for references.5 It is referred to by various names,
including ‘‘auxiliary prediction tasks’’ (Adi et al.,
2017乙), ‘‘diagnostic classifiers’’ (Veldhoen et al.,
2016), and ‘‘probing tasks’’ (Conneau et al., 2018).
As an example of this approach, let us walk
through an application to analyzing syntax in
neural machine translation (NMT) by Shi et al.
(2016乙). 在这项工作中, two NMT models were
trained on standard parallel data—English→
French and English→German. The trained models
(具体来说,
the encoders) were run on an
annotated corpus and their hidden states were
used for training a logistic regression classifier
that predicts different syntactic properties. 这
authors concluded that the NMT encoders learn

5A similar method has been used to analyze hierarchical
structure in neural networks trained on arithmetic expressions
(Veldhoen et al., 2016; Hupkes et al., 2018).

significant syntactic information at both word
level and sentence level. They also compared
representations at different encoding layers and
found that ‘‘local features are somehow preserved
in the lower layer whereas more global, 抽象的
information tends to be stored in the upper
layer.’’ These results demonstrate the kind of
insights that the classification analysis may lead
到, especially when comparing different models
or model components.

Other methods for finding correspondences
between parts of the neural network and certain
properties include counting how often attention
weights agree with a linguistic property like
anaphora resolution (Voita et al., 2018) or directly
computing correlations between neural network
例如,
activations and some property;
correlating RNN state activations with depth
in a syntactic tree (Qian et al., 2016A) 或者
with Melfrequency cepstral coefficient (MFCC)
acoustic features (Wu and King, 2016). 这样的
correspondence may also be computed indirectly.
例如, Alishahi et al. (2017) defined an
ABX discrimination task to evaluate how a neural
model of speech (grounded in vision) encoded
phonology. Given phoneme representations from
different layers in their model, and three pho-
nemes, A, 乙, and X, they compared whether
the model representation for X is closer to A
or B. This discrimination task enabled them to
draw conclusions about which layers encoder
phonology better, observing that
lower layers
generally encode more phonological information.

2.2 Linguistic Phenomena

Different kinds of linguistic information have
been analyzed, ranging from basic properties like
sentence length, word position, word presence, 或者
simple word order, to morphological, 句法的,
and semantic information. Phonetic/phonemic
信息, speaker information, and style and
accent information have been studied in neural
network models for speech, or in joint audio-visual
型号. See Table SM1 for references.

While it is difficult to synthesize a holistic
picture from this diverse body of work, it ap-
pears that neural networks are able to learn a
substantial amount of information on various
语言现象. These models are especially
successful at capturing frequent properties, 尽管
some rare properties are more difficult to learn.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

其他

Linzen et al. (2016), 例如, found that long
short-term memory (LSTM) language models
are able to capture subject–verb agreement in
many common cases, while direct supervision is
required for solving harder cases.
theme

in several
studies is the hierarchical nature of the learned
陈述. We have already mentioned such
findings regarding NMT (Shi et al., 2016乙) and a
visually grounded speech model (Alishahi et al.,
2017). Hierarchical representations of syntax were
also reported to emerge in other RNN models
(Blevins et al., 2018).

emerges

那

最后, a couple of papers discovered that
models trained with latent trees perform better
on natural language inference (NLI) (威廉姆斯
等人。, 2018; Maillard and Clark, 2018) 比
ones trained with linguistically annotated trees.
而且,
the trees in these models do not
resemble syntactic trees corresponding to known
linguistic theories, which casts doubts on the
importance of syntax-learning in the underlying
neural network.6

2.3 Neural Network Components

In terms of the object of study, various neural
network components were investigated, 包括
word embeddings, RNN hidden states or gate
activations, sentence embeddings, 和关注
weights in sequence-to-sequence (seq2seq) 模组-
这. Generally less work has analyzed convo-
lutional neural networks in NLP, but see Jacovi
等人. (2018) for a recent exception. In speech
加工, researchers have analyzed layers in
deep neural networks for speech recognition
and different speaker embeddings. Some analysis
has also been devoted to joint language–vision
or audio–vision models, or to similarities bet-
ween word embeddings and con volutional image
陈述. Table SM1 provides detailed
参考.

2.4 Limitations

The classification approach may find that a certain
amount of linguistic information is captured in the
neural network. 然而, this does not necessar-
ily mean that the information is used by the net-
工作. 例如, Vanmassenhove et al. (2017)

6Others found that even simple binary trees may work well
in MT (王等人。, 2018乙) and sentence classification (陈
等人。, 2015).

investigated aspect in NMT (and in phrase-based
统计机器翻译). They trained a classifier on NMT
sentence encoding vectors and found that they can
accurately predict tense about 90% 当时的.
然而, when evaluating the output translations,
they found them to have the correct tense only
79% 当时的. They interpreted this result
to mean that ‘‘part of the aspectual information
is lost during decoding.’’ Relatedly, C´ıfka and
Bojar (2018) compared the performance of various
NMT models in terms of translation quality
(蓝线) and representation quality (classification
任务). They found a negative correlation between
the two, suggesting that high-quality systems
may not be learning certain sentence meanings.
相比之下, Artetxe et al. (2018) 表明
语言学的
word embeddings contain divergent
信息, which can be uncovered by applying
a linear transformation on the learned embeddings.
Their results suggest an alternative explanation,
showing that ‘‘embedding models are able to
encode divergent linguistic information but have
limits on how this information is surfaced.’’

From a methodological point of view, 最多
of the relevant analysis work is concerned with
correlation: How correlated are neural network
components with linguistic properties? What may
be lacking is a measure of causation: How does
the encoding of linguistic properties affect the
system output? Giulianelli et al. (2018) make some
headway on this question. They predicted number
agreement from RNN hidden states and gates
at different time steps. They then intervened in
how the model processes the sentence by changing
a hidden activation based on the difference
between the prediction and the correct label. 这
improved agreement prediction accuracy, 和
effect persisted over the course of the sentence,
indicating that this information has an effect on the
模型. 然而, they did not report the effect on
overall model quality, for example by measuring
perplexity. Methods from causal inference may
shed new light on some of these questions.

最后,

the predictor for the auxiliary task
is usually a simple classifier, such as logistic
regression. A few studies compared different clas-
sifiers and found that deeper classifiers lead to
overall better results, but do not alter the respective
trends when comparing different models or com-
ponents (Qian et al., 2016乙; Belinkov, 2018).
有趣的是, Conneau et al. (2018) found that
tasks requiring more nuanced linguistic knowledge

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

数字 1: A heatmap visualizing neuron activations.
在这种情况下, the activations capture position in the
句子.

(例如, tree depth, coordination inversion) gain the
most from using a deeper classifier. 然而, 这
approach is usually taken for granted; given its
prevalence, it appears that better theoretical or
empirical foundations are in place.

3 可视化

Visualization is a valuable tool for analyzing
neural networks in the language domain and
超过. Early work visualized hidden unit ac-
tivations in RNNs trained on an artificial lan-
guage modeling task, and observed how they
correspond to certain grammatical relations such
as agreement (Elman, 1991). Much recent work
has focused on visualizing activations on spe-
cific examples in modern neural networks for
语言 (Karpathy et al., 2015; K´ad´ar et al.,
2017; Qian et al., 2016A; 刘等人。, 2018) 和
speech (Wu and King, 2016; Nagamine et al.,
2015; 王等人。, 2017乙). 数字 1 shows an
example visualization of a neuron that captures
position of words in a sentence. The heatmap
uses blue and red colors for negative and positive
activation values, 分别, enabling the user
to quickly grasp the function of this neuron.

The attention mechanism that originated in
work on NMT (Bahdanau et al., 2014) also lends
itself to a natural visualization. The alignments
obtained via different attention mechanisms have
produced visualizations ranging from tasks like
(Rockt¨aschel et al., 2016; Yin et al.,
NLI
2016), summarization (Rush et al., 2015), 公吨
post-editing (Jauregi Unanue et al., 2018), 和
morphological inflection (Aharoni and Goldberg,
2017) to matching users on social media (Tay
等人。, 2018). 数字 2 reproduces a visualization
of attention alignments from the original work by
Bahdanau et al. Here grayscale values correspond
to the weight of the attention between words in an
English source sentence (columns) and its French
翻译 (rows). As Bahdanau et al. 解释, 这
visualization demonstrates that the NMT model
learned a soft alignment between source and target
字. Some aspects of word order may also be

数字 2: A visualization of attention weights, 显示
soft alignment between source and target sentences
in an NMT model. Reproduced from Bahdanau et al.
(2014), with permission.

noticed, as in the reordering of noun and adjective
when translating the phrase ‘‘European Economic
Area.’’

Another line of work computes various saliency
measures to attribute predictions to input features.
The important or salient features can then be
visualized in selected examples (李等人。, 2016A;
Aubakirova and Bansal, 2016; Sundararajan et al.,
2017; Arras et al., 2017A,乙; Ding et al., 2017;
Murdoch et al., 2018; Mudrakarta et al., 2018;
Montavon et al., 2018; Godin et al., 2018).
Saliency can also be computed with respect to
intermediate values, rather than input features
(Ghaeini et al., 2018).7

An instructive visualization technique is to
cluster neural network activations and compare
them to some linguistic property. Early work
clustered RNN activations, showing that they or-
ganize in lexical categories (Elman, 1989, 1990).
Similar techniques have been followed by others.
Recent examples include clustering of sentence
embeddings in an RNN encoder trained in a
multitask learning scenario (Brunner et al., 2017),
and phoneme clusters in a joint audio-visual RNN
模型 (Alishahi et al., 2017).

A few online tools for visualizing neural net-
works have recently become available. LSTMVis

7一般来说, many of

the visualization methods are
adapted from the vision domain, where they have been
extremely popular; see Zhang and Zhu (2018) for a survey.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

(Strobelt et al., 2018乙) visualizes RNN activa-
系统蒸发散, focusing on tracing hidden state dynamics.8
Seq2Seq-Vis (Strobelt et al., 2018A) visual-
izes different modules in attention-based seq2seq
型号, with the goal of examining model deci-
sions and testing alternative decisions. 其他
tool focused on comparing attention alignments
was proposed by Rikters (2018). It also provides
translation confidence scores based on the distri-
bution of attention weights. NeuroX (Dalvi et al.,
2019乙) is a tool for finding and analyzing indi-
vidual neurons, focusing on machine translation.

Evaluation As in much work on interpretabil-
性, evaluating visualization quality is difficult
and often limited to qualitative examples. A few
notable exceptions report human evaluations of
visualization quality. Singh et al. (2018) 显示
human raters hierarchical clusterings of input
words generated by two interpretation methods,
and asked them to evaluate which method is more
accurate, or in which method they trust more.
Others reported human evaluations for attention
visualization in conversation modeling (弗里曼
等人。, 2018) and medical code prediction tasks
(Mullenbach et al., 2018).

The availability of open-source tools of the sort
described above will hopefully encourage users
to utilize visualization in their regular research
and development cycle. 然而, it remains to be
seen how useful visualizations turn out to be.

4 Challenge Sets

The majority of benchmark datasets in NLP are
drawn from text corpora, reflecting a natural
frequency distribution of language phenomena.
While useful in practice for evaluating system
performance in the average case, such datasets
may fail to capture a wide range of phenomena.
An alternative evaluation framework consists of
challenge sets, also known as test suites, 哪个
have been used in NLP for a long time (Lehmann
等人。, 1996), especially for evaluating MT sys-
特姆斯 (King and Falkedal, 1990; Isahara, 1995;
Koh et al., 2001). Lehmann et al. (1996) 著名的
several key properties of test suites: systematicity,
inclusion of negative data,
control over data,

8RNNVis (Ming et al., 2017) is a similar tool, but its
online demo does not seem to be available at the time of
写作.

and exhaustivity. They contrasted such datasets
with test corpora, ‘‘whose main advantage is
that they reflect naturally occurring data.’’ This
idea underlines much of the work on challenge
sets and is echoed in more recent work (王
等人。, 2018A). 例如, Cooper et al. (1996)
constructed a semantic test suite that targets phe-
nomena as diverse as quantifiers, plurals, 他-
phora, ellipsis, adjectival properties, 等等.

After a hiatus of a couple of decades,9 challenge
sets have recently gained renewed popularity in
the NLP community. 在这个部分, we include
datasets used for evaluating neural network
models that diverge from the common average-
case evaluation. Many of them share some of
the properties noted by Lehmann et al. (1996),
although negative examples (ill-formed data) 是
typically less utilized. The challenge datasets can
be categorized along the following criteria: 这
task they seek to evaluate, the linguistic phe-
nomena they aim to study, the language(s) 他们
目标, their size, their method of construction,
and how performance is evaluated.10 Table SM2
(in the supplementary materials) categorizes many
recent challenge sets along these criteria. 以下
we discuss common trends along these lines.

4.1 任务

By far, the most targeted tasks in challenge sets
are NLI and MT. This can partly be explained by
the popularity of these tasks and the prevalence of
neural models proposed for solving them. 也许
more importantly, tasks like NLI and MT arguably
require inferences at various linguistic levels,
making the challenge set evaluation especially
attractive. 仍然, other high-level tasks like reading
comprehension or question answering have not
received as much attention, and may also benefit
from the careful construction of challenge sets.

A significant body of work aims to evaluate
the quality of embedding models by correlating
the similarity they induce on word or sentence
pairs with human similarity judgments. 数据集
containing such similarity scores are often used

9One could speculate that their decrease in popularity
can be attributed to the rise of large-scale quantitative eval-
uation of statistical NLP systems.

10Another typology of evaluation protocols was put forth
by Burlot and Yvon (2017). Their criteria are partially
overlapping with ours, although they did not provide a
comprehensive categorization like the one compiled here.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

to evaluate word embeddings (Finkelstein et al.,
2002; Bruni et al., 2012; Hill et al., 2015, 国际米兰
alia) or sentence embeddings; see the many
shared tasks on semantic textual similarity in
SemEval (Cer et al., 2017, and previous editions).
Many of these datasets evaluate similarity at a
coarse-grained level, but some provide a more
fine-grained evaluation of similarity or related-
内斯. 例如, some datasets are dedicated
for specific word classes such as verbs (Gerz
等人。, 2016) or rare words (Luong et al., 2013),
or for evaluating compositional knowledge in sen-
tence embeddings (Marelli et al., 2014). 穆尔-
tilingual and cross-lingual versions have also
been collected (Leviant and Reichart, 2015; 氧化酶
等人。, 2017). Although these datasets are widely
用过的, this kind of evaluation has been criticized
for its subjectivity and questionable correlation
with downstream performance (Faruqui et al.,
2016).

4.2 Linguistic Phenomena

One of the primary goals of challenge sets is
to evaluate models on their ability to handle
specific linguistic phenomena. While earlier
studies emphasized exhaustivity (Cooper et al.,
1996; Lehmann et al., 1996), recent ones tend
to focus on a few properties of interest. 为了
例子, Sennrich (2017) introduced a challenge
set for MT evaluation focusing on five proper-
领带: subject–verb agreement, noun phrase agree-
蒙特, verb–particle constructions, polarity, 和
transliteration. Slightly more elaborated is an
MT challenge set for morphology,
包括
14 morphological properties (Burlot and Yvon,
2017). See Table SM2 for references to datasets
targeting other phenomena.

Other challenge sets cover a more diverse
in the spirit of
range of linguistic properties,
some of the earlier work. 例如, extend-
ing the categories in Cooper et al. (1996), 这
GLUE analysis set for NLI covers more than
30 phenomena in four coarse categories (词汇的
语义学, predicate–argument structure,
逻辑,
and knowledge). In MT evaluation, Burchardt
等人. (2017) reported results using a large test
suite covering 120 现象, partly based on
Lehmann et al. (1996).11 Isabelle et al. (2017)

11Their dataset does not seem to be available yet, but more

details are promised to appear in a future publication.

and Isabelle and Kuhn (2018) prepared challenge
sets for MT evaluation covering fine-grained
phenomena at morpho-syntactic, 句法的, 和
lexical levels.

一般来说, datasets that are constructed pro-
grammatically tend to cover less fine-grained
linguistic properties, while manually constructed
datasets represent more diverse phenomena.

4.3 Languages

As unfortunately usual in much NLP work, 英语-
pecially neural NLP, the vast majority of challenge
sets are in English. This situation is slightly better
in MT evaluation, where naturally all datasets
feature other languages (see Table SM2). A
notable exception is the work by Gulordava et al.
(2018), who constructed examples for evaluating
number agreement
in language modeling in
英语, 俄语, Hebrew, 和意大利语. 清楚地,
there is room for more challenge sets in non-
English languages. 然而, perhaps more press-
ing is the need for
large-scale non-English
datasets (besides MT) to develop neural models
for popular NLP tasks.

4.4 Scale

The size of proposed challenge sets varies greatly
(Table SM2). 正如预期的那样, datasets constructed
by hand are smaller, with typical sizes in the
数百. Automatically built datasets are much
larger, ranging from several thousands to close to a
hundred thousand (Sennrich, 2017), or even more
than one million examples (Linzen et al., 2016).
In the latter case, the authors argue that such a
large test set is needed for obtaining a sufficient
representation of rare cases. A few manually
constructed datasets contain a fairly large number
of examples, 最多 10 thousand (Burchardt et al.,
2017).

4.5 Construction Method

Challenge sets are usually created either prog-
rammatically or manually, by handcrafting spe-
cific examples. 经常, semi-automatic methods
are used to compile an initial list of examples that
is manually verified by annotators. The specific
method also affects the kind of language use and
how natural or artificial/synthetic the examples
是. We describe here some trends in dataset
construction methods in the hope that they may be
useful for researchers contemplating new datasets.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Several datasets were constructed by modifying
or extracting examples from existing datasets.
例如, Sanchez et al. (2018) and Glockner
等人. (2018) extracted examples from SNLI
(Bowman et al., 2015) and replaced specific words
such as hypernyms, synonyms, and antonyms,
followed by manual verification. Linzen et al.
(2016), 另一方面, extracted examples
of subject–verb agreement from raw texts using
heuristics,
resulting in a large-scale dataset.
Gulordava et al. (2018) extended this to other
agreement phenomena, but they relied on syntactic
information available in treebanks, 导致
smaller dataset.

Several challenge sets utilize existing test suites,
either as a direct source of examples (Burchardt
等人。, 2017) or for searching similar naturally
occurring examples (王等人。, 2018A).12

Sennrich (2017) introduced a method for eval-
uating NMT systems via contrastive translation
对, where the system is asked to estimate
the probability of two candidate translations that
are designed to reflect specific linguistic prop-
erties. Sennrich generated such pairs program-
matically by applying simple heuristics, 例如
changing gender and number to induce agreement
错误, resulting in a large-scale challenge set
of close to 100 thousand examples. This frame-
work was extended to evaluate other properties,
but often requiring more sophisticated genera-
tion methods like using morphological analyzers/
generators (Burlot and Yvon, 2017) or more man-
ual involvement in generation (Bawden et al.,
2018) or verification (Rios Gonzales et al., 2017).
最后, a few studies define templates that
capture certain linguistic properties and instanti-
ate them with word lists (Dasgupta et al., 2018;
Rudinger et al., 2018; 赵等人。, 2018A).
Template-based generation has the advantage of
providing more control, for example for obtaining
a specific vocabulary distribution, but this comes
at the expense of how natural the examples are.

4.6 评估

Systems are typically evaluated by their per-
formance on the challenge set examples, 任何一个
with the same metric used for evaluating the
system in the first place, or via a proxy, 如在

12Wang et al. (2018A) also verified that their examples do
not contain annotation artifacts, a potential problem noted in
recent studies (Gururangan et al., 2018; Poliak et al., 2018乙).

contrastive pairs evaluation of Sennrich (2017).
Automatic evaluation metrics are cheap to obtain
and can be calculated on a large scale. 然而,
they may miss certain aspects. Thus a few studies
report human evaluation on their challenge sets,
such as in MT (Isabelle et al., 2017; Burchardt
等人。, 2017).

We note here also that judging the quality of a
model by its performance on a challenge set can
be tricky. Some authors emphasize their wish
to test systems on extreme or difficult cases,
‘‘beyond normal operational capacity’’
(Naik
等人。, 2018). 然而, whether one should expect
systems to perform well on specially chosen cases
(as opposed to the average case) may depend
on one’s goals. To put results in perspective,
one may compare model performance to human
performance on the same task (Gulordava et al.,
2018).

5 Adversarial Examples

Understanding a model also requires an under-
standing of its failures. Despite their success
in many tasks, machine learning systems can
also be very sensitive to malicious attacks or
adversarial examples (Szegedy et al., 2014;
Goodfellow et al., 2015). In the vision domain,
small changes to the input image can lead to
misclassification, even if such changes are in-
distinguishable by humans.

The basic setup in work on adversarial examples
can be described as follows.13 Given a neural
network model f and an input example x, 我们
seek to generate an adversarial example x(西德:48) 那
will have a minimal distance from x, while being
assigned a different label by f :

||x − x(西德:48)||

min
X(西德:48)
s.t. F (X) =l, F (X(西德:48)) =l(西德:48), 我 (西德:54)=l(西德:48)

In the vision domain, x can be the input image
像素, resulting in a fairly intuitive interpretation
的
this optimization problem: measuring the
distance ||x − x(西德:48)|| is straightforward, and finding
X(西德:48) can be done by computing gradients with respect
to the input, since all quantities are continuous.

In the text domain, the input is discrete (为了
例子, a sequence of words), which poses two
问题. 第一的, it is not clear how to measure

13The notation here follows Yuan et al. (2017).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

the distance between the original and adversarial
examples, x and x(西德:48), which are two discrete objects
(说, two words or sentences). 第二, 最小化
this distance cannot be easily formulated as an
optimization problem, as this requires computing
gradients with respect to a discrete input.

In the following, we review methods for
handling these difficulties according to several
criteria: the adversary’s knowledge, the specificity
of the attack, the linguistic unit being modified,
and the task on which the attacked model was
trained.14 Table SM3 (in the supplementary ma-
terials) categorizes work on adversarial examples
in NLP according to these criteria.

5.1 Adversary’s Knowledge

Adversarial examples can be generated using
access to model parameters, 也称为
white-box attacks, or without such access, 和
black-box attacks (Papernot et al., 2016A, 2017;
Narodytska and Kasiviswanathan, 2017; 刘
等人。, 2017).

White-box attacks are difficult to adapt to the
text world as they typically require computing
gradients with respect to the input, which would
be discrete in the text case. One option is to
compute gradients with respect to the input word
嵌入, and perturb the embeddings. 自从
this may result in a vector that does not correspond
to any word, one could search for the closest word
embedding in a given dictionary (Papernot et al.,
2016乙); Cheng et al. (2018) extended this idea to
seq2seq models. Others computed gradients with
respect to input word embeddings to identify and
rank words to be modified (Samanta and Mehta,
2017; Liang et al., 2018). Ebrahimi et al. (2018乙)
developed an alternative method by representing
text edit operations in vector space (例如, a binary
vector specifying which characters in a word
would be changed) and approximating the change
in loss with the derivative along this vector.

Given the difficulty in generating white-box
adversarial examples for text, much research has
been devoted to black-box examples. 经常, 这
adversarial examples are inspired by text edits that
are thought to be natural or commonly generated
by humans, such as typos, misspellings, 所以

14These criteria are partly taken from Yuan et al. (2017),
where a more elaborate taxonomy is laid out. 现在,
尽管, the work on adversarial examples in NLP is more
limited than in computer vision, so our criteria will suffice.

在 (Sakaguchi et al., 2017; Heigold et al., 2018;
Belinkov and Bisk, 2018). Gao et al. (2018)
defined scoring functions to identify tokens to
modify. Their functions do not require access to
model internals, but they do require the model
prediction score. After identifying the important
代币, they modify characters with common edit
运营.

Zhao et al. (2018C) used generative adversar-
ial networks (GANs) (Goodfellow et al., 2014) 到
minimize the distance between latent
代表-
sentations of input and adversarial examples, 和
performed perturbations in latent space. 自从
latent representations do not need to come from
the attacked model, this is a black-box attack.

最后, Alzantot et al. (2018) developed an
interesting population-based genetic algorithm
for crafting adversarial examples for text clas-
sification by maintaining a population of mod-
ifications of the original sentence and evaluating
fitness of modifications at each generation. 他们
do not require access to model parameters, but do
use prediction scores. A similar idea was proposed
by Kuleshov et al. (2018).

5.2 Attack Specificity

Adversarial attacks can be classified to targeted
与. non-targeted attacks (袁等人。, 2017). A
targeted attack specifies a specific false class, 我(西德:48),
while a nontargeted attack cares only that the
predicted class is wrong, 我(西德:48) (西德:54)=l. Targeted attacks
are more difficult to generate, as they typically
require knowledge of model parameters; 那是,
they are white-box attacks. This might explain
why the majority of adversarial examples in NLP
are nontargeted (see Table SM3). A few targeted
attacks include Liang et al. (2018), which specified
a desired class to fool a text classifier, and Chen
等人. (2018A), which specified words or captions
to generate in an image captioning model. 其他的
targeted specific words to omit, 代替, or include
when attacking seq2seq models (Cheng et al.,
2018; Ebrahimi et al., 2018A).

Methods for generating targeted attacks in
NLP could possibly take more inspiration from
adversarial attacks in other fields. 例如,
in attacking malware detection systems, several
studies developed targeted attacks in a black-
box scenario (袁等人。, 2017). A black-box
targeted attack for MT was proposed by Zhao
等人. (2018C), who used GANs to search for

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

attacks on Google’s MT system after mapping
sentences into continuous space with adversarially
regularized autoencoders (赵等人。, 2018乙).

5.3 Linguistic Unit

Most of the work on adversarial text examples
involves modifications at the character- 和/或
word-level; see Table SM3 for specific references.
Other transformations include adding sentences
or text chunks (Jia and Liang, 2017) or gener-
ating paraphrases with desired syntactic structures
(Iyyer et al., 2018). In image captioning, 陈
等人. (2018A) modified pixels in the input image
to generate targeted attacks on the caption text.

5.4 任务

例如

一般来说, most work on adversarial examples
in NLP concentrates on relatively high-level
language understanding tasks,
文本
classification (including sentiment analysis) 和
reading comprehension, while work on text gen-
eration focuses mainly on MT. See Table SM3
for references. There is relatively little work on
adversarial examples for more low-level language
processing tasks, although one can mention
morphological tagging (Heigold et al., 2018) 和
spelling correction (Sakaguchi et al., 2017).

5.5 Coherence and Perturbation

Measurement

它

是

image examples,

In adversarial
相当
straightforward to measure the perturbation,
either by measuring distance in pixel space, 说
||x − x(西德:48)|| under some norm, or with alternative
measures that are better correlated with human
洞察力 (Rozsa et al., 2016). It is also visually
compelling to present an adversarial image with
imperceptible difference from its source image.
In the text domain, measuring distance is not as
直截了当, and even small changes to the text
may be perceptible by humans. 因此, 评估
of attacks is fairly tricky. Some studies imposed
constraints on adversarial examples to have a
small number of edit operations (Gao et al., 2018).
Others ensured syntactic or semantic coherence in
different ways, such as filtering replacements by
word similarity or sentence similarity (Alzantot
等人。, 2018; Kuleshov et al., 2018), or by using
synonyms and other word lists (Samanta and
Mehta, 2017; 杨等人。, 2018).

2018), but this does not indicate how perceptible
the changes are. More informative human stud-
ies evaluate grammaticality or similarity of the
adversarial examples to the original ones (赵
等人。, 2018C; Alzantot et al., 2018). Given the
inherent difficulty in generating imperceptible
changes in text, more such evaluations are needed.

6 Explaining Predictions

Explaining specific predictions is recognized as
a desideratum in intereptability work (Lipton,
2016), argued to increase the accountability of
machine learning systems (Doshi-Velez et al.,
2017). 然而, explaining why a deep, highly
non-linear neural network makes a certain pre-
diction is not trivial. One solution is to ask the
model to generate explanations along with its
primary prediction (Zaidan et al., 2007; 张
等人。, 2016),15 but this approach requires manual
annotations of explanations, which may be hard
to collect.

An alternative approach is to use parts of the
input as explanations. 例如, Lei et al.
(2016) defined a generator that learns a distri-
bution over text fragments as candidate ratio-
nales for justifying predictions, evaluated on
sentiment analysis. Alvarez-Melis and Jaakkola
(2017) discovered input–output associations in
a sequence-to-sequence learning scenario, 经过
perturbing the input and finding the most relevant
协会. Gupta and Sch¨utze (2018) inspected
how information is accumulated in RNNs towards
a prediction, and associated peaks in prediction
scores with important input segments. As these
methods use input segments to explain predictions,
they do not shed much light on the internal
computations that take place in the network.

现在, despite the recognized importance
for interpretability, our ability to explain pre-
dictions of neural networks in NLP is still limited.

7 Other Methods

We briefly mention here several analysis methods
that do not fall neatly into the previous sections.

A number of studies evaluated the effect
of erasing or masking certain neural network
成分, such as word embedding dimensions,
hidden units, or even full words (李等人。, 2016乙;

Some reported whether a human can classify
the adversarial example correctly (杨等人。,

15Other work considered learning textual-visual expla-

nations from multimodal annotations (Park et al., 2018).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Feng et al., 2018; Khandelwal et al., 2018;
Bau et al., 2018). 例如, 李等人.
(2016乙) erased specific dimensions in word
embeddings or hidden states and computed the
change in probability assigned to different labels.
Their experiments revealed interesting differences
between word embedding models, where in some
models information is more focused in individual
方面. They also found that information is
more distributed in hidden layers than in the input
层, and erased entire words to find important
words in a sentiment analysis task.

Several studies conducted behavioral experi-
ments to interpret word embeddings by defining
intrusion tasks, where humans need to identify
an intruder word, chosen based on difference
in word embedding dimensions (Murphy et al.,
2012; Fyshe et al., 2015; Faruqui et al., 2015).16
In this kind of work, a word embedding model
may be deemed more interpretable if humans are
better able to identify the intruding words. 自从
the evaluation is costly for high-dimensional rep-
resentations, alternative automatic metrics were
经过考虑的 (Park et al., 2017; Senel et al., 2018).
A long tradition in work on neural networks
is to evaluate and analyze their ability to learn
different formal
语言 (Das et al., 1992;
Casey, 1996; Gers and Schmidhuber, 2001; Bod´en
and Wiles, 2002; Chalup and Blair, 2003). 这
trend continues today, with research into modern
architectures and what formal
languages they
can learn (Weiss et al., 2018; Bernardy, 2018;
Suzgun et al., 2019), or the formal properties they
具有 (陈等人。, 2018乙).

8 结论

Analyzing neural networks has become a hot topic
in NLP research. This survey attempted to review
and summarize as much of the current research
尽可能, while organizing it along several
prominent themes. We have emphasized aspects
in analysis that are specific to language—namely,
what linguistic information is captured in neural
网络, which phenomena they are successful
at capturing, and where they fail. 许多
analysis methods are general techniques from the
larger machine learning community, 例如

16The methodology follows earlier work on evaluating the
interpretability of probabilistic topic models with intrusion
任务 (Chang et al., 2009).

visualization via saliency measures or evaluation
by adversarial examples. But even those some-
times require non-trivial adaptations to work with
text input. Some methods are more specific to
the field, but may prove useful in other domains.
Challenge sets or test suites are such a case.

Throughout

this survey, we have identified
several limitations or gaps in current analysis
工作:

• The use of auxiliary classification tasks
for identifying which linguistic properties
neural networks capture has become standard
实践 (部分 2), while lacking both a
theoretical foundation and a better empirical
consideration of
the link between the
auxiliary tasks and the original task.

• Evaluation of analysis work is often limited
or qualitative, especially in visualization
技巧 (部分 3). Newer forms of eval-
uation are needed for determining the suc-
cess of different methods.

• Relatively little work has been done on
explaining predictions of neural network
型号, apart from providing visualizations
(部分 6). With the increasing public
demand for explaining algorithmic choices
in machine learning systems (Doshi-Velez
and Kim, 2017; Doshi-Velez et al., 2017),
there is pressing need for progress in this
方向.

• Much of the analysis work is focused on the
English language, especially in constructing
challenge sets for various tasks (部分 4),
with the exception of MT due to its inherent
multilingual character. Developing resources
and evaluating methods on other languages
is important as the field grows and matures.

• More challenge sets for evaluating other tasks

besides NLI and MT are needed.

最后, as with any survey in a rapidly evolv-
ing field, this paper is likely to omit relevant
recent work by the time of publication. While we
intend to continue updating the online appendix
with newer publications, we hope that our sum-
marization of prominent analysis work and its
categorization into several themes will be a useful
guide for scholars interested in analyzing and
understanding neural networks for NLP.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

致谢

We would like to thank the anonymous review-
ers and the action editor for their very helpful
comments. This work was supported by the
Qatar Computing Research Institute. Y.B. is also
supported by the Harvard Mind, Brain, Behavior
倡议.

参考

Yossi Adi, Einat Kermany, Yonatan Belinkov,
Ofer Lavi, and Yoav Goldberg. 2017A. Anal-
ysis of sentence embedding models using
prediction tasks in natural language processing.
IBM Journal of Research and Development,
61(4):3–9.

Yossi Adi, Einat Kermany, Yonatan Belinkov,
Ofer Lavi, and Yoav Goldberg. 2017. Fine-
Grained Analysis of Sentence Embeddings
Using Auxiliary Prediction Tasks. In Interna-
tional Conference on Learning Representations
(ICLR).

Roee Aharoni

and Yoav Goldberg. 2017.
Morphological Inflection Generation with Hard
Monotonic Attention. 在诉讼程序中
55th Annual Meeting of the Association for
计算语言学 (体积 1: 长的
文件), pages 2004–2015. 协会
计算语言学.

Wasi Uddin Ahmad, Xueying Bai, Zhechao
黄, Chao Jiang, Nanyun Peng, and Kai-Wei
张. 2018. Multi-task Learning for Universal
Sentence Embeddings: A Thorough Evaluation
using Transfer and Auxiliary Tasks. arXiv
preprint arXiv:1804.07911v2.

Afra Alishahi, Marie Barking, and Grzegorz
Chrupała. 2017. Encoding of phonology in a
recurrent neural model of grounded speech.
the 21st Conference on
在诉讼程序中
Computational Natural Language Learning
(CoNLL 2017), pages 368–378. 协会
计算语言学.

加工, pages 412–421. 协会
计算语言学.

Moustafa Alzantot, Yash Sharma, Ahmed
Elgohary, Bo-Jhang Ho, Mani Srivastava, 和
Kai-Wei Chang. 2018. Generating Natural
Language Adversarial Examples. In Proceed-
ings of
这 2018 Conference on Empirical
Methods in Natural Language Processing,
pages 2890–2896. Association for Computa-
tional Linguistics.

Leila Arras, Franziska Horn, Gr´egoire Montavon,
Klaus-Robert M¨uller, and Wojciech Samek.
2017A. ‘‘What is relevant in a text document?’’:
An interpretable machine learning approach.
PLOS ONE, 12(8):1–23.

Leila Arras, Gr´egoire Montavon, Klaus-Robert
M¨uller, and Wojciech Samek. 2017乙. Explain-
ing Recurrent Neural Network Predictions in
Sentiment Analysis. 在诉讼程序中
这
8th Workshop on Computational Approaches
to Subjectivity, Sentiment and Social Media
分析, pages 159–168. 协会
计算语言学.

Mikel Artetxe, Gorka Labaka,

Inigo Lopez-
Gazpio, and Eneko Agirre. 2018. Uncovering
Divergent Linguistic Information in Word
Embeddings with Lessons for Intrinsic and
这
Extrinsic Evaluation. 在诉讼程序中
22nd Conference on Computational Natural
Language Learning, pages 282–291. Associa-
tion for Computational Linguistics.

Malika Aubakirova and Mohit Bansal. 2016. 国际米兰-
preting Neural Networks to Improve Politeness
Comprehension. 在诉讼程序中 2016
Conference on Empirical Methods in Natu-
ral Language Processing, pages 2035–2041.
计算语言学协会.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
本吉奥. 2014. Neural Machine Translation by
Jointly Learning to Align and Translate. arXiv
preprint arXiv:1409.0473v7.

David Alvarez-Melis and Tommi

Jaakkola.
2017. A causal framework for explaining the
predictions of black-box sequence-to-sequence
型号. 在诉讼程序中 2017 会议
on Empirical Methods in Natural Language

Anthony Bau, Yonatan Belinkov, Hassan Sajjad,
Nadir Durrani, Fahim Dalvi, and James Glass.
2018. Identifying and Controlling Important
Neurons in Neural Machine Translation. arXiv
preprint arXiv:1811.01157v1.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Rachel Bawden, Rico Sennrich, Alexandra Birch,
and Barry Haddow. 2018. Evaluating Discourse
Phenomena in Neural Machine Translation. 在
诉讼程序 2018 Conference of the
North American Chapter of the Association
for Computational Linguistics: Human Lan-
guage Technologies, 体积 1 (Long Papers),
pages 1304–1313. Association for Computa-
tional Linguistics.

Yonatan Belinkov. 2018. On Internal Language
Representations in Deep Learning: An Analy-
sis of Machine Translation and Speech Recog-
尼尼申. 博士. 论文, Massachusetts Institute of
技术.

Yonatan Belinkov and Yonatan Bisk. 2018. Syn-
thetic and Natural Noise Both Break Neural
机器翻译. In International Confer-
ence on Learning Representations (ICLR).

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,
Hassan Sajjad, and James Glass. 2017A.
What do Neural Machine Translation Models
Learn about Morphology? 在诉讼程序中
the 55th Annual Meeting of the Association
for Computational Linguistics (体积 1:
Long Papers), pages 861–872. 协会
计算语言学.

Yonatan Belinkov and James Glass. 2017, Anal-
yzing Hidden Representations in End-to-End
Automatic Speech Recognition Systems, 我. Guyon,
U. V. Luxburg, S. 本吉奥, H. 瓦拉赫, 右. 弗格斯,
S. Vishwanathan, 和R. 加内特, 编辑, Ad-
vances in Neural Information Processing Sys-
特姆斯 30, pages 2441–2451. 柯伦联合公司,
Inc.

Yonatan Belinkov, Llu´ıs M`arquez, Hassan Sajjad,
Nadir Durrani, Fahim Dalvi, and James Glass.
2017乙. Evaluating Layers of Representation in
Neural Machine Translation on Part-of-Speech
and Semantic Tagging Tasks. In Proceedings
of the Eighth International Joint Conference
on Natural Language Processing (体积 1:
Long Papers), 第 1–10 页. Asian Federation of
自然语言处理.

Jean-Philippe Bernardy. 2018. Can Recurrent
Neural Networks Learn Nested Recursion?
LiLT (Linguistic Issues in Language Tech-
科学), 16(1).

Arianna Bisazza and Clara Tump. 2018. The Lazy
Encoder: A Fine-Grained Analysis of the Role
of Morphology in Neural Machine Translation.
这 2018 会议
在诉讼程序中
Empirical Methods
in Natural Language
加工, pages 2871–2876. 协会
计算语言学.

Terra Blevins, Omer Levy, and Luke Zettlemoyer.
2018. Deep RNNs Encode Soft Hierarchi-
cal Syntax. In Proceedings of the 56th Annual
Meeting of
the Association for Computa-
tional Linguistics (体积 2: Short Papers),
pages 14–19. Association for Computational
语言学.

Mikael Bod´en and Janet Wiles. 2002. On learning
context-free and context-sensitive languages.
IEEE Transactions on Neural Networks, 13(2):
491–493.

Samuel R. Bowman, Gabor Angeli, Christopher
波茨, and Christopher D. 曼宁. 2015. A
large annotated corpus for learning natural
这
language inference.
2015 实证方法会议
自然语言处理, pages 632–642.
计算语言学协会.

在诉讼程序中

Elia Bruni, Gemma Boleda, Marco Baroni,
and Nam Khanh Tran. 2012. Distributional
Semantics in Technicolor. 在诉讼程序中
the 50th Annual Meeting of the Association
for Computational Linguistics (体积 1:
Long Papers), pages 136–145. 协会
计算语言学.

Gino Brunner, Yuyi Wang, Roger Wattenhofer,
and Michael Weigelt. 2017. 自然语言
Multitasking: Analyzing and Improving Syn-
tactic Saliency of Hidden Representations. 这
31st Annual Conference on Neural Information
加工 (NIPS)—Workshop on Learning
Disentangled Features: From Perception to
控制.

Aljoscha Burchardt, Vivien Macketanz,

Jon
Dehdari, Georg Heigold, Jan-Thorsten Peter,
and Philip Williams. 2017. A Linguistic
Evaluation of Rule-Based, Phrase-Based, 和
Neural MT Engines. The Prague Bulletin of
Mathematical Linguistics, 108(1):159–170.

Franck Burlot

and Franc¸ois Yvon. 2017.
Evaluating the morphological competence of

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Machine Translation Systems. In Proceedings
of the Second Conference on Machine Trans-
关系, pages 43–55. Association for Compu-
tational Linguistics.

Mike Casey. 1996. The Dynamics of Discrete-
Time Computation, with Application to Re-
current Neural Networks and Finite State
Machine Extraction. 神经计算,
8(6):1135–1178.

Daniel Cer, Mona Diab, Eneko Agirre, Inigo
Lopez-Gazpio,
and Lucia Specia. 2017.
SemEval-2017 Task 1: Semantic Textual Sim-
ilarity Multilingual and Crosslingual Focused
评估. In Proceedings of the 11th Inter-
national Workshop on Semantic Evaluation
(SemEval-2017), pages 1–14. 协会
计算语言学.

Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour,
and Emmanuel Dupoux. 2017. Learning weakly
supervised multimodal phoneme embeddings.
In Interspeech 2017.

Stephan K. Chalup and Alan D. 布莱尔. 2003.
Incremental Training of First Order Recurrent
Neural Networks to Predict a Context-Sensitive
语言. Neural Networks, 16(7):955–972.

Jonathan Chang, Sean Gerrish, Chong Wang,
Jordan L. Boyd-graber, 和大卫·M. Blei.
2009, Reading Tea Leaves: How Humans Inter-
pret Topic Models, 是. 本吉奥, D. Schuurmans,
J. D. 拉弗蒂, C. K. 我. 威廉姆斯, 和一个. Culotta,
编辑, Advances in Neural Information Pro-
cessing Systems 22, pages 288–296, 柯兰
Associates, 公司.

Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng
Yi, and Cho-Jui Hsieh. 2018A. Attacking visual
language grounding with adversarial examples:
A case study on neural image captioning. 在
Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 2587–2597.
计算语言学协会.

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu,
and Xuanjing Huang. 2015. Sentence Modeling
with Gated Recursive Neural Network. In Proc-
eedings of the 2015 Conference on Empirical
Methods in Natural Language Processing,
pages 793–798. Association for Computational
语言学.

Yining Chen, Sorcha Gilroy, Andreas Maletti,
Jonathan May, and Kevin Knight. 2018乙.
Recurrent Neural Networks
as Weighted
在诉讼程序中
Language Recognizers.
这 2018 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
体积 1 (Long Papers), pages 2261–2271.
计算语言学协会.

Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu
陈, and Cho-Jui Hsieh. 2018. Seq2Sick:
Evaluating the Robustness of Sequence-to-
Sequence Models with Adversarial Examples.
arXiv 预印本 arXiv:1803.01128v1.

Grzegorz Chrupała, Lieke Gelderloos, and Afra
Alishahi. 2017. Representations of language in
a model of visually grounded speech signal.
In Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 613–622.
计算语言学协会.

Ondˇrej C´ıfka and Ondˇrej Bojar. 2018. Are BLEU
and Meaning Representation in Opposition? 在
Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1362–1371.
计算语言学协会.

Alexis Conneau, Germ´an Kruszewski, Guillaume
Lample, Lo¨ıc Barrault, and Marco Baroni.
2018. What you can cram into a single
$&!#* 向量: Probing sentence embeddings
for linguistic properties. 在诉讼程序中
56th Annual Meeting of the Association for
计算语言学 (体积 1: 长的
文件), pages 2126–2136. 协会
计算语言学.

Robin Cooper, Dick Crouch, Jan van Eijck, Chris
狐狸, Josef van Genabith, Jan Jaspars, Hans
Kamp, David Milward, Manfred Pinkal,
Massimo Poesio, Steve Pulman, Ted Briscoe,
Holger Maier, and Karsten Konrad. 1996, 使用
the framework. Technical report, The FraCaS
Consortium.

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
Yonatan Belinkov, D. Anthony Bau, and James
Glass. 2019A, 一月. What Is One Grain
of Sand in the Desert? Analyzing Individual
Neurons in Deep NLP Models. In Proceedings

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

the Thirty-Third AAAI Conference on

的
人工智能 (AAAI).

Machine Learning. In arXiv preprint arXiv:
1702.08608v2.

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
Yonatan Belinkov, and Stephan Vogel. 2017.
Understanding and Improving Morphological
Learning in the Neural Machine Transla-
tion Decoder. In Proceedings of the Eighth
International Joint Conference on Natural Lan-
guage Processing (体积 1: Long Papers),
pages 142–151. Asian Federation of Natural
语言处理.

Fahim Dalvi, Avery Nortonsmith, D. Anthony
Bau, Yonatan Belinkov, Hassan Sajjad, Nadir
Durrani, and James Glass. 2019乙, 一月.
NeuroX: A Toolkit for Analyzing Individual
Neurons in Neural Networks. 在诉讼程序中
the Thirty-Third AAAI Conference on Artificial
智力 (AAAI): Demonstrations Track.

Sreerupa Das, C. Lee Giles, and Guo-Zheng
Sun. 1992. Learning Context-Free Grammars:
Capabilities and Limitations of a Recurrent
Neural Network with an External Stack
记忆. In Proceedings of The Fourteenth
Annual Conference of Cognitive Science
社会. 印第安纳大学, 页 14.

Ishita Dasgupta, Demi Guo, Andreas Stuhlm¨uller,
Samuel J. 格什曼, and Noah D. 古德曼.
2018. Evaluating Compositionality in Sen-
tence Embeddings. arXiv 预印本 arXiv:1802.
04302v2.

Dhanush Dharmaretnam and Alona Fyshe.
2018. The Emergence of Semantics
在
Neural Network Representations of Visual
这 2018
在诉讼程序中
信息.
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 2
(Short Papers), pages 776–780. 协会
计算语言学.

Yanzhuo Ding, Yang Liu, Huanbo Luan,
and Maosong Sun. 2017. Visualizing and
Understanding Neural Machine Translation. 在
Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1150–1159.
计算语言学协会.

Finale Doshi-Velez

and Been Kim. 2017.
Towards a Rigorous Science of Interpretable

Finale Doshi-Velez, Mason Kortz, Ryan
Budish, Chris Bavitz, Sam Gershman, 大卫
O’Brien, Stuart Shieber, James Waldo, 大卫
温伯格, and Alexandra Wood. 2017.
Accountability of AI Under the Law: 这
Role of Explanation. Privacy Law Scholars
会议.

Jennifer Drexler and James Glass. 2017. 分析
of Audio-Visual Features for Unsupervised
Speech Recognition. In International Workshop
on Grounding Language Understanding.

Javid Ebrahimi, Daniel Lowd, and Dejing
Dou. 2018A. On Adversarial Examples for
Character-Level Neural Machine Translation.
在诉讼程序中
the 27th International
Conference on Computational Linguistics,
pages 653–663. Association for Computational
语言学.

Javid Ebrahimi, Anyi Rao, Daniel Lowd, 和
Dejing Dou. 2018乙. HotFlip: White-Box
Adversarial Examples for Text Classification.
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 2: Short Papers), pages 31–36.
计算语言学协会.

Ali Elkahky, Kellie Webster, Daniel Andor,
and Emily Pitler. 2018. A Challenge Set
and Methods for Noun-Verb Ambiguity. 在
这 2018 会议
会议记录
Empirical Methods
in Natural Language
加工, pages 2562–2572. 协会
计算语言学.

Zied Elloumi, Laurent Besacier, Olivier Galibert,
and Benjamin Lecouteux. 2018. Analyzing
Learned Representations of a Deep ASR
Performance Prediction Model. In Proceedings
的 2018 EMNLP Workshop BlackboxNLP:
Analyzing and Interpreting Neural Networks
for NLP,
为了
计算语言学.

9–15. 协会

页面

Jeffrey L. Elman. 1989. Representation and
Structure in Connectionist Models, 大学
加利福尼亚州, 圣地亚哥, Center for Research
in Language.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Jeffrey L. Elman. 1990. Finding Structure in

时间. 认知科学, 14(2):179–211.

Jeffrey L. Elman. 1991. Distributed represen-
tations, simple recurrent networks, and gram-
matical structure. Machine Learning, 7(2–3):
195–225.

Allyson Ettinger, Ahmed Elgohary, and Philip
Resnik. 2016. Probing for semantic evidence of
composition by means of simple classification
任务. 在诉讼程序中
the 1st Workshop
on Evaluating Vector-Space Representations
for NLP, pages 134–139. 协会
计算语言学.

Manaal Faruqui, Yulia Tsvetkov, Pushpendre
Rastogi, and Chris Dyer. 2016. 问题
With Evaluation of Word Embeddings Using
Word Similarity Tasks. 在诉讼程序中
1st Workshop on Evaluating Vector Space
Representations for NLP.

Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama,
Chris Dyer, and Noah A. 史密斯. 2015. Sparse
Overcomplete Word Vector Representations.
In Proceedings of the 53rd Annual Meeting of
the Association for Computational Linguistics
and the 7th International Joint Conference
on Natural Language Processing (体积 1:
Long Papers), pages 1491–1500. 协会
for Computational Linguistics.

Shi Feng, Eric Wallace, Alvin Grissom II,
Mohit Iyyer, Pedro Rodriguez, and Jordan
Boyd-Graber. 2018. Pathologies of Neural
Models Make Interpretations Difficult.
在
这 2018 会议
会议记录
Empirical Methods
in Natural Language
加工, pages 3719–3728. 协会
计算语言学.

Lev Finkelstein, Evgeniy Gabrilovich, Yossi
Matias, Ehud Rivlin, Zach Solan, Gadi
Wolfman, and Eytan Ruppin. 2002. Placing
Search in Context: The Concept Revisited.
ACM Transactions on Information Systems,
20(1):116–131.

Robert Frank, Donald Mathis, and William
Badecker. 2013. The Acquisition of Anaphora
by Simple Recurrent Networks. 语言
Acquisition, 20(3):181–227.

Cynthia Freeman, Jonathan Merriman, Abhinav
Aggarwal, Ian Beaver, and Abdullah Mueen.
2018. Paying Attention to Attention: 高的-
lighting Influential Samples
in Sequential
分析. arXiv 预印本 arXiv:1808.02113v1.

Alona Fyshe, Leila Wehbe, Partha P. Talukdar,
Brian Murphy, and Tom M. 米切尔. 2015.
A Compositional and Interpretable Semantic
空间. 在诉讼程序中 2015 会议
的
这
the North American Chapter of
计算语言学协会:
人类语言技术, pages 32–41.
计算语言学协会.

David Gaddy, Mitchell Stern, and Dan Klein.
2018. What’s Going On in Neural Constituency
在诉讼程序中
Parsers? An Analysis.
这 2018 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
体积 1 (Long Papers), pages 999–1010.
计算语言学协会.

J. Ganesh, Manish Gupta,

and Vasudeva
Varma. 2017.
Interpretation of Semantic
Tweet Representations. 在诉讼程序中
2017 IEEE/ACM International Conference on
Advances in Social Networks Analysis and
Mining 2017, ASONAM ’17, pages 95–102,
纽约, 纽约, 美国. ACM.

Ji Gao, Jack Lanchantin, Mary Lou Soffa, 和
Yanjun Qi. 2018. Black-box Generation of
Adversarial Text Sequences to Evade Deep
Learning Classifiers. arXiv 预印本 arXiv:
1801.04354v5.

Lieke Gelderloos and Grzegorz Chrupała. 2016.
From phonemes to images: Levels of repre-
sentation in a recurrent neural model of visually-
grounded language learning. In Proceedings
of COLING 2016,
the 26th International
Conference on Computational Linguistics:
技术论文, pages 1309–1319, 大阪,
日本, The COLING 2016 Organizing
委员会.

Felix A. Gers and J¨urgen Schmidhuber. 2001.
LSTM Recurrent Networks Learn Simple
Context-Free and Context-Sensitive Languages.
IEEE Transactions on Neural Networks, 12(6):
1333–1340.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Daniela Gerz, Ivan Vuli´c, Felix Hill, Roi Reichart,
and Anna Korhonen. 2016. SimVerb-3500: A
Large-Scale Evaluation Set of Verb Similarity.
这 2016 会议
在诉讼程序中
Empirical Methods
in Natural Language
加工, pages 2173–2182. 协会
计算语言学.

Hamidreza Ghader and Christof Monz. 2017.
What does Attention in Neural Machine
Translation Pay Attention to? In Proceedings
of the Eighth International Joint Conference
on Natural Language Processing (体积 1:
Long Papers), pages 30–39. Asian Federation
of Natural Language Processing.

Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli.
2018. Interpreting Recurrent and Attention-
Based Neural Models: A Case Study on Nat-
ural Language Inference. 在诉讼程序中
2018 Conference on Empirical Methods in Nat-
ural Language Processing, pages 4952–4957.
计算语言学协会.

Mario Giulianelli, Jack Harding, Florian Mohnert,
Dieuwke Hupkes, and Willem Zuidema. 2018.
Under the Hood: Using Diagnostic Classifiers
to Investigate and Improve How Language
在
Models Track Agreement
诉讼程序 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting
Neural Networks for NLP, pages 240–248.
计算语言学协会.

信息.

Max Glockner, Vered Shwartz,

and Yoav
Goldberg. 2018. Breaking NLI Systems with
Sentences that Require Simple Lexical Infer-
恩塞斯. 在诉讼程序中
the 56th Annual
Meeting of
the Association for Computa-
tional Linguistics (体积 2: Short Papers),
pages 650–655. Association for Computational
语言学.

Fr´ederic Godin, Kris Demuynck, Joni Dambre,
Wesley De Neve, and Thomas Demeester.
2018. Explaining Character-Aware Neural Net-
works for Word-Level Prediction: Do They Dis-
cover Linguistic Rules? 在诉讼程序中
2018 Conference on Empirical Methods in Nat-
ural Language Processing, pages 3275–3284.
计算语言学协会.

Yoav Goldberg. 2017. Neural Network methods
for Natural Language Processing, 体积 10

of Synthesis Lectures on Human Language
Technologies. 摩根 & Claypool Publishers.

Ian Goodfellow, Yoshua Bengio, and Aaron
考维尔. 2016. Deep Learning, 与新闻界.
http://www.deepleaningbook.org.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi
Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio.
2014. Generative Adversarial Nets. In Advances
in Neural Information Processing Systems,
pages 2672–2680.

伊恩·J. 好人, Jonathon Shlens, and Christian
塞格德. 2015. Explaining and Harnessing
Adversarial Examples. In International Con-
ference on Learning Representations (ICLR).

Kristina Gulordava, Piotr Bojanowski, Edouard
Grave, Tal Linzen, and Marco Baroni. 2018.
Colorless Green Recurrent Networks Dream
Hierarchically. 在诉讼程序中
这 2018
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 1
(Long Papers), pages 1195–1205. 协会
for Computational Linguistics.

Abhijeet Gupta, Gemma Boleda, Marco Baroni,
and Sebastian Pad´o. 2015. Distributional
vectors encode referential attributes. In Pro-
ceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing,
pages 12–21. Association for Computational
语言学.

Pankaj Gupta and Hinrich Sch¨utze. 2018. LISA:
Explaining Recurrent Neural Network Judg-
ments via Layer-wIse Semantic Accumulation
and Example to Pattern Transformation. 在
会议记录
这 2018 EMNLP Work-
shop BlackboxNLP: Analyzing and Interpreting
Neural Networks for NLP, pages 154–164.
计算语言学协会.

Suchin Gururangan, Swabha Swayamdipta, Omer
征收, Roy Schwartz, Samuel Bowman, 和
诺亚A. 史密斯. 2018. Annotation Artifacts
in Natural Language Inference Data.
在
诉讼程序 2018 Conference of the
North American Chapter of the Association for
计算语言学: Human Language
Technologies, 体积 2 (Short Papers),

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

pages 107–112. Association for Computational
语言学.

凯瑟琳·L. 哈里斯. 1990. Connectionism and
Cognitive Linguistics. 连接科学,
2(1–2):7–33.

David Harwath and James Glass. 2017. 学习
Word-Like Units from Joint Audio-Visual Ana-
lysis. In Proceedings of the 55th Annual Meeting
的
the Association for Computational Lin-
语言学 (体积 1: Long Papers), pages 506–517.
计算语言学协会.

Georg Heigold, G¨unter Neumann, and Josef
van Genabith. 2018. How Robust Are
Character-Based Word Embeddings in Tagging
and MT Against Wrod Scramlbing or Randdm
Nouse? In Proceedings of the 13th Conference
of The Association for Machine Translation
in the Americas (体积 1: Research Track),
pages 68–79.

Felix Hill, Roi Reichart, and Anna Korhonen.
2015. SimLex-999: Evaluating Semantic
Models with (Genuine) Similarity Estimation.
计算语言学, 41(4):665–695.

Dieuwke Hupkes, Sara Veldhoen, and Willem
Zuidema. 2018. Visualisation and ‘‘diagnostic
classifiers’’ reveal how recurrent and recursive
neural networks process hierarchical structure.
Journal of Artificial Intelligence Research,
61:907–926.

Pierre Isabelle, Colin Cherry, and George Foster.
2017. A Challenge Set Approach to Eval-
In Proceed-
uating Machine Translation.
ings of
这 2017 Conference on Empirical
Methods in Natural Language Processing,
pages 2486–2496. Association for Computa-
tional Linguistics.

Pierre Isabelle and Roland Kuhn. 2018. A Chal-
lenge Set for French–> English Machine Trans-
关系. arXiv 预印本 arXiv:1806.02725v2.

Hitoshi

Isahara. 1995. JEIDA’s test-sets for
quality evaluation of MT systems—technical
evaluation from the developer’s point of view.
In Proceedings of MT Summit V.

Mohit Iyyer, John Wieting, Kevin Gimpel, 和
Luke Zettlemoyer. 2018. Adversarial Exam-
ple Generation with Syntactically Controlled

Paraphrase Networks. 在诉讼程序中
the North American
2018 Conference of
Chapter of the Association for Computational
语言学: 人类语言技术,
体积 1 (Long Papers), pages 1875–1885.
计算语言学协会.

Alon Jacovi, Oren Sar Shalom, and Yoav
Goldberg. 2018. Understanding Convolutional
Neural Networks for Text Classification. 在
诉讼程序 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting
Neural Networks
for NLP, pages 56–65.
计算语言学协会.

Inigo Jauregi Unanue, Ehsan Zare Borzeshi, 和
Massimo Piccardi. 2018. A Shared Atten-
tion Mechanism for Interpretation of Neural
Automatic Post-Editing Systems. In Proceed-
ings of the 2nd Workshop on Neural Machine
Translation and Generation, pages 11–17.
计算语言学协会.

Robin Jia and Percy Liang. 2017. Adversarial
examples for evaluating reading comprehension
系统. 在诉讼程序中 2017 会议
on Empirical Methods in Natural Language
加工, pages 2021–2031. 协会
计算语言学.

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster,
Noam Shazeer, and Yonghui Wu. 2016.
Exploring the Limits of Language Modeling.
arXiv 预印本 arXiv:1602.02410v2.

Akos K´ad´ar, Grzegorz Chrupała, and Afra
of Lin-
2017. Representation
Alishahi.
guistic Form and Function in Recurrent
Neural Networks. 计算语言学,
43(4):761–780.

Andrej Karpathy, Justin Johnson, and Fei-Fei Li.
2015. Visualizing and Understanding Recurrent
网络. arXiv 预印本 arXiv:1506.02078v2.

Urvashi Khandelwal, He He, Peng Qi, and Dan
Jurafsky. 2018. Sharp Nearby, Fuzzy Far Away:
How Neural Language Models Use Context. 在
Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 284–294.
计算语言学协会.

Margaret King and Kirsten Falkedal. 1990.
Using Test Suites in Evaluation of Machine

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Translation Systems. In COLNG 1990 体积 2:
Papers Presented to the 13th International
Conference on Computational Linguistics.

Eliyahu Kiperwasser and Yoav Goldberg. 2016.
Simple and Accurate Dependency Parsing
Using Bidirectional LSTM Feature Represen-
tations. Transactions of
the Association for
计算语言学, 4:313–327.

Sungryong Koh, Jinee Maeng, Ji-Young Lee,
Young-Sook Chae, and Key-Sun Choi. 2001. A
test suite for evaluation of English-to-Korean
machine translation systems. In MT Summit
会议.

Arne K¨ohn. 2015. What’s in an Embedding?
Analyzing Word Embeddings through Multi-
lingual Evaluation. 在诉讼程序中 2015
Conference on Empirical Methods in Natu-
ral Language Processing, pages 2067–2073,
里斯本, Portugal. Association for Computa-
tional Linguistics.

Volodymyr Kuleshov,

Thakoor,
Tingfung Lau, and Stefano Ermon. 2018.
Adversarial Examples for Natural Language
Classification Problems.

Shantanu

Brenden Lake

and Marco Baroni.

2018.
Generalization without Systematicity: 上
Compositional Skills of Sequence-to-Sequence
这
Recurrent Networks. 在诉讼程序中
35th International Conference on Machine
学习, 体积 80 of Proceedings of Ma-
chine Learning Research, pages 2873–2882,
Stockholmsm¨assan, 斯德哥尔摩, 瑞典. PMLR.

Sabine Lehmann, Stephan Oepen, Sylvie Regnier-
Prost, Klaus Netter, Veronika Lux, Judith
克莱因, Kirsten Falkedal, Frederik Fouvry,
Dominique Estival, Eva Dauphin, Herve
Compagnion, Judith Baur, Lorna Balkan, 和
Doug Arnold. 1996. TSNLP—Test Suites for
自然语言处理. In COLING 1996
体积 2: The 16th International Conference
on Computational Linguistics.

Tao Lei, Regina Barzilay, and Tommi Jaakkola.
2016. Rationalizing Neural Predictions.
在
这 2016 会议
会议记录
in Natural Language
Empirical Methods
加工, pages 107–117. 协会
计算语言学.

Ira Leviant and Roi Reichart. 2015. Separated by
an Un-Common Language: Towards Judgment
Language Informed Vector Space Modeling.
arXiv 预印本 arXiv:1508.00106v5.

Jiwei Li, Xinlei Chen, Eduard Hovy, 和
Dan Jurafsky. 2016A. Visualizing and Under-
standing Neural Models in NLP. In Proceedings
的 2016 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
pages 681–691. Association for Computational
语言学.

Jiwei Li, Will Monroe, and Dan Jurafsky. 2016乙.
Understanding Neural Networks
通过
Representation Erasure. arXiv 预印本 arXiv:
1612.08220v3.

Bin Liang, Hongcheng Li, Miaoqiang Su,
Pan Bian, Xirong Li, and Wenchang Shi.
2018. Deep Text Classification Can Be
Fooled. In Proceedings of the Twenty-Seventh
International Joint Conference on Artificial
IJCAI-18, pages 4208–4215.
智力,
International Joint Conferences on Artificial
Intelligence Organization.

Tal Linzen, Emmanuel Dupoux, and Yoav
Goldberg. 2016. Assessing the Ability of
LSTMs to Learn Syntax-Sensitive Depen-
dencies. 协会的交易
计算语言学, 4:521–535.

Zachary C. Lipton. 2016. The Mythos of Model
Interpretability. In ICML Workshop on Human
Interpretability of Machine Learning.

Nelson F. 刘, Omer Levy, Roy Schwartz,
Chenhao Tan, and Noah A. 史密斯. 2018.
LSTMs Exploit Linguistic Attributes of Data.
In Proceedings of The Third Workshop on Rep-
resentation Learning for NLP, pages 180–186.
计算语言学协会.

Yanpei Liu, Xinyun Chen, Chang Liu, 和
Dawn Song. 2017. Delving into Transferable
Adversarial Examples and Black-Box Attacks.
In International Conference on Learning
Representations (ICLR).

Thang Luong, Richard Socher, and Christopher
曼宁. 2013. Better Word Representations
with Recursive Neural Networks for Mor-
phology. In Proceedings of the Seventeenth

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Conference on Computational Natural Lan-
guage Learning, pages 104–113. 协会
for Computational Linguistics.

Jean Maillard and Stephen Clark. 2018. Latent
Tree Learning with Differentiable Parsers:
Shift-Reduce Parsing and Chart Parsing. 在
Proceedings of the Workshop on the Relevance
of Linguistic Structure in Neural Architectures
for NLP, pages 13–18. Association for Com-
putational Linguistics.

Marco Marelli, Luisa Bentivogli, Marco
Baroni, Raffaella Bernardi, Stefano Menini,
and Roberto Zamparelli. 2014. SemEval-
2014 任务 1: Evaluation of Compositional
Distributional Semantic Models on Full Sen-
through Semantic Relatedness and
时态
Textual Entailment. 在诉讼程序中
这
8th International Workshop on Semantic Eval-
uation (SemEval 2014), 第 1–8 页. 协会
for Computational Linguistics.

右. Thomas McCoy, Robert Frank, and Tal
扁豆. 2018. Revisiting the poverty of the
刺激: Hierarchical generalization without a
hierarchical bias in recurrent neural networks.
In Proceedings of the 40th Annual Conference
认知科学学会.

Pramod Kaushik Mudrakarta, Ankur Taly,
Mukund Sundararajan, and Kedar Dhamdhere.
2018. Did the Model Understand the Question?
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1896–1906.
计算语言学协会.

James Mullenbach, Sarah Wiegreffe, Jon Duke,
Jimeng Sun, and Jacob Eisenstein. 2018.
Explainable Prediction of Medical Codes from
Clinical Text. 在诉讼程序中
这 2018
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 1
(Long Papers), pages 1101–1111. 协会
for Computational Linguistics.

瓦. James Murdoch, 彼得·J. 刘, and Bin Yu.
2018. Beyond Word Importance: 语境化
Decomposition to Extract Interactions from
In International Conference on
LSTMs.
Learning Representations.

2012. Learning Effective

Brian Murphy, Partha Talukdar, and Tom
和
米切尔.
Interpretable Semantic Models Using Non-
Negative Sparse Embedding. In Proceedings
of COLING 2012, pages 1933–1950. 这
科林 2012 Organizing Committee.

Risto Miikkulainen and Michael G. Dyer. 1991.
Natural Language Processing with Modular Pdp
Networks and Distributed Lexicon. 认知的
科学, 15(3):343–399.

Tasha Nagamine, Michael L. Seltzer, and Nima
Mesgarani. 2015. Exploring How Deep Neural
Networks Form Phonemic Categories.
在
Interspeech 2015.

Tom´aˇs Mikolov, Martin Karafi´at, Luk´aˇs Burget,
Jan ˇCernock`y, and Sanjeev Khudanpur. 2010.
Recurrent neural network based language
In Eleventh Annual Conference of
模型.
the International Speech Communication
协会.

Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen
李, Yuanzhe Chen, Yangqiu Song, and Huamin
Qu. 2017. Understanding Hidden Memories
In IEEE
of Recurrent Neural Networks.
Conference on Visual Analytics Science and
技术 (IEEE VAST 2017).

Gr´egoire Montavon, Wojciech Samek, and Klaus-
Robert M¨uller. 2018. Methods for interpreting
and understanding deep neural networks.
Digital Signal Processing, 73:1–15.

Tasha Nagamine, Michael L. Seltzer,

和
Nima Mesgarani. 2016. On the Role of Non-
linear Transformations in Deep Neural Net-
work Acoustic Models. In Interspeech 2016,
pages 803–807.

Aakanksha Naik, Abhilasha Ravichander,
Norman Sadeh, Carolyn Rose, and Graham
Neubig. 2018. Stress Test Evaluation for
Natural Language Inference. In Proceedings
的
the 27th International Conference on
计算语言学, pages 2340–2353.
计算语言学协会.

Nina Narodytska and Shiva Kasiviswanathan. 2017.
Simple Black-Box Adversarial Attacks on Deep
Neural Networks. 在 2017 IEEE Conference
on Computer Vision and Pattern Recognition
Workshops (CVPRW), pages 1310–1318.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Lars Niklasson and Fredrik Lin˚aker. 2000.
Distributed representations for extended syn-
tactic transformation. 连接科学,
12(3–4):299–314.

Tong Niu and Mohit Bansal. 2018. Adversarial
Over-Sensitivity and Over-Stability Strategies
for Dialogue Models. 在诉讼程序中
22nd Conference on Computational Natural
Language Learning, pages 486–496. Associa-
tion for Computational Linguistics.

Nicolas Papernot, Patrick McDaniel, and Ian
好人. 2016. Transferability in Machine
学习: From Phenomena to Black-Box
Attacks Using Adversarial Samples. arXiv
preprint arXiv:1605.07277v1.

Nicolas

Papernot,

Patrick McDaniel,

Ian
好人, Somesh Jha, Z. Berkay Celik,
and Ananthram Swami. 2017. Practical Black-
Box Attacks Against Machine Learning. 在
诉讼程序 2017 ACM on Asia Con-
ference on Computer and Communications
安全, ASIA CCS ’17, pages 506–519,
纽约, 纽约, 美国, ACM.

Nicolas Papernot, Patrick McDaniel, Ananthram
Swami, and Richard Harang. 2016. Crafting
Adversarial
Input Sequences for Recurrent
Neural Networks. In Military Communications
会议, MILCOM 2016, pages 49–54.
IEEE.

Dong Huk Park, Lisa Anne Hendricks, Zeynep
Akata, Anna Rohrbach, Bernt Schiele, Trevor
Darrell, and Marcus Rohrbach. 2018. 多-
modal Explanations: Justifying Decisions and
Pointing to the Evidence. In The IEEE Con-
ference on Computer Vision and Pattern
Recognition (CVPR).

Sungjoon Park, JinYeong Bak, and Alice Oh.
2017. Rotated Word Vector Representations
and Their Interpretability. 在诉讼程序中
这 2017 实证方法会议
自然语言处理, pages 401–411.
计算语言学协会.

Matthew Peters, Mark Neumann,

卢克
Zettlemoyer, and Wen-tau Yih. 2018. Dissect-
ing Contextual Word Embeddings: Architecture
和代表. 在诉讼程序中 2018

Conference on Empirical Methods in Natu-
ral Language Processing, pages 1499–1509.
计算语言学协会.

Adam Poliak, Aparajita Haldar, Rachel Rudinger,
J. Edward Hu, Ellie Pavlick, Aaron Steven
白色的, and Benjamin Van Durme. 2018A. 上校-
lecting Diverse Natural Language Inference
Problems for Sentence Representation Evalu-
化. 在诉讼程序中 2018 会议
on Empirical Methods in Natural Language
加工, pages 67–81. 协会
计算语言学.

Adam Poliak,

Jason Naradowsky, Aparajita
Haldar, Rachel Rudinger, and Benjamin Van
Durme. 2018. Hypothesis Only Baselines in
Natural Language Inference. In Proceedings
of the Seventh Joint Conference on Lexical
and Computational Semantics, pages 180–191.
计算语言学协会.

Jordan B. 波拉克. 1990. Recursive distrib-
智力,

uted representations. Artificial
46(1):77–105.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.
2016A. Analyzing Linguistic Knowledge in
Sequential Model of Sentence. In Proceedings
的 2016 经验方法会议
自然语言处理博士, pages 826–835,
Austin, 德克萨斯州. Association for Computational
语言学.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.
2016乙. Investigating Language Universal and
Specific Properties in Word Embeddings. 在
Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1478–1488,
柏林, 德国. Association for Computa-
tional Linguistics.

Marco Tulio Ribeiro, Sameer Singh,

和
Carlos Guestrin. 2018. Semantically Equivalent
Adversarial Rules for Debugging NLP models.
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 856–865.
计算语言学协会.

Mat¯ıss Rikters. 2018. Debugging Neural Ma-
chine Translations. arXiv 预印本 arXiv:1808.
02733v1.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Annette Rios Gonzales, Laura Mascarell, 和
Rico Sennrich. 2017. Improving Word Sense
Disambiguation in Neural Machine Translation
with Sense Embeddings. 在诉讼程序中
Second Conference on Machine Translation,
pages 11–19. Association for Computational
语言学.

Tim Rockt¨aschel, Edward Grefenstette, Karl
Moritz Hermann, Tom´aˇs Koˇcisk`y, and Phil
Blunsom. 2016. Reasoning about Entailment
with Neural Attention. In International Con-
ference on Learning Representations (ICLR).

Andras Rozsa, Ethan M. Rudd, and Terrance E.
Boult. 2016. Adversarial Diversity and Hard
Positive Generation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition Workshops, pages 25–32.

Rachel Rudinger,

Jason Naradowsky, Brian
Leonard, and Benjamin Van Durme. 2018.
Gender Bias in Coreference Resolution. 在
诉讼程序 2018 Conference of the
North American Chapter of the Association for
计算语言学: Human Language
Technologies, 体积 2 (Short Papers),
pages 8–14. Association for Computational
语言学.

D. 乙. Rumelhart and J. L. 麦克莱兰. 1986.
Parallel Distributed Processing: Explorations
in the Microstructure of Cognition. 体积 2,
chapter On Leaning the Past Tenses of English
Verbs, pages 216–271. 与新闻界, 剑桥,
嘛, 美国.

Alexander M. 匆忙, Sumit Chopra, and Jason
Weston. 2015. A Neural Attention Model
for Abstractive Sentence Summarization. 在
诉讼程序 2015 Conference on Em-
pirical Methods in Natural Language Pro-
cessing, pages 379–389. 协会
计算语言学.

Keisuke Sakaguchi, Kevin Duh, Matt Post, 和
Benjamin Van Durme. 2017. Robsut Wrod
Reocginiton via Semi-Character Recurrent
这
Neural Network.
Thirty-First AAAI Conference on Artificial
智力, 二月 4-9, 2017, 旧金山,
加利福尼亚州, USA., pages 3281–3287. AAAI
按.

在诉讼程序中

Suranjana Samanta and Sameep Mehta. 2017.
Towards Crafting Text Adversarial Samples.
arXiv 预印本 arXiv:1707.02812v1.

Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel.
2018. Behavior Analysis of NLI Models:
Uncovering the Influence of Three Factors
on Robustness. 在诉讼程序中 2018
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 1
(Long Papers), pages 1975–1985. 协会
for Computational Linguistics.

Motoki Sato, Jun Suzuki, Hiroyuki Shindo, 和
Yuji Matsumoto. 2018. Interpretable Adversar-
ial Perturbation in Input Embedding Space for
Text. In Proceedings of the Twenty-Seventh
International Joint Conference on Artifi-
cial Intelligence, IJCAI-18, pages 4323–4330.
International Joint Conferences on Artificial
Intelligence Organization.

Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy,
Aykut Koc, and Tolga Cukur. 2018. 硒-
mantic Structure and Interpretability of Word
IEEE/ACM Transactions on
Embeddings.
声音的, Speech, and Language Processing.

Rico Sennrich. 2017. How Grammatical

Is
Character-Level Neural Machine Translation?
Assessing MT Quality with Contrastive Trans-
lation Pairs. In Proceedings of the 15th Con-
ference of
这
计算语言学协会:
体积 2, Short Papers, pages 376–382.
计算语言学协会.

the European Chapter of

Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning
Jiang, and Jian Sun. 2018. Learning Visually-
Grounded Semantics from Contrastive Adver-
sarial Samples. 在诉讼程序中
the 27th
国际计算会议
语言学, pages 3715–3727. 协会
计算语言学.

Xing Shi, Kevin Knight, and Deniz Yuret.
2016A. Why Neural Translations are the Right
Length. 在诉讼程序中 2016 会议
on Empirical Methods in Natural Language
加工, pages 2278–2282. 协会
计算语言学.

Xing Shi, Inkit Padhi, and Kevin Knight. 2016乙.
Does String-Based Neural MT Learn Source

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Syntax? 在诉讼程序中 2016 会议
on Empirical Methods in Natural Language
加工, pages 1526–1534, Austin, 德克萨斯州.
计算语言学协会.

Chandan Singh, 瓦. James Murdoch, and Bin
于. 2018. 分层的
为了
neural network predictions. arXiv 预印本
arXiv:1806.05337v1.

解释

Hendrik Strobelt, Sebastian Gehrmann, 迈克尔
Behrisch, Adam Perer, Hanspeter Pfister, 和
Alexander M. 匆忙. 2018A. Seq2Seq-Vis:
for Sequence-
A Visual Debugging Tool
to-Sequence Models. arXiv 预印本 arXiv:
1804.09299v1.

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter
Pfister, and Alexander M. 匆忙. 2018乙.
LSTMVis: A Tool for Visual Analysis of
Hidden State Dynamics in Recurrent Neural
网络. IEEE Transactions on Visualization
and Computer Graphics, 24(1):667–676.

Mukund Sundararajan, Ankur Taly, and Qiqi
严. 2017. Axiomatic Attribution for Deep
网络. In Proceedings of the 34th Inter-
national Conference on Machine Learning,
体积 70 of Proceedings of Machine Learn-
ing Research, pages 3319–3328, 国际的
Convention Centre, 悉尼, 澳大利亚. PMLR.

伊利亚·苏茨克维尔, Oriol Vinyals, and Quoc V. Le.
2014. Sequence to Sequence Learning with
Neural Networks. In Advances in neural infor-
mation processing systems, pages 3104–3112.

Mirac Suzgun, Yonatan Belinkov, and Stuart M.
Shieber. 2019. On Evaluating the Generaliza-
tion of LSTM Models in Formal Languages.
In Proceedings of the Society for Computation
in Linguistics (SCiL).

Christian Szegedy, Wojciech Zaremba,

伊利亚
吸勺, Joan Bruna, Dumitru Erhan, Ian
好人, and Rob Fergus. 2014. Intriguing
properties of neural networks. 在国际
on Learning Representations
会议
(ICLR).

Gongbo Tang, Rico Sennrich, and Joakim Nivre.
2018. An Analysis of Attention Mechanisms:
The Case of Word Sense Disambiguation in
Neural Machine Translation. 在诉讼程序中

the Third Conference on Machine Translation:
Research Papers, pages 26–35. 协会
计算语言学.

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui.
2018. CoupleNet: Paying Attention to Couples
with Coupled Attention for Relationship Rec-
ommendation. In Proceedings of the Twelfth
International AAAI Conference on Web and
Social Media (ICWSM).

Ke Tran, Arianna Bisazza, and Christof Monz.
2018. The Importance of Being Recurrent for
Modeling Hierarchical Structure. In Proceed-
ings of
这 2018 Conference on Empirical
Methods in Natural Language Processing,
pages 4731–4736. Association for Computa-
tional Linguistics.

Eva Vanmassenhove, Jinhua Du, and Andy Way.
2017. Investigating ‘‘Aspect’’ in NMT and
表面贴装技术: Translating the English Simple Past and
Present Perfect. Computational Linguistics in
the Netherlands Journal, 7:109–128.

Sara Veldhoen, Dieuwke Hupkes, and Willem
Zuidema. 2016. Diagnostic Classifiers: Reveal-
ing How Neural Networks Process Hierarchical
Structure. In CEUR Workshop Proceedings.

Elena Voita, Pavel Serdyukov, Rico Sennrich,
and Ivan Titov. 2018. Context-Aware Neural
Machine Translation Learns Anaphora Resolu-
的. In Proceedings of the 56th Annual Meeting
of the Association for Computational Linguis-
抽动症 (体积 1: Long Papers), pages 1264–1274.
计算语言学协会.

Ekaterina Vylomova, Trevor Cohn, Xuanli
他, and Gholamreza Haffari. 2016. Word
Representation Models for Morphologically
Rich Languages in Neural Machine Translation.
arXiv 预印本 arXiv:1606.04217v1.

Alex Wang, Amapreet Singh, Julian Michael,
Felix Hill, Omer Levy, and Samuel R. Bowman.
2018A. GLUE: A Multi-Task Benchmark and
Analysis Platform for Natural Language Under-
常设. arXiv 预印本 arXiv:1804.07461v1.

Shuai Wang, Yanmin Qian, and Kai Yu. 2017A.
What Does the Speaker Embedding Encode?
In Interspeech 2017, pages 1497–1501.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Xinyi Wang, Hieu Pham, Pengcheng Yin, 和
Graham Neubig. 2018乙. A Tree-Based Decoder
for Neural Machine Translation. In Conference
on Empirical Methods in Natural Language
加工 (EMNLP). 布鲁塞尔, 比利时.

Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yi
李. 2017乙. Gate Activation Signal Analysis
for Gated Recurrent Neural Networks and
Its Correlation with Phoneme Boundaries. 在
Interspeech 2017.

Gail Weiss, Yoav Goldberg, and Eran Yahav.
2018. On the Practical Computational Power
of Finite Precision RNNs
for Language
Recognition. In Proceedings of the 56th Annual
Meeting of
the Association for Computa-
tional Linguistics (体积 2: Short Papers),
pages 740–745. Association for Computational
语言学.

Adina Williams, Andrew Drozdov, and Samuel R.
Bowman. 2018. Do latent tree learning models
identify meaningful structure in sentences?
Transactions of the Association for Compu-
tational Linguistics, 6:253–267.

Zhizheng Wu and Simon King. 2016. Inves-
tigating gated recurrent networks for speech
synthesis. 在 2016 IEEE International Con-
ference on Acoustics, Speech and Signal
加工 (ICASSP), pages 5140–5144. IEEE.

Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-
Ling Wang, and Michael I. 约旦. 2018.
Greedy Attack and Gumbel Attack: Generating
Adversarial Examples for Discrete Data. arXiv
preprint arXiv:1805.12316v1.

Wenpeng Yin, Hinrich Sch¨utze, Bing Xiang, 和
Bowen Zhou. 2016. ABCNN: Attention-Based
Convolutional Neural Network for Modeling
Sentence Pairs. Transactions of the Association
for Computational Linguistics, 4:259–272.

Omar Zaidan,

Jason Eisner, and Christine
Piatko. 2007. Using ‘‘Annotator Rationales’’
to Improve Machine Learning for Text Cate-
gorization. In Human Language Technologies
2007: The Conference of the North American
the Association for Computa-
Chapter of
tional Linguistics; Proceedings of the Main
会议, pages 260–267. 协会
计算语言学.

Quan-shi Zhang and Song-chun Zhu. 2018.
Visual
interpretability for deep learning: A
survey. Frontiers of Information Technology
& Electronic Engineering, 19(1):27–39.

Ye Zhang, Iain Marshall, and Byron C. 华莱士.
2016. Rationale-Augmented Convolutional
Neural Networks for Text Classification. 在
会议记录
这 2016 会议
in Natural Language
Empirical Methods
加工, pages 795–804. 协会
计算语言学.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
Ordonez, and Kai-Wei Chang. 2018A. 性别
Bias in Coreference Resolution: Evaluation and
Debiasing Methods. 在诉讼程序中 2018
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 2
(Short Papers), pages 15–20. 协会
计算语言学.

Junbo Zhao, Yoon Kim, Kelly Zhang, 亚历山大
匆忙, and Yann LeCun. 2018乙. Adversarially
Regularized Autoencoders. 在诉讼程序中
the 35th International Conference on Machine
学习, 体积 80 of Proceedings of Ma-
chine Learning Research, pages 5902–5911,
Stockholmsm¨assan, 斯德哥尔摩, 瑞典. PMLR.

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin
李. 2017. Adversarial Examples: Attacks and
Defenses for Deep Learning. arXiv 预印本
arXiv:1712.07107v3.

Zhengli Zhao, Dheeru Dua,

and Sameer
辛格. 2018C. Generating Natural Adversarial
In International Conference on
Examples.
Learning Representations.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
5
4
1
9
2
3
0
6
1

/
t

我

A
C
_
A
_
0
0
2
5
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3 Analysis Methods in Neural Language Processing: A Survey image

下载pdf