Target-Guided Structured Attention Network for - 麻省理工学院人工智能研究专业

Target-Guided Structured Attention Network for
Target-Dependent Sentiment Analysis

Ji Zhang Chengyao Chen Pengfei Liu Chao He Cane Wing-Ki Leung

Wisers AI Lab, Wisers Information Limited, HKSAR, 中国
{jasonzhang, stacychen, chaohe, caneleung}@wisers.com, ppfliu@gmail.com

抽象的

Target-dependent sentiment analysis (TDSA)
aims to classify the sentiment of a text towards
a given target. The major challenge of this
task lies in modeling the semantic relatedness
between a target and its context sentence. 这
paper proposes a novel Target-Guided Struc-
tured Attention Network (TG-SAN), 哪个
captures target-related contexts for TDSA in a
fine-to-coarse manner. Given a target and its
context sentence, the proposed TG-SAN first
identifies multiple semantic segments from the
sentence using a target-guided structured atten-
tion mechanism. It then fuses the extracted
segments based on their relatedness with the
target for sentiment classification. We present
comprehensive comparative experiments on
three benchmarks with three major findings.
第一的, TG-SAN outperforms the state-of-the-art
by up to 1.61% 和 3.58% in terms of accu-
racy and Marco-F1, 分别. 第二, 它
shows a strong advantage in determining the
sentiment of a target when the context sentence
contains multiple semantic segments. 最后,
visualization results show that the attention
scores produced by TG-SAN are highly
interpretable

1 介绍

Target-dependent sentiment analysis (TDSA) 是
an actively studied research topic with the aim to
determine the sentiment polarity of a text towards
a specific target. 例如, given a sentence
‘‘the food is so good and so popular that waiting
can really be a nightmare’’, the target-dependent
sentiments of food and waiting are positive and
negative, 分别.

172

The major challenge of TDSA lies in modeling
the semantic relatedness between the target and its
context sentence (Tang et al., 2016A; 陈等人。,
2017). Most recent progress in this area benefits
from the attention mechanism, which captures
the relevance between the target and every other
word in the sentence. Based on such word-level
correlations, several models have already been
proposed for constructing target-related sentence
representations for sentiment prediction (王
等人。, 2016; Tang et al., 2016乙; Liu and Zhang,
2017; 杨等人。, 2017; Ma et al., 2017).

One important underlying assumption in exist-
ing attention-based models is that words can be
used as independent semantic units for model-
ing the context sentence when performing TDSA.
This assumption neglects the fact that a sentence
is oftentimes composed of multiple semantic seg-
评论, where each segment may contain multiple
words expressing a certain meaning or senti-
ment collectively. 此外, different semantic
segments may even contribute differently to the
sentiment of a certain target. 数字 1 shows an
example of a restaurant review, which contains
two salient semantic segments (highlighted in
蓝色的). 直观地, a TDSA model should be able
to identify both segments and determine that the
second one is more relevant to the writer’s sen-
timent towards the target [等待]. Existing meth-
消耗臭氧层物质, 然而, would only attend important words
(highlighted in red) such as ‘‘good’’, ‘‘popular’’,
‘‘really’’, and ‘‘nightmare’’ individually through
the aforementioned assumption.

We hypothesize that the ability to uncover mul-
tiple semantic segments and their relatedness
with the target from a context sentence will be
beneficial for TDSA. In this light, we propose a
fine-to-coarse TDSA framework, 即, 目标-
Guided Structured Attention Network (TG-SAN)

计算语言学协会会刊, 卷. 8, PP. 172–182, 2020. https://doi.org/10.1162/tacl 00308
动作编辑器: Walter Daelemans. 提交批次: 8/2019; 修改批次: 12/2019; 已发表 4/2020.
C(西德:13) 2020 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

2 相关工作

Given a target and its context sentence, the major
challenge of TDSA lies in identifying target-
related contexts in the sentence for determining
the target’s sentiment. Early work adopted rule-
based methods or statistical methods to solve this
问题 (Ding et al., 2008; 赵等人。, 2010;
Jiang et al., 2011). These methods relied either on
handcrafted features, 规则, or sentiment lexicons,
all of which required massive manual efforts.

最近几年, neural networks have achieved
great success in various fields for their strong rep-
resentation capability. They have also been proven
effective in modeling the relatedness between the
target and its contexts. Recursive neural networks
were first used by Dong et al. (2014) and Nguyen
and Shirai (2015) for TDSA. 具体来说, 这
target was first converted into the root node of a
parsing tree, and then it contexts were composed
based on syntactic relations in the tree. 像这样
approaches rely strongly on dependency parsing,
they fall short when analyzing nonstandard texts
such as comments and tweets, which are com-
monly used for sentiment analysis.

Another line of work applied recurrent neural
网络 (RNN) and its extensions to TDSA for
their natural way of encoding sentences in a se-
quential fashion. 例如, Tang et al. (2016A)
utilized two RNNs to individually capture the
left and the right contexts of the target, 进而
combined the two contexts for sentiment predic-
的. 张等人. (2016) elaborated on this idea by
using a gate to leverage the contributions of the
two contexts for sentiment prediction. 然而,
such RNN-based methods place more emphasis on
the words near the target while ignoring the distant
那些, regardless of whether they are target-related.
最近, attention mechanisms have become
widely used for modeling the relatedness between
every context word and the target for TDSA
(王等人。, 2016; 杨等人。, 2017; Liu and
张, 2017; Ma et al., 2017). 例如, 哪个
等人. (2017) assigned attention scores to each con-
text word according to their relevance to the tar-
get, and combined all context words with their
attention scores to constitute the context represen-
tation of the target for sentiment classification.

The aforementioned attention-based methods
used a single attention layer to capture target-
related contexts. One drawback of this has been
recently examined by Chen et al. (2017) and Li

数字 1: 一个激励人心的例子, where darker shades
denote higher contributions to the sentiment of the
目标 [等待]. (A) A TDSA model should be able to
identify two salient segments from the sentence, 和
that the second one is more important for determining
the target’s sentiment. (乙) Existing attention-based
models would attend important words individually and
fail to determine their relatedness with the target.

在本文中. The core components of TG-SAN
include a Structured Context Extraction Unit
(SCU) and a Context Fusion Unit (CFU). As op-
the SCU
posed to using word-level attention,
utilizes a target-guided structured attention mech-
anism to encode multiple semantic segments of a
sentence as a structured embedding matrix, 在哪里
each vector in the matrix can be viewed as one
target-related context. The CFU then fuses the
extracted contexts based on their relatedness with
the target to construct the ultimate context repre-
sentation of the target for sentiment classification.
Our contributions are summarized as follows:

(1) We propose to uncover multiple semantic
segments and their relatedness with the target
in a sentence for TDSA.

(2) We devise a novel TG-SAN, which uses
a fine-to-coarse framework to produce the
context representation of the target. TG-SAN
utilizes a target-guided structured attention
mechanism to encode a sentence as a r-
dimensional matrix, where each vector can
be viewed as one target-related context. 这
matrix is further fused into a single context
vector by leveraging their relatedness with
the target for sentiment classification.

(3) We empirically demonstrate that TG-SAN
outperforms a variety of baselines and the
state-of-the-art on three benchmarks, 然后
it is effective in handling sentences com-
posed of multiple semantic segments. 我们
also present visualization results to reveal the
superior explanatory power of the proposed
模型.

173

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

数字 2: Graphical illustration of TG-SAN. The Memory Builder (部分 3.2) takes a sequence of dense
word vectors X = {x1, . . . , 希, . . . , xL} as input, and obtains the contextualized word representations
H = {h1, . . . , 你好, . . . , hL} via a Bi-LSTM. H is then split into a context memory Mc and a target memory Mt
based on the positions of the target. The SCU (部分 3.3) applies a self-attentive operation on the target memory
to obtain a structured target representation Rt, which is used to guide the extraction of r target-related segments
Rc from the context memory through a structured attention mechanism. The CFU (部分 3.4) generates the
target vector rt through a self-attentive operation on Rt, and then learns the contribution of each context to obtain
the ultimate context vector rc. 最后, the Output Layer (部分 3.5) composes the context vector and the target
vector for predicting the target’s sentiment.

等人. (2018). They argued that using one layer of
attention to attend all context words may introduce
noises and degrade classification accuracy. 到
alleviate this problem, Chen et al. (2017) proposed
refining the attended words in an iterative manner,
whereas Li et al. (2018) used a convolutional
neural network to extract n-gram features whose
contributions were decided by their relative posi-
tions to the target in the context sentence.

据我们所知, no existing study
has explicitly considered uncovering a sentence’s
semantic segments and learning their contribu-
tions to a target’s sentiment. We address this prob-
lem with a novel target-guided structured attention
network in this work.

3 Approach

We first mathematically formulate the TDSA
problem addressed in this paper, and then describe
the proposed TG-SAN. 数字 2 depicts the
architecture of TG-SAN.

3.1 Problem Formulation

j}米

j, . . . , il

A sentence is a sequence of words S =
{w1, . . . , wi, . . . , wL}, where wi is the one-hot
representation of a word and L is the length
of the sequence. Given a target, the positions
its mentions in S are denoted by T =
的
j, . . . , 它
{i1
j=1, where l is the number of
word tokens in the target and m is the number
of times the target appears in S. Lt = l ∗ m
is therefore the total number of word tokens of
the target in the sentence. Note that by allowing
m ≥ 1, our problem formulation explicitly
models the situation where the target has multiple
mentions in a sentence, whereas existing attention-
based TDSA models only addressed a single
mention situation (m = 1).

Given a context sentence S and a target’s
mentions indexed by T , our task is to predict
the sentiment polarity y ∈ O of the target, 在哪里
O = {−1, 0, 1} denote negative, neutral, 和
positive sentiments, 分别.

174

3.2 Memory Builder

The Memory Builder constructs the target memory
and the context memory from the input sentence as
如下. A lookup table E ∈ Rde×|V | is first built
to represent the semantics of each word by word
vectors, where de is the dimension of the word
vectors and |V | is the vocabulary size. The one-
hot representation of the word sequence S is then
converted into a sequence of dense word vectors
X = {x1, . . . , 希, . . . , xL}, where xi = Ewi.

A Bi-LSTM layer is placed on top of the
word vectors to obtain their contextualized word
陈述. The output of this Bi-LSTM
layer is a sequence H = {h1, . . . , 你好, . . . , hL},
where each hidden state hi ∈ R2dh is built by
−→hi and
concatenating the outputs of two LSTMs
←−hi.

−→hi =
←−hi =
hi = [

希,

−−−−→
LST M (西德:16)
←−−−−
LST M (西德:16)
−→hi;

希,
←−hi] ∈ R2dh

−−→hi−1(西德:17) ∈ Rdh
←−−hi−1(西德:17) ∈ Rdh

(1)

(2)

(3)

where dh denotes the dimension of each hidden
状态.

The sequence H ∈ RL×2dh is further split into
a target memory Mt and a context memory Mc
according to the positions of target mentions T .
Mt ∈ RLt×2dh consists of the representations of
the target words, while Mc ∈ RLc×2dh consists of
those of the context words, where Lc = L − Lt.

3.3 Structured Context Extraction

Unit (SCU)

Given the target memory and the context memory,
the target-related
the next step is to extract
segments which may appear in different parts of
the context sentence. 最近, Lin et al. (2017)
proposed a structured self-attention mechanism,
which represents a sentence as multiple semantic
细分市场, and applied such mechanism suc-
cessfully to document-level sentiment analysis.
In TDSA, 然而, not all semantic segments are
related to the target. We therefore build on the
idea of Lin et al. (2017) to devise a SCU, 哪个
is able to capture target-related segments as the
contexts for determining the target’s sentiment.

sentation using the self-attentive operation (林
等人。, 2017) as follows:

At = softmax
Rt = AtMt

(西德:0)

t tanh(W1
t

公吨

时间 )

(西德:1)

(4)

(5)

t and W2

where At ∈ Rr×Lt is a weight matrix and Rt ∈
Rr×2dh is the embedding matrix representing the
目标. W1
t are two parameters for the
self-attentive layer. r is a hyper-parameter refer-
ring to the number of rows in the target matrix.
换句话说, r represents the number of struc-
tured representations transformed from the target
memory Mt.

Following Lin et al. (2017), a penalization term
P is used in the loss function to encourage the
diversity of rows captured in Rt.

P = k

(西德:0)

AtAt

T − I

k2
F

(西德:1)

(6)

Target-guided contexts extraction. Given the
target matrix Rt, target-related semantic segments
are uncovered from the context memory Mc as
如下. A matrix Ac ∈ Rr×Lc is first built to
capture the relatedness between the target matrix
and the context memory using a bilinear attention
手术. It is then used to build a context matrix
Rc ∈ Rr×2dh, where each row in the matrix can
be viewed as a target-related semantic segment:
e

Ac = softmax
Rc = AcMc
e

RtWcMc

时间

(西德:0)

(西德:1)

(7)

(8)

where Wc is the parameter of the bilinear attention
手术.

A feed-forward network is further placed on top
Rc to produce its transformed
of the context matrix
Rc. A residual connection (他
表示
e
等人。, 2016) is then used to compose both matrices
乙
to obtain the final structured context representation
Rc ∈ Rr×2dh.

RcW1
Rc = ReLU(
e
乙
Rc = LayerNorm (西德:16)
s, W2

s + b1
Rc +
乙

s)W2
Rc(西德:17)
e

s + b2
s

(9)

(10)

s, b1

s, b2

where W1
s are learnable parameters of
the feed-forward network. The layer normalization
(Ba et al., 2016) used in Equation (10) helps to
prevent gradient vanishing and exploding.

3.4 Context Fusion Unit (CFU)

Structured target representation. The target
memory Mt is converted into a structured repre-

The CFU learns the contributions of the different
extracted contexts to the target’s sentiment, 和

175

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Laptop

Restaurant

训练

testing

训练

testing

训练

testing

# Positive
# Negative
# Neutral

1561
1560
3127

173
173
346

979
858
454

340
128
171

2158
800
631

728
194
196

桌子 1: Statistics of the experimental datasets.

produces the ultimate context vector of the target.
具体来说, a self-attentive operation is utilized
to fuse the target matrix Rt into a target vector rt.

at = softmax
rt = atRt

(西德:0)

m tanh(W1
米

时间 )

(西德:1)

(11)

(12)

where w2

m and W1

m are learnable parameters.

Given the target vector rt, the contribution of
each context is then learned to produce the ultimate
context vector rc ∈ R2dh:

X
我=1

αiRc[我]

rc =

αi =

经验值(βi)
r
j=1 exp(βj)
时间

磷
βi = Rc[我]Urt

(13)

(14)

(15)

where U is a weight matrix, Rc[我] ∈ R2dh
represents the i-th target-related context and αi
denotes its normalized contribution score.

3.5 Output Layer and Model Training

Consider the examples (A) ‘‘It takes a long time
to boot up’’, 和 (乙) ‘‘The battery life is long’’.
Although both targets (in italic) have similar con-
文本,
their sentimental orientations are totally
不同的. It is therefore necessary to consider
the target itself along with its contexts to predict
its sentiment.

In the output layer, the context vector rc and the
target vector rt are concatenated, and transformed
via a non-linear function. The transformed vector
is further used in conjunction with rc to build the
final feature vector rct:

rct = rc + F (Wf [rc; rt])

(16)

where f (·) denotes a non-linear activation
function, and the ReLU function is adopted in this
纸. A softmax layer is then applied to convert
the feature vector into a probability distribution:

q(y|rct) = softmax(Wqrct + bq)

(17)

176

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

where Wq ∈ RkOk×2dh and bq ∈ RkOk are
parameters of the softmax layer.

For a number of D training instances, 叉-
entropy loss with a L2 regularization term is
adopted as the loss function:

L = −

X
我=1

yi log(qi) + λ1X

我

圆周率 +

λ2
2

kθk2

2 (18)

where yi is the true sentiment label, qi is the
predicted probability of the true label, θ is the
set of parameters of TG-SAN, λ1 and λ2 are reg-
ularization coefficients, and Pi is the penaliza-
tion term for the i-th training instance (看
方程 (6)).

4 实验

4.1 实验装置

数据集
We evaluate the proposed TG-SAN on three public
基准数据集, 即, Tweet, Laptop, 和
Restaurant. The Tweet dataset contains tweets
collected from Twitter (Dong et al., 2014). 这
Laptop, and Restaurant datasets are from the
SemEval 2014 challenge (Pontiki et al., 2014),
reviews on laptops and
containing customer
restaurants, 分别. We discarded data in-
stances labeled as ‘‘Conflict’’ in the Laptop and
Restaurant datasets following previous studies.
桌子 1 summarizes statistics of the datasets.

We use classification accuracy and macro-F1

as evaluation metrics in all experiments.

Compared Models
To demonstrate the ability of the proposed model,
we compare it with three baseline approaches, 四
attention-based models, and the state-of-the-art.

支持向量机 (Kiritchenko et al., 2014): This was a top-
performing system in SemEval 2014. It utilized
various types of handcrafted features to build a
SVM classifier.

AdaRNN (Dong et al., 2014): This utilized
a recursive neural network based on dependency

tree structure to iteratively compose target-related
contexts from a sentence for sentiment classification.
TD-LSTM (Tang et al., 2016A): This employed
two LSTMs to separately model
the left and
the right contexts of a given target, and concat-
enated their last hidden states to predict the target’s
情绪.

ATAE-LSTM (王等人。, 2016): This used a
LSTM layer to model a sentence, and used an at-
tention layer to produce a weighted representation
of the sentence with respect to a given target.

IAN (Ma et al., 2017): This used two LSTMs to
separately model the sequence of target words and
that of context words in a sentence. It then applied
an interactive attention mechanism to capture the
relatedness between the target and its context for
sentiment classification.

MemNet (Tang et al., 2016乙): This applied
multiple hops of attention on the word embeddings
of the context sentence, and treated the output of
the last hop as the final representation of the target.
RAM (陈等人。, 2017): This proposed a
recurrent neural attention mechanism to iteratively
refine the context representation, and took the
combination of all constructed contexts as the
final representation for sentiment classification.

TNet (李等人。, 2018): It is the state-of-the-
art in target-dependent sentiment analysis. It first
transformed words considering their positions
with respect to the target, and used a convolutional
neural network to extract n-gram features from the
context sentence for sentiment classification. 笔记
that the published results of TNet were based
on the authors’ implementation with a bug in
data preprocessing.1 We fixed the identified bug,
retrained the TNet model with the parameters
suggested in the work of Li et al. (2018), 和
reported the revised results in this paper for
empirical comparison.

Experimental Settings
As no standard validation set is available for the
基准数据集, we randomly held out 20%
of the training set as the validation set for tun-
ing the hyper-parameters of TG-SAN. Settings
producing the highest validation accuracy are
listed in Table 2, and are adopted in the subsequent
experiments unless otherwise specified.

We initialized the embedding layer of TG-
SAN with the pre-trained 300-dimensional GloVe

1https://github.com/lixin4ever/TNet/

issues/4.

范围

Word embedding dimension de
LSTM hidden dimension dh
Dropout rate
不. of structured representations r
Penalization term coefficient λ1
Regularization term coefficient λ2
Batch size

Value

300
150
0.5
2
0.1
10−6
64

桌子 2: Hyper-parameter settings of TG-SAN.

vectors (Pennington et al., 2014), and fixed the
word vectors during the training process. 这
recurrent weight matrices were initialized with
random orthogonal matrices. All other weight ma-
trices were initialized by randomly sampling from
the uniform distribution U(−0.01, 0.01). All bias
vectors were initialized to zero. RMSProp was
used for network training by setting the learning
rate as 0.001 and the decay rate as 0.9. Dropout
(Srivastava et al., 2014) and early stopping were
adopted to alleviate overfitting. Dropout was
applied on the inputs of the Bi-LSTM layer and the
output layer with the same dropout rate shown in
桌子 2.

4.2 Main Results

We report the experimental results of TG-SAN
(r = 2) and the compared models in Table 3.
总之, TG-SAN outperforms all compared
models on the Tweet and the Restaurant datasets.
On the Laptop dataset, it also achieves the best
accuracy among all models, and macro-F1 com-
parable to the best-performing model, RAM (陈
等人。, 2017). Such results demonstrate the efficacy
of the proposed TG-SAN. We also observe that
the attention-based models perform better than the
baseline models in general. This is not surprising,
as different context words can be of different im-
portance to the sentiment of a target, a phenom-
enon that can be naturally captured by the attention
机制.

TNet and RAM are the most competitive among
all compared models, attributed to their efforts on
alleviating the noise produced by using a single
layer of attention, as already shown in previous
学习. 然而, we observe that their prediction
abilities vary across datasets: RAM performs
better than TNet on Laptop and Restaurant, 和
vice versa on Tweet. 相比之下, TG-SAN pro-
duces satisfactory performance consistently on

177

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

楷模

Laptop

Accuracy Macro-F1 Accuracy

Restaurant
Accuracy Macro-F1

基线

Attention-based

支持向量机 (2014)
AdaRNN (2014)
TD-LSTM (2016A)

ATAE-LSTM (2016)
IAN (2017)
MemNet (2016乙)
RAM (2017)

State-of-the-art

TNet (2018)

Proposed Model

TG-SAN

Ablations

w/o CFU
w/o SCU & CFU
w/o TG

0.6340♯
0.6630∗
0.6662♯

-
-
0.6850♯
0.6936∗

0.7327

0.7471

0.7312
0.7153
0.7269

0.6330♯
0.6590∗
0.6401♯

-
-
0.6691♯
0.6730∗

0.7132

0.7365

0.7141
0.6975
0.7093

0.7049∗
-
0.7183♯

0.6870∗
0.7210∗
0.7033♯
0.7449∗

-
-
0.6843♯

-
-
0.6409♯
0.7135∗

0.7465

0.6985

0.7527

0.7118

0.7465
0.7058
0.7324

0.7042
0.6559
0.6923

0.8016∗
-
0.7800♯

0.7720∗
0.7860∗
0.7816♯
0.8023∗

0.8005

0.8166

0.8095
0.8023
0.8131

-
-
0.6673♯

-
-
0.6583♯
0.7080∗

0.6901

0.7259

0.7189
0.6960
0.6986

桌子 3: Comparison of Accuracy and Macro-F1 among different models. Results marked with ♯ are
adopted from Chen et al. (2017), and those with ∗ are adopted from the original papers. Performance
improvements of the proposed TG-SAN model over the state-of-the-art, TNet (李等人。, 2018), 是
statistically significant at p < 0.01. all datasets, demonstrating the capability of the proposed fine-to-coarse attention framework in capturing the semantic relatedness between the target and the context sentence for TDSA. To conclude, we validated the efficacy of TG-SAN through comparative experiments. The advantage of TG-SAN over existing methods con- firms our hypothesis that semantic segments are the basic units for understanding target-dependent sentiments. It also shows that such segments can be effectively captured by the proposed target-guided structured attention mechanism. 4.3 Ablation Studies Three ablation models are designed to reveal the effectiveness of each compoent in TG-SAN. w/o CFU: This ablation model uses the SCU to capture target-related segments in a sentence, and averages all context vectors to constitute the vector rc in Equation (13) without distinguishing their different contributions. w/o SCU & CFU: In this ablation model, the combination of SCU and CFU is replaced by a simple attention layer. Specifically, the target is represented as the averaged vector of the target memory. It is then utilized to attend the most relevant words in the context sentence to build the context vector. In the output layer, the context vector and the target vector are both composed for sentiment prediction. w/o TG: In this ablation model, the guidance of the target in the SCU is removed to explore the effect of the target on context extraction. Hence, the SCU is reduced to the one proposed by Lin et al. (2017), which extracts semantic segments from the sentence using the self-attentive mechanism. Table 3 reports the results of the three ablation models. We observe that performance degrades when the attention layer capturing the contrib- utions of contexts is removed in w/o CFU. This indicates that some contexts are indeed more im- portant than the others in deciding the sentiment of a target, and the difference is well captured by CFU. Results also show that the use of SCU is crucial. Comparing w/o CFU and w/o SCU & CFU, the macro-F1 of the latter drops drastically by 1.66%, 4.83%, and 2.29% on Tweet, Laptop, and Restaurant respectively. Furthermore, results worsened when the target’s guidance is replaced with the self-attentive mechanism as in w/o TG. This indicates that not all semantic segments appearing in the sentence are related to the target, and it is necessary to extract the related ones for TDSA. 4.4 Effects of r One important hyper-parameter in TG-SAN is r, which refers to the number of structured represen- tations extracted from the context sentence. We vary the value of r from 1 to 5 to investigate its effects on the TDSA task in this experiment. It 178 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 3 0 8 1 9 2 3 1 5 8 / / t l a c _ a _ 0 0 3 0 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 r = 1 2 3 4 5 Tweet Laptop Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1 Restaurant 0.7399 0.7471 0.7355 0.7399 0.7327 0.7261 0.7365 0.7210 0.7236 0.7182 0.7512 0.7527 0.7496 0.7433 0.7433 0.6998 0.7118 0.7063 0.7028 0.6972 0.8131 0.8166 0.8184 0.8220 0.8184 0.7167 0.7259 0.7348 0.7447 0.7407 Table 4: Effects of r, the number of structured representations extracted from the context sentence. Results show that capturing multiple contexts (r>1) 是有益的
for TDSA.

模型

Laptop
Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1

Restaurant

w/o SCU & CFU
TG-SAN (r = 1)

TG-SAN

0.6316
0.6842

0.7368

0.5250
0.5667

0.6850

0.6937
0.7487

0.7513

0.6415
0.6946

0.7114

0.8097
0.8230

0.8291

0.6995
0.7213

0.7366

桌子 5: Results on multi-segment sentences, where each sentence contains multiple targets
or multiple mentions of the same target. TG-SAN outperforms its degenerated version and
the baseline model, showing the advantage of the proposed structured attention mechanism in
uncovering multiple target-related contexts.

is worth noting that the attention mechanism of
the model degenerates into simple attention when
setting r as 1. 桌子 4 reports the results.

TG-SAN performs best when r = 2 在
Tweet and Laptop datasets, and r = 4 在
Restaurant dataset. 一般来说, we conclude that
the best setting of r is always greater than 1.
This demonstrates that multiple contexts are in-
deed beneficial for predicting target-dependent
sentiments, which are well captured by the struc-
tured attention mechanism. We also observe that
when r > 1, model performance may decrease as
r increases. The reason might be that a growing r
increases the complexity of the model, making it
more difficult to train and less generalizable.

4.5 Studies on Multi-segment Sentences

To better understand the advantage of structured
attention in TDSA, we further examine a specific
group of instances containing multiple semantic
细分市场. 具体来说, each instance considered
in this experiment either contains multiple dif-
ferent targets, or multiple mentions of the same
目标. We identified in total 38, 382, 和 825 这样的
instances from the Tweet, Laptop, and Restaurant
datasets, 分别. It is worth noting that multi-
segment instances are particularly common in

Laptop and Restaurant, accounting for 59.78%
和 73.79% of all instances, 分别.

In this experiment, we compare TG-SAN with
two models relying on a simple attention mecha-
nism. One is its degenerated version with r = 1,
and the other is a baseline model (w/o SCU &
CFU). 桌子 5 reports the comparative results.

We observe that TG-SAN outperforms the other
two models on all datasets. This demonstrates that
the structured attention mechanism provides a
richer context representation ability to identify the
target-related contexts more effectively, 这是
in line with our motivation.

4.6 Case Studies

We demonstrate through case studies that TG-
SAN produces not only superior classification
performances, but also highly interpretable results.
数字 3 presents test instances covering three
different situations: (1) multiple targets, 多种的
细分市场; (2) single target, multiple segments; 和
(3) single target, single segment. For each instance,
we plot a heat map to visualize the attention results
produced by TG-SAN and a baseline model (w/o
SCU & CFU) for comparison. Note that the atten-
tion score of each word in TG-SAN is produced
by the product of the context weights α ∈ Rr (看

179

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

数字 3: Visualization results (best viewed in color). Targets are shown in square brackets. Positive and negative
sentiments are highlighted in red and green respectively. In the visualized attention results, the darker the shading
of a word, the higher the attention weight it receives from the corresponding model. 一般来说, TG-SAN
demonstrates a stronger interpretability than the baseline model. It effectively uncovers all sentiment-related
contexts in each case, and identifies the most important ones with respect to a specific target. 相比之下, 上下文
captured by the baseline model are incomplete and inaccurate, as can be seen obviously from the attention results
it generates for ‘‘waiting’’ in sentence (1) and ‘‘google’’ in sentence (2).

方程 (14)) and the word contributions of each
context Ac ∈ Rr×Lc (see Equation (7)), denoted
by αT Ac.

Visualization results show that TG-SAN has a
strong ability in uncovering semantic segments
in a sentence. It can also effectively identify the
relatedness between a segment and a certain target.
例如, 句子 (1) contains two segments
expressing opposite sentiments towards the targets
‘‘food’’ and ‘‘waiting’’. TG-SAN identifies both
细分市场, and places more emphasis on the seg-
ment ‘‘so good’’ (分别, ‘‘nightmare’’) 什么时候
predicting the sentiment of ‘‘food’’ (分别,
‘‘waiting’’). 相比之下, whereas the baseline
model identifies all sentiment-related words, 它
fails to determine accurately the relatedness
between each word and the target. 因此, 它
produces a wrong sentiment prediction for ‘‘wait-
ing’’. Similar observations can be made from
句子 (2). In this sentence, TG-SAN explicitly
captures two target-related segments, whereas the
baseline model identifies only one. In case (3),
we observe that even when a context sentence
contains only one target-related segment, TG-
SAN still produces a reasonable explanation for
its prediction.

for target-dependent sentiment analysis (TDSA).
As opposed to the simple word-level attention
mechanism used by existing models, TG-SAN
uses a fine-to-coarse attention framework to un-
cover multiple target-related contexts and then
fuse them based on their relatedness with the tar-
get for sentiment classification. The effectiveness
of TG-SAN is validated through comprehensive
experiments on three public benchmark datasets.
It also demonstrates superior ability in handling
multi-segment sentences, which contain multiple
targets or multiple mentions of the same target.
此外, the attention results it produces are
highly interpretable as visualization results shown.
As future work, we may extend this study in
two directions. 第一的, the SCU is currently uti-
lized once to extract target-related contexts from a
句子, but extending such fine-to-coarse frame-
work through iterative use of multiple SCUs is
also feasible from the model perspective. 第二,
we would like to explore the effectiveness of
our model in other tasks where semantic related-
ness plays an important role as in TDSA, 例如
the answer sentence selection task for question-
answering.

致谢

5 Conclusions and Future Work

在本文中, we develop a novel Target-
Guided Structured Attention Network (TG-SAN)

We would like to thank all reviewers and the
action editor for their constructive suggestions
and comments. This work was supported in part

180

by the Enterprise Support Scheme (ESS) 的
Hong Kong Innovation and Technology Fund (不.
B/E022/18). 有什么意见, 发现, 结论
or recommendations expressed in this paper do not
reflect the views of the Government of the Hong
Kong Special Administrative Region, the Inno-
vation and Technology Commission, or the ESS
Assessment Panel.

参考

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey
乙. 欣顿. 2016. Layer normalization. arXiv
preprint arXiv:1607.06450v1.

Peng Chen, Zhongqian Sun, Lidong Bing, 和
Wei Yang. 2017. Recurrent attention network
on memory for aspect sentiment analysis. 在
诉讼程序 2017 Conference on Empir-
ical Methods in Natural Language Process-
英, EMNLP 2017, pages 452–461.

Xiaowen Ding, Bing Liu, and Philip S. 于. 2008.
A holistic lexicon-based approach to opinion
矿业. 在诉讼程序中 2008 内特纳-
tional Conference on Web Search and Data
Mining, pages 231–240. ACM.

Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang,
Ming Zhou, and Ke Xu. 2014. Adaptive re-
cursive neural network for target-dependent
twitter sentiment classification. In Proceedings
of the 52nd Annual Meeting of the Association
for Computational Linguistics (体积 2: Short
文件), 体积 2, pages 49–54.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, 和
Jian Sun. 2016. Deep residual
learning for
image recognition. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition, pages 770–778.

Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, 和
Tiejun Zhao. 2011. Target-dependent twitter
sentiment classification. 在诉讼程序中
49th Annual Meeting of the Association for
计算语言学: Human Language
Technologies-Volume 1, pages 151–160. Asso-
ciation for Computational Linguistics.

Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry,
and Saif Mohammad. 2014. NRC-canada-2014:
Detecting aspects and sentiment in customer

181

reviews. In Proceedings of the 8th International
Workshop on Semantic Evaluation (SemEval
2014), pages 437–442.

Xin Li, Lidong Bing, Wai Lam, and Bei Shi. 2018.
Transformation networks for target-oriented
sentiment classification. 在诉讼程序中
56th Annual Meeting of the Association for
计算语言学 (体积 1: 长的
文件), pages 946–956.

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos
Santos, Mo Yu, Bing Xiang, Bowen Zhou,
and Yoshua Bengio. 2017. A structured self-
attentive sentence embedding. 在国际
Conference on Learning Representations 2017.

Jiangming Liu and Yue Zhang. 2017. Attention
modeling for targeted sentiment. In Proceed-
ings of the 15th Conference of the European
Chapter of the Association for Computational
语言学: 体积 2, Short Papers, 体积 2,
pages 572–577.

Dehong Ma, Sujian Li, Xiaodong Zhang, 和
Houfeng Wang. 2017.
Interactive attention
networks for aspect-level sentiment classifica-
的. In Proceedings of the 26th International
智力,
Joint Conference on Artificial
pages 4068–4074.

Thien Hai Nguyen and Kiyoaki Shirai. 2015.
PhraseRNN: Phrase recursive neural net-
work for aspect-based sentiment analysis. 在
诉讼程序 2015 Conference on Empir-
ical Methods in Natural Language Process-
英, EMNLP 2015, pages 2509–2514.

杰弗里

Socher,

Pennington, 理查德

和
Christopher D. 曼宁. 2014. GloVe: 全球的
vectors for word representation. In Proceedings
的 2014 Conference on Empirical Meth-
ods in Natural Language Processing, EMNLP
2014, pages 1532–1543.

Maria Pontiki, Dimitris Galanis, John Pavlopoulos,
Ion Androutsopoulos,
Harris Papageorgiou,
and Suresh Manandhar. 2014. SemEval-2014
任务 4: Aspect based sentiment analysis. 在
Proceedings of the 8th International Workshop
on Semantic Evaluation, SemEval@COLING
2014, 都柏林, 爱尔兰, 八月 23-24, 2014,
pages 27–35.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Nitish Srivastava, Geoffrey E. 欣顿, Alex
and Ruslan
伊利亚·苏茨克维尔,
克里热夫斯基,
Salakhutdinov. 2014. Dropout: A simple way
to prevent neural networks from overfitting.
Journal of Machine Learning Research,
15(1):1929–1958.

Duyu Tang, Bing Qin, Xiaocheng Feng, 和
Ting Liu. 2016A. Effective LSTMs for target-
dependent sentiment classification. In Proceed-
ings of COLING 2016, the 26th International
Conference on Computational Linguistics:
技术论文, pages 3298–3307.

等级

Duyu Tang, Bing Qin, and Ting Liu. 2016乙.
Aspect
sentiment classification with
deep memory network. 在诉讼程序中
2016 实证方法会议
自然语言处理, EMNLP 2016,
pages 214–224.

Yequan Wang, Minlie Huang, Xiaoyan Zhu, 和
Li Zhao. 2016. Attention-based LSTM for

aspect-level sentiment classification. In Pro-
ceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing,
EMNLP 2016, pages 606–615.

Min Yang, Wenting Tu, Jingxuan Wang, Fei
徐, and Xiaojun Chen. 2017. Attention based
LSTM for target dependent sentiment classi-
fication. In Thirty-First AAAI Conference on
人工智能, pages 5013–5014.

Meishan Zhang, Yue Zhang, and Duy-Tin Vo.
2016. Gated neural networks for targeted sen-
timent analysis. In Thirtieth AAAI Conference
on Artificial Intelligence, pages 3087–3093.

Xin Zhao, Jing Jiang, Hongfei Yan, and Xiaoming
李. 2010. Jointly modeling aspects and opinions
with a MaxEnt-LDA hybrid. 在诉讼程序中
这 2010 实证方法会议
自然语言处理, EMNLP 2010,
pages 56–65.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
3
0
8
1
9
2
3
1
5
8

/
t

我

A
C
_
A
_
0
0
3
0
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

182
下载pdf