Categorical Metadata Representation for Customized Text Classification

Jihyeok Kim*1 Reinald Kim Amplayo*2

Kyungjae Lee1

Sua Sung1 Minji Seo1

Seung-won Hwang1

(* equal contribution)

1Yonsei University
zizi1532@yonsei.ac.kr

2爱丁堡大学

reinald.kim@ed.ac.uk

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

{lkj0509,dormouse,ggatalminji,seungwonh}@yonsei.ac.kr

抽象的

The performance of text classification has
improved tremendously using intelligently
engineered neural-based models, especially those
injecting categorical metadata as additional
信息, 例如, using user/product informa-
tion for sentiment classification. This infor-
mation has been used to modify parts of the
模型 (例如, word embeddings, attention mech-
万物有灵论) such that results can be customized
according to the metadata. We observe that
current representation methods for categorical
metadata, which are devised for human con-
消费, are not as effective as claimed in
popular classification methods, outperformed
even by simple concatenation of categorical
features in the final
layer of the sentence
encoder. We conjecture that categorical fea-
tures are harder to represent for machine use,
as available context only indirectly describes
the category, and even such context is often
scarce (for tail category). 为此, we pro-
pose using basis vectors to effectively incor-
porate categorical metadata on various parts
of a neural-based model. This additionally
decreases the number of parameters dramatic-
盟友, especially when the number of categori-
cal features is large. Extensive experiments on
various data sets with different properties are
performed and show that through our method,
we can represent categorical metadata more
effectively to customize parts of the model,
including unexplored ones, and increase the
performance of the model greatly.

介绍

Text classification is the backbone of most NLP
任务: review classification in sentiment analysis

201

(Pang et al., 2002), paper classification in sci-
entific data discovery (Sebastiani, 2002), 和
question classification in question answering (李
and Roth, 2002), to name a few. While prior meth-
ods require intensive feature engineering, 最近的
methods enjoy automatic extraction of features
from text using neural-based models (Socher et al.,
2011) by encoding texts into low-dimensional
dense feature vectors.

This paper discusses customized text clas-
sification, generalized from personalized text
classification (Baruzzo et al., 2009), where we
customize classifiers based on possibly multiple
different known categorical metadata information
information for sentiment
(例如, user/product
classification) instead of just the user information.
如图 1, in addition to the text,
a customizable text classifier is given a list of
categories specific to the text to predict its class.
Existing works applied metadata information to
improve the performance of a model, 例如
user and product (Tang et al., 2015) 信息
in sentiment classification, and author (罗森-
Zvi et al., 2004) 和出版 (Joorabchi and
Mahdi, 2011) information in paper classification.
Towards our goal, we are inspired by the ad-
vancement in neural-based models, incorporat-
ing categorical information ‘‘as is’’ and injecting
it on various parts of the model such as in the
word embeddings (Tang et al., 2015), 注意力
机制 (陈等人。, 2016; Amplayo et al.,
2018A) and memory networks (Dou, 2017).
these methods theoretically make use
的确,
of combined features from both textual and
categorical
特征, which make them more
powerful than disconnected features. 然而,
metadata is generated for human understanding,
and thus we claim that these categories need
to be carefully represented for machine use to

计算语言学协会会刊, 卷. 7, PP. 201–215, 2019. 动作编辑器: Bo Pang.
提交批次: 11/2018; 修改批次: 1/2019; Final submission: 2/2019; 已发表 4/2019.
C(西德:2) 2019 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 1: A high-level framework of models for the Customized Text Classification Task that inputs a text with n
代币 (例如, review) and m categories (例如, 用户, 产品) and outputs a class (例如, positive/negative). 例子
tasks are shown in the left of the figure.

improve the performance of the text classifier
有效地.

第一的, we empirically invalidate the results from
previous studies by showing in our experiments
on multiple data sets that popular methods using
metadata categories ‘‘as is’’ perform worse than
a simple concatenation of textual and categorical
feature vectors. We argue that this is because of
the difficulties of the model in learning optimized
dense vector representation of the categorical
features to be used by the classification model.
The reasons are two-fold: (A) categorical features
do not have direct context and thus rely solely
on classification labels when training the feature
vectors, 和 (乙) there are categorical information
that are sparse and thus cannot effectively learn
optimal feature vectors.

第二, we suggest an alternative represen-
站, using low-dimensional basis vectors to
mitigate the optimization problems of categorical
feature vectors. Basis vectors have nice properties
that can solve the issues presented here because
他们 (A) transform multiple categories into useful
combinations, which serve as mutual context to all

类别, 和 (乙) intelligently initialize vectors,
especially of sparse categorical information, 到
a suboptimal location to efficiently train them
更远. 此外, our method reduces the
number of trainable parameters and thus is flex-
ible for any kinds and any number of available
类别.

We experiment on multiple classification tasks
with different properties and kinds of catego-
ries available. Our experiments show that while
customization methods using categorical infor-
mation ‘‘as is’’ do not perform as well as the
naive concatenation method, applying our pro-
posed basis-customization method makes them
much more effective than the naive method. 我们的
method also enables the use of categorical meta-
data to customize other parts of the model, 这样的
as the encoder weights, that are previously un-
explored due to their high space complexity and
weak performance. We show that this unexplored
use of customization outperform popular and con-
ventional methods such as attention mechanism
when our proposed basis-customization method
is used.

202

2 Preliminaries

2.1 Problem: Customized Text Classification

The original text classification task is defined
as follows: Given a text W = {w1, w2, …, wn},
we are tasked to train a mapping function f (瓦 )
to predict a correct class y ∈ {y1, y2, …, yp}
among the p classes. The customized text
classification task makes use of the categorical
到
metadata information attached on the text
customize the mapping function. 在本文中,
we define categorical metadata as non-continuous
information that describes the text.1 An example
task is review sentiment classification with user
and product information as categorical metadata.
正式地, given a text t = {瓦, C}, 在哪里
W = {w1, w2, …, wn}, C= {c1, c2, …, 厘米}, wx
is the xth of the n tokens in the text, and cz is the
category label of the text on the zth category of
the m available categories, the goal of customized
text classification is to optimize a function fC(瓦 )
to predict a label y, where fC(瓦 ) is the classifier
dependent with C. In our example task, W is the
review text, and we have m = 2 categories where
c1 and c2 are the user and product information.

This is an interesting problem because of the
vast opportunities it provides. 第一的, we are moti-
vated to use categorical metadata because exist-
ing work has shown that non-textual additional
信息, such as POS tags (Go et al., 2009)
and latent topics (赵等人。, 2017), 可以使用
as strong supplementary supervision to improve
the performance of text classification. 第二,
while previously used additional information is
they are either domain-
found to be helpful,
dependent or very noisy (Amplayo et al., 2018乙).
另一方面, categorical metadata are
usually factual and valid information that are
either inherent (例如, user/product information)
or human-labeled (例如, research area). 最后,
the customized text classification task generalizes
the personalization problem (Baruzzo et al.,
2009), where instead of personalizing based on
single user information, we customize based on

1We limit our scope to texts with categorical metadata
信息 (product reviews, news articles, tweets, ETC。),
which covers most of the texts on the Web. Texts without
metadata can use predicted categorical information, 例如
topics from a topic model, which are commonly used (赵
等人。, 2017; Chou et al., 2017). 然而, because the predic-
tion may be incorrect, performance gains cannot be guaran-
teed. We leave the investigation of this area in future work.

possibly multiple categories, which may or may
not include user information. This consequently
creates an opportunity to develop customizable
virtual assistants (Papacharissi, 2002).

2.2 Base Classifier: BiLSTM

We use a Bidirectional Long Short Term Memory
(BiLSTM) 网络 (Hochreiter and Schmidhuber,
1997) as our base text classifier as it is proven to
work well on classifying text sequences (周
等人。, 2016). Although the methods that are
described here apply to other effective classifiers
还有, such as convolutional neural networks
(CNNs) (Kim, 2014) and hierarchical models
(杨等人。, 2016), we limit our experiments
to BiLSTM to cover more important findings.

Our BiLSTM classifier starts by encoding the
word embeddings using a forward and a back-
ward LSTM. The resulting pairs of vectors are
concatenated to get the final encoded word vec-
托尔斯, as shown here:

wi ∈ W
−→
h i = LST Mf (wi,
←−
h i = LST Mb(wi,
−→
h i;

←−
h i]

hi = [

−→
h i−1)
←−
h i+1)

(1)

(2)

(3)

(4)

下一个, we pool the encoded word vectors hi into
a text vector d using an attention mechanism
(Bahdanau et al., 2015; Luong et al., 2015), 哪个
calculates importance scores using a latent context
vector x for all words, normalizes the scores using
softmax, and uses them to do weighted sum on
encoded word vectors, as shown:

ei = x(西德:6)你好

ai =

d =

(西德:2)

经验值(不)
j exp(ej)

(西德:3)

hi ∗ ai

(5)

(6)

(7)

我

最后, we use a logistic regression classifier to
classify labels using learned weight matrix W (C)
and bias vector b(C):

y(西德:8) = W (C)d + 乙(C)

(8)

We can then train our classifier using any gradient
descent algorithm by minimizing the negative log
likelihood of the log softmax of predicted labels
y(西德:8) with respect to the actual labels y.

203

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2.3 Baseline 1: Concatenated BiLSTM

To incorporate the categories into the classifier,
a simple and naive method is to concatenate the
categorical features with the text vector d. 要做的事
这, we create embedding spaces for the dif-
ferent categories and get the category vectors
c1, c2, …, cm based on the category labels of text
d. We then use the concatenated vector as features
for the logistic regression classifier:

y(西德:8) = W (C)[d; c1; c2; …; 厘米] + 乙(C)

(9)

2.4 Baseline 2: Customized BiLSTM

Although the Concatenated BiLSTM easily makes
use of the categories as additional features for
the classifier, it is not able to leverage on the
possible low-level dependencies between textual
and categorical features.
There are different

levels of dependencies
between texts and categories. 例如, 什么时候
predicting the sentiment of a review ‘‘The food is
very sweet,’’ given the user who wrote the review,
the classifier should give a positive label if the user
likes sweet foods and a negative label otherwise.
在这种情况下, the dependency between the review
and the user is on the higher level, where we
look at relationships between the full text and the
类别. Another example is when predicting
the acceptance of a research paper given that the
research area is NLP, the classifier should focus
more on NLP words (例如, 语言, 文本) 相当
than less-related words (例如, 生物学, 化学).
在这种情况下, the dependency between the research
paper and the research area is on the lower level,
where we look at relationships between segments
of text and the categories.

We present five levels of Customized BiLSTM,
which differ on the location where we inject the
categorical features, listed here from the highest
level to the lowest level of dependencies between
text and categories. The main idea is to impose
category-specific weights, 而不是单一的
weight at each level of the model:

1. Customize on the bias vector: At this level
of customization, we look at the general
biases the categories have towards the prob-
莱姆. As a concrete example, when classify-
ing the type of message a politician wrote,
he/she can be biased towards writing personal
messages than policy messages. Instead of
using a single bias vector b(C) in the logistic

regression classifier (方程 8), 我们用
additional multiple bias vectors for each
类别, as shown below. 实际上, 这是
in spirit essentially equivalent to concate-
nated BiLSTM (方程 9), 哪里的
derivation is:

y(西德:8) = Wdd + bc1 + … + bcm + 乙(C)

= Wdd + Wc1c1 + … + Wcmcm + 乙(C)
= W (C)[d; c1; c2; …; 厘米] + 乙(C)

2. Customize on the linear transformation:
At this level of customization, we look at
the text-level semantic biases the categories
有. As a concrete example, in the sentiment
classification task, the review ‘‘The food is
very sweet’’ can have a negative sentiment
if the user who wrote the review does
not like sweets. Instead of using a single
weight matrix W (C) in the logistic regres-
sion classifier (方程 8), we use different
weight matrices for each category:

y(西德:8) = W (C)
c1

d + 瓦 (C)
c2

d + … + 瓦 (C)

cm d + 乙(C)

3. Customize on the attention pooling: 在
this level of customization, we look at the
word importance biases the categories have.
A concrete example is, when classifying a
research paper, NLP words should be focused
more when the research area is NLP. 反而
of using a single context vector x when calcu-
lating the attention scores e (方程 5),
we use different context vectors for each
类别:

你好 + … + X(西德:6)

cmhi

你好 + X(西德:6)
ei = x(西德:6)
c2
c1
a = sof tmax(e)

(西德:3)

d =

我

hi ∗ ai

4. Customize on the encoder weights: At this
level of customization, we look at the word
contextualization biases the categories need.
A concrete example is, given the text ‘‘deep
learning for political message classifica-
tion’’, when encoding the word classifica-
的, the BiLSTM should retain the semantics
of words political message more and forget
the semantics of other words more when
the research area is about politics. Instead of

204

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

using a single set of input, forget, 输出, 和
memory cell weights for each LSTM (Equa-
系统蒸发散 2 和 3), we use multiple sets of the
重量, one for each category:
⎤
⎡

⎤

⎡

⎢
⎢
⎢
⎣

gt
它
英尺
ot

⎥
⎥
⎥
⎦ =

⎢
⎢
⎢
⎣

tanh
σ
σ
σ

⎥
⎥
⎥
⎦

(西德:10)

(西德:3)

(西德:11)

瓦 (e)
ck

[wt; ht−1] + 乙

cs.CR (Cryptography and
安全)
Reject

Political Bias
Classification

Neutral
Personal

Partisan
支持

桌子 5: Example texts from the AAPR data set (upper) and Political Media data set (降低) with a variable
category label (research field and political bias) that changes the classification label.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 4: TSNE Visualization of the category vectors of Customized BiLSTM (first row) and Basis-Customized
BiLSTM (middle row), and the γ coefficients of the latter model (last row), when epoch is equal to 1, 2, 4, 和
when training has finished (left to right).

We finally examine the performance of our
models when data contain cold-start entities (IE。,
users/products may have zero or very few reviews)
using the Sparse80, subset of the Yelp 2013 数据
set provided in Amplayo et al. (2018A). 我们com-
pare our models with three competing models:

NSC (陈等人。, 2016), which uses a hierarchi-
cal LSTM encoder coupled with customization
on the attention mechanism, BiLSTM+CSAA
(Amplayo et al., 2018A), which uses a BiLSTM
encoder with customization on a CSAA mecha-
nism, and HCSC (Amplayo et al., 2018A), 这是

211

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 5: Accuracy per user/product review frequency on Yelp 2013 数据集. The review frequency value f
represents the frequencies in the range [F, F + 10), except when f = 100, where it represents the frequencies in
the range [F, inf).

楷模
NSC
BiLSTM+CSAA
HCSC
BiLSTM+encoder-basis-cust
BiLSTM+linear-basis-cust
BiLSTM+bias-basis-cust
BiLSTM+word-basis-cust
BiLSTM+attention-basis-cust

Accuracy
51.1
52.7
53.8
50.4
50.8
51.9
51.9
53.1

桌子 6: Performance comparison of competing models
in the Yelp 2013 Sparse80 data set.

a combination of CNN and the BiLSTM encoder
with customization on CSAA.

Results are reported in Table 6, which provide
us two observations. 第一的, the BiLSTM model
customized on the linear transformation matrix,
which performs the best on the original Yelp 2013
数据集 (见表 3), obtains a very sharp decrease
in performance. We posit that this is because basis
customization is not able to handle zero-shot cold-
start entities, which are amplified in the Yelp 2013
Sparse80 data set. We leave extensions of basis for
zero-shot or cold-start, studied actively in machine
学习 (王等人。, 2019) and recommendation
域 (孙等人。, 2012), 分别. Inspired
by CSAA (Amplayo et al., 2018A), using similar
review texts for inferring the cold-start user (或者
产品), we expect to infer meta context, similarly
based on similar meta context, which may mitigate
the zero-shot cold-start problem. 第二, 尽管
having no zero-shot learning capabilities, Basis-
Customized BiLSTM on the attention mechanism
performs competitively with HCSC and performs

better than BiLSTM+CSAA, which is Custom-
ized BiLSTM on attention mechanism with cold-
start awareness.

6 结论

We presented a new study on customized text
classification, a task where we are given, aside
from the text, its categorical metadata informa-
的, to predict the label of the text, customized by
the categories available. The issue at hand is that
these categorical metadata information are hardly
understandable and thus difficult to use by neural
machines. 这,
所以, makes neural-based
models hard to train and optimize to find a proper
categorical metadata representation. This issue is
very critical, in such a way that a simple concate-
nation of these categorical information provides
better performance than existing popular neural-
based methods. We propose solving this problem
by using basis vectors to customize parts of a clas-
sification model such as the attention mechanism
and the weight matrices in the hidden layers. 我们的
results show that customizing the weights using
the basis vectors boosts the performance of a basic
BiLSTM model, and also effectively outperforms
the simple yet robust concatenation methods. 我们
share the code and data sets used in our experi-
ments here: https://github.com/zizi1532/
BasisCustomize.

致谢

This work was supported by Microsoft Research
Asia and IITP/MSIT research grant (不. 2017-0-
01779).

212

参考

Reinald Kim Amplayo, Jihyeok Kim, Sua Sung,
and Seung-won Hwang. 2018A. Cold-start
aware user and product attention for sentiment
classification. In Proceedings of the 56th An-
nual Meeting of the Association for Compu-
tational Linguistics (体积 1: Long Papers),
pages 2535–2544. Association for Computa-
tional Linguistics.

Reinald Kim Amplayo, Kyungjae Lee, Jinyoung
杨, and Seung-won Hwang. 2018乙. 反式-
lations as additional contexts for sentence clas-
sification. In Proceedings of the Twenty-Seventh
International Joint Conference on Artificial
智力, IJCAI 2018, pages 3955–3961.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
本吉奥. 2015. Neural machine translation by
jointly learning to align and translate. In Pro-
ceedings of the 3rd International Conference
on Learning Representations, ICLR’15.

Andrea Baruzzo, Antonina Dattolo, Nirmala
Pudota, and Carlo Tasso. 2009. A general
framework for personalized text classification
and annotation. In Proceedings of the Workshop
on Adaptation and Personalization for Web 2.0,
AP WEB 2.0@UMAP.

Huimin Chen, Maosong Sun, Cunchao Tu, Yankai
林, and Zhiyuan Liu. 2016. Neural sentiment
classification with user and product atten-
的. 在诉讼程序中 2016 会议
on Empirical Methods in Natural Language
加工, pages 1650–1659. 协会
计算语言学.

Po-Hao Chou, Richard Tzong-Han Tsai, and Jane
Yung-jen Hsu. 2017. Context-aware sentiment
propagation using LDA topic modeling on
chinese conceptnet. Soft Computing, 21(11):
2911–2921.

Zi-Yi Dou. 2017. Capturing user and product
information for document level sentiment anal-
ysis with deep memory network. In Proceed-
这 2017 Conference on Empirical
ings of
Methods in Natural Language Processing,
pages 521–526. Association for Computational
语言学.

Alec Go, Richa Bhayani, and Lei Huang. 2009.
Twitter sentiment classification using distant

supervision. CS224N Project Report, 斯坦福大学,
1(12).

Sepp Hochreiter and J¨urgen Schmidhuber. 1997.
Long short-term memory. 神经计算,
9(8):1735–1780.

Arash Joorabchi and Abdulhussain E. Mahdi.
2011. An unsupervised approach to automatic
classification of scientific literature utilizing
bibliographic metadata. Journal of Information
科学, 37(5):499–514.

Yoon Kim. 2014. Convolutional neural networks
for sentence classification. 在诉讼程序中
这 2014 经验方法会议
自然语言处理博士 (EMNLP),
pages 1746–1751. Association for Computa-
tional Linguistics.

Xuan Nhat Lam, Thuc Vu, Trong Duc Le,
and Anh Duc Duong. 2008. Addressing cold-
start problem in recommendation systems.
在诉讼程序中
the 2nd International
Conference on Ubiquitous Information Man-
agement and Communication, ICUIMC 2008,
pages 208–211.

Fei-Fei Li, Robert Fergus, and Pietro Perona.
2006. One-shot learning of object categories.
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28(4):594–611.

Xin Li and Dan Roth. 2002. Learning question
In COLING 2002: The 19th
classifiers.
国际计算会议
语言学.

Rui Lin, Shujie Liu, Muyun Yang, Mu Li,
Ming Zhou, and Sheng Li. 2015. 分层的
recurrent neural network for document model-
英. 在诉讼程序中 2015 会议
on Empirical Methods in Natural Language
加工, pages 899–907. 协会
计算语言学.

Yunfei Long, Mingyu Ma, Qin Lu, Rong
Xiang, and Chu-Ren Huang. 2018. Dual mem-
ory network model for biased product review
classification. In Proceedings of the 9th Work-
shop on Computational Approaches to Subjec-
活力, Sentiment and Social Media Analysis,
pages 140–148. Association for Computational
语言学.

213

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

方法

Thang Luong, Hieu Pham, and Christopher D.
到
曼宁. 2015. Effective
attention-based neural machine translation. 在
诉讼程序 2015 Conference on Em-
pirical Methods in Natural Language Pro-
cessing, pages 1412–1421. 协会
计算语言学.

Dehong Ma, Sujian Li, Xiaodong Zhang,
Houfeng Wang, and Xu Sun. 2017. Cas-
cading multiway attentions for document-level
sentiment classification. 在诉讼程序中
Eighth International Joint Conference on Nat-
ural Language Processing (体积 1: 长的
文件), pages 634–643. Asian Federation of
自然语言处理.

Tomas Mikolov, Kai Chen, Greg Corrado, 和
Jeffrey Dean. 2013. Efficient estimation of
word representations in vector space. CoRR,
abs/1301.3781.

和

Pang, Lillian Lee,

Shivakumar
Vaithyanathan. 2002. Thumbs up? 情绪
classification using machine learning tech-
好的. 在诉讼程序中 2002 会议
on Empirical Methods in Natural Language
加工 (EMNLP 2002).

Zizi Papacharissi. 2002. The presentation of self
in virtual life: Characteristics of personal home
页面. Journalism & Mass Communication
季刊, 79(3):643–660.

杰弗里

Socher,

Pennington, 理查德

和
Christopher Manning. 2014. GloVe: 全球的
vectors for word representation. In Proceedings
的 2014 经验方法会议
自然语言处理博士 (EMNLP),
pages 1532–1543. Association for Computa-
tional Linguistics.

Matthew Peters, Mark Neumann, Mohit Iyyer,
Matt Gardner, Christopher Clark, Kenton Lee,
and Luke Zettlemoyer. 2018. Deep contex-
tualized word representations. In Proceedings
的 2018 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
体积 1 (Long Papers), pages 2227–2237.
计算语言学协会.

Proceedings of the 4th International Confer-
ence on Learning Representations, ICLR’16.

Michal Rosen-Zvi, 托马斯·L. Griffiths, 标记
Steyvers, and Padhraic Smyth. 2004. 这
author-topic model for authors and documents.
In UAI ’04, Proceedings of the 20th Confer-
ence in Uncertainty in Artificial Intelligence,
pages 487–494.

Fabrizio Sebastiani. 2002. Machine learning in
automated text categorization. ACM Computing
Surveys, 34(1):1–47.

Richard Socher, Jeffrey Pennington, Eric H.
黄, 安德鲁·Y. 的, and Christopher D.
曼宁. 2011. Semi-supervised recursive
autoencoders for predicting sentiment distri-
butions. 在诉讼程序中 2011 会议
on Empirical Methods in Natural Language
加工, pages 151–161. 协会
计算语言学.

Dong-ting Sun, Tao He, and Fu-hai Zhang. 2012.
Survey of cold-start problem in collaborative
filtering recommender system. Computer and
Modernization, 5:59–63.

Duyu Tang, Bing Qin, and Ting Liu. 2015.
Learning semantic representations of users and
products for document level sentiment classi-
fication. In Proceedings of the 53rd Annual
Meeting of the Association for Computational
Linguistics and the 7th International Joint
Conference on Natural Language Processing
(体积 1: Long Papers), pages 1014–1023.
计算语言学协会.

Wei Wang, Vincent W. 郑, Han Yu, 和
Chunyan Miao. 2019. A survey of zero-shot
学习: Settings, 方法, and applications.
ACM Transactions on Intelligent Systems and
技术, 10(2):13:1–13:37.

Pengcheng Yang, Xu SUN, Wei Li, and Shuming
Ma. 2018. Automatic academic paper rating
based on modularized hierarchical convolu-
tional neural network. 在诉讼程序中
56th Annual Meeting of the Association for
计算语言学 (体积 2: Short
文件), pages 496–502. 协会
计算语言学.

Sachin Ravi and Hugo Larochelle. 2016. Optimi-
zation as a model for few-shot learning. 在

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong
他, Alex Smola, and Eduard Hovy. 2016.

214

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
6
3
1
9
2
3
0
4
7

/
t

我

A
C
_
A
_
0
0
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Hierarchical attention networks for document
classification. 在诉讼程序中 2016 骗局-
ference of the North American Chapter of the
计算语言学协会: 胡-
man Language Technologies, pages 1480–1489.
计算语言学协会.

Matthew D. Zeiler. 2012. ADADELTA: An adap-
tive learning rate method. CoRR, abs/1212.5701.

Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming
徐, Hongyun Bao, and Bo Xu. 2016. Text
classification improved by integrating bidi-
rectional LSTM with two-dimensional max
pooling. COLING 论文集 2016,
the 26th International Conference on Com-
putational Linguistics: 技术论文,
pages 3485–3495.

Rui Zhao, Kezhi Mao, Rui Zhao, and Kezhi
Mao. 2017. Topic-aware deep compositional
models for sentence classification. IEEE/ACM
Transactions on Audio, Speech and Languange
加工, 25(2):248–260.

Pengcheng Zhu and Yujiu Yang. 2017. Parallel
multi-feature attention on neural sentiment
classification. 在诉讼程序中
the Eighth
International Symposium on Information and
Communication Technology, pages 181–188.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我