研究论文
Bi-GRU Relation Extraction Model Based on Keywords
Attention
Yuanyuan Zhang1†, Yu Chen2, Shengkang Yu1, Xiaoqin Gu1, Mengqiong Song1, Yu Peng1, Jianxia Chen2
& Qi Liu2
1Technical Training Center of State Grid Hubei Electric Power Co., Ltd. Wuhan 430070, 中国
2Hubei University of Technology, School of Computer Science, Wuhan 430068, 中国
关键词: Relation extraction; Bi-GRU; CRF keywords attention; Hidden similarity
引文: 张, Y.Y. 等人。: Bi-GRU Relation Extraction Model Based on Keywords Attention. 数据智能 4(3), 552-572
(2022). DOI: 10.1162/dint_a_00147
Receive: Oct. 11, 2021; 修改: Jan. 15, 2022; 公认: Feb. 10, 2022
抽象的
Relational extraction plays an important role in the field of natural language processing to predict semantic
relationships between entities in a sentence. 现在, most models have typically utilized the natural
language processing tools to capture high-level features with an attention mechanism to mitigate the adverse
effects of noise in sentences for the prediction results. 然而, in the task of relational classification, 这些
attention mechanisms do not take full advantage of the semantic information of some keywords which have
information on relational expressions in the sentences. 所以, we propose a novel relation extraction model
based on the attention mechanism with keywords, named Relation Extraction Based on Keywords Attention (REKA).
尤其, the proposed model makes use of bi-directional GRU (Bi-GRU) to reduce computation, obtain the
representation of sentences , and extracts prior knowledge of entity pair without any NLP tools. Besides the calculation
of the entity-pair similarity, Keywords attention in the REKA model also utilizes a linear-chain conditional random
场地 (病例报告表) combining entity-pair features, similarity features between entity-pair features, and its hidden
vectors, to obtain the attention weight resulting from the marginal distribution of each word. 实验
demonstrate that the proposed approach can utilize keywords incorporating relational expression semantics
in sentences without the assistance of any high-level features and achieve better performance than traditional
方法.
†
通讯作者: Yuanyuan Zhang (电子邮件: 16823650@qq.com; ORCID: 0000-0002-5353-2989).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
t
/
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
© 2022 Chinese Academy of Sciences. 根据知识共享署名发布 4.0 国际的 (抄送 4.0) 执照.
Bi-GRU Relation Extraction Model Based on Keywords Attention
1. 介绍
Abundant data on the Web are generated and shared every day, thus the relational facts of subjects
(实体) in the text are often utilized to represent the text information to capture associations among those
数据. 一般来说, triples are utilized to represent entities and their relations which often indicate unambiguous
facts about entities. 例如, a triple (e1, r, e2) denotes that entity e1 has a relation r with another entity
e2. Knowledge graphs (KG) such as FreeBase [1] and DBpedia [2] are real examples of such representations
in the triple form.
Relation extraction is a sub-task of natural language processing (自然语言处理) that can discover relations between
entity pairs and given unstructured text data. Previous work in the area of relation extraction from text
heavily depends on kernel and feature methods [3]. Recent research studies utilize data-driven Deep Neural
网络 (DNNs) methods to eliminate RE of the conventional NLP approaches since these DNN-based
方法 [4–6] can automatically learn features instead of manually designed features based on the various
NLP tool-kits. Most of them surpassed the traditional methods and achieved excellent results for the RE
任务. Among them, both DNNs-based supervised and distant supervision methods are the most popular
and reliable solutions for RE but have their own characteristics. Supervised methods have better performance
for the specific domain, while distant supervision methods have better performance for generic domains.
所以, it is difficult to specify which kind of the above two methods are the best. 因此, 下列
part introduces the DNN-based supervised methods in detail according to the research of the paper.
According to the structure of DNNs, DNN-based Supervised RE usually is classified into various types
such as CNN [6–10], RNN [5, 11, 12], or Mix structure. 此外, some variant RNN networks have
been developed in RE systems such as the Long Short Term Memory network ( LSTM ) [13–15], and Gated
Recurrent Unit ( GRU ) [16]. Each kind of DNN has its own characteristics and advantages in dealing with
various language tasks. 例如, due to the parallel processing ability, the CNNs are good at addressing
local and structural information, but rarely capture global features and time sequence information. 反而,
RNNs, LSMTs, and GRUs, which are suitable for modeling sequence and problem transformation, 能
alleviate these problems that CNNs cannot overcome.
然而, these structural RNNs-based methods have a common drawback which is that many external
artificial features are introduced without an effective feature filter mechanism [17]. 所以, the semantic-
oriented approaches are utilized to improve the ability of semantic representation via capturing the internal
association of text and the attention mechanisms. To alleviate the influence of word-level noise within
句子, many efforts have been devoted to getting rid of irrelevant words [18–21], 尤其, the recent
state-of-the-art attention-based methods such as [19, 22, 23].
Although the inner-sentence noise can be alleviated by the attention mechanisms with the caculation of
weights for the each word independently, there are some information for better extraction through some
continuous words such as phrases. Yu et al.[24] proposes an attention mechanism based on the conditional
random fields (病例报告表), which incorporates such keywords information into the neural relation extractor.
数据智能
553
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
t
.
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
Compared with other strong feature-based classifiers and all baseline neural models, the CRF mechanism
is important for this model to construct a better attention weight.
Based on the above analysis, we propose a novel relation extraction model based on the attention
mechanism with keywords, named Relation Extraction Based on Keywords Attention (REKA) , 哪个
incorporates an attention mechanism based on the keywords-identifiable of relation that is similar to the
segments in the [24]. Different from the model in [24], our model makes use of bi-directional GRU (Bi-
GRU) to reduce computation without any NLP tools. 尤其, the CRF attention mechanism includes
two components: entity pair attention and segment attention.
The proposed entity pair attention means adding additional weight to the entity part of the dataset so
that it plays a more decisive role when entering the code. The proposed segment attention is assumed that
each sentence has a binary sequence of states corresponding to it and that each state variable in the
sequence corresponds to a word in the sentence. This binary state variable indicates whether the
corresponding word is related to the relation extraction task with 0 和 1, 分别. Inspired by the [24],
we utilized a linear-chain CRF incorporating segment attention to obtain the marginal distribution of each
state variable as an attention weight.
总结一下, the contributions of the proposed REKA model are shown as follows:
• Propose a novel Bi-GRU model based on an attention mechanism with keywords to handle the
relation extraction.
• Both entity pair similarity features and segment features are incorporated in the proposed attention
mechanism with keywords.
• Achieves state-of-the-art performance without any other NLP tools assistance.
• Be more interpretable than the original Bi-GRU model.
2. RELATED WORK
2.1 RNN-Based Relation Extraction Models
最近, relation extraction research focuses on extracting relational features with neural networks[25–27].
Zhang et al. [28] claimed that RNN-based relation extraction models have better performance than that
the CNN-based models since CNN’s can only obtain the local features, but RNNs are good at learning
long-distance dependency between entities. Afterward, LSTM [15] is proposed by using the gate mechanism
to solve the problem of gradient explosion in RNN models. Based on this, 徐等. [5] propose a model
with LSTM via the shortest dependency path (SDP) between entities, named the SDP-LSTM model, 其中
there are four types of information, including Word vectors, POS tags, Grammatical relations, and WordNet
hypernyms, to support external information. To address the problem of shallow architecture difficultly
represented by the potential space in different network levels, 徐等. [29] can obtain the abstract features
along the two sub-paths of SDP.
554
数据智能
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
t
.
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
Since dependency trees are directed graphs, it is necessary to identify whether the relation implies the
reverse direction or the first entity is related to the second entity. 所以, the SPD is divided into two
sub-paths, each directed from the entity towards the ancestor node. 然而, one-directional LSTM models
lack representation of the complete sequential information. 因此, the bidirectional LSTM model (BiLSTM)
is utilized by Zhang et al. [30] to obtain the sentence level representation with several lexical features. 这
experimental results demonstrate word embedding as an input feature alone is enough to achieve excellent
结果. 然而, the SDP can filter the input text but has no extracted features. To address this issue, 这
attention mechanism is introduced for BiLSTM-based RE[31].
2.2 Attention Mechanisms for Relation Extraction
Since useful information can be presented anywhere in the sentence, some researchers recently have
presented attention-based models which can obtain the important semantic information in a sentence.
Zhou et al. [31] propose the attention mechanism in BiLSTM, which automatically got the important
features only with the raw text. Similar to the work of Zhou et al. [31], Xiao et al. [32] propose a two-level
BiLSTM architecture based on a two-level attention mechanism to extract a high-level representation of the
raw sentence.
Although the attention mechanism is used to capture the important features extracted by the model, [31]
just presents a random weight without the consideration of prior knowledge. 所以, EAtt-BiGRU
proposed by Qin et al. [33] leverages the entity pair as prior knowledge to form attention weight. 不同的
from Zhou et al.’s [31] 工作, EAtt-BiGRU applies bi-directional GRU (Bi-GRU) to reduce computation,
capture the representation of sentences and adopt a GRU to extract prior knowledge of entity pairs. 张
et al. [34] propose a Bi-GRU model based on another attention mechanism with the SDP for the prior
知识, extracting sentence-level features and attention weights. Nguyen et al. [35] have proposed to
use a special attention mechanism and introduced dependency analysis that takes into account the
interconnections between potential features.
With the proposed BERT model, which has achieved excellent performance on various NLP tasks, 更多的
and more studies have started to try to use the BERT model in search matching tasks and achieved very
good results. In the latest study on pre-trained models, Wei et al. [36] achieved high metric scores using
BERT. Although the BERT model has excellent encoding ability and can fully capture the semantic information
of the context in the sentence, it still has problems such as high training costs and long prediction time.
Our model is inspired by Lee et al. [22], but different from the previous works that can only get word-
level or sentence-level attention and rarely obtain the degree of correlation between entities and other
related words, our model utilizes Bi-GRU instead of BiLSTM to reduce computation. Meaning while,
inspired by the attention model designed by Yu et al. [24] for the relation extraction, which is capable of
learning phrase-like features and capturing reasonably related segments as relational expressions based on
数据智能
555
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
the CRF, we propose a novel attention mechanism combining the entity pair attention with the segment
attention via CRF together.
Although the above methods provide a solid foundation for the research of supervised RE, 还有
limitations among them. 例如, the insufficient training corpus puzzles the further development of
the supervised RE. 所以, Mintz et al. [37] propose a distant supervision approach strongly based on
an assumption in the selection of training examples. Distant supervision methods also achieved excellent
results for the RE [38–40]. 然而, it also has some drawbacks, 例如, the noise in the data sets is
obvious. 因此, it is difficult to demonstrate which two kinds of above methods are currently the best. 因此,
we just research the supervised methods in this paper.
3. METHODOLOGY
The proposed REKA model consists of four components, the structure of which is shown in Figure 1, 和
the role of each layer is as follows:
• The input layer that contains word vector information and location information.
• The self-attention layer that processes the word vectors to obtain word representations.
• To obtain contextual information about each word in a sentence The Bi-GRU layer is used.
• The keyword-attention layer extracts the key information in the sentence and passes it to the final
classification layer.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 1. The systematic architecture of the REKA model.
556
数据智能
Bi-GRU Relation Extraction Model Based on Keywords Attention
3.1 Input Layer
The REKA model’s input layer is designed to transform the original input of the sentence into an embedding
vector containing various feature information, where the input sentences are denoted by {w1, w2, ……, wn}
和 {
is a vector of the relative position features information of every word to the entity pair
ej{1,2}.
e
e
p p
,
1
2
p…
,
,
}
e
n
j
j
j
To further enhance the model’s ability to better capture the semantic information in sentences, a pre-
training model of embedded language models (ELMo) [43] word embedding is utilized in this paper, 哪个
proposes a better solution for multiple meanings of words, unlike the previous work of word2vec by
米科洛夫等人. [41] and GloVe by Pennington et al. [42], in which one word corresponds to a vector that
is stationary.
ELMo is a real trained model, in which a sentence or a paragraph is fed into and inferred the word vector
corresponding to each word based on the context. One of the obvious benefits of ELMo is that the multiple-
meaning words can be understood in the context of the preceding and following words.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
.
t
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
After the word embedding process, {x1, x2, ……, xn} is the dw dimensional vector and input into the next
layer as the position feature vector.
3.2 Multi-Head Attention Layer
Although this paper makes use of non-fixed word vectors in the input layer, we use the Multi-Headed
Attention (MHA) mechanism to process the output vectors in the input layer to help the model further
understand the deep semantic information in the sentences and to address the problem of long-term
dependencies. MHA is a special kind of self-attention mechanism [17, 19], in which the symmetric similarity
matrix of the sequences can be constructed from a sequence of word vectors resulting from the input layer.
如图 2, given a key K, a queries Q, and a value V, the multi-head attention module will
execute the attention h times, the calculation process uses the following equation (1–3):
数字 2. A sample of Multi-Head Attention [17]
数据智能
557
Bi-GRU Relation Extraction Model Based on Keywords Attention
MultiHead (
Q K V W
)
,
,
=
中号
where head
我
=
Attention
……
头 ;
1
Concat[
(
K
W Q W K W V
我
;头
r
)
,
,
问
V
我
我
Attention (
Q K V
,
,
)
=
softmax
⎛
⎜
⎝
(西德:2)
QK
d
w
⎞
⎟
⎠
V
]
(1)
(2)
(3)
中号
∈
右
d
×
d
w
w
瓦
这里
sc aled dot-product attention calculation when calculated and connected in series, Wi
key and value of ith head, 分别 [17].
is the trainable parameter, WM is the
V is query,
问, Wi
K, Wi
瓦
我
瓦
我
瓦
我
,
,
,
w
w
w
w
w
w
K
∈
右
d
×
r d
/
V
∈
右
d
×
r d
/
问
∈
右
d
×
r d
/
The inputs Q, K, V are all equal to the word embedding vector {x1, x2, …, xn} in the multi -head attention[17].
The output of the MHA self-attention is a sequence of features with information about the context of the
input sentences.
3.3 Bi-GRU Network
The Bi-GRU network layer was used to obtain semantic information in sentences about the output
sequence of the MHA self-attentive layer. 如图 3, GRU optimizes the LSTM by retaining only
two gate operations including a new gate and a reset gate, thus its units, 所以, have fewer parameters
and converge faster than LSTM units.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
t
/
.
我
The GRU unit’s processing of mi is represented in this paper for simplicity as GRU(mi). 所以, 这
数字 3. The GRU unit
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
方程 (4–6) for calculating the contextualized word representation is obtained as follows:
(西德:2)(西德:2)(西德:3)
(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:3)
h GRU m
t
t
(西德:4)(西德:2)(西德:2)
(西德:4)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)
(
h GRU m
t
t
(西德:2)(西德:2)(西德:3) (西德:4)(西德:2)(西德:2)
[
]
h h
;
t
t
H
t
=
=
=
)
(
)
(4)
(5)
(6)
558
数据智能
Bi-GRU Relation Extraction Model Based on Keywords Attention
The input M resulting from the MHA self-attention layer is fed into the Bi-GRU network step by step. 到
simultaneous use of past and future feature information at a given time step, we connect the hidden state
R at
H
R with the hidden sta te of the backward GRU network
of the forward GRU network
(西德:2)(西德:2)(西德:3)
h ∈
(西德:2)(西德:2)(西德:3)
h ∈
H
d
d
t
t
each step.
Where dh is used to denote the hidden state of the GRU network unit dimension, {h1, h2, …, hn} is denoted
the hidden state vector of each word, The arrow represents the direction of the GRU unit.
3.4 Keywords Attention based on CRF
Although attention mechanisms have achieved state-of-the-art results in a variety of NLP tasks, 大多数
them do not fully exploit the keywords information in the sentences. This is because keywords usually refer
to important words for solving relational extraction tasks, and the performance of the models would be
improved if information about these keywords could be exploited.
The goal of the attention mechanism with keywords proposed in this paper is to assign more reasonable
weights to the hidden layer vectors, where attention weights are also a set of linear combinations of scalars.
A more reasonable weight assignment indicates that the model pays more attention to the more important
words in the sentence compared to other words, and all the weights in this attention mechanism with
keywords take values between 0 和 1.
然而, there is a different approach to the calculation of the weights between the traditional attention
mechanisms and the proposed model. 尤其, the proposed model defines a state variable z for each
word in the sentence, it means that the word corresponding to z is irrelevant to the relational classification
of this sentence when z equals 0, and vice versa if z equals 1. 因此, each sentence of the input model has
a corresponding sequence of z. From the above description, the expected value of a hidden state N, 这
probability of its corresponding word, will be selected and calculated as the following equation (7):
氮
∑=
我
(
p zi =
)1
H
H
我
(7)
In order to calculate the p(zi = 1|H), the CRF is introduced here to calculate the sequence of weights
for the hidden sequence vectors H = {h1, h2, …, hn}, where H represents the input sequence and hi represents
the hidden output of the GRU layer for the i th word in the sentence. CRF provides a calculation of transfer
probabilities for the computation of conditional probabilities in between sequences.
The linear-chain CRF defines a range of conditional probability p(zi = 1|H) given H with the following
定义 (8–9):
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
p
(
z H
)
Z
(
H
)
z H
,
=
Z
1
H
(
(
∏ c
ψ
) c C
∈
(
= ∑ ∏ c
ψ
′
z H
,
)
′
∈
Z
∈
c C
数据智能
z
)
(8)
(9)
559
Bi-GRU Relation Extraction Model Based on Keywords Attention
Where is the set of state sequences, Z(H) denotes the normalization constant and ZC is the subset of
z given by individual clique c, y(ZC, H) is the potential function of this clique. It is defined by the following
方程 (10):
(
ψ
∏
∈
c C
)
=
cz H
,
n
∏
=
1
我
(
ψ
1
z
我
,
H
我
)
-
n
1
∏
=
1
我
(
ψ
2
z z
,
我
我
+
1
)
(10)
For feature extraction, the feature extractor makes use of two types of feature functions, the vertex feature
function y1(zi, H), the edge feature function y2(zi, zi+1). y1 represents the mapping of the output h of GRU
to the state variable z, and y2 simulates the transition of two state variables at adjacent time steps. 这
equations for their definitions are shown as the following equation (11–13) 分别:
(
ψ
1
iz
,
H
)
=
(
瓦
经验值
+
H
F
1
瓦
乙
F
2
+
)
乙
=
F
1
[
H
我
;
1
e
p
我
;
h t
;
e
2
]
;
F
2
2
=
[
(
z zψ
,
2
我
我
)
+ =
1
经验值
1
e
;
e
2
h t h t
;
;
1
)
t
z z
,
我
+
1
我
(
瓦
]
2
(11)
(12)
(13)
Where WH and WE are trainable parameters, b is a trainable bias term. They calculate the contextual
information as a feature score for each state variable, which takes advantage of the entity location features
e
ip p as well as keyword features embedded vectors (entity pair hidden similarity features t1, t2, and entity
1
我
pair features
e
2
h h ).
,e
e
2
1
For the hidden vector output by the words after the Bi-GRU layer, the CRF keyword attention mechanism
performs soft selection by assigning higher weights to the words in the sentence that are more relevant to
the classification. The processing of the sentence by the CRF keywords attention mechanism is shown in
数字 4, The CRF keyword attention in the figure assigns different weights to each word with an example
sentence “The boy ran into the school cafeteria”. In addition to the two entity words “boy” and “cafeteria”,
“into” in the sentence was also assigned a higher weight relative to the other words, due to the fact that a
is the word associated with the relational classification.
数字 4. CRF keywords attention mechanism architecture shown with an example sentence “The boy ran into the school
cafeteria”.
560
数据智能
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
Entity position feature: The proposed attention mechanism with keywords in this paper not only obtains
word embedding features but also incorporates position embedding features.
In order to represent contextual information as well as the relative location features of entities
e
p p ,
2
我
this paper connects them with the output of their corresponding hidden layers hi, as shown by F1 in Equation
y
12. There is a definition such as
……
,
,e
我
y
y
y
,
.
⋅
⋅
1
ctxt
cand
我
ctxt
cand
n
Positional vectors ar e similar to word embedding in that it transforms a relative positional scalar into a
, where L is the
feature embedding vector by traversing through the embedding matrix
maximum sentence length, dp is the dimension of the position vector.
posW
∈R
-
L
( 2 1)
pd
×
Entity hidden similarity features: Extracting entity hidden similarity features as entity features are used to
replace the traditional entity feature extraction method in this paper, thus avoiding the use of traditional
NLP tools, and its calculation process is defined as shown in Equation (14–15).
=
j
A
我
经验值
⎛
⎝
K
∑
=
1
k
经验值
(
⎛
⎝
H
e
(
(西德:2)
)
j
⎞
⎠
C
我
(西德:2)
)
j
⎞
⎠
C
k
H
e
t
∈
{1,2}
j
K
= ∑
j
a c
我
我
=
1
我
(14)
(15)
在本文中, entities are categorized according to their similarity to their hidden vectors.
C
∈R
2 hd
×
K
denotes a potential vector constructed to represent the classes of similar entities, where K is a hyperparameter
representing the number of classes in which entities are classified by their hidden similarity.
The j th entity hidden similarity feature t j is calculated by weighting the similarity of c with the hidden
layer output
jeh based on the j th entity.
Entity features are structured by cascading the hidden states corresponding to the entity locations and
the potential type representation of the entity pair, shown as F2 in Equation (12).
3.5 Classification Layer
To compute the probability p of the output distribution of the state variable, A softmax layer has been
added after the keyword attention layer, which is shown in Equation 16.
p y
(
氮
)
=
softmax
(
W N
y
)
+
乙
y
(16)
Of which |右| is the number of relationship categories, byR|右| is a biased term, Wy that maps the expected
value of the hidden state N to the feature score of the relational label.
数据智能
561
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
3.6 Training
The proposed keywords attention is calculated concerning the cross-entropy loss of the relation extraction.
This loss function is defined as shown in Equation 17.
|
|
D
′ ∑L
= −
(
p y
日志
我
( )
)
S θ
,
我
( )
=
1
我
(17)
在哪里 |D| is the size of the training data dataset and (S(我), y(我)) is the i th sample in the dataset. The AdaDelta
optimizer is utilized to minimize the loss calculation parameter h in this paper.
To prevent overfitting, L2 regularisation is added to the loss function, where l1, l2 are the hyperparameters
of the regularisation. The second regularizer attempts to compel the model to process a small number of
significant words and returns a sparse weight distribution. The resulting objective function L is shown in
方程 18.
L L
=
′
+
4. EXPERIMENTS
4.1 Dataset and Metric
2
λ θ λ
2
2
+
1
n
∑
(
p z
我
=
1
H
)
我
(18)
To evaluate the experiment, we used the SemEval-2010 Task 8 dataset for our experiment, SemEval-2010
任务 8 dataset is a benchmark dataset that is widely used in the field of relationship extraction. The dataset
有 19 relationship types, including nine directional relationships and others. 如表所示 1.
桌子 1. Types of relationships in the dataset and their percentages.
数字
Rate
Training
Testing
Training
Testing
454
328
312
292
261
258
233
231
192
156
1410
1003
941
845
717
716
690
634
540
504
17.63
12.54
11.76
10.56
8.96
8.95
8.63
7.92
6.75
6.30
16.71
12.07
11.48
10.75
9.61
9.50
8.58
8.50
7.07
5.74
Type
其他
Cause-Effect
Component-Whole
Entity-Destination
Product-Producer
Entity-Origin
Member-Collection
Message-Topic
Content-Container
Instrument-Agency
The dataset includes 10717 句子, of which 8000 samples were used for training and other 2717
samples for testing. The evaluation metrics used here are the macro averaged F1 score based, which is the
official evaluation metric of the dataset.
562
数据智能
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
4.2 Implementation Details
在本文中, a publicly available pre-trained EMLo model is used to initialize the word embeddings in
the REKA model, and the other weights in the model are initialized randomly using the zero-mean Gaussian
分配, the relevant hyperparameters are shown in Table 2, The grid search was used for the selection
of regularised coefficient values for l1 and l2 from 0 到 0.2.
Hyper-parameter
dropout rate
l1
l2
r
batch size
r1
dr
和
dh
K
dp
桌子 2. Hyperparameters of our model.
描述
Keyword attention layer
Bi-GRU layer
Word embedding layer
Multi-head attention layer
Regularization coeffi cient
Number of Heads
Size of mini-batch
Initial learning rate
The decay rate of leaming
Size of attention layer
Size of hidden layer
Number of the similar entities’ classes
Size of position embeddings
Value
0.5
0.6
0.8
0.8
[0, 0.2]
4
50
4
0.5
50
512
4
50
4.3 Comparison Models
The proposed REKA model is to be compared with the below benchmark model.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
.
t
我
(1)
(2)
(3)
(4)
(5)
支持向量机: The SVM [44] is a Non-Neural Model, which achieves top results in the SemEval-2010 task,
but it uses a lot of handcrafted and computationally intensive features such as WordNet, ProBank,
FrameNet, ETC.
MV-RNN. The MV-RNN [45] is an SDP-based model, SDP is a semantic structural feature in
句子. Models with SDP can be iterated along the shortest dependency path between entities.
CNN. The CNN [4] is an end-to-end model on the SemEval-2010 task, which means that the data
from the input end is directly obtained from the output end. This model builds a convolutional
neural network to learn the feature vector of sentence level.
BiLSTM. The BiLSTM [30] is proposed to obtain sentence-level representations on the SemEval-2010
task with bidirectional long short-term memory networks. It is the classic RNN-based relation
extraction model.
DepNN. The DepNN [46] employs an RNN to model subtrees and a CNN to capture features on
the shortest path in sentences.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数据智能
563
Bi-GRU Relation Extraction Model Based on Keywords Attention
(6)
(7)
(8)
(9)
FCM. The FCM [45] decomposes each sentence into sub-structures, then extracts their features
separately and finally merges them into the classification layer.
SDP-LSTM. The SDP-LSTM [5] employs the long short term memory (LSTM) to capture features along
the shortest dependency path (SDP). The model is a convolutional neural network for classification
by ranking and uses a loss function with pairwise rank.
Purely self-attention [47]. Only a self-attentive coding layer was utilized and combined with a
position-aware encoder for relational classification.
CASREL BERT [36]. CASREL BERT presents a cascade binary tagging framework (CASREL) 和
implements a new tagging framework that achieves some performance improvements.
(10) Entity-Aware BERT [48]. The method builds on BERT with structured predictions and an entity-aware
self-attentive layer, achieving excellent performance on the SemEval 2010 任务 8 dataset.
4.4 Experimental Results
To evaluate the proposed models further, we chose the RNN-based model from the above models
for comparison. The Precision-Recall (PR) curves and complexity analysis of the models are shown in
数字 5.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
/
t
我
数字 5. Precision-Recall curves of different used numbers of datasets (1%, 20%, 100%, 分别) 和
compared with RNN methods.
The comparison results between the REKA model and other models are shown in Table 3, 平均数
precisions (AP) of REKA compared with RNN methods are shown in Table 4.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
564
数据智能
Bi-GRU Relation Extraction Model Based on Keywords Attention
桌子 3. Comparison of the results of the Semeval-2010 Task 8 test dataset.
模型
支持向量机[42]
MV-RNN[43]
CNN[4]
BiLSTM[20]
DepNN[44]
FCM[45]
SDP-LSTM[5]
Purely Self-Attention[47]
CASREL BERT
Entity-Aware BERT[48]
REKA Model
Additional Featuresa
销售点, WN, ETC.
销售点, NER, WN
聚乙烯醇, WN
没有任何,
+ PF, 销售点, ETC.
DEP
SDP, NER
SDP
聚乙烯醇
聚乙烯醇
聚乙烯醇
聚乙烯醇
Notes: A. (Where WN, DEP, SDP, PE are WordNet, dependency features, shortest dependency path, position embeddings, 分别).
F1
82.3
82.4
82.7
82.7
84.3
83.6
83.0
83.7
83.8
87.5
88.8
84.8
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
桌子 4. Average precision score for our model and compared methods (micro-averaged over all classes).
A
1%
20%
100%
BiLSTM
SDP-LSTM
0.26
0.60
0.73
0.47
0.68
0.70
REKA
0.55
0.76
0.81
Notes: A. (The fi rst columns show how much of testing data has been used. Performance is on the SemEval-2010 task dataset).
The experimental results show that the proposed REKA model is superior to the conventional model with
fewer features but is lower than the Entity-Aware BERT and CASREL BERT. 然而, the pre-trained model
file of the BERT is so large that it takes longer to be trained with higher hardware performance requirements.
如表所示 5, we conducted ablation experiments on the development dataset in order to explore
the contribution of the various components of the keywords-aware attention mechanism to the experimental
结果. We gradually stripped the individual components from the original model, the experimental results
showed that the F1-score decreased by 0.2 when the position embedding component was stripped from
该模型. MHA, pre-trained EMLo word embeddings, and entity is hidden similarity features provide F1
scores of 0.5, 1.2, 和 0.8 respectively for the model. 尤其, A 2.3% improvement of F1 is a result
of the keywords-aware attention. 所以, experimental results demonstrate that these components
contribute to the model in a complementary way rather than working individually and achieve an F1 score
的 84.6 via the combination of all components.
数据智能
565
Bi-GRU Relation Extraction Model Based on Keywords Attention
桌子 5. The effect of components on the F1-score of the model.
模型
Our model
– Position embedding
– Multi-head attention
– Pre-trained EMLo word embeddings
– Entity hidden similarity features
– Keyword-aware attention
5. 结论
Dev F1
84.6
84.4
83.9
82.7
81.9
79.6
在本文中, we propose a novel Bi-GRU network model based on an attention mechanism with
keywords for the task of RE on the SemEval-2010 task dataset. This model adequately extracts features that
are available in the dataset through the keyword attention mechanism and achieved F1 score of 84.8
without the use of other NLP tools. To calculate the marginal distribution for each word, we used the
similarity between the output of the hidden vectors by the entity words in the hidden layer and the relative
position feature vectors between the entity words in the CRF keyword attention mechanism, 这是
chosen as the attention weight. Our further research will be carried out on attention mechanisms that can
better extract key information from sentences, and we are planning to use this for the identification of
relationships between several entities.
致谢
This work is supported by the Science and Technology Project of Hubei Electric Power Co., LTD., 状态
Grid (149).2020
作者贡献
Yuanyuan Zhang (电子邮件: 16823650@qq.com, ORCID:0000-0002-5353-2989): has participated in the
proposed model design and writing of the manuscript.
Yu Chen (电子邮件: 1148848330@qq.com, ORCID: 0000-0001-7316-3570): has participated in the coding,
the experiment and analysis, writing the manuscript.
Shengkang Yu (电子邮件: 12052033@qq.com, ORCID:0000-0001-6374-3395): has participated in the part
of the experiment and analysis.
Xiaoqin Gu (电子邮件: 1564785699@qq.com, ORCID: 0000-0001-6308-8474): has participated in the part
of the experiment and analysis.
566
数据智能
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
Mengqiong Song (电子邮件: 297365728@qq.com, ORCID:0000-0002-2816-5670): has participated in the
part of the experiment and analysis.
Yu Peng (电子邮件: 1039079148@qq.com, ORCID:0000-0002-5353-2989): has participated in the revision
of the manuscript.
Jianxia Chen (电子邮件: 1607447166@qq.com, ORCID: 0000-0001-6662-1895): has participated in the
model design, problem analysis, writing and revision of the manuscript.
Qi Liu (电子邮件:260129443@qq.com, ORCID:0000-0003-1066-898X): has participated in the writing and
revision of the manuscript.
参考
[1] Bollacker, K., 埃文斯, C。, Paritosh, P。, Sturge, T。, & 泰勒, J。: Free-base: a collaboratively created graph
database for structuring human knowledge. 在: Proceedings of the 2008 ACM SIGMOD international
conference on Management of data, PP. 1247–1250 (2008, 六月)
[2] Auer, S。, Bizer, C。, Kobilarov, G。, Lehmann, J。, Cyganiak, R。, & Ives, Z。: Dbpedia: A nucleus for a web of open
数据. In The semantic web, PP. 722–735 (2007). 施普林格, 柏林, Heidelberg
[3] Pawar, S。, Palshikar, G.K., & Bhattacharyya, P。: Relation extraction: 一项调查. arXiv 预印本 arXiv:1712.05191
(2017)
[4] 曾, D ., 刘, K., Lai, S。, 周, G。, & 赵, J。: Relation classification via convolutional deep neural network.
在: Proceedings of COLING 2014, the 25th international conference on computational linguistics: 技术的
文件, PP. 2335–2344 (2014, 八月)
[5] 徐, Y。, Mou, L。, 李, G。, 陈, Y。, 彭, H。, & Jin, Z。: Classifying relations via long short term memory networks
along shortest dependency paths. 在: 诉讼程序 2015 conference on empirical methods in natural
语言处理, PP. 1785–1794 (2015, 九月)
刘, C。, Sun, W., Chao, W., & Che, W.: Convolution neural network for relation extraction. 在: 国际的
conference on advanced data mining and applications, PP. 231–242 (2013, 十二月). 施普林格, 柏林,
Heidelberg
[6]
[7] 阮, T.H., & Grishman, R。: Relation extraction: Perspective from convolutional neural networks. 在:
Proceedings of the 1st workshop on vector space modeling for natural language processing, PP. 39–48
(2015, 六月)
Santos, C.N.D., Xiang, B., & 周, B.: Classifying relations by ranking with convolutional neural networks.
arXiv 预印本 arXiv:1504.06580 (2015)
[8]
[9] 陈, Y。: Convolutional neural network for sentence classification (Master’s thesis, 滑铁卢大学)
(2015)
[10] Kalchbrenner, N。, 格芬施泰特, E., & Blunsom, P。: A convolutional neural network for modelling sentences.
arXiv 预印本 arXiv:1404.2188 (2014)
[11] Elman, J.L. Distributed representations, simple recurrent networks, and grammatical structure. 机器
学习 7(2), 195–225 (1991)
[12] 张, D ., & 王, D .: Relation classification via recurrent neural network. arXiv 预印本 arXiv:1508.01006
(2015)
数据智能
567
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
[13] 张, S。, 郑, D ., 胡, X。, & 哪个, M。: Bidirectional long short-term memory networks for relation
classification. 在: Proceedings of the 29th Pacific Asia conference on language, information and
计算, PP. 73–78 (2015, 十月)
[14] Sundermeyer, M。, Schlüter, R。, & Ney, H。: LSTM neural networks for language modeling. 在: Thirteenth
annual conference of the international speech communication association (2012)
[15] Hochreiter, S。, & 施米德胡贝尔, J。: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
[16] 钟, J。, Gulcehre, C。, 给, K., & 本吉奥, Y。: Empirical evaluation of gated recurrent neural networks on
sequence modeling. arXiv 预印本 arXiv:1412.3555 (2014)
[17] 王, H。, Qin, K., Zakari, 右. Y。, 鲁, G。, & Yin, J。: Deep neural network-based relation extraction: 一个
overview. Neural Computing and Applications, 1–21 (2022)
[18] 徐, Y。, Mou, L。, 李, G。, 陈, Y。, 彭, H。, & Jin, Z。: Classifying relations via long short term memory
networks along shortest dependency paths. 在: 诉讼程序 2015 conference on empirical methods
in natural language processing, PP. 1785–1794 (2015, 九月)
[19] 张, Y。, Zhong, 五、, 陈, D ., Angeli, G。, & 曼宁, C.D.: Position-aware attention and supervised data
improve slot filling. 在: Conference on Empirical Methods in Natural Language Processing (2017)
[20] 张, Y。, 齐, P。, & 曼宁, C.D.: Graph convolution over pruned dependency trees improves relation
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
extraction. arXiv 预印本 arXiv:1809.10185 (2018)
[21] 刘, T。, 张, X。, 周, W., & Jia, W.: Neural relation extraction via inner-sentence noise reduction and
transfer learning. arXiv 预印本 arXiv:1808.06738 (2018)
[22] 李, J。, Seo, S。, & Choi, Y.S.: Semantic relation classification via bidirectional lstm networks with entity-aware
attention using latent entity typing. Symmetry 11(6), 785 (2019)
[23] 王, H。, Qin, K., 鲁, G。, Luo, G。, & 刘, G。: Direction-sensitive relation extraction using bi-sdp attention
模型. Knowledge-Based Systems 198, 105928 (2020)
[24] 于, B., 张, Z。, 刘, T。, 王, B., 李, S。, & 李, Q.: Beyond Word Attention: Using Segment Attention in
Neural Relation Extraction. 在: IJCAI, PP. 5401–5407 (2019, 八月)
[25] Aydar, M。, Bozal, 奥。, & Ozbay, F。: Neural relation extraction: a survey. arXiv e-prints, arXiv-2007 (2020)
[26] Socher, R。, Huval, B., 曼宁, C.D., & 的, A.Y.: Semantic compositionality through recursive matrix-vector
空间. 在: 诉讼程序 2012 joint conference on empirical methods in natural language processing
and computational natural language learning, PP. 1201–1211 (2012, 七月)
[27] 曾, D ., 刘, K., 陈, Y。, & 赵, J。: Distant supervision for relation extraction via piecewise convolutional
neural networks. 在: Proceedings of the 2015 conference on empirical methods in natural language
加工, PP. 1753–1762 (2015, 九月)
[28] 张, D ., & 王, D .: Relation classification via recurrent neural network. arXiv 预印本 arXiv:1508.01006
(2015)
[29] 徐, Y。, Jia, R。, Mou, L。, 李, G。, 陈, Y。, 鲁, Y。, & Jin, Z。: Improved relation classification by deep recurrent
neural networks with data augmentation. arXiv 预印本 arXiv:1601.03651 (2016)
[30] 张, S。, 郑, D ., 胡, X。, & 哪个, M。: Bidirectional long short-term memory networks for relation
classification. 在: Proceedings of the 29th Pacific Asia conference on language, information and
计算, PP. 73–78 (2015, 十月)
[31] 周, P。, Shi, W., Tian, J。, 齐, Z。, 李, B., Hao, H。, & 徐, B.: Attention-based bidirectional long short-term
memory networks for relation classification. 在: Proceedings of the 54th annual meeting of the association
for computational linguistics (体积 2: Short papers), PP. 207–212 (2016, 八月)
568
数据智能
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
/
t
.
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
[32] Xiao, M。, & 刘, C。: Semantic relation classification via hierarchical recurrent neural network with attention.
在: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics:
技术论文, PP. 1254–1263 (2016, 十二月)
[33] Qin, P。, 徐, W., & Guo, J。: Designing an adaptive attention mechanism for relation classification. 在: 2017
国际神经网络联合会议 (IJCNN), PP. 4356–4362 (2017, 可能). IEEE
[34] 张, C。, Cui, C。, 高, S。, Nie, X。, 徐, W., 哪个, L。, … & Yin, Y。: Multi-gram CNN-based self-attention model
for relation classification. IEEE Access 7, 5343–5357 (2018)
[35] 张, C。, Cui, C。, 高, S。, Nie, X。, 徐, W., 哪个, L。, … & Yin, Y。: Multi-gram CNN-based self-attention model
for relation classification. IEEE Access 7, 5343–5357 (2018)
[36] Wei, Z。, Su, J。, 王, Y。, Tian, Y。, & 张, Y。: A novel cascade binary tagging framework for relational triple
extraction. arXiv 预印本 arXiv:1909.03227 (2019)
[37] Mintz, M。, Bills, S。, Snow, R。, & Jurafsky, D .: Distant supervision for relation extraction without labeled data.
在: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint
Conference on Natural Language Processing of the AFNLP, PP. 1003–1011 (2009, 八月)
[38] 他, Y。, 李, Z。, 哪个, Q., 陈, Z。, 刘, A。, 赵, L。, & 周, X。: End-to-end relation extraction based on
bootstrapped multi-level distant supervision. World Wide Web 23(5), 2933–2956 (2020)
[39] Han, X。, 刘, Z。, & Sun, M。: Neural knowledge acquisition via mutual attention between knowledge graph
and text. 在: Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018, 四月)
[40] 王, G。, 张, W., 王, R。, 周, Y。, 陈, X。, 张, W., … & 陈, H。: Label-free distant supervision
for relation extraction via knowledge graph embedding. 在: 诉讼程序 2018 conference on empirical
methods in natural language processing, PP. 2246–2255 (2018)
[41] 米科洛夫, T。, 陈, K., 科拉多, G。, & 院长, J。: Efficient estimation of word representations in vector space.
arXiv 预印本 arXiv:1301.3781 (2013)
[42] Pennington, J。, Socher, R。, & 曼宁, C.D.: Glove: Global vectors for word representation. In Proceedings
of the 2014 conference on empirical methods in natural language processing (EMNLP), PP. 1532–1543
(2014, 十月)
[43] Sarzynska-Wawer, J。, Wawer, A。, Pawlak, A。, Szymanowska, J。, Stefaniak, 我。, Jarkiewicz, M。, & Okruszek, L。:
Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research 304,
114135 (2021)
[44] Rink, B., & Harabagiu, S。: Utd: Classifying semantic relations by combining lexical and semantic resources.
在: Proceedings of the 5th international workshop on semantic evaluation, PP. 256–259 (2010, 七月)
[45] Socher, R。, Huval, B., 曼宁, C.D., & 的, A.Y.: Semantic compositionality through recursive matrix-vector
空间. 在: 诉讼程序 2012 joint conference on empirical methods in natural language processing
and computational natural language learning, PP. 1201–1211 (2012, 七月)
[46] 刘, Y。, Wei, F。, 李, S。, 吉, H。, 周, M。, & 王, H。: A dependency-based neural network for relation
classification. arXiv 预印本 arXiv:1507.04646 (2015)
[47] Bilan, 我。, & Roth, B.: Position-aware self-attention with relative positional encodings for slot filling. arXiv
preprint arXiv:1807.03052 (2018)
[48] 王, H。, Tan, M。, 于, M。, 张, S。, 王, D ., 徐, K., … & Potdar, S。: Extracting multiple-relations in
one-pass with pre-trained transformers. arXiv 预印本 arXiv:1902.01030 (2019)
数据智能
569
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Bi-GRU Relation Extraction Model Based on Keywords Attention
作者简介
Yuanyuan Zhang (1979–), male, 博士. graduated from Wuhan University,
associate professor and senior engineer of Technical Training Center of State
Grid Hubei Electric Power Co., Ltd. Research direction: intelligent substation
技术, intelligent power grid operation and inspection technology,
电子邮件: 16823650@qq.com, ORCID: 0000-0002-5353-2989
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
Yu Chen (1996–), male, graduate student of Hubei University of Technology,
research direction: Artificial Intelligent, 自然语言处理, 电子邮件: 1148848330@qq.com,
ORCID: 0000-0001-7316-3570
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
/
t
我
Shengkang Yu (1993–), male, master graduated from Huazhong University
of Science and Technology, lecturer and intermediate engineer of Technical
Training Center of State Grid Hubei Electric Power Co.,Ltd. Research direction:
fault diagnosis of electrical equipment, 电子邮件: 120520338@qq.com., ORCID:
0000-0001-6374-3395
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
570
数据智能
Bi-GRU Relation Extraction Model Based on Keywords Attention
Xiaoqin Gu (1973–), 女性, master graduated from Hubei University,
lecturer of Technical Training Center of State Grid Hubei Electric Power Co.,
有限公司, Research direction: power grid operation
技术, 电子邮件:
1564785699@qq.com. ORCID: 0000-0001-6308-8474
Mengqiong Song (1991–), 女性, master graduated from Wuhan University,
intermediate engineer of Technical Training Center of State Grid Hubei
Electric Power Co., Ltd. Research direction: power grid operation technology,
电子邮件: 297365728@qq.com, ORCID: 0000-0002-2816-5670
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
/
t
我
Yu Peng, 女性, graduated from Wuhan University. Research direction: grid
power electronics, 电子邮件: 1039079148@qq.com, ORCID: 0000-0002-5353-
2989
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数据智能
571
Bi-GRU Relation Extraction Model Based on Keywords Attention
Jianxia Chen is an associate professor in School of Computer Science at
Hubei University of Technology. She obtained her MS at Huazhong University
of Science & Technology in China. She has worked as a research fellow on
the CCF in China and ACM in USA. Her particular research interests are in
knowledge graph and recommendation systems.
ORCID: 0000-0001-6662-1895
Qi Liu, 女性, graduate student of Hubei University of Technology, 研究
方向: Artificial Intelligent, 自然语言处理, 电子邮件: 260129443@qq.com., ORCID:
0000-0003-1066-898X
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
d
n
/
我
t
/
我
A
r
t
我
C
e
–
p
d
F
/
/
/
/
4
3
5
5
2
2
0
3
8
4
1
2
d
n
_
A
_
0
0
1
4
7
p
d
.
t
/
我
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
572
数据智能