RESEARCH PAPER - Recherche en IA spécialisée au MIT

RESEARCH PAPER

Relational Topology-based Heterogeneous Network
Embedding for Predicting Drug-Target Interactions

Linlin Zhang1, Chunping Ouyang1,2†, Fuyu Hu1†, Yongbin Liu1,2, Zheng Gao3

1School of Computer, University of South China, Hengyang,Hunan, 421001, Chine

2Hunan Medical Big Data International Sci.&Technologie, Innovation Cooperation Base

3Department of Information and Library Science, Indiana University Bloomington Woodlawn Avenue, IN 47408, Bloomington, America

Mots clés: Link prediction; Heterogeneous information network; Drug-target interaction; Network embedding;

Feature representation

Citation: Zhang, L.L., Ouyang, C.P., Hu, F.Y., et coll.: Relational Topology-based Heterogeneous Network Embedding for Predicting

Drug-Target Interactions. Data Intelligence 5(2), 475-493 (2023). est ce que je: 10.1162/dint_a_00149

Reçu: Dec. 20, 2021; Revised: Apr. 15, 2022; Accepté: May 19, 2022

ABSTRAIT

Predicting interactions between drugs and target proteins has become an essential task in the drug
discovery process. Although the method of validation via wet-lab experiments has become available,
experimental methods for drug-target interaction (DTI) identification remain either time consuming or heavily
dependent on domain expertise. Donc, various computational models have been proposed to predict
possible interactions between drugs and target proteins. Cependant, most prediction methods do not consider
the topological structures characteristics of the relationship. In this paper, we propose a relational topology-
based heterogeneous network embedding method to predict drug-target interactions, abbreviated as RTHNE_
DTI. We first construct a heterogeneous information network based on the interaction between different
types of nodes, to enhance the ability of association discovery by fully considering the topology of the
réseau. Then drug and target protein nodes can be represented by the other types of nodes. According to
the different topological structure of the relationship between the nodes, we divide the relationship in the
heterogeneous network into two categories and model them separately. Extensive experiments on the real-
world drug datasets, RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods.
RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.

Corresponding author: Chunping Ouyang (E-mail: ouyangcp@126.com; ORCID: 0000-0002-2154-0079); Fuyu Hu

†
(E-mail: fyhu1124@163.com; ORCID: 0000-0002-1004-6913).

© 2022 Chinese Academy of Sciences. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0)
Licence.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

1. INTRODUCTION

The prediction of drug-target interactions (DTIs) is the process of uncovering potential drug-target
interactions that are currently undiscovered through information on the structural characteristics of drugs
and targets, as well as known drug-target relationships and drug-drug relationships. Prediction of DTIs is
the key to developing new drugs. It plays an important role in the study of drug toxicity and side effects
and the treatment of diseases. Cependant, traditional methods based on large-scale biological experiments
usually take several years and are often very expensive [1]. Donc, the prediction of DTIs has received
great attention and computer-aided drug discovery has become a trend. In recent years, with the rapid
development of computer technology and the accumulation of large amounts of medical data, computational
solutions have shown feasible and reasonable in discovering new drug-target interactions (DTIs) in the age
of big data, methods such as machine learning and data mining have been widely used to solve various
complex problems in the field of biomedicine [2, 3, 4]. Actuellement, there are three types of prediction
approaches in computational-aided drug discovery, namely Attribute-based calculation methods [5], deep
learning-based [6, 7], and network-based [8, 9] méthodes.

In the past years, several approaches have focused on prediction methods based on similarity calculation
using the attributes of the drug and the target protein. The prediction method of similarity calculation usually
depends on the attributes of the drug and the target protein, tel que [5, 10]. Nevertheless, most of these
methods use the chemical structure and protein sequence of the drug. In public data sets, it is often difficult
to obtain the protein sequence and chemical information of many polymers. Deep learning has advanced
computational modeling of DTI by offering an increased feature extraction power in drug and protein data.
Such as [6, 7] are the prediction models of DTIs by deep learning. Cependant, these deep learning methods
do not consider the network topology information of drugs and target proteins, and the prediction results
are often unsatisfactory. In recent years, network-based approaches have demonstrated great advantages
compared to similarity-based methods. Especially heterogeneous network-based methods achieved good
results in DTI prediction by considering a wide variety of topological information and the complex
interaction relationship of heterogeneous data. Zeng et al. [9] constructed a heterogeneous network that
covers the network profile and attributes information between drugs, target proteins, and diseases. Un
arbitrary-order approximate embedded deep forest method is used to predict DTI. The relationship between
different nodes in a heterogeneous graph is different.

Dans ce document, we study the interaction relationship network between drugs and target proteins, et
classify all relationships in the heterogeneous network into two categories and model them separately
according to the characteristics of the topology of the relationships between nodes. This results in a topology-
based RTHNE_DTI heterogeneous network embedding method that can be used to predict unknown drug-
target protein interactions. Spécifiquement, our main contributions are as follows:

•

476

We proposed a heterogeneous network representation learning method named “RTHNE_DTI” to
predict DTI, it learns the distributed representation of nodes by embedding heterogeneous network
into low-dimensional spaces, which use the network topology information fully. On the other hand,
we apply the method of heterogeneous network representation learning to drug-target interaction
prediction, which achieves a more rapid and effective use of medical data, thereby significantly
improving prediction accuracy.

Data Intelligence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

•

The traditional heterogeneous network representation learning method uses a uniform model to deal with
all relationships. Cependant, different node-node relationship represents different drug characteristics.
In hence, we divide the relationship in a heterogeneous network into two types: Affiliation relationship
and Peer relationship, and we design different models to represent them, which can better capture
the rich feature information between nodes.
En général, the prediction of drug-target interaction is carried out on the labeled network (Some drug
target relationship pairs with known interactions were added to the training set). Cependant, our model
can also achieve good prediction results on the unlabeled network. This solves the problem of
insufficient drug labeling data and low prediction accuracy.
We conduct different experiments using real drug data set and compare with other predictive models,
and the results show that RTHNE_DTI has the best predictive performance.

2. RELATED WORK

Computational methods in DTIs prediction have gained more attention because carrying out a biochemical
experiment on a large scale is costly and time-consuming. The early computational methods mainly focused
on similarity calculation between the attributes of the drug and the target protein. Then deep learning
methods for computational prediction of DTI have become more popular in recent years. The rapid
development of deep learning provides an effective way to predict DTI, especially for large-scale data
prediction tasks. Mayr et al. [11] compared several deep learning methods with other machine learning
and target prediction methods on large-scale drug discovery datasets and concluded that the deep learning
method has the best prediction performance. Lee et al. [12] predicted DTIs through convolutional neural
réseaux (CNNs) on original protein sequences. In a study called DeepDTA, Ozturk et al. [6] proposed a
deep-learning- based model to predict the binding affinity between drugs and targets, CNNs were mainly
used to model protein sequences and compound 1D representations. These methods are generally dependent
on drug and target attributes.

In recent years, heterogeneous network representation learning has become a hot topic of current
research and has good performance in link prediction [13]. Although heterogeneous network representation
learning methods have been widely used for link prediction in social networks with good results. La plupart
previous studies on networks have been based on homogeneous networks. Spécifiquement, the nodes in the
network are of the same type. With the development of network representation learning, in order to model
the heterogeneity of networks, some have tried heterogeneous network representation learning. Par exemple,
Shang et al. [14] proposed a framework, ESim, which uses random wandering based on matching paths to
generate sequences of nodes to optimize the similarity between multiple points. Fu et al. [15] proposed a
heterogeneous information network representation, HIN2vec, which differs from many previous works
based on skip gram language models in that the core of HIN2V ec is a neural network model that learns
the representation of nodes and relations (meta-paths) in the network. Han et al. [16] proposed an aspect-
level collaborative filtering model based on neural networks. In their model, they extract similarity matrices
of different aspect levels of nodes through different meta-paths and feed these matrices into a deep neural
network designed to learn aspect-level potential factors. These methods are commonly used in social
réseaux, scholar networks, etc..

Data Intelligence

477

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

Donc, prediction methods based on network topology are also widely used in DTI relationship
prediction. There are various networks in practice, such as social networks [17], citation networks [18], et
biological information networks [19]. And some interesting research works on network analysis have
attracted increasing attention. Particularly, link prediction is one of the hot spot tasks of network analysis.
Actuellement, most network-based DTI prediction is based on machine learning [8]. Wang et al. transformed
new DTI prediction problems into a two-layer graphical model named the restricted Boltzmann machine
(RBM). Wan et al. [20] developed a new nonlinear end-to-end learning model, called NeoDTI, lequel
integrates different heterogeneous information of drugs and targets, and learned the representation of drugs
and targets to predict DTIs. cependant, note that these approaches have the disadvantage of treating all node
relationships in the heterogeneous network equally and they may not work when chemical pathways and
protein interactions are unknown. Tableau 1 shows the categorization results.

Donc, we apply a heterogeneous network embedding method to predict DTIs, called RTHNE_DTI.
the biggest advantage of this method is that it can fully take into account the characteristics of different
node relationships in the network and is modeled for that feature.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Tableau 1. An overview and comparison of related reviews.

Attribute based calculation
méthodes

Network topology-based
méthodes

Deep learning-based
méthodes

Oui. Yamanishi [5]
X. Zheng [10]
H. Öztürk [6]
Y.-B. Wang [7]
UN. Mayr [11]
je. Lee [12]
X. Chen [8]
X. Zeng [9]
F. Wan [20]

√
√
√
√
√
√

3. PROBLEM DEFINITION

√

√
√
√
√

√

√
√
√

In this section, we introduce some basic definitions of heterogeneous network embedding to predict

DTIs.

Definition 1: Heterogeneous Network (HN).
A Heterogeneous Network is defined as a Graph G = (V, E, UN, Ø, oui), where V represents the set of nodes,
E ⊆ V × V represents the set of edges. Ø and y are the type mapping functions of nodes and edges,
respectivement, where Ø = V → N and y: E → R. Here N and R are the type sets of nodes and edges,
respectivement. A = N ∪ R, and while |N| + |R.| > 2, the network is called a heterogeneous network; otherwise
it is a homogeneous network.

478

Data Intelligence

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

Definition 2: Meta-path.
In a heterogeneous network, the meta-path P is a sequence of node types n1, n2, …, nm and edge types

r1, r2, …, rm−1, in the form of:

r
1

r
m

−

= → …→
n
2

n
m

n
1

(1)

Definition 3: Heterogeneous Network Embedding.
Given a heterogeneous network G, the heterogeneous network embedding learns a low-dimensional
d for each vertex v ∈ V by a mapping function f: V → Rd, in which d << |V| is the dimension vector Ev ∈ R of the representation space. 4. THE PROPOSED APPROACH 4.1 Overall Framework As shown in Figure 1, (a) is a heterogeneous network constructed by five types of nodes (drug, target protein, disease, side-effect, action). In this network, there are not only simple relationships, such as D-D but also compound relationships, such as D-P-Di. In (b), we divide all relationships into two categories according to the relationship topology and model them separately. Finally, we apply the model to different scenarios to verify the performance of our model. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u d n / i t / l a r t i c e - p d f / / / / 5 2 4 7 5 2 0 8 9 7 7 2 d n _ a _ 0 0 1 4 9 p d . t / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 1. Structure of the RTHNE_DTI model. (a) A heterogeneous network of ﬁ ve classes of nodes (drug, target protein, disease, side effect and action) interacting with each other was constructed. (b) All relationships are classiﬁ ed into PR relationships and AR relationships. Then G(x) and F(x) are calculated respectively according to the characteristics of the two relationships during the training process, and the training weights are updated continuously update to obtain the node embedding results. (c) The embedding results are decoded to predict DTI. Data Intelligence 479 Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target Interactions 4.2 Affiliation Relationship and Peer Relationship In studying the data sets associated with the prediction of drug and target protein interactions, we found that not all relationship pairs had an equal number of nodes of the two types of connections, and some relationship pairs had a significantly different number of nodes of the two types of connections, as shown in figure 2. Our study of DrugBank found that the types of action of proteins are very few, only 47, but the variety of proteins is wide. Hence, their relationship network looks like an action-centered network spreading outward. As shown in Figure 2 (a). However, most of the relationships in the drug data set are like drugs and proteins. The two types of nodes do not differ greatly in number, so they form a well-balanced network. As shown in Figure 2 (b). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u d n / i t / l a r t i c e - p d f / / / / 5 2 4 7 5 2 0 8 9 7 7 2 d n _ a _ 0 0 1 4 9 p d t . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 2. Different relationship topology. Node relationships in heterogeneous networks are classiﬁ ed into two categories based on the difference topology of nodes in the heterogeneous network. (a) Afﬁ liation relationship with unbalanced distribution of nodes in network relationships; (b) peer relationship with balanced distribution of nodes in network relationships. To fully use the topology characteristics of heterogeneous networks (HNs), we study the topological features of different relationships in a heterogeneous network. In the network, the degree of a node can well reflect the topological structure characteristics of the network [21]. In general, the degree of a node refers to the number of edges associated with the node. In order to explore the difference between the topological structures of different relationships in HN, we used the degree-based measure D(e) for calculation: ( ) D e = 480 max ( dn dn , u v ( min d d , n u ) ) n v (2) Data Intelligence Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target Interactions vnd are the average Where nu, nv represent the node type of nodes u, v in a relation tuple (u, v, e), degrees for nu and nv, respectively. It is worth mentioning that D(e) ≥ 1. Here, a greater D(e) value indicates that the topology of the two types of connected nodes is not identical, where one side is biased to the other? That is, nodes with a high D(e) value show a stronger affiliation relationship (AR) between them, and nodes connected by this relationship have more similar attributes [22]. However, for smaller D(e) values, it shows that the topology of the two types of connected nodes is peer, which we named the peer relationship (PR). und and In order to ensure the accuracy of the results, we compare these relationships from the perspective of sparseness, annotated as S(e), so as to discover the differences in the network structure of different relationships. We define S(e) as follows: S e ( ) = N r × N n v N n u (3) In the above formula, Nr is the number of edges in type r. In addition, vnN are the number of nodes of types nu and nv, respectively. It should be emphasized that in this way, these relationships can also be consistently divided into two categories, PR and AR. unN and Table 2. Statistical analysis of dataset. Nodes Drug Target Disease Side-effect Action Num of Nodes Relations 708 1512 5603 4192 47 D-D D-P P-P D-Di D-S P-D P-A Num of Relations 10036 1923 7360 199214 80164 1596745 2295 Avg.Du Avg.Dv D(e) S(e) Relationship type 14.18 2.72 4.87 281.38 113.23 1056.05 1.52 14.18 1.27 4.87 35.55 19.12 284.98 48.83 1.00 2.14 1.00 7.91 5.92 3.71 32.17 0.02002 0.00180 0.00322 0.05022 0.02701 0.18848 0.03229 PR PR PR PR PR PR AR We conducted a comprehensive analysis of the obtained data according to the above indicators, as shown in Table 2. 4.3 Different models for PR and AR To respect their different characteristics, we need to design different model treatments for them separately. Here, for two nodes connected by a PR relationship, there is a strong interactive relationship, and their topology structure is very similar. The nodes themselves contain rich structural information between two nodes, so we model the PR as a transition between nodes in a low-dimensional vector space. In addition, for relation type AR, Euclidean distance is used as the calculation to measure the proximity of interacting nodes in low-dimensional space. It should be noted that the calculation methods we use for the two relationships are very consistent mathematically [23]. We use the Euclidean distance method for the AR mainly for the following reasons. First, the nodes connected by this relationship share the same Data Intelligence 481 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u d n / i t / l a r t i c e - p d f / / / / 5 2 4 7 5 2 0 8 9 7 7 2 d n _ a _ 0 0 1 4 9 p d . t / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target Interactions attributes [24], so the nodes connected by the AR can be directly approached in the vector space, which is consistent with the Euclidean distance optimization [25]. Second, the purpose of the heterogeneous network representation is to preserve the structural characteristics of the high-dimensional network. The Euclidean method satisfies the condition of triangular inequality [26], which ensures that the first-order and second-order similarities of the nodes remain unchanged. Translation-based distance for peer relations. Through the study of Table 2, we found that in the heterogeneous drug network constructed, most of the relationships are peer-to-peer. Specifically, a drug acts on multiple diseases, and a disease can also be treated by multiple drugs. And the number of drug nodes and disease nodes differs very little. Peer relationships show powerful interactions between nodes with peer-to-peer structure. For the calculation of the score function of PR, we first give a PR-type relationship tuple (a, r, b), where r ∈ RPR has a weight of wa,b. Then for the embedding of nodes a and b, we define them as Pa and Pb respectively. In addition, we annotate the embedding of relation r as Qr. The final definition is as follows: ( f a b w P Q P a b a b , + − = ) , r (4) For the relationship tuples (a, r, b) ∈ TPR whose relationship is PR in the heterogeneous network, the margin-based loss function [23] is defined as follows: = L PR ∑ ∑ ∑ ) ) ∈ ∈ r PR a r b T a r b , , , , ( ( ′ ′ PR ∈ ′ T PR max ⎡ ⎣ 0, c + ) ( − ′ f a b f , ( ), a b ⎤ ⎦ (5) In the above formula, TPR represents the positive sample set in the PR triplet, and T’PR is the negative sample set. c > 0 represents a margin hyperparameter.

Euclidean distance for affiliation relations. Only the target protein and its action type belong to the
AR relationship in the heterogeneous network we constructed. Spécifiquement, the types of protein nodes and
action nodes vary greatly in number. The nodes with this relationship can be directly approached in the
vector space, so we use Euclidean distance to calculate the proximity between two nodes. Given a set of
triples (m, je, n) with relationship type AR, where i ∈ Ri represents the action relationship between nodes m
and n. Its weight is defined as wm,n and the form is as follows:

m ng
(

)

w P
m n m
,

−

P.
n

2
2

(6)

Similar to the above formula, Pm and Pn are the embedding of nodes m and n, respectivement. g(m, n) est
to calculate the distance between m and n in a low-dimensional space. To ensure that the nodes connected
by the AR relationship are closer, we minimize g(m, n) as much as possible, therefore we define the margin-
based loss function as:

L
AR

∑ ∑
)
(
∈
∈
r AR m i n T m i n

∑
)

, ,

(

′

∈
′
T
AR

maximum

⎡
⎣

(
g m n

)

− ′
g
(

),
m n

⎤
⎦

(7)

As before, TAR and T’AR are the positive and negative examples in the AR relationship, respectivement.

482

Data Intelligence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

4.4 Conjunctive Model

We have divided the node relationships of network into two categories based on the node distribution
differences in the heterogeneous networks. Ici, relations with unbalanced node distribution are Affiliation
Relations (AR) and relations with balanced node distribution are Peer Relations (PR). We first initialize all
the node embedding. Then for the node pairs of PR relations, we calculate the loss based on translation
model, and for the node pairs of AR relations, we calculate the loss based on Euclidean Distance. Enfin,
we fuse the two losses and continuously modify the embedding according to the final loss value. Spécifiquement,
we make the model more complete by fusing the loss of the two models to jointly update the weight of
node embedding. The pseudo code is shown in Algorithm 1.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

5. EXPERIMENTS AND ANALYSIS

5.1 Datasets and Experimental Setup

In this paper, the data set we used to construct the heterogeneous network includes the node type set
V = {drug, cible, diseases, side-effects, action}, the relationship type set R = {drug-drug, drug-target, drug-
diseases, drug-side-effects, target-target, target-diseases, target-action}. The data sources we used are as
follows:

DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed
drug data with comprehensive drug-target interaction, target-action information, and drug-drug interaction
information. We use the DrugBank version 3.0 and DrugBank version 5.1.6. [27].

Data Intelligence

483

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

HPDR (Human Protein Reference Database) contains manually curated scientific information
pertaining to the biology of most human proteins and the data of protein interactions extracted from the
HPRD database Release 9 [28].

CTD (Comparative Toxicogenomic Database) is a public website and research tool that provides four
types of core data: chemical-gene interactions, chemical-disease associations, gene-disease associations,
and chemical-phenotype associations. The drug-disease association and protein-disease association used
in this paper were extracted from CTD [29].

SIDER database contains information about marketed drugs and their adverse reaction records. Dans ce

papier, the drug-side-effects interactions were extracted from SIDER database Version 2 [30].

We obtained data from the above four sources, and after data preprocessing, we finally got 708 drugs,
1, 512 target proteins, 5603 diseases, 47 actions, et 4, 192 side effects. Some descriptive statistics of the
dataset are shown in Table 2.

RTHNE_DTI has three parameters: embedding dimension d, the margin c, and a, we set c = 1, et
a = 0.01. To study the influence of different dimensions on our model, we explored parameter d. Comme indiqué
in figure 3, we can see that when the dimension is 300, the predicted AUC val ue is the highest. So we set
d=100 in the experiment.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 3. Parameter Analysis. Show the dimension value is when the best prediction result achieved.

484

Data Intelligence

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

In this paper, we conducted experiments under four tasks. In order to verify that dividing the relationship
into AR and PR can effectively improve the prediction performance, we conducted two experiments on
(1) the prediction performance of labeled networks based on PR relations only, et (2) the prediction
performance of labeled networks based on all relations. En particulier, it is not possible to predict DTI based
on AR only, because only the relationship between target and action is AR relationship. To verify the
performance of our model on unlabeled networks, we conducted a third experiment, (3) the prediction
performance of unlabeled networks based on all relations. To verify the robustness of our model, nous
performed a fourth experiment, (4) prediction performance based on the other datasets.

About evaluation metrics, we use AUC and AUPR to evaluate the performances of prediction.

5.2 Baseline Methods

DT-Hybrid [31] is a recommended method relying on network-based inference, which is based on

domain knowledge, including drug and target similarity.

BLMNI [32] improves the traditional BLM method and can be used to deal with the new drug and target

candidate problems, and it is called neighbor-based interaction-profile inferring.

HNM [33] combined with the drug target information, the intensity between the drug-disease pair is

calculated by the iterative algorithm on the heterogeneous graph.

MSCMF [10] uses multiple drug similarity matrices and multiple target similarity matrices to project

drugs and targets into a common low-dimensional feature space to predict DTI.

NetLapRLS [34] is a semi-supervised learning method—Laplacian regularized least square (LapRLS),
which use Laplacian Regular Least Squares (LapRLS) to simultaneously use a small amount of available
labeled data and a large amount of unlabeled data to obtain maximum generalization ability from the
chemical structure and genome sequence.

DTINet [35] is a network integration approach that integrates heterogeneous information of drug-target

heterogeneous networks.

RHINE [13] is a heterogeneous information network (HIN) embedding method which using the structural

characteristics of heterogeneous relations.

NeoDTI [20] integrates diverse information from heterogeneous networks and use graph neural network

to learn the representation of drugs and targets automatically.

EEG-DTI [36] propose an “end-to-end” learning framework based on heterogeneous graphical

convolutional networks to learn low-dimensional feature representations of drugs and targets.

Data Intelligence

485

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

5.3 Task 1: Predictive Performance of Labeled Network Based on PR Relationships Only

After analyzing the dataset, as shown in Table 2, we found that only the type of target-action relationship
is AR type, while most of the relationships in the drug-target heterogeneous network are PR type. And since
our task is to predict the interaction relationship between drug and target protein, here we temporarily
disregard the target-action relationship of AR type and only use the relationship of PR data, {drug-drug,
drug-target, drug-disease, drug-side effect, target-target, target-disease} and compare the performance of
our model with the other DTI prediction models.

During the experiment, we used 10% of the drug-target relationship and all other PR relationships as
the training set, and the remaining 90% of the drug-target relationships was held out as the test set.
According to the difference between positive and negative examples, we conducted two different
experiments, the first one in which the ratio between positive and negative samples was set to 1:10, le
other in which all unknown drug-target interacting pairs were considered as negative samples. Since the
EEG-DTI model must consider all negative sample relationship pairs, experiments with AUC (1:10) are not
supported.

The comparison results between our model and other models are shown in Table 3. The AUC scores
obtained by our model in two different scenario prediction experiments are 94.3% et 95.8%, lequel
exceeds the method NeoDTI by 3% et 2% respectivement. Compared to NeoDTI, the embedding dimension
of our method is 300, and NeoDTI is 1, 024.

What needs to be explained here is that in the NeoDTI experiment, in addition to the data mentioned
au-dessus de, the similarity information of the drug structure and the similarity information of the protein sequence
are also used. En outre, NeoDTI is very time consuming and its running time is about 100 times that
of our method.

Tableau 3. Performance evaluation of different models based on PR relations.

Method

AUC (1:10)

AUC (tous)

MSCMF*
DT-Hybrid*
BLMNI*
HNM*
NetLapRLS*
DTINet*
NeoDTI*
EEG-DTI*
RTHNE_DTI*

0.831
0.842
0.855
0.891
0.905
0.919
0.941
–
0.958

0.849
0.833
0.850
0.890
0.895
0.909
0.913
0.831
0.943

5.4 Task 2: Predictive Performance of Labeled Network Based on All Relationships

In this task, we consider the target-action AR-type relationship, modeled individually for the characteristics
of this relationship type, and incorporated it into the model of the PR-type relationship in task1. Nous

486

Data Intelligence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

compare it with more advanced approaches. As before, we still use the 10% drug-target relationship and
all other relationships as the training set, and the remaining data is used as the test set. For a fair comparison,
we set the embedding dimension d = 100, because the two models run the most efficiently when the
dimensionality is low, and all unknown pairs were targeted as the negative samples for all methods in this
experiment. The results are shown in Table 4.

It can be seen from the results that our model is superior to the other two methods. Ici, the NeoDTI
method also utilizes the similarity information between the drug and the target protein, but the AUC value
of our model is still 9% higher than it. Compared with EEG-DTI, our model fully considers the difference
between AR relationship and PR relationship, and the AUC value is 11% higher than it. Compared with
the RHINE method, our method considers more heterogeneous relationships. In terms of AUPR metrics,
our model also far exceeds the RHINE model and is on par with the NeoDTI model. Ainsi, our results have
better performance in this task. En outre, in this experiment, the AUC value of our method is 96.93%,
which is about 3% higher than the result using only the PR relationship. It proves that the AR type relationship
is also very important to improve the prediction ability.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Tableau 4. Performance evaluation of different models based on all relations.

Method

NeoDTI*
EEG-DTI*
RHINE
RTHNE_DTI*

AUC

0.883
0.854
0.923
0.958

AUPR

0.288
0.600
0.145
0.264

5.5 Task3: Predictive Performance of Unlabeled Network Based on All Relationships

In the existing DTI prediction methods, the drug and target pairs with known relationships are added to
the training set to train the model. Cependant, we assume whether it is possible to not add the relationship
for prediction in the training set and only use others. In order to verify this conjecture, we conducted an
experiment on task 3.

For task 3, We use all drug-target relationship pairs as test sets. The ratio of positive and negative samples
est 1:10. Through experiments, our model’s AUC and AUPR scores are 92.11% et 63.69%, respectivement.
Donc, we can use their external relationships to predict when we have no clue whether there is an
interaction between a drug and a target.

As shown in Figure 4, because we removed the target-drug interaction relationship in the task 3 experiment,
the AUC value of our model RTHNE_DTI in task 3 (without label) is lower than that in task 2 (with label),
but still much higher than the results of the NeoDTI model and RHINE model in task 2 (with label).

Data Intelligence

487

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 4. Performance evaluation of unlabeled network.

In addition, in terms of AUPR metrics, the results of our model RTHNE_DTI in task 3 (without labels)

are the best, much higher than the results of all models in task 2 (with labels).

5.6 Predictive performance based on other datasets

DBLP is an integrated database system of computer English literature with the author as the core of the

research results in the computer field. The details of the DBLP dataset are shown in Table 5.

Tableau 5. Statistical analysis of DBLP dataset.

Nodes

Term(T)
Paper(P.)
Author(UN)
Conference(C)

Num of
Nodes

8811
14376
14475
20

Relations

P-C
A-PC
P-P
D-Di
P-A

Num of
Relations

14376
24495
41794
88683
260605

Avg.tu

Avg.tv

D(e)

S(e)

Relation-
ship type

1.0
2.9
2.8
6.2
18.0

718.8
2089.7
2.9
10.7
29.6

718.8
720.6
1.0
1.7
1.6

0.05
0.08
0.0002
0.007
0.002

AR
AR
PR
PR
PR

From Table 5, we can see that the DBLP dataset contains more PR relations. We respectively predict the
two relationship pairs author-author (A-A) and author-conference (A-C) in this experiment. The result is
shown in Table 6.

488

Data Intelligence

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

Tableau 6. Performance evaluation of different datasets.

Dataset

DBLP(A-A)
DBLP(A-C)
Drug-target dataset (D-T)

AUC

0.924
0.906
0.969

The above experiments demonstrate that our method not only has good performance on drug networks,
but also can achieve good results on scholar networks, and AUC value of our model do not fluctuate much
on different datasets. It shows that our model has good robustness.

6. CONCLUSION AND FUTURE WORK

Accurately predicting the interaction between drugs and targets is important for drug research and
development. Dans ce document, we apply the method of heterogeneous network representation learning to
predict drug-target interactions. We build a heterogeneous network by the rich external relationships
between drugs and target proteins and learn about drug and protein representations through neighboring
nodes. We use data intelligence methods to divide the relationships into two categories: Affiliation relations
and Peer relations, based on the different topologies of the relationships in the heterogeneous network, et
model them separately. By doing this, our model can better capture the topological and semantic information
of drug network in the same time, thus taking shorter time and achieving better results. En outre, le
RTHNE_DTI model plays an important role in the real world. Par exemple, we used RTHNE_DTI to discover
a novel interaction between the drug Acemetacin and the target protein PTGS1, which has been proved to
be correct in Drugbank database. It has proven to provide a powerful and useful tool for the drug discovery
and drug repositioning process. In the future, we will consider the rich domain knowledge of drugs and
proteins based on heterogeneous networks to further enhance the predictive effect of RTHNE_DTI and
validate some of the predictions by wet lab experiments.

ACKNOWLEDGEMENTS

This research was funded by the National Natural Science Foundation of China, numéro de subvention 61402220,
the key program of Scientific Research Fund of Hunan Provincial Education Department, grant number
19A439, the Project supported by the Natural Science Foundation of Hunan Province, Chine, numéro de subvention
2020JJ4525 and grant number 2022JJ30495.

AUTHOR CONTRIBUTION STATEMENT

Ll. Z (zhanglinlin@stu.usc.edu.cn, ORCID: 0000-0002-5035-3571) was jointly responsible for the
investigation of the study, organization of data, conception, analyse, and validation of methods, preparation of
ressources, and preparation of the original manuscript. Cp. Ouyang (ouyangcp@126.com, ORCID: 0000-
0002-2154-0079) was jointly responsible for the investigation of the study, organization of the data, le

Data Intelligence

489

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

conceptualization of the methods, and review and editing of the manuscript, as well as supervision and
project management. Fy. H (fyhu1124@163.com, ORCID: 0000-0002-1004-6913) contributed to the
investigation of the study, the conceptualization of the methods, analyse, validation, and data curation.
Yb. L participated in the model optimization and experimental design and analyzed the results. Z.G
(gao27@indiana.edu) provided important feedback and edited the manuscript.

RÉFÉRENCES

[1] Kapetanovic, JE.: Computer-aided drug discovery and development (caddd): in silico-chemico-biological

approche. Chemico-Biological Interactions 171(2), 165–176 (2008)

[2] Pathak, J., Kiefer, R.C., Chute, C.G.: Mining drug-drug interaction patterns from linked data: A case study for
warfarin, clopidogrel, and simvastatin. Dans: 2013 IEEE International Conference on Bioinformatics and
Biomedicine, pp. 23–30 (2013)

[3] Ding, Y., Tang, J., Guo, F.: Identification of drug-target interactions via multiple information integration.

Information Sciences 418, 546–560 (2017)

[4] D’Souza, S., Prema, K., Balaji, S.: Machine learning models for drug–target interactions: current knowledge

and future directions. Drug Discovery Today 25(4),748–756 (2020)

[5] Yamanishi, Y., Araki, M., Gutteridge, UN., Honda, W., Kanehisa, M.: Prediction of drug–target interaction

net-works from the integration of chemical and genomic spaces. Bioinformatics 24(13), i232–i240, (2008)

[6] Öztürk, H., Özgür, UN., Ozkirimli, E.: Deepdta: deep drug–target binding affinity prediction. Bioinformatics

34(17), i821–i829 (2018)

[7] Wang, Y.-B., You, Z.-H., Lequel, S., Faire, H.-C., Chen, Z.-H., Zheng, K.: A deep learning-based method for
drug-target interaction prediction based on long short-term memory neural network. BMC Medical Informatics
and Decision Making 20(2), 1–9 (2020)

[8] Chen, X., Yan, C.C., Zhang, X., Zhang, X., Dai, F., Yin, J., Zhang, Y.: Drug–target interaction prediction:
databases, web servers and computational models. Briefings in Bioinformatics 17(4), 696–712 (2016)
[9] Zeng, X., Zhu, S., Hou, Y., Zhang, P., Li, L., Li, J., Huang, L.F., Lewis, S.J., Nussinov, R., Cheng F.: Réseau-
based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest.
Bioinformatics 36(9), 2805–2812 (2020)

[10] Zheng, X., Ding, H., Mamitsuka, H., Zhu, S.: Collaborative matrix factorization with multiple similarities for
predicting drug-target interactions. Dans: Proceedings of the 19th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 1025–1033 (2013)

[11] Mayr, UN., Klambauer, G., Unterthiner, T., Steijaert, M., Wegner, J.K., Ceulemans, H., Clevert, D.-A., Hochreiter,
S.: Large-scale comparison of machine learning methods for drug target prediction on chembl. Chemical
Science 9(24), 5441–5451(2018)

[12] Lee, JE., Keum, J., Nam, H.: Deepconv-dti: Prediction of drug-target interactions via deep learning with

convolution on protein sequences. Biologie computationnelle PLoS 15(6), e1007129, (2019)

[13] Lu, Y., Shi, C., Hu, L., Liu, Z.: Relation structure-aware heterogeneous information network embedding. Dans:

Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4456–4463 (2019)

[14] Shang, J., Qu, M., Liu, J., Kaplan, L.M., Han, J., Peng, J.: Meta-path guided embedding for similarity search

in large-scale heterogeneous information networks (2016)

[15] Fu, T.-Y., Lee, W.-C., Lei, Z.: Hin2vec: Explore meta-paths in heterogeneous information networks for
representation learning. Dans: Actes du 2017 ACM on Conference on Information and Knowledge
Management, pp. 1797–1806 (2017)

490

Data Intelligence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

[16] Han, X., Shi, C., Wang, S., Philip, S.Y., Song, L.: Aspect-level deep collaborative filtering via heterogeneous

information networks. In IJCAI, pp. 3393–3399 (2018)

[17] Muller, E., Peres, R.: The effect of social networks structure on innovation performance: A review and

directions for research. International Journal of Research in Marketing 36(1), 3–19 (2019)

[18] Bu, Y., Huang, Y., Lu, W.: Loops in publication citation networks. Journal of Information Science 46(6),

837–848 (2020)

[19] Jin, S., Zeng, X., Xia, F., Huang, W., Liu, X.: Application of deep learning methods in biological networks.

Briefings in Bioinformatics 22(2), 1902–1917 (2020)

[20] Wan, F., Hong, L., Xiao, UN., Jiang, T., Zeng, J.: Neodti: neural integration of neighbor information from a
heterogeneous network for discovering new drug–target interactions. Bioinformatics 35(1), 104–111 (2019)

[21] Wasserman, S., Faust, K., et coll.: Social network analysis: Methods and Applications 8, (1994)
[22] Faust, K.: Centrality in affiliation networks. Social Networks 19(2), 157–191 (1997)
[23] Bordes, UN., Usunier, N., Garcia-Duran, UN., Weston, J., Yakhnenko, O.: Translating embeddings for modeling
multi-relational data. In Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
[24] Lequel, J., Leskovec, J.: Community-affiliation graph model for overlapping network community detection.

Dans 2012 IEEE 12th International Conference on Data Mining, pp. 1170–1175 (2012)

[25] Danielsson, P.-E.: Euclidean distance mapping. Computer Graphics and Image Processing 14(3), 227–248

(1980)

[26] Hsieh, C.-K., Lequel, L., Cui, Y., Lin, T.-Y., Belongie, S., Estrin, D.: Collaborative metric learning. In Proceedings

of the 26th international conference on world wide web, pp. 193–201 (2017)

[27] Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, UN., Grant, J.R., Sajed, T., Johnson, D., Li, C., Sayeeda,
Z.: Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Research 46(D1),
D1074–D1082 (2018)

[28] Keshava Prasad, T., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D.,
Raju, B.R., Shafreen, UN. Venugopal: Human protein reference database—2009 update. Nucleic Acids
Research 37(suppl 1), D767–D772 (2009)

[29] Davis, A.P., Murphy, C.G., Johnson, R., Lay, J.M., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D.,
King, B.L., Rosenstein, M.C., Wiegers T.C.: The comparative toxicogenomics database: update 2013.
Nucleicacids Research 41(D1), D1104–D1114 (2013)

[30] Kuhn, M., Campillos, M., Letunic, JE., Jensen, L.J., Bork, P.: A side effect resource to capture phenotypic effects

of drugs. Molecular Systems Biology 6(1), 343 (2010)

[31] Alaimo, S., Pulvirenti, UN., Giugno, R., Ferro, UN.: Drug–target interaction prediction through domain-tuned

network-based inference. Bioinformatics 29(16), 2004–2008 (2013)

[32] Mei, J.-P., Kwoh, C.-K., Lequel, P., Li, X.-L., Zheng, J.: Drug–target interaction prediction by learning from local

information and neighbors. Bioinformatics 29(2), 238–245 (2013)

[33] Wang, W., Lequel, S., Zhang, X., Li, J.: Drug repositioning by integrating target information through a

heterogeneous network model. Bioinformatics 30(20), 2923–2930 (2014)

[34] Xia, Z., Wu, L.-Y., Zhou, X., Wong, S.T.: Semi-supervised drug-protein interaction prediction from

heterogeneous biological spaces. In BMC Systems Biology, p. S6 (2010)

[35] Luo, Y., Zhao, X., Zhou, J., Lequel, J., Zhang, Y., Kuang, W., Peng, J., Chen, L., Zeng, J.: A network integration
approach for drug-target interaction prediction and computational drug repositioning from heterogeneous
information. Communications naturelles 8(1), 1–13 (2017)

[36] J.G., Peng, J., Wang, Y.: An end-to-end heterogeneous graph representation learning-based framework for

drug–target interaction prediction. Briefings in Bioinformatics 22(5), 1–9 (2021)

Data Intelligence

491

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

AUTHOR BIOGRAPHY

Linlin Zhang received her B.S. degree in Software Engineering from the
School of Computer Science, University of South China, Chine, dans 2020. Elle
is pursuing her M.S. degree with a specialization in software engineering
at the University of South China. Her research interests include network
representation learning and drug discovery.
ORCID: 0000-0002-5035-3571

Chunping Ouyang received a Ph.D. degree from the University of Science
& Technology Beijing, Chine, dans 2011. Depuis 2017 à 2018, she was a visiting
scholar at Indiana University Bloomington. She is a professor of computer
science at the University of South China and supervisor of postgraduate. Son
main research interests are semantic web technology, knowledge graphs,
natural language processing, and domain data analysis.
ORCID: 0000-0002-2154-0079

Fuyu Hu received her master’s degree in computer technology from the
School of Computer Science, South China University, Chine, dans 2021. Son
research interests include network representation learning and drug discovery.
She is working on a related project at the Bank of Communications in China.
ORCID: 0000-0002-1004-6913

492

Data Intelligence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
d
n

t
/

un
r
t
je
c
e
–
p
d

F
/

5
2
4
7
5
2
0
8
9
7
7
2
d
n
_
un
_
0
0
1
4
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Relational Topology-based Heterogeneous Network Embedding for Predicting Drug-Target
Interactions

Yongbin Liu received a Ph.D. degree from the University of Science &
Technology Beijing, Chine, dans 2013. Depuis 2013 à 2015, he was a post-doc
research fellow at Tsinghua University. He is an associate professor at the
University of South China. His research interests include natural language
processing and knowledge engineering.
ORCID: 0000-0002-3369-3101

Zheng Gao is an Applied Scientist at Amazon Alexa AI. He received his
Ph.D. degree in Information Science and minor in Computer Science from
Indiana University Bloomington, advised by Prof. Xiaozhong Liu in 2020. Son
research interests are primarily in the area of Graph Mining and Natural
Language Processing (NLP). Particularly, he is applying deep learning
techniques on the interdisciplinary field therein them to solve Community
Detection, Information Retrieval and Recommendation related tasks. He is
currently working as an applied scientist at Amazon Alexa AI and build NLU
models to handle customer utterances.
ORCID: 0000-0001-7549-033X