Modeling Content and Context with Deep Relational Learning
Maria Leonor Pacheco and Dan Goldwasser
计算机科学系
Purdue University
West Lafayette, 在 47907
{pachecog, dgoldwas}@purdue.edu
抽象的
Building models for realistic natural language
tasks requires dealing with long texts and ac-
counting for complicated structural depen-
dencies. Neural-symbolic representations have
emerged as a way to combine the reasoning
capabilities of symbolic methods, 与
expressiveness of neural networks. 然而,
most of the existing frameworks for combining
neural and symbolic representations have been
designed for classic relational learning tasks
that work over a universe of symbolic entities
and relations. 在本文中, we present DRAIL,
an open-source declarative framework for spe-
cifying deep relational models, designed to
support a variety of NLP scenarios. Our frame-
work supports easy integration with expressive
language encoders, and provides an interface to
study the interactions between representation,
inference and learning.
1 介绍
Understanding natural language interactions in
realistic settings requires models that can deal with
noisy textual
the depen-
输入, reason about
dencies between different textual elements, 和
leverage the dependencies between textual content
and the context from which it emerges. Work in
linguistics and anthropology has defined context
as a frame that surrounds a focal communicative
event and provides resources for its interpretation
(Gumperz, 1992; Duranti and Goodwin, 1992).
As a motivating example, consider the interac-
tions in the debate network described in Figure 1.
Given a debate claim (t1), and two consecutive
posts debating it (p1, p2), we define a textual in-
ference task, determining whether a pair of text
elements hold the same stance in the debate
(denoted using the relation Agree(X, 是)). 这
task is similar to other textual inference tasks
100
(Bowman et al., 2015) that have been successfully
approached using complex neural representations
(Peters et al., 2018; Devlin et al., 2019). In add-
行动, we can leverage the dependencies between
these decisions. 例如, assuming that one
post agrees with the debate claim (Agree(t1,
p2)), and the other one does not (¬Agree(t1, p1)),
the disagreement between the two posts can be
inferred: ¬Agree(t1, p1) ∧ Agree(t1, p2) → ¬
Agree(p1, p2). 最后, we consider the social
context of the text. The disagreement between the
posts can reflect a difference in the perspectives
their authors hold on the issue. This informa-
tion might not be directly observed, but it can be
inferred using the authors’ social interactions and
行为, given the principle of social homophily
(McPherson et al., 2001), stating that people with
strong social ties are likely to hold similar views
and authors’ perspectives can be captured by rep-
resenting their social interactions. Exploiting this
information requires models that can align the
social representation with the linguistic one.
Motivated by these challenges, we introduce
DRAIL1, a Deep Relational Learning framework,
which uses a combined neuro-symbolic repre-
sentation for modeling the interaction between
multiple decisions in relational domains. 相似的
to other neuro-symbolic approaches (Mao et al.,
2019; Cohen et al., 2020), our goal is to exploit
the complementary strengths of the two modeling
paradigms. Symbolic representations, used by
logic-based systems and by probabilistic graphical
型号 (Richardson and Domingos, 2006; 巴赫
等人。, 2017), are interpretable, and allow domain
experts to directly inject knowledge and constrain
the learning problem. Neural models capture de-
pendencies using the network architecture and are
better equipped to deal with noisy data, 例如
文本. 然而, they are often difficult to interpret
and constrain according to domain knowledge.
1https://gitlab.com/purdueNlp/DRaiL/.
计算语言学协会会刊, 卷. 9, PP. 100–119, 2021. https://doi.org/10.1162/tacl 00357
动作编辑器: Hoifung Poon. 提交批次: 6/2020; 修改批次: 10/2020; 已发表 3/2021.
C(西德:13) 2021 计算语言学协会. 根据 CC-BY 分发 4.0 执照.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
texts reflecting these ideologies, by exploiting
the relations that bridge social and linguistic
信息 (见图 1).
To demonstrate DRAIL’s modeling approach,
we introduce the task of open-domain stance
prediction with social context, which combines
social network analysis and textual inference over
complex opinionated texts, 如图 1.
We complement our evaluation of DRAIL with two
additional tasks, issue-specific stance prediction,
where we identify the views expressed in debate
forums with respect to a set of fixed issues (沃克
等人。, 2012), and argumentation mining (Stab
和古列维奇, 2017), a document-level discourse
analysis task.
2 相关工作
在这个部分, we survey several lines of work
dealing with symbolic, neural, and hybrid repre-
sentations for relational learning.
2.1 Languages for Graphical Models
Several high-level languages for specifying graph-
ical models have been suggested. BLOG (Milch
等人。, 2005) and CHURCH (Goodman et al., 2008)
were suggested for generative models. For discri-
minative models, we have Markov Logic Net-
作品 (MLNs) (Richardson and Domingos, 2006)
and Probabilistic Soft Logic (PSL) (Bach et al.,
2017). Both PSL and MLNs combine logic and
probabilistic graphical models in a single repre-
sentation, where each formula is associated with
a weight, and the probability distribution over
possible assignments is derived from the weights
of the formulas that are satisfied by such assign-
评论. Like DRAIL, PSL uses formulas in clausal
形式 (specifically collections of horn clauses).
The main difference between DRAIL and these
languages is that, in addition to graphical models,
it uses distributed knowledge representations
to represent dependencies. Other discriminative
methods include FACTORIE (McCallum et al.,
2009), an imperative language to define factor
图表, Constrained Conditional Models (CCMs)
(Rizzolo and Roth, 2010; Kordjamshidi et al.,
2015) an interface to enhance linear classifiers
with declarative constraints, and ProPPR (王
等人。, 2013) a probabilistic logic for large data-
bases that approximates local groundings using a
variant of personalized PageRank.
数字 1: Example debate.
Our main design goal in DRAIL is to provide
a generalized tool, specifically designed for NLP
任务. Existing approaches designed for classic
relational learning tasks (Cohen et al., 2020),
such as knowledge graph completion, 不是
equipped to deal with the complex linguistic input,
whereas others are designed for very specific NLP
settings such as word-based quantitative reason-
ing problems (Manhaeve et al., 2018) or aligning
images with text (Mao et al., 2019). 我们讨论的是
differences between DRAIL and these approaches
in Section 2. The examples in this paper focus
on modelings various argumentation mining tasks
and their social and political context, but the same
principles can be applied to wide array of NLP
tasks with different contextualizing information,
such as images that appear next to the text, 或者
prosody when analyzing transcribed speech, 到
name a few examples.
DRAIL uses a declarative language for de-
fining deep relational models. Similar to other
declarative languages (Richardson and Domingos,
it allows users to
2006; Bach et al., 2017),
inject their knowledge by specifying dependencies
between decisions using first-order logic rules,
which are later compiled into a factor graph
with neural potentials. In addition to probabilistic
inference, DRAIL also models dependencies using
a distributed knowledge representation, denoted
RELNETS, which provides a shared representation
space for entities and their relations, trained using
a relational multi-task learning approach. 这
provides a mechanism for explaining symbols, 和
aligning representations from different modali-
领带. Following our running example, ideological
standpoints, such as Liberal or Conservative,
are discrete entities embedded in the same space
as textual entities and social entities. These en-
tities are initially associated with users, 然而
using RELNETS this information will propagate to
101
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2.2 Node Embedding and Graph Neural Nets
A recent alternative to graphical models is to use
neural nets to represent and learn over relational
数据, represented as a graph. Similar to DRAIL’s
the learned node representation can
RELNETS,
be trained by several different prediction tasks.
然而, unlike DRAIL, these methods do not
use probabilistic inference to ensure consistency.
Node embeddings approaches (Perozzi et al.,
2014; Tang et al., 2015; Pan et al., 2016; Grover
and Leskovec, Grover and Leskovec, 2016;
Tu et al., 2017) learn a feature representation
for nodes capturing graph adjacency information,
such that the similarity in the embedding space of
any two nodes is proportional
to their graph
distance and overlap in neighboring nodes. 一些
frameworks (Pan et al., 2016; Xiao et al., 2017;
Tu et al., 2017) allow nodes to have textual
特性, which provide an initial feature repre-
sentation when learning to represent the graph
关系. When dealing with multi-relational data,
such as knowledge graphs, both the nodes and
the edge types are embedded (Bordes et al., 2013;
王等人。, 2014; Trouillon et al., 2016; 孙等人。,
2019). 最后, these methods learn to represent
nodes and relations based on pair-wise node
关系, without representing the broader graph
context in which they appear. Graph neural nets
(Kipf and Welling, 2017; Hamilton et al., 2017;
Veliˇckovi´c et al., 2017) create contextualized
node representations by recursively aggregating
neighboring nodes.
2.3 Hybrid Neural-Symbolic Approaches
Several recent systems explore ways to combine
neural and symbolic representations in a unified
方式. We group them into five categories.
Lifted rules to specify compositional nets.
These systems use an end-to-end approach and
learn relational dependencies in a latent space.
Lifted Relational Neural Networks (LRNNs)
(Sourek et al., 2018) and RelNNs (Kazemi and
Poole, 2018) are two examples. These systems
map observed ground atoms, facts, and rules to
specific neurons in a network and define compo-
sition functions directly over them. While they
provide for a modular abstraction of the relational
输入, they assume all inputs are symbolic and do
not leverage expressive encoders.
Differentiable inference. These systems iden-
tify classes of logical queries that can be compiled
into differentiable functions in a neural network
基础设施. In this space we have Tensor
Logic Networks (TLNs) (Donadello et al., 2017)
and TensorLog (Cohen et al., 2020). Symbols are
represented as row vectors in a parameter matrix.
The focus is on implementing reasoning using a
series of numeric functions.
Rule induction from data. These systems
are designed for inducing rules from symbolic
knowledge bases, which is not in the scope of our
框架. In this space we find Neural Theorem
Provers (NTPs) (Rockt¨aschel and Riedel, 2017),
Neural Logic Programming (杨等人。, 2017),
DRUM (Sadeghian et al., 2019) and Neural Logic
Machines (NLMs) (Dong et al., 2019). NTPs use
a declarative interface to specify rules that add
inductive bias and perform soft proofs. 另一个
approaches work directly over the database.
Deep classifiers and probabilistic inference.
These systems propose ways to integrate prob-
abilistic inference and neural networks for diverse
learning scenarios. DeepProbLog (Manhaeve et al.
20180 extends the probabilistic logic program-
ming language ProbLog to handle neural
谓词. They are able to learn probabilities for
atomic expressions using neural networks. 这
input data consists of a combination of feature
vectors for the neural predicates, together with
other probabilistic facts and clauses in the logic
程序. Targets are only given at the output
side of the probabilistic reasoner, allowing them
to learn each example with respect to a single
query. 另一方面, Deep Probabilistic
Logic (DPL) (Wang and Poon 2018) combines
neural networks with probabilistic logic for indi-
rect supervision. They learn classifiers using neu-
ral networks and use probabilistic logic to intro-
duce distant supervision and labeling functions.
Each rule is regarded as a latent variable, 和
logic defines a joint probability distribution over
all labeling decisions. 然后, the rule weights and
the network parameters are learned jointly using
variational EM. 相比之下, DRAIL focuses on
learning multiple interdependent decisions from
数据, handling and requiring supervision for all
unknown atoms in a given example. 最后, Deep
Logic Models (DLMs) (Marra et al., 2019) learn a
set of parameters to encode atoms in a probabilistic
logic program. Similarly to Donadello et al. (2017)
102
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Symbolic Features
Neural Features
规则
Induction Symbols
Embed. End-to-end Backprop. to Architecture Multi-Task Open
来源
Encoders
Agnostic
学习
Neural
Symbolic Raw Decla- Prob/Logic
推理
Inputs
Inputs
rative
系统
MLN
FACTORIE
CCM
PSL
LRNNs
RelNNs
LTNs
TensorLog
NTPs
Neural LP
DRUM
NLMs
DeepProbLog
DPL
DLMs
DRAIL
桌子 1: Comparing systems.
and Cohen et al. (2020), they use differentiable
inference, allowing the model to be trained end-
to-end. Like DRAIL, DLMs can work with diverse
neural architectures and backpropagate back to
the base classifiers. The main difference between
DLMs and DRAIL is that DRAIL ensures repre-
sentation consistency of entities and relations
across all learning tasks by employing RELNETS.
Deep structured models. 更普遍,
deep structured prediction approaches have been
successfully applied to various NLP tasks such as
named entity recognition and dependency parsing
(Chen and Manning, 2014; Weiss et al., 2015; Ma
和霍维, 2016; Lample et al., 2016; Kiperwasser
and Goldberg, 2016; Malaviya et al., 2018).
When the need arises to go beyond sentence-
等级, some works combine the output scores of
independently trained classifiers using inference
(Beltagy et al., 2014; ?; 刘等人。, 2016;
Subramanian et al., 2017; Ning et al., 2018),
whereas others implement joint learning for their
specific domains (Niculae et al., 2017; Han et al.,
2019). Our main differentiating factor is that we
provide a general interface that leverages first
order logic clauses to specify factor graphs and
express constraints.
To summarize these differences, we outline a
feature matrix in Table 1. Given our focus in
NLP tasks, we require a neural-symbolic system
那 (1) allows us to integrate state-of-the-art text
encoders and NLP tools, (2) supports structured
prediction across long texts, (3) lets us combine
several modalities and their representations (例如,
social and textual information), 和 (4) results in
an explainable model where domain constraints
can be easily introduced.
3 The DRAIL Framework
DRAIL was designed for supporting complex
NLP tasks. Problems can be broken down into
domain-specific atomic components (which could
be words, 句子, paragraphs or full documents,
depending on the task), and dependencies be-
tween them, their properties and contextualizing
information about them can be explicitly modeled.
In DRAIL, dependencies can be modeled over
the predicted output variables (similar to other
probabilistic graphical models), as well as over
the neural representation of the atoms and their
relationships in a shared embedding space. 这
section explains the framework in detail. We begin
with a high-level overview of DRAIL and the
process of moving from a declarative definition to
a predictive model.
A DRAIL task is defined by specifying a
finite set of entities and relations. Entities are
either discrete symbols (例如, POS tags, ideologies,
specific issue stances), or attributed elements with
complex internal information (例如, 文件,
用户). Decisions are defined using rule templates,
formatted as Horn clauses: tLH ⇒ tRH , 在哪里
tLH (身体) is a conjunction of observed and
predicted relations, and tRH (头)
这
输出
这
debate prediction task in Figure 1, it consists
of several sub-tasks, involving textual inference
(Agree(t1, t2)), social relations (VoteFor(你, v)) 和
their combination (Agree(你, t)). We illustrate how
是
relation to be learned. 考虑
103
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
we describe the neural components and learning
程序.
3.1 Modeling Language
We begin our description of DRAIL by defining
the templating language, consisting of entities,
关系, 和规则, and explaining how these
elements are instantiated given relevant data.
Entities are named symbolic or attributed
元素. An example of a symbolic entity is a
political ideology (例如, Liberal or Conservative).
An example of an attributed entity is a user with
年龄, 性别, and other profile information, 或者
a document associated with textual content. 在
DRAIL entities can appear either as constants,
written as strings in double or single quote
(例如, “user1”) or as variables, 哪个是
identifiers,
substituted with constants when
grounded. Variables are written using unquoted
upper case strings (例如, X, X1). Both constants and
variables are typed.
Relations are defined between entities and their
特性, or other entities. Relations are defined
using a unique identifier, a named predicate,
and a list of typed arguments. Atoms consist
of a predicate name and a sequence of entities,
consistent with the type and arity of the relation’s
argument list. If the atom’s arguments are all
constants, it is referred to as a ground atom. 为了
例子, Agree(“user1”, “user2”) is a ground
atom representing whether “user1” 和 “user2”
are in agreement. When atoms are not grounded
(例如, Agree(X, 是)) they serve as placeholders for
all the possible groundings that can be obtained by
replacing the variables with constants. 关系
can either be closed (IE。, all of their atoms are
observed) or open, when some of the atoms can
be unobserved. In DRAIL, we use a question mark
? to denote unobserved relations. These relations
are the units that we reason over.
To help make these concepts concrete, 骗局-
sider the following example analyzing stances in a
debate, as introduced in Figure 1. 第一的, 我们定义
the entities. User = {“u1”, “u2”}, Claim = {“t1”}
Post ={“p1”, “p2”}. Users are entities associ-
ated with demographic attributes and prefer-
恩塞斯. Claims are assertions over which users
debate. Posts are textual arguments that users
write to explain their position with respect
to the claim. We create these associations by
数字 2: General overview of DRAIL.
to specify the task as a DRAIL program in Figure 2
(左边), by defining a subset of rule templates to
predict these relations.
Each rule template is associated with a neural
architecture and a feature function, mapping the
initial observations to an input vector for each
neural net. We use a shared relational embedding
空间, denoted RELNETS, to represent entities and
relations over them. As described in Figure 2
(“RelNets Layer”), each entity and relation type
is associated with an encoder,
trained jointly
across all prediction rules. This is a form of re-
lational multi-task learning, as the same entities
and relations are reused in multiple rules and
their representation is updated accordingly. 每个
rule defines a neural net, learned over the relations
defined on the body. They they take a composition
of the vectors generated by the relations encoders
as an input (数字 2, “Rule Layer”). DRAIL
is architecture-agnostic, and neural modules for
relations and rules can be specified
实体,
using PyTorch (code snippets can be observed in
附录C). Our experiments show that we can
use different architectures for representing text,
用户, as well as for embedding discrete entities.
The relations in the Horn clauses can correspond
to hidden or observed information, and a specific
input is defined by the instantiations—or ground-
ings—of these elements. The collection of all rule
groundings results in a factor graph representing
our global decision, taking into account the con-
sistency and dependencies between the rules. 这
方式, the final assignments can be obtained by
running an inference procedure. 例如,
the dependency between the views of users on
the debate topic (Agree(你, t)) and the agreement
它们之间 (VoteFor(你, v)), is modeled as a
factor graph in Figure 2 (“Structured Inference
Layer”)).
我们
在
部分 3.1. 然后, in Sections 3.2, 3.3, 和 4,
the DRAIL language
formalize
104
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
defining a set of relations, capturing author-
ship Author(User, 邮政), votes between users
VoteFor(User, User)?, and the position users, 和
他们的帖子, take with respect to to the debate
宣称. Agree(宣称, User)?, Agree(宣称, 邮政)?.
The authorship relation is the only closed one, 为了
例子, the atom: O = {作者(“u1”, “p1”)}.
Rules are functions that map literals (atoms or
their negation) to other literals. Rules in DRAIL are
defined using templates formatted as Horn clauses:
tLH ⇒ tRH, where tLH (身体) is a conjunction
of literals, and tRH (头) is the output literal
to be predicted, and can only be an instance of
open relations. Horn clauses allow us to describe
structural dependencies as a collection of “if-then”
规则, which can be easily interpreted. 考试用-
普莱, Agree(X, C) ∧ VoteFor(是, X) ⇒ Agree(是, C) 前任-
presses the dependency between votes and users
holding similar stances on a specific claim. 我们
note that rules can be rewritten in disjunctive form
by converting the logical implication into a dis-
junction between the negation of the body and the
头. 例如, the rule above can be rewritten
as ¬Agree(X, C) ∨ ¬VoteFor(是, X) ∨ Agree(是, C).
The DRAIL program consists of a set of rules,
which can be weighted (IE。, soft constraints), 或者
unweighted (IE。, hard constraints). Each weighted
rule template defines a learning problem, 用过的
to score assignments to the head of the rule.
Because the body may contain open atoms,
each rule represents a factor function expressing
dependencies between open atoms in the body
and head. Unweighted rules, or constraints, shape
the space of feasible assignments to open atoms,
and represent background knowledge about the
domain.
Given the set of grounded atoms O, rules can
be grounded by substituting their variables with
constants, such that the grounded atoms corres-
pond to elements in O. This process results in a set
of grounded rules, each corresponding to a poten-
tial function or to a constraint. Together they
define a factor graph. 然后, DRAIL finds the
optimally scored assignments for open atoms by
performing MAP inference. To formalize this
过程, we first make the observation that rule
groundings can be written as linear inequalities,
directly corresponding to their disjunctive form,
as follows:
做 +
X
i∈I +
r
X
i∈I −
r
(1 − yi) ≥ 1
(1)
105
(I −
Where I +
r ) correspond to the set of open
r
atoms appearing in the rule that are not negated
(分别, negated). 现在, MAP inference
can be defined as a linear program. Each rule
grounding r, generated from template t(r), 和
input features xr and open atoms yr defines the
潜在的
ψr(xr, yr) = min
做 +
X
i∈I +
r
X
i∈I −
r
(1 − yi), 1
(2)
added to the linear program with a weight wr.
Unweighted rule groundings are defined as
C(xc, yc) = 1 -
yi −
X
i∈I +
C
X
i∈I −
C
(1 − yi)
(3)
with c(xc, yc) ≤ 0 added as a constraints to the
linear program. 这边走, the MAP problem can
be defined over the set of all potentials Ψ and the
set of all constraints C as
arg max
y∈ {0,1}n
磷 (y|X) ≡ arg max
y∈ {0,1}n X
ψr,t∈Ψ
wr ψr(xr, yr)
such that c(xc, yc) ≤ 0; ∀c ∈ C
In addition to logical constraints, we also support
arithmetic constraints than can be written in the
form of linear combinations of atoms with an
inequality or an equality. 例如, 我们可以
enforce the mutual exclusivity of liberal and
conservative ideologies for any user X by writing:
Ideology(X, “骗局”) + Ideology(X, “库”) = 1
We borrow some additional syntax from PSL to
make arithmetic rules easier to use. Bach et al.
(2017) define a summation atom as an atom that
takes terms and/or sum variables as arguments.
A summation atom represents the summations of
ground atoms that can be obtained by substituting
individual variables and summing over all possible
constants for sum variables. 例如, 我们
could rewrite the above ideology constraint
as Ideology(X, +我) = 1, where Ideology(X, +我)
represents the summation of all atoms with
predicate Ideology that share variable X.
DRAIL uses two solvers, Gurobi
(Gurobi
Optimization, 2015) and AD3 (Martins et al.,
2015)
for exact and approximate inference,
分别.
To ground DRAIL programs in data, we create
an in-memory database consisting of all relations
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
expressed in the program. Observations associated
with each relation are provided in column sepa-
rated text files. DRAIL’s compiler instantiates the
program by automatically querying the database
and grounding the formatted rules and constraints.
3.2 Neural Components
Let r be a rule grounding generated from template
t, where t is tied to a neural scoring function
Φt and a set of parameters θt (Rule Layer in
数字 2). In the previous section, we defined the
MAP problem for all potentials ψr(X, y) ∈ Ψ in a
DRAIL program, where each potential has a weight
wr. Consider the following scoring function:
wr = Φt(xr, yr; θt) = Φt(xrel0, . . . , xreln−1; θt)
(4)
Notice that all potentials generated by the same
template share parameters. We define each scoring
function Φt over the set of atoms on the left
hand side of the rule template. Let t = rel0 ∧
rel1 ∧ . . . ∧ reln−1 ⇒ reln be a rule template.
Each atom reli is composed of a relation type, 它是
arguments and feature vectors for them, as shown
图中 2, “Input Layer”.
Given that a DRAIL program is composed of
many competing rules over the same problem, 我们
want to be able to share information between the
different decision functions. For this purpose, 我们
introduce RELNETS.
3.3 RELNETS
A DRAIL program often uses the same entities and
relations in multiple different rules. The symbolic
aspect of DRAIL allows us to constrain the values
of open relations, and force consistency across all
their occurrences. The neural aspect, as defined in
Eq. 4, associates a neural architecture with each
rule template, which can be viewed as a way to
embed the output relation.
那
事实
We want
to exploit
有
repeating occurrences of entities and relations
across different rules. Given that each rule defines
a learning problem, sharing parameters allows us
to shape the representations using complementary
learning objectives. This form of relational multi-
task learning is illustrated it in Figure 2, “RelNets
Layer”.
We formalize this idea by introducing relation-
specific and entity-specific encoders and their
参数 (φrel; θrel) 和 (φent; θent), 哪个是
106
reused in all rules. 举个例子, let’s write
the formulation for the rules outlined in Figure 2,
where each relation and entity encoder is defined
over the set of relevant features.
wr0 = Φt0(φdebates(φuser, φtext))
wr1 = Φt1(φagree(φuser, φtext), φvotefor(φuser, φuser))
Note that entity and relation encoders can be
arbitrarily complex, depending on the application.
例如, when dealing with text, we could
use BiLSTMs or a BERT encoder.
Our goal when using RELNETS is to learn entity
representations that capture properties unique to
their types (例如, 用户, 问题), as well as relational
patterns that contextualize entities, allowing them
to generalize better. We make the distinction
between raw (or attributed) entities and symbolic
实体. Raw entities are associated with rich, 然而
unstructured, information and attributes, 例如
text or user profiles. 另一方面, 象征性的
entities are well-defined concepts, and are not
associated with additional information, 例如
political ideologies (例如, liberal) and issues (例如,
gun-control). With this consideration, we identify
two types of representation learning objectives:
Embed Symbol / Explain Data: Aligns the
embedding of symbolic entities and raw entities,
grounding the symbol in the raw data, and us-
ing the symbol embedding to explain properties
of previously unseen raw-entity instances. 为了
例子, aligning ideologies and text to (1) obtain
an ideology embedding that
到
statements made by people with that ideology,
或者 (2) interpret text by providing a symbolic label
为了它.
is closest
Translate / Correlate: Aligns the represen-
tation of pairs of symbolic or raw entities. 为了
例子, aligning user representations with text,
to move between social and textual information, 作为
shown in Figure 1, “Social-Linguistic Relations”.
Or capturing the correlation between sym-
bolic judgements like agreement and matching
ideologies.
4 学习
The scoring function used for comparing output
assignments can be learned locally for each
rule separately, or globally, by considering the
dependencies between rules.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Global Learning The global approach uses
inference to ensure that the parameters for all
weighted rule templates are consistent across all
决定. Let Ψ be a factor graph with potentials
{ψr} ∈ Ψ over the all possible structures Y .
Let θ = {θt} be a set of parameter vectors, 和
Φt(xr, yr; θt) be the scoring function defined for
potential ψr(xr, yr). Here ˆy ∈ Y corresponds
to the current prediction resulting from the MAP
inference procedure and y ∈ Y corresponds to the
gold structure. We support two ways to learn θ:
(1) The structured hinge loss
max(0, max
ˆy∈Y
(∆(ˆy, y) +
-
Φt(xr, ˆyr; θt))
Φt(xr, yr; θt)
X
ψr∈Ψ
X
ψr∈Ψ
(5)
(2) The general CRF loss
−log p(y|X)= −log
1
Z(X) 是
ψr∈Ψ
经验值
Φt(xr, yr; θt)
(西德:9)
(西德:8)
= −
X
ψr ∈Ψ
Φt(xr, yr; θt) + log Z(X)
(6)
Where Z(X) is a global normalization term
computed over the set of all valid structures Y .
Z(X) =
X
y’∈Y
是
ψr∈Ψ
经验值
Φt(xr, y′
(西德:8)
r; θt)
(西德:9)
When inference is intractable, 近似
inference (例如, AD3) can be used to obtain ˆy.
To approximate the global normalization term
Z(X) in the general CRF case, we follow Zhou
等人. (2015); Andor et al. (2016) and keep a pool
βk of k of high-quality feasible solutions during
inference. 这边走, we can sum over the solutions
in the pool to approximate the partition function
Φt(xr, y′
Py’∈βk Qψr∈Ψ exp
在本文中, we use the structured hinge loss
for most experiments, and include a discussion on
the approximated CRF loss in Section 5.7.
r; θt)
.
(西德:9)
(西德:8)
Inference The parameters
Joint
each
weighted rule template are optimized indepen-
dently. Following Andor et al. (2016), we show
that joint inference serves as a way to greedily
approximate the CRF loss, where we replace the
为了
107
normalization term in Eq. (6) with a greedy
approximation over local normalization as:
1
Qψr ∈Ψ ZL(xr) 是
Φt(xr, yr; θt) +
ψr ∈Ψ
−log
= −
X
ψr ∈Ψ
经验值
Φt(xr, yr; θt)
(西德:9)
(西德:8)
X
ψr ∈Ψ
log ZL(xr)
(7)
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
where ZL(xr) is computed over all the valid
assignments y′
r for each factor ψr. We refer to
models that use this approach as JOINTINF.
ZL(xr) =
经验值
(西德:8)
X
y′
r
Φt(xr, y′
r; θt)
(西德:9)
5 Experimental Evaluation
We compare DRAIL to representative models from
each category covered in Section 2. Our goal is to
examine how different types of approaches cap-
ture dependencies and what are their limitations
when dealing with language interactions. 这些
baselines are described in Section 5.1. 我们也
evaluate different strategies using DRAIL in
部分 5.2.
We focus on three tasks: open debate stance
prediction (秒. 5.3), issue-specific stance pre-
措辞 (秒. 5.4) and argumentation mining
(秒. 5.5), details regarding the hyper-parameters
used for all tasks can be found in Appendix B.
5.1 基线
End-to-end Neural Nets: We test all approaches
against neural nets trained locally on each task,
without explicitly modeling dependencies. 在这个
空间, we consider two variants: INDNETS, 在哪里
each component of the problem is represented
using an independent neural network, and E2E,
where the features for the different components
are concatenated at the input and fed to a single
neural network.
Relational Embedding Methods: Introduced
in Section 2.2,
these methods embed nodes
and edge types for relational data. 他们是
typically designed to represent symbolic entities
and relations. 然而, because our entities can be
defined by raw textual content and other features,
we define the relational objectives over our
encoders. This adaptation has proven successful
for domains dealing with rich textual information
三
(Lee and Goldwasser, 2019). We test
relational knowledge objectives: TransE (Bordes
等人。, 2013), ComplEx (Trouillon et al., 2016),
and RotatE (孙等人。, 2019). Limitations: (1)
These approaches cannot constrain the space using
domain knowledge, 和 (2) they cannot deal with
relations involving more than two entities, 限制
their applicability to higher order factors.
Probabilistic Logics: We compare to PSL
(Bach et al., 2017), a purely symbolic probabilistic
逻辑, and TensorLog (Cohen et al., 2020), A
neuro-symbolic one. In both cases, we instantiate
the program using the weights learned with our
base encoders. Limitations: These approaches do
not provide a way to update the parameters of the
base classifiers.
5.2 Modeling Strategies
Local vs. Global Learning: The trade-off be-
tween local and global learning has been explored
for graphical models (MEMM vs. 病例报告表), 并为
deep structured prediction (Chen and Manning,
2014; Andor et al., 2016; Han et al., 2019).
Although local
the learned
scoring functions might not be consistent with
the correct global prediction. 下列的 (Han
等人。, 2019), we initialize the parameters using lo-
cal models.
learning is faster,
RELNETS: We will show the advantage of
having relational representations that are shared
across different decisions, in contrast to having
independent parameters for each rule. 注意
in all cases, we will use the global learning objec-
tive to train RELNETS.
模块化: Decomposing decisions
进入
relevant modules has been shown to simplify the
learning process and lead to better generalization
(Zhang and Goldwasser, 2019). We will contrast
the performance of modular and end-to-end mod-
els to represent text and user information when
predicting stances.
Representation Learning and Interpretabil-
性: We will do a qualitative analysis to show how
we are able to embed symbols and explain data
by moving between symbolic and sub-symbolic
陈述, as outlined in Section 3.3.
5.3 Open Domain Stance Prediction
传统上, stance prediction tasks have focused
on predicting stances on a specific topic, 例如
流产. Predicting stances for a different topic,
such as gun control, would require learning a new
108
数字 3: DRAIL Program for O.D. Stance Prediction.
时间: Thread, C: 宣称, 磷: 邮政, U: User, V: Voter, 我: Ideology, A,乙:
Can be any in {宣称, 邮政, User}
model from scratch. In this task, we would like to
leverage the fact that stances in different domains
are correlated. Instead of using a pre-defined set
of debate topics (IE。, symbolic entities) 我们定义
the prediction task over claims, expressed in text,
specific to each debate. Concretely, each debate
will have a different claim (IE。, different value for
C in the relation Claim(时间, C), where T corresponds
to a debate thread). We refer to these settings as
Open-Domain and write down the task in Figure 3.
In addition to the textual stance prediction problem
(r0), where P corresponds to a post, we represent
用户 (U) and define a user-level stance prediction
问题 (r1). We assume that additional users read
the posts and vote for content that supports their
意见, resulting in another prediction problem
(r2,r3). 然后, we define representation learning
任务, which align symbolic (ideology, 定义为
我) 和原始的 (users and text) 实体 (r4-r7). 最后,
we write down all dependencies and constrain the
final prediction (c0-c7).
数据集: We collected a set of 7,555 辩论
from debate.org, containing a total of 42,245
posts across 10 broader political issues. 为一个
given issue, the debate topics are nuanced and
vary according to the debate question expressed in
文本 (例如, Should semi-automatic guns be banned,
Conceal handgun laws reduce violent crime).
Debates have at least two posts, containing up
到 25 sentences each. In addition to debates and
posts, we collected the user profiles of all users
participating in the debates, as well as all users
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
模型
Random
U
Hard
U
磷
磷
V
当地的
INDNETS
E2E
TransE
Reln.
Emb. ComplEx
V
63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
58.5 54.1 52.6 57.2 53.1 51.2
61.0 63.3 58.1 57.3 55.0 55.4
59.6 58.3 54.2 57.9 54.6 51.0
问题.
78.7 77.5 55.4 72.6 71.8 52.6
Logic. TensorLog 72.7 71.9 56.2 70.0 67.4 55.8
80.2 79.2 54.4 76.9 75.5 51.3
80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
81.9 80.4 57.0 78.0 77.2 53.7
E2E +Inf
JOINTINF
GLOBAL
RELNETS
RotatE
PSL
DRaiL
当地的
AC
AC
直流
AC
直流
SC
模型
Random
U
Hard
U
磷
磷
V
V
INDNETS 63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
E2E
JOINTINF 73.6 71.8
GLOBAL
73.6 72.0
RELNETS 73.8 72.0
JOINTINF 80.7 79.5
GLOBAL
81.4 79.9
RELNETS 81.8 80.1
JOINTINF 80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
GLOBAL
RELNETS 81.9 80.4 57.0 78.0 77.2 53.7
69.0 67.2
69.0 67.2
71.7 69.5
75.6 74.4
75.8 74.6
77.8 76.4
–
–
–
–
–
–
–
–
–
–
–
–
桌子 2: General Results for Open Domain Stance Prediction (左边), Variations of the Model (正确的). 磷:邮政,
U:User, V:Voter
that cast votes for the debate participants. Profiles
ideology).
consist of attributes (例如, 性别,
User data is considerably sparse. We create two
evaluation scenarios, random and hard. 在里面
random split, debates are randomly divided into
ten folds of equal size. In the hard split, 辩论
are separated by political issue. This results in a
harder prediction problem, as the test data will not
share topically related debates with the training
数据. We perform 10-fold cross validation and
report accuracy.
Entity and Relation Encoders: We represent
posts and titles using a pre-trained BERT-small2
encoder (Turc et al., 2019), a compact version of
the language model proposed by Devlin et al.
2019. For users, we use feed-forward computa-
tions with ReLU activations over the profile fea-
tures and a pre-trained node embedding (Grover
and Leskovec, 2016) over the friendship graph.
All relation and rule encoders are represented
as feed-forward networks with one hidden layer,
ReLU activations and a softmax on top. 注意
all of these modules are updated during learning.
桌子 2 (左边) shows results for all the models
节中描述 5.1. In E2E models, post and
user information is collapsed into a single mod-
ule (规则), whereas in INDNETS, JOINTINF, GLOBAL
and RELNETS they are modeled separately. 全部
other baselines use the same underlying modular
encoders. We can appreciate the advantage of
relational embeddings in contrast to INDNETS for
user and voter stances, particularly in the case of
ComplEx and RotatE. We can attribute this to the
2We found negligible difference in performance between
BERT and BERT-small for this task, while obtaining a
considerable boost in speed.
fact that all objectives are trained jointly and entity
encoders are shared. 然而, approaches that
explicitly model inference, like PSL, TensorLog,
and DRAIL outperform relational embeddings and
end-to-end neural networks. This is because they
enforce domain constraints.
We explain the difference between the per-
formance of DRAIL and the other probabilistic
logics by: (1) The fact that we use exact inference
instead of approximate inference, (2) PSL learns
to weight the rules without giving priority to a
particular task, whereas the JOINTINF model works
directly over the local outputs, and most impor-
急切地, (3) our GLOBAL and RELNETS models back-
propagate to the base classifiers and fine-tune
parameters using a structured objective.
表中 2 (正确的) we show different versions
of the DRAIL program, by adding or removing
certain constraints. AC models only enforce
author consistency, AC-DC models enforce both
author consistency and disagreement between
受访者, 最后, AC-DC-SC models in-
troduce social information by considering voting
行为. We get better performance when we
model more contextualizing information for the
RELNETS case. This is particularly helpful in the
Hard case, where contextualizing information,
combined with shared representations, help the
model generalize to previously unobserved topics.
With respect to the modeling strategies listed in
部分 5.2, we can observe: (1) The advantage of
using a global learning objective, (2) the advantage
of using RELNETS to share information and (3) 这
advantage of breaking down the decision into
模块, instead of learning an end-to-end model.
然后, we perform a qualitative evaluation to
illustrate our ability to move between symbolic
109
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Issue Debate Statements
Guns
No gun laws should be passed restricting the right to bear arms
Gun control is an ineffective comfort tactic used by the government to fool the American people
Gun control is good for society
In the US handguns ought to be banned
The USA should ban most guns and confiscate them
Con Libt Mod Libl Pro
.98
.00
.00
.03
.22
.08
.60
.06
.14
.02
.01
.03
.99
.01
.00
.01
.02
.15
.93
.00
.01
.65
.06
.01
.00
问题
Ideology Statements close in the embedding space
LGBT
Libl
骗局
gay marriage ought be legalized, gay marriage should be legalized, same-sex marriage should be federally legal
Leviticus 18:22 和 20:13 prove the anti gay marriage position, gay marriage is not bad, homosexuality is not a sin nor taboo
桌子 3: Representation Learning Objectives: Explain Data (Top) and Embed Symbol (Bottom).
Note that ideology labels were learned from user profiles, and do not necessarily represent the official stances of political
派对.
and raw information. 桌子 3 (Top) takes a
set of statements and explains them by looking
at the symbols associated with them and their
分数. For learning to map debate statements
to ideological symbols, we rely on the partial
supervision provided by the users that self-identify
with a political ideology and disclose it on their
public profiles. Note that we do not incorporate
any explicit expertise in political science to learn
to represent ideological information. We chose
statements with the highest score for each of the
ideologies. We can see that, in the context of
枪, statements that have to do with some form
of gun control have higher scores for the center-to-
left spectrum of ideological symbols (moderate,
liberal, 进步), whereas statements that
mention gun rights and the ineffectiveness of
gun control policies have higher scores for
conservative and libertarian symbols.
this evaluation,
表中 3
(Bottom), we embed ideologies and find three
example statements that are close in the em-
bedding space. In the context of LGBT issues, 我们
find that statements closest to the liberal symbol
are those that support the legalization of same-
sex marriage, and frame it as a constitutional
问题. 另一方面, the statements closest
to the conservative symbol, frame homosexuality
and same-sex marriage as a moral or religious
问题, and we find statements both supporting
and opposing same-sex marriage. This experiment
shows that our model is easy to interpret, 和
provides an explanation for the decision made.
To complement
最后, we evaluate our learned model over
entities that have not been observed during
训练. 要做到这一点, we extract statements made by
three prominent politicians from ontheissues.org.
然后, we try to explain the politicians by looking
这
their predicted ideology. Results for
在
模型
AB
乙
GM
GC
S
A
S
A
S
A
S
A
INDNETS
TransE
当地的
Reln.
Embed. ComplEx
66.0 61.7 58.2 59.7 62.6 60.6 59.5 61.0
62.5 62.9 53.5 65.1 58.7 69.3 55.3 65.0
66.6 73.4 60.7 72.2 66.6 72.8 60.0 70.7
66.6 72.3 59.2 71.3 67.0 74.2 59.4 69.9
RotatE
PSL
81.6 74.4 69.0 64.9 83.3 74.2 71.9 71.7
TensorLog 77.3 61.3 68.2 51.3 80.4 65.2 68.3 55.6
82.8 74.6 64.8 63.2 84.5 73.4 70.4 66.3
JOINTINF
88.6 84.7 72.8 72.2 90.3 81.8 76.8 72.2
DRAIL GLOBAL
89.0 83.5 80.5 76.4 89.3 82.1 80.3 73.4
RELNETS
问题.
Logic.
桌子 4: General results for issue-specific stance
and agreement prediction (Macro F1). AB: Abortion,
乙: 进化, GM: Gay Marriage, GC: Gun Control.
evaluation can be seen in Table 4. The left part of
数字 4 shows the proportion of statements that
were identified for each ideology: 左边 (liberal or
进步), moderate and right (conservative).
We find that we are able to recover the relative
positions in the political spectrum for the evaluated
政治家: Bernie Sanders, Joe Biden, and Donald
Trump. We find that Sanders is the most left
leaning, followed by Biden. 相比之下, Donald
Trump stands mostly on the right. We also include
some examples of the classified statements. 我们
show that we are able to identify cases in which
the statement does not necessarily align with the
known ideology for each politician.
5.4 Issue-Specific Stance Prediction
Given a debate thread on a specific issue (例如,
流产), the task is to predict the stance with
respect to the issue for each one of the debate
posts (Walker et al., 2012). Each thread forms a
tree structure, where users participate and respond
to each other’s posts. We treat the task as a
collective classification problem, and model the
agreement between posts and their replies, 还有
as the consistency between posts written by the
110
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 4: Statements made by politicians classified using our model trained on debate.org.
same author. The DRAIL program for this task can
be observed in Appendix A.
数据集: We use the 4Forums dataset from the
Internet Argument Corpus (Walker et al., 2012),
consisting of a total of 1,230 debates and 24,658
posts on abortion, evolution, gay marriage, 和
gun control. We use the same splits as Li et al.
(2018) and perform 5-fold cross validation.
Entity and Relation Encoders: We repre-
sented posts using pre-trained BERT encoders
(Devlin et al., 2019) and do not generate features
for authors. As in the previous task, we model all
relations and rules using feed-forward networks
with one hidden layer and ReLU activations. 笔记
that we fine-tune all parameters during training.
表中 4 we can observe the general results for
this task. We report macro F1 for post stance and
agreement between posts for all issues. As in the
previous task, we find that ComplEx and RotatE
relational embeddings outperform INDNETS, 和
probabilistic logics outperform methods that do
not perform constrained inference. PSL out-
performs JOINTINF for evolution and gun control
辩论, which are the two issues with less
training data, whereas JOINTINF outperforms PSL
for debates on abortion and gay marriage. 这
could indicate that re-weighting rules may be
advantageous for the cases with less supervision.
最后, we see the advantage of using a global
learning objective and augmenting it with shared
陈述. 桌子 5 compares our model with
previously published results.
5.5 Argument Mining
The goal of this task is to identify argumentative
structures in essays. Each argumentative structure
corresponds to a tree in a document. Nodes are
predefined spans of text and can be labeled
either as claims, 主要索赔, or premises,
模型
A
乙
GM GC
Avg
BERT (Devlin et al., 2019)
PSL (Sridhar et al., 2015乙)
Struct. Rep. (李等人。, 2018)
67.0
77.0
86.5
62.4
80.3
82.2
67.4
80.5
87.6
64.6
69.1
83.1
65.4
76.7
84.9
DRAIL RELNETS
89.2
82.4
90.1
83.1
86.2
桌子 5: Previous work on issue-specific stance
prediction (stance acc.).
and edges correspond to support/attack relations
between nodes. Domain knowledge is injected by
constraining sources to be premises and targets
to be either premises or major claims, 还有
as enforcing tree structures. We model nodes,
links, and second order relations, grandparent
(a → b → c), and co-parent (a → b ← c)
(Niculae et al., 2017). 此外, we consider
link labels, denoted stances. The DRAIL program
for this task can be observed in Appendix A.
数据集: We used the UKP dataset (Stab and
古列维奇, 2017), consisting of 402 文件,
with a total of 6,100 propositions and 3,800 links
(17% of pairs). We use the splits used by Niculae
等人. (2017), and report macro F1 for components
and positive F1 for relations.
Entity and Relation Encoders: To represent
the component and the essay, we used a
BiLSTM over
initialized with
the words,
GloVe embeddings (Pennington et al., 2014),
concatenated with a feature vector following
Niculae et al. (2017). For representing the relation,
we use a feed-forward computation over the
成分, as well as the relation features used
in Niculae et al. (2017).
We can observe the general results for this
task in Table 6. Given that this task relies on
constructing the tree from scratch, we find that all
methods that do not include declarative constraints
(INDNETS and relational embeddings) suffer when
111
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
当地的
Reln.
Embed.
DRAIL
问题. Logic PSL
模型
Node Link Avg Stance Avg
INDNETS
70.7
52.8 61.7
63.4
62.3
TransE
65.7
ComplEx 69.1
67.2
RotatE
76.5
78.6
83.1
82.9
JOINTINF
GLOBAL
RELNETS
23.7 44.7
15.7 42.4
20.7 44.0
56.4 66.5
59.5 69.1
61.2 72.2
63.7 73.3
44.6
53.5
46.7
64.7
62.9
69.2
68.4
44.7
46.1
44.9
65.9
67.0
71.2
71.7
桌子 6: General results for argument mining.
模型
Node Link Avg
人类上限
ILP Joint (斯塔布和古列维奇, 2017)
Struct RNN strict (Niculae et al., 2017)
Struct SVM full (Niculae et al., 2017)
Joint PointerNet (Potash et al., 2017)
Kuribayashi et al. 2019
86.8
82.6
79.3
77.6
84.9
85.7
75.5 81.2
58.5 70.6
50.1 64.7
60.1 68.9
60.8 72.9
67.8 76.8
DRAIL RELNETS
82.9
63.7 73.3
桌子 7: Previous work on argument mining.
trying to predict links correctly. For this task, 我们
did not apply TensorLog, given that we couldn’t
find a way to express tree constraints using their
syntax. 再次, we see the advantage of using
global learning, as well as sharing information
between rules using RELNETS.
桌子 7 shows the performance of our model
against previously published results. While we
are able to outperform models that use the same
underlying encoders and features, recent work
by Kuribayashi et al. (2019) further improved
performance by exploiting contextualized word
embeddings that look at the whole document,
and making a distinction between argumentative
markers and argumentative components. 我们
did not find a significant improvement by in-
corporating their ELMo-LSTM encoders into
our framework,3 nor by replacing our BiLSTM
encoders with BERT. We leave the exploration
of an effective way to leverage contextualized
embeddings for this task for future work.
5.6 Run-time Analysis
在这个部分, we perform a run-time analysis
of all probabilistic logic systems tested. All ex-
periments were run on a 12 core 3.2Ghz Intel i7
CPU machine with 63GB RAM and an NVIDIA
GeForce GTX 1080 Ti 11GB GDDR5X GPU.
3We did not experiment with their normalization
方法, extended BoW features, nor AC/AM distinction.
数字 5: Average overall training time (per fold).
数字 5 shows the overall training time (每
fold) in seconds for each of the evaluated tasks.
Note that the figure is presented in logarithmic
规模. We find that DRAIL is generally more
computationally expensive than both TensorLog
and PSL. This is expected given that DRAIL back-
propagates to the base classifiers at each epoch,
while the other frameworks just take the local
predictions as priors. 然而, when using a large
number of arithmetic constraints (例如, 争论
Mining), we find that PSL takes a really long time
to train. We found no significant difference when
using ILP or AD.3 We presume that this is due to
the fact that our graphs are small and that Gurobi
is a highly optimized commercial software.
最后, we find that when using encoders with
a large number of parameters (例如, BERT) 在
tasks with small graphs, the difference in training
time between training local and global models
is minimal. In these cases, back-propagation is
considerably more expensive than inference, 和
global models converge in fewer epochs. 为了
Argument Mining, local models are at least twice
as fast. BiLSTMs are considerably faster than
BERT, and inference is more expensive for this
任务.
5.7 Analysis of Loss Functions
In this section we perform an evaluation of the
CRF loss for issue-specific stance prediction.
Note that one drawback of the CRF loss (Eq. 6)
is that we need to accumulate the gradient for
the approximated partition function. When using
entity encoders with a lot of parameters (例如,
BERT), the amount of memory needed for a single
instance increases. We were unable to fit the full
models in our GPU. For the purpose of these tests,
we froze the BERT parameters after local training
112
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
模型
Stance Agree
Avg
Secs p/epoch
Hinge loss
病例报告表(β = 5)
病例报告表(β = 20)
病例报告表(β = 50)
82.74
83.09
84.10
84.19
78.54
81.03
82.16
81.80
80.64
82.06
83.13
83.00
132
345
482
720
桌子 8: Stance prediction (流产) dev results
for different training objectives.
and updated only the relation and rule parameters.
To obtain the solution pool, we use Gurobi’s pool
search mode to find β high-quality solutions. 这
also increases the cost of search at inference time.
Development set results for the debates on
abortion can be observed in Table 8. 尽管
increasing the size of the solution pool leads
it comes at a higher
to better performance,
computational cost.
6 结论
在本文中, we motivate the need for a
declarative neural-symbolic approach that can be
applied to NLP tasks involving long texts and
contextualizing information. We introduce a gen-
eral framework to support this, and demonstrate
its flexibility by modeling problems with diverse
relations and rich representations, and obtain
models that are easy to interpret and expand.
The code for DRAIL and the application examples
in this paper have been released to the community,
to help promote this modeling approach for other
applications.
致谢
We would like to acknowledge current and former
members of the PurdueNLP lab, particularly Xiao
张, Chang Li, Ibrahim Dalal, I-Ta Lee, Ayush
Jain, Rajkumar Pujari, and Shamik Roy for their
help and insightful discussions in the early stages
of this project. We also thank the reviewers and
action editor for their constructive feedback. 这
project was partially funded by the NSF, grant
CNS-1814105.
参考
Daniel Andor, Chris Alberti, David Weiss,
Aliaksei Severyn, Alessandro Presta, Kuzman
Ganchev, Slav Petrov, and Michael Collins.
2016. Globally normalized transition-based
在诉讼程序中
这
神经网络.
54th Annual Meeting of the Association for
计算语言学 (体积 1: 长的
文件), pages 2442–2452, 柏林, 德国.
计算语言学协会.
DOI: https://doi.org/10.18653/v1
/P16-1231
斯蒂芬·H. 巴赫, Matthias Broecheler, Bert
黄, and Lise Getoor. 2017. Hinge-loss
markov random fields and probabilistic soft
逻辑. Journal of Machine Learning Research
(JMLR), 181–67.
Islam Beltagy, Katrin Erk, and Raymond Mooney.
2014. Probabilistic soft
logic for semantic
textual similarity. In Proceedings of the 52nd
Annual Meeting of the Association for Compu-
tational Linguistics (体积 1: Long Papers),
pages 1210–1219. 巴尔的摩, Maryland. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P14
-1114
Antoine Bordes, Nicolas Usunier, 阿尔贝托
Garcia-Duran, Jason Weston, and Oksana
Yakhnenko. 2013, Translating embeddings
for modeling multi-relational data, C. J. C.
布尔吉斯, L. 波图, 中号. Welling, Z. Ghahramani,
and K. 问. 温伯格, 编辑, 进展
Neural Information Processing Systems 26,
pages 2787–2795. 柯伦联合公司, Inc.
Samuel R. Bowman, Gabor Angeli, Christopher
波茨, and Christopher D. 曼宁. 2015. A
large annotated corpus for learning natural
language inference. 在诉讼程序中 2015
Conference on Empirical Methods in Natural
语言处理, pages 632–642. 里斯本,
for Computational
Portugal, 协会
语言学. DOI: https://doi.org/10
.18653/v1/D15-1075
Danqi Chen and Christopher Manning. 2014.
A fast and accurate dependency parser using
神经网络. 在诉讼程序中 2014
Conference on Empirical Methods in Natural
语言处理 (EMNLP), pages 740–750.
Doha, Qatar. Association for Computational
语言学. DOI: https://doi.org/10
.3115/v1/D14-1082
William W. 科恩, Fan Yang, and Kathryn
Mazaitis. 2020. Tensorlog: A probabilistic
113
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
database implemented using deep-learning
基础设施. Journal of Artificial Intelligence
研究, 67:285–325. DOI: https://土井
.org/10.1613/jair.1.11944
Jacob Devlin, Ming-Wei Chang, Kenton Lee, 和
Kristina Toutanova. 2019. BERT: Pre-training
of deep bidirectional transformers for language
理解. 在诉讼程序中
这 2019
Conference of the North American Chapter of
the Association for Computational Linguistics:
人类语言技术, 体积 1
(Long and Short Papers), pages 4171–4186,
为了
明尼阿波利斯, Minnesota. 协会
计算语言学.
Ivan Donadello, Luciano Serafini, and Artur
d’Avila Garcez. 2017. Logic tensor networks
for semantic image interpretation. In Proceed-
ings of the Twenty-Sixth International Joint
Conference on Artificial Intelligence, IJCAI-
17, pages 1596–1602. DOI: https://土井
.org/10.24963/ijcai.2017/221
Honghua Dong, Jiayuan Mao, Tian Lin, Chong
王, Lihong Li, and Denny Zhou. 2019.
Neural logic machines. In 7th International
Conference on Learning Representations, ICLR
2019, New Orleans, 这, 美国, 可能 6-9, 2019.
Alessandro Duranti
and Charles Goodwin.
1992. Rethinking context: An introduction,
Alessandro Duranti and Charles Goodwin,
编辑, Rethinking Context: Language as an
Interactive Phenomenon, chapter 1, pages 1–42,
剑桥大学出版社, 剑桥.
Noah D. 古德曼, Vikash K. Mansinghka,
Daniel M. Roy, Keith Bonawitz, and Joshua B.
Tenenbaum. 2008. 教会: a language for gen-
erative models. In UAI 2008, 会议记录
the 24th Conference in Uncertainty in Artificial
智力, Helsinki, 芬兰, 七月 9-12, 2008,
pages 220–229.
Aditya Grover
的
feature
In Proceedings
学习
这
and Jure Leskovec. 2016.
为了
node2vec: Scalable
22ND
网络.
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 桑
Francisco, CA, 美国, 八月 13-17, 2016,
pages 855–864. DOI: https://doi.org
/10.1145/2939672.2939754, PMID:
27853626, PMCID: PMC5108654
约翰
revisited.
J. Gumperz.
在
编辑,
1992. Contextualiza-
和一个.
磷. Auer
的
The Contextualiza-
Luzio,
从
of Language, pages 39–54. DOI:
的
https://doi.org/10.1075/pbns.22
.04gum
Gurobi Optimization Inc. 2015. Gurobi optimizer
reference manual.
Will Hamilton, Zhitao Ying, and Jure Leskovec.
2017. Inductive representation learning on large
图表. In I. Guyon, U. V. Luxburg, S. 本吉奥,
H. 瓦拉赫, 右. 弗格斯, S. Vishwanathan, 和
右. 加内特, 编辑, Advances in Neural Inform-
ation Processing Systems 30, Curran Asso-
ciates, 公司, pages 1024–1034.
Rujun Han, Qiang Ning, and Nanyun Peng.
2019. Joint event and temporal relation extrac-
tion with shared representations and structured
prediction. 在诉讼程序中 2019 骗局-
ference on Empirical Methods in Natural
Language Processing and the 9th Interna-
tional Joint Conference on Natural Language
加工 (EMNLP-IJCNLP), pages 434–444,
香港, 中国, Association for Com-
putational Linguistics. DOI: https://土井
.org/10.18653/v1/D19-1041
Seyed Mehran Kazemi and David Poole. 2018.
ReLNN: A deep neural model for relational
学习. In Proceedings of the Thirty-Second
AAAI 人工智能会议,
(AAAI-18), the 30th innovative Applications of
人工智能 (IAAI-18), and the 8th
AAAI Symposium on Educational Advances
in Artificial
智力 (EAAI-18), 新的
Orleans, Louisiana, 美国, 二月 2-7, 2018,
pages 6367–6375, AAAI出版社.
Eliyahu Kiperwasser and Yoav Goldberg. 2016.
Simple and accurate dependency parsing using
bidirectional LSTM feature representations.
the Association for Com-
Transactions of
4:313–327. DOI:
putational
https://doi.org/10.1162/tacl
00101
语言学,
Thomas N. Kipf and Max Welling. 2017.
Semi-supervised classification with graph
convolutional networks. In 5th International
Conference on Learning Representations, ICLR
114
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2017, Toulon, 法国, 四月 24-26, 2017,
Conference Track Proceedings.
Parisa Kordjamshidi, Dan Roth, and Hao Wu.
2015. Saul: Towards declarative learning
这
based programming. 在诉讼程序中
Twenty-Fourth International Joint Confer-
ence on Artificial Intelligence, IJCAI 2015,
Buenos Aires, 阿根廷, 七月 25-31, 2015,
pages 1844–1851.
Tatsuki Kuribayashi, Hiroki Ouchi, Naoya
Inoue, Paul Reisert, Toshinori Miyoshi, Jun
Suzuki, and Kentaro Inui. 2019. An empirical
study of span representations in argumenta-
tion structure parsing. 在诉讼程序中
57th Annual Meeting of the Association for
计算语言学, pages 4691–4698,
Florence,
意大利. Association for Computa-
tional Linguistics. DOI: https://doi.org
/10.18653/v1/P19-1464
在诉讼程序中
Guillaume Lample, Miguel Ballesteros, Sandeep
Subramanian, Kazuya Kawakami, and Chris
Dyer. 2016. Neural architectures for named
这
entity recognition.
2016 Conference of
the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
页面
260–270, 圣地亚哥, 加利福尼亚州,
计算语言学协会.
DOI: https://doi.org/10.18653/v1
/N16-1030
I-Ta Lee and Dan Goldwasser. 2019. 多-
relational script learning for discourse relations.
In Proceedings of the 57th Annual Meeting
of the Association for Computational Linguis-
抽动症, pages 4214–4226, Florence, 意大利. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P19
-1413
Chang Li, Aldo Porco, and Dan Goldwasser.
2018. Structured representation learning for
online debate stance prediction. In Proceedings
the 27th International Conference on
的
计算语言学, pages 3728–3739,
圣达菲, New Mexico, 美国. 协会
计算语言学.
improve automatic event detection. In Pro-
ceedings of the 54th Annual Meeting of the
计算语言学协会
(体积 1: Long Papers), pages 2134–2143.
柏林, 德国, Association for Computa-
tional Linguistics.
Xuezhe Ma and Eduard Hovy. 2016. End-
to-end sequence labeling via bi-directional
这
LSTM-CNNs-CRF.
54th Annual Meeting of the Association for
计算语言学 (体积 1: 长的
文件), pages 1064–1074, 柏林, 德国,
计算语言学协会.
在诉讼程序中
Chaitanya Malaviya, Matthew R. Gormley,
and Graham Neubig. 2018. Neural
因素
graph models for cross-lingual morphological
tagging. In Proceedings of the 56th Annual
Meeting of the Association for Computatio-
nal Linguistics (体积 1: Long Papers),
pages 2653–2663. 墨尔本, 澳大利亚. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P18
-1247
Robin Manhaeve, Sebastijan Dumancic, Angelika
Kimmig, Thomas Demeester, and Luc De
Raedt. 2018. Deepproblog: Neural probabilis-
tic logic programming.
在S. 本吉奥, H.
瓦拉赫, H. 拉罗谢尔, K. Grauman, 氮. Cesa-
Bianchi, 和R. 加内特, 编辑, 进展
Neural Information Processing Systems 31,
pages 3749–3759, 柯伦联合公司, Inc.
Jiayuan Mao, Chuang Gan, Pushmeet Kohli,
Joshua B. Tenenbaum, and Jiajun Wu. 2019.
The neuro-symbolic concept learner: Interpret-
ing scenes, 字, and sentences from natural
supervision.
Giuseppe
Marra,
Francesco
Giannini,
Michelangelo Diligenti, and Marco Gori.
2019. Integrating learning and reasoning with
deep logic models. In Machine Learning and
Knowledge Discovery in Databases – 欧洲的
会议, ECML PKDD 2019, W¨urzburg,
德国, 九月 16-20, 2019, Proceed-
英格斯, 第二部分, pages 517–532. DOI: https://
doi.org/10.1007/978-3-030-46147
-8 31
Shulin Liu, Yubo Chen, Shizhu He, Kang Liu,
and Jun Zhao. 2016. Leveraging FrameNet to
Andr´e F. 时间. 马丁斯, M´ario A. 时间. Figueiredo,
Pedro M. 问. Aguiar, 诺亚A. 史密斯, 和
115
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Eric P. Xing. 2015. Ad3: Alternating direc-
tions dual decomposition for map inference in
graphical models. Journal of Machine Learn-
ing Research, 16(16):495–545.
Andrew McCallum, Karl Schultz, and Sameer
辛格. 2009. Factorie: Probabilistic program-
ming via imperatively defined factor graphs.
是. 本吉奥, D. Schuurmans, J. D. 拉弗蒂,
C. K. 我. 威廉姆斯, 和一个. Culotta, 编辑,
Advances
Information Process-
ing Systems 22, pages 1249–1257, 柯兰
Associates, Inc.
in Neural
Miller McPherson, Lynn Smith-Lovin,
和
詹姆斯·M. 厨师. 2001. Birds of a feather: Homo-
phily in social networks. Annual Review of
Sociology, 27(1):415–444. DOI: https://
doi.org/10.1146/annurev.soc.27.1
.415
Brian Milch, Bhaskara Marthi, Stuart Russell,
David Sontag, Daniel L. Ong, and Andrey
Kolobov. 2005. Blog: Probabilistic models with
unknown objects. 在 (IJCAI ’05) Nineteenth
International Joint Conference on Artificial
智力.
Vlad Niculae, Joonsuk Park, and Claire Cardie.
2017. Argument mining with structured SVMs
and RNNs. In Proceedings of the 55th Annual
Meeting of
the Association for Computa-
tional Linguistics (体积 1: Long Papers),
pages 985–995, Vancouver, 加拿大. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P17
-1091
Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth.
2018. Joint reasoning for temporal and causal
关系. In Proceedings of the 56th Annual
Meeting of
the Association for Compu-
tational Linguistics (体积 1: Long Papers),
2278–2288, 墨尔本, 澳大利亚,
页面
计算语言学协会.
DOI: https://doi.org/10.18653/v1
/P18-1212
Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi
张, and Yang Wang. 2016. Tri-party deep
network representation. 在诉讼程序中
Twenty-Fifth International Joint Conference on
人工智能, IJCAI 2016, 纽约,
纽约, 美国, 9-15 七月 2016, pages 1895–1901.
116
杰弗里
Socher,
Pennington, 理查德
和
Christopher Manning. 2014. Glove: 全球的
vectors for word representation. In Proceed-
ings of
这 2014 Conference on Empirical
Methods in Natural Language Processing
(EMNLP), pages 1532–1543, Doha, Qatar.
计算语言学协会.
DOI: https://doi.org/10.3115/v1
/D14-1162
Bryan Perozzi, Rami Al-Rfou, and Steven
Skiena. 2014. Deepwalk: Online learning of
social representations. 在诉讼程序中
20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining,
KDD ’14, pages 701–710, 纽约, 纽约,
美国. ACM. DOI: https://doi.org/10
.1145/2623330.2623732
Matthew Peters, Mark Neumann, Mohit Iyyer,
Matt Gardner, Christopher Clark, Kenton Lee,
and Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. 在诉讼程序中
这 2018 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
体积 1 (Long Papers), pages 2227–2237,
New Orleans, Louisiana, Association for Com-
putational Linguistics. DOI: https://土井
.org/10.18653/v1/N18-1202
Peter Potash, Alexey Romanov, and Anna
Rumshisky. 2017. Here’s my point: Joint
pointer architecture for argument mining.
这 2017 会议
在诉讼程序中
in Natural Lan-
on Empirical Methods
guage Processing, pages 1364–1373, Copen-
hagen, 丹麦. Association for Computa-
tional Linguistics. DOI: https://doi.org
/10.18653/v1/D17-1143
Matthew Richardson and Pedro M. Domingos.
2006. Markov logic networks. Machine Learn-
英, 62(1–2):107–136. DOI: https://土井
.org/10.1007/s10994-006-5833-1
Nick Rizzolo and Dan Roth. 2010. 学习
based Java for rapid development of NLP
系统. In Proceedings of the Seventh Interna-
tional Conference on Language Resources
and Evaluation (LREC’10). Valletta, 马耳他,
European Language Resources Association
(ELRA).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Tim Rockt¨aschel and Sebastian Riedel. 2017.
End-to-end differentiable proving. In I. Guyon,
U. V. Luxburg, S. 本吉奥, H. 瓦拉赫, 右.
弗格斯, S. Vishwanathan, 和R. 加内特,
编辑, Advances
信息
Processing Systems 30, pages 3788–3800.
柯伦联合公司, Inc.
in Neural
Ali Sadeghian, Mohammadreza Armandpour,
Patrick Ding, and Daisy Zhe Wang. 2019.
Drum: End-to-end differentiable rule mining
on knowledge graphs.
在H. 瓦拉赫, H.
拉罗谢尔, A. 贝格尔齐默, F. dAlch´e-Buc,
乙. 狐狸, 和R. 加内特, 编辑, 进展
Neural Information Processing Systems 32,
pages 15347–15357. 柯伦联合公司, Inc.
Gustav Sourek, Vojtech Aschenbrenner, Filip
Zelezn´y, Steven Schockaert,
and Ondrej
Kuzelka. 2018. Lifted relational neural net-
作品: Efficient learning of latent relational
结构. Journal of Artificial Intelligence
研究, 62:69–100. DOI: https://土井
.org/10.1613/jair.1.11203
Dhanya Sridhar, James Foulds, Bert Huang, Lise
Getoor, and Marilyn Walker. 2015乙. Joint
models of disagreement and stance in online
the 53rd Annual
debate. 在诉讼程序中
Meeting of the Association for Computational
Linguistics and the 7th International Joint
Conference on Natural Language Processing
(体积 1: Long Papers), pages 116–125,
北京, 中国, Association for Computa-
tional Linguistics. DOI: https://doi.org
/10.3115/v1/P15-1012
Christian Stab and Iryna Gurevych, 2017.
Parsing argumentation structures
in per-
suasive essays. 计算语言学,
43(3):619–659. DOI: https://doi.org
/10.1162/大肠杆菌a 00295
Shivashankar
Subramanian,
Trevor Cohn,
Timothy Baldwin, and Julian Brooke. 2017.
Joint sentence-document model for manifesto
text analysis. In Proceedings of the Australasian
Language Technology Association Workshop
2017, pages 25–33, Brisbane, 澳大利亚.
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, 和
Jian Tang. 2019. Rotate: Knowledge graph
embedding by relational rotation in complex
空间. In 7th International Conference on
117
Learning Representations, ICLR 2019, 新的
Orleans, 这, 美国, 可能 6-9, 2019.
Jian Tang, Meng Qu, Mingzhe Wang, Ming
张, Jun Yan, and Qiaozhu Mei. 2015. LINE:
large-scale information network embedding. 在
Proceedings of the 24th International Confer-
ence on World Wide Web, 万维网 2015, Florence,
意大利, 可能 18-22, 2015, pages 1067–1077. DOI:
https://doi.org/10.1145/2736277
.2741093
Th´eo Trouillon,
Johannes Welbl, Sebastian
Riedel, Eric Gaussier, and Guillaume Bouchard.
2016. Complex embeddings for simple link
prediction. In Proceedings of The 33rd Inter-
national Conference on Machine Learning,
体积 48 of Proceedings of Machine Learning
研究, pages 2071–2080, 纽约, 新的
约克, 美国, PMLR.
Cunchao Tu, Han Liu, Zhiyuan Liu, 和
Maosong Sun. 2017. CANE: Context-aware
network embedding for relation modeling. 在
Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1722–1731,
Vancouver, 加拿大. Association for Computa-
tional Linguistics.
Iulia Turc, Ming-Wei Chang, Kenton Lee,
and Kristina Toutanova. 2019. Well-read
students learn better: On the importance of
pre-training compact models. arXiv 预印本
arXiv:1908.08962v2.
Petar Veliˇckovi´c, Guillem Cucurull, Arantxa
Casanova, Adriana Romero, Pietro Lio,
and Yoshua Bengio. 2017. Graph attention
网络. arXiv 预印本 arXiv:1710.10903.
Marilyn Walker, Pranav Anand, Rob Abbott,
and Ricky Grant. 2012. Stance classification
using dialogic properties of persuasion. 在
诉讼程序 2012 Conference of the
North American Chapter of the Association for
计算语言学: Human Language
Technologies,
592–596, Montr´eal,
页面
加拿大, 协会
for Computational
语言学.
Hai Wang and Hoifung Poon. 2018. Deep prob-
abilistic logic: A unifying framework for
indirect supervision. 在诉讼程序中 2018
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Conference on Empirical Methods in Natu-
ral Language Processing, pages 1891–1902,
布鲁塞尔, 比利时. Association for Computa-
tional Linguistics. DOI: https://doi.org
/10.18653/v1/D18-1215
William Yang Wang, Kathryn Mazaitis, 和
William W. 科恩. 2013. Programming with
personalized pagerank: A locally groundable
first-order probabilistic logic. In 22nd ACM
International Conference on Information and
Knowledge Management, CIKM’13, San Fran-
cisco, CA, 美国, 十月 27 – 十一月 1, 2013,
pages 2129–2138. ACM. DOI: https://
doi.org/10.1145/2505515.2505573
Zhen Wang,
Jianwen Zhang,
Jianlin Feng,
and Zheng Chen. 2014. Knowledge graph
embedding by translating on hyperplanes.
the Twenty-Eighth AAAI
在诉讼程序中
智力, 七月
Conference on Artificial
27-31, 2014, Qu´ebec City, Qu´ebec, 加拿大,
pages 1112–1119.
David Weiss, Chris Alberti, Michael Collins,
and Slav Petrov. 2015. Structured training for
neural network transition-based parsing. 在
Proceedings of the 53rd Annual Meeting of
the Association for Computational Linguistics
and the 7th International Joint Conference on
自然语言处理 (体积 1: 长的
文件), pages 323–333, 北京, 中国. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P15
-1032,
PMCID:
PMC4695984
26003913,
PMID:
Han Xiao, Minlie Huang, Lian Meng, and Xiaoyan
朱. 2017. 固相磷酸盐: semantic space projection
for knowledge graph embedding with text
descriptions. In Proceedings of the Thirty-First
AAAI 人工智能会议,
二月 4-9, 2017, 旧金山, 加利福尼亚州,
美国, pages 3104–3110.
Fan Yang, Zhilin Yang, and William W. 科恩.
2017. Differentiable learning of logical rules
for knowledge base reasoning. In I. Guyon,
U. V. Luxburg, S. 本吉奥, H. 瓦拉赫,
右. 弗格斯, S. Vishwanathan, 和R. 加内特,
编辑, Advances
信息
Processing Systems 30, pages 2319–2328.
柯伦联合公司, Inc.
in Neural
Xiao Zhang and Dan Goldwasser. 2019. 情绪
tagging with partial
labels using modular
架构. In Proceedings of the 57th Annual
Meeting of the Association for Computational
语言学, pages 579–590, Florence, 意大利.
计算语言学协会.
the 53rd Annual Meeting of
Hao Zhou, Yue Zhang, Shujian Huang, 和
Jiajun Chen. 2015. A neural probabilistic
为了
structured-prediction model
过渡-
In Proceedings
based dependency parsing.
the Asso-
的
ciation for Computational Linguistics and
the 7th International Joint Conference on
自然语言处理 (体积 1:
Long Papers), pages 1213–1222, 北京,
中国, Association for Computational Linguis-
抽动症. DOI: https://doi.org/10.3115
/v1/P15-1117, PMCID: PMC4674637
118
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
A DRaiL Programs
C Code Snippets
We include code snippets to show how to load
data into DRAIL (Figure 7-a), 以及
how to define a neural architecture (Figure 7-b).
Neural architectures and feature functions can be
programmed by creating Python classes, 和
module and classes can be directly specified in
the DRAIL program (线 13, 14, 24, 和 29 在
Figure 7-a).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
t
我
A
C
_
A
_
0
0
3
5
7
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 7: Code snippets.
数字 6: DRAIL programs.
B Hyperparameters
For BERT, we use the default parameters.
任务
Param
Search Space
Selected Value
Open Domain
(当地的)
Open Domain
(全球的)
Stance Pred.
(当地的)
Stance Pred.
(全球的)
精氨酸. Mining
(当地的)
精氨酸. Mining
(全球的)
Learning Rate
Batch size
Patience
Optimizer
Hidden Units
Non-linearity
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Optimizer
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Dropout
Optimizer
Hidden Units
Non-linearity
Learning Rate
Patience
Batch size
2e-6,5e-6,2e-5,5e-5
32 (Max. Mem)
1,3,5
SGD,亚当,AdamW
128,512
-
2e-6,5e-6,2e-5,5e-5
-
2e-6,5e-6,2e-5,5e-5
1,3,5
16 (Max. Mem)
SGD,亚当,AdamW
2e-6,5e-6,2e-5,5e-5
-
1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20
16,32,64,128
0.01,0.05,0.1
SGD,亚当,AdamW
128,512
-
1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20
-
2e-5
32
3
AdamW
512
ReLU
2e-6
Full instance
5e-5
3
16
AdamW
2e-6
Full instance
5e-2
20
64
0.05
SGD
128
ReLU
1e-4
10
Full instance
桌子 9: Hyper-parameter tuning.
119