Modeling Content and Context with Deep Relational Learning
Maria Leonor Pacheco and Dan Goldwasser
Department of Computer Science
Purdue University
West Lafayette, IN 47907
{pachecog, dgoldwas}@purdue.edu
Astratto
Building models for realistic natural language
tasks requires dealing with long texts and ac-
counting for complicated structural depen-
dencies. Neural-symbolic representations have
emerged as a way to combine the reasoning
capabilities of symbolic methods, with the
expressiveness of neural networks. Tuttavia,
most of the existing frameworks for combining
neural and symbolic representations have been
designed for classic relational learning tasks
that work over a universe of symbolic entities
and relations. in questo documento, we present DRAIL,
an open-source declarative framework for spe-
cifying deep relational models, designed to
support a variety of NLP scenarios. Our frame-
work supports easy integration with expressive
language encoders, and provides an interface to
study the interactions between representation,
inference and learning.
1 introduzione
Understanding natural language interactions in
realistic settings requires models that can deal with
noisy textual
the depen-
inputs, reason about
dencies between different textual elements, E
leverage the dependencies between textual content
and the context from which it emerges. Work in
linguistics and anthropology has defined context
as a frame that surrounds a focal communicative
event and provides resources for its interpretation
(Gumperz, 1992; Duranti and Goodwin, 1992).
As a motivating example, consider the interac-
tions in the debate network described in Figure 1.
Given a debate claim (t1), and two consecutive
posts debating it (p1, p2), we define a textual in-
ference task, determining whether a pair of text
elements hold the same stance in the debate
(denoted using the relation Agree(X, Y)). Questo
task is similar to other textual inference tasks
100
(Bowman et al., 2015) that have been successfully
approached using complex neural representations
(Peters et al., 2018; Devlin et al., 2019). In add-
ition, we can leverage the dependencies between
these decisions. Per esempio, assuming that one
post agrees with the debate claim (Agree(t1,
p2)), and the other one does not (¬Agree(t1, p1)),
the disagreement between the two posts can be
inferred: ¬Agree(t1, p1) ∧ Agree(t1, p2) → ¬
Agree(p1, p2). Finalmente, we consider the social
context of the text. The disagreement between the
posts can reflect a difference in the perspectives
their authors hold on the issue. This informa-
tion might not be directly observed, but it can be
inferred using the authors’ social interactions and
behavior, given the principle of social homophily
(McPherson et al., 2001), stating that people with
strong social ties are likely to hold similar views
and authors’ perspectives can be captured by rep-
resenting their social interactions. Exploiting this
information requires models that can align the
social representation with the linguistic one.
Motivated by these challenges, we introduce
DRAIL1, a Deep Relational Learning framework,
which uses a combined neuro-symbolic repre-
sentation for modeling the interaction between
multiple decisions in relational domains. Similar
to other neuro-symbolic approaches (Mao et al.,
2019; Cohen et al., 2020), our goal is to exploit
the complementary strengths of the two modeling
paradigms. Symbolic representations, used by
logic-based systems and by probabilistic graphical
models (Richardson and Domingos, 2006; Bach
et al., 2017), are interpretable, and allow domain
experts to directly inject knowledge and constrain
the learning problem. Neural models capture de-
pendencies using the network architecture and are
better equipped to deal with noisy data, ad esempio
testo. Tuttavia, they are often difficult to interpret
and constrain according to domain knowledge.
1https://gitlab.com/purdueNlp/DRaiL/.
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 9, pag. 100–119, 2021. https://doi.org/10.1162/tacl a 00357
Redattore di azioni: Hoifung Poon. Lotto di invio: 6/2020; Lotto di revisione: 10/2020; Pubblicato 3/2021.
C(cid:13) 2021 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
texts reflecting these ideologies, by exploiting
the relations that bridge social and linguistic
informazione (Guarda la figura 1).
To demonstrate DRAIL’s modeling approach,
we introduce the task of open-domain stance
prediction with social context, which combines
social network analysis and textual inference over
complex opinionated texts, as shown in Figure 1.
We complement our evaluation of DRAIL with two
additional tasks, issue-specific stance prediction,
where we identify the views expressed in debate
forums with respect to a set of fixed issues (Walker
et al., 2012), and argumentation mining (Stab
and Gurevych, 2017), a document-level discourse
analysis task.
2 Related Work
In this section, we survey several lines of work
dealing with symbolic, neural, and hybrid repre-
sentations for relational learning.
2.1 Languages for Graphical Models
Several high-level languages for specifying graph-
ical models have been suggested. BLOG (Milch
et al., 2005) and CHURCH (Goodman et al., 2008)
were suggested for generative models. For discri-
minative models, we have Markov Logic Net-
works (MLNs) (Richardson and Domingos, 2006)
and Probabilistic Soft Logic (PSL) (Bach et al.,
2017). Both PSL and MLNs combine logic and
probabilistic graphical models in a single repre-
sentation, where each formula is associated with
a weight, and the probability distribution over
possible assignments is derived from the weights
of the formulas that are satisfied by such assign-
menti. Like DRAIL, PSL uses formulas in clausal
form (specifically collections of horn clauses).
The main difference between DRAIL and these
languages is that, in addition to graphical models,
it uses distributed knowledge representations
to represent dependencies. Other discriminative
methods include FACTORIE (McCallum et al.,
2009), an imperative language to define factor
graphs, Constrained Conditional Models (CCMs)
(Rizzolo and Roth, 2010; Kordjamshidi et al.,
2015) an interface to enhance linear classifiers
with declarative constraints, and ProPPR (Wang
et al., 2013) a probabilistic logic for large data-
bases that approximates local groundings using a
variant of personalized PageRank.
Figura 1: Example debate.
Our main design goal in DRAIL is to provide
a generalized tool, specifically designed for NLP
compiti. Existing approaches designed for classic
relational learning tasks (Cohen et al., 2020),
such as knowledge graph completion, are not
equipped to deal with the complex linguistic input,
whereas others are designed for very specific NLP
settings such as word-based quantitative reason-
ing problems (Manhaeve et al., 2018) or aligning
images with text (Mao et al., 2019). We discuss the
differences between DRAIL and these approaches
in Section 2. The examples in this paper focus
on modelings various argumentation mining tasks
and their social and political context, but the same
principles can be applied to wide array of NLP
tasks with different contextualizing information,
such as images that appear next to the text, O
prosody when analyzing transcribed speech, A
name a few examples.
DRAIL uses a declarative language for de-
fining deep relational models. Similar to other
declarative languages (Richardson and Domingos,
it allows users to
2006; Bach et al., 2017),
inject their knowledge by specifying dependencies
between decisions using first-order logic rules,
which are later compiled into a factor graph
with neural potentials. In addition to probabilistic
inference, DRAIL also models dependencies using
a distributed knowledge representation, denoted
RELNETS, which provides a shared representation
space for entities and their relations, trained using
a relational multi-task learning approach. Questo
provides a mechanism for explaining symbols, E
aligning representations from different modali-
ties. Following our running example, ideological
standpoints, such as Liberal or Conservative,
are discrete entities embedded in the same space
as textual entities and social entities. These en-
tities are initially associated with users, Tuttavia
using RELNETS this information will propagate to
101
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
2.2 Node Embedding and Graph Neural Nets
A recent alternative to graphical models is to use
neural nets to represent and learn over relational
dati, represented as a graph. Similar to DRAIL’s
the learned node representation can
RELNETS,
be trained by several different prediction tasks.
Tuttavia, unlike DRAIL, these methods do not
use probabilistic inference to ensure consistency.
Node embeddings approaches (Perozzi et al.,
2014; Tang et al., 2015; Pan et al., 2016; Grover
and Leskovec, Grover and Leskovec, 2016;
Tu et al., 2017) learn a feature representation
for nodes capturing graph adjacency information,
such that the similarity in the embedding space of
any two nodes is proportional
to their graph
distance and overlap in neighboring nodes. Some
frameworks (Pan et al., 2016; Xiao et al., 2017;
Tu et al., 2017) allow nodes to have textual
properties, which provide an initial feature repre-
sentation when learning to represent the graph
relations. When dealing with multi-relational data,
such as knowledge graphs, both the nodes and
the edge types are embedded (Bordes et al., 2013;
Wang et al., 2014; Trouillon et al., 2016; Sole et al.,
2019). Finalmente, these methods learn to represent
nodes and relations based on pair-wise node
relations, without representing the broader graph
context in which they appear. Graph neural nets
(Kipf and Welling, 2017; Hamilton et al., 2017;
Veliˇckovi´c et al., 2017) create contextualized
node representations by recursively aggregating
neighboring nodes.
2.3 Hybrid Neural-Symbolic Approaches
Several recent systems explore ways to combine
neural and symbolic representations in a unified
modo. We group them into five categories.
Lifted rules to specify compositional nets.
These systems use an end-to-end approach and
learn relational dependencies in a latent space.
Lifted Relational Neural Networks (LRNNs)
(Sourek et al., 2018) and RelNNs (Kazemi and
Poole, 2018) are two examples. These systems
map observed ground atoms, facts, and rules to
specific neurons in a network and define compo-
sition functions directly over them. While they
provide for a modular abstraction of the relational
inputs, they assume all inputs are symbolic and do
not leverage expressive encoders.
Differentiable inference. These systems iden-
tify classes of logical queries that can be compiled
into differentiable functions in a neural network
infrastructure. In this space we have Tensor
Logic Networks (TLNs) (Donadello et al., 2017)
and TensorLog (Cohen et al., 2020). Symbols are
represented as row vectors in a parameter matrix.
The focus is on implementing reasoning using a
series of numeric functions.
Rule induction from data. These systems
are designed for inducing rules from symbolic
knowledge bases, which is not in the scope of our
framework. In this space we find Neural Theorem
Provers (NTPs) (Rockt¨aschel and Riedel, 2017),
Neural Logic Programming (Yang et al., 2017),
DRUM (Sadeghian et al., 2019) and Neural Logic
Machines (NLMs) (Dong et al., 2019). NTPs use
a declarative interface to specify rules that add
inductive bias and perform soft proofs. The other
approaches work directly over the database.
Deep classifiers and probabilistic inference.
These systems propose ways to integrate prob-
abilistic inference and neural networks for diverse
learning scenarios. DeepProbLog (Manhaeve et al.
20180 extends the probabilistic logic program-
ming language ProbLog to handle neural
predicates. They are able to learn probabilities for
atomic expressions using neural networks. IL
input data consists of a combination of feature
vectors for the neural predicates, together with
other probabilistic facts and clauses in the logic
program. Targets are only given at the output
side of the probabilistic reasoner, allowing them
to learn each example with respect to a single
query. D'altra parte, Deep Probabilistic
Logic (DPL) (Wang and Poon 2018) combines
neural networks with probabilistic logic for indi-
rect supervision. They learn classifiers using neu-
ral networks and use probabilistic logic to intro-
duce distant supervision and labeling functions.
Each rule is regarded as a latent variable, and the
logic defines a joint probability distribution over
all labeling decisions. Then, the rule weights and
the network parameters are learned jointly using
variational EM. In contrasto, DRAIL focuses on
learning multiple interdependent decisions from
dati, handling and requiring supervision for all
unknown atoms in a given example. Lastly, Deep
Logic Models (DLMs) (Marra et al., 2019) learn a
set of parameters to encode atoms in a probabilistic
logic program. Similarly to Donadello et al. (2017)
102
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Symbolic Features
Neural Features
Rule
Induction Symbols
Embed. End-to-end Backprop. to Architecture Multi-Task Open
Fonte
Encoders
Agnostic
Apprendimento
Neural
Symbolic Raw Decla- Prob/Logic
Inference
Inputs
Inputs
rative
System
MLN
FACTORIE
CCM
PSL
LRNNs
RelNNs
LTNs
TensorLog
NTPs
Neural LP
DRUM
NLMs
DeepProbLog
DPL
DLMs
DRAIL
Tavolo 1: Comparing systems.
and Cohen et al. (2020), they use differentiable
inference, allowing the model to be trained end-
to-end. Like DRAIL, DLMs can work with diverse
neural architectures and backpropagate back to
the base classifiers. The main difference between
DLMs and DRAIL is that DRAIL ensures repre-
sentation consistency of entities and relations
across all learning tasks by employing RELNETS.
Deep structured models. More generally,
deep structured prediction approaches have been
successfully applied to various NLP tasks such as
named entity recognition and dependency parsing
(Chen and Manning, 2014; Weiss et al., 2015; Mamma
and Hovy, 2016; Lample et al., 2016; Kiperwasser
and Goldberg, 2016; Malaviya et al., 2018).
When the need arises to go beyond sentence-
level, some works combine the output scores of
independently trained classifiers using inference
(Beltagy et al., 2014; ?; Liu et al., 2016;
Subramanian et al., 2017; Ning et al., 2018),
whereas others implement joint learning for their
specific domains (Niculae et al., 2017; Han et al.,
2019). Our main differentiating factor is that we
provide a general interface that leverages first
order logic clauses to specify factor graphs and
express constraints.
To summarize these differences, we outline a
feature matrix in Table 1. Given our focus in
NLP tasks, we require a neural-symbolic system
Quello (1) allows us to integrate state-of-the-art text
encoders and NLP tools, (2) supports structured
prediction across long texts, (3) lets us combine
several modalities and their representations (per esempio.,
social and textual information), E (4) results in
an explainable model where domain constraints
can be easily introduced.
3 The DRAIL Framework
DRAIL was designed for supporting complex
NLP tasks. Problems can be broken down into
domain-specific atomic components (which could
be words, sentences, paragraphs or full documents,
depending on the task), and dependencies be-
tween them, their properties and contextualizing
information about them can be explicitly modeled.
In DRAIL, dependencies can be modeled over
the predicted output variables (similar to other
probabilistic graphical models), as well as over
the neural representation of the atoms and their
relationships in a shared embedding space. Questo
section explains the framework in detail. We begin
with a high-level overview of DRAIL and the
process of moving from a declarative definition to
a predictive model.
A DRAIL task is defined by specifying a
finite set of entities and relations. Entities are
either discrete symbols (per esempio., POS tags, ideologies,
specific issue stances), or attributed elements with
complex internal information (per esempio., documents,
utenti). Decisions are defined using rule templates,
formatted as Horn clauses: tLH ⇒ tRH , Dove
tLH (body) is a conjunction of observed and
predicted relations, and tRH (head)
IL
produzione
IL
debate prediction task in Figure 1, it consists
of several sub-tasks, involving textual inference
(Agree(t1, t2)), social relations (VoteFor(tu, v)) E
their combination (Agree(tu, T)). We illustrate how
È
relation to be learned. Consider
103
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
we describe the neural components and learning
procedures.
3.1 Modeling Language
We begin our description of DRAIL by defining
the templating language, consisting of entities,
relations, and rules, and explaining how these
elements are instantiated given relevant data.
Entities are named symbolic or attributed
elements. An example of a symbolic entity is a
political ideology (per esempio., Liberal or Conservative).
An example of an attributed entity is a user with
age, genere, and other profile information, O
a document associated with textual content. In
DRAIL entities can appear either as constants,
written as strings in double or single quote
(per esempio., “user1”) or as variables, which are
identifiers,
substituted with constants when
grounded. Variables are written using unquoted
upper case strings (per esempio., X, X1). Both constants and
variables are typed.
Relations are defined between entities and their
properties, or other entities. Relations are defined
using a unique identifier, a named predicate,
and a list of typed arguments. Atoms consist
of a predicate name and a sequence of entities,
consistent with the type and arity of the relation’s
argument list. If the atom’s arguments are all
constants, it is referred to as a ground atom. For
esempio, Agree(“user1”, “user2”) is a ground
atom representing whether “user1” E “user2”
are in agreement. When atoms are not grounded
(per esempio., Agree(X, Y)) they serve as placeholders for
all the possible groundings that can be obtained by
replacing the variables with constants. Relations
can either be closed (cioè., all of their atoms are
observed) or open, when some of the atoms can
be unobserved. In DRAIL, we use a question mark
? to denote unobserved relations. These relations
are the units that we reason over.
To help make these concepts concrete, con-
sider the following example analyzing stances in a
debate, as introduced in Figure 1. Primo, we define
the entities. User = {“u1”, “u2”}, Claim = {“t1”}
Post ={“p1”, “p2”}. Users are entities associ-
ated with demographic attributes and prefer-
enze. Claims are assertions over which users
debate. Posts are textual arguments that users
write to explain their position with respect
to the claim. We create these associations by
Figura 2: General overview of DRAIL.
to specify the task as a DRAIL program in Figure 2
(left), by defining a subset of rule templates to
predict these relations.
Each rule template is associated with a neural
architecture and a feature function, mapping the
initial observations to an input vector for each
neural net. We use a shared relational embedding
spazio, denoted RELNETS, to represent entities and
relations over them. As described in Figure 2
(“RelNets Layer”), each entity and relation type
is associated with an encoder,
trained jointly
across all prediction rules. This is a form of re-
lational multi-task learning, as the same entities
and relations are reused in multiple rules and
their representation is updated accordingly. Each
rule defines a neural net, learned over the relations
defined on the body. They they take a composition
of the vectors generated by the relations encoders
as an input (Figura 2, “Rule Layer”). DRAIL
is architecture-agnostic, and neural modules for
relations and rules can be specified
entities,
using PyTorch (code snippets can be observed in
Appendix C). Our experiments show that we can
use different architectures for representing text,
utenti, as well as for embedding discrete entities.
The relations in the Horn clauses can correspond
to hidden or observed information, and a specific
input is defined by the instantiations—or ground-
ings—of these elements. The collection of all rule
groundings results in a factor graph representing
our global decision, taking into account the con-
sistency and dependencies between the rules. Questo
modo, the final assignments can be obtained by
running an inference procedure. Per esempio,
the dependency between the views of users on
the debate topic (Agree(tu, T)) and the agreement
between them (VoteFor(tu, v)), is modeled as a
factor graph in Figure 2 (“Structured Inference
Layer”)).
Noi
In
Sezione 3.1. Then, in Sections 3.2, 3.3, E 4,
the DRAIL language
formalize
104
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
defining a set of relations, capturing author-
ship Author(Utente, Post), votes between users
VoteFor(Utente, Utente)?, and the position users, E
their posts, take with respect to to the debate
claim. Agree(Claim, Utente)?, Agree(Claim, Post)?.
The authorship relation is the only closed one, for
esempio, the atom: O = {Author(“u1”, “p1”)}.
Rules are functions that map literals (atoms or
their negation) to other literals. Rules in DRAIL are
defined using templates formatted as Horn clauses:
tLH ⇒ tRH, where tLH (body) is a conjunction
of literals, and tRH (head) is the output literal
to be predicted, and can only be an instance of
open relations. Horn clauses allow us to describe
structural dependencies as a collection of “if-then”
rules, which can be easily interpreted. For exam-
ple, Agree(X, C) ∧ VoteFor(Y, X) ⇒ Agree(Y, C) ex-
presses the dependency between votes and users
holding similar stances on a specific claim. Noi
note that rules can be rewritten in disjunctive form
by converting the logical implication into a dis-
junction between the negation of the body and the
head. Per esempio, the rule above can be rewritten
as ¬Agree(X, C) ∨ ¬VoteFor(Y, X) ∨ Agree(Y, C).
The DRAIL program consists of a set of rules,
which can be weighted (cioè., soft constraints), O
unweighted (cioè., hard constraints). Each weighted
rule template defines a learning problem, used
to score assignments to the head of the rule.
Because the body may contain open atoms,
each rule represents a factor function expressing
dependencies between open atoms in the body
and head. Unweighted rules, or constraints, shape
the space of feasible assignments to open atoms,
and represent background knowledge about the
domain.
Given the set of grounded atoms O, rules can
be grounded by substituting their variables with
constants, such that the grounded atoms corres-
pond to elements in O. This process results in a set
of grounded rules, each corresponding to a poten-
tial function or to a constraint. Together they
define a factor graph. Then, DRAIL finds the
optimally scored assignments for open atoms by
performing MAP inference. To formalize this
processi, we first make the observation that rule
groundings can be written as linear inequalities,
directly corresponding to their disjunctive form,
come segue:
yi +
X
i∈I +
R
X
i∈I −
R
(1 − yi) ≥ 1
(1)
105
(I −
Where I +
R ) correspond to the set of open
R
atoms appearing in the rule that are not negated
(rispettivamente, negated). Now, MAP inference
can be defined as a linear program. Each rule
grounding r, generated from template t(R), con
input features xr and open atoms yr defines the
potential
ψr(xr, yr) = min
yi +
X
i∈I +
R
X
i∈I −
R
(1 − yi), 1
(2)
added to the linear program with a weight wr.
Unweighted rule groundings are defined as
C(xc, yc) = 1 −
yi −
X
i∈I +
C
X
i∈I −
C
(1 − yi)
(3)
with c(xc, yc) ≤ 0 added as a constraints to the
linear program. This way, the MAP problem can
be defined over the set of all potentials Ψ and the
set of all constraints C as
arg max
y∈ {0,1}N
P (sì|X) ≡ arg max
y∈ {0,1}n X
ψr,t∈Ψ
wr ψr(xr, yr)
such that c(xc, yc) ≤ 0; ∀c ∈ C
In addition to logical constraints, we also support
arithmetic constraints than can be written in the
form of linear combinations of atoms with an
inequality or an equality. Per esempio, we can
enforce the mutual exclusivity of liberal and
conservative ideologies for any user X by writing:
Ideology(X, “con”) + Ideology(X, “lib”) = 1
We borrow some additional syntax from PSL to
make arithmetic rules easier to use. Bach et al.
(2017) define a summation atom as an atom that
takes terms and/or sum variables as arguments.
A summation atom represents the summations of
ground atoms that can be obtained by substituting
individual variables and summing over all possible
constants for sum variables. Per esempio, we
could rewrite the above ideology constraint
as Ideology(X, +IO) = 1, where Ideology(X, +IO)
represents the summation of all atoms with
predicate Ideology that share variable X.
DRAIL uses two solvers, Gurobi
(Gurobi
Optimization, 2015) and AD3 (Martins et al.,
2015)
for exact and approximate inference,
rispettivamente.
To ground DRAIL programs in data, we create
an in-memory database consisting of all relations
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
expressed in the program. Observations associated
with each relation are provided in column sepa-
rated text files. DRAIL’s compiler instantiates the
program by automatically querying the database
and grounding the formatted rules and constraints.
3.2 Neural Components
Let r be a rule grounding generated from template
T, where t is tied to a neural scoring function
Φt and a set of parameters θt (Rule Layer in
Figura 2). In the previous section, we defined the
MAP problem for all potentials ψr(X, sì) ∈ Ψ in a
DRAIL program, where each potential has a weight
wr. Consider the following scoring function:
wr = Φt(xr, yr; θt) = Φt(xrel0, . . . , xreln−1; θt)
(4)
Notice that all potentials generated by the same
template share parameters. We define each scoring
function Φt over the set of atoms on the left
hand side of the rule template. Let t = rel0 ∧
rel1 ∧ . . . ∧ reln−1 ⇒ reln be a rule template.
Each atom reli is composed of a relation type, its
arguments and feature vectors for them, come mostrato
in Figure 2, “Input Layer”.
Given that a DRAIL program is composed of
many competing rules over the same problem, we
want to be able to share information between the
different decision functions. For this purpose, we
introduce RELNETS.
3.3 RELNETS
A DRAIL program often uses the same entities and
relations in multiple different rules. The symbolic
aspect of DRAIL allows us to constrain the values
of open relations, and force consistency across all
their occurrences. The neural aspect, as defined in
Eq. 4, associates a neural architecture with each
rule template, which can be viewed as a way to
embed the output relation.
Quello
the fact
We want
to exploit
there are
repeating occurrences of entities and relations
across different rules. Given that each rule defines
a learning problem, sharing parameters allows us
to shape the representations using complementary
learning objectives. This form of relational multi-
task learning is illustrated it in Figure 2, “RelNets
Layer”.
We formalize this idea by introducing relation-
specific and entity-specific encoders and their
parameters (φrel; θrel) E (φent; θent), which are
106
reused in all rules. As an example, let’s write
the formulation for the rules outlined in Figure 2,
where each relation and entity encoder is defined
over the set of relevant features.
wr0 = Φt0(φdebates(φuser, φtext))
wr1 = Φt1(φagree(φuser, φtext), φvotefor(φuser, φuser))
Note that entity and relation encoders can be
arbitrarily complex, depending on the application.
Per esempio, when dealing with text, we could
use BiLSTMs or a BERT encoder.
Our goal when using RELNETS is to learn entity
representations that capture properties unique to
their types (per esempio., utenti, issues), as well as relational
patterns that contextualize entities, allowing them
to generalize better. We make the distinction
between raw (or attributed) entities and symbolic
entities. Raw entities are associated with rich, yet
unstructured, information and attributes, ad esempio
text or user profiles. D'altra parte, symbolic
entities are well-defined concepts, and are not
associated with additional information, ad esempio
political ideologies (per esempio., liberal) and issues (per esempio.,
gun-control). With this consideration, we identify
two types of representation learning objectives:
Embed Symbol / Explain Data: Aligns the
embedding of symbolic entities and raw entities,
grounding the symbol in the raw data, and us-
ing the symbol embedding to explain properties
of previously unseen raw-entity instances. For
esempio, aligning ideologies and text to (1) obtain
an ideology embedding that
to the
statements made by people with that ideology,
O (2) interpret text by providing a symbolic label
for it.
is closest
Translate / Correlate: Aligns the represen-
tation of pairs of symbolic or raw entities. For
esempio, aligning user representations with text,
to move between social and textual information, COME
shown in Figure 1, “Social-Linguistic Relations”.
Or capturing the correlation between sym-
bolic judgements like agreement and matching
ideologies.
4 Apprendimento
The scoring function used for comparing output
assignments can be learned locally for each
rule separately, or globally, by considering the
dependencies between rules.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Global Learning The global approach uses
inference to ensure that the parameters for all
weighted rule templates are consistent across all
decisions. Let Ψ be a factor graph with potentials
{ψr} ∈ Ψ over the all possible structures Y .
Let θ = {θt} be a set of parameter vectors, E
Φt(xr, yr; θt) be the scoring function defined for
potential ψr(xr, yr). Here ˆy ∈ Y corresponds
to the current prediction resulting from the MAP
inference procedure and y ∈ Y corresponds to the
gold structure. We support two ways to learn θ:
(1) The structured hinge loss
max(0, max
ˆy∈Y
(∆(ˆy, sì) +
−
Φt(xr, ˆyr; θt))
Φt(xr, yr; θt)
X
ψr∈Ψ
X
ψr∈Ψ
(5)
(2) The general CRF loss
−log p(sì|X)= −log
1
Z(X) Y
ψr∈Ψ
esp
Φt(xr, yr; θt)
(cid:9)
(cid:8)
= −
X
ψr ∈Ψ
Φt(xr, yr; θt) + log Z(X)
(6)
Where Z(X) is a global normalization term
computed over the set of all valid structures Y .
Z(X) =
X
y’∈Y
Y
ψr∈Ψ
esp
Φt(xr, y′
(cid:8)
R; θt)
(cid:9)
When inference is intractable, approximate
inference (per esempio., AD3) can be used to obtain ˆy.
To approximate the global normalization term
Z(X) in the general CRF case, we follow Zhou
et al. (2015); Andor et al. (2016) and keep a pool
βk of k of high-quality feasible solutions during
inference. This way, we can sum over the solutions
in the pool to approximate the partition function
Φt(xr, y′
Py’∈βk Qψr∈Ψ exp
in questo documento, we use the structured hinge loss
for most experiments, and include a discussion on
the approximated CRF loss in Section 5.7.
R; θt)
.
(cid:9)
(cid:8)
Inference The parameters
Joint
each
weighted rule template are optimized indepen-
dently. Following Andor et al. (2016), we show
that joint inference serves as a way to greedily
approximate the CRF loss, where we replace the
for
107
normalization term in Eq. (6) with a greedy
approximation over local normalization as:
1
Qψr ∈Ψ ZL(xr) Y
Φt(xr, yr; θt) +
ψr ∈Ψ
−log
= −
X
ψr ∈Ψ
esp
Φt(xr, yr; θt)
(cid:9)
(cid:8)
X
ψr ∈Ψ
log ZL(xr)
(7)
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
where ZL(xr) is computed over all the valid
assignments y′
r for each factor ψr. We refer to
models that use this approach as JOINTINF.
ZL(xr) =
esp
(cid:8)
X
y′
R
Φt(xr, y′
R; θt)
(cid:9)
5 Experimental Evaluation
We compare DRAIL to representative models from
each category covered in Section 2. Our goal is to
examine how different types of approaches cap-
ture dependencies and what are their limitations
when dealing with language interactions. These
baselines are described in Section 5.1. We also
evaluate different strategies using DRAIL in
Sezione 5.2.
We focus on three tasks: open debate stance
prediction (Sez. 5.3), issue-specific stance pre-
diction (Sez. 5.4) and argumentation mining
(Sez. 5.5), details regarding the hyper-parameters
used for all tasks can be found in Appendix B.
5.1 Baselines
End-to-end Neural Nets: We test all approaches
against neural nets trained locally on each task,
without explicitly modeling dependencies. In this
spazio, we consider two variants: INDNETS, Dove
each component of the problem is represented
using an independent neural network, and E2E,
where the features for the different components
are concatenated at the input and fed to a single
neural network.
Relational Embedding Methods: Introduced
in Section 2.2,
these methods embed nodes
and edge types for relational data. They are
typically designed to represent symbolic entities
and relations. Tuttavia, because our entities can be
defined by raw textual content and other features,
we define the relational objectives over our
encoders. This adaptation has proven successful
for domains dealing with rich textual information
three
(Lee and Goldwasser, 2019). We test
relational knowledge objectives: TransE (Bordes
et al., 2013), ComplEx (Trouillon et al., 2016),
and RotatE (Sole et al., 2019). Limitations: (1)
These approaches cannot constrain the space using
domain knowledge, E (2) they cannot deal with
relations involving more than two entities, limiting
their applicability to higher order factors.
Probabilistic Logics: We compare to PSL
(Bach et al., 2017), a purely symbolic probabilistic
logic, and TensorLog (Cohen et al., 2020), UN
neuro-symbolic one. In entrambi i casi, we instantiate
the program using the weights learned with our
base encoders. Limitations: These approaches do
not provide a way to update the parameters of the
base classifiers.
5.2 Modeling Strategies
Local vs. Global Learning: The trade-off be-
tween local and global learning has been explored
for graphical models (MEMM vs. CRF), and for
deep structured prediction (Chen and Manning,
2014; Andor et al., 2016; Han et al., 2019).
Although local
the learned
scoring functions might not be consistent with
the correct global prediction. Following (Han
et al., 2019), we initialize the parameters using lo-
cal models.
learning is faster,
RELNETS: We will show the advantage of
having relational representations that are shared
across different decisions, in contrast to having
independent parameters for each rule. Note that
in all cases, we will use the global learning objec-
tive to train RELNETS.
Modularity: Decomposing decisions
into
relevant modules has been shown to simplify the
learning process and lead to better generalization
(Zhang and Goldwasser, 2019). We will contrast
the performance of modular and end-to-end mod-
els to represent text and user information when
predicting stances.
Representation Learning and Interpretabil-
ità: We will do a qualitative analysis to show how
we are able to embed symbols and explain data
by moving between symbolic and sub-symbolic
representations, as outlined in Section 3.3.
5.3 Open Domain Stance Prediction
Traditionally, stance prediction tasks have focused
on predicting stances on a specific topic, ad esempio
abortion. Predicting stances for a different topic,
such as gun control, would require learning a new
108
Figura 3: DRAIL Program for O.D. Stance Prediction.
T: Thread, C: Claim, P: Post, U: Utente, V: Voter, IO: Ideology, UN,B:
Can be any in {Claim, Post, Utente}
model from scratch. In this task, we would like to
leverage the fact that stances in different domains
are correlated. Instead of using a pre-defined set
of debate topics (cioè., symbolic entities) we define
the prediction task over claims, expressed in text,
specific to each debate. Concretely, each debate
will have a different claim (cioè., different value for
C in the relation Claim(T, C), where T corresponds
to a debate thread). We refer to these settings as
Open-Domain and write down the task in Figure 3.
In addition to the textual stance prediction problem
(r0), where P corresponds to a post, we represent
utenti (U) and define a user-level stance prediction
problem (r1). We assume that additional users read
the posts and vote for content that supports their
views, resulting in another prediction problem
(r2,r3). Then, we define representation learning
compiti, which align symbolic (ideology, defined as
IO) and raw (users and text) entities (r4-r7). Finalmente,
we write down all dependencies and constrain the
final prediction (c0-c7).
Dataset: We collected a set of 7,555 debates
from debate.org, containing a total of 42,245
posts across 10 broader political issues. For a
given issue, the debate topics are nuanced and
vary according to the debate question expressed in
testo (per esempio., Should semi-automatic guns be banned,
Conceal handgun laws reduce violent crime).
Debates have at least two posts, containing up
A 25 sentences each. In addition to debates and
posts, we collected the user profiles of all users
participating in the debates, as well as all users
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Model
Random
U
Hard
U
P
P
V
Local
INDNETS
E2E
TransE
Reln.
Emb. ComplEx
V
63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
58.5 54.1 52.6 57.2 53.1 51.2
61.0 63.3 58.1 57.3 55.0 55.4
59.6 58.3 54.2 57.9 54.6 51.0
Prob.
78.7 77.5 55.4 72.6 71.8 52.6
Logic. TensorLog 72.7 71.9 56.2 70.0 67.4 55.8
80.2 79.2 54.4 76.9 75.5 51.3
80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
81.9 80.4 57.0 78.0 77.2 53.7
E2E +Inf
JOINTINF
GLOBAL
RELNETS
RotatE
PSL
DRaiL
Local
AC
AC
DC
AC
DC
SC
Model
Random
U
Hard
U
P
P
V
V
INDNETS 63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
E2E
JOINTINF 73.6 71.8
GLOBAL
73.6 72.0
RELNETS 73.8 72.0
JOINTINF 80.7 79.5
GLOBAL
81.4 79.9
RELNETS 81.8 80.1
JOINTINF 80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
GLOBAL
RELNETS 81.9 80.4 57.0 78.0 77.2 53.7
69.0 67.2
69.0 67.2
71.7 69.5
75.6 74.4
75.8 74.6
77.8 76.4
–
–
–
–
–
–
–
–
–
–
–
–
Tavolo 2: General Results for Open Domain Stance Prediction (Left), Variations of the Model (Right). P:Post,
U:Utente, V:Voter
that cast votes for the debate participants. Profiles
ideology).
consist of attributes (per esempio., genere,
User data is considerably sparse. We create two
evaluation scenarios, random and hard. Nel
random split, debates are randomly divided into
ten folds of equal size. In the hard split, debates
are separated by political issue. This results in a
harder prediction problem, as the test data will not
share topically related debates with the training
dati. We perform 10-fold cross validation and
report accuracy.
Entity and Relation Encoders: We represent
posts and titles using a pre-trained BERT-small2
codificatore (Turc et al., 2019), a compact version of
the language model proposed by Devlin et al.
2019. For users, we use feed-forward computa-
tions with ReLU activations over the profile fea-
tures and a pre-trained node embedding (Grover
and Leskovec, 2016) over the friendship graph.
All relation and rule encoders are represented
as feed-forward networks with one hidden layer,
ReLU activations and a softmax on top. Note that
all of these modules are updated during learning.
Tavolo 2 (Left) shows results for all the models
described in Section 5.1. In E2E models, post and
user information is collapsed into a single mod-
ule (rule), whereas in INDNETS, JOINTINF, GLOBAL
and RELNETS they are modeled separately. Tutto
other baselines use the same underlying modular
encoders. We can appreciate the advantage of
relational embeddings in contrast to INDNETS for
user and voter stances, particularly in the case of
ComplEx and RotatE. We can attribute this to the
2We found negligible difference in performance between
BERT and BERT-small for this task, while obtaining a
considerable boost in speed.
fact that all objectives are trained jointly and entity
encoders are shared. Tuttavia, approaches that
explicitly model inference, like PSL, TensorLog,
and DRAIL outperform relational embeddings and
end-to-end neural networks. This is because they
enforce domain constraints.
We explain the difference between the per-
formance of DRAIL and the other probabilistic
logics by: (1) The fact that we use exact inference
instead of approximate inference, (2) PSL learns
to weight the rules without giving priority to a
particular task, whereas the JOINTINF model works
directly over the local outputs, and most impor-
tantly, (3) our GLOBAL and RELNETS models back-
propagate to the base classifiers and fine-tune
parameters using a structured objective.
In Table 2 (Right) we show different versions
of the DRAIL program, by adding or removing
certain constraints. AC models only enforce
author consistency, AC-DC models enforce both
author consistency and disagreement between
respondents, and finally, AC-DC-SC models in-
troduce social information by considering voting
behavior. We get better performance when we
model more contextualizing information for the
RELNETS case. This is particularly helpful in the
Hard case, where contextualizing information,
combined with shared representations, help the
model generalize to previously unobserved topics.
With respect to the modeling strategies listed in
Sezione 5.2, we can observe: (1) The advantage of
using a global learning objective, (2) the advantage
of using RELNETS to share information and (3) IL
advantage of breaking down the decision into
modules, instead of learning an end-to-end model.
Then, we perform a qualitative evaluation to
illustrate our ability to move between symbolic
109
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Issue Debate Statements
Guns
No gun laws should be passed restricting the right to bear arms
Gun control is an ineffective comfort tactic used by the government to fool the American people
Gun control is good for society
In the US handguns ought to be banned
The USA should ban most guns and confiscate them
Con Libt Mod Libl Pro
.98
.00
.00
.03
.22
.08
.60
.06
.14
.02
.01
.03
.99
.01
.00
.01
.02
.15
.93
.00
.01
.65
.06
.01
.00
Issue
Ideology Statements close in the embedding space
LGBT
Libl
Contro
gay marriage ought be legalized, gay marriage should be legalized, same-sex marriage should be federally legal
Leviticus 18:22 E 20:13 prove the anti gay marriage position, gay marriage is not bad, homosexuality is not a sin nor taboo
Tavolo 3: Representation Learning Objectives: Explain Data (Top) and Embed Symbol (Bottom).
Note that ideology labels were learned from user profiles, and do not necessarily represent the official stances of political
parties.
and raw information. Tavolo 3 (Top) takes a
set of statements and explains them by looking
at the symbols associated with them and their
score. For learning to map debate statements
to ideological symbols, we rely on the partial
supervision provided by the users that self-identify
with a political ideology and disclose it on their
public profiles. Note that we do not incorporate
any explicit expertise in political science to learn
to represent ideological information. We chose
statements with the highest score for each of the
ideologies. We can see that, nel contesto di
guns, statements that have to do with some form
of gun control have higher scores for the center-to-
left spectrum of ideological symbols (moderate,
liberal, progressive), whereas statements that
mention gun rights and the ineffectiveness of
gun control policies have higher scores for
conservative and libertarian symbols.
this evaluation,
in Table 3
(Bottom), we embed ideologies and find three
example statements that are close in the em-
bedding space. In the context of LGBT issues, we
find that statements closest to the liberal symbol
are those that support the legalization of same-
sex marriage, and frame it as a constitutional
issue. D'altra parte, the statements closest
to the conservative symbol, frame homosexuality
and same-sex marriage as a moral or religious
issue, and we find statements both supporting
and opposing same-sex marriage. This experiment
shows that our model is easy to interpret, E
provides an explanation for the decision made.
To complement
Finalmente, we evaluate our learned model over
entities that have not been observed during
training. To do this, we extract statements made by
three prominent politicians from ontheissues.org.
Then, we try to explain the politicians by looking
Questo
their predicted ideology. Results for
at
Model
AB
E
GM
GC
S
UN
S
UN
S
UN
S
UN
INDNETS
TransE
Local
Reln.
Embed. ComplEx
66.0 61.7 58.2 59.7 62.6 60.6 59.5 61.0
62.5 62.9 53.5 65.1 58.7 69.3 55.3 65.0
66.6 73.4 60.7 72.2 66.6 72.8 60.0 70.7
66.6 72.3 59.2 71.3 67.0 74.2 59.4 69.9
RotatE
PSL
81.6 74.4 69.0 64.9 83.3 74.2 71.9 71.7
TensorLog 77.3 61.3 68.2 51.3 80.4 65.2 68.3 55.6
82.8 74.6 64.8 63.2 84.5 73.4 70.4 66.3
JOINTINF
88.6 84.7 72.8 72.2 90.3 81.8 76.8 72.2
DRAIL GLOBAL
89.0 83.5 80.5 76.4 89.3 82.1 80.3 73.4
RELNETS
Prob.
Logic.
Tavolo 4: General results for issue-specific stance
and agreement prediction (Macro F1). AB: Abortion,
E: Evolution, GM: Gay Marriage, GC: Gun Control.
evaluation can be seen in Table 4. The left part of
Figura 4 shows the proportion of statements that
were identified for each ideology: left (liberal or
progressive), moderate and right (conservative).
We find that we are able to recover the relative
positions in the political spectrum for the evaluated
politicians: Bernie Sanders, Joe Biden, and Donald
Trump. We find that Sanders is the most left
leaning, followed by Biden. In contrasto, Donald
Trump stands mostly on the right. We also include
some examples of the classified statements. Noi
show that we are able to identify cases in which
the statement does not necessarily align with the
known ideology for each politician.
5.4 Issue-Specific Stance Prediction
Given a debate thread on a specific issue (per esempio.,
abortion), the task is to predict the stance with
respect to the issue for each one of the debate
posts (Walker et al., 2012). Each thread forms a
tree structure, where users participate and respond
to each other’s posts. We treat the task as a
collective classification problem, and model the
agreement between posts and their replies, anche
as the consistency between posts written by the
110
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figura 4: Statements made by politicians classified using our model trained on debate.org.
same author. The DRAIL program for this task can
be observed in Appendix A.
Dataset: We use the 4Forums dataset from the
Internet Argument Corpus (Walker et al., 2012),
consisting of a total of 1,230 debates and 24,658
posts on abortion, evolution, gay marriage, E
gun control. We use the same splits as Li et al.
(2018) and perform 5-fold cross validation.
Entity and Relation Encoders: We repre-
sented posts using pre-trained BERT encoders
(Devlin et al., 2019) and do not generate features
for authors. As in the previous task, we model all
relations and rules using feed-forward networks
with one hidden layer and ReLU activations. Note
that we fine-tune all parameters during training.
In Table 4 we can observe the general results for
this task. We report macro F1 for post stance and
agreement between posts for all issues. As in the
previous task, we find that ComplEx and RotatE
relational embeddings outperform INDNETS, E
probabilistic logics outperform methods that do
not perform constrained inference. PSL out-
performs JOINTINF for evolution and gun control
debates, which are the two issues with less
training data, whereas JOINTINF outperforms PSL
for debates on abortion and gay marriage. Questo
could indicate that re-weighting rules may be
advantageous for the cases with less supervision.
Finalmente, we see the advantage of using a global
learning objective and augmenting it with shared
representations. Tavolo 5 compares our model with
previously published results.
5.5 Argument Mining
The goal of this task is to identify argumentative
structures in essays. Each argumentative structure
corresponds to a tree in a document. Nodes are
predefined spans of text and can be labeled
either as claims, major claims, or premises,
Model
UN
E
GM GC
Avg
BERT (Devlin et al., 2019)
PSL (Sridhar et al., 2015B)
Struct. Rep. (Li et al., 2018)
67.0
77.0
86.5
62.4
80.3
82.2
67.4
80.5
87.6
64.6
69.1
83.1
65.4
76.7
84.9
DRAIL RELNETS
89.2
82.4
90.1
83.1
86.2
Tavolo 5: Previous work on issue-specific stance
prediction (stance acc.).
and edges correspond to support/attack relations
between nodes. Domain knowledge is injected by
constraining sources to be premises and targets
to be either premises or major claims, anche
as enforcing tree structures. We model nodes,
links, and second order relations, grandparent
(a → b → c), and co-parent (a → b ← c)
(Niculae et al., 2017). Additionally, we consider
link labels, denoted stances. The DRAIL program
for this task can be observed in Appendix A.
Dataset: We used the UKP dataset (Stab and
Gurevych, 2017), consisting of 402 documents,
with a total of 6,100 propositions and 3,800 links
(17% of pairs). We use the splits used by Niculae
et al. (2017), and report macro F1 for components
and positive F1 for relations.
Entity and Relation Encoders: To represent
the component and the essay, we used a
BiLSTM over
initialized with
the words,
GloVe embeddings (Pennington et al., 2014),
concatenated with a feature vector following
Niculae et al. (2017). For representing the relation,
we use a feed-forward computation over the
components, as well as the relation features used
in Niculae et al. (2017).
We can observe the general results for this
task in Table 6. Given that this task relies on
constructing the tree from scratch, we find that all
methods that do not include declarative constraints
(INDNETS and relational embeddings) suffer when
111
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Local
Reln.
Embed.
DRAIL
Prob. Logic PSL
Model
Node Link Avg Stance Avg
INDNETS
70.7
52.8 61.7
63.4
62.3
TransE
65.7
ComplEx 69.1
67.2
RotatE
76.5
78.6
83.1
82.9
JOINTINF
GLOBAL
RELNETS
23.7 44.7
15.7 42.4
20.7 44.0
56.4 66.5
59.5 69.1
61.2 72.2
63.7 73.3
44.6
53.5
46.7
64.7
62.9
69.2
68.4
44.7
46.1
44.9
65.9
67.0
71.2
71.7
Tavolo 6: General results for argument mining.
Model
Node Link Avg
Human upper bound
ILP Joint (Stab and Gurevych, 2017)
Struct RNN strict (Niculae et al., 2017)
Struct SVM full (Niculae et al., 2017)
Joint PointerNet (Potash et al., 2017)
Kuribayashi et al. 2019
86.8
82.6
79.3
77.6
84.9
85.7
75.5 81.2
58.5 70.6
50.1 64.7
60.1 68.9
60.8 72.9
67.8 76.8
DRAIL RELNETS
82.9
63.7 73.3
Tavolo 7: Previous work on argument mining.
trying to predict links correctly. For this task, we
did not apply TensorLog, given that we couldn’t
find a way to express tree constraints using their
syntax. Once again, we see the advantage of using
global learning, as well as sharing information
between rules using RELNETS.
Tavolo 7 shows the performance of our model
against previously published results. While we
are able to outperform models that use the same
underlying encoders and features, recent work
by Kuribayashi et al. (2019) further improved
performance by exploiting contextualized word
embeddings that look at the whole document,
and making a distinction between argumentative
markers and argumentative components. Noi
did not find a significant improvement by in-
corporating their ELMo-LSTM encoders into
our framework,3 nor by replacing our BiLSTM
encoders with BERT. We leave the exploration
of an effective way to leverage contextualized
embeddings for this task for future work.
5.6 Run-time Analysis
In this section, we perform a run-time analysis
of all probabilistic logic systems tested. All ex-
periments were run on a 12 core 3.2Ghz Intel i7
CPU machine with 63GB RAM and an NVIDIA
GeForce GTX 1080 Ti 11GB GDDR5X GPU.
3We did not experiment with their normalization
approach, extended BoW features, nor AC/AM distinction.
Figura 5: Average overall training time (per fold).
Figura 5 shows the overall training time (per
fold) in seconds for each of the evaluated tasks.
Note that the figure is presented in logarithmic
scala. We find that DRAIL is generally more
computationally expensive than both TensorLog
and PSL. This is expected given that DRAIL back-
propagates to the base classifiers at each epoch,
while the other frameworks just take the local
predictions as priors. Tuttavia, when using a large
number of arithmetic constraints (per esempio., Argument
Mining), we find that PSL takes a really long time
to train. We found no significant difference when
using ILP or AD.3 We presume that this is due to
the fact that our graphs are small and that Gurobi
is a highly optimized commercial software.
Finalmente, we find that when using encoders with
a large number of parameters (per esempio., BERT) In
tasks with small graphs, the difference in training
time between training local and global models
is minimal. In these cases, back-propagation is
considerably more expensive than inference, E
global models converge in fewer epochs. For
Argument Mining, local models are at least twice
as fast. BiLSTMs are considerably faster than
BERT, and inference is more expensive for this
task.
5.7 Analysis of Loss Functions
In this section we perform an evaluation of the
CRF loss for issue-specific stance prediction.
Note that one drawback of the CRF loss (Eq. 6)
is that we need to accumulate the gradient for
the approximated partition function. When using
entity encoders with a lot of parameters (per esempio.,
BERT), the amount of memory needed for a single
instance increases. We were unable to fit the full
models in our GPU. For the purpose of these tests,
we froze the BERT parameters after local training
112
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Model
Stance Agree
Avg
Secs p/epoch
Hinge loss
CRF(β = 5)
CRF(β = 20)
CRF(β = 50)
82.74
83.09
84.10
84.19
78.54
81.03
82.16
81.80
80.64
82.06
83.13
83.00
132
345
482
720
Tavolo 8: Stance prediction (abortion) dev results
for different training objectives.
and updated only the relation and rule parameters.
To obtain the solution pool, we use Gurobi’s pool
search mode to find β high-quality solutions. Questo
also increases the cost of search at inference time.
Development set results for the debates on
abortion can be observed in Table 8. While
increasing the size of the solution pool leads
it comes at a higher
to better performance,
computational cost.
6 Conclusions
in questo documento, we motivate the need for a
declarative neural-symbolic approach that can be
applied to NLP tasks involving long texts and
contextualizing information. We introduce a gen-
eral framework to support this, and demonstrate
its flexibility by modeling problems with diverse
relations and rich representations, and obtain
models that are easy to interpret and expand.
The code for DRAIL and the application examples
in this paper have been released to the community,
to help promote this modeling approach for other
applications.
Ringraziamenti
We would like to acknowledge current and former
members of the PurdueNLP lab, particularly Xiao
Zhang, Chang Li, Ibrahim Dalal, I-Ta Lee, Ayush
Jain, Rajkumar Pujari, and Shamik Roy for their
help and insightful discussions in the early stages
of this project. We also thank the reviewers and
action editor for their constructive feedback. Questo
project was partially funded by the NSF, grant
CNS-1814105.
Riferimenti
Daniel Andor, Chris Alberti, David Weiss,
Aliaksei Severyn, Alessandro Presta, Kuzman
Ganchev, Slav Petrov, and Michael Collins.
2016. Globally normalized transition-based
Negli Atti di
IL
neural networks.
54esima Assemblea Annuale dell'Associazione per
Linguistica computazionale (Volume 1: Lungo
Carte), pages 2442–2452, Berlin, Germany.
Associazione per la Linguistica Computazionale.
DOI: https://doi.org/10.18653/v1
/P16-1231
Stephen H. Bach, Matthias Broecheler, Bert
Huang, and Lise Getoor. 2017. Hinge-loss
markov random fields and probabilistic soft
logic. Journal of Machine Learning Research
(JMLR), 181–67.
Islam Beltagy, Katrin Erk, and Raymond Mooney.
2014. Probabilistic soft
logic for semantic
textual similarity. In Proceedings of the 52nd
Annual Meeting of the Association for Compu-
linguistica nazionale (Volume 1: Documenti lunghi),
pages 1210–1219. Baltimore, Maryland. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P14
-1114
Antoine Bordes, Nicolas Usunier, Alberto
Garcia-Duran, Jason Weston, and Oksana
Yakhnenko. 2013, Translating embeddings
for modeling multi-relational data, C. J. C.
Burges, l. Bottou, M. Welling, Z. Ghahramani,
and K. Q. Weinberger, editors, Advances in
Neural Information Processing Systems 26,
pages 2787–2795. Curran Associates, Inc.
Samuel R. Bowman, Gabor Angeli, Christopher
Potts, e Christopher D. Equipaggio. 2015. UN
large annotated corpus for learning natural
language inference. Negli Atti del 2015
Conference on Empirical Methods in Natural
Language Processing, pages 632–642. Lisbon,
for Computational
Portugal, Association
Linguistica. DOI: https://doi.org/10
.18653/v1/D15-1075
Danqi Chen and Christopher Manning. 2014.
A fast and accurate dependency parser using
neural networks. Negli Atti del 2014
Conference on Empirical Methods in Natural
Language Processing (EMNLP), pages 740–750.
Doha, Qatar. Associazione per il calcolo
Linguistica. DOI: https://doi.org/10
.3115/v1/D14-1082
William W. Cohen, Fan Yang, and Kathryn
Mazaitis. 2020. Tensorlog: A probabilistic
113
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
database implemented using deep-learning
infrastructure. Journal of Artificial Intelligence
Research, 67:285–325. DOI: https://doi
.org/10.1613/jair.1.11944
Jacob Devlin, Ming-Wei Chang, Kenton Lee, E
Kristina Toutanova. 2019. BERT: Pre-training
of deep bidirectional transformers for language
understanding. Negli Atti di
IL 2019
Conference of the North American Chapter of
the Association for Computational Linguistics:
Tecnologie del linguaggio umano, Volume 1
(Long and Short Papers), pages 4171–4186,
for
Minneapolis, Minnesota. Association
Linguistica computazionale.
Ivan Donadello, Luciano Serafini, and Artur
d’Avila Garcez. 2017. Logic tensor networks
for semantic image interpretation. In Procedi-
ings of the Twenty-Sixth International Joint
Conference on Artificial Intelligence, IJCAI-
17, pages 1596–1602. DOI: https://doi
.org/10.24963/ijcai.2017/221
Honghua Dong, Jiayuan Mao, Tian Lin, Chong
Wang, Lihong Li, and Denny Zhou. 2019.
Neural logic machines. In 7th International
Conference on Learning Representations, ICLR
2019, New Orleans, LA, USA, May 6-9, 2019.
Alessandro Duranti
and Charles Goodwin.
1992. Rethinking context: An introduction,
Alessandro Duranti and Charles Goodwin,
editors, Rethinking Context: Language as an
Interactive Phenomenon, chapter 1, pages 1–42,
Cambridge University Press, Cambridge.
Noah D. Goodman, Vikash K. Mansinghka,
Daniel M. Roy, Keith Bonawitz, and Joshua B.
Tenenbaum. 2008. Church: a language for gen-
erative models. In UAI 2008, Proceedings of
the 24th Conference in Uncertainty in Artificial
Intelligenza, Helsinki, Finland, Luglio 9-12, 2008,
pages 220–229.
Aditya Grover
Di
feature
Negli Atti
apprendimento
IL
and Jure Leskovec. 2016.
for
node2vec: Scalable
22nd
networks.
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San
Francesco, CA, USA, agosto 13-17, 2016,
pages 855–864. DOI: https://doi.org
/10.1145/2939672.2939754, PMID:
27853626, PMCID: PMC5108654
John
revisited.
J. Gumperz.
In
editors,
1992. Contextualiza-
e A.
P. Auer
zione
The Contextualiza-
Luzio,
Di
of Language, pages 39–54. DOI:
zione
https://doi.org/10.1075/pbns.22
.04gum
Gurobi Optimization Inc. 2015. Gurobi optimizer
reference manual.
Will Hamilton, Zhitao Ying, and Jure Leskovec.
2017. Inductive representation learning on large
graphs. In I. Guyon, U. V. Luxburg, S. Bengio,
H. Wallach, R. Fergus, S. Vishwanathan, E
R. Garnett, editors, Advances in Neural Inform-
ation Processing Systems 30, Curran Asso-
ciates, Inc., pages 1024–1034.
Rujun Han, Qiang Ning, and Nanyun Peng.
2019. Joint event and temporal relation extrac-
tion with shared representations and structured
prediction. Negli Atti del 2019 Contro-
ference on Empirical Methods in Natural
Language Processing and the 9th Interna-
tional Joint Conference on Natural Language
in lavorazione (EMNLP-IJCNLP), pages 434–444,
Hong Kong, China, Association for Com-
Linguistica putazionale. DOI: https://doi
.org/10.18653/v1/D19-1041
Seyed Mehran Kazemi and David Poole. 2018.
ReLNN: A deep neural model for relational
apprendimento. In Proceedings of the Thirty-Second
AAAI Conference on Artificial Intelligence,
(AAAI-18), the 30th innovative Applications of
Artificial Intelligence (IAAI-18), and the 8th
AAAI Symposium on Educational Advances
in Artificial
Intelligenza (EAAI-18), Nuovo
Orleans, Louisiana, USA, Febbraio 2-7, 2018,
pages 6367–6375, AAAI Press.
Eliyahu Kiperwasser and Yoav Goldberg. 2016.
Simple and accurate dependency parsing using
bidirectional LSTM feature representations.
the Association for Com-
Transactions of
4:313–327. DOI:
putational
https://doi.org/10.1162/tacl a
00101
Linguistica,
Thomas N. Kipf and Max Welling. 2017.
Semi-supervised classification with graph
convolutional networks. In 5th International
Conference on Learning Representations, ICLR
114
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
2017, Toulon, France, April 24-26, 2017,
Conference Track Proceedings.
Parisa Kordjamshidi, Dan Roth, and Hao Wu.
2015. Saul: Towards declarative learning
IL
based programming. Negli Atti di
Twenty-Fourth International Joint Confer-
ence on Artificial Intelligence, IJCAI 2015,
Buenos Aires, Argentina, Luglio 25-31, 2015,
pages 1844–1851.
Tatsuki Kuribayashi, Hiroki Ouchi, Naoya
Inoue, Paul Reisert, Toshinori Miyoshi, Jun
Suzuki, and Kentaro Inui. 2019. An empirical
study of span representations in argumenta-
tion structure parsing. Negli Atti del
57esima Assemblea Annuale dell'Associazione per
Linguistica computazionale, pages 4691–4698,
Florence,
Italy. Associazione per il calcolo-
linguistica nazionale. DOI: https://doi.org
/10.18653/v1/P19-1464
Negli Atti di
Guillaume Lample, Miguel Ballesteros, Sandeep
Subramanian, Kazuya Kawakami, and Chris
Dyer. 2016. Neural architectures for named
IL
entity recognition.
2016 Conference of
the North American
Capitolo dell'Associazione per il calcolo
Linguistica: Tecnologie del linguaggio umano,
pagine
260–270, San Diego, California,
Associazione per la Linguistica Computazionale.
DOI: https://doi.org/10.18653/v1
/N16-1030
I-Ta Lee and Dan Goldwasser. 2019. Multi-
relational script learning for discourse relations.
In Proceedings of the 57th Annual Meeting
of the Association for Computational Linguis-
tic, pages 4214–4226, Florence, Italy. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P19
-1413
Chang Li, Aldo Porco, and Dan Goldwasser.
2018. Structured representation learning for
online debate stance prediction. Negli Atti
the 27th International Conference on
Di
Linguistica computazionale, pages 3728–3739,
Santa Fe, New Mexico, USA. Associazione per
Linguistica computazionale.
improve automatic event detection. Nel professionista-
ceedings of the 54th Annual Meeting of the
Associazione per la Linguistica Computazionale
(Volume 1: Documenti lunghi), pages 2134–2143.
Berlin, Germany, Associazione per il calcolo-
linguistica nazionale.
Xuezhe Ma and Eduard Hovy. 2016. End-
to-end sequence labeling via bi-directional
IL
LSTM-CNNs-CRF.
54esima Assemblea Annuale dell'Associazione per
Linguistica computazionale (Volume 1: Lungo
Carte), pages 1064–1074, Berlin, Germany,
Associazione per la Linguistica Computazionale.
Negli Atti di
Chaitanya Malaviya, Matthew R. Gormley,
and Graham Neubig. 2018. Neural
factor
graph models for cross-lingual morphological
tagging. In Proceedings of the 56th Annual
Meeting of the Association for Computatio-
nal Linguistics (Volume 1: Documenti lunghi),
pages 2653–2663. Melbourne, Australia. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P18
-1247
Robin Manhaeve, Sebastijan Dumancic, Angelika
Kimmig, Thomas Demeester, and Luc De
Raedt. 2018. Deepproblog: Neural probabilis-
tic logic programming.
In S. Bengio, H.
Wallach, H. Larochelle, K. Grauman, N. Cesa-
Bianchi, and R. Garnett, editors, Advances in
Neural Information Processing Systems 31,
pages 3749–3759, Curran Associates, Inc.
Jiayuan Mao, Chuang Gan, Pushmeet Kohli,
Joshua B. Tenenbaum, and Jiajun Wu. 2019.
The neuro-symbolic concept learner: Interpret-
ing scenes, parole, and sentences from natural
supervisione.
Giuseppe
Marra,
Francesco
Giannini,
Michelangelo Diligenti, and Marco Gori.
2019. Integrating learning and reasoning with
deep logic models. In Machine Learning and
Knowledge Discovery in Databases – European
Conferenza, ECML PKDD 2019, W¨urzburg,
Germany, settembre 16-20, 2019, Proceed-
ing, Part II, pages 517–532. DOI: https://
doi.org/10.1007/978-3-030-46147
-8 31
Shulin Liu, Yubo Chen, Shizhu He, Kang Liu,
and Jun Zhao. 2016. Leveraging FrameNet to
Andr´e F. T. Martins, M´ario A. T. Figueiredo,
Pedro M. Q. Aguiar, Noah A. Smith, E
115
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Eric P. Xing. 2015. Ad3: Alternating direc-
tions dual decomposition for map inference in
graphical models. Journal of Machine Learn-
ing Research, 16(16):495–545.
Andrew McCallum, Karl Schultz, and Sameer
Singh. 2009. Factorie: Probabilistic program-
ming via imperatively defined factor graphs.
Y. Bengio, D. Schuurmans, J. D. Lafferty,
C. K. IO. Williams, e A. Culotta, editors,
Advances
Information Process-
ing Systems 22, pages 1249–1257, Curran
Associates, Inc.
in Neural
Miller McPherson, Lynn Smith-Lovin,
E
James M. Cook. 2001. Birds of a feather: Homo-
phily in social networks. Annual Review of
Sociology, 27(1):415–444. DOI: https://
doi.org/10.1146/annurev.soc.27.1
.415
Brian Milch, Bhaskara Marthi, Stuart Russell,
David Sontag, Daniel L. Ong, and Andrey
Kolobov. 2005. Blog: Probabilistic models with
unknown objects. In (IJCAI ’05) Nineteenth
Conferenza congiunta internazionale sull'artificiale
Intelligenza.
Vlad Niculae, Joonsuk Park, and Claire Cardie.
2017. Argument mining with structured SVMs
and RNNs. In Proceedings of the 55th Annual
Meeting of
the Association for Computa-
linguistica nazionale (Volume 1: Documenti lunghi),
pages 985–995, Vancouver, Canada. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P17
-1091
Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth.
2018. Joint reasoning for temporal and causal
relations. In Proceedings of the 56th Annual
Meeting of
the Association for Compu-
linguistica nazionale (Volume 1: Documenti lunghi),
2278–2288, Melbourne, Australia,
pagine
Associazione per la Linguistica Computazionale.
DOI: https://doi.org/10.18653/v1
/P18-1212
Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi
Zhang, and Yang Wang. 2016. Tri-party deep
network representation. Negli Atti del
Twenty-Fifth International Joint Conference on
Artificial Intelligence, IJCAI 2016, New York,
NY, USA, 9-15 Luglio 2016, pages 1895–1901.
116
Jeffrey
Socher,
Pennington, Richard
E
Cristoforo Manning. 2014. Glove: Globale
vettori per la rappresentazione delle parole. In Procedi-
ings di
IL 2014 Conferenza sull'Empirico
Metodi nell'elaborazione del linguaggio naturale
(EMNLP), pagine 1532–1543, Doha, Qatar.
Associazione per la Linguistica Computazionale.
DOI: https://doi.org/10.3115/v1
/D14-1162
Bryan Perozzi, Rami Al-Rfou, and Steven
Skiena. 2014. Deepwalk: Online learning of
social representations. Negli Atti del
20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining,
KDD ’14, pages 701–710, New York, NY,
USA. ACM. DOI: https://doi.org/10
.1145/2623330.2623732
Matteo Peters, Marco Neumann, Mohit Iyyer,
Matt Gardner, Cristoforo Clark, Kenton Lee,
e Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. Negli Atti di
IL 2018 Conferenza del Nord America
Capitolo dell'Associazione per il calcolo
Linguistica: Tecnologie del linguaggio umano,
Volume 1 (Documenti lunghi), pagine 2227–2237,
New Orleans, Louisiana, Association for Com-
Linguistica putazionale. DOI: https://doi
.org/10.18653/v1/N18-1202
Peter Potash, Alexey Romanov, and Anna
Rumshisky. 2017. Here’s my point: Joint
pointer architecture for argument mining.
IL 2017 Conferenza
Negli Atti di
in Natural Lan-
on Empirical Methods
guage Processing, pages 1364–1373, Copen-
hagen, Denmark. Associazione per il calcolo-
linguistica nazionale. DOI: https://doi.org
/10.18653/v1/D17-1143
Matthew Richardson and Pedro M. Domingos.
2006. Markov logic networks. Machine Learn-
ing, 62(1–2):107–136. DOI: https://doi
.org/10.1007/s10994-006-5833-1
Nick Rizzolo and Dan Roth. 2010. Apprendimento
based Java for rapid development of NLP
systems. In Proceedings of the Seventh Interna-
tional Conference on Language Resources
and Evaluation (LREC’10). Valletta, Malta,
European Language Resources Association
(ELRA).
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Tim Rockt¨aschel and Sebastian Riedel. 2017.
End-to-end differentiable proving. In I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach, R.
Fergus, S. Vishwanathan, and R. Garnett,
editors, Advances
Information
Processing Systems 30, pages 3788–3800.
Curran Associates, Inc.
in Neural
Ali Sadeghian, Mohammadreza Armandpour,
Patrick Ding, and Daisy Zhe Wang. 2019.
Drum: End-to-end differentiable rule mining
on knowledge graphs.
In H. Wallach, H.
Larochelle, UN. Beygelzimer, F. dAlch´e-Buc,
E. Fox, and R. Garnett, editors, Advances in
Neural Information Processing Systems 32,
pages 15347–15357. Curran Associates, Inc.
Gustav Sourek, Vojtech Aschenbrenner, Filip
Zelezn´y, Steven Schockaert,
and Ondrej
Kuzelka. 2018. Lifted relational neural net-
works: Efficient learning of latent relational
structures. Journal of Artificial Intelligence
Research, 62:69–100. DOI: https://doi
.org/10.1613/jair.1.11203
Dhanya Sridhar, James Foulds, Bert Huang, Lise
Getoor, and Marilyn Walker. 2015B. Joint
models of disagreement and stance in online
the 53rd Annual
debate. Negli Atti di
Riunione dell'Associazione per il Computazionale
Linguistica e 7° Giunto Internazionale
Conferenza sull'elaborazione del linguaggio naturale
(Volume 1: Documenti lunghi), pages 116–125,
Beijing, China, Associazione per il calcolo-
linguistica nazionale. DOI: https://doi.org
/10.3115/v1/P15-1012
Christian Stab and Iryna Gurevych, 2017.
Parsing argumentation structures
in per-
suasive essays. Linguistica computazionale,
43(3):619–659. DOI: https://doi.org
/10.1162/COLI a 00295
Shivashankar
Subramanian,
Trevor Cohn,
Timothy Baldwin, and Julian Brooke. 2017.
Joint sentence-document model for manifesto
text analysis. In Proceedings of the Australasian
Language Technology Association Workshop
2017, pages 25–33, Brisbane, Australia.
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, E
Jian Tang. 2019. Rotate: Knowledge graph
embedding by relational rotation in complex
spazio. In 7th International Conference on
117
Learning Representations, ICLR 2019, Nuovo
Orleans, LA, USA, May 6-9, 2019.
Jian Tang, Meng Qu, Mingzhe Wang, Ming
Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE:
large-scale information network embedding. In
Proceedings of the 24th International Confer-
ence on World Wide Web, WWW 2015, Florence,
Italy, May 18-22, 2015, pages 1067–1077. DOI:
https://doi.org/10.1145/2736277
.2741093
Th´eo Trouillon,
Johannes Welbl, Sebastian
Riedel, Eric Gaussier, and Guillaume Bouchard.
2016. Complex embeddings for simple link
prediction. In Proceedings of The 33rd Inter-
national Conference on Machine Learning,
volume 48 of Proceedings of Machine Learning
Research, pages 2071–2080, New York, Nuovo
York, USA, PMLR.
Cunchao tu, Han Liu, Zhiyuan Liu, E
Maosong Sun. 2017. CANE: Context-aware
network embedding for relation modeling. In
Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Documenti lunghi), pages 1722–1731,
Vancouver, Canada. Associazione per il calcolo-
linguistica nazionale.
Iulia Turc, Ming-Wei Chang, Kenton Lee,
and Kristina Toutanova. 2019. Well-read
students learn better: On the importance of
pre-training compact models. arXiv preprint
arXiv:1908.08962v2.
Petar Veliˇckovi´c, Guillem Cucurull, Arantxa
Casanova, Adriana Romero, Pietro Lio,
and Yoshua Bengio. 2017. Graph attention
networks. arXiv preprint arXiv:1710.10903.
Marilyn Walker, Pranav Anand, Rob Abbott,
and Ricky Grant. 2012. Stance classification
using dialogic properties of persuasion. In
Atti del 2012 Conference of the
North American Chapter of the Association for
Linguistica computazionale: Human Language
Technologies,
592–596, Montr´eal,
pagine
Canada, Association
for Computational
Linguistica.
Hai Wang and Hoifung Poon. 2018. Deep prob-
abilistic logic: A unifying framework for
indirect supervision. Negli Atti del 2018
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Conference on Empirical Methods in Natu-
ral Language Processing, pages 1891–1902,
Brussels, Belgium. Associazione per il calcolo-
linguistica nazionale. DOI: https://doi.org
/10.18653/v1/D18-1215
William Yang Wang, Kathryn Mazaitis, E
William W. Cohen. 2013. Programming with
personalized pagerank: A locally groundable
first-order probabilistic logic. In 22nd ACM
International Conference on Information and
Knowledge Management, CIKM’13, San Fran-
cisco, CA, USA, ottobre 27 – novembre 1, 2013,
pages 2129–2138. ACM. DOI: https://
doi.org/10.1145/2505515.2505573
Zhen Wang,
Jianwen Zhang,
Jianlin Feng,
and Zheng Chen. 2014. Knowledge graph
embedding by translating on hyperplanes.
the Twenty-Eighth AAAI
Negli Atti di
Intelligenza, Luglio
Conference on Artificial
27-31, 2014, Qu´ebec City, Qu´ebec, Canada,
pages 1112–1119.
David Weiss, Chris Alberti, Michael Collins,
and Slav Petrov. 2015. Structured training for
neural network transition-based parsing. In
Proceedings of the 53rd Annual Meeting of
the Association for Computational Linguistics
and the 7th International Joint Conference on
Elaborazione del linguaggio naturale (Volume 1: Lungo
Carte), pages 323–333, Beijing, China. Asso-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P15
-1032,
PMCID:
PMC4695984
26003913,
PMID:
Han Xiao, Minlie Huang, Lian Meng, and Xiaoyan
Zhu. 2017. SSP: semantic space projection
for knowledge graph embedding with text
descriptions. In Proceedings of the Thirty-First
AAAI Conference on Artificial Intelligence,
Febbraio 4-9, 2017, San Francisco, California,
USA, pages 3104–3110.
Fan Yang, Zhilin Yang, and William W. Cohen.
2017. Differentiable learning of logical rules
for knowledge base reasoning. In I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, and R. Garnett,
editors, Advances
Information
Processing Systems 30, pages 2319–2328.
Curran Associates, Inc.
in Neural
Xiao Zhang and Dan Goldwasser. 2019. Sentiment
tagging with partial
labels using modular
architectures. In Proceedings of the 57th Annual
Riunione dell'Associazione per il Computazionale
Linguistica, pages 579–590, Florence, Italy.
Associazione per la Linguistica Computazionale.
the 53rd Annual Meeting of
Hao Zhou, Yue Zhang, Shujian Huang, E
Jiajun Chen. 2015. A neural probabilistic
for
structured-prediction model
transition-
Negli Atti
based dependency parsing.
the Asso-
Di
ciation for Computational Linguistics and
the 7th International Joint Conference on
Elaborazione del linguaggio naturale (Volume 1:
Documenti lunghi), pages 1213–1222, Beijing,
China, Association for Computational Linguis-
tic. DOI: https://doi.org/10.3115
/v1/P15-1117, PMCID: PMC4674637
118
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A DRaiL Programs
C Code Snippets
We include code snippets to show how to load
data into DRAIL (Figure 7-a), as well as to
how to define a neural architecture (Figure 7-b).
Neural architectures and feature functions can be
programmed by creating Python classes, and the
module and classes can be directly specified in
the DRAIL program (lines 13, 14, 24, E 29 In
Figure 7-a).
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
3
5
7
1
9
2
4
2
1
2
/
/
T
l
UN
C
_
UN
_
0
0
3
5
7
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figura 7: Code snippets.
Figura 6: DRAIL programs.
B Hyperparameters
For BERT, we use the default parameters.
Task
Param
Search Space
Selected Value
Open Domain
(Local)
Open Domain
(Globale)
Stance Pred.
(Local)
Stance Pred.
(Globale)
Arg. Mining
(Local)
Arg. Mining
(Globale)
Learning Rate
Batch size
Patience
Optimizer
Hidden Units
Non-linearity
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Optimizer
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Dropout
Optimizer
Hidden Units
Non-linearity
Learning Rate
Patience
Batch size
2e-6,5e-6,2e-5,5e-5
32 (Max. Mem)
1,3,5
SGD,Adam,AdamW
128,512
−
2e-6,5e-6,2e-5,5e-5
−
2e-6,5e-6,2e-5,5e-5
1,3,5
16 (Max. Mem)
SGD,Adam,AdamW
2e-6,5e-6,2e-5,5e-5
−
1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20
16,32,64,128
0.01,0.05,0.1
SGD,Adam,AdamW
128,512
−
1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20
−
2e-5
32
3
AdamW
512
ReLU
2e-6
Full instance
5e-5
3
16
AdamW
2e-6
Full instance
5e-2
20
64
0.05
SGD
128
ReLU
1e-4
10
Full instance
Tavolo 9: Hyper-parameter tuning.
119