Modeling Content and Context with Deep Relational Learning

Modeling Content and Context with Deep Relational Learning

Maria Leonor Pacheco and Dan Goldwasser

Departamento de Ciencias de la Computación
Purdue University
West Lafayette, EN 47907
{pachecog, dgoldwas}@purdue.edu

Abstracto

Building models for realistic natural language
tasks requires dealing with long texts and ac-
counting for complicated structural depen-
dencies. Neural-symbolic representations have
emerged as a way to combine the reasoning
capabilities of symbolic methods, con el
expressiveness of neural networks. Sin embargo,
most of the existing frameworks for combining
neural and symbolic representations have been
designed for classic relational learning tasks
that work over a universe of symbolic entities
and relations. en este documento, we present DRAIL,
an open-source declarative framework for spe-
cifying deep relational models, designed to
support a variety of NLP scenarios. Our frame-
work supports easy integration with expressive
language encoders, and provides an interface to
study the interactions between representation,
inference and learning.

1 Introducción

Understanding natural language interactions in
realistic settings requires models that can deal with
noisy textual
the depen-
inputs, reason about
dencies between different textual elements, y
leverage the dependencies between textual content
and the context from which it emerges. Work in
linguistics and anthropology has defined context
as a frame that surrounds a focal communicative
event and provides resources for its interpretation
(Gumperz, 1992; Duranti and Goodwin, 1992).

As a motivating example, consider the interac-
tions in the debate network described in Figure 1.
Given a debate claim (t1), and two consecutive
posts debating it (p1, p2), we define a textual in-
ference task, determining whether a pair of text
elements hold the same stance in the debate
(denoted using the relation Agree(X, Y)). Este
task is similar to other textual inference tasks

100

(Bowman et al., 2015) that have been successfully
approached using complex neural representations
(Peters et al., 2018; Devlin et al., 2019). In add-
ition, we can leverage the dependencies between
these decisions. Por ejemplo, assuming that one
post agrees with the debate claim (Agree(t1,
p2)), and the other one does not (¬Agree(t1, p1)),
the disagreement between the two posts can be
inferred: ¬Agree(t1, p1) ∧ Agree(t1, p2) → ¬
Agree(p1, p2). Finalmente, we consider the social
context of the text. The disagreement between the
posts can reflect a difference in the perspectives
their authors hold on the issue. This informa-
tion might not be directly observed, but it can be
inferred using the authors’ social interactions and
comportamiento, given the principle of social homophily
(McPherson et al., 2001), stating that people with
strong social ties are likely to hold similar views
and authors’ perspectives can be captured by rep-
resenting their social interactions. Exploiting this
information requires models that can align the
social representation with the linguistic one.

Motivated by these challenges, we introduce
DRAIL1, a Deep Relational Learning framework,
which uses a combined neuro-symbolic repre-
sentation for modeling the interaction between
multiple decisions in relational domains. Similar
to other neuro-symbolic approaches (Mao et al.,
2019; Cohen et al., 2020), our goal is to exploit
the complementary strengths of the two modeling
paradigms. Symbolic representations, used by
logic-based systems and by probabilistic graphical
modelos (Richardson and Domingos, 2006; Bach
et al., 2017), are interpretable, and allow domain
experts to directly inject knowledge and constrain
the learning problem. Neural models capture de-
pendencies using the network architecture and are
better equipped to deal with noisy data, como
texto. Sin embargo, they are often difficult to interpret
and constrain according to domain knowledge.

1https://gitlab.com/purdueNlp/DRaiL/.

Transacciones de la Asociación de Lingüística Computacional, volumen. 9, páginas. 100–119, 2021. https://doi.org/10.1162/tacl a 00357
Editor de acciones: Hoifung Poon. Lote de envío: 6/2020; Lote de revisión: 10/2020; Publicado 3/2021.
C(cid:13) 2021 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

texts reflecting these ideologies, by exploiting
the relations that bridge social and linguistic
información (ver figura 1).

To demonstrate DRAIL’s modeling approach,
we introduce the task of open-domain stance
prediction with social context, which combines
social network analysis and textual inference over
complex opinionated texts, as shown in Figure 1.
We complement our evaluation of DRAIL with two
additional tasks, issue-specific stance prediction,
where we identify the views expressed in debate
forums with respect to a set of fixed issues (Caminante
et al., 2012), and argumentation mining (Stab
and Gurevych, 2017), a document-level discourse
analysis task.

2 Trabajo relacionado

En esta sección, we survey several lines of work
dealing with symbolic, neural, and hybrid repre-
sentations for relational learning.

2.1 Languages for Graphical Models

Several high-level languages for specifying graph-
ical models have been suggested. BLOG (Milch
et al., 2005) and CHURCH (Goodman et al., 2008)
were suggested for generative models. For discri-
minative models, we have Markov Logic Net-
obras (MLNs) (Richardson and Domingos, 2006)
and Probabilistic Soft Logic (PSL) (Bach et al.,
2017). Both PSL and MLNs combine logic and
probabilistic graphical models in a single repre-
sentation, where each formula is associated with
a weight, and the probability distribution over
possible assignments is derived from the weights
of the formulas that are satisfied by such assign-
mentos. Like DRAIL, PSL uses formulas in clausal
forma (specifically collections of horn clauses).
The main difference between DRAIL and these
languages is that, in addition to graphical models,
it uses distributed knowledge representations
to represent dependencies. Other discriminative
methods include FACTORIE (McCallum et al.,
2009), an imperative language to define factor
graphs, Constrained Conditional Models (CCMs)
(Rizzolo and Roth, 2010; Kordjamshidi et al.,
2015) an interface to enhance linear classifiers
with declarative constraints, and ProPPR (Wang
et al., 2013) a probabilistic logic for large data-
bases that approximates local groundings using a
variant of personalized PageRank.

Cifra 1: Example debate.

Our main design goal in DRAIL is to provide
a generalized tool, specifically designed for NLP
tareas. Existing approaches designed for classic
relational learning tasks (Cohen et al., 2020),
such as knowledge graph completion, are not
equipped to deal with the complex linguistic input,
whereas others are designed for very specific NLP
settings such as word-based quantitative reason-
ing problems (Manhaeve et al., 2018) or aligning
images with text (Mao et al., 2019). We discuss the
differences between DRAIL and these approaches
en la sección 2. The examples in this paper focus
on modelings various argumentation mining tasks
and their social and political context, but the same
principles can be applied to wide array of NLP
tasks with different contextualizing information,
such as images that appear next to the text, o
prosody when analyzing transcribed speech, a
name a few examples.

DRAIL uses a declarative language for de-
fining deep relational models. Similar to other
declarative languages (Richardson and Domingos,
it allows users to
2006; Bach et al., 2017),
inject their knowledge by specifying dependencies
between decisions using first-order logic rules,
which are later compiled into a factor graph
with neural potentials. In addition to probabilistic
inferencia, DRAIL also models dependencies using
a distributed knowledge representation, denotado
RELNETS, which provides a shared representation
space for entities and their relations, trained using
a relational multi-task learning approach. Este
provides a mechanism for explaining symbols, y
aligning representations from different modali-
corbatas. Following our running example, ideological
standpoints, such as Liberal or Conservative,
are discrete entities embedded in the same space
as textual entities and social entities. These en-
tities are initially associated with users, sin embargo
using RELNETS this information will propagate to

101

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

2.2 Node Embedding and Graph Neural Nets

A recent alternative to graphical models is to use
neural nets to represent and learn over relational
datos, represented as a graph. Similar to DRAIL’s
the learned node representation can
RELNETS,
be trained by several different prediction tasks.
Sin embargo, unlike DRAIL, these methods do not
use probabilistic inference to ensure consistency.
Node embeddings approaches (Perozzi et al.,
2014; Tang et al., 2015; Pan et al., 2016; Grover
and Leskovec, Grover and Leskovec, 2016;
Tu et al., 2017) learn a feature representation
for nodes capturing graph adjacency information,
such that the similarity in the embedding space of
any two nodes is proportional
to their graph
distance and overlap in neighboring nodes. Alguno
frameworks (Pan et al., 2016; Xiao et al., 2017;
Tu et al., 2017) allow nodes to have textual
propiedades, which provide an initial feature repre-
sentation when learning to represent the graph
relaciones. When dealing with multi-relational data,
such as knowledge graphs, both the nodes and
the edge types are embedded (Bordes et al., 2013;
Wang y cols., 2014; Trouillon et al., 2016; Sun et al.,
2019). Finalmente, these methods learn to represent
nodes and relations based on pair-wise node
relaciones, without representing the broader graph
context in which they appear. Graph neural nets
(Kipf and Welling, 2017; Hamilton et al., 2017;
Veliˇckovi´c et al., 2017) create contextualized
node representations by recursively aggregating
neighboring nodes.

2.3 Hybrid Neural-Symbolic Approaches

Several recent systems explore ways to combine
neural and symbolic representations in a unified
way. We group them into five categories.

Lifted rules to specify compositional nets.
These systems use an end-to-end approach and
learn relational dependencies in a latent space.
Lifted Relational Neural Networks (LRNNs)
(Sourek et al., 2018) and RelNNs (Kazemi and
piscina, 2018) are two examples. These systems
map observed ground atoms, hechos, and rules to
specific neurons in a network and define compo-
sition functions directly over them. While they
provide for a modular abstraction of the relational
inputs, they assume all inputs are symbolic and do
not leverage expressive encoders.

Differentiable inference. These systems iden-
tify classes of logical queries that can be compiled
into differentiable functions in a neural network
infrastructure. In this space we have Tensor
Logic Networks (TLNs) (Donadello et al., 2017)
and TensorLog (Cohen et al., 2020). Symbols are
represented as row vectors in a parameter matrix.
The focus is on implementing reasoning using a
series of numeric functions.

Rule induction from data. These systems
are designed for inducing rules from symbolic
knowledge bases, which is not in the scope of our
estructura. In this space we find Neural Theorem
Provers (NTPs) (Rockt¨aschel and Riedel, 2017),
Neural Logic Programming (Yang et al., 2017),
DRUM (Sadeghian et al., 2019) and Neural Logic
Machines (NLMs) (Dong et al., 2019). NTPs use
a declarative interface to specify rules that add
inductive bias and perform soft proofs. El otro
approaches work directly over the database.

Deep classifiers and probabilistic inference.
These systems propose ways to integrate prob-
abilistic inference and neural networks for diverse
learning scenarios. DeepProbLog (Manhaeve et al.
20180 extends the probabilistic logic program-
ming language ProbLog to handle neural
predicates. They are able to learn probabilities for
atomic expressions using neural networks. El
input data consists of a combination of feature
vectors for the neural predicates, together with
other probabilistic facts and clauses in the logic
programa. Targets are only given at the output
side of the probabilistic reasoner, allowing them
to learn each example with respect to a single
query. Por otro lado, Deep Probabilistic
Logic (DPL) (Wang and Poon 2018) combines
neural networks with probabilistic logic for indi-
rect supervision. They learn classifiers using neu-
ral networks and use probabilistic logic to intro-
duce distant supervision and labeling functions.
Each rule is regarded as a latent variable, y el
logic defines a joint probability distribution over
all labeling decisions. Entonces, the rule weights and
the network parameters are learned jointly using
variational EM. A diferencia de, DRAIL focuses on
learning multiple interdependent decisions from
datos, handling and requiring supervision for all
unknown atoms in a given example. Por último, Deep
Logic Models (DLMs) (Marra et al., 2019) learn a
set of parameters to encode atoms in a probabilistic
logic program. Similarly to Donadello et al. (2017)

102

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Symbolic Features

Neural Features

Regla

Induction Symbols

Embed. End-to-end Backprop. to Architecture Multi-Task Open
Fuente

Encoders

Agnostic

Aprendiendo

Neural

Symbolic Raw Decla- Prob/Logic
Inference

Inputs

Inputs

rative

Sistema

MLN
FACTORIE
CCM
PSL
LRNNs
RelNNs
LTNs
TensorLog
NTPs
Neural LP
DRUM
NLMs
DeepProbLog
DPL
DLMs
DRAIL

Mesa 1: Comparing systems.

and Cohen et al. (2020), they use differentiable
inferencia, allowing the model to be trained end-
to-end. Like DRAIL, DLMs can work with diverse
neural architectures and backpropagate back to
the base classifiers. The main difference between
DLMs and DRAIL is that DRAIL ensures repre-
sentation consistency of entities and relations
across all learning tasks by employing RELNETS.

Deep structured models. More generally,
deep structured prediction approaches have been
successfully applied to various NLP tasks such as
named entity recognition and dependency parsing
(Chen and Manning, 2014; Weiss et al., 2015; Mamá
and Hovy, 2016; Lample et al., 2016; Kiperwasser
and Goldberg, 2016; Malaviya et al., 2018).
When the need arises to go beyond sentence-
nivel, some works combine the output scores of
independently trained classifiers using inference
(Beltagy et al., 2014; ?; Liu et al., 2016;
Subramanian et al., 2017; Ning et al., 2018),
whereas others implement joint learning for their
specific domains (Niculae et al., 2017; Han et al.,
2019). Our main differentiating factor is that we
provide a general interface that leverages first
order logic clauses to specify factor graphs and
express constraints.

To summarize these differences, we outline a
feature matrix in Table 1. Given our focus in
NLP tasks, we require a neural-symbolic system
eso (1) allows us to integrate state-of-the-art text
encoders and NLP tools, (2) supports structured
prediction across long texts, (3) lets us combine
several modalities and their representations (p.ej.,
social and textual information), y (4) results in

an explainable model where domain constraints
can be easily introduced.

3 The DRAIL Framework

DRAIL was designed for supporting complex
NLP tasks. Problems can be broken down into
domain-specific atomic components (which could
be words, oraciones, paragraphs or full documents,
depending on the task), and dependencies be-
tween them, their properties and contextualizing
information about them can be explicitly modeled.
In DRAIL, dependencies can be modeled over
the predicted output variables (similar to other
probabilistic graphical models), as well as over
the neural representation of the atoms and their
relationships in a shared embedding space. Este
section explains the framework in detail. We begin
with a high-level overview of DRAIL and the
process of moving from a declarative definition to
a predictive model.

A DRAIL task is defined by specifying a
finite set of entities and relations. Entities are
either discrete symbols (p.ej., POS tags, ideologies,
specific issue stances), or attributed elements with
complex internal information (p.ej., documentos,
users). Decisions are defined using rule templates,
formatted as Horn clauses: tLH ⇒ tRH , dónde
tLH (body) is a conjunction of observed and
predicted relations, and tRH (cabeza)
el
producción
el
debate prediction task in Figure 1, it consists
of several sub-tasks, involving textual inference
(Agree(t1, t2)), relaciones sociales (VoteFor(tu, v)) y
their combination (Agree(tu, t)). We illustrate how

es
relation to be learned. Considerar

103

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3



























































































we describe the neural components and learning
procedures.

3.1 Modeling Language

We begin our description of DRAIL by defining
the templating language, consisting of entities,
relaciones, and rules, and explaining how these
elements are instantiated given relevant data.

Entities are named symbolic or attributed
elementos. An example of a symbolic entity is a
political ideology (p.ej., Liberal or Conservative).
An example of an attributed entity is a user with
edad, género, and other profile information, o
a document associated with textual content. En
DRAIL entities can appear either as constants,
written as strings in double or single quote
(p.ej., «user1») or as variables, cuales son
identifiers,
substituted with constants when
grounded. Variables are written using unquoted
upper case strings (p.ej., X, X1). Both constants and
variables are typed.

Relations are defined between entities and their
propiedades, or other entities. Relations are defined
using a unique identifier, a named predicate,
and a list of typed arguments. Atoms consist
of a predicate name and a sequence of entities,
consistent with the type and arity of the relation’s
argument list. If the atom’s arguments are all
constants, it is referred to as a ground atom. Para
ejemplo, Agreeuser1», «user2») is a ground
atom representing whether «user1» y «user2»
are in agreement. When atoms are not grounded
(p.ej., Agree(X, Y)) they serve as placeholders for
all the possible groundings that can be obtained by
replacing the variables with constants. Relaciones
can either be closed (es decir., all of their atoms are
observado) or open, when some of the atoms can
be unobserved. In DRAIL, we use a question mark
? to denote unobserved relations. These relations
are the units that we reason over.

To help make these concepts concrete, estafa-
sider the following example analyzing stances in a
debate, as introduced in Figure 1. Primero, we define
the entities. User =u1», «u2»}, Claim =t1»}
Post =p1», «p2»}. Users are entities associ-
ated with demographic attributes and prefer-
ences. Claims are assertions over which users
debate. Posts are textual arguments that users
write to explain their position with respect
to the claim. We create these associations by

Cifra 2: General overview of DRAIL.

to specify the task as a DRAIL program in Figure 2
(izquierda), by defining a subset of rule templates to
predict these relations.

Each rule template is associated with a neural
architecture and a feature function, mapping the
initial observations to an input vector for each
neural net. We use a shared relational embedding
espacio, denoted RELNETS, to represent entities and
relations over them. As described in Figure 2
RelNets Layer»), each entity and relation type
is associated with an encoder,
trained jointly
across all prediction rules. This is a form of re-
lational multi-task learning, as the same entities
and relations are reused in multiple rules and
their representation is updated accordingly. Cada
rule defines a neural net, learned over the relations
defined on the body. They they take a composition
of the vectors generated by the relations encoders
as an input (Cifra 2, «Rule Layer»). DRAIL
is architecture-agnostic, and neural modules for
relations and rules can be specified
entidades,
using PyTorch (code snippets can be observed in
Apéndice C). Our experiments show that we can
use different architectures for representing text,
users, as well as for embedding discrete entities.

The relations in the Horn clauses can correspond
to hidden or observed information, and a specific
input is defined by the instantiations—or ground-
ings—of these elements. The collection of all rule
groundings results in a factor graph representing
our global decision, taking into account the con-
sistency and dependencies between the rules. Este
way, the final assignments can be obtained by
running an inference procedure. Por ejemplo,
the dependency between the views of users on
the debate topic (Agree(tu, t)) and the agreement
between them (VoteFor(tu, v)), is modeled as a
factor graph in Figure 2 («Structured Inference
Layer»)).
Nosotros

en
Sección 3.1. Entonces, in Sections 3.2, 3.3, y 4,

the DRAIL language

formalize

104

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

defining a set of relations, capturing author-
ship Author(User, Correo), votes between users
VoteFor(User, User)?, and the position users, y
their posts, take with respect to to the debate
afirmar. Agree(Afirmar, User)?, Agree(Afirmar, Correo)?.
The authorship relation is the only closed one, para
ejemplo, the atom: O = {Authoru1», «p1»)}.

Rules are functions that map literals (atoms or
their negation) to other literals. Rules in DRAIL are
defined using templates formatted as Horn clauses:
tLH ⇒ tRH, where tLH (body) is a conjunction
of literals, and tRH (cabeza) is the output literal
to be predicted, and can only be an instance of
open relations. Horn clauses allow us to describe
structural dependencies as a collection of «if-then»
normas, which can be easily interpreted. Para examen-
por ejemplo, Agree(X, C) ∧ VoteFor(Y, X) ⇒ Agree(Y, C) ex-
presses the dependency between votes and users
holding similar stances on a specific claim. Nosotros
note that rules can be rewritten in disjunctive form
by converting the logical implication into a dis-
junction between the negation of the body and the
cabeza. Por ejemplo, the rule above can be rewritten
as ¬Agree(X, C) ∨ ¬VoteFor(Y, X) ∨ Agree(Y, C).

The DRAIL program consists of a set of rules,
which can be weighted (es decir., soft constraints), o
unweighted (es decir., hard constraints). Each weighted
rule template defines a learning problem, usado
to score assignments to the head of the rule.
Because the body may contain open atoms,
each rule represents a factor function expressing
dependencies between open atoms in the body
and head. Unweighted rules, or constraints, forma
the space of feasible assignments to open atoms,
and represent background knowledge about the
domain.

Given the set of grounded atoms O, rules can
be grounded by substituting their variables with
constants, such that the grounded atoms corres-
pond to elements in O. This process results in a set
of grounded rules, each corresponding to a poten-
tial function or to a constraint. Together they
define a factor graph. Entonces, DRAIL finds the
optimally scored assignments for open atoms by
performing MAP inference. To formalize this
proceso, we first make the observation that rule
groundings can be written as linear inequalities,
directly corresponding to their disjunctive form,
como sigue:

yi +

X
i∈I +
r

X
i∈I −
r

(1 − yi) ≥ 1

(1)

105

(I −

Where I +
r ) correspond to the set of open
r
atoms appearing in the rule that are not negated
(respectivamente, negado). Ahora, MAP inference
can be defined as a linear program. Each rule
grounding r, generated from template t(r), con
input features xr and open atoms yr defines the
potencial

ψr(xr, yr) = min 

yi +

X
i∈I +
r

X
i∈I −
r

(1 − yi), 1

(2)
added to the linear program with a weight wr.
Unweighted rule groundings are defined as

C(xc, yc) = 1 −

yi −

X
i∈I +
C

X
i∈I −
C

(1 − yi)

(3)

with c(xc, yc) ≤ 0 added as a constraints to the
linear program. This way, the MAP problem can
be defined over the set of all potentials Ψ and the
set of all constraints C as

arg max
y∈ {0,1}norte

PAG (y|X) ≡ arg max

y∈ {0,1}n X
ψr,t∈Ψ

wr ψr(xr, yr)

such that c(xc, yc) ≤ 0; ∀c ∈ C

In addition to logical constraints, we also support
arithmetic constraints than can be written in the
form of linear combinations of atoms with an
inequality or an equality. Por ejemplo, podemos
enforce the mutual exclusivity of liberal and
conservative ideologies for any user X by writing:

Ideology(X, «estafa») + Ideology(X, «lib») = 1

We borrow some additional syntax from PSL to
make arithmetic rules easier to use. Bach et al.
(2017) define a summation atom as an atom that
takes terms and/or sum variables as arguments.
A summation atom represents the summations of
ground atoms that can be obtained by substituting
individual variables and summing over all possible
constants for sum variables. Por ejemplo, nosotros
could rewrite the above ideology constraint
as Ideology(X, +I) = 1, where Ideology(X, +I)
represents the summation of all atoms with
predicate Ideology that share variable X.
DRAIL uses two solvers, Gurobi

(Gurobi
Optimization, 2015) and AD3 (Martins et al.,
2015)
for exact and approximate inference,
respectivamente.

To ground DRAIL programs in data, we create
an in-memory database consisting of all relations

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

expressed in the program. Observations associated
with each relation are provided in column sepa-
rated text files. DRAIL’s compiler instantiates the
program by automatically querying the database
and grounding the formatted rules and constraints.

3.2 Neural Components

Let r be a rule grounding generated from template
t, where t is tied to a neural scoring function
Φt and a set of parameters θt (Rule Layer in
Cifra 2). In the previous section, we defined the
MAP problem for all potentials ψr(X, y) ∈ Ψ in a
DRAIL program, where each potential has a weight
wr. Consider the following scoring function:

wr = Φt(xr, yr; θt) = Φt(xrel0, . . . , xreln−1; θt)
(4)

Notice that all potentials generated by the same
template share parameters. We define each scoring
function Φt over the set of atoms on the left
hand side of the rule template. Let t = rel0 ∧
rel1 ∧ . . . ∧ reln−1 ⇒ reln be a rule template.
Each atom reli is composed of a relation type, es
arguments and feature vectors for them, as shown
En figura 2, «Input Layer».

Given that a DRAIL program is composed of
many competing rules over the same problem, nosotros
want to be able to share information between the
different decision functions. For this purpose, nosotros
introduce RELNETS.

3.3 RELNETS

A DRAIL program often uses the same entities and
relations in multiple different rules. The symbolic
aspect of DRAIL allows us to constrain the values
of open relations, and force consistency across all
their occurrences. The neural aspect, as defined in
ecuación. 4, associates a neural architecture with each
rule template, which can be viewed as a way to
embed the output relation.

eso

the fact

We want

to exploit

hay
repeating occurrences of entities and relations
across different rules. Given that each rule defines
a learning problem, sharing parameters allows us
to shape the representations using complementary
learning objectives. This form of relational multi-
task learning is illustrated it in Figure 2, «RelNets
Layer».

We formalize this idea by introducing relation-
specific and entity-specific encoders and their
parámetros (φrel; θrel) y (φent; θent), cuales son

106

reused in all rules. Como ejemplo, let’s write
the formulation for the rules outlined in Figure 2,
where each relation and entity encoder is defined
over the set of relevant features.

wr0 = Φt0(φdebates(φuser, φtext))
wr1 = Φt1(φagree(φuser, φtext), φvotefor(φuser, φuser))

Note that entity and relation encoders can be
arbitrarily complex, depending on the application.
Por ejemplo, when dealing with text, we could
use BiLSTMs or a BERT encoder.

Our goal when using RELNETS is to learn entity
representations that capture properties unique to
their types (p.ej., users, asuntos), as well as relational
patterns that contextualize entities, allowing them
to generalize better. We make the distinction
between raw (or attributed) entities and symbolic
entidades. Raw entities are associated with rich, todavía
unstructured, information and attributes, como
text or user profiles. Por otro lado, symbolic
entities are well-defined concepts, and are not
associated with additional information, como
political ideologies (p.ej., liberal) and issues (p.ej.,
gun-control). With this consideration, we identify
two types of representation learning objectives:

Embed Symbol / Explain Data: Aligns the
embedding of symbolic entities and raw entities,
grounding the symbol in the raw data, and us-
ing the symbol embedding to explain properties
of previously unseen raw-entity instances. Para
ejemplo, aligning ideologies and text to (1) obtener
an ideology embedding that
to the
statements made by people with that ideology,
o (2) interpret text by providing a symbolic label
for it.

is closest

Traducir / Correlate: Aligns the represen-
tation of pairs of symbolic or raw entities. Para
ejemplo, aligning user representations with text,
to move between social and textual information, como
como se muestra en la figura 1, «Social-Linguistic Relations».
Or capturing the correlation between sym-
bolic judgements like agreement and matching
ideologies.

4 Aprendiendo

The scoring function used for comparing output
assignments can be learned locally for each
rule separately, or globally, by considering the
dependencies between rules.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Global Learning The global approach uses
inference to ensure that the parameters for all
weighted rule templates are consistent across all
decisiones. Let Ψ be a factor graph with potentials
{ψr} ∈ Ψ over the all possible structures Y .
Let θ = {θt} be a set of parameter vectors, y
Φt(xr, yr; θt) be the scoring function defined for
potential ψr(xr, yr). Here ˆy ∈ Y corresponds
to the current prediction resulting from the MAP
inference procedure and y ∈ Y corresponds to the
gold structure. We support two ways to learn θ:

(1) The structured hinge loss

máximo(0, máximo
ˆy∈Y

((ˆy, y) +

Φt(xr, ˆyr; θt))

Φt(xr, yr; θt)

X
ψr∈Ψ

X
ψr∈Ψ

(5)

(2) The general CRF loss

−log p(y|X)= −log 

1

z(X) Y
ψr∈Ψ

exp.

Φt(xr, yr; θt)
(cid:9)

(cid:8)

= -

X
ψr ∈Ψ

Φt(xr, yr; θt) + log Z(X)

(6)

Where Z(X) is a global normalization term
computed over the set of all valid structures Y .

z(X) =

X
y’∈Y

Y
ψr∈Ψ

exp.

Φt(xr, y′
(cid:8)

r; θt)

(cid:9)

When inference is intractable, approximate
inferencia (p.ej., AD3) can be used to obtain ˆy.
To approximate the global normalization term
z(X) in the general CRF case, we follow Zhou
et al. (2015); Andor et al. (2016) and keep a pool
βk of k of high-quality feasible solutions during
inferencia. This way, we can sum over the solutions
in the pool to approximate the partition function
Φt(xr, y′
Py’∈βk Qψr∈Ψ exp
en este documento, we use the structured hinge loss
for most experiments, and include a discussion on
the approximated CRF loss in Section 5.7.

r; θt)

.
(cid:9)

(cid:8)

Inference The parameters

Joint
cada
weighted rule template are optimized indepen-
dently. Following Andor et al. (2016), we show
that joint inference serves as a way to greedily
approximate the CRF loss, where we replace the

para

107

normalization term in Eq. (6) with a greedy
approximation over local normalization as:

1

Qψr ∈Ψ ZL(xr) Y
Φt(xr, yr; θt) +

ψr ∈Ψ

−log 

= -

X
ψr ∈Ψ

exp.

Φt(xr, yr; θt)
(cid:9)

(cid:8)

X
ψr ∈Ψ

log ZL(xr)

(7)

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

where ZL(xr) is computed over all the valid
assignments y′
r for each factor ψr. We refer to
models that use this approach as JOINTINF.

ZL(xr) =

exp.

(cid:8)

X
y′
r

Φt(xr, y′

r; θt)

(cid:9)

5 Experimental Evaluation

We compare DRAIL to representative models from
each category covered in Section 2. Our goal is to
examine how different types of approaches cap-
ture dependencies and what are their limitations
when dealing with language interactions. Estos
baselines are described in Section 5.1. Nosotros también
evaluate different strategies using DRAIL in
Sección 5.2.

We focus on three tasks: open debate stance
predicción (Sec. 5.3), issue-specific stance pre-
diction (Sec. 5.4) and argumentation mining
(Sec. 5.5), details regarding the hyper-parameters
used for all tasks can be found in Appendix B.

5.1 Líneas de base

End-to-end Neural Nets: We test all approaches
against neural nets trained locally on each task,
without explicitly modeling dependencies. En esto
espacio, we consider two variants: INDNETS, dónde
each component of the problem is represented
using an independent neural network, and E2E,
where the features for the different components
are concatenated at the input and fed to a single
neural network.

Relational Embedding Methods: Introduced
en la sección 2.2,
these methods embed nodes
and edge types for relational data. Ellos son
typically designed to represent symbolic entities
and relations. Sin embargo, because our entities can be
defined by raw textual content and other features,
we define the relational objectives over our
encoders. This adaptation has proven successful
for domains dealing with rich textual information
tres
(Lee and Goldwasser, 2019). We test

relational knowledge objectives: TransE (Bordes
et al., 2013), ComplEx (Trouillon et al., 2016),
and RotatE (Sun et al., 2019). Limitaciones: (1)
These approaches cannot constrain the space using
domain knowledge, y (2) they cannot deal with
relations involving more than two entities, limiting
their applicability to higher order factors.

Probabilistic Logics: We compare to PSL
(Bach et al., 2017), a purely symbolic probabilistic
logic, and TensorLog (Cohen et al., 2020), a
neuro-symbolic one. In both cases, we instantiate
the program using the weights learned with our
base encoders. Limitaciones: These approaches do
not provide a way to update the parameters of the
base classifiers.

5.2 Modeling Strategies

Local vs. Global Learning: The trade-off be-
tween local and global learning has been explored
for graphical models (MEMM vs. CRF), and for
deep structured prediction (Chen and Manning,
2014; Andor et al., 2016; Han et al., 2019).
Although local
the learned
scoring functions might not be consistent with
the correct global prediction. Following (Han
et al., 2019), we initialize the parameters using lo-
cal models.

learning is faster,

RELNETS: We will show the advantage of
having relational representations that are shared
across different decisions, in contrast to having
independent parameters for each rule. Tenga en cuenta que
en todos los casos, we will use the global learning objec-
tive to train RELNETS.

Modularity: Decomposing decisions

en
relevant modules has been shown to simplify the
learning process and lead to better generalization
(Zhang and Goldwasser, 2019). We will contrast
the performance of modular and end-to-end mod-
els to represent text and user information when
predicting stances.

Representation Learning and Interpretabil-
idad: We will do a qualitative analysis to show how
we are able to embed symbols and explain data
by moving between symbolic and sub-symbolic
representaciones, as outlined in Section 3.3.

5.3 Open Domain Stance Prediction

Traditionally, stance prediction tasks have focused
on predicting stances on a specific topic, como
abortion. Predicting stances for a different topic,
such as gun control, would require learning a new

108

Cifra 3: DRAIL Program for O.D. Stance Prediction.
t: Thread, C: Afirmar, PAG: Correo, Ud.: User, V: Voter, I: Ideology, A,B:
Can be any in {Afirmar, Correo, User}

model from scratch. en esta tarea, we would like to
leverage the fact that stances in different domains
are correlated. Instead of using a pre-defined set
of debate topics (es decir., symbolic entities) we define
the prediction task over claims, expressed in text,
specific to each debate. Concretely, each debate
will have a different claim (es decir., different value for
C in the relation Claim(t, C), where T corresponds
to a debate thread). We refer to these settings as
Open-Domain and write down the task in Figure 3.
In addition to the textual stance prediction problem
(r0), where P corresponds to a post, we represent
users (Ud.) and define a user-level stance prediction
problema (r1). We assume that additional users read
the posts and vote for content that supports their
puntos de vista, resulting in another prediction problem
(r2,r3). Entonces, we define representation learning
tareas, which align symbolic (ideology, defined as
I) and raw (users and text) entidades (r4-r7). Finalmente,
we write down all dependencies and constrain the
final prediction (c0-c7).

Dataset: We collected a set of 7,555 debates
from debate.org, containing a total of 42,245
posts across 10 broader political issues. Para
given issue, the debate topics are nuanced and
vary according to the debate question expressed in
texto (p.ej., Should semi-automatic guns be banned,
Conceal handgun laws reduce violent crime).
Debates have at least two posts, containing up
a 25 sentences each. In addition to debates and
posts, we collected the user profiles of all users
participating in the debates, as well as all users

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Modelo

Aleatorio
Ud.

Hard
Ud.

PAG

PAG

V

Local

INDNETS
E2E
TransE

Reln.
Emb. ComplEx

V
63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
58.5 54.1 52.6 57.2 53.1 51.2
61.0 63.3 58.1 57.3 55.0 55.4
59.6 58.3 54.2 57.9 54.6 51.0
problema.
78.7 77.5 55.4 72.6 71.8 52.6
Logic. TensorLog 72.7 71.9 56.2 70.0 67.4 55.8
80.2 79.2 54.4 76.9 75.5 51.3
80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
81.9 80.4 57.0 78.0 77.2 53.7

E2E +Inf
JOINTINF
GLOBAL
RELNETS

RotatE
PSL

DRaiL

Local

AC

AC
corriente continua

AC
corriente continua
CAROLINA DEL SUR

Modelo

Aleatorio
Ud.

Hard
Ud.

PAG

PAG

V

V
INDNETS 63.9 61.3 54.4 62.2 53.0 51.3
66.3 71.2 54.4 63.4 68.1 51.3
E2E
JOINTINF 73.6 71.8
GLOBAL
73.6 72.0
RELNETS 73.8 72.0
JOINTINF 80.7 79.5
GLOBAL
81.4 79.9
RELNETS 81.8 80.1
JOINTINF 80.7 79.5 55.6 75.2 74.0 52.5
81.0 79.5 55.8 75.3 74.0 53.0
GLOBAL
RELNETS 81.9 80.4 57.0 78.0 77.2 53.7

69.0 67.2
69.0 67.2
71.7 69.5
75.6 74.4
75.8 74.6
77.8 76.4











Mesa 2: General Results for Open Domain Stance Prediction (Left), Variations of the Model (Right). PAG:Correo,
Ud.:User, V:Voter

that cast votes for the debate participants. Profiles
ideology).
consist of attributes (p.ej., género,
User data is considerably sparse. We create two
evaluation scenarios, random and hard. En el
random split, debates are randomly divided into
ten folds of equal size. In the hard split, debates
are separated by political issue. This results in a
harder prediction problem, as the test data will not
share topically related debates with the training
datos. We perform 10-fold cross validation and
report accuracy.

Entity and Relation Encoders: We represent
posts and titles using a pre-trained BERT-small2
encoder (Turc et al., 2019), a compact version of
the language model proposed by Devlin et al.
2019. For users, we use feed-forward computa-
tions with ReLU activations over the profile fea-
tures and a pre-trained node embedding (Grover
and Leskovec, 2016) over the friendship graph.
All relation and rule encoders are represented
as feed-forward networks with one hidden layer,
ReLU activations and a softmax on top. Tenga en cuenta que
all of these modules are updated during learning.
Mesa 2 (Left) shows results for all the models
descrito en la Sección 5.1. In E2E models, post and
user information is collapsed into a single mod-
ule (regla), whereas in INDNETS, JOINTINF, GLOBAL
and RELNETS they are modeled separately. Todo
other baselines use the same underlying modular
encoders. We can appreciate the advantage of
relational embeddings in contrast to INDNETS for
user and voter stances, particularly in the case of
ComplEx and RotatE. We can attribute this to the

2We found negligible difference in performance between
BERT and BERT-small for this task, while obtaining a
considerable boost in speed.

fact that all objectives are trained jointly and entity
encoders are shared. Sin embargo, approaches that
explicitly model inference, like PSL, TensorLog,
and DRAIL outperform relational embeddings and
end-to-end neural networks. This is because they
enforce domain constraints.

We explain the difference between the per-
formance of DRAIL and the other probabilistic
logics by: (1) The fact that we use exact inference
instead of approximate inference, (2) PSL learns
to weight the rules without giving priority to a
particular task, whereas the JOINTINF model works
directly over the local outputs, and most impor-
tantly, (3) our GLOBAL and RELNETS models back-
propagate to the base classifiers and fine-tune
parameters using a structured objective.

En mesa 2 (Right) we show different versions
of the DRAIL program, by adding or removing
certain constraints. AC models only enforce
author consistency, AC-DC models enforce both
author consistency and disagreement between
respondents, and finally, AC-DC-SC models in-
troduce social information by considering voting
comportamiento. We get better performance when we
model more contextualizing information for the
RELNETS case. This is particularly helpful in the
Hard case, where contextualizing information,
combined with shared representations, help the
model generalize to previously unobserved topics.
With respect to the modeling strategies listed in
Sección 5.2, we can observe: (1) The advantage of
using a global learning objective, (2) the advantage
of using RELNETS to share information and (3) el
advantage of breaking down the decision into
modules, instead of learning an end-to-end model.
Entonces, we perform a qualitative evaluation to
illustrate our ability to move between symbolic

109

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Issue Debate Statements

Guns

No gun laws should be passed restricting the right to bear arms
Gun control is an ineffective comfort tactic used by the government to fool the American people
Gun control is good for society
In the US handguns ought to be banned
The USA should ban most guns and confiscate them

Con Libt Mod Libl Pro
.98
.00
.00
.03
.22
.08
.60
.06
.14
.02
.01
.03
.99
.01
.00

.01
.02
.15
.93
.00

.01
.65
.06
.01
.00

Issue

Ideology Statements close in the embedding space

LGBT

Libl
Estafa

gay marriage ought be legalized, gay marriage should be legalized, same-sex marriage should be federally legal
Leviticus 18:22 y 20:13 prove the anti gay marriage position, gay marriage is not bad, homosexuality is not a sin nor taboo

Mesa 3: Representation Learning Objectives: Explain Data (Top) and Embed Symbol (Bottom).
Note that ideology labels were learned from user profiles, and do not necessarily represent the official stances of political
parties.

and raw information. Mesa 3 (Top) takes a
set of statements and explains them by looking
at the symbols associated with them and their
puntaje. For learning to map debate statements
to ideological symbols, we rely on the partial
supervision provided by the users that self-identify
with a political ideology and disclose it on their
public profiles. Note that we do not incorporate
any explicit expertise in political science to learn
to represent ideological information. Nosotros elegimos
statements with the highest score for each of the
ideologies. We can see that, in the context of
guns, statements that have to do with some form
of gun control have higher scores for the center-to-
left spectrum of ideological symbols (moderate,
liberal, progressive), whereas statements that
mention gun rights and the ineffectiveness of
gun control policies have higher scores for
conservative and libertarian symbols.
this evaluation,

en mesa 3
(Bottom), we embed ideologies and find three
example statements that are close in the em-
bedding space. In the context of LGBT issues, nosotros
find that statements closest to the liberal symbol
are those that support the legalization of same-
sex marriage, and frame it as a constitutional
issue. Por otro lado, the statements closest
to the conservative symbol, frame homosexuality
and same-sex marriage as a moral or religious
issue, and we find statements both supporting
and opposing same-sex marriage. This experiment
shows that our model is easy to interpret, y
provides an explanation for the decision made.

To complement

Finalmente, we evaluate our learned model over
entities that have not been observed during
training. Para hacer esto, we extract statements made by
three prominent politicians from ontheissues.org.
Entonces, we try to explain the politicians by looking
este
their predicted ideology. Results for
en

Modelo

AB

mi

GM

GC

S

A

S

A

S

A

S

A

INDNETS
TransE

Local
Reln.
Embed. ComplEx

66.0 61.7 58.2 59.7 62.6 60.6 59.5 61.0
62.5 62.9 53.5 65.1 58.7 69.3 55.3 65.0
66.6 73.4 60.7 72.2 66.6 72.8 60.0 70.7
66.6 72.3 59.2 71.3 67.0 74.2 59.4 69.9
RotatE
PSL
81.6 74.4 69.0 64.9 83.3 74.2 71.9 71.7
TensorLog 77.3 61.3 68.2 51.3 80.4 65.2 68.3 55.6
82.8 74.6 64.8 63.2 84.5 73.4 70.4 66.3
JOINTINF
88.6 84.7 72.8 72.2 90.3 81.8 76.8 72.2
DRAIL GLOBAL
89.0 83.5 80.5 76.4 89.3 82.1 80.3 73.4
RELNETS

problema.
Logic.

Mesa 4: General results for issue-specific stance
and agreement prediction (Macro F1). AB: Abortion,
mi: Evolución, GM: Gay Marriage, GC: Gun Control.

evaluation can be seen in Table 4. The left part of
Cifra 4 shows the proportion of statements that
were identified for each ideology: izquierda (liberal or
progressive), moderate and right (conservative).
We find that we are able to recover the relative
positions in the political spectrum for the evaluated
politicians: Bernie Sanders, Joe Biden, and Donald
Trump. We find that Sanders is the most left
leaning, followed by Biden. A diferencia de, Donald
Trump stands mostly on the right. We also include
some examples of the classified statements. Nosotros
show that we are able to identify cases in which
the statement does not necessarily align with the
known ideology for each politician.

5.4 Issue-Specific Stance Prediction

Given a debate thread on a specific issue (p.ej.,
abortion), the task is to predict the stance with
respect to the issue for each one of the debate
posts (Walker et al., 2012). Each thread forms a
tree structure, where users participate and respond
to each other’s posts. We treat the task as a
collective classification problem, and model the
agreement between posts and their replies, también
as the consistency between posts written by the

110

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 4: Statements made by politicians classified using our model trained on debate.org.

same author. The DRAIL program for this task can
be observed in Appendix A.

Dataset: We use the 4Forums dataset from the
Internet Argument Corpus (Walker et al., 2012),
consisting of a total of 1,230 debates and 24,658
posts on abortion, evolution, gay marriage, y
gun control. We use the same splits as Li et al.
(2018) and perform 5-fold cross validation.

Entity and Relation Encoders: We repre-
sented posts using pre-trained BERT encoders
(Devlin et al., 2019) and do not generate features
for authors. As in the previous task, we model all
relations and rules using feed-forward networks
with one hidden layer and ReLU activations. Nota
that we fine-tune all parameters during training.

En mesa 4 we can observe the general results for
this task. We report macro F1 for post stance and
agreement between posts for all issues. As in the
previous task, we find that ComplEx and RotatE
relational embeddings outperform INDNETS, y
probabilistic logics outperform methods that do
not perform constrained inference. PSL out-
performs JOINTINF for evolution and gun control
debates, which are the two issues with less
training data, whereas JOINTINF outperforms PSL
for debates on abortion and gay marriage. Este
could indicate that re-weighting rules may be
advantageous for the cases with less supervision.
Finalmente, we see the advantage of using a global
learning objective and augmenting it with shared
representaciones. Mesa 5 compares our model with
previously published results.

5.5 Argument Mining

The goal of this task is to identify argumentative
structures in essays. Each argumentative structure
corresponds to a tree in a document. Nodes are
predefined spans of text and can be labeled
either as claims, major claims, or premises,

Modelo

A

mi

GM GC

Avg

BERT (Devlin et al., 2019)
PSL (Sridhar et al., 2015b)
Struct. Rep. (Le et al., 2018)

67.0
77.0
86.5

62.4
80.3
82.2

67.4
80.5
87.6

64.6
69.1
83.1

65.4
76.7
84.9

DRAIL RELNETS

89.2

82.4

90.1

83.1

86.2

Mesa 5: Previous work on issue-specific stance
predicción (stance acc.).

and edges correspond to support/attack relations
between nodes. Domain knowledge is injected by
constraining sources to be premises and targets
to be either premises or major claims, también
as enforcing tree structures. We model nodes,
Enlaces, and second order relations, grandparent
(a → b → c), and co-parent (a → b ← c)
(Niculae et al., 2017). Además, we consider
link labels, denoted stances. The DRAIL program
for this task can be observed in Appendix A.

Dataset: We used the UKP dataset (Stab and
Gurévich, 2017), consisting of 402 documentos,
with a total of 6,100 propositions and 3,800 Enlaces
(17% of pairs). We use the splits used by Niculae
et al. (2017), and report macro F1 for components
and positive F1 for relations.

Entity and Relation Encoders: To represent
the component and the essay, we used a
BiLSTM over
initialized with
las palabras,
GloVe embeddings (Pennington et al., 2014),
concatenated with a feature vector following
Niculae et al. (2017). For representing the relation,
we use a feed-forward computation over the
componentes, as well as the relation features used
in Niculae et al. (2017).

We can observe the general results for this
task in Table 6. Given that this task relies on
constructing the tree from scratch, we find that all
methods that do not include declarative constraints
(INDNETS and relational embeddings) suffer when

111

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Local

Reln.
Embed.

DRAIL

problema. Logic PSL

Modelo

Node Link Avg Stance Avg

INDNETS

70.7

52.8 61.7

63.4

62.3

TransE
65.7
ComplEx 69.1
67.2
RotatE
76.5
78.6
83.1
82.9

JOINTINF
GLOBAL
RELNETS

23.7 44.7
15.7 42.4
20.7 44.0
56.4 66.5
59.5 69.1
61.2 72.2
63.7 73.3

44.6
53.5
46.7
64.7
62.9
69.2
68.4

44.7
46.1
44.9
65.9
67.0
71.2
71.7

Mesa 6: General results for argument mining.

Modelo

Node Link Avg

Human upper bound
ILP Joint (Stab and Gurevych, 2017)
Struct RNN strict (Niculae et al., 2017)
Struct SVM full (Niculae et al., 2017)
Joint PointerNet (Potash et al., 2017)
Kuribayashi et al. 2019

86.8
82.6
79.3
77.6
84.9
85.7

75.5 81.2
58.5 70.6
50.1 64.7
60.1 68.9
60.8 72.9
67.8 76.8

DRAIL RELNETS

82.9

63.7 73.3

Mesa 7: Previous work on argument mining.

trying to predict links correctly. For this task, nosotros
did not apply TensorLog, given that we couldn’t
find a way to express tree constraints using their
syntax. Una vez más, we see the advantage of using
global learning, as well as sharing information
between rules using RELNETS.

Mesa 7 shows the performance of our model
against previously published results. While we
are able to outperform models that use the same
underlying encoders and features, trabajo reciente
by Kuribayashi et al. (2019) further improved
performance by exploiting contextualized word
embeddings that look at the whole document,
and making a distinction between argumentative
markers and argumentative components. Nosotros
did not find a significant improvement by in-
corporating their ELMo-LSTM encoders into
our framework,3 nor by replacing our BiLSTM
encoders with BERT. We leave the exploration
of an effective way to leverage contextualized
embeddings for this task for future work.

5.6 Run-time Analysis

En esta sección, we perform a run-time analysis
of all probabilistic logic systems tested. All ex-
periments were run on a 12 core 3.2Ghz Intel i7
CPU machine with 63GB RAM and an NVIDIA
GeForce GTX 1080 Ti 11GB GDDR5X GPU.

3We did not experiment with their normalization

acercarse, extended BoW features, nor AC/AM distinction.

Cifra 5: Average overall training time (per fold).

Cifra 5 shows the overall training time (por
fold) in seconds for each of the evaluated tasks.
Note that the figure is presented in logarithmic
escala. We find that DRAIL is generally more
computationally expensive than both TensorLog
and PSL. This is expected given that DRAIL back-
propagates to the base classifiers at each epoch,
while the other frameworks just take the local
predictions as priors. Sin embargo, when using a large
number of arithmetic constraints (p.ej., Argument
Minería), we find that PSL takes a really long time
to train. We found no significant difference when
using ILP or AD.3 We presume that this is due to
the fact that our graphs are small and that Gurobi
is a highly optimized commercial software.

Finalmente, we find that when using encoders with
a large number of parameters (p.ej., BERT) en
tasks with small graphs, the difference in training
time between training local and global models
is minimal. In these cases, back-propagation is
considerably more expensive than inference, y
global models converge in fewer epochs. Para
Argument Mining, local models are at least twice
as fast. BiLSTMs are considerably faster than
BERT, and inference is more expensive for this
tarea.

5.7 Analysis of Loss Functions

In this section we perform an evaluation of the
CRF loss for issue-specific stance prediction.
Note that one drawback of the CRF loss (ecuación. 6)
is that we need to accumulate the gradient for
the approximated partition function. When using
entity encoders with a lot of parameters (p.ej.,
BERT), the amount of memory needed for a single
instance increases. We were unable to fit the full
models in our GPU. For the purpose of these tests,
we froze the BERT parameters after local training

112

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Modelo

Stance Agree

Avg

Secs p/epoch

Hinge loss
CRF(β = 5)
CRF(β = 20)
CRF(β = 50)

82.74
83.09
84.10
84.19

78.54
81.03
82.16
81.80

80.64
82.06
83.13
83.00

132
345
482
720

Mesa 8: Stance prediction (abortion) dev results
for different training objectives.

and updated only the relation and rule parameters.
To obtain the solution pool, we use Gurobi’s pool
search mode to find β high-quality solutions. Este
also increases the cost of search at inference time.
Development set results for the debates on
abortion can be observed in Table 8. Mientras
increasing the size of the solution pool leads
it comes at a higher
to better performance,
computational cost.

6 Conclusions

en este documento, we motivate the need for a
declarative neural-symbolic approach that can be
applied to NLP tasks involving long texts and
contextualizing information. We introduce a gen-
eral framework to support this, and demonstrate
its flexibility by modeling problems with diverse
relations and rich representations, and obtain
models that are easy to interpret and expand.
The code for DRAIL and the application examples
in this paper have been released to the community,
to help promote this modeling approach for other
applications.

Expresiones de gratitud

We would like to acknowledge current and former
members of the PurdueNLP lab, particularly Xiao
zhang, Chang Li, Ibrahim Dalal, I-Ta Lee, Ayush
jainista, Rajkumar Pujari, and Shamik Roy for their
help and insightful discussions in the early stages
of this project. We also thank the reviewers and
action editor for their constructive feedback. Este
project was partially funded by the NSF, grant
CNS-1814105.

Referencias

Daniel Andor, Chris Alberti, David Weiss,
Aliaksei Severyn, Alessandro Presta, Kuzman
Ganchev, eslavo petrov, and Michael Collins.
2016. Globally normalized transition-based

En procedimientos de

el
neural networks.
54ª Reunión Anual de la Asociación de
Ligüística computacional (Volumen 1: Largo
Documentos), pages 2442–2452, Berlina, Alemania.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/P16-1231

Stephen H. Bach, Matthias Broecheler, Bert
Huang, and Lise Getoor. 2017. Hinge-loss
markov random fields and probabilistic soft
logic. Journal of Machine Learning Research
(JMLR), 181–67.

Islam Beltagy, Katrin Erk, and Raymond Mooney.
2014. Probabilistic soft
logic for semantic
textual similarity. In Proceedings of the 52nd
Annual Meeting of the Association for Compu-
lingüística nacional (Volumen 1: Artículos largos),
pages 1210–1219. baltimore, Maryland. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P14
-1114

Antonio Bordes, Nicolás Usunier, Alberto
Garcia-Duran, Jason Weston, and Oksana
Yakhnenko. 2013, Translating embeddings
for modeling multi-relational data, C. j. C.
Burges, l. Bottou, METRO. Welling, z. Ghahramani,
and K. q. Weinberger, editores, Avances en
Neural Information Processing Systems 26,
pages 2787–2795. Asociados Curran, Cª.

Samuel R. Bowman, Gabor Angeli, Christopher
Potts, and Christopher D. Manning. 2015. A
large annotated corpus for learning natural
language inference. En Actas de la 2015
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje, pages 632–642. Lisbon,
for Computational
Portugal, Asociación
Lingüística. DOI: https://doi.org/10
.18653/v1/D15-1075

Danqi Chen and Christopher Manning. 2014.
A fast and accurate dependency parser using
neural networks. En Actas de la 2014
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje (EMNLP), pages 740–750.
Doha, Qatar. Asociación de Computación
Lingüística. DOI: https://doi.org/10
.3115/v1/D14-1082

William W. cohen, fan yang, and Kathryn
Mazaitis. 2020. Tensorlog: A probabilistic

113

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

database implemented using deep-learning
infrastructure. Journal of Artificial Intelligence
Investigación, 67:285–325. DOI: https://doi
.org/10.1613/jair.1.11944

Jacob Devlin, Ming-Wei Chang, Kenton Lee, y
Kristina Toutanova. 2019. BERT: Pre-entrenamiento
de transformadores bidireccionales profundos para el lenguaje
comprensión. En procedimientos de
el 2019
Conference of the North American Chapter of
la Asociación de Lingüística Computacional:
Tecnologías del lenguaje humano, Volumen 1
(Artículos largos y cortos), páginas 4171–4186,
para
Mineápolis, Minnesota. Asociación
Ligüística computacional.

Ivan Donadello, Luciano Serafini, and Artur
d’Avila Garcez. 2017. Logic tensor networks
for semantic image interpretation. En curso-
ings of the Twenty-Sixth International Joint
Conference on Artificial Intelligence, IJCAI-
17, pages 1596–1602. DOI: https://doi
.org/10.24963/ijcai.2017/221

Honghua Dong, Jiayuan Mao, Tian Lin, Chong
Wang, Lihong Li, and Denny Zhou. 2019.
Neural logic machines. In 7th International
Conferencia sobre Representaciones del Aprendizaje, ICLR
2019, Nueva Orleans, LA, EE.UU, Puede 6-9, 2019.

Alessandro Duranti

and Charles Goodwin.
1992. Rethinking context: An introduction,
Alessandro Duranti and Charles Goodwin,
editores, Rethinking Context: Language as an
Interactive Phenomenon, capítulo 1, pages 1–42,
Prensa de la Universidad de Cambridge, Cambridge.

Noah D. Buen hombre, Vikash K. Mansinghka,
Daniel M. roy, Keith Bonawitz, and Joshua B.
Tenenbaum. 2008. Church: a language for gen-
erative models. In UAI 2008, Actas de
the 24th Conference in Uncertainty in Artificial
Inteligencia, Helsinki, Finland, Julio 9-12, 2008,
pages 220–229.

Aditya Grover

de

feature

En procedimientos

aprendiendo
el

and Jure Leskovec. 2016.
para
node2vec: Scalable
22nd
redes.
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, san
Francisco, California, EE.UU, Agosto 13-17, 2016,
pages 855–864. DOI: https://doi.org
/10.1145/2939672.2939754, PMID:
27853626, PMCID: PMC5108654

John

revisited.

j. Gumperz.
En
editores,

1992. Contextualiza-
y un.
PAG. Auer
ción
The Contextualiza-
Luzio,
Di
of Language, pages 39–54. DOI:
ción
https://doi.org/10.1075/pbns.22
.04gum

Gurobi Optimization Inc. 2015. Gurobi optimizer

manual de referencia.

Will Hamilton, Zhitao Ying, and Jure Leskovec.
2017. Inductive representation learning on large
graphs. In I. Guyon, Ud.. V. Luxburg, S. bengio,
h. Wallach, R. Fergus, S. Vishwanathan, y
R. Garnett, editores, Advances in Neural Inform-
ation Processing Systems 30, Curran Asso-
ciates, Cª, pages 1024–1034.

Rujun Han, Qiang Ning, and Nanyun Peng.
2019. Joint event and temporal relation extrac-
tion with shared representations and structured
predicción. En Actas de la 2019 Estafa-
ference on Empirical Methods in Natural
Language Processing and the 9th Interna-
tional Joint Conference on Natural Language
Procesando (EMNLP-IJCNLP), pages 434–444,
Hong Kong, Porcelana, Asociación para Com-
Lingüística putacional. DOI: https://doi
.org/10.18653/v1/D19-1041

Seyed Mehran Kazemi and David Poole. 2018.
ReLNN: A deep neural model for relational
aprendiendo. In Proceedings of the Thirty-Second
Conferencia AAAI sobre Inteligencia Artificial,
(AAAI-18), the 30th innovative Applications of
Artificial Intelligence (IAAI-18), and the 8th
AAAI Symposium on Educational Advances
in Artificial
Inteligencia (EAAI-18), Nuevo
Orleans, Luisiana, EE.UU, Febrero 2-7, 2018,
pages 6367–6375, AAAI Press.

Eliyahu Kiperwasser and Yoav Goldberg. 2016.
Simple and accurate dependency parsing using
bidirectional LSTM feature representations.
the Association for Com-
Transactions of
4:313–327. DOI:
putational
https://doi.org/10.1162/tacl a
00101

Lingüística,

Thomas N. Kipf and Max Welling. 2017.
Semi-supervised classification with graph
convolutional networks. In 5th International
Conferencia sobre Representaciones del Aprendizaje, ICLR

114

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

2017, Toulon, Francia, Abril 24-26, 2017,
Conference Track Proceedings.

Parisa Kordjamshidi, Dan Roth, and Hao Wu.
2015. Saul: Towards declarative learning
el
based programming. En procedimientos de
Twenty-Fourth International Joint Confer-
ence on Artificial Intelligence, IJCAI 2015,
Buenos Aires, Argentina, Julio 25-31, 2015,
pages 1844–1851.

Tatsuki Kuribayashi, Hiroki Ouchi, Naoya
Inoue, Paul Reisert, Toshinori Miyoshi, Jun
suzuki, and Kentaro Inui. 2019. An empirical
study of span representations in argumenta-
tion structure parsing. En Actas de la
57ª Reunión Anual de la Asociación de
Ligüística computacional, pages 4691–4698,
Florencia,
Italia. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/P19-1464

En procedimientos de

Guillaume Lample, Miguel Ballesteros, Sandeep
Subramanian, Kazuya Kawakami, and Chris
Dyer. 2016. Neural architectures for named
el
entity recognition.
2016 Conference of
the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
paginas
260–270, San Diego, California,
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/N16-1030

I-Ta Lee and Dan Goldwasser. 2019. Multi-
relational script learning for discourse relations.
In Proceedings of the 57th Annual Meeting
de la Asociación de Linguis Computacional-
tics, pages 4214–4226, Florencia, Italia. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P19
-1413

Chang Li, Aldo Porco, and Dan Goldwasser.
2018. Structured representation learning for
online debate stance prediction. En procedimientos
the 27th International Conference on
de
Ligüística computacional, pages 3728–3739,
Santa Fe, New Mexico, EE.UU. Asociación para
Ligüística computacional.

improve automatic event detection. En profesional-
ceedings of the 54th Annual Meeting of the
Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 2134–2143.
Berlina, Alemania, Asociación de Computación-
lingüística nacional.

Xuezhe Ma and Eduard Hovy. 2016. End-
to-end sequence labeling via bi-directional
el
LSTM-CNNs-CRF.
54ª Reunión Anual de la Asociación de
Ligüística computacional (Volumen 1: Largo
Documentos), pages 1064–1074, Berlina, Alemania,
Asociación de Lingüística Computacional.

En procedimientos de

Chaitanya Malaviya, Matthew R. Gormley,
y Graham Neubig. 2018. Neural
factor
graph models for cross-lingual morphological
tagging. In Proceedings of the 56th Annual
Meeting of the Association for Computatio-
nal Linguistics (Volumen 1: Artículos largos),
pages 2653–2663. Melbourne, Australia. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P18
-1247

Robin Manhaeve, Sebastijan Dumancic, Angelika
Kimmig, Thomas Demeester, and Luc De
Raedt. 2018. Deepproblog: Neural probabilis-
tic logic programming.
In S. bengio, h.
Wallach, h. Larochelle, k. Grauman, norte. Cesa-
Bianchi, y r. Garnett, editores, Avances en
Neural Information Processing Systems 31,
pages 3749–3759, Asociados Curran, Cª.

Jiayuan Mao, Chuang Gan, Pushmeet Kohli,
Joshua B. Tenenbaum, and Jiajun Wu. 2019.
The neuro-symbolic concept learner: Interpret-
ing scenes, palabras, and sentences from natural
supervision.

Giuseppe

Marra,

Francesco

Giannini,
Michelangelo Diligenti, and Marco Gori.
2019. Integrating learning and reasoning with
deep logic models. In Machine Learning and
Knowledge Discovery in DatabasesEuropean
Conferencia, ECML PKDD 2019, W¨urzburg,
Alemania, Septiembre 16-20, 2019, Proceed-
ings, Part II, pages 517–532. DOI: https://
doi.org/10.1007/978-3-030-46147
-8 31

Shulin Liu, Yubo Chen, Shizhu He, Kang Liu,
and Jun Zhao. 2016. Leveraging FrameNet to

andref. t. Martins, M´ario A. t. Figueiredo,
Pedro M. q. Aguiar, Noah A. Herrero, y

115

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Eric P. Xing. 2015. Ad3: Alternating direc-
tions dual decomposition for map inference in
graphical models. Journal of Machine Learn-
ing Research, 16(16):495–545.

Andrew McCallum, Karl Schultz, and Sameer
singh. 2009. Factorie: Probabilistic program-
ming via imperatively defined factor graphs.
Y. bengio, D. Schuurmans, j. D. Lafferty,
C. k. I. williams, y un. Culotta, editores,
Avances
Information Process-
ing Systems 22, pages 1249–1257, Curran
Associates, Cª.

in Neural

Miller McPherson, Lynn Smith-Lovin,

y
James M.. Cocinar. 2001. Birds of a feather: Homo-
phily in social networks. Annual Review of
Sociology, 27(1):415–444. DOI: https://
doi.org/10.1146/annurev.soc.27.1
.415

Brian Milch, Bhaskara Marthi, Stuart Russell,
David Sontag, Daniel L. Ong, and Andrey
Kolobov. 2005. Blog: Probabilistic models with
unknown objects. En (IJCAI ’05) Nineteenth
International Joint Conference on Artificial
Inteligencia.

Vlad Niculae, Joonsuk Park, and Claire Cardie.
2017. Argument mining with structured SVMs
and RNNs. In Proceedings of the 55th Annual
reunión de
la Asociación de Computación-
lingüística nacional (Volumen 1: Artículos largos),
pages 985–995, vancouver, Canada. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P17
-1091

Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth.
2018. Joint reasoning for temporal and causal
relaciones. In Proceedings of the 56th Annual
reunión de
the Association for Compu-
lingüística nacional (Volumen 1: Artículos largos),
2278–2288, Melbourne, Australia,
paginas
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/P18-1212

Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi
zhang, and Yang Wang. 2016. Tri-party deep
network representation. En Actas de la
Twenty-Fifth International Joint Conference on
Artificial Intelligence, IJCAI 2016, Nueva York,
Nueva York, EE.UU, 9-15 Julio 2016, pages 1895–1901.

116

jeffrey

Socher,

Pennington, Ricardo

y
Christopher Manning. 2014. Glove: Global
vectors for word representation. En curso-
cosas de
el 2014 Conferencia sobre Empirismo
Métodos en el procesamiento del lenguaje natural
(EMNLP), pages 1532–1543, Doha, Qatar.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.3115/v1
/D14-1162

Bryan Perozzi, Rami Al-Rfou, and Steven
Skiena. 2014. Deepwalk: Online learning of
social representations. En Actas de la
20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining,
KDD ’14, pages 701–710, Nueva York, Nueva York,
EE.UU. ACM. DOI: https://doi.org/10
.1145/2623330.2623732

Matthew Peters, Mark Neumann, Mohit Iyyer,
Matt Gardner, Christopher Clark, Kenton Lee,
and Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. En procedimientos de
el 2018 Conference of the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
Volumen 1 (Artículos largos), pages 2227–2237,
Nueva Orleans, Luisiana, Asociación para Com-
Lingüística putacional. DOI: https://doi
.org/10.18653/v1/N18-1202

Peter Potash, Alexey Romanov, and Anna
Rumshisky. 2017. Here’s my point: Joint
pointer architecture for argument mining.
el 2017 Conferencia
En procedimientos de
in Natural Lan-
on Empirical Methods
Procesamiento de calibre, pages 1364–1373, Copen-
hagen, Dinamarca. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D17-1143

Matthew Richardson and Pedro M. Domingos.
2006. Markov logic networks. Machine Learn-
En g, 62(1–2):107–136. DOI: https://doi
.org/10.1007/s10994-006-5833-1

Nick Rizzolo and Dan Roth. 2010. Aprendiendo
based Java for rapid development of NLP
sistemas. In Proceedings of the Seventh Interna-
tional Conference on Language Resources
and Evaluation (LREC’10). Valletta, Malta,
European Language Resources Association
(ELRA).

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Tim Rockt¨aschel and Sebastian Riedel. 2017.
End-to-end differentiable proving. In I. Guyon,
Ud.. V. Luxburg, S. bengio, h. Wallach, R.
Fergus, S. Vishwanathan, y r. Garnett,
editores, Avances
Información
Sistemas de procesamiento 30, pages 3788–3800.
Asociados Curran, Cª.

in Neural

Ali Sadeghian, Mohammadreza Armandpour,
Patrick Ding, and Daisy Zhe Wang. 2019.
Drum: End-to-end differentiable rule mining
on knowledge graphs.
En H. Wallach, h.
Larochelle, A. Beygelzimer, F. dAlch´e-Buc,
mi. Fox, y r. Garnett, editores, Avances en
Neural Information Processing Systems 32,
pages 15347–15357. Asociados Curran, Cª.

Gustav Sourek, Vojtech Aschenbrenner, Filip
Zelezn´y, Steven Schockaert,
and Ondrej
Kuzelka. 2018. Lifted relational neural net-
obras: Efficient learning of latent relational
estructuras. Journal of Artificial Intelligence
Investigación, 62:69–100. DOI: https://doi
.org/10.1613/jair.1.11203

Dhanya Sridhar, James Foulds, Bert Huang, Lise
Getoor, and Marilyn Walker. 2015b. Joint
models of disagreement and stance in online
the 53rd Annual
debate. En procedimientos de
Meeting of the Association for Computational
Linguistics and the 7th International Joint
Conferencia sobre procesamiento del lenguaje natural
(Volumen 1: Artículos largos), pages 116–125,
Beijing, Porcelana, Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.3115/v1/P15-1012

Christian Stab and Iryna Gurevych, 2017.
Parsing argumentation structures
in per-
suasive essays. Ligüística computacional,
43(3):619–659. DOI: https://doi.org
/10.1162/COLI a 00295

Shivashankar

Subramanian,

Trevor Cohn,
Timothy Baldwin, and Julian Brooke. 2017.
Joint sentence-document model for manifesto
text analysis. In Proceedings of the Australasian
Language Technology Association Workshop
2017, pages 25–33, Brisbane, Australia.

Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, y
Jian Tang. 2019. Rotate: Knowledge graph
embedding by relational rotation in complex
espacio. In 7th International Conference on

117

Learning Representations, ICLR 2019, Nuevo
Orleans, LA, EE.UU, Puede 6-9, 2019.

Jian Tang, Meng Qu, Mingzhe Wang, Ming
zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE:
large-scale information network embedding. En
Proceedings of the 24th International Confer-
ence on World Wide Web, WWW 2015, Florencia,
Italia, Puede 18-22, 2015, pages 1067–1077. DOI:
https://doi.org/10.1145/2736277
.2741093

Th´eo Trouillon,

Johannes Welbl, Sebastian
Riedel, Eric Gaussier, and Guillaume Bouchard.
2016. Complex embeddings for simple link
predicción. In Proceedings of The 33rd Inter-
Conferencia nacional sobre aprendizaje automático,
volumen 48 of Proceedings of Machine Learning
Investigación, pages 2071–2080, Nueva York, Nuevo
york, EE.UU, PMLR.

Cunchao Tu, Han Liu, Zhiyuan Liu, y
Maosong Sun. 2017. CANE: Context-aware
network embedding for relation modeling. En
Proceedings of the 55th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1722–1731,
vancouver, Canada. Asociación de Computación-
lingüística nacional.

Iulia Turc, Ming-Wei Chang, Kenton Lee,
and Kristina Toutanova. 2019. Well-read
students learn better: On the importance of
pre-training compact models. arXiv preprint
arXiv:1908.08962v2.

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa
Casanova, Adriana Romero, Pietro Lio,
and Yoshua Bengio. 2017. Graph attention
redes. arXiv preimpresión arXiv:1710.10903.

Marilyn Walker, Pranav Anand, Rob Abbott,
and Ricky Grant. 2012. Stance classification
using dialogic properties of persuasion. En
Actas de la 2012 Conference of the
North American Chapter of the Association for
Ligüística computacional: Human Language
Technologies,
592–596, Montr´eal,
paginas
Canada, Asociación
for Computational
Lingüística.

Hai Wang and Hoifung Poon. 2018. Deep prob-
abilistic logic: A unifying framework for
indirect supervision. En Actas de la 2018

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1891–1902,
Bruselas, Bélgica. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D18-1215

William Yang Wang, Kathryn Mazaitis, y
William W. cohen. 2013. Programming with
personalized pagerank: A locally groundable
first-order probabilistic logic. In 22nd ACM
International Conference on Information and
Knowledge Management, CIKM’13, San Fran-
cisco, California, EE.UU, Octubre 27 – Noviembre 1, 2013,
pages 2129–2138. ACM. DOI: https://
doi.org/10.1145/2505515.2505573

Zhen Wang,

Jianwen Zhang,

Jianlin Feng,
and Zheng Chen. 2014. Knowledge graph
embedding by translating on hyperplanes.
the Twenty-Eighth AAAI
En procedimientos de
Inteligencia, Julio
Conference on Artificial
27-31, 2014, Qu´ebec City, Qu´ebec, Canada,
pages 1112–1119.

David Weiss, Chris Alberti, michael collins,
and Slav Petrov. 2015. Structured training for
neural network transition-based parsing. En
Proceedings of the 53rd Annual Meeting of
la Asociación de Lingüística Computacional
and the 7th International Joint Conference on
Natural Language Processing (Volumen 1: Largo
Documentos), pages 323–333, Beijing, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/v1/P15
-1032,
PMCID:
PMC4695984

26003913,

PMID:

Han Xiao, Minlie Huang, Lian Meng, and Xiaoyan
Zhu. 2017. SSP: semantic space projection
for knowledge graph embedding with text
descripciones. In Proceedings of the Thirty-First
Conferencia AAAI sobre Inteligencia Artificial,
Febrero 4-9, 2017, San Francisco, California,
EE.UU, pages 3104–3110.

fan yang, Zhilin Yang, and William W. cohen.
2017. Differentiable learning of logical rules
for knowledge base reasoning. In I. Guyon,
Ud.. V. Luxburg, S. bengio, h. Wallach,
R. Fergus, S. Vishwanathan, y r. Garnett,
editores, Avances
Información
Sistemas de procesamiento 30, pages 2319–2328.
Asociados Curran, Cª.

in Neural

Xiao Zhang and Dan Goldwasser. 2019. Sentiment
tagging with partial
labels using modular
architectures. In Proceedings of the 57th Annual
Meeting of the Association for Computational
Lingüística, pages 579–590, Florencia, Italia.
Asociación de Lingüística Computacional.

the 53rd Annual Meeting of

Hao Zhou, Yue Zhang, Shujian Huang, y
Jiajun Chen. 2015. A neural probabilistic
para
structured-prediction model
transition-
En procedimientos
based dependency parsing.
the Asso-
de
ciation for Computational Linguistics and
the 7th International Joint Conference on
Natural Language Processing (Volumen 1:
Artículos largos), pages 1213–1222, Beijing,
Porcelana, Association for Computational Linguis-
tics. DOI: https://doi.org/10.3115
/v1/P15-1117, PMCID: PMC4674637

118

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

A DRaiL Programs

C Code Snippets

We include code snippets to show how to load
data into DRAIL (Figure 7-a), as well as to
how to define a neural architecture (Figure 7-b).
Neural architectures and feature functions can be
programmed by creating Python classes, y el
module and classes can be directly specified in
the DRAIL program (líneas 13, 14, 24, y 29 en
Figure 7-a).

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
5
7
1
9
2
4
2
1
2

/

/
t

yo

a
C
_
a
_
0
0
3
5
7
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 7: Code snippets.

Cifra 6: DRAIL programs.

B Hyperparameters

For BERT, we use the default parameters.

Tarea

Param

Search Space

Selected Value

Open Domain
(Local)

Open Domain
(Global)
Stance Pred.
(Local)

Stance Pred.
(Global)
Arg. Minería
(Local)

Arg. Minería
(Global)

Learning Rate
Batch size
Patience
Optimizer
Hidden Units
Non-linearity
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Optimizer
Learning Rate
Batch size
Learning Rate
Patience
Batch size
Dropout
Optimizer
Hidden Units
Non-linearity
Learning Rate
Patience
Batch size

2e-6,5e-6,2e-5,5e-5
32 (máx.. Mem)
1,3,5
SGD,Adán,AdamW
128,512

2e-6,5e-6,2e-5,5e-5

2e-6,5e-6,2e-5,5e-5
1,3,5
16 (máx.. Mem)
SGD,Adán,AdamW
2e-6,5e-6,2e-5,5e-5

1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20
16,32,64,128
0.01,0.05,0.1
SGD,Adán,AdamW
128,512

1e-4,5e-4,5e-3,1e-3,5e-2,1e-2
5,10,20

2e-5
32
3
AdamW
512
ReLU
2e-6
Full instance
5e-5
3
16
AdamW
2e-6
Full instance
5e-2
20
64
0.05
SGD
128
ReLU
1e-4
10
Full instance

Mesa 9: Hyper-parameter tuning.

119Modeling Content and Context with Deep Relational Learning image
Modeling Content and Context with Deep Relational Learning image
Modeling Content and Context with Deep Relational Learning image
Modeling Content and Context with Deep Relational Learning image

Descargar PDF