Decomposing and Recomposing Event Structure
William Gantt
University of Rochester, Etats-Unis
wgantt@cs.rochester.edu
Lelia Glass
Georgia Institute of Technology, Etats-Unis
lelia.glass@modlangs.gatech.edu
Aaron Steven White
University of Rochester, Etats-Unis
aaron.white@rochester.edu
Abstrait
We present an event structure classification
empirically derived from inferential properties
annotated on sentence- and document-level
Universal Decompositional Semantics (UDS)
graphs. We induce this classification jointly
with semantic role, entity, and event-event re-
lation classifications using a document-level
generative model structured by these graphs.
To support this induction, we augment exist-
ing annotations found in the UDS1.0 dataset,
which covers the entirety of the English Web
Treebank, with an array of inferential prop-
erties capturing fine-grained aspects of the
temporal and aspectual structure of events.
The resulting dataset (available at decomp.io)
is the largest annotation of event structure and
(partial) event coreference to date.
1
Introduction
Natural language provides myriad ways of com-
municating about complex events. Par exemple,
one and the same event can be described at a
coarse grain, using a single clause (1), or at a finer
grain, using an entire document (2).
(1) The contractors built the house.
(2) They started by laying the house’s foun-
dation. They then framed the house before
installing the plumbing. After that […]
Plus loin, descriptions of the same event at dif-
ferent granularities can be interleaved within the
same document—for example, (2) might well di-
rectly follow (1) as an elaboration on the house-
building process.
Par conséquent, extracting knowledge about
complex events from text involves determining
the structure of the events being referred to: what
their parts are, how those parts are laid out in time,
17
who participates in them and how, and so forth.
Determining this structure requires an event classi-
fication whose elements are associated with event
structure representations. A number of such clas-
sifications and annotated corpora exist: FrameNet
(Baker et al., 1998), VerbNet (Kipper Schuler,
2005), PropBank (Palmer et al., 2005), Abstrait
Meaning Representation (Banarescu et al., 2013),
and Universal Conceptual Cognitive Annotation
(Abend and Rappoport, 2013), entre autres.
Similar in spirit to this prior work, but dif-
ferent in method, our work aims to develop an
empirically derived event structure classification.
Where prior work takes a top–down approach—
hand-engineering an event classification before
deploying it for annotation—we take a bottom–up
approach—decomposing event structure into a
wide variety of theoretically informed, cross-
cutting semantic properties, annotating for those
properties, then recomposing an event classifica-
tion from them by induction. The properties on
which our categories rest target (je) the substruc-
ture of an event (par exemple., that the building described in
(1) consists of a sequence of subevents resulting
in the creation of some artifact); (ii) the super-
structure in which an event takes part (par exemple., que
laying a house’s foundation is part of building a
maison, alongside framing the house, installing the
plumbing, etc.); (iii) the relationship between an
event and its participants (par exemple., that the contractors
dans (1) build the house collectively through their
joint efforts); et (iv) properties of the event’s
participants (par exemple., that the contractors in (1) sont
animate while the house is not).
To derive our event structure classification, nous
extend the Universal Decompositional Semantics
dataset (UDS; White et al., 2016, 2020). UDS anno-
tates for a subset of key event structure properties,
Transactions of the Association for Computational Linguistics, vol. 10, pp. 17–34, 2022. https://doi.org/10.1162/tacl a 00445
Action Editor: Emily Bender. Submission batch: 4/2021; Revision batch: 7/2021; Published 1/2022.
c(cid:2) 2022 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
but a range of key properties remain to be captured.
After motivating the need for these additional
properties (§2), we develop annotation proto-
cols for them (§3). We validate our protocols
(§4) and use them to collect annotations for the
entire Universal Dependencies (Nivre et al., 2016)
English Web Treebank (§5; Bies et al., 2012),
resulting in the UDS-EventStructure dataset
(UDS-E). To derive an event structure classifica-
tion from UDS-E and existing UDS annotations,
we develop a document-level generative model
that jointly induces event, entity, semantic role,
and event-event relation types (§6). Enfin, nous
compare these types to those found in exist-
ing event structure classifications (§7). We make
UDS-E and our code available at decomp.io.
2 Background
Contemporary theoretical
treatments of event
structure tend to take as their starting point
Vendler’s (1957) seminal four-way classification.
We briefly discuss this classification and elab-
orations thereon before turning to other event
structure classifications developed for annotating
corpora.1 We then contrast these with the fully
decompositional approach we take in this paper.
Theoretical Approaches Vendler categorizes
event descriptions into four classes: statives (3),
activités (4), achievements (5), and accomplish-
ments (6). As theoretical constructs, these classes
are used to explain both the distributional charac-
teristics of event descriptions as well as inferences
about how an event progresses over time.
(3) Jo was in the park.
stative = [+DUR, −DYN, −TEL]
(4) Jo ran around in the park.
activity = [+DUR, +DYN, −TEL]
(5) Jo arrived at the park.
achievement = [−DUR, +DYN, +TEL]
(6) Jo ran to the park.
accomplishment = [+DUR, +DYN, +TEL]
Work building on Vendler’s discovered that these
classes can be decomposed into the now well-
accepted component properties in (7)–(9) (Kenny,
1963; Lakoff, 1965; Verkuyl, 1972; Bennett and
Partee, 1978; Mourelatos, 1978; Dowty, 1979).
1The theoretical literature on event structure is truly vast.
See Truswell (2019) for a collection of overview articles.
(7) DUR(ATIVITY): whether the event happens at
an instant or extends over time
(8) DYN(AMICITY): whether the event
involves
changement, broadly construed
(9) TEL(ICITY): whether the event culminates in a
participant changing state or location, être
created or destroyed, and so forth.
Later work further expanded these properties and,
donc, the possible classes. Expanding on DYN,
Taylor (1977) suggests a distinction between dy-
namic predicates that refer to events with dynamic
subevents (par exemple., the individual strides in a running)
and ones that do not (par exemple., the gliding in (10)) (voir
also Bach, 1986; Forgeron, 2003.
(10) The pelican glided through the air.
Dynamic events with dynamic subevents can
be further distinguished based on whether the
subevents are similar (par exemple., the strides in a run-
ning) or dissimilar (par exemple., the subevents in a house-
bâtiment) (Pi˜n´on, 1995). In the case where the
subevents are similar and a participant itself has
subparts (par exemple., when the participant is a group),
there may be a bijection from participant subparts
to subevents. Dans (11), there is a smiling for each
child that makes up the composite smiling—smile
is distributive. Dans (12), the meeting presumably
has some structure, but there is no bijection from
members to subevents—meet is collective (voir
Champollion, 2010, for a review).
(11) {The children, Jo and Bo} smiled.
(12) {The committee, Jo and Bo} met.
Expanding on TEL, Dowty (1991) argues for a
distinction among telics in which the culmination
comes about incrementally (13) or abruptly (14)
(see also Tenny, 1987; Krifka, 1989, 1992, 1998;
Levin and Hovav, 1991; Rappaport Hovav and
Lévine, 1998, 2001; Croft, 2012).
(13) The gardener mowed the lawn.
(14) The climber summitted at 5pm.
This notion of incrementality is intimately tied
up with the notion of DUR(ATIVITY). Par exemple,
Moens and Steedman (1988) point out that certain
event structures can be systematically transformed
into others—for example, alors que (14) décrit
the summitting as something that happens at an
instant (and is thus abrupt), (15) describes it as a
process that culminates in having reached the top
of the mountain (see also Pustejovsky, 1995).
18
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
(15) The climber was summitting.
Such cases of aspectual coercion highlight the
importance of grammatical factors in determining
the structure of an event. More general contextual
factors are also at play when determining event
structure: I ran can describe a telic event (par exemple.,
when it is known that I run the same distance
or to the same place every day) or an atelic
event (par exemple., when the destination and/or distance
is irrelevant in context) (Dowty, 1979; Olsen,
1997). This context-sensitivity strongly suggests
that annotating event structure is not simply a
matter of building a type-level lexical resource
and projecting its labels onto text: Actual text
must be annotated.
Resources Early, broad-coverage lexical
concernant-
sources, such as the Lexical Conceptual Structure
lexicon (LCS; Dorr, 1993), attempt to directly en-
code an elaboration of the core Vendler classes in
terms of a hand-engineered graph representation
proposed by Jackendoff (1990). VerbNet (Kipper
Schuler, 2005) further elaborates on LCS by build-
ing on the fine-grained syntax-based classification
of Levin (1993) and links her classes to LCS-like
representations. More recent versions of VerbNet
(v3.3+; Brown et al., 2018) update these repre-
sentations to ones based on the Dynamic Event
Model (Pustejovsky, 2013).
COLLIE-V, which expands the TRIPS lexicon
and ontology (Ferguson and Allen, 1998, et seq),
takes a similar tack of producing hand-engineered
event structures, combining this hand-engineering
with a procedure for bootstrapping event struc-
photos (Allen et al., 2020). FrameNet also contains
hand-engineered event structures, though they are
significantly more fine-grained than those found
in LCS or VerbNet (Baker et al., 1998).
VerbNet, COLLIE-V, and FrameNet are not
directly annotated on text, though annotations
for at least VerbNet and FrameNet can be ob-
tained by using SemLink to project FrameNet
and VerbNet annotations onto PropBank anno-
tations (Palmer et al., 2005). PropBank frames
have been enriched in a variety of other ways.
One such enrichment can be found in Abstract
Meaning Representation (AMR; Banarescu et al.,
2013; Donatelli et al., 2018). Another can be found
in Richer Event Descriptions (RED; O’Gorman
et coll., 2016), which annotates events and entities
for factuality (whether an event actually happened)
and genericity (whether an event/entity is a partic-
ular or generic) as well as annotating for causal,
temporal, sub-event, and co-reference relations
between events (see also Chklovski and Pantel,
2004; Hovy et al., 2013; Cybulska and Vossen,
2014).
Additional less fine-grained event classifica-
tions exist in TimeBank (Pustejovsky et al., 2006),
Universal Conceptual Cognitive Annotation
(UCCA; Abend and Rappoport, 2013), et le
Situation Entities dataset (SitEnt; Friedrich and
Palmer, 2014b; Friedrich et al., 2016). Of these,
to capturing the standard Vendler
the closest
classification and decompositions
est
SitEnt. The original version of SitEnt annotates
only for a state-event distinction (alongside re-
en retard, non-event structural distinctions), but later
elaborations further annotate for telicity (Friedrich
and Gateva, 2017). Because of this close align-
ment to the standard Vendler classes, we use
SitEnt annotations as part of validating our own
annotation protocol in §3.
thereof
Universal Decompositional Semantics
In con-
trast
to the hand-engineered event structure
classifications discussed above, our aim is to de-
rive event structure representations directly from
semantic annotations. Pour faire ça, we extend the
existing annotations in the Universal Decompo-
sitional Semantics dataset (UDS; White et al.,
2016, 2020) with key annotations for the event
structural distinctions discussed above. Our aim
is not necessarily to reconstruct any previous
classification, though we do find in §6 that our
event type classification approximates Vendler’s
to some extent.
(je)
UDS is a semantic annotation framework and
le
dataset based on the principles that
semantics of words or phrases can be decom-
posed into sets of simpler semantic properties
et (ii) these properties can be annotated by
asking straightforward questions intelligible to
non-experts. UDS comprises two layers of an-
notations on top of the Universal Dependencies
(UD) syntactic graphs in the English Web Tree-
bank (EWT): (je) predicate-argument graphs with
mappings into the syntactic graphs, derived us-
ing the PredPatt tool (White et al., 2016; Zhang
et coll., 2017); et (ii) crowd-sourced annotations
for properties of events (on the predicate nodes
of the predicate-argument graph), entities (on the
19
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 1: Example UDS semantics and syntax graphs with select properties (see Footnote 2 on the property
valeurs). Bolded properties are ones we collect in this paper, and our new document-level graph is also shown
in purple.
argument nodes), and their relationship (on the
predicate-argument edges).
The UDS properties are organized into three
predicate subspaces with five properties in total:
• FACTUALITY (Rudinger et al., 2018)
factual: did the event happen?
• GENERICITY (Govindarajan et al., 2019)
kind: is the event generic?
hypothetical: is the event hypothetical?
dynamic: is the event dynamic or stative?
• TIME (Vashishtha et al., 2019)
duration: how long did/will the event last?
Two argument subspaces with four properties:
• GENERICITY (Govindarajan et al., 2019)
particular: is the entity a particular?
kind: is the entity a kind?
abstract: is the entity abstract or concrete?
• WORDSENSE (White et al., 2016)
Which coarse entity types (WordNet super-
sense) does the entity have?
And one predicate-argument subspace with 16
properties (see White et al., 2016, for full list):
existed {before, pendant, after} did participant
exister {before, pendant, after} the event?
Chiffre 1 shows an example UDS1.0 graph (Blanc
et coll., 2020) augmented with (je) a subset of
the properties we add in bold (see §3); et (ii)
document-level edges in purple (see §6).2
The UDS annotations and associated toolkit
have supported research in a variety of ar-
eas,
including syntactic and semantic parsing
(Stengel-Eskin et al., 2020, 2021), semantic role
labeling (Teichert et al., 2017) and induction
(White et al., 2017), event factuality prediction
(Rudinger et al., 2018), temporal relation extrac-
tion (Vashishtha et al., 2019), entre autres. Pour
our purposes, the annotations do cover some event
structural distinctions—for example, dynamicity,
specific cases of telicity (in the form of change
of state, change of location, and existed {before,
pendant, after}), and durativity. In this sense, UDS
provides an alternative, decompositional event
representation that distinguishes it from more tra-
ditional categorical ones like SitEnt. Cependant, le
existing annotations fail to capture a number of
the core distinctions above—a lacuna this work
aims to fill.
• PROTOROLES (Reisinger et al., 2015)
instigation: did participant cause event?
change of state: did participant change state
during or as a consequence of event?
change of location: did participant change
location during event?
2Following White et al. (2020), the property values in
Chiffre 1 are derived from raw annotations using mixed effects
models (MEMs; Gelman and Hill, 2014), which enable one to
adjust for differences in how annotators approach a particular
annotation task (see also Gantt et al., 2020). In §6, nous
similarly use MEMs in our event structure induction model,
allowing us to work directly with the raw annotations.
20
3 Annotation Protocol
We annotate for the core event structural dis-
tinction not currently covered by UDS, breaking
our annotation into three subprotocols. For all
questions, annotators report confidence in their
response to each question on a scale from 1 (pas
at all confident) à 5 (totally confident).3
Event-subevent Annotators are presented with
a sentence containing a single highlighted predi-
cate followed by four questions about the internal
structure of the event it describes. Q1 asks whether
the event described by the highlighted predicate
has natural subparts. Q2 asks whether the event
has a natural endpoint.
The final questions depend on the response to
Q1. If an annotator responds that the highlighted
predicate refers to an event that has natural parts,
they are asked (je) whether the parts are similar to
one another and (ii) how long each part lasts on
average. If an annotator instead responds that the
event referred to does not have natural parts, ils
are asked (je) whether the event is dynamic, et
(ii) how long the event lasts.
All questions are binary except those concern-
ing duration, for which answers are supplied as one
of twelve ordinal values (see Vashishtha et al.,
2019): effectively no time at all, fractions of a
second, seconds, minutes, hours, jours, weeks,
mois, années, decades, centuries, or effectively
forever. Ensemble, these questions target the three
Vendler-inspired features (DYN, DUR, TEL), plus a
fourth dimension for subtypes of dynamic predi-
cates. In the context of UDS, these properties form
a predicate node subspace, alongside FACTUALITY,
GENERICITY, and TIME.
Event-event Annotators are presented with ei-
ther a single sentence or a pair of adjacent
phrases, with the two predicates of interest
highlighted in distinct colors. For a predicate pair
(p1, p2) describing an event pair (e1, e2), annota-
tors are asked whether e1 is a mereological part
of e2, and vice versa. Both questions are binary:
A positive response to both indicates that e1 and
e2 are the same event; and a positive response
to exactly one of the questions indicates proper
parthood. Prior versions of UDS do not contain
any predicate-predicate edge subspaces, so we add
3The annotation interfaces for all three subprotocols,
including instructions, are available at decomp.io.
document-level graphs to UDS (§6) to capture the
relation between adjacently described events.
This subprotocol
targets generalized event
coreference, identifying constituency in addition
to strict identity. It also augments the information
collected in the event-subevent protocol: insofar
as a proper subevent relation holds between e1 and
e2, we obtain additional fine-grained information
about the subevents of the containing event—for
example, an explicit description of at least one
subevent.
Event-entity The final subprotocol focuses on
the relation between the event described by a
predicate and its plural or conjoined arguments,
asking whether the predicate is distributive or
collective with respect to that argument. Ce
property accordingly forms a predicate-argument
subspace in UDS, similar to PROTOROLES.
4 Validation Experiments
We validate our annotation protocol (je) by assess-
ing interannotator agreement (IAA) among both
experts and crowd-sourced annotators for each
subprotocol on a small sample of items drawn
from existing annotated corpora (§4.1-4.2); et
(ii) by comparing annotations generated using our
protocol against existing annotations that cover
(a subset of) the phenomena that ours does and are
generated by highly trained annotators (§4.3).
4.1 Item Selection
For each of the three subprotocols, l'un des
authors selected 100 sentences for inclusion in
the pilot for that subprotocol. This author did not
consult with the other authors on their selection,
so that annotation could be blind.
For the event-subevent subprotocol, le 100
sentences come from the portion of the MASC
corpus (Ide et al., 2008) that Friedrich et al.
(2016) annotate for eventivity (EVENT v. STATE)
and that Friedrich and Gateva (2017) annotate
for telicity (TELIC v. ATELIC). For the event-event
subprotocol, le 100 sentences come from the
portions of the Richer Event Descriptions cor-
pus (RED; O’Gorman et al., 2016) that are
annotated for event subpart relations. To our
connaissance, no existing annotations cover dis-
tributivity, and so for our event-entity protocol,
we select 100 phrases (distinct from those used
21
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
for the event-subevent subprotocol) and com-
pute IAA, but do not compare against existing
annotations.
4.2 Interannotator Agreement
We compute two forms of IAA: (je) IAA among
expert annotators (the three authors); et (ii) IAA
between experts and crowd-sourced annotators.
In both cases, we use Krippendorff’s α as our
measure of (dis)agreement (Krippendorff, 2004).
For the binary responses, we use the nominal form
of α; for the ordinal responses, we use the ordinal.
Expert Annotators For each subprotocol, le
three authors independently annotated the 100
sentences selected for that subprotocol.
Prior to analysis, we ridit score the confidence
ratings by annotator to normalize them for dif-
ferences in annotator scale use (see Govindarajan
et coll., 2019 for discussion of ridit scoring con-
fidence ratings in a similar annotation protocol).
This method maps ordinal labels to (0, 1) on the
basis of the empirical CDF of each annotator’s re-
sponses—with values closer to 0 implying lower
confidence and those nearer 1 implying higher
confidence. For questions that are dynamically
revealed on the basis of the answer to the natural
parts question (c'est à dire., part similarity, average part
duration, dynamicity, and situation duration) nous
use the average of the ridit scored confidence for
natural parts and that question.
Chiffre 2 shows α when including only items
that the expert annotators rated with a particular
ridit scored confidence or higher. The agreement
for the event-event protocol (mereology) is given
in two forms: given that e1 temporally contains
e2, (je) directed: the agreement on whether e2 is a
subevent of e1; et (ii) undirected: the agreement
on whether e2 is a subevent of e1 and whether e1
is a subevent of e2.
The error bars are computed by a nonpara-
metric bootstrap over items. A threshold of 0.0
corresponds to computing α for all annotations,
regardless of confidence; a threshold of t > 0.0
corresponds to computing α only for annotations
associated with a ridit scored confidence of greater
than t. When this thresholding results in less than
1
3 of items having an annotation for at least two
annotators, α is not plotted. This situation occurs
only for questions that are revealed based on the
answer to a previous question.
Chiffre 2: IAA among experts for each property, fil-
tering annotations with ridit-scored confidence ratings
below different thresholds. Confidence threshold 0.0
implies no filtering. Errors bars show 95% confidence
internals computed by a nonparametric bootstrap.
For natural parts,
telicity, mereology, et
distributivity, agreement is high, even without
filtering any responses on the basis of confidence,
and that agreement improves with confidence.
For part similarity, average part duration, et
situation duration, we see more middling, mais
still reasonable, agreement, though this agreement
does not reliably increase with confidence. Le
fact that it does not increase may have to do with
interactions between confidence on the natural
parts question and its dependent questions that
we do not capture by taking the mean of these
two confidences.
Crowd-Sourced Annotators We recruit crowd-
sourced annotators in two stages. D'abord, we select a
small set of items from the 100 we annotate in the
expert annotation that have high agreement among
experts to create a qualification task. Deuxième,
based on performance in this qualification task,
we construct a pool of trusted annotators who are
allowed to participate in pilot annotations for each
of the three subprotocols.4
4During all validation stages as well as bulk annotation
(§5), we targeted an average hourly pay equivalent to that
22
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
100 sentences in each pilot. Ainsi, all pilots were
guaranteed to include a minimum of 10 distinct
workers (all workers do all HITs), up to a maxi-
mum of 100 for the subprotocols with 10 phrases
per HIT or 200 for the subprotocol with 5 per HIT
(each worker does one HIT). All top-200 workers
from the qualification were allowed to participate.
Chiffre 3 shows IAA between all pilot annota-
tors and experts for individual questions across the
three pilots. More specifically, it shows the distri-
bution of α scores by question for each annotator
when IAA is computed among the three experts
and that annotator only. Solid vertical lines show
the expert-only IAA and dashed vertical lines
show the 95% Intervalle de confiance.
4.3 Protocol Comparison
le
further
validate
event-event
À
et
event-subevent subprotocols, we evaluate how
well our pilot data predicts the corresponding
CONTAINS v. CONTAINS-SUBEVENT annotations from
RED in the former case, as well as the EVENT
v. STATE and TELIC v. ATELIC annotations from
SitEnt
in the latter. In both cases, we used
le (ridit-scored) confidence-weighted average
response across annotators for a particular item
as features in a simple SVM classifier with linear
kernel. In a leave-one-out cross-validation on the
binary classification task for RED, we achieve
a micro-averaged F1 score of 0.79—exceeding
the reported human F1 agreement for both the
CONTAINS (0.640) and CONTAINS-SUBEVENT (0.258)
annotations reported by O’Gorman et al. (2016).
For SitEnt, we evaluate on a three-way
classification task for STATIVE, EVENTIVE-TELIC,
and EVENTIVE-ATELIC, achieving a micro-averaged
F1 of 0.68 using the same leave-one-out
cross-validation. As Friedrich and Palmer (2014un)
do not report interannotator agreement for this class
breakdown, we further compute Krippendorff’s
alpha from their raw annotations and again find
that agreement between our predicted annotations
and the gold ones (0.48) slightly exceeds the
interannotator agreement among humans (0.47).
These results suggest that our subprotocols cap-
ture relevant event structural phenomena as well
as linguistically trained annotators can and that
they may serve as effective alternatives to ex-
isting protocols while not requiring any linguis-
tic expertise.
23
Chiffre 3: Per-property histograms of alphas for IAA
between each crowd-sourced annotator and all experts.
Black lines show the experts-only alpha, with dashed
lines for the 95% CI. (Voir la figure 2).
Qualification For the qualification task, nous
selected eight of the sentences collected from
MASC for the event-subevent subprotocol on
which expert agreement was very high and which
were diverse in the types of events described.
We then obtained event-subevent annotations for
these sentences from 400 workers on Amazon
Mechanical Turk (AMT), and selected the top
200 among them on the basis of their agreement
with expert responses on the same items. These
workers were then permitted to participate in the
pilot tasks.
Pilot We conducted one pilot for each subpro-
tocol, using the items described in §4.1. Sentences
were presented in lists of 10 per Human Intelli-
gence Task (HIT) on AMT for the event-event
and event-entity subprotocols and in lists of 5 par
HIT for event-subevent. We collected annotations
depuis 10 distinct workers for each sentence, et
workers were permitted to annotate up to the full
for undergraduate research assistants performing corpus an-
notation at the first and final author’s institution: $12.50 per hour. A third-party estimate from TurkerView shows our ac- tual average hourly compensation when data collection was completed to be $14.71 per hour.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Annotation
Has natural parts
Parts similar
Parts dissimilar
(Part duration)
No natural parts
Dynamic
Not dynamic
(Full duration)
Natural endpoint
No natural endpoint
total
t
n
e
v
e
b
toi
s
–
t
n
e
v
E
t P1, P2 identical
n
e
v
e
–
t
n
e
v
E
P1, P2 disjoint
P1 ⊂ P2
P2 ⊂ P1
total
Distributive
Collective
oui
t
je
t
n
e
–
t
n
e
v
E
Count (%)
6,903 (23%)
4,498 (15%)
2,158 (7%)
(–)
23,069 (77%)
13,903 (48%)
8,839 (29%)
(–)
6,031 (20%)
23,941 (80%)
29,984
2,435 (6%)
30,247 (80%)
1,832 (5%)
3,029 (8%)
37,719
4,812 (50%)
4,876 (50%)
total
9,710
Exemple
The eighteen steps of the dance are done rhythmically
Israel resumed its policy of targeting militant leaders
Fish are probably the easiest to take care of
(ordinal; not shown)
It had better nutritional value
I would like to informally get together with you
I assume this is 12:30 Central Time?
(ordinal; not shown)
I will deliver it to you
If you know or work there could you enlighten me?
(all event descriptions)
All horses [. . . ] are happy1 & healthy2 when they
arrive
I am often stopped1 on the street and asked, ‘Who does
your hair . . . I LOVE2 it’
The office is shared with a foot doctor and it’s
very sterile1 and medical feeling2, which I liked
It is a very cruel death1 with bodies dismembered2
(pairs of event descriptions w/ temporal overlap)
the pics turned out ok
we draw on our many faith traditions
common
conviction
(event descriptions with plural arguments)
to arrive at a
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Tableau 1: Descriptive statistics and examples from Train and Dev data. Each item was annotated
by a single annotator in Train; and by three annotators in Dev, of which this table reports the major-
ity opinion.
5 Corpus Annotation
We collect crowd-sourced annotations for the en-
tirety of UD-EWT. Predicate and argument spans
are obtained from the PredPatt predicate-argument
graphs for UD-EWT available in UDS1.0. Le
total number of items annotated for each subpro-
tocol is presented in Table 1.
Event-subevent These annotations cover all
predicates headed by verbs (as identified by UD
POS tag), as well as copular constructions with
nominal and adjectival complements. In the for-
mer case, only the verb token is highlighted in the
task; in the latter, the highlighting spans from the
copula to the complement head.
Event-event Pairs for the event-event subpro-
tocol were drawn from the UDS-Time dataset,
which features pairs of verbal predicates, either
within the same sentence or in adjacent sen-
tences, each annotated with its start- and endpoint
relative to the other. We additionally included
predicate-argument pairs in cases where the
argument is annotated in UDS as having a Word-
Net supersense of EVENT, STATE, or PROCESS. À
our knowledge, this represents the largest pub-
licly available (partial) event coreference dataset
to date.
Event-entity For the event-entity subprotocol,
we identify predicate-argument pairs in which the
argument is plural or conjoined. Plural arguments
are identified by the UD NUMBER attribute, et
conjoined ones by a conj dependency between
an argument head and another noun. We con-
sider only predicate-argument pairs with a UD
dependency of nsubj, nsubjpass, dobj, ou
iobj.
24
6 Event Structure Induction
Our goal in inducing event structural categories is
to learn representations of those categories on the
basis of annotated UDS graphs, augmented with
the new UDS-E annotations. We aim to learn four
sets of interdependent classifications grounded
in UDS properties: event types, entity types, avec-
mantic role types, and event-event relation types.
These classifications are interdependent in that we
assume a generative model that incorporates both
sentence- and document-level structure.5
edges
Document-level UDS Semantics
dans
UDS1.0 represent only sentence-internal semantic
relations. This constraint implies that annotations
for cross-sentential semantic relations—a signif-
icant subset of our event-event annotations—
cannot be represented in the graph structure. À
remedy this, we extend UDS1.0 by adding doc-
ument edges that connect semantics nodes either
within a sentence or in two distinct sentences,
and we associate our event-event annotations
with their corresponding document edge (voir
Chiffre 1). Because UDS1.0 does not have a
notion of document edge, it does not contain
Vashishtha et al.’s (2019) fine-grained temporal
relation annotations, which are highly relevant to
event-event relations. We additionally add those
attributes to their corresponding document edges.
Generative Model Algorithm 1 gives the gen-
erative story for our event structure induction
model. We assume some number of types of events
Tevent, roles Rrole, entities Tent, and relations Rrel.
Chiffre 4 shows the resulting factor graph for the
semantic graphs shown in Figure 1.
Annotation Likelihoods The distribution f a
p
on the annotations themselves is implemented as a
mixed model (Gelman and Hill, 2014) dependent
on property p being annotated with annotator ran-
dom intercepts R, where the random intercepts
for annotator a are ρa ∼ N (0, Σann) with un-
known Σann. When p receives binary annotations,
a simple logistic mixed model is assumed, où
p = Bern(logit−1(μip + ρaip)) and ip is the in-
f a
dex corresponding to property p in the expected
annotation μ. When p receives nominal annota-
tion, f a
p = Cat(softmax(μip + ρaip)) and ip is
a set with cardinality of the number of nominal
Initialize queue I ;
for sentence s ∈ S do
Initialize queue J ;
Enqueue J → I ;
if length(je) > W then
Dequeue I
(cid:3)
;
for predicate node v ∈ predicates(s) do
je(event)
(cid:2)
Sample event type tsv ∼ Cat
for property p ∈ Pevent do
for annotator i ∈ A(event)
(cid:2)
(cid:3)
svp
∼ f i
p
do
m(event)
tsv
Sample x(event)
svpi
Enqueue (cid:7)s, v(cid:8) → J;
for argument node v(cid:9) ∈ arguments(s, v) do
;
je(entity)
(cid:3)
(cid:2)
Sample ent. type tsv(cid:9) ∼ Cat
for property p ∈ Pent do
for annotator i ∈ A(ent)
sv(cid:9)p do
(cid:2)
∼ f i
p
Sample x(ent)
sv(cid:9)pi
if v(cid:9) is eventive then
Enqueue (cid:7)s, v(cid:9)(cid:8) → J;
(cid:3)
m(ent)
tsv(cid:9)
(cid:2)
(cid:3)
;
je(role)
tsv tsv(cid:9)
Sample role type rsvv(cid:9) ∼ Cat
for property p ∈ Prole do
for annotator i ∈ A(role)
svv(cid:9)p do
(cid:2)
∼ f i
p
Sample x(role)
m(role)
svv(cid:9)pi
rsvv(cid:9)
for index pair (cid:7)s(cid:9), v(cid:9)(cid:8) ∈ flatten(je) do
(cid:2)
(cid:3)
Sample rel. type q ∼ Cat
for property p ∈ Prel do
for annotator i ∈ A(rel)
Sample x(rel)
svs(cid:9)v(cid:9)pi
svs(cid:9)v(cid:9)p do
(cid:2)
m(rel)
∼ f i
q
p
(cid:3)
(cid:3)
;
je(rel)
tsv ts(cid:9)v(cid:9)
Algorithm 1: Generative story of event struc-
ture induction model for a single document with
sentence window W .
catégories. And when p receives ordinal annota-
tion, we follow White et al. (2020) in using an
ordinal (linked logit) mixed effects model where
ρa defines the cutpoints between response values
in the cumulative density function for annotator a:
≤ j) = logit−1(μip
P.(xaip
p (xaip = j) = P(xaip
f a
− ρaip)
≤ j) − P(xaip
≤ j − 1)
Conditional Properties For both our dataset
and UDS-Protoroles, certain annotations are con-
ditioned on others, owing to the fact that whether
some questions are asked at all depends upon
annotator responses to previous ones. Following
White et al. (2017), we model the likelihoods
for these properties using hurdle models (Agresti,
2014): For a given property, a Bernoulli distribu-
tion determines whether the property applies; si
it does, the property value is determined using a
second distribution of the appropriate type.
5See Ferraro and Van Durme, 2016 for a related model
that uses FrameNet’s ontology, rather than inducing its own.
Temporal Relations Temporal relations an-
notations from UDS-Time consist of 4-tuples
25
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 4: The factor graph for the pair of sentences shown in Figure 1 based on the generative story given in
Algorithm 1. Each node or edge annotated in the semantics graphs becomes a variable node in the factor graph,
as indicated by the dotted lines. Only factors for the prior distributions over types are shown; the annotation
likelihood factors associated with each variable node are omitted for space.
→
e1,
←
e2,
→
e1,
←
e1,
←
→
e2) of real values on the unit inter-
(
e1,
val, representing start- and endpoints of two
event-referring predicates or arguments, e1 and
e2. Each tuple is normalized such that the ear-
←
e2) is always locked to the left end
lier of (
→
of the scale (0) and the later of (
e2) to the
right end (1). The likelihood for these annotations
must consider the different possible orderings of
the two events. To do so, we first determine
←
e2 is, or both are, according
si
to Cat (softmax(μlock← + ρailock← )). We do like-
→
e2, using a separate distribution
wise for
Cat (softmax(μlock→ + ρailock→ )). Enfin, if the
start point from one event and the endpoint from
the other are free (c'est à dire., not locked), we determine
their relative ordering using a third distribution
Cat (softmax(μlock↔ + ρailock↔ )).
←
e1 is locked,
→
e1 and
Implementation We fit our model to the train-
ing data using expectation-maximization. We use
loopy belief propagation to obtain the posteri-
ors over event, entity, role, and relation types in
the expectation step and the Adam optimizer to
estimate the parameters of the distributions asso-
ciated with each type in the maximization step.6
As a stopping criterion, we compute the evidence
that the model assigns to the development data,
stopping when this quantity begins to decrease.
To make use of the (ridit-scored) confidence
∈ (0, 1) associated with each anno-
response caip
tation xaip, we weight the log-likelihood of xaip
by caip when computing the evidence of the anno-
tations. This weighting encourages the model to
6The variable and factor nodes for the relation types can
introduce cycles into the factor graph for a document, lequel
is what necessitates the loopy variant of belief propagation.
26
explain annotations that an annotator was highly
confident in, penalizing the model less if it assigns
low likelihood to a low confidence annotation.
To select |Tevent|, |Tent|, |Rrole|, et |Rrel| pour
Algorithm 1, we fit separate mixture models
for each classification—i.e., removing all factor
nodes—using the same likelihood functions f i
p as
in Algorithm 1. We then compute the evidence
that the simplified model assigns to the develop-
ment data given some number of types, choosing
the smallest number such that there is no reliable
increase in the evidence for any larger number.
To determine reliability, we compute 95% con-
fidence intervals using nonparametric bootstraps.
Surtout, this simplified model is only used
to select |Tevent|, |Tent|, |Rrole|, et |Rrel|: tous
analyses below are conducted on full model fits.
Types The selection procedure described above
yields |Tevent| = 4, |Tent| = 8, |Rrole| = 2, et
|Rrel| = 5. To interpret these classes, we inspect
the property means μt associated with each type t
and give examples from UD-EWT for which the
posterior probability of that type is high.
Event Types While our goal was not neces-
sarily to reconstruct any particular classification
from the theoretical literature, the four event types
align fairly well with those proposed by Vendler
(1957): statives (16), activités (17), achievements
(18), and accomplishments (19). We label our
clusters based on these interpretations (Chiffre 5).
(16) I have finally found a mechanic I trust!!
(17) his agency is still reviewing the decision.
(18) A suit against [. . . ] Kristof was dismissed.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 5: Probability of binary properties from the
event-subevent protocol by event type. Cells marked
with ‘‘N/A’’ indicate that the property generally does
not apply for the corresponding type because of the
conditional dependence on natural parts.
(19) a consortium [. . . ] established in 1997
One difference between Vendler’s classes and our
own is that our ‘‘activities’’ correspond primarily
to those without dynamic subevents, while our
‘‘accomplishments’’ encompass both his accom-
plishments and activities with dynamic subevents
(see discussion of Taylor, 1977 in §2).
Even if approximate, this alignment is surpris-
ing given that Vendler’s classification was not
developed with actual language use in mind and
thus abstracts away from complexities that arise
when dealing with, Par exemple, non-factual or
generic events. Néanmoins, there do arise cases
where a particular predicate has a wider distri-
bution across types than we might expect based
on prior work. Par exemple, know is prototypi-
cally stative; and while it does get classed that
way by our model, it also gets classed as an ac-
complishment or achievement (though rarely an
activité)—for example, when it is used to talk
about coming to know something, as in (20).
(20) Please let me know how[. . . ]to proceed.
Entity Types Our entity types are: par-
son/group (21), concrete artifact (22), contentful
artifact (23), particular state/event (24), generic
state/event (25), temps (26), kind of concrete objects
(27), and particular concrete objects (28).
(21) Have a real mechanic check[…]
(22) I have a [. . . ] cockatiel, and there are 2 eggs
in the bottom of the cage[. . . ]
(23) Please find attached a credit worksheet[. . . ]
Chiffre 6: Probability of role type properties. These
include existing UDS protoroles properties, along with
the distributive property from the event-entity sub-
protocol. We have labeled the role types with our
proto-agent/proto-patient interpretation given below.
(24) He didn’t take a dislike to the kids[…]
(25) They require a lot of attention [. . . ]
(26) Every move Google makes brings this
particular future closer.
(27) And what is their big / main meal of the day.
(28) Find him before he finds the dog food.
que
there are only two abstract
Role Types The optimality of
two role
types is consistent with Dowty’s (1991) pro-
posal
role
prototypes—proto-agent and proto-patient—into
which individual thematic roles (c'est à dire., those spe-
cific to particular predicates) cluster. Plus loin, le
means for the two role types we find very closely
track those predicted by Dowty (voir la figure 6),
with clear proto-agents (29) and proto-patients
(30) (see also White et al., 2017).
(29) they don’t press their sandwiches.
(30) you don’t ever feel like you ate too much.
Relation Types The relation types we obtain
track closely with approaches that use sets of
underspecified temporal relations (Cassidy et al.,
2014; O’Gorman et al., 2016; Zhou et al., 2019,
2020; Wang et al., 2020): e1 starts before e2 (31),
27
e2 starts before e1 (32), e2 ends after e1 (33), e1
contains e2 (34), and e1 = e2 (35).
(31) [. . . ]the Spanish, Thai and other contingents
are already committed to leaving [. . . ]
(32) And I have to wonder: Did he forget that he
already has a memoir[. . . ]
(33) Non, i am not kidding and no i don’t want it
b/c of the taco bell dog. i want it b/c it is
really small and cute.
(34) they offer cheap air tickets to their country
[. . . ] you may get excellent discount airfare,
which may even surprise you.
(35) the food is good, however the tables are so
close together that it feels very cramped.
Type Consistency To assess the impacts of
sentence/document-level structure (Algorithm 1)
and confidence weighting on the types we induce,
we investigate how the distributions over role and
event types shift when comparing models fit with
and without structure and confidence weighting.
Chiffre 7 visualizes these shifts as row-normalized
confusion matrices computed from the posteriors
across items derived from each model. It compares
(top) the simplified model used for model selection
(rows) against the full model without confidence
weighting (columns), et (bottom) the full model
without confidence weighting (rows) against the
one with (columns).7
D'abord, we find that the interaction between types
afforded by incorporating the full graph struc-
ture (top plot) produces small shifts in both the
event and role type distributions, suggesting that
the added structure may help chiefly in resolv-
ing boundary cases, which is exactly what we
might hope additional model structure would do.
Deuxième, weighting likelihoods by annotator con-
fidence (bottom) yields somewhat larger shifts
as well as more entropic posteriors (0.22 average
normalized entropy for events; 0.30 for roles) que
without weighting (0.02 for events; 0.22 for roles).
Higher entropy is expected (and to some extent,
desirable) ici: Introducing a notion of confidence
should make the model less confident about items
that annotators were less confident about. Plus loin,
among event types, the distribution of posterior
entropy across items is driven by a minority of
high uncertainty items, as evidenced by a very low
7The distributional shifts for entity and relation types were
extremely small, and so we do not discuss them here.
28
Chiffre 7: Confusion matrices for event and role types.
median normalized entropy for event types (0.02).
The opposite appears to be be true among the role
les types, for which the median is high (0.60). Ce
latter pattern is perhaps not surprising in light
of theoretical accounts of semantic roles, tel
as Dowty’s: The entire point of such accounts
is that it is very difficult to determine sharp role
catégories, suggesting the need for a more continu-
ous notion.
7 Comparison to Existing Ontologies
To explore the relationship between our induced
classification and existing event and role ontolo-
gies, we ask how well our event, role, and entity
types map onto those found in PropBank and
VerbNet. Surtout, the goal here is not perfect
alignment between our types and PropBank and
VerbNet types, but rather to compare other classi-
fications that reflect top–down assumptions to the
one we derive bottom–up.
p
Pennsylvanie
Implementation To carry out these compar-
isons, we use the parameters of the posterior
distributions over event types θ(ev)
for each predi-
cate p, over role types θ(role)
for each argument a
of each predicate p, and over entity types θ(ent)
pour
Pennsylvanie
each argument a of each predicate p as features in
an SVM with RBF kernel predicting the event and
role types found in PropBank and VerbNet. Nous
take this route, over direct comparison of types,
to account for the possibility that information en-
coded in role or event types within VerbNet or
PropBank is distributed differently in our more
abstract classification. We tune L2 regularization
(λ ∈ {1, 0.5, 0.2, 0.1, 0.01, 0.001}) and bandwidth
(γ ∈ {1e-2, 1e-3, 1e-4, 1e-5}) using grid search,
selecting the best model based on performance
on the standard UD-EWT development set. All
metrics reflect UD-EWT test set performance.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Role
P.
R.
F Micro F
Predicate
P.
R.
F
argnum
functag
verbnet
A0
A1
pag
ppt
0.58 0.63 0.60
0.72 0.78 0.75
0.57 0.59 0.58
0.65 0.77 0.71
agent
0.64 0.54 0.59
patient 0.20 0.14 0.16
0.55 0.58 0.57
theme
0.67
0.62
NA
Tableau 2: Test set results for all role types that are
labeled on at least 5% of the development data.
Role Type Comparison We first obtain a map-
ping from UDS predicates and arguments to the
PropBank predicates and arguments annotated in
EWT. Each such argument in PropBank is anno-
tated with an argument number (A0-A4) aussi
as a function tag (PAG = agent, PPT = patient,
etc.). We then compose this mapping with the
mapping given in the PropBank frame files from
PropBank rolesets to sets of VerbNet classes and
from PropBank roles to sets of VerbNet roles
(AGENT, PATIENT, THEME, etc.) to obtain a mapping
from UDS arguments to sets of VerbNet roles.
Because a particular argument maps to a set of
VerbNet roles, we treat predicting VerbNet roles
as a multi-label problem, fitting one SVM per
role. For each argument a of predicate p, we use
as predictors [je(ev)
p¬a ], avec
p¬aj = [maxa(cid:9)(cid:13)=a θ(role/ent)
je(role/ent)
].
Tableau 2 gives the test set results for all role types
labeled on at least 5% of the development data. Pour
comparison, a majority guessing baseline obtains
micro F1s of 0.58 (argnum) et 0.53 (functag).8
Our roles tend to align well with agentive roles
(PAG, AGENT, and A0) and some non-agentive roles
(PPT, THEME, and A1), but they align less well with
other non-agentive roles (PATIENT). This result sug-
gests that our two-role classification aligns fairly
closely with the agentivity distinctions in Prop-
Bank and VerbNet, as we would expect if our
roles in fact captured something like Dowty’s
coarse distinction among prototypical agents
and patients.
Pennsylvanie ; je(role)
p¬a ; je(ent)
, meana(cid:9)(cid:13)=aθ(role/ent)
Pennsylvanie(cid:9)j
; je(role)
Pennsylvanie
; je(ent)
Pennsylvanie(cid:9)j
p
Event Type Comparison The PropBank role-
set and VerbNet class ontologies are extremely
8A majority baseline for VerbNet roles always yields an
F1 of 0 in our multi-label setup, since no role is assigned to
more than half of arguments.
29
cause
do
has possession
has location
motion
0.51
0.30
0.23
0.11
0.09
0.95
0.25
0.18
0.14
0.10
0.66
0.27
0.20
0.12
0.09
Tableau 3: Test set results for all Verb-
Net predicates that are labeled on five
most frequent predicates.
compare
fine-grained, with PropBank capturing specific
predicate senses and VerbNet capturing very
fine-grained syntactic behavior of a generally
small set of predicates. Since our event types
are intended to be more general than either, nous
do not compare it directly to PropBank rolesets or
VerbNet classes.
Plutôt, nous
to the generative
lexicon-inspired variant of VerbNet’s semantics
(Brown et al., 2018). An example of
layer
the predicate give-13.1 is
this layer
has possession(e1, Ag, Th) & transfer(e2, Ag,
Th, Rec) & cause(e2, e3) & has possession(e3,
Rec, Th). We predict only the abstract predicates
in this decomposition (par exemple., transfer or cause),
treating the problem as multi-label and fitting
one SVM per predicate. For each predicate p,
; je(ent)
; je(role)
we use as predictors [je(ev)
], avec
p·
p·
p
, meanaθ(role/ent)
je(role/ent)
].
p·j
= [maxa θ(role/ent)
pour
paj
paj
Tableau 3 gives the test set results for the five
most frequent predicates in the corpus. For com-
parison, a majority guessing baseline would yield
the same F (0.66) as our model for CAUSE, mais
since none of the other classes are assigned to
more than half of events, majority guessing for
those would yield an F of 0. This result suggests
que, while there may be some agreement between
our classification and VerbNet’s semantics layer,
the two representations are relatively distinct.
8 Conclusion
We have presented an event structure classifica-
tion derived from inferential properties annotated
on sentence- and document-level semantic graphs.
We induced this classification jointly with seman-
tic role, entity, and event-event relation types
using a document-level generative model. Notre
model identifies types that approximate theoret-
ical predictions—notably, four event types like
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Vendler’s, as well as proto-agent and proto-patient
role types like Dowty’s. We hope this work
encourages greater interest in computational ap-
proaches to event structural understanding while
also supporting work on adjacent problems in
NLU, such as temporal information extraction and
(partial) event coreference, for which we provide
the largest publicly available dataset to date.
Remerciements
We would like to thank Emily Bender, Dan Gildea,
and three anonymous reviewers for detailed com-
ments on this paper. We would also like to thank
members of the Formal and Computational Se-
mantics lab at the University of Rochester for
feedback on the annotation protocols. This work
was supported in part by the National Science
Fondation (BCS-2040820/2040831, Collabora-
tive Research: Computational Modeling of the
Internal Structure of Events) as well as by DARPA
AIDA and DARPA KAIROS. The views and con-
clusions contained in this work are those of the
authors and should not be interpreted as nec-
essarily representing the official policies, either
expressed or implied, or endorsements of DARPA
or the U.S. Government. The U.S. Government
is authorized to reproduce and distribute reprints
for governmental purposes notwithstanding any
copyright annotation therein.
Les références
Omri Abend and Ari Rappoport. 2013. Univer-
sal Conceptual Cognitive Annotation (UCCA).
In Proceedings of the 51st Annual Meeting
of the Association for Computational Linguis-
tics (Volume 1: Long Papers), pages 228–238,
Sofia, Bulgaria. Association for Computational
Linguistics.
Alan Agresti. 2014. Categorical Data Analysis,
John Wiley & Fils.
James Allen, Hannah An, Ritwik Bose, Will
de Beaumont, and Choh Man Teng. 2020.
A broad-coverage deep semantic lexicon for
the 12th Lan-
verbs.
guage Resources and Evaluation Conference,
pages 3243–3251, Marseille, France. européen
Language Resources Association.
In Proceedings of
Collin F. Boulanger, Charles J. Fillmore, and John B.
Lowe. 1998. The Berkeley Framenet Project. Dans
Proceedings of the 17th International Confer-
ence on Computational Linguistics, volume 1,
pages 86–90. Association for Computational
Linguistics.
Laura Banarescu, Claire Bonial, Shu Cai, Madalina
Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin
Knight, Philipp Koehn, Martha Palmer, et
Nathan Schneider. 2013. Abstract Meaning
Representation for sembanking. In Proceed-
ings of the 7th Linguistic Annotation Work-
shop and Interoperability with Discourse,
pages 178–186, Sofia, Bulgaria. Association
for Computational Linguistics.
Michael Ruisdael Bennett and Barbara Hall
Partee. 1978. Towards the Logic of Tense
and Aspect
Indiana University
Linguistics Club, Bloomington, IN.
in English.
Ann Bies, Justin Mott, Colin Warner, and Seth
Kulick. 2012. English web treebank. Linguistic
Data Consortium, Philadelphia, Pennsylvanie.
Susan Windisch Brown,
James Pustejovsky,
Annie Zaenen, and Martha Palmer. 2018. Dans-
tegrating Generative Lexicon event structures
into VerbNet. In Proceedings of the Eleventh In-
ternational Conference on Language Resources
and Evaluation (LREC 2018), Miyazaki, Japan.
European Language Resources Association
(ELRA).
Taylor Cassidy, Bill McDowell, Nathanael
Chambers, and Steven Bethard. 2014. An an-
notation framework for dense event ordering.
In Proceedings of the 52nd Annual Meeting
of the Association for Computational Linguis-
tics (Volume 2: Short Papers), pages 501–506,
Baltimore, Maryland. Association for Compu-
tational Linguistics. https://est ce que je.org/10
.3115/v1/P14-2082
Lucas Champollion. 2010. Parts of a Whole:
Distributivity as a Bridge between Aspect
and Measurement. Ph.D. thesis, Université de
Pennsylvania, Philadelphia.
Timothy Chklovski and Patrick Pantel. 2004.
Verbocean: Mining the web for fine-grained
semantic verb relations. In Proceedings of the
2004 Conference on Empirical Methods in
Natural Language Processing, pages 33–40.
Emmon Bach. 1986. The algebra of events.
William Croft. 2012. Verbs: Aspect and causal
Linguistics and Philosophy, 9(1):5–16.
structure. Presse universitaire d'Oxford.
30
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Agata Cybulska and Piek Vossen. 2014. Us-
ing a sledgehammer to crack a nut? Lexical
diversity and event coreference resolution.
In Proceedings of
the Ninth International
Conference on Language Resources and Evalu-
ation (LREC’14), pages 4545–4552, Reykjavik,
Iceland. European Language Resources Asso-
ciation (ELRA).
Lucia Donatelli, Michael Regan, William Croft,
and Nathan Schneider. 2018. Annotation of
tense and aspect semantics for sentential AMR.
In Proceedings of
the Joint Workshop on
Linguistic Annotation, Multiword Expressions
and Constructions
(LAW-MWE-CxG-2018),
pages 96–108.
Bonnie Jean Dorr. 1993. Machine Translation: UN
View from the Lexicon. AVEC Presse.
David Dowty. 1979. Word Meaning and Mon-
tague Grammar: The Semantics of Verbs and
Times in Generative Semantics and in Mon-
tague’s PTQ, volume 7, Springer Science &
Business Media.
David Dowty. 1991. Thematic proto-roles and
argument selection. Language, 67(3):547–619.
https://doi.org/10.2307/415037,
https://doi.org/10.1353/lan.1991
.0021
George
et
Ferguson
James
integrated
F. Allen.
1998. TRIPS: Un
intelligent
problem-solving assistant. In Proceedings of
the Fifteenth National/Tenth Conference on
Artificial Intelligence/Innovative Applications
of Artificial Intelligence, AAAI ’98/IAAI ’98,
pages 567–572, Etats-Unis. American Association
for Artificial Intelligence.
Francis Ferraro and Benjamin Van Durme.
2016. A unified Bayesian model of scripts,
frames and language. In Proceedings of the
AAAI Conference on Artificial Intelligence,
volume 30.
Annemarie Friedrich and Damyana Gateva. 2017.
Classification of telicity using cross-linguistic
annotation projection. In Proceedings of the
2017 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2559–2565,
Copenhagen, Denmark. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/D17-1271
Annemarie Friedrich and Alexis Palmer. 2014un.
Automatic prediction of aspectual class of
31
verbs in context. In Proceedings of the 52nd
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 2: Short Papers),
pages 517–523, Baltimore, Maryland. Associa-
tion for Computational Linguistics. https://
doi.org/10.3115/v1/P14-2085
Annemarie Friedrich and Alexis Palmer. 2014b.
Situation entity annotation. In Proceedings of
LAW VIII – The 8th Linguistic Annotation
Workshop, pages 149–158, Dublin, Ireland.
Association for Computational Linguistics and
Dublin City University. https://doi.org
/10.3115/v1/W14-4921
Annemarie Friedrich, Alexis Palmer, and Manfred
Pinkal. 2016. Situation entity types: Auto-
matic classification of clause-level aspect. Dans
Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Long Papers), pages 1757–1768,
Berlin, Allemagne. Association for Computa-
tional Linguistics. https://est ce que je.org/10
.18653/v1/P16-1166
William Gantt, Benjamin Kane, and Aaron Steven
Blanc. 2020. Natural language inference with
mixed effects. In Proceedings of
the Ninth
Joint Conference on Lexical and Compu-
tational Semantics, pages 81–87, Barcelona,
Espagne (En ligne). Association for Computational
Linguistics.
Andrew Gelman and Jennifer Hill. 2014.
Data Analysis Using Regression and Multi-
level/Hierarchical Models. Cambridge Univer-
sity Press, New York City.
Venkata Govindarajan, Benjamin Van Durme,
and Aaron Steven White. 2019. Decompos-
ing generalization: Models of generic, habitual,
and episodic statements. Transactions of the
Association for Computational Linguistics,
7:501–517. https://est ce que je.org/10.1162
/tacl_a_00285
Eduard Hovy, Teruko Mitamura, Felisa Verdejo,
Jun Araki, and Andrew Philpot. 2013. Events
are not simple:
Identité, non-identity, et
quasi-identity. In Workshop on Events: Defi-
nition, Detection, Coreference, and Represen-
tation, pages 21–28.
Nancy Ide, CollinBaker, ChristianeFellbaum, Charles
Fillmore, and Rebecca Jane Passonneau. 2008.
MASC: The manually annotated sub-corpus
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
In 6th International
of American English.
Conference on Language Resources and Evalu-
ation, LREC 2008, pages 2455–2460. européen
Language Resources Association (ELRA).
Ray Jackendoff. 1990. Semantic Structures,
volume 18. AVEC Presse.
Anthony Kenny. 1963. Action, Emotion and Will,
Humanities Press, Londres.
Karin Kipper Schuler. 2005. VerbNet: UN
Broad-coverage, Comprehensive Verb Lexicon.
Ph.D. thesis, University of Pennsylvania.
Manfred Krifka. 1989. Nominal reference, tem-
poral constitution and quantification in event se-
mantics. In Renate Bartsch, Johan van Benthem,
and Peter van Emde Boas, editors, Semantics
and Contextual Expressions, pages 75–115.
Foris, Dordrecht. https://est ce que je.org/10
.1515/9783110877335-005
Manfred Krifka. 1992. Thematic relations as
links between nominal reference and temporal
constitution. Lexical Matters, 2953.
Manfred Krifka. 1998. The origins of
telic-
ville. In Susan Rothstein, editor, Events and
Grammar, Studies in Linguistics and Philos-
ophy, pages 197–235. Springer Netherlands,
Dordrecht. https://doi.org/10.1007
/978-94-011-3969-4_9
K. Krippendorff. 2004. Content Analysis: Un
Introduction to Its Methodology. Sage.
George Lakoff. 1965. On the Nature of Syn-
tactic Irregularity. Ph.D. thesis, Massachusetts
Institut de technologie.
Beth Levin. 1993. English Verb Classes and
Alternations: A Preliminary Investigation.
University of Chicago Press, Chicago.
Beth Levin and Malka Rappaport Hovav. 1991.
Wiping the slate clean: A lexical seman-
tic exploration. Cognition, 41(1-3):123–151.
https://doi.org/10.1016/0010
-0277(91)90034-2
Marc Moens and Mark Steedman. 1988. Temporal
ontology and temporal reference. Computa-
tional Linguistics, 14(2):15–28.
Joakim Nivre, Marie-Catherine de Marneffe, Filip
Ginter, Yoav Goldberg, Jan Hajiˇc, Christophe
D. Manning, Ryan McDonald, Slav Petrov,
Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty,
and Daniel Zeman. 2016. Universal Depen-
treebank collec-
dencies v1: A multilingual
tion. In Proceedings of the Tenth International
Conference on Language Resources and Eval-
uation (LREC’16), pages 1659–1666, Portoroˇz,
Slovenia. European Language Resources Asso-
ciation (ELRA).
Tim O’Gorman, Kristin Wright-Bettner, et
Martha Palmer. 2016. Richer event description:
Integrating event coreference with temporal,
causal and bridging annotation. In Proceed-
the 2nd Workshop on Computing
ings of
News Storylines (CNS 2016), pages 47–56,
Austin, Texas. Association for Computa-
tional Linguistics. https://est ce que je.org/10
.18653/v1/W16-5706
Mari Broman Olsen. 1997. A Semantic and
Pragmatic Model of Lexical and Grammatical
Aspect. Outstanding Dissertations
in Lin-
guistics. Garland. https://est ce que je.org/10
.1162/0891201053630264
Martha Palmer, Daniel Gildea, and Paul Kingsbury.
2005. The proposition bank: An annotated
corpus of semantic roles. Computational Lin-
guistics, 31(1):71–106.
Christopher Pi˜n´on. 1995. An Ontology for Event
Semantics. Ph.D. thesis, Université de Stanford,
Palo Alto.
James Pustejovsky. 1995. The Generative Lexi-
con. AVEC Presse, Cambridge, MA.
James Pustejovsky. 2013. Dynamic event struc-
ture and habitat theory. In Proceedings of the
6th International Conference on Generative Ap-
proaches to the Lexicon (GL2013), pages 1–10,
Pisa,
Italy. Association for Computational
Linguistics.
James Pustejovsky, Marc Verhagen, Roser Saur´ı,
Jessica Littman, Robert Gaizauskas, Graham
Katz, Inderjeet Mani, Robert Knippen, et
Andrea Setzer. 2006. TimeBank 1.2. Linguistic
Data Consortium, 40.
Alexander P. D. Mourelatos. 1978. Events,
processes, and states. Linguistics and Philoso-
phy, 2(3):415–434. https://est ce que je.org/10
.1007/BF00149015
Malka Rappaport Hovav and Beth Levin. 1998.
Building verb meanings. The Projection of Ar-
guments: Lexical and Compositional Factors,
pages 97–134.
32
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Malka Rappaport Hovav and Beth Levin. 2001. Un
event structure account of english resultatives.
Language, 77(4):766–797.
Drew Reisinger, Rachel Rudinger, Francis
Ferraro, Craig Harman, Kyle Rawlins, et
Benjamin Van Durme. 2015. Semantic proto-
roles. Transactions of the Association for Com-
putational Linguistics, 3:475–488. https://
doi.org/10.1162/tacl a 00152
Rachel Rudinger, Adam Teichert, Ryan Culkin,
Sheng Zhang, and Benjamin Van Durme.
2018. Neural-Davidsonian semantic proto-role
le 2018 Con-
labeling. In Proceedings of
ference on Empirical Methods in Natural
Language Processing, pages 944–955, Brus-
sels, Belgium. Association for Computa-
tional Linguistics. https://est ce que je.org/10
.18653/v1/D18-1114
Carlota S. Forgeron. 2003. Modes of Discourse: Le
Local Structure of Texts, volume 103. Cam-
bridge University Press. https://doi.org
/10.1017/CBO9780511615108
Elias Stengel-Eskin, Kenton Murray, Sheng
Zhang, Aaron Steven White, and Benjamin
Van Durme. 2021. Joint universal syntac-
tic and semantic parsing. arXiv preprint
arXiv:2104.05696
Elias Stengel-Eskin, Aaron Steven White, Sheng
Zhang, and Benjamin Van Durme. 2020.
Universal decompositional semantic parsing.
In Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 8427–8439, En ligne. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.acl-main.746
Barry Taylor. 1977. Tense and continuity.
Linguistics and Philosophy, 1(2):199–220
Adam Teichert, Adam Poliak, Benjamin
Van Durme, and Matthew Gormley. 2017. Se-
mantic proto-role labeling. In Proceedings of
the AAAI Conference on Artificial Intelligence,
volume 31.
Carol Lee Tenny. 1987. Grammaticalizing Aspect
and Affectedness. Ph.D. thesis, Massachusetts
Institut de technologie.
Robert Truswell, editor. 2019. The Oxford Hand-
book of Event Structure, Oxford University
Presse, Oxford. Publication Title: The Oxford
Handbook of Event Structure. https://
doi.org/10.1093/oxfordhb/978
0199685318.001.0001
Siddharth Vashishtha, Benjamin Van Durme, et
Aaron Steven White. 2019. Fine-grained tem-
poral relation extraction. In Proceedings of the
57th Annual Meeting of the Association for
Computational Linguistics, pages 2906–2919,
Florence,
Italy. Association for Computa-
tional Linguistics. https://est ce que je.org/10
.18653/v1/P19-1280
Zeno Vendler. 1957. Verbs and times. Philosoph-
ical Review, 66(2):143–160. https://est ce que je
.org/10.2307/2182371
Henk J. Verkuyl. 1972. On The Compositional
Nature Of The Aspects, volume 15 of Foun-
dations of Language. D. Reidel Publishing
Company, Dordrecht. https://doi.org
/10.1007/978-94-017-2478-4
Haoyu Wang, Muhao Chen, Hongming Zhang,
and Dan Roth. 2020. Joint constrained learning
for event-event relation extraction. In Proceed-
ings of the 2020 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP),
pages 696–706, En ligne. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.emnlp-main.51
Aaron Steven White, Kyle Rawlins, and Benjamin
Van Durme. 2017. The semantic proto-role
linking model. In Proceedings of
the 15th
Conference of the European Chapter of the
Association for Computational Linguistics:
Volume 2, Short Papers, pages 92–98, Va-
lencia, Espagne. Association for Computational
Linguistics.
Aaron Steven White, Drew Reisinger, Keisuke
Sakaguchi, Tim Vieira, Sheng Zhang, Rachel
and Benjamin
Rudinger, Kyle Rawlins,
Van Durme. 2016. Universal decompositional
semantics on Universal Dependencies. En Pro-
ceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing,
pages 1713–1723, Austin, Texas. Association
for Computational Linguistics.
Aaron Steven White, Elias Stengel-Eskin, Siddharth
Vashishtha, Venkata Subrahmanyan Govindarajan,
Dee Ann Reisinger, Tim Vieira, Keisuke
Sakaguchi, Sheng Zhang, Francis Ferraro,
Rachel Rudinger, Kyle Rawlins, and Benjamin
Van Durme. 2020. The universal decomposi-
tional semantics dataset and decomp toolkit. Dans
33
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Proceedings of the 12th Language Resources
and Evaluation Conference, pages 5698–5707,
Marseille, France. European Language Re-
sources Association.
Sheng Zhang, Rachel Rudinger, and Benjamin
Van Durme. 2017. An Evaluation of PredPatt
and Open IE via Stage 1 Semantic Role Label-
ing. In IWCS 2017—12th International Con-
ference on Computational Semantics—Short
papers.
Ben Zhou, Daniel Khashabi, Qiang Ning, et
Dan Roth. 2019.
‘‘going on a vacation’’
takes longer than ‘‘going for a walk’’: UN
study of temporal commonsense understand-
ing. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Confe-
rence
on Natural Language Processing
(EMNLP-IJCNLP), pages 3363–3369, Hong
Kong, Chine. Association for Computa-
tional Linguistics. https://est ce que je.org/10
.18653/v1/D19-1332
Ben Zhou, Qiang Ning, Daniel Khashabi, and Dan
Roth. 2020. Temporal common sense acqui-
sition with minimal supervision. In Proceed-
ings of
le
Association for Computational Linguistics,
pages 7579–7589, En ligne. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.acl-main.678
the 58th Annual Meeting of
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
je
un
c
_
un
_
0
0
4
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
34