Decomposing and Recomposing Event Structure
William Gantt
University of Rochester, 美国
wgantt@cs.rochester.edu
Lelia Glass
Georgia Institute of Technology, 美国
lelia.glass@modlangs.gatech.edu
Aaron Steven White
University of Rochester, 美国
aaron.white@rochester.edu
抽象的
We present an event structure classification
empirically derived from inferential properties
annotated on sentence- and document-level
Universal Decompositional Semantics (UDS)
图表. We induce this classification jointly
with semantic role, 实体, and event-event re-
lation classifications using a document-level
generative model structured by these graphs.
To support this induction, we augment exist-
ing annotations found in the UDS1.0 dataset,
which covers the entirety of the English Web
树库, with an array of inferential prop-
erties capturing fine-grained aspects of the
temporal and aspectual structure of events.
The resulting dataset (available at decomp.io)
is the largest annotation of event structure and
(部分的) event coreference to date.
1
介绍
Natural language provides myriad ways of com-
municating about complex events. 例如,
one and the same event can be described at a
coarse grain, using a single clause (1), or at a finer
grain, using an entire document (2).
(1) The contractors built the house.
(2) They started by laying the house’s foun-
日期. They then framed the house before
installing the plumbing. After that […]
更远, descriptions of the same event at dif-
ferent granularities can be interleaved within the
same document—for example, (2) might well di-
rectly follow (1) as an elaboration on the house-
building process.
最后, extracting knowledge about
complex events from text involves determining
the structure of the events being referred to: 什么
their parts are, how those parts are laid out in time,
17
who participates in them and how, 等等.
Determining this structure requires an event classi-
fication whose elements are associated with event
structure representations. A number of such clas-
sifications and annotated corpora exist: FrameNet
(Baker et al., 1998), VerbNet (Kipper Schuler,
2005), PropBank (Palmer et al., 2005), 抽象的
Meaning Representation (Banarescu et al., 2013),
and Universal Conceptual Cognitive Annotation
(Abend and Rappoport, 2013), 除其他外.
Similar in spirit to this prior work, but dif-
ferent in method, our work aims to develop an
empirically derived event structure classification.
Where prior work takes a top–down approach—
hand-engineering an event classification before
deploying it for annotation—we take a bottom–up
approach—decomposing event structure into a
wide variety of theoretically informed, 叉-
cutting semantic properties, annotating for those
特性, then recomposing an event classifica-
tion from them by induction. The properties on
which our categories rest target (我) the substruc-
ture of an event (例如, that the building described in
(1) consists of a sequence of subevents resulting
in the creation of some artifact); (二) the super-
structure in which an event takes part (例如, 那
laying a house’s foundation is part of building a
房子, alongside framing the house, installing the
plumbing, ETC。); (三、) the relationship between an
event and its participants (例如, that the contractors
在 (1) build the house collectively through their
joint efforts); 和 (四号) properties of the event’s
参与者 (例如, that the contractors in (1) 是
animate while the house is not).
To derive our event structure classification, 我们
extend the Universal Decompositional Semantics
dataset (UDS; White et al., 2016, 2020). UDS anno-
tates for a subset of key event structure properties,
计算语言学协会会刊, 卷. 10, PP. 17–34, 2022. https://doi.org/10.1162/tacl 00445
动作编辑器: Emily Bender. 提交批次: 4/2021; 修改批次: 7/2021; 已发表 1/2022.
C(西德:2) 2022 计算语言学协会. 根据 CC-BY 分发 4.0 执照.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
but a range of key properties remain to be captured.
After motivating the need for these additional
特性 (§2), we develop annotation proto-
cols for them (§3). We validate our protocols
(§4) and use them to collect annotations for the
entire Universal Dependencies (Nivre et al., 2016)
English Web Treebank (§5; Bies et al., 2012),
resulting in the UDS-EventStructure dataset
(UDS-E). To derive an event structure classifica-
tion from UDS-E and existing UDS annotations,
we develop a document-level generative model
that jointly induces event, 实体, semantic role,
and event-event relation types (§6). 最后, 我们
compare these types to those found in exist-
ing event structure classifications (§7). We make
UDS-E and our code available at decomp.io.
2 Background
Contemporary theoretical
treatments of event
structure tend to take as their starting point
Vendler’s (1957) seminal four-way classification.
We briefly discuss this classification and elab-
orations thereon before turning to other event
structure classifications developed for annotating
corpora.1 We then contrast these with the fully
decompositional approach we take in this paper.
Theoretical Approaches Vendler categorizes
event descriptions into four classes: statives (3),
活动 (4), achievements (5), and accomplish-
评论 (6). As theoretical constructs, these classes
are used to explain both the distributional charac-
teristics of event descriptions as well as inferences
about how an event progresses over time.
(3) Jo was in the park.
stative = [+DUR, −DYN, −TEL]
(4) Jo ran around in the park.
activity = [+DUR, +DYN, −TEL]
(5) Jo arrived at the park.
achievement = [−DUR, +DYN, +TEL]
(6) Jo ran to the park.
accomplishment = [+DUR, +DYN, +TEL]
Work building on Vendler’s discovered that these
classes can be decomposed into the now well-
accepted component properties in (7)–(9) (Kenny,
1963; Lakoff, 1965; Verkuyl, 1972; Bennett and
Partee, 1978; Mourelatos, 1978; Dowty, 1979).
1The theoretical literature on event structure is truly vast.
See Truswell (2019) for a collection of overview articles.
(7) DUR(ATIVITY): whether the event happens at
an instant or extends over time
(8) DYN(AMICITY): whether the event
involves
改变, 广义解释
(9) TEL(ICITY): whether the event culminates in a
participant changing state or location, 存在
created or destroyed, 等等.
Later work further expanded these properties and,
所以, the possible classes. Expanding on DYN,
泰勒 (1977) suggests a distinction between dy-
namic predicates that refer to events with dynamic
subevents (例如, the individual strides in a running)
and ones that do not (例如, the gliding in (10)) (看
also Bach, 1986; 史密斯, 2003.
(10) The pelican glided through the air.
Dynamic events with dynamic subevents can
be further distinguished based on whether the
subevents are similar (例如, the strides in a run-
ning) 或不相似 (例如, the subevents in a house-
building) (Pi˜n´on, 1995). In the case where the
subevents are similar and a participant itself has
subparts (例如, when the participant is a group),
there may be a bijection from participant subparts
to subevents. 在 (11), there is a smiling for each
child that makes up the composite smiling—smile
is distributive. 在 (12), the meeting presumably
has some structure, but there is no bijection from
members to subevents—meet is collective (看
Champollion, 2010, for a review).
(11) {The children, Jo and Bo} smiled.
(12) {The committee, Jo and Bo} met.
Expanding on TEL, Dowty (1991) argues for a
distinction among telics in which the culmination
comes about incrementally (13) or abruptly (14)
(see also Tenny, 1987; Krifka, 1989, 1992, 1998;
Levin and Hovav, 1991; Rappaport Hovav and
莱文, 1998, 2001; Croft, 2012).
(13) The gardener mowed the lawn.
(14) The climber summitted at 5pm.
This notion of incrementality is intimately tied
up with the notion of DUR(ATIVITY). 例如,
Moens and Steedman (1988) point out that certain
event structures can be systematically transformed
into others—for example, 然而 (14) 描述
the summitting as something that happens at an
立即的 (and is thus abrupt), (15) describes it as a
process that culminates in having reached the top
of the mountain (see also Pustejovsky, 1995).
18
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
(15) The climber was summitting.
Such cases of aspectual coercion highlight the
importance of grammatical factors in determining
the structure of an event. More general contextual
factors are also at play when determining event
结构: I ran can describe a telic event (例如,
when it is known that I run the same distance
or to the same place every day) or an atelic
事件 (例如, when the destination and/or distance
is irrelevant in context) (Dowty, 1979; Olsen,
1997). This context-sensitivity strongly suggests
that annotating event structure is not simply a
matter of building a type-level lexical resource
and projecting its labels onto text: Actual text
must be annotated.
Resources Early, broad-coverage lexical
关于-
来源, such as the Lexical Conceptual Structure
lexicon (LCS; 多尔, 1993), attempt to directly en-
code an elaboration of the core Vendler classes in
terms of a hand-engineered graph representation
proposed by Jackendoff (1990). VerbNet (Kipper
Schuler, 2005) further elaborates on LCS by build-
ing on the fine-grained syntax-based classification
of Levin (1993) and links her classes to LCS-like
陈述. More recent versions of VerbNet
(v3.3+; Brown et al., 2018) update these repre-
sentations to ones based on the Dynamic Event
模型 (普斯特约夫斯基, 2013).
COLLIE-V, which expands the TRIPS lexicon
and ontology (Ferguson and Allen, 1998, et seq),
takes a similar tack of producing hand-engineered
event structures, combining this hand-engineering
with a procedure for bootstrapping event struc-
特雷斯 (Allen et al., 2020). FrameNet also contains
hand-engineered event structures, though they are
significantly more fine-grained than those found
in LCS or VerbNet (Baker et al., 1998).
VerbNet, COLLIE-V, and FrameNet are not
directly annotated on text, though annotations
for at least VerbNet and FrameNet can be ob-
tained by using SemLink to project FrameNet
and VerbNet annotations onto PropBank anno-
tations (Palmer et al., 2005). PropBank frames
have been enriched in a variety of other ways.
One such enrichment can be found in Abstract
Meaning Representation (AMR; Banarescu et al.,
2013; Donatelli et al., 2018). Another can be found
in Richer Event Descriptions (RED; O’Gorman
等人。, 2016), which annotates events and entities
for factuality (whether an event actually happened)
and genericity (whether an event/entity is a partic-
ular or generic) as well as annotating for causal,
颞, sub-event, and co-reference relations
between events (see also Chklovski and Pantel,
2004; Hovy et al., 2013; Cybulska and Vossen,
2014).
Additional less fine-grained event classifica-
tions exist in TimeBank (Pustejovsky et al., 2006),
Universal Conceptual Cognitive Annotation
(UCCA; Abend and Rappoport, 2013), 和
Situation Entities dataset (SitEnt; Friedrich and
帕尔默, 2014乙; Friedrich et al., 2016). Of these,
to capturing the standard Vendler
the closest
classification and decompositions
是
SitEnt. The original version of SitEnt annotates
only for a state-event distinction (alongside re-
lated, non-event structural distinctions), but later
elaborations further annotate for telicity (弗里德里希
and Gateva, 2017). Because of this close align-
ment to the standard Vendler classes, 我们用
SitEnt annotations as part of validating our own
annotation protocol in §3.
thereof
Universal Decompositional Semantics
在骗子-
特拉斯特
to the hand-engineered event structure
classifications discussed above, our aim is to de-
rive event structure representations directly from
semantic annotations. 要做到这一点, we extend the
existing annotations in the Universal Decompo-
sitional Semantics dataset (UDS; White et al.,
2016, 2020) with key annotations for the event
structural distinctions discussed above. Our aim
is not necessarily to reconstruct any previous
classification, though we do find in §6 that our
event type classification approximates Vendler’s
to some extent.
(我)
UDS is a semantic annotation framework and
这
dataset based on the principles that
semantics of words or phrases can be decom-
posed into sets of simpler semantic properties
和 (二) these properties can be annotated by
asking straightforward questions intelligible to
非专家. UDS comprises two layers of an-
notations on top of the Universal Dependencies
(UD) syntactic graphs in the English Web Tree-
bank (EWT): (我) predicate-argument graphs with
mappings into the syntactic graphs, derived us-
ing the PredPatt tool (White et al., 2016; 张
等人。, 2017); 和 (二) crowd-sourced annotations
for properties of events (on the predicate nodes
of the predicate-argument graph), 实体 (在
19
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 1: Example UDS semantics and syntax graphs with select properties (see Footnote 2 on the property
价值观). Bolded properties are ones we collect in this paper, and our new document-level graph is also shown
in purple.
argument nodes), and their relationship (在
predicate-argument edges).
The UDS properties are organized into three
predicate subspaces with five properties in total:
• FACTUALITY (Rudinger et al., 2018)
factual: did the event happen?
• GENERICITY (Govindarajan et al., 2019)
种类: is the event generic?
hypothetical: is the event hypothetical?
dynamic: is the event dynamic or stative?
• TIME (Vashishtha et al., 2019)
duration: how long did/will the event last?
Two argument subspaces with four properties:
• GENERICITY (Govindarajan et al., 2019)
特别的: is the entity a particular?
种类: is the entity a kind?
抽象的: is the entity abstract or concrete?
• WORDSENSE (White et al., 2016)
Which coarse entity types (WordNet super-
感觉) does the entity have?
And one predicate-argument subspace with 16
特性 (see White et al., 2016, for full list):
存在 {前, 期间, 后} did participant
存在 {前, 期间, 后} the event?
数字 1 shows an example UDS1.0 graph (白色的
等人。, 2020) augmented with (我) a subset of
the properties we add in bold (see §3); 和 (二)
document-level edges in purple (see §6).2
The UDS annotations and associated toolkit
have supported research in a variety of ar-
EAS,
including syntactic and semantic parsing
(Stengel-Eskin et al., 2020, 2021), semantic role
labeling (Teichert et al., 2017) and induction
(White et al., 2017), event factuality prediction
(Rudinger et al., 2018), temporal relation extrac-
的 (Vashishtha et al., 2019), 除其他外. 为了
our purposes, the annotations do cover some event
structural distinctions—for example, dynamicity,
specific cases of telicity (in the form of change
of state, change of location, and existed {前,
期间, 后}), and durativity. 在这个意义上, UDS
provides an alternative, decompositional event
representation that distinguishes it from more tra-
ditional categorical ones like SitEnt. 然而, 这
existing annotations fail to capture a number of
the core distinctions above—a lacuna this work
aims to fill.
• PROTOROLES (Reisinger et al., 2015)
instigation: did participant cause event?
change of state: did participant change state
during or as a consequence of event?
change of location: did participant change
location during event?
2Following White et al. (2020), the property values in
数字 1 are derived from raw annotations using mixed effects
型号 (MEMs; Gelman and Hill, 2014), which enable one to
adjust for differences in how annotators approach a particular
annotation task (see also Gantt et al., 2020). In §6, 我们
similarly use MEMs in our event structure induction model,
allowing us to work directly with the raw annotations.
20
3 Annotation Protocol
We annotate for the core event structural dis-
tinction not currently covered by UDS, breaking
our annotation into three subprotocols. For all
问题, annotators report confidence in their
response to each question on a scale from 1 (不是
at all confident) 到 5 (totally confident).3
Event-subevent Annotators are presented with
a sentence containing a single highlighted predi-
cate followed by four questions about the internal
structure of the event it describes. Q1 asks whether
the event described by the highlighted predicate
has natural subparts. Q2 asks whether the event
has a natural endpoint.
The final questions depend on the response to
Q1. If an annotator responds that the highlighted
predicate refers to an event that has natural parts,
they are asked (我) whether the parts are similar to
one another and (二) how long each part lasts on
average. If an annotator instead responds that the
event referred to does not have natural parts, 他们
are asked (我) whether the event is dynamic, 和
(二) how long the event lasts.
All questions are binary except those concern-
ing duration, for which answers are supplied as one
of twelve ordinal values (see Vashishtha et al.,
2019): effectively no time at all, fractions of a
第二, seconds, minutes, 小时, 天, weeks,
月, 年, 几十年, 世纪, or effectively
forever. 一起, these questions target the three
Vendler-inspired features (DYN, DUR, TEL), plus a
fourth dimension for subtypes of dynamic predi-
cates. In the context of UDS, these properties form
a predicate node subspace, alongside FACTUALITY,
GENERICITY, and TIME.
Event-event Annotators are presented with ei-
ther a single sentence or a pair of adjacent
句子, with the two predicates of interest
highlighted in distinct colors. For a predicate pair
(p1, p2) describing an event pair (e1, e2), annota-
tors are asked whether e1 is a mereological part
of e2, and vice versa. Both questions are binary:
A positive response to both indicates that e1 and
e2 are the same event; and a positive response
to exactly one of the questions indicates proper
parthood. Prior versions of UDS do not contain
any predicate-predicate edge subspaces, so we add
3The annotation interfaces for all three subprotocols,
including instructions, are available at decomp.io.
document-level graphs to UDS (§6) to capture the
relation between adjacently described events.
This subprotocol
targets generalized event
coreference, identifying constituency in addition
to strict identity. It also augments the information
collected in the event-subevent protocol: insofar
as a proper subevent relation holds between e1 and
e2, we obtain additional fine-grained information
about the subevents of the containing event—for
例子, an explicit description of at least one
subevent.
Event-entity The final subprotocol focuses on
the relation between the event described by a
predicate and its plural or conjoined arguments,
asking whether the predicate is distributive or
collective with respect to that argument. 这
property accordingly forms a predicate-argument
subspace in UDS, similar to PROTOROLES.
4 Validation Experiments
We validate our annotation protocol (我) by assess-
ing interannotator agreement (国际航空协会) among both
experts and crowd-sourced annotators for each
subprotocol on a small sample of items drawn
from existing annotated corpora (§4.1-4.2); 和
(二) by comparing annotations generated using our
protocol against existing annotations that cover
(a subset of) the phenomena that ours does and are
generated by highly trained annotators (§4.3).
4.1 Item Selection
For each of the three subprotocols, one of the
authors selected 100 sentences for inclusion in
the pilot for that subprotocol. This author did not
consult with the other authors on their selection,
so that annotation could be blind.
For the event-subevent subprotocol, 这 100
sentences come from the portion of the MASC
语料库 (Ide et al., 2008) that Friedrich et al.
(2016) annotate for eventivity (EVENT v. STATE)
and that Friedrich and Gateva (2017) annotate
for telicity (TELIC v. ATELIC). For the event-event
subprotocol, 这 100 sentences come from the
portions of the Richer Event Descriptions cor-
脓 (RED; O’Gorman et al., 2016) 那是
annotated for event subpart relations. To our
知识, no existing annotations cover dis-
tributivity, and so for our event-entity protocol,
we select 100 句子 (distinct from those used
21
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
for the event-subevent subprotocol) 和com-
pute IAA, but do not compare against existing
注释.
4.2 Interannotator Agreement
We compute two forms of IAA: (我) IAA among
expert annotators (the three authors); 和 (二) 国际航空协会
between experts and crowd-sourced annotators.
In both cases, we use Krippendorff’s α as our
measure of (迪斯)协议 (克里彭多夫, 2004).
For the binary responses, we use the nominal form
of α; for the ordinal responses, we use the ordinal.
Expert Annotators For each subprotocol, 这
three authors independently annotated the 100
sentences selected for that subprotocol.
Prior to analysis, we ridit score the confidence
ratings by annotator to normalize them for dif-
ferences in annotator scale use (see Govindarajan
等人。, 2019 for discussion of ridit scoring con-
fidence ratings in a similar annotation protocol).
This method maps ordinal labels to (0, 1) 在
basis of the empirical CDF of each annotator’s re-
sponses—with values closer to 0 implying lower
confidence and those nearer 1 implying higher
confidence. For questions that are dynamically
revealed on the basis of the answer to the natural
parts question (IE。, part similarity, average part
duration, dynamicity, and situation duration) 我们
use the average of the ridit scored confidence for
natural parts and that question.
数字 2 shows α when including only items
that the expert annotators rated with a particular
ridit scored confidence or higher. 协议
for the event-event protocol (mereology) is given
in two forms: given that e1 temporally contains
e2, (我) 指导的: the agreement on whether e2 is a
subevent of e1; 和 (二) undirected: 协议
on whether e2 is a subevent of e1 and whether e1
is a subevent of e2.
The error bars are computed by a nonpara-
metric bootstrap over items. A threshold of 0.0
corresponds to computing α for all annotations,
regardless of confidence; a threshold of t > 0.0
corresponds to computing α only for annotations
associated with a ridit scored confidence of greater
than t. When this thresholding results in less than
1
3 of items having an annotation for at least two
annotators, α is not plotted. This situation occurs
only for questions that are revealed based on the
answer to a previous question.
数字 2: IAA among experts for each property, fil-
tering annotations with ridit-scored confidence ratings
below different thresholds. Confidence threshold 0.0
implies no filtering. Errors bars show 95% confidence
internals computed by a nonparametric bootstrap.
For natural parts,
telicity, mereology, 和
distributivity, agreement is high, even without
filtering any responses on the basis of confidence,
and that agreement improves with confidence.
For part similarity, average part duration, 和
situation duration, we see more middling, 但
still reasonable, 协议, though this agreement
does not reliably increase with confidence. 这
fact that it does not increase may have to do with
interactions between confidence on the natural
parts question and its dependent questions that
we do not capture by taking the mean of these
two confidences.
Crowd-Sourced Annotators We recruit crowd-
sourced annotators in two stages. 第一的, we select a
small set of items from the 100 we annotate in the
expert annotation that have high agreement among
experts to create a qualification task. 第二,
based on performance in this qualification task,
we construct a pool of trusted annotators who are
allowed to participate in pilot annotations for each
of the three subprotocols.4
4During all validation stages as well as bulk annotation
(§5), we targeted an average hourly pay equivalent to that
22
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
100 sentences in each pilot. 因此, all pilots were
guaranteed to include a minimum of 10 distinct
工人 (all workers do all HITs), up to a maxi-
mum of 100 for the subprotocols with 10 句子
per HIT or 200 for the subprotocol with 5 per HIT
(each worker does one HIT). All top-200 workers
from the qualification were allowed to participate.
数字 3 shows IAA between all pilot annota-
tors and experts for individual questions across the
three pilots. 进一步来说, it shows the distri-
bution of α scores by question for each annotator
when IAA is computed among the three experts
and that annotator only. Solid vertical lines show
the expert-only IAA and dashed vertical lines
show the 95% confidence interval.
4.3 Protocol Comparison
这
更远
validate
event-event
到
和
event-subevent subprotocols, we evaluate how
well our pilot data predicts the corresponding
CONTAINS v. CONTAINS-SUBEVENT annotations from
RED in the former case, as well as the EVENT
v. STATE and TELIC v. ATELIC annotations from
SitEnt
in the latter. In both cases, we used
这 (ridit-scored) confidence-weighted average
response across annotators for a particular item
as features in a simple SVM classifier with linear
kernel. In a leave-one-out cross-validation on the
binary classification task for RED, we achieve
a micro-averaged F1 score of 0.79—exceeding
the reported human F1 agreement for both the
CONTAINS (0.640) and CONTAINS-SUBEVENT (0.258)
annotations reported by O’Gorman et al. (2016).
For SitEnt, we evaluate on a three-way
classification task for STATIVE, EVENTIVE-TELIC,
and EVENTIVE-ATELIC, achieving a micro-averaged
F1 of 0.68 using the same leave-one-out
cross-validation. As Friedrich and Palmer (2014A)
do not report interannotator agreement for this class
breakdown, we further compute Krippendorff’s
alpha from their raw annotations and again find
that agreement between our predicted annotations
and the gold ones (0.48) slightly exceeds the
interannotator agreement among humans (0.47).
These results suggest that our subprotocols cap-
ture relevant event structural phenomena as well
as linguistically trained annotators can and that
they may serve as effective alternatives to ex-
isting protocols while not requiring any linguis-
tic expertise.
23
数字 3: Per-property histograms of alphas for IAA
between each crowd-sourced annotator and all experts.
Black lines show the experts-only alpha, with dashed
lines for the 95% CI. (见图 2).
Qualification For the qualification task, 我们
selected eight of the sentences collected from
MASC for the event-subevent subprotocol on
which expert agreement was very high and which
were diverse in the types of events described.
We then obtained event-subevent annotations for
these sentences from 400 workers on Amazon
Mechanical Turk (AMT), and selected the top
200 among them on the basis of their agreement
with expert responses on the same items. 这些
workers were then permitted to participate in the
pilot tasks.
Pilot We conducted one pilot for each subpro-
tocol, using the items described in §4.1. 句子
were presented in lists of 10 per Human Intelli-
gence Task (HIT) on AMT for the event-event
and event-entity subprotocols and in lists of 5 每
HIT for event-subevent. We collected annotations
从 10 distinct workers for each sentence, 和
workers were permitted to annotate up to the full
for undergraduate research assistants performing corpus an-
notation at the first and final author’s institution: $12.50 每小时. A third-party estimate from TurkerView shows our ac- tual average hourly compensation when data collection was completed to be $14.71 每小时.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Annotation
Has natural parts
Parts similar
Parts dissimilar
(Part duration)
No natural parts
动态的
Not dynamic
(Full duration)
Natural endpoint
No natural endpoint
全部的
t
n
e
v
e
乙
你
s
–
t
n
e
v
乙
t P1, P2 identical
n
e
v
e
–
t
n
e
v
乙
P1, P2 disjoint
P1 ⊂ P2
P2 ⊂ P1
全部的
Distributive
Collective
y
t
我
t
n
e
–
t
n
e
v
乙
Count (%)
6,903 (23%)
4,498 (15%)
2,158 (7%)
(–)
23,069 (77%)
13,903 (48%)
8,839 (29%)
(–)
6,031 (20%)
23,941 (80%)
29,984
2,435 (6%)
30,247 (80%)
1,832 (5%)
3,029 (8%)
37,719
4,812 (50%)
4,876 (50%)
全部的
9,710
例子
The eighteen steps of the dance are done rhythmically
Israel resumed its policy of targeting militant leaders
Fish are probably the easiest to take care of
(ordinal; not shown)
It had better nutritional value
I would like to informally get together with you
I assume this is 12:30 Central Time?
(ordinal; not shown)
I will deliver it to you
If you know or work there could you enlighten me?
(all event descriptions)
All horses [. . . ] are happy1 & healthy2 when they
到达
I am often stopped1 on the street and asked, ‘Who does
your hair . . . I LOVE2 it’
The office is shared with a foot doctor and it’s
very sterile1 and medical feeling2, which I liked
It is a very cruel death1 with bodies dismembered2
(pairs of event descriptions w/ temporal overlap)
the pics turned out ok
we draw on our many faith traditions
常见的
定罪
(event descriptions with plural arguments)
to arrive at a
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
桌子 1: Descriptive statistics and examples from Train and Dev data. Each item was annotated
by a single annotator in Train; and by three annotators in Dev, of which this table reports the major-
ity opinion.
5 Corpus Annotation
We collect crowd-sourced annotations for the en-
tirety of UD-EWT. Predicate and argument spans
are obtained from the PredPatt predicate-argument
graphs for UD-EWT available in UDS1.0. 这
total number of items annotated for each subpro-
tocol is presented in Table 1.
Event-subevent These annotations cover all
predicates headed by verbs (as identified by UD
POS tag), as well as copular constructions with
nominal and adjectival complements. In the for-
mer case, only the verb token is highlighted in the
任务; in the latter, the highlighting spans from the
copula to the complement head.
Event-event Pairs for the event-event subpro-
tocol were drawn from the UDS-Time dataset,
which features pairs of verbal predicates, 任何一个
within the same sentence or in adjacent sen-
时态, each annotated with its start- and endpoint
relative to the other. We additionally included
predicate-argument pairs in cases where the
argument is annotated in UDS as having a Word-
Net supersense of EVENT, STATE, or PROCESS. 到
our knowledge, this represents the largest pub-
licly available (部分的) event coreference dataset
迄今为止.
Event-entity For the event-entity subprotocol,
we identify predicate-argument pairs in which the
argument is plural or conjoined. Plural arguments
are identified by the UD NUMBER attribute, 和
conjoined ones by a conj dependency between
an argument head and another noun. 我们骗-
sider only predicate-argument pairs with a UD
dependency of nsubj, nsubjpass, dobj, 或者
iobj.
24
6 Event Structure Induction
Our goal in inducing event structural categories is
to learn representations of those categories on the
basis of annotated UDS graphs, augmented with
the new UDS-E annotations. We aim to learn four
sets of interdependent classifications grounded
in UDS properties: event types, entity types, 和-
mantic role types, and event-event relation types.
These classifications are interdependent in that we
assume a generative model that incorporates both
句子- and document-level structure.5
边缘
Document-level UDS Semantics
在
UDS1.0 represent only sentence-internal semantic
关系. This constraint implies that annotations
for cross-sentential semantic relations—a signif-
icant subset of our event-event annotations—
cannot be represented in the graph structure. 到
remedy this, we extend UDS1.0 by adding doc-
ument edges that connect semantics nodes either
within a sentence or in two distinct sentences,
and we associate our event-event annotations
with their corresponding document edge (看
数字 1). Because UDS1.0 does not have a
notion of document edge, it does not contain
Vashishtha et al.’s (2019) fine-grained temporal
relation annotations, which are highly relevant to
event-event relations. We additionally add those
attributes to their corresponding document edges.
Generative Model Algorithm 1 gives the gen-
erative story for our event structure induction
模型. We assume some number of types of events
Tevent, roles Rrole, entities Tent, and relations Rrel.
数字 4 shows the resulting factor graph for the
semantic graphs shown in Figure 1.
Annotation Likelihoods The distribution f a
p
on the annotations themselves is implemented as a
mixed model (Gelman and Hill, 2014) dependent
on property p being annotated with annotator ran-
dom intercepts R, where the random intercepts
for annotator a are ρa ∼ N (0, Σann) with un-
known Σann. When p receives binary annotations,
a simple logistic mixed model is assumed, 在哪里
p = Bern(logit−1(μip + ρaip)) and ip is the in-
法
dex corresponding to property p in the expected
annotation μ. When p receives nominal annota-
系统蒸发散, 法
p = Cat(softmax(μip + ρaip)) and ip is
a set with cardinality of the number of nominal
Initialize queue I ;
for sentence s ∈ S do
Initialize queue J ;
Enqueue J → I ;
if length(我) > W then
Dequeue I
(西德:3)
;
for predicate node v ∈ predicates(s) 做
我(事件)
(西德:2)
Sample event type tsv ∼ Cat
for property p ∈ Pevent do
for annotator i ∈ A(事件)
(西德:2)
(西德:3)
svp
∼ f i
p
做
μ(事件)
tsv
Sample x(事件)
svpi
Enqueue (西德:7)s, v(西德:8) → J;
for argument node v(西德:9) ∈ arguments(s, v) 做
;
我(实体)
(西德:3)
(西德:2)
Sample ent. type tsv(西德:9) ∼ Cat
for property p ∈ Pent do
for annotator i ∈ A(耳鼻喉科)
sv(西德:9)p do
(西德:2)
∼ f i
p
Sample x(耳鼻喉科)
sv(西德:9)pi
if v(西德:9) is eventive then
Enqueue (西德:7)s, v(西德:9)(西德:8) → J;
(西德:3)
μ(耳鼻喉科)
tsv(西德:9)
(西德:2)
(西德:3)
;
我(角色)
tsv tsv(西德:9)
Sample role type rsvv(西德:9) ∼ Cat
for property p ∈ Prole do
for annotator i ∈ A(角色)
svv(西德:9)p do
(西德:2)
∼ f i
p
Sample x(角色)
μ(角色)
svv(西德:9)pi
rsvv(西德:9)
for index pair (西德:7)s(西德:9), v(西德:9)(西德:8) ∈ flatten(我) 做
(西德:2)
(西德:3)
Sample rel. type q ∼ Cat
for property p ∈ Prel do
for annotator i ∈ A(相对)
Sample x(相对)
svs(西德:9)v(西德:9)pi
svs(西德:9)v(西德:9)p do
(西德:2)
μ(相对)
∼ f i
q
p
(西德:3)
(西德:3)
;
我(相对)
tsv ts(西德:9)v(西德:9)
Algorithm 1: Generative story of event struc-
ture induction model for a single document with
sentence window W .
类别. And when p receives ordinal annota-
系统蒸发散, we follow White et al. (2020) in using an
ordinal (linked logit) mixed effects model where
ρa defines the cutpoints between response values
in the cumulative density function for annotator a:
≤ j) = logit−1(μip
磷(xaip
p (xaip = j) =P(xaip
法
− ρaip)
≤ j) − P(xaip
≤ j − 1)
Conditional Properties For both our dataset
and UDS-Protoroles, certain annotations are con-
ditioned on others, owing to the fact that whether
some questions are asked at all depends upon
annotator responses to previous ones. 下列的
White et al. (2017), we model the likelihoods
for these properties using hurdle models (Agresti,
2014): For a given property, a Bernoulli distribu-
tion determines whether the property applies; 如果
it does, the property value is determined using a
second distribution of the appropriate type.
5See Ferraro and Van Durme, 2016 for a related model
that uses FrameNet’s ontology, rather than inducing its own.
Temporal Relations Temporal relations an-
notations from UDS-Time consist of 4-tuples
25
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 4: The factor graph for the pair of sentences shown in Figure 1 based on the generative story given in
Algorithm 1. Each node or edge annotated in the semantics graphs becomes a variable node in the factor graph,
as indicated by the dotted lines. Only factors for the prior distributions over types are shown; the annotation
likelihood factors associated with each variable node are omitted for space.
→
e1,
←
e2,
→
e1,
←
e1,
←
→
e2) of real values on the unit inter-
(
e1,
val, representing start- and endpoints of two
event-referring predicates or arguments, e1 and
e2. Each tuple is normalized such that the ear-
←
e2) is always locked to the left end
lier of (
→
of the scale (0) and the later of (
e2) 到
right end (1). The likelihood for these annotations
must consider the different possible orderings of
the two events. 这样做, we first determine
←
e2 is, or both are, 根据
无论
to Cat (softmax(μlock← + ρailock← )). We do like-
→
e2, using a separate distribution
wise for
猫 (softmax(μlock→ + ρailock→ )). 最后, 如果
start point from one event and the endpoint from
the other are free (IE。, not locked), we determine
their relative ordering using a third distribution
猫 (softmax(μlock↔ + ρailock↔ )).
←
e1 is locked,
→
e1 and
Implementation We fit our model to the train-
ing data using expectation-maximization. We use
loopy belief propagation to obtain the posteri-
ors over event, 实体, 角色, and relation types in
the expectation step and the Adam optimizer to
estimate the parameters of the distributions asso-
ciated with each type in the maximization step.6
As a stopping criterion, we compute the evidence
that the model assigns to the development data,
stopping when this quantity begins to decrease.
To make use of the (ridit-scored) confidence
∈ (0, 1) associated with each anno-
response caip
tation xaip, we weight the log-likelihood of xaip
by caip when computing the evidence of the anno-
tations. This weighting encourages the model to
6The variable and factor nodes for the relation types can
introduce cycles into the factor graph for a document, 哪个
is what necessitates the loopy variant of belief propagation.
26
explain annotations that an annotator was highly
confident in, penalizing the model less if it assigns
low likelihood to a low confidence annotation.
To select |Tevent|, |Tent|, |Rrole|, 和 |Rrel| 为了
Algorithm 1, we fit separate mixture models
for each classification—i.e., removing all factor
nodes—using the same likelihood functions f i
p as
in Algorithm 1. We then compute the evidence
that the simplified model assigns to the develop-
ment data given some number of types, choosing
the smallest number such that there is no reliable
increase in the evidence for any larger number.
To determine reliability, we compute 95% 骗局-
fidence intervals using nonparametric bootstraps.
重要的, this simplified model is only used
to select |Tevent|, |Tent|, |Rrole|, 和 |Rrel|: 全部
analyses below are conducted on full model fits.
Types The selection procedure described above
yields |Tevent| = 4, |Tent| = 8, |Rrole| = 2, 和
|Rrel| = 5. To interpret these classes, we inspect
the property means μt associated with each type t
and give examples from UD-EWT for which the
posterior probability of that type is high.
Event Types While our goal was not neces-
sarily to reconstruct any particular classification
from the theoretical literature, the four event types
align fairly well with those proposed by Vendler
(1957): statives (16), 活动 (17), achievements
(18), and accomplishments (19). We label our
clusters based on these interpretations (数字 5).
(16) I have finally found a mechanic I trust!!
(17) his agency is still reviewing the decision.
(18) A suit against [. . . ] Kristof was dismissed.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 5: Probability of binary properties from the
event-subevent protocol by event type. Cells marked
with ‘‘N/A’’ indicate that the property generally does
not apply for the corresponding type because of the
conditional dependence on natural parts.
(19) a consortium [. . . ] 建立在 1997
One difference between Vendler’s classes and our
own is that our ‘‘activities’’ correspond primarily
to those without dynamic subevents, while our
‘‘accomplishments’’ encompass both his accom-
plishments and activities with dynamic subevents
(see discussion of Taylor, 1977 in §2).
Even if approximate, this alignment is surpris-
ing given that Vendler’s classification was not
developed with actual language use in mind and
thus abstracts away from complexities that arise
when dealing with, 例如, non-factual or
generic events. 尽管如此, there do arise cases
where a particular predicate has a wider distri-
bution across types than we might expect based
on prior work. 例如, know is prototypi-
cally stative; and while it does get classed that
way by our model, it also gets classed as an ac-
complishment or achievement (though rarely an
活动)—for example, when it is used to talk
about coming to know something, 如 (20).
(20) Please let me know how[. . . ]to proceed.
Entity Types Our entity types are: 每-
son/group (21), concrete artifact (22), contentful
artifact (23), particular state/event (24), generic
state/event (25), 时间 (26), kind of concrete objects
(27), and particular concrete objects (28).
(21) Have a real mechanic check[…]
(22) I have a [. . . ] cockatiel, and there are 2 eggs
in the bottom of the cage[. . . ]
(23) Please find attached a credit worksheet[. . . ]
数字 6: Probability of role type properties. 这些
include existing UDS protoroles properties, along with
the distributive property from the event-entity sub-
协议. We have labeled the role types with our
proto-agent/proto-patient interpretation given below.
(24) He didn’t take a dislike to the kids[…]
(25) They require a lot of attention [. . . ]
(26) Every move Google makes brings this
particular future closer.
(27) And what is their big / main meal of the day.
(28) Find him before he finds the dog food.
那
there are only two abstract
Role Types The optimality of
two role
types is consistent with Dowty’s (1991) 亲-
工作
角色
prototypes—proto-agent and proto-patient—into
which individual thematic roles (IE。, those spe-
cific to particular predicates) cluster. 更远, 这
means for the two role types we find very closely
track those predicted by Dowty (见图 6),
with clear proto-agents (29) and proto-patients
(30) (see also White et al., 2017).
(29) they don’t press their sandwiches.
(30) you don’t ever feel like you ate too much.
Relation Types The relation types we obtain
track closely with approaches that use sets of
underspecified temporal relations (Cassidy et al.,
2014; O’Gorman et al., 2016; Zhou et al., 2019,
2020; 王等人。, 2020): e1 starts before e2 (31),
27
e2 starts before e1 (32), e2 ends after e1 (33), e1
contains e2 (34), and e1 = e2 (35).
(31) [. . . ]the Spanish, Thai and other contingents
are already committed to leaving [. . . ]
(32) And I have to wonder: Did he forget that he
already has a memoir[. . . ]
(33) 不, i am not kidding and no i don’t want it
b/c of the taco bell dog. i want it b/c it is
really small and cute.
(34) they offer cheap air tickets to their country
[. . . ] you may get excellent discount airfare,
which may even surprise you.
(35) the food is good, however the tables are so
close together that it feels very cramped.
Type Consistency To assess the impacts of
sentence/document-level structure (Algorithm 1)
and confidence weighting on the types we induce,
we investigate how the distributions over role and
event types shift when comparing models fit with
and without structure and confidence weighting.
数字 7 visualizes these shifts as row-normalized
confusion matrices computed from the posteriors
across items derived from each model. It compares
(顶部) the simplified model used for model selection
(rows) against the full model without confidence
weighting (columns), 和 (底部) the full model
without confidence weighting (rows) against the
one with (columns).7
第一的, we find that the interaction between types
afforded by incorporating the full graph struc-
真实 (top plot) produces small shifts in both the
event and role type distributions, suggesting that
the added structure may help chiefly in resolv-
ing boundary cases, which is exactly what we
might hope additional model structure would do.
第二, weighting likelihoods by annotator con-
fidence (底部) yields somewhat larger shifts
as well as more entropic posteriors (0.22 average
normalized entropy for events; 0.30 for roles) 比
without weighting (0.02 for events; 0.22 for roles).
Higher entropy is expected (并且在某种程度上,
desirable) 这里: Introducing a notion of confidence
should make the model less confident about items
that annotators were less confident about. 更远,
among event types, the distribution of posterior
entropy across items is driven by a minority of
high uncertainty items, as evidenced by a very low
7The distributional shifts for entity and relation types were
extremely small, and so we do not discuss them here.
28
数字 7: Confusion matrices for event and role types.
median normalized entropy for event types (0.02).
The opposite appears to be be true among the role
类型, for which the median is high (0.60). 这
latter pattern is perhaps not surprising in light
of theoretical accounts of semantic roles, 这样的
as Dowty’s: The entire point of such accounts
is that it is very difficult to determine sharp role
类别, suggesting the need for a more continu-
ous notion.
7 Comparison to Existing Ontologies
To explore the relationship between our induced
classification and existing event and role ontolo-
吉斯, we ask how well our event, 角色, and entity
types map onto those found in PropBank and
VerbNet. 重要的, the goal here is not perfect
alignment between our types and PropBank and
VerbNet types, but rather to compare other classi-
fications that reflect top–down assumptions to the
one we derive bottom–up.
p
pa
Implementation To carry out these compar-
isons, we use the parameters of the posterior
distributions over event types θ(电动车)
for each predi-
cate p, over role types θ(角色)
for each argument a
of each predicate p, and over entity types θ(耳鼻喉科)
为了
pa
each argument a of each predicate p as features in
an SVM with RBF kernel predicting the event and
role types found in PropBank and VerbNet. 我们
take this route, over direct comparison of types,
to account for the possibility that information en-
coded in role or event types within VerbNet or
PropBank is distributed differently in our more
abstract classification. We tune L2 regularization
(λ ∈ {1, 0.5, 0.2, 0.1, 0.01, 0.001}) and bandwidth
(γ ∈ {1e-2, 1e-3, 1e-4, 1e-5}) using grid search,
selecting the best model based on performance
on the standard UD-EWT development set. 全部
metrics reflect UD-EWT test set performance.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Role
磷
右
F Micro F
Predicate
磷
右
F
argnum
functag
verbnet
A0
A1
pag
ppt
0.58 0.63 0.60
0.72 0.78 0.75
0.57 0.59 0.58
0.65 0.77 0.71
agent
0.64 0.54 0.59
病人 0.20 0.14 0.16
0.55 0.58 0.57
theme
0.67
0.62
NA
桌子 2: Test set results for all role types that are
labeled on at least 5% of the development data.
Role Type Comparison We first obtain a map-
ping from UDS predicates and arguments to the
PropBank predicates and arguments annotated in
EWT. Each such argument in PropBank is anno-
tated with an argument number (A0-A4) 还有
as a function tag (PAG = agent, PPT = patient,
ETC。). We then compose this mapping with the
mapping given in the PropBank frame files from
PropBank rolesets to sets of VerbNet classes and
from PropBank roles to sets of VerbNet roles
(AGENT, PATIENT, THEME, ETC。) to obtain a mapping
from UDS arguments to sets of VerbNet roles.
Because a particular argument maps to a set of
VerbNet roles, we treat predicting VerbNet roles
as a multi-label problem, fitting one SVM per
角色. For each argument a of predicate p, 我们用
as predictors [我(电动车)
p¬a ], 和
p¬aj = [maxa(西德:9)(西德:13)=a θ(role/ent)
我(role/ent)
].
桌子 2 gives the test set results for all role types
labeled on at least 5% of the development data. 为了
比较, a majority guessing baseline obtains
micro F1s of 0.58 (argnum) 和 0.53 (functag).8
Our roles tend to align well with agentive roles
(PAG, AGENT, and A0) and some non-agentive roles
(PPT, THEME, and A1), but they align less well with
other non-agentive roles (PATIENT). This result sug-
gests that our two-role classification aligns fairly
closely with the agentivity distinctions in Prop-
Bank and VerbNet, as we would expect if our
roles in fact captured something like Dowty’s
coarse distinction among prototypical agents
and patients.
pa ; 我(角色)
p¬a ; 我(耳鼻喉科)
, meana(西德:9)(西德:13)=aθ(role/ent)
pa(西德:9)j
; 我(角色)
pa
; 我(耳鼻喉科)
pa(西德:9)j
p
Event Type Comparison The PropBank role-
set and VerbNet class ontologies are extremely
8A majority baseline for VerbNet roles always yields an
F1 of 0 in our multi-label setup, since no role is assigned to
more than half of arguments.
29
原因
做
has possession
has location
motion
0.51
0.30
0.23
0.11
0.09
0.95
0.25
0.18
0.14
0.10
0.66
0.27
0.20
0.12
0.09
桌子 3: Test set results for all Verb-
Net predicates that are labeled on five
most frequent predicates.
compare
fine-grained, with PropBank capturing specific
predicate senses and VerbNet capturing very
fine-grained syntactic behavior of a generally
small set of predicates. Since our event types
are intended to be more general than either, 我们
do not compare it directly to PropBank rolesets or
VerbNet classes.
反而, 我们
to the generative
lexicon-inspired variant of VerbNet’s semantics
(Brown et al., 2018). An example of
层
the predicate give-13.1 is
this layer
has possession(e1, Ag, 钍) & transfer(e2, Ag,
钍, Rec) & 原因(e2, e3) & has possession(e3,
Rec, 钍). We predict only the abstract predicates
in this decomposition (例如, transfer or cause),
treating the problem as multi-label and fitting
one SVM per predicate. For each predicate p,
; 我(耳鼻喉科)
; 我(角色)
we use as predictors [我(电动车)
], 和
p·
p·
p
, meanaθ(role/ent)
我(role/ent)
].
p·j
= [maxa θ(role/ent)
为了
paj
paj
桌子 3 gives the test set results for the five
most frequent predicates in the corpus. For com-
parison, a majority guessing baseline would yield
the same F (0.66) as our model for CAUSE, 但
since none of the other classes are assigned to
more than half of events, majority guessing for
those would yield an F of 0. This result suggests
那, while there may be some agreement between
our classification and VerbNet’s semantics layer,
the two representations are relatively distinct.
8 结论
We have presented an event structure classifica-
tion derived from inferential properties annotated
on sentence- and document-level semantic graphs.
We induced this classification jointly with seman-
tic role, 实体, and event-event relation types
using a document-level generative model. 我们的
model identifies types that approximate theoret-
ical predictions—notably, four event types like
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Vendler’s, as well as proto-agent and proto-patient
role types like Dowty’s. We hope this work
encourages greater interest in computational ap-
proaches to event structural understanding while
also supporting work on adjacent problems in
自然语言单元, such as temporal information extraction and
(部分的) event coreference, for which we provide
the largest publicly available dataset to date.
致谢
We would like to thank Emily Bender, Dan Gildea,
and three anonymous reviewers for detailed com-
ments on this paper. We would also like to thank
members of the Formal and Computational Se-
mantics lab at the University of Rochester for
feedback on the annotation protocols. This work
was supported in part by the National Science
基础 (BCS-2040820/2040831, Collabora-
tive Research: Computational Modeling of the
Internal Structure of Events) as well as by DARPA
AIDA and DARPA KAIROS. The views and con-
clusions contained in this work are those of the
authors and should not be interpreted as nec-
essarily representing the official policies, 任何一个
expressed or implied, or endorsements of DARPA
or the U.S. 政府. 美国. 政府
is authorized to reproduce and distribute reprints
for governmental purposes notwithstanding any
copyright annotation therein.
参考
Omri Abend and Ari Rappoport. 2013. 大学-
sal Conceptual Cognitive Annotation (UCCA).
In Proceedings of the 51st Annual Meeting
of the Association for Computational Linguis-
抽动症 (体积 1: Long Papers), pages 228–238,
Sofia, Bulgaria. Association for Computational
语言学.
Alan Agresti. 2014. Categorical Data Analysis,
约翰·威利 & Sons.
James Allen, Hannah An, Ritwik Bose, 将要
de Beaumont, and Choh Man Teng. 2020.
A broad-coverage deep semantic lexicon for
the 12th Lan-
动词.
guage Resources and Evaluation Conference,
pages 3243–3251, Marseille, 法国. 欧洲的
Language Resources Association.
在诉讼程序中
Collin F. 贝克, Charles J. Fillmore, and John B.
Lowe. 1998. The Berkeley Framenet Project. 在
Proceedings of the 17th International Confer-
ence on Computational Linguistics, 体积 1,
pages 86–90. Association for Computational
语言学.
Laura Banarescu, Claire Bonial, Shu Cai, Madalina
Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin
骑士, Philipp Koehn, Martha Palmer, 和
Nathan Schneider. 2013. Abstract Meaning
Representation for sembanking. In Proceed-
ings of the 7th Linguistic Annotation Work-
shop and Interoperability with Discourse,
pages 178–186, Sofia, Bulgaria. 协会
for Computational Linguistics.
Michael Ruisdael Bennett and Barbara Hall
Partee. 1978. Towards the Logic of Tense
and Aspect
印第安纳大学
Linguistics Club, 布卢明顿, 在.
in English.
Ann Bies, Justin Mott, Colin Warner, and Seth
Kulick. 2012. English web treebank. Linguistic
Data Consortium, 费城, PA.
Susan Windisch Brown,
James Pustejovsky,
Annie Zaenen, and Martha Palmer. 2018. 在-
tegrating Generative Lexicon event structures
into VerbNet. In Proceedings of the Eleventh In-
ternational Conference on Language Resources
and Evaluation (LREC 2018), Miyazaki, 日本.
European Language Resources Association
(ELRA).
Taylor Cassidy, Bill McDowell, Nathanael
Chambers, and Steven Bethard. 2014. An an-
notation framework for dense event ordering.
In Proceedings of the 52nd Annual Meeting
of the Association for Computational Linguis-
抽动症 (体积 2: Short Papers), pages 501–506,
巴尔的摩, Maryland. Association for Compu-
tational Linguistics. https://doi.org/10
.3115/v1/P14-2082
Lucas Champollion. 2010. Parts of a Whole:
Distributivity as a Bridge between Aspect
and Measurement. 博士. 论文, 大学
宾夕法尼亚州, 费城.
Timothy Chklovski and Patrick Pantel. 2004.
Verbocean: Mining the web for fine-grained
semantic verb relations. 在诉讼程序中
2004 实证方法会议
自然语言处理, pages 33–40.
Emmon Bach. 1986. The algebra of events.
William Croft. 2012. Verbs: Aspect and causal
Linguistics and Philosophy, 9(1):5–16.
结构. 牛津大学出版社.
30
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Agata Cybulska and Piek Vossen. 2014. Us-
ing a sledgehammer to crack a nut? 词汇
diversity and event coreference resolution.
在诉讼程序中
the Ninth International
Conference on Language Resources and Evalu-
化 (LREC’14), pages 4545–4552, Reykjavik,
冰岛. European Language Resources Asso-
引文 (ELRA).
Lucia Donatelli, Michael Regan, William Croft,
and Nathan Schneider. 2018. Annotation of
tense and aspect semantics for sentential AMR.
在诉讼程序中
the Joint Workshop on
Linguistic Annotation, Multiword Expressions
and Constructions
(LAW-MWE-CxG-2018),
pages 96–108.
Bonnie Jean Dorr. 1993. 机器翻译: A
View from the Lexicon. 与新闻界.
David Dowty. 1979. Word Meaning and Mon-
tague Grammar: The Semantics of Verbs and
Times in Generative Semantics and in Mon-
tague’s PTQ, 体积 7, Springer Science &
Business Media.
David Dowty. 1991. Thematic proto-roles and
argument selection. 语言, 67(3):547–619.
https://doi.org/10.2307/415037,
https://doi.org/10.1353/lan.1991
.0021
乔治
和
Ferguson
James
integrated
F. 艾伦.
1998. TRIPS: 一个
聪明的
problem-solving assistant. 在诉讼程序中
the Fifteenth National/Tenth Conference on
Artificial Intelligence/Innovative Applications
of Artificial Intelligence, AAAI ’98/IAAI ’98,
pages 567–572, 美国. American Association
for Artificial Intelligence.
Francis Ferraro and Benjamin Van Durme.
2016. A unified Bayesian model of scripts,
frames and language. 在诉讼程序中
AAAI 人工智能会议,
体积 30.
Annemarie Friedrich and Damyana Gateva. 2017.
Classification of telicity using cross-linguistic
annotation projection. 在诉讼程序中
2017 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2559–2565,
哥本哈根, 丹麦. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/D17-1271
Annemarie Friedrich and Alexis Palmer. 2014A.
Automatic prediction of aspectual class of
31
verbs in context. In Proceedings of the 52nd
Annual Meeting of the Association for Compu-
tational Linguistics (体积 2: Short Papers),
pages 517–523, 巴尔的摩, Maryland. Associa-
tion for Computational Linguistics. https://
doi.org/10.3115/v1/P14-2085
Annemarie Friedrich and Alexis Palmer. 2014乙.
Situation entity annotation. 在诉讼程序中
LAW VIII – The 8th Linguistic Annotation
作坊, pages 149–158, 都柏林, 爱尔兰.
Association for Computational Linguistics and
Dublin City University. https://doi.org
/10.3115/v1/W14-4921
Annemarie Friedrich, Alexis Palmer, and Manfred
Pinkal. 2016. Situation entity types: 汽车-
matic classification of clause-level aspect. 在
Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1757–1768,
柏林, 德国. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/P16-1166
William Gantt, Benjamin Kane, and Aaron Steven
白色的. 2020. Natural language inference with
mixed effects. 在诉讼程序中
the Ninth
Joint Conference on Lexical and Compu-
tational Semantics, pages 81–87, 巴塞罗那,
西班牙 (在线的). Association for Computational
语言学.
Andrew Gelman and Jennifer Hill. 2014.
Data Analysis Using Regression and Multi-
level/Hierarchical Models. 剑桥大学-
城市出版社, New York City.
Venkata Govindarajan, Benjamin Van Durme,
and Aaron Steven White. 2019. Decompos-
ing generalization: Models of generic, habitual,
and episodic statements. Transactions of the
计算语言学协会,
7:501–517. https://doi.org/10.1162
/tacl_a_00285
Eduard Hovy, Teruko Mitamura, Felisa Verdejo,
Jun Araki, and Andrew Philpot. 2013. Events
are not simple:
身份, non-identity, 和
quasi-identity. In Workshop on Events: Defi-
尼尼申, Detection, Coreference, and Represen-
站, pages 21–28.
Nancy Ide, CollinBaker, ChristianeFellbaum, 查尔斯
Fillmore, and Rebecca Jane Passonneau. 2008.
MASC: The manually annotated sub-corpus
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
In 6th International
of American English.
Conference on Language Resources and Evalu-
化, LREC 2008, pages 2455–2460. 欧洲的
Language Resources Association (ELRA).
Ray Jackendoff. 1990. Semantic Structures,
体积 18. 与新闻界.
安东尼·肯尼. 1963. 行动, Emotion and Will,
Humanities Press, 伦敦.
Karin Kipper Schuler. 2005. VerbNet: A
Broad-coverage, Comprehensive Verb Lexicon.
博士. 论文, 宾夕法尼亚大学.
Manfred Krifka. 1989. Nominal reference, TEM-
poral constitution and quantification in event se-
曼蒂克. In Renate Bartsch, Johan van Benthem,
and Peter van Emde Boas, 编辑, 语义学
and Contextual Expressions, pages 75–115.
Foris, 多德雷赫特. https://doi.org/10
.1515/9783110877335-005
Manfred Krifka. 1992. Thematic relations as
links between nominal reference and temporal
宪法. Lexical Matters, 2953.
Manfred Krifka. 1998. The origins of
telic-
性. In Susan Rothstein, 编辑, Events and
语法, Studies in Linguistics and Philos-
奥菲, pages 197–235. Springer Netherlands,
多德雷赫特. https://doi.org/10.1007
/978-94-011-3969-4_9
K. 克里彭多夫. 2004. Content Analysis: 一个
Introduction to Its Methodology. 智者.
George Lakoff. 1965. On the Nature of Syn-
tactic Irregularity. 博士. 论文, 马萨诸塞州
Institute of Technology.
Beth Levin. 1993. English Verb Classes and
Alternations: A Preliminary Investigation.
芝加哥大学出版社, 芝加哥.
Beth Levin and Malka Rappaport Hovav. 1991.
Wiping the slate clean: A lexical seman-
tic exploration. 认识, 41(1-3):123–151.
https://doi.org/10.1016/0010
-0277(91)90034-2
Marc Moens and Mark Steedman. 1988. Temporal
ontology and temporal reference. Computa-
tional Linguistics, 14(2):15–28.
乔金·尼弗尔, Marie-Catherine de Marneffe, Filip
Ginter, Yoav Goldberg, Jan Hajiˇc, Christopher
D. 曼宁, Ryan McDonald, Slav Petrov,
Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty,
and Daniel Zeman. 2016. Universal Depen-
treebank collec-
dencies v1: A multilingual
的. In Proceedings of the Tenth International
Conference on Language Resources and Eval-
uation (LREC’16), pages 1659–1666, Portoroˇz,
Slovenia. European Language Resources Asso-
引文 (ELRA).
Tim O’Gorman, Kristin Wright-Bettner, 和
Martha Palmer. 2016. Richer event description:
Integrating event coreference with temporal,
causal and bridging annotation. In Proceed-
the 2nd Workshop on Computing
ings of
News Storylines (CNS 2016), pages 47–56,
Austin, 德克萨斯州. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/W16-5706
Mari Broman Olsen. 1997. A Semantic and
Pragmatic Model of Lexical and Grammatical
Aspect. Outstanding Dissertations
in Lin-
语言学. Garland. https://doi.org/10
.1162/0891201053630264
Martha Palmer, Daniel Gildea, and Paul Kingsbury.
2005. The proposition bank: An annotated
corpus of semantic roles. Computational Lin-
语言学, 31(1):71–106.
Christopher Pi˜n´on. 1995. An Ontology for Event
语义学. 博士. 论文, 斯坦福大学,
帕洛阿尔托.
James Pustejovsky. 1995. The Generative Lexi-
骗局. 与新闻界, 剑桥, 嘛.
James Pustejovsky. 2013. Dynamic event struc-
ture and habitat theory. 在诉讼程序中
6th International Conference on Generative Ap-
proaches to the Lexicon (GL2013), 第 1–10 页,
Pisa,
意大利. Association for Computational
语言学.
James Pustejovsky, Marc Verhagen, Roser Saur´ı,
Jessica Littman, Robert Gaizauskas, Graham
Katz, Inderjeet Mani, Robert Knippen, 和
Andrea Setzer. 2006. TimeBank 1.2. Linguistic
Data Consortium, 40.
Alexander P. D. Mourelatos. 1978. Events,
流程, and states. Linguistics and Philoso-
物理层, 2(3):415–434. https://doi.org/10
.1007/BF00149015
Malka Rappaport Hovav and Beth Levin. 1998.
Building verb meanings. The Projection of Ar-
guments: Lexical and Compositional Factors,
pages 97–134.
32
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Malka Rappaport Hovav and Beth Levin. 2001. 一个
event structure account of english resultatives.
语言, 77(4):766–797.
Drew Reisinger, Rachel Rudinger, Francis
费拉罗, Craig Harman, Kyle Rawlins, 和
Benjamin Van Durme. 2015. Semantic proto-
角色. Transactions of the Association for Com-
putational Linguistics, 3:475–488. https://
doi.org/10.1162/tacl 00152
Rachel Rudinger, Adam Teichert, Ryan Culkin,
Sheng Zhang, and Benjamin Van Durme.
2018. Neural-Davidsonian semantic proto-role
这 2018 骗局-
labeling. 在诉讼程序中
ference on Empirical Methods in Natural
语言处理, pages 944–955, Brus-
sels, 比利时. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D18-1114
Carlota S. 史密斯. 2003. Modes of Discourse: 这
Local Structure of Texts, 体积 103. 凸轮-
桥大学出版社. https://doi.org
/10.1017/CBO9780511615108
Elias Stengel-Eskin, Kenton Murray, 盛
张, Aaron Steven White, and Benjamin
Van Durme. 2021. Joint universal syntac-
tic and semantic parsing. arXiv 预印本
arXiv:2104.05696
Elias Stengel-Eskin, Aaron Steven White, 盛
张, and Benjamin Van Durme. 2020.
Universal decompositional semantic parsing.
In Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 8427–8439, 在线的. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.acl-main.746
Barry Taylor. 1977. Tense and continuity.
Linguistics and Philosophy, 1(2):199–220
Adam Teichert, Adam Poliak, 本杰明
Van Durme, and Matthew Gormley. 2017. 硒-
mantic proto-role labeling. 在诉讼程序中
the AAAI Conference on Artificial Intelligence,
体积 31.
Carol Lee Tenny. 1987. Grammaticalizing Aspect
and Affectedness. 博士. 论文, 马萨诸塞州
Institute of Technology.
Robert Truswell, 编辑. 2019. The Oxford Hand-
book of Event Structure, 牛津大学
按, 牛津. Publication Title: The Oxford
Handbook of Event Structure. https://
doi.org/10.1093/oxfordhb/978
0199685318.001.0001
Siddharth Vashishtha, Benjamin Van Durme, 和
Aaron Steven White. 2019. Fine-grained tem-
poral relation extraction. 在诉讼程序中
57th Annual Meeting of the Association for
计算语言学, pages 2906–2919,
Florence,
意大利. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/P19-1280
Zeno Vendler. 1957. Verbs and times. Philosoph-
ical Review, 66(2):143–160. https://土井
.org/10.2307/2182371
Henk J. Verkuyl. 1972. On The Compositional
Nature Of The Aspects, 体积 15 of Foun-
dations of Language. D. Reidel Publishing
公司, 多德雷赫特. https://doi.org
/10.1007/978-94-017-2478-4
Haoyu Wang, Muhao Chen, Hongming Zhang,
and Dan Roth. 2020. Joint constrained learning
for event-event relation extraction. In Proceed-
ings of the 2020 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP),
pages 696–706, 在线的. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.emnlp-main.51
Aaron Steven White, Kyle Rawlins, and Benjamin
Van Durme. 2017. The semantic proto-role
linking model. 在诉讼程序中
the 15th
欧洲分会会议
计算语言学协会:
体积 2, Short Papers, pages 92–98, Va-
lencia, 西班牙. Association for Computational
语言学.
Aaron Steven White, Drew Reisinger, Keisuke
Sakaguchi, Tim Vieira, Sheng Zhang, 雷切尔
and Benjamin
Rudinger, Kyle Rawlins,
Van Durme. 2016. Universal decompositional
semantics on Universal Dependencies. In Pro-
ceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing,
pages 1713–1723, Austin, 德克萨斯州. 协会
for Computational Linguistics.
Aaron Steven White, Elias Stengel-Eskin, Siddharth
Vashishtha, Venkata Subrahmanyan Govindarajan,
Dee Ann Reisinger, Tim Vieira, Keisuke
Sakaguchi, Sheng Zhang, Francis Ferraro,
Rachel Rudinger, Kyle Rawlins, and Benjamin
Van Durme. 2020. The universal decomposi-
tional semantics dataset and decomp toolkit. 在
33
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Proceedings of the 12th Language Resources
and Evaluation Conference, pages 5698–5707,
Marseille, 法国. European Language Re-
sources Association.
Sheng Zhang, Rachel Rudinger, and Benjamin
Van Durme. 2017. An Evaluation of PredPatt
and Open IE via Stage 1 Semantic Role Label-
英. In IWCS 2017—12th International Con-
ference on Computational Semantics—Short
文件.
Ben Zhou, Daniel Khashabi, Qiang Ning, 和
Dan Roth. 2019.
‘‘going on a vacation’’
takes longer than ‘‘going for a walk’’: A
study of temporal commonsense understand-
英. 在诉讼程序中 2019 会议
Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Confe-
rence
on Natural Language Processing
(EMNLP-IJCNLP), pages 3363–3369, 洪
孔, 中国. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D19-1332
Ben Zhou, Qiang Ning, Daniel Khashabi, and Dan
Roth. 2020. Temporal common sense acqui-
sition with minimal supervision. In Proceed-
ings of
这
计算语言学协会,
pages 7579–7589, 在线的. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.acl-main.678
the 58th Annual Meeting of
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
4
4
5
1
9
8
6
6
0
4
/
/
t
我
A
C
_
A
_
0
0
4
4
5
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
34