Frame-Semantic Parsing

∗

Dipanjan Das
Google Inc.

∗∗

Desai Chen
麻省理工学院

†

Andr´e F. 时间. 马丁斯
Priberam Labs
Instituto de Telecomunicac¸ ˜oes

‡
Nathan Schneider
卡内基梅隆大学

诺亚A. Smith§
卡内基梅隆大学

Frame semantics is a linguistic theory that has been instantiated for English in the FrameNet
lexicon. We solve the problem of frame-semantic parsing using a two-stage statistical model
that takes lexical targets (IE。, content words and phrases) in their sentential contexts and
predicts frame-semantic structures. Given a target in context, the ﬁrst stage disambiguates it to a
semantic frame. This model uses latent variables and semi-supervised learning to improve frame
disambiguation for targets unseen at training time. The second stage ﬁnds the target’s locally
expressed semantic arguments. At inference time, a fast exact dual decomposition algorithm
collectively predicts all the arguments of a frame at once in order to respect declaratively stated
linguistic constraints, resulting in qualitatively better structures than na¨ıve local predictors.
Both components are feature-based and discriminatively trained on a small set of annotated
frame-semantic parses. On the SemEval 2007 benchmark data set, the approach, along with a
heuristic identiﬁer of frame-evoking targets, outperforms the prior state of the art by signiﬁcant
margins. 此外, we present experiments on the much larger FrameNet 1.5 数据集. 我们
have released our frame-semantic parser as open-source software.

∗ Google Inc., 纽约, 纽约 10011. 电子邮件: dipanjand@google.com.
∗∗ Computer Science and Artiﬁcial Intelligence Laboratory, 麻省理工学院,

剑桥, 嘛 02139. 电子邮件: desaic@csail.mit.edu.

○
† Alameda D. Afonso Henriques, 41 – 2.
‡ School of Computer Science, 卡内基梅隆大学, Pittsburgh, PA 15213.

Andar, 1000-123, Lisboa, Portugal. 电子邮件: atm@priberam.pt.

电子邮件: nschneid@cs.cmu.edu.

§ School of Computer Science, 卡内基梅隆大学, Pittsburgh, PA 15213.

电子邮件: nasmith@cs.cmu.edu.

提交材料已收到: 4 可能 2012; revised submission received: 10 十一月 2012; 接受出版:
22 十二月 2012.

土井:10.1162/大肠杆菌a 00163

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

1. 介绍

FrameNet (Fillmore, 约翰逊, and Petruck 2003) is a linguistic resource storing consider-
able information about lexical and predicate-argument semantics in English. Grounded
in the theory of frame semantics (Fillmore 1982), it suggests—but does not formally
deﬁne—a semantic representation that blends representations familiar from word-sense
disambiguation (Ide and V´eronis 1998) and semantic role labeling (SRL; Gildea and
Jurafsky 2002). Given the limited size of available resources, accurately producing
richly structured frame-semantic structures with high coverage will require data-driven
techniques beyond simple supervised classiﬁcation, such as latent variable modeling,
semi-supervised learning, and joint inference.

在本文中, we present a computational and statistical model for frame-semantic
解析, the problem of extracting from text semantic predicate-argument structures
such as those shown in Figure 1. We aim to predict a frame-semantic representation
with two statistical models rather than a collection of local classiﬁers, unlike earlier ap-
proaches (贝克, Ellsworth, and Erk 2007). We use a probabilistic framework that cleanly
integrates the FrameNet lexicon and limited available training data. The probabilistic
framework we adopt is highly amenable to future extension through new features, 更多的
relaxed independence assumptions, and additional semi-supervised models.

Carefully constructed lexical resources and annotated data sets from FrameNet,
detailed in Section 3, form the basis of the frame structure prediction task. We de-
compose this task into three subproblems: target identiﬁcation (部分 4), 其中
frame-evoking predicates are marked in the sentence; frame identiﬁcation (部分 5),
in which the evoked frame is selected for each predicate; and argument identiﬁcation
(部分 6), in which arguments to each frame are identiﬁed and labeled with a role from
that frame. Experiments demonstrating favorable performance to the previous state of
the art on SemEval 2007 and FrameNet data sets are described in each section. 一些
novel aspects of our approach include a latent-variable model (部分 5.2) and a semi-
supervised extension of the predicate lexicon (部分 5.5) to facilitate disambiguation of
words not in the FrameNet lexicon; a uniﬁed model for ﬁnding and labeling arguments

数字 1
An example sentence from the annotations released as part of FrameNet 1.5 with three targets
marked in bold. Note that this annotation is partial because not all potential targets have been
annotated with predicate-argument structures. Each target has its evoked semantic frame
marked above it, enclosed in a distinct shape or border style. For each frame, its semantic roles
are shown enclosed within the same shape or border style, and the spans fulﬁlling the roles are
connected to the latter using dotted lines. 例如, manner evokes the CONDUCT frame, 和
has the AGENT and MANNER roles fulﬁlled by Austria and most un-Viennese, 分别.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

(部分 6) that diverges from prior work in semantic role labeling; and an exact dual
decomposition algorithm (部分 7) that collectively predicts all the arguments of a
frame together, thereby incorporating linguistic constraints in a principled fashion.

Our open-source parser, named SEMAFOR (Semantic Analyzer of Frame Represen-
tations)1 achieves the best published results to date on the SemEval 2007 frame-semantic
structure extraction task (贝克, Ellsworth, and Erk 2007). Herein, we also present
results on newly released data with FrameNet 1.5, the latest edition of the lexicon.
Some of the material presented in this article has appeared in previously published
conference papers: Das et al. (2010) presented the basic model, Das and Smith (2011)
described semi-supervised lexicon expansion, Das and Smith (2012) demonstrated a
sparse variant of lexicon expansion, and Das, 马丁斯, 和史密斯 (2012) presented the
dual decomposition algorithm for constrained joint argument identiﬁcation. We present
here a synthesis of those results and several additional details:

1. The set of features used in the two statistical models for frame identiﬁcation and

argument identiﬁcation.

2. Details of a greedy beam search algorithm for argument identiﬁcation that avoids

illegal argument overlap.

3. Error analysis pertaining to the dual decomposition argument identiﬁcation algo-

rithm, in contrast with the beam search algorithm.

4. Results on full frame-semantic parsing using graph-based semi-supervised learn-
ing with sparsity-inducing penalties; this expands the small FrameNet predicate
lexicon, enabling us to handle unknown predicates.

Our primary contributions are the use of efﬁcient structured prediction tech-
niques suited to shallow semantic parsing problems, novel methods in semi-supervised
learning that improve the lexical coverage of our parser, and making frame-semantic
structures a viable computational semantic representation usable in other language
技术. To set the stage, we next consider related work in the automatic prediction
of predicate-argument semantic structures.

2. 相关工作

在这个部分, we will focus on previous scientiﬁc work relevant to the problem of
frame-semantic parsing. 第一的, we will brieﬂy discuss work done on PropBank-style
semantic role labeling, following which we will concentrate on the more relevant prob-
lem of frame-semantic structure extraction. 下一个, we review previous work that has
used semi-supervised learning for shallow semantic parsing. 最后, we discuss prior
work on joint structure prediction relevant to frame-semantic parsing.

2.1 Semantic Role Labeling

Since Gildea and Jurafsky (2002) pioneered statistical semantic role labeling, 那里
has been a great deal of computational work using predicate-argument structures
for semantics. The development of PropBank (Kingsbury and Palmer 2002), followed
by CoNLL shared tasks on semantic role labeling (Carreras and M`arquez 2004,
2005) boosted research in this area. 数字 2(A) shows an annotation from PropBank.
PropBank annotations are closely tied to syntax, because the data set consists of the

1 参见http://www.ark.cs.cmu.edu/SEMAFOR.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

(A)

(乙)

数字 2
(A) A phrase-structure tree taken from the Penn Treebank and annotated with PropBank
predicate-argument structures. The verbs created and pushed serve as predicates in this
句子. Dotted arrows connect each predicate to its semantic arguments (bracketed phrases).
(乙) A partial depiction of frame-semantic structures for the same sentence. The words in bold
are targets, which instantiate a (lemmatized and part-of-speech–tagged) lexical unit and evoke
a semantic frame. Every frame annotation is shown enclosed in a distint shape or border style,
and its argument labels are shown together on the same vertical tier below the sentence.
See text for explanation of abbreviations.

phrase-structure syntax trees from the Wall Street Journal section of the Penn Treebank
(马库斯, 马尔辛凯维奇, and Santorini 1993) annotated with predicate-argument
structures for verbs. 图中 2(A), the syntax tree for the sentence is marked with
various semantic roles. The two main verbs in the sentence, created and pushed, 是
the predicates. 对于前者, the constituent more than 1.2 million jobs serves as the
semantic role ARG1 and the constituent In that time serves as the role ARGM-TMP. 相似地
for the latter verb, roles ARG1, ARG2, ARGM-DIR, and ARGM-TMP are shown in the ﬁgure.
PropBank deﬁnes core roles ARG0 through ARG5, which receive different interpretations
for different predicates. Additional modiﬁer roles ARGM-* include ARGM-TMP (颞)
and ARGM-DIR (directional), 如图 2(A). The PropBank representation
therefore has a small number of roles, and the training data set comprises some
40,000 句子, thus making the semantic role labeling task an attractive one from the
perspective of machine learning.

There are many instances of inﬂuential work on semantic role labeling using
PropBank conventions. Pradhan et al. (2004) present a system that uses support vector
machines (SVMs) to identify the arguments in a syntax tree that can serve as semantic
角色, followed by classiﬁcation of the identiﬁed arguments to role names via a collection
of binary SVMs. Punyakanok et al. (2004) describe a semantic role labeler that uses inte-
ger linear programming for inference and uses several global constraints to ﬁnd the best

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

suited predicate-argument structures. Joint modeling for semantic role labeling with
discriminative log-linear models is presented by Toutanova, Haghighi, and Manning
(2005), where global features looking at all arguments of a particular verb together are
incorporated into a dynamic programming and reranking framework. The Computa-
tional Linguistics special issue on semantic role labeling (M`arquez et al. 2008) includes
other interesting papers on the topic, leveraging the PropBank conventions for labeling
shallow semantic structures. 最近, there have been initiatives to predict syntactic
dependencies as well as PropBank-style predicate-argument structures together using
one joint model (Surdeanu et al. 2008; Hajiˇc et al. 2009).

这里, we focus on the related problem of frame-semantic parsing. Note from the
annotated semantic roles for the two verbs in the sentence of Figure 2(A) that it is
unclear what the core roles ARG1 or ARG2 represent linguistically. To better understand
the roles’ meaning for a given verb, one has to refer to a verb-speciﬁc ﬁle provided along
with the PropBank corpus. Although collapsing these verb-speciﬁc core roles into tags
ARG0-ARG5 leads to a small set of classes to be learned from a reasonable sized corpus,
analysis shows that the roles ARG2–ARG5 serve many different purposes for different
动词. Yi, Loper, and Palmer (2007) point out that these four roles are highly overloaded
and inconsistent, and they mapped them to VerbNet (Schuler 2005) thematic roles to
get improvements on the SRL task. 最近, Bauer and Rambow (2011) 提出
a method to improve the syntactic subcategorization patterns for FrameNet lexical
units using VerbNet. Instead of working with PropBank, we focus on shallow semantic
parsing of sentences in the paradigm of frame semantics (Fillmore 1982), to which we
turn next.

2.2 Frame-Semantic Parsing

The FrameNet lexicon (Fillmore, 约翰逊, and Petruck 2003) contains rich linguistic
information about lexical items and predicate-argument structures. A semantic frame
present in this lexicon includes a list of lexical units, which are associated words
and phrases that can potentially evoke it in a natural language utterance. Each frame
in the lexicon also enumerates several roles corresponding to facets of the scenario
represented by the frame. In a frame-analyzed sentence, predicates evoking frames
are known as targets, and a word or phrase ﬁlling a role is known as an argument.
数字 2(乙) shows frame-semantic annotations for the same sentence as in Figure 2(A).
(In the ﬁgure, 例如, the CARDINAL NUMBERS frame, “M” denotes the role Multiplier
and “E” denotes the role Entity.) Note that the verbs created and pushed evoke the frames
INTENTIONALLY CREATE and CAUSE CHANGE POSITION ON A SCALE, 分别. 对应的-
ing lexical units2 from the FrameNet lexicon, create.V and push.V, are also shown.
The PropBank analysis in Figure 2(A) also has annotations for these two verbs. 尽管
PropBank labels the roles of these verbs with its limited set of tags, the frame-
semantic parse labels each frame’s arguments with frame-speciﬁc roles shown in the
ﬁgure, making it immediately clear what those arguments mean. 例如, 为了
INTENTIONALLY CREATE frame, 多于 1.2 million jobs is the Created entity, and In that time is
the Time when the jobs were created. FrameNet also allows non-verbal words and phrases
to evoke semantic frames: in this sentence, million evokes the frame CARDINAL NUMBERS
and doubles as its Number argument, 和 1.2 as Multiplier, jobs as the Entity being quantiﬁed,
and more than as the Precision of the quantity expression.

2 See Section 5.1 for a detailed description of lexical units.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

EVENT

Event

Place

时间
event.n, happen.v,
occur.v, take place.v, …

OBJECTIVE_INFLUENCE

Place

时间

uencing_entity

uencing_situation

Dependent_entity
affect.v, effect.n,
impact.n, impact.v, …

TRANSITIVE_ACTION

CAUSE_TO_MAKE_NOISE

MAKE_NOISE

Event

Place

时间

Agent

Cause

Patient

—

Purpose

Place

时间

Agent

Cause

Sound_maker
blare.v, honk.v, play.v,
ring.v, toot.v, …

Sound

Place

时间

Noisy_event

Sound_source
cough.v, gobble.v,
hiss.v, ring.v, yodel.v, …

Inheritance relation

Causative_of relation

Excludes relation

数字 3
Partial illustration of frames, 角色, and lexical units related to the CAUSE TO MAKE NOISE frame,
from the FrameNet lexicon. Core roles are ﬁlled bars. Non-core roles (such as Place and Time) 是
unﬁlled bars. No particular signiﬁcance is ascribed to the ordering of a frame’s roles in its
lexicon entry (the selection and ordering of roles above is for illustrative convenience).
CAUSE TO MAKE NOISE deﬁnes a total of 14 角色, many of them not shown here.

Whereas PropBank contains verbal predicates and NomBank (Meyers et al. 2004) 骗局-
tains nominal predicates, FrameNet counts these as well as allowing adjectives, adverbs,
and prepositions among its lexical units. 最后, FrameNet frames organize predicates
according to semantic principles, both by allowing related terms to evoke a common
frame (例如, push.V, raise.V, and growth.N for CAUSE CHANGE POSITION ON A SCALE) 并由
deﬁning frames and their roles within a hierarchy (见图 3). PropBank does not
explicitly encode relationships among predicates.

Most early work on frame-semantic parsing has made use of the exemplar sentences
in the FrameNet corpus (参见章节 3.1), each of which is annotated for a single frame
and its arguments. Gildea and Jurafsky (2002) presented a discriminative model for
arguments given the frame; 汤普森, 征收, and Manning (2003) used a generative
model for both the frame and its arguments. Fleischman, 权, 和霍维 (2003) 首先
used maximum entropy models to ﬁnd and label arguments given the frame. Shi and
Mihalcea (2004) developed a rule-based system to predict frames and their arguments
in text, and Erk and Pad ´o (2006) introduced the Shalmaneser tool, which uses naive
Bayes classiﬁers to do the same. Other FrameNet SRL systems (Giuglea and Moschitti
2006, 例如) have used SVMs. Most of this work was done on an older, 较小
version of FrameNet, containing around 300 frames and fewer than 500 unique semantic
角色. Unlike this body of work, we experimented with the larger SemEval 2007 共享
task data set, and also the newer FrameNet 1.5,3 which lists 877 frames and 1,068 角色
types—thus handling many more labels, and resulting in richer frame-semantic parses.
Recent work in frame-semantic parsing—in which sentences may contain multiple
frames which need to be recognized along with their arguments—was undertaken
as the SemEval 2007 任务 19 of frame-semantic structure extraction (贝克, Ellsworth,
and Erk 2007). This task leveraged FrameNet 1.3, and also released a small corpus

3 可在 http 上获取://framenet.icsi.berkeley.edu as of 19 一月 2013.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

containing a little more than 2,000 sentences with full text annotations. The LTH system
of Johansson and Nugues (2007), which we use as our baseline (部分 3.4), had the
best performance in the SemEval 2007 task in terms of full frame-semantic parsing.
Johansson and Nugues broke down the task as identifying targets that could evoke
frames in a sentence, identifying the correct semantic frame for a target, and ﬁnally
determining the arguments that ﬁll the semantic roles of a frame. They used a series
of SVMs to classify the frames for a given target, associating unseen lexical items to
frames and identifying and classifying token spans as various semantic roles. 两个都
the full text annotation corpus as well as the FrameNet exemplar sentences were
used to train their models. Unlike Johansson and Nugues, we use only the full text
annotated sentences as training data, model the whole problem with only two statis-
tical models, and obtain signiﬁcantly better overall parsing scores. We also model the
argument identiﬁcation problem using a joint structure prediction model and use semi-
supervised learning to improve predicate coverage. We also present experiments on
recently released FrameNet 1.5 数据.

In other work based on FrameNet, Matsubayashi, Okazaki, and Tsujii (2009) 在-
vestigated various uses of FrameNet’s taxonomic relations for learning generalizations
over roles; they trained a log-linear model on the SemEval 2007 data to evaluate features
for the subtask of argument identiﬁcation. Another line of work has sought to extend
the coverage of FrameNet by exploiting VerbNet and WordNet (Shi and Mihalcea
2005; Giuglea and Moschitti 2006; Pennacchiotti et al. 2008) and by projecting entries
and annotations within and across languages (Boas 2002; Fung and Chen 2004; Pado
和拉帕塔 2005; F ¨urstenau and Lapata 2009b). Others have explored the application
of frame-semantic structures to tasks such as information extraction (Moschitti,
Morarescu, and Harabagiu 2003; Surdeanu et al. 2003), textual entailment (Burchardt
and Frank 2006; Burchardt et al. 2009), question answering (Narayanan and Harabagiu
2004; Shen and Lapata 2007), and paraphrase recognition (Pad ´o and Erk 2005).

2.3 Semi-Supervised Methods

Although there has been a signiﬁcant amount of work in supervised shallow semantic
parsing using both PropBank- and FrameNet-style representations, a few improve-
ments over vanilla supervised methods using unlabeled data are notable. F ¨urstenau and
警告 (2009乙) present a method of projecting predicate-argument structures from some
seed examples to unlabeled sentences, and use a linear program formulation to ﬁnd
the best alignment explaining the projection. 下一个, the projected information as well
as the seeds are used to train statistical model(s) for SRL. The authors ran experiments
using a set of randomly chosen verbs from the exemplar sentences of FrameNet and
found improvements over supervised methods. In an extension to this work, F ¨urstenau
和拉帕塔 (2009A) present a method for ﬁnding examples for unseen verbs using a
graph alignment method; this method represents sentences and their syntactic analysis
as graphs and graph alignment is used to project annotations from seed examples to
unlabeled sentences. This alignment problem is again modeled as a linear program.
F ¨urstenau and Lapata (2012) present an detailed expansion of the aforementioned
文件. Although this line of work presents a novel direction in the area of SRL, 这
published approach does not yet deal with non-verbal predicates and does not evaluate
the presented methods on the full text annotations of the FrameNet releases.

Deschacht and Moens (2009) present a technique of incorporating additional infor-
mation from unlabeled data by using a latent words language model. Latent variables
are used to model the underlying representation of words, and parameters of this model

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

are estimated using standard unsupervised methods. 下一个, the latent information is
used as features for an SRL model. Improvements over supervised SRL techniques
are observed with the augmentation of these extra features. The authors also compare
their method with the aforementioned two methods of F ¨urstenau and Lapata (2009A,
2009乙) and show relative improvements. Experiments are performed on the CoNLL
2008 shared task data set (Surdeanu et al. 2008), which follows the PropBank conven-
tions and only labels verbal and nominal predicates—in contrast to our work, 哪个
includes most lexicosyntactic categories. A similar approach is presented by Weston,
Ratle, and Collobert (2008), who use neural embeddings of words, which are eventu-
ally used for SRL; improvements over state-of-the-art PropBank-style SRL systems are
observed.

最近, there has been related work in unsupervised semantic role labeling (Lang
和拉帕塔 2010, 2011; Titov and Klementiev 2012) that attempts to induce semantic
roles automatically from unannotated data. This line of work may be useful in discov-
ering new semantic frames and roles, but here we stick to the concrete representation
provided in FrameNet, without seeking to expand its inventory of semantic types. 我们
present a new semi-supervised technique to expand the set of lexical items with the
potential semantic frames that they could evoke; we use a graph-based semi-supervised
learning framework to achieve this goal (部分 5.5).

2.4 Joint Inference and Shallow Semantic Parsing

Most high-performance SRL systems that use conventions from PropBank (Kingsbury
and Palmer 2002) and NomBank (Meyers et al. 2004) utilize joint inference for seman-
tic role labeling (M`arquez et al. 2008). To our knowledge, the separate line of work
investigating frame-semantic parsing has not previously dealt with joint inference. A
common trait in prior work, both in PropBank and FrameNet conventions, 已经
the use of a two-stage model that identiﬁes arguments ﬁrst, then labels them, 经常
using dynamic programming or integer linear programs (ILPs); we treat both problems
together here.4

Recent work in natural language processing (自然语言处理) problems has focused on ILP for-
mulations for complex structure prediction tasks like dependency parsing (Riedel and
克拉克 2006; 马丁斯, 史密斯, and Xing 2009; Martins et al. 2010), sequence tagging (Roth
and Yih 2004), as well as PropBank SRL (Punyakanok et al. 2004). Whereas early work
in this area focused on declarative formulations tackled with off-the-shelf solvers, 匆忙
等人. (2010) proposed subgradient-based dual decomposition (also called Lagrangian
relaxation) as a way of exploiting the structure of the problem and existing combina-
torial algorithms. The method allows the combination of models that are individually
tractable, but not jointly tractable, by solving a relaxation of the original problem. 自从
然后, dual decomposition has been used to build more accurate models for dependency
解析 (Koo et al. 2010), combinatory categorical grammar supertagging and parsing
(Auli and Lopez 2011), and machine translation (Chang and Collins 2011; DeNero and
Macherey 2011; Rush and Collins 2011).

最近, Martins et al. (2011乙) showed that the success of subgradient-based dual
decomposition strongly relies on breaking down the original problem into a “good”

4 In prior work, there are exceptions where identiﬁcation and classiﬁcation of arguments have been treated
in one step; for more details, please refer to the systems participating in the CoNLL-2004 shared task on
semantic role labeling (Carreras and M`arquez 2004).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

分解, 那是, one with few overlapping components. This leaves out many
declarative constrained problems, for which such a good decomposition is not readily
可用的. For those, Martins et al. proposed the Alternating Directions Dual Decom-
位置 (AD3) algorithm, which retains the modularity of previous methods, but can
handle thousands of small overlapping components. We adopt that algorithm as it
perfectly suits the problem of argument identiﬁcation, as we observe in Section 7.5 我们
also contribute an exact branch-and-bound technique wrapped around AD3.

Before delving into the details of our modeling framework, we describe in detail the

structure of the FrameNet lexicon and the data sets used to train our models.

3. Resources and Task

We consider frame-semantic parsing resources consisting of a lexicon and annotated
sentences with frame-semantic structures, evaluation strategies, and previous baselines.

3.1 FrameNet Lexicon

The FrameNet lexicon is a taxonomy of manually identiﬁed general-purpose semantic
frames for English.6 Listed in the lexicon with each frame are a set of lemmas (和
parts of speech) that can denote the frame or some aspect of it—these are called lexical
units (LUs). In a sentence, word or phrase tokens that evoke a frame are known as
targets. The set of LUs listed for a frame in FrameNet may not be exhaustive; we may
see a target in new data that does not correspond to an LU for the frame it evokes.
Each frame deﬁnition also includes a set of frame elements, or roles, 相应的
to different aspects of the concept represented by the frame, such as participants,
props, and attributes. We use the term argument to refer to a sequence of word tokens
annotated as ﬁlling a frame role. 数字 1 shows an example sentence from the training
data with annotated targets, LUs, frames, and role-argument pairs. The FrameNet
lexicon also provides information about relations between frames and between roles
(例如, INHERITANCE). 数字 3 shows a subset of the relations between ﬁve frames and
their roles.

Accompanying most frame deﬁnitions in the FrameNet lexicon is a set of lexico-
graphic exemplar sentences (primarily from the British National Corpus) annotated
for that frame. Typically chosen to illustrate variation in argument realization pat-
terns for the frame in question, these sentences only contain annotations for a single
frame.

In preliminary experiments, we found that using exemplar sentences directly to
train our models hurt performance as evaluated on SemEval 2007 数据, which formed
a benchmark for comparison with previous state of the art. This was a noteworthy
观察, given that the number of exemplar sentences is an order of magnitude
larger than the number of sentences in training data that we consider in our experiments
(部分 3.2). This is presumably because the exemplars are not representative as a
sample, do not have complete annotations, and are not from a domain similar to the

5 AD3 was previously referred to as “DD-ADMM,” in reference to the use of dual decomposition with the

alternating directions method of multipliers.

6 Like the SemEval 2007 参与者, we used FrameNet 1.3 and also the newer version of the lexicon,

FrameNet 1.5 (http://framenet.icsi.berkeley.edu).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

桌子 1
Salient statistics of the data sets used in our experiments. There is a signiﬁcant overlap between
the two data sets.

SemEval 2007 数据
数数

FrameNet 1.5 Release
数数

Exemplar sentences
Frame labels (类型)
Role labels (类型)
Sentences in training data
Targets in training data
Sentences in test data
Targets in test data
Unseen targets in test data

139,439
665
720
2,198
11,195
120
1,059
210

154,607
877
1,068
3,256
19,582
2,420
4,458
144

test data. 反而, we make use of these exemplars in the construction of features
(部分 5.2).

3.2 数据

In our experiments on frame-semantic parsing, we use two sets of data:

1. SemEval 2007 数据: In benchmark experiments for comparison with previous
state of the art, we use a data set that was released as part of the SemEval 2007
shared task on frame-semantic structure extraction (贝克, Ellsworth, and Erk 2007).
Full text annotations in this data set consisted of a few thousand sentences con-
taining multiple targets, each annotated with a frame and its arguments. The then-
current version of the lexicon (FrameNet 1.3) was used for the shared task as the
inventory of frames, 角色, and lexical units (数字 3 illustrates a small portion
of the lexicon). In addition to the frame hierarchy, FrameNet 1.3 also contained
139,439 exemplar sentences containing one target each. Statistics of the data used
for the SemEval 2007 shared task are given in the ﬁrst column of Table 1. A total
的 665 frame types and 720 role types appear in the exemplars and the training
portion of the data. We adopted the same training and test split as the SemEval
2007 shared task; 然而, we removed four documents from the training set7 for
发展. 桌子 2 shows some additional information about the SemEval data
放; the variety of lexicosyntactic categories of targets stands in marked contrast
with the PropBank-style SRL data and task.

2. FrameNet 1.5 发布: A more recent version of the FrameNet lexicon was released
在 2010.8 We also test our statistical models (only frame identiﬁcation and argu-
ment identiﬁcation) on this data set to get an estimate of how much improvement
additional data can provide. Details of this data set are shown in the second col-
umn of Table 1. 的 78 documents in this release with full text annotations, 我们
selected 55 (19,582 targets) for training and held out the remaining 23 (4,458 tar-
gets) for testing. There are fewer target annotations per sentence in the test set than

7 These were: StephanopoulousCrimes, Iran Biological, NorthKorea Introduction, and WMDNews 042106.
8 Released on 15 九月 2010, and downloadable from http://framenet.icsi.berkeley.edu as of

13 二月 2013. In our experiments, we used a version downloaded on 22 九月 2010.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

桌子 2
Breakdown of targets and arguments in the SemEval 2007 training set in terms of part of speech.
The target POS is based on the LU annotation for the frame instance. For arguments, this reﬂects
the part of speech of the head word (estimated from an automatic dependency parse); 这
percentage is out of all overt arguments.

targets
数数 %

Noun
动词
Adjective
Preposition
Adverb
数字
Conjunction
文章

5,155
2,785
1,411
296
103
63
8
3

9,824

52 Noun
28
14

Preposition or
complementizer

3 Adjective
1 动词
1

Pronoun
Adverb
其他

论据
数数 %

15
10
7
4
2
6

9,439

2,553
1,744
1,156
736
373
1,047

17,048

the training set.9 Das and Smith (2011, supplementary material) give the names
of the test documents for fair replication of our work. We also randomly selected
4,462 targets from the training data for development of the argument identiﬁcation
模型 (部分 6.1).

预处理. We preprocessed sentences in our data set with a standard set of anno-
tations: POS tags from MXPOST (拉特纳帕尔基 1996) and dependency parses from the
MST parser (麦当劳, Crammer, 和佩雷拉 2005); manual syntactic parses are not
available for most of the FrameNet-annotated documents. We used WordNet (Fellbaum
1998) for lemmatization. Our models treat these pieces of information as observations.
We also labeled each verb in the data as having ACTIVE or PASSIVE voice, using code
from the SRL system described by Johansson and Nugues (2008).

3.3 Task and Evaluation Methodology

Automatic annotations of frame-semantic structure can be broken into three parts:
(1) targets, the words or phrases that evoke frames; (2) the frame type, deﬁned in the
lexicon, evoked by each target; 和 (3) the arguments, or spans of words that serve
to ﬁll roles deﬁned by each evoked frame. These correspond to the three subtasks
in our parser, each described and evaluated in turn: target identiﬁcation (部分 4),
frame identiﬁcation (部分 5, not unlike word-sense disambiguation), and argument
识别 (部分 6, essentially the same as semantic role labeling).

The standard evaluation script from the SemEval 2007 shared task calculates pre-
切除术, 记起, and F1-measure for frames and arguments; it also provides a score that
gives partial credit for hypothesizing a frame related to the correct one. We present

9 For creating the splits, we ﬁrst included the documents that had incomplete annotations as mentioned in
the initial FrameNet 1.5 data release in the test set; because we do not evaluate target identiﬁcation for
this version of data, the small number of targets per sentence does not matter. After these documents
were put into the test set, we randomly selected 55 remaining documents for training, and picked the
rest for additional testing. The ﬁnal test set contains a total of 23 文件. When these documents
are annotated in their entirety, the test set will become larger and the training set will be unaltered.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

precision, 记起, and F1-measure microaveraged across the test documents, report labels-
only matching scores (spans must match exactly), and do not use named entity labels.10
More details can be found in the task description paper from SemEval 2007 (贝克,
Ellsworth, and Erk 2007). 对于我们的实验, statistical signiﬁcance is measured using
a reimplementation of Dan Bikel’s randomized parsing evaluation comparator, a strat-
iﬁed shufﬂing test whose original implementation11 is accompanied by the following
description (quoted verbatim, with explanations of our use of the test given in square
brackets):

The null hypothesis is that the two models that produced the observed results are the
相同的, such that for each test instance [这里, a set of predicate-argument structures for a
句子], the two observed scores are equally likely. This null hypothesis is tested by
randomly shufﬂing individual sentences’ scores between the two models and then
re-computing the evaluation metrics [precision, recall or F1 score in our case]. 如果
difference in a particular metric after a shufﬂing is equal to or greater than the original
observed difference in that metric, then a counter for that metric is incremented. 理想情况下,
one would perform all 2n shufﬂes, where n is the number of test cases (句子), 但
given that this is often prohibitively expensive, the default number of iterations is
10,000 [we use independently sampled 10,000 shufﬂes]. After all iterations, 这
likelihood of incorrectly rejecting the null [假设, IE。, the p-value] is simply
(数控 + 1)/(恩特 + 1), where nc is the number of random differences greater than the
original observed difference, and nt is the total number of iterations.

3.4 Baseline

A strong baseline for frame-semantic parsing is the system presented by Johansson and
Nugues (2007, hereafter J&N’07), the best system in the SemEval 2007 shared task. 那
system is based on a collection of SVMs. They used a set of rules for target identiﬁcation
which we describe in Appendix A. For frame identiﬁcation, they used an SVM classiﬁer
to disambiguate frames for known frame-evoking words. They used WordNet synsets
to extend the vocabulary of frame-evoking words to cover unknown words, 进而
used a collection of separate SVM classiﬁers—one for each frame—to predict a single
evoked frame for each occurrence of a word in the extended set.

J&N’07 followed Xue and Palmer (2004) in dividing the argument identiﬁcation
problem into two subtasks: 第一的, they classiﬁed candidate spans as to whether they
were arguments or not; then they assigned roles to those that were identiﬁed as ar-
guments. Both phases used SVMs. 因此, their formulation of the problem involves
a multitude of independently trained classiﬁers that share no information—whereas
ours uses two log-linear models, each with a single set of parameters shared across all
上下文, to ﬁnd a full frame-semantic parse.

We compare our models with J&N’07 using the benchmark data set from SemEval
2007. 然而, because we are not aware of any other work using the FrameNet 1.5 满的
text annotations, we report our results on that data set without comparison to any other
系统.

10 For microaveraging, we concatenated all sentences of the test documents and measured precision and

recall over the concatenation. Macroaveraging, 另一方面, would mean calculating these metrics
for each document, then averaging them. Microaveraging treats every frame or argument as a unit,
regardless of the length of the document in which it occurs.

11 参见http://www.cis.upenn.edu/(西德:2)dbikel/software.html#comparator.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

4. Target Identiﬁcation

Target identiﬁcation is the problem of deciding which word tokens (or word token
序列) evoke frames in a given sentence. In other semantic role labeling schemes
(例如, PropBank), simple part-of-speech criteria typically distinguish targets from non-
targets. But in frame semantics, 动词, nouns, 形容词, and even prepositions can
evoke frames under certain conditions. One complication is that semantically impov-
erished support predicates (such as make in make a request) do not evoke frames in the
context of a frame-evoking, syntactically dependent noun (request). 此外, 仅有的
颞, locative, and directional senses of prepositions evoke frames.12

Preliminary experiments using a statistical method for target identiﬁcation gave
unsatisfactory results; 反而, we followed J&N’07 in using a small set of rules to
identify targets. 第一的, we created a master list of all the morphological variants of
targets that appear in the exemplar sentences and a given training set. For a sentence in
新数据, we considered as candidate targets only those substrings that appear in this
master list. We also did not attempt to capture discontinuous frame targets: 例如,
we treat there would have been as a single span even though the corresponding LU is
there be.V.13

下一个, we pruned the candidate target set by applying a series of rules identical
to the ones described by Johansson and Nugues (2007, see Appendix A), with two
例外情况. 第一的, they identiﬁed locative, 颞, and directional prepositions using
a dependency parser so as to retain them as valid LUs. 相比之下, we pruned all types
of prepositions because we found them to hurt our performance on the development
set due to errors in syntactic parsing. In a second departure from their target extraction
规则, we did not remove the candidate targets that had been tagged as support verbs
for some other target. Note that we used a conservative white list that ﬁlters out targets
whose morphological variants were not seen either in the lexicon or the training data.14
所以, when this conservative process of automatic target identiﬁcation is used, 我们的
system loses the capability to predict frames for completely unseen LUs, despite the fact
that our powerful frame identiﬁcation model (部分 5) can accurately label frames for
new LUs.15

结果. 桌子 3 shows results on target identiﬁcation tested on the SemEval 2007 测试
放; our system gains 3 F1 points over the baseline. This is statistically signiﬁcant with
p < 0.01. Our results are also signiﬁcant in terms of precision (p < 0.05) and recall (p < 0.01). There are 85 distinct LUs for which the baseline fails to identify the correct target while our system succeeds. A considerable proportion of these units have more than 12 Note that there have been dedicated shared tasks to determine relationships between nominals (Girju et al. 2007) and word-sense disambiguation of prepositions (Litkowski and Hargraves 2007), but we do not build speciﬁc models for predicates of these categories. 13 There are 629 multiword LUs in the lexicon, and they correspond to 4.8% of the targets in the training set; among them are screw up.V, shoot the breeze.V, and weapon of mass destruction.N. In the SemEval 2007 training data, there are just 99 discontinuous multiword targets (1% of all targets). 14 This conservative approach violates theoretical linguistic assumptions about frame-evoking targets as governed by frame semantics. It also goes against the spirit of using linguistic constraints to improve the separate subtask of argument identiﬁcation (see Section 7); however, due to varying distributions of target annotations, high empirical error in identifying locative, temporal, and directional prepositions, and support verbs, we resorted to this aggressive ﬁltering heuristic to avoid making too many target identiﬁcation mistakes. 15 To predict frames and roles for new and unseen LUs, SEMAFOR provides the user with an option to mark those LUs in the input. 21 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 Table 3 Target identiﬁcation results for our system and the baseline on the SemEval’07 data set. Scores in bold denote signiﬁcant improvements over the baseline (p < 0.05). TARGET IDENTIFICATION P R F1 Our technique (§4) Baseline:J&N’07 89.92 79.21 70.79 87.87 67.11 76.10 one token (e.g., chemical and biological weapon.N, ballistic missile.N), which J&N’07 do not model. The baseline also does not label variants of there be.V (e.g., there are and there has been), which we correctly label as targets. Some examples of other single token LUs that the baseline fails to identify are names of months, LUs that belong to the ORIGIN frame (e.g., iranian.A), and directions (e.g., north.A or north-south.A).16 5. Frame Identiﬁcation Given targets, our parser next identiﬁes their frames, using a statistical model. 5.1 Lexical Units FrameNet speciﬁes a great deal of structural information both within and among frames. For frame identiﬁcation we make use of frame-evoking lexical units, the (lem- matized and POS-tagged) words and phrases listed in the lexicon as referring to speciﬁc frames. For example, listed with the BRAGGING frame are 10 LUs, including boast.N, boast.V, boastful.A, brag.V, and braggart.N. Of course, due to polysemy and homonymy, the same LU may be associated with multiple frames; for example, gobble.V is listed under both the INGESTION and MAKE NOISE frames. We thus term gobble.V an ambiguous LU. All targets in the exemplar sentences, our training data, and most in our test data, correspond to known LUs. (See Section 5.4 for statistics of unknown LUs in the test sets.) To incorporate frame-evoking expressions found in the training data but not the lexicon—and to avoid the possibility of lemmatization errors—our frame identiﬁcation model will incorporate, via a latent variable, features based directly on exemplar and training targets rather than LUs. Let L be the set of (unlemmatized and automati- cally POS-tagged) targets found in the exemplar sentences of the lexicon and/or the ⊆ L be the subset of these targets annotated as sentences in our training set. Let L evoking a particular frame f .17 Let Ll and Ll f denote the lemmatized versions of L and f , respectively. Then, we write boasted.VBD ∈ L L BRAGGINGto indicate that this inﬂected verb boasted and its lemma boast have been seen to evoke the BRAGGING frame. Signiﬁcantly, however, another target, such as toot your own horn, might be used elsewhere to evoke this frame. We thus face the additional hurdle of predicting frames for unknown words. BRAGGING and boast.VBD ∈ Ll f 16 We do not evaluate the target identiﬁcation module on the FrameNet 1.5 data set; we instead ran controlled experiments on those data to measure performance of the statistical frame identiﬁcation and argument identiﬁcation subtasks, assuming that the correct targets were given. Moreover, as discussed in Section 3.2, the target annotations on the FrameNet 1.5 test set were fewer in number in comparison to the training set, resulting in a mismatch of target distributions between train and test settings. 17 For example, on average, there are 34 targets per frame in the SemEval 2007 data set; the average frame ambiguity of each target in L is 1.17. 22 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing In producing full text annotations for the SemEval 2007 data set, annotators created several domain-critical frames that were not already present in version 1.3 of the lexicon. For our experiments we omit frames attested in neither the training data nor the exem- plar sentences from the lexicon.18 This leaves a total of 665 frames for the SemEval 2007 data set and a total of 877 frames for the FrameNet 1.5 data set. 5.2 Model For a given sentence x with frame-evoking targets t, let ti denote the ith target (a word (cid:6) of frames, one per i denote its lemma. We seek a list f = (cid:5)f1, . . . , fm sequence).19 Let tl target. In our model, the set of candidate frames for ti is deﬁned to include every frame (cid:7)∈ Ll, then every known frame (the latter condition applies f such that tl i for 4.7% of the annotated targets in the SemEval 2007 development set). In both cases, we let F i be the set of candidate frames for the ith target in x. We denote the entire set of frames in the lexicon as F. f —or if tl i ∈ Ll To allow frame identiﬁcation for targets whose lemmas were seen in neither the exemplars nor the training data, our model includes an additional variable, (cid:2)i. This variable ranges over the seen targets in L fi , which can be thought of as prototypes for the expression of the frame. Importantly, frames are predicted, but prototypes are summed over via the latent variable. The prediction rule requires a probabilistic model over frames for a target: fi ← argmax f ∈F i (cid:2) (cid:2)∈L f pθ(f, (cid:2) | ti, x) (1) We model the probability of a frame f and the prototype unit (cid:2), given the target and the sentence x as: pθ(f, (cid:2) | ti, x) = (cid:2) exp θ(cid:4) (cid:2) g(f, (cid:2), ti, x) , (cid:2)(cid:5) g(f exp θ(cid:4) (cid:5) , ti, x) (2) f (cid:2)∈F (cid:2)(cid:2)∈L (cid:2) f This is a conditional log-linear model: for f ∈ F f , where θ are the model weights, and g is a vector-valued feature function. This discriminative formulation is very ﬂexible, allowing for a variety of (possibly overlapping) features; for example, a feature might relate a frame type to a prototype, represent a lexical-semantic relation- ship between a prototype and a target, or encode part of the syntax of the sentence. i and (cid:2) ∈ L Previous work has exploited WordNet for better coverage during frame identiﬁca- tion (Burchardt, Erk, and Frank 2005; Johansson and Nugues 2007, e.g., by expanding the set of targets using synsets), and others have sought to extend the lexicon itself. We differ in our use of a latent variable to incorporate lexical-semantic features in a discriminative model, relating known lexical units to unknown words that may evoke frames. Here we are able to take advantage of the large inventory of partially annotated l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 18 Automatically predicting new frames is a challenge not yet attempted to our knowledge (including here). Note that the scoring metric (Section 3.3) gives partial credit for related frames (e.g., a more general frame from the lexicon). 19 Here each ti is a word sequence (cid:5)wu, . . . , wv (cid:6), 1 ≤ u ≤ v ≤ n, though in principle targets can be noncontiguous. 23 Computational Linguistics Volume 40, Number 1 the set of syntactic dependencies of the head word21 of ti if the head word of ti is a verb, then the set of dependency labels of its children Table 4 Features used for frame identiﬁcation (Equation (2)). All also incorporate f , the frame being scored. (cid:2) = (cid:5)w(cid:2), π(cid:2) (cid:6) consists of the words and POS tags20 of a target seen in an exemplar or training sentence as evoking f . The features with starred bullets were also used by Johansson and Nugues (2007). • the POS of the parent of the head word of ti •∗ •∗ • the dependency label on the edge connecting the head of ti and its parent • the sequence of words in the prototype, w(cid:2) • the lemmatized sequence of words in the prototype • the lemmatized sequence of words in the prototype and their part-of-speech tags π(cid:2) • WordNet relation22 ρ holds between (cid:2) and ti • WordNet relation22 ρ holds between (cid:2) and ti, and the prototype is (cid:2) • WordNet relation22 ρ holds between (cid:2) and ti, the POS tag sequence of (cid:2) is π(cid:2), and the POS tag sequence of ti is πt exemplar sentences. Note that this model makes an independence assumption: Each frame is predicted independently of all others in the document. In this way the model is similar to J&N’07. However, ours is a single conditional model that shares features and weights across all targets, frames, and prototypes, whereas the approach of J&N’07 consists of many separately trained models. Moreover, our model is unique in that it uses a latent variable to smooth over frames for unknown or ambiguous LUs. Frame identiﬁcation features depend on the preprocessed sentence x, the prototype (cid:2) and its WordNet lexical-semantic relationship with the target ti, and of course the frame f . Our model uses binary features, which are detailed in Table 4. 5.3 Parameter Estimation Given a training data set (either SemEval 2007 data set or the FrameNet 1.5 full text annotations), which is of the form (cid:5)(cid:5)x(j), t(j), f(j), A(j)(cid:6)(cid:6)N j=1, we discriminatively train the frame identiﬁcation model by maximizing the training data log-likelihood:23 max θ N(cid:2) mj(cid:2) (cid:2) log j=1 i=1 (cid:2)∈L ( j) f i pθ( f (j) i , (cid:2) | t(j) i , x(j)) (3) In Equation (3), mj denotes the number of frames in a sentence indexed by j. Note that the training problem is non-convex because of the summed-out prototype latent 20 POS tags are found automatically during preprocessing. 21 If the target is not a subtree in the parse, we consider the words that have parents outside the span, and apply three heuristic rules to select the head: (1) choose the ﬁrst word if it is a verb; (2) choose the last word if the ﬁrst word is an adjective; (3) if the target contains the word of, and the ﬁrst word is a noun, we choose it. If none of these hold, choose the last word with an external parent to be the head. 22 These are: IDENTICAL-WORD, SYNONYM, ANTONYM (including extended and indirect antonyms), HYPERNYM, HYPONYM, DERIVED FORM, MORPHOLOGICAL VARIANT (e.g., plural form), VERB GROUP, ENTAILMENT, ENTAILED-BY, SEE-ALSO, CAUSAL RELATION, and NO RELATION. 23 We found no beneﬁt on either development data set from using an L2 regularizer (zero-mean Gaussian prior). 24 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing Table 5 Frame identiﬁcation results on both the SemEval 2007 data set and the FrameNet 1.5 release. Precision, recall, and F1 were evaluated under exact and partial frame matching; see Section 3.3. Bold indicates best results on the SemEval 2007 data, which are also statistically signiﬁcant with respect to the baseline (p < 0.05). FRAME IDENTIFICATION (§5.2) exact matching R P F1 partial matching R P F1 SemEval 2007 Data gold targets automatic targets (§4) J&N’07 targets Baseline:J&N’07 74.21 74.21 74.21 60.21 60.21 60.21 77.51 61.03 68.29 69.75 54.91 61.44 65.34 49.91 56.59 74.30 56.74 64.34 66.22 50.57 57.34 73.86 56.41 63.97 FrameNet 1.5 Release gold targets – unsupported features & – latent variable 82.97 82.97 82.97 80.30 80.30 80.30 75.54 75.54 75.54 90.51 90.51 90.51 88.91 88.91 88.91 85.92 85.92 85.92 variable (cid:2) for each frame. To calculate the objective function, we need to cope with a sum over frames and prototypes for each target (see Equation (2)), often an expensive operation. We locally optimize the function using a distributed implementation of L- BFGS.24 This is the most expensive model that we train: With 100 parallelized CPUs using MapReduce (Dean and Ghemawat 2008), training takes several hours.25 Decoding takes only a few minutes on one CPU for the test set. 5.4 Supervised Results SemEval 2007 Data. On the SemEval 2007 data set, we evaluate the performance of our frame identiﬁcation model given gold-standard targets and automatically identiﬁed targets (Section 4); see Table 5. Together, our target and frame identiﬁcation outperform the baseline by 4 F1 points. To compare the frame identiﬁcation stage in isolation with that of J&N’07, we ran our frame identiﬁcation model with the targets identiﬁed by their system as input. With partial matching, our model achieves a relative improvement of 0.6% F1 over J&N’07, as shown in the third row of Table 5 (though this is not signiﬁcant). Note that for exact matching, the F1 score of the automatic targets setting is better than the gold target setting. This is due to the fact that there are many unseen predicates in the test set on which the frame identiﬁcation model performs poorly; however, for the automatic targets that are mostly seen in the lexicon and training data, the model gets high precision, resulting in better overall F1 score. Our frame identiﬁcation model thus performs on par with the previous state of the art for this task, and offers several advantages over J&N’s formulation of the problem: It requires only a single model, learns lexical-semantic features as part of that model rather than requiring a preprocessing step to expand the vocabulary of frame-evoking words, and is probabilistic, which can facilitate global reasoning. 24 We do not experiment with the initialization of model parameters during this non-convex optimization process; all parameters are initialized to 0.0 before running the optimizer. However, in future work, experiments can be conducted with different random initialization points to seek non-local optima. 25 In later experiments, we used another implementation with 128 parallel cores in a multi-core MPI setup (Gropp, Lusk, and Skjellum 1994), where training took several hours. 25 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 In the SemEval 2007 data set, for gold-standard targets, 210 out of 1,059 lemmas were not present in the white list that we used for target identiﬁcation (see Section 4). Our model correctly identiﬁes the frames for 4 of these 210 lemmas. For 44 of these lemmas, the evaluation script assigns a score of 0.5 or more, suggesting that our model predicts a closely related frame. Finally, for 190 of the 210 lemmas, a positive score is assigned by the evaluation script. This suggests that the hidden variable model helps in identifying related (but rarely exact) frames for unseen targets, and explains why under exact—but not partial—frame matching, the F1 score using automatic targets is commensurate with the score for oracle targets.26 For automatically identiﬁed targets, the F1 score falls because the model fails to predict frames for unseen lemmas. However, our model outperforms J&N’07 by 4 F1 points. The partial frame matching F1 score of our model represents a signiﬁcant improvement over the baseline (p < 0.01). The precision and recall measures are signiﬁcant as well (p < 0.05 and p < 0.01, respectively). However, because targets identiﬁed by J&N’07 and frames classiﬁed by our frame identiﬁcation model resulted in scores on par with the baseline, we note that the signiﬁcant results follow due to better target identiﬁcation. Note from the results that the automatic target identiﬁcation model shows an increase in precision, at the expense of recall. This is because the white list for target identiﬁcation restricts the model to predict frames only for known LUs. If we label the subset of test set with already seen LUs (seen only in the training set, excluding the exemplars) with their corresponding most frequent frame, we achieve an exact match accuracy between 52.9% and 91.2%, depending on the accuracy of the unseen LUs (these bounds assume, respectively, that they are all incorrectly labeled or all correctly labeled). FrameNet 1.5 Release. The bottom three rows of Table 5 show results on the full text annotation test set of the FrameNet 1.5 release. Because the number of annotations nearly doubled, we see large improvements in frame identiﬁcation accuracy. Note that we only evaluate with gold targets as input to frame identiﬁcation. (As mentioned in Section 3.2, some documents in the test set have not been annotated for all targets, so evaluating automatic target identiﬁcation would not be informative.) We found that 50.1% of the targets in the test set were ambiguous (i.e., they associated with more than one frame either in FrameNet or our training data). On these targets, the exact frame identiﬁcation accuracy is 73.10% and the partial accuracy is 85.77%, which indicates that the frame identiﬁcation model is robust to target ambiguity. On this data set, the most frequent frame baseline achieves an exact match accuracy between 74.0% and 88.1%, depending on the accuracy of the unseen LUs. We conducted further experiments with ablation of the latent variable in our frame identiﬁcation model. Recall that the decoding objective used to choose a frame by marginalizing over a latent variable (cid:2), whose values range over targets known to associate with the frame f being considered (see Equations (1) and (2)) in training. How much do the prototypes, captured by the latent variable, contribute to performance? Instead of treating (cid:2) as a marginalized latent variable, we can ﬁx its value to the observed target. 26 J&N’07 did not report frame identiﬁcation results for oracle targets; thus directly comparing the frame identiﬁcation models is difﬁcult. 26 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing An immediate effect of this choice is a blow-up in the number of features; we must instantiate features (see Table 4) for all 4,194 unique targets observed in training. Because each of these features needs to be associated with all 877 frames in the partition function of Equation (2), the result is an 80-fold blowup of the feature space (the latent variable model had 465,317 features). Such a model is not computationally feasible in our engineering framework, so we considered a model using only features observed to ﬁre at some point in the training data (called “supported” features),27 resulting in only 72,058 supported features. In Table 5, we see a signiﬁcant performance drop (on both exact and partial matching accuracy) with this latent variable–free model, compared both with our latent variable model with all features and with only supported features (of which there are 165,200). This establishes that the latent variable in our frame identiﬁcation model helps in terms of accuracy, and lets us use a moderately sized feature set incorporating helpful unsupported features. Finally, in our test set, we found that 144 out of the 4,458 annotated targets were un- seen, and our full frame identiﬁcation model only labeled 23.1% of the frames correctly for those unseen targets; in terms of partial match accuracy, the model achieved a score of 46.6%. This, along with the results on the SemEval 2007 unseen targets, shows that there is substantial opportunity for improvement when unseen targets are presented to the system. We address this issue next. 5.5 Semi-Supervised Lexicon Expansion We next address the poor performance of our frame identiﬁcation model on targets that were unseen as LUs in FrameNet or as instances in training data, and brieﬂy describe a technique for expanding the set of lexical units with potential semantic frames that they can associate with. These experiments were carried out on the FrameNet 1.5 data only. We use a semi-supervised learning (SSL) technique that uses a graph constructed from labeled and unlabeled data. The widely used graph-based SSL framework—see Bengio, Delalleau, and Le Roux (2006) and Zhu (2008) for introductory material on this topic—has been shown to perform better than several other semi-supervised algorithms on benchmark data sets (Chapelle, Sch ¨olkopf, and Zien 2006, chapter 21). The method constructs a graph where a small portion of vertices correspond to labeled instances, and the rest are unlabeled. Pairs of vertices are connected by weighted edges denoting the similarity between the pair. Traditionally, Markov random walks (Szummer and Jaakkola 2001; Baluja et al. 2008) or optimization of a loss function based on smoothness properties of the graph (e.g., Corduneanu and Jaakkola 2003; Zhu, Ghahramani, and Lafferty 2003; Subramanya and Bilmes 2008) are performed to propagate labels from the labeled vertices to the unlabeled ones. In our work, we are interested in multi-class generalizations of graph-propagation algorithms suitable for NLP applications, where each graph vertex can assume one or more out of many possible labels (Subramanya and Bilmes 2008, 2009; Talukdar and Crammer 2009). For us, graph vertices correspond to natural language types (not tokens) and undirected edges between them are weighted using a similarity metric. Recently, this set-up has been used to learn soft labels on natural language types (say, word n-grams or in our case, syntactically disambiguated 27 The use of unsupported features (i.e., those that can ﬁre for an analysis in the partition function but not observed to ﬁre in the training data) has been observed to give performance improvements in NLP problems; see, for example, Sha and Pereira (2003) and Martins et al. (2010). 27 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 predicates) from seed data, resulting in large but noisy lexicons, which are used to constrain structured prediction models. Applications have ranged from domain adap- tation of sequence models (Subramanya, Petrov, and Pereira 2010) to unsupervised learning of POS taggers by using bilingual graph-based projections (Das and Petrov 2011). We describe our approach to graph construction, propagation for lexicon expansion, and the use of the result to impose constraints on frame identiﬁcation. 5.5.1 Graph Construction. We construct a graph with lexical units as vertices. Thus, each vertex corresponds to a lemmatized word or phrase appended with a coarse POS tag. We use two resources for graph construction. First, we take all the words and phrases present in a dependency-based thesaurus constructed using syntactic cooccurrence statistics (Lin 1998), and aggregate words and phrases that share the same lemma and coarse POS tag. To construct this resource, Lin used a corpus containing 64 million words that was parsed with a fast dependency parser (Lin 1993, 1994), and syntactic contexts were used to ﬁnd similar lexical items for a given word or phrase. Lin sepa- rately treated nouns, verbs, and adjectives/adverbs, so these form the three parts of the thesaurus. This resource gave us a list of possible LUs, much larger in size than the LUs present in FrameNet data. The second component of graph construction comes from FrameNet itself. We scanned the exemplar sentences in FrameNet 1.5 and the training section of the full text annotations and gathered a distribution over frames for each LU appearing in FrameNet data. For a pair of LUs, we measured the Euclidean distance between their frame distributions. This distance was next converted to a similarity score and inter- polated with the similarity score from Lin’s dependency thesaurus. We omit further details about the interpolation and refer the reader to full details given in Das and Smith (2011). For each LU, we create a vertex and link it to the K nearest neighbor LUs under the interpolated similarity metric. The resulting graph has 64,480 vertices, 9,263 of which are labeled seeds from FrameNet 1.5 and 55,217 of which are unlabeled. Each vertex has a possible set of labels corresponding to the 877 frames deﬁned in the lexicon. Figure 4 shows an excerpt from the constructed graph. Figure 4 Excerpt from our constructed graph over LUs. Green LUs are observed in the FrameNet 1.5 data. Above/below them are shown the most frequently observed frame that these LUs associate with. The black LUs are unobserved and graph propagation produces a distribution over most likely frames that they could evoke as target instances. 28 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing 5.5.2 Propagation by Optimization. Once the graph is constructed, the 9,263 seed ver- tices with supervised frame distributions are used to propagate the semantic frame information via their nearest neighbors to all vertices. Here we discuss two graph- based SSL objective functions. Das and Smith (2012) compare several other graph-based SSL algorithms for this problem; we refer the interested reader to that paper. Let V denote the set of all vertices in our graph, ˆV ⊂ V be the set of seed vertices, and F denote the set of all frames. Let N (v) denote the set of neighbors of vertex v ∈ V. Let q = {q1, q2, . . . , q|V|} be the set of frame distributions, one per vertex. For each seed vertex v ∈ ˆV, we have a supervised frame distribution ˆqv. All edges in the graph are weighted according to the aforementioned interpolated similarity score, denoted wuv for the edge adjacent to vertices u and v. We ﬁnd q by solving: NGF-(cid:2)2 : arg min q, s.t. q≥0, ∀v∈V,(cid:9)qv (cid:9) 1=1 (cid:2) v∈ ˆV (cid:11) ˆqv − qv (cid:11)2 2 + μ (cid:2) v∈V,u∈N (v) wuv (cid:11)qv − qu (cid:11)2 2 + ν (cid:2) v∈V (cid:11)qv − 1 |F| (cid:11)2 2 (4) We call the objective in Equation (4) NGF-(cid:2)2 because it uses normalized probability dis- tributions at each vertex and is a Gaussian ﬁeld; it also utilizes a uniform (cid:2)2 penalty—the third term in the objective function. This is a multiclass generalization of the quadratic cost criterion (Bengio, Delalleau, and Le Roux 2006), also used by Subramanya, Petrov, and Pereira (2010) and Das and Petrov (2011). Our second graph objective function is as follows: UJSF-(cid:2)1,2 : arg min q, s.t. q≥0 (cid:2) v∈ ˆV DJS( ˆqv (cid:11)qv) + μ (cid:2) v∈V,u∈N (v) wuvDJS(qv (cid:11)qu) + ν (cid:2) v∈V (cid:11)qv (cid:11)2 1 (5) We call it UJSF-(cid:2)1,2 because it uses unnormalized probability measures at each vertex and is a Jensen-Shannon ﬁeld, utilizing pairwise Jensen-Shannon divergences (Lin 1991; Burbea and Rao 2006) and a sparse (cid:2)1,2 penalty (Kowalski and Torr´esani 2009) as the third term. Das and Smith (2012) proposed the objective function in Equation (5). It seeks at each graph vertex a sparse measure, as we expect in a lexicon (i.e., few frames have nonzero probability for a given target). These two graph objectives can be optimized by iterative updates, whose details we omit in this article; more information about the motivation behind using the (cid:2)1,2 penalty in the UJSF-(cid:2)1,2 objective, the optimization procedure, and an empirical comparison of these and other objectives on another NLP task can be found in Das and Smith (2012). 5.5.3 Constraints for Frame Identiﬁcation. Once a graph-based SSL objective function is minimized, we arrive at the optimal set of frame distributions q , which we use to constrain our frame identiﬁcation inference rule, expressed in Equation (1). In that rule, ti is the ith target in a sentence x, and fi is the corresponding evoked frame. We now add a constraint to that rule. Recall from Section 5.2 that for targets with known lemmatized forms, F i was deﬁned to be the set of frames that associate with lemma tl i in the supervised data. For unknown lemmas, F i was deﬁned to be all the frames in the lexicon. If the LU corresponding to ti is present in the graph, let it be the vertex vi. For such targets ti covered by the graph, we redeﬁne F i as: ∗ F i = {f : f ∈ M-best frames under q } ∗ vi (6) 29 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 Table 6 Exact and partial frame identiﬁcation accuracy on the FrameNet 1.5 data set with the size of lexicon (in terms of non-zero frame components in the truncated frame distributions) used for frame identiﬁcation, given gold targets. The supervised model is compared with alternatives in Table 5. Bold indicates best results. UJSF-(cid:2)1,2 produces statistically signiﬁcant results (p < 0.001) for all metrics with respect to the supervised baseline for both the unseen LUs as well as the whole test set. Although the NGF-(cid:2)2 and UJSF-(cid:2)1,2 models are statistically indistinguishable, it is noteworthy that the UJSF-(cid:2)1,2 objective produces a much smaller lexicon. UNKNOWN TARGETS exact partial ALL TARGETS exact partial frame matching frame matching frame matching frame matching Supervised Self-training NGF-(cid:2)2 UJSF-(cid:2)1,2 23.08 18.88 39.86 42.67 46.62 42.67 62.35 65.29 82.97 82.27 83.51 83.60 90.51 90.02 91.02 91.12 Graph Lexicon Size – – 128,960 45,544 For targets ti in test data whose LUs are not present in the graph (and hence in supervised data), F i is the set of all frames. Note that in this semi-supervised extension of our frame identiﬁcation inference procedure, we introduced several hyperparam- eters, namely, μ, ν, K (the number of nearest neighbors for each vertex included in the graph) and M (the number of highest scoring frames per vertex according to the induced frame distribution). We choose these hyperparameters using cross-validation by tuning the frame identiﬁcation accuracy on unseen targets. (Different values of the ﬁrst three hyperparameters were chosen for the different graph objectives and we omit their values here for brevity; M turned out to be 2 for all models.) Table 6 shows frame identiﬁcation accuracy, both using exact match as well as partial match. Performance is shown on the portion of the test set containing unknown LUs, as well as the whole test set. The ﬁnal column presents lexicon size in terms of the set of truncated frame distributions (ﬁltered according to the top M frames in qv for a vertex v) for all the LUs in a graph. For comparison with a semi-supervised baseline, we consider a self-trained system. For this system, we used the supervised frame identiﬁcation system to label 70,000 sentences from the English Gigaword corpus with frame-semantic parses. For ﬁnding targets in a raw sentence, we used a relaxed target identiﬁcation scheme, where we marked as potential frame-evoking units all targets seen in the lexicon and all other words which were not prepositions, particles, proper nouns, foreign words, or WH-words. We appended these automatic annotations to the training data, resulting in 711,401 frame annotations, more than 36 times the annotated data. These data were next used to train a frame identiﬁcation model.28 This set-up is very similar to that of Bejan (2009) who used self-training to improve frame identiﬁcation. In our setting, however, self-training hurts relative to the fully supervised approach (Table 6). Note that for the unknown part of the test set the graph-based objectives outperform both the supervised model as well as the self-training baseline by a margin of ∼20% 28 We ran self-training with smaller amounts of data, but found no signiﬁcant difference with the results achieved with 711,401 frame annotations. As we observe in Table 6, in our case, self-training performs worse than the supervised model, and we do not hope to improve with even more data. 30 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing absolute. The best model is UJSF-(cid:2)1,2, and its performance is signiﬁcantly better than the supervised model (p < 0.01). It also produces a smaller lexicon (using the sparsity- inducing penalty) than NGF-(cid:2)2, requiring less memory during frame identiﬁcation inference. The small footprint can be attributed to the removal of LUs for which all frame components were zero (qi = 0). The improvements of the graph-based objectives over the supervised and the self-trained models are modest for the whole test set, but the best model still has statistically signiﬁcant improvements over the supervised model (p < 0.01). 6. Argument Identiﬁcation (cid:6), and a list of evoked Given a sentence x = (cid:5)x1, . . . , xn frames f = (cid:5)f1, . . . , fm (cid:6) corresponding to each target, argument identiﬁcation is the task of choosing which of each fi’s roles are ﬁlled, and by which parts of x. This task is most similar to the problem of semantic role labeling, but uses a richer set of frame-speciﬁc labels than PropBank annotations. (cid:6), the set of targets t = (cid:5)t1, . . . , tm 6.1 Model fi fi = {r1, . . . , r|R |} denote frame fi’s roles (named frame element types) observed Let R in an exemplar sentence and/or our training set. A subset of each frame’s roles are marked as core roles; these roles are conceptually and/or syntactically necessary for any given use of the frame, though they need not be overt in every sentence involving the frame. These are roughly analogous to the core arguments ARG0–ARG5 in PropBank. Non-core roles—analogous to the various ARGM-* in PropBank—loosely correspond to syntactic adjuncts, and carry broadly applicable information such as the time, place, or purpose of an event. The lexicon imposes some additional structure on roles, in- cluding relations to other roles in the same or related frames, and semantic types with respect to a small ontology (marking, for instance, that the entity ﬁlling the protag- onist role must be sentient for frames of cognition). Figure 3 illustrates some of the structural elements comprising the frame lexicon by considering the CAUSE TO MAKE NOISE frame. We identify a set S of spans that are candidates for ﬁlling any role r ∈ R fi . In principle, S could contain any subsequence of x, but in this work we only consider the set of contiguous spans that (a) contain a single word or (b) comprise a valid subtree of a word and all its descendants in the dependency parse produced by the MST parser. This covers approximately 80% of arguments in the development data for both data sets. The empty span, denoted ∅, is also included in S, since some roles are not explicitly ﬁlled; in the SemEval 2007 development data, the average number of roles an evoked frame deﬁnes is 6.7, but the average number of overt arguments is only 1.7.29 In 29 In the annotated data, each core role is ﬁlled with one of three types of null instantiations indicating how the role is conveyed implicitly. For instance, the imperative construction implicitly designates a role as ﬁlled by the addressee, and the corresponding ﬁller is thus CNI (constructional null instantiation). In this work we do not distinguish different types of null instantiation. The interested reader may refer to Chen et al. (2010), who handle the different types of null instantions during argument identiﬁcation. 31 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 training, if a labeled argument is not a subtree of the dependency parse, we add its span to S.30 Let A i denote the mapping of roles in R fi to spans in S. Our model makes a prediction for each A i(rk) (for all roles rk ∈ R fi ) using: A i(rk) ← argmax s∈S pψ(s | rk, fi, ti, x) (7) We use a conditional log-linear model over spans for each role of each evoked frame: pψ(A i(rk) = s | fi, ti, x) = exp ψ(cid:4) (cid:2) h(s, rk, fi, ti, x) exp ψ(cid:4) h(s (cid:5) , rk, fi, ti, x) (8) s(cid:2)∈S Note that our model chooses the span for each role separately from the other roles and ignores all frames except the frame the role belongs to. Our model departs from the traditional SRL literature by modeling the argument identiﬁcation problem in a single stage, rather than ﬁrst classifying token spans as arguments and then labeling them. A constraint implicit in our formulation restricts each role to have at most one overt argument, which is consistent with 96.5% of the role instances in the SemEval 2007 training data and 96.4% of the role instances in the FrameNet 1.5 full text annotations. Out of the overt argument spans in the training data, 12% are duplicates, having been used by some previous frame in the sentence (supposing some arbitrary ordering of frames). Our role-ﬁlling model, unlike a sentence-global argument detection-and- classiﬁcation approach,31 permits this sort of argument sharing among frames. Word tokens belong to an average of 1.6 argument spans, including the quarter of words that do not belong to any argument. Appending together the local inference decisions from Equation (7) gives us the best mapping ˆA t for target t. Features for our log-linear model (Equation (8)) depend on the preprocessed sentence x; the target t; a role r of frame f ; and a candidate argument span s ∈ S.32 For features using the head word of the target t or a candidate argument span s, we use the heuristic described in footnote 21 for selecting the head of non-subtree spans. Table 7 lists the feature templates used in our model. Every feature template has a version that does not take into account the role being ﬁlled (so as to incorporate overall biases). The (cid:2)(cid:3) symbol indicates that the feature template also has a variant that is conjoined with r, the name of the role being ﬁlled; and (cid:4) indicates that the feature 30 Here is an example in the FrameNet 1.5 training data where this occurs. In the sentence: As capital of Europe’s most explosive economy, Dublin seems to be changing before your very eyes, the word economy evokes the ECONOMY frame with the phrase most explosive fulﬁlling the Descriptor role. However, in the dependency parse for the sentence the phrase is not a subtree because both words in the frame attach to the word economy. Future work may consider better heuristics to select potential arguments from the dependency parses to recover more gold arguments than what the current work achieves. 31 J&N’07, like us, identify arguments for each target. 32 In this section we use t, f , and r without subscripts because the features only consider a single role of a single target’s frame. 32 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing Table 7 Features used for argument identiﬁcation. Section 6.1 describes the meanings of the different circles attached to each feature. Features with both null and non-null variants: These features come in two ﬂavors: if the argument is null, then one version ﬁres; if it is overt (non-null), then another version ﬁres. (cid:2) some word in t has lemma λ (cid:3)(cid:4) some word in t has lemma λ, and the sentence uses PASSIVE voice (cid:3)(cid:4) the head of t has subcategorization sequence τ = (cid:5)τ1, τ2, . . . (cid:6) (cid:2) the head of t has c syntactic dependents (cid:2) some word in t has POS π (cid:3)(cid:4) some word in t has lemma λ, and the sentence uses ACTIVE voice (cid:3)(cid:4) some syntactic dependent of the head of t has dependency type τ (cid:2) bias feature (always ﬁres) Span content features: apply to overt argument candidates. (cid:4) POS tag π occurs for some word in s (cid:4) the ﬁrst word of s has POS π (cid:4) the last word of s has POS π (cid:4) the head word of s has syntactic dependency type τ (cid:2) ws2 and its closed-class POS tag πs2 , provided that |s| ≥ 2 (cid:4) the head word of s has lemma λ (cid:4) the last word of s: ws|s| has lemma λ (cid:2) ws|s| , and its closed-class POS tag πs|s| , (cid:3)(cid:4) lemma λ is realized in some word in s, the voice denoted in the span (ACTIVE or PASSIVE) provided that |s| ≥ 3 (cid:4) the head word of s has POS π (cid:2) |s|, the number of words in the span (cid:4) the ﬁrst word of s has lemma λ (cid:2) the ﬁrst word of s: ws1 , and its POS tag πs1 , if πs1 is a closed-class POS (cid:2) the syntactic dependency type τs1 of the ﬁrst word with respect to its head (cid:2) τs2 , provided that |s| ≥ 2 (cid:2) τs|s| , provided that |s| ≥ 3 (cid:3)(cid:4) lemma λ is realized in some word in s (cid:3)(cid:4) lemma λ is realized in some word in s, the voice denoted in the span, s’s position with respect to t (BEFORE, AFTER, or OVERLAPPING) Syntactic features: apply to overt argument candidates. (cid:4) dependency path: sequence of labeled, (cid:4) length of the dependency path directed edges from the head word of s to the head word of t Span context POS features: for overt candidates, up to 6 of these features will be active. (cid:4) a word with POS π occurs up to 3 words (cid:4) a word with POS π occurs up to 3 words before the ﬁrst word of s after the last word of s Ordering features: apply to overt argument candidates. (cid:2) the position of s with respect to the span of t: BEFORE, AFTER, or OVERLAPPING (i.e. there is at least one word shared by s and t) (cid:4) linear word distance between the nearest word of s and the nearest word of t, provided s and t do not overlap (cid:4) target-argument crossing: there is at least one word shared by s and t, at least one word in s that is not in t, and at least one word in t that is not in s (cid:4) linear word distance between the middle word of s and the middle word of t, provided s and t do not overlap template additionally has a variant that is conjoined with both r and f , the name of the frame.33 The role-name-only variants provide for smoothing over frames for common types of roles such as Time and Place; see Matsubayashi, Okazaki, and Tsujii (2009) for a detailed analysis of the effects of using role features at varying levels of granularity. Certain features in our model rely on closed-class POS tags, which are deﬁned to be all Penn Treebank tags except for CD and tags that start with V, N, J, or R. Finally, the 33 That is, the (cid:2) symbol subsumes (cid:3)(cid:4), which in turn subsumes (cid:4). 33 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 features that encode a count or a number are binned into groups: (−∞, −20], [−19, −10], [−9, −5], −4, −3, −2, −1, 0, 1, 2, 3, 4, [5, 9], [10, 19], [20, ∞). 6.2 Parameter Estimation We train the argument identiﬁcation model by: N(cid:2) mj(cid:2) | |R (j) (cid:2) f i max ψ j=1 i=1 k=1 log pψ(A(j) i (rk) | f (j) i , t(j) i , x(j)) − C (cid:11)ψ(cid:11)2 2 (9) Here, N is the number of data points (sentences) in the training set, and m is the number of frame annotations per sentence. This objective function is concave. For experiments with the SemEval 2007 data, we trained the model using stochastic gradient ascent (Bottou 2004) with no Gaussian regularization (C = 0).34 Early stopping was done by tuning on the development set, and the best results were obtained with a batch size of 2 and 23 passes through the data. On the FrameNet 1.5 release, we trained this model using L-BFGS (Liu and Nocedal 1989) and ran it for 1,000 iterations. C was tuned on the development data, and we obtained best results for C = 1.0. We did not use stochastic gradient descent for this data set as the number of training instances increased and parallelization of L-BFGS on a multicore setup implementing MPI (Gropp, Lusk, and Skjellum 1994) gave faster training speeds. 6.3 Decoding with Beam Search Naive prediction of roles using Equation (7) may result in overlap among arguments ﬁlling different roles of a frame, because the argument identiﬁcation model ﬁlls each role independently of the others. We want to enforce the constraint that two roles of a single frame cannot be ﬁlled by overlapping spans.35 Toutanova, Haghighi, and Manning (2005) presented a dynamic programming algorithm to prevent overlapping arguments for SRL; however, their approach used an orthogonal view to the argument identi- ﬁcation stage, wherein they labeled phrase-structure tree constituents with semantic roles. That formulation admitted a dynamic programming approach; our formulation of ﬁnding the best argument span for each role does not. To eliminate illegal overlap, we adopt the beam search technique detailed in Algorithm 1. The algorithm produces a set of k-best hypotheses for a frame instance’s full set of role-span pairs, but uses an approximation in order to avoid scoring an exponential number of hypotheses. After determining which roles are most likely not explicitly ﬁlled, it considers each of the other roles in turn: In each iteration, hypotheses incorporating a subset of roles are extended with high-scoring spans for the next role, always maintaining k alternatives. We set k=10,000 as the beam width.36 34 This was the setting used by Das et al. (2010) and we kept it unchanged. 35 On rare occasions a frame annotation may include a secondary frame element layer, allowing arguments to be shared among multiple roles in the frame; see Ruppenhofer et al. (2006) for details. The evaluation for this task only considers the primary layer, which is guaranteed to have disjoint arguments. 36 We show the effect of varying beam widths in Table 9, where we present performance of an exact algorithm for argument identiﬁcation. 34 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing , resulting in k best extensions overall. fi , S, the distribution pψ from Equation 8 for each role rj i, a high-scoring mapping of roles of fi to spans with no token overlap among Algorithm 1 Joint decoding of frame fi’s arguments via beam search. topk(S, pψ, rj) extracts the k most probable spans from S, under pψ, for role rj. extend(D0:(j−1), S (cid:5) ) extends each span vector in D0:(j−1) with the most probable non-overlapping span from S (cid:5) Require: k > 0, 右
ˆA
Ensure:
the spans
1: Calculate A
2: ∀r ∈ R
3: R+
← {r : r ∈ R
fi
4: n ← |R+
|
fi
5: Arbitrarily order R+
fi
1 , . . . , D0:j
6: Let D0:j = (西德:5)D0:j

i according to Equation 7
我(r) = ∅, 让
我(r) (西德:7)= ∅}

(西德:6) refer to the k-best list of vectors of compatible ﬁller spans

fi such that A
fi , A

作为 {r1, r2, . . . rn

我(r) ← ∅

ε R

ˆA

}

for roles r1 through rj
7: Initialize D0:0 to be empty
8: for j = 1 to n do
9: D0:j ← extend(D0:(j−1), topk(S, pψ, rj))
10: end for
11: ∀j ∈ {1, . . . , n}, ˆA
12: return ˆA

我(rj) ← D0:n

1 [j]

我

6.4 结果

Performance of the argument identiﬁcation model is presented in Table 8 for both data
sets in consideration. We analyze them here.

SemEval 2007 数据: For the SemEval data set, the table shows how performance
varies given different types of input: correct targets and correct frames, correct targets
but automatically identiﬁed frames, 最终, no oracle input (the full frame
parsing scenario). Rows 1–2 isolate the argument identiﬁcation task from the frame
identiﬁcation task. Given gold targets and frames, our argument identiﬁcation model
(without beam search) gets an F1 score of 68.09%; when beam search is applied, 这
increases to 68.46%, with a noticeable increase in precision. Note that an estimated 19%
of correct arguments are excluded because they are neither single words nor complete
subtrees (参见章节 6.1) of the automatic dependency parses.37

Qualitatively, the problem of candidate span recall seems to be largely due to
syntactic parse errors.38 Although our performance is limited by errors when using
the syntactic parse to determine candidate spans, it could still improve; this suggests

37 We found that using all constituents from the 10-best syntactic parses would improve oracle recall of

spans in the development set by just a couple of percentage points, at the computational cost of a larger
pool of candidate arguments per role.

38 注意, because of our labels-only evaluation scheme (部分 3.3), arguments missing a word or

containing an extra word receive no credit. 实际上, of the frame roles correctly predicted as having an
overt span, the correct span was predicted 66% 当时的, 尽管 10% of the time the predicted starting
and ending boundaries of the span were off by a total of one or two words.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

n
哦
我
t
A
C
ﬁ
我
t
n
e
d

我

t
n
e
米
你
G
r
A
r
哦
F

e
d
A
米
n
e
e
乙
e
v
A
H
s
n
哦
我
s
我
C
e
d
t
n
e
d
n
e
p
e
d
n
我

我
A
C
哦
我

r
哦
d
e
s
你
n
e
e
乙
s
A
H
米
H
t
我
r
哦
G
我
A
G
n
我
d
哦
C
e
d
t
n
我
哦

e
t
A
米
我
X
哦
r
p
p
A
e
H
t

r
e
H
t
e
H
w
e
t
A
C
我
d
n

我

H
C
我
H
w

,
s
t
我
你
s
e
r

t
s
e
乙
e
t
A
C
我
d
n
我

s
e
r
哦
C
s
d
我
哦
乙

,
)
n
哦
我
t
A
C
ﬁ
我
t
n
e
d

我

t
n
e
米
你
G
r
A
d
n
A
,
e
米
A
r
F

,
t
e
G
r
A
t

C
我
t
A
米
哦
t
你
A
(
G
n
我
s
r
A
p

我
我

你
F

r
哦
F

,
A
t
A
d
7
0
0
2
我
A
v
乙
米
e
S
e
H
t
n
氧
y
我
e
v
我
t
C
e
p
s
e
r

C
我
t
A
米
哦
t
你
A
n
哦
s
t
我
你
s
e
r

t
s
e
乙
e
t
A
C
我
d
n
我

s
e
r
哦
C
s
d
我
哦
乙

,
t
e
s
A
t
A
d
5
.
1
t
e
氮
e
米
A
r
F
e
H
t
n
氧

.
)
5
0
.
0
< p ( e n i l e s a b e h t o t e v i t a l e r s t n e m e v o r p m i i t n a c ﬁ n g i s o s l a e r a e v i a n d n a m a e b , g n i d o c e d r o F . 5 . 1 t e N e m a r F f o s n o i t a t o n n a t x e t l l u f e h t s a l l e w s a a t a d 7 0 ’ l a v E m e S e h t h t o b n o s t l u s e r n o i t a c ﬁ i t n e d i t n e m u g r A 8 e l b a T 36 f o s m r e t n I . ) 1 0 0 . 0 < p ( 9 w o r n i n w o h s s t l u s e r d e s i v r e p u s e h t r e v o t n a c ﬁ i n g i s y l l a c i t s i t a t s s i t l u s e r s i h T . h c r a e s m a e b g n i s u n o i t a c ﬁ i t n e d i t n e m u g r a c i t a m o t u a d n a e v i t c e b o - h p a r g j 2 , 1 (cid:2) - F S J U e h t s e s u t a h t l e d o m n o i t a c ﬁ i t n e d i e m a r f e h t y b d e v e i h c a s i s i h t — n o i t a c ﬁ i t n e d i t n e m u g r a d n a e m a r f l e d o m 2 (cid:2) - F G N e h t r e v o t n a c ﬁ i n g i s y l l a c i t s i t a t s s i l e d o m 2 , 1 (cid:2) - F S J U e h t h t i w s t l u s e r e h t , g n i h c t a m e m a r f l a i t r a p h t i w d e r u s a e m e r o c s 1 F d n a n o i s i c e r p j s e v i t c e b o h p a r g o w t e h t h t i w s t l u s e r e h t , g n i h c t a m e m a r f t c a x e h t i w s c i r t e m e e r h t e h t l l a r o f d n a , g n i h c t a m e m a r f l a i t r a p h t i w l l a c e r r o F . ) 5 0 . 0 < p ( r o f d e s u n e e b e v a h s e m a r f d l o g , s g n i t t e s e s o h t n i e s u a c e b g n i s s i m e r a s t l u s e r h c t a m l a i t r a p n i a t r e c t a h t e t o N . e l b a h s i u g n i t s i d n i y l l a c i t s i t a t s e r a . n o i t a c ﬁ i t n e d i t n e m u g r a g n i h c t a m l a i t r a p g n i h c t a m t c a x e g n i d o c e d s e m a r f s t e g r a t N O I T A C I F I T N E D I T N E M U G R A 1 2 3 4 5 6 7 8 9 0 1 1 1 1 F R P 6 5 3 5 . 4 2 0 5 . 9 0 8 4 . 6 8 9 4 . 9 8 1 4 . 0 7 9 3 . 5 8 7 5 . 6 7 2 6 . 8 9 0 6 . 1 F 9 0 . 8 6 6 4 . 8 6 0 0 . 6 4 9 4 . 6 4 7 3 . 4 4 R P 6 7 . 0 6 7 5 . 0 6 2 8 . 2 4 6 7 . 8 3 3 6 . 6 3 3 4 . 7 7 1 7 . 8 7 8 6 . 9 4 8 0 . 8 5 6 2 . 6 5 e v i a n m a e b m a e b m a e b m a e b ) 2 . 5 § ( d e s i v r e p u s ) 2 . 5 § ( d e s i v r e p u s ) 4 . 3 § ( d e s i v r e p u s d l o g d l o g d l o g d l o g d l o g o t u a o t u a 2 6 . 5 4 8 4 . 8 3 1 0 . 6 5 1 0 . 2 4 4 4 . 5 3 9 5 . 1 5 A / N ) 4 . 3 § ( d e s i v r e p u s o t u a 5 4 8 6 . 2 8 8 6 . 3 9 8 6 . 5 8 4 6 . 0 2 5 6 . 0 3 5 6 . 7 4 2 7 . 7 8 2 7 . 8 9 2 7 . 8 0 . 9 7 8 8 . 9 7 5 0 . 4 6 3 4 . 4 6 4 5 . 4 6 6 3 . 6 7 8 2 . 6 7 8 6 . 0 6 4 0 . 1 6 4 1 . 1 6 0 0 . 2 8 3 8 . 3 8 1 8 . 7 6 2 2 . 8 6 3 3 . 8 6 e v i a n m a e b m a e b m a e b m a e b ) 2 . 5 § ( d e s i v r e p u s d l o g d l o g ) 5 . 5 § , 2 , 1 (cid:2) - F S J U ) 5 . 5 § , 2 (cid:2) - F G N ( L S S ( L S S d l o g d l o g d l o g d l o g d l o g ) s t e g r a t e l c a r o ( g n i s r a P ) l l u f ( g n i s r a P s t e g r a t 7 0 ’ N & J ( g n i s r a P 7 0 ’ N & J : e n i l e s a B ) s e m a r f d n a ) l l u f ( n o i t a c ﬁ i t n e d i t n e m u g r A ) l l u f ( n o i t a c ﬁ i t n e d i t n e m u g r A ) s t e g r a t e l c a r o ( g n i s r a P a t a D 7 0 ’ l a v E m e S e s a e l e R 5 . 1 t e N e m a r F l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing that the model has trouble discriminating between good and bad arguments, and that additional feature engineering or jointly decoding arguments of a sentence’s frames may be beneﬁcial. Rows 3–4 show the effect of automatic supervised frame identiﬁcation on overall frame parsing performance. There is a 22% absolute decrease in F1 (18% when partial credit is given for related frames), suggesting that improved frame identiﬁcation or joint prediction of frames and arguments is likely to have a sizeable impact on overall performance. Rows 4–6 compare our full model (target, frame, and argument identiﬁcation) with the baseline, showing signiﬁcant improvement of more than 4.4 F1 points for both exact and partial frame matching. As with frame identiﬁcation, we compared the argument identiﬁcation stage with that of J&N’07 in isolation, using the automatically identiﬁed targets and frames from the latter as input to our model. As shown in row 5, with partial frame matching, this gave us an F1 score of 48.1% on the test set—signiﬁcantly better (p < 0.05) than 45.6%, the full parsing result from J&N’07 (row 6 in Table 8). This indicates that our argument identiﬁcation model—which uses a single discriminative model with a large number of features for role ﬁlling (rather than argument labeling)—is more accurate than the previous state of the art. FrameNet 1.5 Release: Rows 7–12 show results on the newer data set, which is part of the FrameNet 1.5 release. As in the frame identiﬁcation results of Table 5, we do not show results using predicted targets, as we only test the performance of the statistical models. First, we observe that for results with gold frames, the F1 score is 79.08% with naive decoding, which is signiﬁcantly higher than the SemEval counterpart. This indicates that increased training data greatly improves performance on the task. We also observe that beam search improves precision by nearly 2%, while getting rid of overlap- ping arguments. When both model frames and model arguments are used, we get an F1 score of 68.45%, which is encouraging in comparison to the best results we achieved on the SemEval 2007 data set. Semi-supervised lexicon expansion for frame identiﬁ- cation further improves parsing performance. We observe the best results when the UJSF-(cid:2)1,2 graph objective is used for frame identiﬁcation, signiﬁcantly outperforming the fully supervised model on parsing (p < 0.001) for all evaluation metrics. The im- provements with SSL can be explained by noting that frame identiﬁcation performance goes up when the graph objectives are used, which carries over to argument iden- tiﬁcation. Figure 5 shows an example where the graph-based model UJSF-(cid:2)1,2 corrects an error made by the fully supervised model for the unseen LU discrepancy.N, both for frame identiﬁcation and full frame-semantic parsing. 7. Collective Argument Identiﬁcation with Constraints The argument identiﬁcation strategy described in the previous section does not capture some facets of semantic knowledge represented declaratively in FrameNet. In this section, we present an approach that exploits such knowledge in a principled, uniﬁed, and intuitive way. In prior NLP research using FrameNet, these interactions have been largely ignored, though they have the potential to improve the quality and consistency of semantic analysis. The beam search technique (Algorithm 1) handles one kind of constraint: avoiding argument overlaps. It is, however, approximate and cannot handle other forms of constraints. Here, we present an algorithm that exactly identiﬁes the best full collection of argu- ments of a target given its semantic frame. Although we work within the conventions of 37 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 Figure 5 (a) Output of the supervised frame-semantic parsing model, with beam search for argument identiﬁcation, given the target discrepancies. The output is incorrect. (b) Output using the constrained frame identiﬁcation model that takes into account the graph-based frame distributions over unknown predicates. In this particular example, the UJSF-(cid:2)1,2 graph objective is used. This output matches the gold annotation. The LU discrepancy.N is unseen in supervised FrameNet data. FrameNet, our approach is generalizable to other SRL frameworks. We model argument identiﬁcation as constrained optimization, where the constraints come from expert knowledge encoded in FrameNet. Following prior work on PropBank-style SRL that dealt with similar constrained problems (Punyakanok et al. 2004; Punyakanok, Roth, and Yih 2008, inter alia), we incorporate this declarative knowledge in an integer linear program. Because general-purpose ILP solvers are proprietary and do not fully exploit the structure of the problem, we turn to a class of optimization techniques called dual decomposition (Komodakis, Paragios, and Tziritas 2007; Rush et al. 2010; Martins et al. 2011a). We derive a modular, extensible, parallelizable approach in which semantic con- straints map not just to declarative components in the algorithm, but also to procedural ones, in the form of “workers.” Although dual decomposition algorithms only solve a relaxation of the original problem, we make our approach exact by wrapping the algorithm in a branch-and-bound search procedure. 39 We experimentally ﬁnd that our algorithm achieves accuracy comparable to the results presented in Table 8, while respecting all imposed linguistic constraints. In comparison with beam search, which violates many of these constraints, the presented exact decoder is slower, but it decodes nine times faster than CPLEX, a state-of-the-art, proprietary, general-purpose exact ILP solver.40 39 Open-source code in C++ implementing the AD3 algorithm can be found at http://www.ark.cs.cmu.edu/AD3. 40 See http://www-01.ibm.com/software/integration/optimization/cplex-optimizer. 38 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing 7.1 Joint Inference Here, we take a declarative approach to modeling argument identiﬁcation using an ILP and relate our formulation to prior work in shallow semantic parsing. We show how knowledge speciﬁed in a linguistic resource (FrameNet in our case) can be used to derive the constraints in our ILP. Finally, we draw connections of our speciﬁcation to graphical models, a popular formalism in artiﬁcial intelligence and machine learning, and describe how the constraints can be treated as factors in a factor graph. 7.1.1 Declarative Speciﬁcation. Let us simplify notation by considering a given target t and not considering its index in a sentence x; let the semantic frame it evokes be f . To solely evaluate argument identiﬁcation, we assume that the semantic frame f is given, which is traditionally the case in controlled experiments used to evaluate SRL systems (M`arquez et al. 2008). Let the set of roles associated with the frame f be R f . In sentence x, the set of candidate spans of words that might ﬁll each role is enumerated, usually following an overgenerating heuristic, which is described in Section 6.1; as before, we call this set of spans S. As before, this set also includes the null span ∅; connecting it to a role r ∈ R f denotes that the role is not overt. Our approach assumes a scoring function that gives a strength of association between roles and candidate spans. For each role r ∈ R f and span s ∈ S, this score is parameterized as: c(r, s) = ψ(cid:4) h(s, r, f, t, x), (10) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 where ψ are model weights and h is a feature function that looks at the target t, the evoked frame f , sentence x, and its syntactic analysis, along with r and s. This scoring function is identical in form to the numerator’s exponent in the log-linear model described in Equation (8). The SRL literature provides many feature functions of this form and many ways to use machine learning to acquire ψ. Our presented method does not make any assumptions about the score except that it has the form in Equation (10). ∈ {0, 1} for every role and span pair. We have that: z ∈ {0, 1}d, where d = |R | × |S|. zr,s = 1 means that role r is ﬁlled by span s. Given the binary z vector, it is straightforward to recover the collection of arguments by checking which components zr,s have an assignment of 1; we use this strategy to ﬁnd arguments, as described in Section 7.3 (strategies 4 and 6). The joint argument identiﬁcation task can be represented as a constrained optimization problem: We deﬁne a vector z of binary variables zr,s f maximize (cid:2) (cid:2) r∈R f s∈S c(r, s) × zr,s with respect to such that z ∈ {0, 1}d Az ≤ b (11) In the last line, A is a k × d matrix and b is a vector of length k. Thus, Az ≤ b is a set of k inequalities representing constraints that are imposed on the mapping between roles and spans; these are motivated on linguistic grounds and are described next. 41 41 Note that equality constraints a · z = b can be transformed into double-side inequalities a · z ≤ b and −a · z ≤ −b. 39 Computational Linguistics Volume 40, Number 1 Uniqueness. Each role r is ﬁlled by at most one span in S. This constraint can be expressed by: ∀r ∈ R f , (cid:2) s∈S zr,s = 1 (12) There are O(|R |) such constraints. Note that because S contains the null span ∅, non- overt roles are also captured using the above constraints. Such a constraint is used extensively in prior literature (Punyakanok, Roth, and Yih 2008, Section 3.4.1). f Overlap. SRL systems commonly constrain roles to be ﬁlled by non-overlapping spans. For example, Toutanova, Haghighi, and Manning (2005) used dynamic programming over a phrase structure tree to prevent overlaps between arguments, and Punyakanok, Roth, and Yih (2008) used constraints in an ILP to respect this requirement. Inspired by the latter, we require that each input sentence position of x be covered by at most one argument of t. We deﬁne: G(i) = {s | s ∈ S, s covers position i in x} (13) We can deﬁne our overlap constraints in terms of G as follows, for every sentence position i: ∀i ∈ {1, . . . , |x|}, (cid:2) (cid:2) r∈R f s∈G(i) zr,s ≤ 1 (14) This gives us O(|x|) constraints. It is worth noting that this constraint aims to achieve the same effect as beam search, as described in Section 6.3, which tries to avoid argument overlap greedily. Pairwise “Exclusions.” For many target classes, there are pairs of roles forbidden to appear together in the analysis of a single target token. Consider the following two sentences: (1) A blackberry resembles a loganberry . Entity 1 (2) Most berries Entities Entity 2 resemble each other. Consider the uninﬂected target resemble in both sentences, evoking the same meaning. In Example (1), two roles—which we call Entity 1 and Entity 2—describe two entities that are similar to each other. In the second sentence, a phrase fulﬁlls a third role, called Entities, that collectively denotes some objects that are similar. It is clear that the roles Entity 1 and Entities cannot be overt for the same target at once, because the latter already captures the function of the former; a similar argument holds for the Entity 2 and Entities roles. We call this phenomenon the “excludes” relationship. Let us deﬁne a set of pairs from R f that have this relationship: Exclf = {(ri, rj) | ri and rj exclude each other} 40 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing Using the given set, we deﬁne the constraint: ∀(ri, rj) ∈ Exclf , zri,∅ + zrj,∅ ≥ 1 (15) If both roles are overt in a parse, this constraint will be violated, contravening the “excludes” relationship speciﬁed between the pair of roles. If neither or only one of the roles is overt, the constraint is satisﬁed. The total number of such constraints is |), which is the number of pairwise “excludes” relationships of a given frame. O(|Exclf Pairwise “Requirements.” The sentence in Example (1) illustrates another kind of con- straint. The target resemble cannot have only one of Entity 1 and Entity 2 as roles in text. For example, (3) * A blackberry resembles. Entity 1 Enforcing the overtness of two roles sharing this “requires” relationship is straight- forward. We deﬁne the following set for a frame f : Reqf = {(ri, rj) | ri and rj require each other} This leads to constraints of the form ∀(ri, rj) ∈ Reqf , zri,∅ − zrj,∅ = 0 (16) If one role is overt (or absent), the other must be as well. A related constraint has been used previously in the SRL literature, enforcing joint overtness relationships be- tween core arguments and referential arguments (Punyakanok, Roth, and Yih 2008, Section 3.4.1), which are formally similar to our example.42 7.1.2 Integer Linear Program and Relaxation. Plugging the constraints in Equations 12, 14, 15, and 16 into the last line of Equation (11), we have the argument identiﬁcation problem expressed as an ILP, since the indicator variables z are binary. Here, apart from the ILP formulation, we will consider the following relaxation of Equation (11), which replaces the binary constraint z ∈ {0, 1}d by a unit interval constraint z ∈ [0, 1]d, yielding a linear program: maximize (cid:2) (cid:2) r∈R f s∈S c(r, s) × zr,s with respect to such that z ∈ [0, 1]d Az ≤ b. (17) 42 We noticed that, in the annotated data, in some cases, the “requires” constraint is violated by the FrameNet annotators. This happens mostly when one of the required roles is absent in the sentence containing the target, but is rather instantiated in an earlier sentence (Gerber and Chai 2010). We apply the hard constraint in Equation (16), though extending our algorithm to seek arguments outside the sentence is straightforward. For preliminary work extending SEMAFOR this way, see Chen et al. (2010). 41 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 There are several LP and ILP solvers available, and a great deal of effort has been spent by the optimization community to devise efﬁcient generic solvers. An example is CPLEX, a state-of-the-art solver for mixed integer programming that we use as a baseline to solve the ILP in Equation (11) as well as its LP relaxation in Equation (17). Like many of the best implementations, CPLEX is proprietary. 7.1.3 Linguistic Constraints from FrameNet. Although enforcing the four different sets of constraints is intuitive from a general linguistic perspective, we ground their use in deﬁnitive linguistic information present in the FrameNet lexicon. From the annotated data in the FrameNet 1.5 release, we gathered that only 3.6% of the time is a role instantiated multiple times by different spans in a sentence. This justiﬁes the uniqueness constraint enforced by Equation (12). Use of such a constraint is also consistent with prior work in frame-semantic parsing (Johansson and Nugues 2007). Similarly, we found that in the annotations, no arguments overlapped with each other for a given target. Hence, the overlap constraints in Equation (14) are also justiﬁed. Our third and fourth sets of constraints, presented in Equations (15) and (16), come from FrameNet, too. Examples (1) and (2) are instances where the target resemble evokes the SIMILARITY frame, which is deﬁned in FrameNet as: Two or more distinct entities, which may be concrete or abstract objects or types, are characterized as being similar to each other. Depending on ﬁgure/ground relations, the entities may be expressed in two distinct frame elements and constituents, Entity 1 and Entity 2, or jointly as a single frame element and constituent, Entities. For this frame, the lexicon lists several roles other than the three we have already observed, such as Dimension (the dimension along which the entities are similar), Differ- entiating fact (a fact that reveals how the concerned entities are similar or different), and so forth. Along with the roles, FrameNet also declares the “excludes” and “requires” relationships noted in our discussion in Section 7.1.1. The case of the SIMILARITY frame is not unique; in Figure 1, the frame COLLABORATION, evoked by the target partners, also has two roles Partner 1 and Partner 2 that share the “requires” relationship. In fact, out of 877 frames in FrameNet 1.5, 204 frames have at least a pair of roles for which the “excludes” relationship holds, and 54 list at least a pair of roles that share the “requires” relationship. 7.1.4 Constraints as Factors in a Graphical Model. The LP in Equation (17) can be repre- sented as a maximum a posteriori inference problem in an undirected graphical model. In the factor graph, each component (zr,s) of the vector z corresponds to a binary variable, and each instantiation of a constraint in Equations (12), (14), (15), and (16) corresponds to a factor. Smith and Eisner (2008) and Martins et al. (2010) used such a representation to impose constraints in a dependency parsing problem; the latter discussed the equivalence of linear programs and factor graphs for representing dis- crete optimization problems. All of our constraints take standard factor forms we can describe using the terminology of Smith and Eisner and Martins et al. The uniqueness constraint in Equation (12) corresponds to an XOR factor, while the overlap constraint in Equation (14) corresponds to an ATMOSTONE factor. The constraints in Equation (15) enforcing the “excludes” relationship can be represented with an OR factor. Finally, each “requires” constraints in Equation (16) is equivalent to an XORWITHOUTPUT factor. 42 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Das et al. Frame-Semantic Parsing In the following section, we describe how we arrive at solutions for the LP in Equation (17) using dual decomposition, and how we adapt it to efﬁciently recover the exact solution of the ILP (Equation (11)), without the need of an off-the-shelf ILP solver. 7.2 “Augmented” Dual Decomposition Dual decomposition methods address complex optimization problems in the dual, by dividing them into simple worker problems (subproblems), which are repeatedly solved until a consensus is reached. The simplest technique relies on the subgradient algorithm (Komodakis, Paragios, and Tziritas 2007; Rush et al. 2010); as an alternative, Martins et al. (2011a, 2011b) proposed an augmented Lagrangian technique, which is more suitable when there are many small components —commonly the case in declarative constrained problems, like the one at hand. Here, we present a brief overview of the latter, which is called AD3. Let us start by establishing some notation. Let m ∈ {1, . . . , M} index a factor, and denote by i(m) the vector of indices of variables linked to that factor. (Recall that each factor represents the instantiation of a constraint.) We introduce a new set of variables, u ∈ Rd, called the “witness” vector. We split the vector z into M overlapping pieces , and add M constraints zm = ui(m) to impose that z1, . . . , zM, where each zm all the pieces must agree with the witness (and therefore with each other). Each of the M constraints described in Section 7.1 can be encoded with its own matrix Am and vector bm (which jointly deﬁne A and b in Equation (17)). For convenience, we denote by c ∈ Rd the score vector, whose components are c(r, s), for each r ∈ R f and s ∈ S (Equation (10)), and deﬁne the following scores for the mth subproblem: ∈ [0, 1] |i(m)| cm(r, s) = δ(r, s) −1c(r, s), ∀(r, s) ∈ i(m) where δ(r, s) is the number of constraints that involve role r and span s. Note that · zm. We can rewrite the LP in Equation (17) according to this deﬁnition, c · z = in the following equivalent form: M m=1 cm (cid:3) maximize M(cid:2) cm · zm m=1 ∈ [0, 1]i(m), with respect to u ∈ Rd, zm ∀m ≤ bm, Amzm ∀m zm = ui(m), such that ∀m (18) We introduce Lagrange multipliers λm for the equality constraints in the last line. The AD3 algorithm is depicted as Algorithm 2. Like dual decomposition approaches, it repeatedly performs a broadcast operation (the zm-updates, which can be done in parallel, one constraint per “worker”) and a gather operation (the u- and λ-updates). Each u-operation can be seen as an averaged voting which takes into consideration each worker’s results. Like in the subgradient method, the λ-updates can be regarded as price adjust- ments, which will affect the next round of zm-updates. The only difference with respect to the subgradient method (Rush et al. 2010) is that each subproblem involved in a zm-update also has a quadratic penalty that penalizes deviations from the previous 43 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 0 1 9 1 8 1 2 8 9 5 / c o l i _ a _ 0 0 1 6 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 40, Number 1 Algorithm 2 AD3 for Argument Identiﬁcation Require: role-span matching scores c := (cid:5)c(r, s)(cid:6) r,s, structural constraints (cid:5)Am, bm (cid:6)M m=1, penalty ρ > 0
1: initialize t ← 1
2: initialize u1 uniformly (IE。, 你(r, s) = 0.5, ∀r, s)
3: initialize each λ1
4: repeat
5:
6:

m = 0, ∀m ∈ {1, . . . , 中号}

for each m = 1, . . . , M do

make a zm-update by ﬁnding the best scoring analysis for the mth constraint,
with penalties for deviating from the consensus u:
− ρ
← argmax
2
≤bm
Amzt
米

(厘米 + λt

米) · zm

z(t+1)
米

− ut

(西德:11)zm

(19)

我(米)

(西德:11)2

end for

7:
8: make a u-update by updating the consensus solution, averaging z1, . . . , zm:

你(t+1)(r, s) ← 1

δ(r, s)

9: make a λ-update:

(西德:2)

米:(r,s)∈i(米)

z(t+1)
米

(r, s)

λ(t+1)
米

← λt
米

− ρ(z(t+1)
米

− u(t+1)
我(米) ),

∀m

t ← t + 1
10:
11: until convergence
Ensure: relaxed primal solution u

and dual solution λ∗
is integer, it will encode
an assignment of spans to roles. 否则, it will provide an upper bound of the
true optimum.

∗
. If u

∗

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

average voting; it is this term that accelerates consensus and therefore convergence.
Martins et al. (2011乙) also provide stopping criteria for the iterative updates using
primal and dual residuals that measure convergence; we refer the reader to that paper
欲了解详情.

A key attraction of this algorithm is that all the components of the declarative
speciﬁcation remain intact in the procedural form. Each worker corresponds exactly
to one constraint in the ILP, which corresponds to one linguistic constraint. There is no
need to work out when, during the procedure, each constraint might have an effect, 作为
in beam search.

7.2.1 Solving the Subproblems. In a different application, Martins et al. (2011乙, 部分 4)
showed how to solve each zm-subproblem associated with the XOR, XORWITHOUTPUT
and OR factors in runtime O(|我(米)| 日志 |我(米)|). The only subproblem that remains is that
of the ATMOSTONE factor; a solution with the same runtime is given in Appendix B.

7.2.2 Exact Decoding. It is worth recalling that AD3, like other dual decomposition
算法, solves a relaxation of the actual problem. Although we have observed that
the relaxation is often tight (比照. 部分 7.3), this is not always the case. Speciﬁcally, A
fractional solution may be obtained, which is not interpretable as an argument, 和
therefore it is desirable to have a strategy to recover the exact solution. Two observations

Das et al.

Frame-Semantic Parsing

are noteworthy. 第一的, the optimal value of the relaxed problem (方程 (17)) 提供
an upper bound to the original problem (方程 (11)). This is because Equation (11)
has the additional integer constraint on the variables. 尤其, any feasible dual
point provides an upper bound to the original problem’s optimal value. 第二, dur-
ing execution of the AD3 algorithm, we always keep track of a sequence of feasible
dual points. 所以, each iteration constructs tighter and tighter upper bounds.
With this machinery, we have all that is necessary for implementing a branch-and-
bound search that ﬁnds the exact solution of the ILP. The procedure works recursively
as follows:

1. Initialize L = −∞ (our best value so far).
∗
2. Run Algorithm 2. If the solution u

∗
is integer, return u

and set L to the objec-
tive value. If along the execution we obtain an upper bound less than L, 然后
Algorithm 2 can be safely stopped and return “infeasible”—this is the bound part.
∗
否则 (if u

is fractional) go to step 3.
3. Find the “most fractional” component of u

∗
j ) and branch: constrain uj = 0
(call it u
∗
0 or infeasibility; 和
and go to step 2, eventually obtaining an integer solution u
∗
∗
∗ ∈ {你
∗
} 那
then constrain uj = 1 and do the same, obtaining u
0 , 你
1 . Return the u
1
yields the largest objective value.

∗

Although this procedure may have worst-case exponential runtime, we found it empir-
ically to rapidly obtain the exact solution in all test cases.

7.3 Results with Collective Argument Identiﬁcation

We present experiments only on argument identiﬁcation in this section, as our goal is
to exhibit the importance of incorporating the various linguistic constraints during our
inference procedure. We present results on the full text annotations of FrameNet 1.5, 和
do not experiment on the SemEval 2007 benchmark, as we have already established our
constraint-agnostic models as state-of-the-art. The model weights ψ used in the scoring
function c were learned as in Section 6.1 (IE。, by training a logistic regression model to
maximize conditional log-likelihood). The AD3 parameter ρ was initialized to 0.1, 和
we followed Martins et al. (2011乙) in dynamically adjusting it to keep a balance between
the primal and dual residuals.

We compare the following algorithms to demonstrate the efﬁcacy of our collective

argument identiﬁcation approach:43

1. Naive: This strategy selects the best span for each role r according to the score
function c(r, s), independently of all other roles—the decoding rule formalized in
方程 (7) of Section 6.1. It ignores all constraints except “uniqueness.”

2. Beam: This strategy employs greedy beam search to eliminate overlaps between
predicted arguments, as described in Algorithm 1. Note that it does not try to
respect the “excludes” and “requires” constraints between pairs of roles. 这
default size of the beam in Section 1 was a safe 10,000; this resulted in extremely
slow decoding times. For time comparison, we tried beam sizes of 100 和 2 (这
latter being the smallest size that achieves the same F1 score on the FrameNet 1.5
dev set).

43 The ﬁrst two strategies correspond to rows 7 和 9, 分别, of Table 8.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

桌子 9
Comparison of decoding strategies in Section 7.3 on the data set released with the FrameNet 1.5
Release, given gold frames. We evaluate in terms of precision, 记起, and F1 score on our test
set containing 4,458 targets. We also compute the number of constraint violations each model
使: the three values are the numbers of overlapping arguments and violations of the
“requires” and “excludes” constraints of Section 7.1. 最后, decoding time (without feature
computation steps) on the whole test set is shown in the last column averaged over ﬁve runs.

ARGUMENT IDENTIFICATION
右

方法

磷

Violations

时间 (s)

naive

82.00

76.36

79.08

441

beam = 2
beam = 100
beam = 10, 000

CPLEX, LP
CPLEX, exact

AD3, LP
AD3, exact

83.68
83.83
83.83

83.80
83.78

76.22
76.28
76.28

76.16
76.17

79.78
79.88
79.88

79.80
79.79

83.77
83.78

76.17
76.17

79.79
79.79

0
0
0

0
0

2
0

49
50
50

1
0

2
0

1.26 ± 0.01

0
1
1

0
0

2.74 ± 0.10
29.00 ± 0.25
440.67 ± 5.53

32.67 ± 1.29
43.12 ± 1.26

4.17 ± 0.01
4.78 ± 0.04

3. CPLEX, LP: This uses CPLEX to solve the relaxed LP in Equation (17). To han-
∗ =

dle fractional z, for each role r, we choose the best span s
argmaxs∈S

zr,s, breaking ties arbitrarily.

such that s

4. CPLEX, exact: This tackles the actual ILP (方程 (11)) with CPLEX.
5. AD3, LP: The relaxed problem is solved using AD3. We choose a span for each role

∗

as in strategy 3.

6. AD3, exact: This couples AD3 with branch-and-bound search to get the exact

integer solution.

桌子 9 shows performance of these decoding strategies on the test set. We report
precision, 记起, and F1 scores. As with experiments in previous sections, 我们使用
evaluation script from SemEval 2007 shared task. Because these scores do not penalize
constraint violations, we also report the number of overlap, “excludes,” and “requires”
constraints that were violated in the test set. 最后, we tabulate each setting’s decoding
time in seconds on the whole test set averaged over ﬁve runs.44 The naive model
is very fast but suffers degradation in precision and violates one constraint roughly
per nine targets. The decoding strategy of Section 6.1 used a default beam size of
10,000, which is extremely slow; a faster version of beam size 100 results in the same
precision and recall values, but is 15 times faster on our test set. Beam size 2 结果
in slightly worse precision and recall values, but is even faster. All of these, 然而,
result in many constraint violations. Strategies involving CPLEX and AD3 perform
similarly to each other and to beam search on precision and recall, but eliminate most
or all of the constraint violations. With respect to precision and recall, exact AD3 and
beam search with a width of 10,000 were found to be statistically indistinguishable
(p > 0.01). The decoding strategy with beam size 2 is 11–16 times faster than the

44 Experiments were conducted on a 64-bit machine with two 2.6-GHz dual-core CPUs (IE。, four processors

in all) and a total of 8 GB of RAM. The workers in AD3 were not parallelized, whereas CPLEX
automatically parallelized execution.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

(A) Gold annotation.

(乙) Beam search output.

数字 6
An example from the test set where (A) exhibits the gold annotation for a target that evokes
the COLLABORATION frame, with the Partners role ﬁlled by the span international. (乙) 节目
the prediction made by the beam search decoding scheme (beam = 10,000), where it marks
international with the Partner 1 角色, violating the “requires” constraint; FrameNet notes that this
role should be present with the Partner 2 角色. AD3 is conservative and predicts no role—it is
penalized by the evaluation script, but does not produce output that violates
linguistic constraints.

CPLEX strategies, but is only twice as fast as AD3, and results in signiﬁcantly more
constraint violations. The exact algorithms are slower than the LP versions, but com-
pared with CPLEX, AD3 is signiﬁcantly faster and has a narrower gap between its
exact and LP versions. We found that relaxation was tight 99.8% of the time on the test
examples.

The example in Figure 1 is taken from our test set, and shows an instance where two
角色, Partner 1 and Partner 2, share the “requires” relationship; for this example, the beam
search decoder misses the Partner 2 角色, which is a violation, while our AD3 decoder
identiﬁes both arguments correctly. Note that beam search makes plenty of linguistic
violations. We found that beam search, when violating many “requires” constraints,
often ﬁnds one role in the pair, which increases its recall. AD3 is sometimes more
conservative in such cases, predicting neither role. 数字 6 shows such an example
where beam search ﬁnds one role (Partner 1) while AD3 is more conservative and predicts
no roles. 数字 7 shows another example contrasting the output of beam search and
AD3 where the former predicts two roles sharing an “excludes” relationship; AD3 does
not violate this constraint and tries to predict a more consistent argument set. 全面的,
we found it interesting that imposing the constraints did not have much effect on
standard measures of accuracy.

桌子 9 only shows results with gold frames. We ran the exact version of AD3 with
automatic frames as well. When the semi-supervised graph objective UJSF-(西德:2)1,2 is used
for frame identiﬁcation, the performance with AD3 is only a bit worse in comparison
with beam search (排 11 表中 8) when frame and argument identiﬁcation are
evaluated together. We get a precision of 72.92, a recall of 65.22 and an F1 score of 68.86
(partial frame matching). 再次, all linguistic constraints are respected, unlike beam
搜索.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

8. 结论

We have presented an approach to rich frame-semantic parsing, based on a combination
of knowledge from FrameNet, two probabilistic models trained on full text annota-
tions released along with the FrameNet lexicon, and expedient heuristics. The frame
identiﬁcation model uses latent variables in order to generalize to predicates unseen
in either the FrameNet lexicon or training data, and our results show that, quite often,
this model chooses a frame closely related to the gold-standard annotation. 我们也
presented an extension of this model that uses graph-based semi-supervised learning
to better generalize to new predicates; this achieves signiﬁcant improvements over the
fully supervised approach. Our argument identiﬁcation model, trained using maximum
conditional log-likelihood, uniﬁes the traditionally separate steps of detecting and

(A) Gold annotation.

(乙) Beam search output.

数字 7
An example from the test set where (A) exhibits the gold annotation for a target that evokes
the DISCUSSION frame, with the Interlocutor 1 role ﬁlled by the span neighbors. (乙) 节目
the prediction made by the beam search decoding scheme (beam = 10,000), where it marks
The next morning his households and neighbors with the Interlocutors role, which violates
the “excludes” constraint with respect to the Interlocutor 2 角色. 在 (C), AD3 marks the wrong
span as the Interlocutor 1 角色, but it does not violate the constraint. Both beam and
AD3 inference miss the Topic role.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

labeling arguments. Our system achieves improvements over the previous state of the
art on the SemEval 2007 benchmark data set at each stage of processing and collectively.
We also report stronger results on the more recent, larger FrameNet 1.5 发布.

We applied the AD3 algorithm to collective prediction of a target’s arguments,
incorporating declarative linguistic knowledge as constraints. It outperforms the naive
local decoding scheme that is oblivious to the constraints. 此外, it is signiﬁcantly
faster than a decoder employing a state-of-the-art proprietary solver; it is only twice as
slow as beam search (our chosen decoding method for comparison with the state of
the art), which is inexact and does not respect all linguistic constraints. This method is
easily amenable to the inclusion of additional constraints.

From our results, we observed that in comparison to the SemEval 2007 数据
放, frame-semantic parsing performance signiﬁcantly increases when we use the
FrameNet 1.5 发布; this suggests that the increase in the number of full text anno-
tations and the size of the FrameNet lexicon is beneﬁcial. We believe that with more
annotations in the future (说, in the range of the number of PropBank annotations), 我们的
frame-semantic parser can reach even better accuracy, making it more useful for NLP
applications that require semantic analysis.

There are several open problems to be addressed. Firstly, we could further im-
prove the coverage of the frame-semantic parser by improving our semi-supervised
learning approach; two possibilities are custom metric learning approaches (Dhillon,
Talukdar, and Crammer 2010) that suit the frame identiﬁcation problem in graph-based
SSL, and sparse word representations (Turian, Ratinov, and Bengio 2010) as features
in frame identiﬁcation. The argument identiﬁcation model might also beneﬁt from
semi-supervised learning. Further feature engineering and improved preprocessing,
including tokenization into lexical units, improved syntactic parsing, and the use of
external knowledge bases, is expected to improve the system’s accuracy. 最后, 这
FrameNet lexicon does not contain exhaustive semantic knowledge. Automatic frame
and role induction is an exciting direction of future research that could further enhance
our methods of automatic frame-semantic parsing. The parser described in this article
is available for download at http://www.ark.cs.cmu.edu/SEMAFOR.

附录

A. Target Identiﬁcation Heuristics from J&N’07

We describe here the ﬁltering rules that Johansson and Nugues (2007) used for identify-
ing frame evoking targets in their SemEval 2007 shared task paper. They built a ﬁltering
component based on heuristics that removed words that appear in certain contexts, 和
kept the remaining ones.45 These are:

(西德:2)
(西德:2)

have was retained only if had an object,

be was retained only if it was preceded by there,

will was removed in its modal sense,

of course and in particular were removed,

45 Although not explicitly mentioned in the paper, we believe that these rules were applied on a white list of

potential targets seen in FrameNet and the SemEval 2007 training data.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

(西德:2)

(西德:2)
(西德:2)

(西德:2)

the prepositions above, 反对, 在, 以下, beside, 经过, 在, 在, 超过, and under
were removed unless their head was marked as locative,

after and before were removed unless their head was marked as temporal,

进入, 到, and through were removed unless their head was marked as
方向,

作为, 为了, 所以, and with were always removed,

because the only sense of the word of was the frame PARTITIVE, 它是
removed unless it was preceded by only, member, 一, 最多, 许多, 一些, 很少,
部分, majority, minority, proportion, half, 第三, quarter, 全部, or none, or it was
followed by all, 团体, 他们, or us,

all targets marked as support verbs for some other target were removed.

Note that J&N’07 used a syntactic parser that provided dependency labels correspond-
ing to locative, 颞, and directional arguments, which our syntactic parser of
选择 (the MST parser) does not provide.

ATMOSTONE subproblems in AD3
ATMOSTONE
乙. Solving ATMOSTONE

The ATMOSTONE subproblem can be transformed into that of projecting a point
(a1, . . . , ak) onto the set

(西德:4)

m =

∈ [0, 1]

|我(米)| |

(西德:3)|我(米)|

j=1 zm,j

(西德:5)

≤ 1

This projection can be computed as follows:

(西德:5)
1. Clip each aj into the interval [0, 1] (IE。, set a
(西德:5)
(西德:5)
≤ 1, then return (A
1, . . . , A
k).

satisﬁes

(西德:5)
k
j=1 a
j

(西德:3)

j = min{max{aj, 0}, 1}). If the result

2. Otherwise project (a1, . . . , ak) onto the probability simplex:
(西德:5)

(西德:4)

∈ [0, 1]

|我(米)| |

(西德:3)|我(米)|

j=1 zm,j = 1

This is precisely the XOR subproblem and can be solved in time O(|我(米)|
日志 |我(米)|).

The proof of this procedure’s correctness follows from the proof in Appendix B of

Martins et al. (2011乙).

致谢
We thank Collin Baker, Katrin Erk, 理查德
约翰逊, and Nils Reiter for software, 数据,
evaluation scripts, and methodological
细节. We thank the reviewers of this and
the earlier papers, Alan Black, Ric Crabbe,
Michael Ellsworth, Rebecca Hwa, Dan Klein,
Russell Lee-Goldman, Slav Petrov, Dan Roth,
Josef Ruppenhofer, Amarnag Subramanya,
Partha Talukdar, and members of the ARK

group for helpful comments. This work was
supported by DARPA grant NBCH-1080004,
NSF grants IIS-0836431 and IIS-0915187,
Qatar National Research Foundation grant
NPRP 08-485-1-083, Google’s support of the
Worldly Knowledge Project at CMU,
computational resources provided by Yahoo,
and TeraGrid resources provided by the
Pittsburgh Supercomputing Center under
grant TG-DBS110003.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

参考
Auli, Michael and Adam Lopez. 2011.

A comparison of loopy belief propagation
and dual decomposition for integrated
CCG supertagging and parsing.
In Proceedings of the 49th Annual Meeting
of the Association for Computational
语言学: 人类语言技术,
pages 470–480, Portland, 或者.

贝克, Collin, Michael Ellsworth, 和

Katrin Erk. 2007. SemEval-2007 task 19:
Frame semantic structure extraction.
In Proceedings of the Fourth International
Workshop on Semantic Evaluations,
pages 99–104, Prague.

Baluja, Shumeet, Rohan Seth, D. Sivakumar,
Yushi Jing, Jay Yagnik, Shankar Kumar,
Deepak Ravichandran, and Mohamed
Aly. 2008. Video suggestion and discovery
for YouTube: taking random walks
through the view graph. In Proceedings
of the 17th International Conference on
the World Wide Web, pages 895–904,
北京.

Bauer, Daniel and Owen Rambow. 2011.

Increasing coverage of syntactic
subcategorization patterns in FrameNet
using VerbNet. 在诉讼程序中 2011
IEEE Fifth International Conference on
Semantic Computing, pages 181–184,
华盛顿, 直流.

Bejan, Cosmin A. 2009. Learning Event

Structures From Text. 博士. 论文, 这
University of Texas at Dallas.

本吉奥, Yoshua, Olivier Delalleau, 和

Nicolas Le Roux. 2006. Label propagation
and quadratic criterion. In Olivier
Chapelle, Bernhard Sch ¨olkopf, 和
Alexander Zien, 编辑, Semi-Supervised
学习. 与新闻界, 剑桥, 嘛,
pages 193–216.

Boas, Hans C. 2002. Bilingual FrameNet
dictionaries for machine translation.
In Proceedings of the Third International
Conference on Language Resources and
评估, pages 1,364–1,371, Las Palmas.

波图, L´eon. 2004. Stochastic learning.
In Olivier Bousquet and Ulrike von
Luxburg, 编辑, Advanced Lectures on
Machine Learning, Lecture Notes in
人工智能, LNAI 3176.
Springer Verlag, 柏林, pages 146–168.
Burbea, Jacob and Calyampudi R. 饶. 2006.

On the convexity of some divergence
measures based on entropy functions.
IEEE Transactions on Information Theory,
28(3):489–495.

Burchardt, Aljoscha, Katrin Erk, 和

Anette Frank. 2005. A WordNet detour

to FrameNet. In Bernhard Fisseni,
Hans-Christian Schmitz, 伯恩哈德
Schr ¨oder, and Petra Wagner, 编辑,
Sprachtechnologie, mobile Kommunikation
und linguistische Resourcen, 体积 8 的
Computer Studies in Language and Speech.
Peter Lang, Frankfurt am Main,
pages 408–421.

Burchardt, Aljoscha and Anette Frank. 2006.
Approaching textual entailment with LFG
and FrameNet frames. 在诉讼程序中
Second PASCAL RTE Challenge Workshop,
pages 92–97, 威尼斯.

Burchardt, Aljoscha, Marco Pennacchiotti,
Stefan Thater, and Manfred Pinkal. 2009.
Assessing the impact of frame semantics
on textual entailment. 自然语言
Engineering, 15(4):527–550.

Carreras, Xavier and Llu´ıs M`arquez. 2004.
Introduction to the CoNLL-2004 shared
任务: Semantic role labeling. In Proceedings
of the Eighth Conference on Computational
Natural Language Learning, pages 89–97,
波士顿, 嘛.

Carreras, Xavier and Llu´ıs M`arquez. 2005.
Introduction to the CoNLL-2005 shared
任务: Semantic role labeling. In Proceedings
of the Ninth Conference on Computational
Natural Language Learning, pages 152–164,
安娜堡, MI.

张, Yin-Wen and Michael Collins.

2011. Exact decoding of phrase-based
translation models through Lagrangian
relaxation. 在诉讼程序中 2011
Conference on Empirical Methods in Natural
语言处理, pages 26–37,
爱丁堡.

Chapelle, 奥利维尔, Bernhard Sch ¨olkopf,
and Alexander Zien, 编辑. 2006.
Semi-Supervised Learning. 与新闻界,
剑桥, 嘛.

陈, Desai, Nathan Schneider, Dipanjan

这, and Noah A. 史密斯. 2010. SEMAFOR:
Frame argument resolution with
log-linear models. 在诉讼程序中
5th International Workshop on Semantic
评估, pages 264–267, Upssala.

Corduneanu, Adrian and Tommi Jaakkola.
2003. On information regularization.
In Proceedings of the Nineteenth Conference
on Uncertainty in Artiﬁcial Intelligence,
pages 151–158, Acapulco.

这, Dipanjan, Andr´e F. 时间. 马丁斯, 和
诺亚A. 史密斯. 2012. An exact dual
decomposition algorithm for shallow
semantic parsing with constraints.
In Proceedings of the First Joint Conference
on Lexical and Computational Semantics,
pages 209–217, Montr´eal.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

这, Dipanjan and Slav Petrov. 2011.

Unsupervised part-of-speech tagging
with bilingual graph-based projections.
In Proceedings of the 49th Annual Meeting
of the Association for Computational
语言学: 人类语言技术,
pages 600–609, Portland, 或者.

Fellbaum, Christiane, 编辑. 1998. WordNet:
an electronic lexical database. 与新闻界,
剑桥, 嘛.

Fillmore, Charles J. 1982. Frame semantics.

In Linguistics in the Morning Calm.
Hanshin Publishing Co., Seoul,
pages 111–137.

这, Dipanjan, Nathan Schneider, Desai

Fillmore, Charles J., Christopher R. 约翰逊,

陈, and Noah A. 史密斯. 2010.
Probabilistic frame-semantic parsing.
In Proceedings of the Human Language
Technologies Conference of the North
American Chapter of the Association for
计算语言学, pages 948–956,
天使们, CA.

这, Dipanjan and Noah A. 史密斯. 2011.

Semi-supervised frame-semantic parsing
for unknown predicates. 在诉讼程序中
the 49th Annual Meeting of the Association
for Computational Linguistics: 人类
语言技术, pages 1,435–1,444,
Portland, 或者.

这, Dipanjan and Noah A. 史密斯. 2012.
Graph-based lexicon expansion with
sparsity-inducing penalties. In Proceedings
of the Human Language Technologies
Conference of the North American Chapter
of the Association for Computational
语言学, pages 677–687, Montr´eal.
院长, Jeffrey and Sanjay Ghemawat. 2008.
MapReduce: Simpliﬁed data processing
on large clusters. Communications of the
ACM, 51(1):107–113.

DeNero, John and Klaus Macherey. 2011.

Model-based aligner combination using
dual decomposition. 在诉讼程序中
the 49th Annual Meeting of the Association
for Computational Linguistics: 人类
语言技术, pages 420–429,
Portland, 或者.

Deschacht, Koen and Marie-Francine Moens.

2009. Semi-supervised semantic role
labeling using the Latent Words Language
模型. 在诉讼程序中 2009 会议
on Empirical Methods in Natural Language
加工, pages 21–29, 新加坡.
Dhillon, Paramveer S., Partha Pratim

Talukdar, and Koby Crammer. 2010.
Learning better data representation
using inference-driven metric
学习. In Proceedings of the ACL
2010 Conference Short Papers,
pages 377–381, Uppsala.

Erk, Katrin and Sebastian Pad ´o. 2006.

Shalmaneser—a toolchain for shallow
语义解析. 在诉讼程序中
Fifth International Conference on Language
Resources and Evaluation, pages 527–532,
Genoa.

and Miriam R. L. Petruck. 2003.
Background to FrameNet. 国际的
Journal of Lexicography, 16.3:235–250.
Fleischman, 迈克尔, Namhee Kwon, 和
Eduard Hovy. 2003. Maximum entropy
models for FrameNet classiﬁcation.
在诉讼程序中 2003 会议
自然语言的经验方法
加工, pages 49–56, Sapporo.
Fung, Pascale and Benfeng Chen. 2004.

BiFrameNet: Bilingual frame semantics
resource construction by cross-lingual
induction. In Proceedings of the 20th
国际计算会议
语言学, pages 931–937, 日内瓦.

F ¨urstenau, Hagen and Mirella Lapata. 2009A.

Graph alignment for semi-supervised
semantic role labeling. 在诉讼程序中
2009 实证方法会议
自然语言处理, pages 11–20,
新加坡.

F ¨urstenau, Hagen and Mirella Lapata. 2009乙.
Semi-supervised semantic role labeling.
In Proceedings of the 12th Conference of the
European Chapter of the ACL, pages 220–228,
雅典.

F ¨urstenau, Hagen and Mirella Lapata. 2012.
Semi-supervised semantic role labeling
via structural alignment. 计算型
语言学, 38(1):135–171.

Gerber, Matthew and Joyce Chai. 2010.

Beyond NomBank: A study of implicit
arguments for nominal predicates. 在
Proceedings of ACL, pages 1,583–1,592,
Uppsala.

Gildea, Daniel and Daniel Jurafsky. 2002.
Automatic labeling of semantic roles.
计算语言学, 28(3):245–288.
Girju, 罗克珊, Preslav Nakov, Vivi Nastase,
Stan Szpakowicz, Peter Turney, and Deniz
Yuret. 2007. SemEval-2007 task 04:
Classiﬁcation of semantic relations
between nominals. 在诉讼程序中
Fourth International Workshop on Semantic
Evaluations, pages 13–18, Prague.
Giuglea, Ana-Maria and Alessandro
Moschitti. 2006. Shallow semantic
parsing based on FrameNet, VerbNet
and PropBank. 在诉讼程序中
17第五届欧洲人工会议
智力, pages 563–567, 阿姆斯特丹.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

Gropp, W., 乙. Lusk, 和一个. Skjellum. 1994.
Using MPI: Portable Parallel Programming
with the Message-Passing Interface.
与新闻界, 剑桥, 嘛.

Hajiˇc, Jan, Massimiliano Ciaramita,

Richard Johansson, Daisuke Kawahara,
Maria Ant `onia Mart´ı, Llu´ıs M`arquez,
Adam Meyers, 乔金·尼弗尔, Sebastian
Pad ´o, Jan ˇStˇep´anek, Pavel Stra ˇn´ak, Mihai
Surdeanu, Nianwen Xue, and Yi Zhang.
2009. The CoNLL-2009 shared task:
Syntactic and semantic dependencies in
multiple languages. 在诉讼程序中
Thirteenth Conference on Computational
Natural Language Learning, pages 1–18,
博尔德, 一氧化碳.

Ide, Nancy and Jean V´eronis. 1998.

Introduction to the special issue on word
意义消歧: The state of the art.
计算语言学, 24(1):2–40.
约翰逊, Richard and Pierre Nugues.

2007. LTH: Semantic structure extraction
using nonprojective dependency trees.
In Proceedings of the 4th International
Workshop on Semantic Evaluations,
pages 227–230, Prague.

约翰逊, Richard and Pierre Nugues. 2008.
Dependency-based semantic role labeling
of PropBank. 在诉讼程序中 2008
Conference on Empirical Methods in Natural
语言处理, pages 69–78,
檀香山, HI.

Kingsbury, Paul and Martha Palmer. 2002.

From TreeBank to PropBank. In Proceedings
of the 3rd International Conference on
语言资源与评估,
pages 1,989–1,993, Las Palmas.

Komodakis, Nikos, Nikos Paragios, 和

Georgios Tziritas. 2007. MRF optimization
via dual decomposition: Message-passing
revisited. In Eleventh International
Conference on Computer Vision, 第 1–8 页,
Rio de Janeiro.

Koo, 特里, Alexander M. 匆忙, 迈克尔
柯林斯, Tommi Jaakkola, and David
Sontag. 2010. Dual decomposition for
parsing with non-projective head
automata. 在诉讼程序中 2010
Conference on Empirical Methods in Natural
语言处理, pages 1,288–1,298,
剑桥, 嘛.

Kowalski, Matthieu and Bruno Torr´esani.
2009. Sparsity and persistence: Mixed
norms provide simple signal models with
dependent coefﬁcients. Signal, Image and
Video Processing, 3:251–264.

Lang, Joel and Mirella Lapata. 2010.

Unsupervised induction of semantic roles.
In Proceedings of the Human Language

Technologies Conference of the North
American Chapter of the Association for
计算语言学, pages 939–947,
天使们, CA.

Lang, Joel and Mirella Lapata. 2011.

Unsupervised semantic role induction
with graph partitioning. In Proceedings
的 2011 Conference on Empirical
Methods in Natural Language Processing,
pages 1320–1331, 爱丁堡.

林, Dekang. 1993. Principle-based parsing
without overgeneration. 在诉讼程序中
the 31st Annual Meeting of the Association for
计算语言学, pages 112–120,
Columbus, 哦.

林, Dekang. 1994. Principar–an efﬁcient,
broad-coverage, principle-based parser.
In Proceedings of the 15th Conference on
计算语言学, pages 482–488,
Kyoto.

林, Dekang. 1998. Automatic retrieval and
clustering of similar words. In Proceedings
of the 36th Annual Meeting of the Association
for Computational Linguistics and 17th
国际计算会议
语言学, pages 768–774, 蒙特利尔.
林, Jianhua. 1991. Divergence measures
based on the Shannon entropy. IEEE
Transactions on Information Theory,
37(1):145–151.

Litkowski, Kenneth C. and Orin Hargraves.
2007. SemEval-2007 task 06: Word-sense
disambiguation of prepositions. 在
Proceedings of the Fourth International
Workshop on Semantic Evaluations
(SemEval-2007), pages 24–29, Prague.
刘, Dong C. and Jorge Nocedal. 1989. 在
the limited memory BFGS method for
large scale optimization. Mathematical
Programming, 45(3):503–528.
马库斯, Mitchell P., Mary Ann

马尔辛凯维奇, and Beatrice Santorini.
1993. Building a large annotated corpus of
英语: the Penn treebank. 计算型
语言学, 19(2):313–330, 六月.

M`arquez, Llu´ıs, Xavier Carreras, Kenneth C.
Litkowski, and Suzanne Stevenson. 2008.
Semantic role labeling: an introduction to
the special issue. 计算语言学,
34(2):145–159, 六月.

马丁斯, Andr´e F. T。, Mario A. 时间. Figueiredo,
Pedro M. 问. Aguiar, 诺亚A. 史密斯, 和
Eric P. Xing. 2011A. An augmented
Lagrangian approach to constrained MAP
inference. In Proceedings of the 28th
International Conference on Machine
学习, pages 169–176, Bellevue, WA.
马丁斯, Andr´e F. T。, 诺亚A. 史密斯, Pedro
中号. 问. Aguiar, and Mario A. 时间. Figueiredo.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

2011乙. Dual decomposition with many
overlapping components. In Proceedings
的 2011 Conference on Empirical
Methods in Natural Language Processing,
pages 238–249, 爱丁堡.

马丁斯, Andr´e F. T。, 诺亚A. 史密斯, 和

Eric P. Xing. 2009. Concise integer
linear programming formulations for
dependency parsing. 在诉讼程序中
the Joint Conference of the 47th Annual
Meeting of the Association for Computational
Linguistics and the 4th International
Joint Conference on Natural Language
Processing of the AFNLP, pages 342–350,
Suntec.

马丁斯, Andr´e F. T。, 诺亚A. 史密斯, Eric P.
Xing, Mario A. 时间. Figueiredo, 和佩德罗
中号. 问. Aguiar. 2010. Turbo parsers:
Dependency parsing by approximate
variational inference. 在诉讼程序中
2010 实证方法会议
自然语言处理, pages 34–44,
剑桥, 嘛.

Matsubayashi, Yuichiroh, Naoaki Okazaki,
and Jun’ichi Tsujii. 2009. A comparative
study on generalization of semantic roles
in FrameNet. In Proceedings of the Joint
Conference of the 47th Annual Meeting of the
Association for Computational Linguistics and
the 4th International Joint Conference on
Natural Language Processing of the AFNLP,
pages 19–27, Suntec.

麦当劳, Ryan, Koby Crammer, 和

Fernando Pereira. 2005. 在线的
large-margin training of dependency
parsers. In Proceedings of the 43rd Annual
Meeting of the Association for Computational
语言学, pages 91–98, 安娜堡, MI.

Meyers, 亚当, Ruth Reeves, Catherine
Macleod, Rachel Szekely, Veronika
Zielinska, Brian Young, and Ralph
Grishman. 2004. The NomBank project:
An interim report. 在诉讼程序中
NAACL-HLT Workshop on Frontiers
in Corpus Annotation, pages 24–31,
波士顿, 嘛.

Moschitti, Alessandro, Paul Morarescu, 和
Sanda M. Harabagiu. 2003. Open-domain
information extraction via automatic
semantic labeling. In Ingrid Russell and
Susan M. Haller, 编辑, 诉讼程序
Sixteenth International Florida Artiﬁcial
Intelligence Research Society Conference,
pages 397–401, 英石. Augustine, FL.
Narayanan, Srini and Sanda Harabagiu.
2004. Question answering based on
semantic structures. 在诉讼程序中
the 20th International Conference on
计算语言学, 日内瓦.

Pad ´o, Sebastian and Katrin Erk. 2005.

To cause or not to cause: cross-lingual
semantic matching for paraphrase
modelling. 在诉讼程序中
Cross-Language Knowledge Induction
作坊, Cluj-Napoca.

Pado, Sebastian and Mirella Lapata. 2005.

Cross-linguistic projection of role-semantic
信息. In Proceedings of Human
Language Technology Conference and
Conference on Empirical Methods in Natural
语言处理, pages 859–866,
Vancouver.

Pennacchiotti, Marco, Diego De Cao, Roberto
Basili, Danilo Croce, and Michael Roth.
2008. Automatic induction of FrameNet
lexical units. 在诉讼程序中 2008
Conference on Empirical Methods in Natural
语言处理, pages 457–465,
檀香山, HI.

Pradhan, Sameer S., Wayne H. Ward,

Kadri Hacioglu, 詹姆斯·H. 马丁, 和
Dan Jurafsky. 2004. Shallow semantic
parsing using support vector machines.
In Proceedings of the Human Language
Technologies Conference of the North
American Chapter of the Association for
计算语言学, pages 233–240,
波士顿, 嘛.

Punyakanok, Vasin, Dan Roth, and Wen-tau
Yih. 2008. The importance of syntactic
parsing and inference in semantic role
labeling. 计算语言学,
34(2):257–287.

Punyakanok, Vasin, Dan Roth, Wen-tau Yih,

and Dav Zimak. 2004. Semantic role
labeling via integer linear programming
inference. In Proceedings of the 20th
国际计算会议
语言学, pages 1,346–1,352, 日内瓦.
拉特纳帕尔基, Adwait. 1996. A maximum

entropy model for part-of-speech tagging.
在诉讼程序中 1996 Empirical
Methods in Natural Language Processing,
pages 133–142, 哥本哈根.

Riedel, Sebastian and James Clarke. 2006.

Incremental integer linear programming
for non-projective dependency parsing.
在诉讼程序中 2006 会议
自然语言的经验方法
加工, pages 129–137, 悉尼.

Roth, Dan and Wen-tau Yih. 2004. A linear
programming formulation for global
inference in natural language tasks.
In Proceedings of the Eighth Conference on
Computational Natural Language Learning,
第 1–8 页, 波士顿, 嘛.

Ruppenhofer, Josef, Michael Ellsworth,
Miriam R. L. Petruck, Christopher R.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Das et al.

Frame-Semantic Parsing

约翰逊, and Jan Scheffczyk. 2006.
FrameNet II: extended theory and
实践. International Computer Science
研究所, 伯克利, CA.

匆忙, Alexander M. and Michael Collins.

2011. Exact decoding of syntactic
translation models through Lagrangian
relaxation. In Proceedings of the 49th Annual
Meeting of the Association for Computational
语言学: 人类语言技术,
pages 72–82, Portland, 或者.

匆忙, Alexander M, David Sontag, 迈克尔

柯林斯, and Tommi Jaakkola. 2010.
On dual decomposition and linear
programming relaxations for natural
语言处理. 在诉讼程序中
2010 实证方法会议
自然语言处理, pages 1–11,
剑桥, 嘛.

Schuler, Karin K. 2005. VerbNet: A

broad-coverage, comprehensive verb lexicon.
博士. 论文, 宾夕法尼亚大学.
沙, Fei and Fernando Pereira. 2003. Shallow
parsing with conditional random ﬁelds.
In Proceedings of the Human Language
Technology Conference of the North American
Chapter of the Association for Computational
语言学, pages 134–141, Edmonton.
沉, Dan and Mirella Lapata. 2007. 使用

semantic roles to improve question
answering. 在诉讼程序中 2007 Joint
Conference on Empirical Methods in Natural
Language Processing and Computational
Natural Language Learning, pages 12–21,
Prague.

Shi, Lei and Rada Mihalcea. 2004. 一个

algorithm for open text semantic parsing.
In Proceedings of Workshop on Robust
Methods in Analysis of Natural Language
数据, pages 59–67, 日内瓦.

Shi, Lei and Rada Mihalcea. 2005. Putting
pieces together: Combining FrameNet,
VerbNet and WordNet for robust
语义解析. In Proceedings of the 6th
国际计算会议
Linguistics and Intelligent Text Processing,
pages 100–111, Mexico City.

史密斯, 赋予生命. and Jason Eisner. 2008.

Dependency parsing by belief
propagation. 在诉讼程序中 2008
Conference on Empirical Methods in Natural
语言处理, pages 145–156,
檀香山, HI.

Subramanya, Amarnag and Jeff Bilmes.
2008. Soft-supervised learning for text
分类. 在诉讼程序中 2008
Conference on Empirical Methods in Natural
语言处理, pages 1,090–1,099,
檀香山, HI.

Subramanya, Amarnag, Slav Petrov, 和

Fernando Pereira. 2010. Efﬁcient
graph-based semi-supervised learning of
structured tagging models. In Proceedings
的 2010 经验方法会议
自然语言处理博士,
pages 167–176, 剑桥, 嘛.
Surdeanu, Mihai, Sanda Harabagiu,

John Williams, and Paul Aarseth. 2003.
Using predicate-argument structures for
information extraction. 在诉讼程序中
41st Annual Meeting on Association for
计算语言学, pages 8–15,
Sapporo.

Surdeanu, Mihai, Richard Johansson,

Adam Meyers, Llu´ıs M`arquez, and Joakim
Nivre. 2008. The CoNLL 2008 shared task
on joint parsing of syntactic and semantic
dependencies. In Proceedings of the Twelfth
Conference on Computational Natural
Language Learning, pages 159–177,
曼彻斯特.

Szummer, Martin and Tommi Jaakkola.

2001. Partially labeled classiﬁcation with
Markov random walks. In Advances in
Neural Information Processing Systems 14,
pages 945–952, Vancouver.

Talukdar, Partha Pratim and Koby Crammer.

2009. New regularized algorithms for
transductive learning. 在诉讼程序中
European Conference on Machine Learning
and Knowledge Discovery in Databases,
pages 442–457, Bled.

汤普森, Cynthia A., Roger Levy,

and Christopher D. 曼宁. 2003.
A generative model for semantic role
labeling. In Proceedings of the European
Conference on Machine Learning,
pages 397–408, Cavtat-Dubrovnik.

Titov, Ivan and Alexandre Klementiev. 2012.
A Bayesian approach to unsupervised
semantic role induction. 在诉讼程序中
the 13th Conference of the European Chapter of
the Association for Computational Linguistics,
pages 12–22, 阿维尼翁.

Toutanova, Kristina, Aria Haghighi,
and Christopher Manning. 2005.
Joint learning improves semantic
role labeling. 在诉讼程序中
43rd Annual Meeting of the Association
for Computational Linguistics,
pages 589–596, 安娜堡, MI.
Turian, 约瑟夫, Lev-Arie Ratinov,
and Yoshua Bengio. 2010. Word
陈述: A simple and general
method for semi-supervised learning.
In Proceedings of the 48th Annual Meeting
of the Association for Computational
语言学, pages 384–394, Uppsala.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
0
1
9
1
8
1
2
8
9
5
/
C
哦

我
我

_
A
_
0
0
1
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 40, 数字 1

Weston, Jason, Fr´ed´eric Ratle, 和

Ronan Collobert. 2008. 深度学习
via semi-supervised embedding.
In Proceedings of the 25th International
Conference on Machine Learning,
pages 1,168–1,175, Helsinki.

薛, Nianwen and Martha Palmer. 2004.
Calibrating features for semantic role
labeling. 在诉讼程序中 2004
Conference on Empirical Methods in Natural
语言处理, pages 88–94,
巴塞罗那.

Yi, Szu-ting, Edward Loper, and Martha

帕尔默. 2007. Can semantic roles
generalize across genres? 在诉讼程序中

the Human Language Technologies Conference
of the North American Chapter of the
计算语言学协会,
pages 548–555, 罗切斯特, 纽约.
朱, Xiaojin. 2008. Semi-supervised

learning literature survey. 可用于
http://pages.cs.wisc.edu/∼jerryzhu/
pub/ssl survey.pdf. Last Accessed
七月 2013.

朱, Xiaojin, Zoubin Ghahramani, 和
John Lafferty. 2003. Semi-supervised
learning using Gaussian ﬁelds and
harmonic functions. 在诉讼程序中
20th International Conference on Machine
学习, pages 912–919, 华盛顿, 直流.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d