Integrating Selectional Constraints and

Integrating Selectional Constraints and
Subcategorization Frames in a
Dependency Parser


Seyed Abolghasem Mirroshandel
University of Guilan

Alexis Nasr∗∗
Universit´e Aix-Marseille

Statistical parsers are trained on treebanks that are composed of a few thousand sentences. 在
order to prevent data sparseness and computational complexity, such parsers make strong inde-
pendence hypotheses on the decisions that are made to build a syntactic tree. These independence
hypotheses yield a decomposition of the syntactic structures into small pieces, which in turn
prevent the parser from adequately modeling many lexico-syntactic phenomena like selectional
constraints and subcategorization frames. 此外, treebanks are several orders of magni-
tude too small to observe many lexico-syntactic regularities, such as selectional constraints and
subcategorization frames. 在本文中, we propose a solution to both problems: how to account
for patterns that exceed the size of the pieces that are modeled in the parser and how to obtain
subcategorization frames and selectional constraints from raw corpora and incorporate them
in the parsing process. The method proposed was evaluated on French and on English. 这
experiments on French showed a decrease of 41.6% of selectional constraint violations and a
decrease of 22% of erroneous subcategorization frame assignment. These figures are lower for
英语: 16.21% in the first case and 8.83% in the second.

1. 介绍

The fundamental problem we address in this article was formulated by Joshi (1985)
as the following question: How much context-sensitivity is required to provide reasonable
structural descriptions? His answer to this question was the extended domain of locality
principle of Tree Adjoining Grammars, which allows us to represent in a single structure
(an elementary tree1) a predicate and its arguments.

∗ Computer Engineering Department, 工程学院, University of Guilan, Rasht, 伊朗.

电子邮件: mirroshandel@guilan.ac.ir.

∗∗ Laboratoire d’Informatique Fondamentale, 法国国家科学研究中心 – Universit´e Aix-Marseille, 法国.

电子邮件: alexis.nasr@univ-amu.fr.

1 In the remainder of this article, we will refer to the elementary object manipulated by the parser as

因素, borrowing this term from graph-based parsing.

提交材料已收到: 10 九月 2014; 收到修订版: 3 八月 2015; 接受出版:
28 九月 2015.

土井:10.1162/大肠杆菌a 00242

© 2016 计算语言学协会

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

Context sensitivity and the correct size of domain of locality for a reasonable struc-
tural description has always been a major issue for syntactic formalisms and parsers.
Giving an overview of the different solutions that have been proposed to answer this
question is clearly beyond the scope of this article. We will limit our discussion to the
framework of graph-based dependency parsing (艾斯纳 1996; McDonald and Pereira
2006), in which our study takes place. The basic first-order model makes an extreme
choice with respect to the domain of locality by limiting it to one dependency (the score
of a tree is the sum of the scores of its dependencies). Second-order models slightly
extend the factor sizes to second-order patterns that include a dependency adjacent
to the target dependency. It has been shown experimentally that second-order models
perform better than first-order ones, which tends to prove that second-order factors do
capture important syntactic regularities. 最近, Koo and Collins (2010) proposed
extending factor size to three dependencies and showed that such models yield better
accuracy than second-order models on the same data. 然而, extending the order
of the models has a high computational cost because the number of factors to be
considered in a sentence grows exponentially with the order of the model. Using such
models in a Dynamic Programming framework, such as Eisner (1996), becomes quickly
intractable in terms of processing time and memory requirements. 此外, 经过
reliably estimating the scores of such factors we are quickly confronted with the problem
of data sparseness.

One can also note that high-order factors are not needed for all syntactic phenom-
ena. 实际上, most syntactic attachments can be accurately described with first- 和
second-order models. This is why first- and second-order models perform quite well:
They reach a labeled accuracy score of 85.36% 和 88.88% for first- and second-order
型号, 分别, on the French Treebank (Abeill´e, Cl´ement, and Toussenel 2003).
The same models trained on the Penn Treebank (马库斯, 马尔辛凯维奇, and Santorini
1993) 抵达 88.57% 和 91.76% labeled accuracy score, 分别. The extension of
the domain of locality can therefore be limited to some syntactic phenomena. 这是
the case in Tree Adjoining Grammars for which the extended domain of locality only
concerned some aspects of syntax, 即, subcategorization.

The solution we explore in this article, in order to take high-order factors into
account in a parser, is to decompose the parsing process into two sub-processes.
One of them is in charge of local syntactic phenomena and does not need a high-
order model and the other takes care of syntactic phenomena that are beyond the
scope of the first one. We will call the first one parsing and the second one patch-
英. The patching process is responsible for modeling two important aspects of
syntax, Selectional Constraints (SCs) and Subcategorization Frames (SFs), 哪个是
usually poorly modeled in parsers, as we will show in Sections 7 和 8. Parsing
and patching differ in two important aspects. 第一的, they rely on different search
算法. The first one is based on Dynamic Programming and the other one
uses Integer Linear Programming (ILP). 第二, they rely on different data sets. 这
first uses a standard treebank whereas the second uses much larger unannotated
语料库.

The reason for the first difference comes from the different nature of these two
流程. As already mentioned, high-order factors are needed to model some specific
syntactic phenomena and we do not want to systematically increase the order of the
model in order to take them into account. Using factors of different sizes in a Dynamic
Programming parser is not practical and leads to ad hoc adaptation of the parser. 这
is the reason why patching is based on ILP, which can accommodate more naturally for
constraints defined on structures of different sizes.

56

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

The second difference between parsing and patching is the data sets that were
used to train them. The parser is trained on a standard treebank whereas the patching
model is trained on a raw corpus that is several orders of magnitude larger. 这
reason for this difference comes from the fact that high-order factors used to model
SFs and SCs contain up to three lexical elements and cannot be accurately modeled
using available treebanks. It has been shown by Gildea (2001) and Bikel (2004) 那
bilexical dependencies of the Collins parser (柯林斯 1997) have a very limited impact
on the performances of the parser, although they were supposed to play a key role for
some important ambiguous syntactic attachments. The reason for this is that treebanks
are not large enough to correctly model bilexical dependencies. We show in Sections 7.2
和 8.2 that current treebanks are much too limited in size to accurately model SCs
and SFs.

Other solutions have been proposed in the literature in order to combine local and
global syntactic phenomena in a parser. Parse reranking is one of them. The idea is to
produce the n-best parses for a sentence and rerank them using higher order features,
such as in Collins and Koo (2005) and Charniak and Johnson (2005). This solution has
the advantage of drastically limiting the search space of the high-order search algorithm
to the k best parses. Although the parsing architecture we propose in this article does
use k-best parse lists, our solution allows us to combine dependencies that appear in
any of the parses of the list. We show in Section 6.3 that combining dependencies that
appear in any parse of a k-best list can yield parses that are more accurate than the best
parse of this k-best list. Our method is closer to forest rescoring (Huang and Chiang
2007) or forest reranking (黄 2008).

Another solution to combine local and global decisions is to represent the local
and the global constraints as a single ILP program. This solution is possible because
dependency parsing can be framed as an ILP program. Riedel and Clarke (2006)
propose a formulation of nonprojective dependency parsing as an ILP program. 这
程序, 然而, requires an exponential number of variables. 马丁斯, 史密斯, 和
Xing (2009) propose a more concise formulation that only requires a polynomial number
of variables and constraints. 马丁斯, Almeida, 和史密斯 (2013) propose extending the
model to third-order factors and using dual decomposition. All these approaches have
in common their ability to produce nonprojective structures (Lecerf 1961). Our work
departs from these approaches in that it uses a dynamic programming parser combined
with an ILP solver. 此外, we do not take nonprojective structures into account.
The structure of this article is as follows. 在部分 2 we describe the type of
parser used in this work. 部分 3 describes the patching process that detects, 在一个
given sentence, a set of optimal SFs and SCs, using ILP. The exact nature of SFs and
SCs is described in that section. 部分 4 proposes a way to combine the solutions
produced by the two methods using constrained parsing. 在部分 5, we justify the
use of ILP by showing that the patching process is actually NP-complete. Sections 6, 7,
和 8 constitute the experimental part of this work. 部分 6 describes the experimental
set-up, and Sections 7 和 8, 分别, give results obtained on French and English.
部分 9 concludes the article.

The material presented in this article has been partially published in Mirroshandel,
Nasr, and Roux (2012) and Mirroshandel, Nasr, and Sagot (2013). In Mirroshandel, Nasr,
and Roux (2012), a method for taking into account selectional constraints in a parser is
提出, and in Mirroshandel, Nasr, and Sagot (2013) a method of introducing sub-
categorization frames in the parser is described. This paper proposes a general frame-
work for dealing with these two problems, both separately and jointly, using Integer
Linear Programming, whereas the two previous techniques used ad hoc adaptations

57

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

and could not treat the two problems jointly. 此外, we extended the experimen-
tal part of our work to English, whereas previous work only concerned French.

2. Parsing

在这个部分, we briefly present the graph-based approach for projective dependency
解析. We then introduce an extension to this kind of parser, which we call con-
strained parsing, that forces the parser to include some specific patterns in its output.
然后, a simple confidence measure that indicates how confident the parser is about any
subpart of the solution is presented.

2.1 Graph-Based Parsing

Graph-based parsing (麦当劳, Crammer, and Pereira 2005a; K ¨ubler, 麦当劳, 和
Nivre 2009) defines a framework for parsing that does not make use of a generative
grammar. In such a framework, given a sentence S = w1 . . . wl, any dependency tree2
for S is a possible syntactic structure for S. The heart of a graph-based parser is the
scoring function, which assigns a score s(时间) to every tree T ∈ T (S), 其中T (S) denotes
the set of all dependency trees of sentence S. Such scores are usually the sum of the
scores of subparts of S. The parser returns the tree that maximizes such a score:

ˆT = arg max
T∈T (S)

(西德:88)

s(ψ)

ψ∈ψ(时间)

(1)

In Equation (1), ψ(时间) is the set of all the relevant subparts of tree T and s(ψ) 是个
score of subpart ψ. The scores of the subparts are computed using machine learning
algorithms on a treebank, such as McDonald, Crammer, 和佩雷拉 (2005乙).

As already mentioned in Section 1, several decompositions of the tree into sub-
parts are possible and yield models of increasing complexity. The most simple one
is the first-order model, which simply decomposes a tree into single dependencies
and assigns a score to a dependency, irrespective of its context. The more popular
model is the second-order model, which decomposes a tree into subparts made of two
dependencies.

Solving Equation (1) cannot be done naively for combinatorial reasons: The set T (S)
contains an exponential number of trees. 然而, it can be solved in polynomial time
using dynamic programming (艾斯纳 2000).

2.2 Constrained Parsing

The graph-based framework allows us to impose structural constraints on the parses
that are produced by the parser in a straightforward way, a feature that will prove to be
valuable in this work.

A sentence S = w1 . . . wl is a parser with a scoring function s and a dependency
d = (我, r, j), such that i is the position of the governor, j the position of the dependent,

2 A dependency tree for sentence S = w1 . . . wl and the dependency relation set R is a directed labeled tree

(V, A) such that V = {w1, . . . wn} and A ⊆ V × R × V.

58

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

and r the dependency label, 和 1 ≤ 我, j ≤ l. We can define a new scoring function s+
the following way:

d in

s+
d (我(西德:48), r(西德:48), j(西德:48)) =


−∞

if j(西德:48) = j 和
(我(西德:48) (西德:54)= i or r(西德:48) (西德:54)= r)

s(我(西德:48), r(西德:48), j(西德:48)) 否则

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

When running the parser on sentence S, with the scoring function s+

d , one can be
sure that dependency d will be part of the solution produced by the parser because the
score of all its competing dependencies (IE。, dependencies of the form (我(西德:48), r(西德:48), j(西德:48)) 和
different governors (我(西德:48) (西德:54)= i) or labels (r(西德:48) (西德:54)= r) and the same dependent (j(西德:48) = j)) are set
to −∞. 因此, the overall score of trees that have such competing dependencies
cannot be higher than the trees that have our desired dependency. 换句话说, 这
resulting parse tree will contain dependency d. This feature will be used in Section 4
in order to parse a sentence for which a certain number of dependencies are set in
advance.

Function s+

d can be extended in order to force the presence of any set of dependen-
cies in the output of the parser. Suppose D is a set of dependencies; we then define s+
D
the following way:

s+
D (我, r, j) =


s+
d (我, r, j) if ∃d ∈ D such that d = (·, ·, j)

s(我, r, j) 否则

Modifying the scoring function of the parser to force the inclusion or the ex-
clusion of some dependencies has already been used in the literature. It was used
by Koo and Collins (2010) to ignore dependencies whose probability was below a
certain threshold and by Rush and Petrov (2012) in their multi-pass, coarse-to-fine
方法.

2.3 Confidence Measure

The other extension of graph-based parsers that we use in this work is the computation
of confidence measures on the output of the parser. The confidence measure used is
based on the estimation of posterior probability of a subtree, computed on the k-best list
of parses of a sentence. It is very close to confidence measures used for speech recog-
尼尼申 (Wessel et al. 2001) and differs from confidence measures used for parsing, 这样的
as S´anchez-S´aez, S´anchez, and Bened´ı (2009) or Hwa (2004), which compute confidence
measures for complete parses of a sentence.

Given the k-best parses of a sentence and a subtree ∆ present in at least one
of the k-best parses, let C() be the number of occurrences of ∆ in the k-best
parse set. Then CM(), the confidence measure associated with ∆, is computed
作为:

CM() =

C()
k

59

计算语言学

体积 42, 数字 1

Mirroshandel and Nasr (2011) and Mirroshandel, Nasr, and Roux (2012) provide
more detail on the performance of this measure. Other confidence measures could be
用过的, such as the edge expectation described in McDonald and Satta (2007), or inside-
outside probabilities.

3. Patching

在这个部分, we introduce the process of patching a sentence. The general idea of
patching sentence S is to detect in S the occurrences of predefined configurations.
In section 3.1 we introduce Lexico-Syntactic Configurations, which correspond to the
patches. We describe in section 3.2 the process of selecting an optimal set of patches
on a sentence (finding an optimal solution is not trivial since patches have scores
and some patches are incompatible). Sections 3.3 和 3.4, 分别, describe two
instances of the patching problem, 即, subcategorization frame selection and se-
lectional constraint satisfaction. 部分 3.5 shows how both problems can be solved
一起.

3.1 Lexico-Syntactic Configuration

A Lexico-syntactic Configuration (LSC) is a pair (s, 时间) where s is a numerical value,
called the score of the LSC, and T is a dependency tree of arbitrary size. Every node of
the dependency tree has five features: a syntactic function, a part of speech, a lemma, A
fledged form, and an index, which represents a position in a sentence. The features of
every node can either be specified or left unspecified. Given an LSC i, s(我) will represent
the score of i and T(我) will represent its tree.

Index features are always unspecified in an LSC. What follows is an example of an
LSC that corresponds to the sentence Jean offre des fleurs `a N [Jean offers flowers to N]. 这
five features of the dependency tree nodes are represented between square brackets and
are separated by colons.

(0.028, [:V:offrir:offre:](

[SUJ:氮:让:让:],
[OBJ:氮:fleur:fleurs:],
[A-OBJ:磷:A:A:](
[OBJ:氮:::])))

An instantiated LSC (ILSC) is an LSC such that all its node features (尤其,
index features) are specified. We will note R(我) as the index of the root of ILSC i and L(我)
as the set of its leaf indices.

Given the sentence Jean offre des fleurs `a Marie and the LSC, the instantiation of the

LSC on the sentence gives the following ILSC:

(0.028, [:V:offrir:offre:2](

[SUJ:氮:让:让:1],
[OBJ:氮:fleur:fleurs:4],
[A-OBJ:磷:A:A:5](

[OBJ:氮:Marie:Marie:6])))

60

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

3.2 Selecting the Optimal Set of ILSCs

Given a sentence S, and a set L of LSCs, we define the set I of all possible instantiations
of elements of L on S. Once the set I has been defined, our aim is to select a subset ˆI
made of compatible3 ILSCs such that

ˆI = arg max

I⊆I

(西德:88)

C∈I

s(C)

(2)

As it was the case for Equation (1), 方程 (2) cannot be solved naively for
combinatorial reasons: The power-set of I contains an exponential number of elements.
Such a problem will be solved using ILP.

In order to solve Equation (2), the set I of all possible instantiations of the LSCs
of set L for a given sentence S must be built. Several methods can be used to compute
such a set, 如节中所述 6.3.

We will see in Sections 3.3 和 3.4 two instances of the patching problem. 在

部分 3.3, ILSCs will represent SFs, and in Section 3.4, they will represent SCs.

3.3 Subcategorization Frames

A subcategorization frame is a special kind of LSC. Its root is a predicate and its leaves
are the arguments of the predicate.

Here is an example of an SF for the French verb donner [给予]. It is a ditransitive
frame; it has both a direct object and an indirect one introduced by the preposition `a as
in Jean donne un livre `a Marie. [Jean gives a book to Marie].

(0.028, [:V:donner::] (

[SUJ:氮:::],
[OBJ:氮:::],
[A-OBJ:磷:A:A:] (
[OBJ:氮:::])))

The score of an SF T must reflect the tendency of a verb V (the root of the SF) 到
select the given SF. This score sSF is simply the conditional probability P(时间|V) estimated
data sets that will be described in Sections 7 和 8.

3.3.1 Selecting the Optimal Set of Subcategorization Frames. Given a sentence S of length
N and the set I of Instantiated Subcategorization Frames (ISFs) over S, the selection
of the optimal set ˆI ⊆ I is done by solving the general equation 2, using ILP, 和
constraints that are specific to SF selection. The exact formulation of the ILP program is
now described.

(西德:114)

Definition of the variables

αj
i = 1 if word number i is the predicate of ISF number j, and αj
否则.

i = 0

3 We will not say more at this point about the compatibility of two ILSCs. The exact definition of

incompatibility will be given in Sections 3.3 和 3.4.

61

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

(西德:114)

(西德:114)

βj
i = 1 if word number i is an argument of ISF number j, and βj
否则.
Variables αj
variable αj

i and βj
i are not defined for all values of i and j. A
i is defined only if word i can be the predicate of ISF j ∈ I.

i = 0

Definition of the constraints

a word is the predicate of at most one ISF:

∀i ∈ {1, . . . , 氮}

αj
i ≤ 1

(西德:88)

j∈I

a word cannot be an argument of more than one ISF:

∀i ∈ {1, . . . , 氮}

βj

i ≤ 1

(西德:88)

j∈I

for an ISF to be selected, both its predicate and all its arguments
(IE。, L(j)) must be selected:

∀j ∈ {1, . . . , |我|} |L(j)|αj

右(j) -

(西德:88)

l∈L(j)

βj
l = 0

Definition of the objective function

max

(西德:88)

j∈I

αj

右(j)sSF(j)

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Let us illustrate this process on the sentence Jean rend le livre qu’il a emprunt´e `a la
biblioth`eque. [Jean returns the book that he has borrowed from the library.] which contains an
ambiguous prepositional phrase attachment: the prepositional phrase `a la biblioth`eque
[from the library] can be attached either to the verb rendre [返回] or to the verb emprunter
[borrow] . The set I is composed of the four following ISFs:

1 (0.2,[:V:rend:rendre:2](
[SUJ:氮:让:让:1],
[OBJ:氮:livre:livre:4]))

2 (0.4,[:V:rend:rendre:2](
[SUJ:氮:让:让:1],
[OBJ:氮:livre:livre:4],
[A-OBJ:磷:A:A:9](

[OBJ:氮:biblio.:biblio.:11])))

3 (0.3,[:V:emprunte:emprunter:8] (

[SUJ:氮:il:il:6],
[OBJ:氮:qu’:que:5]))

4 (0.6,[:V:emprunte:emprunter:8] (

[SUJ:氮:il:il:6],
[OBJ:氮:qu’:que:5],
[A-OBJ:磷:A:A:9] (

[OBJ:氮:biblio.:biblio.:11])))

62

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

The actual ILP program is the following:

(西德:114)

(西德:114)

(西德:114)

Definition of the variables

2, α2
α1
2
α3
8, α4
8
β1
1, β2
1
β1
4, β2
4
β1
11
β3
5, β4
5
β3
6, β4
6
β4
11

verb rendre can select ISF 1 或者 2
verb emprunter can select ISF 3 或者 4
noun Jean can be the subject of ISF 1 或者 2
noun livre can be the object of ISF 1 或者 2
noun bibliotheque can be the indirect object of ISF 2
relative pronoun que can be the object of ISF 3 或者 4
pronoun il can be the subject of ISF 3 或者 4
noun bibliotheque can be the indirect object of ISF 4

Definition of the constraints

a word is the predicate of at most one ISF:

α1
2 + α2
8 + α4
α3

2 ≤ 1
8 ≤ 1

verb rendre cannot be the predicate of both ISF 1 和 2
verb emprunter cannot be the predicate of both ISF 3 和 4

a word cannot be an argument of more than one ISF:

1 + β2
β1
4 + β2
β1
5 + β4
β3

β3
6 + β4
11 + β4
β2

1 ≤ 1
4 ≤ 1
5 ≤ 1

Noun Jean cannot be the subject of both ISF 1 和 2
Noun livre cannot be the object of both ISF 1 和 2
Relative pronoun que cannot be the object of
both ISF 3 和 4
6 ≤ 1
Pronoun il cannot be the subject of both ISF 3 和 4
11 ≤ 1 Noun bibliotheque cannot be the object of both ISF 2 和 4

for an ISF to be selected, both its predicate and all its arguments
must be selected:
1 + β1
2 - (β1
1 + β2
2 - (β2

4) = 0
4 + β2

11) = 0

2α1
3α2

ISF 1 must have a subject and an object
ISF 2 must have a subject, an object and an
indirect object
ISF 3 must have a subject and an object
ISF 4 must have a subject, an object and an
indirect object

2α3
3α4

8 - (β3
8 - (β4

5 + β3
5 + β4

6) = 0
6 + β4

11) = 0

Definition of the objective function

max(0.2α1

2 + 0.4α2

2 + 0.3α3

8 + 0.6α4
8)

The problem admits eight solutions, represented in Figure 1. Each column
corresponds to a variable. Black dots (•) correspond to the value 1 and white dots (○)
to the value 0. The optimal solution is solution number 7 for which verb rendre selects
ISF 1 and verb emprunter selects ISF 4. In this solution, the prepositional phrase `a la
biblioth`eque is attached to the verb rendre.

3.4 Selectional Constraints

A selectional constraint (SC) is another kind of LSC. Its dependency tree is either made
of a single dependency or two dependencies in a (daughter, 母亲, 祖母)
configuration. An SC models the tendency of two words (the root and the leaf) to co-
occur in a specific syntactic configuration.

63

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

2 β2

8 β4

4 α2

5 β3

2 β1
α1

1 β1
1 β2
1 ◦ ◦ ◦ ◦ ◦ ◦
2 • • • ◦ ◦ ◦
3 ◦ ◦ ◦ • • •
4 ◦ ◦ ◦ ◦ ◦ ◦
5 ◦ ◦ ◦ ◦ ◦ ◦
6 • • • ◦ ◦ ◦
7 • • • ◦ ◦ ◦
8 ◦ ◦ ◦ • • •

4 β2
11 α3







8 β3
5 β4
6 α4
◦ ◦ ◦ ◦ ◦ ◦
◦ ◦ ◦ ◦ ◦ ◦
◦ ◦ ◦ ◦ ◦ ◦
• • • ◦ ◦ ◦
◦ ◦ ◦ • • •
• • • ◦ ◦ ◦
◦ ◦ ◦ • • •
• • • ◦ ◦ ◦

6 β4
11 分数
0.0

0.2

0.4

0.3

0.6

0.5

0.8

0.7

数字 1
Representation of the eight solutions to the ILP problem.

A set of four SC patterns, described here, have been defined. The first two describe
a subject and object selectional constraint, 分别. Patterns 3 和 4, 分别,
describe selectional constraints on an indirect object introduced by the preposition de
and `a :

([:V:::]([SUJ:氮:::]))
([:V:::]([OBJ:氮:::]))
([:V:::]([DE-OBJ:磷:的::]([OBJ:氮:::])))
([:V:::]([A-OBJ:磷:A::]([OBJ:氮:::])))

Three important formal features distinguish SFs and SCs:

(西德:114)

(西德:114)

(西德:114)

The domain of locality of SFs usually exceeds the domain of locality of SCs.

SCs model bilexical phenomena (the tendency of two words to co-occur
in a specific syntactic configuration) whereas SFs model unilexical
现象 (the tendency of one word to select a specific SF).

The number of different SC patterns is set a priori whereas the number of
SF patterns is data driven.

Note that SC patterns are described at a surface syntactic level and, in the case
of control and raising verbs, the SC pattern will be defined over the subject and the
control/raising verb and not the embedded verb—although the selectional constraint is
between the subject and the embedded verb. In the event of coordination of subject or
目的, only one of the coordinated elements will be extracted.

The score of an SC should reflect the tendency of the root lemma lr and the leaf
lemma ll to appear together in configuration C. It should be maximal if whenever lr oc-
curs as the root of configuration C, the leaf position is occupied by ll and, symmetrically,
if whenever ll occurs as the leaf of configuration C, the root position is occupied by lr.
A function that conforms to such a behavior is the following:

sSC(C, lr, ll) = 1
2

(西德:18) C(C, lr, ll)
C(C, lr, )

+

C(C, lr, ll)
C(C, , ll)

(西德:19)

where C(C, 我, ) (分别, C(C, , 我)) is the number of occurrences of configuration C
with lemma l as a root (分别, leaf) 和C(C, lr, ll) is the number of occurrences of
configuration C with lemma lr as a root and lemma ll as a leaf, 同时地.

64

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

This function takes its values between 0 (lr and ll never co-occur) 和 1 (lr and ll
always co-occur). It is close to pointwise mutual information (教堂和汉克斯 1990)
but takes its values between 0 和 1.

3.4.1 Selecting the Optimal Set of Selectional Constraints. Given a sentence S of length N
and the set I(西德:48) of Instantiated Selectional Constraints (ISCs) over S, the selection of the
optimal set ˆI(西德:48) ⊆ I(西德:48) is framed as the following ILP program.

(西德:114)

(西德:114)

(西德:114)

Definition of the variables

γj
i = 1 if word number i is the root of ISC number j, and γj
否则.
i if word number i is the leaf of ISC number j, and δj
δj
否则.

i = 0

i = 0

Definition of the constraints

a word cannot be the leaf of more than one ISC

∀i ∈ {1, . . . 氮}

δj
i ≤ 1

(西德:88)

j∈I(西德:48)

for an ISC j to be selected, both its root (IE。, 右(j)) and its leaf must
be selected

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

∀j ∈ {1, . . . , |我(西德:48)|} γj

右( j) − δj

d∈L(j) = 0

Definition of the objective function

max

(西德:88)

j∈I(西德:48)

γj
右(j)sSC(j)

Let us illustrate the selection process on the same sentence (Jean rend le livre qu’il a

emprunt´e `a la biblioth`eque.). A set of four ISCs is defined:

5 (0.2,[:V:rend:rendre:2]

([SUJ:氮:让:让:1]))

6 (0.2,[:V:rend:rendre:2]

([OBJ:氮:livre:livre:4]))

7 (0.4,[:V:rend:rendre:2]

([A-OBJ:磷:A:A:9]

([OBJ:氮:biblio.:biblio.:11])))

8 (0.6,[:V:emprunte:emprunter:8]

([A-OBJ:磷:A:A:9]

([OBJ:氮:biblio.:biblio.:11])))

The actual IL program is the following:

(西德:114)

Definition of the variables

2, γ7
2

2, γ6
γ5
γ8
8
δ5
1
δ6
4
11, δ8
δ7
11

verb rendre can be the root of ISC 5, 6, 和 7
verb emprunter can be the root of ISC 8
noun Jean can be the leaf of ISC 5
noun livre can be the leaf of ISC 6
noun biblioth`eque can be the leaf of ISC 7 或者 8

65

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

(西德:114)

(西德:114)

Definition of the constraints

a word cannot be the leaf of more than one ISC

δ5
1 ≤ 1
δ6
4 ≤ 1
11 + δ8
δ7

11 ≤ 1

noun Jean can only be the leaf of ISC 5
noun livre can only be the leaf of ISC 6
noun biblioth`eque cannot be the leaf of both ISC 7 和 8

for an ISC to be selected, both its root and its leaf must be selected
ISC 5 must have a root and a leaf
ISC 6 must have a root and a leaf
ISC 7 must have a root and a leaf
ISC 8 must have a root and a leaf

γ5
2 − δ5
2 − δ6
γ6
γ7
2 − δ7
8 − δ8
γ8

1 = 0
4 = 0
11 = 0
11 = 0

Definition of the objective function

max(0.2γ5

2 + 0.2γ6

2 + 0.4γ7

2 + 0.6γ8
8)

The problem admits twelve solutions, represented in Figure 2. The optimal solution is
solution number 12, in which the prepositional phrase `a la biblioth`eque is attached to the
verb emprunter.

3.5 Combining Subcategorization Frames and Selectional Constraints

SF selection and SC satisfaction can be combined together in a single ILP program that
combines the variables and constraints of the two problems and adds a new constraint
that takes care of the incompatibilities of SCs and SFs.

Given a sentence S of length N, the set I of ISFs and the set I(西德:48) of ISCs, the selection

of the optimal set ˆI(西德:48)(西德:48) ⊆ I ∪ I(西德:48) is the solution of the following program:

(西德:114)

(西德:114)

Definition of the variables
我, βj
αj
Definition of the constraints

我, γj

我, δj

All the constraints of ISF selection and ISC satisfaction.

2 δ7

1 γ6

11 γ8

2 δ5
γ5

8 δ8
4 γ7
2 δ6
1 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
2 • • ◦ ◦ ◦ ◦ ◦ ◦
3 ◦ ◦ • • ◦ ◦ ◦ ◦
4 ◦ ◦ ◦ ◦ • • ◦ ◦
5 ◦ ◦ ◦ ◦ ◦ ◦ • •
6 • • • • ◦ ◦ ◦ ◦
7 • • ◦ ◦ • • ◦ ◦
8 • • ◦ ◦ ◦ ◦ • •
9 ◦ ◦ • • • • ◦ ◦
10 ◦ ◦ • • ◦ ◦ • •
11 • • • • • • ◦ ◦
12 • • • • ◦ ◦ • •

11 分数
0.0
0.2
0.2
0.4
0.6
0.4
0.6
0.8
0.6
0.8
0.8
1.0

数字 2
Representation of the 12 solutions to the ILP problem.

66

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

Incompatible ISFs and ISCs cannot be selected together.
An ISF j and an ISC j(西德:48) are not compatible if they share a common
leaf (我(西德:48)) but have different roots (我 (西德:54)= i(西德:48)(西德:48)). Such a constraint can be
modeled by the following inequality:

∀i, 我(西德:48), 我(西德:48)(西德:48), j, j(西德:48) s.t. 我 (西德:54)= i(西德:48)(西德:48), 2αj

我 + βj

我(西德:48) + 2γj(西德:48)

我(西德:48)(西德:48) + δj(西德:48)

我(西德:48) (西德:54)= 5

(西德:114)

Definition of the objective function

(西德:88)

max

δj
右(j)sSC(j) +

j∈I(西德:48)

(西德:88)

j∈I

αj

右(j)sSF(j)

The actual ILP program is too large and redundant with the two preceding ones to
be exposed fully here. Its set of variables and constraints is the union of the variables
and the constraints of the preceding ones, plus the two following ones:

2α2
2α4

2 + β2
8 + β4

11 + 2γ8
11 + 2γ7

8 + δ8
2 + δ7

11 (西德:54)= 5
11 (西德:54)= 5

ISF 2 and ISC 8 are incompatible
ISF 4 and ISC 7 are incompatible

Definition of the objective function:

max(0.2γ5

2 + 0.2γ6

2 + 0.4γ7

2 + 0.6γ8

8 + 0.2α1

2 + 0.4α2

2 + 0.3α3

8 + 0.6α4
8)

The problem admits 94 solutions. The optimal solution is made of ISFs 1 和 4, 和

ISCs 5, 6, 和 8.

4. Combining Parsing and Patching

Two processes have been described in Sections 2 和 3. In the first one, parsing is based
on dynamic programming and produces a complete parse of a sentence S whereas in
ˆI(西德:48)(西德:48) 制成
第二, patching is based on ILP and produces a partial parse of S: 集合
of ISCs and ISFs.

We would like now to produce a parse for sentence S that combines constraints
coming from parsing and patching. We have exposed in Section 1 the reasons why we
chose to keep these two processes separated and to combine their solutions to produce
a single parse. The whole process is composed of three steps:

1.

2.

3.

Sentence S is first parsed using a first-order parser and the set of k-best
parses is produced in order to generate the set I of ISFs and the set I(西德:48) 的
ISCs.

Patching is then performed and computes the optimal set
ISFs and ISCs.
A new scoring function s+
produces a parse tree ˆT that preserves the ISFs and ISCs computed in
Step 2.

ˆI(西德:48)(西德:48) is then computed and a second-order parser

ˆI(西德:48)(西德:48) ⊆ I ∪ I(西德:48) 的

67

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

This setting is strongly biased in favor of the patching process. Because of the nature
of the scoring function s+
ˆI(西德:48)(西德:48) , the ISFs and ISCs computed during the patching step are
kept in the final solution even if considered unlikely by the parser. 在实践中, 这
solution does not perform well.

In order to let the parser influence the solution of the patching process, the confi-
dence measure CM, defined in Section 2, is introduced in the objective function of the
ILP. Recall that CM() indicates how confident the parser is about sub-tree ∆. 这
quantity is added to the scores of SFs and SCs introduced in sections 3.3 和 3.4.

The new scoring functions ˆsSF and ˆsSC are defined as follows:

ˆsSF() = (1 − µ1)sSF() + µ1CM()

ˆsSC() = (1 − µ2)sSC() + µ2CM()

where the values of µ1 and µ2 are determined experimentally on a development set. 在
the definition of ˆsSF and ˆsSC, the first component (sSF and sSC) can be interpreted as a
lexical score and the second one (CM) as a syntactic score.

After patching is performed and the set

ˆI(西德:48)(西德:48) is computed, the scoring function of the
ˆI(西德:48)(西德:48) to appear in

second order parser is modified in order to force the ISFs and ISCs of
the parser’s solution.

As explained in Section 2.2, the modification of the scoring function modifies only
the scores of first-order factors (single dependencies). The modification amounts to
setting to −∞ the scores of the dependencies that compete with those dependencies
that occur in ˆI(西德:48)(西德:48). At this point, 集合
ˆI(西德:48)(西德:48) is just considered as a set of first-order
因素, whatever the ILSC they are a member of. The scores of second-order factors
are not modified. This solution works because the second-order parser uses both first-
and second-order factors. There is no need to modify the scoring function of the second-
(or higher) order factors since the first-order factors that they are composed of will be
discarded anyway, because of their modified scores.

Let us illustrate the process on the sentence Jean commande une glace `a la serveuse.
[Jean orders an ice-cream from the waitress.]. The parser output for this sentence
wrongly attaches the preposition `a to the noun glace whereas it should be attached to the
ˆI(西德:48)(西德:48) is made of the ISF (commander,
verb commander. Suppose that, after patching, 集合
SBJ:N OBJ:N AOBJ:氮): the verb commander takes an indirect object introduced by prepo-
sition `a, and the ISC (VaN, commander, serveuse): the noun serveuse is the indirect object of
ˆI(西德:48)(西德:48) is viewed as the set of dependencies { (commande,
the verb commander. At this point
SUJ, 让), (commande, OBJ, glace), (commande, AOBJ, `a), (`a, OBJ, serveuse)}. The score of
all competing dependencies, which are all dependencies that have either Jean, glace, `a, 或者
serveuse as a dependent, are set to −∞ and will be discarded from the parser solution.

5. Complexity of the Patching Problem

The two-stage architecture proposed in this paper—parsing using dynamic program-
明; then patching using ILP—suffers from a potential efficiency problem because
of the complexity of ILP, which is NP-complete (Karp 1972) in the general case. 我们的
system has therefore an exponential worst-case complexity. Two questions then arise:

68

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

Could patching be solved using a polynomial algorithm? How does a general ILP solver
perform, 在实践中, on our data?

We will answer these questions in two steps. 第一的, we will prove in this section that
patching is actually NP-complete. Building the optimal solution requires exponential
time in the worst case. 第二, we will show in Section 7, 那, due to the size of the
instances at hand, the processing time using a general ILP solver is reasonable.

In order to prove that patching is NP-complete, we will first show that some
instances of patching can be expressed as a Set Packing problem (SP), known to be
NP-complete (Karp 1972). The representation of a patching problem as a SP problem
will be called patching as a Set Packing problem (P-SP). We will then reduce SP to P-SP
by showing that any instance of SP can be represented as an instance of P-SP, 这将
prove the NP-completeness of P-SP and therefore the NP-completeness of patching.

The Set Packing decision problem is the following: Given a finite set S and a list U of
subsets of S, are there k subsets in U that are pairwise disjoint? The optimization version
of SP, the Maximum Set Packing problem (MSP), looks for the maximum number of
pairwise disjoint sets in U.

We will show that a subset of the instances of patching can be expressed as an MSP.

更确切地说, we restrict ourselves to instances such that:

(西德:114)

(西德:114)

the set I(西德:48) of Instantiated Constraint Selections is empty (we only deal with
a set I of ISFs) 和

the scores of all ISFs in I are equal.

The reason why we limit ourselves to a subset of patching is that our goal is to
reduce any SP problem to a patching problem. Reducing an SP problem to some special
cases of patching is enough to prove NP-completeness of patching. Patching in the
general case might be represented as an MSP problem but this is not our goal.

Recall that patching with ISF is subject to three constraints:

C1: a word cannot be the predicate of more than one ISF

C2: a word cannot be the argument of more than one ISF

C3: for an ISF to be selected, both its predicate and its arguments must be selected

Given a sentence S = w1, . . . , wn and a set I of ISFs on S, let us associate with S the
set S = {1, . . . , n}. As a first approximation, let us represent an ISF of I as the subset
of S composed of the indices of the arguments of the ISF. The list of ISF I is therefore
represented as a list U of subsets of S. Solving the MSP problem for (S, U ) produces the
optimal solution ˆU that satisfies C2 and partly C3. C2 is satisfied because the elements
of ˆU are pairwise disjoint. C3 is partly satisfied because all the arguments of an ISF are
selected altogether, but not the predicate.

The most straightforward way to take C1 into account is to add the index of an
ISF predicate in the subset corresponding to the ISF. 很遗憾, this solution is not
正确的. It will correctly constrain a word to be the predicate of at most one ISF but it will
incorrectly prevent a word from acting as a predicate of an ISF and as an argument of
another ISF. 为了解决这个问题, we will associate two integers to a word of
the sentence, one representing the word as a predicate and the other representing it as
an argument. Given a sentence of length n and the word of index i, i will represent
the word as an argument and i + n will represent the word as a predicate. An ISF

69

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

. . . wak as arguments is represented as
having word wp as a predicate and words wa1
the subset {a1, . . . ak, p + n}. The solution to the MSP problem using this representation
clearly verifies our three constraints.

Given a sentence S = w1, . . . , wn and a set of ISFs I, we can build the set S =
{1, . . . , 氮}, with N ≥ 2n4 and the list U such that any element of U corresponds to an
ISF of I, using the method described here. The solution to the MSP on input (S, U ) 是
equivalent to the solution of patching on input (S, 我 ). Such an MSP problem will be
called a P-MSP problem (patching as Maximum Set Packing). Recall that the decision
version of the problem is called P-SP (patching as Set Packing).

In order to prove the NP-completeness of patching, we will reduce any instance
of SP to an instance of P-SP. Given the set S = {1, . . . , n} and the list U = (U1, . . . , Um)
of subsets of S, let us build the set S (西德:48) = {1, . . . , n + 米} and the list U (西德:48) = (U (西德:48)
1, . . . , U (西德:48)
米)
i = Ui ∪ {我 + n}. 这对 (S (西德:48), U (西德:48)) is a valid instance of P-SP. The solution
such that U (西德:48)
of P-SP on input (S (西德:48), U (西德:48)) corresponds to the solution of SP on input (S, U ).5 如果有
was a polynomial algorithm to solve patching, there would be a polynomial algorithm
to solve SP. SP would therefore not be NP-complete, which is incorrect. Patching is
therefore NP-complete.

6. Experimental Set-up

We describe in this section and the two following ones the experimental part of this
工作. This section concerns language-independent aspects of the experimental set-up
and Sections 7 和 8 describe the experiments and results obtained, 分别, 在
French and English data.

The complete experiments involve two offline and three online processes. 这

offline processes are:

1.

2.

Training the parser on a treebank.

Extracting SFs and SCs from raw data and computing their scores. 这
result of this process is the set L.

The online processes are:

1.

2.

3.

Generating the set I of ISFs and ISCs on a sentence S, using L. 我们会打电话
this process candidate generation.
Patching sentence S with I, using ILP. The result of this process is the set ˆI.
Parsing S under the constraints defined by ˆI.

The remaining part of this section is organized as follows: 部分 6.1 gives some
details about the parser that we use; 部分 6.2 describes some aspects of the extraction
of SCs and SFs from raw data. 部分 6.3 focuses on the candidate generation process.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

4 At this point, we can impose that N = 2n. We will need the condition N ≥ 2n later in order to prove the
reducibility of SP to P-SP. The condition N ≥ 2n is not harmful, if N > 2n, the range [2n + 1, . . . , 氮] 将要
simply not be used when representing a patching instance as a P-SP instance.

5 It is important to note that in the set U (西德:48)

any new constraint because this element only occurs in the set U (西德:48)
of another set U (西德:48)

k with k (西德:54)= i.

i = Ui ∪ {我 + n}, adding the new element i + n does not introduce
i and will not conflict with any element

70

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

We have decided not to devote specific sections to the processes of patching and
parsing under constraints. Patching has been described in full in Section 3. The ILP
solver we have used is SCIP (Achterberg 2009). Parsing under constraints has been
节中描述 2.2.

6.1 Parser

The parser used for our experiments is the second-order graph-based parser implemen-
tation of Bohnet (2010). We have extended the decoding part of the parser in order to
implement constrained parsing and k-best parses generation. The first-order extension
is based on algorithm 3 of Huang and Chiang (2005), and the second-order extension
relies on a non-optimal greedy search algorithm.

The parser uses the 101 feature templates, described in Table 4 of Bohnet (2010).
Each feature template describes the governor and dependent of a labeled dependency,
called the target dependency, as well as some features of its neighborhood. Every
feature is associated with a score, computed on a treebank. A parse tree score is defined
as the sum of the features it contains and the parser looks for the parse tree with the
highest score.

Feature templates are divided into six sets:

Standard Feature Templates are the standard first-order feature templates. They de-
scribe the governor and the dependent of the target dependency. The governor
and the dependent are described by combinations of their part of speech, 引理,
morphological features, and fledged form.

Linear Feature Templates describe the immediate linear neighbors of the governor
and/or the dependent of the target dependency. In such templates, governor,
dependent, and neighbors are represented by their part of speech.

Grandchild Feature Templates are the first type of second-order features. They de-
scribe the structural neighborhood of the target dependency. This neighborhood
is limited to a child of the dependent of the target dependency. Such templates
are suited to model linguistic phenomena such as prepositional attachments.

Linear Grandchild Feature Templates describe the immediate linear neighbors of the

governor, the dependent, and the grandchild of the target dependency.

Sibling Feature Templates are the second type of second-order feature templates. 他们
describe the structural neighborhood of the target dependency limited to a sibling
of the dependent of the target dependency. Such templates are suited, 例如,
to prevent the repetition of non-repeatable dependencies.

Linear Sibling Feature Templates describe the immediate linear neighbors of the

governor, and the two siblings.

When run in first-order mode, the parser only uses Standard and Linear features.

All features are used when the parser is run in second-order mode.

Four measures are used to evaluate the parser. The first two are standard whereas
the third and fourth have been specifically defined to monitor the effect of our model
on the performances of the parser:

Labeled Accuracy Score (LAS) of a tree T is the ratio of correct labeled dependencies

in T.

71

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

Unlabeled Accuracy Score (UAS) of a tree T is the ratio of correct unlabeled dependen-

cies in T.

Subcategorization Frame Accuracy Score (SFAS) of a tree T is the ratio of verbs in T
that have been assigned their correct subcategorization frame. 更确切地说,
given the reference parse R for sentence S and the output H of the parser for the
same sentence, all ISFs are extracted from R and H, yielding the two sets RSF and
HSF. SFAS is the recall of HSF with respect to RSF (

).

|RSF∩HSF|
|RSF|

Selectional Constraint Accuracy Score (SCAS) of a tree T is the ratio of correct occur-
rences of SC patterns in T. 更确切地说, given a gold standard tree R for sentence
S and the output H of the parser for the same sentence, all ISCs are extracted from
R and H, yielding the two sets RSC and HSC. SCAS is the recall of HSC with respect
to RSC.

6.2 Extracting SFs and SCs from Raw Data

The LSC extraction process takes as input raw data and produces a set L composed
of SCs and SFs. The process is straightforward: The corpora are first parsed, then LSC
templates are applied on the parses, using unification, in order to produce SCs and SFs
along with their number of occurrences. These numbers are used to compute selectional
constraints scores (sSC), as defined in Section 3.4, and subcategorization frame scores
(sSF) 如节中所述 3.3.

The extraction of lexical co-occurrences from raw data has been a very active direc-
tion of research for many years. Giving an exhaustive description of this body of work is
clearly beyond the aim of this article. The dominant approach for extracting lexical co-
occurrences, as described, 例如, in Volk (2001), Nakov and Hearst (2005), 坑
等人. (2010), and Zhou et al. (2011) directly model word co-occurrences on word strings.
Co-occurrences of pairs of words are first collected on raw corpora or n-grams. Based
on the counts produced, lexical affinity scores are computed. The detection of pairs of
word co-occurrences is generally very simple: It is either based on the direct adjacency
of the words in the string or their co-occurrence in a window of a few words. Bansal
and Klein (2011) and Nakov and Hearst (2005) rely on the same sort of techniques
but use more sophisticated patterns, based on simple paraphrase rules, for identifying
co-occurrences. Our work departs from these approaches by extracting co-occurences
in specific lexicosyntactic contexts that correspond to our SC patterns. Our approach
allows us to extract more fine-grained phenomena but, being based on automatically
parsed data, it is more error-prone.

Extracting SFs for verbs from raw data has also been an active direction of research
for a long time, dating back at least to the work of Brent (1991) and Manning (1993).
最近, Messiant et al. (2008) proposed such a system for French verbs. 这
method we use for extracting SF is not novel with respect to such work. Our aim was
not to devise new extraction techniques, but merely to evaluate the resource produced
by such techniques for statistical parsing.

6.3 Candidate Generation

The candidate generation process has as input a sentence S and a set L of SCs and SFs.
It produces the set I made of instantiations of elements of L on S. This set constitues
the input of the ILP solver that will produce the solution ˆI. The candidate generation

72

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

process is very important because it defines the search space of the patching process.
If the search space is not wide enough, it might fail to contain the correct solution. 如果
it is too wide, the IL programs generated will be too large to be solved in a reasonable
amount of time.

Several methods can be used in order to perform the candidate generation process.
One could compute I on the linear representation of S enriched with part of speech
tags and lemmas. Bechet and Nasr (2009), 例如, represent LSCs as finite-state
automata that are matched on S. This method has a tendency to over-generate and
proposes some very unlikely instantiations.

In order to constrain the instantiation process, some syntax can be used. 考试用-
普莱, LSCs can be matched on a set T of possible parses of S in order to produce I. 这
set T itself can be the k-best parses produced by a parser, as in parse reranking (柯林斯
and Koo 2005; Charniak and Johnson 2005). It can also be built using parse correction
技巧 (Hall and Nov´ak 2005; Attardi and Ciaramita 2007; Henestroza and Candito
2011). In the latter approach, a parse tree T of S is first built, using a parser, and parse cor-
rection rules are applied on it in order to build other parses that might correct errors of T.
The solution that we propose uses a parser to produce a list T = {T1, . . . , Tk} of k-
best parses, such as in parse reranking. But this list is merged into a set of dependencies
Dk that is the union of all the dependencies appearing in the k-best trees. The set Dk is
not a tree because a single word can have several governors. This set can be seen as a
superset of the k-best parses. We will call this method sub-parse combination or simply
combination when the context is not ambiguous.

The main difference between reranking and combination is that the latter can com-
bine dependencies that do not co-occur in a single tree of the k-best list. The search space
that it defines is therefore larger than the k-best list. The set Dk can be compared to the
pruned forest of Boullier, Nasr, and Sagot (2009). Once the set Dk is built, the LSCs of L
are matched on the dependencies of Dk using unification.

In order to compare the search spaces respectively defined by a k-best parse list and
the corresponding Dk set, we define two measures on the k-best parse list. The first one
(Oracle LAS) corresponds to the LAS of the best tree in the k-best list. The second one
(Recall LAS) is the part of labeled dependencies of the reference parse that appear in
Dk. Oracle LAS constitutes an upper bound for reranking and Recall LAS is an upper
bound for combination. The values of these two measures are represented in Figure 3

数字 3
Oracle and Recall LAS of the k-best trees for first-order (左边) and second-order (正确的) 解析,
computed on FTB TEST.

73

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

数字 4
Recall SFAC and SCAS of the k-best trees for first-order (左边) and second-order (正确的) 解析,
computed on FTB TEST.

for first-order parsing (左边) and second-order parsing (正确的) measured on the test set of
the French Treebank.

As one can see in Figure 3, the Recall LAS curve is, 当然, always above the
Oracle LAS curve. In the case of first-order parsing, the Oracle curve quickly reaches a
plateau. This is because first-order k-best parses are generally very close to one another,
because of the very local nature of the model. The parses in the k-best list tend to
be combinations of dependencies that appeared in previous parses. 的数量
new dependencies created tend to decrease quickly with k. For k = 100, the difference
between the Oracle and the Recall scores is almost equal to 5 点. This difference
clearly demonstrates the advantage of using Dk instead of the k-best list.

The results for second-order parsing are less spectacular. The Oracle curve does
not reach an asymptote, as it was the case for first-order parsing. New dependencies
are continually being created as k increases. The reason why we have considered first-
order models in this section is that generating a first-order k-best solution can be done
very efficiently; second-order k-best generation is more time-consuming. Performing
candidate generation with a first-order parser is therefore a potentially interesting
方法. We will see in the two following sections that first-order k-best generation gave
good results on French but did not perform well on English, where second-order k-best
generation had to be used.

Although Figure 3 gives an estimation of the LAS upper bounds of reranking and
combination, it does not tell us what can be expected in terms of SFAS and SCAS.
数字 4 answers this question by showing the evolution of Recall SFAS and Recall SCAS
on the same data set. As one can see, for k = 100, with first-order k-best generation,
almost 94% of the ISCs and 93% of the ISFs that appear in the reference can be recovered.
The results for second-order k-best are, 分别, equal to 97% 和 96%.

7. Experiments on French

We describe in this section the experiments performed on French and the results ob-
泰内德. We start, in Section 7.1, with the description of the data used to train the parser
and give an overview of the errors made by the parser. The extraction of SCs and
SFs from raw corpora is described in Section 7.2, which starts with the description
of the corpora that were used, then gives general statistics and coverage results for

74

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

桌子 1
Decomposition of the French Treebank into training, 发展, and test sets.

number of sentences

number of tokens

FTB TRAIN
FTB DEV
FTB TEST

9,881
1,239
1,235

278,083
36,508
36,340

the extracted data. The results of patching and parsing are given in Section 7.3 和
an analysis of the errors is presented in Section 7.4. 部分 7.5 analyzes the runtime
performances.

7.1 Parsing

The parser was trained on the French Treebank (Abeill´e, Cl´ement, and Toussenel 2003),
transformed into dependency trees by Candito et al. (2009). The size of the treebank and
its decomposition into train, 发展, and test sets are represented in Table 1.

The parser gave state-of-the-art results for French: the LAS and UAS are, 重新指定-
主动地, 88.88% 和 90.71%. The SCAS reaches 87.81% and the SFAS is 80.84%, 哪个
means that in 80.84% of the cases, a verb selects its correct subcategorization frame.
These results are for sentences with gold part-of-speech-tags and lemmas.

An analysis of the errors shows that the first cause of errors is prepositional attach-
评论, which account for 23.67% of the errors. A more detailed description of the errors
is given in Table 2. Every line of the table corresponds to one attachment type. 第一个
column describes the nature of the attachment. The second indicates the frequency of
this type of dependency in the corpus. The third column displays the accuracy for this
type of dependency (the number of dependencies of this type correctly predicted by the
parser divided by the total number of dependencies of this type). The fourth column
shows the contribution of the errors made on this type of dependency to the global
error rate. We have used this error analysis to define the following four SC patterns,
which we call SBJ, OBJ, VdeN, and VaN:

([:V:::]([SUJ:氮:::]))
([:V:::]([OBJ:氮:::]))

SBJ
OBJ
VdeN ([:V:::]([DE-OBJ:磷:的::]([OBJ:氮:::])))
VaN

([:V:::]([A-OBJ:磷:A::]([OBJ:氮:::])))

The last column of Table 2 indicates the SC pattern that corresponds to a given
type of error. One can see that all SC patterns are not described at the same level of
detail—some of them specify the label of the dependency, if considered important (到
distinguish subject and object, 例如), and others specify the lexical nature of the
dependent (for prepositions, 例如).

7.2 Extracting SFs and SCs

In order to automatically produce the set L of SCs and SFs, we gathered a raw corpus
that we have parsed using the parser described in Section 7.1. The corpus is actually a
collection of three corpora of slightly different genres. The first one (AFP) is a collection
of news reports of the French press agency Agence France Presse; 第二 (EST REP)
is a collection of newspaper articles from a local French newspaper: l’Est R´epublicain.

75

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

桌子 2
The seven most common error types made by the baseline parser. For each type of error, 我们
displayed its frequency, the parser accuracy, the impact on global errors, and the SC pattern that
corresponds to the error type.

依赖性

freq.

acc.

contrib.

SC pattern

N—mod →N
V—∗ → `a
V—suj → N
N—coord → CC
N—∗ → de
V—∗ → de
V—obj → N

1.50
0.88
3.43
0.77
3.70
0.66
2.74

72.23
69.11
93.03
69.78
92.07
74.68
90.43

2.91
2.53
2.53
2.05
2.05
1.62
1.60

VaN
SBJ

VdeN
OBJ

The third (WIKI) is a collection of articles from the French Wikipedia. The size of the
different corpora are detailed in Table 3.

The raw corpora were first tokenized, POS-tagged, and lemmatized with the
MACAON tool suite (Nasr et al. 2011), then parsed in order to get the most likely parse
for every sentence. The SFs and SCs were then extracted from the parses and their scores
computed. In the two following sections, we give some statistics on the SFs and SCs
produced.

7.2.1 Extracted SF. As described in Section 3.3, an SF frame is a special kind of LSC. 它是
root is a predicate and its leaves are the arguments of the predicate. Some linguistic
constraints are defined over SF. The first one is the category of the root, which must
be either a finite tense verb (V), an infinitive form (VINF), a past participle (VPP), 或者
a present participle (VPR). The second is the set of syntactic functions that introduce
arguments of the predicate; they can either be a subject (SUJ), a direct object (OBJ), or an
indirect object introduced by preposition `a (A-OBJ) or de (DE-OBJ).

SFs are defined at a level that is slightly more abstract than the surface syntactic
realization. Two abstractions are in action. The first one is the factoring of several parts
of speech. We have factored pronouns, common nouns, and proper nouns into a single
category N. We have not gathered verbs of different modes into one category because
the mode of the verb influences its syntactic behavior and hence its SF. 第二
abstraction is the absence of linear order between the arguments.

The number of verbal lemmas, number of SFs, and average number of SFs per verb
are shown in Table 4, column A0. As can be observed, the number of verbal lemmas and

桌子 3
Some statistics computed on the corpora used to collect Subcategorization Frames and
Selectional Constraints.

CORPUS Number of sentences Number of tokens

2,041,146
2,998,261
1,592,035
5,198,642

59,914,238
53,913,288
33,821,460
147,648,986

AFP
EST REP
WIKI
TOTAL

76

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

桌子 4
Number of verbal lemmas, SFs, and average number of SFs per verb for three levels of
thresholding (0, 5, 和 10).

number of verbal lemmas
number of different SFs
average number of SFs per verb

A0

23,915
12,122

A5

4,871
2,064

A10

3,923
1,355

14.26

16.16

13.45

SFs are unrealistic. This is because the data from which SFs were extracted are noisy:
they consist of automatically produced syntactic trees that contain errors. 那里有两个
main sources of errors in the parsed data: the pre-processing chain (tokenization, 部分-
of-speech tagging, and lemmatization), which can consider as a verb a word that is
不是; 和, 当然, parsing errors, which tend to create incorrect SFs. In order to fight
against noise, we have used a simple thresholding: We only collect SFs that occur more
than a threshold i. The results of the thresholding appear in columns A5 and A10, 在哪里
the subscript is the value of the threshold. It is a very crude technique that allows us
to eliminate many errors but also rare phenomena. 正如预期的那样, both the number of
verbs and SFs decrease sharply when increasing the value of the threshold. The average
number of SFs per verb is 14.26 without threshold and reaches 16.16 for a threshold
的 5. This phenomenon is mainly due to part of speech tagging errors on the raw
语料库: A large part of hapax are words mistakenly tagged as verbs. They disappear
with thresholding, and their disappearance tends to increase ambiguity.

We report in Table 5 the coverage of the extracted SFs on FTB DEV. The last column
represents the coverage of the SF extracted from FTB TRAIN. These figures constitute
an interesting reference point when evaluating the coverage of the SFs extracted from
raw data. We have distinguished lexical coverage (ratio of verbal lemmas of FTB DEV
present in the resource) and syntactic coverage (ratio of SFs of FTB DEV present in the
resource). Both measures were computed on types and tokens. 桌子 5 shows that lexical
coverage is very high, even on A10. It is interesting to note that syntactic coverage of the
automatically built resource is much higher than syntactic coverage of the same resource
extracted from FTB TRAIN. This clearly shows that many SFs of FTB DEV were not seen
in FTB TRAIN. This observation supports our hypothesis that treebanks are not large
enough to accurately model the possible SFs of a verb. Comparing coverage of T and
Ax is somehow unfair because, as we already mentioned, the automatic resources are
嘈杂, and have a tendency to overgenerate whereas T contains only correct SFs (最多
the corpus annotation errors). Accuracy results of Section 7.3 will give a more balanced

桌子 5
Lexical and syntactic coverage on FTB DEV of the extracted SFs. Coverage is computed for three
thresholding levels of the automatically generated resource (A0, A5, and A10) and FTB TRAIN (时间).

A0

A5

A10

时间

Lex. cov.
Synt. cov.

类型
99.52
95.78

代币
99.85
97.13

类型
98.56
91.08

代币
99.62
93.96

类型
98.08
88.84

代币
99.50
92.39

类型
89.56
62.24

代币
96.98
73.54

77

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

桌子 6
Number of occurrences of the four SC patterns (OBJ, SBJ, VdeN, and VaN) in raw corpora with
three levels of thresholding (0, 5, 和 10).

SC pat.

OBJ
SBJ
VdeN
VaN
TOTAL

A0

A5

A10

422,756
433,196
116,519
185,127
1,157,598

58,495
55,768
11,676
23,674
149,613

26,847
25,291
4,779
10,729
67,646

view by computing how many errors are actually corrected using the noisy extracted
resource.

7.2.2 Extracted SCs. The number of different SCs extracted for each of the four SC
configurations are shown in Table 6 for the three thresholds 0, 5, 和 10.

The coverage of SCs on FTB DEV is shown in Table 7. It is interesting to note that
SC coverage is much lower than SF coverage. This is due to the different nature of the
two phenomena: SC is a bilexical phenomena whereas SF is a monolexical one. 理想情况下,
a larger corpus should be used to extract SCs. One can also note that the coverage of
SCs extracted from FTB DEV is very low.

7.3 结果

The results obtained by our method are shown in Table 8, using the four measures
defined in Section 6.1. The accuracy scores of the baseline parser are presented in line 1.
Line 2 shows the results obtained when setting parameters µ1 and µ2 to 1. This setting
corresponds to the situation where SF and SC scores are not used. Patching, 在这个
案件, only combines ILSCs that have a good confidence measure. This experiment is
important in order to ensure that sSF and sSC do carry useful information. As one can
看, setting µ1 and µ2 to 1 slightly increases the accuracy of the parser, which means that
the accuracy of the parser can be increased just by imposing the final parse sub-parses
that have a good confidence measure.

The optimal values of µ1 and µ2 were computed on FTB DEV; they are both 0.65.
These values are used for the experiments corresponding to lines 3, 4, 和 5. Lines 3

桌子 7
Coverage for the four SC configurations (OBJ, SBJ, VdeN, VaN). Coverage is computed for three
thresholding levels of the automatically generated resource (A0, A5, and A10) and FTB TRAIN (时间).

SC pat.

A0

A5

A10

时间

OBJ
SBJ
VdeN
VaN
TOTAL

类型
68.31
66.26
46.28
62.93
65.17

代币
69.71
68.18
56.61
64.69
67.54

类型
58.50
52.87
34.93
48.89
53.10

代币
61.51
57.24
49.57
52.42
57.15

类型
53.24
47.82
31.66
42.68
47.92

代币
56.12
52.94
46.73
47.56
52.78

类型
17.94
23.11
11.57
20.06
19.96

代币
21.31
24.48
15.77
23.34
22.24

78

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

桌子 8
Four measures of accuracy on FTB TEST for four settings: Baseline second-order parser, patching
with confidence measure, patching with SF only, patching with SCs only, and patching with SFs
and SCs.

LAS

UAS

SCAS

SFAS

Baseline
CM only
SF Selection
SC Satisf.
组合

88.88
89.02
89.21
89.22
89.51

90.71
90.98
91.05
91.09
91.39

87.81
88.03
88.19
91.97
92.89

80.84
81.16
84.13
82.03
85.06

和 4 indicate the accuracy for SF selection and SC satisfaction alone, as described in
Sections 3.3 和 3.4. Line 5 provides the result when they are combined, as described in
部分 3.5.

The table shows that, when setting µ1 and µ2 to their optimal values, the three
methods increase both LAS and UAS with respect to the baseline. The best results are
obtained by the combined method. SCAS and SFAS allow us to gain a better under-
standing of what is happening. As could be expected, SF selection has a stronger impact
on SFAS whereas SC satisfaction has a stronger impact on SCAS. But both methods also
have a positive influence on the other measure. This is because SFs and SCs are not
independent phenomena and correcting one can also lead to correcting the other. 这
combined method yields the best results on our four measures.

7.4 误差分析

There are three main causes for SC and SF errors: insufficient coverage of L, narrowness
of the patching search space I, or inadequacy of the ILP objective function. Given a
sentence S, let us note R its correct parse and H the parse produced by our system.
Let us note ˆIR and ˆIH, 分别, the set of ILSCs of R and H. We define an error as
an ILSC x ∈ ˆIR, such that x /∈ ˆIH, and we note y the LSC corresponding to x (x is an
instantiation of y). The three error causes are described as follows:

NotInDB y /∈ L. This is the situation where an ILSC cannot be extracted from the k-
best list because its corresponding LSC does not appear in L. A possible cause for
such errors comes from the corpora we have used to extract LSCs. We have shown
in Section 7.2 that the coverage of our resources is not perfect and some ILSCs
that appear in the reference parse R might not appear in the resource. A solution
to such a problem would be to use larger corpora in order to increase coverage.
Another cause of this error is the processing chain that was used to process the
raw corpora. As already mentioned, any of the modules that compose this chain
can make errors and introduce noise in L.

NotInKB x /∈ I ∪ I(西德:48). This is the situation where x does not appear in any of the k-best
树. The problem is the limitation of the search space of the ILP. The solution
to this problem is either to increase the size of the search space or to use another
method to produce it, as proposed in Section 3.2. The simplest way to increase
the search space size is to increase the length of the k-best list. 然而, 增加

79

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

桌子 9
Raw number of errors, error decrease, and error distribution for each SC pattern, average of SC
图案, and SFs.

Type

OBJ
SBJ
VdeN
VaN
SC
SF

err.

72
101
59
132
364
807

decr. NotInDB NotInKB NotInSOL

37.5%
47.5%
27.1%
45.4%
41.6%
22.0%

0.55
0.41
0.42
0.24
0.38
0.22

0.44
0.30
0.70
0.33
0.42
0.24

0.22
0.47
0.21
0.59
0.41
0.54

the search space yields larger IL programs and therefore the resolution time of
the ILP.

NotInSOL x ∈ I ∪ I(西德:48) but x /∈ ˆIR. This is the situation where x is in the input of the
ILP program but is not part of the solution. There are two possible causes for this
situation. The first one comes from the scores sSF and sSC computed on the corpora.
A correct ILSC might have been assigned a lower score than an incorrect one. 这
situation, 反过来, can be due to different attachment preferences in the test and
the raw corpora used to produce L. The second cause is the confidence measure
used to introduce a syntactic dimension to the ILP objective function, as described
in Section 4. 在某些情况下, an ILSC might have a good lexical score but a bad
syntactic score, in which case it is ruled out.

桌子 9 details the distribution of the errors over these three categories. Column 1
describes the nature of the attachment. The four first rows correspond to specific SC
图案, 排 5 is the average of all SCs, and row 6 concerns SF selection. Column 2
indicates the number of errors made by the baseline parser. Column 3 indicates the
decrease of the number of errors and columns 4, 5, 和 6 give the distribution of the
errors over the three error types defined earlier.

As one can see from Table 9, our method decreases by 41.6% the number of SC
errors and by 22% the number of SF errors. The difference between these two figures
comes from the fact that SFs are more complex phenomena that SC patterns. SFs can be
composed of a large number of dependencies and an error in one of them leads to an
error in the SF selection.

The SC errors are almost equally distributed along the three error types: 在 38% 的
the cases, the correct SC is not L; 在 42% of the cases, the correct SC does not appear in
这 100 best parses; 并在 41% of the cases, the ILP fails to find the correct SC, 虽然
it is in the search space. There are three possible causes for the latter type of error: Recall
that the objective function of the ILP is based on the scores of ILSCs, which are them-
selves a combination of a lexical and a syntactic score: s() = (1 − µ)sL() + µsS().
The cause of the error could be the lexical score (sL), the syntactic score (sS), 或者
重量 (µ).

Most SF errors pertain to category NotInSOL: 在 54% of the cases, the correct SF is in
the input of the ILP but it is not part of the solution. This might be due to the definition
of the sSF score, which is a simple conditional probability.

80

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

数字 5
Decomposition of runtime on FTB TEST.

7.5 Runtime Analysis

数字 5 displays the runtime of our system with respect to the length of the sentences
parsed. Runtime has been decomposed in three parts: first-order 100-best generation,
ILP solving, and constrained second-order parsing. The runtime of the instantiation
process is not reported because it is negligible compared with the runtime of the other
流程.

数字 5 shows that ILP resolution is the most time-consuming part of the whole
过程. Although in practice the performances of the whole system are reasonable
(大约 330 words per second on average), we show that the patching problem
is NP-complete and we have no guarantee that the problem will be solved in polynomial
时间.

8. Experiments on English

A second series of experiments was performed on English. Although the setting used
is almost the same as for French, there are some notable differences between the two
series of experiments with respect to the data used:

(西德:114)

(西德:114)

(西德:114)

The treebank used for training the parser is significantly larger.

The definition of SC and SF patterns is different.

The raw corpus used to extract ISCs and ISFs is one order of magnitude
larger.

8.1 Parsing

The parser was trained and evaluated on the Penn Treebank (马库斯, 马尔辛凯维奇,
and Santorini 1993). We have used the standard decomposition of the Penn Treebank

81

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

桌子 10
Decomposition of the Penn Treebank into training, 发展, and test sets.

Number of sentences Number of tokens

PTB TRAIN (秒. 02–21)
PTB DEV
PTB TEST

(秒. 22)
(秒. 23)

39,279
1,334
2,398

958,167
33,368
57,676

into training, 发展, and test sets, with sizes as reported in Table 10. 主要的
difference with respect to French comes from the training set, 这是 2.44 times larger.
As could be expected, the parser trained and evaluated on the Penn Treebank yields
better results than the same parser trained on the French Treebank. The LAS is 91.76 和
UAS is 93.75, a relative increase of 3.24% 和 3.35% with respect to French.

The English SF and SC definitions differ from the French ones in the choice of the
preposition used. The nine most frequent prepositions occuring in the Penn Treebank
were taken into account ({的, 在, 到, 为了, 和, 在, 在, 从, 经过}) to define both SCs and SFs.
A set of 20 SC patterns are defined: {OBJ, SBJ, VofN, VinN, VtoN, VforN, VwithN, VonN,
VatN, VfromN, VbyN, NofN, NinN, NtoN, NforN, NwithN, NonN, NatN, NfromN,
NbyN}. SFs are defined as the dependents of a verb that are introduced by one of the
nine prepositions, plus the subject and the object.

The SCAS and SFAS are, 分别, 91.99 和 88.24: an increase of 4.76% 和
9.15% compared to French. These results show that the English parser makes fewer
errors on prepositional attachments. This situation may be due to the fact that prepo-
sitional attachments are more ambiguous in French (4.56 prepositions on average per
sentence in the FTB compared with 3.15 for the PTB).

桌子 11 details the error rate for each of the 20 SC patterns defined, sorted with
respect to their contribution to the total error rate. The total contribution of these errors
reaches 18.87%, which is an approximation of the upper limit of the improvement that
we can expect from our method. The mean accuracy for these patterns is almost equal
到 92%. Two situations should be distinguished, 然而: the subject and direct object
dependencies, 一方面, and the prepositional attachments, 在另一. 这

桌子 11
Contribution of the 20 SC patterns to the errors made by the parser. For each pattern, 我们
displayed its frequency, the parser accuracy, and the impact on global errors.

SC pat.

freq.

acc.

contrib.

SC pat.

freq.

acc.

contrib.

0.88
VinN
5.48
OBJ
7.71
SBJ
0.32
VforN
0.59
NinN
0.32
VonN
0.30
VtoN
0.35
NforN
0.16
NonN
VfromN 0.14

72.75
95.72
97.17
56.04
76.35
63.69
78.82
84.42
68.48
69.51

2.91
2.85
2.65
1.71
1.69
1.39
0.77
0.66
0.62
0.53

VatN
NofN
VwithN
VbyN
NtoN
NwithN
VofN
NatN
NfromN
NbyN
TOTAL

0.20
1.72
0.23
0.34
0.16
0.15
0.06
0.14
0.10
0.06
19.43

78.26
97.44
84.38
92.82
85.71
84.52
74.29
89.02
85.96
76.47
91.99

0.53
0.53
0.43
0.30
0.28
0.28
0.19
0.19
0.17
0.17
18.87

82

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

prepositional attachments (restricted to the nine most frequent prepositions) represent
6.24% of the dependencies of the test set but their contribution to the total error rate is
equal to 13.37% and the mean accuracy of these attachments is equal to 82.32%, 然而
subject and direct object dependencies correspond to 13.19% of the dependencies but
generate only 5.5% of the errors.

8.2 Extracting SFs and SCs

The ISFs and ISCs were extracted from the Google-parsed Gigaword (Napoles, Gormley,
and Van Durme 2012b, 2012A) (noted as the GIGA corpus in the remainder of the article),
which is a subset of the original English Gigaword, fifth edition (Parker et al. 2011),
parsed with the parser described in Huang, 哈珀, and Petrov (2010). The origin of the
corpora as well as their sizes are reported in Table 12. As one can see, the size of the
corpus is one order of magnitude larger than the French corpus we gathered to extract
ISFs and ISCs—more precisely, 这是 31 times larger.

The number of occurrences of each of the 20 SC patterns extracted are displayed in

桌子 13. A total of 33.6 million occurrences of SC patterns have been extracted.

桌子 14 shows some statistics on the number of SF extracted from the data. 这
number of different SFs extracted is, 一般, three times larger than in French.
This difference is mainly due to the definition of English SFs, which take into account
more prepositions. The mean number of SFs per verb is 15.39 without thresholding and
reaches 17.73 for a threshold of 10.

Tables 15 和 16 display coverage on Penn Treebank (PTB) DEV for SC and SF
extracted from the GIGA corpus, with three levels of thresholding, as well as the
coverage for SCs and SFs extracted from PTB TRAIN.

The coverage for SCs is 65.66% on tokens, for a threshold equal to 10, a relative
increase of 24% compared with French. This increase is the expected result of using
more data. The most striking figure of Table 15 is the coverage of the training corpus.
它是 22.24% for French and 54.55% for English. This increase cannot be explained
only by the larger size of the training corpus for English. It seems to indicate a higher
lexicosyntactic variety in the French Treebank than in the Penn Treebank, or a higher
variety between the development and training sets of the French Treebank compared
with the Penn Treebank.

桌子 12
Some statistics computed on the corpora used to collect Subcategorization Frames and
Selectional Constraints.

CORPUS

Number of sentences Number of tokens

Agence France-Presse, English Service
Associated Press Worldstream, English Service
Central News Agency of Taiwan, English Service
Los Angeles Times/Washington Post Newswire
Service
New York Times Newswire Service
Washington Post/Bloomberg Newswire Service
Xinhua News Agency, English Service
TOTAL

35,200,934
64,893,993
1,399,398
13,711,844

80,630,849
808,266
15,770,726
212,416,010

811,564,028
1,346,062,109
41,125,514
313,874,339

1,672,223,270
20,135,054
382,529,760
4,587,514,074

83

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

桌子 13
Number of occurrences of the 20 SC patterns in the GIGA corpus with three levels of
thresholding (0, 5, 和 10).

SC pat.

OBJ
SBJ
VofN
VinN
VtoN
VforN
VwithN
VonN
VatN
VfromN
VbyN
NofN
NinN
NtoN
NforN
NwithN
NonN
NatN
NfromN
NbyN
TOTAL

A0

A5

A10

5,517,917
5,150,122
488,475
1,866,766
1,201,947
1,223,887
1,591,669
1,148,819
640,019
816,035
1,200,835
4,655,268
2,028,525
784,074
1,691,557
1,066,320
956,125
516,497
743,002
347,791
33,635,650

1,531,695
1,312,531
79,799
471,104
2,750,660
263,518
302,467
260,671
143,864
164,210
252,166
1,048,782
404,431
128,504
296,692
138,280
170,875
93,686
114,957
56,020
7,509,318

933,764
768,133
37,128
270,713
153,676
142,346
151,949
142,715
80,390
85,407
131,468
579,842
212,265
63,935
144,884
62,051
85,611
47,476
54,695
27,363
4,175,811

桌子 14
Number of verbal lemmas, SFs, and average number of SFs per verb for three levels of
thresholding (0, 5, 和 10).

number of verbal lemmas
number of different SFs
average number of SFs per verb

A0

67, 186
31, 055

A5

9, 271
6, 394

A10

6, 305
3, 816

15.39

21.52

17.73

The syntactic coverage of the SF extracted from the GIGA corpus (the ratio of
对 [verbal lemma, SF] of PTB DEV present in the GIGA corpus) 是 77.56% when no
threshold is applied, although it was 97.13% for French. This difference is the result
of the more extended definition of SF in English. But the more surprising figure is the
coverage of the PTB TRAIN, 这是 82.7% (它是 73.54% for French). This difference
could be explained, as was the case for SC coverage, by a difference of lexicosyntactic
variety between the French Treebank and the Penn Treebank.

8.3 结果

In our experiments on French, the first step used a first-order k-best parser. 在里面
English experiment, as already noted, the accuracy of the second-order parser is higher
and the quality of the k-best parses using a first-order parser is not good enough to leave
enough room for improvement over a second-order parser. The situation is detailed in

84

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

桌子 15
Coverage for the 20 SC patterns on PTB DEV. Coverage is computed for three thresholding levels
of the automatically generated resource (A0, A5, and A10) and PTB TRAIN (时间).

SC pat.

A0

A5

A10

时间

类型
SC pat.
78.26
OBJ
79.18
SBJ
86.36
VofN
72.86
VinN
73.94
VtoN
66.96
VforN
VwithN 82.35
79.55
VonN
86.05
VatN
VfromN 68.09
59.05
VbyN
68.54
NofN
58.14
NinN
74.51
NtoN
63.56
NforN
NwithN 46.34
67.27
NonN
50.00
NatN
NfromN 48.72
52.17
NbyN
74.59
TOTAL

代币
71.86
82.66
86.96
73.94
74.67
67.2
83.02
78.26
87.23
68.75
59.63
68.07
59.12
76.79
64.17
47.62
66.07
51.22
44.19
52.17
74.8

类型
73.24
75.99
54.55
67.29
64.08
63.48
72.55
73.86
83.72
59.57
51.43
60.94
52.33
54.9
53.39
43.9
52.73
47.5
43.59
39.13
69.17

代币
66.13
78.25
56.52
68.66
64.67
64.00
73.58
72.83
85.11
60.42
52.29
60.53
53.59
58.93
54.17
45.24
51.79
48.78
39.53
39.13
69.01

类型
70.23
74.03
50
65.8
59.15
58.26
70.59
69.32
79.07
55.32
45.71
56.42
48.26
52.94
46.61
36.59
45.45
37.5
35.9
34.78
65.95

代币
62.07
76.34
52.17
67.25
60
59.2
71.7
68.48
80.85
56.25
45.87
56.14
49.17
57.14
47.5
38.1
44.64
39.02
32.56
34.78
65.66

类型
49.86
68.83
13.64
28.62
26.76
26.96
11.76
27.27
27.91
17.02
9.52
27.49
26.16
15.69
18.64
9.76
27.27
22.5
17.95
0.00
46.6

代币
58.43
75.97
17.39
32.04
28.67
31.20
13.21
30.43
29.79
16.67
11.01
28.77
28.73
19.64
19.17
11.90
28.57
21.95
25.58
0.00
54.55

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C

桌子 16
Lexical and syntactic coverage of the extracted SF on PTB DEV. Coverage is computed for
three thresholding levels of the automatically generated resource (A0, A5, and A10) 和
PTB TRAIN (时间).

A0

A5

A10

时间

Lex. cov.
Synt. cov.

类型
98.34
72.60

代币
99.37
77.56

类型
98.23
66.23

代币
99.35
72.87

类型
95.58
63.16

代币
98.77
70.97

类型
95.36
70.00

代币
98.95
82.70


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

桌子 17. The table shows that the Oracle accuracy of 100 best parses for a first-order
parser is often below the accuracy of the first-best parse for a second-order parser. 这
situation is slightly more favorable for k = 200 but is still dominated for some SCs by
the first-best second-order accuracy. This is the reason why we decided to use a second-
order parser in the candidate generation step. The rest of the process is unchanged.

The results obtained are shown in Table 18. The decrease of labeled error attach-
ments is 3.15% and the decrease of unlabeled error attachments is 4.16%. These results
are lower than what we observed for French, where the decrease is, 分别, 5.66%
和 7.32%. Selectional Constraint violations are decreased by 16.21% and erroneous
Subcategorization Frame assignments by 8.83%. They were equal to 41.6% 和 22%,
分别, for French. These results are disappointing—we expected that the larger

85

计算语言学

体积 42, 数字 1

桌子 17
Unlabeled accuracy for each of the 20 SC patterns using a second-order and first-order parser. 在
the second-order parser, the first-best and oracle scores for k = 100 are shown. In the first-order
parser, the first-best and the oracle score for k = 100 and k = 200 are shown.

SC pat.

100 oracle

first-best

100 oracle

o = 2

o = 1
200 oracle

first-best

OBJ
SBJ
VofN
VinN
VtoN
VforN
VwithN
VonN
VatN
VfromN
VbyN
NofN
NinN
NtoN
NforN
NwithN
NonN
NatN
NfromN
NbyN

98.13
98.93
100
96.59
98.24
93.41
99.22
94.41
98.26
98.78
98.97
99.49
96.11
98.90
97.49
96.43
95.65
100
96.49
100

95.72
97.17
74.29
72.75
78.82
56.04
84.38
63.69
78.26
69.51
92.82
97.44
76.35
85.71
84.42
84.52
68.48
89.02
85.96
76.47

96.23
96.39
85.71
61.32
76.47
54.95
85.16
60.34
57.39
71.95
89.23
98.57
81.14
94.51
91.96
90.48
72.83
90.24
91.23
91.18

96.91
96.71
88.57
63.73
76.47
60.99
87.5
60.89
59.13
73.17
89.74
98.88
82.34
95.6
93.47
94.05
73.91
91.46
91.23
97.06

93.3
93.96
77.14
56.91
74.12
46.7
82.03
56.42
53.91
69.51
88.21
97.03
70.66
81.32
82.41
77.38
64.13
85.37
75.44
82.35

桌子 18
Four measures of accuracy on PTB TEST for four settings: Baseline second-order parser, patching
with syntactic score only, patching with SFs only, patching with SCs only, and patching with SFs
and SCs.

LAS

UAS

SCAS

SFAS

Baseline
CM only
SF Selection
SC Satisf.
组合

91.76
91.80
91.97
92.02
92.05

93.75
93.79
93.93
94.01
94.05

91.99
92.06
92.96
93.11
93.29

88.24
88.27
89.20
89.14
89.28

size of the corpus used for extracting ISCs and ISFs would yield a larger decrease of SC
and SF errors.

8.4 误差分析

桌子 19 allows us to gain a better understanding of the causes of the errors. 第一个
cause of errors is the ILP objective function. 在 53% of the cases for SC errors and 59%
of the cases for SF errors, the ILP failed to find the correct SFs or SCs although it is
in the search space. The inadequacy of the objective function is difficult to interpret
because it has, 本身, three causes, as we mentioned in our experiments on French: 这
SC or SF scores, the confidence measure, or the weight µ. The second cause of errors

86

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

桌子 19
Raw number of errors, error decrease, and error distribution for each SC pattern, average of SC
图案, and SF.

Type

err.

decr.

NotInDB NotInKB NotInSOL

OBJ
SBJ
VofN
VinN
VtoN
VforN
VwithN
VonN
VatN
VfromN
VbyN
NofN
NinN
NtoN
NforN
NwithN
NonN
NatN
NfromN
NbyN
SC
SF

14.29%
133
4.03%
124
11.11%
9
30.15%
136
30.56%
36
26.25%
80
10.00%
20
18.46%
65
16.00%
25
8.00%
25
14 −14.29%
12.00%
25
17.72%
79
0.00%
13
6.45%
31
0.00%
13
24.14%
29
0.00%
9
0.00%
8
12.50%
8
16.21%
882
8.83%
1314

0.39
0.24
0.38
0.48
0.32
0.29
0.44
0.32
0.19
0.35
0.56
0.23
0.4
0.38
0.24
0.31
0.23
0.44
0.13
0.43
0.34
0.41

0.51
0.39
0.00
0.18
0.12
0.2
0.06
0.19
0.10
0.04
0.13
0.23
0.2
0.08
0.17
0.23
0.18
0.00
0.25
0.00
0.25
0.19

0.40
0.51
0.63
0.46
0.60
0.61
0.56
0.62
0.71
0.65
0.44
0.68
0.49
0.54
0.59
0.46
0.59
0.56
0.75
0.57
0.53
0.59

is coverage. Despite the size of the raw corpus used to extract SCs and SFs, the lack of
coverage remains an important source of errors. The quality of the search space of the
ILP is the third cause of errors. This observation shows that candidate generation with
second-order parsing yields better results than first-order parsing.

9. Conclusions and Discussion

We showed in this paper that erroneous Subcategorization Frame assignment and
Selectional Constraint violations can be partially corrected by using weighted lexicosyn-
tactic patterns, called patches, representing Subcategorization Frames and Selectional
Constraints that have been extracted from a large raw corpus. The experiments, 每-
formed on the French Treebank, allowed us to decrease by 41.6% Selectional Constraint
violations and by 22% erroneous Subcategorization Frame assignments. These figures
are lower for English: 16.21% in the first case and 8.83% in the second.

The method proposed is based on the decomposition of the parsing process into two
sub-processes. The first one, called patching, 标识, in a sentence, a optimal set of
patches that can be of arbitrary size. The second process is based on constrained graph-
based parsing; it takes as input an optimal set of patches and builds a complete parse of
the sentence that contains these patches.

The proposed architecture is one solution to the general problem of combining local
and global lexicosyntactic patterns in a parser. It has been applied to the problem of
selecting Subcategorization Frames and Selectional Constraints but it could be used to
take into account any type of lexicosyntactic linguistic constraints, such as semantic
frames or discourse structures.

87

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

The main limit of our method is the NP-complete nature of patching. 理论上,
solving an instance of the patching problem could require exponential time. 在实践中,
for our task, the resolution time is reasonable, but it could become an issue when dealing
with larger patches. Other approximate methods could be used to perform the patching
任务, this is the first direction in which we wish to continue this work.

The second direction that we want to explore concerns the first step of the patching
过程, 即, candidate generation. Patches being lexicosyntactic structures, the gen-
eration process needs to have access to the possible syntactic structures of the sentences.
The solution we have proposed uses a first- or second-order k-best list of parses from
which syntactic material will be used to propose possible patches. Another solution
would be to perform the patch generation directly on the parse forest.

The error analysis showed that coverage of the automatically extracted ISCs and
ISFs is an important cause of errors, despite the size of the corpus used to extract ISCs
and ISFs. Collecting larger corpora might not be the right solution to increase coverage.
The third direction we would like to explore is generalizing over the events that are
observed in the data, using distributional semantics.

致谢
The authors thank Yann Vaxes for his help
in proving the NP-Completeness of the
patching problem and Fiona Torbey,
Jean Torbey, and Balamurali Andiyakkal
Rajendran for proofreading.

参考
Abeill´e, Anne, Lionel Cl´ement, and Franc¸ois
Toussenel. 2003. Building a treebank for
法语. In Anne Abeill´e, 编辑, Treebanks.
Kluwer.

Achterberg, Tobias. 2009. SCIP: Solving

constraint integer programs. Mathematical
Programming Computation, 1(1):1–41.

Attardi, Giuseppe and Massimiliano

Ciaramita. 2007. Tree revision learning for
dependency parsing. In HLT-NAACL,
pages 388–395, 罗切斯特, 纽约.

Bansal, Mohit and Dan Klein. 2011.

Web-scale features for full-scale parsing. 在
Proceedings of the 49th Annual Meeting of the
计算语言学协会:
人类语言技术,
pages 693–702, Portland.

Bechet, Frederic and Alexis Nasr. 2009.

Robust dependency parsing for spoken
language understanding of spontaneous
speech. In Proceedings of Interspeech,
pages 1039–1042, Brighton, 英国.

Bikel, Daniel M. 2004. Intricacies of Collins’
parsing model. 计算语言学,
30(4):479–511.

博内特, 贝恩德. 2010. Top accuracy and fast

dependency parsing is not a contradiction.
In Proceedings of the 23rd International
Conference on Computational Linguistics
(科林 2010), pages 89–97, 北京.

88

Boullier, 皮埃尔, Alexis Nasr, and Benoˆıt

Sagot. 2009. Constructing parse forests that
include exactly the n-best PCFG trees. 在
11th Conference on Parsing Technologies,
pages 117–128, 巴黎.

Brent, 迈克尔. 1991. Automatic acquisition

of subcategorization frames from
untagged text. In Proceedings of the 29th
Annual Meeting of the Association for
计算语言学, pages 209–214,
伯克利, CA.

Candito, Marie, Benoˆıt Crabb´e, Pascal Denis,

and Franc¸ois Gu´erin. 2009. Analyse
syntaxique du franc¸ais: des constituants
aux d´ependances. 在诉讼程序中
Traitement Automatique des Langues
Naturelles, Senlis, 法国.

查尼亚克, Eugene and Mark Johnson. 2005.
Coarse-to-fine n-best parsing and MaxEnt
discriminative reranking. 在诉讼程序中
the 43rd Annual Meeting of the Association
for Computational Linguistics,
pages 173–180, Michigan.

教会, K. 瓦. 和P. Hanks. 1990. Word

association norms, mutual information,
和词典编纂. 计算型
语言学, 16(1):22–29.

柯林斯, 迈克尔. 1997. Three generative,

lexicalised models for statistical parsing.
In Proceedings of the 35th Annual Meeting
of the Association for Computational
Linguistics and Eighth Conference of the
European Chapter of the Association for
计算语言学, pages 16–23,
Madrid, 西班牙.

柯林斯, Michael and Terry Koo. 2005.

Discriminative reranking for natural
language parsing. 计算型
语言学, 31(1):25–70.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Mirroshandel and Nasr

Integrating Selection and Subcategorization in a Dependency Parser

艾斯纳, Jason. 2000. Bilexical grammars and
their cubic-time parsing algorithms. 在
Advances in Probabilistic and Other Parsing
Technologies, pages 29–61, 施普林格.

艾斯纳, Jason M. 1996. Three new

probabilistic models for dependency
解析: An exploration. 在诉讼程序中
the 16th Conference on Computational
Linguistics-Volume 1, pages 340–345,
哥本哈根, 丹麦.

Gildea, Daniel. 2001. Corpus variation and
parser performance. 在诉讼程序中
2001 实证方法会议
自然语言处理,
pages 167–202, Pittsburgh.

大厅, Keith and V´aclav Nov´ak. 2005.

Corrective modeling for non-projective
dependency parsing. 在诉讼程序中
Ninth International Workshop on Parsing
技术, pages 42–52, Vancouver,
加拿大.

Henestroza, Enrique Anguiano and Marie
Candito. 2011. Parse correction with
specialized models for difficult attachment
类型. In Proceedings of the Conference on
自然语言的经验方法
加工, pages 1222–1233, 爱丁堡,
苏格兰.

黄, L. 和D. 蒋. 2005. Better k-best

解析. In Proceedings of the Ninth
International Workshop on Parsing
技术, pages 53–64, Vancouver.
黄, 梁. 2008. Forest reranking:

Discriminative parsing with non-local
特征. In Proceedings of ACL-08: 赫勒特,
pages 586–594, Columbus.

黄, Liang and David Chiang. 2007.
Forest rescoring: Faster decoding
with integrated language models. 在
Proceedings of the 45th Annual Meeting
of the Association for Computational
语言学, pages 144–151, Prague.

黄, Zhongqiang, Mary Harper, and Slav
Petrov. 2010. Self-training with products of
latent variable grammars. 在诉讼程序中
这 2010 实证方法会议
自然语言处理, pages 12–22,
剑桥, 马萨诸塞州.

Hwa, Rebecca. 2004. Sample selection for

statistical parsing. 计算型
语言学, 30(3):253–276.

Joshi, Aravind Krishna. 1985. Tree Adjoining
Grammars: How Much Context-Sensitivity is
Required to Provide Reasonable Structural
Descriptions? Technical report.

Karp, 理查德·M. 1972. Reducibility among

Combinatorial Problems. 施普林格.
Koo, Terry and Michael Collins. 2010.

In Proceedings of the 48th Annual Meeting of
the Association for Computational Linguistics,
pages 1–11, Uppsala, 瑞典.

K ¨ubler, Sandra, Ryan McDonald, and Joakim

Nivre. 2009. Dependency parsing.
Synthesis Lectures on Human Language
Technologies, 1(1):1–127.

Lecerf, Yves. 1961. Une repr´esentation

alg´ebrique de la structure des phrases
dans diverses langues naturelles. Compte
Rendu de l’Acadad´emie des Sciences de Paris,
252:232–234.

曼宁, Christopher. 1993. Automatic

acquisition of a large subcategorization
dictionary from corpora. 在诉讼程序中
the 31st Annual Meeting of the Association for
计算语言学, pages 235–242,
Columbus.

马库斯, 中号. P。, 中号. A. 马尔辛凯维奇, 和
乙. 圣托里尼岛. 1993. Building a large
annotated corpus of English: The Penn
treebank. 计算语言学,
19(2):313–330.

马丁斯, Andr´e F. T。, Miguel Almeida, 和

诺亚A. 史密斯. 2013. Turning on
the turbo: Fast third-order non-projective
turbo parsers. 在诉讼程序中
51st Annual Meeting of the Association
for Computational Linguistics (体积 2:
Short Papers), pages 617–622, Sofia.
马丁斯, Andr´e F. T。, 诺亚A. 史密斯, 和

Eric P. Xing. 2009. Concise integer linear
programming formulations for
dependency parsing. 在诉讼程序中
Joint Conference of the 47th Annual Meeting
of the ACL and the 4th International Joint
Conference on Natural Language Processing of
the AFNLP, pages 342–350, Suntec.

麦当劳, R。, K. Crammer, and F. 佩雷拉.
2005A. Online large-margin training of
dependency parsers. 在诉讼程序中
43rd Annual Meeting of the Association for
计算语言学 (ACL’05),
pages 91–98, Michigan.

麦当劳, Ryan, Koby Crammer, 和
Fernando Pereira. 2005乙. 在线的
large-margin training of dependency
parsers. In Proceedings of the 43rd Annual
Meeting of the Association for Computational
语言学 (ACL’05), pages 91–98,
Michigan.

麦当劳, Ryan and Giorgio Satta. 2007.
On the complexity of non-projective
data-driven dependency parsing. 在
Proceedings of the 10th International
Conference on Parsing Technologies,
pages 121–132, Prague.

麦当劳, Ryan T. and Fernando C. 氮.

Efficient third-order dependency parsers.

佩雷拉. 2006. Online learning of

89

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

计算语言学

体积 42, 数字 1

approximate dependency parsing
算法. In EACL, pages 81–88,
Trento, 意大利.

Messiant, C。, A. 科尔霍宁, 时间. Poibeau, 等人.
2008. Lexschem: A large subcategorization
lexicon for French verbs. 在诉讼程序中
the Language Resources and Evaluation
会议, Marrakech, 摩洛哥.

Mirroshandel, S.A. 和一个. Nasr. 2011. Active
learning for dependency parsing using
partially annotated sentences. 在
Proceedings of the 12th International
Conference on Parsing Technologies,
pages 140–149, 都柏林.

Mirroshandel, Seyed Abolghasem, Alexis

Nasr, and Joseph Le Roux. 2012.
Semi-supervised dependency parsing
using lexical affinities. 在诉讼程序中
50th Annual Meeting of the Association
for Computational Linguistics (体积 1:
Long Papers), pages 777–785, Jeju Island,
韩国.

Mirroshandel, Seyed Abolghasem,

Alexis Nasr, and Benoit Sagot. 2013.
Enforcing subcategorization constraints
in a parser using sub-parses recombining.
在诉讼程序中 2013 Conference of
the North American Chapter of the Association
for Computational Linguistics: 人类
语言技术, pages 239–247,
亚特兰大.

Nakov, 磷. 和M. Hearst. 2005. 使用

Web as an implicit training set: 应用
to structural ambiguity resolution. 在
Proceedings of the Human Language
Technology Conference and the Conference on
自然语言的经验方法
加工, pages 835–842, Vancouver,
加拿大.

Napoles, 考特尼, Matthew Gormley, 和
Benjamin van Durme. 2012A. Annotated
English gigaword ldc2012t21. Linguistic
Data Consortium, 费城, PA.

Napoles, 考特尼, Matthew Gormley, 和
Benjamin van Durme. 2012乙. Annotated
gigaword. In Proceedings of the Joint
Workshop on Automatic Knowledge Base
Construction and Web-scale Knowledge
萃取, pages 95–100, 蒙特利尔,
加拿大.

Nasr, A。, F. B´echet, J. F. Rey, 乙. Favre, 和

Le Roux, J. 2011. MACAON: An NLP tool

suite for processing word lattices. 在
Proceedings of the ACL-HLT 2011 系统
Demonstrations, pages 86–91, Portland.
派克, 罗伯特, David Graff, Junbo Kong,
Ke Chen, and Kazuaki Maeda. 2011.
English gigaword fifth edition, ldc2011t07.
Linguistic Data Consortium, 费城,
PA.

坑, E., S. Bergsma, D. 林, and K. 教会.

2010. Using Web-scale N-grams to
improve base NP parsing performance. 在
Proceedings of the 23rd International
Conference on Computational Linguistics
(科林 2010), pages 886–894, 北京,
中国.

Riedel, Sebastian and James Clarke. 2006.

Incremental integer linear programming
for non-projective dependency parsing. 在
诉讼程序 2006 会议
自然语言的经验方法
加工, pages 129–137, 悉尼,
澳大利亚.

匆忙, Alexander M. and Slav Petrov. 2012.
Vine pruning for efficient multi-pass
dependency parsing. 在诉讼程序中
2012 Conference of the North American
Chapter of the Association for Computational
语言学: 人类语言技术,
pages 498–507, Jeju Island, 韩国.

S´anchez-S´aez, R。, J. A. S´anchez, 和 J. 中号.
Bened´ı. 2009. Statistical confidence
measures for probabilistic parsing. 在
Proceedings of the International Conference
RANLP-2009, pages 388–392, Borovets,
Bulgaria.

沃尔克, 中号. 2001. Exploiting the WWW as a

corpus to resolve PP attachment
ambiguities. In Corpus Linguistics,
pages 601–606.

韦塞尔, Frank, Ralf Schluter, Klaus Macherey,

and Hermann Ney. 2001. Confidence
measures for large vocabulary continuous
speech recognition. IEEE Transactions
on Speech and Audio Processing,
9(3):288–298.

周, G。, J. 赵, K. 刘, 和L. Cai. 2011.

Exploiting Web-derived selectional
preference to improve statistical
dependency parsing. 在诉讼程序中
49th Annual Meeting of the Association for
计算语言学: Human Language
Technologies, pages 1556–1565, Portland.

90

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
C



/

A
r
t

C
e

p
d

F
/

/

/

/

4
2
1
5
5
1
8
0
5
9
8
6
/
C


_
A
_
0
0
2
4
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3
下载pdf