Transition-Based Parsing for Deep - Recherche en IA spécialisée au MIT

Transition-Based Parsing for Deep
Dependency Structures

Xun Zhang∗
Peking University

∗
Yantao Du
Peking University

∗

Weiwei Sun
Peking University

Xiaojun Wan∗
Peking University

Derivations under different grammar formalisms allow extraction of various dependency struc-
photos. Particularly, bilexical deep dependency structures beyond surface tree representation
can be derived from linguistic analysis grounded by CCG, LFG, and HPSG. Traditionnellement, ces
dependency structures are obtained as a by-product of grammar-guided parsers. In this arti-
clé, we study the alternative data-driven, transition-based approach, which has achieved great
success for tree parsing, to build general dependency graphs. We integrate existing tree pars-
ing techniques and present two new transition systems that can generate arbitrary directed
graphs in an incremental manner. Statistical parsers that are competitive in both accuracy
and efﬁciency can be built upon these transition systems. En outre, the heterogeneous
design of transition systems yields diversity of the corresponding parsing models and thus
greatly beneﬁts parser ensemble. Concerning the disambiguation problem, we introduce two
new techniques, namely, transition combination and tree approximation, to improve parsing
qualité. Transition combination makes every action performed by a parser signiﬁcantly change
conﬁgurations. Donc, more distinct features can be extracted for statistical disambiguation.
With the same goal of extracting informative features, tree approximation induces tree backbones
from dependency graphs and re-uses tree parsing techniques to produce tree-related features. Nous
conduct experiments on CCG-grounded functor–argument analysis, LFG-grounded grammatical
relation analysis, and HPSG-grounded semantic dependency analysis for English and Chinese.
Experiments demonstrate that data-driven models with appropriate transition systems can
produce high-quality deep dependency analysis, comparable to more complex grammar-driven

∗ The authors are with the Institute of Computer Science and Technology, the MOE Key Laboratory of

Computational Linguistics, Peking University, Beijing 100871, Chine.
E-mail: {zhangxunah,duyantao,ws,wanxiaojun}@pku.edu.cn. Weiwei Sun is the corresponding author.

Submission received: 6 May 2015; revised submission received: 2 Novembre 2015; accepted for publication:
18 Janvier 2016.

est ce que je:10.1162/COLI a 00252

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

models. Experiments also indicate the effectiveness of the heterogeneous design of transition
systems for parser ensemble, transition combination, as well as tree approximation for statistical
disambiguation.

1. Introduction

The derivations licensed by a grammar under deep grammar formalisms, Par exemple,
combinatory categorial grammar (CCG; Steedman 2000), lexical-functional grammar
(LFG; Bresnan and Kaplan 1982) and head-driven phrase structure grammar (HPSG;
Pollard and Sag 1994), are able to produce rich linguistic information encoded as
bilexical dependencies. Under CCG, this is done by relating the lexical heads of functor
categories and their arguments (Clark, Hockenmaier, and Steedman 2002). Under LFG,
bilexical grammatical relations can be easily derived as the backbone of F-structures
(Sun et al. 2014). Under HPSG, predicate–argument structures (Miyao, Ninomiya, et
ichi Tsujii 2004) or reduction of minimal recursion semantics (Ivanova et al. 2012) can be
extracted from typed feature structures corresponding to whole sentences. Dependency
analysis grounded in deep grammar formalisms is usually beyond tree representations
and well-suited for producing meaning representations. Chiffre 1 is an example from
CCGBank. The deep dependency graph conveniently represents more semantically
motivated information than the surface tree. Par exemple, it directly captures the
Agent–Predicate relations between word “people” and conjuncts “ﬁght,” “eat,” as well
as “drink.”

Automatically building deep dependency structures is desirable for many practical
NLP applications, Par exemple, information extraction (Miyao et al. 2008) and question
answering (Reddy, Lapata, and Steedman 2014). Traditionnellement, deep dependency graphs
are generated as a by-product of grammar-guided parsers. The challenge is that a
deep-grammar-guided parsing model usually cannot produce full coverage and the
time complexity of the corresponding parsing algorithms is very high. Previous work
on data-driven dependency parsing mainly focused on tree-shaped representations.
Nevertheless, recent work has shown that a data-driven approach is also applicable
to generate more general linguistic graphs. Sagae and Tsujii (2008) present an initial
study on applying transition-based methods to generate HPSG-style predicate–argument
structures, and have obtained competitive results. En outre, Titov et al. (2009) et
Henderson et al. (2013) have shown that more general graphs rather than planars can
be produced by augmenting existing transition systems.

This work follows early encouraging research and studies transition-based ap-
proaches to construct deep dependency graphs. The computational challenge to in-
cremental graph spanning is the existence of a large number of crossing arcs in deep

Chiffre 1
An example from CCGBank. The upper curves represent a deep dependency graph and the
bottom curves represent a traditional dependency tree.

354

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

dependency analysis. To tackle this problem, we integrate insightful ideas, especially
the ones illustrated in Nivre (2009) and G ´omez-Rodr´ıguez and Nivre (2010), developed
in the tree spanning scenario, and design two new transition systems, both of which are
able to produce arbitrary directed graphs. En particulier, we explore two techniques to lo-
calize transition actions to maximize the effect of a greedy search procedure. In this way,
the corresponding parsers for generating linguistically motivated bilexical graphs can
process sentences in close to linear time with respect to the number of input words. Ce
efﬁciency advantage allows deep linguistic processing for very-large-scale text data.

For syntactic parsing, ensembled methods have been shown to be very helpful
in boosting accuracy (Sagae and Lavie 2006; Zhang et al. 2009; McDonald and Nivre
2011). En particulier, Surdeanu and Manning (2010) presented a nice comparative
study on various ensemble models for dependency tree parsing. They found that the
diversity of base parsers is more important than complex ensemble models for learning.
Motivated by this observation, the authors proposed a hybrid transition-based parser
that achieved state-of-the-art performance by combining complementary prediction
powers of different transition systems. One advantage of their architecture is the
linear-time decoding complexity, given that all base models run in linear-time. Another
concern of our work is about the model diversity obtained by the heterogeneous design
of transition systems for general graph spanning. Empirical evaluation indicates that
statistical parsers built upon our new transition systems as well as the existing best
transition system—namely, Titov et al. (2009)’s system (THMM, hereafter)—exhibit
complementary parsing strengths, which beneﬁt system combination. In order to take
advantage of this model diversity, we propose a simple yet effective ensemble model
to build a better hybrid system.

We implement statistical parsers using the structured perceptron algorithm (Collins
2002) for transition classiﬁcation and use a beam decoder for global inference. Concern-
ing the disambiguation problem, we introduce two new techniques, namely, transition
combination and tree approximation, to improve parsing quality. To increase system
coverage, the ARC transitions designed by the THMM as well as our systems do not
change the nodes in the stack nor buffer in a conﬁguration: Only the nodes linked to
the top of the stack or buffer are modiﬁed. Donc, features derived from the conﬁg-
urations before and after an ARC transition are not distinct enough to train a good clas-
siﬁer. To deal with this problem, we propose the transition combination technique and
three algorithms to derive oracles for modiﬁed transition systems. When we apply our
models to semantics-oriented deep dependency structures, Par exemple, CCG-grounded
functor–argument analysis and HPSG-grounded reduced minimal recursion semantics
(MRS; Copestake et al. 2005) analyse, we ﬁnd that syntactic trees can provide very
helpful features. In case the syntactic information is not available, we introduce a tree
approximation technique to induce tree backbones from deep dependency graphs. Tel
tree backbones can be utilized to train a tree parser which provides pseudo tree features.
To evaluate transition-based models for deep dependency parsing, we conduct
experiments on CCG-grounded functor–argument analysis (Hockenmaier and Steedman
2007; Tse and Curran 2010), LFG-grounded grammatical relation analysis (Sun et al.
2014), and HPSG-grounded semantic dependency analysis (Miyao, Ninomiya, and ichi
Tsujii 2004; Ivanova et al. 2012) for English and Chinese. Empirical evaluation indicates
some non-obvious facts:

Data-driven models with appropriate transition systems and
disambiguation techniques can produce high-quality deep dependency
analyse, comparable to more complex grammar-driven models.

355

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Parsers built upon heterogeneous transition systems and decoding orders
have complementary prediction strengths, and the parsing quality can be
signiﬁcantly improved by system combination; compared to the best
individual system, system combination gets an absolute labeled F-score
improvement of 1.21 on average.

Transition combination signiﬁcantly improves parsing accuracy on a wide
range of conditions, resulting in an absolute labeled F-score improvement
de 0.74 on average.

Pseudo trees contribute to semantic dependency parsing (SDP) equally
well to syntactic trees, and result in an absolute labeled F-score
improvement of 1.27 on average.

We compare our parser with representative state-of-the-art parsers (Miyao and
Tsujii 2008; Auli and Lopez 2011b; Martins and Almeida 2014; Xu, Clark, and Zhang
2014; Du, Sun, and Wan 2015) with respect to different architectures. To evaluate the
impact of grammatical knowledge, we compare our parser with parsers guided by
treebank-induced HPSG and CCG grammars. Both of our individual and ensembled
parsers achieve equivalent accuracy to HPSG and CCG chart parsers (Miyao and Tsujii
2008; Auli and Lopez 2011b), and outperform a shift-reduce CCG parser (Xu, Clark, et
Zhang 2014). It is worth noting that our parsers exclude all syntactic and grammatical
information. Autrement dit, strictly less information is used. This result demonstrates
the effectiveness of data-driven approaches to the deep linguistic processing prob-
lem. Compared to other types of data-driven parsers, our individual parser achieves
equivalent performance to and our hybrid parser obtains slightly better results than
factorization parsers based on dual decomposition (Martins and Almeida 2014; Du, Sun,
and Wan 2015). This result highlights the effectiveness of the lightweight, transition-
based approach.

Parsers based on the two new transition systems have been utilized as base com-
ponents for parser ensemble (Du et al. 2014) for SemEval 2014 Task 8 (Oepen et al.
2014). Our hybrid system obtained the best overall performance of the closed track of
this shared task. In this article, we re-implement all models, calibrate features more
carefully, and thus obtain improved accuracy. The idea to extract tree-shaped backbone
from a deep dependency graph has also been used to design other types of parsing
models in our early work (Du et al. 2014, 2015; Du, Sun, and Wan 2015). Nevertheless,
the idea to train a pseudo tree parser to serve a transition-based graph parser is new.

The implementation of our parser is available at http://www.icst.pku.edu.cn/

lcwm/grass.

2. Transition Systems for Graph Spanning

2.1 Background Notations

A dependency graph G = (V, UN) is a labeled directed graph, such that for sentence x =
w1, . . . , wn the following holds:

V = {0, 1, 2, . . . , n},
A ⊆ V × R × V.

356

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

The vertex set V consists of n + 1 nodes, each of which is represented by a single integer.
En particulier, 0 represents a virtual root node w0, and all others correspond to words in
X. The arc set A represents the labeled dependency relations of the particular analysis
G. Speciﬁcally, an arc (je, r, j) ∈ A represents a dependency relation r from head wi to
dependent wj. A dependency graph G is thus a set of labeled dependency relations be-
tween the root and the words of x. To simplify the description in this section, we mainly
consider unlabeled parsing and assume the relation set R is a singleton. Or, taking it
another way, we assume A ⊆ V × V. It is straightforward to adapt the discussions in
this article for labeled parsing. To do so, we can parameterize transitions with possible
dependency relations. For empirical evaluation as discussed in Section 5, we will test
both labeled and unlabeled parsing models.

Following Nivre (2008), we deﬁne a transition system for dependency parsing as a

quadruple S = (C, T, cs, C

t), où

C is a set of conﬁgurations, each of which contains a buffer β of
(remaining) words and a set A of dependency arcs,
T is a set of transitions, each of which is a (partial) function t : C (cid:5)→ C,

cs is an initialization function, mapping a sentence x to a conﬁguration
with β = [1, . . . , n],
C

⊆ C is a set of terminal conﬁgurations.

Given a sentence x = w1, . . . , wn and a graph G = (V, UN) on it, if there is a sequence
= cs(X),
= A, we say the sequence of transitions is an
= A − Aci for the arcs to be built of ci. We could

of transitions t1, . . . , tm and a sequence of conﬁgurations c0, . . . , cm such that c0
ti(ci−1) = ci(i = 1, . . . , m), cm
oracle sequence. And we deﬁne ¯Aci
denote a transition sequence as either t1,m or c0,m

t, and Acm

∈ C

In a typical transition-based parsing process, the input words are put into a queue
and partially built structures are organized by a stack. A set of SHIFT/REDUCE actions
are performed sequentially to consume words from the queue and update the partial
parsing results organized by the stack. Our new systems designed for deep parsing
differ with respect to their information structures to deﬁne a conﬁguration and the
behaviors of transition actions.

2.2 Naive Spanning and Locality

For every two nodes, a simple graph-spanning strategy is to check if they can be directly
connected by an arc. Accordingly, a “naive” spanning algorithm can be implemented
by exploring a left-to-right checking order, as introduced by Covington (2001) et
modiﬁed by Nivre (2008).

for j = 1..n

PARSE(x = (w1, . . . , wn))
1
2
3

for k = j − 1..1
Link(j, k)

The operation Link chooses between 1) adding the arc (je, j) ou (j, je) et 2) adding no arc
at all. In this way, the algorithm builds a graph by incrementally trying to link every
pair of words.

357

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

LEFT-ARC
RIGHT-ARC
SHIFT
POP
SWAP
SWAPT

(p|je, j|β) ⇒ (p|je, j|β)
(p|je, j|β) ⇒ (p|je, j|β)
(p, j|β) ⇒ (p|j, β)
(p|je, β) ⇒ (p, β)
(p|je|j, β) ⇒ (p|j, je|β)
(p|je|j, β) ⇒ (p|j|je, β)

Chiffre 2
Transitions of the online re-ordering approach.

The complexity of naive spanning is θ(n2),1 because it does nothing to explore
the topological properties of a linguistic structure. Autrement dit, the naive graph-
spanning idea does not fully take advantages of the greedy search of the transition-
based parsing architecture. On the contrary, a well-designed transition system for
(projective) tree parsing can decode in linear time by exploiting locality among subtrees.
Take the arc-eager system presented in Nivre (2008), Par exemple: Only the nodes at the
top of the stack and the buffer are allowed to be linked. Such limitation is the key to
implement a linear time decoder. Dans ce qui suit, we introduce two ideas to localize a
transition action, c'est, to allow a transition to manipulate only the frontier items in
the data structures of a conﬁguration. By this means, we can decrease the number of
possible transitions for each conﬁguration and thus minimize the total decoding time.

2.3 System 1: Online Re-ordering

The online re-ordering approach that we explore is to provide the system with ability
to re-order the nodes during parsing in an online fashion. The key idea, as introduced
in Titov et al. (2009) and Nivre (2009), is to allow a SWAP transition that switches the
position of the two topmost nodes on the stack. By changing the linear order of words,
the system is able to build crossing arcs for graph spanning. We refer to this approach
as online re-ordering. We introduce a stack-based transition system with online re-
ordering for deep dependency parsing. The obtained oracle parser is complete with
respect to the class of all directed graphs without self-loop.

2.3.1 The System. We deﬁne a transition system SS
t), where a conﬁguration
c = (p, β, UN) ∈ C contains a stack σ of nodes, besides β and A. We set the initial conﬁgu-
ration for a sentence x = w1, . . . , wn to be cs(X) = ([], [1, . . . , n], {}), and take C
t to be the
= (p, [], UN) (for any σ any A). These transitions are
set of all conﬁgurations of the form ct
shown in Figure 2 and explained as follows.

= (C, T, cs, C

(cid:2)

(cid:2)
(cid:2)

SHIFT (sh) removes the front from the buffer and pushes it onto the stack.
LEFT/RIGHT-ARC (la/ra) updates a conﬁguration by adding (j, je)/(je, j) to A
where i is the top of the stack, and j is the front of the buffer.

POP (pop) updates a conﬁguration by popping the top of the stack.
SWAP (sw) updates a conﬁguration with stack σ|je|j by moving i back to the
buffer.

1 We assume that at most one edge exists between two words. This is a reasonable assumption for a

linguistic representation.

358

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

A variation of transition SWAP is SWAPT, which updates the conﬁguration by
swapping i and j. Cependant, the system of this variation is not complete with respect to
directed graphs because the power of transition SWAPT is limited, and counterexamples
of completeness can be found. For more theoretical discussion about this system (c'est à dire.,
THMM), see Titov et al. (2009). We also denote Titov et al. (2009)’s system as ST.

2.3.2 Theoretical Analysis. The soundness of SS is trivial. To demonstrate the completeness
of the system, we give a constructive proof that can derive oracle transitions for any
arbitrary graph. To simplify the description, the label attached to transitions are not
considered. The idea is inspired by Titov et al. (2009). Given a sentence x = w1, . . . , wn
= cs(X) et
and a graph G = (V, UN) on it, we start with the initial conﬁguration c0
compute the oracle transitions step by step. On the i-th step, let p be the top of σ
ci−1,
b be the front of β
ci−1; let L(j) be the ordered list of nodes connected to j in ¯Aci−1 for any
node j ∈ σ

ci−1 ) = [L(j0), . . . , L(jl)] if σ

= [jl, . . . , j0].

ci−1; let L(p

ci−1

The oracle transition for each conﬁguration is derived as follows. If there is no arc
linked to p in ¯Aci−1, then we set ti to pop; if there exists a ∈ ¯Aci−1 linking p and b, alors
we set ti to la or ra correspondingly. When there are only sh and sw left, we see if there
is any node q under the top of σ
ci−1 such that L(q) precedes L(p) by the lexicographical
order. If so, we set ti to sw; else we set ti to sh. An example for when to do sw is shown
in Figure 3. Let ci

= ti(ci−1); we continue to compute ti+1, until β

ci is empty.

Lemma 1
If ti is sh, L(p

ci−1 ) = [L(j0), . . . , L(jl)] is complete ordered by lexicographical order.

Proof
It cannot be the case that for some u > 0, L(ju) strictly precedes L(j0), otherwise ti should
be sw. It also cannot be the case that for some u > v > 0, L(ju) strictly precedes L(jv),
because when jv−1 is shifted onto the stack, L(jv) precedes L(ju) and all the transitions
do not change L(jv) and L(ju) afterwards. (cid:2)

Lemma 2
For i = 0, . . . , m, there is no arc (j, k) ∈ ¯Aci such that j, k ∈ σ
je.

Proof
When j ∈ σ
( j, k) ou (k, j) in ¯Acw such that k ∈ σ
only link to nodes in β
ical orders, and from Lemma 1 the top of σ
ra should be applied. (cid:2)

ci is shifted onto the stack by the w-th transition tw, there must be no arc
cw−1 can
cw−1, which implies that L(k) has one of the smallest lexicograph-
cw−1 must be linked to j. And not sh, but la or

cw. Otherwise, by induction every node in σ

Chiffre 3
p, β, and ¯A of two conﬁgurations c1 and c2. In the left graphic, L(p
Because [5, 6] et [5] precedes [6], we apply two SWAPs and then two SHIFTs, obtaining the
right graphic.

c1 ) = [[6], [5], [5, 6], [7]].

359

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Theorem 1
t1, . . . , tm is an oracle sequence of transitions for G.

¯Acm

Proof
= ∅, so it sufﬁces to show the sequence of
From Lemma 2, we can infer that
transitions is always ﬁnite. We deﬁne a swap sequence to be a subsequence ti, . . . , tj
such that ti and tj are sw, ti−1 and tj+1 are not sw, and a shift sequence similarly. Il
can be seen that a swap sequence is always followed by a shift sequence, the length
of which is no less than the swap sequence, and if the two sequences are of the
same length, the next transition cannot be sw. Let #(t) to be the number of transition
types t in the sequence, alors #(la), #(ra), #(pop), et #(sh) − #(sw) are all ﬁnite. Là-
fore the number of swap sequence is ﬁnite, indicating that the transition sequence is
ﬁnite. (cid:2)

2.4 System 2: Two-Stack–Based System

A majority of transition systems organize partial parsing results with a stack. Classical
parsers, including arc-standard and arc-eager ones, add dependency arcs only between
nodes that are adjacent on the stack or the buffer. A natural idea to produce crossing
arcs is to temporarily move nodes that block non-adjacent nodes to an extra memory
module, like the two-stack–based system for two-planar graphs (G ´omez-Rodr´ıguez
and Nivre 2010) and the list-based system (Nivre 2008). In this article, we design a
new transition system to handle crossing arcs by using two stacks. This system is also
complete with respect to the class of directed graphs without self-loop.

2.4.1 The System. We deﬁne the two-stack–based transition system S2S
t),
where a conﬁguration c = (p, p(cid:2)
, β, UN) ∈ C contains a primary stack σ and a secondary
stack σ(cid:2)
. We set cs(X) = ([], [], [1, . . . , n], {}) for the sentence x = w1, . . . , wn, and we take
the set C
t to be the set of all conﬁgurations with empty buffers. The transition set
T contains six types of transitions, as shown in Figure 4. We only explain MEM and
RECALL:

= (C, T, cs, C

(cid:2)

MEM (mem) pops the top element from the primary stack and pushes it
onto the secondary stack.

RECALL (rc) moves the top element of the secondary stack back to the
primary stack.

LEFT-ARC
RIGHT-ARC
SHIFT
POP
MEM
RECALL

, j|β) ⇒ (p|je, p(cid:2)
, j|β)
, j|β) ⇒ (p|je, p(cid:2)
, j|β)
, j|β) ⇒ (p|j, p(cid:2)
, β)
, β) ⇒ (p, p(cid:2)
, β)
, β) ⇒ (p, p(cid:2)|je, β)
, β)

Chiffre 4
Transitions of the two-stack–based system.

360

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

2.4.2 Theoretical Analysis. The soundness of this system is trivial, and the completeness
is also straightforward after we give the construction of an oracle transition sequence
for an arbitrary graph. The oracle is computed as follows on the i-th step: We do la, ra,
and pop transitions just like in Section 2.3.2. After that, let b be the front of β
ci−1, we see
ci−1 or j ∈ σ(cid:2)
if there is j ∈ σ
ci−1, then we do a
sequence of mem to make j the top of σ
ci−1, then we do a sequence of rc to
make j the top of σ

ci−1 linked to b by an arc in ¯Aci−1. If j ∈ σ
ci−1; if j ∈ σ(cid:2)
ci−1 or σ(cid:2)

ci−1. When no node in σ

ci−1 is linked to b, we do sh.

Theorem 2
S2S is complete with respect to directed graphs without self-loop.

Proof
The completeness immediately follows the fact that the computed oracle sequence is
ﬁnite, and every time a node is shifted onto σ
ci to the
shifted node. (cid:2)

ci, no arc in ¯Aci links nodes in σ

2.4.3 Related Systems. G ´omez-Rodr´ıguez and Nivre (2010, 2013) introduced a two-stack–
based transition system for tree parsing. Their study is motivated by the observation
that the majority of dependency trees in various treebanks are actually planar or two-
planar graphs. Accordingly, their algorithm is specially designed to handle projective
trees and two-planar trees, but not all graphs. Because many more crossing arcs exist
in deep dependency structures and more sentences are assigned with neither planar
nor two-planar graphs, their strategy of utilizing two stacks is not suitable for the deep
dependency parsing problem. Different from their system, our new system maximizes
the utility of two memory modules and is able to handle any directed graphs.

The list-based systems, such as the basic one introduced by Nivre (2008) et le
extended one introduced by Choi and Palmer (2011), also use two memory modules.
The function of the secondary memory module of their systems and ours is very
different. In our design, only nodes involved in a subgraph that contains crossing arcs
may be put into the second stack. In the existing list-based systems, both lists are heavily
used, and nodes may be transferred between them many times. The function of the two
lists is to simulate one memory module that allows accessing any unit in it.

2.5 Extension
2.5.1 Graphs with Loops. It is easy to extend our system to generate arbitrary directed
graphs by adding a new transition:

(cid:2)

SELF-ARC adds an arc from the top element of the primary memory
module (p) to itself, but does not update any stack nor buffer.

Theorem 3
SS and S2S augmented with SELF-ARC are complete with respect to directed graphs.

2.5.2 Labeled Parsing and Supertagging. It is also straightforward to adapt the two transi-
tion systems for labeled dependency graph generation. To do so, we can parameterize
LEFT-ARC and RIGHT-ARC transitions with dependency relations. Par exemple, a pa-
rameterized transition LEFT-ARCr tells the system not only that there is an arc between
the frontier node of the stack and the frontier node of the buffer but also that this arc
holds a relation r. Some linguistic representations assign labels to nodes as well. When a

361

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

deep grammar is considered to license to representation, node labels are usually called
“supertags.” To assign supertags to words, namely, nodes in a dependency graph, nous
can parameterize the SHIFT transition with tag labels.

3. Statistical Disambiguation

3.1 Transition Classiﬁcation

A transition-based parser must decide which transition is appropriate given its parsing
environment (c'est à dire., conﬁguration). As with many other data-driven dependency parsers,
we use a global linear model for disambiguation. Autrement dit, a discriminative
classiﬁer is utilized to approximate the oracle function for a transition system S that
maps a conﬁguration c to a transition t that is deﬁned on c. More formally, a transition-
based statistical parser tries to ﬁnd the transition sequence c0,m that maximizes the
following score

SCORE(c0,m) =

m−1(cid:2)

i=0

SCORE(ci, ti+1)

(1)

Following the state-of-the-art discriminative disambiguation technique for data-driven
parsing, we deﬁne the score function as a linear combination of features deﬁned over a
conﬁguration and a transition, as follows:

SCORE(ci, ti+1) = θ(cid:3)φ(ci, ti+1)

(2)

where φ deﬁnes a vector for each conﬁguration–transition pair and θ is the weight vector
for linear combination.

Exact calculation of the maximization is extremely hard without any assumption
of φ. Even with a proper φ for real-word parsing, exact decoding is still impractical for
most practical feature designs. In this article, we follow the recent success of using beam
search for approximate decoding. During parsing, the parser keeps track of multiple
yet a ﬁxed number of partial outputs to avoid making decisions too early. Training a
parser in the discriminative setting corresponds to estimating θ associated with rich
features. Previous research on dependency parsing shows that structured perceptron
(Collins 2002; Collins and Roark 2004) is one of the strongest learning algorithms. In all
experiments, we use the averaged perceptron algorithm with early update to estimate
parameters. The whole parser is very similar to the transition-based system introduced
in Zhang and Clark (2008, 2011b).

3.2 Transition Combination

In either THMM, SS, or S2S, the LEFT/RIGHT-ARC transition does not modify either the
stack or the buffer. Only new edges are added to the target graph. When automatic
classiﬁers are utilized to approximate an oracle, a majority of features for predicting an
ARC transition will be overlapped with the features for the successive transition. Empir-
ically, this property signiﬁcantly decreases the parsing accuracy. A key observation of a
linguistically motivated bilexical graph is that there is usually at most one edge between
any two words, therefore an ARC transition is not followed by another ARC. Par conséquent,
any ARC with its successive transition modiﬁes a conﬁguration much. To practically

362

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

LEFT-ARC
RIGHT-ARC
SHIFT
POP
MEM
RECALL
LEFT-ARC-SHIFT
LEFT-ARC-POP
LEFT-ARC-MEM
LEFT-ARC-RECALL
RIGHT-ARC-SHIFT
RIGHT-ARC-POP
RIGHT-ARC-MEM
RIGHT-ARC-RECALL

, j|β) ⇒ (p|je, p(cid:2)
, j|β)
, j|β) ⇒ (p|je, p(cid:2)
, j|β)
, j|β) ⇒ (p|j, p(cid:2)
, β)
, β) ⇒ (p, p(cid:2)
, β)
, β) ⇒ (p, p(cid:2)|je, β)
, β)
, j|β) ⇒ (p|je|j, p(cid:2)
, β)
, j|β) ⇒ (p, p(cid:2)
, j|β)
, j|β) ⇒ (p, p(cid:2)|je, j|β)
(cid:2)|je, p(cid:2)
, j|β) ⇒ (p|je|j, p(cid:2)
, β)
, j|β) ⇒ (p, p(cid:2)
, j|β)
, j|β) ⇒ (p, p(cid:2)|je, j|β)
(cid:2)|je, p(cid:2)

, p(cid:2)|je, j|β) ⇒ (p|je

, j|β)

Chiffre 5
Original and combined transitions for the two-stack combined system. Two-cycle is not
considered here.

improve the performance of a statistical parser, we combine every pair of two successive
transitions starting with ARC and transform the proposed two transition systems into
two modiﬁed ones. Par exemple, in our two-stack–based system, after combining, nous
obtain the transitions presented in Figure 5.

The number of edges between any two words could be at most two in real data. If
→ wa.
there are two edges between two words wa and wb, it must be wa
We call these two edges a two-cycle, and call this problem the two-cycle problem. In our
combined transitions, a LEFT/RIGHT-ARC transition should appear before a non-ARC
transition. In order to generate two edges between two words, we have two strategies:

→ wb and wb

UN) Add a new type of transitions to each system, which consist of a LEFT-ARC
transition, a RIGHT-ARC transition, and any other non-ARC transition
(par exemple., LEFT-ARC-RIGHT-ARC-RECALL for S2S).

B) Use a non-directional ARC transition instead of LEFT/RIGHT-ARC. Ici,

an ARC transition may add one or two edges depends on its label. In detail,
we propose two algorithms, namely, ENCODELABEL and DECODELABEL (voir
Algorithms 1 et 2), to deal with labels for ARC transition.

Algorithm 1 encode label

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

1: procedure ENCODELABEL(type, lLabel, rLabel)
2:

if type == LEFT then

return ”left” + lLabel
else if type == RIGHT then
return ”right” + rLabel

4:
5:

else

7:
end if
8:
9: end procedure

return ”both” + lLabel + »|» + rLabel

363

Computational Linguistics

Volume 42, Nombre 3

Algorithm 2 decode combined label, return a pair of left label and right label

1: procedure DECODELABEL(label)
if label.startswith?(”left”) alors
2:
return {label[4 :], nil}

else if label.startswith?(”right”) alors

return {nil, label[5 :]}

4:
5:

6:
7:

else

return {label[4 : label.index(

(cid:2)|(cid:2)

)], label[(label.index(

(cid:2)|(cid:2)

) + 1) :]}

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

end if
8:
9: end procedure

To our best efforts, the strategy B performs better.
D'abord, let us consider accuracy. Generally speaking, it is harder for transition clas-
siﬁcation if more target transitions are deﬁned. Using strategy A, we should add ad-
ditional transitions to handle the two-cycle condition. Based on our experiments, le
performance decreases when using more transitions.

Considering efﬁciency, we can save time by only using labels that appear in training
data in strategy B. If we have a total of K possible labels in training data, they will
generate K2 two-cycle types, but only k possible combinations of two-cycle appear in
training data (k (cid:9) K2). In strategy A, we must add K2 transitions to deal with all possible
two-cycle types, but most of them do not make sense. Using fewer two-cycle types helps
us eliminate the invalid calculation and save time effectively.

Using strategy B, we change the original edges’ labels and use the ARC(label)–non-
ARC transition instead of LEFT/RIGHT-ARC(label)–non-ARC. An ARC(label)–non-ARC
transition should execute the ARC(label) transition ﬁrst, then execute the non-ARC
transition. ARC(label) generates one or two edges depends on its label. Not only do
we encode two-cycle labels, but also LEFT/RIGHT-ARC labels. In practice, we only
use those labels that appear in training data. Because labels that do not appear only
contribute non-negative weights while training, we can eliminate them without any
performance loss.

For each transition system and each dependency graph, we generate an oracle
transition, and train our model according to this oracle. The constructive proofs pre-
sented in Section 2.3 et 2.4 deﬁne two kinds of oracles. Cependant, they are not directly
applicable when the transition combination strategy is utilized. The main challenge is
the existence of cycles. In this article, we propose three algorithms to derive oracles
for THMM, SS, and S2S, respectivement. Algorithms 3 à 5 illustrate the key steps of the
procedure of our system, which ﬁnd the next transition t given a conﬁguration c and
= (Vx, Agold) for the three systems. When this key procedure, namely,
gold graph Ggold
the EXTRACTONEORACLE method, is well deﬁned, the entire transition system can be
derived as follows:

oracle = ∅

EXTRACTORACLE(c0, Agold)
1
2 while t ← EXTRACTONEORACLE(c0, Agold, nil) do
3
4
5

oracle.push back = t
c0

← t(c0)
end while

364

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

Algorithm 3 Oracle generation for the THMM system
1: procedure EXTRACTONEORACLE(c, Agold, label)
2:

if c = (p|je, j|β, UN) ∧ ¬∃k[k (cid:3) j ∧ ∃l[(je, je, k) ∈ Agold]] alors

if label = nil then

return REDUCE

else

return ARC(label) ◦ REDUCE

end if

else if c = (p|je, j|β, UN) ∧ ∃l[(je, je, j) ∈ Agold] alors
− (je, je, j)

Agold
return EXTRACTONEORACLE(c, Agold, label)

← Agold

else

Agold] ∧ ∃l1[(i1, l1, k1) ∈ Agold] ∧ ¬∃k0
∃l1
(cid:14)) ∈ Agold]] ∧ k0

(cid:14)[(i1, l1

(cid:14), k1

< k1] ∨ ¬∃k1[k1 (cid:14)[k0 |i0, j|β, A) ∧ ∃k0k1[k0 (cid:14) < k0 ∧ ∃l0 (cid:3) j ∧ k1 (cid:14), k0 (cid:3) j ∧ ∃l1[(i1, l1, k1) ∈ Agold]] then (cid:3) j ∧ ∃l0[(i0, l0, k0) ∈ (cid:14)) ∈ Agold]] ∧ (cid:14)[(i0, l0 c = (σ|i1 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: if label = nil then return SWAP else return ARC(label) ◦ SWAP end if end if if c = (σ, j|β, A) then if label = nil then return SHIFT else return ARC(label) ◦ SHIFT end if end if return nil 25: 26: end procedure We want to emphasize that, although the EXTRACTORACLE methods initialize the parameter LABEL in EXTRACTONEORACLE as nil, if an arc transition is predicted in the EXTRACTONEORACLE method, it will call EXTRACTONEORACLE recursively to return an ARC(label)–non-ARC transition and assign a value for that LABEL. 3.3 Feature Design Developing features has been shown to be crucial to advancing the state-of-the-art in dependency parsing (Koo and Collins 2010; Zhang and Nivre 2011). To build accurate deep dependency parsers, we utilize a large set of features for transition classiﬁcation. To conveniently deﬁne all features, we use the following notation. In a conﬁguration with stack σ and buffer β, we denote the top two nodes in σ by σ 1, and the front 0. In a conﬁguration of the two-stack–based system with the second stack σ(cid:2) of β by β , the top element of σ(cid:2) 0 and the front of β by β 0. The left-most dependent of node n is denoted by n.lc, the right-most one by n.rc. The left-most parent of node n is denoted by n.lp, the right-most one by n.rp. Then we denote the word and POS-tag is denoted by σ(cid:2) 0 and σ 365 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 2 3 3 5 3 1 8 0 6 9 1 0 / c o l i _ a _ 0 0 2 5 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 42, Number 3 Algorithm 4 Oracle generation for the online re-ordering system 1: procedure EXTRACTONEORACLE(c, Agold, label) 2: if c = (σ|i, j|β, A) ∧ ¬∃k[k (cid:3) j ∧ ∃l[(i, l, k) ∈ Agold]] then if label = nil then return REDUCE else return ARC(label) ◦ REDUCE end if else if c = (σ|i, j|β, A) ∧ ∃l[(i, l, j) ∈ Agold] then − (i, l, j) Agold return EXTRACTONEORACLE(c, Agold, label) ← Agold else if c = (σ|i, j|β, A) ∧ ∃i(cid:14)[i(cid:14) < i ∧ i(cid:14) ∈ σ ∧ ∃l(cid:14)[(i(cid:14), l(cid:14), j) ∈ Agold]] then 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: if label = nil then return SWAP else return ARC(label) ◦ SWAP end if end if if c = (σ, j|β, A) then if label = nil then return SHIFT else return ARC(label) ◦ SHIFT end if end if return nil 25: 26: end procedure of node n by wn, pn, respectively. Our parser derives the so-called path features from dependency trees. The path features collect POS tags or the ﬁrst letter of POS tags along the tree between two nodes. Given two nodes n1 and n2, we denote the path feature as path(n1, n2) and the coarse-grained path feature as cpath(n1, n2). The syntactic head of a node n is denoted as n.h. We use the same feature templates for the online re-ordering and the two-stack– based systems, and they are slightly different from THMM. Figure 6 deﬁnes basic feature template functions. All feature templates are described here. 1), guni(β 0, σ 0), funi(σ THMM system: funi(σ 0), fcontext(σ 0), fcontext(β 1, β 0), fpair−l(σ 0, β fpair−l(σ 0), fpair(σ 0, β 1), ftri(σ 1), ftri−l(σ 0, σ .rp), ftri−l(σ .lc), ftri−l(σ .lc), ftri−l(σ 0, σ 0, β ftri−l(σ 0, β 0, σ 0, β 0, σ 0 0 0 .lc), ftri−l(σ .rp), ftri−l(σ .lp), ftri−l(σ 0, β ftri−l(σ 0, β 1, β 0, σ 1, β 0, σ 0 1 1 .lc), ftri−l(σ .lc), .lp), ftri−l(σ 0, σ 1, β ftri−l(σ 0, β 1, β 0, β 1, β 1 0 0 .lc2), .lc, σ .rp, σ .rc), fquar−l(σ 0, σ 0, σ 0, β fquar−l(σ 0 0 0 0 .rc2), fquar−l(σ .rc, σ .lc), .lp, β 0, β 0, σ 0, β fquar−l(σ 0 0 0 0 .rc), .rp, σ .lc2), fquar−l(σ .lc, β 0, σ 0, β 0, β fquar−l(σ 1 1 0 0 .rc2), .rc, σ .lc2), fquar−l(σ .lc, σ 0, σ 0, σ 1, β fquar−l(σ 1 1 1 1 0, β 0, β 1, β 1, β 0), 0, β 0, β 1, β .lp), 0, σ 0 .lp), 0, β 0 .lc), 0, σ 1 (cid:2) 366 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 2 3 3 5 3 1 8 0 6 9 1 0 / c o l i _ a _ 0 0 2 5 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Zhang, Du, Sun, and Wan Transition-Based Parsing for Deep Dependency Structures Algorithm 5 Oracle generation for the two-stack–based system 1: procedure EXTRACTONEORACLE(c, Agold, label) 2: s, j|β, A) ∧ ¬∃k[k (cid:3) j ∧ ∃l[(i, l, k) ∈ Agold]] then if c = (σ|i, σ s, j|β, A) ∧ ∃l[(i, l, j) ∈ Agold] then else if c = (σ|i, σ ← Agold Agold return EXTRACTONEORACLE(c, σ − (i, l, j) s, Agold, label) s, j|β, A) ∧ ∃i(cid:14)[i(cid:14) < i ∧ i(cid:14) ∈ σ ∧ ∃l(cid:14)[(i(cid:14), l(cid:14), j) ∈ Agold]] then if label = nil then return REDUCE else return ARC(label) ◦ REDUCE end if else if c = (σ|i, σ if label = nil then return MEM else return ARC(label) ◦ MEM end if else if c = (σ|i, σ s |is, j|β, A) then if label = nil then return RECALL else return ARC(label) ◦ RECALL end if end if if c = (σ, σ s, j|β, A) then if label = nil then return SHIFT else return ARC(label) ◦ SHIFT end if 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: end if return nil 31: 32: end procedure (cid:2) 0 0 0 0), 1, β 0, β 0, β .lc, β .lp, β .lc2), fpath(σ 1, β 0, β 0 0), fchar(σ .lc), fquar−l(σ 0), fquar−l(σ fpath(σ 0), fchar(β 1, β (cid:14)), guni(β 1), funi(σ Online re-ordering/two stack system: funi(σ 0 (cid:14), β 0), fpair−l(σ 0), fcontext(β 0), fpair−l(σ 0), fpair−l(σ fcontext(σ 0, β 0), 0 (cid:14)), ftri(σ (cid:14)), ftri−l(σ .lp), 0, σ 1), fpair(σ 0, σ 0, β 0, σ 0, β 0, σ 0, β 0, σ fpair(σ 1), ftri(σ 0 0 0 .rp), ftri−l(σ .lp), .lc), ftri−l(σ .lc), ftri−l(σ 0, β 0, σ 0, β 0, σ 0, β 0, β 0, σ 0, β ftri−l(σ 0 0 0 0 .lc), ftri−l(σ .lc), .rp), ftri−l(σ .lp), ftri−l(σ 1, β 0, β 0, σ 1, β 0, σ 1, β 0, σ 0, β ftri−l(σ 0 1 1 1 .lc), ftri−l(σ .lc), ftri−l(σ (cid:14).lp), (cid:14), β .lp), ftri−l(σ 0, σ 1, β 0, β 0, β 1, β ftri−l(σ 0, σ 1, β 1 0 0 0 0 (cid:14).lc), (cid:14).rp), ftri−l(σ (cid:14), β (cid:14), β (cid:14), β (cid:14).lc), ftri−l(σ 0, σ 0, σ 0, σ ftri−l(σ 0 0 0 0 0 0 .lc), fquar−l(σ .lp), ftri−l(σ (cid:14), β .rp, σ (cid:14), β 0, β 0, β ftri−l(σ 0, σ 0, β 0 0 0 0 0 0 .rc2), .rc, σ .lc2), fquar−l(σ .lc, σ 0, σ 0, β 0, σ 0, β fquar−l(σ 0 0 0 0 0), funi(σ 1, β .rc), 0), l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 2 3 3 5 3 1 8 0 6 9 1 0 / c o l i _ a _ 0 0 2 5 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 367 Computational Linguistics Volume 42, Number 3 .w ◦ .w, X−1 .w, X+2 .w, X−2 .p, X−1 .p, X−1 .p, X+2 .p, X+1 .w, X+1 .w ◦ X+2 .w, X−2 .p ◦ X−1 .p, X+1 .p ◦ X+1 .p, X−2 .w ◦ X−1 .p .p ◦ X+2 funi(X): X.w, X.p, X.w ◦ X.lp.l, X.w ◦ X.rp.l, X.w ◦ X.lc.l, X.w ◦ X.rc.l, X.w ◦ X.lp.a, X.w ◦ X.rp.a, X.w ◦ X.lc.a, X.w ◦ X.rc.a, X.w ◦ X.p.a, X.w ◦ X.c.a, X.w ◦ X.lc.set, X.p ◦ X.lc.set, X.w ◦ X.rc.set, X.p ◦ X.rc.set guni(X): X.w, X.p, X.w ◦ X.lp.l, X.p ◦ X.lp.l, X.w ◦ X.lc.l, X.p ◦ X.lc.l, X.w ◦ X.lp.a, X.p ◦ X.lp.a, X.w ◦ X.lc.a, X.p ◦ X.lc.a X.w ◦ X.lc.set, X.p ◦ X.lc.set fcontext(X): .w, X−1 X−2 .w, X+1 X+1 fpair(X, Y): X.wp ◦ Y.wp, X.wpY.w, X.wp ◦ Y.p, X.w ◦ Y.wp, X.p ◦ Y.wp, X.w ◦ Y.w, X.w ◦ Y.p, X.p ◦ Y.w, X.p ◦ Y.p fpair−l(X, Y): X.wp ◦ Y.wp, X.wpY.w, X.wp ◦ Y.p, X.w ◦ Y.wp, X.p ◦ Y.wp, X.w ◦ Y.w, X.w ◦ Y.p, X.p ◦ Y.w, X.p ◦ Y.p, X.w ◦ Y.w ◦ X.rc.a, X.w ◦ Y.w ◦ Y.lc.a, X.w ◦ Y.w ◦ (cid:16)X, Y(cid:17).d, X.p ◦ Y.p ◦ (cid:16)X, Y(cid:17).d, X.w ◦ Y.p ◦ (cid:16)X, Y(cid:17).d, X.p ◦ Y.w ◦ (cid:16)X, Y(cid:17).d, X.p ◦ Y.p ◦ X.lc.set, X.p ◦ Y.p ◦ X.rc.set, X.p ◦ Y.p ◦ Y.lc.set ftri(X, Y, Z): X.w ◦ Y.p ◦ Z.p, X.p ◦ Y.w ◦ Z.p, X.p ◦ Y.p ◦ Z.w, X.p ◦ Y.p ◦ Z.p ftri−l(X, Y, Z): X.w ◦ Y.p ◦ Z.p ◦ (cid:16)X, Z(cid:17).l, X.p ◦ Y.w ◦ Z.p ◦ (cid:16)X, Z(cid:17).l, X.p ◦ Y.p ◦ Z.w ◦ (cid:16)X, Z(cid:17).l, X.p ◦ Y.p ◦ Z.p ◦ (cid:16)X, Z(cid:17).l fquar−l(X, Y, Z, W): X.p ◦ Y.p ◦ Z.p ◦ W.p ◦ (cid:16)X, Z(cid:17).l ◦ (cid:16)X, W(cid:17).l fpath(X, Y): (cid:16)X, Y(cid:17).path, (cid:16)X, Y(cid:17).cpath, X.p ◦ Y.p ◦ X.tp.w, X.p ◦ Y.w ◦ X.tp.p, X.w ◦ Y.p ◦ X.tp.p, X.p ◦ Y.p ◦ Y.tp.w, X.p ◦ Y.w ◦ Y.tp.p, X.w ◦ Y.p ◦ Y.tp.p fchar(X): X[−1,−1] .w, X[−2,−1] .w, X[−3,−1] .w, X[+1,+1] .w, X[+1,+3] .w, X[+1,+2] .w Figure 6 Feature template functions. .lc, β .lp, β .lc), fquar−l(σ .lc2), 0, β 0, β 0, β 0, β fquar−l(σ 0 0 0 0 .lc, σ .rp, σ .rc), fquar−l(σ .lc2), 0, σ 0, σ 1, β 1, β fquar−l(σ 1 1 1 1 .lp, β .rc2), fquar−l(σ .rc, σ .lc), 0, σ 0, β 1, β 1, β fquar−l(σ 1 0 1 0 (cid:14).rc), (cid:14).rp, σ (cid:14), β .lc2), fquar−l(σ .lc, β 0, σ 0, β 1, β fquar−l(σ 0 0 0 0 0 (cid:14), β (cid:14).lc2), fquar−l(σ (cid:14).lc, σ (cid:14).rc2), (cid:14).rc, σ (cid:14), β 0, σ 0, σ fquar−l(σ 0 0 0 0 0 0 .lc), fquar−l(σ .lp, β (cid:14), β .lc2), fpath(σ .lc, β (cid:14), β 0, β fquar−l(σ 0, β 0 0 0 0 0 0 (cid:14), β 0), fchar(β 0), fchar(σ 0), fpath(σ 1, β fpath(σ 0) 0 0, β 0), 4. Tree Approximation Tree structures exhibit many computationally good properties, and parsing techniques for tree-structured representations are quite mature to some extent. When we consider semantics-oriented graphs, such as the representations for semantic role labeling 368 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 2 3 3 5 3 1 8 0 6 9 1 0 / c o l i _ a _ 0 0 2 5 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Zhang, Du, Sun, and Wan Transition-Based Parsing for Deep Dependency Structures (SRL; Surdeanu et al. 2008; Hajiˇc et al. 2009), CCG-grounded functor–argument (Clark, Hockenmaier, and Steedman 2002) analysis, HPSG-grounded predicate–argument analysis (Miyao, Ninomiya, and ichi Tsujii 2004), and reduction of MRS (Ivanova et al. 2012), syntactic trees can provide very useful features for semantic disambiguation (Punyakanok, Roth, and Yih 2008). Our parser also utilizes a path feature template (as deﬁned in Section 3.3) to incorporate syntactic information for disambiguation. In case syntactic tree information is not available, we introduce a tree approximation technique to induce tree backbones from deep dependency graphs. Such tree backbones can be utilized to train a tree parser which provides pseudo path features. In particular, we introduce an algorithm to associate every graph with a projective dependency tree, which we call weighted conversion. The tree reﬂects partial information about the corresponding graph. The key idea underlying this algorithm is to assign heuristic weights to all ordered pairs of words, and then ﬁnd the tree with maximum weights. That means a tree frame of a given graph is automatically derived as an alternative for syntactic analysis. We assign weights to all the possible edges (i.e., all pairs of words) and then determine which edges are to be kept by ﬁnding the maximum spanning tree. More formally, given a set of nodes V, each possible edge (i, j), where i, j ∈ V, is assigned a heuristic weight ω(i, j). Among all trees (denoted as T ) over V, the maximum spanning tree Tmax contains the maximum sum of values of edges: Tmax = arg max (V,AT )∈T (cid:2) (i,j)∈AT t(i, j)ω(i, j) (3) We separate the ω(i, j) into three parts (ω(i, j) = A(i, j) + B(i, j) + C(i, j)) that are as deﬁned here. (cid:2) (cid:2) (cid:2) (cid:2) A(i, j) = a · max{y(i, j), y(j, i)}: a is the weight for the existing edge on graph ignoring direction. B(i, j) = b · y(i, j): b is the weight for the forward edge on the graph. C(i, j) = n − |i − j|: This term estimates the importance of an edge where n is the length of the given sentence. For dependency parsing, we consider edges with short distance to be more important because those edges can be predicted more accurately in future parsing processes. a (cid:18) b (cid:18) n or a > bn > n2: The converted tree should contain as many arcs
as possible in original graph, and the direction of the arcs should not be
changed if possible. The relationship of a, b, and c guarantees this.

After all edges are weighted, we can use maximum spanning tree algorithms to
obtain the converted tree. To obtain the projective tree, we choose Eisner’s algorithm.
For any graph, we can call this algorithm and get a corresponding tree. Cependant,
the tree is informative only when the given graph is dense enough. Heureusement, ce
condition holds for semantic dependency parsing.

369

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

5. Empirical Evaluation

5.1 Set-up

We present empirical evaluation of different incremental graph spanning algorithms for
CCG-style functor–argument analysis, LFG-style grammatical relation analysis, and HPSG-
style semantic dependency analysis for English and Chinese. Linguistically speaking,
these types of syntacto-semantic dependencies directly encode information such as
coordination, extraction, raising, control, as well as many other long-range dependen-
cies. Experiments for a variety of formalisms and languages proﬁle different aspects of
transition-based deep dependency parsing models.

Chiffre 7 visualizes cross-format annotations assigned to the English sentence: UN
similar technique is almost impossible to apply to other crops, such as cotton, soybeans, et
rice. This running example illustrates a range of linguistic phenomena such as coor-
dination, verbal chains, argument and modiﬁer prepositional phrases, complex noun
phrases, and the so-called tough construction. The ﬁrst format is from the popular
corpus PropBank, which is widely used by various SRL systems. We can clearly see
that compared with SRL, SDP uses dense graphs to represent much more syntacto–
semantic information. This difference suggests to us that we should explore different
algorithms for producing SRL and SDP graphs. Another thing worth noting is that, pour
the same phenomenon, annotation schemes may not agree with each other. Take the
coordination construction, Par exemple. For more details about the difference among
different data sets, please refer to Ivanova et al. (2012).

For CCG analysis, we conduct experiments on English and Chinese CCGBank
(Hockenmaier and Steedman 2007; Tse and Curran 2010). Following previous experi-
mental set-up for English CCG parsing, we use Section 02-21 as training data, Section 00
as the development data, and Section 23 for testing. To conduct Chinese parsing exper-
iments, we use data setting C of Tse and Curran (2012). For grammatical relation analy-
sis, we conduct experiments on Chinese GRBank data (Sun et al. 2014). The selection for
entraînement, development, and test data is also according to Sun et al.’s (2014) experiments.
We also evaluate all parsing models using more HPSG-grounded semantics-oriented
data, namely, DeepBank2 (Flickinger, Zhang, and Kordoni 2012) and EnjuBank (Miyao,
Ninomiya, and ichi Tsujii 2004). Different from Penn Treebank–converted corpus,
DeepBank’s annotations are essentially based on the parsing results given a large-scale
linguistically precise HPSG grammar, namely, LingGO English resource grammar (ERG;
Flickinger 2000), and manually disambiguated. As part of the full HPSG sign, the ERG
also makes available a logical-form representation of propositional semantics, dans le
framework of minimal recursion semantics (MRS; Copestake et al. 2005). Such seman-
tic information is reduced into variable-free bilexical dependency graphs (Oepen and
Lønning 2006; Ivanova et al. 2012). En résumé, DeepBank gives the reduction of logical-
form meaning representations with respect to MRS. EnjuBank (Miyao, Ninomiya, et
ichi Tsujii 2004) provides another corpus for semantic dependency parsing. This type
of annotation is somehow shallower than DeepBank, given that only basic predicate–
argument structures are concerned. Different from DeepBank but similar to CCGBank
and GRBank, EnjuBank is semi-automatically converted from Penn Treebank–style an-
notations with linguistic heuristics. To conduct HPSG experiments, we use Sections 00
à 19 as training data and Section 20 as development data to tune parameters. For ﬁnal

2 http://moin.delph-in.net/DeepBank.

370

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

A similar technique is almost impossible to apply to other crops, such as cotton, soybeans and rice

(un) Format 1: Propositional semantics, from PropBank.

ROOT

arg2

arg1

arg3

arg1

implicit conj

arg2

and c

A similar technique is almost impossible to apply to other crops, such as cotton, soybeans and rice.

(b) Format 2: MRS-derived dependencies, from DeepBank HPSG annotations.

ROOT

arg1

arg2

arg1

arg2

arg1

arg2

arg1 arg1

arg1

arg2 arg1

arg1

arg2

A similar technique is almost impossible to apply to other crops , such as cotton , soybeans and rice

ROOT

arg3

arg1

arg2

arg1

arg2

arg1

arg2

A similar technique is almost impossible to apply to other crops , such as cotton , soybeans and rice

(d) Format 4: Functor-argument structures, from CCGBank.

Chiffre 7
Dependency representations in (un) PropBank, (b) DeepBank, (c) Enju HPSGBank, et (d)
CCGBank formats.

evaluation, we use Sections 00 à 20 as training data and section 21 as test data. Le
DeepBank and EnjuBank data sets are from SemEval 2014 Task 8 (Oepen et al. 2014),
and the data splitting policy follows the shared task. Tableau 1 gives a summary of the
data sets for experiments.

Experiments for English CCG-grounded analysis were performed using automat-
ically assigned POS-tags that are generated by a symbol-reﬁned generative HMM
tagger3 (SR-HMM; Huang, Harper, and Petrov 2010). Experiments for English HPSG-
grounded analysis used POS-tags provided by the shared task. For the experiments on
Chinese CCGBank and GRBank, we use gold-standard POS tags.

We use the averaged perceptron algorithm with early update to estimate param-
eters, and beam search for decoding. We set the beam size to 16 and the number
of iterations to 20 for all experiments. The measure for comparing two dependency
graphs is precision and recall of tokens that are deﬁned as (cid:16)wh, wd, je(cid:17) tuples, where wh is
the head, wd is the dependent, and l is the relation. Labeled precision/recall (LP/LR)
is the ratio of tuples correctly identiﬁed by the automatic generator, and unlabeled
precision/recall (UP/UR) is the ratio regardless of l. F-score is a harmonic mean of
precision and recall. These measures correspond to attachment scores (LAS/UAS) in de-
pendency tree parsing and also used by the SemEval 2014 Task 8. The de facto standard

3 http://www.code.google.com/p/umd-featured-parser/.

371

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Tableau 1
Data sets for experiments. Columns “Training” and “Test” present the number of sentences in
training and test sets, respectivement.

Language

Formalism Data

Training

Test

English

Chinese

CCG
HPSG
HPSG

CCG
LFG

CCGBank
DeepBank
EnjuBank

CCGBank
GRBank

39,604
34,003
34,003

22,339
22,277

2,407
1,348
1,348

2,813
2,557

to evaluate CCG parsers also considers supertags. Because no supertagging is performed
in our experiments, only the unlabeled precision/recall/F-score is comparable to the
results reported in other papers. And the labeled performance reported here only
considers the labels assigned to dependency arcs that indicate the argument types. Pour
example, an arc label arg1 denotes that the dependent is the ﬁrst argument of the head.

5.2 Parsing Efﬁciency

We evaluate the real running time of our ﬁnal trained parser using realistic data. The test
sentences are collected from English Wikipedia and Chinese Gigaword (LDC2005T14).
D'abord, we show the inﬂuence of beam size in Figure 8. In this experiment, the DeepBank
trained models are used for test. We can see that the parsers run in nearly linear time
regardless of the beam width in realistic situations. Deuxième, we report the the averaged
real running time of models trained on different data sets in Figure 9. Encore, we can
see that the parser runs in close to linear time for a variety of linguistically motivated
representations. The results also suggest that our proposed transition-based parsers can
automatically learn the complexity of linguistically motivated dependency structures
from an annotated corpus. Note that although within the deep parsing framework, le
study of formal grammars is partially relevant for data-driven dependency parsing,

Beam 2
Beam 4
Beam 8
Beam 16

0.6

0.5

0.4

0.3

0.2

0.1

)
d
n
o
c
e
s
je
je
je
je

(

e
m
T

Beam 2
Beam 4
Beam 8
Beam 16

0.6

0.5

0.4

0.3

0.2

0.1

)
d
n
o
c
e
s
je
je
je
je

(

e
m
T

40
Words

Online re-ordering

40
Words
Two-stack

Chiffre 8
Real running time relative to beam size. Tested using DeepBank-trained models.

372

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

DeepBank
EnjuBank
CCGBank(dans)
CCGBank(cn)
GRBank(cn)

0.6

0.5

0.4

0.3

0.2

0.1

)
d
n
o
c
e
s
je
je
je
je

(

e
m
T

DeepBank
EnjuBank
CCGBank(dans)
CCGBank(cn)
GRBank(cn)

0.7

0.6

0.5

0.4

0.3

0.2

0.1

)
d
n
o
c
e
s
je
je
je
je

(

e
m
T

40
Words

Online re-ordering

40
Words
Two-stack

Chiffre 9
Real running time relative to models trained on different data sets.

where our parsers rely on inductive inference from treebank data, and only implicitly
use a grammar.

5.3 Importance of Transition Combination

Chiffre 10 and Table 2 summarize the labeled parsing results on all of the ﬁve data
sets. In this experiment, we distinguish parsing models with and without transition
combination. All models take only the surface word form and POS tag information and
do not derive features from any syntactic analysis. The importance of transition combi-
nation is highlighted by the comparative evaluation on parsers using this mechanism
ou non. Signiﬁcant improvements are observed over a wide range of conditions: Parsers
based on different transition systems for different languages and different formalisms
almost always beneﬁt. This result suggests a necessary strategy for designing transition
systems for producing deep dependency graphs: Conﬁgurations should be essentially
modiﬁed by every transition.

Because of the importance of transition combination, all the following experiments

utilize the transition combination strategy.

5.4 Model Diversity and Parser Ensemble
5.4.1 Model Diversity. For model ensemble, besides the accuracy of each single model,
it is also essential that the models to be integrated should be very different. We argue
that heterogeneous parsing models can be built by varying the underlying transition
systèmes. By reversing the sentence from right to left, we can build other model variants
with the same transition system. To evaluate the differences between two models A and
B, we deﬁne the following metric:

2 ∗ |D
∩ D
B
UN
|
| + |D
|D
B

where D
X denotes the set of dependencies related to held out sentences returned
by model X. Tables 3 et 4 show the model diversity evaluated on English and
Chinese data, respectivement. We can see that parsing models built upon different tran-
sition systems do vary. Even for one speciﬁc transition system, different processing
directions yield quite different parsing results.

373

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

e
r
o
c
s
–
F
d
e
e
b
un
L

Standard
Transition Combination

e
r
o
c
s
–
F
d
e
e
b
un
L

DM(dans) PAS(dans) CCG(dans) CCG(cn) GR(cn)

THMM

Online re-ordering

Standard
Transition Combination

e
r
o
c
s
–
F
d
e
e
b
un
L

DM(dans) PAS(dans) CCG(dans) CCG(cn) GR(cn)

Two-stack

Chiffre 10
Labeled parsing F-scores of different transition system with and without transition combination.
“Standard” denotes the standard systems, which do not combine an ARC transition with its
following transition.

5.4.2 Parser Ensemble. Parser ensemble has been shown very effective to boost the
performance of data-driven tree parsers (Nivre and McDonald 2008; Surdeanu and
Manning 2010; Sun and Wan 2013). Empirically, the two proposed systems together
with the existing THMM system exhibit complementary prediction powers, and their
combination yields superior accuracy. We present a simple yet effective voting strategy
for parser ensemble. For each pair of words in each sentence, we count the number of
models that give positive predictions. If the number is greater than a threshold (we set
it to half the number of models in this work), we put this arc to the ﬁnal graph, and label
the arc with the most common label of what the models give.

Tableau 5 presents the parsing accuracy of the combined model where six base models
are utilized for voting. We can see that a system ensemble is quite helpful. Given that
our graph parsers all run in expected linear time, the combined system also runs very
efﬁciently.

5.5 Impact of Syntactic Parsing
5.5.1 Effectiveness of Syntactic Features. Syntactic parsing, especially the full one, has been
shown very important for boosting the performance of SRL, a well studied shallow
semantic parsing task (Punyakanok, Roth, and Yih 2008). According to the comprehen-
sive evaluation presented in Punyakanok, Roth, and Yih (2008) and Zhuang and Zong

374

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

Tableau 2
Performance of different transition system with and without transition combination on the test
set of the DeepBank/EnjuBank data, on the development set of the English and Chinese
CCGBank data, and on the development set of the Chinese GRBank data. Sstd
x denotes the
standard system, which does not combine an ARC transition with its following transition.

DeepBank
LR

English

EnjuBank
LR

CCGBank
LR

82.71% 84.32% 83.51
85.00% 84.40% 84.70
82.60% 84.46% 83.52
84.63% 83.98% 84.30
82.97% 84.65% 83.80
85.01% 84.41% 84.71

6.89% 87.48% 87.19
88.66% 88.17% 88.42
86.72% 87.38% 87.06
88.58% 88.20% 88.39
87.25% 87.75% 87.55
88.80% 88.48% 88.64

85.88% 85.81% 85.85
87.00% 85.93% 86.46
85.60% 86.04% 85.82
86.63% 86.04% 86.33
86.06% 86.29% 86.17
86.77% 86.25% 86.51

Chinese

CCGBank
LR

GRBank
LR

80.93% 80.75% 80.84
82.04% 81.16% 81.60
80.86% 81.60% 81.23
81.71% 81.48% 81.60
80.81% 81.35% 81.08
82.09% 81.81% 81.95

80.10% 77.95% 79.01
81.28% 78.40% 79.81
80.32% 78.96% 79.63
80.30% 79.46% 79.88
80.58% 80.23% 80.41
80.88% 80.18% 80.53

Sstd
T
ST
Sstd
S
SS
Sstd
2S
S2S

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(2010) (see Table 6), there is an essential gap between full and shallow parsing-based
SRL systems. If we consider a system that takes only word form and POS tags as input,
the performance gap will be larger.

When we consider semantics-oriented deep dependency structures, y compris
the representations for CCG-grounded functor–argument (Clark, Hockenmaier, et
Steedman 2002) analyse, HPSG-grounded predicate–argument analysis
(Miyao,
Ninomiya, and ichi Tsujii 2004), and reduction of MRS (Ivanova et al. 2012), syntactic
parses can also provide very useful features for disambiguation. To evaluate the impact
of syntactic tree parsing, we include more features, namely, path features, to our parsing
models. The detailed description of syntactic features are presented in Section 3.3. Dans
this work, we apply syntactic dependency parsers rather than phrase-structure parsers.
Chiffre 11 summarizes the impact of features derived from syntactic trees. We can clearly
see that syntactic features are effective to enhance semantic dependency parsing. These
informative features lead to on average 1.14% et 1.03% absolute improvements for
English and Chinese CCG parsing. Compared with SRL, the improvement brought by
syntactic parsing is smaller. We think one main reason for this difference is the informa-
tion density of different types of graphs. SRL graphs usually annotate only on verbal
predicates and their nominalization, whereas the semantic graphs grounded by CCG and
HPSG target all words. Autrement dit, SRL provides partial analysis and semantic de-
pendency parsing provides full analysis. Accordingly, SRL needs structural information
generated by a syntactic parser much more than semantic dependency parsing.

375

Computational Linguistics

Volume 42, Nombre 3

Tableau 3
Model diversity between different models on the test set of the DeepBank/EnjuBank data and
on the development set of the English CCGBank data. Srev
system Sx but in the right-to-left word order.

x means processing a sentence with

S2S

0.9285

0.9285
0.9385

DeepBank
Srev
T

0.8788
0.8748
0.8772

S2S

0.9504

0.9481
0.9503

S2S

0.9547

0.9532
0.9575

EnjuBank
Srev
T

0.9045
0.9046
0.9066

CCGBank
Srev
T

0.9155
0.9166
0.9200

Srev
S

0.8796
0.8776
0.8802
0.9390

Srev
S

0.9038
0.9060
0.9087
0.9562

Srev
S

0.9164
0.9179
0.9197
0.9586

Srev
2S

0.8797
0.8773
0.8790
0.9364
0.9413

Srev
2S

0.9043
0.9055
0.9076
0.9565
0.9584

Srev
2S

0.9182
0.9187
0.9205
0.9575
0.9617

ST
SS
S2S
Srev
T
Srev
S

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Tableau 4
Model diversity between different models on the development set of the Chinese
CCGBank/GRBank data.

S2S

0.9261

0.9262
0.9314

CCGBank
Srev
T

0.8668
0.8667
0.8694

Srev
S

0.8614
0.8593
0.8624
0.9130

S2S

0.8918

0.8861
0.9058

GRBank
Srev
T

0.8398
0.8455
0.8460

Srev
S

0.8301
0.8378
0.8391
0.8969

Srev
2S

0.8658
0.8663
0.8683
0.9107
0.9230

Srev
2S

0.8328
0.8414
0.8441
0.8984
0.9111

ST
SS
S2S
Srev
T
Srev
S

376

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

Tableau 5
Performance of base and combined models on the test set of the DeepBank/EnjuBank data, sur
the development set of the English and Chinese CCGBank data and on the development set of
the Chinese GRBank data. Note that the labeled results for CCG parsing do not consider
supertags.

DeepBank UP

English
UR

ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Combined
EnjuBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Combined
CCGBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

87.03% 86.42% 86.72
88.16% 88.16% 88.16
86.57% 85.91% 86.24
88.53% 88.27% 88.40
86.99% 86.38% 86.68
88.12% 88.11% 88.11

85.00% 84.40% 84.70
86.12% 86.11% 86.12
84.63% 83.98% 84.30
86.63% 86.38% 86.51
85.01% 84.41% 84.71
86.28% 86.26% 86.27

88.29% 90.27% 89.27
UP
UF
UR
89.98% 89.47% 89.72
91.93% 91.92% 91.92
89.86% 89.48% 89.67
92.12% 92.04% 92.08
90.07% 89.75% 89.91
92.09% 92.01% 92.05

86.46% 88.40% 87.42
LP
LR
88.66% 88.17% 88.42
90.67% 90.66% 90.67
88.58% 88.20% 88.39
90.86% 90.79% 90.82
88.80% 88.48% 88.64
90.88% 90.80% 90.84

91.31% 93.62% 92.45
UP
UF
UR
91.10% 89.98% 90.54
91.23% 91.27% 91.25
90.80% 90.18% 90.49
91.12% 91.31% 91.22
90.85% 90.30% 90.58
91.57% 91.63% 91.60

90.15% 92.43% 91.28
LP
LR
87.00% 85.93% 86.46
87.25% 87.28% 87.27
86.63% 86.04% 86.33
87.35% 87.53% 87.44
86.77% 86.25% 86.51
87.83% 87.89% 87.86

Combined

91.43% 92.83% 92.13

87.76% 89.10% 88.42

Chinese

CCGBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Combined
GRBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

UP
UF
UR
86.24% 85.31% 85.77
85.20% 85.13% 85.16
85.86% 85.62% 85.74
84.65% 85.90% 85.27
86.17% 85.87% 86.02
85.14% 86.32% 85.73

LP
LR
82.04% 81.16% 81.60
80.97% 80.90% 80.94
81.71% 81.48% 81.60
80.55% 81.74% 81.14
82.09% 81.81% 81.95
81.05% 82.18% 81.61

86.63% 89.05% 87.82
UF
UR
UP
83.38% 80.43% 81.88
85.03% 84.05% 84.54
82.35% 81.49% 81.92
83.74% 85.04% 84.39
82.93% 82.20% 82.56
83.84% 85.02% 84.43

82.83% 85.14% 83.97
LR
LP
81.28% 78.40% 79.81
82.93% 81.98% 82.45
80.30% 79.46% 79.88
81.77% 83.05% 82.41
80.88% 80.18% 80.53
81.80% 82.94% 82.37

Combined

86.05% 87.14% 86.59

84.06% 85.12% 84.59

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

377

Computational Linguistics

Volume 42, Nombre 3

Tableau 6
Performance of English and Chinese SRL achieved by representative full and shallow
parsing-based systems. The results are copied from Punyakanok, Roth, and Yih (2008) et
Zhuang and Zong (2010).

Precison

Recall

F-score

English

Full parsing
Shallow parsing

77.09% 75.51%
75.48% 67.13%

Chinese

Full parsing
Shallow parsing

79.17% 72.09%
72.57% 67.02%

76.29
71.06

75.47
69.68

5.5.2 Comparison of Different Tree Parsers. There are two dominant data-driven
approaches to syntactic dependency tree parsing: transition-based (Yamada and
Matsumoto 2003; Nivre 2008) and graph-based (McDonald's 2006; Torres Martins, Forgeron,
and Xing 2009). In terms of overall per token prediction, the transition-based and
graph-based tree parsers achieve comparable performance (Suzuki et al. 2009; Blanc
et autres. 2015). To evaluate the impact of the two tree parsing approaches on semantic
dependency parsing, we use two tree parsers to serve our graph parser. The ﬁrst
one is our in-house implementation of the algorithm presented in Zhang and Nivre
(2011), and the second one is a second-order graph-based parser4 (Bohnet 2010). Le
tree parsers are trained with the unlabeled tree annotations provided by the English
and Chinese CCGBank data. For both English and Chinese experiments, 5-fold cross
validation is performed to parse the training data to avoid overﬁtting. The accuracy
of tree parsers is shown in Table 7. Results presented in Figure 12 indicate that the
two parsers are also equivalently effective for producing semantic analysis. Ce
result is somehow non-obvious given that the combination of a graph-based and
transition-based parser usually gives signiﬁcantly better parsing performance (Nivre
and McDonald 2008; Torres Martins et al. 2008).

5.6 Effectiveness of Tree Approximation

In case syntactic information is not available, we propose a tree approximation tech-
nique to induce tree backbones from deep dependency graphs. En particulier, notre
technique guarantees that the automatically derived trees are projective, which is a nec-
essary condition for a number of effective tree parsing algorithms. We can utilize these
pseudo trees as an alternative to syntactic analysis. To evaluate the effectiveness of tree
approximation, we compare the contribution to semantic dependency parsing of syn-
tactic trees and pseudo trees. In this experiment, we use a transition-based tree parser
to generate automatic analysis. Chiffre 13 presents the results. Generally speaking,
pseudo trees contribute to semantic dependency parsing equally well as syntactic trees.
Sometimes, they perform even better. There is a considerable drop when DeepBank data
are applied. We think the main reason is the density of DeepBank graphs. Because there
are fewer edges in the original graphs, it is harder to extract informative pseudo trees.
Par conséquent, the ﬁnal graph parsing beneﬁts less.

4 http://www.code.google.com/p/mate-tools/.

378

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

e
r
o
c
s
–
F
d
e
e
b
un
L

No Tree
Syntactic Tree

DM(dans)

PAS(dans)

CCG(dans)

CCG(cn)

THMM

No Tree
Syntactic Tree

DM(dans)

PAS(dans)

CCG(dans)

CCG(cn)

Online re-ordering

No Tree
Syntactic Tree

DM(dans)

CCG(dans)

PAS(dans)
Two stack

CCG(cn)

e
r
o
c
s
–
F
d
e
e
b
un
L

No Tree
Syntactic Tree

DM(dans)

PAS(dans)

CCG(dans)
THMM (reverse)

CCG(cn)

No Tree
Syntactic Tree

DM(dans)

CCG(cn)
Online re-ordering (reverse)

CCG(dans)

PAS(dans)

No Tree
Syntactic Tree

PAS(dans)

DM(dans)
Two stack (reverse)

CCG(dans)

CCG(cn)

Chiffre 11
Parsing accuracy with and without syntactic features. The syntactic trees for experiments on
DeepBank and EnjuBank data sets are provided by the SemEval 2014 shared task, and they are
automatically generated by the Stanford Parser. The syntactic trees for experiments on English
and Chinese CCG data sets are generated by our in-house implementation of the model
introduced in Zhang and Nivre (2011).

It is also possible to build a parser ensemble on pseudo tree enhanced models.
Cependant, the effectiveness of system combination is not as effective as integrating non-
tree models. Tableau 8 summarizes the detailed parsing accuracy. We can see that system
ensemble is still helpful, though the improvement is limited.

379

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Tableau 7
Accuracy of preprocessing on the development data for CCG analysis. Tr and Gr, respectivement,
denote transition-based and graph-based tree parsers.

UAS(Tr) UAS(Gr)

English
Chinese

93.48%
80.97%

93.47%
80.81%

5.7 Comparison with Other Parsers
5.7.1 Comparison with Grammar-Based Parsers. We compare our parser with several
representative Treebank-guided, grammar-based parsers that achieve state-of-the-art
performance for CCG and HPSG analysis. The grammar-based parsers selected represent
two different architectures.

(cid:2)

e
r
o
c
s
–
F
d
e
e
b
un
L

The ﬁrst type of parser implements a shift-reduce parsing architecture and
also uses beam search for practical decoding. En particulier, we compare
our parser with the state-of-the-art CCG parser introduced in Xu, Clark, et

Transition-based
Graph-based

e
r
o
c
s
–
F
d
e
e
b
un
L

CCG(dans) CCG(en/rev) CCG(cn) CCG(cn/rev)

THMM

Online re-ordering

Transition-based
Graph-based

e
r
o
c
s
–
F
d
e
e
b
un
L

CCG(dans) CCG(en/rev) CCG(cn) CCG(cn/rev)

Two stack

Chiffre 12
Labeled F-scores with respect to different tree parsing techniques. Results shown here are from
experiments for English and Chinese CCG parsing.

380

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

e
r
o
c
s
–
F
d
e
e
b
un
L

Syntactic Tree
Pseudo Tree

DM(dans)

PAS(dans)

CCG(dans)

CCG(cn)

THMM

Syntactic Tree
Pseudo Tree

DM(dans)

PAS(dans)

CCG(dans)

CCG(cn)

Online re-ordering

Syntactic Tree
Pseudo Tree

DM(dans)

CCG(dans)

PAS(dans)
Two stack

CCG(cn)

e
r
o
c
s
–
F
d
e
e
b
un
L

Syntactic Tree
Pseudo Tree

DM(dans)

PAS(dans)

CCG(dans)
THMM (reverse)

CCG(cn)

Syntactic Tree
Pseudo Tree

DM(dans)

CCG(cn)
Online re-ordering (reverse)

CCG(dans)

PAS(dans)

Syntactic Tree
Pseudo Tree

PAS(dans)

DM(dans)
Two stack (reverse)

CCG(dans)

CCG(cn)

Chiffre 13
Parsing accuracy based on syntactic and pseudo tree features. All trees are generated by our
in-house implementation of the model introduced in Zhang and Nivre (2011).

(cid:2)

Zhang (2014).5 This parser extends a shift-reduce CFG parser (Zhang and
Clark 2011a) with a dependency model.

The second type of parser implements the chart parsing architecture with
some reﬁnements. For CCG analysis, we focus on the parser proposed by

5 The unlabeled parsing results are not reported in the original paper. The ﬁgures presented in Table 9 sont

provided by Wenduan Xu.

381

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Tableau 8
Performance of base and combined models on the test set of the DeepBank/EnjuBank data and
on the development set of the English and Chinese CCGBank data. Features extracted from
pseudo trees are utilized for disambiguation.

DeepBank

English
UF

ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Combined
EnjuBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Combined
CCGBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

87.99% 87.64% 87.81
88.94% 88.98% 88.96
87.76% 87.45% 87.60
88.72% 88.65% 88.69
87.92% 87.60% 87.76
89.04% 88.85% 88.95

85.95% 85.61% 85.78
87.07% 87.11% 87.09
85.83% 85.52% 85.67
86.88% 86.82% 86.85
86.03% 85.72% 85.87
87.15% 86.96% 87.05

88.54% 90.25% 89.39
UP
UF
UR
91.88% 91.45% 91.66
92.82% 92.83% 92.83
91.76% 91.39% 91.58
92.66% 92.65% 92.65
91.85% 91.54% 91.70
92.92% 92.83% 92.87

86.65% 88.32% 87.48
LP
LR
90.60% 90.17% 90.38
91.61% 91.61% 91.61
90.50% 90.14% 90.32
91.45% 91.44% 91.45
90.63% 90.33% 90.48
91.77% 91.68% 91.73

92.47% 93.52% 92.99
UP
UF
UR
92.15% 91.05% 91.60
92.46% 92.27% 92.37
91.91% 91.18% 91.54
92.34% 92.43% 92.39
91.86% 91.13% 91.49
92.53% 92.41% 92.47

91.34% 92.38% 91.86
LP
LR
88.20% 87.15% 87.67
88.78% 88.61% 88.70
87.97% 87.28% 87.62
88.67% 88.76% 88.72
87.92% 87.22% 87.57
88.85% 88.73% 88.79

Combined

92.38% 93.20% 92.79

88.92% 89.71% 89.31

CCGBank
ST
Srev
T
SS
Srev
S
S2S
Srev
2S

Chinese
UP
UF
UR
87.44% 86.41% 86.93
87.11% 87.04% 87.07
86.76% 86.52% 86.64
86.46% 87.54% 87.00
87.09% 86.91% 87.00
86.57% 87.69% 87.13

LP
LR
83.44% 82.45% 82.94
83.24% 83.17% 83.21
82.85% 82.63% 82.74
82.69% 83.72% 83.20
83.21% 83.03% 83.12
82.75% 83.82% 83.28

Combined

87.27% 89.00% 88.12

83.57% 85.23% 84.39

Auli and Lopez (2011b). The basic system architecture follows the
well-engineered C&C Parser,6 and additionally applies a number of
advanced machine learning and optimization techniques, including belief
propagation, dual decomposition Auli and Lopez (2011un), and parameter
estimation with softmax-margin loss (Auli and Lopez 2011b), to enhance
the results. For HPSG analysis, we compare with the well-studied Enju

6 http://svn.ask.it.usyd.edu.au/trac/candc.

382

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

Tableau 9
Parsing results on test sets obtained by representative parsers. State-of-the-art results on these
data sets, as reported in Oepen et al. (2014), Martins and Almeida (2014), Xu, Clark, and Zhang
(2014), Auli and Lopez (2011b), Du, Sun, and Wan (2015), Sun et al. (2014), are included.

DeepBank
Our system

Srev
2S
Combined
Srev
2S +Pseudo Tree
Combined+Pseudo Tree

LP
LR
86.28% 86.26% 86.27
86.46% 88.40% 87.42
87.15% 86.96% 87.05
86.65% 88.32% 87.48

Factorization (Turbo)

(Martins and Almeida 2014)

88.82% 87.35% 88.08

EnjuBank
Our system

Srev
2S
Combined
Srev
2S +Pseudo Tree
Combined+Pseudo Tree

LP
LR
90.88% 90.80% 90.84
90.15% 92.43% 91.28
91.77% 91.68% 91.73
91.34% 92.38% 91.86

Chart parsing (Enju)

(Oepen et al. 2014)

92.09% 92.02% 92.06

Factorization (Turbo)

(Martins and Almeida 2014)

91.95% 89.92% 90.93

English CCGBank
Our system

Srev
2S
Combined
Srev
2S +Pseudo Tree
Combined+Pseudo Tree

UP
UF
UR
91.84% 91.75% 91.80
92.06% 93.14% 92.60
92.49% 92.30% 92.40
92.52% 93.13% 92.82

Shift-reduce
Chart parsing

(Xu, Clark, and Zhang 2014)
(Auli and Lopez 2011b)

93.15% 91.06% 92.09
93.08% 92.44% 92.76

Factorization
Chinese GRBank
Our system

Transition-based
Chinese CCGBank
Our system

(Du, Sun, and Wan 2015)

Srev
2S
Combined

(Sun et al. 2014)

Srev
2S
Combined
Srev
2S +Pseudo Tree
Combined+Pseudo Tree

93.03% 92.03% 92.53
LP
LR
82.28% 83.11% 82.69
84.92% 85.28% 85.10

83.93% 79.82% 81.82
UF
UR
UP
85.07% 86.02% 85.54
86.35% 88.85% 87.58
86.65% 87.34% 86.99
87.14% 88.60% 87.86

Parser,7 which develops a number of advanced techniques for
discriminative deep parsing—for example, maximum entropy estimation
with feature forest (Miyao and Tsujii 2008) and efﬁcient decoding with
supertagging and CFG-ﬁltering (Matsuzaki, Miyao, and Tsujii 2007).

Tableau 9 shows the ﬁnal results on the test data for each data set. The representa-
tive shift-reduce parser for comparison utilizes a very similar learning and decoding
architectures to our system. Similar to our parser, Xu, Clark, and Zhang’s (2014) parser
incrementally processes a sentence and uses a beam decoder that performs an inexact

7 http://kmcs.nii.ac.jp/enju/?lang=en.

383

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

recherche. Xu, Clark, and Zhang’s parser sets beam width to 128, while ours is 16. It also
uses the structured prediction algorithm for parameter estimation. The major difference
is that the shift-reduce CCG parser explicitly utilizes a core grammar to guide decoding,
whereas our parser excludes all such information. Actually, our models reported here
also exclude all syntactic information because no syntactic parse is used for feature
extraction. We can see that our individual system based on the two stack transition
system achieves equivalent performance to the CCG-driven parser. De plus, when this
individual system is augmented with tree approximation, the accuracy is signiﬁcantly
improved. Note that the individual system with both settings does not rely on any
explicit syntactic information. This result on one hand indicates the effectiveness of
adapting syntactic parsing techniques for full semantic parsing, and on the other hand
suggests the possibility of using semantically structural (not syntactically structural)
information only to achieve high-accuracy semantic parsing.

Statistical parsers based on chart parsing are able to perform a more principled
search and therefore usually achieve better parsing accuracy than a normal shift-reduce
parser. We also compare our parsing models with two state-of-the-art chart parsers,
namely, the Enju Parser (Miyao and Tsujii 2008) and Auli and Lopez’s (2011b) parser.
Different from Xu, Clark, and Zhang’s (2014) shift-reduce parser and our models, Auli
and Lopez’s (2011b) parser does not guarantee to produce analysis for arbitrary sen-
tences. Généralement, the numerical performance evaluated on all sentences is lower than the
results obtained on sentences that can be parsed. Note that Auli and Lopez (2011b) only
reported results on sentences that are covered, whereas Oepen et al. (2014) reported
results on all sentences, which is achieved by Enju Parser. From Table 9, we can clearly
see that our graph-spanning models are very competitive. The best individual and com-
bined models outperform the Enju Parser and perform equally well to Auli and Lopez’s
(2011b) parser. It is worth noting that strictly less information is used by our parsers.

5.7.2 Comparison with Other Data-Driven Parsers. We also compare our parser with re-
cently developed data-driven, factorization models (Martins and Almeida 2014; Du,
Sun, and Wan 2015). Different from projective but similar to non-projective tree parsing,
decoding for factorization models where very basic second-order sibling factors are
incorporated is NP-hard. See the proof presented in our early work (Du, Sun, et
Wan 2015) for details. To perform principled decoding, dual decomposition is used and
achieves good empirical results (Martins and Almeida 2014; Du, Sun, and Wan 2015).

From Table 9, we can see that the transition-based approach augmented with tree
approximation is comparable to the factorization approach in general. Compared with
the Turbo Parser, our individual and hybrid models perform signiﬁcantly worse on
DeepBank but signiﬁcantly better on EnjuBank. We think one main reason is because
of the annotation styles. Though both corpora are based on HPSG, the annotations in
question are quite different. DeepBank graphs are more sparse than EnjuBank, lequel
makes tree approximation less effective. It seems that the transition-based parser suffers
more when fewer output edges are targeted. The two approaches achieve equivalent
performance for CCG parsing.

6. Related Work

Deep linguistic processing is concerned with NLP approaches that aim at modeling
the complexity of natural languages in rich linguistic representations. Such approaches
are typically related to a particular computational linguistic theory (par exemple., CCG, LFG, et

384

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

HPSG). Parsing in these formalisms provides an elegant way to generate deep syntacto-
semantic dependency structures with high quality (Clark and Curran 2007; Miyao,
Sagae, and Tsujii 2007; Miyao and Tsujii 2008). The incremental shift-reduce parsing
architecture has been implemented for CCG parsing (Zhang and Clark 2011a; Ambati
et autres. 2015). Besides using phrase-structure rules only, a shift-reduce parser can be
enhanced by incorporating a dependency model (Xu, Clark, and Zhang 2014). Notre
parser and the two above parsers have some essential resemblances, including learning
and decoding algorithms. The main difference is the usage of syntactic and grammatical
information. The comparison in Section 5.7 gives a rough idea of the impact of explicitly
using grammatical constraints. A deep-grammar-guided parsing model usually cannot
produce full coverage and the time complexity of the corresponding parsing algorithms
is very high. Some NLP applications may favor lightweight solutions to build deep
dependency structures.

Different from grammar-guided approaches, data-driven approaches make essen-
tial use of machine learning from linguistic annotations in order to parse new sentences.
Such approaches, Par exemple, transition-based (Yamada and Matsumoto 2003; Nivre
2008) and graph-based (McDonald's 2006; Torres Martins, Forgeron, and Xing 2009) mod-
le, have attracted the most attention of dependency parsing in recent years. Several
successful parsers (par exemple., MST, Mate, and Malt parsers) have been built and applied
to many NLP applications. Recently, two advanced techniques have been studied to
enhance a transition-based parser. D'abord, developing features has been shown crucial
to advancing parsing accuracy and a very rich feature set is carefully evaluated by
Zhang and Nivre (2011). Deuxième, beyond deterministic greedy search, beam search and
principled dynamic programming strategies have been used to explore more possible
hypotheses (Zhang and Clark 2008; Huang and Sagae 2010). When we implement our
graph parser, we also leverage rich features and beam search to obtain good parsing
accuracy.

Most research concentrated on surface dependency structures, and the majority
of existing approaches are limited to producing only tree-shaped graphs. We notice
three distinguished exceptions in early work. Sagae and Tsujii (2008) proposed a DAG
parser that is able to handle projective directed dependency graphs, and that uses the
pseudo-projective parsing technique (Nivre and Nilsson 2005) to build crossing arcs.
Titov et al. (2009) and Henderson et al. (2013) introduced non-planar parsing to parse
PropBank (Palmer, Gildea, and Kingsbury 2005) structures. Cependant, neither technique
handles crossing arcs fully well. There have been a number of papers trying to build
non-projective trees, which inspired the design of our transition systems. Especially, nous
borrow key ideas from Nivre (2009), G ´omez-Rodr´ıguez and Nivre (2010), and G ´omez-
Rodr´ıguez and Nivre (2013). In addition to the investigation on the transition-based
approche, McDonald and Pereira (2006) presented a factorization parser that can gener-
ate dependency graphs in which a word may depend on multiple heads, and evaluated
it on the Danish Treebank. Very recently, the dual decomposition technique has been
adopted to achieve principled decoding for factorization models. High-accuracy models
have been introduced in Martins and Almeida (2014) and Du, Sun, and Wan (2015).

7. Conclusion

We study transition-based approaches that produce general dependency graphs directly
from input sequences of words, in a way nearly as simple as tree parsers. We introduce
two new graph-spanning algorithms to generate arbitrary directed graphs, which suit
deep dependency parsing well. We also introduce transition combination and tree

385

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

approximation for statistical disambiguation. Statistical parsers built upon these new
techniques have been evaluated with dependency structures that are extracted from
linguistically deep CCG, LFG, and HPSG derivations. Our models achieve state-of-the-art
performance on ﬁve representative data sets for English and Chinese parsing. Exper-
iments demonstrate the effectiveness of grammar-free, transition-based approaches to
dealing with complex linguistic phenomena beyond surface syntax.

In addition to deep dependency parsing, many other NLP tasks (par exemple., quantiﬁer
scope disambiguation [Manshadi, Gildea, and Allen 2013] and event extraction [Li, Ji,
and Huang 2013]), can be formulated as graph spanning problems. We think such tasks
can beneﬁt from algorithms that span general graphs rather than trees, and our new
transition-based parsers can provide practical solutions to these tasks.

Remerciements
This work was supported by the National
Natural Science Foundation of China under
subventions 61300064 et 61331011, et le
National High-Tech R&D Program under
grant 2015AA015403. We are very grateful to
the anonymous reviewers for their insightful
and constructive comments and suggestions.

Les références
Ambati, Bharat Ram, Tejaswini Deoskar,
Mark Johnson, and Mark Steedman.
2015. An incremental algorithm for
transition-based CCG parsing. Dans
Actes du 2015 Conference of the
North American Chapter of the Association for
Computational Linguistics: Human Language
Technologies, pages 53–63, Denver, CO.
Auli, Michael and Adam Lopez. 2011un. UN
comparison of loopy belief propagation
and dual decomposition for integrated
CCG supertagging and parsing. Dans
Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics:
Human Language Technologies,
pages 470–480, Portland, OR. Association
for Computational Linguistics.

Auli, Michael and Adam Lopez. 2011b.
Training a log-linear parser with loss
functions via softmax-margin. Dans
Actes du 2011 Conference on
Empirical Methods in Natural Language
Processing, pages 333–343, Édimbourg.
Bohnet, Bernd. 2010. Top accuracy and fast

dependency parsing is not a contradiction.
In Proceedings of the 23rd International
Conference on Computational Linguistics
(Coling 2010), pages 89–97, Beijing.

Bresnan, J.. et R. M.. Kaplan. 1982.

Introduction: Grammars as mental
representations of language. In J. Bresnan,
editor, The Mental Representation of
Grammatical Relations. AVEC Presse,
Cambridge, MA, pages xvii–lii.

386

Choi, Jinho D. and Martha Palmer. 2011.

Getting the most out of transition-based
dependency parsing. In Proceedings of the
49th Annual Meeting of the Association for
Computational Linguistics: Human Language
Technologies, pages 687–692, Portland, OR.

Clark, Stephen and James R. Curran.

2007. Wide-coverage efﬁcient statistical
parsing with CCG and log-linear
models. Computational Linguistics,
33(4):493–552.

Clark, Stephen, Julia Hockenmaier, and Mark

Steedman. 2002. Building deep
dependency structures using a
wide-coverage CCG parser. In Proceedings
of the 40th Annual Meeting of the Association
for Computational Linguistics,
pages 327–334. Philadelphia, Pennsylvanie.
Collins, Michael. 2002. Discriminative

training methods for hidden Markov
models: Theory and experiments with
perceptron algorithms. In Proceedings of the
2002 Conference on Empirical Methods in
Natural Language Processing, pages 1–8.
Philadelphia, Pennsylvanie.

Collins, Michael and Brian Roark. 2004.

Incremental parsing with the perceptron
algorithme. In Proceedings of the 42nd
Meeting of the Association for Computational
Linguistics (ACL’04), Main Volume,
pages 111–118, Barcelona.

Copestake, Ann, Dan Flickinger, Carl

Pollard, and Ivan A. Sag. 2005. Minimal
recursion semantics: An introduction.
Research on Language and Computation,
3:281–332.

Covington, Michael A. 2001. A fundamental
algorithm for dependency parsing. Dans
Proceedings of the 39th Annual ACM
Southeast Conference, pages 95–102.
Athens, GA.

Du, Yantao, Weiwei Sun, and Xiaojun Wan.
2015. A data-driven, factorization parser
for CCG dependency structures. Dans
Proceedings of the 53rd Annual Meeting of the

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

Association for Computational Linguistics and
the 7th International Joint Conference on
Natural Language Processing (Volume 1: Long
Papers), pages 1545–1555, Beijing.

Du, Yantao, Fan Zhang, Weiwei Sun, et
Xiaojun Wan. 2014. Peking: Proﬁling
syntactic tree parsing techniques for
semantic graph parsing. In Proceedings of
the 8th International Workshop on Semantic
Evaluation (SemEval 2014), pages 459–464,
Dublin.

Du, Yantao, Fan Zhang, Xun Zhang, Weiwei

Sun, and Xiaojun Wan. 2015. Peking:
Building semantic dependency graphs
with a hybrid parser. In Proceedings of the
9th International Workshop on Semantic
Evaluation (SemEval 2015), pages 927–931,
Denver, CO.

Flickinger, Dan. 2000. On building a more
efﬁcient grammar by exploiting types.
Natural Language Engineering, 6(1):15–28.

Flickinger, Daniel, Yi Zhang, and Valia

Kordoni. 2012. Deepbank: A dynamically
annotated treebank of the Wall Street
journal. In Proceedings of the Eleventh
International Workshop on Treebanks and
Linguistic Theories, pages 85–96, Lisbon.
G ´omez-Rodr´ıguez, Carlos and Joakim Nivre.

2010. A transition-based parser for
2-planar dependency structures. Dans
Proceedings of the 48th Annual Meeting of the
Association for Computational Linguistics,
pages 1492–1501, Uppsala.

G ´omez-Rodr´ıguez, Carlos and Joakim Nivre.

2013. Divisible transition systems and
multiplanar dependency parsing.
Computational Linguistics, 39(4):799–845.

Hajiˇc, Jan, Massimiliano Ciaramita,

Richard Johansson, Daisuke Kawahara,
Maria Ant `onia Mart´ı, Llu´ıs M`arquez,
Adam Meyers, Joakim Nivre,
Sebastian Pad ´o, Jan ˇStˇep´anek,
Pavel Stra ˇn´ak, Mihai Surdeanu,
Nianwen Xue, and Yi Zhang. 2009.
The CONLL-2009 shared task: Syntactic
and semantic dependencies in multiple
languages. In Proceedings of the Thirteenth
Conference on Computational Natural
Language Learning (CoNLL 2009):
Shared Task, pages 1–18, Boulder, CO.
Henderson, James, Paola Merlo, Ivan Titov,
and Gabriele Musillo. 2013. Multilingual
joint parsing of syntactic and semantic
dependencies with a latent variable
model. Computational Linguistics,
39(4):949–998.

Hockenmaier, Julia and Mark Steedman.
2007. CCGbank: A corpus of CCG
derivations and dependency structures

extracted from the Penn treebank.
Computational Linguistics, 33(3):355–396.

Huang, Liang and Kenji Sagae. 2010.

Dynamic programming for linear-time
incremental parsing. In Proceedings of the
48th Annual Meeting of the Association for
Computational Linguistics, pages 1077–1086,
Uppsala.

Huang, Zhongqiang, Mary Harper, and Slav
Petrov. 2010. Self-training with products of
latent variable grammars. In Proceedings of
le 2010 Conference on Empirical Methods in
Natural Language Processing, pages 12–22,
Cambridge, MA.

Ivanova, Angelina, Stephan Oepen, Lilja

Øvrelid, and Dan Flickinger. 2012. Who
did what to whom? A contrastive study of
syntacto-semantic dependencies. Dans
Proceedings of the Sixth Linguistic Annotation
Workshop, pages 2–11, Jeju Island.
Koo, Terry and Michael Collins. 2010.

Efﬁcient third-order dependency parsers.
In Proceedings of the 48th Annual Meeting of
the Association for Computational Linguistics,
pages 1–11, Uppsala.

Li, Qi, Heng Ji, and Liang Huang. 2013. Joint
event extraction via structured prediction
with global features. In Proceedings of the
51st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long
Papers), pages 73–82, Soﬁa.

Manshadi, Mehdi, Daniel Gildea, and James

Allen. 2013. Plurality, negation, et
quantiﬁcation: Towards comprehensive
quantiﬁer scope disambiguation. Dans
Proceedings of the 51st Annual Meeting of the
Association for Computational Linguistics
(Volume 1: Long Papers), pages 64–72, Soﬁa.

Martins, Andr´e F. T. and Mariana S. C.

Almeida. 2014. Priberam: A turbo semantic
parser with second order features. Dans
Proceedings of the 8th International Workshop
on Semantic Evaluation (SemEval 2014),
pages 471–476, Dublin.

Matsuzaki, Takuya, Yusuke Miyao, et
Jun’ichi Tsujii. 2007. Efﬁcient HPSG
parsing with supertagging and
CFG-ﬁltering. In Proceedings of the 20th
International Joint Conference on Artiﬁcial
intelligence, pages 1671–1676, San
Francisco, Californie.

McDonald's, Ryan. 2006. Discriminative

Learning and Spanning Tree Algorithms
for Dependency Parsing. Ph.D. thesis,
University of Pennsylvania, Philadelphia,
Pennsylvanie.

McDonald's, Ryan and Fernando Pereira. 2006.

Online learning of approximate
dependency parsing algorithms. Dans

387

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 42, Nombre 3

Proceedings of 11th Conference of the
European Chapter of the Association for
Computational Linguistics (EACL-2006)),
volume 6, pages 81–88, Trento.

McDonald's, Ryan T. and Joakim Nivre. 2011.
Analyzing and integrating dependency
parsers. Computational Linguistics,
37(1):197–230.

Miyao, Yusuke, Takashi Ninomiya, et
Jun’ichi Tsujii. 2004. Corpus-oriented
grammar development for acquiring a
head-driven phrase structure grammar
from the penn treebank. In IJCNLP,
pages 684–693, Hainan Island.

Miyao, Yusuke, Rune Sætre, Kenji Sagae,
Takuya Matsuzaki, and Jun’ichi Tsujii.
2008. Task-oriented evaluation of syntactic
parsers and their representations. Dans
Proceedings of ACL-08: HLT, pages 46–54,
Columbus, OH.

Miyao, Yusuke, Kenji Sagae, and Jun’ichi
Tsujii. 2007. Towards framework-
independent evaluation of deep linguistic
parsers. In Proceedings of the GEAF
2007 Workshop, pages 238–258,
Stanford, Californie.

Miyao, Yusuke and Jun’ichi Tsujii.
2008. Feature forest models for
probabilistic HPSG parsing.
Computational Linguistics, 34(1):35–80.

Nivre, Joakim. 2008. Algorithms for

deterministic incremental dependency
parsing. Computational Linguistics,
34:513–553.

Nivre, Joakim. 2009. Non-projective

dependency parsing in expected linear
temps. In Proceedings of the Joint Conference
of the 47th Annual Meeting of the ACL
and the 4th International Joint Conference
on Natural Language Processing of the
AFNLP, pages 351–359, Suntec.
Nivre, Joakim and Ryan McDonald.
2008. Integrating graph-based and
transition-based dependency parsers.
In Proceedings of ACL-08: HLT,
pages 950–958, Columbus, OH.
Nivre, Joakim and Jens Nilsson. 2005.

Pseudo-projective dependency parsing.
In Proceedings of the 43rd Annual Meeting of
the Association for Computational Linguistics
(ACL’05), pages 99–106, Ann-Arbor, MI.
Oepen, Stephan, Marco Kuhlmann, Yusuke
Miyao, Daniel Zeman, Dan Flickinger,
Jan Hajic, Angelina Ivanova, and Yi Zhang.
2014. Semeval 2014 task 8: Broad-coverage
semantic dependency parsing.
In Proceedings of the 8th International
Workshop on Semantic Evaluation
(SemEval 2014), pages 63–72, Dublin.

388

Oepen, Stephan and Jan Tore Lønning.

2006. Discriminant-based MRS banking.
In Proceedings of the Fifth International
Conference on Language Resources
and Evaluation (LREC-2006), Genoa.

Palmer, Martha, Daniel Gildea, et

Paul Kingsbury. 2005. The proposition
bank: An annotated corpus of semantic
roles. Computational Linguistics, 31:71–106.

Pollard, Carl and Ivan A. Sag. 1994.

Head-Driven Phrase Structure Grammar.
The University of Chicago Press, Chicago.
Punyakanok, Vasin, Dan Roth, and Wen-tau
Yih. 2008. The importance of syntactic
parsing and inference in semantic
role labeling. Computational Linguistics,
34(2):257–287.

Reddy, Siva, Mirella Lapata, et

Mark Steedman. 2014. Large-scale
semantic parsing without question-answer
pairs. Transactions of the Association for
Computational Linguistics (TACL),
2:377–392.

Sagae, Kenji and Alon Lavie. 2006. Parser

combination by reparsing. In Proceedings of
the Human Language Technology Conference
of the NAACL, Companion Volume: Short
Papers, pages 129–132, Stroudsburg, Pennsylvanie.

Sagae, Kenji and Jun’ichi Tsujii. 2008.
Shift-reduce dependency DAG
parsing. In Proceedings of the 22nd
International Conference on Computational
Linguistics, pages 753–760, Manchester.

Steedman, Mark. 2000. The Syntactic

Process. AVEC Presse, Cambridge, MA.

Sun, Weiwei, Yantao Du, Xin Kou, Shuoyang
Ding, and Xiaojun Wan. 2014. Grammatical
relations in Chinese: GB-ground extraction
and data-driven parsing. In Proceedings
of the 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1:
Long Papers), pages 446–456, Baltimore.

Sun, Weiwei and Xiaojun Wan. 2013.

Data-driven, PCFG-based and
pseudo-PCFG-based models for
Chinese dependency parsing. Transactions
of the Association for Computational
Linguistics (TACL), 1:301–314.

Surdeanu, Mihai, Richard Johansson,
Adam Meyers, Llu´ıs M`arquez, et
Joakim Nivre. 2008. The CONLL 2008
shared task on joint parsing of syntactic
and semantic dependencies. In CoNLL
2008: Proceedings of the Twelfth Conference
on Computational Natural Language
Apprentissage, pages 159–177, Manchester.

Surdeanu, Mihai and Christopher D.

Manning. 2010. Ensemble models for
dependency parsing: Cheap and good?

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o

je
je

_
un
_
0
0
2
5
2
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Zhang, Du, Sun, and Wan

Transition-Based Parsing for Deep Dependency Structures

In Human Language Technologies:
Le 2010 Annual Conference of the
North American Chapter of the
Association for Computational Linguistics,
pages 649–652, Les anges, Californie.

Suzuki, Jun, Hideki Isozaki, Xavier Carreras,
and Michael Collins. 2009. An empirical
study of semi-supervised structured
conditional models for dependency
parsing. In Proceedings of the 2009 Conference
on Empirical Methods in Natural Language
Processing, pages 551–560, Singapore.
Titov, Ivan, James Henderson, Paola Merlo,
and Gabriele Musillo. 2009. Online graph
planarisation for synchronous parsing
of semantic and syntactic dependencies.
In Proceedings of the 21st International
Joint Conference on Artiﬁcial Intelligence,
pages 1562–1567, San Francisco, Californie.

Torres Martins, Andre, Noah Smith,

and Eric Xing. 2009. Concise integer
linear programming formulations for
dependency parsing. In Proceedings of the
Joint Conference of the 47th Annual Meeting
of the ACL and the 4th International Joint
Conference on Natural Language Processing
of the AFNLP, pages 342–350, Suntec.
Torres Martins, Andr´e Filipe, Dipanjan
Le, Noah A. Forgeron, and Eric P. Xing.
2008. Stacking dependency parsers.
In Proceedings of the 2008 Conference
on Empirical Methods in Natural Language
Processing, pages 157–166, Honolulu, HI.

Ces, Daniel and James R. Curran. 2010.
Chinese CCGbank: Extracting CCG
derivations from the Penn Chinese
treebank. In Proceedings of the 23rd
International Conference on Computational
Linguistics (Coling 2010), pages 1083–1091,
Beijing.

Ces, Daniel and James R. Curran. 2012.
The challenges of parsing Chinese
with combinatory categorial grammar.
In Proceedings of the 2012 Conference of the
North American Chapter of the Association for
Computational Linguistics: Human Language
Technologies, pages 295–304, Montr´eal.

Blanc, David, Chris Alberti, Michael Collins,
and Slav Petrov. 2015. Structured training
for neural network transition-based
parsing. In Proceedings of the 53rd Annual

Meeting of the Association for Computational
Linguistics and the 7th International Joint
Conference on Natural Language Processing
(Volume 1: Long Papers), pages 323–333,
Beijing.

Xu, Wenduan, Stephen Clark, and Yue

Zhang. 2014. Shift-reduce CCG parsing
with a dependency model. In Proceedings
of the 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long
Papers), pages 218–227, Baltimore, MARYLAND.
Yamada, Hiroyasu and Yuji Matsumoto.
2003. Statistical dependency analysis
with support vector machines. In 8th
International Workshop of Parsing
Technologies (IWPT2003), pages 195–206,
Nancy.

Zhang, Hui, Min Zhang, Chew Lim Tan,

and Haizhou Li. 2009. K-best
combination of syntactic parsers.
In Proceedings of the 2009 Conference
on Empirical Methods in Natural Language
Processing, pages 1552–1560, Singapore.

Zhang, Yue and Stephen Clark. 2008.
A tale of two parsers: Investigating
and combining graph-based and
transition-based dependency parsing.
In Proceedings of the 2008 Conference
on Empirical Methods in Natural Language
Processing, pages 562–571, Honolulu, HI.

Zhang, Yue and Stephen Clark. 2011un.

Shift-reduce CCG parsing. In Proceedings of
the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language
Technologies, pages 683–692, Portland, OR.

Zhang, Yue and Stephen Clark. 2011b.

Syntactic processing using the generalized
perceptron and beam search. Informatique
Linguistics, 37(1):105–151.

Zhang, Yue and Joakim Nivre. 2011.

Transition-based dependency parsing with
rich non-local features. In Proceedings of
the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language
Technologies, pages 188–193, Portland, OR.

Zhuang, Tao and Chengqing Zong. 2010.

A minimum error weighting combination
strategy for Chinese semantic role labeling.
In Proceedings of the 23rd International
Conference on Computational Linguistics
(Coling 2010), pages 1362–1370, Beijing.

389

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
2
3
3
5
3
1
8
0
6
9
1
0
/
c
o