Arc-Eager Parsing with the Tree Constraint
Joakim Nivre∗
Uppsala University
Daniel Fern´andez-Gonz´alez∗∗
University of Vigo
The arc-eager system for transition-based dependency parsing is widely used in natural language
processing despite the fact that it does not guarantee that the output is a well-formed dependency
tree. We propose a simple modification to the original system that enforces the tree constraint
without requiring any modification to the parser training procedure. Experiments on multiple
languages show that the method on average achieves 72% of the error reduction possible and
consistently outperforms the standard heuristic in current use.
1. Einführung
One of the most widely used transition systems for dependency parsing is the arc-
eager system first described in Nivre (2003), which has been used as the backbone
for greedy deterministic dependency parsers (Nivre, Hall, and Nilsson 2004; Goldberg
and Nivre 2012), beam search parsers with structured prediction (Zhang and Clark
2008; Zhang and Nivre 2011), neural network parsers with latent variables (Titov and
Henderson 2007), and delexicalized transfer parsers (McDonald, Petrov, and Hall 2011).
Jedoch, in contrast to most similar transition systems, the arc-eager system does
not guarantee that the output is a well-formed dependency tree, which sometimes
leads to fragmented parses and lower parsing accuracy. Although various heuristics
have been proposed to deal with this problem, there has so far been no clean the-
oretical solution that also gives good parsing accuracy. In diesem Artikel, we present a
modified version of the original arc-eager system, which is provably correct for the
class of projective dependency trees, which maintains the linear time complexity of
greedy (or beam search) parsers, and which does not require any modifications to the
parser training procedure. Experimental evaluation on the CoNLL-X data sets show
that the new system consistently outperforms the standard heuristic in current use,
on average achieving 72% of the error reduction possible (compared with 41% für die
old heuristic).
∗ Uppsala University, Department of Linguistics and Philology, Kasten 635, SE-75126, Uppsala, Schweden.
Email: joakim.nivre@lingfil.uu.se.
∗∗ Universidad de Vigo, Departamento de Inform´atica, Campus As Lagoas, 32004, Ourense, Spanien.
Email: danifg@uvigo.es.
Einreichung erhalten: 25 Juni 2013; zur Veröffentlichung angenommen: 4 November 2013.
doi:10.1162/COLI a 00185
© 2014 Verein für Computerlinguistik
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
C
Ö
l
ich
_
A
_
0
0
1
8
5
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Computerlinguistik
Volumen 40, Nummer 2
2. The Problem
The dependency parsing problem is usually defined as the task of mapping a sen-
tence x = w1, . . . , wn to a dependency tree T, which is a directed tree with one node
for each input token wi, plus optionally an artificial root node corresponding to a
dummy word w0, and with arcs representing dependency relations, optionally labeled
with dependency types (K ¨ubler, McDonald, and Nivre 2009). In diesem Artikel, we will
furthermore restrict our attention to dependency trees that are projective, meaning that
every subtree has a contiguous yield. Figur 1 shows a labeled projective dependency
tree.
Transition-based dependency parsing views parsing as heuristic search through a
non-deterministic transition system for deriving dependency trees, guided by a statisti-
cal model for scoring transitions from one configuration to the next. Figur 2 zeigt die
arc-eager transition system for dependency parsing (Nivre 2003, 2008). A parser con-
figuration consists of a stack σ, a buffer β, and a set of arcs A. The initial configuration
for parsing a sentence x = w1, . . . , wn has an empty stack, a buffer containing the words
w1, . . . , wn, and an empty arc set. A terminal configuration is any configuration with
an empty buffer. Whatever arcs have then been accumulated in the arc set A defines
the output dependency tree. There are four possible transitions from a configuration
(cid:7)
(cid:7)
DOBJ
IOBJ
(cid:4)
(cid:7)
SBJ
(cid:7)
?
He1
wrote2
(cid:4)
?
her3
P
(cid:7)
?
a4
DET
(cid:4)
(cid:4)
?
letter5
(cid:4)
?
.6
Figur 1
Projective labeled dependency tree for an English sentence.
Initial:
Terminal:
([ ], [w1, . . . , wn], { })
(σ, [ ], A)
(σ, wi|β, A)
(σ|wi, β, A)
Shift:
Reduce:
Right-Arc: (σ|wi, wj|β, A) ⇒ (σ|wi|wj, β, A ∪ {wi → wj})
Left-Arc:
(σ|wi, wj|β, A) ⇒ (σ, wj|β, A ∪ {wi ← wj})
⇒ (σ|wi, β, A)
⇒ (σ, β, A)
l
ich
_
A
_
0
0
1
8
5
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
HEAD(wi)
¬HEAD(wi)
Figur 2
Arc-eager transition system for dependency parsing. Wir gebrauchen | as list constructor, Bedeutung
that σ|wi is a stack with top wi and remainder σ and wj|β is a buffer with head wj and tail β.
The condition HEAD(wi) is true in a configuration (σ, β, A) if A contains an arc wk → wi
(for some k).
260
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
C
Ö
Nivre and Fern´andez-Gonz´alez
Arc-Eager Parsing with the Tree Constraint
where top is the word on top of the stack (wenn überhaupt) and next is the first word of the
buffer:1
1.
2.
3.
4.
Shift moves next to the stack.
Reduce pops the stack; allowed only if top has a head.
Right-Arc adds a dependency arc from top to next and moves next to the
stack.
Left-Arc adds a dependency arc from next to top and pops the stack;
allowed only if top has no head.
The arc-eager system defines an incremental left-to-right parsing order, where left
dependents are added bottom–up and right dependents top–down, which is advanta-
geous for postponing certain attachment decisions. Jedoch, a fundamental problem
with this system is that it does not guarantee that the output parse is a projective
dependency tree, only a projective dependency forest, das ist, a sequence of adjacent,
non-overlapping projective trees (Nivre 2008). This is different from the closely related
arc-standard system (Nivre 2004), which constructs all dependencies bottom–up and
can easily be constrained to only output trees. The failure to implement the tree con-
straint may lead to fragmented parses and lower parsing accuracy, especially with
respect to the global structure of the sentence. Darüber hinaus, even if the loss in accuracy
is not substantial, this may be problematic when using the parser in applications where
downstream components may not function correctly if the parser output is not a well-
formed tree.
The standard solution to this problem in practical implementations, such as Malt-
Parser (Nivre, Hall, and Nilsson 2006), is to use an artificial root node and to attach
all remaining words on the stack to the root node at the end of parsing. This fixes the
formal problem, but normally does not improve accuracy because it is usually unlikely
that more than one word should attach to the artificial root node. Daher, in the error
analysis presented by McDonald and Nivre (2007), MaltParser tends to have very low
precision on attachments to the root node. Other heuristic solutions have been tried,
usually by post-processing the nodes remaining on the stack in some way, but these
techniques often require modifications to the training procedure and/or undermine the
linear time complexity of the parsing system. In any case, a clean theoretical solution to
this problem has so far been lacking.
3. The Solution
We propose a modified version of the arc-eager system, which guarantees that the arc
set A in a terminal configuration forms a projective dependency tree. The new system,
shown in Figure 3, differs in four ways from the old system:
1.
Configurations are extended with a boolean variable e, keeping track of
whether we have seen the end of the input, das ist, whether we have
passed through a configuration with an empty buffer.
1 For simplicity, we only consider unlabeled parsing here. In labeled parsing, which is used in all
experiments, Right-Arc and Left-Arc also have to select a label for the new arc.
261
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
C
Ö
l
ich
_
A
_
0
0
1
8
5
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Computerlinguistik
Volumen 40, Nummer 2
Initial:
Terminal:
([ ], [w1, . . . , wn], { }, false)
([wi], [ ], A, true)
Shift:
Unshift:
Reduce:
Right-Arc: (σ|wi, wj|β, A, e) ⇒
(σ, wi|β, A, false) ⇒ (σ|wi, β, A, [[β = [ ]]])
(σ|wi, [ ], A, true) ⇒ (σ, [wi], A, true)
(σ|wi, β, A, e)
⇒ (σ, β, A, e)
Left-Arc:
(σ|wi|wj, β, A ∪ {wi → wj}, [[e ∨ β = [ ]]])
(σ|wi, wj|β, A, e) ⇒ (σ, wj|β, A ∪ {wi ← wj}, e)
¬HEAD(wi)
HEAD(wi)
¬HEAD(wi)
Figur 3
Arc-eager transition system enforcing the tree constraint. The expression [[Phi]] evaluates to true if
φ is true and false otherwise.
2.
3.
4.
Terminal configurations have the form ([wi], [ ], A, true), das ist, they have
an empty buffer, exactly one word on the stack, and e = true.
The Shift transition is allowed only if e = false.
There is a new transition Unshift, which moves top back to the buffer and
which is allowed only if top has no head and the buffer is empty.
The new system behaves exactly like the old system until we reach a configuration with
an empty buffer, after which there are two alternatives. If the stack contains exactly one
word, we terminate and output a tree, which was true also in the old system. Jedoch,
if the stack contains more than one word, we now go on parsing but are forbidden to
make any Shift transitions. After this point, there are two cases. If the buffer is empty,
we make a deterministic choice between Reduce and Unshift depending on whether top
has a head or not. If the buffer is not empty, we non-deterministically choose between
Right-Arc and either Left-Arc or Reduce (the latter again depending on whether top
has a head). Because the new Unshift transition is only used in completely deterministic
Fälle, we can use the same statistical model to score transitions both before and after we
have reached the end of the input, as long as we make sure to block any Shift transition
favored by the model.
We first show that the new system is still guaranteed to terminate and that the
maximum number of transitions is O(N), where n is the length of the input sentence,
which guarantees linear parsing complexity for greedy (and beam search) parsers with
constant-time model predictions and transitions. From previous results, we know that
the system is guaranteed to reach a configuration of the form (σ, [ ], A) in 2n − k transi-
tionen, where k = |σ| (Nivre 2008).2 In any non-terminal configuration arising from this
point on, we can always perform Reduce or Unshift (in case the buffer is empty) oder
Right-Arc (ansonsten), which means that termination is guaranteed if we can show that
the number of additional transitions is bounded.
2 This holds because we must move n words from the buffer to the stack (in either a Shift or a Right-Arc
Übergang) and pop n − k words from the stack (in either a Reduce or Left-Arc transition).
262
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
C
Ö
l
ich
_
A
_
0
0
1
8
5
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Nivre and Fern´andez-Gonz´alez
Arc-Eager Parsing with the Tree Constraint
Note first that we can perform at most k − 1 Unshift transitions moving a word back
to the buffer (because a word can only be moved back to the buffer if it has no head,
which can only happen once since Shift is now forbidden).3 daher, we can perform
at most k − 1 Right-Arc transitions, moving a word back to the stack and attaching
it to its head. Endlich, we can perform at most k − 1 Reduce and Left-Arc transitions,
removing a word from the stack (regardless of whether it has first been moved back
to the buffer). In Summe, we can thus perform at most 2n − k + 3(k − 1) < 4n transitions,
which means that the number of transitions is O(n).
Having shown that the new system terminates after a linear number of transitions,
we now show that it also guarantees that the output is a well-formed dependency tree.
In order to reach a terminal configuration, we must pop n − 1 words from the stack,
each of which has exactly one incoming arc and is therefore connected to at least one
other node in the graph. Because the word remaining in the stack has no incoming arc
but must be connected to (at least) the last word that was popped, it follows that the
resulting graph is connected with exactly n − 1 arcs, which entails that it is a tree.
It is worth noting that, although the new system can always construct a tree over
the unattached words left on the stack in the first configuration of the form (σ, [ ], A),
it may not be able to construct every possible tree over these nodes. More precisely,
a sequence of words wj, . . . , wk can only attach to a word on the left in the form of
a chain (not as siblings) and can only attach to a word on the right as siblings (not
as a chain). Nevertheless, the new system is both sound and complete for the class
of projective dependency trees, because every terminating transition sequence derives
a projective tree (soundness) and every projective tree is derived by some transition
sequence (completeness). By contrast, the original arc-eager system is complete but not
sound for the class of projective trees.
4. Experiments
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
In our empirical evaluation we make use of the open-source system MaltParser (Nivre,
Hall, and Nilsson 2006), which is a data-driven parser-generator for transition-based
dependency parsing supporting the use of different transition systems. Besides the
original arc-eager system, which is already implemented in MaltParser, we have added
an implementation of the new modified system. The training procedure used in Malt-
Parser derives an oracle transition sequence for each sentence and gold tree in the
training corpus and uses every configuration–transition pair in these sequences as a
training instance for a multi-class classifier. Because the oracle sequences in the arc-
eager system always produce a well-formed tree, there will be no training instances
corresponding to the extended transition sequences in the new system (i.e., sequences
containing one or more non-terminal configurations of the form (σ, [ ], A)). However,
because the Unshift transition is only used in completely deterministic cases, where the
classifier is not called upon to rank alternative transitions, we can make use of exactly
the same classifier for both the old and the new system.4
We compare the original and modified arc-eager systems on all 13 data sets from the
CoNLL-X shared task on multilingual dependency parsing (Buchholz and Marsi 2006),
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
3 The number is k − 1, rather than k, because Unshift requires an empty buffer, which together with only
one word on the stack would imply a terminal configuration.
4 Although this greatly simplifies the integration of the new system into existing parsing frameworks, it is
conceivable that accuracy could be improved further through specialized training methods, for example,
using a dynamic oracle along the lines of Goldberg and Nivre (2012). We leave this for future research.
263
Computational Linguistics
Volume 40, Number 2
which all assume the existence of a dummy root word prefixed to the sentence. We tune
the feature representations separately for each language and projectivize the training
data for languages with non-projective dependencies but otherwise use default settings
in MaltParser (including the standard heuristic of attaching any unattached tokens to
the artificial root node at the end of parsing for the original system). Because we want
to perform a detailed error analysis for fragmented parses, we initially avoid using the
dedicated test set for each language and instead report results on a development set
created by splitting off 10% of the training data.
Table 1 (columns 2–3) shows the unlabeled attachment score (including punctua-
tion) achieved with the two systems. We see that the new system improves over the
old one by 0.19 percentage points on average, with individual improvements ranging
from 0.00 (Japanese) to 0.50 (Slovene). These differences may seem quantitatively small,
but it must be remembered that the unattached tokens left on the stack in fragmented
parses constitute a very small fraction of the total number of tokens on which these
scores are calculated. In order to get a more fine-grained picture of the behavior of the
two systems, we therefore zoom in specifically on these tokens in the rest of Table 1.
Column 4 shows the number of unattached tokens left on the stack when reaching
the end of the input (excluding the artificial root node). Column 5 shows for how many
of these tokens the correct head is also on the stack (including the artificial root node).
Both statistics are summed over all sentences in the development set. We see from these
figures that the amount of fragmentation varies greatly between languages, from only
four unattached tokens for Japanese to 230 tokens for Slovene. These tendencies seem
to reflect properties of the data sets, with Japanese having the lowest average sentence
length of all languages and Slovene having a high percentage of non-projective depen-
dencies and a very small training set. They also partly explain why these languages
show the smallest and largest improvement, respectively, in overall attachment score.
Table 1
Experimental results for the old and new arc-eager transition systems (development sets).
UAS = unlabeled attachment score; Stack-Token = number of unattached tokens left in the stack
when reaching the end of the input (excluding the artificial root node); Stack-Head = number of
unattached tokens for which the head is also left in the stack (including the artificial root node);
Correct = number of tokens left in the stack that are correctly attached in the final parser output;
Recall = Correct/Stack-Head (%).
UAS
Stack
Correct
Recall
Language
Old New Token Head
Old New
Old
New
Arabic
Bulgarian
Chinese
Czech
Danish
Dutch
German
Japanese
Portuguese
Slovene
Spanish
Swedish
Turkish
77.38
90.32
89.28
83.26
88.10
86.23
87.44
93.53
87.68
76.50
81.43
88.18
81.44
77.74
90.42
89.48
83.46
88.17
86.49
87.75
93.53
87.84
77.00
81.59
88.33
81.45
38
20
33
41
29
63
66
4
44
230
57
43
24
24
15
31
36
23
57
39
4
36
173
42
28
16
3
7
8
8
18
13
6
4
15
50
13
13
9
22
13
18
21
22
28
27
4
26
97
22
24
10
12.50
46.67
25.81
22.22
78.26
22.80
15.38
100.00
41.67
28.90
30.95
46.43
56.25
91.67
86.67
58.06
58.33
95.65
49.12
69.23
100.00
72.22
56.07
52.38
85.71
62.50
Average
85.44
85.63
53.23
40.31
12.85
25.69
40.60
72.12
264
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Nivre and Fern´andez-Gonz´alez
Arc-Eager Parsing with the Tree Constraint
Table 2
Results on final test sets. LAS = labeled attachment score. UAS = unlabeled attachment score.
Ara
Bul Cze Chi Dan Dut Ger
Jap
Por
Slo
Spa Swe Tur Ave
LAS Old
65.90 87.39 86.11 77.82 84.11 77.28 85.42 90.88 83.52 70.30 79.72 83.19 73.92 80.43
New 66.13 87.45 86.27 77.93 84.16 77.37 85.51 90.84 83.76 70.86 79.73 83.19 73.98 80.55
UAS Old
76.33 90.51 89.79 83.09 88.74 79.95 87.92 92.44 86.28 77.09 82.47 88.70 81.09 84.95
New 76.49 90.58 89.94 83.09 88.82 80.18 88.00 92.40 86.33 77.34 82.49 88.76 81.20 85.05
Columns 6 and 7 show, for the old and the new system, how many of the unattached
tokens on the stack are attached to their correct head in the final parser output, as a
result of heuristic root attachment for the old system and extended transition sequences
for the new system. Columns 8 and 9 show the same results expressed in terms of recall
or error reduction (dividing column 6/7 by column 5). These results clearly demonstrate
the superiority of the new system over the old system with heuristic root attachment.
Whereas the old system correctly attaches 40.60% of the tokens for which a head can be
found on the stack, the new system finds correct attachments in 72.12% of the cases. For
some languages, the effect is dramatic, with Arabic improving from just above 10% to
over 90% and German from about 15% to almost 70%, but all languages clearly benefit
from the new technique for enforcing the tree constraint.
Variation across languages can to a large extent be explained by the proportion
of unattached tokens that should be attached to the artificial root node. Because the
old root attachment heuristic attaches all tokens to the root, it will have 100% recall
on tokens for which this is the correct attachment and 0% recall on all other tokens.
This explains why the old system gets 100% recall on Japanese, where all four tokens
left on the stack should indeed be attached to the root. It also means that, on average,
root attachment is only correct for about 40% of the cases (which is the overall recall
achieved by this method). By contrast, the new system only achieves a recall of 82.81%
on root attachments, but this is easily compensated by a recall of 63.50% on non-root
attachments.
For completeness, we report also the labeled and unlabeled attachment scores (in-
cluding punctuation) on the dedicated test sets from the CoNLL-X shared task, shown
in Table 2. The results are perfectly consistent with those analyzed in depth for the
development sets. The average improvement is 0.12 for LAS and 0.10 for UAS. The
largest improvement is again found for Slovene (0.58 LAS, 0.25 UAS) and the smallest
for Japanese, where there is in fact a marginal drop in accuracy (0.04 LAS/UAS).5 For
all other languages, however, the new system is at least as good as the old system
and in addition guarantees a well-formed output without heuristic post-processing.
Moreover, although the overall improvement is small, there is a statistically significant
improvement in either LAS or UAS for all languages except Bulgarian, Czech, Japanese,
Spanish, and Swedish, and in both LAS and UAS on average over all languages accord-
ing to a randomized permutation test (α = .05) (Yeh 2000). Finally, it is worth noting
that there is no significant difference in running time between the old and the new
system.
5 As we saw in the previous analysis, fragmentation happens very rarely for Japanese and all unattached
tokens should normally be attached to the root node, which gives 100% recall for the baseline parser.
265
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 40, Number 2
5. Conclusion
In conclusion, we have presented a modified version of the arc-eager transition system
for dependency parsing, which, unlike the old system, guarantees that the output is
a well-formed dependency tree. The system is provably sound and complete for the
class of projective dependency trees, and the number of transitions is still linear in
the length of the sentence, which is important for efficient parsing. The system can be
used without modifying the standard training procedure for greedy transition-based
parsers, because the statistical model used to score transitions is the same as for the
old system. An empirical evaluation on all 13 languages from the CoNLL-X shared task
shows that the new system consistently outperforms the old system with the standard
heuristic of attaching all unattached tokens to the artificial root node. Whereas the
old method only recovers about 41% of the attachments that are still feasible, the new
system achieves an average recall of 72%. Although this gives only a marginal effect on
overall attachment score (at most 0.5%), being able to guarantee that output parses are
always well formed may be critical for downstream modules that take these as input.
Moreover, the proposed method achieves this guarantee as a theoretical property of the
transition system without having to rely on ad hoc post-processing and works equally
well regardless of whether a dummy root word is used or not.
Acknowledgments
This research has been partially funded by
the Spanish Ministry of Economy and
Competitiveness and FEDER (project
TIN2010-18552-C03-01), Ministry of
Education (FPU Grant Program), and Xunta
de Galicia (projects CN 2012/319 and CN
2012/317).
References
Buchholz, Sabine and Erwin Marsi. 2006.
CoNLL-X shared task on multilingual
dependency parsing. In Proceedings of the
10th Conference on Computational Natural
Language Learning (CoNLL), pages 149–164,
New York, NY.
Goldberg, Yoav and Joakim Nivre. 2012.
A dynamic oracle for arc-eager
dependency parsing. In Proceedings
of the 24th International Conference on
Computational Linguistics (COLING),
pages 959–976, Jeju Island.
K ¨ubler, Sandra, Ryan McDonald, and
Joakim Nivre. 2009. Dependency Parsing.
Morgan and Claypool.
McDonald, Ryan and Joakim Nivre. 2007.
Characterizing the errors of data-driven
dependency parsing models. In Proceedings
of the 2007 Joint Conference on Empirical
Methods in Natural Language Processing
and Computational Natural Language
Learning (EMNLP-CoNLL), pages 122–131,
Prague.
266
McDonald, Ryan, Slav Petrov, and Keith
Hall. 2011. Multi-source transfer of
delexicalized dependency parsers. In
Proceedings of the Conference on Empirical
Methods in Natural Language Processing
(EMNLP), pages 62–72, Edinburgh.
Nivre, Joakim. 2003. An efficient algorithm
for projective dependency parsing.
In Proceedings of the 8th International
Workshop on Parsing Technologies (IWPT),
pages 149–160, Nancy.
Nivre, Joakim. 2004. Incrementality in
deterministic dependency parsing.
In Proceedings of the Workshop on
Incremental Parsing: Bringing Engineering
and Cognition Together (ACL), pages 50–57,
Stroudsburg, PA.
Nivre, Joakim. 2008. Algorithms for
deterministic incremental dependency
parsing. Computational Linguistics,
34:513–553.
Nivre, Joakim, Johan Hall, and Jens Nilsson.
2004. Memory-based dependency parsing.
In Proceedings of the 8th Conference on
Computational Natural Language Learning
(CoNLL), pages 49–56, Boston, MA.
Nivre, Joakim, Johan Hall, and Jens Nilsson.
2006. Maltparser: A data-driven
parser-generator for dependency parsing.
In Proceedings of the 5th International
Conference on Language Resources and
Evaluation (LREC), pages 2,216–2,219,
Genoa.
Titov, Ivan and James Henderson. 2007.
A latent variable model for generative
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Nivre and Fern´andez-Gonz´alez
Arc-Eager Parsing with the Tree Constraint
dependency parsing. In Proceedings of the
10th International Conference on Parsing
Technologies (IWPT), pages 144–155,
Prague.
Yeh, Alexander. 2000. More accurate tests
for the statistical significance of result
differences. In Proceedings of the 18th
International Conference on Computational
Linguistics (COLING), pages 947–953,
Saarbr ¨uken.
Zhang, Yue and Stephen Clark. 2008.
A tale of two parsers: Investigating
and combining graph-based and
transition-based dependency parsing.
In Proceedings of the Conference on
Empirical Methods in Natural Language
Processing (EMNLP), pages 562–571,
Honolulu, HI.
Zhang, Yue and Joakim Nivre. 2011.
Transition-based parsing with rich
non-local features. In Proceedings of the
49th Annual Meeting of the Association
for Computational Linguistics (ACL),
pages 188–193, Portland, OR.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
267
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
2
2
5
9
1
8
0
3
2
4
3
/
c
o
l
i
_
a
_
0
0
1
8
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
PDF Herunterladen