Automatic Selection of HPSG-Parsed Sentences for
Treebank Construction
Montserrat Marimon∗
Universitat de Barcelona
∗∗
N ´uria Bel
Universität Pompeu Fabra
†
Llu´ıs Padr ´o
Universitat Polit`ecnica de Catalunya
This article presents an ensemble parse approach to detecting and selecting high-quality lin-
guistic analyses output by a hand-crafted HPSG grammar of Spanish implemented in the LKB
System. The approach uses full agreement (d.h., exact syntactic match) along with a MaxEnt parse
selection model and a statistical dependency parser trained on the same data. The ultimate goal
is to develop a hybrid corpus annotation methodology that combines fully automatic annotation
and manual parse selection, in order to make the annotation task more efficient while maintaining
high accuracy and the high degree of consistency necessary for any foreseen uses of a treebank.
1. Einführung
Treebanks constitute a crucial resource for theoretical linguistic investigations as well
as for NLP applications. Daher, in the past decades, there has been increasing interest in
their construction and both theory-neutral and theory-grounded treebanks have been
developed for a great variety of languages. Descriptions of available annotated corpora
can be found in Abeill´e (2003) and in the proceedings from the annual editions of the
International Workshop on Treebanks and Linguistic Theories.
Quantity and quality are two very important objectives when building a treebank,
but speed and low labor costs are also required. Zusätzlich, guaranteeing consis-
tency, das ist, that the same phenomena receive the same annotation through the corpus,
is crucial for any of the possible uses of the treebank. The first attempts at treebank
projects used manual annotation mainly and devoted many hours of human labor
to their construction. Human annotation is not only slow and expensive, but it also
introduces errors and inconsistencies because of the difficulty and tiring nature of the
∗ Gran Via de les Corts Catalanes 585, 08007-Barcelona. Email: montserrat.marimon@ub.edu.
∗∗ Roc Boronat 138, 08018-Barcelona. Email: nuria.bel@upf.edu.
† Jordi Girona 1-3, 08034-Barcelona. Email: padro@lsi.upc.edu.
Einreichung erhalten: 16 Oktober 2012; revised submission received: 20 Oktober 2013; zur Veröffentlichung angenommen:
5 Dezember 2013.
doi:10.1162/COLI a 00190
© 2014 Verein für Computerlinguistik
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Computerlinguistik
Volumen 40, Nummer 3
task.1 Therefore, automating parts of the annotation process aims to leverage effective-
ness, producing a larger number of high-quality and consistent analyses in shorter time
and using fewer resources.
This article presents research that attempts to increase the degree of automation in
the annotation process when constructing a large treebank for Spanish (the IULA Span-
ish LSP Treebank) in the framework of the European project METANET4U (Enhancing
the European Linguistic Infrastructure, GA 270893GA).2
The treebank was developed using the following bootstrapping approach, Einzelheiten
of which are presented in Sections 3 Und 4:
(cid:2)
(cid:2)
(cid:2)
(cid:2)
(cid:2)
Erste, we annotated the sentences using the DELPH-IN development
Rahmen, in which the annotation process is effected by manually
selecting the correct parses from among all the analyses produced by a
hand-built symbolic grammar.
Zweite, when a number of human-validated parsed sentences were
verfügbar, we trained a MaxEnt ranker.
Dritte, we trained a dependency parser with the human-validated parsed
sentences converted to the CoNLL format.
Vierte, we provided a fully automated chain based on an ensemble
method that compared the parse delivered by the dependency parser and
the one delivered by the MaxEnt ranker, and then accepted the
automatically proposed analysis, but only if both were identical.
Fünfte, sentences rejected by the ensemble were given to human annotators
for manual disambiguation.
Obviously, using fully automatic parsing would have been the best solution for
speed and consistency, but no statistical parsers for Spanish are good enough yet, Und
when using symbolic parsers, there is no way to separate good parses from incorrect
ones. The ensemble method we propose is a way of avoiding monitoring automatic
parsing; the error is more than acceptable and recall is expected to be augmented by
re-training and the refinement of the different parses.
After this introduction, Abschnitt 2 presents an overview of related work on automatic
parse selection, Abschnitt 3 summarizes the set-up, Abschnitt 4 presents our experiments and
results and, finally, Abschnitt 5 concludes.
2. Related Work
In the broadest sense, this work is situated with respect to research into automatic
parse selection. Such projects have had a variety of different goals as well as dif-
ferent approaches, based on (ich) semantic filtering techniques (Yates, Schoenmackers,
and Etzioni 2006), (ii) sentence-level features (z.B., Länge; Kawahara and Uchimoto
1 In order to control errors, a common strategy is to control inter-annotator agreement by making two
annotators work on the same sentences. This makes the task even slower and more expensive.
2 The IULA Spanish LSP Treebank contains 43,000 annotated sentences, distributed among different
domains (Law, Economy, Computing Science, Medicine, and Environment) and sentence lengths
(ranging from 4 Zu 30 Wörter). The treebank is publicly available at http://metashare.upf.edu.
524
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Marimon, Bel, and Padr ´o
Automatic Selection of HPSG-Parsed Sentences
2008), (iii) statistics about PoS sequences in a batch of parsed sentences (Reichart and
Rappoport 2009), Und (iv) ensemble parse algorithms (Reichart and Rappoport 2007;
Sagae and Tsujii 2007; Baldridge and Osborne 2003). Hier, we focus on ensemble
approaches.
Reichart and Rappoport (2007) selected high-quality constituency parses by
using the level of agreement among 20 copies of the same parser, trained on dif-
ferent subsets of a training corpus. Experiments using training and test data for the
same domain and in the parser-adaptation scenario showed improvements over several
baselines.
Sagae and Tsujii (2007) used an ensemble to select high-quality dependency parses.
They compared the outputs of two statistical shift-reduce LR models and selected only
identical parses, in their case to retrain the MaxEnt model. Following this procedure,
they achieved the highest score in the domain adaptation track of the CoNLL 2007
shared task.
Endlich, Baldridge and Osborne (2003) used an ensemble of parsers in the con-
text of HPSG grammars applied to committee-based active learning, das ist, to select
the most informative sentences to be hand-annotated and used as training material
to improve the statistical parser and to minimize the required amount of such sen-
tences. Using the English Resource Grammar (Flickinger 2002) and the Redwoods
treebank (Oepen et al. 2002), they showed that sample selection according to preferred
parse disagreement between two different machine learning algorithms (log-linear
and perceptron), or between the same algorithm trained on two independent feature
sets (configurational and ngram sets, based on the HPSG derivation trees), reduced the
amount of human-annotated material needed to train an HPSG parse selection model
compared with a certainty-based method based on tree entropy and several baseline
selection metrics.
Like Baldridge and Osborne (2003), we investigate ensemble parsing in the context
of HPSG grammars; Jedoch, our goal does not involve selecting the most informative
sentences to retrain the parser, but rather to select those sentences most reliably parsed,
in order to enlarge the treebank automatically. Daher, rather than selecting sentences on
which two models disagree, we select those where they agree completely. Zusätzlich,
we present two important contributions, going beyond what has been done in previous
arbeiten. Erste, although parsing ensembles have previously been proposed only for closely
related language models (d.h., parsers that use algorithms under the machine-learning
paradigm, varying only the feature set or training data), the presented work is the
first to combine parsers from different paradigms: stochastic dependency parsing and
MaxEnt parse selection over parses produced by a symbolic grammar. Zweite, Die
current work is the first to propose such a methodology for parse selection as a way of
overcoming the seemingly impossible task of automatically selecting good parses from
automatic parsing to speed treebank production and, more importantly, to meet the
requirements of high precision and high consistency that are good for all of the uses of
the treebank.
3. Set-up
We select high-quality HPSG analyses using full agreement among a MaxEnt parse
selection model and a dependency parser. A comparison between the two is performed
on the dependency structures that we obtain converting the parse tree produced by a
symbolic grammar to the CoNLL format.
525
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Computerlinguistik
Volumen 40, Nummer 3
3.1 HPSG Parsing and Disambiguation
Our investigation uses the Deep Linguistic Processing with HPSG Initiative (DELPH-
IN),3 an open-source processing framework also used in several treebank projects
within this international initiative (Oepen et al. 2002; Flickinger et al. 2012). Using this
Rahmen, the annotation process is divided into two parts: (1) the corpus is parsed
using a hand-built HPSG (Pollard and Sag 1994); (2) the grammar output is ranked by
a MaxEnt-based parse ranker (Toutanova et al. 2005), and the best parse is manually
selected.
The grammar applied in parsing is a broad-coverage, open-source Spanish gram-
mar implemented in the Linguistic Knowledge Builder (LKB) System (Copestake 2002),
the Spanish Resource Grammar (SRG) (Marimon 2013).
The manual selection task is performed with an interface provided as part of the
[incr tstb()] grammar profiling environment (Oepen and Carroll 2000) that allows the
annotator to reduce the set of parses incrementally by choosing so-called discriminants
(Fuhrmann 1997); das ist, by selecting the features that distinguish between the different
parses, until the appropriate parse is left or, if none of the displayed parses is the correct
eins, all parses are rejected.
As always the case with symbolic grammars, the SRG produces several hundreds
of analyses for a sentence. The DELPH-IN framework, Jedoch, provides a MaxEnt-
based ranker that sorts the parses produced by the grammar. Although this stochastic
ranker cannot be used to select automatically the correct parse without introducing a
considerable number of errors (as we will show, it only achieves accuracy of about 61%),
it nevertheless allows the annotator to reduce the forest to the n-best trees, typically the
500 top readings. The statistics that form the model of the MaxEnt ranker are gathered
from disambiguated parses and can be updated as the number of annotated sentences
erhöht sich.
3.2 Conversion to the CoNLL Format
The linguistic analysis produced by the LKB system for each parsed sentence provides,
together with a constituent structure and a Minimal Recursion Semantics (MRS) seman-
tic representation (Copestake et al. 2005), a derivation tree, obtained from a complete
syntactico-semantic analysis represented in a parse tree with standard HPSG-typed
feature structures at each node.
The derivation tree is encoded in a nested, parenthesized structure whose ele-
ments correspond to the identifiers of grammatical rules and the lexical items used
in parsing. Phrase structure rules—marked by the suffix ‘ c’ (for ‘construction’)—
identify the daughter sequence, separated by a hyphen, Und, in headed-phrase con-
Anweisungen, a basic dependency relation between sentence constituents (z.B., Thema-
Kopf (sb-hd) and head-complement (hd-cmp)). Lexical
items are annotated with
part-of-speech information according to the EAGLES tag set for Spanish4 and their
lexical entry identifier, and they optionally include a lexical rule identifier. Figur 1
shows an example.
In order to compare the first-best trees selected by the MaxEnt selection model and
the outputs of the dependency parser, we convert the derivation trees to a dependency
3 http://www.delph-in.net/.
4 See http://www.ilc.cnr.it/EAGLES96/annotate/annotate.html.
526
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Marimon, Bel, and Padr ´o
Automatic Selection of HPSG-Parsed Sentences
Figur 1
Derivation tree and dependency graph of Conceder licencias, cuando as´ı lo dispongan las ordenanzas
[To grant licences, when so stipulated by ordinances].
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
Format, also illustrated in Figure 1. In this target annotation, lexical elements are linked
by asymmetrical dependency relations in which one of the elements is considered
the head of the relation and the other one is its dependant. The conversion is a fully
automatic and unambiguous process that produces the dependency structure in the
CoNLL format (Buchholz and Marsi 2006). A deterministic conversion algorithm
makes use of the identifiers of the phrase structure rules mentioned previously, In
order to identify the heads, dependants, and some dependency types that are directly
transferred onto the dependency structure (z.B., Thema, specifier, and modifier). Der
identifiers of the lexical entries, which include the syntactic category of the sub-
categorized elements, enable the identification of the argument-related dependency
functions.5
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
3.3 Dependency Parsing
For dependency parsing, we use MaltParser (Nivre et al. 2007). To train it, wir gebrauchen
manually disambiguated parses among those parses produced by the HPSG grammar,
converted to the dependency format we describe earlier.
5 An alternative proposal for projecting HPSG trees to CoNLL is described in Ivanova et al. (2012).
527
Computerlinguistik
Volumen 40, Nummer 3
Tisch 1
Results of the MaxEnt model and MaltParser as labeled attachment scores, unlabeled
attachment scores, labeled accuracy score, and exact syntactic match.
LAS
UAS
Label Accur Score
Exact Synt Match
MaxEnt model
MaltParser
95.4% 96.8%
92.0% 95.0%
97.6%
94.5%
61.0%
43.1%
4. Experiments and Results
In our experiments, we tested the ability of the ensemble approach to select only correct
parses. The experiment proceeded as follows:
(cid:2)
(cid:2)
(cid:2)
We divided a set of 15,329 sentences into a training and test set (13,901 Und
1,428 Sätze, jeweils). Sentence length ranged from 4 Zu 20 Wörter
(longer sentences had not been annotated yet).
We trained the MaxEnt model and MaltParser and ran each of the models
on the test set. The results we achieved are displayed in Table 1.
We compared the outputs of the two models and selected those sentences
where both parses produced identical analyses.
The performance of our parser ensemble approach was measured through precision
and recall on the task of selecting those sentences for which the first tree proposed by
the MaxEnt model was the correct one. Tisch 2 shows the confusion matrix resulting
from the experiment. The row predicted ok counts the number of sentences selected
by our ensemble method (Malt and MaxEnt delivered parses are identical), und das
row predicted nok contains the number of sentences not selected because the parsers
disagreed. Columns gold present the manual evaluation of a MaxEnt model first ranked
parse. From this table, we can compute precision and recall of our sentence selector:
445 sentences were selected out of the 1,428 sentences in the test set (31.2%). Precision
(number of correctly selected sentences among all the selected sentences) stood at 90.6%
(403/445), and recall (number of correctly selected sentences among all the actually
correctly ranked first sentences) War 46.6% (403/864).
We compared the results of our ensemble method with two parse selection methods
based on: (ich) a simple probability-based threshold (baseline) Und (ii) a parser uncertainty
measure computed as tree entropy as used by Baldridge and Osborne (2003). The baseline
consisted of selecting sentences for which the ratio between the probabilities of the two
highest ranked analyses delivered by the MaxEnt model was over a given threshold.
Tisch 2
Confusion matrix used to assess the results in terms of precision and recall.
gold
ok
403
461
864
nok
42
522
564
total
445
983
1,428
predicted
ok
nok
total
528
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
C
Ö
l
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
C
Ö
l
ich
_
A
_
0
0
1
9
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Marimon, Bel, and Padr ´o
Automatic Selection of HPSG-Parsed Sentences
The idea was that a very high ratio would indicate that the parse ranked first had a
large advantage over the others, whereas if the ratio was close to 1, both the first and
the second analyses would have similar probabilities, indicating lower confidence of the
model in the decision. Tree entropy takes into account not just the two highest ranked
Analysen, but all trees proposed by the parser for that sentence. The rationale is that
high entropy indicates a scattered probability distribution among possible trees (Und
thus less certainty of the model in the prediction), whereas low entropy should indicate
that one tree (or a few) gets most of the probability mass.
Results for different thresholds (both for the baseline and tree entropy) are shown
in Table 3 (top). As we can see, setting a high threshold for the baseline, we can select a
small subset of 20% of the sentences with precision similar to that achieved by our parse
ensemble approach. To select 31% of the sentences (d.h., about the same proportion we
obtained with the ensemble approach) we need to set a threshold of 4.5, obtaining a
precision of 84%, which is lower than the 90% obtained with the ensemble method.
Tree entropy exhibits similar behavior, in that a restrictive threshold can select
um 15% of sentences with precision over 90%, while setting a threshold such that
um 31% of sentences are selected, we obtain precision of about 75%.
Note that although the baseline has an F1 score slightly higher than the ensemble,
our goal is a high precision filter that can be utilized to select correctly parsed sentences.
From this point of view, our approach beats both baselines.
The fact that tree entropy yields worse values than the baseline is somehow pre-
dictable: Given a sentence with n possible trees (note that n may be in the order of
dozens or even hundreds), if a small number m of those analyses (1 < m << n) concen-
trate a large portion of probability mass but exhibit small differences between them, the
sentence will be rejected by the baseline (because there is not enough distance between
the first and second analyses) but will be accepted by tree entropy (because entropy will
be relatively low, given the large value of n). Thus, tree entropy is a good measure for
Baldridge and Osborne (2003), whose purpose is to select sentences where the model
is less confident, but our simple baseline seems to be better when the goal is to select
sentences where the first parse is the correct one.
Table 3
Top: Comparative results using different threshold values for the baselines. Bottom: Results per
sentence length when selecting about 31% over all sentences. Thr = threshold; %sel = percentage
of selected sentences; P = precision; R = recall; Len = sentence length.
Thr. %sel
Baseline
P
R
Thr.
Tree entropy
%sel
P
R
%sel
Ensemble
P
R
2
3
4.5
10
20
30
Len.
50.9% 67.6% 70.2% 0.2
38.4% 77.5% 60.8% 0.15
31.0% 84.1% 53.3% 0.133
20.8% 91.1% 38.6% 0.1
12.1% 97.3% 24.0% 0.075
9.9% 98.7% 19.9% 0.05
59.6% 60.1% 73.2%
38.2% 71.7% 56.0%
31.3% 75.7% 48.3% 31.2% 90.6% 46.6%
21.4% 82.0% 35.8%
15.1% 91.5% 28.2%
11.2% 96.6% 22.2%
Baseline
P
R
%sel
Tree entropy
P
R
%sel
Ensemble
P
R
%sel
1-10
11-20
56.0% 96.7% 70.6% 42.8% 96.6% 53.9% 43.6% 97.7% 70.4%
19.8% 68.2% 36.9% 26.1% 60.3% 43.0% 10.3% 83.7% 33.8%
All
31.0% 84.1% 53.3% 31.3% 75.7% 48.3% 31.2% 90.6% 46.6%
529
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
c
o
l
i
_
a
_
0
0
1
9
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 40, Number 3
As shown in Table 3 (bottom), behavior is different for sentences of up to 10 words
than for longer sentences. All three systems have a bias towards selecting short rather
than long sentences (because short sentences are more often correctly analyzed by the
parser). The results for short sentences are similar in all three cases, but the ensemble
approach is clearly more precise for long sentences, with only a moderate loss in recall.
5. Conclusion
We have described research that aims to increase the degree of automation when
building annotated corpora. We propose a parser ensemble approach based on full
agreement between a MaxEnt model and a dependency parser to select correct linguistic
analyses output by an HPSG grammar. This enables a hybrid annotation methodology
that combines fully automatic annotation and manual parse selection, which makes the
annotation task more efficient while maintaining high accuracy and the high degree of
consistency necessary for a useful treebank. Our approach is grammar-independent and
can be used by any DELPH-IN-style treebank. In the future, we plan to investigate the
impact of automatic treebank enlargement on the performance of statistical parsers.
Acknowledgments
This work was supported by grant Ram´on y
Cajal from Spanish MICINN and the project
METANET4U. We thank the reviewers for
their comments and Carlos Morell for his
support.
References
Abeill´e, Anne (editor). 2003. Treebanks:
Building and Using Parsed Corpora. Kluwer,
Amsterdam.
Baldridge, Jason and Miles Osborne.
2003. Active learning for HPSG
parse selection. In Proceedings of the
7th Conference on Computational Natural
Language Learning, pages 17–24,
Edmonton.
Buchholz, Sabine and Erwin Marsi. 2006.
CoNLL-X shared task on multilingual
dependency parsing. In Proceedings of the
10th Conference on Computational Natural
Language Learning, pages 149–164,
New York, NY.
Carter, David. 1997. The TreeBanker: A tool
for supervised training of parsed corpora.
In Proceedings of the 14th National Conference
on Artificial Intelligence, pages 598–603,
Providence, RI.
Copestake, Ann. 2002. Implementing Typed
Feature Structure Grammars. CSLI
Publications, Stanford, CA.
Copestake, Ann, Dan Flickinger, Carl
Pollard, and Ivan A. Sag. 2005. Minimal
recursion semantics: An introduction.
Research on Language and Computation,
3(4):281–332.
530
Flickinger, Dan. 2002. On building a more
efficient grammar by exploiting types.
In Natural Language Engineering (6)1—
Special Issue: Efficiency Processing with
HPSG: Methods, Systems, Evaluation,
16(1):1–17.
Flickinger, Dan, Valia Kordoni, Yi Zhang,
Ant ´onio Branco, Kiril Simov, Petya
Osenova, Catarina Carvalheiro, Francisco
Costa, and S´ergio Castro. 2012.
ParDeepBank: Multiple parallel deep
treebanking. In Proceedings of the 11th
Workshop on Treebanks and Linguistic
Theories, pages 97–108, Lisbon.
Ivanova, Angelina, Stephan Oepen, Lilja
Ovrelid, and Dan Flickinger. 2012.
Who did what to whom? A contrastive
study of syntacto-semantic dependencies.
In Proceedings of the 6th Linguistic
Annotation Workshop, pages 2–11,
Jeju Island.
Kawahara, Daisuke and Kiyotaka
Uchimoto. 2008. Learning reliability
of parses for domain adaptation of
dependency parsing. In Proceedings of the
3rd International Joint Conference on Natural
Language Processing, pages 709–714,
Hyderabad.
Marimon, Montserrat. 2013. The Spanish
DELPH-IN grammar. Language Resources
and Evaluation, 47(2):371–397.
Nivre, Joakim, Johan Hall, Jens Nilsson,
Atanas Chanev, G ¨ulsen Eryigit, Sandra
K ¨ubler, Svetoslav Marinov, and Erwin
Mars. 2007. Maltparser: A language-
independent system for data-driven
dependency parsing. Natural Language
Engineering, 13(2):95–135.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
c
o
l
i
_
a
_
0
0
1
9
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Marimon, Bel, and Padr ´o
Automatic Selection of HPSG-Parsed Sentences
Oepen, Stephan and John Carroll. 2000.
Performance profiling for parser
engineering. In Natural Language
Engineering (6)1—Special Issue: Efficiency
Processing with HPSG: Methods, Systems,
and Evaluation, 16(1):81–97.
Oepen, Stephan, Dan Flickinger,
K. Toutanova, and C. D. Manning. 2002.
LinGo Redwoods. A rich and dynamic
treebank for HPSG. In Proceedings
of the 1st Workshop on Treebanks and
Linguistic Theories, pages 139–149,
Sozopol.
Pollard, Carl and Ivan A. Sag. 1994.
Head-driven Phrase Structure Grammar.
The University of Chicago Press and
CSLI Publications, Chicago.
Reichart, Roi and Ari Rappoport. 2007.
An ensemble method for selection of
high quality parses. In Proceedings of the
45th Annual Meeting of the Association for
Computational Linguistics, pages 408–415,
Prague.
Reichart, Roi and Ari Rappoport. 2009.
Automatic selection of high quality
parses created by a fully unsupervised
parser. In Proceedings of the 13th
Conference on Computational Natural
Language Learning, pages 156–164,
Boulder, CO.
Sagae, Kenji and Jun-Ichi Tsujii. 2007.
Dependency parsing and domain
adaptation with LR models and parser
ensembles. In Proceedings of the Joint
Meeting of the Conference on Empirical
Methods in Natural Language Processing and
Conference on Computational Natural
Language Learning, pages 1,044–1,050,
Prague.
Toutanova, Kristina, Christoper D. Manning,
Dan Flickinger, and Stephan Oepen. 2005.
Stochastic HPSG parse disambiguation
using the Redwoods corpus. Research on
Language and Computation, 3(1):83–105.
Yates, Alexander, Stefan Schoenmackers,
and Oren Etzioni. 2006. Detecting parser
errors using Web-based semantic filters.
In Proceedings of the 11th Conference of
Empirical Methods in Natural Language
Processing, pages 27–34, Sydney.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
c
o
l
i
_
a
_
0
0
1
9
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
531
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
0
3
5
2
3
1
8
0
3
2
0
7
/
c
o
l
i
_
a
_
0
0
1
9
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3