A Machine Learning Approach to - Am MIT spezialisierte KI-Forschung

A Machine Learning Approach to
Modeling Scope Preferences

Derrick Higgins∗
Universität von Chicago

Jerrold M. Sadock†
Universität von Chicago

This article describes a corpus-based investigation of quantiﬁer scope preferences. Following
recent work on multimodular grammar frameworks in theoretical linguistics and a long history
of combining multiple information sources in natural language processing, scope is treated as a
distinct module of grammar from syntax. This module incorporates multiple sources of evidence
regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments
discussed in this article evaluate the performance of our models in predicting the most likely scope
reading for a particular sentence, using Penn Treebank data both with and without syntactic
annotation. We wish to focus attention on the issue of determining scope preferences, which has
largely been ignored in theoretical linguistics, and to explore different models of the interaction
between syntax and quantiﬁer scope.

1. Overview

This article addresses the issue of determining the most accessible quantiﬁer scope
reading for a sentence. Quantiﬁers are elements of natural and logical languages (solch
as each, NEIN, and some in English and ∀ and ∃ in predicate calculus) that have certain
semantic properties. Loosely speaking, they express that a proposition holds for some
proportion of a set of individuals. One peculiarity of these expressions is that there
can be semantic differences that depend on the order in which the quantiﬁers are
interpretiert. These are known as scope differences.

(1)

Everyone likes two songs on this album.

As an example of the sort of interpretive differences we are talking about, consider
the sentence in (1). There are two readings of this sentence; which reading is meant
depends on which of the two quantiﬁed expressions everyone and two songs on this
album takes wide scope. The ﬁrst reading, in which everyone takes wide scope, simply
implies that every person has a certain preference, not necessarily related to anyone
else’s. This reading can be paraphrased as “Pick any person, and that person will like
two songs on this album.” The second reading, in which everyone takes narrow scope,
implies that there are two speciﬁc songs on the album of which everyone is fond, sagen,
“Blue Moon” and “My Way.”

In theoretical linguistics, attention has been primarily focused on the issue of scope
Generation. Researchers applying the techniques of quantiﬁer raising and Cooper stor-
age have been concerned mainly with enumerating all of the scope readings for a

∗ Department of Linguistics, Universität von Chicago, 1010 East 59th Street, Chicago, IL 60637. Email:

dchiggin@alumni.uchicago.edu.

† Department of Linguistics, Universität von Chicago, 1010 East 59th Street, Chicago, IL 60637. Email:

j-sadock@uchicago.edu.

C(cid:1) 2003 Verein für Computerlinguistik

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

sentence that are possible, without regard to their relative likelihood or naturalness.
Kürzlich, Jedoch, linguists such as Kuno, Takami, and Wu (1999) have begun to turn
their attention to scope prediction, or determining the relative accessibility of different
scope readings.

In computational linguistics, more attention has been paid to the factors that de-
termine scope preferences. Systems such as the SRI Core Language Engine (Moran
1988; Moran and Pereira 1992), LUNAR (Wald 1986), and TEAM (Martin, Appelt,
and Pereira 1986) have employed scope critics that use heuristics to decide between
alternative scopings. Jedoch, the rules that these systems use in making quantiﬁer
scope decisions are motivated only by the researchers’ intuitions, and no empirical
results have been published regarding their accuracy.

In diesem Artikel, we use the tools of machine learning to construct a data-driven
model of quantiﬁer scope preferences. For theoretical linguistics, this model serves as
an illustration that Kuno, Takami, and Wu’s approach can capture some of the clear-
est generalizations about quantiﬁer scoping. For computational linguistics, dieser Artikel
provides a baseline result on the task of scope prediction, with which other scope
critics can be compared. Zusätzlich, it is the most extensive empirical investigation
of which we are aware that collects data of any kind regarding the relative frequency
of different quantiﬁer scope readings in English text.1

Abschnitt 2 brieﬂy discusses treatments of scoping issues in theoretical linguistics,
and Section 3 reviews the computational work that has been done on natural language
quantiﬁer scope. In Section 4 we introduce the models that we use to predict quantiﬁer
scoping, as well as the data on which they are trained and tested. Abschnitt 5 combines
the scope model of the previous section with a probabilistic context-free grammar
(PCFG) model of syntax and addresses the issue of whether these two modules of
grammar ought to be combined in serial, with information from the syntax feeding the
quantiﬁer scope module, or in parallel, with each module constraining the structures
provided by the other.

2. Approaches to Quantiﬁer Scope in Theoretical Linguistics

Most, if not all, linguistic treatments of quantiﬁer scope have closely integrated it with
the way in which the syntactic structure of a sentence is built up. Montague (1973) gebraucht
a syntactic rule to introduce a quantiﬁed expression into a derivation at the point where
it was to take scope, whereas generative semantic analyses such as McCawley (1998)
represented the scope of quantiﬁcation at deep structure, transformationally lowering
quantiﬁers into their surface positions during the course of the derivation. More recent
work in the interpretive paradigm takes the opposite approach, extracting quantiﬁers
from their surface positions to their scope positions by means of a quantiﬁer-raising
(QR) transformation (Mai 1985; Aoun and Li 1993; Hornstein 1995). Another popular
technique is to percolate scope information up through the syntactic tree using Cooper
storage (Cooper 1983; Hobbs and Shieber 1987; Pollard 1989; Nerbonne 1993; Park 1995;
Pollard and Yoo 1998).

The QR approach to dealing with scope in linguistics consists in the claim that
there is a covert transformation applying to syntactic structures that moves quantiﬁed
elements out of the position in which they are found on the surface and raises them to
a higher position that reﬂects their scope. The various incarnations of the strategy that

1 See Carden (1976), Jedoch, for a questionnaire-based approach to gathering data on the accessibility

of different quantiﬁer scope readings.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

Figur 1
Simple illustration of the QR approach to quantiﬁer scope generation.

follows from this claim differ in the precise characterization of this QR transforma-
tion, what conditions are placed upon it, and what tree-conﬁgurational relationship
is required for one operator to take scope over another. The general idea of QR is
represented in Figure 1, a schematic analysis of the reading of the sentence Someone
saw everyone in which someone takes wide scope (d.h., ‘there is some person x such that
for all persons y, x saw y’).

In the Cooper storage approach, quantiﬁers are gathered into a store and passed
upward through a syntactic tree. At certain nodes along the way, quantiﬁers may be
retrieved from the store and take scope. The relative scope of quantiﬁers is determined
by where each quantiﬁer is retrieved from the store, with quantiﬁers higher in the tree
taking wide scope over lower ones. As with QR, different authors implement this
scheme in slightly different ways, but the simplest case is represented in Figure 2, Die
Cooper storage analog of Figure 1.

These structural approaches, QR and Cooper storage, have in common that they
allow syntactic factors to have an effect only on the scope readings that are available for
a given sentence. They are also similar in addressing only the issue of scope generation,
or identifying all and only the accessible readings for each sentence. That is to say,
they do not address the issue of the relative salience of these readings.

Kuno, Takami, and Wu (1999, 2001) propose to model the scope of quantiﬁed
elements with a set of interacting expert systems that basically consists of a weighted
vote taken of the various factors that may inﬂuence scope readings. This model is
meant to account not only for scope generation, but also for “the relative strengths of
the potential scope interpretations of a given sentence” (1999, page 63). They illustrate
the plausibility of this approach in their paper by presenting a number of examples
that are accounted for fairly well by the approach even when an unweighted vote of
the factors is allowed to be taken.

Also, Zum Beispiel, in Kuno, Takami and Wu’s (49B) (1999), repeated here as (2), Die
correct prediction is made: that the sentence is unambiguous with the ﬁrst quantiﬁed
noun phrase (NP) taking wide scope over the second (the reading in which we don’t
all have to hate the same people). Tisch 1 illustrates how the votes of each of Kuno,
Takami, and Wu’s “experts” contribute to this outcome. Since the expression many of
us/you receives more votes, and the numbers for the two competing quantiﬁed expres-
sions are quite far apart, the ﬁrst one is predicted to take wide scope unambiguously.

(2) Many of us/you hate some of them.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Figur 2
Simple illustration of the Cooper storage approach to quantiﬁer scope generation.

Tisch 1
Voting to determine optimal scope readings for quantiﬁers, according to Kuno, Takami, Und
Wu (1999).

Baseline:
Subject Q:
Lefthand Q:
Speaker/Hearer Q:
Total:

some of them
√

many of us/you
√
√
√
√

Some adherents of the structural approaches also seem to acknowledge the ne-
cessity of eventually coming to terms with the factors that play a role in determining
scope preferences in language. Aoun and Li (2000) claim that the lexical scope pref-
erences of quantiﬁers “are not ruled out under a structural account” (page 140). Es ist
clear from the surrounding discussion, obwohl, that they intend such lexical require-
ments to be taken care of in some nonsyntactic component of grammar. Obwohl
Kuno, Takami, and Wu’s dialogue with Aoun and Li in Language has been portrayed
by both sides as a debate over the correct way of modeling quantiﬁer scope, they are
not really modeling the same things. Whereas Aoun and Li (1993) provide an account
of scope generation, Kuno, Takami, and Wu (1999) intend to model both scope gen-
eration and scope prediction. The model of scope preferences provided in this article
is an empirically based reﬁnement of the approach taken by Kuno, Takami, and Wu,
but in principle it is consistent with a structural account of scope generation.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

3. Approaches to Quantiﬁer Scope in Computational Linguistics

Many studies, such as Pereira (1990) and Park (1995), have dealt with the issue of
scope generation from a computational perspective. Attempts have also been made
in computational work to extend a pure Cooper storage approach to handle scope
prediction. Hobbs and Shieber (1987) discuss the possibility of incorporating some sort
of ordering heuristics into the SRI scope generation system, in the hopes of producing
a ranked list of possible scope readings, but ultimately are forced to acknowledge that
„[T]he modiﬁcations turn out to be quite complicated if we wish to order quantiﬁers
according to lexical heuristics, such as having each out-scope some. Because of the
recursive nature of the algorithm, there are limits to the amount of ordering that can
be done in this manner” (page 55). The stepwise nature of these scope mechanisms
makes it hard to state the factors that inﬂuence the preference for one quantiﬁer to
take scope over another.

Those natural language processing (NLP) systems that have managed to provide
some sort of account of quantiﬁer scope preferences have done so by using a separate
system of heuristics (or scope critics) that apply postsyntactically to determine the most
likely scoping. LUNAR (Wald 1986), TEAM (Martin, Appelt, and Pereira 1986), Und
the SRI Core Language Engine as described by Moran (1988; Moran and Pereira 1992)
all employ scope rules of this sort. By and large, these rules are of an ad hoc nature,
implementing a linguist’s intuitive idea of what factors determine scope possibilities,
and no results have been published regarding the accuracy of these methods. Für
Beispiel, Moran (1988) incorporates rules from other NLP systems and from VanLehn
(1978), such as a preference for a logically weaker interpretation, the tendency for each
to take wide scope, and a ban on raising a quantiﬁer across multiple major clause
boundaries. The testing of Moran’s system is “limited to checking conformance to
the stated rules” (pages 40–41). Zusätzlich, these systems are generally incapable of
handling unrestricted text such as that found in the Wall Street Journal corpus in a
robust way, because they need to do a full semantic analysis of a sentence in order
to make scope predictions. The statistical basis of the model presented in this article
offers increased robustness and the possibility of more serious evaluation on the basis
of corpus data.

4. Modeling Quantiﬁer Scope

In diesem Abschnitt, we argue for an empirically driven machine learning approach to
the identiﬁcation of factors relevant to quantiﬁer scope and the modeling of scope
preferences. Following much recent work that applies the tools of machine learning to
linguistic problems (Brill 1995; Pedersen 2000; van Halteren, Zavrel, and Daelemans
2001; Soon, Ng, and Lim 2001), we will treat the prediction of quantiﬁer scope as
an example of a classiﬁcation task. Our aim is to provide a robust model of scope
prediction based on Kuno, Takami, and Wu’s theoretical foundation and to address
the serious lack of empirical results regarding quantiﬁer scope in computational work.
We describe here the modeling tools borrowed from the ﬁeld of artiﬁcial intelligence
for the scope prediction task and the data from which the generalizations are to be
gelernt. Endlich, we present the results of training different incarnations of our scope
module on the data and assess the implications of this exercise for theoretical and
computational linguistics.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

4.1 Classiﬁcation in Machine Learning
Determining which among multiple quantiﬁers in a sentence takes wide scope, gegeben
a number of different sources of evidence, is an example of what is known in machine
learning as a classiﬁcation task (Mitchell 1996). There are many types of classiﬁers
that may be applied to this task that both are more sophisticated than the approach
suggested by Kuno, Takami, and Wu and have a more solid probabilistic foundation.
These include the naive Bayes classiﬁer (Manning and Sch ¨utze 1999; Jurafsky and
Martin 2000), maximum-entropy models (Berger, Della Pietra, and Della Pietra 1996;
Ratnaparkhi 1997), and the single-layer perceptron (Bishop 1995). We employ these
classiﬁer models here primarily because of their straightforward probabilistic inter-
pretation and their similarity to the scope model of Kuno, Takami, and Wu (since they
each could be said to implement a kind of weighted voting of factors). In Section 4.3,
we describe how classiﬁers of these types can be constructed to serve as a grammatical
module responsible for quantiﬁer scope determination.

All of these classiﬁers can be trained in a supervised manner. Das ist, given a sam-
ple of training data that provides all of the information that is deemed to be relevant
to quantiﬁer scope and the actual scope reading assigned to a sentence, these classi-
ﬁers will attempt to extract generalizations that can be fruitfully applied in classifying
as-yet-unseen examples.

4.2 Data
The data on which the quantiﬁer scope classiﬁers are trained and tested is an extract
from the Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993) that we have
tagged to indicate the most salient scope interpretation of each sentence in context.
Figur 3 shows an example of a training sentence with the scope reading indicated.
The quantiﬁer lower in the tree bears the tag “Q1,” and the higher quantiﬁer bears the
tag “Q2,” so this sentence is interpreted such that the lower quantiﬁer has wide scope.
Reversing the tags would have meant that the higher quantiﬁer takes wide scope, Und
while if both quantiﬁers had been marked “Q1,” this would have indicated that there
is no scope interaction between them (as when they are logically independent or take
scope in different conjuncts of a conjoined phrase).2

The sentences tagged were chosen from the Wall Street Journal (WSJ) section of
the Penn Treebank to have a certain set of attributes that simplify the task of design-
ing the quantiﬁer scope module of the grammar. Erste, in order to simplify the coding
Verfahren, each sentence has exactly two scope-taking elements of the sort considered
for this project.3 These include most NPs that begin with a determiner, predeterminer,
or quantiﬁer phrase (QP)4 but exclude NPs in which the determiner is a, ein, oder der. Ex-

2 This “no interaction” class is a sort of “elsewhere” category that results from phrasing the classiﬁcation
question as “Which quantiﬁer takes wider scope in the preferred reading?” Where there is no scope
interaction, the answer is “neither.” This includes cases in which the relative scope of operators does
not correspond to a difference in meaning, as in One woman bought one horse, or when they take scope
in different propositional domains, such as in Mary bought two horses and sold three sheep. The human
coders used in this study were instructed to choose class 0 whenever there was not a clear preference
for one of the two scope readings.

3 This restriction that each sentence contain only two quantiﬁed elements does not actually exclude

many sentences from consideration. We identiﬁed only 61 sentences with three quantiﬁers of the sort
we consider and 12 sentences with four. Zusätzlich, our review of these sentences revealed that many
of them simply involve lists in which the quantiﬁers do not interact in terms of scope (as in, für
Beispiel, “We ask that you turn off all cell phones, extinguish all cigarettes, and open any candy before
the performance begins”). Daher, the class of sentences with more than two quantiﬁers is small and
seems to involve even simpler quantiﬁer interactions than those found in our corpus.

4 These categories are intended to be understood as they are used in the tagging and parsing of the Penn
Treebank. See Santorini (1990) and Bies et al. (1995) for details; the Appendix lists selected codes used

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

( (S

(NP-SBJ

(NP (DT Those) )
(SBAR

(WHNP-1 (WP who) )
(S

(NP-SBJ-2 (-NONE- *T*-1) )
(ADVP (RB still) )
(VP (VBP want)

(NP-SBJ (-NONE- *-2) )
(VP (TO to)

(VP (VB do)

(NP (PRP it) ))))))))

(‘‘ ‘‘)
(VP (MD will)

(ADVP (RB just) )
(VP (VB find)

(NP

(NP (DT-Q2 some) (NN way) )
(SBAR

(WHADVP-3 (-NONE- 0) )
(S

(NP-SBJ (-NONE- *) )
(VP (TO to)

(VP (VB get)

(PP (IN around) (’’ ’’)

(NP (DT-Q1 any) (NN attempt)

(NP-SBJ (-NONE- *) )
(VP (TO to)

(VP (VB curb)

(NP (PRP it) ))))))

(ADVP-MNR (-NONE- *T*-3) ))))))))

(. .) ))

Figur 3
Tagged Wall Street Journal text from the Penn Treebank. The lower quantiﬁer takes wide
scope, indicated by its tag “Q1.”

cluding these determiners from consideration largely avoids the problem of generics
and the complexities of assigning scope readings to deﬁnite descriptions. In ADDI-
tion, only sentences that had the root node S were considered. This serves to exclude
sentence fragments and interrogative sentence types. Our data set therefore differs
systematically from the full WSJ corpus, but we believe it is sufﬁcient to allow many
generalizations about English quantiﬁcation to be induced. Given these restrictions on
the input data, the task of the scope classiﬁer is a choice among three alternatives:5

(Klasse 0) There is no scopal interaction.

(Klasse 1) The ﬁrst quantiﬁer takes wide scope.

(Klasse 2) The second quantiﬁer takes wide scope.

for annotating the Penn Treebank corpus. The category QP is particularly unintuitive in that it does not
correspond to a quantiﬁed noun phrase, but to a measure expression, such as more than half.

5 Some linguists may ﬁnd it strange that we have chosen to treat the choice of preferred scoping for two
quantiﬁed elements as a tripartite decision, since the possibility of independence is seldom treated in
the linguistic literature. As we are dealing with corpus data in this experiment, we cannot afford to
ignore this possibility.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

The result is a set of 893 Sätze,6 annotated with Penn Treebank II parse trees and
hand-tagged for the primary scope reading.

To assess the reliability of the hand-tagged data used in this project, the data were
coded a second time by an independent coder, in addition to the reference coding.
The independent codings agreed with the reference coding on 76.3% of sentences. Der
kappa statistic (Cohen 1960) for agreement was .52, with a 95% conﬁdence interval
zwischen .40 Und .64. Krippendorff (1980) has been widely cited as advocating the
view that kappa values greater than .8 should be taken as indicating good reliability,
with values between .67 Und .8 indicating tentative reliability, but we are satisﬁed
with the level of intercoder agreement on this task. As Carletta (1996) notes, viele
tasks in computational linguistics are simply more difﬁcult than the content analysis
classiﬁcations addressed by Krippendorff, and according to Fleiss (1981), kappa values
zwischen .4 Und .75 indicate fair to good agreement anyhow.

Discussion between the coders revealed that there was no single cause for their dif-
ferences in judgments when such differences existed. Many cases of disagreement stem
from different assumptions regarding the lexical quantiﬁers involved. Zum Beispiel, Die
coders sometimes differed on whether a given instance of the word any corresponds
to a narrow-scope existential, as we conventionally treat it when it is in the scope of
negation, or the “free-choice” version of any. To take another example, two universal
quantiﬁers are independent in predicate calculus (∀x∀y[Phi] ⇐⇒ ∀y∀x[Phi]), but in creat-
ing our scope-tagged corpus, it was often difﬁcult to decide whether two universal-like
English quantiﬁers (such as each, any, jeden, and all) were actually independent in a
given sentence. Some differences in coding stemmed from coder disagreements about
whether a quantiﬁer within a ﬁxed expression (z.B., all the hoopla) truly interacts with
other operators in the sentence. Natürlich, another major factor contributing to inter-
coder variation is the fact that our data sentences, taken from Wall Street Journal text,
are sometimes quite long and complex in structure, involving multiple scope-taking
operators in addition to the quantiﬁed NPs. In such cases, the coders sometimes had
difﬁculty clearly distinguishing the readings in question.

Because of the relatively small amount of data we had, we used the technique of
tenfold cross-validation in evaluating our classiﬁers, in each case choosing 89 of the
893 total data sentences from the data as a test set and training on the remaining 804.
We preprocessed the data in order to extract the information from each sentence that
we would be treating as relevant to the prediction of quantiﬁer scoping in this project.
(Although the initial coding of the preferred scope reading for each sentence was done
manually, this preprocessing of the data was done automatically.) At the end of this
preprocessing, each sentence was represented as a record containing the following
Information (see the Appendix for a list of annotation codes for Penn Treebank):

•

the syntactic category, according to Penn Treebank conventions, of the
ﬁrst quantiﬁer (z.B., DT for each, NN for everyone, or QP for more than half )

the ﬁrst quantiﬁer as a lexical item (z.B., each or everyone). For a QP
consisting of multiple words, this ﬁeld contains the head word, or “CD”
in case the head is a cardinal number.

the syntactic category of the second quantiﬁer

the second quantiﬁer as a lexical item

6 These data have been made publicly available to all licensees of the Penn Treebank by means of a

patch ﬁle that may be retrieved from (cid:4)http://humanities.uchicago.edu/linguistics/students/dchiggin/
qscope-data.tgz(cid:5). This ﬁle also includes the coding guidelines used for this project.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

































Klasse:

DT
ﬁrst cat:
manche
ﬁrst head:
DT
second cat:
any
second head:
NP
join cat:
ﬁrst c-commands:
YES
second c-commands: NEIN
nodes intervening:

VP intervenes:
ADVP intervenes:

.
.
.

S intervenes:

conj intervenes:

, intervenes:
: intervenes:

.
.
.

” intervenes:

YES
NEIN

YES

NEIN

NEIN
NEIN

YES

































Figur 4
Example record corresponding to the sentence shown in Figure 3.

•

the syntactic category of the lowest node dominating both quantiﬁed
NPs (the “join” node)

• whether the ﬁrst quantiﬁed NP c-commands the second
• whether the second quantiﬁed NP c-commands the ﬁrst
•

the number of nodes intervening7 between the two quantiﬁed NPs

•

a list of the different categories of nodes that intervene between the
quantiﬁed NPs (Genau genommen, for each nonterminal category, there is a distinct
binary feature indicating whether a node of that category intervenes)

• whether a conjoined node intervenes between the quantiﬁed NPs
•

a list of the punctuation types that are immediately dominated by nodes
intervening between the two NPs (wieder, for each punctuation tag in the
treebank there is a distinct binary feature indicating whether such
punctuation intervenes)

Figur 4 illustrates how these features would be used to encode the example in Fig-
ure 3.

The items of information included in the record, as listed above, are not the exact
factors that Kuno, Takami, and Wu (1999) suggest be taken into consideration in mak-
ing scope predictions, and they are certainly not sufﬁcient to determine the proper
scope reading for all sentences completely. Surely pragmatic factors and real-world
knowledge inﬂuence our interpretations as well, although these are not represented
Hier. This list does, Jedoch, provide information that could potentially be useful in
predicting the best scope reading for a particular sentence. Zum Beispiel, Information

7 We take a node α to intervene between two other nodes β and γ in a tree if and only if δ is the lowest

node dominating both β and γ, δ dominates α or δ = α, and α dominates either β or γ.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Tisch 2
Baseline performance, summed over all ten test sets.

Condition Correct

Incorrect

Percentage correct

First has wide scope
Second has wide scope
No scope interaction
Total

0
0
545
545

64
281
0
345

0/64 = 0.0%
0/281 = 0.0%
545/545 = 100.0%
545/890 = 61.2%

about whether one quantiﬁed NP in a given sentence c-commands the other corre-
sponds to Kuno, Takami, and Wu’s observation that subject quantiﬁers tend to take
wide scope over object quantiﬁers and topicalized quantiﬁers tend to outscope ev-
erything. The identity of each lexical quantiﬁer clearly should allow our classiﬁers to
make the generalization that each tends to take wide scope, if this word is found in
the data, and perhaps even learn the regularity underlying Kuno, Takami, and Wu’s
observation that universal quantiﬁers tend to outscope existentials.

4.3 Classiﬁer Design
In diesem Abschnitt, we present the three types of model that we have trained to predict
the preferred quantiﬁer scoping on Penn Treebank sentences: a naive Bayes classiﬁer,
a maximum-entropy classiﬁer, and a single-layer perceptron.8 In evaluating how well
these models do in assigning the proper scope reading to each test sentence, it is im-
portant to have a baseline for comparison. The baseline model for this task is one that
simply guesses the most frequent category of the data (“no scope interaction”) jeden
Zeit. This simplistic strategy already classiﬁes 61.2% of the test examples correctly, als
shown in Table 2.

It may surprise some linguists that this third class of sentences in which there is
no scopal interaction between the two quantiﬁers is the largest. In part, this may be
due to special features of the Wall Street Journal text of which the corpus consists. Für
Beispiel, newspaper articles may contain more direct quotations than other genres. In
the process of tagging the data, Jedoch, it was also apparent that in a large proportion
of cases, the two quantiﬁers were taking scope in different conjuncts of a conjoined
Phrase. This further tendency supports the idea that people may intentionally avoid
constructions in which there is even the possibility of quantiﬁer scope interactions,
perhaps because of some hearer-oriented pragmatic principle. Linguists may also be
concerned that this additional category in which there is no scope interaction between
quantiﬁers makes it difﬁcult to compare the results of the present work with theoretical
accounts of quantiﬁer scope that ignore this case and concentrate on instances in which
one quantiﬁer does take scope over another. In response to such concerns, Jedoch,
we point out ﬁrst that we provide a model of scope prediction rather than scope
Generation, and so it is in any case not directly comparable with work in theoretical
linguistics, which has largely ignored scope preferences. Zweite, we point out that
the empirical nature of this study requires that we take note of cases in which the
quantiﬁers simply do not interact.

8 The implementations of these classiﬁers are publicly available as Perl modules at (cid:4)http://humanities.

uchicago.edu/linguistics/students/dchiggin/classiﬁers.tgz(cid:5).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

Tisch 3
Performance of the naive Bayes classiﬁer, summed over all 10 test runs.

Condition Correct

Incorrect

Percentage correct

First has wide scope
Second has wide scope
No scope interaction
Total

177
41
428
646

104
23
117
244

177/281 = 63.0%
41/64 = 64.1%
428/545 = 78.5%
646/890 = 72.6%

4.3.1 Naive Bayes Classiﬁer. Our data D will consist of a vector of features (d0 · · · dn)
that represent aspects of the sentence under examination, such as whether one quan-
tiﬁed expression c-commands the other, as described in Section 4.2. The fundamental
simplifying assumption that we make in designing a naive Bayes classiﬁer is that
these features are independent of one another and therefore can be aggregated as in-
dependent sources of evidence about which class c∗ a given sentence belongs to. Das
independence assumption is formalized in equations (1) Und (2).

∗ = arg max

P(C)P(d0 · · · dn | C)

≈ arg max

P(C)

N(cid:7)

k=0

P(dk

| C)

(1)

(2)

We constructed an empirical estimate of the prior probability P(C) by simply count-
ing the frequency with which each class occurs in the training data. We constructed
| C) by counting how often each feature dk co-occurs with the class c to
each P(dk
construct the empirical estimate ˆP(dk
| C) and interpolated this with the empirical
frequency ˆP(dk) of the feature dk, not conditioned on the class c. This interpolated
probability model was used in order to smooth the probability distribution, avoiding
the problems that can arise if certain feature-value pairs are assigned a probability of
null.

The performance of the naive Bayes classiﬁer is summarized in Table 3. For each
of the 10 test sets of 89 items taken from the corpus, the remaining 804 of the total
893 sentences were used to train the model. The naive Bayes classiﬁer outperformed
the baseline by a considerable margin.

In addition to the raw counts of test examples correctly classiﬁed, obwohl, Wir
would like to know something of the internal structure of the model (d.h., what sort of
features it has induced from the data). For this classiﬁer, we can assume that a feature
f is a good predictor of a class c∗ when the value of P(F | c∗) is signiﬁcantly larger
than the (geometric) mean value of P(F | C) for all other values of c. Those features
with the greatest ratio P(F ) ×

geom.mean(∀c(cid:4)=c∗[P(F |C)]) are listed in Table 4.9

P(F |C

∗)

The ﬁrst-ranked feature in Table 4 shows that there is a tendency for quanti-
ﬁed elements not to interact when they are found in conjoined constituents, und das
second-ranked feature indicates a preference for quantiﬁers not to interact when there
is an intervening comma (presumably an indicator of greater syntactic “distance”).
Feature 3 indicates a preference for class 1 when there is an intervening S node,

9 We include the term P(F ) in the product in order to prevent sparsely instantiated features from

showing up as highly-ranked.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Tisch 4
Most active features from naive Bayes classiﬁer.

Rank

Feature

Predicted Ratio

Klasse

1
2
3
4
5
6
15

There is an intervening conjunct node
There is an intervening comma
There is an intervening S node
The ﬁrst quantiﬁed NP does not c-command the second
Second quantiﬁer is tagged QP
There is an intervening S node
The second quantiﬁed NP c-commands the ﬁrst

0
0
1
0
1
0
2

1.63
1.51
1.33
1.25
1.16
1.12
1.00

whereas feature 6 indicates a preference for class 0 under the same conditions. Pre-
sumably, this reﬂects a dispreference for the second quantiﬁer to take wide scope
when there is a clause boundary intervening between it and the ﬁrst quantiﬁer. Der
fourth-ranked feature in Table 4 indicates that, if the ﬁrst quantiﬁed NP does not
c-command the second, it is less likely to take wide scope. This is not surprising,
given the importance that c-command relations have had in theoretical discussions
of quantiﬁer scope. The ﬁfth-ranked feature expresses a preference for quantiﬁed ex-
pressions of category QP to take narrow scope, if they are the second of the two
quantiﬁers under consideration. This may simply be reﬂective of the fact that class
1 is more common than class 2, and the measure expressions found in QP phrases
in the Penn Treebank (such as more than three or about half ) tend not to be logically
independent of other quantiﬁers. Endlich, the feature 15 in Table 4 indicates a high
correlation between the second quantiﬁed expression’s c-commanding the ﬁrst and
the second quantiﬁer’s taking wide scope. We can easily see this as a translation into
our feature set of Kuno, Takami, and Wu’s claim that subjects tend to outscope ob-
jects and obliques and topicalized elements tend to take wide scope. Some of these
top-ranked features have to do with information found only in the written medium,
but on the whole, the features induced by the naive Bayes classiﬁer seem consis-
tent with those suggested by Kuno, Takami, and Wu, although they are distinct by
necessity.

4.3.2 Maximum-Entropy Classiﬁer. The maximum-entropy classiﬁer is a sort of log-
linear model, deﬁning the joint probability of a class and a data vector (d0 · · · dn) als
the product of the prior probability of the class c with a set of features related to the
Daten:10

P(d0 · · · dn, C) =

P(C)
Z

N(cid:7)

αk

k=0

(3)

This classiﬁer superﬁcially resembles in form the naive Bayes classiﬁer in equation (2),
but it differs from that classiﬁer in that the way in which values for each α are chosen
does not assume that the features in the data are independent. For each of the 10
training sets, we used the generalized iterative scaling algorithm to train this classiﬁer
An 654 training examples, verwenden 150 examples for validation to choose the best set of

10 Z in Equation 3 is simply a normalizing constant that ensures that we end up with a probability

distribution.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

Tisch 5
Performance of the maximum-entropy classiﬁer, summed over all 10 test runs.

Condition Correct

Incorrect

Percentage correct

First has wide scope
Second has wide scope
No scope interaction
Total

148
31
475
654

133
33
70
236

148/281 = 52.7%
31/64 = 48.4%
475/545 = 87.2%
654/890 = 73.5%

Tisch 6
Most active features from maximum-entropy classiﬁer.

Rank

Feature

Predicted
Klasse

αc,.25

1
2
3
4
5
6
7
12
25

Second quantiﬁer is each
There is an intervening comma
There is an intervening conjunct node
First quantiﬁed NP does not c-command the second
Second quantiﬁer is every
There is an intervening quotation mark (”)
There is an intervening colon
First quantiﬁed NP c-commands the second
There is no intervening comma

2
0
0
0
2
0
0
1
1

1.13
1.01
1.00
0.99
0.98
0.95
0.95
0.92
0.90

values for the αs.11 Test data could then be classiﬁed by choosing the class for the data
that maximizes the joint probability in equation (3).

The results of training with the maximum-entropy classiﬁer are shown in Table 5.
The classiﬁer showed slightly higher performance than the naive Bayes classiﬁer, mit
the lowest error rate on the class of sentences having no scope interaction.

To determine exactly which features of the data the maximum-entropy classiﬁer
sees as relevant to the classiﬁcation problem, we can simply look at the α values (aus
equation (3)) for each feature. Those features with higher values for α are weighted
more heavily in determining the proper scoping. Some of the features with the highest
values for α are listed in Table 6. Because of the way the classiﬁer is built, predictor
features for class 2 need to have higher loadings to overcome the lower prior probabil-
ity of the class. daher, we actually rank the features in Table 6 according to αˆP(C)k
(which we denote as αc,k). ˆP(C) represents the empirical prior probability of a class c,
and k is simply a constant (.25 in diesem Fall) chosen to try to get a mix of features for
different classes at the top of the list.

The features ranked ﬁrst and ﬁfth in Table 6 express lexical preferences for certain
quantiﬁers to take wide scope, even when they are the second of the two quantiﬁers
according to linear order in the string of words. The tendency for each to take wide
scope is stronger than for the other quantiﬁer, which is in line with Kuno, Takami,
and Wu’s decision to list it as the only quantiﬁer with a lexical preference for scoping.
Feature 2 makes the “no scope interaction” class more likely if a comma intervenes, Und

11 Overtraining is not a problem with the pure version of the generalized iterative scaling algorithm. Für
efﬁciency reasons, Jedoch, we chose to take the training corpus as representative of the event space,
rather than enumerating the space exhaustively (see Jelinek [1998] for details). Aus diesem Grund, es war
necessary to employ validation in training.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Tisch 7
Performance of the single-layer perceptron, summed over all 10 test runs.

Condition Correct

Incorrect

Percentage correct

First has wide scope
Second has wide scope
No scope interaction
Total

182
35
468
685

99
29
77
205

182/281 = 64.8%
35/64 = 54.7%
468/545 = 85.9%
685/890 = 77.0%

feature 25 makes a wide-scope reading for the ﬁrst quantiﬁer more likely if there is no
intervening comma. The third-ranked feature expresses the tendency mentioned above
for quantiﬁers in conjoined clauses not to interact. Features 4 Und 12 indicate that if the
ﬁrst quantiﬁed expression c-commands the second, it is likely to take wide scope, Und
that if this is not the case, there is likely to be no scope interaction. Endlich, the sixth-
and seventh-ranked features in the table show that an intervening quotation mark or
colon will make the classiﬁer tend toward class 0, “no scope interaction,” which is easy
to understand. Quotations are often opaque to quantiﬁer scope interactions. The top
features found by the maximum-entropy classiﬁer largely coincide with those found
by the naive Bayes model, which indicates that these generalizations are robust and
objectively present in the data.

4.3.3 Single-Layer Perceptron. For our neural network classiﬁer, we employed a feed-
forward single-layer perceptron, with the softmax function used to determine the acti-
vation of nodes at the output layer, because this is a one-of-n classiﬁcation task (Bridle
1990). The data to be classiﬁed are presented as a vector of features at the input layer,
and the output layer has three nodes, representing the three possible classes for the
Daten: “ﬁrst has wide scope,” “second has wide scope,” and “no scope interaction.”
The output node with the highest activation is interpreted as the class of the datum
presented at the input layer.

For each of the 10 test sets of 89 examples, we trained the connection weights
of the network using error backpropagation on 654 training sentences, reserving 150
sentences for validation in order to choose the weights from the training epoch with the
highest classiﬁcation performance. In Table 7 we present the results of the single-layer
neural network in classifying our test sentences. As the table shows, the single-layer
perceptron has much better classiﬁcation performance than the naive Bayes classiﬁer
and maximum-entropy model, possibly because the training of the network aims to
minimize error in the activation of the classiﬁcation output nodes, which is directly
related to the classiﬁcation task at hand, whereas the other models do not directly
make use of the notion of “classiﬁcation error.” The perceptron also uses a sort of
weighted voting and could be interpreted as an implementation of Kuno, Takami,
and Wu’s proposal for scope determination. This clearly illustrates that the tenability
of their proposal hinges on the exact details of its implementation, since all of our
classiﬁer models are reasonable interpretations of their approach, but they have very
different performance results on our scope determination task.

To determine exactly which features of the data the network sees as relevant to
the classiﬁcation problem, we can simply look at the connection weights for each
feature-class pair. Higher connection weights indicate a greater correlation between
input features and output classes. For one of the 10 networks we trained, some of
the features with the highest connection weights are listed in Table 8. Since class 0 Ist

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

Tisch 8
Most active features from single-layer perceptron.

Rank

Feature

Predicted Weight

Klasse

1
2
3
4
17
18
19
20

There is an intervening comma
Second quantiﬁer is all
There is an intervening colon
There is an intervening conjunct node
The ﬁrst quantiﬁed NP c-commands the second
Second quantiﬁer is tagged RBS
There is an intervening S node
Second quantiﬁer is each

0
0
0
0
1
2
1
2

4.31
3.77
2.98
2.72
1.69
1.69
1.61
1.50

simply more frequent in the training data than the other two classes, the weights for
this class tend to be higher. daher, we also list some of the best predictor features
for classes 1 Und 2 in the table.

Der Erste- and third-ranked features in Table 8 show that an intervening comma or
colon will make the classiﬁer tend toward class 0, “no scope interaction.” This ﬁnding
by the classiﬁer is similar to the maximum-entropy classiﬁer’s ﬁnding an intervening
quotation mark relevant and can be taken as an indication that quantiﬁers in distant
syntactic subdomains are unlikely to interact. Ähnlich, the fourth-ranked feature indi-
cates that quantiﬁers in separate conjuncts are unlikely to interact. The second-ranked
feature in the table expresses a tendency for there to be no scope interaction between
two quantiﬁers if the second of them is headed by all. This may be related to the
independence of universal quantiﬁers (∀x∀y[Phi] ⇐⇒ ∀y∀x[Phi]). Feature 17 in Table 8
indicates a high correlation between the ﬁrst quantiﬁed expression’s c-commanding
the second and the ﬁrst quantiﬁer’s taking wide scope, which again supports Kuno,
Takami, and Wu’s claim that scope preferences are related to syntactic superiority re-
Beziehungen. Feature 18 expresses a preference for a quantiﬁed expression headed by most
to take wide scope, even if it is the second of the two quantiﬁers (since most is the
only quantiﬁer in the corpus that bears the tag RBS). Feature 19 indicates that the
ﬁrst quantiﬁer is more likely to take wide scope if there is a clause boundary in-
tervening between the two quantiﬁers, which supports the notion that the syntactic
distance between the quantiﬁers is relevant to scope preferences. Endlich, feature 20
expresses the well-known tendency for quantiﬁed expressions headed by each to take
wide scope.

4.4 Summary of Results
Tisch 9 summarizes the performance of the quantiﬁer scope models we have presented
Hier. All of the classiﬁers have test set accuracy above the baseline, which a paired
t-test reveals to be signiﬁcant at the .001 Ebene. The differences between the naive
Bayes, maximum-entropy, and single-layer perceptron classiﬁers are not statistically
signiﬁcant.

The classiﬁers performed signiﬁcantly better on those sentences annotated consis-
tently by both human coders at the beginning of the study, reinforcing the view that
this subset of the data is somehow simpler and more representative of the basic regu-
larities in scope preferences. Zum Beispiel, the single-layer perceptron classiﬁed 82.9%
of these sentences correctly. To further investigate the nature of the variation between
the two coders, we constructed a version of our single-layer network that was trained

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Tisch 9
Summary of classiﬁer results.

Training data Validation data

Test data

Baseline
Na¨ıve Bayes
Maximum entropy
Single-layer
perceptron

—
76.7%
78.3%

84.7%

—
—
75.5%

76.8%

61.2%
72.6%
73.5%

77.0%

on the data on which both coders agreed and tested on the remaining sentences. Das
classiﬁer agreed with the reference coding (the coding of the ﬁrst coder) 51.4% of the
time and with the additional independent coder 35.8% of the time. The ﬁrst coder con-
structed the annotation guidelines for this project and may have been more successful
in applying them consistently. Alternativ, it is possible that different individuals use
different strategies in determining scope preferences, and the strategy of the second
coder may simply have been less similar than the strategy of the ﬁrst coder to that of
the single-layer network.

These three classiﬁers directly implement a sort of weighted voting, the method
of aggregating evidence proposed by Kuno, Takami, and Wu (although the classiﬁers’
implementation is slightly more sophisticated than the unweighted voting that is ac-
tually used in Kuno, Takami, and Wu’s paper). Natürlich, since we do not use exactly
the set of features suggested by Kuno, Takami, and Wu, our model should not be
seen as a straightforward implementation of the theory outlined in their 1999 Papier.
Trotzdem, the results in Table 9 suggest that Kuno, Takami, and Wu’s suggested
design can be used with some success in modeling scope preferences. Darüber hinaus, Die
project undertaken here provides an answer to some of the objections that Aoun and
Li (2000) raise to Kuno, Takami, and Wu. Aoun and Li claim that Kuno, Takami, Und
Wu’s choice of experts is seemingly arbitrary and that it is unclear how the voting
weights of each expert are to be set, but the machine learning approach we employ
in this article is capable of addressing both of these potential problems. Supervised
training of our classiﬁers is a straightforward approach to setting the weights and
also constitutes our approach to selecting features (or “experts” in Kuno, Takami, Und
Wu’s terminology). In the training process, any feature that is irrelevant to scoping
preferences should receive weights that make its effect negligible.

5. Syntax and Scope

In diesem Abschnitt, we show how the classiﬁer models of quantiﬁer scope determination
introduced in Section 4 may be integrated with a PCFG model of syntax. We com-
pare two different ways in which the two components may be combined, which may
loosely be termed serial and parallel, and argue for the latter on the basis of empirical
results.

5.1 Modular Design
Our use of a phrase structure syntactic component and a quantiﬁer scope component
to deﬁne a combined language model is simpliﬁed by the fact that our classiﬁers are
probabilistic and deﬁne a conditional probability distribution over quantiﬁer scopings.
The probability distributions that our classiﬁers deﬁne for quantiﬁer scope structures
are conditional on syntactic phrase structure, because they are computed on the basis

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

of syntactically provided features, such as the number of nodes of a certain type that
intervene between two quantiﬁers in a phrase structure tree.

Daher, the combined language model that we deﬁne in this article assigns probabil-
ities according to the pairs of structures that may be assigned to a sentence by the Q-
structure and phrase structure syntax modules. The probability of a word string w1−n
is therefore deﬁned as in equation (4), where Q ranges over all possible Q-structures
in the set Q and S ranges over all possible syntactic structures in the set S.

P(w1−n) =

(cid:8)

P(S, Q | w1−n)

S∈S,Q∈Q
(cid:8)

S∈S,Q∈Q

P(S | w1−n)P(Q | S, w1−n)

(4)

(5)

Gleichung (5) shows how we can use the deﬁnition of conditional probability to
break our calculation of the language model probability into two parts. Der erste von
these parts, P(S | w1−n), which we may abbreviate as simply P(S), is the probability
of a particular syntactic tree structure’s being assigned to a particular word string. Wir
model this probability using a probabilistic phrase structure grammar (vgl. Charniak
[1993, 1996]). The second distribution on the right side of equation (5) is the conditional
probability of a particular quantiﬁer scope structure’s being assigned to a particular
word string, given the syntactic structure of that string. This probability is written as
P(Q | S, w1−n), or simply P(Q | S), and represents the quantity we estimated above
in constructing classiﬁers to predict the scopal representation of a sentence based on
aspects of its syntactic structure.

Daher, given a PCFG model of syntactic structure and a probabilistically deﬁned
classiﬁer of the sort introduced in Section 4, it is simple to determine the probability
of any pairing of two particular structures from each domain for a given sentence.
We simply multiply the values of P(S) and P(Q | S) to obtain the joint probability
P(Q, S). In the current section, we examine two different models of combination for
these components: one in which scope determination is applied to the optimal syn-
tactic structure (the Viterbi parse), and one in which optimization is performed in the
space of both modules to ﬁnd the optimal pairing of syntactic and quantiﬁer scope
structures.

5.2 The Syntactic Module
Before turning to the application of our multimodular approach to the problem of
scope determination in Section 5.3, we present here a short overview of the phrase
structure syntactic component used in these projects. Wie oben beschrieben, we model syn-
tax as a probabilistic phrase structure grammar (PCFG), and in particular, we use a
treebank grammar (Charniak 1996) trained on the Penn Treebank.

A PCFG deﬁnes the probability of a string of words as the sum of the probabilities
of all admissible phrase structure parses (trees) for that string. The probability of a
given tree is the product of the probability of all of the rule instances used in the
construction of that tree, where rules take the form N → φ, with N a nonterminal
symbol and φ a ﬁnite sequence of one or more terminals or nonterminals.

To take an example, Figur 5 illustrates a phrase structure tree for the sentence Su-
san might not believe you, which is admissible according to the grammar in Table 10. (Alle
of the minimal subtrees in Figure 5 are instances of one of our rules.) The probability

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Figur 5
A simple phrase structure tree.

Tisch 10
A simple probabilistic phrase structure grammar.

Rule

Probability

.7
.2
.1

.3
.1
.1
.3
.2

.3
.4
.3

.2
.3
.3
.2

.5
.5

S → NP VP
S → VP
S → V NP VP

VP → V VP
VP → ADV VP
VP → V
VP → V NP
VP → V NP NP

NP → Susan
NP → you
NP → Yves

V → might
V → believe
V → show
V → stay

ADV → not
ADV → always

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

of this tree, which we can indicate as τ , can be calculated as in equation (6).

(cid:7)

P(τ ) =

P(ρ)

ρ∈Rules(τ )

= P(S → NP VP) × P(VP → V VP) × P(VP → ADV VP)
× P(VP → V NP) × P(NP → Susan) × P(V → might)
× P(ADV → not) × P(V → believe) × P(NP → you)
−5
= .7 × .3 × .1 × .3 × .3 × .2 × .5 × .3 × .4 = 2.268 × 10

(6)

(7)

(8)

The actual grammar rules and associated probabilities that we use in deﬁning our
syntactic module are derived from the WSJ corpus of the Penn Treebank by maximum-
likelihood estimation. Das ist, for each rule N → φ used in the treebank, we add the
C(N→φ)
C(N→ψ) , where C(·) denotes the
rule to the grammar and set its probability to
“count” or a rule (d.h., the number of times it is used in the corpus). A grammar
composed in this manner is referred to as a treebank grammar, because its rules are
directly derived from those in a treebank corpus.

(cid:9)

We used sections 00–20 of the WSJ corpus of the Penn Treebank for collecting the
rules and associated probabilities of our PCFG, which is implemented as a bottom-up
chart parser. Before constructing the grammar, the treebank was preprocessed using
known procedures (vgl. Krotov et al. [1998]; Belz [2001]) to facilitate the construction of
a rule list. Functional and anaphoric annotations (basically anything following a “-”
in a node label; vgl. Santorini [1990]; Bies et al. [1995]) were removed from nonterminal
labels. Nodes that dominate only “empty categories” such as traces were removed.
Zusätzlich, unary-branching constructions were removed by replacing the mother
category in such a structure with the daughter node. (Zum Beispiel, given an instance
of the rule X → YZ, if the daughter category Y were expanded by the unary rule
Y → W, our algorithm would induce the single rule X → WZ.) Endlich, we discarded
all rules that had more than 10 symbols on the right-hand side (an arbitrary limit of
our parser implementation). This resulted in a total of 18,820 rules, of which 11,156
were discarded as hapax legomena, leaving 7,664 rules in our treebank grammar.
Tisch 11 shows some of the rules in our grammar with the highest and lowest corpus
counts.

5.3 Unlabeled Scope Determination
In diesem Abschnitt, we describe an experiment designed to assess the performance of
parallel and serial approaches to combining grammatical modules, focusing on the
task of unlabeled scope determination. This task involves predicting the most likely
Q-structure representation for a sentence, basically the same task we addressed in Sec-
tion 4, in comparing the performance levels of each type of classiﬁer. The experiment
of this section differs, Jedoch, from the task presented in Section 4 in that instead of
providing a syntactic tree from the Penn Treebank as input to the classiﬁer, we provide
the model only with a string of words (a sentence). Our dual-component model will
search for the optimal syntactic and scopal structures for the sentence (the pairing
(τ ∗, χ∗)) and will be evaluated based on its success in identifying the correct scope
reading χ∗.

Our concern in this section will be to determine whether it is necessary to search
the space of possible pairings (τ , χ) of syntactic and scopal structures or whether
it is sufﬁcient to use our PCFG ﬁrst to ﬁx the syntactic tree τ , and then to choose

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Computerlinguistik

Volumen 29, Nummer 1

Tisch 11
Rules derived from sections 00–20 of the Penn Treebank WSJ corpus. “TOP” is a special “start”
symbol that may expand to any of the symbols found at the root of a tree in the corpus.

Rule

PP → IN NP
TOP → S
NP → DT NN
NP → NP PP
S → NP VP
S → NP VP .
VP → TO VP
…

S → CC PP NNP NNP VP .
(cid:3)(cid:3)
NP → DT “ NN NN NN
NP → NP PP PP PP PP PP
INTJ → UH UH
NP → DT “ NN NNS
SBARQ → “ WP VP . (cid:3)(cid:3)
S → PP NP VP . (cid:3)(cid:3)

Corpus count

59,053
34,614
28,074
25,192
14,032
12,901
11,598

2
2
2
2
2
2
2

a scope reading to maximize the probability of the pairing. Das ist, are syntax and
quantiﬁer scope mutually dependent components of grammar, or can scope relations
be “read off of” syntax? The serial model suggests that the optimal syntactic structure
τ ∗ should be chosen on the basis of the syntactic module only, as in equation (9),
and the optimal quantiﬁer scope structure χ∗ then chosen on the basis of τ ∗, as in
equation (10). The parallel model, andererseits, suggests that the most likely
pairing of structures must be chosen in the joint probability space of both components,
as in equation (11).

τ ∗ = arg max

τ ∈S

PS(τ | w1−n)

χ∗ = arg max

χ∈Q

PQ(χ | τ ∗, w1−n)

τ ∗ = { τ | (τ , χ) = arg max
τ ∈S,χ∈Q

PS(τ | w1−n)PQ(χ | τ , w1−n) }

(9)

(10)

(11)

5.3.1 Experimental Design. For this experiment, we implement the scoping compo-
nent as a single-layer feed-forward network, because the single-layer perceptron clas-
siﬁer had the best prediction rate among the three classiﬁers tested in Section 4. Der
softmax activation function we use for the output nodes of the classiﬁer guarantees
that the activations of all of the output nodes sum to one and can be interpreted as
class probabilities. The syntactic component, Natürlich, is determined by the treebank
PCFG grammar described above.

Given these two models, which respectively deﬁne PQ(χ | τ , w1−n) and PS(τ | w1−n)
from equation (11), it remains only to specify how to search the space of pairings
(τ , χ) in performing this optimization to ﬁnd χ∗. Bedauerlicherweise, it is not feasible to
examine all values τ ∈ S, since our PCFG will generally admit a huge number of

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
/
C
Ö

l
ich
/

A
R
T
ich
C
e
–
P
D

F
/

2
9
1
7
3
1
7
9
7
9
1
0
0
8
9
1
2
0
1
0
3
3
2
1
3
3
7
4
4
9
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Higgins and Sadock

Modeling Scope Preferences

Tisch 12
Performance of models on the unlabeled scope prediction task, summed over all 10 test runs.

Condition

Correct

Incorrect

Percentage correct

Parallel model

First has wide scope
Second has wide scope
No scope interaction
Total

Serial model

First has wide scope
Second has wide scope
No scope interaction
Total

168
26
467
661

163
27
461
651

113
38
78
229

118
37
84
239

167/281 = 59.4%
26/64 = 40.6%
467/545 = 85.7%
661/890 = 74.3%

163/281 = 58.0%
27/64 = 42.2%
461/545 = 84.6%
651/890 = 73.1%

trees for a sentence (especially given a mean sentence length of over 20 words in
the WSJ corpus).12 Our solution to this search problem is to make the simplifying as-
sumption that the syntactic tree that is used in the optimal set of structures (τ ∗, χ∗)
will always be among the top few trees τ for which PS(τ | w1−n) is the greatest.
Das ist, although we suppose that quantiﬁer scope information is relevant to pars-
ing, we do not suppose that it is so strong a determinant as to completely over-
ride syntactic factors. In der Praxis, this means that our parser will return the top 10
parses for each sentence, along with the probabilities assigned to them, and these
are the only parses that are considered in looking for the optimal set of linguistic
structures.

We again used 10-fold cross-validation in evaluating the competing models, di-
viding the scope-tagged corpus into 10 test sections of 89 sentences each, and we
used the same version of the treebank grammar for our PCFG. The ﬁrst model re-
trieved the top 10 syntactic parses (τ0 · · · τ9) for each sentence and computed the
probability P(τ , χ) for each τ ∈ τ0 · · · τ9, χ ∈ 0, 1, 2, choosing that scopal represen-
tation χ that was found in the maximum-probability pairing. We call this the par-
allel model, because the properties of each probabilistic model may inﬂuence the
optimal structure chosen by the other. The second model retrieved only the Viterbi
parse τ0 from the PCFG and chose the scopal representation χ for which the pair-
ing (τ0, χ) took on the highest probability. We call this the serial model, because it
represents syntactic phrase structure as independent of other components of gram-
mar (in diesem Fall, quantiﬁer scope),
though other components are dependent
upon it.

5.3.2 Ergebnisse. There was an appreciable difference in performance between these two
models on the quantiﬁer scope test sets. As shown in Table 12, the parallel model
narrowly outperformed the serial model, von 1.2%. A 10-fold paired t-test on the test
sections of the scope-tagged corpus shows that the parallel model is signiﬁcantly better
(P < .05). 12 Since we are allowing χ to range only over the three scope readings (0, 1, 2), however, it is possible to enumerate all values of χ to be paired with a given syntactic tree τ . 93 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / / 2 9 1 7 3 1 7 9 7 9 1 0 0 8 9 1 2 0 1 0 3 3 2 1 3 3 7 4 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 29, Number 1 This result suggests that, in determining the syntactic structure of a sentence, we must take aspects of structure into account that are not purely syntactic (such as quanti- ﬁer scope). Searching both dimensions of the hypothesis space for our dual-component model allowed the composite model to handle the interdependencies between differ- ent aspects of grammatical structure, whereas ﬁxing a phrase structure tree purely on the basis of syntactic considerations led to suboptimal performance in using that structure as a basis for determining quantiﬁer scope. 6. Conclusion In this article, we have taken a statistical, corpus-based approach to the modeling of quantiﬁer scope preferences, a subject that has previously been addressed only with systems of ad hoc rules derived from linguists’ intuitive judgments. Our model takes its theoretical inspiration from Kuno, Takami, and Wu (1999), who suggest an “expert system” approach to scope preferences, and follows many other projects in the machine learning of natural language that combine information from multiple sources in solving linguistic problems. 0ur results are generally supportive of the design that Kuno, Takami, and Wu pro- pose for the quantiﬁer scope component of grammar, and some of the features induced by our models ﬁnd clear parallels in the factors that Kuno, Takami, and Wu claim to be relevant to scoping. In addition, our ﬁnal experiment, in which we combine our quantiﬁer scope module with a PCFG model of syntactic phrase structure, provides evidence of a grammatical architecture in which different aspects of structure mutu- ally constrain one another. This result casts doubt on approaches in which syntactic processing is completed prior to the determination of other grammatical properties of a sentence, such as quantiﬁer scope relations. Appendix: Selected Codes Used to Annotate Syntactic Categories in the Penn Tree- bank, from Marcus et al. (1993) and Bies et al. (1995) Part-of-speech tags Tag Meaning Tag Meaning Adverb RB Conjunction RBR Comparative adverb Cardinal number RBS Determiner TO Preposition Adjective UH Comparative adjective VB Superlative adjective VBD Past-tense verb Singular or mass noun VBG Gerundive verb Plural noun Singular proper noun CC CD DT IN JJ JJR JJS NN NNS NNP NNPS Plural proper noun PDT Superlative adverb “to” Interjection Verb in base form Predeterminer VBZ VBN Past participial verb VBP Non-3sg, present- tense verb 3sg, present-tense verb WP WH pronoun WP$ Possessive WH pronoun PRP PRP$ Personal pronoun Possessive pronoun 94 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / / 2 9 1 7 3 1 7 9 7 9 1 0 0 8 9 1 2 0 1 0 3 3 2 1 3 3 7 4 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Higgins and Sadock Modeling Scope Preferences Phrasal categories Code Meaning Code Meaning Adjective phrase ADJP ADVP Adverb phrase INTJ NP PP QP Interjection Noun phrase Prepositional phrase Quantiﬁer phrase (i.e., measure/amount phrase) Declarative clause S SINV SBAR Clause introduced by a subordinating conjunction SBARQ Clause introduced by a WH phrase Inverted declarative sentence Inverted yes/no question following the WH phrase in SBARQ Verb phrase VP SQ Acknowledgments The authors are grateful for an Academic Technology Innovation Grant from the University of Chicago, which helped to make this work possible, and to John Goldsmith, Terry Regier, Anne Pycha, and Bob Moore, whose advice and collaboration have considerably aided the research reported in this article. Any remaining errors are, of course, our own. References Aoun, Joseph and Yen-hui Audrey Li. 1993. The Syntax of Scope. MIT Press, Cambridge. Aoun, Joseph and Yen-hui Audrey Li. 2000. Scope, structure, and expert systems: A reply to Kuno et al. Language, 76(1):133–155. Belz, Anja. 2001. Optimisation of corpus-derived probabilistic grammars. In Proceedings of Corpus Linguistics 2001, pages 46–57. Berger, Adam L., Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71. Bies, Ann, Mark Ferguson, Karen Katz, and Robert MacIntyre. 1995. Bracketing guidelines for Treebank II style. Technical report, Penn Treebank Project, University of Pennsylvania. Bishop, Christopher M. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford. Bridle, John S. 1990. Probabilistic interpretation of feedforward classiﬁcation network outputs with relationships to statistical pattern recognition. In F. Fougelman-Soulie and J. Herault, editors, Neurocomputing— Algorithms, Architectures, and Applications. Springer-Verlag, Berlin, pages 227–236. Brill, Eric. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543–565. Carden, Guy. 1976. English Quantiﬁers: Logical Structure and Linguistic Variation. Academic Press, New York. Carletta, Jean. 1996. Assessing agreement on classiﬁcation tasks: The kappa statistic. Computational Linguistics, 22(2):249–254. Charniak, Eugene. 1993. Statistical language learning. MIT Press, Cambridge. Charniak, Eugene. 1996. Tree-bank grammars. In AAAI/IAAI, vol. 2, pages 1031–1036. Cohen, Jacob. 1960. A coefﬁcient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46. Cooper, Robin. 1983. Quantiﬁcation and Syntactic Theory. Reidel, Dordrecht. Fleiss, Joseph L. 1981. Statistical Methods for Rates and Proportions. John Wiley & Sons, New York. Hobbs, Jerry R. and Stuart M. Shieber. 1987. An algorithm for generating quantiﬁer scopings. Computational Linguistics, 13:47–63. Hornstein, Norbert. 1995. Logical Form: From GB to Minimalism. Blackwell, Oxford and Cambridge. Jelinek, Frederick. 1998. Statistical Methods for Speech Recognition. MIT Press, Cambridge. Jurafsky, Daniel and James H. Martin. 2000. Speech and Language Processing. Prentice Hall, Upper Saddle River, New Jersey. Krippendorff, Klaus. 1980. Content Analysis: An Introduction to Its Methodology. Sage 95 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / / 2 9 1 7 3 1 7 9 7 9 1 0 0 8 9 1 2 0 1 0 3 3 2 1 3 3 7 4 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 29, Number 1 Publications, Beverly Hills, California. Krotov, Alexander, Mark Hepple, Robert J. Gaizauskas, and Yorick Wilks. 1998. Compacting the Penn treebank grammar. In COLING-ACL, pages 699–703. Kuno, Susumu, Ken-Ichi Takami, and Yuru Wu. 1999. Quantiﬁer scope in English, Chinese, and Japanese. Language, 75(1):63–111. Kuno, Susumu, Ken-Ichi Takami, and Yuru Wu. 2001. Response to Aoun and Li. Language, 77(1):134–143. Manning, Christopher D. and Hinrich Sch ¨utze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge. Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330. Martin, Paul, Douglas Appelt, and Fernando Pereira. 1986. Transportability and generality in a natural-language interface system. In B. J. Grosz, K. Sparck Jones, and B. L. Webber, editors, Natural Language Processing. Kaufmann, Los Altos, California, pages 585–593. May, Robert. 1985. Logical Form: Its Structure and Derivation. MIT Press, Cambridge. McCawley, James D. 1998. The Syntactic Phenomena of English. University of Chicago Press, Chicago, second edition. Mitchell, Tom M. 1996. Machine Learning. McGraw Hill, New York. Montague, Richard. 1973. The proper treatment of quantiﬁcation in ordinary English. In J. Hintikka et al., editors, Approaches to Natural Language. Reidel, Dordrecht, pages 221–242. Moran, Douglas B. 1988. Quantiﬁer scoping in the SRI core language engine. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics (ACL’88), pages 33–40. Moran, Douglas B. and Fernando C. N. Pereira. 1992. Quantiﬁer scoping. In Hiyan Alshawi, editor, The Core Language Engine. MIT Press, Cambridge, pages 149–172. Nerbonne, John. 1993. A feature-based syntax/semantics interface. In A. Manaster-Ramer and W. Zadrozsny, editors, Annals of Mathematics and Artiﬁcial Intelligence (Special Issue on Mathematics of Language), 8(1–2):107–132. Also published as DFKI Research Report RR-92-42. Park, Jong C. 1995. Quantiﬁer scope and 96 constituency. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL’95), pages 205–212. Pedersen, Ted. 2000. A simple approach to building ensembles of na¨ıve Bayesian classiﬁers for word sense disambiguation. In Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2000), pages 63–69. Pereira, Fernando. 1990. Categorial semantics and scoping. Computational Linguistics, 16(1):1–10. Pollard, Carl. 1989. The syntax-semantics interface in a uniﬁcation-based phrase structure grammar. In S. Busemann, C. Hauenschild, and C. Umbach, editors, Views of the Syntax/Semantics Interface KIT-FAST Report 74. Technical University of Berlin, pages 167–185. Pollard, Carl and Eun Jung Yoo. 1998. A uniﬁed theory of scope for quantiﬁers and WH-phrases. Journal of Linguistics, 34(2):415–446. Ratnaparkhi, Adwait. 1997. A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania. Santorini, Beatrice. 1990. Part-of-speech tagging guidelines for the Penn Treebank project. Technical Report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania. Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544. van Halteren, Hans, Jakub Zavrel, and Walter Daelemans. 2001. Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2):199–229. VanLehn, Kurt A. 1978. Determining the scope of English quantiﬁers. Technical Report AITR-483, Massachusetts Institute of Technology Artiﬁcial Intelligence Laboratory, Cambridge. Woods, William A. 1986. Semantics and quantiﬁcation in natural language question answering. In B. J. Grosz, K. Sparck Jones, and B. L. Webber, editors, Natural Language Processing. Kaufmann, Los Altos, California, pages 205–248. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / / 2 9 1 7 3 1 7 9 7 9 1 0 0 8 9 1 2 0 1 0 3 3 2 1 3 3 7 4 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Machine Learning Approach to image

PDF Herunterladen