Deterministic Coreference Resolution Based - Ricerca sull'intelligenza artificiale specializzata al MIT

Deterministic Coreference Resolution Based
on Entity-Centric, Precision-Ranked Rules

Heeyoung Lee
Stanford University

∗

Angel Chang
Stanford University

Yves Peirsman
University of Leuven

∗∗

†
Nathanael Chambers
United States Naval Academy

‡
Mihai Surdeanu
University of Arizona

Dan Jurafsky§
Stanford University

We propose a new deterministic approach to coreference resolution that combines the global
information and precise features of modern machine-learning models with the transparency
and modularity of deterministic, rule-based systems. Our sieve architecture applies a battery of
deterministic coreference models one at a time from highest to lowest precision, where each model
builds on the previous model’s cluster output. The two stages of our sieve-based architecture,
a mention detection stage that heavily favors recall, followed by coreference sieves that are
precision-oriented, offer a powerful way to achieve both high precision and high recall. Further,
our approach makes use of global information through an entity-centric model that encourages
the sharing of features across all mentions that point to the same real-world entity. Despite
its simplicity, our approach gives state-of-the-art performance on several corpora and genres,
and has also been incorporated into hybrid state-of-the-art coreference systems for Chinese and

∗ Stanford University, 450 Serra Mall, Stanford, CA 94305. E-mail: heeyoung@stanford.edu,

angelx@cs.stanford.edu.

∗∗ University of Leuven, Blijde-Inkomststraat 21 PO Box 03308, B-3000 Leuven, Belgium.

E-mail: yves.peirsman@arts.kuleuven.be.

† United States Naval Academy, 121 Blake Road, Annapolis, MD 21402. E-mail: nchamber@usna.edu.
‡ University of Arizona, PO Box 210077, Tucson, AZ 85721-0077. E-mail: msurdeanu@email.arizona.edu.
§ Stanford University, 450 Serra Mall, Stanford, CA 94305. E-mail: jurafsky@stanford.edu.

Invio ricevuto: 27 May 2012; revised submission received: 22 ottobre 2012; accepted for publication:
20 novembre 2012.

doi:10.1162/COLI a 00152

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
/
C
o

l
io
/

UN
R
T
io
C
e
–
P
D

F
/

3
9
4
8
8
5
1
8
0
2
6
6
6
/
C
o

l
io

_
UN
_
0
0
1
5
2
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Linguistica computazionale

Volume 39, Numero 4

Arabic. Our system thus offers a new paradigm for combining knowledge in rule-based systems
that has implications throughout computational linguistics.

1. introduzione

Coreference resolution, the task of ﬁnding all expressions that refer to the same entity in
a discourse, is important for natural language understanding tasks like summarization,
question answering, and information extraction.

The long history of coreference resolution has shown that the use of highly precise
lexical and syntactic features is crucial to high quality resolution (Ng and Cardie 2002b;
Lappin and Leass 1994; Poesio et al. 2004UN; Zhou and Su 2004; Bengtson and Roth
2008; Haghighi and Klein 2009). Recent work has also shown the importance of global
inference—performing coreference resolution jointly for several or all mentions in a
document—rather than greedily disambiguating individual pairs of mentions (Morton
2000; Luo et al. 2004; Yang et al. 2004; Culotta et al. 2007; Yang et al. 2008; Poon and
Domingos 2008; Denis and Baldridge 2009; Rahman and Ng 2009; Haghighi and Klein
2010; Cai, Mujdricza-Maydt, and Strube 2011).

Modern systems have met this need for carefully designed features and global or
entity-centric inference with machine learning approaches to coreference resolution.
But machine learning, although powerful, has limitations. Supervised machine learning
systems rely on expensive hand-labeled data sets and generalize poorly to new words
or domains. Unsupervised systems are increasingly more complex, making them hard
to tune and difﬁcult to apply to new problems and genres as well. Rule-based models
like Lappin and Leass (1994) were a popular early solution to the subtask of pronominal
anaphora resolution. Rules are easy to create and maintain and error analysis is more
transparent. But early rule-based systems relied on hand-tuned weights and were not
capable of global inference, two factors that led to poor performance and replacement
by machine learning.

We propose a new approach that brings together the insights of these modern
supervised and unsupervised models with the advantages of deterministic, rule-based
systems. We introduce a model that performs entity-centric coreference, where all men-
tions that point to the same real-world entity are jointly modeled, in a rich feature space
using solely simple, deterministic rules. Our work is inspired both by the seminal early
work of Baldwin (1997), who ﬁrst proposed that a series of high-precision rules could
be used to build a high-precision, low-recall system for anaphora resolution, e da
more recent work that has suggested that deterministic rules can outperform machine
learning models for coreference (Zhou and Su 2004; Haghighi and Klein 2009) and for
named entity recognition (Chiticariu et al. 2010).

Figura 1 illustrates the two main stages of our new deterministic model: mention
detection and coreference resolution, as well as a smaller post-processing step. Nel
mention detection stage, nominal and pronominal mentions are identiﬁed using a
high-recall algorithm that selects all noun phrases (NPs), pronouns, and named entity
mentions, and then ﬁlters out non-mentions (pleonastic it, i-within-i, numeric entities,
partitives, eccetera.).

The coreference resolution stage is based on a succession of ten independent coref-
erence models (or ”sieves”), applied from highest to lowest precision. Precision can be
informed by linguistic intuition, or empirically determined on a coreference corpus (Vedere
Sezione 4.4.3). Per esempio, the ﬁrst (highest precision) sieve links ﬁrst-person pronouns
inside a quotation with the speaker of a quotation, and the tenth sieve (cioè., low precision
but high recall) implements generic pronominal coreference resolution.

886

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
/
C
o

l
io
/

UN
R
T
io
C
e
–
P
D

F
/

3
9
4
8
8
5
1
8
0
2
6
6
6
/
C
o

l
io

_
UN
_
0
0
1
5
2
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Lee et al. Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules

Figura 1
The architecture of our coreference system.

Crucially, our approach is entity-centric—that is, our architecture allows each coref-
erence decision to be globally informed by the previously clustered mentions and their
shared attributes. In particular, each deterministic rule is run on the entire discourse,
using and extending clusters (cioè., groups of mentions pointing to the same real-world
entity, built by models in previous tiers). Così, Per esempio, in deciding whether two
mentions i and j should corefer, our system can consider not just the local features of
i and j but also any information (head word, named entity type, genere, or number)
about the other mentions already linked to i and j in previous steps.

Finalmente, the architecture is highly modular, which means that additional coreference

resolution models can be easily integrated.

The two stage architecture offers a powerful way to balance both high recall and
precision in the system and make use of entity-level information with rule-based
architecture. The mention detection stage heavily favors recall, and the following sieves
favor precision. Our results here and in our earlier papers (Raghunathan et al. 2010;
Lee et al. 2011) show that this design leads to state-of-the-art performance despite the
simplicity of the individual components, and that the lack of language-speciﬁc lexical
features makes the system easy to port to other languages. The intuition is not new; In
addition to the prior coreference work mentioned earlier and discussed in Section 6, we
draw on classic ideas that have proved to be important again and again in the history of
natural language processing. The idea of beginning with the most accurate models or
starting with smaller subproblems that allow for high-precision solutions combines the
intuitions of “shaping” or “successive approximations” ﬁrst proposed for learning by
Skinner (1938), and widely used in NLP (per esempio., the successively trained IBM MT models
of Brown et al. [1993]) and the “islands of reliability” approaches to parsing and speech
recognition [Borghesi and Favareto 1982; Corazza et al. 1991]). The idea of beginning
with a high-recall list of candidates that are followed by a series of high-precision ﬁlters
dates back to one of the earliest architectures in natural language processing, the part of
speech tagging algorithm of the Computational Grammar Coder (Klein and Simmons

887

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
/
C
o

l
io
/

UN
R
T
io
C
e
–
P
D

F
/

3
9
4
8
8
5
1
8
0
2
6
6
6
/
C
o

l
io

_
UN
_
0
0
1
5
2
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Linguistica computazionale

Volume 39, Numero 4

1963) and the TAGGIT tagger (Greene and Rubin 1971), which begin with a high-recall
list of all possible tags for words, and then used high-precision rules to ﬁlter likely tags
based on context.

In the next section we walk through an example of our system applied to a
simple made-up text. We then describe our model in detail and test its performance
on three different corpora widely used in previous work for the evaluation of
coreference resolution. We show that our model outperforms the state-of-the-art
on each corpus. Inoltre, in these sections we describe analytic and ablative
experiments demonstrating that both aspects of our algorithm (the entity-centric aspect
that allows the global sharing of features between mentions assigned to the same
cluster and the precision-based ordering of sieves) independently offer signiﬁcant
improvements to coreference, perform an error analysis, and discuss the relationship
of our work to previous models and to recent hybrid systems that have used our
algorithm as a component to resolve coreference in English, Chinese, and Arabic.

2. Walking Through a Sample Coreference Resolution

Before delving into the details of our method, we illustrate the intuition behind our
approach with the simple pedagogical example listed in Table 1.

In the mention detection step, the system extracts mentions by inspecting all noun
frasi (NP) and other modiﬁer pronouns (PRP) (see Section 3.1 for details). In Table 1,
this step identiﬁes 11 different mentions and assigns them initially to distinct entities
(Entity id and mention id in each step are marked by superscript and subscript).
This component also extracts mention attributes—for example, John:{ne:persona}, E
A girl:{genere:female, number:singular}. These mentions form the input for the
following sequence of sieves.

The ﬁrst coreference resolution sieve (the speaker or quotation sieve) matches
pronominal mentions that appear in a quotation block to the corresponding speaker.
Generalmente, in all the coreference resolution sieves we traverse mentions left-to-right in
a given document (see Section 3.2.1). The ﬁrst match for this model is my9
9, che è
merged with John10
10 into the same entity (entity id: 9). This illustrates the advantages
of our incremental approach: by assigning a higher priority to the quotation sieve, we
avoid linking my9
5, a common mistake made by generic coreference models,
since anaphoric candidates (especially in subject position) are generally preferred to
cataphoric ones (Hobbs 1978).

9 with A girl5

The next sieve searches for anaphoric antecedents that have the exact same string
as the mention under consideration. This component resolves the tenth mention, John9
10,
by linking it with John1
1. When searching for antecedents, we sort candidates in the same
sentential clause from left to right, and we prefer sentences that are closer to the mention
under consideration (see Section 3.2.2 for details). Così, the sorted list of candidates for
John9
1, a musician2
5, the song6
2.
The algorithm stops as soon as a matching antecedent is encountered. In questo caso, IL
algorithm ﬁnds John1

1 and does not inspect a musician2
2.

7, My favorite8

3, a new song4

9, A girl5

10 is It7

4, John1

8, My9

6, He3

The relaxed string match sieve searches for mentions satisfying a looser set of
string matching constraints than exact match (details in Section 3.3.3), but makes no
change because there are no such mentions. The precise constructs sieve searches for
several high-precision syntactic constructs, such as appositive relations and predicate
nominatives. In this example, there are two predicate nominative relations in the ﬁrst
and fourth sentences, so this component clusters together John1
2, and It7
7
and my favorite8
8.

1 and a musician2

888

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
/
C
o

l
io
/

UN
R
T
io
C
e
–
P
D

F
/

3
9
4
8
8
5
1
8
0
2
6
6
6
/
C
o

l
io

_
UN
_
0
0
1
5
2
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Lee et al. Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules

Tavolo 1
A sample run-through of our approach, applied to a made-up sentence. In each step we mark in
bold the affected mentions; superscript and subscript indicate entity id and mention id.

Input:

Mention Detection:

Speaker Sieve:

String Match:

Relaxed String Match:

Precise Constructs:

Strict Head Match A:

Strict Head Match B,C:

Proper Head Noun Match:

Relaxed Head Match:

Pronoun Match:

Post Processing:

Final Output:

John is a musician. He played a new song. A girl was listening to
the song. “It is my favorite,” John said to her.
[John]1
[A girl]5
“[It]7

5 was listening to [the song]6
6.

3 played [a new song]4
4.

1 È [a musician]2

2. [Lui]3

7 È [[my]9

9 favorite]8
1 È [a musician]2
5 was listening to [the song]6
6.

8," [John]10
2. [Lui]3