Generación de datos a texto con planificación macro

Generación de datos a texto con planificación macro

Ratish Puduppully and Mirella Lapata
Institute for Language, Cognition and Computation
School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB

r.puduppully@sms.ed.ac.uk

mlap@inf.ed.ac.uk

Abstracto

Recent approaches to data-to-text generation
have adopted the very successful encoder-
decoder architecture or variants thereof. Estos
models generate text that is fluent (but often
imprecise) and perform quite poorly at select-
ing appropriate content and ordering it co-
herently. To overcome some of these issues,
we propose a neural model with a macro
planning stage followed by a generation stage
reminiscent of traditional methods which em-
brace separate modules for planning and sur-
face realization. Macro plans represent high
level organization of important content such
as entities, events, and their interactions; ellos
are learned from data and given as input to
the generator. Extensive experiments on two
data-to-text benchmarks (ROTOWIRE and MLB)
show that our approach outperforms compet-
itive baselines in terms of automatic and hu-
man evaluation.

1

Introducción

Data-to-text generation refers to the task of gen-
erating textual output from non-linguistic input
(Reiter y Dale, 1997, 2000; Gatt and Krahmer,
2018) such as databases of records, simulations
of physical systems, accounting spreadsheets, o
expert system knowledge bases. Como ejemplo,
Cifra 1 shows various statistics describing a
major league baseball (MLB) juego, incluido
extracts from the box score (es decir.,
the perfor-
mance of the two teams and individual
equipo
members who played as batters, pitchers or field-
ers; Mesa (A)), play-by-play (es decir., the detailed
sequence of each play of the game as it occurred;
Mesa (B)), and a human written game summary
(Mesa (C)).

Traditional methods for data-to-text generation
(Kukich, 1983; McKeown, 1992; Reiter y Dale,

510

1997) follow a pipeline architecture, adopting
separate stages for text planning (determinando
which content to talk about and how it might
be organized in discourse), planificación de oraciones
(aggregating content into sentences, deciding spe-
cific words to describe concepts and relations, y
generating referring expressions), and linguistic
realization (applying the rules of syntax, mor-
phology, and orthographic processing to generate
surface forms). Recent neural network–based
approaches (Lebret et al., 2016; Mei et al., 2016;
Wiseman et al., 2017) make use of the encoder-
decoder architecture (Sutskever et al., 2014), son
trained end-to-end, and have no special-purpose
modules for how to best generate a text, aside
from generic mechanisms such as attention and
copy (Bahdanau et al., 2015; Gu et al., 2016). El
popularity of end-to-end models has been fur-
ther boosted by the release of new datasets with
thousands of input-document training pairs. El
example shown in Figure 1 is taken from the MLB
conjunto de datos (Puduppully et al., 2019b), which contains
baseball game statistics and human written sum-
maries (∼25K instances). ROTOWIRE (Wiseman
et al., 2017) is another widely used benchmark,
which contains NBA basketball game statistics
and their descriptions (∼5K instances).

Wiseman et al. (2017) show that despite being
able to generate fluent text, neural data-to-text
generation models are often imprecise, prone
to hallucination (es decir., generate text that is not
supported by the input), and poor at content
selection and document structuring. Attempts to
remedy some of these issues focus on changing
the way entities are represented (Puduppully et al.,
2019b; Iso et al., 2019), allowing the decoder to
skip low-confidence tokens to enhance faithful
generación (Tian et al., 2019), and making the
encoder-decoder architecture more modular by
introducing micro planning (Puduppully et al.,
2019a; Moryossef et al., 2019). Micro planning
operates at the record level (ver tabla (A) en

Transacciones de la Asociación de Lingüística Computacional, volumen. 9, páginas. 510–527, 2021. https://doi.org/10.1162/tacl a 00381
Editor de acciones: Claire Gardent. Lote de envío: 11/2020; Lote de revisión: 2/2021; Publicado 5/2021.
C(cid:3) 2021 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 1: MLB statistics tables and game summary. Tables summarize the performance of teams and individual
team members who played as batters and pitchers as well as the most important actions (and their actors) in each
play (Tables (A) y (B)). Macro plan for the game summary is shown at the bottom (Mesa (mi)).

indicates
paragraph delimiters. There is a plan for every paragraph in the game summary (correspondence shown in same
color); verbalizes entities, mientras verbalizes events related to the top/bottom
side of an inning (mira la sección 3.1). Set of candidate paragraph plans are shown above macro plan (Mesa (D)) y
grouped into two types: plans describing a single entity/event or their combinations. Best viewed in color.

Cifra 1; p.ej., C.Mullins BH 2, J.Villar TEAM Orioles),
it determines which facts should be mentioned
within a textual unit (p.ej., una sentencia) y cómo
these should be structured (p.ej., the sequence of
records). An explicit content planner essentially
makes the job of the neural network less onerous
allowing to concentrate on producing fluent natu-
ral language output, without expending too much
effort on content organization.

En este trabajo, we focus on macro planning, el
high-level organization of information and how
it should be presented which we argue is impor-
tant for the generation of long, multi-paragraph
documentos (see text (C) En figura 1). Problema-
atically, modern datasets like MLB (Puduppully
et al., 2019b; and also Figure 1) and ROTOWIRE

(Wiseman et al., 2017) do not naturally lend
themselves to document planning as there is no
explicit link between the summary and the content
of the game (which is encoded in tabular form).
En otras palabras, the underlying plans are latent,
and it is not clear how they might be best repre-
enviado, a saber, as sequences of records from a
mesa, or simply words. Sin embargo, game sum-
maries through their segmentation into paragraphs
(and lexical overlap with the input) give clues
as to how content might be organized. Paragraphs
are a central element of discourse (Chafe, 1979;
Longacre, 1979; Halliday and Hasan, 1976),
the smallest domain where coherence and topic
are defined and anaphora resolution is possible

511

(Zadrozny and Jensen, 1991). We therefore oper-
ationalize the macro plan for a game summary as
a sequence of paragraph plans.

Although resorting to paragraphs describes the
summary plan at a coarse level, we still need to
specify individual paragraph plans. In the sports
domain, paragraphs typically mention entities
(p.ej., players important in the game), key events
(p.ej., scoring a run), and their interaction. Y
most of this information is encapsulated in the
statistics accompanying game summaries (ver
Tables (A) y (B) En figura 1). We thus define
paragraph plans such that they contain verbaliza-
tions of entity and event records (see plan (mi) en
Cifra 1). Given a set of paragraph plans and their
corresponding game summary (ver tabla (D) y
summary (C) En figura 1), our task is twofold.
en el momento del entrenamiento, we must learn how content was
selected in order to give rise to specific game
summaries (p.ej., how input (D) led to plan (mi)
for summary (C) En figura 1), while at test time,
given input for a new game, we first predict a
macro plan for the summary and then generate the
corresponding document.

We present a two-stage approach where macro
plans are induced from training data (by taking the
table and corresponding summaries into account)
and then fed to the text generation stage. Aside
from making data-to-text generation more inter-
pretable, the task of generating a document from
a macro plan (rather than a table) affords greater
control over the output text and plays to the advan-
tage of encoder-decoder architectures which excel
at modeling sequences. We evaluate model per-
formance on the ROTOWIRE (Wiseman et al., 2017)
and MLB (Puduppully et al., 2019b) benchmarks.
Experimental results show that our plan-and-
generate approach produces output that is more
factual, coherent, and fluent compared with exist-
ing state-of-the-art models. Our code,
entrenado
modelos, and dataset with macro plans can be
found at https://github.com/ratishsp
/data2text-macro-plan-py.

2 Trabajo relacionado

Content planning has been traditionally consid-
ered a fundamental component in natural lan-
guage generation. Not only does it determine
which information-bearing units to talk about,
but also arranges them into a structure that

creates coherent output. Many content plan-
ners have been based on theories of discourse
coherencia (Azul, 1993), schemas (McKeown
et al., 1997), or have relied on generic plan-
ners (Valle, 1989). Plans are mostly based on
hand-crafted rules after analyzing the target text,
although a few approaches have recognized the
need for learning-based methods. Por ejemplo,
Duboue and McKeown (2001) learn ordering
constraints in a content plan, Konstas and Lapata
(2013) represent plans as grammar rules whose
probabilities are estimated empirically, while oth-
ers make use of semantically annotated corpora
to bootstrap content planners
(Duboue and
McKeown, 2002; Kan and McKeown, 2002).

More recently, various attempts have been made
to improve neural generation models (Wiseman
et al., 2017) based on the encoder-decoder archi-
tecture (Bahdanau et al., 2015) by adding various
planning modules. Puduppully et al. (2019a) pro-
pose a model for data-to-text that first learns a
plan from the records in the input table and then
generates a summary conditioned on this plan.
Shao et al. (2019) introduce a Planning-based
Hierarchical Variational Model where a plan is
a sequence of groups, each of which contains a
subset of input items to be covered in a sentence.
The content of each sentence is verbalized, estafa-
ditioned on the plan and previously generated
contexto. In their case, input items are a rela-
tively small list of attributes (∼28) and the output
document is also short (∼110 words).

There have also been attempts to incorporate
neural modules in a pipeline architecture for
data-to-text generation. Moryossef et al. (2019)
develop a model with a symbolic text planning
stage followed by a neural realization stage. Ellos
experiment with the WebNLG dataset (Gardent
et al., 2017) which consists of RDF (cid:4) Subject,
Object, Predicate (cid:5) triples paired with correspond-
ing text. Their document plan is a sequence of
sentence plans that in turn determine the division
of facts into sentences and their order. Along
similar lines, Castro Ferreira et al. (2019) pro-
pose an architecture composed of multiple steps
including discourse ordering, text structuring, lex-
icalization, referring expression generation, y
surface realization. Both approaches show the
effectiveness of pipeline architectures, sin embargo,
their task does not require content selection and
the output texts are relatively short (24 tokens on
promedio).

512

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Although it is generally assumed that task-
specific parallel data is available for model
training, Laha et al. (2020) do away with this
assumption and present a three-stage pipeline
model which learns from monolingual corpora.
They first convert the input to a form of tuples,
which in turn are expressed in simple sentences,
followed by the third stage of merging simple
sentences to form more complex ones by aggre-
gation and referring expression generation. Ellos
also evaluate on data-to-text tasks which have
relatively short outputs. There have also been
efforts to improve the coherence of the output,
especially when dealing with longer documents.
Puduppully et al. (2019b) make use of hierar-
chical attention over entity representations which
are updated dynamically, while Iso et al. (2019)
explicitly keep track of salient entities and mem-
orize which ones have been mentioned.

Our work also attempts to alleviate deficien-
cies in neural data-to-text generation models.
In contrast to previous approaches, (Puduppully
et al., 2019a; Moryossef et al., 2019; Laha et al.,
2020), we place emphasis on macro planning and
create plans representing high-level organization
of a document including both its content and
estructura. We share with previous work (p.ej.,
Moryossef et al. 2019) the use of a two-stage
architecture. We show that macro planning can
be successfully applied to long document data-to-
text generation resulting in improved factuality,
coherencia, and fluency without any postpro-
cesando (p.ej., to smooth referring expressions)
or recourse to additional tools (p.ej., parsing or
information extraction).

3 Problem Formulation

We hypothesize that generation based on plans
should fare better compared to generating from a
set of records, since macro plans offer a bird’s-eye
vista, a high-level organization of the document
content and structure. We also believe that macro
planning will work well for long-form text genera-
ción, eso es, for datasets that have multi-paragraph
target texts, a large vocabulary space, and require
content selection.

We assume the input to our model is a set of
paragraph plans E = {ei}| mi |
i=1 where ei is a para-
graph plan. We model the process of generating
output summary y given E as a two step process,
a saber, the construction of a macro plan x based

on the set of paragraph plans, followed by the
generation of a summary given a macro plan as
aporte. We now explain how E is obtained and
each step is realized. We discuss our model con-
sidering mainly an example from the MLB dataset
(Puduppully et al., 2019b) but also touch on how
the approach can be straightforwardly adapted to
ROTOWIRE (Wiseman et al., 2017).

3.1 Macro Plan Definition

A macro plan consists of a sequence of paragraph
plans separated by a paragraph discourse marker

, eso es, x = ei

ej . . .

ek where
ei, ej, ek ∈ E . A paragraph plan in turn is a
sequence of entities and events describing the
juego. By entities we mean individual players or
teams and the information provided about them in
box score statistics (see rows and column headings
En figura 1 Mesa (A)), while events refer to infor-
mation described in play-by-play (ver tabla (B)).
In baseball, plays are grouped in half-innings.
During each half of an inning, a team takes its
turn to bat (the visiting team bats in the top half
and the home team in the bottom half). An exam-
ple macro plan is shown at the bottom of Figure 1.
Within a paragraph plan, entities and events are
verbalized into a text sequence along the lines of
Saleh et al. (2019). We make use of special tokens
para el of record followed by the value
of record from the table. We retain the same posi-
tion for each record type and value. Por ejemplo,
batter C.Mullins from Figure 1 would be verbal-
ized as C.Mullins h 4

2 2 1 Orioles
. . . . For the sake of brevity we use shorthand
for the full entity.

Paragraph Plan for Entities For a paragraph
containing entities, the corresponding plan will
be a verbalization of the entities in sequence.
For paragraphs with multiple mentions of the
same entity, the plan will verbalize an entity only
once and at its first position of mention. Párrafo
‘‘Keller gave up a home run . . . the teams with the
worst records in the majors’’ from the summary in
Cifra 1 describes four entities including B. Keller,
C. Mullins, Royals and Orioles. The respective
plan is the verbalization of the four entities in
secuencia:
, where V stands
for verbalization and es un

513

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

shorthand for B.Keller V
7 5 8 . . . , es un
shorthand for the team Royals 9
14 1, etcétera.

Paragraph Plan for Events A paragraph may
also describe one or more events. Por ejemplo,
the paragraph ‘‘With the score tied 1–1 in the
fourth . . . 423-foot home run to left
field to
make it 3-1’’ discusses what happened in the
bottom halves of the fourth and fifth innings. Nosotros
verbalize an event by first describing the par-
ticipating entities followed by the plays in the
evento. Entities are described in the order in which
they appear in a play, and within the same play
we list the batter followed by the pitcher, fielder,
scorer, and basemen. The paragraph plan corre-
sponding to the bottom halves of the fourth and
fifth inning is . Aquí, is a shorthand for
. . . . . .
, etcétera. The en-
tities , ,
, y cor-
respond in turn to W. Merrifield, A. Cashner,
B. Goodwin, and H. Dozier while
refers to the first play in the bottom half of
the fifth inning (see the play-by-play table in
Cifra 1) and abbreviates the following detailed
plan: 5 B Royals
Orioles 1
H.Dozier A. Cashner>
Home-run Royals-3-Orioles-1,
Etcétera.

The procedure described above is not specific
to MLB and can be ported to other datasets with
similar characteristics such as ROTOWIRE. Cómo-
alguna vez, ROTOWIRE does not provide play-by-play
información, and as a result there is no event
verbalization for this dataset.

3.2 Macro Plan Construction

We provided our definition for macro plans in the
previous sections, sin embargo, it is important to note
that such macro plans are not readily available in
data-to-text benchmarks like MLB (Puduppully
et al., 2019b) and ROTOWIRE (Wiseman et al.,
2017) which consist of tables of records r paired
with a gold summary y (see Tables (A)–(C) en
Cifra 1). We now describe our method for obtain-
ing macro plans x from r and y.

Similar to Moryossef et al. (2019), we define
macro plans to be conformant with gold sum-
maries such that (1) they have the same splits
into paragraphs—entities and events within a
paragraph in y are grouped into a paragraph plan
in x; y (2) the order of events and entities
in a paragraph and its corresponding plan are
identical. We construct macro plans by matching
entities and events in the summary to records
in the tables. Además, paragraph delimiters
within summaries form natural units which taken
together give rise to a high-level document plan.
We match entities in summaries with entities
in tables using exact string match, allowing for
some degree of variation in the expression of
team names (p.ej., A’s for Athletics and D-backs
for Diamondbacks). Information pertaining to
innings appears in the summaries in the form of
ordinal numbers (p.ej., primero, ninth) modifying the
noun inning and can be relatively easily identi-
fied via pattern matching (p.ej., in sentences like
‘‘Dozier led off the fifth inning’’). Sin embargo, allá
are instances where the mention of innings is more
ambiguous (p.ej., ‘‘With the scored tied 1–1 in the
fourth, Andrew Cashner (4–13) gave up a sacri-
fice fly’’). We could disambiguate such mentions
manually and then train a classifier to learn to
predict whether an inning is mentioned. En cambio,
we explore a novel annotation-free method that
makes use of the pretrained language model GPT2
(Radford et al., 2019). Específicamente, we feed the
context preceding the ordinal number to GPT2
(es decir.,
the current paragraph up to the ordinal
number and the paragraph preceding it) and if
inning appears in the top 10 next word predictions,
we consider it a positive match. On a held-out
conjunto de datos, this method achieves 98% precision and
98% recall at disambiguating inning mentions.

To resolve whether the summary discusses the
top or bottom side of an inning, we compare the
entities in the paragraph with the entities in each
half-inning (play-by-play Table (B) En figura 1)
and choose the side with the greater number of
entity matches. Por ejemplo, Andrew Cashner,
Merrifield and fourth inning uniquely resolves to
the bottom half of the fourth inning.

3.3 Paragraph Plan Construction

Cifra 1 shows the macro plan we obtain for
game summary (C). En tono rimbombante, macro plan (mi)
is the outcome of a content selection process after

514

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

considering several candidate paragraph plans as
aporte. So, what are the candidate paragraph plans
that give rise to macro plan (mi)? To answer this
pregunta, we examined the empirical distribution
of paragraph plans in MLB and ROTOWIRE (train-
ing portion). Curiosamente, we found that ∼79% of
the paragraph plans in MLB refer to a single event
or a single player (and team(s)). In ROTOWIRE,
∼92% of paragraphs are about a singleton player
(and team(s)) or a pair of players.

Based on this analysis, we assume that para-
graph plans can be either one (verbalized)
entity/event or a combination of at most two.
Under this assumption, we explicitly enumerate
the set of candidate paragraph plans in a game.
For the game in Figure 1, candidate paragraph
plans are shown in Table (D). The first table
groups plans based on individual verbalizations
describing the team(s), jugadores, and events taking
place in specific innings. The second table groups
pairwise combinations thereof. In MLB, semejante
combinations are between team(s) and players. En
ROTOWIRE, we also create combinations between
jugadores. Such paragraph plans form set E based
on which macro plan x is constructed to give rise
to game summary y.

4 Model Description

The input to our model is a set of paragraph plans,
each of which is a sequence of tokens. We first
compute paragraph plan representations ∈ Rn,
and then apply a contextualization and content
planning mechanism similar to planning mod-
ules introduced in earlier work (Puduppully et al.,
2019a; Chen and Bansal, 2018). Predicted macro
plans serve as input to our text generation model,
which adopts an encoder-decoder architecture
(Bahdanau et al., 2015; Luong et al., 2015).

4.1 Macro Planning

Paragraph Plan Representation We encode
tokens in a verbalized paragraph plan ei as
{ei,j}|ei|
j=1 with a BiLSTM (Cifra 2, bottom part).
To reflect the fact that some records will be more
important than others, we compute an attention
weighted sum of {ei,j}|ei|
j=1 following Yang et al.
(2016). Let d ∈ Rn denote a randomly initialized
query vector
jointly with the rest of
parámetros. We compute attention values αi,j over
d and paragraph plan token representation ei,j:
αi,j ∝ exp(d(cid:2)ei,j)

learnt

(1)

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

plan

2: Párrafo

Cifra
y
contextualization for macro planning. Cálculo
of e3 is detailed in Equations (1) y (2), eatt
en
Ecuación (3), and ec

representación

3 en la ecuación (4).

3

Paragraph plan vector ei is the attention weighted
sum of ei,j (con
αi,j = 1):

(cid:2)

j

(cid:3)

ei =

αi,jei,j

(2)

j

Próximo, we contextualize each paragraph plan
representation vis-a-vis other paragraph plans
(Cifra 2, top left part). Primero, we compute attention
scores βi,k over paragraph plan representations to
obtain an attentional vector eatt

for each:

i

(cid:2)
βi,k ∝ exp(mi
i Waek)

(cid:3)

ci =

βi,kek

k(cid:8)=i
eatt
i = Wg[ei; ci]

(3)

(cid:2)

where Wa ∈ Rn×n, Wg ∈ Rn×2n are parameter
k(cid:8)=i βi,k = 1. Entonces, we compute
matrices, y
a content selection gate, and apply this gate to ei
to obtain new paragraph plan representation ec
i :

(cid:4)

(cid:5)

eatt
i

gi = sigmoid
i = gi (cid:9) ei
ec

(4)

dónde (cid:9) denotes element-wise multiplication.
De este modo, each element in ei is weighted by cor-
responding element of gi ∈ [0, 1]n to obtain a
contextualized paragraph plan representation ec
i .

Content Planning Our model learns to predict
macro plans, after having been trained on pairs
of sets of paragraph plans and corresponding

515

separators in between. The conditional output
probability p(y|X) is modeled as:

pag(y|X) =

|y|(cid:6)

t=1

pag(yt|y

ROTOWIRE

MLB

Vocab Size
# Tokens
# Instances
# Record Types
Avg Records
Avg Paragraph Plans
Avg Length

11.3k
1.5METRO
4.9k

39
628
10.7
337.1

38.9k
14.3METRO
26.3k
53
565
15.1
542.05

Mesa 1: Dataset statistics for ROTOWIRE and
tokens,
MLB. Vocabulary size, number of
number of instances (es decir., table-summary pairs),
number of record types, average number of
records, average number of paragraph plans,
and average summary length.

During inference, we employ beam search to
find the most likely macro plan ˆz among candidate
macro plans z(cid:10) given paragraph plans as input.

ˆz = arg max

z(cid:10)

pag(z(cid:10)|mi ; i)

We deterministically obtain ˆx from ˆz, y
output summary ˆy among candidate outputs y(cid:10)
given macro plan ˆx as input:

ˆy = arg max

y(cid:10)

pag(y(cid:10)|ˆx; Fi)

5 Experimental Setup

en

performed

experimentos

Data We
el
ROTOWIRE (Wiseman et al., 2017) and MLB
(Puduppully et al., 2019b) benchmarks. El
details of these two datasets are given in Table 1.
We can see that MLB is around 5 times bigger,
has a richer vocabulary, and has longer game sum-
maries. We use the official splits of 3,398/727/728
for ROTOWIRE and 22,821/1,739/1,744 for MLB.
We make use of a tokenization script1 to deto-
kenize and retokenize the summaries in both
ROTOWIRE and MLB.

We reconstructed the MLB dataset, como el
version released by Puduppully et al. (2019b)
de
had removed all paragraph delimiters
game summaries. Específicamente, we followed their
methodology and downloaded the same sum-
maries from the ESPN Web site2 and added the

delimiter to paragraphs in the summaries.3

1https://github.com/neulab/DGT.
2http://www.espn.com/mlb/recap?gameId=

{gameid}.

3Although our model is trained on game summaries with
paragraph delimiters, and also predicts these at generation
tiempo, for evaluation we strip

from model output.

517

ROTOWIRE does not have paragraph delimiters in
game summaries either. We reverse engineered
these as follows: (1) we split summaries into sen-
tences using the NLTK (Bird et al., 2009) oración
tokenizer; (2) initialized each paragraph with a
separate sentence; (3) merged two paragraphs into
one if the entities in the former were a superset of
entities in the latter; (4) repeated Step 3 until no
merges were possible.

Training Configuration We tuned the model
hyperparameters on the development set. Para
training the macro planning and the text gener-
ation stages, we used the Adagrad (Duchi et al.,
2011) optimizer. Además, the text generation
stage made use of truncated BPTT (Williams and
Peng, 1990) with truncation length 100. We learn
subword vocabulary (Sennrich et al., 2016) para
paragraph plans in the macro planning stage. Nosotros
used 2.5K merge operations for ROTOWIRE and 8K
merge operations for MLB. In text generation, nosotros
learn a joint subword vocabulary for the macro
plan and game summaries. We used 6K merge
operations for ROTOWIRE and 16K merge oper-
ations for MLB. All models were implemented
on OpenNMT-py (Klein et al., 2017). We add to
set E the paragraph plans corresponding to the
output summary paragraphs, to ensure full cover-
age during training of the macro planner. During
inference for predicting macro plans, we employ
length normalization (Bahdanau et al., 2015) a
avoid penalizing longer outputs; específicamente, nosotros
divide the scores of beam search by the length of
La salida. Además, we adopt bigram blocking
(Paulus et al., 2018). For MLB, we further block
beams containing more than two repetitions of a
unigram. This helps improve the diversity of the
predicted macro plans.

System Comparisons We compared our model
against the following systems: (1) the Template-
based generators from Wiseman et al. (2017)
for ROTOWIRE and Puduppully et al. (2019b) para
MLB. Both systems apply the same principle, ellos
emit a sentence about the teams playing in the
juego, followed by player-specific sentences, y
a closing sentence. MLB additionally contains a
description of play-by-play; (2) ED+CC, the best
performing system in Wiseman et al. (2017), es
a vanilla encoder-decoder model equipped with
an attention and copy mechanism; (3) NCP+CC,
the micro planning model of Puduppully et al.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

(2019a), generates content plans from the table
by making use of Pointer networks (Viñales
et al., 2015) to point to records; content plans are
encoded with a BiLSTM and the game summary
is decoded using another LSTM with attention
and copy; (4) ENT, the entity-based model of
Puduppully et al. (2019b), creates dynamically
updated entity-specific representations; the text
is generated conditioned on the data input and
entity memory representations using hierarchical
attention at each time step.

6 Resultados

Automatic Evaluation For automatic evalua-
ción, following earlier work (Wiseman et al. 2017;
Puduppully et al. 2019a,b, inter alia) nosotros reportamos
AZUL (Papineni et al., 2002) with the gold sum-
mary as reference but also make use of the In-
formation Extraction (IE) metrics from Wiseman
et al. (2017), which are defined over the output
of an IE system; the latter extracts entity (jugadores,
equipos) and value (numbers) pairs in a summary,
and then predicts the type of relation. Por ejemplo,
given the pair Kansas City Royals, 9, it would
predict their relation as TR (es decir., Team Runs).
Training data for the IE system is obtained by
checking for matches between entity, value pairs
in the gold summary and entity, valor, record type
triplets in the table.

Let ˆy be the gold summary and y the model
producción. Relation Generation (RG) measures the
precision and count of relations extracted from y
that also appear in records r. Content Selection
(CS) measures the precision and recall of relations
extracted from y that are also extracted from ˆy.
Content Ordering (CO) measures the normalized
Damerau-Levenshtein distance between the se-
quences of relations extracted from y and ˆy.

We reused the IE model from Puduppully et al.
(2019a) for ROTOWIRE but retrained it for MLB
to improve its precision and recall. Además,
the implementation of Wiseman et al. (2017)
computes RG, CS, and CO excluding duplicate
relaciones. This artificially inflates the performance
of models whose outputs contain repetition. Nosotros
include duplicates in the computation of the IE
métrica (and recreate them for all comparison
sistemas).

Mesa 2 (arriba) presents our

results on the
ROTOWIRE
In addition to Templ,
NCP+CC, ENT, and ED+CC we include the

prueba

colocar.

ROTOWIRE

Templ
WS-2017
ED+CC
NCP+CC
ENT
RBF-2020
Macro
−Plan(4)

RG

CS
P% P% R% F% DLD%

CO

#

54.3 99.9 27.1 57.7 36.9
34.1 75.1 20.3 36.3 26.1
35.9 82.6 19.8 33.8 24.9
40.8 87.6 28.0 51.1 36.2
32.7 91.7 34.7 48.5 40.5
44.9 89.5 23.9 47.0 31.7
42.1 97.6 34.1 57.8 42.9
36.2 81.3 22.1 38.6 28.1

13.1
12.4
12.0
15.8
16.6
14.3
17.7
12.1

MLB

#

RG

CS
P% P% R% F% DLD%

CO

62.3 99.9 21.6 55.2 31.0
Templ
ED+CC
32.5 91.3 27.8 40.6 33.0
NCP+CC
19.6 81.3 44.5 44.1 44.3
23.8 81.1 40.9 49.5 44.8
ENT
30.8 94.4 40.8 54.9 46.8
Macro
−Plan(SP,4) 25.1 92.7 40.0 44.6 42.2

11.0
17.1
21.9
20.7
21.8
21.9

AZUL

8.46
14.19
14.99
16.50
16.12
17.16
15.46
14.00

AZUL

4.12
9.68
9.68
11.50
12.62
11.09

relation generation (RG) count

Mesa 2: Evaluation on ROTOWIRE and MLB
test sets;
(#)
and precision (P%), contenido
selección (CS)
precisión (P%), recordar (R%) and F-measure (F%),
content ordering (CO) in normalized Damerau-
Levenshtein distance (DLD%), and BLEU.

best performing model of Wiseman et al. (2017)
(WS-2017; note that ED+CC is an improved re-
implementation of their model), and the model of
Rebuffel et al. (2020) (RBF-2020), which repre-
sents the state of the art on ROTOWIRE. This model
has a Transformer encoder (Vaswani et al., 2017)
with a hierarchical attention mechanism over
entities and records within entities. The models
of Saleh et al. (2019), Iso et al. (2019), and Gong
et al. (2019) make use of additional information
not present in the input (p.ej., previous/next games,
summary writer) and are not directly comparable
to the systems in Table 2. Results for the MLB
test set are in the bottom portion of Table 2.

Indicando que

is always faithful

Templ has the highest RG precision and count
on both datasets. This is not surprising, por diseño
Templ
to the input. Cómo-
alguna vez, notice that it achieves the lowest BLEU
él
among comparison systems,
mostly regurgitates facts with low fluency. Macro
achieves the highest RG precision among all neu-
ral models for ROTOWIRE and MLB. Obtenemos
an absolute improvement of 5.9% over ENT
for ROTOWIRE and 13.3% for MLB. Además,
Macro achieves the highest CS F-measure for
both datasets. On ROTOWIRE, Macro achieves the
highest CO score, and the highest BLEU on MLB.
On ROTOWIRE, in terms of BLEU, Macro is worse

518

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

than comparison models (p.ej., NCP+CC or ENT).
Inspection of the output showed that the opening
párrafo, which mostly describes how the two
teams fared, is generally shorter in Macro, dirigir-
ing to shorter summaries and thus lower BLEU.
There is high variance in the length of the opening
paragraph in the training data and Macro verbal-
izes the corresponding plan conservatively. Ideas
such as length normalization (Wu et al., 2016) o
length control (Kikuchi et al., 2016; Takeno et al.,
2017; Fan et al., 2018) could help alleviate this;
sin embargo, we do not pursue them further for fair
comparison with the other models.

The Contribution of Macro Planning To study
the effect of macro planning in more detail, nosotros
further compared Macro against text generation
modelos (mira la sección 4.2) which are trained on
verbalizations of the tabular data (and gold sum-
maries) but do not make use of document plans or a
document planning mechanism. On ROTOWIRE, el
model was trained on verbalizations of players and
equipos, with the input arranged such that the ver-
balization of the home team was followed by the
visiting team, the home team players and the visit-
ing team players. Mention of players was limited
to the four best ones, following Saleh et al. (2019)
(see −Plan(4) en mesa 2). For MLB, we addition-
ally include verbalizations of innings focusing on
scoring plays which are likely to be discussed in
game summaries (see −Plan(SP,4) en mesa 2).
Note that by preprocessing the input in such a
way some simple form of content selection takes
place simply by removing extraneous information
which the model does not need to consider.

Across both datasets, −Plan variants appear
competitive. On ROTOWIRE −Plan(4) is better than
ED+CC in terms of content selection but worse
compared to ENT. On MLB, −Plan(SP,4) is again
superior to ED+CC in terms of content selection
but not ENT whose performance lags behind
when considering RG precision. Tomados juntos,
these results confirm that verbalizing entities and
events into a text sequence is effective. En el
mismo tiempo, we see that −Plan variants are worse
than Macro across most metrics which underlines
the importance of an explicit planning component.

Mesa 3 presents intrinsic evaluation of the
macro planning stage. Aquí, we compare the in-
ferred macro plan with the gold macro plans, CS
and CO metrics with regard to entities and events

Macro

CS-P

CS-R

CS-F

ROTOWIRE
MLB

81.3
80.6

73.2
63.3

77.0
70.9

CO

45.8
31.4

Mesa 3: Evaluation of macro planning stage;
content selection precision (CS-P), recordar (CS-
R), F-measure (CS-F), and content ordering (CO)
between the inferred plans and gold plans in terms
of entities and events for ROTOWIRE (RW) y
MLB test sets.

instead of relations. We see that our macro plan-
ning model (Macro) achieves high scores for CS
and CO for both ROTOWIRE and MLB. We further
used the CS and CO metrics to check how well the
generated summary follows the (predicted) plan.
We followed the steps in Section 3.2 and reverse
engineered macro plans from the model summa-
ries and compared these extracted plans with the
original macro plans with regard to entities and
events. We found that Macro creates summaries
that follow the plan closely: For ROTOWIRE, el
CS F-score and CO are greater than 98%; para
MLB, the CS F-score is greater than 94% and CO
is greater than 89%. We show an output summary
for Macro in Table 4, together with the predicted
document plan.

Human-Based Evaluation We also asked par-
ticipants to assess model output in terms of relation
generación, grammaticality, coherencia, y estafa-
ciseness (Wiseman et al., 2017; Puduppully
et al., 2019a; Puduppully et al., 2019b), Para
ROTOWIRE, we compared Macro against RBF-
2020,4 ED+CC, Gold, and Templ. For MLB, nosotros
compared Macro against ENT, ED+CC, Gold,
and Templ.

We conducted our study on the Amazon
Mechanical Turk (AMT) crowdsourcing platform,
following best practices for human evaluation in
NLG (van der Lee et al., 2019). Específicamente, to en-
sure consistent ratings, we required crowdworkers
to have an approval rating greater than 98% y
a minimum of 1,000 previously completed tasks.
Raters were restricted to English-speaking coun-
intentos (es decir., US, Reino Unido, Canada, Irlanda, Australia, o
NZ). Participants were allowed to provide feed-
back on the task or field questions (our interface
accepts free text).

4We are grateful to Cl´ement Rebuffel for providing us

with the output of their system.

519

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3




ST. PETERSBURG, Fla. (AP) – The Tampa Bay Rays are making
the most of it.

Akinori Iwamura hit a two-run homer in the
eighth inning and the Rays beat the Boston Red Sox 2-1 on Sunday
to complete a three-game sweep.

The Rays, who have the best
record in the majors, have won six of their last seven games.


The Rays have won four of their last five series, including three in
a row against the Red Sox, who have won six of their last seven
en general.

Dioner Navarro singled with one out in the eighth off
Clay Buchholz (1-2) and moved to third on Jason Bartlett’s flyout to
center. Iwamura then drove a 1-1 pitch into the left-field stands for
his second homer of the season.

Scott Dohmann (2-0) got the
win in relief, striking out Manny Ramirez with runners on first and
third to end the eighth.

Troy Percival worked the ninth for
his fifth save in five opportunities.

Clay Buchholz (1-2) gave
up two runs and three hits in eight innings. He struck out nine and
walked two.

The Red Sox loaded the bases with one out in the
fifth on a single by Coco Crisp, a wild pitch and a walk to Jed Lowrie.
Jacoby Ellsbury drove in Crisp with a two-out single to center.


Jackson struck out four and walked three.

The Red Sox loaded
the bases with one out in the fifth on a single by Coco Crisp, a walk
to Jed Lowrie and a one-out walk to Jed Lowrie. Jackson struck out
Julio Lugo, but Jacoby Ellsbury singled to center to put the Red Sox
arriba 1-0.

The Red Sox threatened in the eighth when J. D. Dibujó
drew a two-out walk against Trever Miller, but Ramirez struck out to
end the inning.

Mesa 4: Predicted macro plan (arriba) con
corresponding model output (abajo). Entities and
events in summary corresponding to those in the
macro plan are boldfaced.

In our first study, we presented crowdworkers
with sentences randomly selected from summaries
along with their corresponding box score (y
play-by-play in case of MLB) and asked them to
count supported and contradicting facts (ignoring
hallucinations, es decir., unsupported facts). We did
not require crowdworkers to be familiar with
NBA or MLB. En cambio, we provided a cheat sheet
explaining the semantics of box score tables. En
addition, we provided examples of sentences with
supported/contradicting facts. We evaluated 40
summaries from the test set (20 per dataset), 4 sen-
tences from each summary and elicited 3 respuestas
per summary. This resulted in 40 summaries ×
5 systems × 3 raters, for a total of 600 tareas.
Altogether, 131 crowdworkers participated in this
estudiar (agreement using Krippendorff’s α was 0.44
for supported and 0.42 for contradicting facts).

As shown in Table 5, Macro yields the smallest
number of contradicting facts among neural mod-
els on both datasets. On ROTOWIRE the number
of contradicting facts for Macro is comparable
to Gold and Templ (the difference is not sta-
tistically significant) and significantly smaller
compared to RBF-2020 and ED+CC. The count

ROTOWIRE #Supp #Contra Gram

Coher

Concis

3.63
7.57*
3.92

Gold
Templ
ED+CC
RBF-2020 5.08*
Macro

4.00

38.33

46.25*

0.07
30.83
0.08 −61.67* −52.92* −36.67*
−4.58
0.91*
3.75
0.67*
6.67
0.27

−8.33
4.58
10.42

5.0
13.33
5.0

MLB
Gold
Templ
ED+CC
ENT
Macro

Coher
30.0

#Supp #Contra Gram
0.14
3.59
21.67
0.04 −51.25* −43.75*
4.21
0.72* −22.5* −12.08* −39.17*
3.42
5.83* −0.83* −22.08*
0.73*
3.71
27.08
26.67
46.25
0.25
3.76

Concis
26.67
7.5

hechos

Mesa 5: Average number of supported (#Supp)
and contradicting (#Contra)
in game
summaries and best-worst scaling evaluation
(higher is better). Systems significantly different
from Macro are marked with an asterisk * (usando
a one-way ANOVA with post hoc Tukey HSD
pruebas; p ≤ 0.05).

of supported facts for Macro is comparable to
Gold, and ED+CC, and significantly lower than
Templ and RBF-2020. On MLB, Macro has sig-
nificantly fewer contradicting facts than ENT and
ED+CC and is comparable to Templ and Gold
(the difference is not statistically significant). El
count of supported facts for Macro is comparable
to Gold, ENT, ED+CC, and Templ. For both
conjuntos de datos, Templ has the lowest number of contra-
dicting facts. This is expected as Templ essentially
parrots facts (aka records) from the table.

We also conducted a second study to evaluate
the quality of the generated summaries. We pre-
sented crowdworkers with a pair of summaries and
asked them to choose the better one in terms of
Grammaticality (is the summary written in well-
formed English?), Coherence (is the summary
well structured and well organized and does it have
a natural ordering of the facts?), and Conciseness
(does the summary avoid unnecessary repetition
including whole sentences, facts or phrases?). Nosotros
provided example summaries showcasing good
and bad output. For this task, we required that the
crowdworkers be able to comfortably compre-
hend NBA/MLB game summaries. We elicited
preferences with Best-Worst Scaling (Louviere
and Woodworth 1991; Louviere et al., 2015), a
method shown to be more reliable than rating
escamas. The score of a system is computed as the
number of times it is rated best minus the number
of times it is rated worst (Orme, 2009). The scores

520

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

range from −100 (absolutely worst) a +100
(absolutely best). We divided the five competing
systems into ten pairs of summaries and elicited
ratings for 40 summaries (20 per dataset). Cada
summary pair was rated by 3 raters. This resulted
en 40 summaries × 10 system pairs × 3 evaluación
criteria × 3 raters, for a total of 3,600 tareas. A
total of 206 crowdworkers participated in this task
(agreement using Krippendorff’s α was 0.47).

As shown in Table 5, on ROTOWIRE, Macro is
comparable to Gold, RBF-2020, and ED+CC in
terms of Grammaticality but significantly better
than Templ. In terms of Coherence, Macro is
comparable to RBF-2020 and ED+CC but signif-
icantly better than Templ and significantly worse
than Gold. With regard to Conciseness, Macro is
comparable to Gold, RBF-2020, and ED+CC, y
significantly better than Templ. On MLB, Macro
is comparable to Gold in terms of Grammaticality
and significantly better than ED+CC, ENT, y
Templ. Macro is comparable to Gold in terms of
Coherence and significantly better than ED+CC,
ENT and Templ. In terms of Conciseness, raters
found Macro comparable to Gold and Templ and
significantly better than ED+CC and ENT. Taken
together, our results show that macro planning
leads to improvement in data-to-text generation in
comparison to other systems for both ROTOWIRE
and MLB datasets.

7 Discusión

In this work we presented a plan-and-generate
approach for data-to-text generation that consists
of a macro planning stage representing high-level
document organization in terms of structure and
contenido, followed by a text generation stage.
Extensive automatic and human evaluation shows
that our approach achieves better results than ex-
isting state-of-the-art models and generates sum-
maries which are factual, coherent, and concise.

Our results show that macro planning is more
advantageous for generation tasks expected to
produce longer texts with multiple discourse
units, and could be easily extended to other sports
domains such as cricket (Kelly y col., 2009) o
American football (Barzilay and Lapata, 2005).
Other approaches focusing on micro planning
(Puduppully et al., 2019a; Moryossef et al., 2019)
might be better tailored for generating shorter
textos. There has been a surge of datasets recently
focusing on single-paragraph outputs and the task

of content selection such as E2E (Novikova et al.,
2017), WebNLG (Gardent et al., 2017), y
WikiBio (Lebret et al., 2016; Perez-Belrachini
and Lapata, 2018). We note that in our model con-
tent selection takes place during macro planning
and text generation. The results in Table 2 espectáculo
that Macro achieves the highest CS F-measure
on both datasets, indicating that the document as
a whole and individual sentences discuss appro-
priate content.

Throughout our experiments we observed that
template-based systems score poorly in terms of
CS (but also CO and BLEU). This is primarily due
to the inflexibility of the template approach which
is limited to the discussion of a fixed number of
(high-scoring) jugadores. Todavía, human writers (y
neural models to a certain extent), synthesize
summaries taking into account the particulars of a
specific game (where some players might be more
important than others even if they scored less)
and are able to override global defaults. Template
sentences are fluent on their own, but since it
is not possible to perform aggregation (Reiter,
1995), the whole summary appears stilted, it lacks
coherence and variability, contributing to low
Puntuaciones BLEU. The template baseline is worse for
MLB than ROTOWIRE which reflects the greater
difficulty to manually create a good template for
MLB. En general, we observe that neural models are
more fluent and coherent, being able to learn a
better ordering of facts which is in turn reflected
in better CO scores.

Despite promising results, there is ample room
to improve macro planning, especially in terms
of the precision of RG (ver tabla 2, P% column
of RG). We should not underestimate that Macro
must handle relatively long inputs (the average
input length in the MLB development set is ∼3100
tokens) which are challenging for the attention
mechanism. Consider the following output of our
model on the MLB dataset: Ramirez’s two-run
double off Joe Blanton tied it in the sixth, y
Brandon Moss added a two-out RBI single off
Alan Embree to give Boston a 3-2 dirigir. Aquí, el
name of the pitcher should have been Joe Blanton
instead of Alan Embree. De hecho, Alan Embree is
the pitcher for the following play in the half in-
y. En este caso, attention diffuses over the rela-
tively long MLB macro plan, leading to inaccurate
content selection. We could alleviate this prob-
lem by adopting a noisy channel decomposition

521

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

(Yee et al., 2019; Yu et al., 2020), eso es, por
learning two different distributions: a conditional
model that provides the probability of translating
a paragraph plan to text and a language model that
provides an unconditional estimate of the output
(es decir., the whole game summary). Sin embargo, nosotros
leave this to future work.

For ROTOWIRE, the main source of errors is
the model’s inability to understand numbers. Para
ejemplo, Macro generates the following output:
The Lakers were the superior shooters in this
juego, going 48 percent from the field and 30 por-
cent from the three-point line, while the Jazz went
47 percent from the floor and 30 percent from
beyond the arc. Aquí, 30 percent should have been
24 percent for the Lakers but the language model
expects a higher score for the three-point line, y
desde 24 is low (especially compared to 30 scored
by the Jazz), it simply copies 30 scored by the
Jazz instead. A mechanism for learning better rep-
resentations for numbers (Wallace et al., 2019) o
executing operations such as argmax or minus (NO
et al., 2018) should help alleviate this problem.

Finalmente, although our focus so far has been on
learning document plans from data, the decoupling
of planning from generation allows to flexibly
generate output according to specification. Para
ejemplo, we could feed the model with manually
constructed macro plans, consequently controlling
the information content and structure of the output
summary (p.ej., for generating short or long texts,
or focusing on specific aspects of the game).

Expresiones de gratitud

We thank the Action Editor, Claire Gardent,
and the three anonymous reviewers for their
constructive feedback. We also thank Laura
Perez-Beltrachini for her comments on an earlier
draft of this paper, and Parag Jain, Hao Zheng,
Stefanos Angelidis and Yang Liu for helpful
discusiones. Nosotros
financial
the European Research Council
support of
(Lapata; award number 681760, ‘‘Translating
Multiple Modalities into Text’’).

acknowledge

el

Referencias

Dzmitry Bahdanau, Kyunghyun Cho, y yoshua
bengio. 2015. Traducción automática neuronal por
aprender juntos a alinear y traducir.
En
3rd International Conference on Learning

522

Representaciones, ICLR 2015, San Diego, California,
EE.UU, May 7–9, 2015, Conference Track
Actas.

Regina Barzilay and Mirella Lapata. 2005.
Collective content selection for concept-to-text
generación. In Proceedings of Human Language
Technology Conference and Conference on
Empirical Methods
in Natural Language
Procesando,
331–338, vancouver,
paginas
British Columbia, Canada. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.3115/1220575.1220617

Steven Bird, Ewan Klein, and Edward Loper.
2009. Natural Language Processing with
Python, O’Reilly Media.

En procedimientos de

Thiago Castro Ferreira, Chris van der Lee,
Emiel van Miltenburg, and Emiel Krahmer.
generación: A
data-to-text
2019. Neural
comparison between pipeline and end-to-
el
end architectures.
2019 Conference on Empirical Methods in
Natural Language Processing and the 9th
International Joint Conference on Natural
Idioma
(EMNLP-IJCNLP),
pages 552–562, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1052

Procesando

Wallace L. Chafe. 1979. The flow of thought
and the flow of language. In Talmy Giv´on,
editor, Syntax and Semantics, volumen 12,
pages 159–181, Academic Press Inc. DOI:
https://doi.org/10.1163/9789004
368897 008

Yen-Chun Chen and Mohit Bansal. 2018.
Fast abstractive summarization with reinforce-
selected sentence rewriting. En procedimientos de
the 56th Annual Meeting of the Association
para Lingüística Computacional (Volumen 1:
Artículos largos), pages 675–686, Melbourne,
Australia. Asociación
for Computational
Lingüística. DOI: https://doi.org/10
.18653/v1/P18-1063

Robert Dale. 1989. Generating referring expres-
sions in a domain of objects and processes.

Pablo Duboue and Kathleen McKeown. 2002.
Content planner construction via evolutionary

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

algorithms and a corpus-based fitness function.
the International Nat-
En procedimientos de
ural
Conferencia de Generación de Lenguas,
pages 89–96, Harriman, Nueva York, EE.UU.
Asociación de Lingüística Computacional.

Pablo A. Duboue and Kathleen R. McKeown.
2001. Empirically estimating order constraints
for content planning in generation. En profesional-
ceedings of the 39th Annual Meeting of the
Asociación de Lingüística Computacional,
pages 172–179, Tolosa, Francia. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.3115/1073012
.1073035

John C. Duchi, Elad Hazan, and Yoram Singer.
2011. Adaptive subgradient methods for online
learning and stochastic optimization. Diario de
Machine Learning Research, 12:2121–2159.

Angela Fan, David Grangier, y miguel
Auli. 2018. Controllable abstractive summa-
rization. In Proceedings of the 2nd Workshop
on Neural Machine Translation and Gen-
eration, pages 45–54, Melbourne, Australia.
Asociación de Lingüística Computacional.

Claire Gardent, Anastasia Shimorina, Shashi
Narayan, and Laura Perez-Beltrachini. 2017.
Creating training corpora for NLG micro-
planners. In Proceedings of the 55th Annual
reunión de
la Asociación de Computación-
lingüística nacional (Volumen 1: Artículos largos),
pages 179–188, vancouver, Canada. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P17
-1017

Albert Gatt and Emiel Krahmer. 2018. Survey
of the state of the art
idioma
generación: Core tasks, applications and eval-
uation. j. Artif. Intell. Res., 61:65–170. DOI:
https://doi.org/10.1613/jair.5477

in natural

Heng Gong, Xiaocheng Feng, Bing Qin, y
Ting Liu. 2019. Table-to-text generation with
effective hierarchical encoder on three dimen-
siones (row, column and time). En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),

pages 3143–3152, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1310

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor
oh. k. li. 2016. Incorporating copying mech-
anism in sequence-to-sequence learning. En
Proceedings of the 54th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1631–1640,
Berlina, Alemania. Asociación de Computación-
lingüística nacional.

Caglar Gulcehre,

Sungjin Ahn, Ramesh
Nallapati, Bowen Zhou, and Yoshua Bengio.
2016. Pointing the unknown words. En profesional-
cesiones de
the 54th Annual Meeting of
the Association for Computational Linguis-
tics (Volumen 1: Artículos largos), pages 140–149,
Berlina, Alemania. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/P16-1014

METRO. A. k. Halliday and Ruqaiya Hasan. 1976.
Cohesion in English, Londres. Longman. DOI:
https://doi.org/10.1162/neco
.1997.9.8.1735

Sepp Hochreiter y Jürgen Schmidhuber. 1997.
Memoria larga a corto plazo. Computación neuronal,
9:1735–1780.

Eduard H. Azul. 1993. Automated discourse
generation using discourse structure relations.
Artificial Intelligence, 63(1-2):341–385. DOI:
https://doi.org/10.1016/0004-3702
(93)90021-3

Hayate Iso, Yui Uehara, Tatsuya Ishigaki,
Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi,
Yusuke Miyao, Naoaki Okazaki, and Hiroya
Takamura. 2019. Learning to select,
track,
En curso-
and generate for data-to-text.
el
cosas de
Asociación de Lingüística Computacional,
pages 2102–2113, Florencia,
Italia. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P19
-1202

the 57th Annual Meeting of

Min-Yen Kan and Kathleen R. McKeown. 2002.
Corpus-trained text generation for summa-
rization. In Proceedings of the International

523

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Natural Language Generation Conference,
pages 1–8, Harriman, Nueva York, EE.UU. también-
ciation for Computational Linguistics.

Colin Kelly, Ann Copestake, and Nikiforos
Karamanis. 2009. Investigating content selec-
tion for language generation using machine
aprendiendo. In Proceedings of the 12th European
Taller sobre Generación de Lenguaje Natural
(ENLG 2009), pages 130–137, Atenas, Greece.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.3115/161
0195.1610218

Yuta Kikuchi, Graham Neubig, Ryohei Sasano,
Hiroya Takamura, and Manabu Okumura.
length in neural
2016. Controlling output
encoder-decoders. En Actas de la 2016
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1328–1338,
austin, Texas. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D16-1140

Guillaume Klein, Yoon Kim, Yuntian Deng,
Jean Senellart, y Alejandro Rush. 2017.
OpenNMT: Open-source toolkit
for neural
machine translation. In Proceedings of ACL
2017, Demostraciones del sistema, pages 67–72,
vancouver, Canada. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/P17-4012

Ioannis Konstas and Mirella Lapata. 2013.
Inducing document plans
for concept-to-
generación de texto. En Actas de la 2013
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje,
1503–1514,
seattle, Washington, EE.UU. Asociación para
Ligüística computacional.

paginas

Karen Kukich. 1983. Design of a knowledge-
In 21st Annual
based report generator.
la Asociación de Computación-
reunión de
lingüística nacional. DOI: https://doi.org
/10.3115/981311.981340

Anirban Laha, Parag Jain, Abhijit Mishra,
and Karthik Sankaranarayanan. 2020. Scal-
able micro-planned generation of discourse
from structured data. Computational Linguis-
tics, 45(4):737–763. DOI: https://doi
.org/10.1162/coli a 00363

R´emi Lebret, David Grangier, y miguel
text generation from
Auli. 2016. Neural
structured data with application to the biog-
raphy domain. En procedimientos de
el 2016
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1203–1213,
austin, Texas. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D16-1128

R. mi. Longacre. 1979. The paragraph as a
grammatical unit. In Talmy Giv´on, editor,
Syntax and Semantics, volumen 12, Académico
Press Inc., pages 115–133.

Jordan J. Louviere, Terry N. Flynn,

y
A. A. j. Marley. 2015. Best-Worst Scaling:
Teoría, Methods and Applications, Cambridge
Prensa universitaria. DOI: https://doi.org
/10.1017/CBO9781107337855

Jordan J. Louviere and George G. Woodworth.
1991. Best-worst scaling: A model for the
largest difference judgments. Universidad de
Alberta: Working Paper.

approaches

Thang Luong, Hieu Pham, and Christopher D.
Manning. 2015. Effective
a
attention-based neural machine translation. En
Actas de la 2015 Conferencia sobre el Imperio-
Métodos icales en el procesamiento del lenguaje natural,
pages 1412–1421, Lisbon, Portugal. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D15
-1166

Kathleen R. McKeown. 1992. Text Generation.
en procesamiento del lenguaje natural,

Estudios
Prensa de la Universidad de Cambridge.

Kathleen R. McKeown, Desmond A. Jordán,
Shimei Pan, James Shaw, and Barry A. allen.
1997. Language generation for multimedia
In Fifth Conference
healthcare briefings.
on Applied Natural Language Processing,
pages 277–282, Washington, corriente continua, EE.UU. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/974557
.974598

Hongyuan Mei, Mohit Bansal, and Matthew R.
walter. 2016. What to talk about and how?
Selective generation using LSTMs with coarse-
el
to-fine alignment.

En procedimientos de

524

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

2016 Conference of
the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
paginas
720–730, San Diego, California.
Asociación de Lingüística Computacional.

Amit Moryossef, Yoav Goldberg, and Ido Dagan.
2019. Step-by-step: Separating planning from
realization in neural data-to-text generation. En
Actas de la 2019 Conference of the
North American Chapter of the Association for
Ligüística computacional: Human Language
Technologies, Volumen 1 (Long and Short
Documentos),
2267–2277, Mineápolis,
Minnesota. Asociación de Computación
Lingüística.

paginas

Feng Nie, Jinpeng Wang, Jin-Ge Yao, Rong
Cacerola, and Chin-Yew Lin. 2018. Operation-
guided neural networks
for high fidelity
En procedimientos de
data-to-text generation.
el 2018 Conference on Empirical Meth-
probabilidades
Procesamiento del lenguaje,
pages 3879–3889, Bruselas, Bélgica. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D18
-1422

in Natural

Jekaterina Novikova, Ondˇrej Duˇsek,

y
Verena Rieser. 2017. The E2E dataset:
New challenges for end-to-end generation.
the 18th Annual SIG-
En procedimientos de
dial Meeting on Discourse and Dialogue,
pages 201–206, Sarrebruck, Alemania. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/W17
-5525

Romain Paulus, Caiming Xiong, y ricardo
Socher. 2018. A deep reinforced model for
En internacional
abstractive summarization.
Conferencia sobre Representaciones del Aprendizaje.

Laura Perez-Beltrachini and Mirella Lapata. 2018.
Bootstrapping generators from noisy data. En
Actas de la 2018 Conference of the
North American Chapter of the Association for
Ligüística computacional: Human Language
Technologies, Volumen 1 (Artículos largos),
pages 1516–1527, Nueva Orleans, Luisiana.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/N18-1137

Ratish Puduppully, Li Dong,

and Mirella
Lapata. 2019a. Data-to-text generation with
content selection and planning. En procedimientos
of the 33rd AAAI Conference on Artificial Intel-
ligence. Honolulu, Hawaii. DOI: https://
doi.org/10.1609/aaai.v33i01.330
16908

Ratish Puduppully, Li Dong, and Mirella Lapata.
2019b. Data-to-text generation with entity
modelado. In Proceedings of the 57th Annual
Meeting of the Association for Computational
Lingüística, pages 2023–2035, Florencia, Italia.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/P19-1195

Alec Radford, Jeffrey Wu, niño rewon, David
Luan, Dario Amodei, and Ilya Sutskever.
2019. Language models are unsupervised
multitask learners. OpenAI blog, 1(8):9. DOI:
https://doi.org/10.18653/v1/P19
-1195

Bryan Orme. 2009. Maxdiff analysis: Simple
and HB.

individual-level

logit,

counting,
Sawtooth Software.

Kishore

Papineni,

Salim Roukos,

Todd
Ward, y Wei-Jing Zhu. 2002. AZUL: a
method for automatic evaluation of machine
the 40th
En procedimientos de
traducción.
la Asociación para
Annual Meeting of
Ligüística computacional, páginas 311–318,
Filadelfia, Pensilvania, EE.UU. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.3115/1073083
.1073135

Cl´ement Rebuffel, Laure Soulier, Geoffrey
Scoutheeten, and Patrick Gallinari. 2020. A
hierarchical model for data-to-text generation.
Informa-
En
tion Retrieval, pages 65–80. Saltador. DOI:
https://doi.org/10.1007/978-3-030
-45439-5 5, PMCID: PMC7148215

European Conference

en

Ehud Reiter. 1995. NLG vs. templates. CORR,
cmp-lg/9504013v1. DOI: https://doi.org
/10.1017/S1351324997001502

Ehud Reiter and Robert Dale. 1997. Edificio
applied natural language generation systems.
Natural Language Engineering, 3(1):57–87.

525

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Ehud Reiter and Robert Dale. 2000. Edificio
Sistemas.
Language Generation
Natural
Estudios
en procesamiento del lenguaje natural,
Prensa de la Universidad de Cambridge. DOI: https://
doi.org/10.1017/CBO9780511519857

Fahimeh

Ioan
Saleh, Alexandre Berard,
Calapodescu, and Laurent Besacier. 2019. Naver
Labs Europe’s systems for
the document-
level generation and translation task at WNGT
the 3rd Work-
2019.
shop on Neural Generation and Translation,
pages 273–279, Hong Kong. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.18653/v1/D19-5631

En procedimientos de

Rico Sennrich, Barry Haddow, and Alexandra
Birch. 2016. Neural machine translation of
En profesional-
rare words with subword units.
cesiones de
the 54th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1715–1725,
Berlina, Alemania. Asociación de Computación-
lingüística nacional. DOI: https://doi
.org/10.18653/v1/P16-1162

Zhihong Shao, Minlie Huang, Jiangtao Wen,
Wenfei Xu, and Xiaoyan Zhu. 2019. Long and
diverse text generation with planning-based
hierarchical variational model. En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),
pages 3257–3268, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1321

Ilya Sutskever, Oriol Vinyals, and Quoc V.
Le. 2014. Sequence to sequence learning
with neural networks. En avances en neurología
Sistemas de procesamiento de información, volumen 27,
pages 3104–3112. Asociados Curran, Cª.

Shunsuke Takeno, Masaaki Nagata, and Kazuhide
Yamamoto. 2017. Controlling target features
in neural machine translation via prefix
constraints. In Proceedings of the 4th Workshop
on Asian Translation (WAT 2017), pages 55–63,
Taipéi, Taiwán. Asian Federation of Natural
Procesamiento del lenguaje.

Ran Tian, Shashi Narayan, Thibault Sellam, y
Ankur P. Parikh. 2019. Sticking to the facts:
Confident decoding for faithful data-to-text
generación. CORR, abs/1910.08684v2.

Chris van der Lee, Albert Gatt, Emiel van
Miltenburg, Sander Wubben,
and Emiel
Krahmer. 2019. Best practices for the human
evaluation of automatically generated text.
the 12th International
En procedimientos de
Conference on Natural Language Generation,
pages 355–368, Tokio, Japón. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.18653/v1/W19-8643

Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Leon Jones, Aidan N.. Gómez,
lucas káiser, y Illia Polosukhin. 2017.
Attention is all you need. In I. Guyon, Ud.. V.
Luxburg, S. bengio, h. Wallach, R. Fergus,
S. Vishwanathan, y r. Garnett, editores,
Avances en el procesamiento de información neuronal
Sistemas
Cª,
pages 5998–6008.

Asociados Curran,

30,

Oriol Vinyals, Meire Fortunato, and Navdeep
Jaitly. 2015. Pointer networks. In C. Cortes,
norte. D. lorenzo, D. D. Sotavento, METRO. Sugiyama,
en
y r. Garnett,
Neural Information Processing Systems 28,
pages 2692–2700, Asociados Curran, Cª.

editores, Avances

Eric Wallace, Yizhong Wang, Sujian Li, Sameer
singh, and Matt Gardner. 2019. Do NLP
models know numbers? Probing numer-
acy in embeddings. En procedimientos de
el
2019 Conference on Empirical Methods in
Natural Language Processing and the 9th
International Joint Conference on Natural
(EMNLP-IJCNLP),
Idioma
pages 5307–5315, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1534

Procesando

Ronald J. Williams and Jing Peng. 1990. Un
efficient gradient-based algorithm for on-line
recurrent network trajectories.
training of
Computación neuronal, 2(4):490–501. DOI:
https://doi.org/10.1162/neco.1990
.2.4.490

Sam Wiseman, Stuart Shieber, y alejandro
Rush. 2017. Challenges in data-to-document

526

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

el 2017
En procedimientos de
generación.
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje,
2253–2263,
Copenhague, Dinamarca. Asociación para Com-
Lingüística putacional. DOI: https://doi
.org/10.18653/v1/D17-1239

paginas

Yonghui Wu, Mike Schuster, Zhifeng Chen,
Quoc V. Le, Mohammad Norouzi, Wolfgang
Macherey, Maxim Krikun, Yuan Cao, Qin
gao, Klaus Macherey, Jeff Klingner, Apurva
Shah, Melvin Johnson, Xiaobing Liu, lucas
Kaiser, Stephan Gows, Yoshikiyo Kato, Taku
Kudo, Hideto Kazawa, Keith Stevens, Jorge
Kurian, Nishant Patil, Wei Wang, Cliff Young,
Jason Smith, Jason Riesa, Alex Rudnick,
Oriol Vinyals, Greg Corrado, Macduff Hughes,
and Jeffrey Dean. 2016. Google’s neural
machine translation system: Bridging the gap
between human and machine translation. CORR,
abs/1609.08144v2.

Zichao Yang, Diyi Yang, Chris Dyer, Xiao Dong
Él, Alex Smola, and Eduard Hovy. 2016.
Hierarchical attention networks for document
el 2016
clasificación.
Conference of
the North American Chap-
the Association for Computational
ter of

En procedimientos de

Lingüística: Tecnologías del lenguaje humano,
pages 1480–1489, San Diego, California. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/N16
-1174

Kyra Yee, Yann Dauphin, and Michael Auli. 2019.
Simple and effective noisy channel modeling
for neural machine translation. En procedimientos
del 2019 Conferencia sobre métodos empíricos
in Natural Language Processing and the 9th
International Joint Conference on Natural
(EMNLP-IJCNLP),
Idioma
pages 5696–5701, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1571

Procesando

Lei Yu, Laurent Sartran, Wojciech Stokowiec,
Wang Ling, Lingpeng Kong, Phil Blunsom,
and Chris Dyer. 2020. Better document-level
machine translation with Bayes’ rule. Trans-
acciones de la Asociación de Computación
Lingüística, 8:346–360. DOI: https://
doi.org/10.1162/tacl a 00319

Wlodek Zadrozny and Karen Jensen. 1991.
párrafos. computacional

Semántica
de
Lingüística, 17(2):171–210.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6

/

/
t

yo

a
C
_
a
_
0
0
3
8
1
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

527
Descargar PDF