Data-to-text Generation with Variational Sequential Planning
Ratish Puduppully and Yao Fu and Mirella Lapata
Institute for Language, Cognition and Computation
School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, Reino Unido
r.puduppully@sms.ed.ac.uk yao.fu@ed.ac.uk mlap@inf.ed.ac.uk
Abstracto
We consider the task of data-to-text genera-
ción, which aims to create textual output from
non-linguistic input. We focus on generating
long-form text, eso es, documents with mul-
tiple paragraphs, and propose a neural model
enhanced with a planning component respon-
sible for organizing high-level information in
a coherent and meaningful way. We infer
latent plans sequentially with a structured vari-
ational model, while interleaving the steps of
planning and generation. Text is generated by
conditioning on previous variational decisions
and previously generated text. Experiments on
two data-to-text benchmarks (ROTOWIRE and
MLB) show that our model outperforms strong
baselines and is sample-efficient in the face
of limited training data (p.ej., a few hundred
instancias).
1
Introducción
Data-to-text generation refers to the task of gener-
ating textual output from non-linguistic input such
as database tables, spreadsheets, or simulations of
physical systems (Reiter y Dale, 1997, 2000;
Gatt and Krahmer, 2018). Recent progress in
this area (Mei et al., 2016; Lebret et al., 2016;
Wiseman et al., 2017) has been greatly facil-
itated by the very successful encoder-decoder
neural architecture (Sutskever et al., 2014) y el
development of large scale datasets. ROTOWIRE
(Wiseman et al., 2017) and MLB (Puduppully
et al., 2019b) constitute such examples. They both
focus on the sports domain, which has histori-
cally drawn attention in the generation community
(Barzilay and Lapata, 2005; Tanaka-Ishii et al.,
1998; Robin, 1994) and consider the problem of
generating long target texts from database records.
Cifra 1 (reproduced from Puduppully and
Lapata, 2021) provides a sample from the MLB
conjunto de datos, which pairs human written summaries
697
(Table C in Figure 1) with major league baseball
game statistics. These are mostly scores (collec-
tively referred to as box score) which summarize
the performance of teams and players, for ex-
amplio, batters, pitchers, or fielders (Table A in
Cifra 1) and a play-by-play description of the
most important events in the game (Table B in
Cifra 1). Game summaries in MLB are relatively
largo (540 tokens on average) with multiple para-
graphs (15 on average). The complexity of the
input and the length of the game summaries pose
various challenges to neural models which, de-
spite producing fluent output, are often imprecise,
prone to hallucinations, and display poor con-
tent selection (Wiseman et al., 2017). Attempts to
address these issues have seen the development
of special-purpose modules that keep track of
salient entities (Iso et al., 2019; Puduppully et al.,
2019b), determine which records (see the rows in
Tables A and B in Figure 1) should be mentioned
in a sentence and in which order (Puduppully et al.,
2019a; Narayan et al., 2020), and reconceptualize
the input in terms of paragraph plans (Puduppully
and Lapata, 2021) to facilitate document-level
planificación (see Table D in Figure 1).
Específicamente, Puduppully and Lapata (2021) anuncio-
vocate the use of macro plans for improving
the organization of document content and struc-
tura. A macro plan is a sequence of paragraph
planes, and each paragraph plan corresponds to a
document paragraph. A macro plan is shown in
Table E (Cifra 1). Examples of paragraph plans
are given in Table D where
verbalizes records pertaining to entities and
Top/Bottom side of an inning. Verbalizations are
sequences of record types followed by their val-
ues. Document paragraphs are shown in Table C
and have the same color as their corresponding
plans in Table E. Durante el entrenamiento, Puduppully and
Transacciones de la Asociación de Lingüística Computacional, volumen. 10, páginas. 697–715, 2022. https://doi.org/10.1162/tacl a 00484
Editor de acciones: Ehud Reiter. Lote de envío: 10/2021; Lote de revisión: 3/2021; Publicado 6/2022.
C(cid:2) 2022 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 1: Example from the MLB dataset reproduced from Puduppully and Lapata (2021) with the authors’
permission. Table A is typically referred to as box score. It summarizes the data of the game per team and player.
Table B reports statistics pertaining to innings or play-by-play scores. Table C contains the game summary.
Paragraphs in Table C are separated with
delimiters. Table D contains paragraph plans obtained from
Tables A and B. Paragraph plans in the first column correspond to a single entity or event. Paragraph plans in
the second column describe combinations of entities or events.
to entities and
correspond to paragraphs in Table C. Table E contains the macro plan for the document in Table C. A macro plan
is a sequence of paragraph plans. Plan-document correspondences are highlighted using the same color.
Lapata (2021) learn to predict a macro plan from a
pool of paragraph plans, and produce a game sum-
mary based on it. Continuing with our example
En figura 1, plan (mi) is obtained from paragraph
planes (D), to give rise to game summary (C).
The intermediate macro plan renders genera-
tion more interpretable (differences in the output
can be explained by differences in macro plan-
y). It also makes modeling easier, the input is
no longer a complicated table but a sequence of
paragraph plans, which in turn allows us to treat
data-to-text generation as a sequence-to-sequence
learning problem. Sin embargo, decoding to a
long document remains challenging for at least
two reasons. En primer lugar, the macro plan may be en-
coded as a sequence but a very long one (más
than 3,000 tokens), which the decoder has to attend
to at each time step in order to generate a sum-
mary token-by-token. En segundo lugar, the prediction of
the macro plan is conditioned solely on the input
(es decir., pool of paragraph plans (D) En figura 1) y
does not make use of information present in the
summaries. We hypothesize that planning would
be more accurate were it to consider information
available in the table (and corresponding para-
graph plans) and the generated summary, more so
because the plans are coarse-grained and there is
a one-to-many relationship between a paragraph
plan and its realization. Por ejemplo, we can see
that the plan for
two very different realizations in the summary in
Cifra 1 (see first and third paragraph).
698
chronological order. Sin embargo, another ordering
might have been equally plausible, Por ejemplo,
describing innings where the highest runs are
scored first or innings that are important in flip-
ping the outcome of the match. In the face of
such diversity, there may never be enough data
to learn an accurate global plan. It is easier to
select a paragraph plan from the pool once some
of the summary is known, and different plans
can be predicted for the same input. Además,
the proposed model is end-to-end differentiable
and gradients for summary prediction also inform
plan prediction.
Our contributions can be summarized as fol-
lows: (1) We decompose data-to-text generation
into sequential plan selection and paragraph gener-
ación. The two processes are interleaved and gen-
eration proceeds incrementally. We look at what
has been already generated, make a plan on what
to discuss next, realize the plan, and repeat; (2) en
contrast to previous models (Puduppully et al.,
2019a; Puduppully and Lapata, 2021), dónde
content plans are monolithic and determined in ad-
vance, our approach is more flexible, it simplifies
modelado (we do not need to learn alignments be-
tween paragraph plans and summary paragraphs),
and leads to sample efficiency in low resource
escenarios; (3) our approach scales better for tasks
involving generation of
long multi-paragraph
textos, as we do not need to specify the document
plan in advance; y (4) experimental results on
English and German ROTOWIRE (Wiseman et al.,
2017; Hayashi et al., 2019), and MLB (Puduppully
et al., 2019b) show that our model is well-suited to
long-form generation and generates more factual,
coherent, and less repetitive output compared to
strong baselines.
We share our code and models in the hope of
being useful for other tasks (p.ej., story generation,
summarization).1
2 Trabajo relacionado
A long tradition in natural language generation
views content planning as a central component
to identifying important content and structuring
it appropriately (Reiter y Dale, 2000). Earlier
work has primarily made use of hand-crafted con-
tent plans with some exceptions that pioneered
1https://github.com/ratishsp/data2text
-seq-plan-py.
Cifra 2: Conceptual sequence of interleaved plan-
ning and generation steps. The paragraph plan and its
corresponding paragraph have the same color.
En este trabajo, we present a model which inter-
leaves macro planning with text generation (ver
Cifra 2 for a sketch of the approach). We begin
by selecting a plan from a pool of paragraph plans
(see Table D in Figure 1), and generate the first
paragraph by conditioning on it. We select the next
plan by conditioning on the previous plan and the
previously generated paragraph. We generate the
next paragraph by conditioning on the currently
selected plan, the previously predicted plan, y
generated paragraph. We repeat this process until
the final paragraph plan is predicted. We model
the selection of paragraph plans as a sequential
latent variable process, which we argue is in-
tuitive since content planing is inherently latent.
Contrary to Puduppully and Lapata (2021), we do
not a priori decide on a global macro plan. Bastante,
our planning process is incremental and as a result
less rigid. Planning is informed by generation and
viceversa, which we argue should be mutually
beneficial (they are conditioned on each other).
Durante el entrenamiento, the sequential latent model can
better leverage the summary to render paragraph
plan selection more accurate and take previous
decisions into account. We hypothesize that the
interdependence between planning and generation
allows the model to cope with diversity. In gen-
eral, there can be many ways in which the input
table can be described in the output summary, eso
es, different plans give rise to equally valid game
summaries. The summary in Figure 1 (Table C)
focuses on the performance of Brad Keller, who is
a high scoring pitcher (first three paragraphs). Un
equally plausible summary might have discussed
a high scoring batter first (p.ej., Ryan O’Hearn).
Also notice that the summary describes innings in
699
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
learning-based approaches. Por ejemplo, Duboue
and McKeown (2001) learn ordering constraints
on the content plan, while Kan and McKeown
(2002) learn content planners from semantically
annotated corpora, and Konstas and Lapata (2013)
predict content plans using grammar rules whose
probabilities are learned from training data.
More recently, there have been attempts to
equip encoder-decoder models (Bahdanau et al.,
2015; Wiseman et al., 2017) with content plan-
ning modules. Puduppully et al. (2019a) introduce
micro planning: They first learn a content plan
corresponding to a sequence of records, y luego
generate a summary conditioned on it. Narayan
et al. (2020) treat content selection as a task
similar to extractive summarization. Específicamente,
they post-process Pudupully et al.’s (2019a)
micro-plans with special tokens identifying the
beginning and end of a sentence. Their model
first extracts sentence plans and then verbalizes
them one-by-one by conditioning on previously
generated sentences. Moryossef et al. (2019b,a)
propose a two-stage approach that first predicts
a document plan and then generates text based
on it. The input to their model is a set of RDF
(cid:3)Subject, Object, Predicate(cid:4) tuples. Their docu-
ment plan is a sequence of sentence plans where
each sentence plan contains a subset of tuples in a
specific order. Text generation is implemented us-
ing a sequence-to-sequence model enhanced with
attention and copy mechanisms (Bahdanau et al.,
2015). They evaluate their model on the WebNLG
conjunto de datos (Gardent et al., 2017), where the outputs
are relatively short (24 tokens on average).
Our approach is closest
to Puduppully and
Lapata (2021), who advocate macro planning as
a way of organizing high-level document content.
Their model operates over paragraph plans that
are verbalizations of the tabular input and pre-
dicts a document plan as a sequence of paragraph
planes. In a second stage, the summary is generated
from the predicted plan making use of atten-
tion enriched with a copy mechanism. We follow
their formulation of content planning as para-
graph plan prediction. Our model thus operates
over larger content units compared to related work
(Puduppully et al., 2019a; Narayan et al., 2020)
and performs the tasks of micro- and macro-
planning in one go. In contrast to Puduppully
and Lapata (2021), we predict paragraph plans
and their corresponding paragraphs jointly in an
incremental fashion. Our approach is reminiscent
of psycholinguistic models of speech production
(Levelt, 1993; Taylor and Taylor, 1990; Guhe,
2020), which postulate that different levels of pro-
cesando (or modules) are responsible for language
generación; these modules are incremental, cada
producing output as soon as the information it
needs is available and the output is processed
immediately by the next module.
We assume that plans form a sequence of
párrafos, which we treat as a latent variable
and learn with a structured variational model.
Sequential latent variables (Chung et al., 2015;
Fraccaro et al., 2016; Goyal et al., 2017) tener
previously found application in modeling atten-
tion in sequence-to-sequence networks (Shankar
and Sarawagi, 2019), document summarization
(Le et al., 2017), controllable generation (Li and
Rush, 2020; Fu et al., 2020), and knowledge-
grounded dialogue (Kim y cols., 2020). En el
context of data-to-text generation, latent variable
models have been primarily used to inject diver-
sity in the output. Shao et al. (2019) generate a
sequence of groups (essentially a subset of the
aporte) which specifies the content of the sentence
to be generated. Their plans receive no feedback
from text generation, they cover a small set of in-
put items, and give rise to relatively short docu-
mentos (aproximadamente 100 tokens long). Ye et al.
(2020) use latent variables to disentangle the
content from the structure (operationalized as tem-
plates) of the output text. Their approach gen-
erates diverse output output by sampling from
the template-specific sample space. They apply
their model to single-sentence generation tasks
(Lebret et al., 2016; Reed et al., 2018).
3 Modelo
Following Puduppully and Lapata (2021), we as-
sume that at training time our model has access
to a pool of paragraph plans E (see Table D in
Cifra 1), which represent a clustering of records.
We explain how paragraph plans are created from
tabular input in Section 4. Given E, we aim to gen-
erate a sequence of paragraphs y = [y1, . . . , yT ]
that describe the data following a sequence of
chosen plans z = [z1, . . . , zT ]. Let yt denote a
párrafo, which can consist of multiple sen-
tenencias, and T the count of paragraphs in a
summary. With a slight abuse of notation, super-
scripts denote indices rather than exponentiation.
So, yt
i refers to the i-th word in the t-th paragraph.
700
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 3: Model workflow. Solid arrows show dependencies between random variables. Dashed arrows show the
computation graph whose backbone consists of an LSTMtext and an LSTMplan. Note that the variational model
and the generative model are tied closely with the shared LSTM. To generate long documents, the model observes
what has been already generated, decides on a plan about what to discuss next, uses this plan to guide next stage
generación, and repeats until the end.
A plan z = [z1, . . . , zT ] is a list of discrete vari-
ables where zt = j means that we choose the j-th
item from pool E of candidate plans to guide the
generation of paragraph yt.
Generation with Latent Plans The core tech-
nique of our model is learning the sequence of
latent plans that guides long document genera-
ción. We consider a conditional generation setting
where the input E is a set of paragraph plans
and the output y1:T are textual paragraphs ver-
balizing the selected sequence z = z1:t . Nuestro
goal is to induce variables z that indicate which
paragraphs are being talked about and in which
orden. Similar to previous work (Li and Rush,
2020; Fu et al., 2020), we model this process as
a conditional generative model that produces both
y and z and factorizes as:
pθ(y, z|mi) =
(cid:2)
pθ(zt|y
7
this using the shorthand
paragraph plan for an event is the verbalization of
the players in the event followed by the verbaliza-
tion of play-by-plays. Candidate paragraph plans
703
RW
MLB
DE-RW
Vocab Size
# Tokens
# Instances
# Paragraphs
# Record Types
Avg Records
Avg Length
Avg Plan length
723
11.3K 38.9K
9.5k
1.5M 14.3M 234K
4.9K 26.3K
399K 47.7K
39
628
337.1
10.6
7k
39
628
323.6
9.5
53
565
542.1
15.1
Mesa 1: Dataset statistics for ROTOWIRE (RW),
MLB, and German ROTOWIRE (DE-RW). Vo-
cabulary size, number of tokens, number of
instancias (es decir., table-summary pairs), number of
párrafos, number of record types, average num-
ber of records, average summary length, promedio
macro plan length measured in terms of number
of paragraphs.
E are obtained by enumerating entities and events
and their combinations (see Table D in Figure 1).
Oracle macro plans are obtained by matching the
mentions of entities and events in the gold sum-
mary with the input table. We make use of these
oracle macro plans during training. The versions
of MLB and ROTOWIRE released by Puduppully
and Lapata (2021) contain paragraph delimiters
for gold summaries; we preprocessed the German
ROTOWIRE in a similar fashion.
Mesa 1 also shows the average length of the
macro plan in terms of the number of paragraph
plans it contains. Esto es 10.6 for ROTOWIRE, 15.1
for MLB, y 9.5 for German RotoWire.
Training Configuration We train our model
with the AdaGrad optimizer
(Duchi et al.,
2011) and tune parameters on the development
colocar. We use a learning rate of 0.15. We learn a
joint subword vocabulary (Sennrich et al., 2016)
for paragraph plans and summaries with 6K
merge operations for ROTOWIRE, 16K merge op-
erations for MLB, and 2K merge operations for
German ROTOWIRE. The model is implemented
on a fork of OpenNMT-py (Klein et al., 2017).
For efficiency, we batch using summaries in-
stead of individual paragraphs. Batch sizes for
MLB, ROTOWIRE, and German-ROTOWIRE are
8, 5, y 1, respectivamente. We set λ to 2 en
Ecuación (18). In Equation (19), c is 1/100000
for MLB, 1/50000 for ROTOWIRE, y 1/30000
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
for German-ROTOWIRE. We set the temperature
of Gumbel-Softmax to 0.1.
During inference in MLB, similar to Puduppully
and Lapata (2021), we block the repetition of
paragraph plan bigrams (es decir., we disallow the rep-
etition of (zt, zt+1)) and select the paragraph plan
with the next higher probability in Equation (8).
Además, we block consecutive repetitions, y
more than two repetitions of a unigram. Dur-
ing training we observed high variance in the
length of paragraphs yt, since the same plan can
result in a shorter or longer paragraph. por ejemplo-
amplio,
párrafos (first and third paragraph) with dif-
ferent lengths in Figure 1. We found that this
encourages the model
to be conservative and
generate relatively short output. We control the
paragraph length (Fan et al., 2018) by creating
discrete bins, each containing approximately an
equal number of paragraphs. Durante el entrenamiento, nosotros
prepend the embedding of the bin to the current
plan rt
z (see Equation (11)). For inference, bins
are tuned on the validation set.
We run inference for 15 paragraphs on
ROTOWIRE and German ROTOWIRE, and for
20 paragraphs on MLB; we stop when the model
predicts the end of paragraph plan token EOP.
Unlike previous work (Wiseman et al., 2017;
Puduppully et al., 2019a,b, inter alia), we do
not make use of truncated Back Propagation
Through Time (Williams and Peng, 1990), como
we incrementally generate paragraphs instead of
long documents.
System Comparisons We compared our model
con: (1) a Template-based generator which cre-
ates a document consisting of template sentences.
We used Wiseman et al.’s (2017) system on
ROTOWIRE and Puduppully et al.’s (2019b) sistema
on MLB. They are both similar in that they de-
scribe team scores followed by player specific
statistics and a concluding statement. In MLB, el
template additionally describes play-by-play de-
tails. We also created a template system for Ger-
man ROTOWIRE following a similar approach. (2)
ED+CC, the best performing model of Wiseman
et al. (2017). It consists of an encoder-decoder
model equipped with attention and copy mecha-
nisms. (3) NCP+CC, the micro planning model
of Puduppully et al. (2019a). It first creates a
content plan by pointing to input records through
the use of Pointer Networks (Vinyals et al., 2015).
The content plan is then encoded with a BiL-
STM and decoded using another LSTM with an
attention and copy mechanism. (4) ENT,
el
entity model of Puduppully et al. (2019b). Él
creates entity-specific representations which are
updated dynamically. At each time step during
decoding, their model makes use of hierarchical
attention by attending over entity representa-
tions and the records corresponding to these.
(5) MACRO,
the two-stage planning model
of Puduppully and Lapata (2021), which first
makes use of Pointer Networks (Vinyals et al.,
2015) to predict a macro plan from a set of candi-
date paragraph plans. The second stage takes the
predicted plan as input and generates the game
summary with a sequence-to-sequence model en-
hanced with attention and copy mechanisms. En
addition, we compare with a variant of Macro
enhanced with length control (+Papelera).
5 Resultados
Our experiments were designed to explore how the
proposed model compares to related approaches
which are either not enhanced with planning mod-
ules or non-incremental. We also investigated the
sample efficiency of these models and the quality
of the predicted plans when these are available.
The majority of our results focus on automatic
evaluation metrics. We also follow previous
trabajar (Wiseman et al., 2017; Puduppully et al.,
2019a,b; Puduppully and Lapata, 2021) in eliciting
judgments to evaluate system output.
5.1 Automatic Evaluation
producción
evaluate model
Nosotros
using BLEU
(Papineni et al., 2002) with the gold summary as
a reference. We also report model performance
against the Information Extraction (IE) metrics of
Wiseman et al. (2017) which are defined based
on the output of an IE model which extracts entity
(team and player names) and value (numbers)
pairs from the summary and predicts the type of
relation between them.
Let ˆy be the gold summary and y be the model
producción. Relation Generation (RG) measures the
precision and count of relations obtained from y
that are found in the input table. Content Selec-
ción (CS) measures the precision, recordar, and F-
measure of relations extracted from y also found
in ˆy. And Content Ordering (CO) measures the
704
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
complement of the Damerau-Levenshtein dis-
tance between relations extracted from y and ˆy.
Higher values are better for RG Precision, CS
F-measure, CO, and BLEU. We reuse the IE model
from Puduppully et al. (2019a) for ROTOWIRE,
Puduppully and Lapata (2021) for MLB, y
Hayashi et al. (2019) for German ROTOWIRE. Nuestro
computation of IE metrics for all systems includes
duplicate records (Puduppully and Lapata, 2021).
In addition to IE-based metrics, we report the
number of errors made by systems according
to Number (incorrect number in digits, number
spelled in words, etc.), Nombre (incorrect names
of teams, jugadores, days of week, etc.), and Word
(errors in usage of words) following the classifi-
cation of Thomson and Reiter (2020). We detect
such errors automatically using the system of
Kasner et al. (2021), which scored best against
gold standard human annotations of the same type
(Thomson and Reiter, 2021). We only report these
metrics for English ROTOWIRE, since error an-
notations (for automatic metric learning) are not
available for other datasets. Además, with regard
to Word errors, we only report errors for incor-
rect usage of the word double-double.3 We found
such errors to be detected reliably, en contraste con
Word errors as a whole for which the precision
of the system of Kasner et al. (2021) is ∼50%.
Lower values are better for the Number, Nombre,
and double-double errors. We note metrics such as
RG precision, Número, Nombre, and double-double
errors directly compute the accuracy of the gener-
ation model. Metrics such as CS, CO, and BLEU
measure how similar model output is against a
reference summary. De este modo, CS, CO, and BLEU
measure generation accuracy indirectly under the
assumption that gold summaries are accurate.
MLB Dataset Table 2 summarizes our results
on MLB. Our sequential planning model (Seq-
Plan) has the highest RG P among neural mod-
els and performs best in terms of CS F, CO,
and BLEU. The variant of Macro with length
control (+Papelera) performs comparably or worse
than Macro.
To examine the importance of latent sequential
planificación, we also present a variant of our model
that uniformly samples a plan from the pool E
instead of Equation (8) (see row w(ith) Uniforme
3A double-double occurs when a player scores 10 puntos
or more in two record types: puntos, rebounds, assists, steals,
and blocked shots.
MLB
RG
CS
CO
#
P% P% R% F% DLD%
AZUL
Templ
62.3 99.9 21.6 55.2 31.0
11.0
4.12
ED+CC
NCP+CC
ENT
Macro
+Papelera
32.5 91.3 27.8 40.6 33.0
19.6 81.3 44.5 44.1 44.3
23.8 81.1 40.9 49.5 44.8
. . . .30.8 . . . .94.4 40.8 54.9 . . . .46.8
31.2 93.7 38.3 52.4 44.2
SeqPlan
28.9 95.9 . . . .43.3 . . . .53.5 47.8
w Uniform 18.5 90.9 36.5 30.6 33.3
w Oracle
27.6 95.9 42.5 50.4 46.1
28.6 95.9 41.4 50.8 45.6
2-Stage
17.1
21.9
20.7
21.8
. . . . .
21.6
22.7
14.5
22.0
21.3
9.68
9.68
11.50
12.62
. . . . . .
12.32
14.29
10.30
13.13
13.96
Mesa 2: MLB results (test set); relation genera-
ción (RG) count (#) and precision (P%), contenido
selección (CS) precisión (P%), recordar (R%), y
F-measure (F%), content ordering (CO) as com-
plement of normalized Damerau-Levenshtein dis-
tance (DLD%), and BLEU. Highest and . . . . . . .
segundo
highest generation models are highlighted.
. . . . . . .
en mesa 2). This version obtains lower values
compared to SeqPlan across all metrics under-
scoring the importance of sequential planning.
We also present two variants of SeqPlan: (a) uno
that makes use of oracle (instead of predicted)
plans during training to generate yt; essentially, él
replaces zt with z∗ in Equation (12) (row w(ith)
Oracle in Table 2); y (b) a two-stage model
that trains the planner (Ecuación (15)) and gen-
erator (Ecuación (12)) separately (row 2-stage in
Mesa 2)—in this case, we use greedy decod-
ing to sample zt from Equation (15) instead of
Gumbel-Softmax and replace zt with z∗ in Equa-
ción (12). Both variants are comparable to SeqPlan
in terms of RG P but worse in terms of CS F, CO,
and BLEU.
Además, we evaluate the accuracy of the
inferred plans by comparing them against oracle
planes, using the CS and CO metrics (computed
over the entities and events in the plan).4 Mesa 4
shows that SeqPlan achieves higher CS F and
CO scores than Macro. De nuevo, this indicates that
planning is beneficial, particularly when taking
the table and the generated summary into account.
4To compute the accuracy of macro plans, entities and
events from the model’s plan need to be compared against
entities and events in the oracle macro plan. Puduppully and
Lapata (2021) obtained the entities and events for the oracle
macro plan by extracting these from reference summaries.
We noted that this includes coreferent or repeat mentions of
entities and events within a paragraph. We instead extract
entities and events directly from the oracle macro plan.
705
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
RW
RG
CS
CO
#
P% P% R% F% DLD%
AZUL
Templ
54.3 99.9 27.1 57.7 36.9
13.1
8.46
WS-2017
ED+CC
NCP+CC
ENT
RBF-2020
Macro
+Papelera
34.1 75.1 20.3 36.3 26.1
35.9 82.6 19.8 33.8 24.9
40.8 87.6 28.0 51.1 36.2
32.7 . . . .91.7 34.7 48.5 40.5
44.9 89.5 23.9 47.0 31.7
42.1 97.6 34.1 57.8 42.9
61.0 97.2 26.8 66.1 38.2
SeqPlan
. . . .46.7 97.6 . . . .30.6 . . . .57.4 . . . .39.9
w Uniform 22.0 80.2 18.2 19.6 18.9
w Oracle
50.4 97.2 29.0 59.1 38.9
53.4 97.5 28.5 61.3 38.9
2-stage
DE-RW
RG
CS
#
P% P% R% F% DLD%
Templ
54.4 99.9 17.2 63.0 27.1
11.6
ED+CC
NCP+CC
ENT
RBF-2020
Macro
+Papelera
6.7 18.8
. . . .24.8 59.3
9.9
17.7 52.5 11.3 . . . .25.7 15.7
17.4 . . . .64.7 . . . .13.3 24.0 . . . .17.1
0.6
0.4
1.1
0.2
5.1 21.0
8.3
7.9 20.0 11.3
4.0
30.2 49.7
20.4 55.0
6.8
9.6
. . . .9.8
0.3
6.1
8.1
SeqPlan
13.8 91.8 38.0 38.4 38.2
21.2
12.4
12.0
15.8
16.6
14.3
17.7
15.8
16.7
. . . . .
6.0
16.8
16.1
CO
14.19
14.99
16.50
16.12
17.16
15.46
16.48
16.26
. . . . . .
8.61
16.32
16.61
AZUL
7.32
5.09
. . . .7.29
6.52
2.29
5.15
6.18
8.65
Mesa 3: Evaluation on ROTOWIRE (RW) and Ger-
man ROTOWIRE (DE-RW) test sets; relation gener-
ación (RG) count (#) and precision (P%), contenido
selección (CS) precisión (P%), recordar (R%), y
F-measure (F%), content ordering (CO) as com-
plement of normalized Damerau-Levenshtein dis-
tance (DLD%), and BLEU. Highest and . . . . . . .
segundo
highest generation models are highlighted.
. . . . . . . .
English and German ROTOWIRE Results on
ROTOWIRE are presented in Table 3 (arriba). In ad-
dition to Templ, ED+CC, NCP+CC, and ENT,
we compare with the models of Wiseman et al.
(2017) (WS-2017) and Rebuffel et al. (2020)
(RBF-2020). WS-2017 is the best performing
model of Wiseman et al. (2017). Note that ED+CC
is an improved re-implementation of WS-2017.
RBF-2020 represents the current state-of-the-art
on ROTOWIRE, and is composed of a Transformer
arquitectura codificador-decodificador (Vaswani et al.,
2017) with hierarchical attention on entities and
their records. The models of Saleh et al. (2019),
Iso et al. (2019), and Gong et al. (2019) are not
comparable as they make use of information ad-
ditional to the table such as previous/next games
or the author of the game summary. The model of
706
Narayan et al. (2020) is also not comparable as it
relies on a pretrained language model (Rothe et al.,
2020) to generate the summary sentences.
Mesa 3 (abajo) shows our results on German
ROTOWIRE. We compare against NCP+CC’s entry
in the WNGT 2019 shared task5 (Hayashi et al.,
2019), and our implementation of Templ, ED+
CC, ENT, Macro, and RBF-2020. Saleh et al.
(2019) are not comparable as they pretrain on 32M
parallel and 420M monolingual data. Asimismo,
Puduppully et al. (2019C) make use of a jointly
trained multilingual model by combining ROTO-
WIRE with German ROTOWIRE.
We find that SeqPlan achieves highest RG P
among neural models, and performs on par with
Macro (it obtains higher BLEU but lower CS F and
CO scores). The +Bin variant of Macro performs
better on BLEU but worse on other metrics. Como
en mesa 2, w Uniform struggles across metrics
corroborating our hypothesis that latent sequential
planning improves generation performance. El
other two variants (w Oracle and 2-Stage) son
worse than SeqPlan in RG P and CS F, comparable
in CO, and slightly higher in terms of BLEU.
On German, our model is best across metrics,
achieving an RG P of 91.8% which is higher by
42% (absolute) compared to Macro. De hecho, el
RG P of SeqPlan is superior to Saleh et al. (2019),
whose model is pretrained with additional data
and is considered state of the art (Hayashi et al.,
2019). RG# is lower mainly because of a bug in
the German IE that is excludes number records.
RG# for NCP+CC and Macro is too high because
the summaries contain considerable repetition.
The same record will repeat at least once with
NCP+CC and three times with Macro, mientras
solo 7% of the records are repeated with SeqPlan.
Mesa 4 evaluates the quality of the plans in-
ferred by our model on the ROTOWIRE dataset.
As can be seen, SeqPlan is slightly worse than
Macro in terms of CS F and CO. We believe this
is because summaries in ROTOWIRE are somewhat
formulaic, with a plan similar to Templ: an open-
ing statement is followed by a description of the
top scoring players, and a conclusion describing
the next match. Such plans can be learned well by
Macro without access to the summary. MLB texts
show much more diversity in terms of length, y
5We thank Hiroaki Hayashi for providing us with the
output of the NCP+CC system.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Datasets
Macro
SeqPlan
Macro
SeqPlan
Macro
SeqPlan
B
l
METRO
W.
R
W.
R
–
mi
D
CS
CO
P% R% F% DLD%
27.0
73.6
27.1
74.4
45.9
51.1
56.5
60.6
81.5
79.1
86.8
73.1
62.7
61.6
34.2
60.8
70.9
69.3
49.0
66.4
36.3
35.5
30.1
31.0
Mesa 4: Evaluation of macro planning stage (prueba
colocar); content selection (CS) precisión (P%), re-
call (R%), and F-measure (F%), content ordering
(CO) as complement of normalized Damerau-
Levenshtein distance (DLD%).
Templ
WS-2017
ED+CC
NCP+CC
ENT
RBF-2020
Macro
SeqPlan
Number Name
3.05*
0.08*
double-double
0.00*
13.01*
8.11*
7.89*
5.89*
6.20*
2.57
2.70
9.66*
8.29*
7.76*
7.24*
8.39*
4.60*
6.56
0.36*
0.31*
0.14
0.15
0.41*
0.18
0.20
Mesa 5: Número, Nombre, and double-double
(Word) errors per example. Systems significantly
different from SeqPlan are marked with an as-
terisk * (using a one-way ANOVA with posthoc
Tukey HSD tests; p ≤ 0.05).
the sequencing of entities and events. The learn-
ing problem is also more challenging, apoyado
by the fact that the template system does not do
very well in this domain (es decir., it is worse in BLEU,
CS F, and CO compared to ROTOWIRE). In Ger-
man ROTOWIRE, SeqPlan plans achieve higher
CS F and CO than Macro.
Mesa 5 reports complementary automatic met-
rics on English ROTOWIRE aiming to assess the
factuality of generated output. We find that Templ
has the least Number, Nombre, and double-double
errores. This is expected as it simply reproduces
facts from the table. SeqPlan and Macro have
similar Number errors, and both are significantly
better than other neural models. SeqPlan has sig-
nificantly more Name errors than Macro, y
significantly fewer than other neural models. En-
707
Cifra 4: Sample efficiency for (a) MLB and (b)
ROTOWIRE datasets. SeqPlan and Macro are trained
on different portions (%) of the training dataset and
performance is measured with RG P%.
spection of Name errors revealed that these are
mostly due to incorrect information about next
juegos. Such information is not part of the in-
put and models are prone to hallucinate. SeqPlan
fares worse as it attempts to discuss next games
for both teams while Macro focuses on one team
solo. In terms of double-double errors, SeqPlan is
comparable to Macro, ENT, and NCP+CC, y
significantly better than WS-2017, ED+CC, y
RBF-2020.
5.2 Sample Efficiency
We also evaluated whether SeqPlan is more
sample-efficient in comparison to Macro, by ex-
amining how RG P varies with (training) datos
tamaño. As shown in Figure 4, the difference be-
tween SeqPlan and Macro is more pronounced
when relatively little data is available. Para examen-
por ejemplo, con 10% of training data, RG P for SeqPlan
on MLB is 85.7% y 92.1% on ROTOWIRE. en contra-
contraste, Macro obtains 57.5% on MLB and 47.1% en
ROTOWIRE. As more training data becomes avail-
capaz, the difference in RG P decreases. The slope
of increase in RG P for Macro is higher for
ROTOWIRE than MLB. We hypothesize that this
is because MLB has longer summaries with more
párrafos, and it is thus more difficult for Macro
to learn alignments between paragraph plans and
text paragraphs in the game summary.
5.3 Human Evaluation
We used the Amazon Mechanical Turk crowd-
sourcing platform for our judgment elicitation
estudiar. To ensure consistent ratings (van der Lee
et al., 2019), we required that raters have com-
el menos
pleted at
98% approval rate. Participants were restricted to
English-speaking countries (EE.UU, Reino Unido, Canada,
el menos 1,000 tareas, and have at
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Australia, Irlanda, or New Zealand) and were al-
lowed to provide feedback or ask questions. Raters
were paid an average of $0.35 for each task, en-
suring that the remuneration is higher than the
minimum wage per hour in the United States. Nosotros
compared SeqPlan with Gold, Templ, ED+CC,
and Macro; we did not compare against ENT,
as previous work (Puduppully and Lapata, 2021)
has shown that it performs poorly against Macro.
For ROTOWIRE, we additionally compared against
RBF-2020.
Supported and Contradicted Facts Our first
eliciation study provided raters with box scores
(and play-by-plays in the case of MLB), a lo largo de
with sentences randomly extracted from game
summaries. We asked them to count supported and
contradicting facts (ignoring hallucinations). Par-
ticipants were given a cheatsheet to help them
understand box score and play-by-play statistics
as well as examples of sentences with the correct
count of supported and contradicting facts. Este
evaluation was conducted on 40 summaries (20
for each dataset), with four sentences per sum-
mary, each rated by three participants. For MLB,
this resulted in 300 tareas (5 systems × 20 sum-
maries × 3 raters) and for ROTOWIRE in 360 (6
systems × 20 summaries × 3 raters). Altogether,
we had 177 Participantes. The agreement between
raters using Krippendorff’s α for supported facts
and contradicting facts was 0.43.
Mesa 6 (columns #Supp and #Contra) presents
nuestros resultados. Lower is better for contradicting facts.
In case of supporting facts, the count should nei-
ther be too high nor too low. A high count of
supporting facts indicates indicates poor content
selección. A low count of supporting facts with
a high count of contradicting facts indicates low
accuracy of generation.
Templ achieves the lowest count of contradict-
ing facts and the highest count of supported facts
for both the datasets. This is no surprise as it
essentially regurgitates facts (es decir., records) de
the table. On MLB, all systems display a com-
parable count of supported facts (differences are
not statistically significant), with the exception
of Templ, which contains significantly more. En
terms of contradicting facts, SeqPlan performs on
par with Macro, Gold, and Templ, and is sig-
nificantly better than ED+CC. On ROTOWIRE, en
terms of supported facts, SeqPlan performs on
par with the other neural models, is significantly
MLB
#Supp #Contra Gram
Coher
Concis
Gold
Templ
ED+CC
Macro
SeqPlan
3.59
4.21*
3.42
3.76
3.68
21.67
0.14
29.17
14.17
0.04 −58.33* −48.33*
9.17
0.72* −32.50* −18.33* −48.33*
22.50
0.25
2.50
0.19
37.50
31.67
15.00
22.50
42.67*
ROTOWIRE #Supp #Contra Gram
3.63*
Gold
7.57*
Templ
ED+CC
3.92
RBF-2020 5.08
4.00
Macro
4.84
SeqPlan
Coher
Concis
40.67
28.00
−57.33* −55.33* −34.67*
4.00 −14.67* −13.33
−0.67
1.33
6.00
10.00
7.33
0.67
10.67
20.67
4.00
0.07
0.08
0.91*
0.67*
0.27
0.17
Mesa 6: Average number of supported (#Supp)
and contradicting (#Contra) facts in game sum-
maries and best-worst scaling evaluation for
Coherence (Coher), Conciseness (Concis), y
Grammaticality (Gram). Lower is better for con-
tradicting facts; higher is better for Coherence,
Conciseness, and Grammaticality. Systems sig-
nificantly different from SeqPlan are marked with
an asterisk * (using a one-way ANOVA with post
hoc Tukey HSD tests; p ≤ 0.05).
higher than Gold, and significantly lower than
Templ. In terms of contradicting facts, SeqPlan
performs on par with Macro, Gold, and Templ, y
significantly better than ED+CC and RBF-2020.
Coherence, Grammaticality, and Conciseness
In our second study, raters were asked to choose
the better summary from a pair of summaries based
on Coherence (Is the summary well structured and
well organized and does it have a natural ordering
of the facts?), Conciseness (Does the summary
avoid unnecessary repetition including whole sen-
tenencias, facts or phrases?), and Grammaticality
(Is the summary written in well-formed En-
inglés?). For this study, we required that the raters
be able to comfortably comprehend summaries
of NBA/MLB games. We obtained ratings using
Best-Worst scaling (Louviere and Woodworth,
1991; Louviere et al., 2015), an elicitation para-
digm shown to be more accurate than Likert
escamas. The score for a system is obtained by the
number of times it is rated best minus the number
of times it is rated worst (Orme, 2009). Scores
range between −100 (absolutely worst) y +100
(absolutely best); higher is better. We assessed 40
summaries from the test set (20 for each dataset).
Each summary pair was rated by three participants.
708
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
For MLB, we created 1,800 tareas (10 sistema
pairs × 20 summaries × 3 raters × 3 dimensions)
y 2,700 for ROTOWIRE (15 pairs of systems × 20
summaries × 3 raters × 3 dimensions). Altogether,
377 raters participated in this task. The agree-
ment between the raters using Krippendorff’s α
era 0.49.
On MLB, SeqPlan is significantly more coher-
ent than ED+CC and Templ, and is comparable
with Gold and Macro. A similar picture emerges
with grammaticality. SeqPlan is as concise as
Gold, Macro, and Templ, and significantly better
than ED+CC. On ROTOWIRE, SeqPlan is signif-
icantly more coherent than Templ and ED+CC,
but on par with Macro, RBF-2020, and Gold.
In terms of conciseness, SeqPlan is compara-
ble to Gold, Macro, RBF-2020, and ED+CC,
and significantly better than Templ. In terms of
grammaticality, SeqPlan is comparable to Macro,
RBF-2020, and ED+CC, significantly better than
Templ, and significantly worse than Gold.
6 Discusión
En este trabajo, we proposed a novel sequential la-
tent variable model for joint macro planning and
generación. Key in our approach is the creation of
a latent plan in a sequential manner, while inter-
leaving the prediction of plans and the generation
of corresponding paragraphs. We proposed to de-
construct monolithic long document generation
into smaller units (párrafos, in our case), cual
affords flexibility and better communication be-
tween planning and generation. Tomados juntos,
the results of automatic and human evaluation
suggest that SeqPlan performs best in terms of
factuality and coherence, it generates diverse, y
overall fluent, summaries, and is less data-hungry
compared with strong systems like Macro and
NCP+CC. As SeqPlan does not have to learn
alignments between the macro plan and the output
texto, it is better suited for long-form generation.
Potential applications include summarizing books
(Kry´sci´nski et al., 2021), where the output can
be longer than 1,000 tokens, or generating finan-
cial reports (Kogan et al., 2009; H¨andschke et al.,
2018), where the output exceeds 9,000 tokens.
Existing approaches for long-form generation
summarize individual paragraphs independently
(Kry´sci´nski et al., 2021) or adopt a hierarchical
acercarse (Wu et al., 2021), where summaries of
ST. LOUIS – The St. Louis Cardinals have been waiting for
their starting rotation.
Skip Schumaker drove in the go-ahead
run with a double in the ninth inning, and the Cardinals beat the
Milwaukee Brewers 4–3 on Wednesday night to avoid a three-game
sweep.
The Cardinals have won four of five, and have won
four in a row.
The Cardinals have won four of five, incluido
a three-game sweep by the Brewers.
Brian Barton led off
the ninth with a pinch-hit double off Derrick Turnbow (0–1) y
moved to third on Cesar Izturis’ sacrifice bunt. Schumaker drove in
Barton with a double down the left-field line.
Ryan Braun,
who had two hits, led off the eighth with a double off Ryan Franklin
(1–1). Braun went to third on a wild pitch and scored on Corey Hart’s
triple into the right-field corner.
Albert Pujols was intentionally
walked to load the bases with one out in the eighth, and Guillermo
Ankiel flied out. Troy Glaus walked to load the bases for Kennedy,
who hit a sacrifice fly off Guillermo Mota.
Ryan Franklin (1–1)
got the win despite giving up a run in the eighth. Ryan Braun led
off with a double and scored on Corey Hart’s one-out triple.
Jason Isringhausen pitched a perfect ninth for his seventh save in
nine chances. He has converted his last six save opportunities and has
n’t allowed a run in his last three appearances.
The Brewers
lost for the seventh time in eight games.
Wainwright allowed
two runs and four hits in seven innings. He walked four and struck
out six.
Brewers manager Ron Roenicke was ejected by home
plate umpire Bill Miller for arguing a called third strike.
El
Cardinals took a 2–0 lead in the third. Albert Pujols walked with two
outs and Rick Ankiel walked. Glaus then lined a two-run double into
the left-field corner.
The Brewers tied it in the third. Jason
Kendall led off with a double and scored on Rickie Weeks’ double.
Ryan Braun’s RBI single tied it at 2.
Villanueva allowed two
runs and three hits in seven innings. He walked four and struck
out one.
Mesa 7: Predicted macro plan (arriba) and gener-
ated output from our model. Transitions between
paragraph plans are shown using →. Paragraphs
are separated with
delimiters. Entities and
events in the summary corresponding to the macro
plan are boldfaced.
paragraphs form the basis of chapter summaries
which in turn are composed into a book summary.
Mesa 7 gives an example of SeqPlan output.
We see that the game summary follows the macro
plan closely. Además, the paragraph plans and
the paragraphs exhibit coherent ordering. Manual
inspection of SeqPlan summaries reveals that a
major source of errors in MLB relate to atten-
tion diffusing over long paragraph plans. As an
ejemplo, consider the following paragraph pro-
duced by SeqPlan ‘‘Casey Kotchman had three
hits and three RBIs, including a two-run double
in the second inning that put the Angels up 2–0.
Torii Hunter had three hits and drove in a run.’’
En realidad, Torii Hunter had two hits but the model
incorrectly generates hits for Casey Kotchman.
709
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
The corresponding paragraph plan is 360 tokens
long and attention fails to discern important to-
kens. A more sophisticated encoder, Por ejemplo,
based on Transformers (Vaswani et al., 2017),
could make attention more focused. In ROTOWIRE,
the majority of errors involve numbers (p.ej., equipo
atributos) and numerical comparisons. Incorpo-
rating pre-executed operations such as min, máximo
(Nie et al., 2018) could help alleviate these errors.
Finalmente, it is worth mentioning that although
the template models achieve highest RG precision
for both MLB and ROTOWIRE (Tables 2 y 3),
this is mainly because they repeat facts from the
mesa. Template models score low against CS F,
CO, and BLEU metrics. Además, they obtain
lowest scores in Grammaticality and Coherence
(Mesa 6), which indicates that they are poor at
selecting records from the table and ordering them
correctly in a fluent manner.
Expresiones de gratitud
We thank the Action Editor, Ehud Reiter, y
the anonymous reviewers for their constructive
comentario. We also thank Parag Jain for helpful dis-
cussions. We acknowledge the financial support
of the European Research Council (award num-
ber 681760, ‘‘Translating Multiple Modalities
into Text’’).
Referencias
Dzmitry Bahdanau, Kyunghyun Cho, y yoshua
bengio. 2015. Traducción automática neuronal por
aprender juntos a alinear y traducir. en 3ro
Conferencia Internacional sobre Aprendizaje Repre-
sentaciones, ICLR 2015, San Diego, California, EE.UU,
May 7–9, 2015, Conference Track Proceedings.
Regina Barzilay and Mirella Lapata. 2005. Columna-
lective content selection for concept-to-text
generación. In Proceedings of Human Lan-
guage Technology Conference and Conference
sobre métodos empíricos en lenguaje natural
Procesando, pages 331–338, vancouver, British
Columbia, Canada. Asociación de Computación-
lingüística nacional. https://doi.org/10
.3115/1220575.1220617
Tanto Bengio, Oriol Vinyals, Navdeep Jaitly,
and Noam Shazeer. 2015. Scheduled sam-
pling for sequence prediction with recurrent
neural networks. In Proceedings of the 28th
International Conference on Neural Informa-
tion Processing Systems – Volumen 1, NIPS’15,
pages 1171–1179, Cambridge, MAMÁ, EE.UU.
CON prensa.
Junyoung Chung, Kyle Kastner, Laurent Dinh,
Kratarth Goel, Aaron C. Courville, y yoshua
bengio. 2015. A recurrent latent variable model
for sequential data. En avances en neurología
Sistemas de procesamiento de información, volumen 28.
Asociados Curran, Cª.
Carl Doersch. 2016. Tutorial on variational auto-
encoders. CORR, abs/1606.05908.
Pablo A. Duboue and Kathleen R. McKeown.
2001. Empirically estimating order constraints
for content planning in generation. En profesional-
the 39th Annual Meeting of
cesiones de
la Asociación de Lingüística Computacional,
pages 172–179, Tolosa, Francia. Asociación
para Lingüística Computacional.
John C. Duchi, Elad Hazan, and Yoram Singer.
2011. Adaptive subgradient methods for online
learning and stochastic optimization. Diario
de Investigación sobre Aprendizaje Automático, 12:2121–2159.
Angela Fan, David Grangier, and Michael Auli.
2018. Controllable abstractive summarization.
the 2nd Workshop on
En procedimientos de
Neural Machine Translation and Generation,
pages 45–54, Melbourne, Australia. asociación-
ción para la Lingüística Computacional.
Marco Fraccaro, Søren Kaae Sønderby, Ulrich
Paquet, and Ole Winther. 2016. Sequential neu-
ral models with stochastic layers. In Advances
en sistemas de procesamiento de información neuronal,
volumen 29. Asociados Curran, Cª.
Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen,
Yansong Feng, y Alejandro Rush. 2020.
Latent template induction with Gumbel-CRFS.
In Advances in Neural Information Process-
ing Systems, volumen 33, pages 20259–20271.
Asociados Curran, Cª.
Claire Gardent, Anastasia Shimorina, Shashi
Narayan, and Laura Perez-Beltrachini. 2017.
Creating training corpora for NLG micro-
planners. In Proceedings of the 55th Annual
reunión de
la Asociación de Computación-
lingüística nacional (Volumen 1: Artículos largos),
pages 179–188, vancouver, Canada. asociación-
ción para la Lingüística Computacional. https://
doi.org/10.18653/v1/P17-1017
710
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Albert Gatt and Emiel Krahmer. 2018. Survey of
the state of the art in natural language gen-
eration: Core tasks, applications and evalua-
ción. Journal of Artificial Intelligence Research,
61:65–170. https://doi.org/10.1613
/jair.5477
Sebastian Gehrmann, Yuntian Deng,
y
Alexander Rush. 2018. Bottom-up abstractive
el 2018
summarization. En procedimientos de
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 4098–4109,
Bruselas, Bélgica. Asociación de Computación-
lingüística nacional. https://doi.org/10
.18653/v1/D18-1443
Heng Gong, Xiaocheng Feng, Bing Qin, y
Ting Liu. 2019. Table-to-text generation with
effective hierarchical encoder on three dimen-
siones (row, column and time). En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),
pages 3143–3152, Hong Kong, Porcelana. asociación-
ción para la Lingüística Computacional. https://
doi.org/10.18653/v1/D19-1310
Anirudh Goyal, Alessandro Sordoni, Bagazo-
Alexandre Cˆot´e, Nan Rosemary Ke, y yoshua
bengio. 2017. Z-forcing: Training stochastic
redes recurrentes. En avances en neurología
Sistemas de procesamiento de información, volumen 30.
Asociados Curran, Cª.
Markus Guhe. 2020. Incremental Conceptual-
ization for Language Production. Psychol-
ogy Press. https://doi.org/10.4324
/9781003064398
Sebastian G. METRO. H¨andschke, Sven Buechel,
Jan Goldenstein, Philipp Poschmann, Tinghui
Duan, Peter Walgenbach, and Udo Hahn. 2018.
A corpus of corporate annual and social respon-
sibility reports: 280 million tokens of balanced
organizational writing. En Actas de la
First Workshop on Economics and Natural Lan-
Procesamiento de calibre, pages 20–31, Melbourne,
Australia. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653
/v1/W18-3103
Hiroaki Hayashi, Yusuke Oda, Alexandra Birch,
Ioannis Konstas, Andrew Finch, Minh-Thang
Luong, Graham Neubig, and Katsuhito Sudoh.
2019. Findings of the third workshop on neural
generation and translation. En procedimientos de
the 3rd Workshop on Neural Generation and
Translation, pages 1–14, Hong Kong. asociación-
ción para la Lingüística Computacional. https://
doi.org/10.18653/v1/D19-5601
Hayate Iso, Yui Uehara, Tatsuya Ishigaki,
Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi,
Yusuke Miyao, Naoaki Okazaki, and Hiroya
Takamura. 2019. Learning to select, track, y
generate for data-to-text. En Actas de la
57ª Reunión Anual de la Asociación de
Ligüística computacional, pages 2102–2113,
Florencia, Italia. Asociación de Computación
Lingüística. https://doi.org/10.18653
/v1/P19-1202
Eric Jang, Shixiang Gu, and Ben Poole. 2017.
Categorical reparametrization with Gumble-
softmax. In International Conference on Learn-
ing Representations (ICLR 2017). OpenReview
.net.
Min-Yen Kan and Kathleen R. McKeown. 2002.
Corpus-trained text generation for summa-
rization. In Proceedings of the International
Natural Language Generation Conference,
pages 1–8, Harriman, Nueva York, EE.UU. también-
ciation for Computational Linguistics.
Zdenˇek Kasner, Simon Mille, and Ondˇrej Duˇsek.
2021. Text-in-context: Token-level error detec-
tion for table-to-text generation. En procedimientos
of the 14th International Conference on Nat-
ural Language Generation, pages 259–265,
Aberdeen, Escocia, Reino Unido. Asociación para
Ligüística computacional.
Byeongchang Kim, Jaewoo Ahn, and Gunhee
kim. 2020. Sequential latent knowledge se-
lection for knowledge-grounded dialogue. En
Conferencia Internacional sobre Aprendizaje Repre-
sentaciones. https://doi.org/10.1145
/3459637.3482314
Diederik P. Kingma and Max Welling. 2014.
Auto-encoding variational Bayes. In 2nd In-
ternational Conference on Learning Represen-
taciones, ICLR 2014, Banff, AB, Canada, Abril
14-dieciséis, 2014, Conference Track Proceedings.
Guillaume Klein, Yoon Kim, Yuntian Deng, Jean
Senellart, y Alejandro Rush. 2017. Open-
NMT: Open-source toolkit for neural machine
711
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
traducción. In Proceedings of ACL 2017, Sys-
tem Demonstrations, pages 67–72, vancouver,
Canada. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653
/v1/P17-4012
Shimon Kogan, Dimitry Levin, Bryan R.
Routledge, Jacob S. Sagi, y Noé A.. Herrero.
2009. Predicting risk from financial reports with
regression. In Proceedings of Human Language
Technologies: El 2009 Annual Conference of
the North American Chapter of the Association
para Lingüística Computacional, pages 272–280,
Roca, Colorado. Asociación de Computación-
lingüística nacional. https://doi.org/10
.3115/1620754.1620794
Ioannis Konstas and Mirella Lapata. 2013. En-
ducing document plans for concept-to-text
generación. En Actas de la 2013 Estafa-
Conferencia sobre métodos empíricos en Lan Natural.-
Procesamiento de calibre, pages 1503–1514, seattle,
Washington, EE.UU. Asociación de Computación-
lingüística nacional.
Wojciech Kry´sci´nski, Nazneen Rajani, Divyansh
agarwal, Caiming Xiong, and Dragomir Radev.
2021. Booksum: A collection of datasets for
long-form narrative summarization.
R´emi Lebret, David Grangier, y miguel
Auli. 2016. Neural text generation from struc-
tured data with application to the biography do-
principal. En Actas de la 2016 Conferencia
sobre métodos empíricos en lenguaje natural
Procesando, pages 1203–1213, austin, Texas.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/D16
-1128
Chris van der Lee, Albert Gatt, Emiel van
and Emiel
Miltenburg, Sander Wubben,
Krahmer. 2019. Best practices for the human
evaluation of automatically generated text. En
Actas de
the 12th International Con-
ference on Natural Language Generation,
pages 355–368, Tokio, Japón. Asociación para
Ligüística computacional.
Willem J. METRO. Levelt. 1993. Speaking: De
Intention to Articulation, volumen 1. CON prensa.
https://doi.org/10.7551/mitpress
/6393.001.0001
mike lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer
Exacción, Veselin Stoyanov, and Luke Zettlemoyer.
2020. BART: Denoising sequence-to-sequence
pre-training for natural language generation,
traducción, and comprehension. En curso-
cosas de
el
Asociación de Lingüística Computacional,
pages 7871–7880, En línea. Asociación para
Ligüística computacional. https://doi
.org/10.18653/v1/2020.acl-main.703
the 58th Annual Meeting of
Piji Li, Wai Lam, Lidong Bing, and Zihao
Wang. 2017. Deep recurrent generative de-
coder for abstractive text summarization. En
Actas de la 2017 Conference on Em-
pirical Methods in Natural Language Process-
En g, pages 2091–2100, Copenhague, Dinamarca.
Asociación de Lingüística Computacional.
Xiang Lisa Li and Alexander Rush. 2020.
Posterior control of blackbox generation. En
Actas de la 58ª Reunión Anual de
la Asociación de Lingüística Computacional,
pages 2731–2743, En línea. Asociación para
Ligüística computacional.
Jordan J. Louviere, Terry N. Flynn, and Anthony
Alfred John Marley. 2015. Best-Worst Scaling:
Teoría, Methods and Applications. Cambridge
Prensa universitaria. https://doi.org/10
.1017/CBO9781107337855
Jordan J. Louviere and George G. Woodworth.
1991. Best-Worst Scaling: A Model for the
Largest Difference Judgments. Universidad de
Alberta: Working Paper.
Chris J. Maddison, Andriy Mnih, y si
Whye Teh. 2017. The concrete distribution:
relaxation of discrete ran-
A continuous
dom variables. In International Conference
on Learning Representations (ICLR 2017).
OpenReview.net.
Hongyuan Mei, Mohit Bansal, and Matthew
R. walter. 2016. Qué
to talk about and
cómo? Selective generation using LSTMs with
coarse-to-fine alignment. En procedimientos de
el 2016 Conference of the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
pages 720–730, San Diego, California. Associ-
ation for Computational Linguistics.
Amit Moryossef, Yoav Goldberg, and Ido
Dagan. 2019a. Improving quality and effi-
ciency in plan-based neural data-to-text gener-
ación. In Proceedings of the 12th International
712
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Conference on Natural Language Generation,
pages 377–382, Tokio, Japón. Asociación para
Ligüística computacional. https://doi
.org/10.18653/v1/W19-8645
Amit Moryossef, Yoav Goldberg, and Ido Dagan.
2019b. Step-by-step: Separating planning from
realization in neural data-to-text generation.
En procedimientos de
el 2019 Conference of
the North American Chapter of the Associ-
ation for Computational Linguistics: Humano
Language Technologies, Volumen 1 (Long and
Artículos breves), pages 2267–2277, Mineápolis,
Minnesota. Asociación de Computación
Lingüística.
Shashi Narayan, Joshua Maynez, Jakub Adamek,
Daniele Pighin, Blaz Bratanic, y ryan
McDonald. 2020. Stepwise extractive summa-
rization and planning with structured transform-
ers. En Actas de la 2020 Conferencia sobre
Métodos empíricos en Natural Language Pro-
cesando (EMNLP), pages 4143–4159, En línea.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/2020
.emnlp-main.339
Feng Nie, Jinpeng Wang, Jin-Ge Yao, Rong Pan,
and Chin-Yew Lin. 2018. Operation-guided
neural networks for high fidelity data-to-text
generación. En Actas de la 2018 Estafa-
Conferencia sobre métodos empíricos en Lan Natural.-
Procesamiento de calibre, pages 3879–3889, Bruselas,
Bélgica. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653
/v1/D18-1422
Bryan Orme. 2009. Maxdiff analysis: Simple
counting, individual-level logit, and HB. Saw-
tooth Software.
Kishore Papineni, Salim Roukos, Todd Ward,
y Wei-Jing Zhu. 2002. AZUL: Un método para
evaluación automática de la traducción automática.
In Proceedings of the 40th Annual Meeting of
la Asociación de Lingüística Computacional,
páginas 311–318, Filadelfia, Pensilvania,
EE.UU. Asociación de Lingüística Computacional.
https://doi.org/10.3115/1073083
.1073135
Ratish Puduppully, Li Dong, and Mirella Lapata.
2019a. Data-to-text generation with content se-
lection and planning. In Proceedings of the 33rd
Conferencia AAAI sobre Inteligencia Artificial.
Honolulu, Hawaii. https://doi.org/10
.1609/aaai.v33i01.33016908
Ratish Puduppully, Li Dong, and Mirella Lapata.
2019b. Data-to-text generation with entity
modelado. In Proceedings of the 57th Annual
Meeting of the Association for Computational
Lingüística, pages 2023–2035, Florencia, Italia.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/P19
-1195
Ratish Puduppully and Mirella Lapata. 2021.
Data-to-text generation with macro planning.
Transactions of the Association for Computa-
lingüística nacional, abs/2102.02723. https://
doi.org/10.1162/tacl a 00381
Ratish Puduppully,
Jonathan Mallinson, y
Mirella Lapata. 2019C. University of Edin-
burgh’s submission to the document-level
generation and translation shared task. En profesional-
cesiones de
the 3rd Workshop on Neural
Generation and Translation, pages 268–272,
Hong Kong. Asociación de Computación
Lingüística. https://doi.org/10.18653
/v1/D19-5630
Cl´ement Rebuffel, Laure Soulier, Geoffrey
Scoutheeten, and Patrick Gallinari. 2020. A hi-
erarchical model for data-to-text generation. En
European Conference on Information Retrieval,
pages 65–80. Saltador. https://doi.org
/10.1007/978-3-030-45439-5_5
Lena Reed, Shereen Oraby, and Marilyn Walker.
2018. Can neural generators for dialogue
learn sentence planning and discourse structur-
En g? In Proceedings of the 11th International
Conference on Natural Language Genera-
ción, pages 284–295, Tilburg University, El
Países Bajos. Asociación de Computación
Lingüística. https://doi.org/10.18653
/v1/W18-6535
Ehud Reiter and Robert Dale. 1997. Edificio
applied natural language generation systems.
Natural Language Engineering, 3(1):57–87.
https://doi.org/10.1017/S13513249
97001502
Ehud Reiter and Robert Dale. 2000. Build-
ing Natural Language Generation Systems,
Prensa de la Universidad de Cambridge, Nueva York, Nueva York.
https://doi.org/10.1017/CBO978051
1519857
713
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Danilo Jimenez Rezende, Shakir Mohamed, y
Daan Wierstra. 2014. Stochastic backprop-
agation and approximate inference in deep
generative models. In Proceedings of the 31th
Conferencia internacional sobre aprendizaje automático-
En g, ICML 2014, Beijing, Porcelana, 21–26 June
2014, volumen 32 of JMLR Workshop and
Conference Proceedings, pages 1278–1286.
JMLR.org.
Jacques Robin. 1994. Revision-based generation
of Natural Language Summaries providing his-
torical Background. Doctor.
tesis, Columbia
Universidad.
Sascha Rothe, Shashi Narayan, and Aliaksei
Severyn. 2020. Leveraging pre-trained check-
points for sequence generation tasks. Trans-
acciones de la Asociación de Computación
Lingüística, 8:264–280. https://doi.org
/10.1162/tacl_a_00313
Fahimeh
Ioan
Saleh, Alexandre Berard,
Calapodescu, and Laurent Besacier. 2019. Na-
ver Labs Europe’s systems for the document-
level generation and translation task at WNGT
the 3rd Work-
2019.
shop on Neural Generation and Translation,
pages 273–279, Hong Kong. Asociación para
Ligüística computacional. https://doi
.org/10.18653/v1/D19-5631
En procedimientos de
Abigail See, Peter J. Liu, and Christopher D.
to the point: Summa-
Manning. 2017. Get
rization with pointer-generator networks. En
Proceedings of the 55th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1073–1083,
vancouver, Canada. Asociación de Computación-
lingüística nacional.
Rico Sennrich, Barry Haddow, and Alexandra
Birch. 2016. Neural machine translation of rare
words with subword units. En procedimientos de
the 54th Annual Meeting of the Association for
Ligüística computacional (Volumen 1: Largo
Documentos), pages 1715–1725, Berlina, Alemania.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/P16
-1162
the Thirty-First AAAI Conference on Arti-
ficial Intelligence, Febrero 4-9, 2017, san
Francisco, California, EE.UU, pages 3295–3301.
AAAI Press.
Shiv Shankar and Sunita Sarawagi. 2019. Pos-
terior attention models for sequence to se-
quence learning. In International Conference
on Learning Representations. https://doi
.org/10.18653/v1/D18-1065
Zhihong Shao, Minlie Huang, Jiangtao Wen,
Wenfei Xu, and Xiaoyan Zhu. 2019. Long and
diverse text generation with planning-based hi-
erarchical variational model. En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),
pages 3257–3268, Hong Kong, Porcelana. asociación-
ción para la Lingüística Computacional. https://
doi.org/10.18653/v1/D19-1321
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
2014. Sequence to sequence learning with neu-
ral networks. In Advances in Neural Informa-
tion Processing Systems 27, pages 3104–3112.
Asociados Curran, Cª.
Kumiko Tanaka-Ishii, Koiti Hasida, and Itsuki
Noda. 1998. Reactive content selection in the
generation of real-time soccer commentary. En
36ª Reunión Anual de la Asociación de
Computational Linguistics and 17th Interna-
tional Conference on Computational Linguis-
tics, Volumen 2, pages 1282–1288, Montréal,
Quebec, Canada. Asociación de Computación-
lingüística nacional. https://doi.org/10
.3115/980691.980778
METRO. Martin Taylor and Insup Taylor. 1990. Book re-
puntos de vista: Speaking: From intention to articulation.
Ligüística computacional, 16(1).
Craig Thomson and Ehud Reiter. 2020. A gold
standard methodology for evaluating accu-
racy in data-to-text systems. En procedimientos
of the 13th International Conference on Nat-
ural Language Generation, pages 158–168,
Dublín, Irlanda. Asociación de Computación
Lingüística.
Iulian Vlad Serban, Alessandro Sordoni, ryan
Lowe, Laurent Charlin, Joelle Pineau, Aaron C.
Courville, and Yoshua Bengio. 2017. A hierar-
chical latent variable encoder-decoder model
for generating dialogues. En procedimientos de
Craig Thomson and Ehud Reiter. 2021. generación-
eration challenges: Results of the accuracy
evaluation shared task. En Actas de la
14th International Conference on Natural Lan-
guage Generation, pages 240–248, Aberdeen,
714
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Escocia, Reino Unido. Asociación de Computación
Lingüística.
Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Leon Jones, Aidan N.. Gómez,
lucas káiser, y Illia Polosukhin. 2017,
Attention is all you need.
In I. Guyon,
Ud.. V. Luxburg, S. bengio, h. Wallach, R.
Fergus, S. Vishwanathan, y r. Garnett,
editores, Advances in Neural Information Pro-
cessing Systems 30, pages 5998–6008. Curran
Associates, Cª.
Oriol Vinyals, Meire Fortunato, and Navdeep
Jaitly. 2015, Pointer networks. In C. Cortes, norte.
D. lorenzo, D. D. Sotavento, METRO. Sugiyama, y r.
Garnett, editores, Advances in Neural Informa-
tion Processing Systems 28, pages 2692–2700.
Asociados Curran, Cª.
Ronald J. Williams and Jing Peng. 1990. An effi-
cient gradient-based algorithm for on-line train-
ing of recurrent network trajectories. Neural
Cálculo, 2(4):490–501. https://doi
.org/10.1162/neco.1990.2.4.490
Sam Wiseman, Stuart Shieber, y alejandro
Rush. 2017. Challenges in data-to-document
generación. En Actas de la 2017 Estafa-
ference on Empirical Methods in Natural
Procesamiento del lenguaje,
2253–2263,
Copenhague, Dinamarca. Asociación para Com-
Lingüística putacional. https://doi.org
/10.18653/v1/D17-1239
paginas
Jeff Wu, Long Ouyang, Daniel M. Ziegler,
Nisan Stiennon, Ryan Lowe, Jan Leike, y
Paul F. Christiano. 2021. Recursively sum-
marizing books with human feedback. CORR,
abs/2109.10862.
En procedimientos de
Zichao Yang, Diyi Yang, Chris Dyer, Xiao Dong
Él, Alex Smola, and Eduard Hovy. 2016.
Hierarchical attention networks for document
el 2016
clasificación.
Conferencia del Capítulo Norteamericano
the Association for Computational Lin-
de
guísticos: Tecnologías del lenguaje humano,
pages 1480–1489, San Diego, California.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/N16
-1174
Rong Ye, Wenxian Shi, Hao Zhou, Zhongyu Wei,
and Lei Li. 2020. Variational template machine
for data-to-text generation. En internacional
Conferencia sobre Representaciones del Aprendizaje.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
8
4
2
0
2
9
9
5
4
/
/
t
yo
a
C
_
a
_
0
0
4
8
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
715