Generación de datos a texto con planificación macro
Ratish Puduppully and Mirella Lapata
Institute for Language, Cognition and Computation
School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB
r.puduppully@sms.ed.ac.uk
mlap@inf.ed.ac.uk
Abstracto
Recent approaches to data-to-text generation
have adopted the very successful encoder-
decoder architecture or variants thereof. Estos
models generate text that is fluent (but often
imprecise) and perform quite poorly at select-
ing appropriate content and ordering it co-
herently. To overcome some of these issues,
we propose a neural model with a macro
planning stage followed by a generation stage
reminiscent of traditional methods which em-
brace separate modules for planning and sur-
face realization. Macro plans represent high
level organization of important content such
as entities, events, and their interactions; ellos
are learned from data and given as input to
the generator. Extensive experiments on two
data-to-text benchmarks (ROTOWIRE and MLB)
show that our approach outperforms compet-
itive baselines in terms of automatic and hu-
man evaluation.
1
Introducción
Data-to-text generation refers to the task of gen-
erating textual output from non-linguistic input
(Reiter y Dale, 1997, 2000; Gatt and Krahmer,
2018) such as databases of records, simulations
of physical systems, accounting spreadsheets, o
expert system knowledge bases. Como ejemplo,
Cifra 1 shows various statistics describing a
major league baseball (MLB) juego, incluido
extracts from the box score (es decir.,
the perfor-
mance of the two teams and individual
equipo
members who played as batters, pitchers or field-
ers; Mesa (A)), play-by-play (es decir., the detailed
sequence of each play of the game as it occurred;
Mesa (B)), and a human written game summary
(Mesa (C)).
Traditional methods for data-to-text generation
(Kukich, 1983; McKeown, 1992; Reiter y Dale,
510
1997) follow a pipeline architecture, adopting
separate stages for text planning (determinando
which content to talk about and how it might
be organized in discourse), planificación de oraciones
(aggregating content into sentences, deciding spe-
cific words to describe concepts and relations, y
generating referring expressions), and linguistic
realization (applying the rules of syntax, mor-
phology, and orthographic processing to generate
surface forms). Recent neural network–based
approaches (Lebret et al., 2016; Mei et al., 2016;
Wiseman et al., 2017) make use of the encoder-
decoder architecture (Sutskever et al., 2014), son
trained end-to-end, and have no special-purpose
modules for how to best generate a text, aside
from generic mechanisms such as attention and
copy (Bahdanau et al., 2015; Gu et al., 2016). El
popularity of end-to-end models has been fur-
ther boosted by the release of new datasets with
thousands of input-document training pairs. El
example shown in Figure 1 is taken from the MLB
conjunto de datos (Puduppully et al., 2019b), which contains
baseball game statistics and human written sum-
maries (∼25K instances). ROTOWIRE (Wiseman
et al., 2017) is another widely used benchmark,
which contains NBA basketball game statistics
and their descriptions (∼5K instances).
Wiseman et al. (2017) show that despite being
able to generate fluent text, neural data-to-text
generation models are often imprecise, prone
to hallucination (es decir., generate text that is not
supported by the input), and poor at content
selection and document structuring. Attempts to
remedy some of these issues focus on changing
the way entities are represented (Puduppully et al.,
2019b; Iso et al., 2019), allowing the decoder to
skip low-confidence tokens to enhance faithful
generación (Tian et al., 2019), and making the
encoder-decoder architecture more modular by
introducing micro planning (Puduppully et al.,
2019a; Moryossef et al., 2019). Micro planning
operates at the record level (ver tabla (A) en
Transacciones de la Asociación de Lingüística Computacional, volumen. 9, páginas. 510–527, 2021. https://doi.org/10.1162/tacl a 00381
Editor de acciones: Claire Gardent. Lote de envío: 11/2020; Lote de revisión: 2/2021; Publicado 5/2021.
C(cid:3) 2021 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
/
/
t
yo
a
C
_
a
_
0
0
3
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
/
/
t
yo
a
C
_
a
_
0
0
3
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 1: MLB statistics tables and game summary. Tables summarize the performance of teams and individual
team members who played as batters and pitchers as well as the most important actions (and their actors) in each
play (Tables (A) y (B)). Macro plan for the game summary is shown at the bottom (Mesa (mi)).
indicates
paragraph delimiters. There is a plan for every paragraph in the game summary (correspondence shown in same
color);
side of an inning (mira la sección 3.1). Set of candidate paragraph plans are shown above macro plan (Mesa (D)) y
grouped into two types: plans describing a single entity/event or their combinations. Best viewed in color.
Cifra 1; p.ej., C.Mullins BH 2, J.Villar TEAM Orioles),
it determines which facts should be mentioned
within a textual unit (p.ej., una sentencia) y cómo
these should be structured (p.ej., the sequence of
records). An explicit content planner essentially
makes the job of the neural network less onerous
allowing to concentrate on producing fluent natu-
ral language output, without expending too much
effort on content organization.
En este trabajo, we focus on macro planning, el
high-level organization of information and how
it should be presented which we argue is impor-
tant for the generation of long, multi-paragraph
documentos (see text (C) En figura 1). Problema-
atically, modern datasets like MLB (Puduppully
et al., 2019b; and also Figure 1) and ROTOWIRE
(Wiseman et al., 2017) do not naturally lend
themselves to document planning as there is no
explicit link between the summary and the content
of the game (which is encoded in tabular form).
En otras palabras, the underlying plans are latent,
and it is not clear how they might be best repre-
enviado, a saber, as sequences of records from a
mesa, or simply words. Sin embargo, game sum-
maries through their segmentation into paragraphs
(and lexical overlap with the input) give clues
as to how content might be organized. Paragraphs
are a central element of discourse (Chafe, 1979;
Longacre, 1979; Halliday and Hasan, 1976),
the smallest domain where coherence and topic
are defined and anaphora resolution is possible
511
(Zadrozny and Jensen, 1991). We therefore oper-
ationalize the macro plan for a game summary as
a sequence of paragraph plans.
Although resorting to paragraphs describes the
summary plan at a coarse level, we still need to
specify individual paragraph plans. In the sports
domain, paragraphs typically mention entities
(p.ej., players important in the game), key events
(p.ej., scoring a run), and their interaction. Y
most of this information is encapsulated in the
statistics accompanying game summaries (ver
Tables (A) y (B) En figura 1). We thus define
paragraph plans such that they contain verbaliza-
tions of entity and event records (see plan (mi) en
Cifra 1). Given a set of paragraph plans and their
corresponding game summary (ver tabla (D) y
summary (C) En figura 1), our task is twofold.
en el momento del entrenamiento, we must learn how content was
selected in order to give rise to specific game
summaries (p.ej., how input (D) led to plan (mi)
for summary (C) En figura 1), while at test time,
given input for a new game, we first predict a
macro plan for the summary and then generate the
corresponding document.
We present a two-stage approach where macro
plans are induced from training data (by taking the
table and corresponding summaries into account)
and then fed to the text generation stage. Aside
from making data-to-text generation more inter-
pretable, the task of generating a document from
a macro plan (rather than a table) affords greater
control over the output text and plays to the advan-
tage of encoder-decoder architectures which excel
at modeling sequences. We evaluate model per-
formance on the ROTOWIRE (Wiseman et al., 2017)
and MLB (Puduppully et al., 2019b) benchmarks.
Experimental results show that our plan-and-
generate approach produces output that is more
factual, coherent, and fluent compared with exist-
ing state-of-the-art models. Our code,
entrenado
modelos, and dataset with macro plans can be
found at https://github.com/ratishsp
/data2text-macro-plan-py.
2 Trabajo relacionado
Content planning has been traditionally consid-
ered a fundamental component in natural lan-
guage generation. Not only does it determine
which information-bearing units to talk about,
but also arranges them into a structure that
creates coherent output. Many content plan-
ners have been based on theories of discourse
coherencia (Azul, 1993), schemas (McKeown
et al., 1997), or have relied on generic plan-
ners (Valle, 1989). Plans are mostly based on
hand-crafted rules after analyzing the target text,
although a few approaches have recognized the
need for learning-based methods. Por ejemplo,
Duboue and McKeown (2001) learn ordering
constraints in a content plan, Konstas and Lapata
(2013) represent plans as grammar rules whose
probabilities are estimated empirically, while oth-
ers make use of semantically annotated corpora
to bootstrap content planners
(Duboue and
McKeown, 2002; Kan and McKeown, 2002).
More recently, various attempts have been made
to improve neural generation models (Wiseman
et al., 2017) based on the encoder-decoder archi-
tecture (Bahdanau et al., 2015) by adding various
planning modules. Puduppully et al. (2019a) pro-
pose a model for data-to-text that first learns a
plan from the records in the input table and then
generates a summary conditioned on this plan.
Shao et al. (2019) introduce a Planning-based
Hierarchical Variational Model where a plan is
a sequence of groups, each of which contains a
subset of input items to be covered in a sentence.
The content of each sentence is verbalized, estafa-
ditioned on the plan and previously generated
contexto. In their case, input items are a rela-
tively small list of attributes (∼28) and the output
document is also short (∼110 words).
There have also been attempts to incorporate
neural modules in a pipeline architecture for
data-to-text generation. Moryossef et al. (2019)
develop a model with a symbolic text planning
stage followed by a neural realization stage. Ellos
experiment with the WebNLG dataset (Gardent
et al., 2017) which consists of RDF (cid:4) Subject,
Object, Predicate (cid:5) triples paired with correspond-
ing text. Their document plan is a sequence of
sentence plans that in turn determine the division
of facts into sentences and their order. Along
similar lines, Castro Ferreira et al. (2019) pro-
pose an architecture composed of multiple steps
including discourse ordering, text structuring, lex-
icalization, referring expression generation, y
surface realization. Both approaches show the
effectiveness of pipeline architectures, sin embargo,
their task does not require content selection and
the output texts are relatively short (24 tokens on
promedio).
512
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
/
/
t
yo
a
C
_
a
_
0
0
3
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Although it is generally assumed that task-
specific parallel data is available for model
training, Laha et al. (2020) do away with this
assumption and present a three-stage pipeline
model which learns from monolingual corpora.
They first convert the input to a form of tuples,
which in turn are expressed in simple sentences,
followed by the third stage of merging simple
sentences to form more complex ones by aggre-
gation and referring expression generation. Ellos
also evaluate on data-to-text tasks which have
relatively short outputs. There have also been
efforts to improve the coherence of the output,
especially when dealing with longer documents.
Puduppully et al. (2019b) make use of hierar-
chical attention over entity representations which
are updated dynamically, while Iso et al. (2019)
explicitly keep track of salient entities and mem-
orize which ones have been mentioned.
Our work also attempts to alleviate deficien-
cies in neural data-to-text generation models.
In contrast to previous approaches, (Puduppully
et al., 2019a; Moryossef et al., 2019; Laha et al.,
2020), we place emphasis on macro planning and
create plans representing high-level organization
of a document including both its content and
estructura. We share with previous work (p.ej.,
Moryossef et al. 2019) the use of a two-stage
architecture. We show that macro planning can
be successfully applied to long document data-to-
text generation resulting in improved factuality,
coherencia, and fluency without any postpro-
cesando (p.ej., to smooth referring expressions)
or recourse to additional tools (p.ej., parsing or
information extraction).
3 Problem Formulation
We hypothesize that generation based on plans
should fare better compared to generating from a
set of records, since macro plans offer a bird’s-eye
vista, a high-level organization of the document
content and structure. We also believe that macro
planning will work well for long-form text genera-
ción, eso es, for datasets that have multi-paragraph
target texts, a large vocabulary space, and require
content selection.
We assume the input to our model is a set of
paragraph plans E = {ei}| mi |
i=1 where ei is a para-
graph plan. We model the process of generating
output summary y given E as a two step process,
a saber, the construction of a macro plan x based
on the set of paragraph plans, followed by the
generation of a summary given a macro plan as
aporte. We now explain how E is obtained and
each step is realized. We discuss our model con-
sidering mainly an example from the MLB dataset
(Puduppully et al., 2019b) but also touch on how
the approach can be straightforwardly adapted to
ROTOWIRE (Wiseman et al., 2017).
3.1 Macro Plan Definition
A macro plan consists of a sequence of paragraph
plans separated by a paragraph discourse marker
, eso es, x = ei
ej . . .
ek where
ei, ej, ek ∈ E . A paragraph plan in turn is a
sequence of entities and events describing the
juego. By entities we mean individual players or
teams and the information provided about them in
box score statistics (see rows and column headings
En figura 1 Mesa (A)), while events refer to infor-
mation described in play-by-play (ver tabla (B)).
In baseball, plays are grouped in half-innings.
During each half of an inning, a team takes its
turn to bat (the visiting team bats in the top half
and the home team in the bottom half). An exam-
ple macro plan is shown at the bottom of Figure 1.
Within a paragraph plan, entities and events are
verbalized into a text sequence along the lines of
Saleh et al. (2019). We make use of special tokens
para el
of record from the table. We retain the same posi-
tion for each record type and value. Por ejemplo,
batter C.Mullins from Figure 1 would be verbal-
ized as
2
. . . . For the sake of brevity we use shorthand
Paragraph Plan for Entities For a paragraph
containing entities, the corresponding plan will
be a verbalization of the entities in sequence.
For paragraphs with multiple mentions of the
same entity, the plan will verbalize an entity only
once and at its first position of mention. Párrafo
‘‘Keller gave up a home run . . . the teams with the
worst records in the majors’’ from the summary in
Cifra 1 describes four entities including B. Keller,
C. Mullins, Royals and Orioles. The respective
plan is the verbalization of the four entities in
secuencia:
for verbalization and
513
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
/
/
t
yo
a
C
_
a
_
0
0
3
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
shorthand for Paragraph Plan for Events A paragraph may The procedure described above is not specific 3.2 Macro Plan Construction We provided our definition for macro plans in the Similar to Moryossef et al. (2019), we define To resolve whether the summary discusses the 3.3 Paragraph Plan Construction Cifra 1 shows the macro plan we obtain for 514 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh considering several candidate paragraph plans as Based on this analysis, we assume that para- 4 Model Description The input to our model is a set of paragraph plans, 4.1 Macro Planning Paragraph Plan Representation We encode learnt (1) yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh plan 2: Párrafo Cifra representación 3 en la ecuación (4). 3 Paragraph plan vector ei is the attention weighted (cid:2) j (cid:3) ei = αi,jei,j (2) j Próximo, we contextualize each paragraph plan for each: i (cid:2) (cid:3) ci = βi,kek k(cid:8)=i (3) (cid:2) where Wa ∈ Rn×n, Wg ∈ Rn×2n are parameter (cid:4) (cid:5) eatt gi = sigmoid (4) dónde (cid:9) denotes element-wise multiplication. Content Planning Our model learns to predict 515 separators in between. The conditional output pag(y|X) = |y|(cid:6) t=1 pag(yt|y ROTOWIRE MLB Vocab Size 11.3k 39 38.9k Mesa 1: Dataset statistics for ROTOWIRE and During inference, we employ beam search to ˆz = arg max z(cid:10) pag(z(cid:10)|mi ; i) We deterministically obtain ˆx from ˆz, y ˆy = arg max y(cid:10) pag(y(cid:10)|ˆx; Fi) 5 Experimental Setup en performed experimentos Data We We reconstructed the MLB dataset, como el delimiter to paragraphs in the summaries.3 1https://github.com/neulab/DGT. {gameid}. 3Although our model is trained on game summaries with from model output. 517 ROTOWIRE does not have paragraph delimiters in Training Configuration We tuned the model System Comparisons We compared our model yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh (2019a), generates content plans from the table 6 Resultados Automatic Evaluation For automatic evalua- Let ˆy be the gold summary and y the model We reused the IE model from Puduppully et al. Mesa 2 (arriba) presents our results on the prueba colocar. ROTOWIRE Templ RG CS CO # 54.3 99.9 27.1 57.7 36.9 13.1 MLB # RG CS CO 62.3 99.9 21.6 55.2 31.0 11.0 AZUL 8.46 AZUL 4.12 relation generation (RG) count Mesa 2: Evaluation on ROTOWIRE and MLB best performing model of Wiseman et al. (2017) Indicando que is always faithful Templ has the highest RG precision and count 518 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh than comparison models (p.ej., NCP+CC or ENT). The Contribution of Macro Planning To study Across both datasets, −Plan variants appear Mesa 3 presents intrinsic evaluation of the Macro CS-P CS-R CS-F ROTOWIRE 81.3 73.2 77.0 CO 45.8 Mesa 3: Evaluation of macro planning stage; instead of relations. We see that our macro plan- Human-Based Evaluation We also asked par- We conducted our study on the Amazon 4We are grateful to Cl´ement Rebuffel for providing us with the output of their system. 519 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh ST. PETERSBURG, Fla. (AP) – The Tampa Bay Rays are making Akinori Iwamura hit a two-run homer in the The Rays, who have the best Dioner Navarro singled with one out in the eighth off Scott Dohmann (2-0) got the Troy Percival worked the ninth for Clay Buchholz (1-2) gave The Red Sox loaded the bases with one out in the The Red Sox loaded The Red Sox threatened in the eighth when J. D. Dibujó Mesa 4: Predicted macro plan (arriba) con In our first study, we presented crowdworkers As shown in Table 5, Macro yields the smallest ROTOWIRE #Supp #Contra Gram Coher Concis 3.63 Gold 4.00 38.33 46.25* 0.07 −8.33 5.0 MLB Coher #Supp #Contra Gram Concis hechos Mesa 5: Average number of supported (#Supp) of supported facts for Macro is comparable to We also conducted a second study to evaluate 520 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh range from −100 (absolutely worst) a +100 As shown in Table 5, on ROTOWIRE, Macro is 7 Discusión In this work we presented a plan-and-generate Our results show that macro planning is more of content selection such as E2E (Novikova et al., Throughout our experiments we observed that Despite promising results, there is ample room 521 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh (Yee et al., 2019; Yu et al., 2020), eso es, por For ROTOWIRE, the main source of errors is Finalmente, although our focus so far has been on Expresiones de gratitud We thank the Action Editor, Claire Gardent, acknowledge el Referencias Dzmitry Bahdanau, Kyunghyun Cho, y yoshua 522 Representaciones, ICLR 2015, San Diego, California, Regina Barzilay and Mirella Lapata. 2005. Steven Bird, Ewan Klein, and Edward Loper. En procedimientos de Thiago Castro Ferreira, Chris van der Lee, Procesando Wallace L. Chafe. 1979. The flow of thought Yen-Chun Chen and Mohit Bansal. 2018. Robert Dale. 1989. Generating referring expres- Pablo Duboue and Kathleen McKeown. 2002. yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh algorithms and a corpus-based fitness function. Pablo A. Duboue and Kathleen R. McKeown. John C. Duchi, Elad Hazan, and Yoram Singer. Angela Fan, David Grangier, y miguel Claire Gardent, Anastasia Shimorina, Shashi Albert Gatt and Emiel Krahmer. 2018. Survey in natural Heng Gong, Xiaocheng Feng, Bing Qin, y pages 3143–3152, Hong Kong, Porcelana. también- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor Caglar Gulcehre, Sungjin Ahn, Ramesh METRO. A. k. Halliday and Ruqaiya Hasan. 1976. Sepp Hochreiter y Jürgen Schmidhuber. 1997. Eduard H. Azul. 1993. Automated discourse Hayate Iso, Yui Uehara, Tatsuya Ishigaki, the 57th Annual Meeting of Min-Yen Kan and Kathleen R. McKeown. 2002. 523 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh Natural Language Generation Conference, Colin Kelly, Ann Copestake, and Nikiforos Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Guillaume Klein, Yoon Kim, Yuntian Deng, Ioannis Konstas and Mirella Lapata. 2013. paginas Karen Kukich. 1983. Design of a knowledge- Anirban Laha, Parag Jain, Abhijit Mishra, R´emi Lebret, David Grangier, y miguel R. mi. Longacre. 1979. The paragraph as a Jordan J. Louviere, Terry N. Flynn, y Jordan J. Louviere and George G. Woodworth. approaches Thang Luong, Hieu Pham, and Christopher D. Kathleen R. McKeown. 1992. Text Generation. Estudios Kathleen R. McKeown, Desmond A. Jordán, Hongyuan Mei, Mohit Bansal, and Matthew R. En procedimientos de 524 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh 2016 Conference of Amit Moryossef, Yoav Goldberg, and Ido Dagan. paginas Feng Nie, Jinpeng Wang, Jin-Ge Yao, Rong in Natural Jekaterina Novikova, Ondˇrej Duˇsek, y Romain Paulus, Caiming Xiong, y ricardo Laura Perez-Beltrachini and Mirella Lapata. 2018. Ratish Puduppully, Li Dong, and Mirella Ratish Puduppully, Li Dong, and Mirella Lapata. Alec Radford, Jeffrey Wu, niño rewon, David Bryan Orme. 2009. Maxdiff analysis: Simple individual-level logit, counting, Kishore Papineni, Salim Roukos, Todd Cl´ement Rebuffel, Laure Soulier, Geoffrey European Conference en Ehud Reiter. 1995. NLG vs. templates. CORR, Ehud Reiter and Robert Dale. 1997. Edificio 525 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh Ehud Reiter and Robert Dale. 2000. Edificio Fahimeh Ioan En procedimientos de Rico Sennrich, Barry Haddow, and Alexandra Zhihong Shao, Minlie Huang, Jiangtao Wen, Ilya Sutskever, Oriol Vinyals, and Quoc V. Shunsuke Takeno, Masaaki Nagata, and Kazuhide Ran Tian, Shashi Narayan, Thibault Sellam, y Chris van der Lee, Albert Gatt, Emiel van Ashish Vaswani, Noam Shazeer, Niki Parmar, Asociados Curran, 30, Oriol Vinyals, Meire Fortunato, and Navdeep editores, Avances Eric Wallace, Yizhong Wang, Sujian Li, Sameer Procesando Ronald J. Williams and Jing Peng. 1990. Un Sam Wiseman, Stuart Shieber, y alejandro 526 yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh el 2017 paginas Yonghui Wu, Mike Schuster, Zhifeng Chen, Zichao Yang, Diyi Yang, Chris Dyer, Xiao Dong En procedimientos de Lingüística: Tecnologías del lenguaje humano, Kyra Yee, Yann Dauphin, and Michael Auli. 2019. Procesando Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wlodek Zadrozny and Karen Jensen. 1991. Semántica yo D F t pag : d metro i mi / a yo a F d i . 1 / yo a / / yo a . F b t oh
shorthand for the team 9
14
also describe one or more events. Por ejemplo,
the paragraph ‘‘With the score tied 1–1 in the
fourth . . . 423-foot home run to left
field to
make it 3-1’’ discusses what happened in the
bottom halves of the fourth and fifth innings. Nosotros
verbalize an event by first describing the par-
ticipating entities followed by the plays in the
evento. Entities are described in the order in which
they appear in a play, and within the same play
we list the batter followed by the pitcher, fielder,
scorer, and basemen. The paragraph plan corre-
sponding to the bottom halves of the fourth and
fifth inning is
tities
respond in turn to W. Merrifield, A. Cashner,
B. Goodwin, and H. Dozier while
refers to the first play in the bottom half of
the fifth inning (see the play-by-play table in
Cifra 1) and abbreviates the following detailed
plan:
H.Dozier
Home-run
Etcétera.
to MLB and can be ported to other datasets with
similar characteristics such as ROTOWIRE. Cómo-
alguna vez, ROTOWIRE does not provide play-by-play
información, and as a result there is no event
verbalization for this dataset.
previous sections, sin embargo, it is important to note
that such macro plans are not readily available in
data-to-text benchmarks like MLB (Puduppully
et al., 2019b) and ROTOWIRE (Wiseman et al.,
2017) which consist of tables of records r paired
with a gold summary y (see Tables (A)–(C) en
Cifra 1). We now describe our method for obtain-
ing macro plans x from r and y.
macro plans to be conformant with gold sum-
maries such that (1) they have the same splits
into paragraphs—entities and events within a
paragraph in y are grouped into a paragraph plan
in x; y (2) the order of events and entities
in a paragraph and its corresponding plan are
identical. We construct macro plans by matching
entities and events in the summary to records
in the tables. Además, paragraph delimiters
within summaries form natural units which taken
together give rise to a high-level document plan.
We match entities in summaries with entities
in tables using exact string match, allowing for
some degree of variation in the expression of
team names (p.ej., A’s for Athletics and D-backs
for Diamondbacks). Information pertaining to
innings appears in the summaries in the form of
ordinal numbers (p.ej., primero, ninth) modifying the
noun inning and can be relatively easily identi-
fied via pattern matching (p.ej., in sentences like
‘‘Dozier led off the fifth inning’’). Sin embargo, allá
are instances where the mention of innings is more
ambiguous (p.ej., ‘‘With the scored tied 1–1 in the
fourth, Andrew Cashner (4–13) gave up a sacri-
fice fly’’). We could disambiguate such mentions
manually and then train a classifier to learn to
predict whether an inning is mentioned. En cambio,
we explore a novel annotation-free method that
makes use of the pretrained language model GPT2
(Radford et al., 2019). Específicamente, we feed the
context preceding the ordinal number to GPT2
(es decir.,
the current paragraph up to the ordinal
number and the paragraph preceding it) and if
inning appears in the top 10 next word predictions,
we consider it a positive match. On a held-out
conjunto de datos, this method achieves 98% precision and
98% recall at disambiguating inning mentions.
top or bottom side of an inning, we compare the
entities in the paragraph with the entities in each
half-inning (play-by-play Table (B) En figura 1)
and choose the side with the greater number of
entity matches. Por ejemplo, Andrew Cashner,
Merrifield and fourth inning uniquely resolves to
the bottom half of the fourth inning.
game summary (C). En tono rimbombante, macro plan (mi)
is the outcome of a content selection process after
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
aporte. So, what are the candidate paragraph plans
that give rise to macro plan (mi)? To answer this
pregunta, we examined the empirical distribution
of paragraph plans in MLB and ROTOWIRE (train-
ing portion). Curiosamente, we found that ∼79% of
the paragraph plans in MLB refer to a single event
or a single player (and team(s)). In ROTOWIRE,
∼92% of paragraphs are about a singleton player
(and team(s)) or a pair of players.
graph plans can be either one (verbalized)
entity/event or a combination of at most two.
Under this assumption, we explicitly enumerate
the set of candidate paragraph plans in a game.
For the game in Figure 1, candidate paragraph
plans are shown in Table (D). The first table
groups plans based on individual verbalizations
describing the team(s), jugadores, and events taking
place in specific innings. The second table groups
pairwise combinations thereof. In MLB, semejante
combinations are between team(s) and players. En
ROTOWIRE, we also create combinations between
jugadores. Such paragraph plans form set E based
on which macro plan x is constructed to give rise
to game summary y.
each of which is a sequence of tokens. We first
compute paragraph plan representations ∈ Rn,
and then apply a contextualization and content
planning mechanism similar to planning mod-
ules introduced in earlier work (Puduppully et al.,
2019a; Chen and Bansal, 2018). Predicted macro
plans serve as input to our text generation model,
which adopts an encoder-decoder architecture
(Bahdanau et al., 2015; Luong et al., 2015).
tokens in a verbalized paragraph plan ei as
{ei,j}|ei|
j=1 with a BiLSTM (Cifra 2, bottom part).
To reflect the fact that some records will be more
important than others, we compute an attention
weighted sum of {ei,j}|ei|
j=1 following Yang et al.
(2016). Let d ∈ Rn denote a randomly initialized
query vector
jointly with the rest of
parámetros. We compute attention values αi,j over
d and paragraph plan token representation ei,j:
αi,j ∝ exp(d(cid:2)ei,j)
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
y
contextualization for macro planning. Cálculo
of e3 is detailed in Equations (1) y (2), eatt
en
Ecuación (3), and ec
sum of ei,j (con
αi,j = 1):
representation vis-a-vis other paragraph plans
(Cifra 2, top left part). Primero, we compute attention
scores βi,k over paragraph plan representations to
obtain an attentional vector eatt
βi,k ∝ exp(mi
i Waek)
eatt
i = Wg[ei; ci]
k(cid:8)=i βi,k = 1. Entonces, we compute
matrices, y
a content selection gate, and apply this gate to ei
to obtain new paragraph plan representation ec
i :
i
i = gi (cid:9) ei
ec
De este modo, each element in ei is weighted by cor-
responding element of gi ∈ [0, 1]n to obtain a
contextualized paragraph plan representation ec
i .
macro plans, after having been trained on pairs
of sets of paragraph plans and corresponding
probability p(y|X) is modeled as:
# Tokens
# Instances
# Record Types
Avg Records
Avg Paragraph Plans
Avg Length
1.5METRO
4.9k
628
10.7
337.1
14.3METRO
26.3k
53
565
15.1
542.05
tokens,
MLB. Vocabulary size, number of
number of instances (es decir., table-summary pairs),
number of record types, average number of
records, average number of paragraph plans,
and average summary length.
find the most likely macro plan ˆz among candidate
macro plans z(cid:10) given paragraph plans as input.
output summary ˆy among candidate outputs y(cid:10)
given macro plan ˆx as input:
el
ROTOWIRE (Wiseman et al., 2017) and MLB
(Puduppully et al., 2019b) benchmarks. El
details of these two datasets are given in Table 1.
We can see that MLB is around 5 times bigger,
has a richer vocabulary, and has longer game sum-
maries. We use the official splits of 3,398/727/728
for ROTOWIRE and 22,821/1,739/1,744 for MLB.
We make use of a tokenization script1 to deto-
kenize and retokenize the summaries in both
ROTOWIRE and MLB.
version released by Puduppully et al. (2019b)
de
had removed all paragraph delimiters
game summaries. Específicamente, we followed their
methodology and downloaded the same sum-
maries from the ESPN Web site2 and added the
2http://www.espn.com/mlb/recap?gameId=
paragraph delimiters, and also predicts these at generation
tiempo, for evaluation we strip
game summaries either. We reverse engineered
these as follows: (1) we split summaries into sen-
tences using the NLTK (Bird et al., 2009) oración
tokenizer; (2) initialized each paragraph with a
separate sentence; (3) merged two paragraphs into
one if the entities in the former were a superset of
entities in the latter; (4) repeated Step 3 until no
merges were possible.
hyperparameters on the development set. Para
training the macro planning and the text gener-
ation stages, we used the Adagrad (Duchi et al.,
2011) optimizer. Además, the text generation
stage made use of truncated BPTT (Williams and
Peng, 1990) with truncation length 100. We learn
subword vocabulary (Sennrich et al., 2016) para
paragraph plans in the macro planning stage. Nosotros
used 2.5K merge operations for ROTOWIRE and 8K
merge operations for MLB. In text generation, nosotros
learn a joint subword vocabulary for the macro
plan and game summaries. We used 6K merge
operations for ROTOWIRE and 16K merge oper-
ations for MLB. All models were implemented
on OpenNMT-py (Klein et al., 2017). We add to
set E the paragraph plans corresponding to the
output summary paragraphs, to ensure full cover-
age during training of the macro planner. During
inference for predicting macro plans, we employ
length normalization (Bahdanau et al., 2015) a
avoid penalizing longer outputs; específicamente, nosotros
divide the scores of beam search by the length of
La salida. Además, we adopt bigram blocking
(Paulus et al., 2018). For MLB, we further block
beams containing more than two repetitions of a
unigram. This helps improve the diversity of the
predicted macro plans.
against the following systems: (1) the Template-
based generators from Wiseman et al. (2017)
for ROTOWIRE and Puduppully et al. (2019b) para
MLB. Both systems apply the same principle, ellos
emit a sentence about the teams playing in the
juego, followed by player-specific sentences, y
a closing sentence. MLB additionally contains a
description of play-by-play; (2) ED+CC, the best
performing system in Wiseman et al. (2017), es
a vanilla encoder-decoder model equipped with
an attention and copy mechanism; (3) NCP+CC,
the micro planning model of Puduppully et al.
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
by making use of Pointer networks (Viñales
et al., 2015) to point to records; content plans are
encoded with a BiLSTM and the game summary
is decoded using another LSTM with attention
and copy; (4) ENT, the entity-based model of
Puduppully et al. (2019b), creates dynamically
updated entity-specific representations; the text
is generated conditioned on the data input and
entity memory representations using hierarchical
attention at each time step.
ción, following earlier work (Wiseman et al. 2017;
Puduppully et al. 2019a,b, inter alia) nosotros reportamos
AZUL (Papineni et al., 2002) with the gold sum-
mary as reference but also make use of the In-
formation Extraction (IE) metrics from Wiseman
et al. (2017), which are defined over the output
of an IE system; the latter extracts entity (jugadores,
equipos) and value (numbers) pairs in a summary,
and then predicts the type of relation. Por ejemplo,
given the pair Kansas City Royals, 9, it would
predict their relation as TR (es decir., Team Runs).
Training data for the IE system is obtained by
checking for matches between entity, value pairs
in the gold summary and entity, valor, record type
triplets in the table.
producción. Relation Generation (RG) measures the
precision and count of relations extracted from y
that also appear in records r. Content Selection
(CS) measures the precision and recall of relations
extracted from y that are also extracted from ˆy.
Content Ordering (CO) measures the normalized
Damerau-Levenshtein distance between the se-
quences of relations extracted from y and ˆy.
(2019a) for ROTOWIRE but retrained it for MLB
to improve its precision and recall. Además,
the implementation of Wiseman et al. (2017)
computes RG, CS, and CO excluding duplicate
relaciones. This artificially inflates the performance
of models whose outputs contain repetition. Nosotros
include duplicates in the computation of the IE
métrica (and recreate them for all comparison
sistemas).
ROTOWIRE
In addition to Templ,
NCP+CC, ENT, and ED+CC we include the
WS-2017
ED+CC
NCP+CC
ENT
RBF-2020
Macro
−Plan(4)
P% P% R% F% DLD%
34.1 75.1 20.3 36.3 26.1
35.9 82.6 19.8 33.8 24.9
40.8 87.6 28.0 51.1 36.2
32.7 91.7 34.7 48.5 40.5
44.9 89.5 23.9 47.0 31.7
42.1 97.6 34.1 57.8 42.9
36.2 81.3 22.1 38.6 28.1
12.4
12.0
15.8
16.6
14.3
17.7
12.1
P% P% R% F% DLD%
Templ
ED+CC
32.5 91.3 27.8 40.6 33.0
NCP+CC
19.6 81.3 44.5 44.1 44.3
23.8 81.1 40.9 49.5 44.8
ENT
30.8 94.4 40.8 54.9 46.8
Macro
−Plan(SP,4) 25.1 92.7 40.0 44.6 42.2
17.1
21.9
20.7
21.8
21.9
14.19
14.99
16.50
16.12
17.16
15.46
14.00
9.68
9.68
11.50
12.62
11.09
test sets;
(#)
and precision (P%), contenido
selección (CS)
precisión (P%), recordar (R%) and F-measure (F%),
content ordering (CO) in normalized Damerau-
Levenshtein distance (DLD%), and BLEU.
(WS-2017; note that ED+CC is an improved re-
implementation of their model), and the model of
Rebuffel et al. (2020) (RBF-2020), which repre-
sents the state of the art on ROTOWIRE. This model
has a Transformer encoder (Vaswani et al., 2017)
with a hierarchical attention mechanism over
entities and records within entities. The models
of Saleh et al. (2019), Iso et al. (2019), and Gong
et al. (2019) make use of additional information
not present in the input (p.ej., previous/next games,
summary writer) and are not directly comparable
to the systems in Table 2. Results for the MLB
test set are in the bottom portion of Table 2.
on both datasets. This is not surprising, por diseño
Templ
to the input. Cómo-
alguna vez, notice that it achieves the lowest BLEU
él
among comparison systems,
mostly regurgitates facts with low fluency. Macro
achieves the highest RG precision among all neu-
ral models for ROTOWIRE and MLB. Obtenemos
an absolute improvement of 5.9% over ENT
for ROTOWIRE and 13.3% for MLB. Además,
Macro achieves the highest CS F-measure for
both datasets. On ROTOWIRE, Macro achieves the
highest CO score, and the highest BLEU on MLB.
On ROTOWIRE, in terms of BLEU, Macro is worse
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Inspection of the output showed that the opening
párrafo, which mostly describes how the two
teams fared, is generally shorter in Macro, dirigir-
ing to shorter summaries and thus lower BLEU.
There is high variance in the length of the opening
paragraph in the training data and Macro verbal-
izes the corresponding plan conservatively. Ideas
such as length normalization (Wu et al., 2016) o
length control (Kikuchi et al., 2016; Takeno et al.,
2017; Fan et al., 2018) could help alleviate this;
sin embargo, we do not pursue them further for fair
comparison with the other models.
the effect of macro planning in more detail, nosotros
further compared Macro against text generation
modelos (mira la sección 4.2) which are trained on
verbalizations of the tabular data (and gold sum-
maries) but do not make use of document plans or a
document planning mechanism. On ROTOWIRE, el
model was trained on verbalizations of players and
equipos, with the input arranged such that the ver-
balization of the home team was followed by the
visiting team, the home team players and the visit-
ing team players. Mention of players was limited
to the four best ones, following Saleh et al. (2019)
(see −Plan(4) en mesa 2). For MLB, we addition-
ally include verbalizations of innings focusing on
scoring plays which are likely to be discussed in
game summaries (see −Plan(SP,4) en mesa 2).
Note that by preprocessing the input in such a
way some simple form of content selection takes
place simply by removing extraneous information
which the model does not need to consider.
competitive. On ROTOWIRE −Plan(4) is better than
ED+CC in terms of content selection but worse
compared to ENT. On MLB, −Plan(SP,4) is again
superior to ED+CC in terms of content selection
but not ENT whose performance lags behind
when considering RG precision. Tomados juntos,
these results confirm that verbalizing entities and
events into a text sequence is effective. En el
mismo tiempo, we see that −Plan variants are worse
than Macro across most metrics which underlines
the importance of an explicit planning component.
macro planning stage. Aquí, we compare the in-
ferred macro plan with the gold macro plans, CS
and CO metrics with regard to entities and events
MLB
80.6
63.3
70.9
31.4
content selection precision (CS-P), recordar (CS-
R), F-measure (CS-F), and content ordering (CO)
between the inferred plans and gold plans in terms
of entities and events for ROTOWIRE (RW) y
MLB test sets.
ning model (Macro) achieves high scores for CS
and CO for both ROTOWIRE and MLB. We further
used the CS and CO metrics to check how well the
generated summary follows the (predicted) plan.
We followed the steps in Section 3.2 and reverse
engineered macro plans from the model summa-
ries and compared these extracted plans with the
original macro plans with regard to entities and
events. We found that Macro creates summaries
that follow the plan closely: For ROTOWIRE, el
CS F-score and CO are greater than 98%; para
MLB, the CS F-score is greater than 94% and CO
is greater than 89%. We show an output summary
for Macro in Table 4, together with the predicted
document plan.
ticipants to assess model output in terms of relation
generación, grammaticality, coherencia, y estafa-
ciseness (Wiseman et al., 2017; Puduppully
et al., 2019a; Puduppully et al., 2019b), Para
ROTOWIRE, we compared Macro against RBF-
2020,4 ED+CC, Gold, and Templ. For MLB, nosotros
compared Macro against ENT, ED+CC, Gold,
and Templ.
Mechanical Turk (AMT) crowdsourcing platform,
following best practices for human evaluation in
NLG (van der Lee et al., 2019). Específicamente, to en-
sure consistent ratings, we required crowdworkers
to have an approval rating greater than 98% y
a minimum of 1,000 previously completed tasks.
Raters were restricted to English-speaking coun-
intentos (es decir., US, Reino Unido, Canada, Irlanda, Australia, o
NZ). Participants were allowed to provide feed-
back on the task or field questions (our interface
accepts free text).
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
the most of it.
eighth inning and the Rays beat the Boston Red Sox 2-1 on Sunday
to complete a three-game sweep.
record in the majors, have won six of their last seven games.
The Rays have won four of their last five series, including three in
a row against the Red Sox, who have won six of their last seven
en general.
Clay Buchholz (1-2) and moved to third on Jason Bartlett’s flyout to
center. Iwamura then drove a 1-1 pitch into the left-field stands for
his second homer of the season.
win in relief, striking out Manny Ramirez with runners on first and
third to end the eighth.
his fifth save in five opportunities.
up two runs and three hits in eight innings. He struck out nine and
walked two.
fifth on a single by Coco Crisp, a wild pitch and a walk to Jed Lowrie.
Jacoby Ellsbury drove in Crisp with a two-out single to center.
Jackson struck out four and walked three.
the bases with one out in the fifth on a single by Coco Crisp, a walk
to Jed Lowrie and a one-out walk to Jed Lowrie. Jackson struck out
Julio Lugo, but Jacoby Ellsbury singled to center to put the Red Sox
arriba 1-0.
drew a two-out walk against Trever Miller, but Ramirez struck out to
end the inning.
corresponding model output (abajo). Entities and
events in summary corresponding to those in the
macro plan are boldfaced.
with sentences randomly selected from summaries
along with their corresponding box score (y
play-by-play in case of MLB) and asked them to
count supported and contradicting facts (ignoring
hallucinations, es decir., unsupported facts). We did
not require crowdworkers to be familiar with
NBA or MLB. En cambio, we provided a cheat sheet
explaining the semantics of box score tables. En
addition, we provided examples of sentences with
supported/contradicting facts. We evaluated 40
summaries from the test set (20 per dataset), 4 sen-
tences from each summary and elicited 3 respuestas
per summary. This resulted in 40 summaries ×
5 systems × 3 raters, for a total of 600 tareas.
Altogether, 131 crowdworkers participated in this
estudiar (agreement using Krippendorff’s α was 0.44
for supported and 0.42 for contradicting facts).
number of contradicting facts among neural mod-
els on both datasets. On ROTOWIRE the number
of contradicting facts for Macro is comparable
to Gold and Templ (the difference is not sta-
tistically significant) and significantly smaller
compared to RBF-2020 and ED+CC. The count
7.57*
3.92
Templ
ED+CC
RBF-2020 5.08*
Macro
30.83
0.08 −61.67* −52.92* −36.67*
−4.58
0.91*
3.75
0.67*
6.67
0.27
4.58
10.42
13.33
5.0
Gold
Templ
ED+CC
ENT
Macro
30.0
0.14
3.59
21.67
0.04 −51.25* −43.75*
4.21
0.72* −22.5* −12.08* −39.17*
3.42
5.83* −0.83* −22.08*
0.73*
3.71
27.08
26.67
46.25
0.25
3.76
26.67
7.5
and contradicting (#Contra)
in game
summaries and best-worst scaling evaluation
(higher is better). Systems significantly different
from Macro are marked with an asterisk * (usando
a one-way ANOVA with post hoc Tukey HSD
pruebas; p ≤ 0.05).
Gold, and ED+CC, and significantly lower than
Templ and RBF-2020. On MLB, Macro has sig-
nificantly fewer contradicting facts than ENT and
ED+CC and is comparable to Templ and Gold
(the difference is not statistically significant). El
count of supported facts for Macro is comparable
to Gold, ENT, ED+CC, and Templ. For both
conjuntos de datos, Templ has the lowest number of contra-
dicting facts. This is expected as Templ essentially
parrots facts (aka records) from the table.
the quality of the generated summaries. We pre-
sented crowdworkers with a pair of summaries and
asked them to choose the better one in terms of
Grammaticality (is the summary written in well-
formed English?), Coherence (is the summary
well structured and well organized and does it have
a natural ordering of the facts?), and Conciseness
(does the summary avoid unnecessary repetition
including whole sentences, facts or phrases?). Nosotros
provided example summaries showcasing good
and bad output. For this task, we required that the
crowdworkers be able to comfortably compre-
hend NBA/MLB game summaries. We elicited
preferences with Best-Worst Scaling (Louviere
and Woodworth 1991; Louviere et al., 2015), a
method shown to be more reliable than rating
escamas. The score of a system is computed as the
number of times it is rated best minus the number
of times it is rated worst (Orme, 2009). The scores
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
(absolutely best). We divided the five competing
systems into ten pairs of summaries and elicited
ratings for 40 summaries (20 per dataset). Cada
summary pair was rated by 3 raters. This resulted
en 40 summaries × 10 system pairs × 3 evaluación
criteria × 3 raters, for a total of 3,600 tareas. A
total of 206 crowdworkers participated in this task
(agreement using Krippendorff’s α was 0.47).
comparable to Gold, RBF-2020, and ED+CC in
terms of Grammaticality but significantly better
than Templ. In terms of Coherence, Macro is
comparable to RBF-2020 and ED+CC but signif-
icantly better than Templ and significantly worse
than Gold. With regard to Conciseness, Macro is
comparable to Gold, RBF-2020, and ED+CC, y
significantly better than Templ. On MLB, Macro
is comparable to Gold in terms of Grammaticality
and significantly better than ED+CC, ENT, y
Templ. Macro is comparable to Gold in terms of
Coherence and significantly better than ED+CC,
ENT and Templ. In terms of Conciseness, raters
found Macro comparable to Gold and Templ and
significantly better than ED+CC and ENT. Taken
together, our results show that macro planning
leads to improvement in data-to-text generation in
comparison to other systems for both ROTOWIRE
and MLB datasets.
approach for data-to-text generation that consists
of a macro planning stage representing high-level
document organization in terms of structure and
contenido, followed by a text generation stage.
Extensive automatic and human evaluation shows
that our approach achieves better results than ex-
isting state-of-the-art models and generates sum-
maries which are factual, coherent, and concise.
advantageous for generation tasks expected to
produce longer texts with multiple discourse
units, and could be easily extended to other sports
domains such as cricket (Kelly y col., 2009) o
American football (Barzilay and Lapata, 2005).
Other approaches focusing on micro planning
(Puduppully et al., 2019a; Moryossef et al., 2019)
might be better tailored for generating shorter
textos. There has been a surge of datasets recently
focusing on single-paragraph outputs and the task
2017), WebNLG (Gardent et al., 2017), y
WikiBio (Lebret et al., 2016; Perez-Belrachini
and Lapata, 2018). We note that in our model con-
tent selection takes place during macro planning
and text generation. The results in Table 2 espectáculo
that Macro achieves the highest CS F-measure
on both datasets, indicating that the document as
a whole and individual sentences discuss appro-
priate content.
template-based systems score poorly in terms of
CS (but also CO and BLEU). This is primarily due
to the inflexibility of the template approach which
is limited to the discussion of a fixed number of
(high-scoring) jugadores. Todavía, human writers (y
neural models to a certain extent), synthesize
summaries taking into account the particulars of a
specific game (where some players might be more
important than others even if they scored less)
and are able to override global defaults. Template
sentences are fluent on their own, but since it
is not possible to perform aggregation (Reiter,
1995), the whole summary appears stilted, it lacks
coherence and variability, contributing to low
Puntuaciones BLEU. The template baseline is worse for
MLB than ROTOWIRE which reflects the greater
difficulty to manually create a good template for
MLB. En general, we observe that neural models are
more fluent and coherent, being able to learn a
better ordering of facts which is in turn reflected
in better CO scores.
to improve macro planning, especially in terms
of the precision of RG (ver tabla 2, P% column
of RG). We should not underestimate that Macro
must handle relatively long inputs (the average
input length in the MLB development set is ∼3100
tokens) which are challenging for the attention
mechanism. Consider the following output of our
model on the MLB dataset: Ramirez’s two-run
double off Joe Blanton tied it in the sixth, y
Brandon Moss added a two-out RBI single off
Alan Embree to give Boston a 3-2 dirigir. Aquí, el
name of the pitcher should have been Joe Blanton
instead of Alan Embree. De hecho, Alan Embree is
the pitcher for the following play in the half in-
y. En este caso, attention diffuses over the rela-
tively long MLB macro plan, leading to inaccurate
content selection. We could alleviate this prob-
lem by adopting a noisy channel decomposition
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
learning two different distributions: a conditional
model that provides the probability of translating
a paragraph plan to text and a language model that
provides an unconditional estimate of the output
(es decir., the whole game summary). Sin embargo, nosotros
leave this to future work.
the model’s inability to understand numbers. Para
ejemplo, Macro generates the following output:
The Lakers were the superior shooters in this
juego, going 48 percent from the field and 30 por-
cent from the three-point line, while the Jazz went
47 percent from the floor and 30 percent from
beyond the arc. Aquí, 30 percent should have been
24 percent for the Lakers but the language model
expects a higher score for the three-point line, y
desde 24 is low (especially compared to 30 scored
by the Jazz), it simply copies 30 scored by the
Jazz instead. A mechanism for learning better rep-
resentations for numbers (Wallace et al., 2019) o
executing operations such as argmax or minus (NO
et al., 2018) should help alleviate this problem.
learning document plans from data, the decoupling
of planning from generation allows to flexibly
generate output according to specification. Para
ejemplo, we could feed the model with manually
constructed macro plans, consequently controlling
the information content and structure of the output
summary (p.ej., for generating short or long texts,
or focusing on specific aspects of the game).
and the three anonymous reviewers for their
constructive feedback. We also thank Laura
Perez-Beltrachini for her comments on an earlier
draft of this paper, and Parag Jain, Hao Zheng,
Stefanos Angelidis and Yang Liu for helpful
discusiones. Nosotros
financial
the European Research Council
support of
(Lapata; award number 681760, ‘‘Translating
Multiple Modalities into Text’’).
bengio. 2015. Traducción automática neuronal por
aprender juntos a alinear y traducir.
En
3rd International Conference on Learning
EE.UU, May 7–9, 2015, Conference Track
Actas.
Collective content selection for concept-to-text
generación. In Proceedings of Human Language
Technology Conference and Conference on
Empirical Methods
in Natural Language
Procesando,
331–338, vancouver,
paginas
British Columbia, Canada. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.3115/1220575.1220617
2009. Natural Language Processing with
Python, O’Reilly Media.
Emiel van Miltenburg, and Emiel Krahmer.
generación: A
data-to-text
2019. Neural
comparison between pipeline and end-to-
el
end architectures.
2019 Conference on Empirical Methods in
Natural Language Processing and the 9th
International Joint Conference on Natural
Idioma
(EMNLP-IJCNLP),
pages 552–562, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1052
and the flow of language. In Talmy Giv´on,
editor, Syntax and Semantics, volumen 12,
pages 159–181, Academic Press Inc. DOI:
https://doi.org/10.1163/9789004
368897 008
Fast abstractive summarization with reinforce-
selected sentence rewriting. En procedimientos de
the 56th Annual Meeting of the Association
para Lingüística Computacional (Volumen 1:
Artículos largos), pages 675–686, Melbourne,
Australia. Asociación
for Computational
Lingüística. DOI: https://doi.org/10
.18653/v1/P18-1063
sions in a domain of objects and processes.
Content planner construction via evolutionary
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
the International Nat-
En procedimientos de
ural
Conferencia de Generación de Lenguas,
pages 89–96, Harriman, Nueva York, EE.UU.
Asociación de Lingüística Computacional.
2001. Empirically estimating order constraints
for content planning in generation. En profesional-
ceedings of the 39th Annual Meeting of the
Asociación de Lingüística Computacional,
pages 172–179, Tolosa, Francia. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.3115/1073012
.1073035
2011. Adaptive subgradient methods for online
learning and stochastic optimization. Diario de
Machine Learning Research, 12:2121–2159.
Auli. 2018. Controllable abstractive summa-
rization. In Proceedings of the 2nd Workshop
on Neural Machine Translation and Gen-
eration, pages 45–54, Melbourne, Australia.
Asociación de Lingüística Computacional.
Narayan, and Laura Perez-Beltrachini. 2017.
Creating training corpora for NLG micro-
planners. In Proceedings of the 55th Annual
reunión de
la Asociación de Computación-
lingüística nacional (Volumen 1: Artículos largos),
pages 179–188, vancouver, Canada. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P17
-1017
of the state of the art
idioma
generación: Core tasks, applications and eval-
uation. j. Artif. Intell. Res., 61:65–170. DOI:
https://doi.org/10.1613/jair.5477
Ting Liu. 2019. Table-to-text generation with
effective hierarchical encoder on three dimen-
siones (row, column and time). En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1310
oh. k. li. 2016. Incorporating copying mech-
anism in sequence-to-sequence learning. En
Proceedings of the 54th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1631–1640,
Berlina, Alemania. Asociación de Computación-
lingüística nacional.
Nallapati, Bowen Zhou, and Yoshua Bengio.
2016. Pointing the unknown words. En profesional-
cesiones de
the 54th Annual Meeting of
the Association for Computational Linguis-
tics (Volumen 1: Artículos largos), pages 140–149,
Berlina, Alemania. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/P16-1014
Cohesion in English, Londres. Longman. DOI:
https://doi.org/10.1162/neco
.1997.9.8.1735
Memoria larga a corto plazo. Computación neuronal,
9:1735–1780.
generation using discourse structure relations.
Artificial Intelligence, 63(1-2):341–385. DOI:
https://doi.org/10.1016/0004-3702
(93)90021-3
Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi,
Yusuke Miyao, Naoaki Okazaki, and Hiroya
Takamura. 2019. Learning to select,
track,
En curso-
and generate for data-to-text.
el
cosas de
Asociación de Lingüística Computacional,
pages 2102–2113, Florencia,
Italia. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/P19
-1202
Corpus-trained text generation for summa-
rization. In Proceedings of the International
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
pages 1–8, Harriman, Nueva York, EE.UU. también-
ciation for Computational Linguistics.
Karamanis. 2009. Investigating content selec-
tion for language generation using machine
aprendiendo. In Proceedings of the 12th European
Taller sobre Generación de Lenguaje Natural
(ENLG 2009), pages 130–137, Atenas, Greece.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.3115/161
0195.1610218
Hiroya Takamura, and Manabu Okumura.
length in neural
2016. Controlling output
encoder-decoders. En Actas de la 2016
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1328–1338,
austin, Texas. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D16-1140
Jean Senellart, y Alejandro Rush. 2017.
OpenNMT: Open-source toolkit
for neural
machine translation. In Proceedings of ACL
2017, Demostraciones del sistema, pages 67–72,
vancouver, Canada. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/P17-4012
Inducing document plans
for concept-to-
generación de texto. En Actas de la 2013
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje,
1503–1514,
seattle, Washington, EE.UU. Asociación para
Ligüística computacional.
In 21st Annual
based report generator.
la Asociación de Computación-
reunión de
lingüística nacional. DOI: https://doi.org
/10.3115/981311.981340
and Karthik Sankaranarayanan. 2020. Scal-
able micro-planned generation of discourse
from structured data. Computational Linguis-
tics, 45(4):737–763. DOI: https://doi
.org/10.1162/coli a 00363
text generation from
Auli. 2016. Neural
structured data with application to the biog-
raphy domain. En procedimientos de
el 2016
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1203–1213,
austin, Texas. Asociación de Computación-
lingüística nacional. DOI: https://doi.org
/10.18653/v1/D16-1128
grammatical unit. In Talmy Giv´on, editor,
Syntax and Semantics, volumen 12, Académico
Press Inc., pages 115–133.
A. A. j. Marley. 2015. Best-Worst Scaling:
Teoría, Methods and Applications, Cambridge
Prensa universitaria. DOI: https://doi.org
/10.1017/CBO9781107337855
1991. Best-worst scaling: A model for the
largest difference judgments. Universidad de
Alberta: Working Paper.
Manning. 2015. Effective
a
attention-based neural machine translation. En
Actas de la 2015 Conferencia sobre el Imperio-
Métodos icales en el procesamiento del lenguaje natural,
pages 1412–1421, Lisbon, Portugal. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D15
-1166
en procesamiento del lenguaje natural,
Prensa de la Universidad de Cambridge.
Shimei Pan, James Shaw, and Barry A. allen.
1997. Language generation for multimedia
In Fifth Conference
healthcare briefings.
on Applied Natural Language Processing,
pages 277–282, Washington, corriente continua, EE.UU. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.3115/974557
.974598
walter. 2016. What to talk about and how?
Selective generation using LSTMs with coarse-
el
to-fine alignment.
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
paginas
720–730, San Diego, California.
Asociación de Lingüística Computacional.
2019. Step-by-step: Separating planning from
realization in neural data-to-text generation. En
Actas de la 2019 Conference of the
North American Chapter of the Association for
Ligüística computacional: Human Language
Technologies, Volumen 1 (Long and Short
Documentos),
2267–2277, Mineápolis,
Minnesota. Asociación de Computación
Lingüística.
Cacerola, and Chin-Yew Lin. 2018. Operation-
guided neural networks
for high fidelity
En procedimientos de
data-to-text generation.
el 2018 Conference on Empirical Meth-
probabilidades
Procesamiento del lenguaje,
pages 3879–3889, Bruselas, Bélgica. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D18
-1422
Verena Rieser. 2017. The E2E dataset:
New challenges for end-to-end generation.
the 18th Annual SIG-
En procedimientos de
dial Meeting on Discourse and Dialogue,
pages 201–206, Sarrebruck, Alemania. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/W17
-5525
Socher. 2018. A deep reinforced model for
En internacional
abstractive summarization.
Conferencia sobre Representaciones del Aprendizaje.
Bootstrapping generators from noisy data. En
Actas de la 2018 Conference of the
North American Chapter of the Association for
Ligüística computacional: Human Language
Technologies, Volumen 1 (Artículos largos),
pages 1516–1527, Nueva Orleans, Luisiana.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/N18-1137
Lapata. 2019a. Data-to-text generation with
content selection and planning. En procedimientos
of the 33rd AAAI Conference on Artificial Intel-
ligence. Honolulu, Hawaii. DOI: https://
doi.org/10.1609/aaai.v33i01.330
16908
2019b. Data-to-text generation with entity
modelado. In Proceedings of the 57th Annual
Meeting of the Association for Computational
Lingüística, pages 2023–2035, Florencia, Italia.
Asociación de Lingüística Computacional.
DOI: https://doi.org/10.18653/v1
/P19-1195
Luan, Dario Amodei, and Ilya Sutskever.
2019. Language models are unsupervised
multitask learners. OpenAI blog, 1(8):9. DOI:
https://doi.org/10.18653/v1/P19
-1195
and HB.
Sawtooth Software.
Ward, y Wei-Jing Zhu. 2002. AZUL: a
method for automatic evaluation of machine
the 40th
En procedimientos de
traducción.
la Asociación para
Annual Meeting of
Ligüística computacional, páginas 311–318,
Filadelfia, Pensilvania, EE.UU. Associ-
ation for Computational Linguistics. DOI:
https://doi.org/10.3115/1073083
.1073135
Scoutheeten, and Patrick Gallinari. 2020. A
hierarchical model for data-to-text generation.
Informa-
En
tion Retrieval, pages 65–80. Saltador. DOI:
https://doi.org/10.1007/978-3-030
-45439-5 5, PMCID: PMC7148215
cmp-lg/9504013v1. DOI: https://doi.org
/10.1017/S1351324997001502
applied natural language generation systems.
Natural Language Engineering, 3(1):57–87.
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Sistemas.
Language Generation
Natural
Estudios
en procesamiento del lenguaje natural,
Prensa de la Universidad de Cambridge. DOI: https://
doi.org/10.1017/CBO9780511519857
Saleh, Alexandre Berard,
Calapodescu, and Laurent Besacier. 2019. Naver
Labs Europe’s systems for
the document-
level generation and translation task at WNGT
the 3rd Work-
2019.
shop on Neural Generation and Translation,
pages 273–279, Hong Kong. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.18653/v1/D19-5631
Birch. 2016. Neural machine translation of
En profesional-
rare words with subword units.
cesiones de
the 54th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 1715–1725,
Berlina, Alemania. Asociación de Computación-
lingüística nacional. DOI: https://doi
.org/10.18653/v1/P16-1162
Wenfei Xu, and Xiaoyan Zhu. 2019. Long and
diverse text generation with planning-based
hierarchical variational model. En procedimientos
del 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
Procesamiento del lenguaje oral (EMNLP-IJCNLP),
pages 3257–3268, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1321
Le. 2014. Sequence to sequence learning
with neural networks. En avances en neurología
Sistemas de procesamiento de información, volumen 27,
pages 3104–3112. Asociados Curran, Cª.
Yamamoto. 2017. Controlling target features
in neural machine translation via prefix
constraints. In Proceedings of the 4th Workshop
on Asian Translation (WAT 2017), pages 55–63,
Taipéi, Taiwán. Asian Federation of Natural
Procesamiento del lenguaje.
Ankur P. Parikh. 2019. Sticking to the facts:
Confident decoding for faithful data-to-text
generación. CORR, abs/1910.08684v2.
Miltenburg, Sander Wubben,
and Emiel
Krahmer. 2019. Best practices for the human
evaluation of automatically generated text.
the 12th International
En procedimientos de
Conference on Natural Language Generation,
pages 355–368, Tokio, Japón. Asociación para
Ligüística computacional. DOI: https://
doi.org/10.18653/v1/W19-8643
Jakob Uszkoreit, Leon Jones, Aidan N.. Gómez,
lucas káiser, y Illia Polosukhin. 2017.
Attention is all you need. In I. Guyon, Ud.. V.
Luxburg, S. bengio, h. Wallach, R. Fergus,
S. Vishwanathan, y r. Garnett, editores,
Avances en el procesamiento de información neuronal
Sistemas
Cª,
pages 5998–6008.
Jaitly. 2015. Pointer networks. In C. Cortes,
norte. D. lorenzo, D. D. Sotavento, METRO. Sugiyama,
en
y r. Garnett,
Neural Information Processing Systems 28,
pages 2692–2700, Asociados Curran, Cª.
singh, and Matt Gardner. 2019. Do NLP
models know numbers? Probing numer-
acy in embeddings. En procedimientos de
el
2019 Conference on Empirical Methods in
Natural Language Processing and the 9th
International Joint Conference on Natural
(EMNLP-IJCNLP),
Idioma
pages 5307–5315, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1534
efficient gradient-based algorithm for on-line
recurrent network trajectories.
training of
Computación neuronal, 2(4):490–501. DOI:
https://doi.org/10.1162/neco.1990
.2.4.490
Rush. 2017. Challenges in data-to-document
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
En procedimientos de
generación.
Jornada sobre Métodos Empíricos en Natural
Procesamiento del lenguaje,
2253–2263,
Copenhague, Dinamarca. Asociación para Com-
Lingüística putacional. DOI: https://doi
.org/10.18653/v1/D17-1239
Quoc V. Le, Mohammad Norouzi, Wolfgang
Macherey, Maxim Krikun, Yuan Cao, Qin
gao, Klaus Macherey, Jeff Klingner, Apurva
Shah, Melvin Johnson, Xiaobing Liu, lucas
Kaiser, Stephan Gows, Yoshikiyo Kato, Taku
Kudo, Hideto Kazawa, Keith Stevens, Jorge
Kurian, Nishant Patil, Wei Wang, Cliff Young,
Jason Smith, Jason Riesa, Alex Rudnick,
Oriol Vinyals, Greg Corrado, Macduff Hughes,
and Jeffrey Dean. 2016. Google’s neural
machine translation system: Bridging the gap
between human and machine translation. CORR,
abs/1609.08144v2.
Él, Alex Smola, and Eduard Hovy. 2016.
Hierarchical attention networks for document
el 2016
clasificación.
Conference of
the North American Chap-
the Association for Computational
ter of
pages 1480–1489, San Diego, California. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/N16
-1174
Simple and effective noisy channel modeling
for neural machine translation. En procedimientos
del 2019 Conferencia sobre métodos empíricos
in Natural Language Processing and the 9th
International Joint Conference on Natural
(EMNLP-IJCNLP),
Idioma
pages 5696–5701, Hong Kong, Porcelana. también-
ciation for Computational Linguistics. DOI:
https://doi.org/10.18653/v1/D19
-1571
Wang Ling, Lingpeng Kong, Phil Blunsom,
and Chris Dyer. 2020. Better document-level
machine translation with Bayes’ rule. Trans-
acciones de la Asociación de Computación
Lingüística, 8:346–360. DOI: https://
doi.org/10.1162/tacl a 00319
párrafos. computacional
de
Lingüística, 17(2):171–210.
oh
w
norte
oh
a
d
mi
d
r
oh
metro
h
t
/
/
i
r
mi
C
t
.
t
.
d
tu
t
C
yo
/
r
t
i
C
mi
–
pag
d
/
oh
/
0
1
1
6
2
t
C
_
a
_
0
0
3
8
1
1
9
2
4
1
7
6
t
C
_
a
_
0
0
3
8
1
pag
d
y
gramo
tu
mi
s
t
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3