A Survey on Cross-Lingual Summarization

Jiaan Wang1∗, Fandong Meng2†, Duo Zheng4, Yunlong Liang2
Zhixu Li3†, Jianfeng Qu1 and Jie Zhou2
1School of Computer Science and Technology, Soochow University, Suzhou, China

2Pattern Recognition Center, WeChat AI, Tencent Inc, China
3Shanghai Key Laboratory of Data Science, School of Computer Science,

Fudan University, Shanghai, China

4Beijing University of Posts and Telecommunications, Peking, China

jawang1@stu.suda.edu.cn, {fandongmeng,yunlonliang,withtomzhou}@tencent.com

zd@bupt.edu.cn, zhixuli@fudan.edu.cn, jfqu@suda.edu.cn

Abstrakt

Cross-lingual summarization is the task of
generating a summary in one language (z.B.,
English) for the given document(S) in a differ-
ent language (z.B., Chinese). Under the glob-
alization background, this task has attracted
increasing attention of the computational lin-
guistics community. Trotzdem, there still
remains a lack of comprehensive review for
diese Aufgabe. daher, we present
the first
systematic critical review on the datasets, ap-
proaches, and challenges in this field. Specif-
isch, we carefully organize existing datasets
and approaches according to different con-
struction methods and solution paradigms,
jeweils. For each type of dataset or ap-
proach, we thoroughly introduce and summa-
rize previous efforts and further compare them
with each other to provide deeper analyses. In
the end, we also discuss promising directions
and offer our thoughts to facilitate future re-
suchen. This survey is for both beginners and
experts in cross-lingual summarization, Und
we hope it will serve as a starting point as well
as a source of new ideas for researchers and
engineers interested in this area.

Einführung

To help people efficiently grasp the gist of
documents in a foreign language, Cross-Lingual
Summarization (XLS) aims to generate a summary
in the target language from the given document(S)
in a different source language. This task could be
regarded as a combination of monolingual sum-

∗Work was done when Jiaan Wang was interning at
Pattern Recognition Center, WeChat AI, Tencent Inc, China.

†Corresponding authors.

marization (MS) and machine translation (MT),
both of which are unsolved natural language pro-
Abschließen (NLP) tasks and have been continuously
studied for decades (Paice, 1990; Brown et al.,
1993). XLS is an extremely challenging task: (1)
from the perspective of data, unlike MS, natu-
rally occurring documents in a source language
paired with the corresponding summaries in dif-
ferent target languages are rare, making it difficult
to collect large-scale and human-annotated data-
sets (Ladhak et al., 2020; Perez-Beltrachini and
Lapata, 2021); (2) from the perspective of models,
XLS requires both the abilities to translate and
summarize, which makes it hard to generate accu-
rate summaries by directly conducting XLS (Cao
et al., 2020).

Despite its importance, XLS has attracted a
little attention (Leuski et al., 2003; Wan et al.,
2010) in the statistical learning era due to its dif-
ficulties and the scarcity of parallel corpus. Re-
cent years have witnessed the rapid development
of neural networks, especially the emergence of
pre-trained encoder-decoder models (Zhang et al.,
2020A; Raffel et al., 2020; Lewis et al., 2020;
Liu et al., 2020; Tang et al., 2021; Xue et al.,
2021), making neural summarizers and transla-
tors achieve impressive performance. In der Zwischenzeit,
creating large-scale XLS datasets has proven fea-
sible by utilizing existing MS datasets (Zhu et al.,
2019; Wang et al., 2022B) or Internet resources
(Ladhak et al., 2020; Perez-Beltrachini and
Lapata, 2021). The aforementioned successes have
laid the foundation for the XLS research field
and gradually attracted interest in XLS. Im Par-
besonders, recent researchers put their efforts into
solving the XLS task and published more than
20 papers over the past five years. Trotzdem,

1304

Transactions of the Association for Computational Linguistics, Bd. 10, S. 1304–1323, 2022. https://doi.org/10.1162/tacl a 00520
Action Editor: Yang Liu. Submission batch: 4/2022; Revision batch: 7/2022; Published 11/2022.
C(cid:3) 2022 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

there still lacks a systematic review of progresses,
Herausforderungen, and opportunities of XLS.

To fill the above gap and help new researchers,
in this paper we provide the first comprehen-
sive review of existing efforts relevant to XLS
and give multiple promising directions for future
Forschung. Speziell, we first briefly introduce
the formal definition and evaluation metrics of
XLS (§ 2), which serves as a strong background
before delving further into XLS. Dann, we pro-
vide an exhaustive overview of existing XLS
research datasets (§ 3). In detail, to alleviate the
scarcity of XLS data, previous work resorts to
different ways to construct large-scale benchmark
datasets, which are divided into synthetic datasets
and multi-lingual website datasets. The synthetic
datasets (Zhu et al., 2019; Bai et al., 2021A; Wang
et al., 2022B) are constructed through (manually
or automatically) translating the summaries of
existing MS datasets from a source language to
target languages while the multi-lingual website
datasets (Nguyen and Daum´e III, 2019; Ladhak
et al., 2020; Fatima and Strube, 2021; Perez-
Beltrachini and Lapata, 2021) are collected from
websites that provide multi-lingual versions for
their content.

Nächste, we thoroughly introduce and summarize
existing models, which are organized with respect
to different paradigms, nämlich, pipeline (§ 4) Und
end-to-end (§ 5). In detail, the pipeline models
adopt either translate-then-summarize approaches
(Leuski et al., 2003; Boudin et al., 2011; Wan,
2011; Yao et al., 2015; Zhang et al., 2016; Linhares
Pontes et al., 2018; Wan et al., 2018; Ouyang et al.,
2019) or summarize-then-translate approaches
(Or˘asan and Chiorean, 2008; Wan et al., 2010).
In this manner, the pipeline models avoid con-
ducting XLS directly, thus bypassing the model
challenge we discussed previously. Jedoch, Die
pipeline method suffers from error propagation
and recurring latency, making it not suitable for
the real-world scenario (Ladhak et al., 2020). Con-
sequently, the end-to-end method has attracted
more attention. To alleviate the model challenge,
it generally utilizes the related tasks (z.B., MS
and MT) as auxiliaries or resorts to external re-
sources. The end-to-end models mainly fall into
four categories: multi-task methods (Zhu et al.,
2019; Takase and Okazaki, 2020; Cao et al., 2020;
Bai et al., 2021A; Liang et al., 2022), Wissen-
distillation methods (Ayana et al., 2018; Duan
et al., 2019; Nguyen and Luu, 2022), resource-

enhanced methods (Zhu et al., 2020; Jiang et al.,
2022), and pre-training methods (Dou et al., 2020;
Xu et al., 2020; Ma et al., 2021; Chi et al., 2021A;
Wang et al., 2022B). For each category, we will
thoroughly go through the previous work and
discuss the corresponding pros and cons. Endlich,
we also point out multiple promising directions
on XLS to push forward the future research (§ 6),
followed by conclusions (§ 7). Our contributions
are concluded as follows:

• To the best of our knowledge, this survey
is the first that presents a thorough review
of XLS.

• We comprehensively review the existing
XLS work and carefully organize them ac-
cording to different frameworks.

• We suggest multiple promising directions to

facilitate future research on XLS.

2 Hintergrund

2.1 Task Definition

Given a collection of documents in the source
language D = {Aus}M
i=1 (m denotes the number
of documents and m ≥1), the goal of XLS is
to generate the corresponding summary in the
target language Y = {yi}N
i=1 with n words. Der
conditional distribution of XLS models is:

pθ(Y |D) =

N(cid:2)

t=1

pθ(yt|D, y1:t−1)

where θ represents model parameters and y1:t−1
is the partial ground truth summary.

It is worth noting that: (1) when m > 1, Das
task is upgraded to cross-lingual multi-document
summarization (XLMS) which has been discussed
by some previous studies (Or˘asan and Chiorean,
2008; Boudin et al., 2011; Zhang et al., 2016);
(2) when the given documents are dialogues, Die
task becomes cross-lingual dialogue summariza-
tion (XLDS) which has been recently proposed by
Wang et al. (2022B). The XLMS and XLDS are
also within the scope of this survey. Außerdem,
we define the source and the target languages
in XLS should be two exactly distinct human
languages, which also means (1) if the source
language is in code-mixed style of two natural

1305

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

languages (z.B., Chinese and English), the target
language should not be either of the both; (2) Die
programming languages (z.B., PYTHON or JAVA)
should not be the source or the target language.1

2.2 Evaluation

Following MS, ROUGE scores (Lin, 2004) Sind
universally adopted as the basic automatic metrics
for XLS, especially the F1 scores of ROUGE-1,
ROUGE-2, and ROUGE-L, which measure the
unigram, bigram, and longest common sequence
between the ground truth and the generated sum-
maries, jeweils. Trotzdem, the original
ROUGE scores are specifically designed for En-
glish. To make these metrics suitable for other lan-
guages, some useful toolkits have been released,
Zum Beispiel, multi-lingual ROUGE2 and
MLROUGE.3 In addition to these metrics based
on lexical overlap, recent work proposes new met-
rics based on the semantic similarity (token/word
embeddings), such as MoverScore4 (Zhao et al.,
2019) and BERTScore5 (Zhang et al., 2020B),
whose great consistency with human judgements
on MS has been shown (Koto et al., 2021).

3 Datasets

In diesem Abschnitt, we review available large-scale
XLS datasets6 and further divide them into two
categories: synthetic datasets (§ 3.1) and multi-
lingual website datasets (§ 3.2). For each cate-
gory, we will introduce the construction details
and the key characteristics of the corresponding
datasets. Zusätzlich, we compare these two cate-
gories to provide a deeper understanding (§ 3.3).

1If the source language is a programming language while
the target language is a human language, the task becomes
code summarization, which is beyond the scope of this survey.
2https://github.com/csebuetnlp/xl-sum
/tree/master/multilingual_rouge_scoring.
3https://github.com/dqwang122/MLROUGE.
4https://github.com/AIPHES/emnlp19

-moverscore.

5https://github.com/Tiiiger/bert score.
6There are also some XLS datasets in the statistical learn-
ing era, z.B., multiple MultiLing datasets (Giannakopoulos,
2013; Giannakopoulos et al., 2015) and the translated
DUC2001 dataset (Wan, 2011). Jedoch, these datasets
are either not public or extremely limited in scale (typically
less than 100 Proben). Daher, we do not go into these data-
sets in depth.

3.1 Synthetic Datasets

Intuitively, one straightforward way to build XLS
datasets is directly translating the summaries of a
MS dataset from their original language to differ-
ent target languages. The datasets built in this way
are named synthetic datasets, which could benefit
from existing MS datasets.

Dataset Construction. En2ZhSum (Zhu et al.,
2019) is constructed through utilizing a sophis-
ticated MT service7 to translate the summaries
of CNN/Dailymail (Hermann et al., 2015) Und
MSMO (Zhu et al., 2018) from English to Chi-
nese. In the same way, Zh2EnSum (Zhu et al.,
2019) is built through translating the summaries
of LCSTS (Hu et al., 2015) from Chinese to En-
glish. Later, Bai et al. (2021A) propose En2DeSum
through translating the English Gigaword8 to
German using the WMT’19 English-German
winner MT model (Ng et al., 2019).

More recently, Wang et al. (2022B) construct
XSAMSum and XMediaSum, which directly em-
ploy professional translators to translate summa-
ries of two dialogue-oriented MS datasets, Das
Ist, SAMSum (Gliwa et al., 2019) and MediaSum
(Zhu et al., 2021), from English to both German
and Chinese. In this way, their datasets achieve
much higher quality than those automatically
constructed ones.

Quality Controlling. Since the translation re-
sults provided by MT services might contain
flaws, En2ZhSum, Zh2EnSum, and En2DeSum
further use the round-trip translation (RTT) strat-
egy to filter out low-quality samples. Specifi-
cally, given a monolingual document-summary
pair (cid:5)Dsrc, Ssrc(cid:6), the summary Ssrc is first trans-
lated to a target langauge S(cid:7)
tgt is
translated back to the source language S(cid:7)
src. Nächste,
(cid:5)Dsrc, S(cid:7)
(cid:6) will be retained as an XLS sample
only if the ROUGE scores between Ssrc and
S(cid:7)
src exceed the pre-defined thresholds. In ADDI-
tion, the translated summaries in the test set of
En2ZhSum and Zh2EnSum are post-edited by hu-
man annotators to ensure the reliability of model
evaluation.

tgt, and then S(cid:7)

tgt

As for manually translated synthetic datasets,
nämlich, XSAMSum and XMediaSum, Wang et al.
(2022B) design a quality control loop, where data

7http://www.zkfy.com/.
8LDC2011T07.

1306

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Dataset

Trans. Genre Scale

Src

Tgt
Lang. Lang.

En2ZhSum Auto. News
Zh2EnSum Auto. News
En2DeSum Auto. News
XSAMSum Manu. Dial.
XMediaSum Manu. Dial.

En
371k
1.7M Zh
438k
16k×2
40k×2

Zh
En
En De
En De/Zh
En De/Zh

indicates

‘‘Trans.’’

Tisch 1: Overview of existing synthetic XLS
the translation
datasets.
method (automatic or manual) to construct data-
sets. The ‘‘genres’’ of these datasets are divided
into news articles and dialogues according to the
basic MS datasets. For ‘‘scale’’, some datasets
contain two cross-lingual directions, thus we use
×2 to calculate the overall scale. ‘‘Src Lang.’’
and ‘‘Tgt Lang.’’ denote the source and tar-
get languages for each dataset, jeweils (En:
English, Zh: Chinese, and De: Deutsch).

reviewers and experts participate to ensure the
accuracy of the translation.

Dataset Statistics. Tisch 1 compares previous
synthetic datasets in terms of the translation
method, Genre, scale, source language, and tar-
get language. We conclude that: (1) Da ist ein
trade-off between scale and quality. In line with
MS, the scale of XLS datasets in the news domain
is much larger than others since news articles
are convenient to collect. When faced with such
large-scale datasets, it is expensive and even im-
practical to manually translate or post-edit all
their summaries. Daher, these datasets generally
adopt automatic translation methods, causing lim-
ited quality. (2) The XLS datasets in the dialogue
domain are more challenging than those in the
news domain. Besides the limited scale, the key
information of one dialogue is often scattered and
spans multiple utterances, leading to low informa-
tion density (Feng et al., 2022C), which together
with complex dialogue phenomena (z.B., corefer-
enz, repetition, and interruption) makes the task
quite challenging (Wang et al., 2022B).

3.2 Multi-Lingual Website Datasets

In the globalization process, online resources
across different languages are overwhelmingly
wachsend. One reason is that many websites start to
provide multi-lingual versions for their content to
facilitate global users. daher, these websites

might contain a large number of parallel docu-
ments in different languages. Some researchers
try to utilize such resources to establish XLS
datasets.

Dataset Construction. Nguyen and Daum´e III
(2019) collect news articles from the Global
Voices website,9 which reports and translates
news about unheard voices across the globe. Der
translated news on this website is performed by
volunteer translators. Each news article also links
to its parallel articles in other languages, if avail-
able. Daher, it is convenient to obtain different
language versions of an article. Dann, they em-
ploy crowdworkers to write English summaries
for hundreds of selected English articles. In diesem
manner, the non-English articles together with the
English summaries constitute the Global Voices
XLS dataset.10 Although this dataset utilizes on-
line resources, the way to collect summaries (d.h.,
crowd-sourcing) limits its scale and directions (Die
target language must be English).

To alleviate the dilemma, WikiLingua (Ladhak
et al., 2020) collects multi-lingual guides from
WikiHow,11 where each step in a guide consists
of a paragraph and the corresponding one-sentence
summary. Heuristically, the dataset combines par-
agraphs and one-sentence summaries of all the
steps in one guide to create a monolingual article-
summary pair. With the help of hyperlinks be-
tween parallel guides in different languages, Die
article in one language and its summary in an-
other one are easy to align. In this way, Wiki-
Lingua collects articles and the corresponding
leading
summaries in 18 anders
Zu 306 (18 × 17) directions. Ähnlich, Perez-
Beltrachini and Lapata (2021) construct XLS
datasets from Wikipedia,12 a widely used multi-
lingual encyclopedia. In detail, the Wikipedia ar-
ticles are typically organized into lead sections
and bodies. They focus on 4 languages and pair
lead sections with the corresponding bodies in
different languages to construct XLS samples. In
the end, the collected samples form the XWikis
dataset with 12 directions.

languages,

9https://globalvoices.org/.
10The Global Voices dataset contains gv-snippet and
gv-crowd two subsets. The former cannot well meet the
need of XLS due to its low quality (Nguyen and Daum´e III,
2019), thus we only introduce the gv-crowd subset.

11https://www.wikihow.com/.
12https://www.wikipedia.org/.

1307

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Dataset

Domain

Scale
(avg / max / min)

Global Voices News
News
CrossSum
Guides
WikiLingua
Encyclopedia
XWikis

15
14
45 1936
18
306
4

208 / 487 / 75
845 / 45k / 1
18k / 113k / 915
12 214k / 469k / 52k

Tisch 2: Overview of representative multi-lingual
website datasets. ‘‘L’’ denotes the number of lan-
guages involved in each dataset. ‘‘D’’ indicates
the number of cross-lingual directions. ‘‘Scale
(avg/max/min)’’ calculates the average/maximum/
minimum number of XLS samples per direction.

Zusätzlich, Hasan et al. (2021A) construct the
CrossSum dataset by automatically aligning iden-
tical news articles written in different languages
from the XL-Sum dataset (Hasan et al., 2021B).
The multi-lingual news article-summary pairs in
XL-Sum are collected from the BBC website.13
Infolge, CrossSum involves 45 languages and
1936 directions.

Quality Control. For the manually annotated
dataset (d.h., Global Voices), Nguyen and Daum´e
III (2019) employ human evaluation to remove
low-quality annotated summaries to ensure the
Qualität. For automatically collected datasets (d.h.,
WikiLingua and XWikis), they typically extract
the desired content from the websites via heuristic
matching rules to ensure the correctness. As for
the automatically aligned dataset (d.h., CrossSum),
Hasan et al. (2021A) adopt LaBSE (Feng et al.,
2022A) to encode all summaries from XL-Sum
(Hasan et al., 2021B). Dann, they align documents
belonging to different languages based on the
cosine similarity of corresponding summaries, Und
pre-define a minimum similarity score to reduce
the number of incorrect alignments.

Dataset Statistics. Tisch 2 lists the key charac-
teristics of the representative multi-lingual website
datasets. It is worth noting that the number of XLS
samples in each direction of the same dataset may
be different since different articles might be avail-
able in different languages. Somit, we measure the
overall scale of each dataset from its average, max-
imum, and minimum number of XLS samples per
Richtung, jeweils. We find that: (1) The scale
of Global Voices is extremely lower than other
datasets due to the different methods for collect-

13https://www.bbc.com/.

ing summaries. Speziell,
the WikiLingua,
XWikis and XL-Sum (the basis of CrossSum)
datasets automatically extract a huge number of
summaries from online resources via simple strat-
egies rather than crowd-sourcing. (2) CrossSum
and WikiLingua involve more languages than
die Anderen, and most language pairs have inter-
sectional articles, resulting in numerous cross-
lingual directions.

3.3 Diskussion

According to the above review of large-scale XLS
datasets, the approaches for building datasets are
summarized as: (ICH) manually or (II) automatically
translating the summaries of MS datasets; (III)
automatically collecting documents as well as
summaries from multi-lingual websites.

Among them, approach I involves less noise
than others since its translation and quality control
are performed by professional translators rather
than machine translation or volunteers. Jedoch,
this approach is too labor-intensive and costly to
build large-scale datasets. Zum Beispiel, to con-
trol costs, XMediaSum (Wang et al., 2022B) nur
manually translates part of (∼8.6%) summaries of
MediaSum (Zhu et al., 2021). Besides, Zh2EnSum
and En2ZhSum (Zhu et al., 2019) are automati-
cally collected via approach II, and only their
test sets have been manually corrected. daher,
despite the high quality of the constructed data,
approach I is more suitable for building validation
and test sets of large-scale XLS datasets rather
than the whole datasets.

Approaches II and III could be adopted to build
whole XLS datasets. We discuss them in the
following situations:

source

(1) High-resource

languages ⇒
high-resource target languages: This situation has
been well studied in previous work, and most of
the proposed XLS datasets focus on this situation.
Both approaches II and III are useful to construct
XLS datasets whose source and target languages
are both high-resource languages.

(2) High-resource source languages ⇒ low-
languages: When the docu-
resource target
ments and summaries from XLS datasets are,
jeweils, in a high-resource language and a
low-resource language, approach III loses its ef-
fectiveness. This is because, for a multi-lingual
website, its content in a low-resource language
in a high-resource
is typically less than that

1308

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Sprache. Infolge, the number of collected
XLS samples involving low-resource languages
is significantly limited. Zum Beispiel, WikiLingua
(Ladhak et al., 2020), as a multi-lingual website
dataset, contains 113.2k English⇒Spanish sam-
ples, but only 7.2k English⇒Czech samples. In
this situation, approach II might be a possible
way to collect a large number of samples. Notiz
that the MT from a high-resource language to a
low-resource language might involve more trans-
lation flaws than those between two high-resource
languages. Daher, besides the RTT strategy, Wie
to filter out the potential flaws is worthy for fur-
ther study.

(3) Low-resource source languages ⇒ high-
or low-resource target languages: If the source
language is low-resource, there might be no MS
dataset and enough website content in this lan-
Spur, leading to the failures of approaches II
and III. daher, how to build datasets in this
situation is still an open-ended problem, welche
needs to be explored in the future. As pointed by
Feng et al. (2022B), one straightforward approach
is to automatically translate both documents and
summaries from high-resource MS datasets. Wie-
immer,
translating documents with hundreds of
words might introduce substantial noise, espe-
cially when low-resource languages are involved.
Daher, its practicality and reliability need more
careful justification.

4 Pipeline Methods

Early XLS work generally focuses on the pipe-
line methods whose main idea is decomposing
XLS into MS and MT sub-tasks, and then accom-
plishing them step by step. These methods can
be further divided into summarize-then-translate
(Sum-Trans) and translate-then-summarize (Trans-
Sum) types according to the finished order of
sub-tasks. For each type, we will systematically
present previous methods. Zusätzlich, we com-
pare these two types to provide deeper analyses.

4.1 Sum-Trans

Or˘asan and Chiorean (2008) utilize the Maxi-
mum Marginal Relevance (MMR) algorithm to
summarize Romanian news, and then translate
the summaries from Romanian to English via
the eTranslator MT service.14 Furthermore, Wan
et al. (2010) find the translated summaries might

fall into low readability due to the limited MT
performance at that time. To alleviate this issue,
they first use a trained SVM model (Cortes and
Vapnik, 1995) to predict the translation quality
of each English sentence, where the model only
leverages features in the English sentences. Dann,
they select sentences with high quality and infor-
mativeness to form summaries which are finally
translated to Chinese by Google MT service.15

4.2 Trans-Sum

Compared with Sum-Trans, Trans-Sum attracts
more research attention, and this type of pipeline
method can be further classified into three
sub-types depending on whether its summarizer is
extractive, compressive, or abstractive:

• The extractive method selects complete sen-
tences from the translated documents as
summaries.

• The compressive method first extracts key
sentences from the translated documents, Und
further removes non-relevant or redundant
words in the key sentences to obtain the final
summaries.

• The abstractive method generates new sen-
tences as summaries, which are not limited
to original words or phrases.

Note that we do not classify the Sum-Trans ap-
proaches in the same manner since their sum-
marizers are all extractive.

Extractive Trans-Sum. Leuski et al. (2003)
build a cross-lingual information delivery system
that first translates Hindi documents to English
via a statistical MT model and then selects im-
portant English sentences to form summaries. In
this system, the summarizer only uses the docu-
ment information from the target language side,
which heavily depends on the MT results and
might lead to flawed summaries. Jedoch, se-
mantic information from both sides should be
taken into account.

Zu diesem Zweck, after translating English docu-
ments to Chinese, Wan (2011) designs two graph-
based summarizers (d.h., SimFusion and CoRank)
which utilize bilingual
information to output
the final Chinese summaries: (ich) the SimFusion

14https://www.etranslator.ro/.

15https://cloud.google.com/translate.

1309

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

summarizer first measures the saliency scores of
Chinese sentences through combing the English-
side and Chinese-side similarity, and then, Die
salient Chinese sentences constitute the final
summaries; (ii) the CoRank summarizer simulta-
neously ranks both English and Chinese sentences
by incorporating mutual influences between them,
and then, the top-ranking Chinese sentences are
used to constitute summaries.

Later, Boudin et al. (2011) translate documents
from English to French, and then use SVM re-
gression method to predict translation quality of
each sentence based on bilingual features. Nächste,
the crucial translated sentences are selected based
on a modified PageRank algorithm (Page et al.,
1999) considering the translation quality. zuletzt,
the redundant sentences are removed from the
selected sentences to form the final summaries.

Compressive Trans-Sum.
Inspired by phrase-
based MT, Yao et al. (2015) propose a compres-
sive summarization method that simultaneously
selects and compresses sentences. Speziell,
the sentence selection is based on bilingual fea-
tures, and the sentence compression is performed
by removing the redundant or poorly translated
phrases in one single sentence. To further exca-
vate the complementary information of similar
Sätze, Zhang et al. (2016) first parse bilingual
documents into predicate-argument structures
(PAS), and then produce summaries by fusing
bilingual PAS structures. In this way, several
salient PAS elements (concepts or facts) from dif-
ferent sentences can be merged into one summary
Satz. Ähnlich, Linhares Pontes et al. (2018)
take bilingual lexical chunks into account dur-
ing measuring the sentence similarity and further
compress sentences at both single- and multi-
sentence levels.

Abstractive Trans-Sum. With the emergence
of large-scale synthetic XLS datasets (Zhu et al.,
2019), researchers attempt to adopt the sequence-
to-sequence models as summarizers in Trans-Sum
Methoden. Considering that the translated docu-
ments might contain flaws, Ouyang et al. (2019)
train an abstractive summarizer (d.h., PGNet, Sehen
et al. 2017) on English pairs of a noisy docu-
ment and a clean summary. In this manner, Die
summarizer could achieve good robustness, Wann
summarizing the English documents which are
translated from a low-resource language.

4.3 Sum-Trans vs. Trans-Sum

We compare Sum-Trans and Trans-Sum in the
following situations:

• When using extractive or compressive sum-
marizers, the summarizers of the Trans-Sum
methods can benefit from bilingual docu-
ments while the counterpart of the Sum-Trans
methods can only utilize the source-language
documents. Daher, the Trans-Sum methods
typically achieve better performance than
the Sum-Trans counterparts. Zum Beispiel,
on the manually translated DUC 2001
dataset, PBCS (Yao et al., 2015), as a Trans-
Sum method, outperforms its Sum-Trans
baseline by 8%/8.4%/10.4% in terms of
ROUGE-1/2/L. Andererseits, the Trans-
Sum methods are less efficient since they
need to translate the whole documents rather
than a few summaries.

• Apart

from the above discussion, Wann
adopting abstractive summarizers, a large-
scale MS dataset is required to train the
summarizers. It is also worth noting that
the MS datasets in low-resource languages
are much smaller than the MT counterparts
(Tiedemann and Thottingal, 2020; Hasan
et al., 2021B). Daher, the Trans-Sum meth-
ods are helpful if the source language is low-
resource. Im Gegensatz, if the target language
is low-resource in MS, the Sum-Trans meth-
ods are more useful (Ouyang et al., 2019;
Ladhak et al., 2020).

5 End-to-End Methods

Though the pipeline method is intuitive, Es 1)
suffers from error propagation; 2) needs either a
large corpus to train MT models or the monetary
cost of paid MT services; 3) has a latency during
inference. Thanks to the rapid development of
neural networks, many end-to-end XLS models
are proposed to alleviate the above issues.

In diesem Abschnitt, we take stock of previous end-
to-end XLS models and further divide them into
four frameworks (cf., Figur 1): multi-task frame-
arbeiten (§ 5.1), knowledge-distillation framework,
(§ 5.2), resource-enhanced framework (§ 5.3), Und
pre-training framework (§ 5.4). For each frame-
arbeiten, we will entirely introduce its core idea and

1310

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Figur 1: Overview of four end-to-end frameworks (best viewed in color). XLS: cross-lingual summarization;
MT: maschinelle Übersetzung; MS: monolingual summarization. Dashed arrows indicate the supervised signals.
Rimless colored blocks denote the input or output sequences of the corresponding tasks. Beachten Sie, dass
Die
knowledge-distillation framework might contain more than one teacher model, and the auxiliary/pre-training
tasks used in the multi-task/pre-training framework are not limited to MT and MS, here we omit these for
simplicity.

corresponding models. zuletzt, we discuss the pros
and cons with respect to each framework (§ 5.5).

5.1 Multi-Task Framework

It is challenging for an end-to-end model to di-
rectly conduct XLS since it requires both the
abilities to translate and summarize (Cao et al.,
2020). As shown in Figure 1(A), many researchers
use the related tasks (z.B., MT and MS) together
with XLS to train unified models. In this way,
XLS models could also benefit from the related
tasks.

Zhu et al. (2019) utilize a shared transformer
encoder to encode the input sequences of both
XLS and MT/MS. Dann, two independent trans-
former decoders are used to conduct XLS and
MT/MS, jeweils. This was the first paper to
show that the end-to-end method outperforms the
pipeline ones. Later, Cao et al. (2020) use two
encoder-decoder models to perform MS in the
source and target languages, jeweils. Mean-
while, the source encoder and the target decoder
jointly conduct XLS. Dann, two linear mappers
are used to convert the context representation
(d.h., the output of encoders) from the source to
the target language and vice versa. Zusätzlich,
two discriminators are adopted to discriminate be-
tween the encoded and mapped representations.
Thereby, the overall model could jointly learn to
summarize documents and align representations
between both languages.

Although the above efforts design unified mod-
els in the multi-task framework, their decoders are
independent for different tasks, leading to limi-
tations in capturing the relationships among the
multiple tasks. To solve this problem, Takase and
Okazaki (2020) train a single encoder-decoder

model on both MS, MT and XLS datasets. Sie
prepend a special token at the input sequences
to indicate which task is performed. Zusätzlich,
Bai et al. (2021A) make the MS a prerequisite
for XLS and propose MCLAS, a XLS model of
single encoder-decoder architecture. For the given
documents, MCLAS generates the sequential con-
catenation of the corresponding monolingual and
cross-lingual summaries. In this way, the transla-
tion alignment is also implicit in the generation
Verfahren, making MCLAS achieve great per-
formance in XLS. More recently, Liang et al.
(2022) utilize conditional variational auto-encoder
(CVAE) (Sohn et al., 2015) to capture the hier-
archical relationship among MT, MS, and XLS.
Speziell, three variables are adopted in the pro-
posed model to reconstruct the results of MT, MS,
and XLS, jeweils. Besides, the used encoder
and decoder are shared among all tasks, während die
prior and recognition networks are independent
to indicate the different tasks. Considering the
limited XLS data in low-resource languages, Bai
et al. (2021A) and Liang et al. (2022) also investi-
gate XLS in the few-shot setting.

5.2 Knowledge-Distillation Framework

The original thought of knowledge distillation
is distilling the knowledge in an ensemble of
Modelle (d.h., teacher models) into a single model
(d.h., student model) (Hinton et al., 2015). Due
to the close relationship between MT/MS and
XLS, some researchers attempt to use MS or MT,
or both models to teach the XLS model in the
knowledge-distillation framework. In this way,
besides the XLS labels, the student model can
also learn from the output or hidden state of the
teacher models.

1311

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Ayana et al. (2018) utilize large-scale MS and
MT corpora to train MS and MT models, bzw-
aktiv. Dann, they use the trained MS or MT, oder
both models, as the teacher models to teach the
XLS student model. Both the teacher and student
models are bi-directional GRU models (Cho et al.,
2014). To let the student model mimic the output
of the teacher model, the KL-divergence between
the generation probabilities of these two mod-
els is used as the training objective. Later, Duan
et al. (2019) implement transformer (Vaswani
et al., 2017) as the backbone of the MS teacher
model and the XLS student model, and further
train the student model with two objectives: (1)
the cross-entropy between the generation distri-
butions of these two models; (2) the Euclidean
distance between the attention weights of both
Modelle. It is worth noting that both Ayana et al.
(2018) and Duan et al. (2019) focus on zero-shot
XLS due to the scarcity of XLS datasets at that
Zeit, while their training objectives do not in-
clude XLS.

After the emergence of large-scale XLS data-
sets, Nguyen and Luu (2022) confirm that the
knowledge-distillation framework can also be
adopted in rich-resource scenarios. Speziell,
they employ the transformer student and teacher
Modelle, and further propose a variant Sinkhorn
divergence, which together with the XLS objec-
tive supervises the student XLS model.

5.3 Resource-Enhanced Framework

As shown in Figure 1(C), the resource-enhanced
framework utilizes additional resources to enrich
the information of the input documents, und das
generation probability of the output summaries
is conditioned on both the encoded and enriched
Information.

Zhu et al. (2020) explore the translation pat-
tern in XLS. In detail, they first encode the input
documents in source language via a transformer
encoder, and then obtain the translation distribu-
tion for the words of the input documents by the
fast-align toolkit (Dyer et al., 2013). zuletzt,
a transformer decoder is used to generate sum-
maries in target language based on both its output
distribution and the translation distributions. In
this way, the extra bilingual alignment informa-
tion helps the XLS model better learn the trans-
formation from the source to the target language.
Jiang et al. (2022) utilize the TextRank toolkit

(Mihalcea and Tarau, 2004) to extract key clues
from input sequences, and then construct arti-
cle graphs based on these clues via a designed
Algorithmus. Nächste, they encode the clues and the
article graphs by a clue encoder (with trans-
former encoder architecture) and a graph encoder
(based on graph nerual networks), jeweils.
Endlich, a transformer decoder with two types of
cross-attention (performed on the outputs of both
clue and graph encoders) is adopted to generate
final summaries. Zusätzlich, they consider the
translation distribution used in Zhu et al. (2020)
to further strength the proposed model.

5.4 Pre-Training Framework

The emergence of pre-trained models has brought
NLP to a new era (Qiu et al., 2020). The pretrained
models typically first learn the general represen-
tation from large-scale corpora, and then adapt to
the specific task through fine-tuning.

More recently, the general multi-lingual pre-
trained generative models have shown impres-
sive performance on many multi-lingual NLP
tasks. Zum Beispiel, mBART (Liu et al., 2020),
as a multi-lingual pre-trained model, is derived
from BART (Lewis et al., 2020). mBART is
pre-trained with BART-style denoising objectives
on a huge volume of unlabeled multi-lingual data.
mBART shows its superiority in MT originally
(Liu et al., 2020), and Liang et al. (2022) find it
can also outperform many multi-task XLS mod-
els on large-scale XLS datasets through simply
fine-tuning. Later, mBART-50 (Tang et al., 2021)
goes a step further and extends the language pro-
cessing abilities of mBART from 25 languages
Zu 50 languages. In addition to the BART-style
pre-trained models, mT5 (Xue et al., 2021) Ist
a multi-lingual T5 (Raffel et al., 2020) Modell,
which is pre-trained in 101 languages with the
T5-style span corruption objective. Although great
performance has been achieved, these general
pre-trained models only utilize the denoising or
span corruption objectives in multiple languages
without any cross-lingual supervision, ergebend
the under-explored cross-lingual ability.

To solve this problem, Xu et al. (2020) propose
a mix-lingual XLS model which is pre-trained
with masked language model (MLM), denoising
auto-encoder (DAE), MS, translation span cor-
ruption (TSC), and MT tasks.16 The TSC and MT

16Typewriter font indicates the cross-lingual tasks.

1312

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Tisch 3: Examples of inputs and targets used by different cross-lingual pre-training tasks for the
sentence ‘‘Everything that kills make me feel alive’’ with its Chinese translation and summarization.
The randomly selected spans are replaced with unique mask tokens (i.e, [M1], [M2], Und [M3]) in TSC
and TPSC.

pre-training samples are derived from OPUS
English↔Chinese parallel corpus.17 Dou et al.
(2020) utilize XLS, MT and MS tasks to pre-
train another XLS model. They leverage the
English↔German/Chinese MT samples from the
WMT2014/WMT2017 dataset. For XLS,
Sie
pre-train the model on En2ZhSum and English-
German datasets (Dou et al., 2020). Wang et al.
(2022B) focus on dialogue-oriented XLS and
extend mBART-50 with MS, MT, and two
dialogue-oriented pre-training objectives (d.h., ac-
tion infilling and utterance permutation) via the
second pre-training stage on MediaSum and
XMediaSum datasets. Note that Xu et al. (2020),
Dou et al. (2020), and Wang et al. (2022B) nur
focus on the XLS task. The languages supported
by these models are limited to a few specific ones.
Außerdem, mT6 (Chi et al., 2021A) and ΔLM
(Ma et al., 2021) are presented towards general
cross-lingual abilities. In detail, Chi et al. (2021A)
first present three tasks, nämlich, MT, TSC, Und
translation pair span corruption (TPSC), to extend
mT5, and then design a PNAT decoding strategy
to let the model separately decode each target
span of SC-like pre-training tasks. Endlich, Chi
et al. (2021A) combine SC, TSC, and PNAT to
jointly train the mT6 model. To support mul-
tiple languages, mT6 is pre-trained on CC-Net
(Wenzek et al., 2020), MultiUN (Ziemski et al.,
2016), IIT Bombay (Kunchukuttan et al., 2018),
OPUS, and WikiMatrix (Schwenk et al., 2021)
corpora, covering a total of 94 languages. ΔLM
reuses the parameters of InfoXLM (Chi et al.,
2021B) and further is trained with SC and TSC
tasks on CC100 (Conneau et al., 2020), CC-Net,
Wikipedia dump, CCAligned (El-Kishky et al.,
2020), and OPUS corpora, einschließlich 100 lan-

17http://opus.nlpl.eu/.

guages. The superiority of mT6 and ΔLM on
WikiLingua (a large-scale XLS dataset) has been
demonstrated. Darüber hinaus, there are also some gen-
eral cross-lingual pre-trained models that have
not been evaluated in XLS, Zum Beispiel, XNLG
(Chi et al., 2020) and VECO (Luo et al., 2021).

Tisch 3 shows the details of the above cross-
lingual pre-training tasks. TSC and TPSC pre-
dict the masked spans from a translation pair.
The input sequence of TSC is only masked in
one language while the counterpart of TPSC is
masked in both languages.

5.5 Diskussion

Tisch 4 summarizes all end-to-end XLS models.
We conclude that all four frameworks resort to
external resources to improve XLS performance:
(1) The multi-task framework uses large-scale
MS and MT corpora to help XLS. Though the
multi-task learning is intuitive, its training strat-
egy and weights of different task is non-trivial to
determine. (2) The knowledge-distillation frame-
work is another way to utilize the large-scale MS
and MT corpora. This framework is most suitable
for zero-shot XLS since it could be supervised by
the MS and MT teacher models without any XLS
labels. Trotzdem, knowledge distillation often
fails to live up to its name, transferring very lim-
ited knowledge from teacher to student (Stanton
et al., 2021). Daher, it should be verified more
deeply in the rich-resource XLS. (3) The resource-
enhanced framework employs the off-the-shelf
toolkits to enhance the representation of input
documents. This framework significantly relaxes
the dependence on external data, but it suffers from
error propagation. (4) The pre-training frame-
work can benefit from both unlabeled and la-
beled corpora. In detail, pre-trained models learn

1313

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Modell

Architecture

Training Objective

Evaluation Direction

Evaluation Dataset

Multi-Task Framework

CLS+MS (Zhu et al., 2019)
CLS+MT (Zhu et al., 2019)
Cao et al. (2020)
Transum (Takase and Okazaki, 2020)
MCLAS (Bai et al., 2021A)
VHM (Liang et al., 2022)

Transformer
Transformer
Transformer
Transformer
Transformer
Transformer∗

XLS+MS
XLS+MT
XLS+MS+REC
XLS+MS+MT
XLS+MS
XLS+MS+MT

En↔Zh
En↔Zh
En↔Zh
Ar/Zh→En, En→Ja
En↔Zh, En→De
En↔Zh

En2ZhSum, Zh2EnSum
En2ZhSum, Zh2EnSum
Gigaword†, DUC2004†, En2ZhSum, Zh2EnSum
DUC2004†, JAMUL†
En2ZhSum, Zh2EnSum, En2DeSum
En2ZhSum, Zh2EnSum

MS teacher (Ayana et al., 2018)
MT teacher (Ayana et al., 2018)
MS+MT teachers (Ayana et al., 2018)
Duan et al. (2019)
Nguyen and Luu (2022)

GRU
GRU
GRU
Transformer
Transformer

XLS+KD (MS)
XLS+KD (MT)
XLS+KD (MS+MT)
XLS+KD (MS)
XLS+KD (MS)

En→Zh
En→Zh
En→Zh
Zh→En
En↔Zh, En↔Ja, En→Ar/Vi

DUC2003†, DUC2004†
DUC2003†, DUC2004†
DUC2003†, DUC2004†
Gigaword†, DUC2004†
En2ZhSum, Zh2EnSum, WikiLingua

Knowledge-Distillation Framework

ATS (Zhu et al., 2020)
GlueGraphSum (Jiang et al., 2022)

Transformer∗
Transformer∗

XLS
XLS

Resource-Enhanced Framework

En↔Zh
En↔Zh

Pre-Training Framework

En2ZhSum, Zh2EnSum
En2ZhSum, Zh2EnSum, CyEn2ZhSum‡

Xu et al. (2020)
Dou et al. (2020)
mT6 (Chi et al., 2021A)
ΔLM (Ma et al., 2021)
mDIALBART (Wang et al., 2022B)

Transformer
Transformer
Transformer
Transformer
Transformer

MLM+DAE+MS+MT+TSC
XLS+MT+MS
SC+TSC+PNAT
SC+TSC
AcI+UP+MS+MT

En↔Zh
En→Zh, En→De
Es/Ru/Tr/Vi→En
Es/Ru/Tr/Vi→En
En→Zh, En→De

En2ZhSum, Zh2EnSum
En2ZhSum, English-German‡
WikiLingua
WikiLingua
XMediaSum40k

Tisch 4: The summary of end-to-end XLS models. ‘‘Transformer’’ means the vanilla transformer
encoder-decoder architecture. ∗ denotes the variant architecture. ‘‘REC’’ represents the reconstruction
objective, which is used to supervise the linear mappers in the model proposed by Cao et al. (2020).
‘‘KD’’ denotes the knowledge distillation objectives, derived from the output or hidden state of the
corresponding teacher models, such as MS and MT models. The ‘‘Training Objective’’ of pre-trained
models lists the pre-training objectives. Language nomenclature used in ‘‘Evaluation Direction’’ is ISO
639-1 codes. † indicates the number of samples in the dataset is less than 2000. ‡ denotes unre-
leased datasets.

Modell

CLS+MS♥† (Zhu et al., 2019)
CLS+MT♥† (Zhu et al., 2019)
Cao et al. (2020)♥†
VHM♥∗ (Liang et al., 2022)
ATS (Zhu et al., 2020) ♣†
mBART (Liu et al., 2020) ♠‡
Dou et al. (2020) ♠∗
Xu et al. (2020) ♠∗
mVHM (Liang et al., 2022)♥♠∗

R-1

38.25
40.23
38.12
40.98
40.47
41.55
42.83
43.50
41.95

En2ZhSum
R-2

20.20
22.32
16.76
23.07
22.21
23.27
23.30
25.41
23.54

R-L

34.76
36.59
33.86
37.12
36.89
37.22
39.29
29.66
37.67

R-1

40.34
40.25
40.97
41.36
40.68
43.61
−

41.62
43.97

Zh2EnSum
R-2

22.65
22.58
23.20
24.64
24.12
25.14
−

23.35
25.61

R-L

36.39
36.21
36.96
37.15
36.97
38.79
−

37.26
39.19

Tisch 5: The leaderboard of end-to-end XLS
models on En2ZhSum and Zh2EnSum datasets
(Zhu et al., 2019) in terms of ROUGE(R)-1/2/L
(Lin, 2004). The evaluation scripts refer to Zhu
et al. (2020). ♥: multi-task framework; ♣: resource-
enhanced framework; ♠: pre-training framework.
† indicates the results are obtained by evaluating
output files provided by the authors; ‡ denotes
the results by running their released codes; ∗ indi-
cates the results are reported in the original papers
which adopt the same evaluation scripts as Zhu
et al. (2020).

from more external corpora than others, leading
to the promising performance on XLS.

To give a deeper comparison of end-to-end
XLS models, as shown in Table 5, we organize a
leaderboard with unified evaluation metrics, based
on the released code and generated results from
representative published literature. The models in
the pre-training framework (Liu et al., 2020; Dou
et al., 2020; Xu et al., 2020) generally outperform
Andere. Zusätzlich, the pre-training framework
could also serve other frameworks. For exam-
Bitte, Liang et al. (2022) utilize mBART weights
as model initialization for VHM (d.h., mVHM),
bringing decent gains compared with vanilla
VHM. daher, it is possible and valuable to
combine the advantages of different frameworks,
which is worthy of discussion in the future.

6 Prospects

the general language knowledge from large-scale
unlabeled data with self-supervised objectives. In
order to improve the cross-lingual ability, they can
resort to MT parallel corpus and design supervised
Signale. This framework absorbs more knowledge

In diesem Abschnitt, we discuss and suggest the follow-
ing promising future directions, which meet actual
application needs:

The Essence of XLS. Unifying two abilities
(d.h., translation and summarization abilities) In

1314

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

a single model is non-trivial (Cao et al., 2020).
Even though the effectiveness of the state-of-the-
art models has been proved, the essence of XLS
remains unclear, especially (1) the hierarchical
relationship between MT&MS and XLS (Liang
et al., 2022), Und (2) the theoretical analysis for
what makes MT&MS help XLS?

XLS Dataset with Low-Resource Languages.
There are thousands of languages in the world
and most of them are low-resource. Despite the
practical significance, building high-quality and
large-scale XLS datasets whose source or tar-
get language is low-resource remains challenging
(c.f., Abschnitt 3.3), and needs to be further ex-
plored in the future.

Unified XLS across Genres and Domains. Als
we described in Section 3, existing XLS datasets
cover multiple genres or domains, nämlich, news,
dialogue, guides, and encyclopedia. The diver-
sity across them is naturally promoting the need
for unified XLS, instead of promoting the trend
of devising unique models on individual gen-
res or domains. At present, the unified XLS is
still under-explored, making us believe the urgent
need for it.

Controllable XLS. Bai et al. (2021B) integrate
a compression rate to control how much infor-
mation should be kept in the target language. Wenn
the compression rate is 100%, XLS degrades to
MT. Daher, the continuous variable unifies XLS
and MT tasks. In this manner, a new research view
is introduced to leverage MT to help XLS. In ADDI-
tion, controlling some other attributes of the target
summary may be useful in real applications, solch
as entity-centric XLS and aspect-based XLS.

Low-Resource XLS. Most
languages in the
world are low-resource, which makes large-scale
parallel datasets across these languages rare and
expensive. Somit, low-resource XLS is more re-
alistic. Trotzdem, current work has not well
investigate and explore this situation. Kürzlich,
prompt-based learning has become a new para-
digm in NLP (Liu et al., 2021). With the help
of the well-designed prompting function, a pre-
trained model
is able to perform few-shot or
even zero-shot learning. Future work can adopt
prompt-based learning to deal with the low-
resource XLS.

Triangular XLS. Following triangular MT, tri-
angular XLS is a special case of low-resource XLS
where the language pair of interest has limited
parallel data, but both languages have abundant
parallel data with a pivot language. This situation
typically appears in multi-lingual website datasets
(a category of XLS datasets, cf., § 3.2), Weil
their documents are usually centered in English
and then translated into other languages to facili-
tate global users. Somit, English acts as the pivot
Sprache. How to exploit such abundant parallel
data to improve the XLS of interest language pairs
remains challenging.

Many-to-Many XLS. Most previous work trains
XLS models separately in each cross-lingual di-
rection. In this way, the knowledge of XLS can-
not be transferred among different directions.
Besides, a trained model can only perform in
a single direction, resulting in limited usage. To
solve this problem, Hasan et al. (2021A) jointly
fine-tune mT5 in multiple directions. zuletzt,
the fine-tuned model can perform in arbitrary
even unseen directions, which is named many-to-
many XLS. Future work can focus on design-
ing robust and effective training strategies for
many-to-many XLS.

Long Document XLS. Kürzlich, long docu-
ment MS has attracted wide research attention
(Cohan et al., 2018; Sharma et al., 2019; Wang
et al., 2021, 2022A). Long document XLS is also
important in real scenes, Zum Beispiel, facilitating
researchers to access the arguments of scientific
papers in foreign languages. Trotzdem, this di-
rection has not been noticed by previous work.
Interessant, we find many non-English scien-
tific papers have corresponding English abstracts
due to the regulations of publishers. For exam-
Bitte, many Chinese academic journals require
researchers to write abstracts in both Chinese
and English. This might be a feasible method to
construct long document XLS datasets. We hope
future work can promote this direction.

XLS. Previous

Multi-Document
multi-
document XLS work (Or˘asan and Chiorean,
2008; Boudin et al., 2011; Zhang et al., 2016)
only utilizes statistical features to build pipeline
Systeme, and further performs on early XLS
datasets. The multi-document XLS is also worthy
of discussion in the pre-trained model era.

1315

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Multi-Modal XLS. With the increasing of mul-
timedia data on the internet, some researchers
have put their effort into multi-modal summariza-
tion (Zhu et al., 2018; Sanabria et al., 2018; Li
et al., 2018, 2020; Fu et al., 2021), bei dem die
input of summarization systems is a document
together with images or videos. Trotzdem,
existing multi-modal summarization work only
focuses on the monolingual scene and ignores
cross-lingual ones. We hope future work could
highlight multi-modal XLS.

Evaluation Metrics. Developing evaluation
metrics for XLS is still an open problem that
needs to be further studied. Current XLS metrics
typically inherit from MS. Jedoch, anders
from MS, the XLS samples consist of (cid:5)source
document, (target document), source summary,
target summary(cid:6). Besides the target summary,
how to apply other information to assess the
summary quality would be an interesting point
for further study.

Others. Considering that current XLS research
is still in the preliminary stage, many research
points of MS are missing in XLS, such as the
factual inconsistency and hallucination problems.
These directions are also worthy to be deeply
explored in further work.

7 Abschluss

In diesem Papier, we present the first comprehensive
survey of current research efforts on XLS. Wir
systematically summarize existing XLS datasets
and methods, highlight their characteristics, Und
compare them with each other to provide deeper
Analysen. Zusätzlich, we give multiple perspective
directions to facilitate further research on XLS.
We hope that this XLS survey can provide a clear
picture of this topic and boost the development of
the current XLS technologies.

Danksagungen

We would like to thank anonymous reviewers for
their suggestions and comments. This research is
supported by the National Key Research and De-
velopment Project (NEIN. 2020AAA0109302), Die
National Natural Science Foundation of China
(NEIN. 62072323, 62102276), Shanghai Science

and Technology Innovation Action Plan (NEIN. 19-
511120400), Shanghai Municipal Science and
Technology Major Project (NEIN. 2021SHZDZX01-
03), the Natural Science Foundation of Jiangsu
Province (grant no. BK20210705), the Natural
Science Foundation of Educational Commission
of Jiangsu Province, China (grant no. 21KJD52-
0005), and the Priority Academic Program Devel-
opment of Jiangsu Higher Education Institutions.

Verweise

Ayana, Shi-qi Shen, Yun Chen, Cheng Yang,
Zhi-yuan Liu, and Mao-song Sun. 2018.
Zero-shot cross-lingual neural headline gen-
IEEE/ACM Transactions on Au-
eration.
and Language Processing,
dio,
26(12):2319–2327. https://doi.org/10
.1109/TASLP.2018.2842432

Speech,

Yu Bai, Yang Gao, and Heyan Huang. 2021A.
Cross-lingual abstractive summarization with
limited parallel resources. In Proceedings of
the 59th Annual Meeting of the Association
for Computational Linguistics and the 11th In-
ternational Joint Conference on Natural Lan-
guage Processing (Volumen 1: Long Papers),
pages 6910–6924, Online. Association for
Computerlinguistik. https://doi.org
/10.18653/v1/2021.acl-long.538

Yu Bai, Heyan Huang, Kai Fan, Yang Gao, Zewen
Chi, and Boxing Chen. 2021B. Bridging the gap:
Cross-lingual summarization with compression
rate. ArXiv preprint, abs/2110.07936v1.

Florian Boudin, St´ephane Huet, and Juan-Manuel
Torres-Moreno. 2011. A graph-based approach
to cross-language multi-document summariza-
tion. Polibits, 43:113–118.

Peter F. Braun, Stephen A. Della Pietra, Vincent
J. Della Pietra, and Robert L. Mercer. 1993.
The mathematics of statistical machine trans-
lation: Parameterschätzung. Rechnerisch
Linguistik, 19(2):263–311.

Yue Cao, Hui Liu, and Xiaojun Wan. 2020. Jointly
learning to align and summarize for neural
cross-lingual summarization. In Proceedings of
the 58th Annual Meeting of the Association for
Computerlinguistik, pages 6220–6231,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2020.acl-main.554

1316

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Zewen Chi, Li Dong, Shuming Ma, Shaohan
Huang, Saksham Singhal, Xian-Ling Mao,
Heyan Huang, Xia Song, and Furu Wei. 2021A.
mT6: Multilingual pretrained text-to-text trans-
In Proceed-
former with translation pairs.
Die 2021 Conference on Empirical
ings of
Methods in Natural Language Processing,
pages 1671–1683, Online and Punta Cana,
Dominican Republic. Association for Compu-
tational Linguistics. https://doi.org/10
.18653/v1/2021.emnlp-main.125

Zewen Chi, Li Dong, Furu Wei, Wenhui Wang,
Xian-Ling Mao, and Heyan Huang. 2020.
Cross-lingual natural language generation via
pre-training. In The Thirty-Fourth AAAI Con-
ference on Artificial Intelligence, AAAI 2020,
The Thirty-Second Innovative Applications of
Artificial Intelligence Conference, IAAI 2020,
The Tenth AAAI Symposium on Educational
Advances in Artificial Intelligence, EAAI 2020,
New York, New York, USA, February 7–12, 2020,
pages 7570–7577. AAAI Press. https://doi
.org/10.1609/aaai.v34i05.6256

Zewen Chi, Li Dong, Furu Wei, Nan Yang,
Saksham Singhal, Wenhui Wang, Xia Song,
Xian-Ling Mao, Heyan Huang, and Ming Zhou.
InfoXLM: An information-theoretic
2021B.
framework for cross-lingual language model
pre-training. In Proceedings of the 2021 Confer-
ence of the North American Chapter of the Asso-
ciation for Computational Linguistics: Human
Language Technologies, pages 3576–3588,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.naacl-main.280

Kyunghyun Cho, Bart van Merri¨enboer, Caglar
Gulcehre, Dzmitry Bahdanau, Fethi Bougares,
Holger Schwenk, and Yoshua Bengio. 2014.
Learning phrase representations using RNN
encoder–decoder for statistical machine trans-
lation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language
Processing (EMNLP), pages 1724–1734, Doha,
Qatar. Association for Computational Lin-
guistics.

Arman Cohan, Franck Dernoncourt, Doo Soon
Kim, Trung Bui, Seokhwan Kim, Walter Chang,
and Nazli Goharian. 2018. A discourse-aware
attention model for abstractive summariza-
tion of long documents. In Proceedings of

Die 2018 Conference of the North American
Chapter of the Association for Computational
Linguistik: Human Language Technologies,
Volumen 2 (Short Papers), pages 615–621, Neu
Orleans, Louisiana. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/N18-2097

Alexis Conneau, Kartikay Khandelwal, Naman
Goyal, Vishrav Chaudhary, Guillaume Wenzek,
Francisco Guzm´an, Edouard Grave, Myle Ott,
Luke Zettlemoyer, and Veselin Stoyanov.
2020. Unsupervised cross-lingual representa-
tion learning at scale. In Proceedings of the
58th Annual Meeting of the Association for
Computerlinguistik, pages 8440–8451,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2020.acl-main.747

Corinna Cortes and Vladimir Vapnik. 1995.
Support-vector networks. Machine Learning,
20(3):273–297. https://doi.org/10.1007
/BF00994018

Zi-Yi Dou, Sachin Kumar, and Yulia Tsvetkov.
2020. A deep reinforced model
for zero-
shot cross-lingual summarization with bilingual
semantic similarity rewards. In Proceedings
of the Fourth Workshop on Neural Genera-
tion and Translation, pages 60–68, Online.
Verein für Computerlinguistik.
https://doi.org/10.18653/v1/2020
.ngt-1.7

Xiangyu Duan, Mingming Yin, Min Zhang,
Boxing Chen, and Weihua Luo. 2019. Zero-
shot cross-lingual abstractive sentence sum-
marization through teaching generation and
attention. In Proceedings of the 57th Annual
the Association for Computa-
Meeting of
tional Linguistics, pages 3162–3172, Florence,
Italien. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/P19-1305

Chris Dyer, Victor Chahuneau, and Noah A.
Schmied. 2013. A simple, fast, and effective repa-
rameterization of IBM model 2. In Proceed-
ings of
der Norden
American Chapter of the Association for Com-
putational Linguistics: Human Language Tech-
nologies, pages 644–648, Atlanta, Georgia.
Verein für Computerlinguistik.

Die 2013 Conference of

1317

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Ahmed El-Kishky, Vishrav Chaudhary, Francisco
Guzm´an, and Philipp Koehn. 2020. CC-
Aligned: A massive collection of cross-lingual
web-document pairs. In Proceedings of
Die
2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP),
pages 5960–5969, Online. Association for
Computerlinguistik. https://doi.org
/10.18653/v1/2020.emnlp-main.480

Mehwish Fatima and Michael Strube. 2021.
A novel Wikipedia based dataset for mono-
lingual and cross-lingual summarization. In
Proceedings of the Third Workshop on New
Frontiers in Summarization, pages 39–50,
Online and in Dominican Republic. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/2021.newsum-1.5

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer,
Naveen Arivazhagan, and Wei Wang. 2022A.
Language-agnostic BERT sentence embedding.
In Proceedings of the 60th Annual Meeting
of the Association for Computational Linguis-
Tics (Volumen 1: Long Papers), pages 878–891,
Dublin,
Ireland. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/2022.acl-long.62

Xiachong Feng, Xiaocheng Feng, and Bing Qin.
2022B. MSAMSum: Towards benchmarking
multi-lingual dialogue summarization. In Pro-
ceedings of
the Second DialDoc Workshop
on Document-grounded Dialogue and Con-
versational Question Answering, pages 1–12,
Ireland. Association for Computa-
Dublin,
tional Linguistics. https://doi.org/10
.18653/v1/2022.dialdoc-1.1

Xiachong Feng, Xiaocheng Feng, and Bing
Qin. 2022C. A survey on dialogue summariza-
tion: Recent advances and new frontiers. In
Proceedings of the Thirty-First International
Joint Conference on Artificial Intelligence,
IJCAI-22, pages 5453–5460.
International
Joint Conferences on Artificial Intelligence
Organization. https://doi.org/10.24963
/ijcai.2022/764

Xiyan Fu, Jun Wang, and Zhenglu Yang. 2021.
MM-AVS: A full-scale dataset for multi-modal
summarization. In Proceedings of
Die 2021
the North American Chap-
Conference of

ter of
the Association for Computational
Linguistik: Human Language Technologies,
pages 5922–5926, Online. Association for
Computerlinguistik.

George Giannakopoulos. 2013. Multi-document
multilingual
summarization and evaluation
tracks in ACL 2013 MultiLing workshop. In
Proceedings of the MultiLing 2013 Workshop
on Multilingual Multi-document Summariza-
tion, pages 20–28, Sofia, Bulgaria. Association
für Computerlinguistik.

George Giannakopoulos,

Jeff Kubina,

John
Conroy, Josef Steinberger, Benoit Favre, Mijail
Kabadjov, Udo Kruschwitz, and Massimo
Poesio. 2015. MultiLing 2015: Multilingual
summarization of single and multi-documents,
on-line fora, and call-center conversations.
In Proceedings of
the 16th Annual Meet-
ing of
the Special Interest Group on Dis-
course and Dialogue, pages 270–274, Prague,
Czech Republic. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/W15-4638

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, Und
Aleksander Wawer. 2019. SAMSum corpus: A
human-annotated dialogue dataset for abstrac-
tive summarization. In Proceedings of the 2nd
Workshop on New Frontiers in Summarization,
pages 70–79, Hongkong, China. Association
für Computerlinguistik. https://doi
.org/10.18653/v1/D19-5409

Tahmid Hasan, Abhik Bhattacharjee, Wasi
Uddin Ahmad, Yuan-Fang Li, Yong-Bin Kang,
and Rifat Shahriyar. 2021A. Crosssum: Sei-
yond English-centric cross-lingual abstractive
text summarization for 1500+ language pairs.
ArXiv preprint, abs/2112.08804v1.

Tahmid Hasan, Abhik Bhattacharjee, Md. Saiful
Islam, Kazi Mubasshir, Yuan-Fang Li, Yong-
Bin Kang, M. Sohel Rahman, and Rifat
Shahriyar. 2021B. XL-sum: Large-scale multi-
lingual abstractive summarization for 44 lan-
guages. In Findings of
the Association for
Computerlinguistik: ACL-IJCNLP 2021,
pages 4693–4703, Online. Association for
Computerlinguistik. https://doi.org
/10.18653/v1/2021.findings-acl.413

Karl Moritz Hermann, Tom´as Kocisk´y, Edward
Grefenstette, Lasse Espeholt, Will Kay,

1318

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Mustafa Suleyman, and Phil Blunsom. 2015.
Teaching machines to read and comprehend.
In Advances in Neural Information Process-
ing Systems 28: Annual Conference on Neural
Information Processing Systems 2015, Decem-
ber 7-12, 2015, Montreal, Quebec, Kanada,
pages 1693–1701.

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey
Dean. 2015. Distilling the knowledge in a
neural network. ArXiv preprint, abs/1503.
02531v1.

Baotian Hu, Qingcai Chen, and Fangze Zhu. 2015.
LCSTS: A large scale Chinese short text sum-
marization dataset. In Proceedings of the 2015
Conference on Empirical Methods in Natu-
ral Language Processing, pages 1967–1972,
Lisbon, Portugal. Association for Computa-
tional Linguistics.

Shuyu Jiang, Dengbiao Tu, Xingshu Chen, R.
Tang, Wenxian Wang, and Haizhou Wang.
2022. ClueGraphSum: Let key clues guide the
cross-lingual abstractive summarization. ArXiv
preprint, abs/2203.02797v2.

Fajri Koto, Jey Han Lau, and Timothy Baldwin.
2021. Evaluating the efficacy of summariza-
tion evaluation across languages. In Findings
von
the Association for Computational Lin-
guistics: ACL-IJCNLP 2021, pages 801–812,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.findings-acl.71

Anoop Kunchukuttan, Pratik Mehta, and Pushpak
Bhattacharyya. 2018. The IIT Bombay English-
Hindi parallel corpus. In Proceedings of the
Eleventh International Conference on Lan-
guage Resources and Evaluation (LREC 2018),
Miyazaki, Japan. European Language Re-
sources Association (ELRA).

Faisal Ladhak, Esin Durmus, Claire Cardie,
and Kathleen McKeown. 2020. WikiLingua:
A new benchmark dataset for cross-lingual
abstractive summarization. In Findings of the
Verein für Computerlinguistik:
EMNLP 2020, pages 4034–4048, Online.
Verein für Computerlinguistik.
https://doi.org/10.18653/v1/2020
.findings-emnlp.360

Anton Leuski, Chin-Yew Lin, Liang Zhou,
Und

Ulrich Germann, Franz

Josef Och,

Eduard H. Hovy. 2003. Cross-lingual c*st*rd:
Information. ACM
English access to hindi
Transactions on Asian and Low-Resource
Language Information Processing, 2:245–269.
https://doi.org/10.1145/979872
.979877

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer
Erheben, Veselin Stoyanov, and Luke Zettlemoyer.
2020. BART: Denoising sequence-to-sequence
pre-training for natural language generation,
Übersetzung, and comprehension. In Proceed-
ings of
Die
Verein für Computerlinguistik,
pages 7871–7880, Online. Association for
Computerlinguistik. https://doi.org
/10.18653/v1/2020.acl-main.703

the 58th Annual Meeting of

Haoran Li, Junnan Zhu, Tianshang Liu, Jiajun
Zhang, and Chengqing Zong. 2018. Multi-
modal sentence summarization with modality
attention and image filtering. In Proceedings
of the Twenty-Seventh International Joint Con-
ference on Artificial Intelligence, IJCAI-18,
pages 4152–4158. International Joint Confer-
ences on Artificial Intelligence Organization.
https://doi.org/10.24963/ijcai.2018
/577

Mingzhe Li, Xiuying Chen,

Shen Gao,
Zhangming Chan, Dongyan Zhao, and Rui
Yan. 2020. VMSMO: Learning to generate mul-
timodal summary for video-based news articles.
In Proceedings of
Die 2020 Conference on
Empirical Methods in Natural Language Pro-
Abschließen (EMNLP), pages 9360–9369, Online.
Verein für Computerlinguistik.

Yunlong Liang, Fandong Meng, Chulun Zhou,
Jinan Xu, Yufeng Chen, Jinsong Su, Und
Jie Zhou. 2022. A variational hierarchical
model for neural cross-lingual summarization.
In Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics
(Volumen 1: Long Papers), pages 2088–2099,
Dublin, Ireland. Association for Computational
Linguistik. https://doi.org/10.18653/v1
/2022.acl-long.148

Chin-Yew Lin. 2004. ROUGE: A package for
automatic evaluation of summaries. In Text
Summarization Branches Out, pages 74–81,

1319

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Barcelona, Spanien. Association for Computa-
tional Linguistics.

Elvys Linhares Pontes, St´ephane Huet, Juan-
Manuel Torres-Moreno, and Andr´ea Carneiro
Linhares. 2018. Cross-language text summa-
rization using sentence and multi-sentence
compression. In Natural Language Process-
ing and Information Systems, pages 467–479,
Cham. Springer International Publishing.

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao
Jiang, Hiroaki Hayashi, and Graham Neubig.
2021. Pre-train, prompt, and predict: A system-
atic survey of prompting methods in natural
language processing. ArXiv preprint, abs/2107
.13586v1.

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike
Lewis, and Luke Zettlemoyer. 2020. Multilin-
gual denoising pre-training for neural machine
Übersetzung. Transactions of the Association for
Computerlinguistik, 8:726–742.

Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi,
Songfang Huang, Fei Huang, and Luo Si. 2021.
VECO: Variable and flexible cross-lingual
pre-training for language understanding and
Generation. In Proceedings of the 59th Annual
Meeting of the Association for Computational
Linguistics and the 11th International Joint
Conference on Natural Language Processing
(Volumen 1: Long Papers), pages 3980–3994,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.acl-long.308

Shuming Ma, Li Dong, Shaohan Huang,
Dongdong Zhang, Alexandre Muzio, Saksham
Singhal, Hany Hassan Awadalla, Xia Song,
and Furu Wei. 2021. DeltaLM: Encoder-
decoder pre-training for language generation
and translation by augmenting pretrained mul-
tilingual encoders. ArXiv preprint, abs/2106
.13736v2.

Rada Mihalcea and Paul Tarau. 2004. TextRank:
Bringing order into text. In Proceedings of
Die 2004 Conference on Empirical Methods in
Natural Language Processing, pages 404–411,
Barcelona, Spanien. Association for Computa-
tional Linguistics.

Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott,
Michael Auli, and Sergey Edunov. 2019. Face-

book FAIR’s WMT19 news translation task
Vorlage. In Proceedings of the Fourth Con-
ference on Machine Translation (Volumen 2:
Shared Task Papers, Day 1), pages 314–319,
Florence, Italien. Association for Computational
Linguistik.

Khanh Nguyen and Hal Daum´e III. 2019. Global
Stimmen: Crossing borders in automatic news
the 2nd
summarization. In Proceedings of
Workshop on New Frontiers in Summarization,
pages 90–97, Hongkong, China. Association
für Computerlinguistik. https://
doi.org/10.18653/v1/D19-5411

Thong Thanh Nguyen and Anh Tuan Luu. 2022.
Improving neural cross-lingual abstractive
summarization via employing optimal transport
distance for knowledge distillation. Proceed-
ings of the AAAI Conference on Artificial Intel-
ligence, 36(10):11103–11111. https://doi
.org/10.1609/aaai.v36i10.21359

Constantin Or˘asan

and Oana Andreea
Chiorean. 2008. Evaluation of a cross-lingual
Romanian-English multi-document
summa-
rizer. In Proceedings of
the Sixth Interna-
tional Conference on Language Resources and
Evaluation (LREC’08), Marrakech, Morocco.
European Language Resources Association
(ELRA).

Jessica Ouyang, Boya Song,

Die 2019 Conference of

and Kathy
McKeown. 2019. A robust abstractive system
for cross-lingual summarization. In Proceed-
ings of
der Norden
American Chapter of the Association for Com-
putational Linguistics: Human Language Tech-
nologies, Volumen 1 (Long and Short Papers),
pages 2025–2031, Minneapolis, Minnesota.
Verein für Computerlinguistik.
https://doi.org/10.18653/v1/N19
-1204

Lawrence Page, Sergey Brin, Rajeev Motwani,
and Terry Winograd. 1999. The pagerank ci-
tation ranking: Bringing order to the web.
Technical Report 1999-66, Stanford InfoLab.

Chris D. Paice. 1990. Constructing literature ab-
stracts by computer: Techniques and pros-
pects. Information Processing & Management,
26(1):171–186. https://doi.org/10.1016
/0306-4573(90)90014-S

1320

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Laura Perez-Beltrachini and Mirella Lapata.
2021. Models and datasets for cross-lingual
summarisation. In Proceedings of
Die 2021
Conference on Empirical Methods in Natu-
ral Language Processing, pages 9408–9423,
Online and Punta Cana, Dominican Republic.
Verein für Computerlinguistik.
https://doi.org/10.18653/v1/2021
.emnlp-main.742

Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan
and Xuanjing Huang.
Shao, Ning Dai,
2020. Pre-trained models for natural
lan-
guage processing: A survey. ArXiv preprint,
abs/2003.08271v4.

Colin Raffel, Noam Shazeer, Adam Roberts,
Katherine Lee, Sharan Narang, Michael
Matena, Yanqi Zhou, Wei Li, und Peter J. Liu.
2020. Exploring the limits of transfer learning
with a unified text-to-text transformer. Zeitschrift
of Machine Learning Research, 21(140):1–67.

Ramon

Shruti
Sanabria, Ozan Caglayan,
Palaskar, Desmond Elliott, Lo¨ıc Barrault,
Lucia Specia, and Florian Metze. 2018. How2:
A large-scale dataset for multimodal language
Verständnis. In Proceedings of the Work-
shop on Visually Grounded Interaction and
Language (ViGIL). NeurIPS.

Holger Schwenk, Vishrav Chaudhary, Shuo Sun,
Hongyu Gong, and Francisco Guzm´an. 2021.
WikiMatrix: Mining 135M parallel sentences
In 1620 language pairs
from Wikipedia.
In Proceedings of
the 16th Conference of
the European Chapter of the Association for
Computerlinguistik: Main Volume,
pages 1351–1361, Online. Association for
Computerlinguistik. https://doi
.org/10.18653/v1/2021.eacl-main
.115

Abigail See, Peter J. Liu, and Christopher D.
to the point: Summa-
Manning. 2017. Get
rization with pointer-generator networks. In
Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics
(Volumen 1: Long Papers), pages 1073–1083,
Vancouver, Kanada. Association for Computa-
tional Linguistics.

Eva Sharma, Chen Li, and Lu Wang. 2019.
BIGPATENT: A large-scale dataset for ab-
stractive and coherent summarization. In Pro-

ceedings of the 57th Annual Meeting of the
Verein für Computerlinguistik,
pages 2204–2213, Florence, Italien. Association
für Computerlinguistik. https://doi
.org/10.18653/v1/P19-1212

Kihyuk Sohn, Honglak Lee, and Xinchen Yan.
2015. Learning structured output representa-
tion using deep conditional generative models.
In Advances in Neural Information Process-
ing Systems 28: Annual Conference on Neural
Information Processing Systems 2015, Decem-
ber 7–12, 2015, Montreal, Quebec, Kanada,
pages 3483–3491.

Samuel

Pavel

Stanton,

Izmailov,

Polina
Kirichenko, Alexander A. Alemi, and Andrew
G. Wilson. 2021. Does knowledge distillation
really work? Advances in Neural Information
Processing Systems, 34.

Sho Takase and Naoaki Okazaki. 2020. Multi-task
learning for cross-lingual abstractive summa-
rization. ArXiv preprint, abs/2010.07503v1.

Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen,
Naman Goyal, Vishrav Chaudhary, Jiatao Gu,
and Angela Fan. 2021. Multilingual transla-
tion from denoising pre-training. In Findings
of the Association for Computational Linguis-
Tics: ACL-IJCNLP 2021, pages 3450–3466,
Online. Association for Computational Lin-
https://doi.org/10.18653
guistics.
/v1/2021.findings-acl.304

J¨org Tiedemann and Santhosh Thottingal. 2020.
OPUS-MT – building open translation services
for the world. In Proceedings of
the 22nd
Annual Conference of the European Associ-
ation for Machine Translation, pages 479–480,
Lisboa, Portugal. European Association for
Machine Translation.

Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
Lukasz Kaiser, and Illia Polosukhin. 2017. Bei-
tention is all you need. In Advances in Neural
Information Processing Systems 30: Jährlich
Conference on Neural Information Process-
ing Systems 2017, December 4–9, 2017, Long
Beach, CA, USA, pages 5998–6008.

Xiaojun Wan. 2011. Using bilingual information
for cross-language document summarization.

1321

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

the 49th Annual Meet-
In Proceedings of
the Association for Computational
ing of
Linguistik: Human Language Technologies,
pages 1546–1555, Portland, Oregon, USA.
Verein für Computerlinguistik.

Xiaojun Wan, Huiying Li, and Jianguo Xiao.
2010. Cross-language document summariza-
tion based on machine translation quality pre-
diction. In Proceedings of the 48th Annual
Meeting of the Association for Computational
Linguistik, pages 917–926, Uppsala, Schweden.
Verein für Computerlinguistik.

Xiaojun Wan, Fuli Luo, Xue Sun, Songfang
Huang, and Jin ge Yao. 2018. Cross-language
document summarization via extraction and
ranking of multiple summaries. Knowledge and
Information Systems, 58:481–499.

Jiaan Wang, Zhixu Li, Qiang Yang, Jianfeng
Qu, Zhigang Chen, Qingsheng Liu, Und
Guoping Hu. 2021. Sportssum2.0: Generating
high-quality sports news from live text com-
the 30th ACM
mentary. In Proceedings of
International Conference on Information &
Knowledge Management, pages 3463–3467,
New York, New York, USA. Association for Com-
puting Machinery. https://doi.org/10
.1145/3459637.3482188

Jiaan Wang, Zhixu Li, Tingyi Zhang, Duo
Jianfeng Qu, An Liu, Lei Zhao,
Zheng,
and Zhigang Chen. 2022A. Knowledge en-
hanced sports game summarization. In Pro-
ceedings of the Fifteenth ACM International
Conference on Web Search and Data Min-
ing, WSDM ’22, pages 1045–1053, Neu
York, New York, USA. Association for Computing
Machinery. https://doi.org/10.1145
/3488560.3498405

Jiaan Wang, Fandong Meng, Ziyao Lu, Duo
Zheng, Zhixu Li, Jianfeng Qu, and Jie Zhou.
2022B. ClidSum: A benchmark dataset for
cross-lingual dialogue summarization. ArXiv
preprint, abs/2202.05599v1.

Guillaume Wenzek, Marie-Anne Lachaux,
Alexis Conneau, Vishrav Chaudhary, Francisco
Guzm´an, Armand Joulin, and Edouard Grave.
2020. CCNet: Extracting high quality mono-
lingual datasets from web crawl data.
In
Proceedings of the 12th Language Resources

and Evaluation Conference, pages 4003–4012,
Marseille, Frankreich. European Language Re-
sources Association.

Ruochen Xu, Chenguang Zhu, Yu

Shi,
Michael Zeng, and Xuedong Huang. 2020.
Mixed-lingual pre-training for cross-lingual
summarization. In Proceedings of the 1st Con-
ference of
Die
Association for Computational Linguistics and
the 10th International Joint Conference on
Natural Language Processing, pages 536–541,
Suzhou, China. Association for Computational
Linguistik.

the Asia-Pacific Chapter of

Linting Xue, Noah Constant, Adam Roberts,
Mihir Kale, Rami Al-Rfou, Aditya Siddhant,
Aditya Barua, and Colin Raffel. 2021. mT5:
A massively multilingual pre-trained text-to-
text transformer. In Proceedings of the 2021
Conference of
the North American Chap-
the Association for Computational
ter of
Linguistik: Human Language Technologies,
pages 483–498, Online. Association for Com-
putational Linguistics.

compressive

Phrase-based

Jin-ge Yao, Xiaojun Wan, and Jianguo Xiao.
2015.
kreuzen-
language summarization. In Proceedings of the
2015 Conference on Empirical Methods in
Natural Language Processing, pages 118–127.
Verein für Computerlinguistik.
https://doi.org/10.18653/v1/D15
-1012

Jiajun Zhang, Yu Zhou, and Chengqing Zong.
2016. Abstractive cross-language summariza-
tion via translation model enhanced predicate
argument structure fusing. IEEE/ACM Trans-
actions on Audio, Speech, and Language Pro-
Abschließen, 24:1842–1853. https://doi.org
/10.1109/TASLP.2016.2586608

Jingqing Zhang, Yao Zhao, Mohammad Saleh,
und Peter J. Liu. 2020A. PEGASUS: Pre-training
with extracted gap-sentences for abstractive
summarization. In Proceedings of
the 37th
International Conference on Machine Learn-
ing, ICML 2020, 13–18 July 2020, Virtual
Ereignis, Volumen 119 of Proceedings of Ma-
chine Learning Research, pages 11328–11339.
PMLR.

Tianyi Zhang, Varsha Kishore, Felix Wu,
Kilian Q. Weinberger, and Yoav Artzi. 2020B.

1322

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bertscore: Evaluating text generation with
BERT. In 8th International Conference on
Learning Representations, ICLR 2020, Ad-
dis Ababa, Ethiopia, April 26–30, 2020.
OpenReview.net.

Wei Zhao, Maxime Peyrard, Fei Liu, Yang
Gao, Christian M. Meyer, and Steffen Eger.
2019. MoverScore: Text generation evaluat-
ing with contextualized embeddings and earth
mover distance. In Proceedings of the 2019
Conference on Empirical Methods in Natural
Language Processing and the 9th Interna-
tional Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), pages 563–578,
Hongkong, China. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D19-1053

Chenguang Zhu, Yang Liu, Jie Mei, and Michael
Zeng. 2021. MediaSum: A large-scale media
interview dataset for dialogue summarization.
In Proceedings of the 2021 Conference of the
North American Chapter of the Association
für Computerlinguistik: Human Lan-
guage Technologies, pages 5927–5934, Online.
Verein für Computerlinguistik.

Junnan Zhu, Haoran Li, Tianshang Liu, Yu Zhou,
Jiajun Zhang, and Chengqing Zong. 2018.
MSMO: Multimodal summarization with mul-
timodal output. In Proceedings of the 2018
Conference on Empirical Methods in Natu-

ral Language Processing, pages 4154–4164,
Brussels, Belgien. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D18-1448

Junnan Zhu, Qian Wang, Yining Wang, Yu Zhou,
Jiajun Zhang, Shaonan Wang, and Chengqing
Zong. 2019. NCLS: Neural cross-lingual sum-
marization. In Proceedings of the 2019 Con-
ference on Empirical Methods in Natural
Language Processing and the 9th International
Joint Conference on Natural Language Pro-
Abschließen (EMNLP-IJCNLP), pages 3054–3064,
Hongkong, China. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D19-1302

Junnan Zhu, Yu Zhou,

Jiajun Zhang, Und
Chengqing Zong. 2020. Attend, translate and
summarize: An efficient method for neural
cross-lingual summarization. In Proceedings of
the 58th Annual Meeting of the Association for
Computerlinguistik, pages 1309–1321,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2020.acl-main.121

Michał Ziemski, Marcin Junczys-Dowmunt, Und
Bruno Pouliquen. 2016. The United Na-
tions parallel corpus v1.0. In Proceedings of
the Tenth International Conference on Lan-
guage Resources and Evaluation (LREC’16),
pages 3530–3534, Portoroˇz, Slovenia. Euro-
pean Language Resources Association (ELRA).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
5
2
0
2
0
6
2
1
5
7

/
T

A
C
_
A
_
0
0
5
2
0
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

1323
PDF Herunterladen