Chinese Idiom Paraphrasing

Chinese Idiom Paraphrasing

Jipeng Qiang1∗ Yang Li1 Chaowei Zhang1 Yun Li1
Yi Zhu1 Yunhao Yuan1 Xindong Wu2,3

1 Yangzhou University, 中国,

2 Hefei University of Technology, 中国,

3 Zhejiang Lab, 中国

{jpqiang,cwzhang,liyun,zhuyi,yhyuan}@yzu.edu.cn, xwu@hfut.edu.cn

抽象的

Idioms are a kind of idiomatic expression
in Chinese, most of which consist of four
Chinese characters. Due to the properties of
non-compositionality and metaphorical mean-
英, Chinese idioms are hard to be understood
by children and non-native speakers. 这
study proposes a novel task, denoted as Chi-
nese Idiom Paraphrasing (CIP). CIP aims
to rephrase idiom-containing sentences to
non-idiomatic ones under the premise of pre-
serving the original sentence’s meaning. 自从
the sentences without idioms are more eas-
ily handled by Chinese NLP systems, CIP
can be used to pre-process Chinese datasets,
thereby facilitating and improving the perfor-
mance of Chinese NLP tasks, 例如, 机器
translation systems, Chinese idiom cloze, 和
Chinese idiom embeddings. In this study, 我们
can treat the CIP task as a special paraphrase
generation task. To circumvent difficulties
in acquiring annotations, we first establish
a large-scale CIP dataset based on human
and machine collaboration, which consists of
115,529 sentence pairs. In addition to three
sequence-to-sequence methods as the base-
线, we further propose a novel infill-based
approach based on text infilling. The results
show that the proposed method has better
performance than the baselines based on the
established CIP dataset.

1

介绍

Idioms, called ‘‘
’’ (ChengYu) in Chinese,
are widely used in daily communications and
various literary genres. Idioms are a kind of
compact Chinese expressions that consist of few
words but imply relatively complex social nu-
ances. 而且, Chinese idioms are often used
to describe similar phenomena, 事件, ETC。, 哪个
means the idioms cannot be interpreted with their
literal meanings in some cases. 因此, it has al-
ways been a challenge for non-native speakers,

∗Corresponding author: jpqiang@yzu.edu.cn.

740

and even native speakers, to recognize Chinese
idioms (Zheng et al., 2019). 例如, 这
’’ (YuRouBaiXing) shown in
idiom ‘‘
数字 1, represents ‘‘oppress the people’’, 反而
of its literal meaning‘‘fish meat the people’’.

In real life, if some people do not understand
the meaning of idioms, we have to explain them
by converting them into a set of word segments
that reflect more intuitive and understandable
paraphrasing. In this study, we try to manipu-
late computational approaches to automatically
rephrase idiom-containing sentences into simpler
句子 (IE。, non-idiom-containing sentences)
for preserving context-based paraphrasing, 和
then benefit both Chinese-based natural language
processing and societal applications.

Since idioms are a kind of obstacles for many
NLP tasks, CIP can be used as a pre-processing
phase that facilitates and improves the perfor-
mance of machine translation systems (Ho et al.,
2014; Shao et al., 2018), Chinese idiom cloze
(Jiang et al., 2018; Zheng et al., 2019), and Chi-
nese idiom embeddings (Tan and Jiang, 2021).
此外, CIP-based applications can help spe-
cific groups, such as children, non-native speakers,
and people with cognitive disabilities, to improve
their reading comprehension.

We propose a new task in this study, 的-
noted as Chinese Idiom Paraphrasing (CIP), 哪个
aims to rephrase the idiom-containing sentences
into fluent,
intuitive, and meaning-preserving
non-idiom-containing sentences. We can treat the
CIP task as a special paraphrase generation task.
The general paraphrase generation task aims to
rephrase a given sentence to another one that pos-
sesses identical semantics but various lexicons or
syntax (Kadotani et al., 2021; 卢等人。, 2021).
相似地, CIP emphasizes rephrasing the idioms
of input sentences to word segments that reflect
more intuitive and understandable paraphrasing.
近几十年来, many researchers devoted to

计算语言学协会会刊, 卷. 11, PP. 740–754, 2023. https://doi.org/10.1162/tacl 00572
动作编辑器: Minlie Huang. 提交批次: 12/2022; 修改批次: 3/2023; 已发表 7/2023.
C(西德:3) 2023 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

语料库. Because the training corpus for the MT
system does not include any idioms, the MT sys-
tem will not translate input English sentences
to idiom-containing Chinese sentences. 然后, 这
MT system is deployed to translate English sen-
tences of the idiom-containing sub-corpus to
the non-idiom-containing sentences. 一个大规模的
pseudo-parallel CIP dataset can be constructed by
pairing the idiom-containing sentences of idiom-
containing sub-corpus and the translated non-
idiom-containing sentences. 最后, we employ
native speakers to validate the generated sentences
and modify defective sentences if necessary.

第二, we propose one novel

infill-based
method to rephrase the input idiom-containing
句子. Since the constructed dataset is used as
the training dataset, we treat the CIP task as a
paraphrase generation task. We adopt three differ-
ent sequence-to-sequence (Seq2Seq) methods as
基线: LSTM-based approach, Transformer-
based approach, and mT5-based approach, 在哪里
mT5 is a massively multilingual pre-trained
text-to-text Transformer (Xue et al., 2021). 我们的
proposed infill-based method is only required to
rephrase the idioms of the sentence, 意思是
that we only need to generate context-based in-
terpretations of idioms, rather than the whole
句子. 具体来说, a CIP sentence pair can be
processed to produce a (corrupted) input sentence
by replacing both the idioms of the source sentence
and a corresponding target extracted from the sim-
plified sentence. The mT5-based CIP method is
fine-tuned to reconstruct the corresponding target.
Experimental results show that, compared with the
baselines evaluated on the constructed CIP dataset,
our infill-based method can output high-quality
paraphrasing of sentences that are grammatically
correct and semantically appropriate.

As the use of the Chinese language becomes
more widespread, the need for effective Chinese
paraphrasing methods may increase, 导致
further research and development in this area.
The constructed dataset and employed baselines
that are used to accelerate this research are
open-source, available on Github.2

2 相关工作

Paraphrase Generation: Paraphrase genera-
to extract paraphrases of given
tion aims

2https://www.github.com/jpqiang/Chinese

数字 1: Given a Chinese idiom-containing sen-
张力, we aim to output a fluent,
intuitive, 和
meaning-preserving non-idiom-containing sentence.
An idiom-containing sentence is hard to process by
NLP applications. 例如, this idiom-containing
sentence is translated by the newest Google Translator.1
After processing the idiom-containing sentence using
the proposed CIP method, we can output the correct
翻译. In the example, the idiom is undelined.

paraphrase generation (McKeown, 1979; Meteer
and Shaked, 1988) have struggled due to the lack
of a reliable supervision dataset (Meng et al.,
2021). Inspired by the challenge, we establish a
large-scale training dataset in this work for the
CIP task.

Contributions. This study produces two main
contributions toward the development of CIP
系统.

第一的, a large-scale benchmark is established for
the CIP task. The benchmark comprises 115,529
sentence pairs, which of 8,421 are idioms. A re-
current challenge in crowdsourcing NLP-oriented
datasets at scale is that human writers frequently
utilize repetitive patterns to fabricate examples,
leading to a lack of linguistic diversity (刘等人。,
2022). A new large-scale CIP dataset is created in
this study by taking advantage of the collaboration
between humans and machines.

In detail, we initially divide a large-scale
Chinese-English machine translation corpus into
two parts (idiom-containing sub-corpus, 和
non-idiom-containing sub-corpus) by judging if
a Chinese sentence contains idioms. 下一个, 我们
train an English-to-Chinese machine translation
(公吨) system using the non-idiom-containing sub-

1translate.google.com. Accessed in: 2022-12-01.

-Idiom-Paraphrasing.

741

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

句子. The extracted paraphrases can pre-
serve the original meaning of the sentence, 但
are assembled with different words or syntactic
结构 (McKeown, 1979; Meteer and Shaked,
1988; Zhou and Bhat, 2021).

Most recent neural paraphrase generation methods
primarily take advantage of the sequence-to-
sequence framework, which can achieve con-
siderable performance improvements compared
with traditional approaches (Zhou and Bhat,
2021). Some approaches use reinforcement learn-
ing or multi-task learning to improve the quality
and diversity of generated paraphrases (Xie et al.,
2022). A long-standing issue embraced in para-
phrase generation studies is the lack of reliable
supervised datasets. The issue can be avoided by
constructing manually annotated paired-paraphrase
datasets (Kadotani et al., 2021) or designing un-
supervised paraphrase generation methods (猛
等人。, 2021).

Differ from existing paraphrase generation re-
搜索, we take our attention to Chinese idiom
paraphrasing that rephrases idiom-containing sen-
tences to non-idiom-containing ones.

Text Infilling: Originating from cloze tests
(泰勒, 1953), text infilling aims to fill in missing
blanks in a sentence or paragraph by making use
of the preceding and subsequent text, to make the
text complete and meaningful.

Current text infilling methods may be cate-
gorized into four groups. GAN-based methods
train GANs to ensure that the generator gen-
erates highly dependable infilling content that
can trick the discriminator (Fedus et al., 2018).
Intricate inference-based methods use dynamic
programming or gradient search to locate infilling
content that is highly probable within its sur-
rounding context (Zaidi et al., 2020). Masked
LM-based methods generate infilling content
based on its bidirectional contextual word em-
bedding (Shen et al., 2020). LM-based methods
fine-tune off-the-shelf LMs in an auto-regressive
方式, and some approaches modify the input
format by putting an infilling answer after the
masked input (多纳休等人。, 2020), 然而
others do not modify the input format (Zhu et al.,
2019). In contrast to the aforementioned methods,
our goal in this paper is not only to make the
text complete, but also to maintain the sentence’s
meaning when creating paraphrases. 因此,
we employ a sequence-to-sequence framework to
identify infilling content.

Idioms: Idiom is an interesting linguistic phe-
nomenon in the Chinese language. 比较的
with other types of words, 最多
idioms are
unique in perspective of non-compositionality
and metaphorical meaning. Idiom understanding
plays an important role in the research area of
Chinese language understanding. Many types of
research related to Chinese idiom understanding
have been proposed that can benefit a variety
of related down-streaming tasks. 例如,
Shao et al. (2018) focused on evaluating the qual-
ity of idiom translation of machine translation
系统. Zheng et al. (2019) provided a bench-
mark to assess the abilities of multiple models
on Chinese idiom-based cloze tests, and evaluated
how well the models can comprehend Chinese
idiom-containing texts. 刘等人. (2019) studied
how to improve essay writing skills by recom-
mending Chinese idioms. Tan and Jiang (2021)
investigated the tasks on learning and quality
evaluation of Chinese idiom embeddings. 在这个
纸, we study a novel CIP task that is dif-
ferent from the above tasks. Since the proposed
CIP method can rephrase idiom-containing sen-
tences to non-idiom-containing ones, it is expected
that CIP can benefit tasks related to idiom rep-
resentation and idiom translation.

Pershina et al. (2015) studied a new task of
English idiom paraphrases aiming to determine
whether two idioms have alike or similar mean-
英格斯. They collected idioms’ definitions in a
dictionary and utilized word embedding model-
ings to represent idioms to calculate the similarity
between two idioms. Qiang et al. (2021) proposed
a Chinese lexical simplification method, 哪个
focuses on replacing complex words in given
sentences with simpler and meaning-equivalent
备择方案. It is noteworthy that the substitutes
in Chinese lexical simplification are all made up
of a single word, but an idiom typically cannot be
substituted by a single word to express original
concepts or ideas.

3 Human and Machine Collaborative

Dataset Construction

This section describes the process of constructing
a large-scale parallel dataset for CIP. A qualified
CIP dataset needs to meet the following two re-
quirements: (1) The two sentences in a sentence
pair have to convey the same meaning; 和 (2) A
sentence pair has to contain an idiom-containing

742

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

数字 2: A pipelined illustration of creating a CIP dataset based on a Chinese-English MT corpus. (A) The corpus
is split into an idiom-containing sub-corpus and a non-idiom-containing sub-corpus based on a Chinese idiom
列表. (乙) We train a MT system using the non-idiom-containing sub-corpus, and create a pseudo-CIP Dataset by
pairing the original Chinese idiom-containing sentences and the translated non-idiom-containing sentences using
the trained MT system. (C) We ask human annotators to revise the translated Chinese sentence of the pairs to
strengthen the quality of the created CIP dataset.

sentence and an idiom-containing one. We outline
a three-stage pipeline for dataset construction,
which takes advantage of both the generative
strength of machine translation (公吨) methods and
the evaluative strength of human annotators. 胡-
man annotators are generally reliable in correcting
examples, but it is challenging while crafting di-
verse and creative examples at scale. 所以,
we deploy a machine translator to automatically
create an initial CIP dataset, and then inquire
annotators to proofread each generated instance.

3.1 Pipeline

数字 2 exhibits the details of the pipeline. 我们的
pipeline starts with an existing English-Chinese
machine translation dataset denoted as D. Firstly,
we refer to a collect Chinese idiom list I to
the MT dataset D into two parts: 非-
split
idiom-containing sub-dataset D1 and idiom-
containing sub-dataset D2 (Stage 1). All the data
items in both D1 and D2 are in forms of sentence
对. 然后, we train a neural machine translation
system M using D1, which can translate En-
glish sentences to non-idiom-containing Chinese
句子. 随后, we input English sen-
tences in D2 to M to output non-idiom-containing
Chinese sentences. Afterward, the Chinese sen-
tences in D2 and the generated sentences are
paired to construct a large-scale initial parallel
CIP dataset (Stage 2). 最后, the constructed
dataset is reviewed and revised by annotators for
quality assurance (Stage 3).

Stage 1: Corpus Segmentation. The English-
Chinese MT dataset D we applied in the re-
search are grabbed from WMT18 (Bojar et al.,
2018), which contains 24,752,392 sentence pairs.
We extract a Chinese idiom list I that embraces
31,114 idioms.3 Since the list enables determining
whether the Chinese sentence in a pair contains
idioms, D can be split as D1 and D2. 这
sub-dataset D1 is used to train a special MT
system M that can translate English sentences to
non-idiom-containing Chinese sentences. In our
实验, 仅有的 0.2% of the translated Chinese
sentences contain idioms (见表 6). After re-
moving redundant Chinese sentences, 号码
of sentence pairs in D2 is 105,559.

Stage 2: Pseudo-CIP Dataset. Giving a sen-
tence pair (词, 不) in D2, we input the English
sentence ei into MT system M, and output a
Chinese translation ti. We pair Chinese sentence
ci and Chinese translation ti as a pseudo-CIP sen-
tence pair. 因此, a CIP dataset can be built up
by pairing original Chinese sentences and cor-
responding translated English-to-Chinese ones in
D2. The pseudo-CIP dataset D2(西德:4) can meet the two
requirements of CIP dataset construction. On one
手, the pseudo-CIP data is from the MT dataset,
which can guarantee that the paired sentences de-
liver the same meanings. On another hand, 全部
original sentences include one or more idioms,
and all the translated sentences do not contain
idioms.

3https://github.com/pwxcoo/chinese-xinhua.

743

sentence pairs

代币

In-domain

Train

95,560

Dev

5,000

Test

4,999

Out-of-domain

Dev

4,994

Test

4,976

全部的

115,529

3,390,179

173,001

169,793

225,850

221,673

4,180,496

来源

Avg. sentence length

35

句子

All Idioms

Unique Idioms

102,997

7,609

35

5,423

5,225

34

5,494

5,279

45

5,808

5,149

45

5,800

5,128

36

251,055

8,421

参考

代币

3,454,127

175,083

172,028

239,578

224,907

4,265,723

句子

Avg. sentence length

Avg. edit disatance

36

7.85

35

7.26

34

7.37

48

6.21

45

5.36

36

7.62

桌子 1: The statistics of the CIP dataset.

Freq. 间隔

Out

Valid

Test

Valid

Test

0

415

421

279

284

[1,10)

[10,20)

[20,30)

[30,40)

[40,50)

[50,68)

1,787

1,814

1,871

1,854

941

946

808

810

1,159

1,171

1,120

1,108

1,026

1,057

1,643

1,657

67

60

62

66

28

25

25

21

桌子 2: Frequency statistics for idiomatic usage in Dev and Test.

Stage 3: Human Review. As the final stage
of the pipeline, we recruit five human annotators
to review each sentence pair (词, 的) in the pseudo-
CIP dataset D2(西德:4). These annotators are all under-
graduate native Chinese speakers. 给定 (词, 的),
annotators are asked to revise and improve
the quality of ti. 的
is required to be non-
idiom-containing and fully meaning-preserving.

3.2 语料库统计

The statistical details of the CIP dataset are shown
表中 1. The dataset D2(西德:4) is treated as in-domain
数据, which contains 105,559 instances including
8,261 different idioms. D2(西德:4) is partitioned into three
部分: a training set Train, a development set Dev,
and a test set Test. The number of instances in
Train, Dev, and Test are 95,560, 5,000, 和 4,999,
分别.

We observe that both the Train and Test
datasets come from the same distribution. 如何-
曾经, when models are deployed in real-world
applications, the inference might be performed
on the data from different distributions,
IE。,
out-of-domain (Desai and Durrett, 2020). 那里-
fore, we additionally collected 9,970 句子
with idioms from modern vernacular classics, 在-
cluding prose and fiction, as out-of-domain data,
to assess the generalization ability of CIP methods.

Unlike the MT corpus, these sentences have no
English sentences as their references, 我们手动
modify them to non-idiom-containing sentences
with the help of Chinese native speakers.

There are three significant differences between
in-domain and out-of-domain data. 第一的, the av-
erage length of sentences in in-domain data is
大约 35 字, while it is about 45 words for
out-of-domain data. 第二, the average number
of idioms in in-domain data is 1.07, which is lower
than that of out-of-domain data (IE。, 1.17). 第三,
the sentence pairs in out-of-domain data need
fewer modifications than that in in-domain data.
在这种情况下, a lack of linguistic diversity might be
taken place due to human annotators often relying
on repetitive patterns to generate sentences.

To verify the scalability and generalization abil-
ity of the CIP methods, we adopt the following
strategy to construct Dev and Test. We counted
the frequency of each idiom in the corpus, 在哪里
the minimum and the maximum idiom frequency
是 1 和 68, 分别. Based on the number
of idioms in each frequency interval, we extract
the instances into the Dev and Test. The id-
iom frequency statistics on the Dev and Test
are shown in Table 2. We can see that those
low-frequency idioms occupy a higher proportion
of all the idiom occurrences (62.76% 和 62.71%

744

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

’’ in CIP dataset. c and e are a machine learning
桌子 3: The examples contain the idiom ‘‘
sentence pair, t is the CIP reference sentence of c generated by collaborating machine translation and
human intervention. The words in underlined are idioms, and their translations and their interpretations
are marked in wave line.

for low-frequency interval [0,20) in in-domain
Dev and Test). 有 421 和 415 instances
containing idioms in in-domain Dev and Test that
are never seen in the Train.

3.3 Some Examples in the CIP Dataset

some examples of

the idiom
We present
’’ (reclusive) in the CIP dataset, 显示
‘‘
表中 3. The idiom ‘‘
’’ can be
rephrased with different descriptions, displaying
the linguistic diversity.

4 方法

Based on our constructing CIP dataset, the CIP
task can be treated as a sentence paraphrasing task
(部分 4.1). 此外, we propose a novel
infill-based method to solve it (部分 4.2).

4.1 Paraphrasing for CIP

This can be defined as follows. Given a source
sentence c = {c1, . . . , 西杰, . . . , 厘米} with one or
more idioms, we intend to produce a paraphrase
sentence t = {t1, . . . , 的, …tn}. More specifi-
卡莉, t is expected to be non-idiom-containing
and meaning-preserving, where cj or ti refers
to a Chinese character. In this study, we sup-
pose to design a supervised method to approach
this monolingual machine translation task. 我们

adopt a Sequence-to-Sequence (Seq2Seq) frame-
work that directly predicts the probability of the
character-sequential translation from source sen-
tences to target ones (Bahdanau et al., 2015),
where the probability is calculated using the
following equation 1:

磷 (t | C) =

n(西德:2)

我=1

磷 (的 | t, where the rule means ‘+’ fol-
lows by ‘−’ in Diff. We get the following two
’> and
matching sequence pairs <‘ ’>. A sequence
<‘ ’,‘ pair will be ignored by the model if no id- iom is included. In this example, we obtain the ’>,
sequence pair <‘ ’’ rep- where ‘‘ resent an idiom and corresponding interpretation. Training. Given one sentence pair {c,t}, we first construct a new sentence pair {c C, y}, 作为
shown in Figure 3. 然后, we employ the Seq2Seq
methods to accomplish this task, 如图所示
数字 4. 这里, we make two modifications. (1)

’’ and ‘‘

’,'

4https://github.com/paulgb/simplediff.

746

数字 3: Infill-based CIP method as the sequence-
to-sequence task. An sentence pair {C,t} is transferred
into the input ‘‘c c’’ and ‘‘y’’.

数字 4: An example of the infill-based CIP method.
The input sequence fed into mT5-based Seq2Seq mod-
eling is composed of an original sentence and a target
句子, in which the idiom of the original sentence is
replaced by one blank. The interpretation of the idiom
is treated as the reference sentence, 而不是
target sentence.

If c is directly fed to the encoder, the information
of the idiom of c is ignored. We concatenate the
original c and the sentence c as the input sequence.
(2) When a sentence has two or more idioms, 我们
construct one {C C, y} for each idiom in
C. We only use one blank for one idiom instead
of multiple blanks for all idioms, because we can
preserve enough information when generating the
sequences to infill the blank.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Given training data consisting of { C, C, y}, 它
is straightforward to optimize the following ob-
jective function according to maximum likelihood
estimations:

(西德:3)

Loss = −

log P (y|C, C; 我)

(2)

∈Train

mT5 is a pre-trained span masked language
造型 (MLM) to build a Seq2Seq modeling.
In contrast to MLM in BERT (Devlin et al., 2019),

span MLM reconstructs consecutive spans of in-
put tokens and masks them with a blank. 随着
help of mT5, the proposed method enables recon-
structing the idiom I’s interpretation of sentence c
via replacing the idiom with the blank. 所以,
our infill-based method adopts mT5 to learn how
to fill the black.

During inference, if a sentence has multiple
idioms, we iteratively decode each idiom to a
corresponding representation.

Relation to Previous Work. 与相比
the sentence paraphrasing task (Zhou and Bhat,
2021; Xie et al., 2022), our infill-based method
only requires us to rephrase the idioms of the
句子, rather than the whole sentence. Actually,
our method is inspired by the text infilling method
of Zhu et al. (2019). But our method is different
from the existing text infilling method, 因为
our aim is to rephrase the original sentence, 和
aim of text infilling is to make the text complete
and meaningful.

5 实验

5.1 Experiment Setup

Implementation Details.
In this experiment
设计, four CIP methods are deployed, 包括-
英: LSTM-based Seq2Seq modeling (LSTM),
Transformer-based Seq2Seq modeling (反式-
以前的), mT5-based Seq2Seq modeling (mT5),
and infill-based CIP method (Infill). We im-
plement LSTM and Transformer methods using
fairseq (Ott et al., 2019). mT5 and Infill meth-
ods are mT5-based, and are fulfilled using
HuggingFace transformers (沃尔夫等人。, 2020).
此外, the sentence tokenization is accom-
plished using the Jieba Chinese word segmenter5
and BPE tokenization. The size of the vocabu-
lary is set to 32K. The LSTM-based Seq2Seq
method adopts the Adam optimizer configured
with β = (0.9, 0.98), 3e−4 learning rate, 和
0.2 dropout rate. The Transformer-based Seq2Seq
method maintains the hyperparameters of the base
Transformer (Vaswani et al., 2017) (根据), 哪个
contains a six-layered encoder and a six-layered
decoder. The three parameters (β of Adam op-
timizer, learning rate, and dropout rate) 在里面
Transformer-based method are equivalent to those
in the LSTM-based method. It’s noteworthy that
the learning rate is gradually increased to 3e−4

by 4k steps and correspondingly decays according
to the inverse square root schedule. For mT5 and
Infill, we adopt the mt5 version that is re-trained
on Chinese corpus.6 We train the three methods
with the Adam optimizer (Kingma and Ba, 2015)
and an initial learning rate of 3e−4 up to 20 纪元
using early stopping on development data. 这
training will be stopped when the accuracy on
the development set does not improve within 5
纪元. We used a beam search with 5 beams for
inference.

指标. As we mentioned above,

the CIP
task can be treated as a sentence paraphrasing
任务. 所以, We apply four metrics to eval-
uate sentence paraphrasing task namely, 蓝线
(Papineni et al., 2002), BERTScore (张等人。,
2020), and ROUGE-1 and ROUGE-2 (林, 2004).
BLEU is a widely used machine translation
metric, which measures opposed references to
evaluate lexical overlaps with human intervention
(Papineni et al., 2002). BERTScore is chosen as
another metric due to its high correlation with
human judgments (张等人。, 2020). 康姆-
pared to BLEU, BERTScore is measured using
token-wise cosine similarity between representa-
tions produced by BERT. We measure semantic
overlaps between generated sentences and ref-
erence ones using ROUGE scores (林, 2004).
ROUGE is often used to evaluate text summariza-
的. The two metrics ROUGE1 and ROUGE2
refer to the overlaps of unigram and bigram
between the system and reference summaries,
分别.

Since generated paraphrases often only need to
rewrite the idioms of the original sentence, evalu-
ating the whole sentence cannot accurately reflect
the quality of paraphrase generation. 为了
better evaluate the quality of idiom paraphrasing,
we only evaluate the rewrite part of the gener-
ating paraphrases instead of the whole sentence
using the above metrics. 具体来说, given an
original sentence c, a reference sentence t, 和
the generated paraphrase sentence u, we first find
the common words in all the sentences c, t, 和
你, and remove the common words from t and
你. We evaluate the remaining words of t and u
using the above metrics, denoted as (BERT-E,
BERTScore-E, ROUGE1-E, and ROUGE2-E).
基线. In this research, we adopt


Seq2Seq methods to handle CIP tasks that are

5https://github.com/fxsjy/jieba.

6www.github.com/ZhuiyiTechnology/t5-pegasus.

747

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

方法

BLEU/BLEU-E

BERTS/BERTS-E

ROU1/ROU1-E

ROU2/ROU2-E

Re-translation

BERT

LSTM

Transformer

mT5

Infill

27.37/2.67

74.75/1.39

81.99/31.87

82.19/32.58

82.98/33.87

78.73/64.43

91.53/62.99

93.79/78.73

94.00/79.41

94.22/80.36

57.01/24.46

83.05/18.36

87.83/55.52

88.16/56.70

88.13/57.44

31.93/5.32

73.64/5.11

80.20/43.40

80.50/44.58

80.78/45.89

83.55/(34.26 ± 0.02) 94.46/(80.94±0.05) 88.68/(58.44±0.03) 81.57/(47.96±0.03)

桌子 4: The results of different methods on the in-domain test set using the metrics: 蓝线, BERTS,
ROUGE1, and ROUGE2, where BERTS, ROU1, and ROU2 refer to BERTScore, ROUGE-1, 和
ROUGE-2, 分别. ‘‘±’’ means the standard deviation of five runs.

LSTM-based, Transformer-based, and mT5-based
型号, 分别. We additionally provide
two zero-shot methods that can facilitate solv-
ing the CIP problem, 即, Re-translation and
BERT-CLS.

(1) The LSTM-based Seq2Seq method is a ba-
sic Seq2Seq method, which uses an LSTM (长的
short-term memory [Hochreiter and Schmidhuber,
1997]) to convert a sentence to a dense, fixed-
length vector representation. In contrast to vanilla
form of RNNs, LSTM can handle long sequences,
but it fails to maintain the global information of
the sequences.

(2) The Transformer-based Seq2Seq method
(Vaswani et al., 2017) is a state-of-the-art Seq2Seq
method that has been widely adopted to process
various NLP tasks, such as machine translation,
abstractive summarization, ETC. Transformer ap-
plies a self-attention mechanism that directly mod-
els the relationships among all words of an input
sequence regardless of words’ positions. 不像
LSTM, Transformer handles the entire input se-
quence at once, rather than iterating words one
by one.

(3) mT5 is a Seq2Seq method that uses the
framework of Transformer. 现在, most down-
stream NLP tasks build their models by fine-
tuning pre-trained language models (Raffel et al.,
2020). mT5 is a massively multilingual pre-trained
language model that is implemented in a form of
unifiedtext-to-textto process different down-
stream NLP problems. In this study, we fine-tune
the mT5-based approach to handle CIP task.

(4) Re-translation is implemented by utiliz-
ing the back-translation techniques of machine
translation methods. We first translate an idiom-
containing sentence to an English sentence using
an efficient Chinese-English translation system,

and then translate the generated English sentence
using our
trained English-Chinese translation
系统 (introduced by Section 3.1) to generate
a non-idiom-containing Chinese sentence. 这
Chinese-English translation system can be eas-
ily accessed online.7 The trained English-Chinese
translation system is a transformer-based Seq2Seq
方法.

(5) BERT-CLS is an existing BERT-based Chi-
nese lexical simplification method (Qiang et al.,
2021). In this task, an idiom is treated as a complex
word that will be replaced with a simpler word.

5.2 Performance of CIP Methods

桌子 4 summarizes the evaluation results on our
established CIP dataset using two types of met-
rics. The supervised CIP methods (LSTM, 反式-
以前的, mT5, and Infill) are significantly better
than two zero-shot methods (Re-translation and
BERT) in perspectives of the four metrics. 这
results reveal that the dataset is a high-quality
语料库, which can help to benefit CIP task.

桌子 4 和表 5 show that the performance
of LSTM-based baseline is inferior to the other
three baselines on in-domain and out-of-domain
test datasets. 一般来说, the two mT5-based CIP
方法 (mT5 and Infill) outperform the other
two methods (LSTM and Transformer), 哪个
suggests that CIP methods fine-tuned on mT5 can
improve CIP performance. It is observed that Infill
yields the best results on the in-domain test set
compared with other CIP methods, which verifies
that Infill is quite effective. On the out-of-domain
test set, BERT-based method achieves the best
results on ROU1 and ROU2, and obtains the
worst results on ROU1-E and ROU2-E, 因为

7huggingface.co/Helsinki-NLP/opus-mt-zh-en.

748

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

方法

BLEU/BLEU-E

BERTS/BERTS-E

ROU1/ROU1-E

ROU2/ROU2-E

Re-translation

BERT

LSTM

Transformer

mT5

Infill

13.65/0.87

84.95/1.63

81.20/7.81

80.14/8.07

84.76/9.29

72.63/61.87

94.79/63.07

93.50/67.03

93.55/67.36

94.58/68.05

47.18/22.73

89.84/19.95

87.66/31.08

87.63/31.62

89.20/31.82

19.47/2.65

85.10/5.99

81.69/14.14

81.56/14.48

83.76/14.64

86.60/(10.68±0.03) 94.98/(68.54±0.05) 89.70/(31.10±0.03) 84.89/(15.85±0.02)

桌子 5: The results of different methods on the out-of-domain test set. ‘‘±’’ means the standard
deviation of five runs.

方法

参考

BERT

LSTM

Transformer

mT5

Infill

Out

辛普. 意义

Fluency

4.41

3.28

3.58

3.48

3.85

4.02

4.26

2.24

3.33

3.29

3.57

3.78

4.29

2.58

3.27

3.22

3.62

3.81

Avg

4.32

2.70

3.39

3.33

3.68

3.87

辛普. 意义

Fluency

4.00

2.98

3.26

3.16

3.23

3.72

3.86

2.06

3.02

3.01

3.02

3.49

3.77

2.26

2.89

2.89

2.97

3.42

Avg

3.88

2.44

3.06

3.02

3.07

3.54

桌子 6: The results of human evaluation. ‘‘Simp.’’ denotes ‘‘simplicity’’, ‘‘Avg’’ denotes ‘‘average’’.

it makes minor modifications on the source sen-
张力. It means that the type of metrics (BLEU-E,
BERTS-E, ROU1-E, and ROU2-E) are more rea-
sonable as the evaluation metrics for CIP task.
Our proposed Infill-based method is still the best
option for CIP task on the out-of-domain test set.
Our proposed method Infill is superior to the
baselines in several key ways. 第一的, our approach
is more efficient, allowing us to achieve better
results in less time, because infill-based method
only needs to rephrase the idioms of the input sen-
张力. 第二, our method is more robust, 自从
it achieves the best results in both in-domain and
out-of-domain test sets. 最后, our method has
been extensively tested and validated, giving us
confidence in its reliability and accuracy through
human evaluation. 全面的, our method represents
a significant improvement over the existing base-
lines and is the best option for solving the CIP
problem at hand.

5.3 Human Evaluation

For further evaluating the CIP methods, we adopt
human evaluation to analyze the deployed CIP
方法. We choose 184 sentences from the in-
domain test set and 197 sentences from the out-

方法

Re-translation

BERT

LSTM

Transformer

mT5

Infill

99.82%

78.18%

85.64%

87.26%

87.54%

87.98%

Out

99.86%

69.21%

84.65%

85.13%

72.07%

81.29%

桌子 7: The proportion between the times of
idiom paraphrasing and the number of all idioms.

of-domain test set. To verify the scalability and
generalization ability of the CIP methods, 我们
show the performance of CIP methods when they
are used to solve the problems where the idioms
are never seen in the corpus. 所以, we choose
49 和 48 test sentences for in-domain and out-
of-domain test sets containing idioms that do not
appear in the training set, 分别. We ask
five native speakers to rate each generated sen-
tence using three features: simplicity, 意义,
and fluency. The five-point Likert scale is adopted
to rate these features, and the average scores of

749

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

sui Dynasty began to (西德:2)(西德:2)(西德:2)(西德:2)

打开 (西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)

分支机构(西德:2)(西德:2)到(西德:2)(西德:2)(西德:2)(西德:2)拿(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)

disabilities, also taking the first scholar.

they said what I tried to attain was a(西德:2)(西德:2)(西德:2)(西德:2)pipe(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)(西德:2)

dream and nothing more.

家庭.
You are not young anymore, you should be able to (西德:2)(西德:2)(西德:2)start(西德:2)(西德:2)A (西德:2)(西德:2)(西德:2)(西德:2)(西德:2)

Sent1

英语

参考

LSTM, Trans., mT5

Infill

Sent2

英语

参考

LSTM

反式.

mT5

Infill

Sent3

英语

参考

LSTM, 反式.

mT5

Infill

桌子 8: The outputting paraphrasing of CIP methods when the idioms do not appear in the training set.
‘‘Trans.’’ denotes ‘‘Transformer’’.

the features are calculated correspondingly. (1)
Simplicity is responsible for evaluating whether
re-paraphrased idioms of generated sentences are
easily understandable, which means idioms in
the original sentences should be rewritten with
simpler and more common words. (2) 意义
assesses whether generated sentences preserve the
meaning of the original sentences. (3) Fluency is
used to judge if a generated sentence is fluent and
does not contain grammatical errors.

The results of the human evaluation are shown
表中 6. We calculate the scores of annotated
sentences t, denoted as Reference. We see that
the infill-based mT5 method outperforms other
methods on in-domain and out-of-domain test
套, which means Infill is an effective method on
the CIP task. The conclusions are consistent with
the results using automatic metric. 与相比
参考, our method has a significant potential
for improvement.

此外, we calculated the inter-annotator
agreement among different annotators. Specifi-
卡莉, we computed Fleiss’ Kappa (弗莱斯, 1971)
scores for different domain test sets. The scores
是 0.199 和 0.097 in in-domain and out-of-
domain test sets, 分别. This indicates that
the evaluation of Chinese idiom paraphrasing was

relatively subjective but still managed to achieve
a modest level of agreement. We acknowledge
the limitations of human judgment in evaluating
the quality of paraphrasing and believe that the
diversity of opinions among raters is a valuable
insight into the complexity of the CIP task.

5.4 Proportion of Idiom Paraphrasing

CIP aims to rephrase an input idiom-containing
sentence to a meaning-preserving and non-idiom-
containing sentence. In this subsection, we count
the number of idiom-containing sentences that are
rephrased to non-idiom-containing sentences. 这
results are shown in Table 7. The result shows that
Re-translation achieves the best results, 哪个
can rephrase almost all
idioms to non-idiom-
containing representations. That means that our
idea on CIP dataset construction using machine
translation method is feasible. Theoretically, 如果
the trained English-Chinese machine translation
方法 (Stage 2 in the pipeline) can output
high-quality results, we do not need to ask annota-
tors to optionally revise Chinese translations. 我们
observe that the proportions of these CIP methods
(LSTM, Transformer, mT5, and Infill) are nearly
90%, which means they have great potential for
dealing with idiom paraphrasing. 而且, A

750

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

small number of idioms cannot be rephrased, 是-
cause some idioms are simple, thereby are re-
tained in the training set.

5.5 Case Study

The in-domain test set contains idioms that are
never seen in Train. We show the paraphrasing
results of different CIP methods in Table 8.

Our method consistently outperforms all other
approaches in the case study. We found that LSTM-
based and Transformer-based methods tend to re-
tain or output part of the idioms, 因为他们
cannot learn any knowledge of these idioms from
the training corpus. We found that both mT5-based
and Infill-based methods based on the pretrained
language model mT5 can generate correct inter-
pretations for some of the idioms, as the mT5
model has learned the knowledge of these id-
ioms. The mT5-based method generates a whole
new sentence for the original sentence, 哪个
can lead to some incorrect interpretations. 在骗子-
特拉斯特, the Infill-based method only rephrases the
idioms within the sentence based on their context,
which can produce higher-quality interpretations
compared to the mT5-based method.

5.6 The Translations of Chinese Idioms

Not only do idioms present a challenge for peo-
ple to understand, but they also present a greater
challenge for Chinese-based NLP applications.
这里, we use Chinese-English machine transla-
tion as an example of an NLP application to eval-
uate the usefulness of CIP methods. Given an
input sentence containing an idiom, we first use
our CIP method as a preprocessing technique to
rephrase the sentence, and then translate the para-
phrased version into an English sentence.

We give some examples to compare the dif-
ferences, and the results are shown in Table 9.
Because many idioms cannot be translated with
their literal meaning, our method helps to identify
and paraphrase these idioms, making them easier
for the machine translation system to process.

6 结论

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

桌子 9: The following examples demonstrate
how our CIP method can improve a Chinese-
English machine translation system. ‘‘Tran1’’
is the translation of the original sentence us-
ing Google Translate,8 ‘‘Para’’ is the paraphrased
version generated by our infill-based method, 和
‘‘Tran2’’ is the translation of the paraphrased
sentence using Google Translate.

在本文中, we propose a novel Chinese idiom
paraphrasing (CIP) 任务, which aims to rephrase
sentences containing idioms into non-idiomatic
versions. The CIP task can be treated as a special

case of paraphrase generation and can be ad-
dressed using Seq2Seq modeling. We construct

8https://translate.google.com/. Accessed

在: 2022-12-01.

751

a large-scale training dataset for CIP by taking
the collaborations between humans and machines.
具体来说, we first design a framework to con-
struct a pseudo-CIP dataset and then ask workers
to revise and evaluate the dataset. In this study, 我们
deploy three Seq2Seq methods and propose one
novel CIP methods (Infill) for the CIP task. Ex-
perimental results reveal that our proposed meth-
ods trained on our dataset can yield good results.
This could have a positive impact on the perfor-
mance of machine translation systems, 还有
as other natural language processing applications
that involve Chinese idioms. In our subsequent
研究, our proposed methods will be used as
strong baselines, and the established dataset will
also be used to accelerate the study on this topic.

致谢

This research is partially supported by the National
Natural Science Foundation of China under grants
62076217, 62120106008, 和 61906060, 和
Blue Project of Yangzhou University.

参考

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
本吉奥. 2015. Neural machine translation by
jointly learning to align and translate. In 3rd
International Conference on Learning Rep-
resentations, ICLR 2015.

Ond rej Bojar, Christian Federmann, Mark Fishel,
Yvette Graham, Barry Haddow, Matthias Huck,
Philipp Koehn, and Christof Monz. 2018.
Findings of the 2018 conference on machine
翻译 (wmt18). 在诉讼程序中

Third Conference on Machine Translation, 卷-
梅 2: Shared Task Papers, pages 272–307,
比利时, 布鲁塞尔. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/W18-6401

Shrey Desai and Greg Durrett. 2020. Calibration
of pre-trained transformers. 在诉讼程序中
这 2020 经验方法会议
自然语言处理博士 (EMNLP),
pages 295–302, 在线的. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/2020.emnlp-main.21

理解. 在诉讼程序中 2019 骗局-
ference of the North American Chapter of the
计算语言学协会: 胡-
man Language Technologies, 体积 1 (长的
and Short Papers), pages 4171–4186. https://
https://doi.org/10.18653/v1/N19-1423

Chris Donahue, Mina Lee, and Percy Liang. 2020.
Enabling language models to fill in the blanks.
In Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 2492–2501.

William Fedus, 伊恩·J. 好人, and Andrew
中号. Dai. 2018. Maskgan: Better text genera-
. In 6th International
tion via filling in the
Conference on Learning Representations, ICLR
2018, Vancouver, BC, 加拿大, 四月 30 –
可能 3, 2018, Conference Track Proceedings.
OpenReview.net.

Joseph L. 弗莱斯. 1971. Measuring nominal scale
agreement among many raters. Psychological
Bulletin, 76(5):378. https://doi.org/10
.1037/h0031619

Wan Yu Ho, Christine Kng, Shan Wang, 和
Francis Bond. 2014. Identifying idioms in
Chinese translations. In LREC, pages 716–721.

Sepp Hochreiter and J¨urgen Schmidhuber. 1997.
Long short-term memory. 神经计算,
https://doi.org/10
9(8):1735–1780.
.1162/neco.1997.9.8.1735, 考研:
9377276

Zhiying Jiang, Boliang Zhang, Lifu Huang,
and Heng Ji. 2018. Chengyu cloze test. 在
Proceedings of the Thirteenth Workshop on In-
novative Use of NLP for Building Educational
应用领域, pages 154–158.

Sora Kadotani, Tomoyuki Kajiwara, Yuki Arase,
and Makoto Onizuka. 2021. Edit distance
based curriculum learning for paraphrase gen-
进化. In Proceedings of the 59th Annual
Meeting of the Association for Computational
Linguistics and the 11th International Joint
Conference on Natural Language Processing:
Student Research Workshop, pages 229–234.
https://doi.org/10.18653/v1/2021
.acl-srw.24

Jacob Devlin, Ming-Wei Chang, Kenton Lee, 和
Kristina Toutanova. 2019. BERT: Pre-training
of deep bidirectional transformers for language

Diederik P. Kingma, and Jimmy Ba. 2015.
A method for stochastic optimization. https://
doi.org/10.48550/arXiv.2203.16634

752

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Chin-Yew Lin. 2004. ROUGE: A package for
automatic evaluation of summaries. In Text
Summarization Branches Out, pages 74–81,
巴塞罗那, 西班牙. Association for Computa-
tional Linguistics.

Alisa Liu, Swabha Swayamdipta, 诺亚A. 史密斯,
and Yejin Choi. 2022. WANLI: Worker and
AI collaboration for natural language inference
dataset creation. In Findings of the Associa-
tion for Computational Linguistics: EMNLP
2022, pages 6826–6847, 阿布扎比, 团结的
Arab Emirates. Association for Computational
语言学.

Yuanchao Liu, Bo Pang, and Bingquan Liu. 2019.
Neural-based Chinese idiom recommendation
for enhancing elegance in essay writing. 在
Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics,
pages 5522–5526. Florence, 意大利. 协会
for Computational Linguistics. https://土井
.org/10.18653/v1/P19-1552

Xinyu Lu, Jipeng Qiang, Yun Li, Yunhao Yuan,
and Yi Zhu. 2021. An unsupervised method for
building sentence simplification corpora in mul-
tiple languages. In Findings of the Association
for Computational Linguistics: EMNLP 2021,
pages 227–237, Punta Cana, Dominican Repub-
利克. 计算语言学协会.
https://doi.org/10.18653/v1/2021
.findings-emnlp.22

Kathleen R. McKeown. 1979. Paraphrasing us-
ing given and new information in a question-
answer system. In 17th Annual Meeting of
the Association for Computational Linguistics,
pages 67–72, 拉霍亚, 加利福尼亚州, 美国. Asso-
ciation for Computational Linguistics.

Yuxian Meng, Xiang Ao, Qing He, Xiaofei Sun,
Qinghong Han, Fei Wu, Chun Fan, and Jiwei Li.
2021. ConRPG: Paraphrase generation using
contexts as regularizer. 在诉讼程序中
2021 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2551–2562.
https://doi.org/10.18653/v1/2021
.emnlp-main.199

for effective paraphrasing.

Marie Meteer and Varda Shaked. 1988. Strate-
In COL-
吉斯
ING Budapest 1988 体积 2: 国际的
Conference on Computational Linguistics,
431–436. https://doi.org/10
页面
.3115/991719.991724

Myle Ott, Sergey Edunov, Alexei Baevski, Angela
Fan, Sam Gross, Nathan Ng, David Grangier,
and Michael Auli. 2019. fairseq: A fast, 前任-
tensible toolkit

诉讼程序 2019 Conference of the
North American Chapter of the Association
for Computational Linguistics (Demonstra-
系统蒸发散), pages 48–53. https://doi.org/10
.18653/v1/N19-4009

for sequence modeling.

Kishore Papineni, Salim Roukos, Todd Ward,
and Wei-Jing Zhu. 2002. 蓝线: A method for
automatic evaluation of machine translation.
In Proceedings of the 40th Annual Meeting

the Association for Computational Lin-
语言学, pages 311–318. https://doi.org
/10.3115/1073083.1073135

Maria Pershina, Yifan He, and Ralph Grishman.
2015. Idiom paraphrases: Seventh heaven vs
cloud nine. In Proceedings of the First Work-
shop on Linking Computational Models of
词汇, Sentential and Discourse-level Seman-
抽动症, pages 76–82. 里斯本, Portugal. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/W15-2709

Jipeng Qiang, Xinyu Lu, Yun Li, Yun-Hao
Yuan, and Xindong Wu. 2021. Chinese lex-
ical simplification. IEEE/ACM Transactions
on Audio, Speech, and Language Processing,
pages 1819–1828. https://doi.org/10
.1109/TASLP.2021.3078361

Colin Raffel, Noam Shazeer, Adam Roberts,
Katherine Lee, Sharan Narang, 迈克尔
Matena, Yanqi Zhou, Wei Li, and Peter J. 刘.
2020. Exploring the limits of transfer learning
with a unified text-to-text transformer. 杂志
of Machine Learning Research, 21:1–67.

Yutong Shao, Rico Sennrich, Bonnie Webber,
and Federico Fancellu. 2018. Evaluating ma-
chine translation performance on Chinese id-
ioms with a blacklist method. In Proceedings
of the Eleventh International Conference on
语言资源与评估 (LREC
2018).

Tianxiao Shen, Victor Quach, Regina Barzilay,
and Tommi Jaakkola. 2020. Blank language
型号. 在诉讼程序中 2020 会议
on Empirical Methods in Natural Language
加工 (EMNLP), pages 5186–5198.

Minghuan Tan and Jing Jiang. 2021. 学习
and evaluating Chinese idiom embeddings.

753

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

In Proceedings of the International Confer-
ence on Recent Advances in Natural Language
加工 (RANLP 2021), pages 1387–1396.

reconstruction. In Findings of
the Associa-
tion for Computational Linguistics: 前交叉韧带 2022,
pages 1234–1243.

Wilson L. 泰勒. 1953. ‘‘Cloze procedure’’: A
new tool for measuring readability. 杂志-
ism Quarterly, 30(4):415–433. https://土井
.org/10.1177/107769905303000401

Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N.
Gomez, Łukasz Kaiser, and Illia Polosukhin.
2017. Attention is all you need. In Advances
in Neural Information Processing Systems,
pages 5998–6008.

Thomas Wolf, Lysandre Debut, Victor Sanh,
Julien Chaumond, Clement Delangue, Anthony
Moi, Pierric Cistac, Tim Rault, R´emi Louf,
Morgan Funtowicz, Joe Davison, Sam Shleifer,
Patrick von Platen, Clara Ma, Yacine Jernite,
Julien Plu, Canwen Xu, Teven Le Scao,
Sylvain Gugger, Mariama Drame, Quentin
Lhoest, and Alexander M. 匆忙. 2020. 反式-
前者: State-of-the-art natural language pro-
cessing. 在诉讼程序中 2020 会议
on Empirical Methods in Natural Language Pro-
cessing: 系统演示, pages 38–45,
在线的. Association for Computational Lin-
语言学. https://doi.org/10.18653/v1
/2020.emnlp-demos.6

Yanling Xiao, Lemao Liu, Guoping Huang, Qu
Cui, Shujian Huang, Shuming Shi, and Jiajun
陈. 2022. BiTIIMT: A bilingual text-infilling
method for interactive machine translation. 在
Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics
(体积 1: Long Papers), pages 1958–1969,
都柏林,
爱尔兰. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/2022.acl-long.138

Xuhang Xie, Xuesong Lu, and Bei Chen.
2022. Multi-task learning for paraphrase gen-
eration with keyword and part-of-speech

Linting Xue, Noah Constant, Adam Roberts,
Mihir Kale, Rami Al-Rfou, Aditya Siddhant,
Aditya Barua, and Colin Raffel. 2021. mT5: A
massively multilingual pre-trained text-to-text
transformer. 在诉讼程序中 2021 骗局-
ference of the North American Chapter of the
计算语言学协会: 胡-
man Language Technologies, pages 483–498,
在线的. Association for Computational Lin-
语言学. https://doi.org/10.18653/v1
/2021.naacl-main.41

Najam Zaidi, Trevor Cohn, and Gholamreza
Haffari. 2020. Decoding as dynamic pro-
gramming for recurrent autoregressive models.
In International Conference on Learning
Representations.

Yuhui Zhang, Chenghao Yang, Zhengping Zhou,
and Zhiyuan Liu. 2020. Enhancing transformer
with sememe knowledge. 在诉讼程序中
the 5th Workshop on Representation Learning
for NLP, pages 177–184, 在线的. 协会
for Computational Linguistics.

Chujie Zheng, Minlie Huang, and Aixin Sun.
2019. ChID: A large-scale Chinese IDiom
dataset for cloze test. In Proceedings of the 57th
Annual Meeting of the Association for Compu-
tational Linguistics, pages 778–787. https://
doi.org/10.18653/v1/P19-1075

Jianing Zhou and Suma Bhat. 2021. Paraphrase
一代: A survey of the state of the art. 在
诉讼程序 2021 Conference on Empiri-
cal Methods in Natural Language Processing,
pages 5075–5086. https://doi.org/10
.18653/v1/2021.emnlp-main.414

Wanrong Zhu, Zhiting Hu, and Eric P. Xing.
2019. Text infilling. CoRR, abs/1901.00158.
https://doi.org/10.48550/arXiv.1901
.00158

754

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
5
7
2
2
1
4
3
2
7
9

/

/
t

A
C
_
A
_
0
0
5
7
2
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3Chinese Idiom Paraphrasing image
Chinese Idiom Paraphrasing image
Chinese Idiom Paraphrasing image

下载pdf