Meta-Learning a Cross-lingual Manifold for Semantic Parsing

Meta-Learning a Cross-lingual Manifold for Semantic Parsing

Tom Sherborne and Mirella Lapata
Institute for Language, Cognition and Computation
School of Informatics, Università di Edimburgo
10 Crichton Street, Edinburgh EH8 9AB, UK

tom.sherborne@ed.ac.uk, mlap@inf.ed.ac.uk

Astratto

Localizing a semantic parser to support new
languages requires effective cross-lingual gen-
eralization. Recent work has found success
with machine-translation or zero-shot meth-
ods, although these approaches can struggle
to model how native speakers ask questions.
We consider how to effectively leverage min-
imal annotated examples in new languages
for few-shot cross-lingual semantic parsing.
We introduce a first-order meta-learning algo-
rithm to train a semantic parser with maximal
sample efficiency during cross-lingual trans-
fer. Our algorithm uses high-resource lan-
guages to train the parser and simultaneously
optimizes for cross-lingual generalization to
lower-resource languages. Results across six
languages on ATIS demonstrate that our
combination of generalization steps yields
accurate semantic parsers sampling ≤10% of
source training data in each new language.
Our approach also trains a competitive model
on Spider using English with generalization
to Chinese similarly sampling ≤10% of train-
ing data.1

1

introduzione

A semantic parser maps natural language (NL)
utterances to logical forms (LF) or executable pro-
grams in some machine-readable language (per esempio.,
SQL). Recent improvement in the capability of
semantic parsers has focused on domain transfer
within English (Su and Yan, 2017; Suhr et al.,
2020), compositional generalization (Yin and
Neubig, 2017; Herzig and Berant, 2021; Scholak
et al., 2021), E, more recently, cross-lingual
metodi (Duong et al., 2017; Susanto and Lu,
2017B; Richardson et al., 2018).

Within cross-lingual semantic parsing, there
has been an effort to bootstrap parsers with min-

1Our code and data are available at github.com

/tomsherborne/xgr.

49

imal data to avoid the cost and labor required
to support new languages. Recent proposals in-
clude using machine translation to approximate
training data for supervised learning (Moradshahi
et al., 2020; Sherborne et al., 2020; Nicosia et al.,
2021) and zero-shot models, which engineer cross-
lingual similarity with auxiliary losses (van der
Goot et al., 2021; Yang et al., 2021; Sherborne
and Lapata, 2022). These shortcuts bypass costly
data annotation but present limitations such as
‘‘translationese’’ artifacts from machine transla-
zione (Koppel and Ordan, 2011) or undesirable
domain shift (Sherborne and Lapata, 2022). How-
ever, annotating a minimally sized data sample
can potentially overcome these limitations while
incurring significantly reduced costs compared
to full dataset translation (Garrette and Baldridge,
2013).

We argue that a few-shot approach is more
realistic for an engineer motivated to support ad-
ditional languages for a database—as one can
rapidly retrieve a high-quality sample of transla-
tions and combine these with existing supported
languages (cioè., English). Beyond semantic pars-
ing, cross-lingual few-shot approaches have also
succeeded at leveraging a small number of anno-
tations within a variety of tasks (Zhao et al., 2021,
inter alia) including natural language inference,
paraphrase identification, part-of-speech-tagging,
and named-entity recognition. Recentemente, the ap-
plication of meta-learning to domain generali-
zation has further demonstrated capability for
models to adapt to new domains with small sam-
ples (Gu et al., 2018; Li et al., 2018; Wang et al.,
2020B).

In this work, we synthesize these directions
into a meta-learning algorithm for cross-lingual
semantic parsing. Our approach explicitly opti-
mizes for cross-lingual generalization using fewer
training samples per new language without per-
formance degradation. We also require minimal

Operazioni dell'Associazione per la Linguistica Computazionale, vol. 11, pag. 49–67, 2023. https://doi.org/10.1162/tacl a 00533
Redattore di azioni: Wei Lu. Lotto di invio: 7/2022; Lotto di revisione: 9/2022; Pubblicato 1/2023.
C(cid:3) 2023 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
3
3
2
0
6
7
8
7
3

/

/
T

l

UN
C
_
UN
_
0
0
5
3
3
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

computational overhead beyond standard gradient-
descent training and no external dependencies be-
yond in-task data and a pre-trained encoder. Nostro
algorithm, Cross-Lingual Generalization Reptile
(XG-REPTILE) unifies two-stage meta-learning into
a single process and outperforms prior and con-
stituent methods on all languages, given identical
data constraints. The proposed algorithm is still
model-agnostic and applicable to more tasks re-
quiring sample-efficient cross-lingual transfer.

Our innovation is the combination of both
intra-task and inter-language steps to jointly learn
the parsing task and optimal cross-lingual trans-
fer. Specifically, we interleave learning the overall
task from a high-resource language and learn-
ing cross-lingual transfer from a minimal sample
of a lower-resource language. Results on ATIS
(Hemphill et al., 1990) in six languages (English,
French, Portuguese, Spanish, German, Chinese)
and Spider (Yu et al., 2018) in two languages (In-
glish, Chinese) demonstrate our proposal works
in both single- and cross-domain environments.
Our contributions are as follows:

• We introduce XG-REPTILE, a first-order
meta-learning algorithm for cross-lingual
generalization. XG-REPTILE approximates an
optimal manifold using support languages
with cross-lingual regularization using tar-
Ottenere
languages to train for explicit cross-
lingual similarity.

• We showcase sample-efficient cross-lingual
transfer within two challenging semantic
parsing datasets across multiple languages.
Our approach yields more accurate parsing
in a few-shot scenario and demands 10×
fewer samples than prior methods.

• We establish a cross-domain and cross-
lingual parser obtaining promising results for
both Spider in English (Yu et al., 2018) E
CSpider in Chinese (Min et al., 2019).

2 Related Work

Meta-Learning
for Generalization Meta-
Learning2 has recently emerged as a promising
technique for generalization, delivering high
performance on unseen domains by learning to

2We refer the interested reader to Wang et al. (2020B),
Hospedales et al. (2022), and Wang et al. (2021B) for more
extensive surveys on meta-learning.

Imparare, questo è, improving learning over multiple
episodes (Hospedales et al., 2022; Wang et al.,
2021B). A popular approach is Model-Agnostic
Meta-Learning (Finn et al., 2017, MAML),
wherein the goal is to train a model on a variety
of learning tasks, such that it can solve new tasks
using a small number of training samples. In ef-
fect, MAML facilitates task-specific fine-tuning
using few samples in a two-stage process. MAML
requires computing higher-order gradients (cioè.,
‘‘gradient through a gradient’’) which can often
be prohibitively expensive for complex models.
This limitation has motivated first-order ap-
proaches to MAML which offer similar perfor-
mance with improved computational efficiency.

In this vein,

the Reptile algorithm (Nichol
et al., 2018) transforms the higher-order gradi-
ent approach into K successive first-order steps.
Reptile-based training approximates a solution
manifold across tasks (cioè., a high-density pa-
rameter sub-region biased for strong cross-task
then similarly followed by rapid
likelihood),
fine-tuning. By learning an optimal
initializa-
zione, meta-learning proves useful for low-resource
adaptation by minimizing the data required
for out-of-domain tuning on new tasks. Kedia
et al. (2021) also demonstrate the utility of Reptile
to improve single-task performance. We build on
this to examine single-task cross-lingual transfer
using the manifold learned with Reptile.

Meta-Learning for Semantic Parsing A va-
riety of NLP applications have adopted meta-
learning in zero- and few-shot learning scenarios
as a method of explicitly training for general-
ization (Lee et al., 2021; Hedderich et al., 2021).
Within semantic parsing, there has been increasing
interest in cross-database generalization, moti-
vated by datasets such as Spider (Yu et al., 2018)
requiring navigation of unseen databases (Herzig
and Berant, 2017; Suhr et al., 2020).

Approaches to generalization have included
simulating source and target domains (Givoli and
Reichart, 2019) and synthesizing new training data
based on unseen databases (Zhong et al., 2020;
Xu et al., 2020UN). Meta-learning has demonstrated
fast adaptation to new data within a monolin-
gual low-resource setting (Huang et al., 2018; Guo
et al., 2019; Lee et al., 2019; Sole et al., 2020).
Allo stesso modo, Chen et al. (2020) utilize Reptile to
improve generalization of a model, trained on
source domains, to fine-tune on new domains.

50

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
3
3
2
0
6
7
8
7
3

/

/
T

l

UN
C
_
UN
_
0
0
5
3
3
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Our work builds on Wang et al. (2021UN), who
explicitly promote monolingual cross-domain
generalization by ‘‘meta-generalizing’’ across dis-
joint, domain-specific batches during training.

Cross-lingual Semantic Parsing A surge of in-
terest in cross-lingual NLU has seen the creation of
many benchmarks across a breadth of languages
(Conneau et al., 2018; Hu et al., 2020; Liang
et al., 2020), thereby motivating significant ex-
ploration of cross-lingual transfer (Nooralahzadeh
et al., 2020; Xia et al., 2021; Xu et al., 2021;
Zhao et al., 2021, inter alia). Previous approaches
to cross-lingual semantic parsing assume parallel
multilingual training data (Jie and Lu, 2014) E
exploit multi-language inputs for training without
resource constraints (Susanto and Lu, 2017UN,B).

There has been recent interest in evaluating if
machine translation is an economic proxy for cre-
ating training data in new languages (Sherborne
et al., 2020; Moradshahi et al., 2020). Zero-shot
approaches to cross-lingual parsing have also
been explored using auxiliary training objectives
(Yang et al., 2021; Sherborne and Lapata, 2022).
Cross-lingual
learning has also been gaining
traction in the adjacent field of spoken-language
understanding (SLU). For datasets such as Multi-
ATIS (Upadhyay et al., 2018), MultiATIS++ (Xu
et al., 2020B), and MTOP (Li et al., 2021),
zero-shot cross-lingual transfer has been studied
through specialized decoding methods (Zhu et al.,
2020), machine translation (Nicosia et al., 2021),
and auxiliary objectives (van der Goot et al.,
2021).

Cross-lingual semantic parsing has mostly
remained orthogonal to the cross-database gen-
eralization challenges raised by datasets such as
Spider (Yu et al., 2018). While we primarily
present findings for multilingual ATIS into SQL
(Hemphill et al., 1990), we also train a parser on
both Spider and its Chinese version (Min et al.,
2019). To the best of our knowledge, we are
the first to explore a multilingual approach to
this cross-database benchmark. We use Reptile to
learn the overall task and leverage domain gener-
alization techniques (Li et al., 2018; Wang et al.,
2021UN) for sample-efficient cross-lingual transfer.

3 Problem Definition

a natural language utterance and a relational data-
base context to an executable program expressed
in a logical form (LF) lingua:

P = pθ (Q, D)

(1)

As formalized in Equation (1), we learn pa-
rameters, θ, using paired data (Q, P, D) Dove
P is the logical form equivalent of natural lan-
guage question Q. In this work, our LFs are all
executable SQL queries and therefore grounded
in a database D. A single-domain dataset refer-
ences only one D database for all (Q, P ), whereas
a multi-domain dataset demands reasoning about
unseen databases to generalize to new queries.
This is expressed as a ‘zero-shot’ problem if the
databases at test time, Dtest, were unseen during
training. This challenge demands a parser capa-
ble of domain generalization beyond observed
databases. This is in addition to the structured
prediction challenge of semantic parsing.

Cross-Lingual Generalization Prototypical se-
mantic parsing datasets express the question, Q, In
English only. As discussed in Section 1, our parser
should be capable of mapping from additional
languages to well-formed, executable programs.
Tuttavia, prohibitive expense limits us from
reproducing a monolingual model for each addi-
tional language and previous work demonstrates
accuracy improvement by training multilingual
models (Jie and Lu, 2014). In aggiunta a
challenges of structured prediction and domain
generalization, we jointly consider cross-lingual
generalization. Training primarily relies on exist-
ing English data (cioè., QEN samples) and we show
that our meta-learning algorithm in Section 4
leverages a small sample of training data in new
languages for accurate parsing. We express this
sample, Sl, for some language, l, COME:

Sl = (Ql, P, D)Nl
i=0

(2)

where Nl
is the sample size from l, assumed
to be smaller than the original English dataset
(cioè., Nl (cid:4) NEN). Where available, we extend
this paradigm to develop models for L different
languages simultaneously in a multilingual setup
by combining samples as:

Semantic Parsing We wish to learn a param-
eterized parsing function, , which maps from

SL = {Sl1, Sl2, . . . , SlN

}

(3)

51

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
3
3
2
0
6
7
8
7
3

/

/
T

l

UN
C
_
UN
_
0
0
5
3
3
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

We can express cross-lingual generalization as:

(P | Ql, D) → pθ (P | QEN, D)

(4)

where pθ (P | QEN, D) is the predicted distri-
bution over all possible output SQL sequences
conditioned on an English question, QEN, and a
database D. Our goal is for the prediction from a
new language, Ql, to converge towards this ex-
isting distribution using the same parameters θ,
constrained to fewer samples in l than English.

We aim to maximize the accuracy of predict-
ing programs on unseen test data from each
non-English language l. The key challenge is
learning a performant distribution over each new
language with minimal available samples. Questo
includes learning to incorporate each l into the
parsing task and modeling the language-specific
surface form of questions. Our setup is akin to
few-shot learning; Tuttavia, the number of ex-
amples needed for satisfactory performance is an
empirical question. We are searching for both
minimal sample sizes and maximal sampling ef-
ficiency. We discuss our sampling strategy in
Sezione 5.2 with results at multiple sizes of SL
in Section 6.

4 Methodology

We combine two meta-learning techniques for
cross-lingual semantic parsing. The first is the
Reptile algorithm outlined in Section 2. Reptile
optimizes for dense likelihood regions within the
parameters (cioè., a solution manifold) through pro-
moting inter-batch generalization (Nichol et al.,
2018). Standard Reptile iteratively optimizes the
manifold for an improved initialization across
objectives. Rapid fine-tuning yields the final
task-specific model. The second technique is the
first-order approximation of DG-MAML (Li et al.,
2018; Wang et al., 2021UN). This single-stage pro-
cess optimizes for domain generalization by
simulating ‘‘source’’ and ‘‘target’’ batches from
different domains to explicitly optimize for cross-
batch generalization. Our algorithm, XG-REPTILE,
combines these paradigms to optimize a target
loss with the overall learning ‘‘direction’’ de-
rived as the optimal manifold learned via Rep-
tile. This trains an accurate parser demonstrating
sample-efficient cross-lingual transfer within an
efficient single-stage learning process.

Figura 1: One iteration of XG-REPTILE. (1) Run K
iterations of gradient descent over K support batches to
learn φK, (2) compute ∇macro, the difference between
φK and φ1, (3) find the loss on the target batch using
φK, E (4) compute the final gradient update from
∇macro and the target loss.

4.1 The XG-REPTILE Algorithm

Each learning episode of XG-REPTILE comprises
intra-task learning and
two component steps:
inter-language generalization to jointly learn pars-
ing and cross-lingual transfer. Alternating these
processes trains a competitive parser from multi-
ple languages with low computational overhead
beyond existing gradient-descent training. Nostro
two stages of
approach combines the typical
meta-learning to produce a single model without
a fine-tuning requirement.

Task Learning Step We first sample from the
high-resource language (cioè., SEN) K ‘‘support’’
batches of examples, BS = {(QEN, P, D)}. For
each of K batches: We compute predictions,
compute losses, calculate gradients and adjust pa-
rameters using some optimizer (see illustration in
Figura 1). After K successive optimization steps
the initial weights in this episode, φ1, have been
optimized to φK. The difference between final
and initial weights is calculated as:

∇macro = φK − φ1

(5)

This ‘‘macro-gradient’’ step is equivalent to a
Reptile step (Nichol et al., 2018), representing
learning a solution manifold as an approximation
of overall learning trajectory.

52

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
3
3
2
0
6
7
8
7
3

/

/
T

l

UN
C
_
UN
_
0
0
5
3
3
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

k=1 from SEN

Copy φ1 ← θt−1
Sample K support batches {BS}K
Sample target language l from L languages
Sample target batch BT from Sl
for k ← 1 to K [Inner Loop] do
(cid:3)
← Forward
(cid:2)

Algorithm 1 XG-REPTILE
Require: Support data, SEN, target data, SL
Require: Inner learning rate, α, outer learning rate, β
1: Initialise θ1, the vector of initial parameters
2: for t ← 1 to T do
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15: end for

end for
Macro grad: ∇macro ← φK − φ1
BT , φK
Target Step: LT ← Forward
Total gradient: ∇Σ = ∇macro + ∇φK
Update θt ← SGD (θt−1, ∇Σ, β)

LS
k
φk ← Adam

BS
k , φk−1
φk−1, ∇LS

k , α

LT

(cid:3)

(cid:2)

(cid:2)

(cid:3)

1
K proportionality requires 10% of target-language
data relative to support. We demonstrate in
Sezione 6 that we can use a smaller < 1 K quantity per target language to increase sample efficiency. Gradient Analysis Following Nichol et al. (2018), we express gk = ∇LS k , the gradient in a single step of the inner loop (Line 9), as: gk = ¯gi + ¯Hk (φk − φ1) + O (cid:3) (cid:2) α2 (7) We use a Taylor series expansion to approximate gk by ¯gk, the gradient at the original point, φ1, the Hessian matrix of the gradient at the initial point, ¯Hk, the step difference between position φk and the initial position and some scalar terms with marginal influence, O α2 (cid:2) (cid:3) . Cross-Lingual Step The second step samples one ‘‘target’’ batch, BT = (Ql, P, D), from a sampled target language (i.e., Sl ⊂ SL). We com- pute the cross-entropy loss and gradients from the prediction of the model at φK on BT : LT = Loss (pφK (Ql, D) , P ) (6) We evaluate the parser at φK on a target lan- guage we desire to generalize to. We show below that the gradient of LT comprises the loss at φK and additional terms maximizing the inner product between the high-likelihood manifold and the tar- get loss. The total gradient encourages intra-task and cross-lingual learning (see Figure 1). Algorithm 1 outlines the XG-REPTILE process (loss calculation and batch processing are simpli- fied for brevity). We repeat this process over T episodes to train model pθ to convergence. If we optimized for target data to align with individual support batches (i.e., K = 1) then we may observe batch-level noise in cross-lingual generalization. Our intuition is that aligning the target gradient with an approximation of the task manifold, i.e., ∇macro, will overcome this noise and align new languages to a more mutually beneficial direction during training. We observe this intuitive behavior during learning in Section 6. We efficiently generalize to low-resource languages by exploiting the asymmetric data re- quirements between steps: One batch of the target language is required for K batches of the source language. For example, if K = 10 then using this By evaluating Equation (7) at i = 1 and rewrit- ing the difference as a sum of gradient steps (e.g., Equations (8) and (9)), we arrive at an expression for gk shown in Equation (10) expressing the gra- dient as an initial component, ˆgk, and the product of the Hessian at k, with all prior gradient steps. We refer to Nichol et al. (2018) for further vali- dation that the gradient of this product maximizes the cross-batch expectation—therefore promoting cross-batch generalization and towards the solu- tion manifold. The final gradient (Equation (11)) is the accumulation over gk steps and is equivalent to Equation (5). ∇macro comprises both gradi- ents of K steps and additional terms maximiz- ing the inner-product of cross-batch gradients. Use gj = ¯gj + O(α) φk − φ1 = −α k−1(cid:4) j=1 gj gk = ¯gi − α ¯Hi (cid:3) (cid:2) α2 ¯gj + O k−1(cid:4) j=1 ∇macro = K(cid:4) k=1 gk (8) (9) (10) (11) We can similarly express the gradient of the target batch as Equation (12) where the term, ¯HT ∇macro, is the cross-lingual generalization product similar to the intra-task generalization seen above. gT = ¯gT − α ¯HT ∇macro + O (cid:3) (cid:2) α2 (12) 53 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Equation (13) shows an example final gradi- ent when K = 2. Within the parentheses are the cross-batch and cross-lingual gradient prod- ucts as components promoting fast learning across multiple axes of generalization. ∇Σ = g1 + g2 + gT = ¯g1 + ¯g2 + ¯gT − α (cid:2) ¯H2¯g1 + ¯HT [¯g1 + ¯g2] (13) (cid:3) (cid:3) (cid:2) α2 + O The key hyperparameter in XG-REPTILE is the number of inner-loop steps K representing a trade-off between manifold approximation and target step frequency. At small K, the manifold approximation may be poor, leading to sub- optimal learning. At large K, then improved mani- fold approximation incurs fewer target batch steps per epoch, leading to weaked cross-lingual trans- fer. In practice, K is set empirically, and Section 6 identifies an optimal region for our task. XG-REPTILE can be viewed as generalizing two existing algorithms. Without the LT loss, our approach is equivalent to Reptile and lacks cross-lingual alignment. If K = 1, then XG- to DG-FMAML (Wang REPTILE is equivalent et al., 2021a) but lacks generalization across support batches. Our unification of these al- gorithms represent the best of both approaches and outperforms both techniques within semantic parsing. Another perspective is that XG-REPTILE learns a regularized manifold, with immediate cross-lingual capability, as opposed to standard Reptile, which requires fine-tuning to transfer across tasks. We identify how this contrast in transfer in approaches influences cross-lingual Section 6. 5 Experimental Design We evaluate XG-REPTILE against several com- parison systems across multiple languages. Where possible, we re-implement existing models and use identical data splits to isolate the contribu- tion of our training algorithm. training pairs with 493 and 448 examples for validation and testing, respectively. We report performance as execution accuracy to test if pre- dicted SQL queries can retrieve accurate data- base results. We also evaluate on Spider (Yu et al., 2018), combining English and Chinese (Min et al., 2019, CSpider) versions as a cross-lingual task. The latter translates all questions to Chinese but re- tains the English database. Spider is significantly more challenging; it contains 10,181 questions and 5,693 unique SQL queries for 200 multi-table databases over 138 domains. We use the same split as Wang et al. (2021a) to measure general- ization to unseen databases/table-schema during testing. This split uses 8,659 examples from 146 databases for training and 1,034 examples from 20 databases for validation. The test set contains 2,147 examples from 40 held-out databases and is held privately by the authors. To our knowl- edge, we report the first multilingual approach for Spider by training one model for English and Chinese. Our challenge is now multi-dimensional, requiring cross-lingual and cross-domain gener- alization. Following Yu et al. (2018), we report exact set match accuracy for evaluation. 5.2 Sampling for Generalization Training for cross-lingual generalization often uses parallel samples across languages. We illus- trate this in Equation (14), where y1 is the equiv- alent output for inputs, x1, in each language: EN : (x1, y1) DE : (x1, y1) ZH : (x1, y1) (14) However, high sample overlap risks trivializing the task because models are not learning from new pairs, but instead matching only new inputs to known outputs. A preferable evaluation will test composition of novel outputs from unseen inputs: EN : (x1, y1) DE : (x2, y2) ZH : (x2, y2) (15) 5.1 Data We report results on two semantic parsing data- sets. First on ATIS (Hemphill et al., 1990), us- ing the multilingual version from Sherborne and Lapata (2022) pairing utterances in six languages (English, French, Portuguese, Spanish, German, Chinese) to SQL queries. ATIS is split into 4,473 Equation (15) samples exclusive, disjoint datasets for English and target languages during training. In other words, this process is subtractive—for example, a 5% sample of German (or Chinese) target data leaves 95% of data as the English support. This is similar to K-fold cross-validation used to evaluate across many data splits. We 54 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 sample data for our experiments with Equa- tion (15). It is also possible to use Equation (16), where target samples are also disjoint, but we find this setup results in too few English exam- ples for effective learning. EN : (x1, y1) DE : (x2, y2) ZH : (x3, y3) (16) 5.3 Semantic Parsing Models We use a Transformer encoder-decoder model similar to Sherborne and Lapata (2022) for our ATIS experiments. We use the same mBART50 encoder (Tang et al., 2021) and train a Trans- former decoder from scratch to generate SQL. For Spider, we use the RAT-SQL model (Wang et al., 2020a), which has formed the basis of many performant submissions to the Spider leaderboard. RAT-SQL can successfully reason about unseen databases and table schema using a novel schema-linking approach within the encoder. We use the version from Wang et al. (2021a) with mBERT (Devlin et al., 2019) input embeddings for a unified model between English and Chinese inputs. Notably, RAT-SQL can be over-reliant on lexical similarity features between input questions and tables (Wang et al., 2020a). This raises the challenge of generaliz- ing to Chinese where such overlap is null. For fair comparison, we implement identical models as prior work on each dataset and only evaluate the change in training algorithm. This is why we use an mBART50 encoder component for ATIS experiments and different mBERT input embed- dings for Spider experiments. 5.4 Comparison Systems Translate-Test A monolingual Transformer is trained on source English data (SEN). Ma- chine translation is used to translate test data from additional languages into English. Logical forms are predicted from translated data using the English model. Translate-Train Machine translation is used to translate English training data into each tar- get language. A monolingual Transformer is trained on translated training data and logical forms are predicted using this model. Train-EN∪All A Transformer is trained on English data and samples from all target languages together in a single stage (i.e., SEN ∪SL). This is superior to training without English (e.g., on SL only); we contrast to this approach for more competitive comparison. TrainEN→FT-All We first train on English sup- port data, SEN, and then fine-tune on target samples, SL. Reptile-EN→FT-All Initial training uses Rep- tile (Nichol et al., 2018) on English support data, SEN, followed by fine-tuning on tar- get samples, SL. This is a typical usage of Reptile for training a low-resource multi- domain parser (Chen et al., 2020). We also compare to DG-FMAML (Wang et al., 2021a) as a special case of XG-REPTILE when K = 1. Additionally, we omit pairwise versions of XG-REPTILE (e.g., separate models general- izing from English to individual languages). These approaches demand more computation and demonstrated no significant improvement over a multi-language approach. All Machine Transla- tion uses Google Translate (Wu et al., 2016). We compare our algorithm against several strong baselines and adjacent training methods including: 5.5 Training Configuration Monolingual Training A monolingual Trans- former is trained on gold-standard pro- fessionally translated data for each new language. This is a monolingual upper bound without few-shot constraints. Multilingual Training A multilingual Trans- former is trained on the union of all data from the ‘‘Monolingual Training’’ method. This ideal upper bound uses all data in all languages without few-shot constraints. Experiments focus on the expansion from English to additional languages, where we use English as the ‘‘support’’ language and additional languages as ‘‘target’’. Key hyperparameters are outlined in Table 1. We train each model using the given opti- mizers with early stopping where model selection is through minimal validation loss for combined support and target languages. Input utterances are tokenized using SentencePiece (Kudo and Richardson, 2018) and Stanza (Qi et al., 2020) for ATIS and Spider, respectively. All experi- ments are implemented in PyTorch on a single 55 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 ATIS Spider 16 10 SGD 1 × 10−4 Batch Size Inner Optimizer Inner LR Outer Optimizer Adam (Kingma and Ba, 2015) 1 × 10−3 Outer LR Optimum K 10 Max Train Steps Training Time 5 × 10−4 3 12 hours 2.5 days 20,000 Table 1: Experimental hyperparameters for XG- REPTILE on ATIS and Spider set primarily by replicating prior work. V100 GPU. We report key results for ATIS aver- aged over three seeds and five random data splits. For Spider, we submit the best singular model from five random splits to the leaderboard. 6 Results and Analysis We contrast XG-REPTILE to baselines for ATIS in Table 2 and present further analysis within Figure 2. Results for the multi-domain Spider are shown in Table 3. Our findings support our hypothesis that XG-REPTILE is a superior algo- rithm for jointly training a semantic parser and encouraging cross-lingual generalization with im- proved sample efficiency. Given the same data, XG-REPTILE produces more mutually beneficial parameters for both model requirements with only modifications to the training loop. Comparison across Generalization Strategies We compare XG-REPTILE to established learn- ing algorithms in Table 2. Across baselines, we find that single-stage training, that is, Train- EN∪All or machine-translation based mod- els, perform below two-stage approaches. The strongest competitor is the Reptile-EN→FT-All model, highlighting the effectiveness of Reptile for single-task generalization (Kedia et al., 2021). However, XG-REPTILE performs above all base- lines across sample rates. Practically, 1%, 5%, 10% correspond to 45, 225, and 450 example pairs, respectively. We identify significant im- provements (p < 0.01; relative to the closest model using an independent t-test) in cross- lingual transfer through jointly learning to parse and multi-language generalization while main- taining single-stage training efficiency. 56 Compared to the upper bounds, XG-REPTILE performs above Monolingual Training at ≥ 1% sampling, which further supports the prior ben- efit of multilingual modeling (Susanto and Lu, 2017a). Multilingual Training is only marginally stronger than XG-REPTILE at 1% and 5% sam- pling despite requiring many more examples. XG-REPTILE@10% improves on this model by an average +1.3%. Considering that our upper bound uses 10× the data of XG-REPTILE@10%, this accuracy gain highlights the benefit of ex- plicit cross-lingual generalization. This is con- sistent at higher sample sizes (see Figure 2(c) for German). At the smallest sample size, XG-REPTILE@1%, demonstrates a +12.4% and +13.2% improve- ment relative to Translate-Train and Translate- Test. Machine translation is often viable for cross-lingual transfer (Conneau et al., 2018). How- ever, we find that mistranslation of named entities incurs an exaggerated parsing penalty—leading to inaccurate logical forms (Sherborne et al., 2020). This suggests that sample quality has an exaggerated influence on semantic parsing perfor- mance. When training XG-REPTILE with MT data, we also observe a lower Target-language aver- age of 66.9%. This contrast further supports the importance of sample quality in our context. XG-REPTILE improves cross-lingual generaliza- tion across all languages at equivalent and lower sample sizes. At 1%, it improves by an average +15.7% over the closest model, Reptile-EN→ FT-All. Similarly, at 5%, we find +9.8% gain, and at 10%, we find +8.9% relative to the clos- est competitor. Contrasting across sample sizes— our best approach is @10%, however, this is +3.5% above @1%, suggesting that smaller samples could be sufficient if 10% sampling is unattainable. This relative stability is an improve- ment compared to the 17.7%, 11.2%, or 10.3% difference between @1% and @10% for other models. This implies that XG-REPTILE better uti- lizes smaller samples than adjacent methods. Across languages at 1%, XG-REPTILE improves primarily for languages dissimilar to English (Ahmad et al., 2019) to better minimize the cross-lingual transfer gap. For Chinese (ZH), we see that XG-REPTILE@1% is +26.4% above the closest baseline. This contrasts with the smallest gain, +8.5% for German, with greater similar- ity to English. Our improvement also yields less variability across target languages—the standard l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 ZX-PARSE (Sherborne and Lapata, 2022) Monolingual Training Multilingual Training Translate-Train Translate-Test @1% @5% @10% Train-EN∪All Train-EN→FT-All Reptile-EN→FT-All XG-REPTILE Train-EN∪All Train-EN→FT-All Reptile-EN→FT-All XG-REPTILE Train-EN∪All Train-EN→FT-All Reptile-EN→FT-All XG-REPTILE EN 76.9 77.2 73.9 — — 69.7 ± 1.4 71.2 ± 2.3 73.2 ± 0.7 73.8 ± 0.3 67.3 ± 1.6 69.2 ± 1.9 69.5 ± 1.8 74.4 ± 1.3 65.7 ± 1.9 67.4 ± 1.9 72.8 ± 1.8 75.8 ± 1.3 FR 70.2 67.8 72.5 55.9 58.2 44.0 ± 3.5 53.3 ± 5.2 58.9 ± 4.8 70.4 ± 1.8 55.2 ± 4.5 58.9 ± 5.3 65.3 ± 3.8 73.0 ± 0.9 61.5 ± 1.7 63.8 ± 5.8 66.3 ± 4.2 74.2 ± 0.2 PT 63.4 66.1 73.1 56.1 57.3 42.2 ± 3.7 49.7 ± 5.4 54.8 ± 3.4 70.8 ± 0.7 54.7 ± 4.5 54.8 ± 5.4 61.3 ± 6.0 71.6 ± 1.1 62.1 ± 2.3 60.3 ± 5.3 64.6 ± 4.9 72.8 ± 0.6 ES 59.7 64.1 70.4 57.1 57.9 38.3 ± 6.8 56.1 ± 2.7 52.8 ± 4.4 68.9 ± 2.3 44.4 ± 4.5 52.8 ± 4.5 59.6 ± 2.6 71.6 ± 0.7 53.7 ± 3.2 59.6 ± 4.0 62.3 ± 6.4 72.1 ± 0.7 DE 69.3 66.6 72.0 60.1 56.9 45.8 ± 2.6 52.5 ± 6.7 60.6 ± 3.6 69.1 ± 1.2 55.8 ± 2.9 60.6 ± 6.5 64.9 ± 5.1 71.1 ± 0.6 62.7 ± 2.3 64.5 ± 6.5 66.6 ± 5.0 73.0 ± 0.6 ZH 60.2 64.9 70.5 56.1 51.4 41.7 ± 3.6 39.0 ± 4.0 41.7 ± 4.0 68.1 ± 1.2 52.3 ± 4.3 41.7 ± 9.5 56.9 ± 9.2 69.5 ± 0.5 60.6 ± 2.4 58.4 ± 6.4 60.7 ± 3.6 72.8 ± 0.5 Target Avg 64.6 ± 5.0 65.9 ± 1.4 71.7 ± 1.2 57.1 ± 1.8 56.3 ± 2.8 42.4 ± 2.8 50.1 ± 6.6 53.8 ± 7.4 69.5 ± 1.1 52.5 ± 4.7 53.8 ± 7.4 61.6 ± 3.6 71.4 ± 1.3 60.1 ± 3.7 61.3 ± 2.7 64.1 ± 2.6 73.0 ± 0.8 Table 2: Denotation accuracy using varying learning algorithms including XG-REPTILE at 1%, 5%, and 10% sampling rates for target dataset size relative to support dataset for ATIS. We report for English, French, Portuguese, Spanish, German, and Chinese. Target Avg reports the average denotation accuracy across non-English languages ± standard deviation across languages. For few-shot experiments, we also report the standard deviation (±) across random samples. Best few-shot results per language are bolded. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 2: Ablation Experiments on ATIS (a) accuracy against inner loop size K across languages, (b) accuracy against K for German when varying batch size, and (c) accuracy against dataset sample size relative to support dataset from 1% to 50% for German. For (b), the K = 1 case is equivalent to DG-FMAML (Wang et al., 2021a). deviation across languages for XG-REPTILE@1% is 1.1, compared to 2.8 for Train-EN∪All or 7.4 for Reptile-EN→FT-All. We can also compare to ZX-PARSE, the method of Sherborne and Lapata (2022) that engineers cross-lingual latent alignment for zero-shot se- mantic parsing without data in target languages. language, XG- With 45 samples per REPTILE@1% improves by an average of +4.9%. for distant is more beneficial XG-REPTILE languages—cross-lingual transfer penalty be- tween English and Chinese is −12.3% for ZX- PARSE compared to −5.7% in our case. While these systems are not truly comparable, given target different data requirements, this contrast is prac- tically useful for comparison between zero- and few-shot localization. Influence of K on Performance In Figure 2(a) we study how variation in the key hyperparame- ter K, the size of the inner-loop in Algorithm 1 or the number of batches used to approximate the solution manifold influences model per- formance across languages (single run at 5% sampling). When K = 1, the model learns gen- eralization from batch-wise similarity, which is equivalent to DG-FMAML (Wang et al., 2021a). We empirically find that increasing K beyond 57 one benefits performance by encouraging cross- lingual generalization with the task over a sin- gle batch, and it is, therefore, beneficial to align an out-of-domain example with the overall di- rection of training. However, as theorized in Section 4, increasing K also decreases the fre- quency of the outer step within an epoch leading to poor cross-lingual transfer at high K. This trade-off yields an optimal operating regime for this hyper-parameter. We use K = 10 in our experiments as the center of this region. Given this setting of K, the target sample size must be 10% of the support sample size for training in a single epoch. However, Table 2 identi- fies XG-REPTILE as the most capable algorithm for ‘‘over-sampling’’ smaller target samples for resource-constrained generalization. Influence of Batch Size on Performance We consider two further case studies to analyze XG-REPTILE performance. For clarity, we focus on German; however, these trends are consistent across all target languages. Figure 2(b) exam- ines if the effects of cross-lingual transfer within XG-REPTILE are sensitive to batch size during train- ing (single run at 5% sampling). A dependence between K and batch size could imply that the desired inter-task and cross-lingual generaliza- tion outlined in Equation (13) is an unrealistic, edge-case phenomenon. This is not the case, and a trend of optimal K setting is consistent across many batch sizes. This suggests that K is an independent hyper-parameter requiring tuning alongside existing experimental settings. Performance across Larger Sample Sizes We consider a wider range of target data sample sizes between 1% and 50% in Figure 2(c). We observe that baseline approaches converge to be- tween 69.3% and 73.9% at 50% target sample size. Surprisingly, the improvement of XG-REPTILE is retained at higher sample sizes with an accuracy of 76.5%. The benefit of XG-REPTILE is still great- est at low sample sizes with +5.4% improvement at 1%; however, we maintain a +2.6% gain over the closest system at 50%. While low sampling is the most economical, the consistent benefit of XG-REPTILE suggests a promising strategy for other cross-lingual tasks. Learning Spider and CSpider Our results on Spider and CSpider are shown in Table 3. We EN ZH Dev Test Dev Test 50.4 65.2 68.9 46.9 56.8 — 32.5 — 63.5 — 48.9 — Monolingual DG-MAML DG-FMAML XG-REPTILE Multilingual 56.8 XG-REPTILE @1% 59.6 @5% @10% 59.2 56.5 58.1 59.7 47.0 47.3 48.0 45.6 45.6 46.0 Table 3: Exact set match accuracy for RAT-SQL trained on Spider (English) and CSpider (Chi- nese) comparing XG-REPTILE to DG-MAML and DG-FMAML (Wang et al., 2021a). We exper- iment with sampling between 1% to 10% of Chinese examples relative to English. Monolin- gual and multilingual best results are bolded. compare XG-REPTILE to monolingual approaches from Wang et al. (2021a) and discuss cross-lingual results when sampling between 1% and 10% of CSpider target during training. In the monolingual setting, XG-REPTILE shows significant improvement (p < 0.01; using an inde- pendent samples t-test) compared to DG-FMAML with +6.7% for English and +16.4% for Chinese dev accuracy. This further supports our claim that generalizing with a task manifold is superior to batch-level generalization. Our results are closer to DG-MAML (Wang et al., 2021a), a higher-order meta-learning method requiring computational resources and training times exceeding 4× the requirements for XG-REPTILE. XG-REPTILE yields accuracies −5.4% and −1.5% below DG-MAML for En- glish and Chinese, where DG-FMAML performs much lower at −12.1% (EN) and −17.9% (ZH). Our results suggest that XG-REPTILE is a supe- rior first-order meta-learning algorithm rivaling prior work with greater computational demands.3 In the multilingual setting, we observe that XG-REPTILE performs competitively using as lit- tle as 1% of Chinese examples. While training sampling 1% and 5% perform similarly—the best model sees 10% of CSpider samples during train- ing to yield accuracy only −0.9% (test) below 3We compare against DG-MAML as the best public available model on the CSpider leaderboard at the time of writing. 58 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 3: PCA Visualizations of sentence-averaged encodings for English (EN), French (FR), and Chinese (ZH) from the ATIS test set (@1% sampling from Table 2). We identify the regularized weight manifold that improves cross-lingual transfer using XG-REPTILE. We also improve in two similarity metrics averaged across languages. the monolingual DG-MAML model. While per- formance does not match monolingual models, the multilingual approach has additional utility in serving more users. As a zero-shot setup, predict- ing SQL from CSpider inputs through the model trained for English yields 7.9% validation accu- racy, underscoring that cross-lingual transfer for this dataset is non-trivial. Varying the target sample size demonstrates more variable effects for Spider compared to ATIS. Notably, increasing the sample size yields poorer English performance beyond the optimal XG-REPTILE@5% setting for English. This may be a consequence of the cross-database challenge in Spider—information shared across languages may be less beneficial than for single-domain ATIS. The least performant model for both lan- guages is XG-REPTILE@1%. Low performance here for Chinese can be expected, but the perfor- mance for English is surprising. We suggest that this result is a consequence of ‘‘over-sampling’’ of the target data disrupting the overall training process. That is, for 1% sampling and optimal K = 4, the target data is ‘‘over-used’’ 25× for each epoch of support data. We further observe diminishing benefits for English with additional Chinese samples. While we trained a competi- tive parser with minimal Chinese data, this effect could be a consequence of how RAT-SQL can- not exploit certain English-oriented learning fea- tures (e.g., lexical similarity scores). Future work could explore cross-lingual strategies to unify entity modeling for improved feature sharing. Visualizing the Manifold Analysis of XG- REPTILE in Section 4 relies on a theoretical ba- sis that first-order meta-learning creates a dense high-likelihood sub-region in the parameters (i.e., optimal manifold). Under these conditions, repre- sentations of new domains should cluster within the manifold to allow for rapid adaptation with minimal samples. This contrasts with methods without meta-learning, which provide no guaran- tees of representation density. However, metrics in Tables 2 and 3 do not directly explain if this expected effect arises. To this end, we visualize ATIS test set encoder outputs using PCA (Halko et al., 2011) in Figure 3. We contrast English (support) and French and Chinese as the most and least similar target languages. Using PCA al- lows for direct interpretation of low-dimensional distances across approaches. Cross-lingual sim- ilarity is a proxy for manifold alignment—as our goal is accurate cross-lingual transfer from 59 closely aligned representations from source and languages (Xia et al., 2021; Sherborne target and Lapata, 2022). Analyzing Figure 3, we observe meta-learning methods (Reptile-EN→FT-All, XG-REPTILE) to fit target languages closer to the support (English, yellow circle). In contrast, methods not utilizing meta-learning (Train-EN∪All, Train-EN→FT-All) appear less ordered with weaker representation overlap. Encodings from XG-REPTILE are less separable across languages and densely clus- tered, suggesting the regularized manifold hypoth- esized in Section 4 ultimately yields improved cross-lingual transfer. Visualizing encodings from English in the Reptile-EN model before fine- tuning produces a similar cluster (not shown), however, required fine-tuning results in ‘‘spread- ing’’ leading to less cross-lingual similarity. We also quantitatively examine the average encoding change in Figure 3 using cosine simi- larity and Hausdorff distance (Patra et al., 2019) between English and each target language. Cosine similarity is measured pair-wise across parallel inputs in each language to gauge similarity from representations with equivalent SQL outputs. As a measure of mutual proximity between sets, Hausdorff distance denotes a worst-case dis- tance between languages to measure more general ‘‘closeness’’. Under both metrics, XG-REPTILE yields the best performance with the most sub- stantial pair-wise similarity and Hausdorff simi- larity. These indicators for cross-lingual similarity further support the observation that our ex- pected behavior is legitimately occurring during training. Our findings better explain why our XG- REPTILE performs above other training algorithms. Specifically, our results suggest that XG-REPTILE learns a regularized manifold which produces stronger cross-lingual similarity and improved parsing compared to Reptile fine-tuning a man- ifold. This contrast will inform future work for cross-lingual meta-learning where XG-REPTILE can be applied. Error Analysis We can also examine where the improved cross-lingual transfer influences parsing performance. Similar to Figure 3, we con- sider the results of models using 1% sampling as the worst-case performance and examine where XG-REPTILE improves on other methods on the test set (448 examples) over five languages. Figure 4: Contrast between SQL from a French input from ATIS for Train-EN∪All and XG-REPTILE. The entities ‘‘San Jos´e’’ and ‘‘Phoenix’’ are not observed in the 1% sample of French data but are mentioned in the English support data. The Train-EN∪All ap- proach fails to connect attributes seen in English when generating SQL from French inputs (×). Training with XG-REPTILE better leverages support data to gen- erate accurate SQL from other languages ((cid:2)). Accurate semantic parsing requires sophisti- cated entity handling to translate mentioned proper nouns from utterance to logical form. In our few-shot sampling scenario, most entities will appear in the English support data (e.g., ‘‘Den- ver’’ or ‘‘American Airlines’’), and some will be mentioned within the target language sam- ple (e.g., ‘‘Mine´apolis’’ or ‘‘Nueva York’’ in Spanish). These samples cannot include all possi- ble entities—effective cross-lingual learning must ‘‘connect’’ these entities from the support data to the target language—such that these names can be parsed when predicting SQL from the target language. As shown in Figure 4, the failure to rec- ognize entities from support data, for inference on target languages, is a critical failing of all models besides XG-REPTILE. 60 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 The improvement in cross-lingual similarity using XG-REPTILE expresses a specific improve- ment in entity recognition. Compared to the worst performing model, Train-EN∪All, 55% of im- provement accounts for handling entities absent from the 1% target sample but present in the 99% English support data. While XG-REPTILE can gen- erate accurate SQL, other models are limited in expressivity to fall back on using seen entities from the 1% sample. This notably accounts for 60% of improvement in parsing Chinese, with minimal orthographic overlap to English, indi- cating that XG-REPTILE better leverages support data without reliance on token similarity. In 48% of improved parses, entity mishandling is the sole error—highlighting how limiting poor cross- lingual transfer is for our task. from target Our model also improves handling of novel modifiers (e.g., ‘‘on a weekday’’, ‘‘round-trip’’) language samples. Modi- absent fiers are often realized as additional sub-queries and filtering logic in SQL outputs. Comparing XG-REPTILE to Train-EN∪All, 33% of improve- ment is related to modifier handling. Less capable systems fall back on modifiers observed from the target sample or ignore them entirely to generate inaccurate SQL. While XG-REPTILE better links parsing knowl- edge from English to target languages—the problem is not solved. Outstanding errors in all languages primarily relate to query complexity, and the cross-lingual transfer gap is not closed. Furthermore, our error analysis suggests a future direction for optimal sample selection to minimize the error from interpreting unseen phenomena. 7 Conclusion We propose XG-REPTILE, a meta-learning algo- rithm for few-shot cross-lingual generalization in semantic parsing. XG-REPTILE is able to bet- ter utilize fewer samples to learn an economical multilingual semantic parser with minimal cost and improved sample efficiency. Compared to adjacent training algorithms and zero-shot ap- proaches, we obtain more accurate and consistent logical forms across languages similar and dis- similar to English. Results on ATIS show clear benefit across many languages and results on Spi- der demonstrate that XG-REPTILE is effective in a challenging cross-lingual and cross-database sce- nario. We focus our study on semantic parsing, however, this algorithm could be beneficial in other low-resource cross-lingual tasks. In future work we plan to examine how to better align enti- ties in low-resource languages to further improve parsing accuracy. Acknowledgments We thank the action editor and anonymous re- their constructive feedback. The viewers for authors also thank Nikita Moghe, Seraphina Goldfarb-Tarrant, Ondrej Bohdal, and Heather Lent for their insightful comments on earlier ver- sions of this paper. We gratefully acknowledge the support of the UK Engineering and Physical Sciences Research Council (grants EP/L016427/1 (Sherborne) and EP/W002876/1 (Lapata)) and the European Research Council (award 681760, Lapata). References Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng. 2019. On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2440–2452, Minneapolis, Minnesota. Association for Com- putational Linguistics. https://doi.org /10.18653/v1/N19-1253 Xilun Chen, Asish Ghoshal, Yashar Mehdad, Luke Zettlemoyer, and Sonal Gupta. 2020. Low-resource domain adaptation for compo- sitional task-oriented semantic parsing. In Pro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5090–5100, Online. Associa- tion for Computational Linguistics. https:// doi.org/10.18653/v1/D18-1269 Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representa- tions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics. 61 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Com- putational Linguistics. Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, and Mark Johnson. 2017. Multilingual semantic parsing and code- switching. In Proceedings of the 21st Con- ference on Computational Natural Language Learning (CoNLL 2017), pages 379–389, Vancouver, Canada. Association for Compu- tational Linguistics. https://doi.org/10 .18653/v1/K17-1038 Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Ma- chine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 1126–1135. PMLR. Dan Garrette and Jason Baldridge. 2013. Learn- ing a part-of-speech tagger from two hours of annotation. In Proceedings of the 2013 Confer- ence of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, pages 138–147, At- lanta, Georgia. Association for Computational Linguistics. Ofer Givoli and Roi Reichart. 2019. Zero-shot In Pro- instructions. semantic parsing for ceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4454–4464, Florence, Italy. Association for Computational Linguistics. https:// doi.org/10.18653/v1/P19-1438 Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 3622–3631, Brussels, Belgium. Association for Computational Linguistics. Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2019. Coupling retrieval and meta-learning for context-dependent semantic parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 855–866, Florence, Italy. Association for Computational Linguistics. Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 2011. Finding structure with random- ness: probabilistic algorithms for construct- ing approximate matrix decompositions. SIAM Review, 53(2):217–288. https://doi.org /10.1137/090771806 Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Str¨otgen, and Dietrich Klakow. 2021. A survey on recent approaches for natural language processing in low-resource scenar- ios. In Proceedings of the 2021 Conference of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies, pages 2545–2568, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.naacl-main.201 Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS spo- ken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Work- shop Held at Hidden Valley, Pennsylvania, June 24–27,1990. https://doi.org/10 .3115/116580.116613 Jonathan Herzig and Jonathan Berant. 2017. Neu- ral semantic parsing over multiple knowledge- bases. In Proceedings of the 55th Annual Meeting of the Association for Computa- tional Linguistics (Volume 2: Short Papers), pages 623–628, Vancouver, Canada. Associa- tion for Computational Linguistics. https:// doi.org/10.18653/v1/P17-2098 Jonathan Herzig and Jonathan Berant. 2021. Span-based semantic parsing for composi- tional generalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Lan- guage Processing (Volume 1: Long Papers), pages 908–921, Online. Association for Com- putational Linguistics. https://doi.org /10.18653/v1/2021.acl-long.74 62 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Timothy M. Hospedales, Antreas Antoniou, Paul Micaelli, and Amos J. Storkey. 2022. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(09):5149–5169. Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. XTREME: A massively mul- tilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Ma- chine Learning Research, pages 4411–4421. PMLR. Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-tau Yih, and Xiaodong He. 2018. Natural language to structured query gener- ation via meta-learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 732–738, New Orleans, Louisiana. Association for Computa- tional Linguistics. https://doi.org/10 .18653/v1/N18-2115 Zhanming Jie and Wei Lu. 2014. Multilingual semantic parsing: Parsing multiple languages into semantic representations. In Proceedings of COLING 2014, the 25th International Con- ference on Computational Linguistics: Techni- cal Papers, pages 1291–1301, Dublin, Ireland. Dublin City University and Association for Computational Linguistics. Akhil Kedia, Sai Chetan Chinthakindi, and Wonho Ryu. 2021. Beyond Reptile: Meta- learned dot-product maximization between improved single-task regular- gradients for ization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 407–420, Punta Cana, Dominican Re- public. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.findings-emnlp.37 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Repre- sentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. Moshe Koppel and Noam Ordan. 2011. Trans- lationese and its dialects. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1318–1326, Portland, Oregon, USA. Association for Computational Linguistics. Taku Kudo and John Richardson. 2018. Sen- tencePiece: A simple and language indepen- dent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demon- strations, pages 66–71, Brussels, Belgium. Association for Computational Linguistics. https://doi.org/10.18653/v1/D18 -2012 Dongjun Lee, Jaesik Yoon, Jongyun Song, Sang-gil Lee, and Sungroh Yoon. 2019. One- shot learning for text-to-sql generation. CoRR, abs/1905.11499. the 59th Annual Meeting of Hung-yi Lee, Ngoc Thang Vu, and Shang-Wen Li. 2021. Meta learning and its applications to natural language processing. In Proceed- ings of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Ab- stracts, pages 15–20, Online. Association for Computational Linguistics. Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. 2018. Learning to gener- alize: Meta-learning for domain generalization. Proceedings of the AAAI Conference on Artifi- cial Intelligence, 32(1). https://doi.org /10.1609/aaai.v32i1.11596 Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. 2021. MTOP: A comprehensive mul- tilingual task-oriented semantic parsing bench- mark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2950–2962, Online. Association for Computational Linguistics. Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, 63 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, and Ming Zhou. 2020. XGLUE: A new benchmark datasetfor cross- lingual pre-training, understanding and genera- tion. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 6008–6018, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020 .emnlp-main.484 Qingkai Min, Yuefeng Shi, and Yue Zhang. 2019. A pilot study for Chinese SQL semantic parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3652–3658, Hong Kong, China. Association for Computational Linguistics. Mehrad Moradshahi, Giovanni Campagna, Sina Semnani, Silei Xu, and Monica Lam. 2020. Localizing open-ontology QA semantic parsers in a day using machine translation. In Pro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5970–5983, Online. Associ- ation for Computational Linguistics. Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. CoRR, abs/1803.02999v3. Massimo Nicosia, Zhongdi Qu, and Yasemin Altun. 2021. Translate & fill: Improving zero- shot multilingual semantic parsing with syn- the Association thetic data. In Findings of for Computational Linguistics: EMNLP 2021, pages 3272–3284, Punta Cana, Dominican Republic. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.findings-emnlp.279 Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, and Isabelle Augenstein. 2020. Zero-shot cross-lingual transfer with meta learning. In Proceedings of the 2020 Confer- ence on Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 4547–4562, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2020.emnlp-main.368 Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, and Graham Neubig. 2019. Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 184–193, Florence, Italy. Association for Computational Linguistics. https://doi.org/10.18653/v1/P19 -1018 Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A python natural language processing toolkit for many human languages. In Pro- ceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Sys- tem Demonstrations, pages 101–108, Online. Association for Computational Linguistics. Kyle Richardson, Jonathan Berant, and Jonas Kuhn. 2018. Polyglot semantic parsing in the 2018 Confer- APIs. In Proceedings of ence of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 720–730, New Orleans, Louisiana. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /N18-1066 Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing incremen- tally for constrained auto-regressive decoding from language models. In Proceedings of the 2021 Conference on Empirical Methods in Nat- ural Language Processing, pages 9895–9901, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021 .emnlp-main.779 Tom Sherborne and Mirella Lapata. 2022. Zero-shot cross-lingual semantic parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4134–4153, Dublin, Ireland. Association for Computa- tional Linguistics. https://doi.org/10 .18653/v1/2022.acl-long.285 Tom Sherborne, Yumo Xu, and Mirella Lapata. 2020. Bootstrapping a crosslingual semantic 64 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 parser. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 499–517, Online. Association for Com- putational Linguistics. https://doi.org/10 .18653/v1/2020.findings-emnlp.45 Yu Su and Xifeng Yan. 2017. Cross-domain semantic parsing via paraphrasing. In Pro- ceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1235–1246, Copenhagen, Denmark. Association for Computational Linguistics. https://doi.org/10.18653/v1/D17 -1127 Alane Suhr, Ming-Wei Chang, Peter Shaw, and Kenton Lee. 2020. Exploring unexplored gener- alization challenges for cross-database semantic parsing. In Proceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics, pages 8372–8388, Online. Association for Computational Linguistics. Yibo Sun, Duyu Tang, Nan Duan, Yeyun Gong, Xiaocheng Feng, Bing Qin, and Daxin Jiang. 2020. Neural semantic parsing in low-resource settings with back-translation and meta- learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8960–8967. https://doi.org/10.1609/aaai.v34i05 .6427 Raymond Hendy Susanto and Wei Lu. 2017a. Neural architectures for multilingual semantic parsing. In Proceedings of the 55th Annual Meeting of the Association for Computa- tional Linguistics (Volume 2: Short Papers), pages 38–44, Vancouver, Canada. Association for Computational Linguistics. https://doi .org/10.18653/v1/P17-2007 Raymond Hendy Susanto and Wei Lu. 2017b. Semantic parsing with neural hybrid trees. Pro- ceedings of the AAAI Conference on Artificial Intelligence, 31(1). Shyam Upadhyay, Manaal Faruqui, G¨okhan T¨ur, Dilek Hakkani-T¨ur, and Larry P. Heck. 2018. (Almost) Zero-shot cross-lingual spoken language understanding. In 2018 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018, pages 6034–6038. IEEE. Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet ¨Ust¨un, Marija Stepanovi´c, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, and Barbara Plank. 2021. From masked language modeling to trans- lation: Non-English auxiliary tasks improve zero-shot spoken language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies, pages 2479–2497, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021 .naacl-main.197 Bailin Wang, Mirella Lapata, and Ivan Titov. 2021a. Meta-learning for domain generaliza- tion in semantic parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 366–379, Online. Association for Com- putational Linguistics. https://doi.org /10.18653/v1/2021.naacl-main.33 Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020a. RAT-SQL: Relation-aware schema en- coding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguis- tics, pages 7567–7578, Online. Association for Computational Linguistics. https://doi.org /10.18653/v1/2020.acl-main.677 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, and Angela Fan. 2021. Multilingual translation from denoising pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3450–3466, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021 .findings-acl.304 Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, and Tao Qin. 2021b. Generalizing to unseen domains: A survey on domain gen- eralization. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4627–4635. In- ternational Joint Conferences on Artificial In- telligence Organization. https://doi.org /10.24963/ijcai.2021/628 65 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020b. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Survey, 53(3). https://doi .org/10.1145/3386252 Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural ma- chine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144v2. Mengzhou Xia, Guoqing Zheng, Subhabrata Mukherjee, Milad Shokouhi, Graham Neubig, and Ahmed Hassan Awadallah. 2021. MetaXL: Meta representation transformation for low- resource cross-lingual learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 499–511, Online. Association for Com- putational Linguistics. Silei Xu, Sina Semnani, Giovanni Campagna, and Monica Lam. 2020a. AutoQA: From databases to QA semantic parsers with only In Proceedings of synthetic training data. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 422–434, Online. Association for Com- putational Linguistics. Weijia Xu, Batool Haider, Jason Krone, and Saab Mansour. 2021. Soft layer selection with meta-learning for zero-shot cross-lingual trans- fer. In Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing, pages 11–18, Online. Association for Computational Linguistics. Weijia Xu, Batool Haider, and Saab Mansour. 2020b. End-to-end slot alignment and recogni- tion for cross-lingual NLU. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5052–5063, Online. Association for Computational Linguistics. Jingfeng Yang, Federico Fancellu, Bonnie Webber, and Diyi Yang. 2021. Frustratingly simple but surprisingly strong: Using language- independent features for zero-shot cross-lingual semantic parsing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5848–5856, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021 .emnlp-main.472 pages Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 440–450, Vancouver, Canada. Associa- tion for Computational Linguistics. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A large-scale human-labeled dataset for com- plex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing, pages 3911–3921, Brussels, Belgium. Association for Computa- tional Linguistics. https://doi.org/10 .18653/v1/D18-1425 Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vuli´c, Roi Reichart, Anna Korhonen, and Hinrich Sch¨utze. 2021. A closer look at few- shot crosslingual transfer: The choice of shots matters. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5751–5767, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.acl-long.447 Victor Zhong, Mike Lewis, Sida I. Wang, and Luke Zettlemoyer. 2020. Grounded adapta- tion for zero-shot executable semantic parsing. 66 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 6869–6882, Online. Association for Computational Linguistics. Qile Zhu, Haidar Khan, Saleh Soltan, Stephen Rawls, and Wael Hamza. 2020. Don’t parse, insert: Multilingual semantic parsing with insertion based decoding. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 496–506, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020 .conll-1.40 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 3 3 2 0 6 7 8 7 3 / / t l a c _ a _ 0 0 5 3 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 67Meta-Learning a Cross-lingual Manifold for Semantic Parsing image
Meta-Learning a Cross-lingual Manifold for Semantic Parsing image

Scarica il pdf