RYANSQL: Recursively Applying - Specialized Research AI at MIT

RYANSQL: Recursively Applying
Sketch-based Slot Fillings
for Complex Text-to-SQL in
Cross-Domain Databases

DongHyun Choi
Natural Language Processing Team
Kakao Enterprise and School of Software
Sungkyunkwan University
heuristic.c@kakaoenterprise.com

Myeong Cheol Shin
Natural Language Processing Team
Kakao Enterprise
index.sh@kakaoenterprise.com

EungGyun Kim
Natural Language Processing Team
Kakao Enterprise
jason.ng@kakaoenterprise.com

Dong Ryeol Shin
School of Software
Sungkyunkwan University
drshin@skku.edu

Text-to-SQL is the problem of converting a user question into an SQL query, when the question
and database are given. In this article, we present a neural network approach called RYANSQL
(Recursively Yielding Annotation Network for SQL) to solve complex Text-to-SQL tasks for
cross-domain databases. Statement Position Code (SPC) is deﬁned to transform a nested SQL
query into a set of non-nested SELECT statements; a sketch-based slot-ﬁlling approach is pro-
posed to synthesize each SELECT statement for its corresponding SPC. Additionally, two input
manipulation methods are presented to improve generation performance further. RYANSQL
achieved competitive result of 58.2% accuracy on the challenging Spider benchmark. At the
time of submission (April 2020), RYANSQL v2, a variant of original RYANSQL, is positioned
at 3rd place among all systems and 1st place among the systems not using database content

Submission received: 12 April 2020; revised version received: 3 January 2021; accepted for publication:
4 March 2021.

https://doi.org/10.1162/COLI a 00403

© 2021 Association for Computational Linguistics
Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
(CC BY-NC-ND 4.0) license

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 47, Number 2

with 60.6% exact matching accuracy. The source code is available at https: // github. com/
kakaoenterprise/ RYANSQL .

1. Introduction

Relational databases are widely used to maintain and query structured data sets
in many ﬁelds such as healthcare (Hillestad et al. 2005), ﬁnancial markets (Beck,
Demirguc-Kunt, and Levine 2000), or customer relation management (Ngai, Xiu, and
Chau 2009). Most relational databases support Structured Query Language (SQL) to
access the stored data. Although SQL is expressive and powerful, it is quite difﬁcult to
master, especially for non-technical users.

Text-to-SQL is the task of generating an SQL query when a user question and a
target database are given. The examples are shown in Figure 1. Recently proposed
neural network architectures achieved more than 80% exact matching accuracy on the
well-known Text-to-SQL benchmarks such as ATIS (Air Travel Information Service),
GeoQuery, and WikiSQL (Xu, Liu, and Song 2017; Yu et al. 2018a; Shi et al. 2018; Dong
and Lapata 2018; Hwang et al. 2019; He et al. 2018). However, those benchmarks have
shortcomings that restrict their applications. The ATIS (Price 1990) and GeoQuery (Zelle
and Mooney 1996) benchmarks assume the same database across the training and test
data set, thus the trained systems cannot process a newly encountered database at in-
ference time. The WikiSQL (Zhong, Xiong, and Socher 2017) benchmark assumes cross-
domain databases. Cross-domain means that the databases for training and test data
sets are different; the system should predict with an unseen database as its input dur-
ing testing. Meanwhile, the complexity of SQL queries and databases in the WikiSQL
benchmark is somewhat limited. WikiSQL assumes that an input database always has
only one table. It also assumes that the resultant SQL is non-nested, and contains SELECT
and WHERE clauses only. Figure 1(a) shows an example from the WikiSQL data set.

Different from those benchmarks, the Spider benchmark proposed by Yu et al.
(2018c) contains complex SQL queries with cross-domain databases. The SQL queries in
Spider benchmark could contain nested queries with multiple table JOINs, and clauses
like ORDERBY, GROUPBY, and HAVING. Figure 1(b) shows an example from the Spider
benchmark; Yu et al. (2018c) showed that the state-of-the-art systems for the previous
benchmarks do not perform well on the Spider data set.

In this article, we propose a novel network architecture called RYANSQL (Recur-
sively Yielding Annotation Network for SQL) to handle such complex, cross-domain
Text-to-SQL problems. The proposed approach generates nested queries by recursively
yielding their component SELECT statements. A sketch-based slot-ﬁlling approach
is proposed to predict each SELECT statement. In addition, two simple but effective
input manipulation methods are proposed to improve the overall system performance.
Among the systems not using database content, the proposed RYANSQL and its variant
RYANSQL v2, with the aid of BERT (Devlin et al. 2019), improve the previous state-of-
the-art system SLSQL (Lei et al. 2020) by 2.5% and 4.9%, respectively, in terms of the
test set exact matching accuracy. RYANSQL v2 is ranked at 3rd place among all systems
including those using database content. Our contributions are summarized as follows:

• We propose a detailed sketch for the complex SELECT statements, along

with a network architecture to ﬁll the slots.

•

Statement Position Code (SPC) is introduced to recursively predict nested
queries with sketch-based slot-ﬁlling algorithm.

310

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Choi et al.

RYANSQL

(a) A Text-to-SQL example from the WikiSQL data set.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

(b) A Text-to-SQL example from the Spider data set.

Figure 1
Text-to-SQL examples.

• We suggest two simple input manipulation methods to improve

performance further.

2. Task Deﬁnition

1 , . . . , wQ

The Text-to-SQL task considered in this article is deﬁned as follows: Given a question
with n tokens Q = {wQ
n } and a DB schema with t tables and f foreign key
relations D = {T1, . . . , Tt, F1, . . . , Ff }, ﬁnd S, the SQL translation of Q. Each table Ti
consists of a table name with ti words {wTi
}, and a set of columns {Cj, . . . , Ck}.
Cj
cj }, and a marker to check if the
Each column Cj consists of a column name {w
column is a primary key.

1 , . . . , wTi
ti
Cj
1 , . . . , w

For an SQL query S we deﬁne a non-nested form of S, N(S) = {(P1, S1), . . . , (Pl, Sl)}.
In the deﬁnition, Pi is the i-th SPC, and Si is its corresponding SELECT statement. Table 1
shows examples of a natural language query Q, corresponding SQL translation S, and
non-nested form N(S).

311

INPUTQuestion:What is the format for South Australia?Database:OUTPUTSELECTFormat FROMtableWHERE State/territory=“South Australia” TABLEState/territorytextText/backgroundcolourtextFormattextCurrent slogantextCurrent seriestextNotestextINPUTQuestion:Find the first name and age of the students who are playing both Football and Lacrosse.Database:OUTPUTSELECTFname,AgeFROMStudent WHEREStuIDIN(SELECTStuIDFROMSportsinfo WHERESportName=”Football” INTERSECTSELECTStuIDFROMSportsinfo WHERESportName=”Lacrosse”)TABLE StudentStuIDINT, primary keyLNameVARCHARFnameVARCHARAgeINTSexVARCHARMajorINTAdvisorINTcity_codeVARCHARTABLE SportsInfoStuIDINTSportNameVARCHARHoursPerWeekINTGamesPlayedINTOnScholarshipVARCHARforeign key

Computational Linguistics

Volume 47, Number 2

Table 1
Examples of a user question Q, SQL translation S, and non-nested form N(S).

Find the names of scientists who are not working on the project
with the highest hours.
SELECT name FROM scientists EXCEPT
(SELECT T3.name FROM assignedto AS T1

JOIN projects AS T2 ON T1.project = T2.code
JOIN scientists AS T3 ON T1.scientist = T3.SSN

WHERE T2.hours = ( SELECT max(hours) FROM projects ) )

Case 1

N(S)

P1
S1
P2

P3
S3

[ NONE ]
SELECT name FROM scientists EXCEPT S2
[ EXCEPT ]
SELECT T3.name FROM assignedto AS T1

JOIN projects AS T2 ON T1.project = T2.code
JOIN scientists AS T3 ON T1.scientist = T3.SSN

WHERE T2.hours = S3

[ EXCEPT, WHERE ]
SELECT max(hours) FROM projects

Find the names of accounts whose
checking balance is above the average checking balance, but
savings balance is below the average savings balance.
SELECT T1.name FROM accounts AS T1

JOIN checking AS T2 ON T1.custid = T2.custid

WHERE T2.balance > (SELECT avg(balance) FROM checking)

INTERSECT
SELECT T1.name FROM accounts AS T1

Case 2

N(S)

JOIN savings AS T2 ON T1.custid = T2.custid

WHERE T2.balance < (SELECT avg(balance) FROM savings) [ NONE ] SELECT T1.name FROM accounts AS T1 JOIN checking AS T2 ON T1.custid = T2.custid WHERE T2.balance > S2

INTERSECT S3
[ WHERE ]
SELECT avg(balance) FROM checking
[ INTERSECT ]
SELECT T1.name FROM accounts AS T1

JOIN savings AS T2 ON T1.custid = T2.custid

WHERE T2.balance < S4 [ INTERSECT, WHERE ] SELECT avg(balance) FROM savings P1 S1 P2 S2 P3 S3 P4 S4 1 , . . . , cP Each SPC P could be considered as a sequence of p position code elements, P = [cP p ]. The possible set of position code elements is {NONE, UNION, IN- TERSECT, EXCEPT, WHERE, HAVING, PARALLEL}. NONE represents the outermost statement, and PARALLEL means the parallel elements inside a single clause, for example, 312 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL Figure 2 Network architecture of the proposed input encoder. (cid:115) represents self-attention. the second element of the WHERE clause. Other position code elements represent corresponding SQL clauses. Because it is straightforward to construct S from N(S), the goal of the proposed sys- tem is to construct N(S) for the given Q and D. To achieve the goal, the proposed system ﬁrst sets the initial SPC P1 = [NONE], and predicts its corresponding SELECT statement and nested SPCs. The system recursively ﬁnds out the corresponding SELECT statements for the remaining SPCs, until every SPC has its own corresponding SELECT statement. 3. Generating a SELECT Statement In this section, the method to create the SELECT statement for the given question Q, database D, and SPC P is described. Section 3.1 describes the input encoder; the sketch-based slot-ﬁlling decoder is described in Section 3.2. 3.1 Input Encoder Figure 2 shows the overall network architecture of the input encoder. The input encoder consists of ﬁve layers: Embedding layer, Embedding Encoder layer, Question-Column Alignment layer, Table Encoder layer, and Question-Table Alignment layer. Embedding Layer. To get the embedding vector for a word w in question, table names, or column names, its word embedding and character embedding are concatenated. The word embedding is initialized with d1 = 300 dimensional pretrained GloVe (Pennington, Socher, and Manning 2014) word vectors, and is ﬁxed during training. For character embedding, each character is represented as a trainable vector of dimension 313 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 QuestionQDatabaseDSQL Position CodeP𝑒"#𝑒$#𝑒%#𝑤"#𝑤$#𝑤%#EmbeddingEmbeddingEncoderQuestionEncoder𝑣"#𝑣$#𝑣%#NAMETABLE 1COLUMN 1COLUMN 𝑗𝑤)*+*𝑤",*𝑤-*,*𝑤"+*𝑤",.𝑤-.,.NAMETABLE tCOLUMN kCOLUMN m𝑤)/+/𝑤",0𝑤-0,0𝑤"+/𝑤",1𝑤-1,1𝑒)*+*𝑒",*𝑒-*,*𝑒"+*𝑒",.𝑒-.,.𝑒)/+/𝑒",0𝑒-0,0𝑒"+/𝑒",1𝑒-1,1𝑐"3𝑐43𝑒"3𝑒43SPCEncoderℎ",ℎ6,ℎ7,ℎ8,Question-ColumnAlignment𝑣",𝑣6,𝑣7,𝑣8,Question –Column AlignerTableEncoderQuestion-TableAlignmentOUTPUTS………………………………………………………………………Sℎ"+ℎ)+…𝑣"#𝑣$#𝑣%#𝑣"+𝑣)+…𝑣",𝑣6,𝑣7,𝑣8,………𝑣9𝑣3Question-TableAlignerSS…EncodedQuestion TokensEncodedTablesEncodedColumnsEncodedDatabaseEncodedSPC𝑣#EncodedQuestionColumn EncoderColumn Encoder Computational Linguistics Volume 47, Number 2 d2 = 50, and we take maximum value of each dimension of component characters to get the ﬁxed-length vector. The two vectors are then concatenated to obtain the embedding vector ew ∈ Rd1+ d2. One layer highway network (Srivastava, Greff, and Schmidhuber 2015) is applied on top of this representation. For SPC P, each code element c is represented as a trainable vector of dimension dp = 100. Embedding Encoder Layer. A one-dimensional convolution layer with kernel size 3 is applied on top of SPC element embedding vectors {eP p }. Max-pooling is applied on the output to get the SPC vector vP ∈ Rdp. vP is then concatenated to each question and column word embedding vector. 1 , . . . , eP CNN with dense connection proposed in Yoon, Lee, and Lee (2018) is applied to encode each word of question and columns to capture local information. Parameters are shared across the question and columns. For each column, a max-pooling layer is followed; outputs are concatenated with their table name representations and projected to dimension d. Layer outputs are encoded question word vectors VQ = {vQ m} ∈ Rm×d. n } ∈ Rn×d, and hidden column vectors HC = {hC 2 , . . . , vQ 1 , . . . , hC 1 , vQ Question-Column Alignment Layer. In this layer, the model tries to update the column vectors with the input question. More precisely, the model ﬁrst aligns question tokens with column vectors to obtain an attended question vector for each column. The attended question vectors are then fused with corresponding column vectors to get question context-integrated column vectors. Scaled dot-product attention (Vaswani et al. 2017) is used to align question tokens with column vectors: AQtoC = softmax( (cid:124) HC(VQ) √ d )VQ (1) where each i-th row of AQtoC ∈ Rm×d is an attended question vector of the i-th column. The heuristic fusion function fusion(x, y), proposed in Hu et al. (2018), is applied to merge AQtoC with HC: ˜x = relu(Wr[x; y; x ◦ y; x − y]) g = σ(Wg[x; y; x ◦ y; x − y]) fusion(x, y) = g ◦ ˜x + (1 − g) ◦ x FC = fusion(AQtoC, HC) (2) where Wr and Wg are trainable variables, σ denotes the sigmoid function, ◦ denotes element-wise multiplication, and FC ∈ Rm×d is fused column matrix. Once the column vectors are updated with the question context, a transformer layer (Vaswani et al. 2017) is applied on top of FC to capture contextual column information. Layer outputs are the encoded column vectors VC = {vC m} ∈ Rm×d. 1 , . . . , vC Table Encoder Layer. Column vectors belonging to each table are integrated to get the encoded table vector. For a matrix M ∈ Rn×d, self-attention function fs(M) ∈ R1×d is deﬁned as follows: fs(M) = softmax(W2 tanh(W1M (cid:124) ))M (3) 314 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL where W1 ∈ Rd×d, W2 ∈ R1×d are trainable parameters. Then, for table t with columns {Cj, . . . , Ck}, the hidden table vector hT t is calculated as follows: t = fs([vC hT j ; ...; vC k ]) (4) Outputs of the layer are the hidden table vectors HT = {hT 1 , hT 2 , . . . , hT t } ∈ Rt×d. Question-Table Alignment Layer. the same network architecture as the Question-Column Alignment layer is used to model the table vectors with contextual information of the question. Layer outputs are the encoded table vectors VT = {vT In this layer, 1 , vT 2 , . . . , vT t } ∈ Rt×d. Encoder Output. Final outputs of the input encoder are as follows: (1) Encoded question word vectors VQ = {vQ (2) Encoded column vectors t } ∈ Rt×d, and (4) VC = {vC Encoded SPC vP ∈ Rdp. Additionally, (5) Encoded question vector vQ = fs(VQ) and (6) Encoded DB schema vector vD = fs(VC) ∈ Rd are calculated for later use in the decoder. 1 , . . . , vQ m} ∈ Rm×d, (3) Encoded table vectors VT = {vT n } ∈ Rn×d, 1 , . . . , vC 1 , . . . , vT 3.1.1 BERT-based Input Encoder. Inspired by the work of Hwang et al. (2019) and Guo et al. (2019), BERT (Devlin et al. 2019) is considered as another version of the input encoder. The input to BERT is constructed by concatenating question words, SPC elements, and column words as follows: [CLS], wQ p , [SEP], 1 , . . . , wC1 wC1 cm , [SEP]. Hidden states of the last layer are retrieved to form VQ and VC; for VC, the state of each column’s last word is taken to represent an encoded column vector. Each table is calculated as a self-attended vector of its containing columns; vQ, vD, and vector vT j vP are calculated as the same. c1 , [SEP], . . . , [SEP], wCm 1 , . . . , wCm n , [SEP], cP 1 , . . . , wQ 1 , . . . , cP 3.2 Sketch-based Slot-Filling Decoder Table 2 shows the proposed sketch for a SELECT statement. The sketch-based slot-ﬁlling decoder predicts values for slots of the proposed sketch, as well as the number of slots. Classifying Base Structure. By the term base structure of a SELECT statement, we refer to the existence of its component clauses and the number of conditions for each clause. We ﬁrst combine the encoded vectors vQ, vD, and vP to obtain the statement encoding vector vS, as follows: hc(x, y) = concat(x, y, |x − y|, x ◦ y) vS = W concat(hc(vQ, vD), vP) (5) (6) where W ∈ Rd×(4d+dp ) is a trainable parameter, and function hc(x, y) is the concatenation function for the heuristic matching method proposed in Mou et al. (2016). Eleven values bg, bo, bl, bw, bh, ng, no, ns, nw, nh, and cIUEN are classiﬁed by applying two fully connected layers on vS. Binary values bg, bo, bl, bw, bh represent the existence of GROUPBY, ORDERBY, LIMIT, WHERE, and HAVING, respectively. Note that FROM and SELECT clauses must exist to form a valid SELECT statement. ng, no, ns, nw, nh represent 315 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 2 Table 2 Proposed sketch for a SELECT statement. $TBL and $COL represent a table and a column, respectively. $AGG is one of {none, max, min, count, sum, avg}, $ARI is one of the arithmetic operators {none, -, +, *, / }, and $COND is one of the conditional operators {between, =, >, <, >=, <=, !=, in, like, is, exists}. $DIST and $NOT are Boolean variables representing the existence of keywords DISTINCT and NOT, respectively. $ORD is a binary value for keywords ASC/DESC, and $CONJ is one of conjunctions {AND, OR}. $VAL is the value for WHERE/HAVING condition; $SEL represents the slot for another SELECT statement. CLAUSE SKETCH FROM ($TBL)+ SELECT ORDERBY GROUPBY LIMIT WHERE HAVING $DIST ( $AGG ( $DIST1 $AGG1 ( ( $DIST1 ( $COL )∗ $AGG1 $COL1 $COL1 $ARI $DIST2 $ARI $DIST2 $AGG2 $AGG2 $COL2 ) ) + $COL2 ) $ORD ) ∗ $NUM ( $CONJ ( $DIST1 $NOT $COND $VAL1|$SEL1 $AGG1 $COL1 $ARI $DIST2 $VAL2|$SEL2 )∗ $AGG2 $COL2 ) INTERSECT UNION EXCEPT $SEL the number of conditions in GROUPBY, ORDERBY, SELECT, WHERE, and HAVING clauses, respectively. The maximal numbers of conditions Ng = 3, No = 3, Ns = 6, Nw = 4, and Nh = 2 are deﬁned for GROUPBY, ORDERBY, SELECT, WHERE, and HAVING clauses, to solve the problem as n-way classiﬁcation problem. The values of maximal condition numbers are chosen to cover all the training cases. Finally, cIUEN represents the existence of one of INTERSECT, UNION, or EXCEPT, or NONE if no such clause exists. If the value of cIUEN is one of INTERSECT, UNION, or EXCEPT, the corresponding SPC is created, and the SELECT statement for that SPC is generated recursively. FROM Clause. A list of $TBLs should be decided to predict the FROM clause. For each table i, Pfromtbl(i|Q, D, P), the probability that table i is included in the FROM clause, is calculated as follows: ci = concat(vT si = W2 tanh(W1ci) i , vQ, vD, vP) (7) Pfromtbl(i|Q, D, P) = σ(s)i where W1, W2 are trainable variables, s = [s1, . . . , st] ∈ Rt represents the scores for tables, and σ denotes the sigmoid function. From now on, we omit the notations Q, D, and P for the sake of simplicity. 316 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL Top nt tables with the highest Pfromtbl(i) values are chosen. We set an upper bound Nt = 6 on possible number of tables. The formula to get P#tbl(nt) for each possible nt is: vT(cid:48) = softmax(s)VT P#tbl(nt) = softmax(full2(vT(cid:48) )) (8) In the equation, full2 means the application of two fully connected layers, and table score vector s is from Equation (7). During the inference, the $TBLs are classiﬁed ﬁrst, and $COLs for other clauses are chosen among the columns of the classiﬁed $TBLs. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 SELECT Clause. The decoder ﬁrst generates Ns conditions to predict the SELECT clause. Because each condition depends on different parts of Q, we calculate attended question vector for each condition: AQ sel = W3 tanh(VQW1 + vPW2) VQ sel = softmax(AQ sel)VQ (cid:124) (9) While W1, W2 ∈ Rd×d, W3 ∈ RNs×d are trainable parameters, and VQ sel ∈ RNs×d is the matrix of attended question vectors for Ns conditions. vP is tiled to match the row of VQ. For m columns and Ns conditions, Psel col1 ∈ RNs×m, the probability matrix for each column to ﬁll the slot $COL1 of each condition, is calculated as follows: sel[i] = W6 tanh(VQ AC Psel col1[i] = softmax(AC sel[i]) sel[i]W4 + VCW5) (cid:124) (10) where W4, W5 ∈ Rd×d and W6 ∈ R1×d are trainable parameters. In this and following equations, notation M[i] is used to represent the i-th row of matrix M. The attended question vectors are then updated with selected column information sel col1 ∈ RNs×d: to get the updated question vector UQ UC sel col1[i] = Psel col1[i]VC sel col1[i] = W7hc(VQ UQ sel[i], UC sel col1[i]) (11) where W7 is a trainable variable, and hc(x, y) is deﬁned in Equation (5). The probabilities for $DIST1, $AGG1, $ARI, and $AGG are calculated by applying a fully connected layer on UQ sel col1[i]. Equation (10) is reused to calculate Psel col2, with VQ sel col1[i]; then UQ sel col2 is retrieved in the same way as Equation (11), and the probabilities of $DIST2 and $AGG2 are calculated in the same way as $DIST1 and $AGG1. Finally, the $DIST slot, DISTINCT marker for overall SELECT clause, is calculated by applying a fully connected layer on vS. sel[i] replaced by UQ Once all the slots are ﬁlled for Ns conditions, the decoder retrieves the ﬁrst ns conditions to predict the SELECT clause. This is possible because the CNN with Dense Connection used for question encoding (Yoon, Lee, and Lee 2018) captures relative 317 Computational Linguistics Volume 47, Number 2 position information. Due to the SQL consistency protocol of the Spider benchmark (Yu et al. 2018c), we expect that the conditions are ordered in the same way as they are presented in Q. For the data sets without such consistency protocol, the proposed slot-ﬁlling method could easily be changed to an LSTM-based model, as shown in Xu, Liu, and Song (2017). ORDERBY Clause. The same network structure as a SELECT clause is applied. The only difference is the prediction of $ORD slot; this could be done by applying a fully ob col1, which is the correspondence of UQ connected layer on UQ sel col1. GROUPBY Clause. The same network structure as a SELECT clause is applied. For the GROUPBY case, retrieving only the values of Pgb col1 is enough to ﬁll the necessary slots. LIMIT Clause. A question does not explicitly contain the $NUM slot value for LIMIT clause in many cases, if the question is for the top-1 result (For example: “Show the name and the release year of the song by the youngest singer”). Thus, the LIMIT decoder ﬁrst determines if the given Q requests for the top-1 result. If so, the decoder sets the $NUM value to 1; otherwise, it tries to ﬁnd out the speciﬁc token for $NUM among the tokens of Q using pointer network (Vinyals, Fortunato, and Jaitly 2015). LIMIT top-1 probability Plimit top1 is retrieved by applying a fully-connected layer on vS. PQ limit num[i], the probability of i-th question token for $NUM slot value, is calculated as: AQ limit num = W3 tanh(VQW1 + vPW2) (cid:124) limit num[i] = softmax(AQ PQ limit num)i (12) W1, W2 ∈ Rd×d, W3 ∈ R1×d are trainable parameters. WHERE Clause. The same network structure as a SELECT clause is applied to get the attended question vectors VQ wh ∈ RNw×d, and probabilities for $COL1, $COL2, $DIST1, $DIST2, $AGG1, $AGG2, and $ARI. A fully connected layer is applied on UQ wh col1 to get the probabilities for $CONJ, $NOT, and $COND. A fully connected layer is applied on UQ wh col2 to determine if the condition value for each column is another nested SELECT statement or not. If the value is determined as a nested SELECT statement, the corresponding SPC is generated, and the SELECT statement for the SPC is predicted recursively. If not, the pointer network is used to get the start and end position of the value span from question tokens. wh col1 and UQ HAVING Clause. The same network structure as a WHERE clause is applied. 4. Two Input Manipulation Methods In this section, we introduce two input manipulation methods to improve the performance of our proposed system further. 4.1 JOIN Table Filtering In a FROM clause, some tables (and their columns) are not mentioned explicitly in the given question, but they are still required to make a “link” between other tables to 318 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL Table 3 An example SQL query with a link table. Q: What are the papers of Liwen Xiong in 2015? SQL: SELECT DISTINCT t3.paperid FROM writes AS t2 JOIN author AS t1 ON t2.authorid = t1.authorid JOIN paper AS t3 ON t2.paperid = t3.paperid WHERE t1.authorname = “Liwen Xiong” AND t3.year = 2015; form a proper SQL query. One such example is given in Table 3. The table writes is not explicitly mentioned in Q, but it is used in the JOIN clause to link between tables author and paper. Those “link” tables are necessary to create the proper SELECT statement, but they work as noise in aligning question tokens and tables because the link tables do not have the corresponding tokens in Q. To reduce the training noises, only the non-link tables are considered as the $TBL slot values of FROM clause during training. A table of FROM clause is considered a link table if (1) all the $AGG values of the SELECT clause are none, and (2) none of its columns appears in other clauses’ slots. During the inference, the link tables could easily be recovered by using the foreign key relations of the extracted tables. More precisely, the system uses a heuristic of ﬁnding the shortest joinable foreign key relation “path” between the extracted tables. Once a path is found, tables in the path are added as the $TBLs of the FROM clause. The goal of this method is to distinguish the link tables from non-link tables during the training phase. SyntaxSQLNet (Yu et al. 2018b) ﬁrst predicts all the columns, and then chooses FROM tables based on the classiﬁed columns. As noted in Yu et al. (2018b), the approach cannot handle count queries with additional JOINs, for example, “SELECT T2.name, count(*) FROM singer in concert AS T1 JOIN singer AS T2 ON T1.singer id = T2.singer id GROUP BY T2.singer id.” Its corresponding user question is “List singer names and number of concerts for each singer.” GNN (Bogin, Gardner, and Berant 2019) handles the problem by turning the database schema into a graph; foreign key links between nodes help the system to distinguish between two types of tables. IRNet (Guo et al. 2019) and RAT-SQL (Wang et al. 2020) have the separated schema linking processing module to explicitly link columns with question tokens. 4.2 Supplemented Column Names We supplement the column names with their table names to distinguish between columns with the same name but belonging to different tables and representing different meanings. Table names are concatenated in front of their belonging column names to form supplemented column names (SCNs), but if the stemmed form of a table name is wholly included in the stemmed form of a column name, the table name is not concate- nated. Table 4 shows SCN examples; the three columns with the same name id are dis- tinguished with their SCNs. We can also expect the SCNs to align better with question tokens, since a SCN contains more information about what the column actually refers to. The method aims to integrate tables with their columns. To achieve the goal, IRNet (Guo et al. 2019) and RAT-SQL (Wang et al. 2020) separately encode tables and columns, 319 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 2 Table 4 Examples of supplemented column names (SCNs). Table Column SCN tv channel id series name tv channel id tv channel series name tv series cartoon id id tv series id cartoon id and integrate the two embeddings on the network; GNN (Bogin, Gardner, and Berant 2019) represents database schema as a graph, generating links between tables and their columns, and directly processes the graph using graph neural network; EditSQL (Zhang et al. 2019) concatenates the table names with its column names, using a special character. 5. Related Work Most recent works on the Text-to-SQL task used the encoder-decoder model. Those works could be classiﬁed into three main categories, based on their decoder outputs. Sequence-to-Sequence translation approaches generate SQL query tokens. Dong and Lapata (2016) introduced the hierarchical tree decoder to prevent the model from generating grammatically incorrect semantic representations of the input sentences. Zhong, Xiong, and Socher (2017) used policy-based reinforcement learning to deal with the unordered nature of WHERE conditions. Grammar-based approaches generate a sequence of grammar rules and apply the generated rules sequentially to obtain the resultant SQL query. IRNet (Guo et al. 2019) deﬁned a structural representation of an SQL query and a set of parse actions to handle the WikiSQL data set. IRNet deﬁned the SemQL query, which is an abstraction of a SQL query in tree form. They also proposed a set of grammar rules to synthesize SemQL queries; synthesizing a SQL query from a SemQL tree structure is straightforward. RAT-SQL (Wang et al. 2020) improved the work of Guo et al. (2019) by proposing a relation-aware transformer to effectively encode relations between columns, tables, and question tokens. GNN (Bogin, Gardner, and Berant 2019) focused on the DB constraints selection problem during the grammar decoding process; they applied global reasoning between question words and database columns/tables. SLSQL (Lei et al. 2020) manually annotated link information between user questions and database columns to show the role of schema linking. Sketch-based slot-ﬁlling approaches use a sketch, which aligns with the syntactic structure of a SQL query. A sketch should be deﬁned generic enough to handle all SQL queries of interest. Once a sketch is deﬁned, one can simply ﬁll the slots of the sketch to obtain the resultant SQL query. SQLNet (Xu, Liu, and Song 2017) ﬁrst introduced a sketch to handle the WikiSQL data set, along with attention-based slot-ﬁlling algo- rithms. The proposed sketch for WikiSQL is shown in Table 5. TypeSQL (Yu et al. 2018a) added category information such as named entity to better encode the input question. SQLova (Hwang et al. 2019) introduced BERT (Devlin et al. 2019) to encode the input question and database, and the encoded vectors were used to ﬁll the slots of the sketch. 320 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL Table 5 Sketch for WikiSQL data set. $COL represents a column, and $AGG is one of {none, max, min, count, sum, avg}. $COND is one of the conditional operators { =, >, < }. $VAL is the value for WHERE condition. CLAUSE SKETCH SELECT $AGG $COL WHERE ($COL $COND $VAL)* Table 6 The sketch for a SELECT statement proposed by RCSQL (Lee 2019). $COL represents a column. $AGG is one of {none, max, min, count, sum, avg}, and $COND is one of the conditional operators {between, =, >, <, >=, <=, !=, in, like, is, exists}. $ORD is a binary value for keywords ASC/DESC, and $CONJ is one of conjunctions {AND, OR}. $VAL is the value for WHERE/HAVING condition; $SEL represents the slot for another SELECT statement. CLAUSE SELECT ORDERBY GROUPBY LIMIT WHERE HAVING SKETCH ( $AGG $COL ) + ( $AGG $COL ) + $ORD ( $COL )∗ $NUM ( $CONJ $COL $COND $VAL | $SEL ) ∗ ($CONJ $AGG $COL $COND $VAL | $SEL ) ∗ INTERSECT UNION EXCEPT $SEL X-SQL (He et al. 2018) aligned the contextual information with column tokens to better summarize each column. The sketch-based approaches for WikiSQL described here all used the sketch shown in Table 5, which is enough for the WikiSQL queries but oversim- pliﬁed for general SQL queries, for example, those contained in the Spider benchmark. The sketch-based approach on the more complex Spider benchmark showed rela- tively low performance compared to the grammar-based approaches so far. There are two major reasons: (1) It is hard to deﬁne a sketch for Spider queries since the allowed syntax of the Spider SQL queries is far more complicated than that of the WikiSQL queries. (2) Because the sketch-based approaches ﬁll values for the predeﬁned slots, the approaches have difﬁculties in predicting the nested queries. RCSQL (Lee 2019) tried to apply the sketch-based approach on the Spider data set; Table 6 shows the sketch proposed by Lee (2019). To predict a nested SELECT statement, RCSQL takes a temporal generated SQL query with a special token [SUB QUERY] in the corresponding location as its input. For example, for Case 1 of Table 1, RCSQL gets a temporal generated query string ”SELECT name FROM scientists EXCEPT [SUB QUERY]” as its input to generate the nested statement S2, along with the user question and database schema. 321 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 2 Our proposed approach has three improvements compared to RCSQL. First, our sketch in Table 2 is more “complete” in terms of expressiveness. For example, because the RCSQL sketch lacks $ARI elements, the RCSQL cannot generate queries with arithmetic operations between columns, for example, “SELECT T1.name FROM accounts AS T1 JOIN checking AS T2 ON T1.custid = T2.custid JOIN savings AS T3 ON T1.custid = T3.custid ORDER BY T2.balance + T3.balance LIMIT 1.” Second, while our proposed approach directly predicts for the tables in FROM clause, the RCSQL heuristically predicts the tables using the extracted columns for other clauses. The RCSQL approach cannot generate count queries with additional table JOINs, for example, “SELECT count(*) FROM institution AS T1 JOIN protein AS T2 ON T1.institution id = T2.institution id WHERE T1.founded > 1880 OR T1.type = ‘Private’.” Third, RCSQL fails to generate the nested
SELECT statements when two or more statements are on the same depth, for example,
S2 and S3 in Case 2 of Table 1. Because RCSQL generates one SELECT statement for an
input, it expects only one special token for a query.

In this article, we propose a more completed sketch compared to the WikiSQL
(Table 5) and RCSQL (Table 6) sketches for complex SELECT statements, along with the
Statement Position Code (SPC) to handle the nested queries more efﬁciently. Although
our proposed sketch is tuned using the Spider data set, the sketch is based on the
generic SQL syntax and could be applied to other SQL generation tasks.

6. Experiment

6.1 Experiment Setup

Implementation. The proposed RYANSQL is implemented with Tensorﬂow (Abadi et al.
2015). Layernorm (Ba, Kiros, and Hinton 2016) and dropout (Srivastava et al. 2014) are
applied between layers, with a dropout rate of 0.1. Exponential decay with decay rate
0.8 is applied to the learning rate for every three epochs. On each epoch, the trained
classiﬁer is evaluated against the validation data set, and the training stops when the
exact match score for the validation data set is not improved for 20 consequent training
epochs. Minibatch size is set to 16; learning rate is set to 4e−4. Loss is deﬁned as the
sum of all classiﬁcation losses from the slot-ﬁlling decoder. The trained network has
22M parameters.

For pretrained language model–based input encoding, we downloaded the
publicly available pretrained model of BERT, BERT-Large, Uncased (Whole Word
Masking), and ﬁne-tuned the model during training. The learning rate is set to 1e−5,
and minibatch size is set to 4. The model with BERT has 445M parameters.

Data sets. The Spider data set (Yu et al. 2018c) is mainly used to evaluate our proposed
system. We use the same data split as Yu et al. (2018c); 206 databases are split into 146
train, 20 dev, and 40 test. All questions for the same database are in the same split;
there are 8,659 questions for train, 1,034 for dev, and 2,147 for test. The test set of Spider
is not publicly available, so for testing our models are submitted to the data owner.
For evaluation, we used exact matching accuracy, with the same deﬁnition as deﬁned
in Yu et al. (2018c).

6.2 Evaluation Results

Table 7 shows comparisons of the proposed system with several state-of-the-art
systems; evaluation scores for dev and test data sets are retrieved from the Spider

322

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Choi et al.

RYANSQL

Table 7
Evaluation results of the proposed systems and other state-of-the-art systems.

System

Dev

Test

Without pretrained language models

GrammarSQL (Lin et al. 2019)
EditSQL (Zhang et al. 2019)
IRNet (Guo et al. 2019)
RATSQL v2 (Wang et al. 2020)

RYANSQL (Ours)

With pretrained language models

RCSQL (Lee 2019)
EditSQL + BERT
IRNet + BERT
IRNet v2 + BERT
SLSQL + BERT (Lei et al. 2020)

RYANSQL + BERT (Ours)
RYANSQL v2 + BERT (Ours)

With DB content

34.8% 33.8%
36.4% 32.9%
53.3% 46.7%
62.7% 57.2%

43.4%

−

28.5% 24.3%
57.6% 53.4%
61.9% 54.7%
63.9% 55.0%
60.8% 55.7%

66.6% 58.2%
70.6% 60.6%

Global-GNN (Bogin, Gardner, and Berant 2019)
IRNet++ + XLNet
RATSQL v3 + BERT

52.7% 47.4%
65.5% 60.1%
69.7% 65.6%

leaderboard.1 The proposed system is compared with grammar-based systems
GrammarSQL (Lin et al. 2019), Global-GNN (Bogin, Gardner, and Berant 2019), IRNet
(Guo et al. 2019), and RATSQL (Wang et al. 2020). Also, we compared the proposed
system with RCSQL (Lee 2019), which so far showed the best performance on the
Spider data set using a sketch-based slot-ﬁlling approach.

Evaluation results are presented in three different groups, based on the use of
pretrained language models and database content. Although the use of database
content (i.e., cell values) could greatly improve the performance of a Text-to-SQL
system (as shown in Wang et al. 2018; Hwang et al. 2019; He et al. 2018), a Text-to-SQL
system could rarely have access to database content in real world applications due to
various reasons such as personal privacy, business secrets, or legal issues. Because the
use of database content improves the system performance but decreases the system
availability, we put models using database content in a separated group.

For RYANSQL v2, we trained two networks called table network and slot network,
with the same network architectures as the proposed RYANSQL. The table network
is trained to maximize the $TBL classiﬁcation accuracy on dev set; the slot network is trained to maximize the exact match accuracy on dev set as RYANSQL does, but the $TBL classiﬁcation results are fetched from the table network (which is ﬁxed during the
training of the slot network). During the inference, the model ﬁrst classiﬁes $TBLs using the table network, and ﬁlls other slots using the slot network. 1 https://yale-lily.github.io/spider, as of April 2020. 323 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e – p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 2 Table 8 Exact matching accuracy of the proposed system and other state-of-the-art systems for each hardness level. Med. means medium hardness. Approaches On dev set Easy Med. Hard Extra ALL RCSQL IRNet2 RATSQL v2 (with DB content) RATSQL v3 + BERT (with DB content) 53.2% 27.0% 20.1% 6.5% 28.8% 70.4% 55.0% 46.6% 30.6% 53.3% 80.4% 63.9% 55.7% 40.6% 62.7% 86.4% 73.6% 62.1% 42.9% 69.7% RYANSQL (Ours) RYANSQL + BERT (Ours) 69.2% 43.0% 28.2% 22.4% 43.4% 86.0% 70.5% 54.6% 40.6% 66.6% On test set IRNet IRNet + BERT RATSQL v2 (with DB content) RATSQL v3 + BERT (with DB content) 70.1% 49.2% 39.5% 19.1% 46.7% 77.2% 58.7% 48.1% 25.3% 54.7% 74.8% 60.7% 53.6% 31.5% 57.2% 83.0% 71.3% 58.3% 38.4% 65.6% RYANSQL + BERT (Ours) 81.2% 62.1% 51.9% 28.0% 58.2% As can be observed from the table, the proposed system RYANSQL improves the previous sketch-based slot-ﬁlling system RCSQL by a large margin of 15% on the dev set. Note that the RCSQL ﬁne-tuned another well-known pretrained language model ELMo (Peters et al. 2018). With the use of BERT, among the systems without database content, the proposed systems RYANSQL + BERT and RYANSQL v2 + BERT outperform the previous state-of-the-art by 2.5% and 4.9%, respectively, on the hidden test data set, in terms of exact matching accuracy. The proposed system still shows competitive results compared to the systems using database content; RATSQL v3 + BERT outperforms the proposed system by better aligning user questions and database schemas using database content. Table 8 compares the exact matching accuracies of the proposed systems and other state-of-the-art systems for each hardness level. The proposed RYANSQL + BERT out- performs the previous sketch-based approach RCSQL in every hardness level on dev set. Additionally, the proposed RYANSQL + BERT showed relatively poor performance for the test set at the Extra hardness level, compared to RATSQL v3 + BERT. This suggests that much test data at the Extra hardness level require database content to answer, since the two systems showed comparable results for the Extra hardness dev set. Next, ablation studies are conducted on the proposed methods to clarify the con- tribution of each feature. The results are presented in Table 9. It turns out that the use of SPC greatly improves the performances for Hard and Extra hardness levels. The result shows that the SPC plays an important role in generating the nested SQL queries. The SPC also slightly increases the performance for Easy and Medium hardness levels. 2 For IRNet + BERT, we downloaded the source code and trained the model from authors’ homepage (https://github.com/microsoft/IRNet), but we were not able to reproduce the authors’ suggested dev set exact matching accuracy. 324 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e – p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL Table 9 Ablation study results of the proposed models for each hardness level on dev set. SPC, SCN, and JTF represent the use of Statement Position Code, supplemented column names, and JOIN table ﬁltering, respectively. Approaches SPC SCN JTF Easy Med. Hard Extra ALL RYANSQL RYANSQL + BERT O O O O X X X X O O O O X X X X O O X X O O X X O O X X O O X X O X O X O X O X O X O X O X O X 69.2% 43.0% 28.2% 22.4% 43.4% 68.0% 40.2% 27.6% 19.4% 41.4% 62.0% 38.2% 24.1% 14.7% 37.7% 65.2% 37.7% 29.9% 18.8% 39.9% 63.2% 39.5% 19.0% 16.5% 38.0% 68.4% 41.6% 18.4% 14.7% 39.7% 63.2% 39.3% 14.9% 16.5% 37.2% 60.0% 38.2% 16.7% 11.8% 35.5% 86.0% 70.5% 54.6% 40.6% 66.6% 86.8% 66.1% 46.6% 42.4% 63.9% 76.4% 58.2% 46.6% 30.6% 56.1% 78.0% 63.4% 46.0% 28.8% 58.3% 85.6% 66.6% 27.0% 22.4% 57.3% 83.6% 68.4% 25.9% 26.5% 58.0% 78.4% 60.2% 21.3% 24.7% 52.2% 77.2% 60.5% 23.6% 25.9% 52.6% This is because the SPC helps the model to distinguish between each nested SELECT statement, thus removing noise on aligning question tokens and columns. The use of SCN moderately improves the accuracies for all hardness levels. This is expected, since SCN helps a database column to better align with question tokens by supplementing the column name with its table information. The JOIN table ﬁltering (JTF) increases performance only when the other two features SPC and SCN are used together. Analysis shows that for some cases, the link tables removed by JTF actually have their corresponding question tokens. One example is the SQL query “SELECT T3.amenity name FROM dorm AS T1 JOIN has amenity AS T2 ON T1.dormid = T2.dormid JOIN dorm amenity AS T3 ON T2.amenid = T3.amenid WHERE T1.dorm name = ‘Smith Hall’ ” for question “Find the name of amenities Smith Hall dorm have.” Table has amenity is considered as a link table, but there exist corresponding clues in the question. Removing the table from $TBL list according to the
JTF feature would introduce alignment noise during training. But the evaluation result
also shows that, by better aligning question and database schema using the other two
features SPC and SCN, the model can recover from the alignment noise introduced by
JTF, improving the overall system performance.

Proposed models are also evaluated without all three features SPC, SCN, and JTF
to separately see the contribution of our newly proposed sketch. Without the three
features, RYANSQL shows 35.5% accuracy on dev set, which is a 6.7% improvement
compared to another sketch-based slot-ﬁlling model RCSQL; RYANSQL + BERT shows
52.6% dev set accuracy.

Effect of the pretrained language model. There exists a huge performance gap of 23.2%
on dev set between RYANSQL and RYANSQL + BERT. SQL component matching
F1 scores for the two models are shown in Table 10 to ﬁgure out the reason. For the

325

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 47, Number 2

Table 10
SQL Component matching F1 scores of RYANSQL and RYANSQL + BERT on dev set.

Approaches

SELECT

WHERE

GROUP

ORDER

keywords

ALL

RYANSQL
RYANSQL + BERT

69.4% 47.4% 67.5% 73.9%
88.2% 74.4% 78.8% 83.3%

82.3%
88.5%

43.4%
66.6%

Table 11
Evaluations with different pretrained language models on dev set.

System

RYANSQL
RYANSQL + BERT-base
RYANSQL + BERT-large
RYANSQL + RoBERTa
RYANSQL + ELECTRA

Dev

43.4%
51.4%
66.6%
65.7%
63.6%

item keyword (which measures the existence of SQL predeﬁned keywords such as
SELECT or GROUP BY), the performance gap between the two models is 6.2%, which is
relatively small compared to the overall performance gap of 23.2%. Meanwhile, the
performance gaps on clause components such as WHERE are similar to or larger than the
overall performance gap. These evaluation results suggest that the use of a pretrained
language model mainly improves the column classiﬁcation performance, rather than
base structures classiﬁcation accuracy of a SQL query.

Next, a series of experiments is conducted to see if additional performance
improvements could be gained by applying different pretrained language models.
Table 11 shows the evaluation results with four different pretrained language models,
namely, BERT-base, BERT-large, RoBERTa (Liu et al. 2019), and ELECTRA (Clark et al.
2020). Although RoBERTa and ELECTRA are generally known to perform better than
BERT, the evaluation results showed no performance improvement.

Generality of SCN and JTF. The two proposed input manipulation methods SCN and
JTF are applied on IRNet (Guo et al. 2019) to see their generalities. We downloaded the
source code from the author’s homepage,3 and trained to obtain the dev set accuracy.
Evaluation results are shown in Table 12. The performance improvements due to
the two input manipulation methods were almost ignorable; because IRNet has the
separated schema linking preprocessing module, whose purpose is to link columns
with question tokens, the role of SCN and JTF are greatly reduced.

6.3 Evaluation on Different Data Sets

We conducted experiments on WikiSQL (Zhong, Xiong, and Socher 2017) and CSpider
(Min, Shi, and Zhang 2019) data sets to test the generalization capability of the proposed
model to new data sets. Table 13 shows the comparison between the proposed RYAN-

3 https://github.com/microsoft/IRNet.

326

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Choi et al.

RYANSQL

Table 12
Evaluations of IRNet with two input manipulation methods on dev set.

System SCN JTF

Dev

IRNET

O
O
X
X

O
X
O
X

52.3%
52.9%
52.3%
52.2%4

Table 13
Evaluation results on WikiSQL benchmark with other state-of-the-art systems.

System

Dev LF Dev X

Test LF

Test X

SQLova (Hwang et al. 2019)
X-SQL (He et al. 2018)
Guo and Gao (2019)
HydraNet (Lyu et al. 2020)

81.6 % 87.2 % 80.7 % 86.2 %
83.8 % 89.5 % 83.3 % 88.7 %
84.3 % 90.3 % 83.7 % 89.2 %
83.6 % 89.1 % 83.8 % 89.2 %

RYANSQL + BERT

81.6 % 87.7 % 81.3 % 87.0 %

SQL + BERT and other WikiSQL state-of-the-art systems. Only the systems without
execution-guided decoding (EGD) (Wang et al. 2018) are compared, since EGD makes
use of the database content. As can be observed from the table, the proposed RYANSQL
+ BERT showed comparable results to other WikiSQL state-of-the-art systems.

Next, we evaluated the proposed models on the CSpider data set. CSpider (Min,
Shi, and Zhang 2019) is a Chinese-translated version of the Spider benchmark. Only the
question of the Spider data set is translated; database table names and column names
remain in English. Evaluation on the CSpider data set will show whether the proposed
model could be applied on the different languages, even when the question language
and database schema language are different. To handle the case, we used multilingual
BERT, which has the same network architecture with BERT-base but is trained using
a multilingual corpus. Table 14 shows the comparisons between the proposed system
and other state-of-the-art systems on the leaderboard. Compared to the exact matching
accuracy 51.4% of RYANSQL + BERT-base on Spider data set, the multilingual version
shows 10% lower accuracy on dev set, but still shows comparable results to other
state-of-the-art systems that are designed for CSpider data set. Our proposed system
showed 34.7% test accuracy on the test set, and ranked 2nd place on the leaderboard.

4 We re-trained the IRNet model using the authors’ source code with the Spider data set. We were not able
to obtain the authors’ presented 53.3% accuracy on the dev set, and it turns out that the preprocessed
Spider data sets on the authors’ homepage and generated from the source code script are different. Since
we need to preprocess the data using the source code to apply the input manipulation methods, we
presented the dev set accuracy of our re-trained IRNet model, not the one presented in the authors’ paper.

327

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
7
2
3
0
9
1
9
3
0
9
8
3
/
c
o

l
i

_
a
_
0
0
4
0
3
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 47, Number 2

Table 14
Evaluation results on CSpider data set5 with other state-of-the-art systems.

System

SyntaxSQLNet (Yu et al. 2018b)
CN-SQL (Anonymous)
DG-SQL (Anonymous)
XL-SQL (Anonymous)

Dev

Test

16.4% 13.3%
22.9% 18.8%
35.5% 26.8%
54.9% 47.8%

RYANSQL + Multilingual BERT (Ours)

41.3% 34.7%

Table 15
Exact matching accuracy of the proposed system on the Spider dev set, with the different ranges
of overlap scores.

Overlap Score

Accuracy

Examples

0.0 ≤ O(Q, S) ≤ 0.2
0.2 < O(Q, S) ≤ 0.4 0.4 < O(Q, S) ≤ 0.6 0.6 < O(Q, S) ≤ 0.8 0.8 < O(Q, S) ≤ 1.0 Total 32.5% 47.3% 46.7% 67.0% 80.5% 66.6% 40 74 182 282 456 1,034 6.4 Error Analysis l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o We analyzed 345 failed examples of the RYANSQL + BERT on the development set. We were able to categorize 195 of those examples according to failure types. The most common cause of failure is column selection failure; 68 out of 195 cases (34.9%) suffered from the error. In many of these cases, the correct column name is not mentioned in a question; for example, for the question ”What is the airport name for airport ‘AKO’?”, the decoder chooses column AirportName instead of AirportCode as its WHERE clause condition column. As mentioned in Yavuz et al. (2018), cell value examples for each column will be helpful to solve this problem. The second frequent error is table number classiﬁcation error; 49 out of 195 cases (25.2%) belong to the category. The decoder occasionally chooses too many tables for the FROM clause, resulting in unnecessary table JOINs. Similarly, 22 out of 195 cases (11.3%) were due to condition number classiﬁcation error. Those errors could be handled by observing and updating the extracted slot values as a whole; for example, for a user question “List the maximum weight and type for each type of pet.” the system generates SQL query “SELECT PetType, max(weight), weight FROM Pets GROUP BY PetType.” If the system could observe the extracted slot values as a whole, it would ﬁgure out that extracting weight and max(weight) together for SELECT clause is unlikely. Our future work will mainly focus on solving this issue. l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 5 https://taolusi.github.io/CSpider-explorer/, as of July 2020. 328 Choi et al. RYANSQL Figure 3 Relation between the number of foreign key relations in the database, and exact matching accuracy of the queries on dev set. The remaining 150 errors were hard to be classiﬁed into one category, and some of them were due to different representations of the same meaning, for example: “SELECT max(age) FROM Dogs” vs. “SELECT age FROM Dogs ORDER BY age DESC LIMIT 1.” Next, we tried to see if the proposed model could handle user questions in which words are different from database column and table names. We deﬁne an overlap score O(Q, S) = size(w(Q)∩w(S)) between a user question Q and its SQL translation S. In the equa- tion, w(Q) is the set of stemmed words in Q, and w(S) is the set of stemmed words from column names and table names used in S. Intuitively, the score measures how much overlap exists between the column/table names of SQL query S and user question Q. size(w(S)) Overlap scores are calculated for question-SQL query pairs in the Spider dev set. The data set is divided into ﬁve categories based on the calculated overlap scores; Table 15 shows exact matching accuracies of the proposed RYANSQL for those categories. As can be seen from the table, the proposed system shows relatively low performance on the examples with low overlap scores. This suggests one limitation of the proposed system: Even with the aid of pre-trained language models, the system frequently fails to link between question tokens and database schema when their words are different. Better alignment methods between question tokens and database schema should be studied as a future work to further improve the system performance. Another limitation of the proposed model is that the model does not use foreign keys during encoding; foreign keys are used only for JOIN table ﬁltering. We analyzed the correlation between the number of foreign keys and exact matching accuracies in Figure 3, to ﬁgure out the effect of such limitation. The number of foreign keys and exact matching accuracy shows weak negative correlation, with Pearson correlation coefﬁcient ρ = −0.22. Based on the analysis result, in future work we will try to integrate the foreign keys into the encoding process, for example, by using the relation aware transformer proposed in Wang et al. (2020), to improve the proposed model further. 329 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 00.10.20.30.40.50.60.70.80.9112345711AccuracyNumber of Foreign Keys Computational Linguistics Volume 47, Number 2 7. Conclusion In this article, we proposed a sketch-based slot-ﬁlling algorithm for complex, cross- domain Text-to-SQL problems. A detailed sketch for complex SELECT statement prediction is proposed, along with the Statement Position Code to handle nested queries. Two simple but effective input manipulation methods are additionally proposed to enhance the overall system performance further. The system achieved 3rd place among all systems and 1st place among the systems not using database content, on the challenging Spider benchmark data set. Based on the error analysis results, as a next step of the research we will focus on globally updating the extracted slots by considering the slot prediction values as a whole. The analysis results also show the need to encode the relation structures of the database schema, for example, foreign keys, to improve the performance. We will also work on a method to effectively use the database content instead of using only the database schema, to further improve the system performance for the cases when database content is available. References Abadi, Mart´ın, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Man, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vigas, Oriol Vinyals, PeteWarden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Computing Research Repository, arXiv:1607.06450. Beck, Thorsten, Asli Demirguc-Kunt, and Ross Levine. 2000. A new database on the structure and development of the ﬁnancial sector. The World Bank Economic Review, 14(3):597–605. https://doi.org/10.1093 /wber/14.3.597 Bogin, Ben, Matt Gardner, and Jonathan Berant. 2019. Global reasoning over database structures for text-to-SQL parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3650–3655. Hong Kong. 330 Clark, Kevin, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. Computing Research Repository, arXiv:2003.10555. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186. Minneapolis, MN. Dong, Li and Mirella Lapata. 2016. Language to logical form with neural attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 33–43. Berlin. https://doi .org/10.18653/v1/P16-1004 Dong, Li and Mirella Lapata. 2018. Coarse-to-ﬁne decoding for neural semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 731–742. Melbourne. https://doi.org/10.18653 /v1/P18-1068 Guo, Jiaqi, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards complex text-to-SQL in cross-domain database with intermediate representation. Computing Research Repository, arXiv:1905 .082057. Guo, Tong and Huilin Gao. 2019. Content enhanced BERT-based text-to-SQL generation. Computing Research Repository, arXiv:1910.07179. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Choi et al. RYANSQL He, Pengcheng, Yi Mao, Kaushik Chakrabarti, and Weizhu Chen. 2018. X Reinforce schema representation with context. Computing Research Repository, arXiv:1808.073837. Hillestad, Richard, James Bigelow, Anthony Bower, Federico Girosi, Robin Meili, Richard Scoville, and Roger Taylor. 2005. Can electronic medical record systems transform health care? Potential health beneﬁts, savings, and costs. Health Affairs. 24(5):1103–1117. https://doi.org /10.1377/hlthaff.24.5.1103, PubMed: 16162551 Hu, Minghao, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, and Ming Zhou. 2018. Reinforced mnemonic reader for machine reading comprehension. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence, pages 4099–4106. Stockholm. Hwang, Wonseok, Jinyeong Yim, Seunghyun Park, and Minjoon Seo. 2019. A comprehensive exploration on WikiSQL with table-aware word contextualization. Computing Research Repository, arXiv:1902.01069. Lee, Dongjun. 2019. Clause-wise and recursive decoding for complex and cross-domain text-to-SQL generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6047–6053, Hong Kong. https://doi.org/10.18653 /v1/D19-1624 Lei, Wenqiang, Weixin Wang, Zhixin Ma, Tian Gan, Wei Lu, Min-Yen Kan, and Tat-Seng Chua. 2020. Re-examining the role of schema linking in text-to-SQL. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6943–6954. Online. Lin, Kevin, Ben Bogin, Mark Neumann, Jonathan Berant, and Matt Gardner. 2019. Grammar-based neural text-to-SQL generation. Computing Research Repository, arXiv:1905.13326. Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. Computing Research Repository. arXiv:1907.11692. Lyu, Qin, Kaushik Chakrabarti, Shobhit Hathi, Souvik Kundu, Jianwen Zhang, and Zheng Chen. 2020. Hybrid ranking network for text-to-SQL, Microsoft Dynamics 365 AI. Min, Qingkai, Yuefeng Shi, and Yue Zhang. 2019. A pilot study for Chinese SQL semantic parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3652–3658. Hong Kong. https://doi.org/10.18653/v1/D19-1377 Mou, Lili, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 130–136. Berlin. https://doi.org /10.18653/v1/P16-2022 Ngai, Eric WT, Li Xiu, and Dorothy C. K. Chau. 2009. Application of data mining techniques in customer relationship management: A literature review and classiﬁcation. Expert Systems with Applications, 36(2):2592–2602. https:// doi.org/10.1016/j.eswa.2008.02.021 Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha. https://doi.org/10.3115/v1 /D14-1162 Peters, Matthew, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 2227–2237. Minneapolis, MN. https://doi.org/10 .18653/v1/N18-1202 Price, P. J. 1990. Evaluation of spoken language systems: The ATIS domain. In HLT ’90: Proceedings of the Workshop on Speech and Natural Language, pages 91–95. Hidden Valley, PA. https://doi.org/10 .3115/116580.116612 Shi, Tianze, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, and Weizhu Chen. 2018. IncSQL: Training incremental text-to-SQL parsers with non-deterministic oracles. Computing Research Repository. arXiv:1809.05054. Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from 331 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 2 overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958. Srivastava, Rupesh K., Klaus Greff, and Jrgen Schmidhuber. 2015. Training very deep networks. Advances in Neural Information Processing Systems, pages 2377–2385. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, 30:5998–6008. Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. Advances in Neural Information Processing Systems, pages 2692–2700. Wang, Bailin, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578. Online. https://doi .org/10.18653/v1/2020.acl-main.677 Wang, Chenglong, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. 2018. Robust text-to-SQL generation with execution-guided decoding. Computing Research Repository, arXiv:1807.03100. Xu, Xiaojun, Chang Liu, and Dawn Song. 2017. SQLnet: Generating structured queries from natural language without reinforcement learning. Computing Research Repository, arXiv:1711.04436. Yavuz, Semih, Izzeddin Gur, Yu Su, and Xifeng Yan. 2018. What it takes to achieve 100% condition accuracy on WikiSQL. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1702–1711. Brussels. https://doi.org/10.18653/v1/D18-1197 Yoon, Deunsol, Dongbok Lee, and Sangkeun Lee. 2018. Dynamic self-attention: Computing attention over words dynamically for sentence embedding. Computing Research Repository, arXiv:1808.073837. Yu, Tao, Zifan Li, Zilin Zhang, Rui Zhang, and Dragomir Radev. 2018a. TypeSQL: Knowledge-based type-aware neural text-to-SQL generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–594. New Orleans, LA. Yu, Tao, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018b. SyntaxSQLnet: Syntax tree networks for complex and cross-domain text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1653–1663. Brussels. https://doi.org/10.18653/v1/D18-1193 Yu, Tao, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018c. Spider: A large-scale human-labeled data set for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921. Brussels. https://doi.org/10.18653/v1/D18-1425 Zelle, John Marvin and Raymond Joseph Mooney. 1996. Learning to parse database queries using inductive logic programming. In Proceedings of the Thirteenth National Conference On Artiﬁcial Intelligence, volume 2, pages 1050–1055. Portland, OR. Zhang, Rui, Tao Yu, Heyang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, and Dragomir Radev. 2019. Editing-based SQL query generation for cross-domain context-dependent questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5341–5352. Hong Kong. Zhong, Victor, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. Computing Research Repository, arXiv:1709.00103. 332 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 2 3 0 9 1 9 3 0 9 8 3 / c o l i _ a _ 0 0 4 0 3 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3
Download pdf