Strong Equivalence of TAG and CCG

Strong Equivalence of TAG and CCG

Lena Katharina Schiffer and Andreas Maletti

Faculty of Mathematics and Computer Science, Universit¨at Leipzig, Germany
P.O. Box 100 920, D-04009 Leipzig, Germany
{schiffer,maletti}@informatik.uni-leipzig.de

Abstract

Tree-adjoining grammar (TAG) and combina-
tory categorial grammar (CCG) are two well-
established mildly context-sensitive grammar
formalisms that are known to have the same
expressive power on strings (i.e., generate the
same class of string languages). It is demon-
strated that their expressive power on trees
also essentially coincides. In fact, CCG with-
out lexicon entries for the empty string and
only first-order rules of degree at most 2 are
sufficient for its full expressive power.

1

Introduction

grammar

categorial

Combinatory
(CCG)
(Steedman, 2000; Steedman and Baldridge, 2011)
is one of several grammar formalisms that were
introduced as an extension of context-free gram-
mars. In particular, CCG extends the classical cate-
gorial grammar (Bar-Hillel et al., 1960), which has
the same expressivity as context-free grammar, by
rules that are inspired by combinatory logic (Curry
et al., 1958). CCG is a mildly context-sensitive
grammar
formalism (Joshi, 1985). Context-
sensitive grammar formalisms are formalisms
that are efficiently parsable (i.e., in polynomial
time) and have expressivity beyond the context-
free languages. They are able to express a limited
amount of cross-serial dependencies and have the
constant growth property. Because of these fea-
tures and its notion of syntactic categories, which
is quite intuitive for natural languages, CCG has
become widely applied in compuational linguis-
tics (Steedman, 2000). Further, it can be enhanced
by semantics through the lambda calculus.

CCG is based on a lexicon and a rule system.
The lexicon assigns syntactic categories to the
symbols of an input string and the rule system
describes how neighboring categories can be
combined to new categories. Each category has

707

a target, which is similar to the return type of a
function, and optionally, a number of arguments.
Different from functions, each argument has a
directionality that indicates if it is expected on the
left or the right side. If repeated combination of
categories leads to a (binary) derivation tree that
comprises all input symbols and is rooted in an
initial category, then the input string is accepted.
When defining CCG, there are many degrees of
freedom yielding a number of different variants
(Steedman, 2000; Baldridge, 2002; Steedman and
Baldridge, 2011; Kuhlmann et al., 2015). This
is a consequence of the linguistically motivated
need to easily express specific structures that have
been identified in a particular theory of syntax
for a given natural language. However, we and
others (Kuhlmann et al., 2015) are interested in
the expressive power of CCGs as generators of
formal languages, since this allows us to disentan-
gle the confusion of subtly different formalisms
and identify the principal structures expressible
by a common core of the formalisms. As linguis-
tic structure calls for a representation that goes
beyond strings, we aim for a characterization of
expressive power in terms of the generated trees.
The most famous result on the expressive power
of CCG is by Vijay-Shanker and Weir (1994),
(TAG),
showing that
linear-indexed grammar (LIG), head grammar
(HG), and CCG generate the same string lan-
guages. An equivalent automaton model is the
embedded push-down automaton (Vijay-Shanker,
1988). In the definition of CCG used by Vijay-
Shanker and Weir (1994),
the lexicon allows
ε-entries, which assign syntactic categories to the
empty string ε. Their rule system restricts rules
to specific categories and limits the rule degree.
CCG with unbounded rule degree are Turing-
complete (Kuhlmann et al., 2018). Prefix-closed
CCG without target restrictions, in which the rules
obey special closure properties, are less powerful.

tree-adjoining grammar

Transactions of the Association for Computational Linguistics, vol. 9, pp. 707–720, 2021. https://doi.org/10.1162/tacl a 00393
Action Editor: Mark Steedman. Submission batch: 9/2020; Revision batch: 12/2020; Published 8/2021.
c(cid:2) 2021 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

This even holds for multimodal CCGs (Kuhlmann
et al., 2010, 2015), which allow many types of
directionality indicators (i.e., slashes).

When going beyond the level of string lan-
guages, there exist different notions of strong
generative power. We consider two formalisms as
strongly equivalent if their generated derivation
tree languages coincide modulo relabelings. For
example, the well-known local and regular tree
grammars (G´ecseg and Steinby, 1997) are strongly
equivalent. On the other hand, Hockenmaier and
Young (2008) regard two formalisms as strongly
equivalent if they capture the same sets of depen-
dencies. Then there exist specific scrambling
cases whose dependencies can be expressed by
their CCG, but not by Lexicalized TAG (LTAG).
Their CCG are syntactically more expressive than
ours and allow type-raising, whereas the strong
generative capacity (in our sense) of LTAG is
strictly smaller than that of TAG (Kuhlmann and
Satta, 2012). The dependencies expressed by CCG
without rule restrictions and TAG are shown to be
incomparable by Koller and Kuhlmann (2009).

Returning to our notion of strong generative
capacity, Kuhlmann et al. (2019) investigated the
tree-generative capacity of CCG without ε-entries.
The generated trees are always binary. CCG
with application and first-degree composition
rules generate exactly the regular tree languages
(G´ecseg and Steinby, 1997). Without the compo-
sition rules, only a proper subset can be generated.
trees
The languages of CCG rule trees (i.e.,
labeled by applied rules instead of categories) with
bounded rule degree can also be generated by sim-
ple monadic context-free tree grammar (sCFTG).
For the converse direction, we show that the
tree languages generated by sCFTG can also be
generated by CCG, which shows strong equiva-
lence. This answers several open questions. Since
sCFTG and TAG are strongly equivalent (Kepser
and Rogers, 2011), our result also shows strong
equivalence of CCG and TAG. In contrast to the
construction of Vijay-Shanker and Weir (1994),
which relies heavily on ε-entries, our construction
avoids them and shows that they do not increase
the expressive power of CCG. Additionally, we
only use rules up to degree 2 and first-order cat-
egories (i.e., arguments are atomic), which shows
that larger rule degree or higher-order categories
do not increase the expressive power.

Our construction proceeds roughly as follows.
We begin with a spine grammar, which is a variant

708

of sCFTG that is also strongly equivalent to TAG.
We encode its spines using a context-free gram-
mar, which in turn can be represented by a special
variant of push-down automata. Finally, the runs
of the push-down automaton are simulated by
the stack operations of the
a CCG such that
automaton are realized by adding and removing
arguments of the categories.

2 Preliminaries

The nonnegative integers are N. For every k ∈ N,
we let [k] = {i ∈ N | 1 ≤ i ≤ k}. The set Σ∗
contains all strings over the finite set Σ including
the empty string ε. We let Σ+ = Σ∗ \ {ε}. The
length of w ∈ Σ∗ is |w|, and concatenation is
written as juxtaposition. The prefixes Pref(w) of a
string w ∈ Σ∗ are {u ∈ Σ∗ | ∃v ∈ Σ∗ : w = uv}.
A string language is a subset L ⊆ Σ∗. Given a
relation ⇒ ⊆ S2, we let ⇒∗ be the reflexive,
transitive closure of ⇒.

2.1 Tree Languages

In this paper, we only deal with binary trees since
the derivation trees of CCGs are binary. Thus, we
build trees over ranked sets Σ = Σ0 ∪ Σ1 ∪ Σ2.
If Σ is an alphabet, then it is a ranked alphabet.
For every k ∈ {0, 1, 2}, we say that symbol
a ∈ Σk has rank k. We write TΣ2, Σ1(Σ0) for the
set of all trees over Σ, which is the smallest set T
such that c(t1, . . . , tk) ∈ T for all k ∈ {0, 1, 2},
c ∈ Σk, and t1, . . . , tk ∈ T . As usual, we write
just a for leaves a() with a ∈ Σ0. A tree language
is a subset T ⊆ TΣ2,∅(Σ0). Let T = TΣ2,Σ1(Σ0).
(cid:2)
The map pos : T → P+
assigns Gorn tree
addresses (Gorn, 1965) to a tree, where P+(S) is
the set of all nonempty subsets of S. Let

[2]∗

(cid:3)

(cid:2)
pos

(cid:3)

(cid:4)

c(t1, . . . , tk)

= {ε} ∪

{iw}

i∈[k]
w∈pos(ti)

(cid:5)

(cid:6)

w ∈ pos(t) | w1 /∈ pos(t)

for all k ∈ {0, 1, 2}, c ∈ Σk and t1, . . . , tk ∈ T .
The set of all leaf positions of t is defined as
leaves(t) =
. Given
a tree t ∈ T and a position w ∈ pos(t), we
write t|w and t(w) to denote the subtree rooted
in w and the symbol at w, respectively. Addi-
tionally, we let t[t(cid:12)]w be the tree obtained when
replacing the subtree appearing in t at position
w by t(cid:12) ∈ T . Finally, let yield : T → Σ+
0 be
defined by yield(a) = a for all a ∈ Σ0 and
(cid:2)
= yield(t1) · · · yield(tk) for
yield
all k ∈ [2], c ∈ Σk, and t1, . . . , tk ∈ T .

c(t1, . . . , tk)

(cid:3)

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

The special leaf symbol (cid:2) is reserved and
represents a hole in a tree. The set CΣ2,Σ1(Σ0) of
contexts contains all trees of TΣ2,Σ1
, in
which (cid:2) occurs exactly once. We write pos(cid:2)(C)
to denote the unique position of (cid:2) in the
context C ∈ CΣ2,Σ1(Σ0). Moreover, given t ∈ T
we simply write C[t] instead of C[t]pos(cid:2)(C).

(cid:3)
Σ0 ∪ {(cid:2)}

(cid:2)

A tuple (ρ0, ρ1, ρ2) is called a relabeling if
ρk : Σk → Δk for all k ∈ {0, 1, 2} and ranked set
Δ. It induces the map ρ : T → TΔ2,Δ1(Δ0) given
(cid:3)
(cid:2)
ρ(t1), . . . , ρ(tk)
by ρ
for all k ∈ {0, 1, 2}, c ∈ Σk and t1, . . . , tk ∈ T .

c(t1, . . . , tk)

ρk(c)

(cid:3)(cid:2)

=

(cid:3)

(cid:2)

2.2 Combinatory Categorial Grammar

In the following, we give a short introduction to
CCG. Given an alphabet A of atoms or atomic
categories and a set of slashes D = {/, \}
indicating directionality, the set of categories is
defined as C(A) = TD,∅(A). We usually write the
categories in infix notation and the slashes are
left-associative by convention, so each category
takes the form c = a |1 c1 · · · |k ck where a ∈ A,
|i ∈ D, ci ∈ C(A) for all i ∈ {1, . . . , k}. The
atom a is called the target of c and written as tar(c).
The slash-category pairs |i ci are called arguments
and their number k is called the arity of c and
denoted by ar(c). In addition, we write arg(c, i)
to get the i-th argument |i ci of c. In the sense
of trees, the sequence of arguments is a context
(cid:2) |1 c1 · · · |k ck. The set of argument contexts is
denoted by A(A) ⊆ CD,∅(A). We distinguish
between two types of categories. In first-order
categories, all arguments are atomic, whereas in
higher-order categories, the arguments can have
arguments themselves.

Next, we describe how 2 neighboring categories
can be combined. Intuitively, the direction of
the slash determines on which side a category
matching the argument is expected. Hence there
are 2 types of rules. Despite the conventions for
inference systems, we put the inputs (premises)
below and the output (conclusion) above to make
the shape of the proof tree apparent. A rule of
degree k with k ∈ N has one of the following
forms:

ax |1 c1 · · · |k ck

ax/c

c |1 c1 · · · |k ck

ax |1 c1 · · · |k ck

c |1 c1 · · · |k ck

ax\c

(forward rule)

(backward rule)

where a ∈ A, c ∈ C(A) ∪ {y}, |i ∈ D, and
ci ∈ C(A)∪{yi} for all i ∈ [k]. Here, y, y1, . . . , yk
are category variables that can match any category
in C(A) and x is an argument context variable that
can match any argument context in A(A). The
category taking the argument (ax | c with | ∈ D)
is called primary category, the one providing
it (c |1 c1 · · · |k ck) is called secondary category,
and they are combined to an output category
(ax |1 c1 · · · |k ck). Given rule r, we write sec(r) to
refer to the secondary category. Rules of degree 0
will be referred to as application rules, while rules
of higher degree are composition rules. We write
R(A) for the set of all rules over A. A rule system
is a pair Π = (A, R), where A is an alphabet
and R ⊆ R(A) is a finite set of rules over A.
Given a rule r ∈ R, we obtain a ground instance
of it by replacing the variables {y, y1, . . . } by
concrete categories and the variable x by a
concrete argument context. The ground instances
of Π induce a relation →Π ⊆ C(A)2 × C(A)
and we write c(cid:12)(cid:12)
c c(cid:12) Π instead of (c, c(cid:12)) →Π c(cid:12)(cid:12). The
relation →Π extends to a relation ⇒Π ⊂ (C(A)∗)2
on sequences of categories. It is given by
(cid:7)

(cid:9)

(cid:4)

⇒Π =

ϕ,ψ∈C(A)∗

(ϕcc(cid:12)ψ, ϕc(cid:12)(cid:12)ψ)

(cid:8)
(cid:8)
(cid:8)
(cid:8)

c(cid:12)(cid:12)
c c(cid:12) Π

A combinatory categorial grammar (CCG) is
a tuple G = (Σ, A, R, I, L) that consists of an
alphabet Σ of input symbols, a rule system (A, R),
a set I ⊆ A of initial categories, and a finite
relation L ⊆ Σ × C(A) called lexicon. It is called
k-CCG if each rule r ∈ R has degree at most k,
where k ∈ N.

The CCG G generates the category sequences
CG ⊆ C(A)∗ and the string language L(G) ⊆ Σ∗
given by

CG =

(cid:4)

(cid:5)

a0∈I

ϕ ∈ C(A)∗

(cid:8)
(cid:8) ϕ ⇒∗

(A,R) a0

(cid:6)

and L(G) = L−1(CG), where the string language
L(G) contains all strings that can be relabeled
via the lexicon to a category sequence in CG.
A tree t ∈ TC(A),∅(L(Σ)) is called derivation
t(w·2) (A, R) for every w ∈
tree of G if
pos(t) \ leaves(t). We denote the set of all
derivation trees of G by D(G).

t(w·1)

t(w)

A category relabeling ρ : C(A) → Δ is
a relabeling such that ρ(c) = ρ(c(cid:12)) for all
categories c, c(cid:12) ∈ C(A) with tar(c) = tar(c(cid:12))

709

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

geared towards our needs and still accepts the
context-free languages (of strings of length ≥ 2).
It will be similar to the push-down Moore ma-
chines of Decker et al. (2013). Instead of pro-
cessing input symbols as part of transitions (as
in Mealy machines), Moore machines output a
unique input symbol in each state (Fleischner,
1977). For every set Γ, we let Γ≤1 = {ε} ∪ Γ
and Γ≥2 = {w ∈ Γ∗ | 2 ≤ |w|} be the sets of
strings over Γ of length at most 1 and at least 2,
respectively.

Definition 2. A Moore push-down automaton
(MPDA) A = (Q, Σ, Γ, δ, τ, I, F ) consists of
input
(i) finite sets Q, Σ, and Γ of states,
symbols, and stack symbols, respectively, (ii) a
(cid:3)
set δ ⊆
Q × Γ × Γ × Q
of transitions, (iii) an output function τ : Q → Σ,
and (iv) sets I, F ⊆ Q of initial and final states,
(cid:3)
respectively.

Q × Γ≤1 × Γ≤1 × Q

\

(cid:2)

(cid:3)

(cid:2)

Due to the definition of δ, in a single step we can
either push or pop a single stack symbol or ignore
the stack. Note that we explicitly exclude the case
where a symbol is popped and another symbol
is pushed at the same time. In the following, let
A = (Q, Σ, Γ, δ, τ, I, F ) be an MPDA. On the
set ConfA = Q × Γ∗ of configurations of A the
move relation (cid:16)A ⊆ Conf 2

A is

(cid:4)

(cid:16)A =

(cid:10)(cid:2)

(cid:17)q, γα(cid:18), (cid:17)q(cid:12), γ(cid:12)α(cid:18)

(cid:3) (cid:8)
(cid:8)
(cid:8) α ∈ Γ+

(cid:11)

(q,γ,γ(cid:12),q(cid:12))∈δ

and a configuration (cid:17)q, α(cid:18) ∈ ConfA is initial
(respectively, final) if q ∈ I and α ∈ Γ (respec-
tively, q ∈ F and α = ε). An accepting run
is a sequence ξ0, . . . , ξn ∈ ConfA of configura-
tions that are successively related by moves (i.e.,
ξi−1 (cid:16)A ξi for all i ∈ [n]), starts with an initial
configuration ξ0, and finishes in a final configura-
tion ξn. In other words, we can start in an initial
state with an arbitrary symbol on the stack and
finish in a final state with the empty stack, and for
each intermediate step there has to exist a transi-
tion. The language accepted by A contains exactly
those strings w ∈ Σ∗, for which there exists
an accepting run (cid:17)q0, α0(cid:18), . . . , (cid:17)qn, αn(cid:18) such that
w = τ (q0) · · · τ (qn). Thus, we accept the strings
that are output symbol-by-symbol by the states
attained during an accepting run. As usual, two
MPDA are equivalent if they accept the same lan-
guage. Since no initial configuration is final, each
accepting run has length at least 2, so we can only
accept strings of length at least 2. While we could

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 1: CCG derivation tree (see Example 2.2).

(cid:2)

(cid:3)

(cid:2)

(cid:3)

c, ar(c)

and arg
. The relabeled
derivation trees Tρ(G) ⊆ TΔ2, ∅(Δ0) are given by

= arg

c(cid:12), ar(c(cid:12))

(cid:5)

Tρ(G) =

ρ(t)

(cid:8)
(cid:8) t ∈ D(G), t(ε) ∈ I

(cid:6)

A tree language T ⊆ TΔ2, ∅(Δ0) is generatable by
G if there is a category relabeling ρ(cid:12) : C(A) → Δ
such that T = Tρ(cid:12)(G).
Example 1. Let G = (Σ, A, R(A, 2), {⊥}, L)
with Σ = {α, β, γ, δ} and A = {⊥, a, b, c, d, e}
be a CCG with the lexicon L given below, where
R(A, 2) is the set of all rules over A up to degree 2.
Thus, it is a 2-CCG.

L(α) = {a, b}
L(β) = {c\b, c\b\e, e, e/e}
L(γ) = {d/c, c\a/c}
L(δ) = {⊥/d}

⊥x\a/c

c\a/c

is a forward rule of degree 2 in
⊥x/c
R(A, 2), where x is an argument context and
can thus be replaced by an arbitrary sequence of
arguments. Utilizing x = (cid:2)\a yields the ground
c\a/c , which has primary category
instance
c1 = ⊥\a/c and secondary category c2 = c\a/c.
The latter has target tar(c2) = c and the two
arguments \a and /c, so its arity is ar(c2) = 2.

⊥\a/a/c

⊥\a/c

A derivation tree of G is depicted in Figure 1.
We start at the bottom with categories taken from
the lexicon in accordance with the input symbols.
Then neighboring categories are combined until
we arrive at the root with initial category ⊥, so
(cid:3)
the input word is accepted.

3 Push-down Automata

We start by introducing a Moore variant of push-
down automata (Autebert et al., 1997) that is

710

adjust the model to remove this restriction, the
presented version serves our later purposes best.
Theorem 3. MPDA accept
languages of strings of length at least 2.

the context-free
(cid:3)
The MPDA A is pop-normalized if there exists
a map pop : Γ → Q such that q(cid:12) = pop(γ) for
every transition (q, γ, ε, q(cid:12)) ∈ δ. In other words,
for each stack symbol γ ∈ Γ there is a unique
state pop(γ) that the MPDA enters whenever γ is
popped from the stack.

Later on, we will simulate the runs of an MPDA
in a CCG such that subsequent configurations are
represented by subsequent primary categories.
Popping transitions are modeled by removing the
last argument of a category. Thus, the target state
has to be stored in the previous argument. This
argument is added when the according pushing
transition is simulated, so at that point we already
have to be aware in which state the MPDA will
end up after popping the symbol again. This will
be explained in more detail in Section 7.

We can easily establish this property by stor-
ing a state in each stack symbol. Each pushing
transition is replaced by one variant for each state
(i.e., we guess a state when pushing), but when a
symbol is popped, this is only allowed if the state
stored in it coincides with the target state.
Lemma 4. For every MPDA we can construct an
(cid:3)
equivalent pop-normalized MPDA.

The next statement shows that we can provide
a form of look-ahead on the output. In each new
symbol we store the current as well as the next
output symbol. Standard techniques can be used
to prove the statement. We will briefly sketch why
this look-ahead is necessary. Before constructing
the CCG, the MPDA will be used to model a
spine grammar. The next output symbol of the
MPDA corresponds to the label of the parent node
along a so-called spine of a tree generated by the
spine grammar. From this parent node we can
determine the possible labels of its other child.
This information will be used in the CCG to
control which secondary categories are allowed as
neighboring combination partners.

Lemma 5. For every context-free language
L ⊆ Σ∗ and (cid:12) /∈ Σ, the language Next(L) is
context-free, where
(cid:5)

(cid:4)

(cid:6)

Next(L) =

(cid:17)σ2, σ1(cid:18) · · · (cid:17)σn, σn−1(cid:18)(cid:17)(cid:12), σn(cid:18)

n∈N, σ1,…,σn∈Σ
σ1,…,σn∈L

(cid:3)

711

Corollary 6. For every context-free language
L ⊆ Σ≥2 there exists a pop-normalized MPDA A
(cid:3)
such that L(A) = Next(L).

4 Spine Grammars

Now we move on to representations of tree lan-
guages. We first recall context-free tree grammars
(Rounds, 1969), but only the monadic simple
variant (Kepser and Rogers, 2011).

Definition 7. A simple monadic context-free tree
grammar (sCFTG) is a tuple G = (N, Σ, S, P )
consisting of (i) disjoint ranked alphabets N and
Σ of nonterminal and terminal symbols with N =
N1 ∪ N0 and Σ1 = ∅, (ii) a nullary start nontermi-
nal S ∈ N0, and (iii) a finite set P ⊆ P0 ∪ P1 of
productions, where P0 = N0 × TΣ2,N1(N0 ∪ Σ0)
(cid:3)
and P1 = N1 × CΣ2,N1(N0 ∪ Σ0).
In the following let G = (N, Σ, S, P ) be an
sCFTG. We write (n, r) ∈ P simply as n → r.
Given t, u ∈ TΣ2,N1(Σ0 ∪ N0) we let t ⇒G u
if there exist (n → r) ∈ P and a position
w ∈ pos(t) such that (i) t|w = n and u = t[r]w
(cid:13)
with n ∈ N0, or (ii) t|w = n(t(cid:12)) and u = t
w
with n ∈ N1 and t(cid:12) ∈ TΣ2,N1(Σ0 ∪ N0). The tree
language T (G) generated by G is

r[t(cid:12)]

(cid:12)

T (G) = {t ∈ TΣ2,∅(Σ0) | S ⇒∗

G t}

.

(cid:3)

(cid:3)

T (G)

(cid:2)
= yield

The sCFTG G(cid:12)
is strongly equivalent to G if
T (G) = T (G(cid:12)), and it is weakly equivalent to G if
(cid:2)
T (G(cid:12))
yield
Spine grammars (Fujiyoshi and Kasai, 2000)
are a restriction on simple monadic context-free
tree grammars that remain equally expressive by
Lemma 5.4 of Fujiyoshi and Kasai (2000) modulo
relabelings. Let us clarify this result. Clearly, each
spine grammar is itself an sCFTG and for each
sCFTG G there exists a spine grammar G(cid:12) and a
relabeling ρ such that T (G) = {ρ(t) | t ∈ T (G(cid:12))}.
Although sCFTGs are more established, we elect
to utilize spine grammars because of their essential
notion of spines.
Definition 8. The sCFTG G is a spine grammar
if there exists a map d : Σ2 → {1, 2} such that
wi ∈ Pref(pos(cid:2)(C)) with i = d(C(w)) for every
production (n → C) ∈ P with n ∈ N1 and
(cid:3)
w ∈ Pref(pos(cid:2)(C)) with C(w) ∈ Σ2.
Henceforth let G be a spine grammar with
map d : Σ2 → {1, 2}. Consider a production

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

the spine continues to the left,

(n → C) ∈ P with n ∈ N1. The spine of C
is simply the path from the root of C to the unique
occurrence pos(cid:2)(C) of (cid:2). The special feature
of a spine grammar is that the symbols along
the spine indicate exactly in which direction the
spine continues. Since only the binary terminal
symbols offer branching, the special feature of
spine grammars is the existence of a map d that
tells us for each binary terminal symbol σ ∈ Σ2
whether
in
which case d(σ) = 1, or to the right, in which
case d(σ) = 2. This map d, called spine direction,
applies to all instances of σ in all productions with
spines. In the original definition of spine grammars
(Fujiyoshi and Kasai, 2000, Definition 3.2), only
nonterminal symbols have a spine direction. By
creating copies of binary terminal symbols we can
show that both variants are equivalent modulo
relabelings.
Definition 9. Spine grammar G is in normal form
if each (n → r) ∈ P is of the form (i) start:
r = b(α) or r = α for some b ∈ N1 and α ∈ Σ0,
(cid:3)
for some b1, b2 ∈ N1,
(ii) chain: r = b1
or (iii) terminal: r = σ((cid:2), a) or r = σ(a, (cid:2)) for
(cid:3)
some σ ∈ Σ2 and a ∈ N0 \ S.

b2((cid:2))

(cid:2)

In spine grammars in normal form, the initial
nonterminals are isolated and cannot occur on the
right-hand sides. The 3 production types of the
normal form are illustrated in Figure 2. Using
a single start production followed by a number
of chain and terminal productions, a nullary
nonterminal n can be rewritten to a tree t that
consists of a spine of terminals, where each non-
spinal child is a nullary nonterminal. Formally,
for every nullary nonterminal n ∈ N0 let

IG(n) = {t ∈ TΣ2, ∅(Σ0 ∪ N0) | n (⇒G ; ⇒∗

G(cid:12)) t}

that

where G(cid:12) is the spine grammar G without start
is, G(cid:12) = (N, Σ, S, P (cid:12)) with
productions;
productions P (cid:12) = {(n → r) ∈ P | n ∈ N1}.
So we perform a single derivation step using
the productions of G followed by any number of
derivation steps using only productions of G(cid:12). The
elements of IG(n) are called spinal trees for n and
their spine generator is n. By a suitable renaming
of nonterminals we can always achieve that the
spine generator does not occur in any of its spinal
trees. The spine grammar G is normalized if it is in
normal form and IG(n) ⊆ TΣ2, ∅(Σ0 ∪ (N0 \ {n}))
for every nullary nonterminal n ∈ N0.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 2: Types of productions of spine grammars
in normal form (see Definition 9).

The following result is a variant of Theorem 1

of Fujiyoshi and Kasai (2000).

Theorem 10. For every spine grammar there is
a strongly equivalent normalized spine grammar.
(cid:3)

Example 11. We define the spine grammar
G = (N, Σ, {s}, P ) with N1 = {t, a, b, c, b(cid:12), e},
N0 = {s, a, b, c, e}, Σ2 = {α2, β2, γ2, η2}, Σ0 =
{α, β, γ, δ}, and P as shown below.

s → t(δ)
a → α
b → β
¯c → γ
e → β

(cid:3)

(cid:3)

(cid:2)

b → e(β)
t → a
(cid:2)
b(cid:12) → b
b → a
(cid:2)
e → e

b(cid:12)((cid:2))
(cid:3)
c((cid:2))
(cid:2)
b(cid:12)((cid:2))
(cid:3)
e((cid:2))

a → α2(a, (cid:2))
b → β2((cid:2), b)
c → γ2((cid:2), c)
e → η2(e, (cid:2))

The tree in Figure 3a, in which the spines are
marked by thick edges, is generated by G. The
spinal tree corresponding to the main spine of the
depicted tree is shown in Figure 3b. The yield
(cid:3)
of T (G) is {αn δ γn βm | n, m ≥ 1}.

5 Tree-adjoining Grammars

Before we proceed we will briefly introduce TAG
and sketch how a spine grammar is obtained
from it. TAG is a mildly context-sensitive
grammar formalism that operates on a set of
elementary trees of which a subset is initial. To
generate a tree, we start with an initial tree and
successively splice elementary trees into nodes
using adjunction operations. In an adjunction,
we select a node, insert a new tree there, and
reinsert the original subtree below the selected
node at the distinguished and specially marked
foot node of the inserted tree. We use the non-
strict variant of TAG, in which the root and foot
labels of the inserted tree need not coincide with
the label of the replaced node to perform an
adjunction. To control at which nodes adjunction

712

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 3: Tree generated by spine grammar G, a spinal tree in IG(s) (see Example 11), and a tree in
F(S(G))S reassembled from spines (see Example 16).

is allowed, each node is equipped with two types
of constraints. The selective adjunction constraint
specifies a set of trees that can be adjoined and
the Boolean obligatory adjunction constraint spe-
cifies whether adjunction is mandatory. Only trees
without obligatory adjunction constraints are part
of the generated tree language.

Figure 4 shows the elementary trees of an
example TAG. Only tree 1 is initial and foot nodes
are marked by a superscript asterisk ·∗ on the label.
Whenever adjunction is forbidden (i.e., empty
set as selective adjunction constraint and non-
obligatory adjunction), we omit the constraints
altogether. Otherwise, the constraints are put next
to the label. For example, {2, 3}+ indicates that
tree 2 or 3 must (+ = obligatory) be adjoined.

We briefly sketch the transformation from TAG
to sCFTG by Kepser and Rogers (2011). TAG is
a notational variant of footed simple CFTG, in
which all variables in right-hand sides of produc-
tions appear in order directly below a designated
foot node. To obtain an sCFTG, the footed simple
CFTG is first converted into a spine grammar,
where the spine is the path from the root to the foot
node, and then brought into normal form using
the construction of Fujiyoshi and Kasai (2000).
The spine grammar of Example 11 is strongly
equivalent to the TAG shown in Figure 4.

6 Decomposition into Spines

We proceed with the construction starting from
the normalized spine grammar G. First, we will
construct a context-free grammar (CFG) that
captures all information of G. It represents the
spinal trees (from bottom to top) as strings and
enriches the symbols with the spine generator

Figure 4: Tree-adjoining grammar.

(initialized by start productions and preserved
by chain productions) and a non-spinal child
(given by terminal productions). The order of
these annotations depends on the spine direction
of
the
the symbol. The leftmost symbol of
generated strings has only a spine generator
annotated since the bottom of the spine has no
children. To simplify the notation, we write ng for
(n, g) ∈ N 2, αn for (α, n) ∈ Σ0 × N , and
for (σ, n1, n2) ∈ Σ2 × N 2.
Definition 12. Let G be normalized and (cid:19) /∈
N . The spines S(G) = L(G(cid:12)) of G are the
strings generated by the CFG G(cid:12) = ({(cid:19)} ∪
(N 2), Σ(cid:12), (cid:19), P (cid:12)) with Σ(cid:12) = (Σ0 × N ) ∪ (Σ2 × N 2)
and productions P (cid:12) = P0 ∪ P1 ∪ P2 given by
P0 =

σ
n1 n2

(cid:6)

(cid:5)

(cid:8)
(cid:8) (n → α) ∈ P
(cid:8)
(cid:2)
(cid:8)

(cid:6)

(cid:3)

(cid:19) → αn
(cid:5)

(cid:4)

n → b(α)
(cid:2)
(cid:2)
n → b

b(cid:12)((cid:2))

∈ P
(cid:3)(cid:3)

(cid:6)

∈ P

(cid:8)
(cid:8)

(cid:19) → αn bn
(cid:5)
ng → b(cid:12)
(cid:14)(cid:5)

g bg

ng → σ
g n(cid:12)
(cid:8)
(cid:8)

ng → σ
n(cid:12) g

P1 =

P2 =

g∈N
(cid:4)

(cid:5)
g∈N

713

(cid:2)

(cid:8)
(cid:8)

(cid:3)

(cid:6)

n → σ((cid:2), n(cid:12))
(cid:3)

∈ P
(cid:6)(cid:15)

n → σ(n(cid:12), (cid:2))

∈ P

(cid:2)

(cid:3)

Example 13. We list
some corresponding
productions of the spine grammar G (left) of
Example 11 and the CFG G(cid:12)
(right) for its
spines S(G).

a → α
s → t(δ)
t → a

(cid:2)

b(cid:12)((cid:2))

(cid:3)

a → α2(a, (cid:2))

: (cid:19) → αa
: (cid:19) → δsts
ts → b(cid:12)
:
sas
: as → α2
a s

tb
a b

→ b(cid:12)
b ab
→ α2
a b

. . .

. . .

Note that for each start production we obtain a
single production since the nonterminal on the left
side becomes the spine generator. On the other
hand, for each chain or terminal production we
have to combine them with all nonterminals, as we
do not know the spine generator of the nonterminal
on the left side of the original production. When
a string is derived, the spine generators are pulled
through originating from start productions and
are consistent throughout the string. The language
generated by G(cid:12) is

S(G) =

(cid:10)

(cid:10)

δs

β¯b

α2
n β2
s ¯b
¯a s
(cid:8)
(cid:8)
(cid:8) m ≥ 0

m

n

(cid:11)

γ2
s ¯c
η2
¯e ¯b

(cid:11)

(cid:8)
(cid:8)
(cid:8) n ≥ 1
(cid:10)

αa, βe, γc

(cid:11)

(cid:3)

Note that each string generated by the CFG
belongs to (Σ0 × N )(Σ2 × N 2)∗. Next we define
how to reassemble those spines to form trees again,
which then relabel to the original trees generated
by G. The operation given in the following
definition describes how a string generated by
the CFG can be transformed into a tree by
attaching subtrees in the non-spinal direction
of each symbol, whereby the non-spinal child
annotation of the symbol and the spinal annotation
of the root of the attached tree have to match.
Definition 14. Let T ⊆ TΣ2×N,∅(Σ0 × N ) and
w ∈ A with A = (Σ0 × N )(Σ2 × N 2)∗. The
generator gen : (Σ0 × N ) ∪ (Σ2 × N 2) → N is
the nonterminal in spine direction and is given by

(cid:16)

gen(a) =

n
nd(σ)

if a = αn ∈ Σ0 × N
if a = σ

∈ Σ2 × N 2
n1 n2
(cid:8)
(cid:3)
(cid:2)
(cid:8) gen
For n ∈ N , let Tn =
t(ε)
be those trees of T whose root
label has n
annotated in spinal direction. We define the tree

(cid:5)
t ∈ T

= n

(cid:6)

language attT (w) ⊆ TΣ2×N,∅(Σ0×N ) recursively
by attT (αn) = {αn} for all αn ∈ Σ0 × N , and
(cid:3)

(cid:2)

w

attT
(cid:16)

σ
n1 n2

=

σ
n1 n2

(t1, t2)

(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)

td(σ) ∈ attT (w)
t3−d(σ) ∈ Tn3−d(σ)

(cid:17)

σ
n1 n2

∈ Σ2 × N 2.

for all w ∈ A and

(cid:3)
To obtain the tree language defined by G, it is
necessary to apply this operation recursively on
the set of spines.
Definition 15. Let L ⊆ (Σ0 × N )(Σ2 × N 2)∗.
We inductively define the tree language F(L)
generated by L to be the smallest tree language F
(cid:3)
such that attF (w) ⊆ F for every w ∈ L.
Example 16. The CFG G(cid:12) of Example 13
generates the set of spines S(G) and F(S(G))S
contains the correctly assembled trees formed
from these spines. Figure 3c shows a tree of
F(S(G))S since the generator of the main spine
is S = s, which is stored in spinal direction in the
root label α2
¯a s . We can observe the correspondence
of annotations in non-spinal direction and the
spine generator of the respective child in the same
(cid:3)
direction.
S and T (G)
coincide modulo relabeling. This shows that the
context-free language S(G) of spines completely
describes the tree language T (G) generated by G.

Next we prove that F

S(G)

(cid:2)

(cid:3)

(cid:3)

F(S(G))S

Theorem 17. Let G be normalized. Then
(cid:2)
= T (G), where the relabeling
π
π : (Σ0 × N ) ∪ (Σ2 × N 2) → Σ0 ∪ Σ2 is given
) = σ for all α ∈ Σ0,
by π(αn) = α and π( σ
n1 n2
(cid:3)
σ ∈ Σ2, and n, n1, n2 ∈ N .
Corollary 18. There exists a pop-normalized
(cid:3)
(cid:2)
S(G)
MPDA A such that L(A) ∪ L1 = Next
,
(cid:6)
(cid:2)
S(G)
w ∈ Next
| |w| = 1
where L1 =
.
(cid:3)
S and T (G) coincide
L(A) ∪ L1
Moreover, F
(cid:3)
modulo relabeling.

(cid:5)

(cid:2)

(cid:3)

Example 19. The MPDA constructed in
the spine grammar G of
Corollary 18 for
Example 11 is depicted in Figure 5. Initial states
are indicated using a start marker and final
states are marked by a double circle. Pushing
and popping stack operations are written with
downwards and upwards arrows, respectively. The
MPDA consists of two components. The bigger
one describes the main spine, and the smaller one

714

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:3)

(cid:2)

σ
n1 n2

tell whether the symbol τ (q) generated by state q
occurs as the first or second child of its parent sym-
bol and with which spine generator it is combined.
∈ Σ2 × N 2
Let τ (q) =
and s2 ∈ Σ(cid:12)(cid:12). The slash type and the com-
bining nonterminal can be determined from the
. Formally, slash(q) = / if
next symbol
d(σ) = 1 and slash(q) = \ otherwise. In addition,
comb(q) = n3−d(σ) and comb(⊥) = S.

σ
n1 n2

σ
n1 n2

with

, s2

We simulate the accepting runs of A in the
spines consisting of primary categories of the
CCG. The main idea is that the primary categories
on the spine store the current configuration of A.
This is achieved by adding an additional argument
for transitions that push a symbol, whereas for
each popping transition, an argument is removed.
The rightmost argument stores the current state
in the first component and the top of the stack in
the second component. The previous arguments
store the preceding stack symbols in their second
components and the state the automaton returns to
when the stack symbol stored in the next argument
is popped in the first components. To implement
the required transformations of consecutive pri-
mary categories, the secondary categories need to
have a specific structure. This mandates that the
categories at the top of a spine (which act as sec-
ondary categories unless they belong to the main
spine) cannot store their corresponding automaton
state in the first component of the last argument as
usual, but instead utilize the third component of
their target. Thus each argument stores the final
state corresponding to its secondary combination
partner in the third component. This third compo-
nent also allows us to decide whether a category
is primary: A category is a primary category if
and only if the spine generator of the state stored
in the first component of the last argument and
the spine generator of the state stored in the last
component of the target coincide. This is possible
since G is normalized, which yields that attaching
spines have a spine generator that is different from
the spine generator of the spine that they attach to.

Definition 20. We define the CCG GA,L1 =
(Δ0, A, R, I (cid:12), L) as follows:

Let A = {(q, γ, f ) ∈ A(cid:12) | gen(f ) = comb(q)}
with A(cid:12) = (Q ∪ {⊥}) × Γ × (F ∪ L1). We use ai
to refer to the i-th component of an atom a ∈ A.
Additionally, let I (cid:12) = {(⊥, ε, f ) ∈ A | gen(f ) =
S}.

715

Figure 5: Sample MPDA (see Example 6).

describes the side spine. The distinction between
the three stack symbols is necessary due to pop-
normalization. The distinction between q1 and q(cid:12)
1
(and similar states) is necessary because their
previous action distinguishes their produced input
(cid:2)
symbol since we recognize Next
. For
example, τ (q1) =
1) =
(cid:2)
¯c
¯b , γ2
β2
. Similarly, τ (p1) = (z, z) and
s
τ (p(cid:12)
1) = ((cid:12), z) where z = η2
¯b . To completely
capture the behavior of G, we additionally require
the set L1 = {((cid:12), α¯a), ((cid:12), β¯b), ((cid:12), β¯e), ((cid:12), γ¯c)},
(cid:3)
which contains the spines of length 1.

S(G2)
and τ (q(cid:12)

¯c , γ2

γ2

(cid:3)

(cid:3)

(cid:2)

(cid:3)

¯e

¯c

s

s

s

7 Constructing the CCG
In this section, let G = (N, Σ, S, P ) be a normal-
ized spine grammar with spine direction d : Σ →
{1, 2} and A = (Q, Δ, Γ, δ, τ, I, F ) the pop-
normalized MPDA constructed in Corollary 18
with pop : Γ → Q. We note that Δ = Σ(cid:12) ×Σ(cid:12)(cid:12) with
Σ(cid:12) = {(cid:12)}∪(Σ2 ×N 2) as well as Σ(cid:12)(cid:12) = (Σ0 ×N )∪
(Σ2 ×N 2). Moreover, let ⊥ /∈ Q be a special sym-
bol. To provide better access to the components
of the MPDA A, we define some additional maps.
The spine generator gen : Q → N is given for
every state q ∈ Q by gen(q) = gen(s2), where
τ (q) = (s1, s2) ∈ Δ. Since A cannot accept
strings of length 1, we have to treat them sepa-
(cid:6)
| |w| = 1
w ∈ Next
rately. Let L1 =
and gen : L1 → N be given by gen(w) = n for
all w = ((cid:12), αn) ∈ L1. We extend τ : Q → Δ to
τ (cid:12) : (Q ∪ L1) → Δ by τ (cid:12)(q) = τ (q) for all q ∈ Q
and τ (cid:12)(a) = a for short strings a ∈ L1.

S(G)

Recall

that D = {/, \}. The slash type
slash : (Q \ F ) → D and combining nonterminal
comb : (Q \ F ) ∪ {⊥} → N of a state q ∈ Q \ F

(cid:5)

(cid:3)

(cid:2)

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:18)

|
|∈D(R
1

In the rules R =

|
3) we
underline the primary category ax\b, which
always needs to fulfill gen(a3) = gen(b1).
(cid:16)

|
∪ R
2

∪ R

(cid:17)

(cid:4)

R/

1 =

a,b,c∈A, |∈D
(b1,ε,ε,c1)∈δ
b2=c2
(cid:4)

R/

2 =

a,b,c,e∈A, |,|(cid:12)∈D
(b1,ε,e2,e1)∈δ
b2=c2
c1=pop(e2)
(cid:4)

(cid:16)

R/

3 =

a,b∈A
(b1,b2,ε,q)∈δ

ax | c

ax/b

b | c

(cid:16)

ax | c |(cid:12) e

ax/b

b | c |(cid:12) e

(cid:17)

(cid:17)

ax

ax/b

b

(1)

(2)

(3)

We listed all the forward rules, but for each
forward rule there also exists a symmetric
\
backward rule yielding the rule sets R
2,
and R

\
1, R

\
3.

We need some notions for the lexicon. A
category c ∈ C(A) is well-formed if | = slash(b1)
and b1 ∈ Q for every i ∈ [ar(c)] with | b =
arg(c, i). Let Cwf = {c ∈ C(A) | c well-formed}
be the set of well-formed categories. Clearly
I (cid:12) ⊆ Cwf. In addition, we introduce sets (cid:19)L1
and (cid:19)A of top-of-spine categories derived from
the short strings of L1 and the strings accepted
by A, respectively:

{ax ∈ Cwf | a3 ∈ L1}

(cid:19)L1 = {a ∈ I (cid:12) | a3 ∈ L1}
(cid:4)

r∈R
ax=sec(r)
(cid:19)A = {a ∈ I (cid:12) | a3 ∈ F }
(cid:4)

{ax ∈ Cwf | a3 ∈ F }

r∈R
ax=sec(r)

∪ (cid:19)A ⊆ Cwf. Now we can define the
Note that (cid:19)L1
lexicon as follows for all α ∈ Δ0 = Σ(cid:12)×(Σ0×N ):

(cid:7)

L(α) =

ax

(cid:8)
(cid:8)
(cid:8)
(cid:8)

(cid:9)

ax ∈ (cid:19)L1
τ (cid:12)(a3) = α
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)
(cid:8)

ax ∈ (cid:19)A
b1 ∈ I
gen(a3) = gen(b1)
pop(b2) = a3
τ (cid:12)(b1) = α

ax | b ∈ Cwf

⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎩

⎪⎪⎪⎪⎬
⎪⎪⎪⎪⎭

(cid:3)

716

Each atom of A consists of three components.
The first component stores the current state of A
(or the special symbol ⊥), the second component
stores the current symbol at the top of the stack,
and the third component stores the final state
corresponding to the combining category of the
attaching side spine. With this intuition, the rule
system directly implements the transitions of A.
The lexicon assigns categories to symbols that
can label
leaves, so these symbols are taken
from the nullary terminal symbols. The assigned
categories consist of a category that appears at
the top of a spine and an additional argument for
the initial state of an accepting run. The spines
of length 1 are translated directly to secondary
categories or initial categories.

Let us make a few general observations that hold
for all the categories that appear in derivation trees
of GA,L1: (i) All categories are well-formed. This
follows from the fact only well-formed categories
occur in the lexicon and all categories in the
derivation trees consist of atoms and arguments
that were already present in the lexicon. (ii) All
primary categories ax | b obey gen(a3) = gen(b1).
This is directly required by the rule system.

Finally, we will now describe how to relabel the
derivation trees D(GA,L1) of the CCG GA,L1 that
uses categories built using the input symbols of the
MPDA A. Note that only well-formed categories
will occur in derivation trees. Primary and non-
primary categories are relabeled differently. The
relabeling ρ : Cwf → Δ is defined for every
c ∈ Cwf by ρ(ax | b) = τ (cid:12)(b1) for all primary
categories ax | b ∈ Cwf; i.e., gen(a3) = gen(b1).
Otherwise ρ(ax) = τ (cid:12)(a3) for all
initial and
secondary categories ax ∈ Cwf.

The following property requires that the spine
grammar G is normalized, so a spine never has the
same spine generator as its attached spines.
Lemma 21. For all secondary categories ax | b
(cid:3)
we have gen(a3) (cid:20)= gen(b1).

We are now ready to describe the general form
of primary spines of GA,L1. Given a primary
spine c0 . . . cn read from lexicon entry towards
the root with n ≥ 1, we know that it starts
with a lexicon entry c0 = ax | b ∈ L(Δ0)
and ends with the non-primary category ax,
which as such cannot be further modified.
Hence each of the categories c ∈ {c0, . . . , cn−1}
has the form ax |1 b1 . . . |m bm with m ≥ 1.
Let bi = (qi, γi, fi) for every i ∈ [m]. The

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 6: Part of a derivation tree of GA,L1 (see Example 23).

category cn is relabeled to τ (cid:12)(a3) and c is relabeled
to τ (cid:12)(qm). Additionally, unless a1 = ⊥, the first
components of all atoms in ax have the same spine
generator gen(a1) and gen(q1) = · · · = gen(qm),
but gen(a1) (cid:20)= gen(q1). Finally, neighboring
arguments |i−1 bi−1 |i bi in the suffix are coupled
such that pop(γi) = qi−1 for all i ∈ [m] \ {1}.
This coupling is introduced in the rules of second
degree and preserved by the other rules.

(cid:3)

S(G)

Using these observations, it can be proved that
the primary spines of GA,L1 are relabeled to strings
(cid:2)
of Next
and vice versa. Additionally,
spines attach in essentially the same manner in
the CCG and using F. This yields the following
main theorem.
Theorem 22. Given a spine grammar G, we can
construct a CCG G(cid:12) that can generate T (G). (cid:3)
Example 23. Figure 6 shows part of the derivation
tree of CCG GA,L1 that corresponds to the tree
of Figure 3a, which is generated by the spinal
grammar G of Example 11. We use the following
abbreviations: α = ((cid:12), αa), β = ((cid:12), βb), and
γ = ((cid:12), γc). The labeling of the depicted section
is δ γ2 γ2 β2 for the main spine and β η2 for the
side spine (see Figure 3a). The corresponding runs
of A are
and
(cid:2)
(cid:17)p0, χ(cid:18), (cid:17)p(cid:12)
Let us observe how the transitions of A
are simulated by GA,L1. The first
transition
(q0, ε, ε, q1) on the main spine does not modify
the stack. It is implemented by replacing the
last argument /(q0, ω, γ) by /(q1, ω, γ). The next
transition (q1, ε, υ, q(cid:12)
1) pushes the symbol υ to the
stack. The argument /(q1, ω, γ) is thus replaced
by \(q3, ω, α)/(q(cid:12)
1). As the stack grows, an
additional argument with the new state and stack
symbol is added. The previous argument stores
pop(υ) = q3 to ensure that we enter the correct

(cid:17)q0, ω(cid:18), (cid:17)q1, ω(cid:18), (cid:17)q(cid:12)
1, ε(cid:18)

1, υω(cid:18), (cid:17)q2, υω(cid:18)

1, υ, p(cid:12)

(cid:3)

(cid:3)

(cid:2)

.

state after popping υ. It also contains the previous
unchanged stack symbol ω. The popping transition
(p0, χ, ε, p(cid:12)
1) on the side spine run is realized by
removing /(p0, χ, β).

The third components are required to relabel
the non-primary categories. At the bottom of the
main spine, c1 = (⊥, ε, q(cid:12)
3)/(q0, ω, γ) is a primary
category because q0 and q(cid:12)
3 are associated with the
same spine generator s. Thus, c1 gets relabeled
to τ (cid:12)(q0). However, for c2 = (q0, ω, γ)/(q1, ω, γ)
the spine generators of γ and of the output of q1
are different (c and s). Hence it is a non-primary
category and gets relabeled to γ.

Concerning the lexicon, c1 is a lexical category
due to the fact that (⊥, ε, q(cid:12)
3) ∈ (cid:19)A can appear
at the top of a spine as an initial category with
q(cid:12)
∈ F in its third component, while the appended
3
(q0, ω, γ) represents an initial configuration of A.
Similarly, c2 is a well-formed secondary category
of a rule and the third component of its target is
in L1. Therefore, it is an element of (cid:19)L1, which is
a subset of the lexicon.

1, υ, p(cid:12)

Let us illustrate how the attachment of the side
spine to the main spine is realized. The lexicon
contains (q(cid:12)
1)\(q2, υ, α)\(p0, χ, β), of which
the first two atoms are responsible for performing
a transition on the main spine. This part cannot be
modified since the rule system disallows it. The
target stores the final state p(cid:12)
1 of the side spine run
in its third component. The appended argument
models the initial configuration of the side spine
(cid:3)
run starting in state p0 with χ on the stack.

the

For

converse

inclusion we

utilize
Theorem 20 of Kuhlmann et al. (2019). It states
that for every CCG G(cid:12) there exists an sCFTG
that generates the rule trees of G(cid:12). Whereas
derivation trees are labeled by categories, rule
trees are labeled by lexicon entries at leaves and

717

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

by applied rules (instead of the output category) at
inner nodes. Rule trees are a natural encoding
of derivation trees using only a finite set of
labels. As each rule indicates the target and last
argument of its output category, rule trees can be
relabeled in the same manner as derivation trees.
For completeness’ sake we restate Definition 16
of Kuhlmann et al. (2019).
Definition 24. Let G = (Σ, A, R, I, L) be a
CCG and T = TR,∅(L(Σ)). A tree t ∈ T is
a rule tree if cat(t) ∈ I, where the partial
map cat : T → C(A) is inductively defined by
(i) cat(a) = a for all lexicon entries a ∈ L(Σ),

(cid:27)

(cid:26)

axy

(ii) cat
= azy for all trees
t1, t2 ∈ T with cat(t1) = az/b and cat(t2) = by,

(t1, t2)

by

ax\b
(cid:26)

(cid:27)

axy
by ax\b

(t1, t2)

and (iii) cat
= azy for all
t1, t2 ∈ T with cat(t1) = by and cat(t2) = az\b.
rule trees of G is denoted
The set of all
(cid:3)
by R(G).

We observe that any category relabeling can
equivalently be applied to rule trees instead of
derivation trees (because a category relabeling
only depends on the target a and the last
argument | b of a category ax | b). This yields
the second main theorem.

Theorem 25. CCGs and sCFTGs are strongly
(cid:3)
equivalent up to relabeling.

Kepser and Rogers (2011) proved that TAGs
and sCFTGs are strongly equivalent, which shows
that
they are also strongly equivalent (up to
relabeling) to CCGs.

Corollary 26. CCGs and TAGs are strongly
(cid:3)
equivalent up to relabeling.

Clearly,

from strong equivalence we can
(without
conclude weak equivalence as well
the relabeling since the lexicon provides the
relabeling). Weak equivalence was famously
proven by Vijay-Shanker and Weir (1994), but
Theorem 3 of Kuhlmann et al. (2015) shows that
the original construction is incorrect. However,
Weir (1988) provides an alternative construction
and proof. Our contribution provides a stronger
form (and proof) of this old equivalence result. It
avoids the ε-entries that the original construction
heavily relies on. An ε-entry is a category assigned
to the empty string; these interspersed categories
form the main building block in the original
constructions. The necessity of these ε-entries

718

(Vijay-Shanker and Weir, 1994) is an interesting
and important question that naturally arises and
has been asked by Kuhlmann et al. (2015). We
settle this question and demonstrate that they can
be avoided.

Corollary 27. CCGs and TAGs are weakly
equivalent, and CCGs with ε-entries and CCGs
(cid:3)
generate the same (ε-free) languages.

The tree expressive power of CCGs with re-
stricted rule degrees has already been investigated
by Kuhlmann et al. (2019). It has been shown that
0-CCGs accept a proper subset of the regular tree
languages (G´ecseg and Steinby, 1997), whereas
1-CCGs accept exactly the regular tree languages.
It remained open whether there is a k such that
k-CCGs and (k + 1)-CCGs have the same expres-
sive power. Our construction establishes that
2-CCGs are as expressive as k-CCGs for arbitrary
k ≥ 2. Another consequence of our construction
is that first-order categories are sufficient.

Corollary 28. 2-CCGs with first-order categories
have the same expressive power as k-CCGs with
(cid:3)
k > 2.

8 Conclusion

We presented a translation from spine grammar
to CCG. Due to the strong equivalence of spine
grammar and TAG (Kepser and Rogers, 2011),
we can also construct a strongly equivalent CCG
for each TAG. Together with the translation from
CCG to sCFTG (Kuhlmann et al., 2019), this
proves the strong equivalence of TAG and CCG,
which means that both formalisms generate the
same derivation trees modulo relabelings. Our
construction uses CCG rules of degree at most 2,
only first-order categories, lexicon entries of arity
at most 3, and no ε-entries in the lexicon. Such
CCGs thus have full expressive power. Avoiding
ε-entries is particularly interesting because they
violate the Principle of Adjacency (Steedman,
2000, p. 54), which is a fundamental linguistic
principle underlying CCG and requires that all
combining categories correspond to phonologi-
cally realized counterparts in the input and are
string-adjacent. Their elimination is performed by
trimming them from the sCFTG obtained from a
CCG with ε-entries and translating the trimmed
sCFTG back to a CCG using our construction.

Translating CCG to sCFTG (Kuhlmann et al.,
2019) yields sCFTGs whose size is exponential

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

in a CCG-specific constant, which depends on
the maximal rule degree and the maximal arity of
lexicon entries. The increase can be attributed to
variables in CCG rules, which need to be properly
instantiated. Our construction increases the gram-
mar size only polynomially, which can be verified
for each step. Overall, a k-CCG can be converted
to an equivalent 2-CCG without ε-entries in time
and space exponential in k (and the maximal
length of lexicon entries) and polynomial in the
size of the grammar.

Acknowledgments

We would like to thank Mark Steedman and
the three anonymous reviewers for their valuable
and detailed comments, which greatly helped in
improving the comprehensibility of this paper. The
work of Lena Katharina Schiffer was funded by
the German Research Foundation (DFG) Research
Training Group GRK 1763 ‘Quantitative Logics
and Automata’.

References

Jean-Michel Autebert, Jean Berstel, and Luc
Boasson. 1997. Context-free languages and
pushdown automata, Grzegorz Rozenberg
and Arto Salomaa, editors, Handbook of
Formal Languages, volume 1, chapter 3,
pages 111–174. Springer. https://doi.org
/10.1007/978-3-642-59136-5 3

Jason Baldridge. 2002. Lexically Specified
Derivational Control
in Combinatory Cate-
gorial Grammar. Ph.D. thesis, University of
Edinburgh.

Yehoshua Bar-Hillel, Haim Gaifman, and Eli
Shamir. 1960. On categorial and phrase-
structure grammars. Bulletin of the Research
Council of Israel, 9F(1):1–16.

Haskell B. Curry, Robert Feys, and William Craig.
1958. Combinatory Logic. Number 1 in Studies
in Logic and the Foundations of Mathematics.
North-Holland.

https://doi.org/10.1007/978-3-642
-40787-1 11

Herbert Fleischner. 1977. On the equivalence
of Mealy-type and Moore-type automata and
a relation between reducibility and Moore-
reducibility. Journal of Computer and System
Sciences, 14(1):1–16. https://doi.org
/10.1016/S0022-0000(77)80038-X

Akio Fujiyoshi and Takumi Kasai. 2000. Spinal-
formed context-free tree grammars. Theory of
Computing Systems, 33(1):59–83. https://
doi.org/10.1007/s002249910004

Ferenc G´ecseg and Magnus Steinby. 1997. Tree
languages. In Grzegorz Rozenberg and Arto
Salomaa, editors, Handbook of Formal Lan-
guages, volume 3, chapter 1, pages 1–68.
https://doi.org/10.1007
Springer.
/978-3-642-59126-6 1

Saul Gorn. 1965. Explicit definitions and linguis-
tic dominoes. In Systems and Computer Science,
Proceedings of the Conference held at Univ. of
Western Ontario, pages 77–115. https://
doi.org/10.3138/9781487592769-008

Julia Hockenmaier and Peter Young. 2008. Non-
local scrambling: The equivalence of TAG and
CCG revisited. In Proc. 9th TAG+. University
of T¨ubingen.

Aravind K. Joshi. 1985. Tree adjoining grammars:
How much context-sensitivity is required to
provide reasonable structural descriptions?
In David R. Dowty, Lauri Karttunen, and
Arnold M. Zwicky, editors, Natual Language
Parsing, chapter 6, pages 206–250. Cambridge
University Press. https://doi.org/10
.1017/CBO9780511597855.007

Stephan Kepser and Jim Rogers. 2011. The
equivalence of tree adjoining grammars and
monadic linear context-free tree grammars.
Journal of Logic, Language and Information,
20(3):361–384. https://doi.org/10.1007
/s10849-011-9134-0

Normann Decker, Martin Leucker, and Daniel
Thoma. 2013. Impartiality and anticipation
for monitoring of visibly context-free prop-
erties. In Proc. Runtime Verification, volume
8174 of LNCS, pages 183–200. Springer.

Alexander Koller and Marco Kuhlmann. 2009.
Dependency trees and the strong generative
In Proc. 12th EACL,
capacity of CCG.
pages 460–468. ACL. https://doi.org
/10.3115/1609067.1609118

719

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Marco Kuhlmann, Alexander Koller, and Giorgio
Satta. 2010. The importance of rule restrictions
in CCG. In Proc. Association for Computational
Linguistics, pages 534–543. ACL.

William C. Rounds. 1969. Context-free grammars
on trees. In Proc. Symposium on Theory of
Computing, pages 143–148. ACM. https://
doi.org/10.1145/800169.805428

Marco Kuhlmann, Alexander Koller, and Giorgio
Satta. 2015. Lexicalization and generative
power in CCG. Computational Linguistics,
41(2):187–219. https://doi.org/10.1162
/COLI a 00219

Marco Kuhlmann, Andreas Maletti, and Lena K.
Schiffer. 2019. The tree-generative capacity
of combinatory categorial grammars. In Proc.
Foundations of Software Technology and
Theoretical Computer Science, volume 150 of
LIPIcs, pages 44:1–44:14. Schloss Dagstuhl —
Leibniz-Zentrum f¨ur Informatik.

Marco Kuhlmann, Giorgio Satta, and Peter
Jonsson. 2018. On the complexity of CCG pars-
ing. Computational Linguistics, 44(3):447–482.
https://doi.org/10.1162/coli a
00324

Marco Kuhlmann and Giorgio Satta. 2012.
closed
strong lexicalization. Computational

Tree-adjoining grammars
under
Linguistics, 38(3), 617–629.

are not

Mark Steedman. 2000. The Syntactic Process.
MIT Press. https://doi.org/10.1002
/9781444395037.ch5

Mark Steedman and Jason Baldridge. 2011.
Combinatory categorial grammar. In Robert D.
Borsley and Kersti B¨orjars, editors, Non-
Transformational Syntax: Formal and Explicit
Models of Grammar, chapter 5, pages 181–224.
Blackwell.

Krishnamurti Vijay-Shanker. 1988. A Study of
thesis,

Tree Adjoining Grammars. Ph.D.
University of Pennsylvania.

Krishnamurti Vijay-Shanker and David J. Weir.
1994. The equivalence of four extensions of
context-free grammars. Mathematical Systems
Theory, 27(6):511–546.

David J. Weir. 1988. Characterizing Mildly
Context-sensitive Grammar Formalisms. Ph.D.
thesis, University of Pennsylvania.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
3
9
3
1
9
5
5
1
4
3

/

/
t

l

a
c
_
a
_
0
0
3
9
3
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

720
Download pdf