Formal Basis of a Language Universal
Miloˇs Stanojevi´c
University of Edinburgh
School of Informatics
m.stanojevic@ed.ac.uk
Mark Steedman
University of Edinburgh
School of Informatics
steedman@inf.ed.ac.uk
Steedman (2020) proposes as a formal universal of natural language grammar that grammatical
permutations of the kind that have given rise to transformational rules are limited to a class
known to mathematicians and computer scientists as the “separable” permutations. This class
of permutations is exactly the class that can be expressed in combinatory categorial grammars
(CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of
studies of crosslinguistic variation in word order in nominal and verbal constructions.
The number of permutations that are separable grows in the number n of lexical elements in
the construction as the Large Schr¨oder Number Sn−1. Because that number grows much more
slowly than the n! number of all permutations, this generalization is also of considerable practical
interest for computational applications such as parsing and machine translation.
The present article examines the mathematical and computational origins of this restriction,
and the reason it is exactly captured in CCG without the imposition of any further constraints.
1. Introduction
Permutation of elements in a construction, either within a single language or across
languages, creating discontinuous semantic dependencies, is a widespread character-
istic of natural languages that has motivated transformational rules (Chomsky 1957,
passim), among other expressive grammar formalisms. Steedman (2020) proposes as
a formal universal of natural language grammar that grammatical permutations are
limited to a class known to mathematicians and computer scientists as the “separable”
permutations, which are those orders over which a tree can be constructed such that all
leaves descending from any node form a continuous subset i . . . j of the original ordered
ensemble 1, . . . , n.
The evidence for this claim is based on a number of studies of two data sets of
alternations in two major constructions. The first concerns the permutations of the four
Submission received: 29 Janvier 2019; revised version received: 29 Juillet 2020; accepted for publication:
10 Octobre 2020.
https://doi.org/10.1162/COLI a 00394
© 2021 Association for Computational Linguistics
Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
(CC BY-NC-ND 4.0) Licence
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
elements Det(erminer) Num(erator) Adj(ective) N(oun) that surface in that order in
English noun phrases (NP) like “These five young lads,” but in Yoruba and many other
languages are found in the mirror-image permutation. The crosslinguistic variation in
this construction has been studied by Cinque (2005) and Abels and Neeleman (2012)
(entre autres) for languages with a single dominant order, and also within a single
language with very free word order for the same construction by Nchare (2012). Le
second construction is the verb group, also consisting of four elements, which we can
think mnemonically of as the verb-sequence “must have been dancing,” whose varia-
tions with various verb-types both across and within the various Germanic languages
and dialects, has been studied by Wurmbrand (2004; 2006) and Abels (2016).
In both constructions, the attested orders fall on highly skewed Zipfian power-law
distributions, with the rarest orders found in only a very few cases. Such distributions
make it impossible to decide whether further unattested orders are entirely excluded
by hard constraints on the universal space of possible grammars, or merely disfavored
by sampling error arising from soft constraints such as “Harmony” (proposed by
Culbertson, Smolensky, and Legendre 2012, among others considered below) that may
merely make them vanishingly rare in samples of the size that we can actually access
(Abels and Neeleman 2012). Bien sûr, a further possibility is that the unattested orders
arise from a combination of hard and soft constraints.
The present study defers to such authors on the specific characteristics of the soft
constraints and the distributions themselves, and considers only the nature and origin
of a specific hard constraint on the formal operations that define the set of permutations
that such distributions can apply to in the first place.
Both the nominal and verbal constructions concerned can be thought of in grammat-
ical terms as made up of elements or “categories” with a “natural order of dominance”
stemming from their semantics. For these constructions, that order of dominance hap-
pens in English to be the same as their left-to-right order on the page, inducing a right-
branching tree-structure. Cependant, in other languages, the same relations of dominance
allow all of the word orders that would arise if the same tree were treated as a mobile,
with daughters allowed to rotate around each other, as well as some word orders
involving apparent displacements that go beyond rotation. While the constructions of
interest allow only one natural order of dominance, other sets of categories including
multiple words of the same category may allow more than one natural order to merge.
(Par exemple, the set consisting of the words “divorced, alleged, spy” has two natural
orders of dominance, the other one corresponding to a distinct sense “alleged divorced
spy.”)
We can therefore abstract over the two constructions of interest as being made up of
elements numbered 1 > 2 > 3 > 4 according to that natural order of dominance, sous
the following correspondence:
(1) These > five > young > lads
must > have > been > dancing
1 > 2 > 3 >
4
There is widespread suspicion that not all of the four-factorial permutations of these
constructions are allowed (Williams 2003; Svenonius 2007).
In considering whether a given permutation is allowed for the NP construction,
Cinque (2005; 2010) emphasizes the need to exclude from consideration word orders for
the NP in which the adjective is extraposed or has the markers of a relative clause. Le
exclusion is crucial, since such modifiers may be adjoined to the entire NP, rather than to
10
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
N, thereby defining a different natural order of dominance, with the extraposed element
comme 1. Cinque rejects a number of the orders claimed by Dryer (2018) on this ground. Le
present paper follows Cinque strictly concerning the attested fixed NP orders.
Cinque also proposes to exclude free word-order alternation of the kind investi-
gated by Nchare (2012) for the Shupamem NP, on the grounds that alternating orders
reflect distinctions of focus. Accounts of focus alternations have sometimes assumed
the existence of an FP-like focus-phrasal external functional projection, offering an
extra target for a different variety of movement resembling extraposition (Szabolcsi
1983, 1994; Giusti 2006). Cependant, this is an additional and somewhat theory-internal
assumption. English achieves similar alternations of focus or contrast in the NP by
in situ accent placement without varying word order, for which Steedman (2014) offers
a lexicalized alternative semantics without focus-movement or focus-projection, other
than by surface-compositional syntactic derivation. We therefore depart from Cinque
on this point, and provisionally accept Nchare’s free orders as arising from the same
mechanism as Cinque’s.
The orders attested in these three studies for the NP and VP constructions can then
be summarized as in Figure 1, in which the alphabetic ordering of the permutations
(which will be used for reference throughout this article) follows Cinque. (Abels 2016
adopts a different convention.) Cinque (2007 n.13) notes the possibility that Figure 1(m)
(Cinque 2005)
(Nchare 2012)
(Abels 2016)
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
un.
b.
c.
d.
e.
F.
g.
h.
je.
j.
k.
je.
m.
n.
o.
p.
q.
r.
s.
t.
toi.
v.
w.
X.
1 2 3 4
1 2 4 3
1 4 2 3
4 1 2 3
2 1 3 4
2 1 4 3
2 4 1 3
4 2 1 3
3 1 2 4
3 1 4 2
3 4 1 2
4 3 1 2
1 3 2 4
1 3 4 2
1 4 3 2
4 1 3 2
2 3 1 4
2 3 4 1
2 4 3 1
4 2 3 1
3 2 1 4
3 2 4 1
3 4 2 1
4 3 2 1
v/
v/
v/
v/
—
—
—
—
—
—
v/
v/
(v/ )
v/
v/
v/
—
v/
v/
v/
—
—
v/
v/
Chiffre 1
Attested permutations.
v/
v/
—
—
v/
v/
—
—
v/
—
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
—
(v/ )
—
(v/ )
—
—
v/
v/
(v/ )
v/
v/
v/
—
v/
(v/ )
v/
—
—
v/
v/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
11
Computational Linguistics
Volume 47, Nombre 1
constitutes a fifteenth order for the NP, attested for only one language so far, Dhivehi
(Maldivian; Cain 2000), indicated by parentheses in the figure.1
We note at this point that Dryer (2018) claims several further word orders as basic
for NP in fixed-order languages, in addition to those listed by Cinque. En particulier, il
claims (2018: pages 17, 29) that Kilivila (Senft 1986) and four other languages have order
(g), which is excluded by the theory developed below, as their default order. Cependant,
Cinque argues that Senft’s example of this order involves adjectival extraposition, pas-
ing that Senft (1986, pages 96) also cites the canonical order (un) for the full four-element
construction itself. Of the other four languages cited by Dryer, he himself notes for
Yapese (Jensen 1977) that the adjective in his example of order (g) is marked as a relative
clause including the copula, hence arguably also extraposed from NP. Of the remaining
three languages for which Dryer claims this order, Katu (Costello 1969, page 22) does
not lexically distinguish demonstratives from locatives, but the one example Costello
gives (1969, page 34 (87)) involving both an adjective and a demonstrative locative has
the canonical reverse order N Adj Dem. The example given by Tryon (1967, page 60) pour
Dehu/Drehu includes the copula with the adjective, so is arguably also extraposed.
In the lexically multifunctional language Teop (Mosel 2017), to which Dryer also
attributes (g) as base order, adjectives are expressed as adjoined adjectival phrases, avec
their own copies of the article or agreement and numerator, so that a strong possibility
of extraposition or apposition clearly exists here also.
In terms of the CCG theory developed below, adjective extraposition requires
the addition of a distinct NP-adjunct category, syntactically and semantically non-
isomorphic to the standard adjective, inducing a different Natural Order of Dominance
over the four elements, with the extraposed adjectival as 1, the demonstrative as 2, le
numerator as 3, and the noun as 4. Ainsi, none of these languages constitutes a coun-
terexample to the claim below that order (g) is universally excluded for the standard
catégories. We have strictly followed Cinque’s account of the attested NP orders, et
exclude several other orders listed by Dryer whose attestation appear to depend on
adjective extraposition, y compris (h).
Of the orders for the Germanic verb group admitted by Abels (2016), he gives only
qualified support for Figure 1f, h, m, and s, in some cases because they are only attested
as alternates, similarly marked by parentheses.
It is striking that just two of the 24 permutations over these two constructions are
entirely without even equivocal support for attestation in these three studies, namely,
Figure 1g and j—that is, 2413 and its mirror-image 3142. These are the permutations that
would give rise to the following word orders for the two constructions:
(2) g.
Five
lads
ces
young
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
NumP|N(cid:48)
2
N
4
NP|NumP N(cid:48)|N
1
j.
Jeune
ces
lads
3
five
N(cid:48)|N NP|NumP
3
1
N
4
NumP|N(cid:48)
2
1 Some of the possibilities admitted for the Shupamem NP are conditional on the presence of clitic
agreement and definiteness markers, and certain orders are associated with focus effects—see Nchare
(2012) chapter 3 and Steedman (2020) for details that we pass over here.
12
Stanojevi´c and Steedman
Formal Basis of a Language Universal
(3) g.
have dancing must
been
VP2|VP3 VP4 VP1|VP2 VP3|VP4
2
4
1
3
j.
been
must dancing have
VP3|VP4 VP1|VP2 VP4 VP2|VP3
3
1
4
2
The above examples of permutations that in CCG are neither allowed as base orders nor
as alternates for the two constructions include lexical types indicating the dominance
relations among these elements, using an order-free categorial notation, in which a
category of the form X|Y indicates an element that combines with a sister constituent of
type Y adjacent to the right or to the left to yield a constituent of type X.2
These categories immediately offer a hint as to a hard constraint forcing the two
ordres 2413 and its mirror-image 3142 to remain unattested. A formal proof is set out
below, but it is obvious from inspection of (2) et (3) that these are the only two
permutations in which no category of the form X|Y is adjacent to a category Y, or to a category
Oui|Z that could yield Y as its result.
It is also interesting that the two permutations 2413 and its mirror-image 3142 sont
distinguished by mathematicians and computer scientists as the only two permutations
on the order 1234 that are non-separable (Avis and Newborn 1981; Bose, Buss, and Lubiw
1998). This observation motivated Steedman (2020) to hypothesize that for a fixed
natural order of dominance of arbitrary length all permutations generated by CCG are
within the set of separable permutations.
The main contribution of the present article is to show that this hypothesis indeed
holds for sentences of arbitrary length. En outre, we also establish a close connection
between CCG derivations and separating trees, a connection between CCG parsing and
the number of paths in a lattice, and the relation of this result to the other grammar
formalisms, namely, ITG (Wu 1996), TAG (Joshi 1985), MG (Stabler 1996), and CAT
(Williams 2003).
The paper is structured in the following way. In Section 2 we give a short overview
of CCG. In Section 3 we give a more specific definition of natural order of dominance
and prove that Steedman’s (2020) hypothesis is true. Section 4 shows an alternative,
possibly more intuitive, way of enumerating the permutations that CCG allows, first
by establishing the isomorphism between the normal forms of CCG derivations and
separating trees, and then by showing the connection between CCG shift-reduce parsing
and the Large Schr ¨oder Number of paths in a lattice on the Cartesian plane. Section 5
establishes the statistical significance of the fact that the two non-separable permuta-
tions that are predicted to be unattested are among the vanishingly small number of
permutations that are in fact empirically unattested so far. Section 6 puts this result in
the context of other grammar formalisms and their expressiveness. Appendix A gives
CCG derivations for all 22 separable permutations of the attested NP and VP word
ordres.
2 This notation is isomorphic to the Minimalist Programmatic notation of Chomsky (1995b; 2001), dans lequel
the present |Y is written =Y, and which is essentially categorial (Chomsky 1995a, 2000; Stabler 2011;
Adger 2013).
13
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
2. Combinatory Categorial Grammar
Combinatory Categorial Grammar (CCG) is a radically lexicalized theory of grammar in
which all language-specific syntactic and semantic information concerning word order
and subcategorization is specified in lexical entries or categories, and is projected onto
the sentences of the language by universal rules that are combinatory, in the sense that
they apply to pairs of non-empty, strictly contiguous, catégories. The latter property
immediately guarantees that no CCG can recognize the unattested permutation in (2)
et (3).
2.1 The Categorial Lexicon
The lexical fragment for the very common English NP order in CCG is a directional
categorial grammar, in which all instances of | are instantiated as /, meaning that they
have to combine with an element to the right, thus:3
(4) These = NP/NumP
= NumP/N(cid:48)
five
young = N(cid:48)/N
lads
= N
Slashes identify categories of the form X/Y and X\Y as functions taking an argument of
syntactic type Y to the right or to the left, and as yielding a result of type X.
Par contre, the following lexical fragment defines the even more common mirror-
image word order glossed as “Lads young five these,” as required, Par exemple, pour
Yoruba (Hawkins 1983, page 119).
(5) “These” = NP\NumP
= NumP\N(cid:48)
“five”
“young” = N(cid:48)\N
“lads”
= N
2.2 Rules of Function Application
The universally available rules (6) of syntactic combination called forward and back-
ward application (respectively labeled > and <) allow syntactic derivation from such
lexicons.
(6) The Application Rules
a. X/(cid:63)Y Y ⇒ X
b. Y X\(cid:63)Y ⇒ X
(>)
(<)
The type (cid:63) of the slashes in X/(cid:63)Y and X\(cid:63)Y limit the categories to which these rules can
apply, and can be ignored for the moment.
3 Of course, we need further lexical categories to allow, e.g., These young lads, Five lads, etc., as NP. This
might be done via underspecification using X-bar-theoretic features (Chomsky 1970), but we will ignore
them for now.
14
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
c
o
l
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
l
i
_
a
_
0
0
3
9
4
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
The forward rule (6a) allows the following derivation for the English lexicon (4):
(7)
These
five
young lads
NP/NumP NumP/N(cid:48) N(cid:48)/N N
>
N(cid:48)
>
NumP
>
NP
The rightward arrow > on all combinations in (7) indicates that it is the rightward
functional application rule (6un) that has applied in these cases.4
The derivation in (7) is clearly isomorphic to the standard linguistic context-free
phrase structure tree, except that it is the right way up—that is, its leaves are at the top
and its root at the bottom.
Together with the rightward and leftward application rules (6) for combination to
the right and left, such categories immediately generate eight of the 24 permutations, comme
follows:
(8) un.
b.
n.
o.
r.
s.
five
five
These
These
These
These
five
young lads
NP/NumP NumP/N(cid:48) N(cid:48)/N N
lads young
five
NP/NumP NumP/N(cid:48) N N(cid:48)\N
young lads
NP/NumP N(cid:48)/N N NumP\N(cid:48)
lads young
NP/NumP N N(cid:48)\N NumP\N(cid:48)
young lads
NumP/N(cid:48) N(cid:48)/N N NP\NumP
lads young
NumP/N(cid:48) N N(cid:48)\N NP\NumP
five
N(cid:48)/N N NumP\N(cid:48) NP\NumP
five
N N(cid:48)\N NumP\N(cid:48) NP\NumP
ces
ces
ces
ces
Five
Five
w. Young lads
X. Lads young
Very many languages
Many languages
Few languages
Many languages
Few languages
Many languages
Very few languages
Very many languages
Cinque (2005) gives counts for each word order in his sample of 700 or so languages,
quantized to four levels. The present counts are updated according to Cinque’s 2013
sample of around 1,500 languages, with “very few” corresponding to 10–29, “few” to
4 Although compositional semantics and logical form are suppressed for the purposes of this article, le
semantics of the rules in (6) is also the application of semantic functions such as young(cid:48) to arguments such
as lads(cid:48) to yield logical forms such as young(cid:48) lads(cid:48). En général, if the functor X | Y has logical form f and the
argument Y has logical form a, then the result X always has logical form f a (read “f of a”). Ainsi,
semantics is “surface compositional” in CCG.
15
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
30–99, “many” to 100–299, and “very many” to 300+ (cf. Merlo 2015, Tableau 1). The dis-
continuous alpha-numeration in (8) again reflects the place of these orders in Cinque’s
ordering of the 24 permutations of these elements introduced earlier in Figure 1, lequel
we take as standard. All eight orders are among those endorsed by Cinque and Nchare.5
The above eight orders are the ones obtained by treating the tree (7) as a mobile,
allowing sisters to rotate around each other. The same eight orders are similarly avail-
able for the Germanic serial verb construction, although for these orders we have no
meaningful frequency estimates:
(9) un. will
help
teach swim
VP1/VP2 VP2/VP3 VP3/VP4 VP4
swim teach
help
b. will
VP1/VP2 VP2/VP3 VP4 VP3\VP4
n. will
teach swim help
o. will
VP1/VP2 VP3/VP4 VP4 VP2\VP3
swim teach
VP1/VP2 VP4 VP3\VP4 VP2\VP3
help
r.
help
teach swim will
s.
VP2/VP3 VP3/VP4 VP4 VP1\VP2
swim teach
VP2/VP3 VP4 VP3\VP4 VP1\VP2
help
will
w.
teach swim help
will
VP3/VP4 VP4 VP2\VP3 VP1\VP2
X. swim teach
help
will
VP4 VP3\VP4 VP2\VP3 VP1\VP2
Again all of these orders are attested in Germanic, although Abel’s endorsement of (s)
is equivocal, on the grounds that it is only attested as an alternate in his sample.
It is striking that all possible orders based on purely applicative derivations appear
to be attested for both constructions, and that all of the NP orders admitted by Cinque
(2013) with “very many” or “many” attestations are among them, suggesting soft con-
straints favoring “harmony” or consistent directionality across categories and purely
applicative derivations.6
Cependant, the other 14 orders attested by the authorities in Figure 1 involve disconti-
nuities. Par exemple, in Figure 1c, in its instantiation as the Masai NP order “These lads
five young,” element 3 the adjective “young” is separated from its 4, the noun “lads” by
2, the numerator “five.” How do we allow the attested orders without also allowing the
unnattested orders in (2) et (3)?
5 Cinque’s (2013) study doubles the size of his sample to around 1,500 languages, and adds no new
attested orders to his original 14—unsurprisingly, given the power-law distribution of the frequencies.
6 Abels and Neeleman (2012) and Abels (2016) assume that these orders are base-generated, without
mouvement.
16
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
To do so, something more than the simple application rules is required. Cinque
and others propose transformational movement as that “something more.” The present
approach offers an alternative to movement, or any other syntactic operation of “action
at a distance” over non-contiguous elements.
2.3 Rules That Change Word Order
CCGs also include universally available rules of functional composition, strictly limited
in the first-order case to the following four combinatory rules:7
(10) The Harmonic Composition Rules
un. X/(cid:5)Y Y/Z ⇒B X/Z
b. Y\Z X\(cid:5)Y ⇒B X\Z
(11) The Crossing Composition Rules
un. X/×Y Y\Z ⇒B× X\Z
b. Y/Z X\×Y ⇒B× X/Z
(>B )
(B×)
(B2
×)
The combination of crossing rules and generalized composition is the source of (slightly)
greater than context-free expressive power in CCG, allowing analyses of trans-context-
free constructions like Germanic crossed dependencies (Bresnan et al. 1982; Steedman
1985, 2000).
All syntactic rules in CCG are subject to a generalization called the Combinatory
Projection Principle (CPP), which says that rules must apply consistent with the direc-
tionality specified on the primary function X|Oui, and must project unchanged onto their
result the directionality of any argument(s) specified on the secondary function Y|Z:8
(13) The Combinatory Projection Principle (CPP)
Syntactic combinatory rules are binary rules that apply to contiguous non-empty
categories of the specified syntactic types in the specified linear order, such that the
syntactic type and directionality of any argument in the inputs that also appears in
the result are the same.
7 It will be convenient in the following discussions to refer to the function X|Y in such rules as the
“primary” function, and to the other function Y|Z (etc.) as “secondary.” While we continue to suppress
explicit semantics for the purposes of the present article, like the application rules (6) the composition
rules (10) et (11) have an invariant surface-compositional semantics, such that if the meaning of the
primary function X|Y is a functor f and that of the secondary function Y|Z is g, then the meaning of the
result X|Z is λz.f (g z), the composition of the two functors, which if applied to an argument of type Z and
meaning a, yields an X meaning f (g a).
8 This principle is defined more formally in Steedman (2000; 2012) in terms of three more elementary
Principles of Adjacency, Consistency, and Inheritance.
17
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
Because directionality is originally specified in the lexicon, this principle amounts to the
statement that syntax cannot override the word order specified there.
En particulier, this principle universally excludes rules like the following from CCG:
(14) Oui
Y/Z
X/Y
X/Y
X/Y
X/Y
Y/Z
Z
(cid:54)⇒ X
(cid:54)⇒ X/Z
(cid:54)⇒ X\Z
(cid:54)⇒ X Z
Oui
The same principle excludes all movement, copying, deletion-under-identity, ou autre
action-at-a-distance, all structure-changing operations such as “restructuring,” “reanal-
ysis,” or “reconstruction,” and all “traces” and other syntactic empty categories, et
makes syntactic derivation strictly type-dependent, rather than structure-dependent.
The types (cid:5) and × on the slashes on the primary function X|Y in the composition
rules (10) et (11), like the type (cid:63) on the application rules (6), allows categories to be
lexically restricted as to which of the rules can apply to them or to their projections
under the CPP. The absence of specific slash-typing on the secondary function Y|Z is an
abbreviation meaning that it schematizes over all slash types. Cependant, the CPP (13)
requires that the corresponding slash type on the argument |Z in the result X|Z is the
same slash type.
This slash-typing system is different from the one used by Baldridge and Kruijff
(2003) in that types are not organized into a type hierarchy. Avoiding slash type hier-
archy allows for a more strict specification of which rules can be applied. The type we
now write as X/(cid:5)(cid:63)Y was written following Steedman (2005) and Baldridge and Kruijff
(2003) as X/(cid:5)Oui, and X/×(cid:63)Y was written X/×Y.9
The inclusion of the harmonic composition rules (10) allows some additional
derivations, and supports a variety of “non-constituent” coordinations (Steedman 1985;
Dowty 1988), of which the following is a simple example:10
(15)
These
five
fat
et
seven
lean
cattle
NP/(cid:5)(cid:63)NumP NumP/(cid:5)(cid:63)N(cid:48) N(cid:48)/(cid:5)(cid:63)N (X\(cid:63)X)/(cid:63)X NumP/(cid:5)(cid:63)N(cid:48) N(cid:48)/(cid:5)(cid:63)N N
NumP/(cid:5)(cid:63) N
>B
NumP/(cid:5)(cid:63) N
(NumP/(cid:5)(cid:63) N)\(cid:63)(NumP/(cid:5)(cid:63) N)
NumP/(cid:5)(cid:63) N
>B
>
< >
NumP
>
NP
The crossing composition rules (11), unlike the harmonic rules (10), have a reorder-
ing effect that is relevant to the present discussion. Par exemple, in English they allow
a non-movement-based account of the “Heavy NP Shift” construction, thus:
9 The present slash-type system is believed to be equivalent to the rule-based type-system of Steedman
(2000), distinct from the earlier one investigated by Kuhlmann, Koller, and Satta (2015), and restoring
weak equivalence to TAG.
10 The variable X in the conjunction category schematizes over a bounded number of types. The category’s (cid:63)
slash types impose the across-the-board constraint on coordination (Steedman 2012), and are a
consequence of its semantics, which follows Partee and Rooth (1983).
18
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
(16)
je
will
buy tomorrow a very heavy book
NP (S\ NP)/VP VP/NP VP\ VP
S\ NP
<
S
>
It will be obvious from the above derivation that allowing the crossing composition
rules (11) to apply to unrestricted categories induces alternation of word order, as here
between the Heavy-Shifted order in (16) and the canonical order I will buy a book
tomorrow. Steedman (2020) shows that such word-order alternations can be excluded
for fixed NP word-order languages of the kind considered by Cinque by restricting the
slash type of the functor categories in the lexicon as either × (“only crossing-compose”)
ou (cid:63) (“only apply”).
If, as in the alternation in English between (15) and the canonical order, we want
a category to combine by both forward harmonic composition and forward application,
then we assign a lexical category of the form X/(cid:5)(cid:63)Oui, bearing the union of (cid:5) et (cid:63) slash
les types, as in derivation (15). If we want all three rule-types to apply to a forward category,
to allow free word order, then we assign it the union of all three slash types X/(cid:5)×(cid:63)Oui,
which to save space and maintain compatibility with earlier notations we write as the
universal forward slash X/Y as in (16). Cependant, such details of language-specific fine-
tuning of grammars can be ignored for the present purposes.
3. The Generalization Concerning the Permutations Allowed by CCG
In this section we prove that the earlier claim that CCG generates only separable per-
mutations holds for constructions of arbitrary length. Section 3.1 extends the definition
of natural order of dominance to an arbitrary number of elements and shows how the
natural order of dominance follows from the CCG lexicon that defines constructions.
Section 3.2 then defines the notion of separable permutations. Sections 3.3 et 3.4 define
the proposition more formally and prove its correctness. Section 3.5 shows how this
result applies to the standard version of CCG including lexical type-raising and the full
range of combinatory rules.
3.1 Natural Order of Dominance
The notion of natural order of dominance has been used up until this point informally,
in the context of the categories defining the four core elements of the NP and VP
constructions that point the way to a generalization. To see the generalization to other
constructions of arbitrary length, a more formal definition is needed.
We define the Natural Order of Dominance (NOD) over categories of type X|Oui
where both X and Y are categories of first order, as follows. A word ai with a category
X|Y immediately dominates a word aj with category Y|Z.11 A NOD over n words is then
11 This definition of immediate dominance is a special case of dependency structure for Pure First-Order
CCG as defined by Koller and Kuhlmann (2009). We shall see below that this definition in fact generalizes
to the second-order case where X and/or Y are function categories.
19
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
defined as an ordering a1, . . . an such that ai immediately dominates ai+1 for all i < n. (There may be more than one NOD if there is more than one word of the same type.) Because every word’s category can select for only one argument category, and because each word ai in a NOD immediatly dominates word ai+1, that NOD defines a total ordering of words paired with their categories. We can write any NOD as a . . . Xn−1|Xn, Xn|Xn+1 where each category sequence of categories X1|X2, X2|X3, Xi|Xi+1 is a category of a word taking position i in the NOD. In the previous sections we have invoked NOD over the elements of noun and verb phrases with categories: NP|NumP, NumP|N(cid:48), N(cid:48)|N, N and VP1|VP2, VP2|VP3, VP3|VP4, VP4. These sequences match the generalized definition of NOD except for the last element being atomic Xn instead of a function category Xn|Xn+1. To allow such sequences, we extend the definition of NOD to allow the last element of the node Xn|Xn+1 to be an atomic Xn. Because |Xn+1 will never be canceled, it makes no difference to NOD whether it is present or not. In the proofs we will use the Xn|Xn+1 form for convenience. The linearization of these n words that is consistent with their NOD will be called their Canonical Linearization (CL). 3.2 Separable Permutations and Separating Trees Separable permutations were first formulated in algorithmic form by Avis and Newborn (1981) and have since then been rediscovered in different (but formally equivalent) forms such as bootstrap percolation in permutation matrices (Shapiro and Stephens 1991), permutations avoiding the “forbidden” patterns 2413 and 3142 (West 1996), and inversion transduction grammar (ITG) permutations (Wu 1996, among others). The pattern-avoiding definition of West (1996) can be stated in the following way: The permutation π is separable if and only if there are no digits a < b < c < d that appear in π as b . . . d . . . a . . . c or c . . . a . . . d . . . b. For example, 4 1 6 3 5 2 is not separable because the digits 3 < 4 < 5 < 6 form the forbidden pattern 4 . . . 6 . . . 3 . . . 5. Notice that the elements of the forbidden pattern do not need to be immediately next to each other: They are forbidden even when discontiguous. Bose, Buss, and Lubiw (1998) define the separable permutations in terms of separat- ing trees, which we define for present purposes as follows: (17) A separating tree over a permutation of a totally ordered set 1 . . . n is a binary tree in which each node dominates a continuous range i . . . k of elements of the ordering, which its chidren split exhaustively into two contiguous continuous sub- ranges i . . . j and j + 1 . . . k. Nodes are labeled + if the range of their left child precedes that of their right child, and − if the range on the left succeeds that on the right. For example, a permutation such as 4 3 1 2 5 8 6 7 over the range [1 . . . 8] can be split into two contiguous sub-sequences over continuous ranges, namely: 4 3 1 2 over range [1 . . . 4] and 5 8 6 7 over range [5 . . . 8], with the resulting binary node labeled +. If we continue to recursively factorize contiguous sub-sequences over continuous ranges, we form the separating tree shown in Figure 2a. It is important for what follows to note that the identities of the leaves of a separable tree are redundant, in the sense that they are fully determined by the tree structure and + / − labels on the non-terminal nodes. Bose, Buss, and Lubiw (1998) show that labels on separating trees define a sorting operation that reorders the children based on their ranges: If the label on a node is + 20 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Stanojevi´c and Steedman Formal Basis of a Language Universal + + + + − + − + + + + + − + − + + − + + − + + + + + + + 34 21 7685 43 21 7685 21 43 7685 21 43 765 8 (a) (b) (c) (d) Figure 2 Separating trees. then the range in the canonical linearization covered by the left child is ordered before that of the right child, and the existing order and label + is retained. If the label is −, then the CL range covered by the left child is ordered after that of the right, and the − operator is applied to restore the canonical order of its children, changing the label to +. This sorting procedure is exemplified in the series of transformations of the trees in Figure 2 that swap children of all nodes labeled with − until the canonical order results. The restriction of separating trees to binary structures limits the permutations that they can cover. For instance, we cannot create a separating tree for the permutation 2 4 1 3 because it cannot be split into contiguous subtrees covering continuous ranges. Bose, Buss, and Lubiw (1998) show that the set of permutations that is described by separating trees is precisely the set of separable permutations. In other words, every separable permutation can be mapped to a separating tree and every separating tree can be mapped to a separable permutation. Although it is clear that every separating tree maps to a unique separable permuta- tion, some separable permutations can be mapped to multiple different separating trees. We will return to this point in Section 4. 3.3 The Claim We observe that there is a direct relation between CCG derivations and separating trees. The intuition is clear from Figure 3: All CCG derivations for the canonical lineariza- tion of the five-element NOD, exemplified in Figure 3a, consist entirely of forward combinations, while the isomorphic separating trees for the canonical linearization, exemplified in Figure 3b, consist entirely of positive + nodes (cf. Figure 2d). Similarly, CCG derivations for non-canonical linearizations, exemplified in Figure 3c, include backward combinations, which correspond to − nodes in the isomorphic separating tree in Figure 3d.12 We seek to prove that the following claims hold for any NOD of n elements: 1. 2. 3. CCG can derive only separable permutations of the canonical linearization. CCG can derive all separable permutations of the canonical linearization. The cardinality of the set of permutations derivable by CCG is bounded by Sn−1, the n−1th element of the Large Schr ¨oder Number series (OEIS series A006318). 12 The actual combinatory rules involved will either be crossed or harmonic, but we can ignore the distinction here because of our use of the universal | slash. 21 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 1 >B
+
>B
>B
+
+
>B
+
X1|X2
X2|X3
X3|X4
X4|X5
X5|X6
X1|X2
X2|X3
X3|X4
X4|X5
X5|X6
(un) CCG derivation for canonical linearization.
(b) Separating tree for canonical linearization.
>B
+
B
+
X2|X3
X1|X2
X4|X5
X5|X6
X3|X4
X2|X3
X1|X2
X4|X5
X5|X6
X3|X4
(c) CCG derivation for non-canonical linearization.
(d) Separating tree that derives the same non-
canonical linearization.
Chiffre 3
Relation between CCG and separating trees.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
We begin by proving three helpful lemmas. To prove the first of these lemmas we
use strong mathematical induction, sometimes also called complete induction. To prove
that some proposition P(n) holds for all natural numbers using strong induction it is
sufficient to prove the following statements:
(je)
(ii)
base case P(1) et
inductive step [P.(1) ∧ P(2) ∧ · · · ∧ P(k)] =⇒ P(k + 1) holds for all k ∈ N
Strong induction is different than the more typical weak induction in the strength of
the assumption that is taken in the inductive step: Weak induction assumes only P(k),
whereas strong induction assumes P(1) ∧ · · · ∧ P(k). Both are equally sound proving
strategies, but strong induction is more convenient for proving propositions about trees.
A more detailed distinction between weak and strong induction is available in many
textbooks that cover proof methods (Eccles 1997; Liebeck 2010; Gunderson 2014).
3.4 The Proof
Lemma 1
For a fixed NOD of n elements with canonical linearization X1|X2 X2|X3 . . . Xn|Xn+1:
CCG derivations can form all possible contiguous binary branching trees;
Every node that spans from i to j in every tree derives a category Xi|Xj+1.
(je)
(ii)
22
Stanojevi´c and Steedman
Formal Basis of a Language Universal
Proof. Base case:
For n = 1 we have a single category in the canonical linearization X1|X2. This is the only
possible tree and it is binary in a trivial sense. Its only node spans from 1 à 1 deriving
category X1|X1+1.
Inductive step: Assuming the strong inductive hypothesis holds for all 1 ≤ k ≤
n, we prove that it also holds for n + 1. We analyze the canonical linearization
. . . Xn+1|Xn+2 by splitting it into two contiguous parts where the first
X1|X2 X2|X3
. . . Xn+1|Xn+2 for any
. . . Xk|Xk+1 and the second one is Xk+1|Xk+2
one is X1|X2
k ∈ [1 . . . n]. Each of these two parts forms a smaller NOD with length ≤ n. From the
inductive assumption it follows that for both of these parts CCG can form all possible
binary branching trees, and that all nodes spanning i to j within these parts will be
labeled with Xi|Xj+1. That means that the root node of the left and right part will be
X1|Xk+1 and Xk+1|Xn+2, respectivement. Now we can combine the left and right part into
the whole structure using forward function composition >B to give us a new node
X1|Xn+2. This node satisfies the labeling condition of the lemma. The binary branching
condition is also satisfied because we put no constraints on the size of the spans of the
left and the right part (as long as they are non-empty), and by the inductive assumption,
(cid:3)
each part can derive all possible binary trees.
Lemma 2
All word orders that can be derived by CCG for a given NOD can also be derived by
an isomorphic separating tree that reorders the same permutation into the canonical
linearization for that NOD.
Proof. It is sufficient to show that any CCG derivation tree corresponds to an isomorphic
separating tree over the same permutation.
Consider any arbitrary CCG derivation tree for a canonical linearization of a NOD
of length n. Exemple: One such derivation for the NOD for n = 5 is in Figure 3a.
From Lemma 1, we know that all nodes with span i to j will have label Xi|Xj+1
and all binary nodes will use forward function composition >B. We can derive the
same order with an isomorphic separating tree where instead of using CCG forward
combination we have the + operator at every node, because the children are in the
canonical order. Exemple: Figure 3b.
Non-canonical linearization in CCG requires backward combinations. For any node
in a CCG derivation for the canonical linearization whose left child has category
Xi|Xj+1 and right child category Xj+1|Xk+1 we can obtain the inverse order by reversing
the directionality of the combination and reversing the order of the child nodes, making
Xj+1|Xk+1 precede Xi|Xj+1, so that all words in the range j + 1 to k in the canonical order
precede words i to j. Exemple: Figure 3c shows this reordering effect for the two nodes
where forward is replaced by backward combination.
Any CCG derivation reordered in this way corresponds to an isomorphic separating
tree in which the unrotated nodes corresponding to forward combination are relabeled
comme + and the rotated nodes corresponding to backward combination are relabeled as −.
Exemple: Figure 3d.
By the result of Bose, Buss, and Lubiw discussed in Section 3.2, applying the −
operator to convert all such nodes to + and invert the linear order of their children
reverses the process to recover the canonical linearization of which this is a separable
(cid:3)
permutation.
23
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
Lemma 3
CCG can derive all possible separating trees for an NOD of arbitrary length.
Proof. By Lemma 1, CCG can derive all possible binary branching trees on the canonical
linearization. By Lemma 2, each node in each of these binary trees can be labeled either
+ or −, defining a separating tree isomorphic to the CCG derivation for the correspond-
ing linearization. Because we saw in Section 3.2 that a separating tree determines the
identities of its leaves, the set of all possible binary trees with all possible node labelings
(cid:3)
is the set of all possible separating trees under the definition (17) là.
Enfin, we can prove the main claims that relate CCG derivations of NOD and
separable permutations of canonical linearization of NOD.
Theorem 1
CCG derives only separable permutations.
Proof. By Lemma 2, all CCG derivations can be mapped to separating trees. All separat-
ing trees correspond to a separable permutation (Bose, Buss, and Lubiw 1998). Donc
(cid:3)
CCG derives only separable permutations.
Theorem 2
CCG derives all separable permutations over a NOD.
Proof. By Lemma 3, CCG can derive all possible separating trees. Every separable
permutation corresponds to a separating tree (Bose, Buss, and Lubiw 1998). Donc,
(cid:3)
CCG derives all separable permutations.
Theorem 3
The cardinality of the set of permutations derivable by CCG is bounded by Sn−1, le
n − 1th Large Schr ¨oder Number.
Proof. By Theorems 1 et 2, CCG derives all and only separable permutations over an
NOD of length n. The number of separable permutations of length n is the n−1th Large
Schr ¨oder Number (Shapiro and Stephens 1991). Donc, the number of permutations
(cid:3)
derivable by CCG for that NOD is also n−1th Large Schr ¨oder Number.
The recurrent formula for this series is given below, the base case being S0 = 1:
(18)
Sn = Sn−1 +
n−1
(cid:88)
k=0
SkSn−k−1
3.5 Generalization to Full CCG
The above results have been proven using a restricted version of CCG called “pure
first-order” CCG (Koller and Kuhlmann 2009), in which all variables in categories
Xi|Xi+1 correspond to atomic types such as N, rather than function categories, tel que
(S\NP). The standard version of CCG is a second-order calculus, in which arguments
can have function types, a possibility that is central to the CCG analysis of relativization,
coordination, and adjunction, including the phenomenon of parasitic gaps. En effet, tous
24
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
cases of so-called movement reduce in CCG to contiguous merger with a semantically
second-order functor.
Categories that select for multiple arguments are “Curried” unary functors in CCG,
with a function as their result, tel que (Xi|Xj)|Xi+1. They do not define a NOD over the
arguments Xi+1, Xj, because those arguments do not stand in a dominance relation, et
they potentially allow the second-level composition rules such as (12) to apply. Never-
theless, they have a canonical linearization in which all combinations are forward, et
(by Lemma 2) dominate separating trees over separable permutations of that canonical
commande, although they no longer allow all separable permutations. (The large Schr ¨oder
numbers thus constitute an upper bound on the allowable permutations of canonical
linearizations in CCG.)
The present results generalize immediately to full CCG. The semantics of the com-
binatory rules is invariant, so it makes no difference to the combinatorial possibilities
if the “canceling” variable Y in a rule such as application matches an atomic category
or a function category. (En fait, we have already exploited this possibility in writing the
atomic symbols VP and N for what are semantically predicate or functional types.)
Type-raising is a lexical operation that exchanges the role of argument and function
by making the former into a function over the latter and that does allow permutations
that would not be allowed by the original categories. It does so by changing the NOD
among the categories. Cependant, because this is a lexical operation whose effects are
determined before syntactic derivation can apply, the main result of the present paper
still applies: Only separable permutations on the canonical linearization order defined
by the lexical categories can be captured.
4. Normal-Form Separating Trees and CCG Parsing
The results shown above depended on the fact that the canonical linearization on the
category sets of interest is binary tree-complete, and therefore separating tree-complete
under rotation, and therefore separable-permutation-complete. Cependant, we noted ear-
lier that there are more separating trees than separable permutations, because many
separating trees yield the same permutation. En effet, for the canonical linearization of
NOD of length n and its inverse, the number of separating trees is Cn−1, the n−1th
Catalan Number (Church and Patil 1982), which is much larger than the corresponding
Large Schr ¨oder number.
Par exemple, Chiffre 4 shows all eight separating trees derived by rotation from the
two binary trees on the canonical linearization of the canonical linearization CL3 of three
elements, dans lequel + on each of the two nodes means that it is unrotated and − means
that it is rotated. These are indeed all separable permutations of three elements, mais le
two orders 1,2,3 et 3,2,1 are represented by two separating trees, where C3 = 2. Pour
the purpose of efficient parsing with CCG, we would like there to be only one CCG
derivation, and one separating tree, for each separable permutation.
4.1 Normal-Form Separating Trees
Shapiro and Stephens (1991) observed that the spurious ambiguity of separating
trees arises when two nodes that are in parent–child relation have the same label, as for
the trees in Figure 4a and 4b. They proposed a normal form that makes the mapping
between the set of normal form separating trees and the set of separable permutations
bijective. This normal form states that no node can be labeled with the same +/−
label as its right child. This excludes trees like those in Figure 4a and 4e and means
25
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
+
+
−
−
+
+
+
+
321
21
3
32
1
213
(un)
−
(b)
−
(c)
+
(d)
+
−
−
−
−
123
23
1
12
3
231
(e)
(F)
(g)
(h)
Chiffre 4
All possible separating trees for a three-word sentence.
that the number of normal-form separating trees is the same as that of the separable
permutations.
This normal form for separating trees is reminiscent of the CCG derivational
normal-form proposed by Eisner (1996) to enforce a one-to-one mapping between CCG
syntactic derivations and semantic interpretation that they generate (cf. Hockenmaier
and Bisk 2010). Eisner normal-form forbids particular sequences of combinatory rules
that are of the same type from occurring in a parent-child relationship in the derivation.
Using normal-form separating trees, we can distinguish the possible cases, shown
in Figure 5. Figures 5a and 5b show the configurations where the right child is a
single word. Figures 5c and 5d show those where the right child is a non-single word
constituent that adheres to the normal form. The length of the examples is identified as
n + 1 in order to be consistent with the conventional numeration for the Large Schr ¨oder
numbers. The length of the constituent A is k. We are interested in the number of normal-
form separating-trees Sn, since that will also be the number of separable permutations
of length n + 1.
The number of trees that conform to the pattern in Figure 5a clearly depends on the
number of possible concrete sub-trees that can act as constituent A. Here A is of length
k = n, which means that the number of separating trees of the pattern in Figure 5a is
Sn−1. If we ignore the label of the root node, the other three patterns show all possible
ways of constructing the right child: having a single terminal node, or binary node
labeled + or −. This means that we can express the sum of trees confirming to these
three patterns as the number of trees that have their root unlabeled (since we are ignor-
ing the root label) and have the left and right child confirming to the normal form for
separating trees. For a given k, which plays a role of a split point in parsing, the number
of those trees would be Sk−1Sn+1−k−1, but because k can be any number between 1 et
n, and is not fixed, we need to sum over all of those values (cid:80)n
k=1 Sk−1Sn−k. We can shift
this sum to start from zero to obtain the equivalent formula (cid:80)n−1
k=0 SkSn−k−1. When we
put the formulas for all four patterns together we get the Large Schr ¨oder recurrence in
Équation (18). We can conclude that the number of normal-form separating trees for a
NOD of length n is also Sn−1.
26
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
+
−
UN
UN
0
n
n+1
0
n
n+1
(un)
+
(b)
−
−
+
UN
B
C
UN
B
C
0
k
(c)
n+1
0
k
(d)
n+1
Chiffre 5
Normal form branching configurations.
4.2 The Relation of Large Schr ¨oder Numbers to Shift-Reduce Parsing
The Large Schr ¨oder numbers are often also presented in relation to the number of lattice
paths in the Cartesian plane such that each path starts at lower-left corner (0, 0), ends
at upper right corner (n, n), contains no points above the diagonal y = x (c'est, peut
only be in the lower right triangle), and is composed only of unit steps →, ↑, et (cid:37)
or more concretely (1, 0), (0, 1), et (1, 1) (Weisstein 2018). In this section we show how
this view of Large Schr ¨oder numbers can be interpreted as Shift-Reduce parsing with
CCG derivations corresponding to the normal-form separating trees defined above.
Shift-Reduce is one of the most widely used parsing algorithms in computational
linguistics and it is probably the simplest way to construct a parser. It is based on
having a parsing state with a stack (filled with constituents built so far) and a buffer
(filled with words in the sentence suffix that are yet to be processed). A shift transition
takes a word from the buffer and puts it on top of the stack. For binarized grammars
like CCG, a reduceX transition replaces the two topmost constituents on the stack with
a single new constituent labeled X. In our CCG derivations we will have two reduce
opérations: reduce+ for forward combinators and reduce− for backward combinators.
We will again take an example sentence of length n + 1. The initial parsing state will be
a stack containing the first word and a buffer containing the other n words.
We can visually represent any valid successful shift-reduce parsing trace by starting
at the point (0, 0) and taking → to represent a shift transition and ↑ labeled + or − to
represent a reduceX transition. The Shift-Reduce transition sequence for the derivation
tree in Figure 6a is shown in Figure 6b. Clairement, such paths will never cross the diagonal,
27
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
−
−
+
+
−
546
231
6
4
5
1
3
2
(un) separating tree
(b) all reduce transitions are labeled
6
4
5
1
3
2
6
4
5
1
3
2
(c) only first reduce in the reduce
sequence is labeled
(d) no labels
Chiffre 6
A separating tree and different representations of the shift-reduce transition sequence that
recognizes it.
because we can never have more reduce operations than shifts and every successful
parse has n shifts and n reduce steps, so that every path ends at (n, n).
The set of SR parse paths can be further restricted to those corresponding to the
the normal-form parses and separating trees defined in the preceding Section 4.1, comme
follows. D'abord, the normal form says that a parent node cannot have the same label as
its right child node. What that means in SR parsing terms is that we cannot have two
consecutive transitions both of which are reduce+ or reduce−. Donc, when we have a
sequence of reduce operations, expressed as an uninterrupted vertical sequence of ↑,
we need only know the label of the first reduce transition in the sequence because the
other labels can be inferred from the normal form: They will alternate between + et
−. This modified representation is shown in Figure 6c.
The final modification is to use a slightly different encoding of the initial reduce in
each of the reduce sequences. The first reduce step in the sequence always comes after
a shift, which means that it forms a shape → ↑. If that sequence is [shift, reduce−], nous
will encode it with a diagonal (cid:37) and if it is [shift, reduce+] we will encode it with
existing→ ↑ but without any label, as shown in Figure 6d. This encoding defines a
one-to-one mapping between CCG normal-form shift-reduce parses and paths in the
Schr ¨oder lattice.
These observations support another way to think about the number of permutations
that CCG allows for a NOD of length n in terms of SR normal-form parsing. Quand
28
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
+ 1,oui)
(oui
Sn−k−1
Sn−1
Sk
1
0
(un) Paths starting with (cid:37) transition
n
0
k+1
1
(b) Paths starting with → transition
n
Chiffre 7
Counting paths in the lattice for different initial transitions.
we start in the initial position (0, 0) we can take paths that start either with (cid:37) or →
(↑ is illegal because it would cross the upper diagonal). If we take (cid:37) (Figure 7a) nous
find ourselves in a lattice of the same shape, smaller by one unit, which means that the
number of normal-form paths through it is Sn−1. If, on the other hand, we take → we
will end up in position (1, 0). Any path that goes from (1, 0) à (n, n) must at some point
have a vertical arrow that crosses diagonal (oui + 1, oui), as shown in Figure 7b. The first
point at which the path crosses that diagonal will be some concrete point (k + 1, k). Le
number of paths that go from (1, 0) à (n, n) and contain arc ↑ at (k + 1, k) is SkSn−k−1.
Encore, we need to sum over all possible paths that cross the diagonal (oui + 1, oui) over
each possible k, which gives (cid:80)n−1
k=0 SkSn−k−1. Summing results for initial (cid:37) and initial
→ again yields the Large Schr ¨oder number recurrence function from Equation (18).
5. Statistical Significance of the Global Result
The above results confirm that all 22 separable permutations are allowed, at least as al-
ternates, for any construction defined by four elements of the form A|B, B|C, C|D, D, et
le 2 non-separable permutations are forbidden. The fact that in both the constructions
surveyed, the distribution of the frequencies with which the separable permutation
orders are actually observed are so skewed means that some orders are so vanishingly
rare as to only be attested in one construction, and/or even by only one or two lan-
guages. The possibility that this skewness arises from soft constraints related to ease of
processing or learnability is interesting in their own right, and is the subject of ongoing
linguistic research (Culbertson, Smolensky, and Legendre 2012; Merlo 2015; Dryer 2018;
Merlo and Ouwayda 2018).13
Nevertheless, according to the present theory, the two permutations (g) et (j) sont
excluded for a different reason, namely, by a hard constraint that is a consequence of
the theory of grammar itself. The fact that the two disallowed orders are the only two
that are totally unattested in the union of Cinque’s, Nchare’s, and Abels’s surveys is a
13 This possibility is also acknowledged by Abels and Neeleman (2012).
29
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
strong result that is unlikely to have arisen by chance. The strength of this result can be
assessed quantitatively, as follows.
Il y a 21 permutations that are attested for NP. We will assume that all 21 sont
sampled without replacement from a uniform distribution of 24 permutations. Without
knowing which 21 are attested, we could form 24 theories that exclude one permutation,
24 × 23 that exclude two, et 24 × 23 × 22 that exclude 3 permutations. For each one
of them, there is a different number of possible observations (c'est à dire., different sets of 21
permutations) that would not falsify them. Par exemple, a theory that forbids 3 permu-
tations will be consistent only with one possible set of 21 observed permutations, alors que
a theory that forbids 2 (like the one presented here) will be consistent with 22 different
sets of 21 permutations. The probability of our predictions being not falsified by chance
is the number of observations (permutation sets) that would be consistent with our
theory divided by the total number of possible observations—that is, environ 1
dans 100:
(19)
(cid:19)
(cid:19) = 22
2024
≈ 0.0109
p =
(cid:18) 22
21
(cid:18) 24
21
If on the other hand we accept (h), as tentatively attested on the basis of Abels’ and
Wurmbrand’s accounts, the probability of getting this stronger result by chance drops
to around 1 dans 250:
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
(20)
(cid:19)
(cid:19) = 1
276
≈ 0.0036
p =
(cid:18) 22
22
(cid:18) 24
22
Both results are evidently unlikely to come up by chance.
6. Discussion
The prediction of CCG concerning possible word orders is significant from both sci-
entific and engineering perspectives. The Large Schr ¨oder series Sn grows much more
slowly with n than the total factorial number of permutations n!, so that the proportion
n!−Sn−1
of non-separable permutations that are disallowed by CCG grows rapidly with
n!
Sn−1
n! of separable
n, comme 0%, 0%, 0%, 8%, 25%, 45%, 64%, 79%, 89%, . . . . The proportion
permutations that are allowed shrinks correspondingly rapidly with n.
From a practical NLP perspective there are many areas where this saving can
be achieved, with machine translation and parsing, including “sequence-to-sequence”
semantic parsing, being the obvious use-cases. If we know the limited number of word-
order variations among languages, alignment algorithms (or attention as its neural
equivalent [Bahdanau, Cho, and Bengio 2015]) can be constrained (or biased) dans le
right direction (Zhang and Gildea 2005; Dong and Lapata 2016).14
14 This observation is also implicit in Whitelock’s (1992) early proposal for “shake and bake” translation for
Japonais.
30
Stanojevi´c and Steedman
Formal Basis of a Language Universal
Unsupervised parser induction is another application where introducing an induc-
tive bias (or hard constraint) for contiguous derivation trees with a normal form sig-
nificantly reduces the number of alternatives that learner needs to consider (Clark and
Lappin 2010). These savings are especially significant in comparison to other theories
of grammar that use discontinuous constituents, empty strings, or very powerful, souvent
Turing-complete, unification mechanisms.
We saw earlier that the number and type of permutations allowed by CCG appears
to correspond to the crosslinguistic variation in word order that is actually attested.
CCG also provides a strong and easily testable prediction of what word orders will be
found in the future in human languages that have not so far been examined (Steedman
2020). These properties stem from the combinatory projection principle (13), et en
particular from the restriction to binary combinatory rules.
So far, the empirical studies of crosslinguistic variation cited above have been
confined to no more than four core elements of NP and VP. Cependant, we have shown
that the results proved here are completely general to sets of categories of any size that
define a natural order of dominance. The verb group in particular can be naturally
extended to serial verbs of various kinds. The cartographic spines of functional pro-
jections identified by Cinque and Rizzi (2008) offer several tens of functional heads in
a supposedly universal order of dominance, although there is more to say here about
the relation of syntax and morphology and the degrees of crosslinguistic variation they
allow. Any firm evidence for any non-separable permutation of these elements in any
language for which the natural order of dominance can be unequivocally determined
would immediately falsify the present theory. As we noted earlier, the number of for-
bidden non-separable falsifying cases rises very rapidly as the size of such sets increases
au-delà 4.
Cependant, the power-law distributions that we observe on all such dimensions of
variation, and the fact that the performance factors that lie behind them favor continuity
and semantic homomorphism, mean that those falsifying cases will always lie out in the
long tail. We have seen that the total number of languages that have ever been studied in
the requisite detail is in the case of the nominal construction only just enough to take us
far enough out into that long tail to achieve a confidence level against ever seeing a non-
separable permutation of even 1 dans 100. That means our chances of reaching a definitive
answer to the question for even a five-element construction are quite daunting.
Naturellement, our work is not the first work to try to find a formal constraint of this
kind. Here we review some results from other formalisms.
ITG Inversion Transduction Grammars (Wu 1996) are a type of syntax transduction
grammar developed for the task of machine translation. ITG can generate only separa-
ble permutations of the words in the source language, so it is natural to ask how this
property of ITG differs from the one proposed here. The main difference lies in what is
considered the reference word order that is being permuted. Ici, CCG gives a clear
definition: The reference word order is the CL that reflects the NOD, and ultimately the
semantics. To say that CCG generates only separable permutations of the NOD is to
make a clear prediction about what permutations can be found in natural languages.
Cependant, the same is not the case with ITG, which does not identify any NOD. For ITG
the reference word order is that of the source language in the MT task. Taking a different
language as a source language gives a different prediction of what will be found in the
world’s languages. Take, par exemple, the previous example “these five young lads.” If
ITG takes English as a source language then it will make the same prediction as CCG.
But if we had, Par exemple, taken Spanish as a source language, with the word order
“these five lads young,” ITG would be able to derive “five lads these young,” which has
31
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
not been attested and that CCG predicts to be impossible. More importantly, starting
from Spanish ITG would be unable to derive the word order “lads these young five,»
which was attested by both Cinque and Nchare for NP and by Abels and Neeleman for
the equivalent VP permutation. Extending ITG to a more powerful formalism such as
Permutation Trees (Zhang and Gildea 2007; Stanojevi´c and Sima’an 2015) does not solve
this problem as long as the NOD is ignored.
TAG Tree-Adjoining Grammars (Joshi 1985) are often compared to CCG because
they are weakly equivalent, in the sense that they generate the same set of languages
(Joshi, Vijay-Shanker, and Weir 1991). The problem discussed in our article is about the
set of permutations allowed for a fixed natural order of dominance. This means that
results from our paper would transfer to TAG only if TAG was strongly equivalent to
CCG, which Koller and Kuhlmann (2009) have shown not to be the case: There are some
dependency structures that are derivable by CCG that are not derivable by TAG and vice
versa. En particulier, Koller and Kuhlmann’s (2009, Figure 8b) example of a dependency
structure derivable by TAG but not by CCG is actually the dependency structure of the
non-separable permutation 2413, ou (g) under the alphanumeric convention used in this
article.
Previous work on the permutations derivable by TAG have looked at the different
scrambled argument orders allowed in German. This was motivated by the work of
Becker, Rambow, and Niv (1992), who claim that scrambling requires even higher
generative power than LCFRS. Joshi, Becker, and Rambow (2000) show that that is
indeed the case if we want unbounded embedding levels of scrambling, but note that
it is not clear that this is needed in modeling attestable natural language data. The tree-
local multi-component version of TAG (TL-MC-TAG) does not increase weak generative
power but does increase strong generative power, allowing complete scrambling of
three arguments but not four.
Scrambling in CCG is handled by the assumption that all arguments are lexically
type-raised as functions over the verb or verb complex, which allows less oblique
arguments (crucially, the subject) to compose with the German verb complex in ad-
vance of more oblique arguments. As noted earlier, lexical type raising alters the NOD,
making the subject take position 1 and the verb position n in the order of dominance.
If the generalization of composition rules in CCG is limited to the second-order case
illustrated in (12), CCG is subject to the same limitation as TL-MC-TAG: three arguments
can scramble freely, but four cannot. For more discussion on German scrambling and
CCG, see Hockenmaier and Young (2008).
Chen-Main and Joshi (2008) take the problem of scrambling of noun arguments
of four verbs: N1N2N3N4V1V2V3V4. They show that there are 22 permutations of N1
to N4 that can be derived with the extended version of TL-MC-TAG that supports
flexible composition and multiple adjoining. Cependant, these permutations cannot be
directly compared with the NP and VP constructions considered in the empirical studies
discussed earlier.15
MG Minimalist Grammars (Stabler 1996) are a rigorously defined formalism for
Chomskyan Minimalism (Chomsky 1995b) that has higher generative capacity than
CCG. Stabler (2011) has used MG to derive possible permutations for the four core ele-
ments of a noun phrase and found that, under some assumptions, MG can generate only
15 When Chen-Main and Joshi (2008) report that their variation of TAG can capture permutation (g) 2413 de
verbal arguments, they are actually referring to a permutation 24135678 of eight elements, of which the
last four (the verbs) are polyvalent, defining a quite different (partial) NOD from the 1 > 2 > 3 > 4
linearization.
32
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Stanojevi´c and Steedman
Formal Basis of a Language Universal
16 out of the complete 24 possible permutations. These 16 include the 14 permutations
claimed by Cinque (2005), but exclude some of those claimed by Nchare (2012) for the
free word-order language Shupamem. The assumption that Stabler makes is that the
grammar uses neither empty strings nor head-movement, both of which are prevalent
in most linguistic work on minimalist syntax. Fait intéressant, Stabler’s 16 include permu-
tation (j), which is one of the two non-separable permutations that are excluded under
the present theory.
The limitation of combinatory reduction to the separable permutations is also
implicit in Svenonius’s (2007) *1-3-X-2 restriction on adjunct placement (among other
constructions), where X is an adjunct to 1, and in Williams’s (2003, pages 203-211)
proposal for his categorial calculus CAT. CAT has a standard directional categorial
lexicon and rule of application, with a combinatory operation REASSOCIATE equiv-
alent to composition, and an operation FLIP, unavailable in CCG, which reverses the
directionality on the argument of a functor category.
Williams’s CAT variety of Minimalism is closely related to CCG. Cependant, without
the addition of morpho-lexical type-raising, permitting arguments to compose, CAT
is unable to express the variety of constructions that CCG makes available, lequel
include relativization, coordination-reduction, Germanic scrambling, and Hungarian
VM fronting (Steedman 2020). In the absence of lexical type-raising these constructions
require the extension of CAT by the addition of powerful rules of “action at a distance”
unavailable in CCG, such as movement, copying, and/or deletion, together with atten-
dant constraints to limit their considerable expressive power.
A related proposal to ours by Medeiros (2018) hypothesizes that the set of allowed
permutations is the one that could be sorted with a single stack, so-called stack-sortable
permutations or 231-avoiding permutations (Knuth 1968). This set of permutations
exactly covers Cinque’s 14, but does not fit the other empirical data considered here:
7 permutations out of 21 that are observed are predicted not to be possible by Medeiros.
As with Williams (2003), it is also unclear that Medeiros’s mechanism will also han-
dle phenomena of unbounded or wh movement, as CCG does. Cependant, there is an
interesting relation between the sorting automata for 231-avoiding permutations and
the automata that sort separable permutations: 231-avoiding permutations are sortable
with a single pop-stack while separable permutations are sortable with a sequence of
n − 1 pop-stacks (Avis and Newborn 1981).
7. Conclusion
This article has argued that the facts of attested word order in the constructions consid-
ered exploits all and only the degrees of freedom that CCG allows, despite the fact that
some of the orders that CCG allows for some constructions are vanishingly rare due
to unrelated performance soft constraints that induce Zipfian (power-law) distributions
discussed by Culbertson, Smolensky, and Legendre (2012) and Culbertson and Adger
(2014).
Explanatory adequacy in a theory depends on being able to explain why the data
has just the degrees of freedom it does, and why other things don’t happen. If the latter
have been prevented by hard constraints, as in various ways they have been for these
constructions by Cinque, Svenonius, Abels, Neeleman, Nchare, and Stabler, then those
constraints themselves have in turn to be explained.
No hard constraints are needed at the level of competence grammar in CCG. Le
limitation on the grammatical permutation of n elements in natural grammars to the
Large Schr ¨oder number Sn−1 of separable permutations is a formal universal stemming
33
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
c
o
je
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
7
1
9
2
0
0
8
4
4
0
/
c
o
je
je
_
un
_
0
0
3
9
4
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Computational Linguistics
Volume 47, Nombre 1
from the limited expressivity of CCG itself—specifically, its restriction of its operators
to strictly adjacent pairs of non-empty constituent types via the Combinatory Projec-
tion Principle (13). This result is a corollary of the Combinatory Categorial theory of
grammar, and the formally explicit reduction that it affords of what in Minimalist terms
would be called MOVE or “internal merge” to contiguous (c'est, “external”) MERGE.
Appendix A: The Ensemble of Attested Word Orders Over 4 Elements
In this appendix we review the attested orders of four core elements of NP and VP as
reported by Cinque, Nchare, and Abels and Neeleman. We also give CCG derivation
trees for these orders from Steedman (2020).
If our lexicon consists of four entirely unrestricted categories of the form A|B, B|C,
C|D, D, then for both the nominal and verbal constructions all 22 separable permuta-
tions are allowed, and only the two non-separable permutations are unanalyzable, as a
consequence of the combinatory projection principle (13), as follows.
The NP
For the NP, le 22 orders are allowed under the following derivations (“×” marks
the two sequences (g) et (j) that are unanalyzable). Only essential compositions are
indicated: all other combinations are application. For non-basic orders, the annotation
“from z” indicates the basic pure-applicative order among those in (8) on whose lexicon
a particular derived order is based:16
Both (basique: v.many)
Both (basique: many)
Cinque (from b: v.few)
Cinque (from b: few)
Nchare (from r)
Nchare (from s)
(21) un.
b.
c.
These
These
five
young lads
NP/NumP NumP/N(cid:48) N(cid:48)/N N
lads young
five
NP/NumP NumP/N(cid:48) N N(cid:48)\N
young
NP/NumP N NumP/N(cid:48) N(cid:48)\N
>B×
These
lads
five
NumP\N
d. Lads
five
ces
young
N NP/NumP NumP/N(cid:48) N(cid:48)\N
NP/N(cid:48)
>B
NP\N
>B×
e.
F.
Five
ces
young lads
NumP/N(cid:48) NP\ NumP N(cid:48)/N N
B×
je. Jeune
ces
lads
N(cid:48)/N NP/NumP NumP\N(cid:48) N
five
NP\N(cid:48)
NP/N
>B×
B×
five
NP\N(cid:48)
je. Lads young
ces
N N(cid:48)\N NP/NumP NumP\N(cid:48)
>B×
five
NP\N(cid:48)
five
m.
n.
o.
These
young
lads
NP/NumP N(cid:48)/N NumP\N(cid:48) N
B×
NP\N
q.
Five
young
lads
NumP/N(cid:48) N(cid:48)/N NP\NumP N
>B
ces
NumP/N
NP/N
B×
NumP\N
toi. Jeune
five
ces
lads
N(cid:48)/N NumP\N(cid:48) NP\NumP N
NP\N(cid:48)
B×
five
NumP\(cid:63)N
NumP
>
< NP This is Cinque’s attested order (21c).17 17 Depending on the facts concerning coordinations like (15) in the language in question, we may want to use /(cid:5)(cid:63) or \(cid:5)(cid:63) slash types in place of /(cid:63) or \(cid:63) to allow harmonic composition as well as application in a language like English. 36 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Stanojevi´c and Steedman Formal Basis of a Language Universal For a language like Shupamem with many orders, in the worst case we can simply allow multiple entries for lexical items supporting all those orders. However, in practice, smaller lexicons with entries supporting more than one order seem to apply. (For example, in Shupamem, Dem seems to bear the completely unrestricted bidirectional slash type NP|NumP—Steedman [2020, page 19].) The VP Turning to the Germanic VP construction surveyed in Abels (2016), we see the identical picture to that for the elements of the NP (21) emerge for the verb group. That is, only 22 out of the total 24 permutations are allowed: (24) a. b. will help teach swim VP1/VP2 VP2/VP3 VP3/VP4 VP4 swim teach help will VP1/VP2 VP2/VP3 VP4 VP3\VP4 c. will swim help teach VP1/VP2 VP4 VP2/VP3 VP3\VP4 >B×
VP2\VP4
d. swim will
help
teach
VP4 VP1/VP2 VP2/VP3 VP3\VP4
VP1\VP3
>B
VP1\VP4
>B×
e.
will
help
swim
VP2/VP3 VP1\VP2 VP3/VP4 VP4
B×
je.
teach
swim
VP3/VP4 VP1/VP2 VP2\VP3 VP4
help
will
VP1\VP3
VP1/VP4
>B×
B×
VP1\VP3
je. swim teach
will
help
VP4 VP3\VP4 VP1/VP2 VP2\VP3
>B×
m.
will
swim
VP1/VP2 VP3/VP4 VP2\VP3 VP4
teach
VP2\VP3
help
B×
VP1\VP4
will
q. Help
teach
swim
VP2/VP3 VP3/VP4 VP1\VP2 VP4
>B
VP2/VP4
VP1/VP4
B×
VP2\VP4
teach
swim
VP3/VP4 VP2\VP3 VP1\VP2 VP4
help
will
VP1\VP3
VP1/VP4