Formal Basis of a Language Universal

Miloˇs Stanojevi´c
爱丁堡大学
School of Informatics
m.stanojevic@ed.ac.uk

Mark Steedman
爱丁堡大学
School of Informatics
steedman@inf.ed.ac.uk

斯蒂德曼 (2020) proposes as a formal universal of natural language grammar that grammatical
permutations of the kind that have given rise to transformational rules are limited to a class
known to mathematicians and computer scientists as the “separable” permutations. This class
of permutations is exactly the class that can be expressed in combinatory categorial grammars
(CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of
studies of crosslinguistic variation in word order in nominal and verbal constructions.

The number of permutations that are separable grows in the number n of lexical elements in
the construction as the Large Schr¨oder Number Sn−1. Because that number grows much more
slowly than the n! number of all permutations, this generalization is also of considerable practical
interest for computational applications such as parsing and machine translation.

The present article examines the mathematical and computational origins of this restriction,
and the reason it is exactly captured in CCG without the imposition of any further constraints.

1. 介绍

Permutation of elements in a construction, either within a single language or across
语言, creating discontinuous semantic dependencies, is a widespread character-
istic of natural languages that has motivated transformational rules (Chomsky 1957,
passim), among other expressive grammar formalisms. 斯蒂德曼 (2020) proposes as
a formal universal of natural language grammar that grammatical permutations are
limited to a class known to mathematicians and computer scientists as the “separable”
permutations, which are those orders over which a tree can be constructed such that all
leaves descending from any node form a continuous subset i . . . j of the original ordered
放 1, . . . , n.

The evidence for this claim is based on a number of studies of two data sets of
alternations in two major constructions. The ﬁrst concerns the permutations of the four

提交材料已收到: 29 一月 2019; 收到修订版: 29 七月 2020; 接受出版:
10 十月 2020.

https://doi.org/10.1162/COLI 00394

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

elements Det(erminer) Num(erator) Adj(ective) 氮(oun) that surface in that order in
English noun phrases (NP) like “These ﬁve young lads,” but in Yoruba and many other
languages are found in the mirror-image permutation. The crosslinguistic variation in
this construction has been studied by Cinque (2005) and Abels and Neeleman (2012)
(除其他外) for languages with a single dominant order, and also within a single
language with very free word order for the same construction by Nchare (2012). 这
second construction is the verb group, also consisting of four elements, which we can
think mnemonically of as the verb-sequence “must have been dancing,” whose varia-
tions with various verb-types both across and within the various Germanic languages
and dialects, has been studied by Wurmbrand (2004; 2006) and Abels (2016).

In both constructions, the attested orders fall on highly skewed Zipﬁan power-law
分布, with the rarest orders found in only a very few cases. Such distributions
make it impossible to decide whether further unattested orders are entirely excluded
by hard constraints on the universal space of possible grammars, or merely disfavored
by sampling error arising from soft constraints such as “Harmony” (提议者
Culbertson, 斯摩棱斯基, and Legendre 2012, among others considered below) that may
merely make them vanishingly rare in samples of the size that we can actually access
(Abels and Neeleman 2012). 当然, a further possibility is that the unattested orders
arise from a combination of hard and soft constraints.

The present study defers to such authors on the speciﬁc characteristics of the soft
constraints and the distributions themselves, and considers only the nature and origin
of a speciﬁc hard constraint on the formal operations that deﬁne the set of permutations
that such distributions can apply to in the ﬁrst place.

Both the nominal and verbal constructions concerned can be thought of in grammat-
ical terms as made up of elements or “categories” with a “natural order of dominance”
stemming from their semantics. For these constructions, that order of dominance hap-
pens in English to be the same as their left-to-right order on the page, inducing a right-
branching tree-structure. 然而, in other languages, the same relations of dominance
allow all of the word orders that would arise if the same tree were treated as a mobile,
with daughters allowed to rotate around each other, as well as some word orders
involving apparent displacements that go beyond rotation. While the constructions of
interest allow only one natural order of dominance, other sets of categories including
multiple words of the same category may allow more than one natural order to merge.
(例如, the set consisting of the words “divorced, alleged, spy” has two natural
orders of dominance, the other one corresponding to a distinct sense “alleged divorced
spy.”)

We can therefore abstract over the two constructions of interest as being made up of
elements numbered 1 > 2 > 3 > 4 according to that natural order of dominance, 在下面
the following correspondence:

(1) These > ﬁve > young > lads

must > have > been > dancing

1 > 2 > 3 >

There is widespread suspicion that not all of the four-factorial permutations of these
constructions are allowed (威廉姆斯 2003; Svenonius 2007).

In considering whether a given permutation is allowed for the NP construction,
Cinque (2005; 2010) emphasizes the need to exclude from consideration word orders for
the NP in which the adjective is extraposed or has the markers of a relative clause. 这
exclusion is crucial, since such modiﬁers may be adjoined to the entire NP, rather than to

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

氮, thereby deﬁning a different natural order of dominance, with the extraposed element
作为 1. Cinque rejects a number of the orders claimed by Dryer (2018) on this ground. 这
present paper follows Cinque strictly concerning the attested ﬁxed NP orders.

Cinque also proposes to exclude free word-order alternation of the kind investi-
gated by Nchare (2012) for the Shupamem NP, on the grounds that alternating orders
reﬂect distinctions of focus. Accounts of focus alternations have sometimes assumed
the existence of an FP-like focus-phrasal external functional projection, offering an
extra target for a different variety of movement resembling extraposition (Szabolcsi
1983, 1994; Giusti 2006). 然而, this is an additional and somewhat theory-internal
assumption. English achieves similar alternations of focus or contrast in the NP by
in situ accent placement without varying word order, for which Steedman (2014) 优惠
a lexicalized alternative semantics without focus-movement or focus-projection, 其他
than by surface-compositional syntactic derivation. We therefore depart from Cinque
on this point, and provisionally accept Nchare’s free orders as arising from the same
mechanism as Cinque’s.

The orders attested in these three studies for the NP and VP constructions can then
be summarized as in Figure 1, in which the alphabetic ordering of the permutations
(which will be used for reference throughout this article) follows Cinque. (Abels 2016
adopts a different convention.) Cinque (2007 n.13) notes the possibility that Figure 1(米)

(Cinque 2005)

(Nchare 2012)

(Abels 2016)

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

A.
乙.
C.
d.
e.
F.
G.
H.
我.
j.
k.
我.
米.
n.
哦.
p.
q.
r.
s.
t.
你.
v.
w.
X.

1 2 3 4
1 2 4 3
1 4 2 3
4 1 2 3
2 1 3 4
2 1 4 3
2 4 1 3
4 2 1 3
3 1 2 4
3 1 4 2
3 4 1 2
4 3 1 2
1 3 2 4
1 3 4 2
1 4 3 2
4 1 3 2
2 3 1 4
2 3 4 1
2 4 3 1
4 2 3 1
3 2 1 4
3 2 4 1
3 4 2 1
4 3 2 1

v/
v/
v/
v/
—
—
—
—
—
—
v/
v/
(v/ )
v/
v/
v/
—
v/
v/
v/
—
—
v/
v/

数字 1
Attested permutations.

v/
v/
—
—
v/
v/
—
—
v/
—
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/
v/

v/
v/
v/
v/
—
(v/ )
—
(v/ )
—
—
v/
v/
(v/ )
v/
v/
v/
—
v/
(v/ )
v/
—
—
v/
v/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

constitutes a ﬁfteenth order for the NP, attested for only one language so far, Dhivehi
(Maldivian; 该隐 2000), indicated by parentheses in the ﬁgure.1

We note at this point that Dryer (2018) claims several further word orders as basic
for NP in ﬁxed-order languages, in addition to those listed by Cinque. 尤其, 他
索赔 (2018: 页面 17, 29) that Kilivila (Senft 1986) and four other languages have order
(G), which is excluded by the theory developed below, as their default order. 然而,
Cinque argues that Senft’s example of this order involves adjectival extraposition, 不是-
ing that Senft (1986, 页面 96) also cites the canonical order (A) for the full four-element
construction itself. Of the other four languages cited by Dryer, he himself notes for
Yapese (詹森 1977) that the adjective in his example of order (G) is marked as a relative
clause including the copula, hence arguably also extraposed from NP. Of the remaining
three languages for which Dryer claims this order, Katu (科斯特洛 1969, 页 22) 做
not lexically distinguish demonstratives from locatives, but the one example Costello
gives (1969, 页 34 (87)) involving both an adjective and a demonstrative locative has
the canonical reverse order N Adj Dem. The example given by Tryon (1967, 页 60) 为了
Dehu/Drehu includes the copula with the adjective, so is arguably also extraposed.

In the lexically multifunctional language Teop (Mosel 2017), to which Dryer also
属性 (G) as base order, adjectives are expressed as adjoined adjectival phrases, 和
their own copies of the article or agreement and numerator, so that a strong possibility
of extraposition or apposition clearly exists here also.

In terms of the CCG theory developed below, adjective extraposition requires
the addition of a distinct NP-adjunct category, syntactically and semantically non-
isomorphic to the standard adjective, inducing a different Natural Order of Dominance
over the four elements, with the extraposed adjectival as 1, the demonstrative as 2, 这
numerator as 3, and the noun as 4. 因此, none of these languages constitutes a coun-
terexample to the claim below that order (G) is universally excluded for the standard
类别. We have strictly followed Cinque’s account of the attested NP orders, 和
exclude several other orders listed by Dryer whose attestation appear to depend on
adjective extraposition, 包括 (H).

Of the orders for the Germanic verb group admitted by Abels (2016), he gives only
qualiﬁed support for Figure 1f, H, 米, and s, in some cases because they are only attested
as alternates, similarly marked by parentheses.

It is striking that just two of the 24 permutations over these two constructions are
entirely without even equivocal support for attestation in these three studies, 即,
Figure 1g and j—that is, 2413 and its mirror-image 3142. These are the permutations that
would give rise to the following word orders for the two constructions:

(2) G.

Five

lads

这些

young

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

NumP|氮(西德:48)
2

氮
4

NP|NumP N(西德:48)|氮

Young

这些

lads

ﬁve

氮(西德:48)|N NP|NumP

氮
4

NumP|氮(西德:48)
2

1 Some of the possibilities admitted for the Shupamem NP are conditional on the presence of clitic

agreement and deﬁniteness markers, and certain orders are associated with focus effects—see Nchare
(2012) chapter 3 and Steedman (2020) for details that we pass over here.

Stanojevi´c and Steedman

Formal Basis of a Language Universal

(3) G.

have dancing must

been

VP2|VP3 VP4 VP1|VP2 VP3|VP4

been

must dancing have

VP3|VP4 VP1|VP2 VP4 VP2|VP3

The above examples of permutations that in CCG are neither allowed as base orders nor
as alternates for the two constructions include lexical types indicating the dominance
relations among these elements, using an order-free categorial notation, in which a
category of the form X|Y indicates an element that combines with a sister constituent of
type Y adjacent to the right or to the left to yield a constituent of type X.2

These categories immediately offer a hint as to a hard constraint forcing the two
orders 2413 and its mirror-image 3142 to remain unattested. A formal proof is set out
以下, but it is obvious from inspection of (2) 和 (3) that these are the only two
permutations in which no category of the form X|Y is adjacent to a category Y, or to a category
是|Z that could yield Y as its result.

It is also interesting that the two permutations 2413 and its mirror-image 3142 是
distinguished by mathematicians and computer scientists as the only two permutations
on the order 1234 that are non-separable (Avis and Newborn 1981; Bose, Buss, and Lubiw
1998). This observation motivated Steedman (2020) to hypothesize that for a ﬁxed
natural order of dominance of arbitrary length all permutations generated by CCG are
within the set of separable permutations.

The main contribution of the present article is to show that this hypothesis indeed
holds for sentences of arbitrary length. 此外, we also establish a close connection
between CCG derivations and separating trees, a connection between CCG parsing and
the number of paths in a lattice, and the relation of this result to the other grammar
formalisms, 即, ITG (吴 1996), TAG (Joshi 1985), MG (Stabler 1996), and CAT
(威廉姆斯 2003).

The paper is structured in the following way. 在部分 2 we give a short overview
of CCG. 在部分 3 we give a more speciﬁc deﬁnition of natural order of dominance
and prove that Steedman’s (2020) hypothesis is true. 部分 4 shows an alternative,
possibly more intuitive, way of enumerating the permutations that CCG allows, 首先
by establishing the isomorphism between the normal forms of CCG derivations and
separating trees, and then by showing the connection between CCG shift-reduce parsing
and the Large Schr ¨oder Number of paths in a lattice on the Cartesian plane. 部分 5
establishes the statistical signiﬁcance of the fact that the two non-separable permuta-
tions that are predicted to be unattested are among the vanishingly small number of
permutations that are in fact empirically unattested so far. 部分 6 puts this result in
the context of other grammar formalisms and their expressiveness. Appendix A gives
CCG derivations for all 22 separable permutations of the attested NP and VP word
orders.

2 This notation is isomorphic to the Minimalist Programmatic notation of Chomsky (1995乙; 2001), 其中

现在 |Y is written =Y, and which is essentially categorial (Chomsky 1995a, 2000; Stabler 2011;
Adger 2013).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

2. Combinatory Categorial Grammar

Combinatory Categorial Grammar (CCG) is a radically lexicalized theory of grammar in
which all language-speciﬁc syntactic and semantic information concerning word order
and subcategorization is speciﬁed in lexical entries or categories, and is projected onto
the sentences of the language by universal rules that are combinatory, 在某种意义上说
they apply to pairs of non-empty, strictly contiguous, 类别. The latter property
immediately guarantees that no CCG can recognize the unattested permutation in (2)
和 (3).

2.1 The Categorial Lexicon

The lexical fragment for the very common English NP order in CCG is a directional
categorial grammar, in which all instances of | are instantiated as /, meaning that they
have to combine with an element to the right, 因此:3

(4) These = NP/NumP
= NumP/N(西德:48)

ﬁve
young = N(西德:48)/氮
lads

= N

Slashes identify categories of the form X/Y and X\Y as functions taking an argument of
syntactic type Y to the right or to the left, and as yielding a result of type X.

相比之下, the following lexical fragment deﬁnes the even more common mirror-
image word order glossed as “Lads young ﬁve these,” as required, 例如, 为了
Yoruba (Hawkins 1983, 页 119).

(5) “These” = NP\NumP
= NumP\N(西德:48)

“ﬁve”
“young” = N(西德:48)\氮
“lads”

= N

2.2 Rules of Function Application

The universally available rules (6) of syntactic combination called forward and back-
ward application (respectively labeled > and <) allow syntactic derivation from such lexicons. (6) The Application Rules a. X/(cid:63)Y Y ⇒ X b. Y X\(cid:63)Y ⇒ X (>)
(<) The type (cid:63) of the slashes in X/(cid:63)Y and X\(cid:63)Y limit the categories to which these rules can apply, and can be ignored for the moment. 3 Of course, we need further lexical categories to allow, e.g., These young lads, Five lads, etc., as NP. This might be done via underspeciﬁcation using X-bar-theoretic features (Chomsky 1970), but we will ignore them for now. 14 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Stanojevi´c and Steedman Formal Basis of a Language Universal The forward rule (6a) allows the following derivation for the English lexicon (4): (7) These ﬁve young lads NP/NumP NumP/N(cid:48) N(cid:48)/N N >

氮(西德:48)

NumP

The rightward arrow > on all combinations in (7) indicates that it is the rightward
functional application rule (6A) that has applied in these cases.4

The derivation in (7) is clearly isomorphic to the standard linguistic context-free
phrase structure tree, except that it is the right way up—that is, its leaves are at the top
and its root at the bottom.

Together with the rightward and leftward application rules (6) for combination to
the right and left, such categories immediately generate eight of the 24 permutations, 作为
如下:

(8) A.

乙.

哦.

ﬁve

这些

ﬁve
young lads
NP/NumP NumP/N(西德:48) 氮(西德:48)/N N
lads young
ﬁve
NP/NumP NumP/N(西德:48) N N(西德:48)\氮
young lads
NP/NumP N(西德:48)/N N NumP\N(西德:48)
lads young
NP/NumP N N(西德:48)\N NumP\N(西德:48)
young lads
NumP/N(西德:48) 氮(西德:48)/N N NP\NumP
lads young
NumP/N(西德:48) N N(西德:48)\N NP\NumP
ﬁve
氮(西德:48)/N N NumP\N(西德:48) NP\NumP
ﬁve
N N(西德:48)\N NumP\N(西德:48) NP\NumP

这些

Five

w. Young lads

X. Lads young

Very many languages

Many languages

Few languages

Many languages

Few languages

Many languages

Very few languages

Very many languages

Cinque (2005) gives counts for each word order in his sample of 700 or so languages,
quantized to four levels. The present counts are updated according to Cinque’s 2013
sample of around 1,500 语言, with “very few” corresponding to 10–29, “few” to

4 Although compositional semantics and logical form are suppressed for the purposes of this article, 这

semantics of the rules in (6) is also the application of semantic functions such as young(西德:48) to arguments such
as lads(西德:48) to yield logical forms such as young(西德:48) lads(西德:48). 一般来说, if the functor X | Y has logical form f and the
argument Y has logical form a, then the result X always has logical form f a (read “f of a”). 因此,
semantics is “surface compositional” in CCG.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

30–99, “many” to 100–299, and “very many” to 300+ (比照. Merlo 2015, 桌子 1). The dis-
continuous alpha-numeration in (8) again reﬂects the place of these orders in Cinque’s
ordering of the 24 permutations of these elements introduced earlier in Figure 1, 哪个
we take as standard. All eight orders are among those endorsed by Cinque and Nchare.5
The above eight orders are the ones obtained by treating the tree (7) as a mobile,
allowing sisters to rotate around each other. The same eight orders are similarly avail-
able for the Germanic serial verb construction, although for these orders we have no
meaningful frequency estimates:

(9) A. 将要

帮助

teach swim
VP1/VP2 VP2/VP3 VP3/VP4 VP4
swim teach

帮助

乙. 将要

VP1/VP2 VP2/VP3 VP4 VP3\VP4

n. 将要

teach swim help

哦. 将要

VP1/VP2 VP3/VP4 VP4 VP2\VP3
swim teach
VP1/VP2 VP4 VP3\VP4 VP2\VP3

帮助

teach swim will

VP2/VP3 VP3/VP4 VP4 VP1\VP2
swim teach
VP2/VP3 VP4 VP3\VP4 VP1\VP2

帮助

将要

teach swim help

将要

VP3/VP4 VP4 VP2\VP3 VP1\VP2

X. swim teach

帮助

将要

VP4 VP3\VP4 VP2\VP3 VP1\VP2

Again all of these orders are attested in Germanic, although Abel’s endorsement of (s)
is equivocal, on the grounds that it is only attested as an alternate in his sample.

It is striking that all possible orders based on purely applicative derivations appear
to be attested for both constructions, and that all of the NP orders admitted by Cinque
(2013) with “very many” or “many” attestations are among them, suggesting soft con-
straints favoring “harmony” or consistent directionality across categories and purely
applicative derivations.6

然而, 另一个 14 orders attested by the authorities in Figure 1 involve disconti-
nuities. 例如, in Figure 1c, in its instantiation as the Masai NP order “These lads
ﬁve young,” element 3 the adjective “young” is separated from its 4, the noun “lads” by
2, the numerator “ﬁve.” How do we allow the attested orders without also allowing the
unnattested orders in (2) 和 (3)?

5 Cinque’s (2013) study doubles the size of his sample to around 1,500 语言, and adds no new

attested orders to his original 14—unsurprisingly, given the power-law distribution of the frequencies.

6 Abels and Neeleman (2012) and Abels (2016) assume that these orders are base-generated, 没有

movement.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

这样做, something more than the simple application rules is required. Cinque
and others propose transformational movement as that “something more.” The present
approach offers an alternative to movement, or any other syntactic operation of “action
at a distance” over non-contiguous elements.

2.3 Rules That Change Word Order

CCGs also include universally available rules of functional composition, strictly limited
in the ﬁrst-order case to the following four combinatory rules:7

(10) The Harmonic Composition Rules
A. X/(西德:5)Y Y/Z ⇒B X/Z
乙. Y\Z X\(西德:5)Y ⇒B X\Z
(11) The Crossing Composition Rules
A. X/×Y Y\Z ⇒B× X\Z
乙. Y/Z X\×Y ⇒B× X/Z

(>B )
(B×)
(B2

×)

The combination of crossing rules and generalized composition is the source of (轻微地)
greater than context-free expressive power in CCG, allowing analyses of trans-context-
free constructions like Germanic crossed dependencies (Bresnan et al. 1982; 斯蒂德曼
1985, 2000).

All syntactic rules in CCG are subject to a generalization called the Combinatory
Projection Principle (CPP), which says that rules must apply consistent with the direc-
tionality speciﬁed on the primary function X|是, and must project unchanged onto their
result the directionality of any argument(s) speciﬁed on the secondary function Y|Z:8

(13) The Combinatory Projection Principle (CPP)

Syntactic combinatory rules are binary rules that apply to contiguous non-empty
categories of the speciﬁed syntactic types in the speciﬁed linear order, such that the
syntactic type and directionality of any argument in the inputs that also appears in
the result are the same.

7 It will be convenient in the following discussions to refer to the function X|Y in such rules as the

“primary” function, and to the other function Y|Z (ETC。) as “secondary.” While we continue to suppress
explicit semantics for the purposes of the present article, like the application rules (6) the composition
规则 (10) 和 (11) have an invariant surface-compositional semantics, such that if the meaning of the
primary function X|Y is a functor f and that of the secondary function Y|Z is g, then the meaning of the
result X|Z is λz.f (g z), the composition of the two functors, which if applied to an argument of type Z and
meaning a, yields an X meaning f (g a).

8 This principle is deﬁned more formally in Steedman (2000; 2012) in terms of three more elementary

Principles of Adjacency, Consistency, and Inheritance.

17

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

Because directionality is originally speciﬁed in the lexicon, this principle amounts to the
statement that syntax cannot override the word order speciﬁed there.

尤其, this principle universally excludes rules like the following from CCG:

(14) 是

Y/Z
X/Y
X/Y

X/Y
X/Y
Y/Z
Z

(西德:54)⇒ X
(西德:54)⇒ X/Z
(西德:54)⇒ X\Z
(西德:54)⇒ X Z

是

The same principle excludes all movement, copying, deletion-under-identity, or other
action-at-a-distance, all structure-changing operations such as “restructuring,” “reanal-
分析,” or “reconstruction,” and all “traces” and other syntactic empty categories, 和
makes syntactic derivation strictly type-dependent, rather than structure-dependent.

The types (西德:5) and × on the slashes on the primary function X|Y in the composition
规则 (10) 和 (11), like the type (西德:63) on the application rules (6), allows categories to be
lexically restricted as to which of the rules can apply to them or to their projections
under the CPP. The absence of speciﬁc slash-typing on the secondary function Y|Z is an
abbreviation meaning that it schematizes over all slash types. 然而, the CPP (13)
requires that the corresponding slash type on the argument |Z in the result X|Z is the
same slash type.

This slash-typing system is different from the one used by Baldridge and Kruijff
(2003) in that types are not organized into a type hierarchy. Avoiding slash type hier-
archy allows for a more strict speciﬁcation of which rules can be applied. The type we
now write as X/(西德:5)(西德:63)Y was written following Steedman (2005) and Baldridge and Kruijff
(2003) as X/(西德:5)是, and X/×(西德:63)Y was written X/×Y.9

The inclusion of the harmonic composition rules (10) allows some additional
derivations, and supports a variety of “non-constituent” coordinations (斯蒂德曼 1985;
Dowty 1988), of which the following is a simple example:10

(15)

这些

ﬁve

胖的

和

七

lean

cattle

NP/(西德:5)(西德:63)NumP NumP/(西德:5)(西德:63)氮(西德:48) 氮(西德:48)/(西德:5)(西德:63)氮 (X\(西德:63)X)/(西德:63)X NumP/(西德:5)(西德:63)氮(西德:48) 氮(西德:48)/(西德:5)(西德:63)N N

NumP/(西德:5)(西德:63) 氮

>B

NumP/(西德:5)(西德:63) 氮
(NumP/(西德:5)(西德:63) 氮)\(西德:63)(NumP/(西德:5)(西德:63) 氮)
NumP/(西德:5)(西德:63) 氮

>B

>

< >

NumP

>

NP

The crossing composition rules (11), unlike the harmonic rules (10), have a reorder-
ing effect that is relevant to the present discussion. 例如, in English they allow
a non-movement-based account of the “Heavy NP Shift” construction, 因此:

9 The present slash-type system is believed to be equivalent to the rule-based type-system of Steedman
(2000), distinct from the earlier one investigated by Kuhlmann, Koller, and Satta (2015), and restoring
weak equivalence to TAG.

10 The variable X in the conjunction category schematizes over a bounded number of types. The category’s (西德:63)

slash types impose the across-the-board constraint on coordination (斯蒂德曼 2012), and are a
consequence of its semantics, which follows Partee and Rooth (1983).

18

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

(16)

我

将要

buy tomorrow a very heavy book

NP (S\ NP)/VP VP/NP VP\ VP

S\ NP
< S >

It will be obvious from the above derivation that allowing the crossing composition
规则 (11) to apply to unrestricted categories induces alternation of word order, as here
between the Heavy-Shifted order in (16) and the canonical order I will buy a book
tomorrow. 斯蒂德曼 (2020) shows that such word-order alternations can be excluded
for ﬁxed NP word-order languages of the kind considered by Cinque by restricting the
slash type of the functor categories in the lexicon as either × (“only crossing-compose”)
或者 (西德:63) (“only apply”).

如果, as in the alternation in English between (15) and the canonical order, we want
a category to combine by both forward harmonic composition and forward application,
then we assign a lexical category of the form X/(西德:5)(西德:63)是, bearing the union of (西德:5) 和 (西德:63) slash
类型, as in derivation (15). If we want all three rule-types to apply to a forward category,
to allow free word order, then we assign it the union of all three slash types X/(西德:5)×(西德:63)是,
which to save space and maintain compatibility with earlier notations we write as the
universal forward slash X/Y as in (16). 然而, such details of language-speciﬁc ﬁne-
tuning of grammars can be ignored for the present purposes.

3. The Generalization Concerning the Permutations Allowed by CCG

In this section we prove that the earlier claim that CCG generates only separable per-
mutations holds for constructions of arbitrary length. 部分 3.1 extends the deﬁnition
of natural order of dominance to an arbitrary number of elements and shows how the
natural order of dominance follows from the CCG lexicon that deﬁnes constructions.
部分 3.2 then deﬁnes the notion of separable permutations. Sections 3.3 和 3.4 deﬁne
the proposition more formally and prove its correctness. 部分 3.5 shows how this
result applies to the standard version of CCG including lexical type-raising and the full
range of combinatory rules.

3.1 Natural Order of Dominance

The notion of natural order of dominance has been used up until this point informally,
in the context of the categories deﬁning the four core elements of the NP and VP
constructions that point the way to a generalization. To see the generalization to other
constructions of arbitrary length, a more formal deﬁnition is needed.

We deﬁne the Natural Order of Dominance (NOD) over categories of type X|是
where both X and Y are categories of ﬁrst order, as follows. A word ai with a category
X|Y immediately dominates a word aj with category Y|Z.11 A NOD over n words is then

11 This deﬁnition of immediate dominance is a special case of dependency structure for Pure First-Order

CCG as deﬁned by Koller and Kuhlmann (2009). We shall see below that this deﬁnition in fact generalizes
to the second-order case where X and/or Y are function categories.

19

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

deﬁned as an ordering a1, . . . an such that ai immediately dominates ai+1 for all i < n. (There may be more than one NOD if there is more than one word of the same type.) Because every word’s category can select for only one argument category, and because each word ai in a NOD immediatly dominates word ai+1, that NOD deﬁnes a total ordering of words paired with their categories. We can write any NOD as a . . . Xn−1|Xn, Xn|Xn+1 where each category sequence of categories X1|X2, X2|X3, Xi|Xi+1 is a category of a word taking position i in the NOD. In the previous sections we have invoked NOD over the elements of noun and verb phrases with categories: NP|NumP, NumP|N(cid:48), N(cid:48)|N, N and VP1|VP2, VP2|VP3, VP3|VP4, VP4. These sequences match the generalized deﬁnition of NOD except for the last element being atomic Xn instead of a function category Xn|Xn+1. To allow such sequences, we extend the deﬁnition of NOD to allow the last element of the node Xn|Xn+1 to be an atomic Xn. Because |Xn+1 will never be canceled, it makes no difference to NOD whether it is present or not. In the proofs we will use the Xn|Xn+1 form for convenience. The linearization of these n words that is consistent with their NOD will be called their Canonical Linearization (CL). 3.2 Separable Permutations and Separating Trees Separable permutations were ﬁrst formulated in algorithmic form by Avis and Newborn (1981) and have since then been rediscovered in different (but formally equivalent) forms such as bootstrap percolation in permutation matrices (Shapiro and Stephens 1991), permutations avoiding the “forbidden” patterns 2413 and 3142 (West 1996), and inversion transduction grammar (ITG) permutations (Wu 1996, among others). The pattern-avoiding deﬁnition of West (1996) can be stated in the following way: The permutation π is separable if and only if there are no digits a < b < c < d that appear in π as b . . . d . . . a . . . c or c . . . a . . . d . . . b. For example, 4 1 6 3 5 2 is not separable because the digits 3 < 4 < 5 < 6 form the forbidden pattern 4 . . . 6 . . . 3 . . . 5. Notice that the elements of the forbidden pattern do not need to be immediately next to each other: They are forbidden even when discontiguous. Bose, Buss, and Lubiw (1998) deﬁne the separable permutations in terms of separat- ing trees, which we deﬁne for present purposes as follows: (17) A separating tree over a permutation of a totally ordered set 1 . . . n is a binary tree in which each node dominates a continuous range i . . . k of elements of the ordering, which its chidren split exhaustively into two contiguous continuous sub- ranges i . . . j and j + 1 . . . k. Nodes are labeled + if the range of their left child precedes that of their right child, and − if the range on the left succeeds that on the right. For example, a permutation such as 4 3 1 2 5 8 6 7 over the range [1 . . . 8] can be split into two contiguous sub-sequences over continuous ranges, namely: 4 3 1 2 over range [1 . . . 4] and 5 8 6 7 over range [5 . . . 8], with the resulting binary node labeled +. If we continue to recursively factorize contiguous sub-sequences over continuous ranges, we form the separating tree shown in Figure 2a. It is important for what follows to note that the identities of the leaves of a separable tree are redundant, in the sense that they are fully determined by the tree structure and + / − labels on the non-terminal nodes. Bose, Buss, and Lubiw (1998) show that labels on separating trees deﬁne a sorting operation that reorders the children based on their ranges: If the label on a node is + 20 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Stanojevi´c and Steedman Formal Basis of a Language Universal + + + + − + − + + + + + − + − + + − + + − + + + + + + + 34 21 7685 43 21 7685 21 43 7685 21 43 765 8 (a) (b) (c) (d) Figure 2 Separating trees. then the range in the canonical linearization covered by the left child is ordered before that of the right child, and the existing order and label + is retained. If the label is −, then the CL range covered by the left child is ordered after that of the right, and the − operator is applied to restore the canonical order of its children, changing the label to +. This sorting procedure is exempliﬁed in the series of transformations of the trees in Figure 2 that swap children of all nodes labeled with − until the canonical order results. The restriction of separating trees to binary structures limits the permutations that they can cover. For instance, we cannot create a separating tree for the permutation 2 4 1 3 because it cannot be split into contiguous subtrees covering continuous ranges. Bose, Buss, and Lubiw (1998) show that the set of permutations that is described by separating trees is precisely the set of separable permutations. In other words, every separable permutation can be mapped to a separating tree and every separating tree can be mapped to a separable permutation. Although it is clear that every separating tree maps to a unique separable permuta- tion, some separable permutations can be mapped to multiple different separating trees. We will return to this point in Section 4. 3.3 The Claim We observe that there is a direct relation between CCG derivations and separating trees. The intuition is clear from Figure 3: All CCG derivations for the canonical lineariza- tion of the ﬁve-element NOD, exempliﬁed in Figure 3a, consist entirely of forward combinations, while the isomorphic separating trees for the canonical linearization, exempliﬁed in Figure 3b, consist entirely of positive + nodes (cf. Figure 2d). Similarly, CCG derivations for non-canonical linearizations, exempliﬁed in Figure 3c, include backward combinations, which correspond to − nodes in the isomorphic separating tree in Figure 3d.12 We seek to prove that the following claims hold for any NOD of n elements: 1. 2. 3. CCG can derive only separable permutations of the canonical linearization. CCG can derive all separable permutations of the canonical linearization. The cardinality of the set of permutations derivable by CCG is bounded by Sn−1, the n−1th element of the Large Schr ¨oder Number series (OEIS series A006318). 12 The actual combinatory rules involved will either be crossed or harmonic, but we can ignore the distinction here because of our use of the universal | slash. 21 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 47, Number 1 >乙

+

>B

>B

+

+

>B

+

X1|X2

X2|X3

X3|X4

X4|X5

X5|X6

X1|X2

X2|X3

X3|X4

X4|X5

X5|X6

(A) CCG derivation for canonical linearization.

(乙) Separating tree for canonical linearization.

>B

+

乙

+

X2|X3

X1|X2

X4|X5

X5|X6

X3|X4

X2|X3

X1|X2

X4|X5

X5|X6

X3|X4

(C) CCG derivation for non-canonical linearization.

(d) Separating tree that derives the same non-
canonical linearization.

数字 3
Relation between CCG and separating trees.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

We begin by proving three helpful lemmas. To prove the ﬁrst of these lemmas we
use strong mathematical induction, sometimes also called complete induction. To prove
that some proposition P(n) holds for all natural numbers using strong induction it is
sufﬁcient to prove the following statements:

(我)

(二)

base case P(1) 和
inductive step [磷(1) ∧ P(2) ∧ · · · ∧ P(k)] =⇒ P(k + 1) holds for all k ∈ N

Strong induction is different than the more typical weak induction in the strength of
the assumption that is taken in the inductive step: Weak induction assumes only P(k),
whereas strong induction assumes P(1) ∧ · · · ∧ P(k). Both are equally sound proving
策略, but strong induction is more convenient for proving propositions about trees.
A more detailed distinction between weak and strong induction is available in many
textbooks that cover proof methods (Eccles 1997; Liebeck 2010; Gunderson 2014).

3.4 The Proof

Lemma 1
For a ﬁxed NOD of n elements with canonical linearization X1|X2 X2|X3 . . . Xn|Xn+1:

CCG derivations can form all possible contiguous binary branching trees;

Every node that spans from i to j in every tree derives a category Xi|Xj+1.

(我)

(二)

22

Stanojevi´c and Steedman

Formal Basis of a Language Universal

Proof. Base case:
For n = 1 we have a single category in the canonical linearization X1|X2. This is the only
possible tree and it is binary in a trivial sense. Its only node spans from 1 到 1 deriving
category X1|X1+1.
Inductive step: Assuming the strong inductive hypothesis holds for all 1 ≤ k ≤
n, we prove that it also holds for n + 1. We analyze the canonical linearization
. . . Xn+1|Xn+2 by splitting it into two contiguous parts where the ﬁrst
X1|X2 X2|X3
. . . Xn+1|Xn+2 for any
. . . Xk|Xk+1 and the second one is Xk+1|Xk+2
one is X1|X2
k ∈ [1 . . . n]. Each of these two parts forms a smaller NOD with length ≤ n. 从
inductive assumption it follows that for both of these parts CCG can form all possible
binary branching trees, and that all nodes spanning i to j within these parts will be
labeled with Xi|Xj+1. That means that the root node of the left and right part will be
X1|Xk+1 and Xk+1|Xn+2, 分别. Now we can combine the left and right part into
the whole structure using forward function composition >B to give us a new node
X1|Xn+2. This node satisﬁes the labeling condition of the lemma. The binary branching
condition is also satisﬁed because we put no constraints on the size of the spans of the
left and the right part (as long as they are non-empty), and by the inductive assumption,
(西德:3)
each part can derive all possible binary trees.

Lemma 2
All word orders that can be derived by CCG for a given NOD can also be derived by
an isomorphic separating tree that reorders the same permutation into the canonical
linearization for that NOD.

Proof. It is sufﬁcient to show that any CCG derivation tree corresponds to an isomorphic
separating tree over the same permutation.

Consider any arbitrary CCG derivation tree for a canonical linearization of a NOD

of length n. 例子: One such derivation for the NOD for n = 5 is in Figure 3a.

From Lemma 1, we know that all nodes with span i to j will have label Xi|Xj+1
and all binary nodes will use forward function composition >B. We can derive the
same order with an isomorphic separating tree where instead of using CCG forward
combination we have the + operator at every node, because the children are in the
canonical order. 例子: Figure 3b.

Non-canonical linearization in CCG requires backward combinations. For any node
in a CCG derivation for the canonical linearization whose left child has category
席|Xj+1 and right child category Xj+1|Xk+1 we can obtain the inverse order by reversing
the directionality of the combination and reversing the order of the child nodes, 制作
Xj+1|Xk+1 precede Xi|Xj+1, so that all words in the range j + 1 to k in the canonical order
precede words i to j. 例子: Figure 3c shows this reordering effect for the two nodes
where forward is replaced by backward combination.

Any CCG derivation reordered in this way corresponds to an isomorphic separating
tree in which the unrotated nodes corresponding to forward combination are relabeled
作为 + and the rotated nodes corresponding to backward combination are relabeled as −.
例子: Figure 3d.

By the result of Bose, Buss, and Lubiw discussed in Section 3.2, applying the −
operator to convert all such nodes to + and invert the linear order of their children
reverses the process to recover the canonical linearization of which this is a separable
(西德:3)
permutation.

23

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

Lemma 3
CCG can derive all possible separating trees for an NOD of arbitrary length.

Proof. By Lemma 1, CCG can derive all possible binary branching trees on the canonical
linearization. By Lemma 2, each node in each of these binary trees can be labeled either
+ or −, deﬁning a separating tree isomorphic to the CCG derivation for the correspond-
ing linearization. Because we saw in Section 3.2 that a separating tree determines the
identities of its leaves, the set of all possible binary trees with all possible node labelings
(西德:3)
is the set of all possible separating trees under the deﬁnition (17) 那里.

最后, we can prove the main claims that relate CCG derivations of NOD and

separable permutations of canonical linearization of NOD.

Theorem 1
CCG derives only separable permutations.

Proof. By Lemma 2, all CCG derivations can be mapped to separating trees. All separat-
ing trees correspond to a separable permutation (Bose, Buss, and Lubiw 1998). 所以
(西德:3)
CCG derives only separable permutations.

Theorem 2
CCG derives all separable permutations over a NOD.

Proof. By Lemma 3, CCG can derive all possible separating trees. Every separable
permutation corresponds to a separating tree (Bose, Buss, and Lubiw 1998). 所以,
(西德:3)
CCG derives all separable permutations.

Theorem 3
The cardinality of the set of permutations derivable by CCG is bounded by Sn−1, 这
n − 1th Large Schr ¨oder Number.

Proof. By Theorems 1 和 2, CCG derives all and only separable permutations over an
NOD of length n. The number of separable permutations of length n is the n−1th Large
Schr ¨oder Number (Shapiro and Stephens 1991). 所以, the number of permutations
(西德:3)
derivable by CCG for that NOD is also n−1th Large Schr ¨oder Number.

The recurrent formula for this series is given below, the base case being S0 = 1:

(18)

Sn = Sn−1 +

n−1
(西德:88)

k=0

SkSn−k−1

3.5 Generalization to Full CCG

The above results have been proven using a restricted version of CCG called “pure
ﬁrst-order” CCG (Koller and Kuhlmann 2009), in which all variables in categories
席|Xi+1 correspond to atomic types such as N, rather than function categories, 例如
(S\NP). The standard version of CCG is a second-order calculus, in which arguments
can have function types, a possibility that is central to the CCG analysis of relativization,
协调, and adjunction, including the phenomenon of parasitic gaps. 的确, 全部

24

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

cases of so-called movement reduce in CCG to contiguous merger with a semantically
second-order functor.

Categories that select for multiple arguments are “Curried” unary functors in CCG,
with a function as their result, 例如 (席|Xj)|Xi+1. They do not deﬁne a NOD over the
arguments Xi+1, Xj, because those arguments do not stand in a dominance relation, 和
they potentially allow the second-level composition rules such as (12) to apply. Never-
theless, they have a canonical linearization in which all combinations are forward, 和
(by Lemma 2) dominate separating trees over separable permutations of that canonical
ordering, although they no longer allow all separable permutations. (The large Schr ¨oder
numbers thus constitute an upper bound on the allowable permutations of canonical
linearizations in CCG.)

The present results generalize immediately to full CCG. The semantics of the com-
binatory rules is invariant, so it makes no difference to the combinatorial possibilities
if the “canceling” variable Y in a rule such as application matches an atomic category
or a function category. (实际上, we have already exploited this possibility in writing the
atomic symbols VP and N for what are semantically predicate or functional types.)

Type-raising is a lexical operation that exchanges the role of argument and function
by making the former into a function over the latter and that does allow permutations
that would not be allowed by the original categories. It does so by changing the NOD
among the categories. 然而, because this is a lexical operation whose effects are
determined before syntactic derivation can apply, the main result of the present paper
仍然适用: Only separable permutations on the canonical linearization order deﬁned
by the lexical categories can be captured.

4. Normal-Form Separating Trees and CCG Parsing

The results shown above depended on the fact that the canonical linearization on the
category sets of interest is binary tree-complete, and therefore separating tree-complete
under rotation, and therefore separable-permutation-complete. 然而, we noted ear-
lier that there are more separating trees than separable permutations, because many
separating trees yield the same permutation. 的确, for the canonical linearization of
NOD of length n and its inverse, the number of separating trees is Cn−1, the n−1th
Catalan Number (Church and Patil 1982), which is much larger than the corresponding
Large Schr ¨oder number.

例如, 数字 4 shows all eight separating trees derived by rotation from the
two binary trees on the canonical linearization of the canonical linearization CL3 of three
元素, 其中 + on each of the two nodes means that it is unrotated and − means
that it is rotated. These are indeed all separable permutations of three elements, 但是
two orders 1,2,3 和 3,2,1 are represented by two separating trees, where C3 = 2. 为了
the purpose of efﬁcient parsing with CCG, we would like there to be only one CCG
derivation, and one separating tree, for each separable permutation.

4.1 Normal-Form Separating Trees

Shapiro and Stephens (1991) observed that the spurious ambiguity of separating
trees arises when two nodes that are in parent–child relation have the same label, as for
the trees in Figure 4a and 4b. They proposed a normal form that makes the mapping
between the set of normal form separating trees and the set of separable permutations
bijective. This normal form states that no node can be labeled with the same +/−
label as its right child. This excludes trees like those in Figure 4a and 4e and means

25

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

+

+

-

-

+

+

+

+

321

21

3

32

1

213

(A)

-

(乙)

-

(C)

+

(d)

+

-

-

-

-

123

23

1

12

3

231

(e)

(F)

(G)

(H)

数字 4
All possible separating trees for a three-word sentence.

that the number of normal-form separating trees is the same as that of the separable
permutations.

This normal form for separating trees is reminiscent of the CCG derivational
normal-form proposed by Eisner (1996) to enforce a one-to-one mapping between CCG
syntactic derivations and semantic interpretation that they generate (比照. Hockenmaier
and Bisk 2010). Eisner normal-form forbids particular sequences of combinatory rules
that are of the same type from occurring in a parent-child relationship in the derivation.
Using normal-form separating trees, we can distinguish the possible cases, 显示
图中 5. Figures 5a and 5b show the conﬁgurations where the right child is a
single word. Figures 5c and 5d show those where the right child is a non-single word
constituent that adheres to the normal form. The length of the examples is identiﬁed as
n + 1 in order to be consistent with the conventional numeration for the Large Schr ¨oder
numbers. The length of the constituent A is k. We are interested in the number of normal-
form separating-trees Sn, since that will also be the number of separable permutations
of length n + 1.

The number of trees that conform to the pattern in Figure 5a clearly depends on the
number of possible concrete sub-trees that can act as constituent A. Here A is of length
k = n, which means that the number of separating trees of the pattern in Figure 5a is
Sn−1. If we ignore the label of the root node, the other three patterns show all possible
ways of constructing the right child: having a single terminal node, or binary node
labeled + or −. This means that we can express the sum of trees conﬁrming to these
three patterns as the number of trees that have their root unlabeled (since we are ignor-
ing the root label) and have the left and right child conﬁrming to the normal form for
separating trees. For a given k, which plays a role of a split point in parsing, 号码
of those trees would be Sk−1Sn+1−k−1, but because k can be any number between 1 和
n, and is not ﬁxed, we need to sum over all of those values (西德:80)n
k=1 Sk−1Sn−k. We can shift
this sum to start from zero to obtain the equivalent formula (西德:80)n−1
k=0 SkSn−k−1. 什么时候我们
put the formulas for all four patterns together we get the Large Schr ¨oder recurrence in
方程 (18). We can conclude that the number of normal-form separating trees for a
NOD of length n is also Sn−1.

26

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

+

-

A

A

0

n

n+1

0

n

n+1

(A)

+

(乙)

-

-

+

A

乙

C

A

乙

C

0

k

(C)

n+1

0

k

(d)

n+1

数字 5
Normal form branching conﬁgurations.

4.2 The Relation of Large Schr ¨oder Numbers to Shift-Reduce Parsing

The Large Schr ¨oder numbers are often also presented in relation to the number of lattice
paths in the Cartesian plane such that each path starts at lower-left corner (0, 0), 结束
at upper right corner (n, n), contains no points above the diagonal y = x (那是, 能
only be in the lower right triangle), and is composed only of unit steps →, ↑, 和 (西德:37)
or more concretely (1, 0), (0, 1), 和 (1, 1) (Weisstein 2018). In this section we show how
this view of Large Schr ¨oder numbers can be interpreted as Shift-Reduce parsing with
CCG derivations corresponding to the normal-form separating trees deﬁned above.

Shift-Reduce is one of the most widely used parsing algorithms in computational
linguistics and it is probably the simplest way to construct a parser. It is based on
having a parsing state with a stack (ﬁlled with constituents built so far) and a buffer
(ﬁlled with words in the sentence sufﬁx that are yet to be processed). A shift transition
takes a word from the buffer and puts it on top of the stack. For binarized grammars
like CCG, a reduceX transition replaces the two topmost constituents on the stack with
a single new constituent labeled X. In our CCG derivations we will have two reduce
运营: reduce+ for forward combinators and reduce− for backward combinators.
We will again take an example sentence of length n + 1. The initial parsing state will be
a stack containing the ﬁrst word and a buffer containing the other n words.

We can visually represent any valid successful shift-reduce parsing trace by starting
at the point (0, 0) and taking → to represent a shift transition and ↑ labeled + or − to
represent a reduceX transition. The Shift-Reduce transition sequence for the derivation
tree in Figure 6a is shown in Figure 6b. 清楚地, such paths will never cross the diagonal,

27

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

-

-

+

+

-

546

231

6

4

5

1

3

2

(A) separating tree

(乙) all reduce transitions are labeled

6

4

5

1

3

2

6

4

5

1

3

2

(C) only ﬁrst reduce in the reduce
sequence is labeled

(d) no labels

数字 6
A separating tree and different representations of the shift-reduce transition sequence that
recognizes it.

because we can never have more reduce operations than shifts and every successful
parse has n shifts and n reduce steps, so that every path ends at (n, n).

The set of SR parse paths can be further restricted to those corresponding to the
the normal-form parses and separating trees deﬁned in the preceding Section 4.1, 作为
如下. 第一的, the normal form says that a parent node cannot have the same label as
its right child node. What that means in SR parsing terms is that we cannot have two
consecutive transitions both of which are reduce+ or reduce−. 所以, when we have a
sequence of reduce operations, expressed as an uninterrupted vertical sequence of ↑,
we need only know the label of the ﬁrst reduce transition in the sequence because the
other labels can be inferred from the normal form: They will alternate between + 和
-. This modiﬁed representation is shown in Figure 6c.

The ﬁnal modiﬁcation is to use a slightly different encoding of the initial reduce in
each of the reduce sequences. The ﬁrst reduce step in the sequence always comes after
a shift, which means that it forms a shape → ↑. If that sequence is [转移, reduce−], 我们
will encode it with a diagonal (西德:37) and if it is [转移, reduce+] we will encode it with
existing→ ↑ but without any label, as shown in Figure 6d. This encoding deﬁnes a
one-to-one mapping between CCG normal-form shift-reduce parses and paths in the
Schr ¨oder lattice.

These observations support another way to think about the number of permutations
that CCG allows for a NOD of length n in terms of SR normal-form parsing. 什么时候

28

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

+ 1,y)

(y

Sn−k−1

Sn−1

Sk

1

0
(A) Paths starting with (西德:37) 过渡

n

0

k+1

1
(乙) Paths starting with → transition

n

数字 7
Counting paths in the lattice for different initial transitions.

we start in the initial position (0, 0) we can take paths that start either with (西德:37) or →
(↑ is illegal because it would cross the upper diagonal). If we take (西德:37) (Figure 7a) 我们
ﬁnd ourselves in a lattice of the same shape, smaller by one unit, 这意味着
number of normal-form paths through it is Sn−1. 如果, 另一方面, we take → we
will end up in position (1, 0). Any path that goes from (1, 0) 到 (n, n) must at some point
have a vertical arrow that crosses diagonal (y + 1, y), as shown in Figure 7b. 第一个
point at which the path crosses that diagonal will be some concrete point (k + 1, k). 这
number of paths that go from (1, 0) 到 (n, n) and contain arc ↑ at (k + 1, k) is SkSn−k−1.
再次, we need to sum over all possible paths that cross the diagonal (y + 1, y) 超过
each possible k, which gives (西德:80)n−1
k=0 SkSn−k−1. Summing results for initial (西德:37) and initial
→ again yields the Large Schr ¨oder number recurrence function from Equation (18).

5. Statistical Signiﬁcance of the Global Result

The above results conﬁrm that all 22 separable permutations are allowed, at least as al-
ternates, for any construction deﬁned by four elements of the form A|乙, 乙|C, C|D, D, 和
这 2 non-separable permutations are forbidden. The fact that in both the constructions
被调查的, the distribution of the frequencies with which the separable permutation
orders are actually observed are so skewed means that some orders are so vanishingly
rare as to only be attested in one construction, and/or even by only one or two lan-
guages. The possibility that this skewness arises from soft constraints related to ease of
processing or learnability is interesting in their own right, and is the subject of ongoing
linguistic research (Culbertson, 斯摩棱斯基, and Legendre 2012; Merlo 2015; Dryer 2018;
Merlo and Ouwayda 2018).13

尽管如此, according to the present theory, the two permutations (G) 和 (j) 是
excluded for a different reason, 即, by a hard constraint that is a consequence of
the theory of grammar itself. The fact that the two disallowed orders are the only two
that are totally unattested in the union of Cinque’s, Nchare’s, and Abels’s surveys is a

13 This possibility is also acknowledged by Abels and Neeleman (2012).

29

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

strong result that is unlikely to have arisen by chance. The strength of this result can be
assessed quantitatively, as follows.

有 21 permutations that are attested for NP. We will assume that all 21 是
sampled without replacement from a uniform distribution of 24 permutations. 没有
knowing which 21 are attested, we could form 24 theories that exclude one permutation,
24 × 23 that exclude two, 和 24 × 23 × 22 that exclude 3 permutations. For each one
of them, there is a different number of possible observations (IE。, different sets of 21
permutations) that would not falsify them. 例如, a theory that forbids 3 permu-
tations will be consistent only with one possible set of 21 observed permutations, 尽管
a theory that forbids 2 (like the one presented here) will be consistent with 22 不同的
sets of 21 permutations. The probability of our predictions being not falsiﬁed by chance
is the number of observations (permutation sets) that would be consistent with our
theory divided by the total number of possible observations—that is, 大约 1
在 100:

(19)

(西德:19)

(西德:19) = 22
2024

≈ 0.0109

p =

(西德:18) 22
21
(西德:18) 24
21

If on the other hand we accept (H), as tentatively attested on the basis of Abels’ and
Wurmbrand’s accounts, the probability of getting this stronger result by chance drops
to around 1 在 250:

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

(20)

(西德:19)

(西德:19) = 1
276

≈ 0.0036

p =

(西德:18) 22
22
(西德:18) 24
22

Both results are evidently unlikely to come up by chance.

6. 讨论

The prediction of CCG concerning possible word orders is signiﬁcant from both sci-
entiﬁc and engineering perspectives. The Large Schr ¨oder series Sn grows much more
slowly with n than the total factorial number of permutations n!, so that the proportion
n!−Sn−1
of non-separable permutations that are disallowed by CCG grows rapidly with
n!
Sn−1
n! of separable

n, 作为 0%, 0%, 0%, 8%, 25%, 45%, 64%, 79%, 89%, . . . . The proportion
permutations that are allowed shrinks correspondingly rapidly with n.

From a practical NLP perspective there are many areas where this saving can
be achieved, with machine translation and parsing, including “sequence-to-sequence”
语义解析, being the obvious use-cases. If we know the limited number of word-
order variations among languages, alignment algorithms (or attention as its neural
equivalent [Bahdanau, 给, and Bengio 2015]) can be constrained (or biased) 在里面
right direction (Zhang and Gildea 2005; Dong and Lapata 2016).14

14 This observation is also implicit in Whitelock’s (1992) early proposal for “shake and bake” translation for

Japanese.

30

Stanojevi´c and Steedman

Formal Basis of a Language Universal

Unsupervised parser induction is another application where introducing an induc-
tive bias (or hard constraint) for contiguous derivation trees with a normal form sig-
niﬁcantly reduces the number of alternatives that learner needs to consider (Clark and
Lappin 2010). These savings are especially signiﬁcant in comparison to other theories
of grammar that use discontinuous constituents, empty strings, or very powerful, 经常
Turing-complete, uniﬁcation mechanisms.

We saw earlier that the number and type of permutations allowed by CCG appears
to correspond to the crosslinguistic variation in word order that is actually attested.
CCG also provides a strong and easily testable prediction of what word orders will be
found in the future in human languages that have not so far been examined (斯蒂德曼
2020). These properties stem from the combinatory projection principle (13), 并在
particular from the restriction to binary combinatory rules.

迄今为止, the empirical studies of crosslinguistic variation cited above have been
conﬁned to no more than four core elements of NP and VP. 然而, we have shown
that the results proved here are completely general to sets of categories of any size that
deﬁne a natural order of dominance. The verb group in particular can be naturally
extended to serial verbs of various kinds. The cartographic spines of functional pro-
jections identiﬁed by Cinque and Rizzi (2008) offer several tens of functional heads in
a supposedly universal order of dominance, although there is more to say here about
the relation of syntax and morphology and the degrees of crosslinguistic variation they
allow. Any ﬁrm evidence for any non-separable permutation of these elements in any
language for which the natural order of dominance can be unequivocally determined
would immediately falsify the present theory. As we noted earlier, the number of for-
bidden non-separable falsifying cases rises very rapidly as the size of such sets increases
超过 4.

然而, the power-law distributions that we observe on all such dimensions of
variation, and the fact that the performance factors that lie behind them favor continuity
and semantic homomorphism, mean that those falsifying cases will always lie out in the
long tail. We have seen that the total number of languages that have ever been studied in
the requisite detail is in the case of the nominal construction only just enough to take us
far enough out into that long tail to achieve a conﬁdence level against ever seeing a non-
separable permutation of even 1 在 100. That means our chances of reaching a deﬁnitive
answer to the question for even a ﬁve-element construction are quite daunting.

Naturally, our work is not the ﬁrst work to try to ﬁnd a formal constraint of this

种类. Here we review some results from other formalisms.

ITG Inversion Transduction Grammars (吴 1996) are a type of syntax transduction
grammar developed for the task of machine translation. ITG can generate only separa-
ble permutations of the words in the source language, so it is natural to ask how this
property of ITG differs from the one proposed here. The main difference lies in what is
considered the reference word order that is being permuted. 这里, CCG gives a clear
deﬁnition: The reference word order is the CL that reﬂects the NOD, and ultimately the
语义学. To say that CCG generates only separable permutations of the NOD is to
make a clear prediction about what permutations can be found in natural languages.
然而, the same is not the case with ITG, which does not identify any NOD. For ITG
the reference word order is that of the source language in the MT task. Taking a different
language as a source language gives a different prediction of what will be found in the
world’s languages. 拿, 例如, the previous example “these ﬁve young lads.” If
ITG takes English as a source language then it will make the same prediction as CCG.
But if we had, 例如, taken Spanish as a source language, with the word order
“these ﬁve lads young,” ITG would be able to derive “ﬁve lads these young,” which has

31

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

not been attested and that CCG predicts to be impossible. 更重要的是, starting
from Spanish ITG would be unable to derive the word order “lads these young ﬁve,”
which was attested by both Cinque and Nchare for NP and by Abels and Neeleman for
the equivalent VP permutation. Extending ITG to a more powerful formalism such as
Permutation Trees (Zhang and Gildea 2007; Stanojevi´c and Sima’an 2015) does not solve
this problem as long as the NOD is ignored.

TAG Tree-Adjoining Grammars (Joshi 1985) are often compared to CCG because
they are weakly equivalent, in the sense that they generate the same set of languages
(Joshi, Vijay-Shanker, and Weir 1991). The problem discussed in our article is about the
set of permutations allowed for a ﬁxed natural order of dominance. 这意味着
results from our paper would transfer to TAG only if TAG was strongly equivalent to
CCG, which Koller and Kuhlmann (2009) have shown not to be the case: There are some
dependency structures that are derivable by CCG that are not derivable by TAG and vice
versa. 尤其, Koller and Kuhlmann’s (2009, Figure 8b) example of a dependency
structure derivable by TAG but not by CCG is actually the dependency structure of the
non-separable permutation 2413, 或者 (G) under the alphanumeric convention used in this
文章.

Previous work on the permutations derivable by TAG have looked at the different
scrambled argument orders allowed in German. This was motivated by the work of
Becker, Rambow, and Niv (1992), who claim that scrambling requires even higher
generative power than LCFRS. Joshi, Becker, and Rambow (2000) show that that is
indeed the case if we want unbounded embedding levels of scrambling, but note that
it is not clear that this is needed in modeling attestable natural language data. The tree-
local multi-component version of TAG (TL-MC-TAG) does not increase weak generative
power but does increase strong generative power, allowing complete scrambling of
three arguments but not four.

Scrambling in CCG is handled by the assumption that all arguments are lexically
type-raised as functions over the verb or verb complex, which allows less oblique
论据 (关键地, the subject) to compose with the German verb complex in ad-
vance of more oblique arguments. 如前所述, lexical type raising alters the NOD,
making the subject take position 1 and the verb position n in the order of dominance.
If the generalization of composition rules in CCG is limited to the second-order case
illustrated in (12), CCG is subject to the same limitation as TL-MC-TAG: three arguments
can scramble freely, but four cannot. For more discussion on German scrambling and
CCG, see Hockenmaier and Young (2008).

Chen-Main and Joshi (2008) take the problem of scrambling of noun arguments
of four verbs: N1N2N3N4V1V2V3V4. They show that there are 22 permutations of N1
to N4 that can be derived with the extended version of TL-MC-TAG that supports
ﬂexible composition and multiple adjoining. 然而, these permutations cannot be
directly compared with the NP and VP constructions considered in the empirical studies
discussed earlier.15

MG Minimalist Grammars (Stabler 1996) are a rigorously deﬁned formalism for
Chomskyan Minimalism (Chomsky 1995b) that has higher generative capacity than
CCG. Stabler (2011) has used MG to derive possible permutations for the four core ele-
ments of a noun phrase and found that, under some assumptions, MG can generate only

15 When Chen-Main and Joshi (2008) report that their variation of TAG can capture permutation (G) 2413 的
verbal arguments, they are actually referring to a permutation 24135678 of eight elements, of which the
last four (the verbs) are polyvalent, deﬁning a quite different (部分的) NOD from the 1 > 2 > 3 > 4
linearization.

32

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Stanojevi´c and Steedman

Formal Basis of a Language Universal

16 out of the complete 24 possible permutations. 这些 16 include the 14 permutations
claimed by Cinque (2005), but exclude some of those claimed by Nchare (2012) 为了
free word-order language Shupamem. The assumption that Stabler makes is that the
grammar uses neither empty strings nor head-movement, both of which are prevalent
in most linguistic work on minimalist syntax. 有趣的是, Stabler’s 16 include permu-
站 (j), which is one of the two non-separable permutations that are excluded under
the present theory.

The limitation of combinatory reduction to the separable permutations is also
implicit in Svenonius’s (2007) *1-3-X-2 restriction on adjunct placement (除其他外
constructions), where X is an adjunct to 1, and in Williams’s (2003, 页面 203-211)
proposal for his categorial calculus CAT. CAT has a standard directional categorial
lexicon and rule of application, with a combinatory operation REASSOCIATE equiv-
alent to composition, and an operation FLIP, unavailable in CCG, which reverses the
directionality on the argument of a functor category.

Williams’s CAT variety of Minimalism is closely related to CCG. 然而, 没有
the addition of morpho-lexical type-raising, permitting arguments to compose, CAT
is unable to express the variety of constructions that CCG makes available, 哪个
include relativization, coordination-reduction, Germanic scrambling, and Hungarian
VM fronting (斯蒂德曼 2020). In the absence of lexical type-raising these constructions
require the extension of CAT by the addition of powerful rules of “action at a distance”
unavailable in CCG, such as movement, copying, and/or deletion, together with atten-
dant constraints to limit their considerable expressive power.

A related proposal to ours by Medeiros (2018) hypothesizes that the set of allowed
permutations is the one that could be sorted with a single stack, so-called stack-sortable
permutations or 231-avoiding permutations (Knuth 1968). This set of permutations
exactly covers Cinque’s 14, but does not ﬁt the other empirical data considered here:
7 permutations out of 21 that are observed are predicted not to be possible by Medeiros.
As with Williams (2003), it is also unclear that Medeiros’s mechanism will also han-
dle phenomena of unbounded or wh movement, as CCG does. 然而, there is an
interesting relation between the sorting automata for 231-avoiding permutations and
the automata that sort separable permutations: 231-avoiding permutations are sortable
with a single pop-stack while separable permutations are sortable with a sequence of
n - 1 pop-stacks (Avis and Newborn 1981).

7. 结论

This article has argued that the facts of attested word order in the constructions consid-
ered exploits all and only the degrees of freedom that CCG allows, despite the fact that
some of the orders that CCG allows for some constructions are vanishingly rare due
to unrelated performance soft constraints that induce Zipﬁan (power-law) 分布
discussed by Culbertson, 斯摩棱斯基, and Legendre (2012) and Culbertson and Adger
(2014).

Explanatory adequacy in a theory depends on being able to explain why the data
has just the degrees of freedom it does, and why other things don’t happen. If the latter
have been prevented by hard constraints, as in various ways they have been for these
constructions by Cinque, Svenonius, Abels, Neeleman, Nchare, and Stabler, then those
constraints themselves have in turn to be explained.

No hard constraints are needed at the level of competence grammar in CCG. 这
limitation on the grammatical permutation of n elements in natural grammars to the
Large Schr ¨oder number Sn−1 of separable permutations is a formal universal stemming

33

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

p

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

/

/

/

4
7
1
9
2
0
0
8
4
4
0
/
C
哦

我
我

_
A
_
0
0
3
9
4
p
d

.

F

乙
y
G
你
e
s
t

t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 47, 数字 1

from the limited expressivity of CCG itself—speciﬁcally, its restriction of its operators
to strictly adjacent pairs of non-empty constituent types via the Combinatory Projec-
tion Principle (13). This result is a corollary of the Combinatory Categorial theory of
grammar, and the formally explicit reduction that it affords of what in Minimalist terms
would be called MOVE or “internal merge” to contiguous (那是, “external”) MERGE.

附录A: The Ensemble of Attested Word Orders Over 4 Elements

In this appendix we review the attested orders of four core elements of NP and VP as
reported by Cinque, Nchare, and Abels and Neeleman. We also give CCG derivation
trees for these orders from Steedman (2020).

If our lexicon consists of four entirely unrestricted categories of the form A|乙, 乙|C,
C|D, D, then for both the nominal and verbal constructions all 22 separable permuta-
tions are allowed, and only the two non-separable permutations are unanalyzable, 作为一个
consequence of the combinatory projection principle (13), as follows.

The NP
For the NP, 这 22 orders are allowed under the following derivations (“×” marks
the two sequences (G) 和 (j) that are unanalyzable). Only essential compositions are
表明的: all other combinations are application. For non-basic orders, the annotation
“from z” indicates the basic pure-applicative order among those in (8) on whose lexicon
a particular derived order is based:16

两个都 (basic: v.many)

两个都 (basic: 许多)

Cinque (from b: v.few)

Cinque (from b: 很少)

Nchare (from r)

Nchare (from s)

(21) A.

乙.

C.

这些

这些

ﬁve
young lads
NP/NumP NumP/N(西德:48) 氮(西德:48)/N N
lads young
ﬁve
NP/NumP NumP/N(西德:48) N N(西德:48)\氮
young
NP/NumP N NumP/N(西德:48) 氮(西德:48)\氮
>B×

这些

lads

ﬁve

NumP\N

d. Lads

ﬁve

这些

young
N NP/NumP NumP/N(西德:48) 氮(西德:48)\氮
NP/N(西德:48)

>B

NP\N

>B×

e.

F.

Five

这些
young lads
NumP/N(西德:48) NP\ NumP N(西德:48)/N N

B×

我. Young

这些

lads
氮(西德:48)/N NP/NumP NumP\N(西德:48) 氮

ﬁve

NP\N(西德:48)

NP/N

>B×

B×

ﬁve

NP\N(西德:48)

我. Lads young

这些
N N(西德:48)\N NP/NumP NumP\N(西德:48)
>B×

ﬁve

NP\N(西德:48)
ﬁve

米.

n.

哦.

这些

young

lads
NP/NumP N(西德:48)/N NumP\N(西德:48) 氮

B×

NP\N

q.

Five

young

lads
NumP/N(西德:48) 氮(西德:48)/N NP\NumP N
>B

这些

NumP/N

NP/N

B×

NumP\N

你. Young

ﬁve

这些

lads
氮(西德:48)/N NumP\N(西德:48) NP\NumP N
NP\N(西德:48)

B×

ﬁve

NumP\(西德:63)氮
NumP
>

< NP This is Cinque’s attested order (21c).17 17 Depending on the facts concerning coordinations like (15) in the language in question, we may want to use /(cid:5)(cid:63) or \(cid:5)(cid:63) slash types in place of /(cid:63) or \(cid:63) to allow harmonic composition as well as application in a language like English. 36 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 7 1 9 2 0 0 8 4 4 0 / c o l i _ a _ 0 0 3 9 4 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Stanojevi´c and Steedman Formal Basis of a Language Universal For a language like Shupamem with many orders, in the worst case we can simply allow multiple entries for lexical items supporting all those orders. However, in practice, smaller lexicons with entries supporting more than one order seem to apply. (For example, in Shupamem, Dem seems to bear the completely unrestricted bidirectional slash type NP|NumP—Steedman [2020, page 19].) The VP Turning to the Germanic VP construction surveyed in Abels (2016), we see the identical picture to that for the elements of the NP (21) emerge for the verb group. That is, only 22 out of the total 24 permutations are allowed: (24) a. b. will help teach swim VP1/VP2 VP2/VP3 VP3/VP4 VP4 swim teach help will VP1/VP2 VP2/VP3 VP4 VP3\VP4 c. will swim help teach VP1/VP2 VP4 VP2/VP3 VP3\VP4 >B×

VP2\VP4

d. swim will

帮助

teach

VP4 VP1/VP2 VP2/VP3 VP3\VP4

VP1\VP3

>B

VP1\VP4

>B×

e.

将要

帮助

swim
VP2/VP3 VP1\VP2 VP3/VP4 VP4
B×

我.

teach

swim
VP3/VP4 VP1/VP2 VP2\VP3 VP4

帮助

将要

VP1\VP3

VP1/VP4

>B×

B×

VP1\VP3

我. swim teach

将要

帮助

VP4 VP3\VP4 VP1/VP2 VP2\VP3
>B×

米.

将要

swim
VP1/VP2 VP3/VP4 VP2\VP3 VP4

teach

VP2\VP3
帮助

B×

VP1\VP4
将要

q. Help

teach

swim
VP2/VP3 VP3/VP4 VP1\VP2 VP4
>B

VP2/VP4

VP1/VP4

B×

VP2\VP4

teach

swim
VP3/VP4 VP2\VP3 VP1\VP2 VP4

帮助

将要

VP1\VP3

VP1/VP4

下载pdf