Optimizing over subsequences generates context-sensitive languages
Andrew Lamont
University of Massachusetts Amherst
alamont@linguist.umass.edu
Abstrakt
Phonological generalizations are finite-state.
While Optimality Theory is a popular frame-
work for modeling phonology, it is known
to generate non-finite-state mappings and
languages. This paper demonstrates that Opti-
mality Theory is capable of generating non-
context-free languages, contributing to the
characterization of its generative capacity. Das
is achieved with minimal modification to the
theory as it is standardly employed.
1
Einführung
Sind
generalizations
finite-state
Phonological
(Johnson, 1972; Kaplan and Kay, 1994; sehen
Heinz, 2018, for a recent overview); das ist, Eingang-
output mappings can be modeled using finite-state
transducers and phonotactic well-formedness can
be modeled using finite-state acceptors.
Optimality Theory (OT; Prince and Smolensky,
1993/2004) is a framework that is commonly
used to model phonology. While some restricted
variants of OT are finite-state (Frank and Satta,
1998; Eisner, 2000, 2002; Riggle, 2004; sehen
Hulden, 2017, for a recent overview), standard
OT, as it is employed by practicing phonologists, Ist
known to generate non-finite-state mappings and
languages (Eisner, 1997; Frank and Satta, 1998).
OT is a special instance of Harmonic Gram-
mar (Legendre et al., 1990), which can model
arbitrary computations (Smolensky, 1992). Während
the exact generative capacity of OT has not yet
been characterized, it has recently been shown
to produce non-context-free mappings (Lamont,
2019A,B). This paper contributes to the literature
on OT by demonstrating its capacity to gener-
ate non-context-free languages using constraints
defined over subsequences. Subsequences are
finite literals composed of ordered symbols that
are not necessarily adjacent.1 They contrast with
1Trivially, all strings of length 1 are subsequences. In diesem
Papier, subsequences refer to subsequences of length ≥ 2.
528
substrings, whose constituent elements are con-
tiguous. Figur 1 illustrates the subsequences of
Länge 2 in the string example. Of these twenty-one
subsequences, six are also substrings of length 2:
e. . . X, X. . . A, A. . . M, M. . . P, P. . . l, l. . . e.
In the literature on phonotactics as formal
languages, subsequences have been used to model
non-local phenomena (Heinz, 2007, 2010, 2014;
Rogers et al., 2010; Graf, 2017). Zum Beispiel, Wenn
a language disallows words from surfacing with
more than one lateral consonant, it can be modeled
as banning the subsequence l. . . l. Languages
defined by banning a finite set of subsequences
belong to the Strictly Piecewise languages, welche
are properly contained within the class of regular
languages.
Strictly Piecewise languages impose inviolable
constraints on subsequences: A string belongs to
a language if and only if it does not contain any
banned subsequences. In OT, all constraints are
violable, and violations are minimized when-
ever possible. Folglich,
in addition to
modeling non-local restrictions, constraints on
subsequences are often used to minimize the dis-
tance between two objects (McCarthy and Prince,
1993; Hyde, 2012, 2016). Zum Beispiel, Figur 2
illustrates a string of syllables, represented as σ,
that belong to a prosodic word, whose edges are
marked with square brackets. Two syllables are
parsed into a foot, indicated by parentheses. Der
number of ) . . . σ . . . ] subsequences indicates how
far the foot is from the right edge of the prosodic
word, calculated over intervening syllables.
When only one foot is parsed, aligning it to
the right edge of the prosodic word eliminates
) . . . σ . . . ] subsequences:
[σσσσσ(σσ)]. Wie-
immer, they are unavoidable when multiple feet are
parsed. In these cases, the pressure to minimize
) . . . σ . . . ] subsequences determines the prosod-
ification. Zum Beispiel, Tisch 1 illustrates four
parses of a seven syllable string that contain three
disyllabic feet and one monosyllabic foot. Der
position of the monosyllabic foot affects the total
number of ) . . . σ . . . ] subsequences.
Transactions of the Association for Computational Linguistics, Bd. 9, S. 528–537, 2021. https://doi.org/10.1162/tacl a 00382
Action Editor: Mark-Jan Nederhof. Submission batch: 10/2020; Revision batch: 1/2021; Published 5/2021.
C(cid:3) 2021 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figur 1: The string example contains twenty-one
subsequences of length 2: e. . . X, e. . . A, e. . . M,
e. . . P, e. . . l, e. . . e, X. . . A, X. . . M, X. . . P, X. . . l,
X. . . e, A. . . M, A. . . P, A. . . l, A. . . e, M. . . P, M. . . l,
M. . . e, P. . . l, P. . . e, l. . . e.
Figur 2: Minimizing the number of ) . . . σ . . . ]
subsequences minimizes the distance between the right
edge of the foot and the right edge of the prosodic
word.
It is not possible to replicate these effects with
inviolable constraints on subsequences. Formally,
this is because some strings that are not in the
language are themselves subsequences of strings
that are in the language. Zum Beispiel, to ban the
string *[(σ)(σσ)], one must ban one of its sub-
Sequenzen. Jedoch, Weil *[(σ)(σσ)] is itself
a subsequence of [(σσ)(σσ)], the latter contains
every subsequence of the former, Und *[(σ)(σσ)]
cannot be banned without incorrectly banning
[(σσ)(σσ)]. This example illustrates that violable
constraints on subsequences in OT generate lan-
guages more expressive than Strictly Piecewise
languages. Along similar lines, Koser and Jardine
(2020) demonstrate that violable constraints on
substrings in OT are more expressive than invio-
lable constraints.
Optimization and violability contribute much
more expressivity than these results suggest.
Eisner (1997, 2000) demonstrated that OT can
generate context-free languages with subsequence
constraints, and this paper pushes his result into
non-context-free languages. The main result of
this paper has wide-reaching consequences for
phonologists, as many standard constraint fam-
ilies are defined over subsequences. Beispiele
include alignment constraints (McCarthy and
Prince, 1993; Hyde, 2012, 2016), conjoined con-
straints (Smolensky, 1993, 2006; Alderete, 1997),
co-occurrence constraints (Suzuki, 1998; Pulley-
blank, 2002), SHARE constraints (McCarthy, 2010;
Mullin, 2011), and the family of surface corre-
Parse
[(σ)(σσ)(σσ)(σσ)]
[(σσ)(σ)(σσ)(σσ)]
[(σσ)(σσ)(σ)(σσ)]
[(σσ)(σσ)(σσ)(σ)]
Total
12
11
10
9
Tisch 1: Right-aligning a monosyllabic foot in
odd-parity syllable strings minimizes the number
von ) . . . σ . . . ] subsequences.
spondence constraints (Walker, 2000; Hansson,
2001, 2010; Rose and Walker, 2004; Bennett,
2013, 2015).
Eisner’s result,
that OT with subsequence
constraints generates context-free languages, Ist
presented in section 2. It further argues that
restricting the set of subsequence constraints
available to OT does not
its generative
Kapazität. This paper’s contribution, that OT gen-
erates context-sensitive languages with constraints
on subsequences, is presented in section 3. Der
result is illustrated with case studies on prosodic
parsing and non-local dissimilation, and a proof
of the general case is provided.
limit
2 Violable Subsequence Constraints Can
Divide Strings into Halves
Eisner
(1997, 2000) demonstrated that with
violable constraints on subsequences, mappings
in Optimality Theory can target the centers of
strings. In his example, the Midpoint Pathology,
a feature in the input shifts so as to surface in
the center of the output. Taking stress as that
feature, the Midpoint Pathology is defined in (1)
as an input-output mapping, where σ represents a
syllable, and ´σ represents a stressed syllable. Der
output language is homomorphic to the archetypal
non-finite-state language anbn, with the allowance
of an additional a or b.
(1) Fmidpoint : σi´σσj (cid:4)→ σk ´σσl where
ich + j = k + l and |k − l| ≤ 1
In OT, a set of candidates is generated from
an input string, and evaluated by a ranked set
of constraints. Constraints are functions that map
candidates onto a number of violation marks,
assigning as many violations as there are specific
structures in the candidate or specific changes
made to the input to generate the candidate. Der
candidate that is lexicographically minimal in its
529
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
/´σσσσσσσ/
ALIGN(σ, ´σ, σ)
IDENT(stress)
A. ´σσσσσσσ
B. σ´σσσσσσ
C. σσ´σσσσσ
→ d. σσσ´σσσσ
e. σσσσ´σσσ
F. σσσσσ´σσ
G. σσσσσσ´σ
0 + 0 + 1 + 2 + 3 + 4 + 5 = 15
0 + 0 + 0 + 1 + 2 + 3 + 4 = 10
1 + 0 + 0 + 0 + 1 + 2 + 3 = 7
2 + 1 + 0 + 0 + 0 + 1 + 2 = 6
3 + 2 + 1 + 0 + 0 + 0 + 1 = 7
4 + 3 + 2 + 1 + 0 + 0 + 0 = 10
5 + 4 + 3 + 2 + 1 + 0 + 0 = 15
2
2
2
2
2
2
Tisch 2: The Midpoint Pathology (Eisner, 1997, 2000): Stress shifts onto the
middle syllable in odd-parity words, and onto either of two middle syllables in
even-parity words. Violations of ALIGN(σ, ´σ, σ) are split up by syllable.
concatenated violations is returned as output. In
the Midpoint Pathology, stress shifts to mini-
mize the violations of the alignment constraint
(McCarthy and Prince, 1993; McCarthy, 2003)
defined in (2), which assigns a candidate as
many violations as ´σ . . . σ . . . σ and σ . . . σ . . . ´σ
subsequences it contains.
(2) ALIGN(σ, ´σ, σ): For every syllable σ, if there
is a stressed syllable ´σ, assign one violation
mark for every syllable that intervenes
between σ and ´σ.
The tableau in Table 2 illustrates stress shift-
ing onto the middle syllable of a seven syllable
string. The candidate set contains the input string
/´σσσσσσσ/ (2A) and the six candidates derived
from it by shifting the stress to another syllable
(2b-g). The middle column shows the number of
violations ALIGN(σ, ´σ, σ) assigns each candidate;
for clarity, the violations incurred by each syllable
are shown separately. Violations of ALIGN(σ, ´σ, σ)
decrease as stress approaches the center syllable.
For completeness, the rightmost column shows the
number of violations assigned by the constraint
IDENT(stress), which penalizes syllables whose
stress value in the input was changed. The order-
ing between constraints indicates that ALIGN(σ,
´σ, σ) is ranked above IDENT(stress). The candi-
date with medial stress (2D) is returned as output
because its violation vector (6, 2) is lexicographi-
cally minimal. If IDENT(stress) were ranked above
ALIGN(σ, ´σ, σ), candidate (2A) would be returned
as output.
As this example demonstrates, shifting stress
onto the medial syllable minimizes the viola-
tions of ALIGN(σ, ´σ, σ); siehe Sektion 3 for a proof
that this holds for any length input. Hyde (3; 2008;
2012; 2016) argues that the Midpoint Pathology
/´σσσσσσσ/
*´σ . . . σ . . . σ
ID(stress)
A. ´σσσσσσσ
B. σ´σσσσσσ
C. σσ´σσσσσ
D. σσσ´σσσσ
e. σσσσ´σσσ
→ f. σσσσσ´σσ
→ g. σσσσσσ´σ
15
10
6
3
1
0
0
2
2
2
2
2
2
Tisch 3: Asymmetric alignment does not motivate
the Midpoint Pathology.
the symmetrical nature of
is an artifact of
ALIGN(σ, ´σ, σ). Insbesondere, if the constraint
´σ . . . σ . . . σ or σ . . . σ . . . ´σ
penalized only
subsequences and not both, then stress would
be drawn to one edge rather than the center.
The tableau in Table 3 illustrates this. Hier,
only ´σ . . . σ . . . σ subsequences are penalized, Und
stress is drawn to the right edge, surfacing on
either of the last two syllables (3f-g).
While asymmetrical constraints avoid the Mid-
point Pathology specifically, they motivate other
mappings that target the centers of strings. Für
the constraint ALLFEET-RIGHT (Hyde,
Beispiel,
2008, 2012, 2016) penalizes ) . . . σ subsequences
that occur within a prosodic word. By restricting
its application to a given prosodic word, ALLFEET-
RIGHT is similar to but distinct from penalizing
) . . . σ . . . ] subsequences. As discussed in section 1,
this constraint pulls monosyllabic feet to the right
edge of prosodic words, and as the tableau in
Tisch 4 illustrates, it also balances the size of mul-
tiple prosodic words. All the candidates in this
tableau are parsed into two prosodic words and are
exhaustively footed. To save space, constraints
that enforce these conditions are omitted. Der
530
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
/σσσσσσσσ/
ALLFT-R
A. [(σ)] [(σ)(σσ)(σσ)(σσ)] 0 + 12 = 12
B. [(σ)] [(σσ)(σ)(σσ)(σσ)] 0 + 11 = 11
C. [(σ)] [(σσ)(σσ)(σ)(σσ)] 0 + 10 = 10
0 + 9 = 9
D. [(σ)] [(σσ)(σσ)(σσ)(σ)]
0 + 6 = 6
e. [(σσ)] [(σσ)(σσ)(σσ)]
2 + 6 = 8
F. [(σ)(σσ)] [(σ)(σσ)(σσ)]
2 + 5 = 7
G. [(σ)(σσ)] [(σσ)(σ)(σσ)]
2 + 4 = 6
H. [(σ)(σσ)] [(σσ)(σσ)(σ)]
1 + 6 = 7
ich. [(σσ)(σ)] [(σ)(σσ)(σσ)]
1 + 5 = 6
J. [(σσ)(σ)] [(σσ)(σ)(σσ)]
1 + 4 = 5
k. [(σσ)(σ)] [(σσ)(σσ)(σ)]
2 + 2 = 4
→ l. [(σσ)(σσ)] [(σσ)(σσ)]
Tisch 4: ALLFEET-RIGHT prefers that when two
prosodic words are parsed, they are balanced in
Größe. Violations are separated by prosodic word.
violations of ALLFEET-RIGHT decrease as the dif-
ference in size between the two prosodic words
decreases, and candidate (4l) is returned as output.
In der Praxis, prosodic words typically reflect mor-
phosyntactic structure. Jedoch, because the con-
straints that enforce such correspondences are
violable (such as the family of MATCH constraints;
Selkirk, 2011), mappings like those illustrated in
Tisch 4 are predicted to be possible.
(3) ALLFEET-RIGHT: For every foot in a prosodic
word, assign one violation for every syllable
it precedes within the same prosodic word.
The balanced prosodic word mapping is defined
In (4). Like the Midpoint Pathology, its application
depends on identifying the center syllable, and its
output language is homomorphic to anbn with
an extra a or b. Daher, even though asymmetric
alignment cannot target the center of a prosodic
word, it can target the center of a string parsed
into two prosodic words.
(4) Fbalanced : σi (cid:4)→ [(σσ)J(σ)k][(σσ)l(σ)M]
where i = j + k + l + M, k ≤ 1, m ≤ 1,
and j = l
The mappings in this section depend on the
same underlying mechanism: divide a string of
syllables into two parts, and minimize the subse-
quences of at least length 2 that occupy each part.
This reduces the difference in size between the two
parts to at most one syllable. With the Midpoint
Pathology, the two parts are defined by syllables
Figur 3: Hierarchical prosodic structure: syllables (σ)
are parsed into feet (F), which are parsed into prosodic
Wörter (ω), which are parsed into phonological phrases
(Phi), which are parsed into an intonational phrase (ι).
that precede the stressed syllable and syllables
that follow the stressed syllable. With balanced
prosodic words, the two prosodic words define the
Teile. This mechanism is independent of the subse-
quences themselves, provided that they are at least
of length 2. Daher, restricting which subsequences
a constraint can penalize can only block specific
mappings. No restrictions on the set of subse-
quences prevents them from dividing strings in
half. This is proved formally at the end of section 3.
The mappings in this section only divided strings
into two equal parts, generating context-free lan-
guages. The next section presents mappings that
divide strings into three or more equal parts,
generating non-context-free languages.
3 Violable Subsequence Constraints Can
Divide Strings into Arbitrarily Many
Equally Sized Parts
As the previous section demonstrated, subse-
quence constraints can be used to divide strings
into two equal parts, generating context-free
languages. Parsing strings into more than two
equal parts generates non-context-free languages.
With balanced prosodic words, this follows from
parsing a string of syllables into more than two
prosodic words. This can be motivated by hier-
archical prosodic structure (Nespor and Vogel,
1986; Selkirk, 1984), rather than stipulated arbi-
trarily. Figur 3 illustrates a standard five-level
prosodic hierarchy, where prosodic words are
dominated by phonological phrases, welche sind
dominated by an intonational phrase. By requiring
these top two levels to dominate exactly two
daughter nodes, the string of syllables is parsed
531
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
/σσσσσσσσσσσσσσ/
A. [(σ)] [(σ)] [(σ)] [(σσ)(σσ)(σσ)(σσ)(σσ)(σ)]
B. [(σ)] [(σ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σσ)]
C. [(σ)] [(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)(σ)]
D. [(σ)] [(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σσ)]
e. [(σ)] [(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]
F. [(σ)] [(σ)] [(σσ)(σσ)(σσ)] [(σσ)(σσ)(σσ)]
G. [(σ)] [(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σ)]
H. [(σ)] [(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)]
ich. [(σ)] [(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σ)]
J. [(σ)] [(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)]
k. [(σ)] [(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]
l. [(σ)] [(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]
M. [(σ)] [(σσ)(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]
N. [(σ)] [(σσ)(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]
Ö. [(σσ)] [(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)]
P. [(σσ)] [(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]
Q. [(σσ)] [(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]
R. [(σσ)] [(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]
S. [(σσ)] [(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)]
T. [(σσ)] [(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]
u. [(σσ)(σ)] [(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σ)]
→ v. [(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)]
ALLFEET-RIGHT
0 + 0 + 0 + 25 = 25
0 + 0 + 0 + 20 = 20
0 + 0 + 1 + 16 = 17
0 + 0 + 2 + 12 = 14
0 + 0 + 4 + 9 = 13
0 + 0 + 6 + 6 = 12
0 + 0 + 0 + 16 = 16
0 + 0 + 1 + 12 = 13
0 + 0 + 2 + 9 = 11
0 + 0 + 4 + 6 = 10
0 + 1 + 1 + 9 = 11
0 + 1 + 2 + 6 = 9
0 + 1 + 4 + 4 = 9
0 + 2 + 2 + 4 = 8
0 + 0 + 0 + 12 = 12
0 + 0 + 1 + 9 = 10
0 + 0 + 2 + 6 = 8
0 + 0 + 4 + 4 = 8
0 + 1 + 1 + 6 = 8
0 + 1 + 2 + 4 = 7
1 + 1 + 1 + 4 = 7
1 + 1 + 2 + 2 = 6
Tisch 5: ALLFEET-RIGHT prefers that when four prosodic words are parsed, they are balanced
in size. Violations are split up by prosodic word.
into four prosodic words. Note that because this
hierarchy is finite, it can be represented by a
finite-state grammar (Yu, 2019).
Wie erwartet, ALLFEET-RIGHT has two effects on
strings parsed into four prosodic words. Erste, In
odd-parity prosodic words, the monosyllabic foot
appears at the right edge. Zweite, no two prosodic
words differ in size by more than one syllable.
The tableau in Table 5 illustrates these effects
with a fourteen syllable string. A large number of
candidates are omitted to reduce the size of this
tableau. These include candidates with more or
fewer than four prosodic words, and candidates
where monosyllabic feet do not surface at the right
edge of their prosodic word. It can be verified
that those candidates incur more violations of
ALLFEET-RIGHT, as in Tables 1 Und 4. The other
omitted candidates are identical to the presented
candidates, except with their prosodic words in
another order. Zum Beispiel, the candidate chosen
as output (5v) represents a set containing six
possible parses, which incur the same number of
violations of ALLFEET-RIGHT.
This quartering mapping is defined in (5).
The language it generates is homomorphic to
the context-sensitive stringset anbncndn with
allowances for an extra a, B, C, oder d.
(5) Fquarter : σi (cid:4)→
[(σσ)J(σ)k][(σσ)l(σ)M][(σσ)N(σ)Ö][(σσ)P(σ)Q]
where i = j + k + l + M + N + Ö + P + Q,
k ≤ 1, m ≤ 1, o ≤ 1, q ≤ 1, and j = l = n = p
As noted in the previous section, ALLFEET-
RIGHT is distinct from a constraint that penalizes
) . . . σ subsequences. Insbesondere, because it is
restricted to subsequences within prosodic words,
it undercounts when multiple prosodic words are
parsed, as Figure 4 illustrates. The next case study
demonstrates that this property is irrelevant to
the generative capacity of OT with subsequence
constraints.
Zu diesem Zweck, consider the constraint *X. . . X
defined in (6). *X. . . X is a non-local variant of
*GEMINATE that penalizes subsequences of length
2 whose constituent segments are identical. Es
has not been proposed as a serious phonological
532
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
/ppppppppp/
*X. . . X
IDENT(Ort)
A. ppppppppp
B. ppppppppt
C. ppppppptt
D. ppppppttt
e. ppppptttt
F. ppppppptk
G. pppppptkk
H. ppppptkkk
ich. pppptkkkk
J. pppppttkk
k. ppppttkkk
→ l. ppptttkkk
36 + 0 + 0 = 36
28 + 0 + 0 = 28
21 + 1 + 0 = 22
15 + 3 + 0 = 18
10 + 6 + 0 = 16
21 + 0 + 0 = 21
15 + 0 + 1 = 16
10 + 0 + 3 = 13
6 + 0 + 6 = 12
10 + 1 + 1 = 12
6 + 1 + 3 = 10
3 + 3 + 3 = 9
1
2
3
4
2
3
4
5
4
5
6
Tisch 7: *X. . . X prefers that strings that contain
labials, coronals, and dorsals have equal numbers
of stops at each place of articulation. Violations
are separated into labials, coronals, and dorsals.
segment type. The tableau in Table 7 on the
next page illustrates this case with dissimilation
targeting major place. Hier, the segment inventory
comprises voiceless stops specified as labial,
coronal, or dorsal. The input is a string of nine
labial stops (7A), and candidates derived from it
by changing the place features are shown (7b-l).
The constraint IDENT(Ort) penalizes changes
made to major place features. Unsurprisingly, als
the differences between the numbers of each stop
decrease, so do the violations of *X. . . X. Der
output is any string with three labial stops, three
coronal stops, and three dorsal stops (7l).
The mapping is defined in (8); it generates
a language homomorphic to permutations of
anbncn, with one additional a, B, or c.
(8) Fplace : {P, T, k}ich (cid:4)→ {P, T, k}i where the
difference between any two sets of stops
defined by place is not greater than 1.
*X. . . X has the same effect as constraints
like ALLFEET-RIGHT: it divides the string into a
fixed number of parts, and requires those parts
be as similar to each other in size as possible by
penalizing their subsequences. *X. . . X differs
only in that it does not require that the subparts
form contiguous substrings.
The mappings in this section demonstrated
constraints over subsequences dividing strings
into a fixed number of groups of equal size.
if the groups were not exactly
In all cases,
equal, they could differ by at most one element.
This generalizes to strings of all lengths and all
Figur 4: ALLFEET-RIGHT only penalizes the two ) . . . σ
subsequences indicated with solid lines, and not the
three indicated with dashed lines.
/llllllll/
*X. . . X
IDENT(lateral)
A. llllllll
B. lllllllr
C. llllllrr
D. lllllrrr
→ e. llllrrrr
28 + 0 = 28
21 + 0 = 21
15 + 1 = 16
10 + 3 = 13
6 + 6 = 12
1
2
3
4
Tisch 6: *X. . . X prefers that strings that contain
both laterals and rhotics have equal numbers of
ihnen. Violations are separated into laterals and
rhotics.
constraint, but rather as supporting evidence for
this paper’s result. Unlike ALLFEET-RIGHT, Dort
are no restrictions on which subsequences it
evaluates.
(6) *X. . . X: Assign one violation for every
subsequence α. . . β where α = β.
The tableau in Table 6 illustrates the effect of
this constraint on non-local liquid dissimilation.
In this example, the class of liquid consonants
comprises only alveolar laterals and rhotics, Und
the constraint IDENT(lateral) penalizes changing
one into the other. The input contains eight
laterals (6A). The candidates shown are derived by
changing underlying laterals into rhotics (6b-e).
As in Table 5, permutations of these strings
incur equal numbers of violations. This tableau
demonstrates that as the difference in the number
of laterals and rhotics decreases, so does the
number of violations of *X. . . X. Any string with
four laterals and four rhotics is returned as output
(6e).
This mapping is defined in (7). Its output
language is context-free, and homomorphic to
permutations of anbn with the usual allowance of
an additional a or b.
(7) Fliquid : {l, R}ich (cid:4)→ {l, R}i where the
difference between the number of laterals
and rhotics is not greater than 1.
To generate a context-sensitive stringset with
this constraint, one just has to consider a third
533
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
T
l
A
C
_
A
_
0
0
3
8
2
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
numbers of groups: minimizing the difference
between group sizes minimizes the number of
violating subsequences. Before presenting a proof
of this result, it is necessary to establish that
subsequences of length 2 grow quadratically in
a string’s length, insbesondere, a string of length
n has n2 − n
subsequences of length 2. As a base
Fall, a string of length 2 hat 1 such subsequence:
22 − 2
2 = 1. Inductively, assume that a string of
length n has n2 − n
subsequences of length 2.
Adding one segment adds n subsequences of
Länge 2, and it can be verified that n2 − n
2 + n =
(N + 1)2 − (N + 1)
. The proof of the main result of
2
2
2
this paper follows.
Proof. For a string s composed of k disjoint sets,
let s1, s2, . . . , sk denote the cardinality of each set,
and define the constraint function M : s → N as
M (S) =
k(cid:2)
i=1
s2
ich
− si
2
Let u be a string composed of k disjoint sets
such that one set has two more members than
another:
ui ≥ uj + 2
By way of contradiction, assume that the com-
position u minimizes the function M .
Consider an alternate composition v, identical
to u except that one element has been moved from
the larger set to the smaller set:
vi = ui − 1
vj = uj + 1
2ui < 2uj + 2
ui < uj + 1
This contradicts the fact that ui is at least as great
as uj + 2. Therefore, no composition of a string
whose component sets differ in cardinality by two
or more minimizes M .
constraints
This proves
in
that violable
Optimality Theory that penalize subsequences
of length 2 can divide any length input into a
fixed number of equally sized parts, generating
context-sensitive stringsets.
4 Conclusion
Optimality Theory is known to generate non-
finite state mappings and languages (Eisner, 1997;
Frank and Satta, 1998), even with constraints
defined over strings (Riggle, 2004; Gerdemann
and Hulden, 2012; Heinz and Lai, 2013; Hao,
2019; Lamont, 2019a,b). This paper contributes to
this literature by demonstrating that constraints
over
subsequences generate context-sensitive
languages under optimization. This result has
wide-ranging impacts for the field of phonology,
as a number of commonly employed constraint
types are defined over subsequences.
Acknowledgments
This paper has been greatly improved by three
anonymous reviewers for TACL and through
conversations with Jeffrey Heinz, Brett Hyde, Neil
Immerman, and Brandon Prickett. I am especially
to Chris Coscia for his guidance on
grateful
mathematical notation and structuring the proof.
All remaining errors are of course my own.
By assumption, we have M (u) < M (v), from
which we derive a contradiction:
References
u2
i
− ui
2
u2
j
+
− uj
2
v2
i
<
− vi
2
v2
j
+
− vj
2
u2
i
− ui + u2
j
− uj < v2
i
− vi + v2
j
− vj
u2
i
− ui + u2
j
− uj < (ui − 1)2 − (ui − 1)
+ (uj + 1)2 − (uj + 1)
u2
i
− ui + u2
j
− uj < u2
i
− 3ui + u2
j + uj + 2
John Alderete. 1997. Dissimilation as local
conjunction. In Kiyomi Kusumoto, editor, Pro-
ceedings of NELS 27, pages 17–32. Graduate
Linguistics Student Association, Amherst, MA.
Wm. G. Bennett. 2013. Dissimilation, Consonant
Harmony, and Surface Correspondence. Ph.D.
thesis, Rutgers, The State University of New
Jersey.
Wm. G. Bennett. 2015. The Phonology of
Consonants. Cambridge University Press,
Cambridge.
534
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
t
l
a
c
_
a
_
0
0
3
8
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Jason Eisner. 1997. What constraints should OT
allow? Paper presented at LSA 71. Available
at http://roa.rutgers.edu/article
/view/215
Jeffrey Heinz. 2010. Learning long-distance
phonotactics. Linguistic Inquiry, 41:623–661.
DOI: https://doi.org/10.1162/LING
a 00015
Jason Eisner. 2000. Directional constraint eval-
uation in Optimality Theory. In Proceedings
of the 18th International Conference on Com-
putational Linguistics, pages 257–263. DOI:
https://doi.org/10.3115/990820
.990858
Jason Eisner. 2002. Comprehension and compi-
lation in Optimality Theory. In Proceedings of
the 40th Annual Meeting of the Association for
Computational Linguistics, pages 56–63. DOI:
https://doi.org/10.3115/1073083
.1073095
Robert Frank and Giorgia Satta. 1998. Optimal-
ity Theory and the generative complexity of
constraint violability. Computational Linguis-
tics, 24(2):307–315.
Dale Gerdemann and Mans Hulden. 2012.
Practical finite state Optimality Theory. In
Proceedings of the 10th International Work-
shop on Finite State Methods and Natural
Language Processing, pages 10–19.
Thomas Graf. 2017. The power of
locality
in phonology. Phonology, 34(2):
domains
385–405. DOI: https://doi.org/10
.1017/S0952675717000197
Gunnar ´Olafur Hansson. 2001. Theoretical and
Typological Issues in Consonant Harmony.
Ph.D. thesis, University of California, Berkeley.
Gunnar
´Olafur Hansson. 2010. Consonant
Harmony: Long-Distance
in
Phonology. University of California Press,
Berkeley, CA.
Interactions
Yiding Hao. 2019. Finite-state Optimality Theory:
Non-rationality of Harmonic Serialism. Journal
of Language Modeling, 7(2):49–99. DOI:
https://doi.org/10.15398/jlm.v7
i2.210
Jeffrey Heinz. 2007.
Inductive Learning of
Phonotactic Patterns. Ph.D. thesis, University
of California Los Angeles.
Jeffrey Heinz. 2014. Culminativity times harmony
equals unbounded stress. In Harry van der Hulst,
editor, Word Stress: Theoretical and Typolog-
ical Issues, pages 255–275. Cambridge Uni-
versity Press, Cambridge. DOI: https://
doi.org/10.1017/CBO9781139600408
.012
Jeffrey Heinz. 2018. The computational nature
of phonological generalizations. In Larry M.
Hyman and Frans Plank, editors, Phonolog-
ical Typology, pages 126–195. De Gruyter
Mouton, Berlin. DOI: https://doi.org
/10.1515/9783110451931-005
Jeffrey Heinz and Regine Lai. 2013. Vowel
harmony and subsequentiality. In Proceedings
of the 13th Meeting on the Mathematics of
Language, pages 52–63.
Mans Hulden. 2017. Formal and computational
verification of phonological analyses. Phonol-
ogy, 34(2):407–435. DOI: https://doi
.org/10.1017/S0952675717000203
Brett Hyde. 2008. Alignment continued: Distance-
sensitivity, order-sensitivity, and the midpoint
pathology. Unpublished manuscript, Washing-
ton University. Available at http://roa
.rutgers.edu/article/view/1028.
Brett Hyde. 2012. Alignment constraints. Natural
Language and Linguistic Theory, 30:789–836.
DOI: https://doi.org/10.1007/s11
049-012-9167-3
Brett Hyde. 2016. Layering and Directionality.
Equinox, Sheffield.
C. Douglas Johnson. 1972. Formal Aspects of
Phonological Description. Mouton, The Hague.
DOI: https://doi.org/10.1515/978
3110876000
Ronald Kaplan
and Martin Kay.
1994.
Regular models of phonological rule systems.
Computational Linguistics, 20:331–378.
Nate Koser and Adam Jardine. 2020. The
complexity of optimizing over strictly local
constraints. In Proceedings of the 43rd Annual
Penn Linguistics Conference, pages 125–134.
535
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
t
l
a
c
_
a
_
0
0
3
8
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Andrew Lamont. 2019a. Majority rule in har-
monic serialism. In Supplemental Proceedings
of
the 2018 Annual Meeting on Phonol-
ogy, Washington, D.C., Linguistic Society of
America. DOI: https://doi.org/10.3765
/amp.v7i0.4546
Douglas Pulleyblank. 2002. Harmony drivers: No
disagreement allowed. In Proceedings of the
Twenty-Eighth Annual Meeting of the Berke-
ley Linguistics Society, pages 249–267. DOI:
https://doi.org/10.3765/bls.v28i1
.3841
Andrew Lamont.
2019b. Precededence
is
pathological: The problem of alphabetical
the 36th West
sorting.
Coast Conference on Formal Linguistics,
pages 243–249, Somerville, MA. Cascadilla
Proceedings Project.
In Proceedings of
G´eraldine Legendre, Yoshiro Miyata, and Paul
Smolensky. 1990. Harmonic Grammar: a
theory of
formal multi-level connectionist
linguistic well-formedness: an application. In
Proceedings of the Twelfth Annual Conference
of the Cognitive Science Society, page 884–891,
Hillsdale, NJ. Erlbaum.
John J. McCarthy. 2003. OT constraints are
categorical. Phonology, 20(1):75–138. DOI:
https://doi.org/10.1017/S095267
5703004470
John J. McCarthy. 2010. Autosegmental spreading
in Optimality Theory. In John A. Goldsmith,
Elizabeth Hume, and W. Leo Wetzels, editors,
Tones and Features, pages 195–222. Walter de
Gruyter, Berlin.
John J. McCarthy and Alan Prince. 1993. Gen-
eralized alignment. In Geert Booij and Jaap
van Marle, editors, Yearbook of Morphol-
ogy, pages 79–153. Kluwer, Dordrecht. DOI:
https://doi.org/10.1007/978-94
-017-3712-8 4
Kevin Mullin. 2011. Strength in harmony
systems: Trigger and directional asymmetries.
Unpublished manuscript, University of Mas-
sachusetts Amherst. Available at https://
www.academia.edu/393775/Strength
in Harmony Systems Trigger and
Directional Asymmetries
Marina Nespor and Irene Vogel. 1986. Prosodic
Phonology. Foris Publications, Dordrecht.
Alan Prince and Paul Smolensky. 1993/2004.
Optimality Theory: Constraint Interaction in
Generative Grammar. Blackwell Publishing,
Malden, MA.
536
Jason Riggle. 2004. Generation, Recognition, and
Learning in Finite State Optimality Theory.
thesis, University of California Los
Ph.D.
Angeles.
James Rogers, Jeffrey Heinz, Gil Bailey, Matt
Edlefsen, Molly Visscher, David Wellcome,
and Sean Wibel. 2010. On languages piece-
wise testable in the strict sense. The Math-
ematics of Language, 10/11:255–265. DOI:
https://doi.org/10.1007/978-3
-642-14322-9 19
Sharon Rose and Rachel Walker. 2004. A
typology of consonant agreement as corre-
spondence. Language, 80(3):475–531. DOI:
https://doi.org/10.1353/lan.2004
.0144
Elisabeth Selkirk. 1984. Phonology and Syntax:
The Relation between Sound and Structure.
MIT Press, Cambridge.
Elisabeth Selkirk. 2011. The syntax-phonology
interface. In John A. Goldsmith, Jason Riggle,
and Alan C. L. Yu, editors, The Hand-
book of Phonological Theory, 2nd edition,
pages 435–484. Blackwell Publishing, Oxford.
DOI: https://doi.org/10.1002/978
1444343069.ch14
Paul Smolensky. 1992. Harmonic Grammars for
formal languages. In Advances in Neural Infor-
mation Processing Systems 5, pages 847–854,
Morgan Kaufmann Publishers
Inc., San
Francisco.
Paul Smolensky. 1993. Harmony, markedness,
and phonological activity. Paper presented at
Rutgers Optimality Workshop 1. Available at
http://roa.rutgers.edu/article
/view/88.
Paul Smolensky. 2006. Optimality in phonology
II: Harmonic completeness,
local constraint
conjunction, and feature-domain markedness.
In Paul Smolensky and G´eraldine Legendre,
editors, The Harmonic Mind: From Neural
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
t
l
a
c
_
a
_
0
0
3
8
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Completeness to Optimality-Theoretic Gram-
mar, volume II, pages 27–160. MIT Press,
Cambridge, MA.
Keiichiro Suzuki. 1998. A Typological Investiga-
tion of Dissimilation. Ph.D. thesis, University
of Arizona.
Rachel Walker. 2000. Long-distance consonantal
identity effects. In Proceedings of WCCFL 19,
pages 532–545. Somerville, MA. Cascadilla
Press.
Kristine M. Yu. 2019. Parsing with Minimalist
Grammars and prosodic trees. In Robert C.
Berwick and Edward P. Stabler, editors, Min-
imalist Parsing, pages 69–109. Oxford Uni-
versity Press, Oxford. DOI: https://doi
.org/10.1093/oso/9780198795087.003
.0004
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
3
8
2
1
9
2
4
1
7
0
/
/
t
l
a
c
_
a
_
0
0
3
8
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
537
PDF Herunterladen