On the Universal Generation Problem for - Specialized Research AI at MIT

On the Universal Generation Problem for
Uniﬁcation Grammars

J ¨urgen Wedekind
University of Copenhagen

∗

The universal generation problem for uniﬁcation grammars is the problem of determining
whether a given grammar derives any terminal string with a given feature structure. It is known
that the problem is decidable for LFG and PATR grammars if only acyclic feature structures are
taken into consideration. In this brief note, we show that the problem is undecidable for cyclic
structures. This holds even for grammars that are off-line parsable.

The universal generation problem for uniﬁcation grammars is the problem of determin-
ing for an arbitrary grammar G and an arbitrary feature structure F whether there exists
at least one sentence that G derives with F. If F is acyclic, Wedekind and Kaplan (2012)
have shown that the problem is decidable for LFG (Kaplan and Bresnan 1982) and PATR
(Shieber et al. 1983) grammars. They prove that the set of strings that a grammar relates
to an acyclic feature structure can be described by a context-free grammar. Decidability
of the problem then follows because the emptiness problem is decidable for context-
free languages. For cyclic feature structures they demonstrated by example that the set
of strings that a grammar relates to an input might not be context-free, but they did not
further investigate the formal properties of the languages that are in general related to
cyclic structures.

In this brief note, we show the undecidability of the universal generation prob-
lem by reduction from the undecidable emptiness problem for the intersection of two
context-free languages. We provide a proof for LFG- or PATR-style grammars that asso-
ciate feature structures with trees derived in accordance with a context-free grammar.
Our result also applies to other systems such as HPSG (Pollard and Sag 1994) whose
formal devices are powerful enough to simulate, albeit indirectly, the effect of context-
free derivation.

To state the universal generation problem more formally, recall that a uniﬁcation
G between terminal strings and feature

grammar G deﬁnes a binary derivation relation Δ
structures, as given in (1).

(1) Δ

(s, F) iff G derives terminal string s with feature structure F

The universal generation problem is then the problem of deciding for an arbitrary
(s, F)} is
uniﬁcation grammar G and an arbitrary feature structure F whether {s | Δ
empty or not.

∗ Center for Language Technology, University of Copenhagen, Njalsgade 140, 2300 Copenhagen S,

Denmark. E-mail: jwedekind@hum.ku.dk.

Submission received: 29 October 2013; accepted for publication: 27 January 2014.

doi:10.1162/COLI a 00191

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
0
3
5
3
3
1
8
0
3
2
1
8
/
c
o

l
i

_
a
_
0
0
1
9
1
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 40, Number 3

L’
E’

root

R
B

E’

root

L’

R’

R’
B

E’

Figure 1
A sample c-structure and the f-structures associated with it by type 1 (top) and type 2 (bottom)
string grammar derivations.

For the reduction of the emptiness problem for the intersection of two context-free
languages, we can, without loss of generality, assume that the context-free languages are
(cid:2)-free. These languages can be described by grammars in Chomsky normal form, that
is, by context-free grammars G = (N, T, S, P) with nonterminal vocabulary N, terminal
vocabulary T, and start symbol S where every rule in P is of the form A → BC with
B, C ∈ N, or A → a with a ∈ T.

For the proof we ﬁrst deﬁne for each context-free grammar G in Chomsky normal
form two LFG grammars that both derive L(G) and that associate with each derivable
terminal string feature structures (f-structures) that provide slightly different encodings
of the derivable string.

Let G = (N, T, S, P) be a context-free grammar in Chomsky normal form. A type
(cid:5)
(cid:5)) whose rule set P
(G) for G is an LFG grammar (N, T, S, P
1 string grammar String1
includes for each rule A → BC in P a rule of the form (2a) and for each rule A → a in P
a rule of the form (2b).
(2) a. A →

b. A →

a
(↑ B a) = (↑ E)

B
(↑ L) = ↓

C
(↑ R) = ↓

(↑ B) = (↓ B) (↑ E) = (↓ E)
(↑ L E) = (↓ B)

(cid:5)) whose rule
includes a rule of the form (3a) for each A → BC in P and a rule of the form (3b)

(G) for G is an LFG grammar (N, T, S, P

(cid:5)

A type 2 string grammar String2
set P
for each A → a in P.
(3) a. A →

B
(↑ L’) = ↓

C
(↑ R’) = ↓

(↑ B) = (↓ B) (↑ E’) = (↓ E’)
(↑ L’ E’) = (↓ B)

b. A →

a
(↑ B a) = (↑ E’)

Figure 1 illustrates a c-structure and the f-structures associated with it by type 1
and type 2 string grammar derivations.1 The attributes L, R, B, and E are mnemonic

1 Note that the terminal symbols also occur as attributes in the annotations of the terminal rules. This
“abuse” of the terminal symbols is not essential to our argument (a set of new attributes that is in
one-to-one correspondence with the set of terminals would also sufﬁce), but it makes the encoding
of the terminal strings in the f-structures more perspicuous.

534

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
0
3
5
3
3
1
8
0
3
2
1
8
/
c
o

l
i

_
a
_
0
0
1
9
1
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Wedekind

On the Universal Generation Problem

for ‘left’, ‘right’, ‘begin’, and ‘end’, respectively. For later reference, we also depicted
the constant root that we uniformly use to instantiate the ↑ of a derivation that refers
to the c-structure root; root then labels the f-structure element to which it refers in
the minimal model of the f-description. (In Kaplan and Bresnan’s [1982] terminology,
root corresponds to the f-structure variable associated with the c-structure root, usually
notated by f1.)

Both types of string grammars have in common that they have G as their context-
free skeleton and that for every string in L(G), the f-structure for each string grammar
encodes both the string itself and also the branching structure of a derivation in G that
leads to that terminal string. The f-structures derived by the two types of grammars
vary only slightly in the labels that they use to encode those properties. An f-structure
of a type 2 grammar derivation for a given string shares the ‘begin’ attribute (B) with
the f-structure of a corresponding type 1 grammar derivation, but it has distinct ‘left’,
‘right’, and ‘end’ attributes (L’, R’, E’).

Because the derived f-descriptions can never be unsatisﬁable (the string grammars
do not contain atomic values), the f-structure constraints of the string grammars do not
actually ﬁlter the language of the context-free grammar. Thus G and its string grammars
(G)). By induction on
must have the same language L(G) = L(String1
the depth of the derivation trees it is also easy to see that the minimal solution of the
f-description of a derivation of a terminal string s is acyclic and single-rooted, and
(cid:5) = s.
satisﬁes (root B s
That is, these grammars both encode their terminal strings in their respective (root B) to
(root E)/(root E’) paths.

(cid:5)) = (root E’), respectively, if and only if s

(cid:5)) = (root E) and (root B s

(G)) = L(String2

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
0
3
5
3
3
1
8
0
3
2
1
8
/
c
o

l
i

_
a
_
0
0
1
9
1
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Before going into the details of the undecidability proof, we ﬁrst give an out-
line of the proof idea. For the reduction, we have to construct for two arbitrary
(cid:2)-free context-free languages L1 and L2 an LFG grammar G and an input structure
F such that the set of terminal strings that G derives with F is empty if and only
if the intersection of L1 and L2 is empty. Because every (cid:2)-free context-free language
is derivable by a context-free grammar in Chomsky normal form, we can make
),
(G2
the LFG grammar G by combining the productions of String1
)
= (N2, T2, S2, P2
for two arbitrary context-free grammars G1
in Chomsky normal form. To avoid undesired interactions between the rules of
the two string grammars, we assume that the sets of nonterminals of G1 and G2
are disjoint (this is without loss of generality because nonterminals can always be
renamed).

(G1
) and G2

= (N1, T1, S1, P1

) and String2

(cid:5)) where s

) and String2

We observed already that the string grammars String1

) asso-
ciate with any c-structure derivation of a terminal string s1 in G1 and any c-structure
derivation of a terminal string s2 in G2 f-structures that encode s1 and s2 as their
respective (root B) values. By construction of the string grammars, the only paths that
the two f-structures share are the paths (root B s
is a common preﬁx of s1
),
(G2
) and String2
and s2. Thus, if we deﬁne G to consist of the rules of String1
and a start rule that expands S to S1S2 and forces the f-structures for s1 and s2 to
unify, their (root E) and (root E’) paths become reentrant ((root E) = (root E’)) if and
only if s1 and s2 are identical. G then assigns to a terminal string an f-structure with
reentrant (root E) and (root E’) paths if and only if it has the form s
is in
L(G1

) ∩ L(G2
If all we do is uniﬁcation on the top level the f-structures for the strings s1s2
would still record information on the structure of their derivation. Thus distinct strings
)} would get assigned distinct f-structures. However, the
in {s
proof requires that there be a single f-structure that is assigned to all strings s
with

(cid:5) ∈ L(G1

) ∩ L(G2

and s

(cid:5) | s

(G1

(G2

(cid:5)

535

Computational Linguistics

Volume 40, Number 3

root

E’

E B

L’

R’

cycles for the attributes in T1

∩ T2

Figure 2
The functional contribution of the S rule to a derivation in G.

(cid:5)

in L(G1

) ∩ L(G2

). We achieve that by annotating the start rule so that the uniﬁed
s
f-structures derived by the string grammars are folded up into one and the same cyclic
| + 7 cycles
f-structure F. This f-structure consists of a single element (node) and |T1
of length 1, each one labeled with one of the attributes in {B, L, R, L’, R’, E, E’} ∪ (T1
).
F thus has the following form.2

∩ T2

(4)

L’

cycles for the attributes in T1

∩ T2

R’

E’

∩ T2, so that it imposes no constraints on
F must contain cycles for all terminals in T1
).
) ∩ L(G2
the strings that may appear in L(G1
The folding into F is accomplished by annotations of G’s start rule whose contribu-
tion to a derivation in G is depicted in Figure 2. As earlier, we include root for later use.
Obviously, if (root E) = (root E’) holds in the uniﬁed f-structures of the string grammars,
then the uniﬁcation of the string grammar f-structures and the structure in Figure 2
yields F. Otherwise, their uniﬁcation results in a structure that only properly subsumes
F. This is because neither (root E E) nor (root E’ E) exists in the uniﬁed f-structures of the
two string grammars, and therefore their values in the structure in Figure 2 are not
merged when the structures are uniﬁed. Thus G derives with F exactly the set of strings
(cid:5)
) is
s
empty.

). Hence, this set is empty if and only if L(G1

) ∩ L(G2

in L(G1

with s

(cid:5)

We now give a rigorous statement and proof of our undecidability theorem.

Theorem
For an arbitrary LFG grammar G and an arbitrary f-structure F it is undecidable whether
{s | Δ

(s, F)} = ∅.

2 This f-structure may look peculiar in that it does not contain atomic feature values. However, this is not
relevant to the proof. To make the f-structure look more “natural,” we can, for example, expand G by an
annotation (↑ D) = V at the start rule and F by a feature D with value V.

536

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
0
3
5
3
3
1
8
0
3
2
1
8
/
c
o

l
i

_
a
_
0
0
1
9
1
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Wedekind

On the Universal Generation Problem

) and G2

= (N1, T1, S1, P1

Proof
) be two arbitrary context-free gram-
Let G1
mars in Chomsky normal form. Without loss of generality, we can assume that
) we construct an LFG grammar
N1
G = (N, T, S, P) with N = N1
∪ T2. The rule set P
consists of the rules of String1

= ∅. On the basis of String1
∪ N2
(G1

∪ N2, and T = T1
) and the following start rule.

(G1
∪ {S}, S (cid:10)∈ N1
(G2

= (N2, T2, S2, P2

) and String2

∩ N2

(G2

S →

S1
↑ = ↓
(↑ E E) = ↑

S2
↑ = ↓
(↑ E’ E) = (↑ E’ E B)
(↑ E’ E) = (↑ E’ E L)
(↑ E’ E) = (↑ E’ E R)
(↑ E’ E) = (↑ E’ E L’)
(↑ E’ E) = (↑ E’ E R’)

……

(cid:2)

annotations (↑ E’ E) = (↑ E’ E x)
∩ T2
for all x in T1

) = (root E) and (root B s2

The functional contribution of this start rule to a derivation in G is depicted in Figure 2.
The (↑ E E) = ↑ annotation at S1 introduces the left cycle and the annotations at S2
account for the rest. Now let F be the f-structure in (4) and consider an arbitrary
derivation of a terminal string s with f-description FD in G. By construction of G, s must
= s2 iff
have the form s1s2, with s1 derived from S1 and s2 derived from S2. We claim s1
F is the f-structure for FD. Note ﬁrst that also G does not contain atomic values. Thus,
FD cannot be unsatisﬁable and must have an f-structure.
) = (root E’)
= s2, then FD (cid:11) (root E) = (root E’), since (root B s1
If s1
follow from FD. From (root E) = (root E’) and the instantiated annotations of the S rule,
∩ T2
we get (root x) = root, for all x ∈ {B, L, R, L’, R’} ∪ (T1
). With these equations we can
then derive from (root B s1
) = (root E’) equations root = (root x),
) = (root E) and (root B s2
for x ∈ {E, E’}. Thus F must be the f-structure that we obtain from a minimal model
of FD.
(cid:10)= s2. Let FD1 and FD2 be the f-descriptions of the string grammars,
Now suppose s1
(cid:5)
}, with n1 and n2 instantiating ↓ in the
∪ {root = n2
∪ {root = n1
= FD1
FD
1
(cid:5)
(cid:5)
(cid:5) = FD
∪ FD
2. By construction of G, the only terms
annotations at S1 and S2, and FD
1
(cid:5)
(cid:5)
)
2 are the common subterms of (root B s1
1 and FD
shared by the deductive closures of FD
(cid:5)
(cid:5) (cid:10)(cid:11) (root E) = (root E’), because otherwise FD
(cid:5)) = (root E)
(cid:11) (root B s
and (root B s2
1
(cid:5)
(cid:5)= s2, as we saw earlier. Because
and FD
2
(cid:5)
obviously (root E E) and (root E’ E) do not occur in any equation derivable from FD
,
(root E) = (root E’) cannot follow from FD either, and F cannot be the f-structure
for FD.
(s, F)} = ∅ if and only
Thus {s | Δ
if L(G1
) = ∅. Since the emptiness problem for the intersection of context-free
languages is in general undecidable, the generation problem must be undecidable
too.

(cid:5)) = (root E’) would imply s

)} and hence {s | Δ

(cid:5)= s1 and s

G
) ∩ L(G2

(s, F)} = {s

). Thus FD

(cid:5) ∈ L(G1

(cid:11) (root B s

) ∩ L(G2

= FD2

}, FD

(cid:5) | s

(cid:5)
2

(cid:5)

As a consequence of this theorem we know that there does not exist a general gen-
eration algorithm, at least if cyclic input structures are considered as legitimate inputs.
We note that the grammars constructed in this proof are off-line parsable (cf.,
e.g., Kaplan and Bresnan 1982; Johnson 1988; Jaeger, Francez, and Wintner 2005). Off-
line parsability is sufﬁcient to guarantee the decidability of the recognition/parsing
problem even for cyclic f-structures. But Wedekind and Kaplan (2012) have shown that
off-line parsability is not necessary to guarantee that generation from acyclic structures

537

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d

f
/

4
0
3
5
3
3
1
8
0
3
2
1
8
/
c
o

l
i

_
a
_
0
0
1
9
1
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 40, Number 3

is decidable, and the grammars in this proof demonstrate that it is not sufﬁcient for
cyclic structures.

Off-line parsability typically bounds the size of the c-structures of a string by a func-
tion of the length of that string. This works for parsing because the size of the f-structure
is bounded by the size of the c-structure, but it is insufﬁcient for generation because it
does not constrain the structural correspondence between the c- and f-structure (see also
Dymetman 1991). A single constraint that guarantees decidability for both parsing and
generation must not only bound the size of the f-structures for a terminal string by the
length of the string, but it must also ensure, as we have learned from the proof herein,
that the determination of the terminal strings for an f-structure can be achieved with
ﬁnite control.

Acknowledgments
The author wishes to thank Ron Kaplan
for his insightful comments and helpful
suggestions during the preparation of this
paper, and the four anonymous reviewers for
their valuable feedback on an earlier draft.

References
Dymetman, Marc. 1991. Inherently reversible

grammars, logic programming and
computability. In Proceedings of the ACL
Workshop: Reversible Grammar in Natural
Language Processing, pages 20–30, Berkeley,
CA.

Jaeger, Efrat, Nissim Francez, and Shuly

Wintner. 2005. Uniﬁcation grammars and
off-line parsability. Journal of Logic,
Language, and Information, 14(2):199–234.
Johnson, Mark. 1988. Attribute–Value Logic

and the Theory of Grammar. CSLI
Publications, Stanford, CA.

Kaplan, Ronald M. and Joan Bresnan. 1982.
Lexical-Functional Grammar: A formal
system for grammatical representation.
In Joan Bresnan, editor, The Mental
Representation of Grammatical Relations. MIT
Press, Cambridge, MA, pages 173–281.
Pollard, Carl and Ivan Sag. 1994. Head-Driven
Phrase Structure Grammar. The University
of Chicago Press, Chicago, IL.
Shieber, Stuart M., Hans Uszkoreit,

Fernando C. N. Pereira, Jane Robinson,
and Mabry Tyson. 1983. The formalism
and implementation of PATR-II. In
Barbara J. Grosz and Mark E. Stickel,
editors, Research on Interactive Acquisition
and Use of Knowledge. SRI Final Report
1894. SRI International, Menlo Park, CA,
pages 39–79.

Wedekind, J ¨urgen and Ronald M. Kaplan.

2012. LFG generation by grammar
specialization. Computational Linguistics,
38(4):867–915.

538

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
/
c
o

l
i
/

a
r
t
i
c
e
–
p
d