Provable Limitations of Acquiring Meaning from Ungrounded Form: - IA de Investigación especializada en el MIT

Provable Limitations of Acquiring Meaning from Ungrounded Form:
What Will Future Language Models Understand?

William Merrill∗ Yoav Goldberg∗ † Roy Schwartz‡ Noah A. Smith∗§

∗Allen Institute for AI, United States

†Bar Ilan University, Israel

‡Hebrew University of Jerusalem, Israel

§University of Washington, United States

{willm,yoavg,roys,noah}@allenai.org

Abstracto

Language models trained on billions of tokens
have recently led to unprecedented results on
many NLP tasks. This success raises the ques-
tion of whether, in principle, a system can ever
‘‘understand’’ raw text without access to some
form of grounding. We formally investigate
the abilities of ungrounded systems to acquire
significado. Our analysis focuses on the role of
‘‘assertions’’: textual contexts that provide in-
direct clues about the underlying semantics.
We study whether assertions enable a system
to emulate representations preserving semantic
relations like equivalence. We find that asser-
tions enable semantic emulation of languages
that satisfy a strong notion of semantic trans-
parency. Sin embargo, for classes of languages
where the same expression can take different
values in different contexts, we show that em-
ulation can become uncomputable. Finalmente, nosotros
discuss differences between our formal model
and natural language, exploring how our re-
sults generalize to a modal setting and other
relaciones semánticas. Juntos, our results sug-
gest that assertions in code or language do
not provide sufficient signal to fully emulate
semantic representations. We formalize ways
in which ungrounded language models appear
to be fundamentally limited in their ability to
‘‘understand’’.

Introducción

Recientemente, language models trained on huge data-
sets of raw text have pushed the limits of natural
language processing (Devlin et al., 2019; Raffel
et al., 2019; Brown y cols., 2020, among others).
Such systems transcend the expert system para-
digm, where rules about language and meaning are
hardcoded into a system, as well as the supervised
learning paradigm, where a notion of meaning
is provided through ground-truth labels. Bastante,
analysis of massive language models has revealed
eso, to some degree, knowledge of syntactic and

semantic dependencies can emerge without ex-
plicit supervision (Rogers et al., 2020; Tenney
et al., 2019). This knowledge can then be trans-
ferred to a variety of downstream NLP tasks.

Todavía, today’s NLP systems built on large lan-
guage models still fall short of human-level gen-
eral understanding (Yogatama et al., 2019; zhang
et al., 2020). Brown et al. (2020) discuss the limi-
tations of their GPT-3 language model compared
with humans, suggesting that:

Scaling up any LM-like model . . . may
eventually run into (or could already be
running into) the limits of the pretraining
objetivo.

This possibility raises an interesting theoretical
pregunta. What are the fundamental limits of learn-
ing meaning from language modeling, even as-
suming a perfect learner with access to unlimited
datos? Recientemente, Bender and Koller (2020) argumentó
that achieving true natural language understanding
from text alone is impossible, y eso, to really
get at meaning, some type of semantic grounding
is necessary.1 Their style of argumentation largely
focused on developing thought experiments,
rather than making formal arguments.

One thought experiment featuring prominently
in Bender and Koller (2020) was the task of learn-
ing to understand a programming language’s se-
mantics from raw code. Aquí, understanding was
defined as fully emulating a compiler. This setup
has clear parallels to learning to understand natural
idioma, although the more well-defined nature
of programming languages makes them easier to
reason about. Bender and Koller (2020) argue that
emulation is difficult in this setting, and per-
haps impossible, because the source code alone
contains no information about how it should be
interpreted to create outputs. One counterpoint

1See Michael (2020) for a summary of the informal
discussion around Bender and Koller (2020), much of which
took place on social media.

1047

Transacciones de la Asociación de Lingüística Computacional, volumen. 9, páginas. 1047–1060, 2021. https://doi.org/10.1162/tacl a 00412
Editor de acciones: Mark-Jan Nederhof. Lote de envío: 9/2020; Lote de revisión: 4/2021; Publicado 9/2021.
C(cid:3) 2021 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3

/
t

a
C
_
a
_
0
0
4
1
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

raised by the paper, as well as others (Miguel
2020; Potts, 2020), is the existence of unit tests,
with assertions encoding examples of input/output
pairs for blocks of code.2 For example, sistema-
atically observing blocks like x = 3; assert x
== 3 could let a system bootstrap the semantics
of variable assignment, because a programmer is
likely to write assertions that will pass. These as-
sertions constitute a form of implicit grounding
embedded within language modeling by the prag-
matic concerns of programmers, and they could
potentially be leveraged to emulate a compiler.3
Sin embargo, it is not immediately clear if unit tests
provide ‘‘enough’’ supervision to do this, incluso
with unlimited data.

Viewing the debate about the power of asser-
tions as central to the larger philosophical ques-
ción, we aim to clarify it in more formal terms.
en este documento, we formally study whether observ-
ing a generalized notion of assertions can allow a
system to ‘‘understand’’ strings. An assertion is
a query about whether two strings evaluate to the
same value within a fixed context. This is moti-
vated by the role of assertions in unit tests, dónde
asserting two expressions are equal suggests that
they have the same value within the test.

While assertions are directly motivated by the
compiler thought experiment, they also have ana-
logs in natural language, where sentences make
assertions about the world, and it is reasonable to
expect some form of bias towards true statements
this is one of Grice’s
(Potts, 2020). En efecto,
Maxims (Grice, 1975): a set of basic principles
proposed to govern the pragmatics of natural lan-
guage. Por ejemplo, the truth conditions of This
cat is the cat that Mary owns verify that two cats in
the world identified in distinct ways are the same
entidad. En general, we might expect a sentence to
appear with higher frequency if its truth conditions
hold within its context, similar to an assertion in
código, although of course there will also be other
factors governing sentence frequency besides this.
En este sentido, the example sentence resembles the
Python statement assert cat1 == cat2, dónde
cat1 and cat2 are two Cat objects. See Section 6
for more discussion of how assertions and other
formal concepts translate to natural language. Nosotros
will generalize assertions to an abstract formal

2Unit tests are blocks of code in a software project that are
designed to test whether the core code is behaving correctly.
3Contexts like assertions can be seen as an argument in

favor of the distributional hypothesis (harris, 1954).

language context, allowing us to study how they
can be used to emulate semantic relations.

Our findings are as follows. If every expression
in a language has the same value in every valid
contexto, then the language can be emulated using
a finite number of assertion queries (Sección 4).
Sin embargo, we construct a class of languages where
expressions can take different values in different
contextos, and where assertions do not enable em-
ulación, es decir., infinite queries would be required
(Sección 5). Intuitivamente, this means that assertions
do not provide enough signal for a Turing-complete
emulator to fully ‘‘understand’’ languages from
this class. We go on to discuss differences between
our formal model and the less well-defined context
of natural language (Sección 6).

These results provide a formal way to charac-
terize upper bounds on whether it is possible to
emulate the semantics of a language from distribu-
tional properties of strings. Within our framework,
in certain settings, we find that meaning cannot
be learned from text alone. We strengthen claims
made by Bender and Koller (2020) that assertions
in code do not necessarily provide sufficient signal
for a language model to emulate understanding.
We do not make strong claims about how these
results transfer to natural language, although we
expect that the added complexity of natural lan-
guage would make it, if anything, more difficult to
‘‘understand’’ than code.4

2 Preliminaries

Let L ⊆ Σ(cid:2) denote a formal language over alpha-
bet Σ. We will use λ to denote the empty string.
Dejar (S(cid:2))2 denote the Cartesian product of Σ(cid:2)
with itself; eso es, the set of all pairs of strings.
Resembling Clark (2010), we refer to a tuple
(cid:5)yo, r(cid:6) ∈ (S(cid:2))2 as a syntactic context. Nosotros también
use other symbols to refer to a context (p.ej.,
κ = (cid:5)yo, r(cid:6)). We denote by λ2 the empty context
(cid:5)λ, λ(cid:6).

2.1 Significado

We will model formal languages not just as sets
of strings, but as having an associated semantics.5

4Appendix C documents and motivates conceptual

changes since the original arXiv version of the paper.

5We slightly abuse notation by using L to refer to both
a set of strings, and a set of strings paired with a denotation
función, which could be written more verbosely as (cid:5)l, (cid:2)·(cid:3)l(cid:6).

1048

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3

/
t

a
C
_
a
_
0
0
4
1
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Específicamente, we assume the existence of a de-
notational semantics over every substring of L,
which we now elaborate on. Let Y be a count-
able set of referents. Primero, we will say that some
e ∈ Σ(cid:2) is a valid expression within the context
κ = (cid:5)yo, r(cid:6) if there exists some contextual denota-
ción (cid:2)mi | κ(cid:3)L ∈ Y . Intuitivamente, this represents the
value of e when it occurs in the larger context
ler ∈ L. We will also use the notation (cid:2)mi | yo, r(cid:3)l
where convenient. We will reserve ∅ ∈ Y as a
special null symbol, defining (cid:2)mi | κ(cid:3)L = ∅ iff e is
not a valid expression in the context κ.6

Each context κ ∈ (S(cid:2))2 also has a support, o

set of expressions that are valid within it:

suppL(κ) = {e ∈ Σ(cid:2) | (cid:2)mi | κ(cid:3)l (cid:9)= ∅}.

Example Let L be a language of integers along
con el + operator, Por ejemplo, 2 + 2. Y is
simply the integers. We take (cid:2)mi | κ(cid:3)L to map e
to its standard arithmetic interpretation, a saber,
(cid:2)2 + 6 | λ, + 4(cid:3)L = 8. We take expressions that
are not conventionally well-formed to be invalid:
Por ejemplo, (cid:2)+ | λ, +(cid:3)L = ∅. Finalmente, let κ =
(cid:5)λ, + 4(cid:6). Then suppL(κ) = L, since any valid
expression can occur within κ.

2.2 Strong Transparency

As defined above, we make very few assump-
tions about denotations. They are not necessarily
compositivo, and expressions may take differ-
ent referents in different contexts. Sin embargo, nosotros
saw in the integer expression language that the
meanings of an expression did not depend on its
contexto. We now define a property formalizing
this idea.

Definición 1 (Strong transparency) L is strongly
transparent iff, for all e ∈ Σ(cid:2), κ ∈ (S(cid:2))2, either
(cid:2)mi | κ(cid:3)L = (cid:2)mi | l2(cid:3)l (cid:9)= ∅, o (cid:2)mi | κ(cid:3)L = ∅.

Informalmente, strong transparency says each e has
a well-defined denotation that exists independent

6Our simple model of denotations does not reflect the full
range of semantic theories that have been proposed for natural
idioma. En particular, our denotations (cid:2)mi | κ(cid:3)L depend only
on the linguistic context κ rather than any external world
estado. This differs substantially from how truth conditions are
traditionally conceptualized in formal semantics (Heim and
Kratzer, 1998). Por ejemplo, in our framework, the referent
of English (cid:2)the dog | κ(cid:3)L must be fixed with no regard for
the extralinguistic context. Sección 6 further contrasts our
setup with the richer semantics of natural language.

of context, and that this simple denotation can be
‘‘plugged into’’ any context. Our previous exam-
ple expression 2 + 6 is strongly transparent be-
cause it can be said to have a well-defined value 8
independent of its context. We could break strong
transparency by adding bound variables to the lan-
guage, Por ejemplo, x = 2; X + 6 in Python. En
este caso, (cid:2)X | κ(cid:3)L non-vacuously depends on κ.

Strong transparency resembles referential trans-
parency (Whitehead and Russell, 1925–1927), pero
is a stronger condition, in that it does not allow the
same name to ever refer to different values. por ejemplo-
amplio, for a Python program, strong transparency
does not allow assigning local variables within a
función, even if the function output would remain
completely specified by its inputs.

2.3 Assertion Queries

We now define an oracle function providing asser-
tion information about expressions in L, resem-
bling assert e1 == e2 for two Python expressions
e1, e2. A system is granted access to this function,
and it can make assertion queries to it in order
to learn about the semantics of L.7 An assertion
query tells us whether two expressions e, mi(cid:10) son
equivalent within the context κ.
Definición 2 (Assertion oracle) For e, mi(cid:10), ∈ Σ(cid:2)
and κ ∈ (S(cid:2))2, define the assertion oracle

ℵL(mi, mi(cid:10) | κ) =

(cid:2)

1
0

si (cid:2)mi | κ(cid:3)L = (cid:2)mi(cid:10) | κ(cid:3)l
de lo contrario.

Recall that we defined (cid:2)mi | κ(cid:3)L = ∅ if e is not
valid in the context κ. In our example language
of integer expressions, for all κ, ℵL(4, 2 + 2 |
κ) = 1, desde 4 = 2 + 2. The computational
power of this oracle depends on the complexity
of the underlying semantics: For arbitrary seman-
tics, it can become uncomputable. en este documento,
aunque, we focus on classes of languages for which
the denotation function and assertion oracle are
computable.

The ℵL oracle is motivated by assertion state-
ments in programming languages, which occur
naturally in environments like unit tests. The dis-
tribution of strings in a corpus of code should cap-
ture some notion of this oracle, since a programmer
is more likely to assert two expressions are equal if

7This resembles the role of queries in classical grammar

induction works (p.ej., Angluin, 1987).

1049

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3

/
t

a
C
_
a
_
0
0
4
1
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

they are expected to have the same value. Our goal
is to study the limits of understanding achievable
from raw text, so we consider an ‘‘upper bound’’
setup by assuming a system has full access to ℵL.
Can the system use this powerful oracle to emulate
the underlying semantics?

2.4 Turing Machines

Our notion of language understanding will be
based around the idea of emulation, which in turn
requires a model of computational realizability.
We will use Turing machines (Turing, 1936) como
a model of universal computation. We write μ(mi)
for the output of Turing machine μ evaluated on
input e ∈ Σ(cid:2). We will also define an oracle Turing
machine as a standard Turing machine that can
compute a blackbox ‘‘oracle’’ function f as a
subroutine. We imagine the machine has a special
query instruction and tape. After writing x to the
query tape and executing the query instruction,
the query tape will contain f (X). We will write
μf (mi) for the Turing machine μ evaluated on
input e with oracle access to f . In the case where
f = ℵL, we will simply write μL(mi). Whereas,
in computability theory, oracle Turing machines
are generally leveraged to make reductions from
uncomputable problems, here we will use them
to formalize the ability of an emulator to make
assertion queries about L. This oracle provides
additional power because these queries contain
additional information beyond that encoded in the
input expression.

3 Research Question: Do Assertions

Enable Emulation?

There is a long history in AI of trying to define
and measure understanding. Turing (1950) consti-
tutes an early behaviorist perspective; more recent
approaches tend to emphasize not just an external
view of a system’s behavior, but also ‘‘how it is
achieved’’ (Levesque, 2014). Understanding can
be behaviorally diagnosed in neural models by
evaluating them on benchmarks (Wang y cols.,
2018). An alternate approach is probing (Adi et al.,
2017; Conneau et al., 2018; Hupkes and Zuidema,
2018; Hewitt and Liang, 2019; Belinkov and
Glass 2019), which investigates how directly a
model’s representations encode semantic relations
by measuring if they can be easily decoded from

Cifra 1: An illustration of Definition 3. μ emulates a
representation of each expression using assertion que-
ries. Entonces, δ compares the emulated representations to
determine equivalence.

a ellos. Similarmente, we take the position that systems
are capable of understanding if they emulate rep-
resentations that are isomorphic to underlying
meaning under important semantic relations like
equivalence. We will formalize this in Question 1,
which asks whether such emulation is possible
using assertions.

Definición 3 (ℵ-emulation) A class of languages
L over Σ is ℵ-emulatable if there exists an oracle
Turing machine μ and standard Turing machine
δ such that, for all L ∈ L, κ ∈ (S(cid:2))2, y
mi, mi(cid:10) ∈ suppL(κ),

(cid:2)mi | κ(cid:3)L = (cid:2)mi(cid:10) | κ(cid:3)L ⇐⇒ δ

(cid:3)

(cid:4)
μL(mi), μL(mi(cid:10)) | κ

μ can be thought of as an emulator that evaluates
expresiones, whereas δ receives two values and
decides whether they are equal. Fundamentalmente, only μ
has direct access to ℵL. δ can only use information
from the oracle to the extent that it is encoded in
the representations μL(mi) and μL(mi(cid:10)).

Definición 3 formulates emulation as a decision
problema, as is typical in theoretical computer
ciencia. Equivalently, δ can be replaced by a
computable function ρ such that ρ(μL(mi) | κ)
evaluates μL(mi) in context κ, eso es, its output
string is isomorphic to (cid:2)mi | κ(cid:3)L under =. El
functions δ and ρ are Turing-reducible to each
otro, implying that if one definition is satisfied,
so is the other.

1050

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3

/
t

a
C
_
a
_
0
0
4
1
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 2: emulate implements an emulator μ. Let all strings be an iterable enumerating all strings in Σ(cid:2).
We provide a concrete implementation of all strings in Figure 5.

With our definition of emulation in place, nosotros

can formally state the research question:
Question 1 For a class of languages L, is L
ℵ-emulatable?

How does Question 1 relate to understanding in
large language models? We imagine that, con
sufficiently large amounts of data, the frequencies
of strings in L carry enough signal such that the
language model objective ‘‘supervises’’ access
to ℵL. De este modo, μL(mi) can be thought of as the
language model representation of an expression e.
We then hope to recover underlying semantic
relations from the representations produced by the
language model via some function δ. The class L
corresponds to a set of hypothesis languages over
which the language model must search for the true
l. We will see that whether emulation is possible
will depend on the properties of L.

Stepping back, Question 1 bears on the role
of assertions
raised by Bender and Koller
(2020). Does observing assertions allow a Turing-
complete system to emulate a compiler? In more
general terms, are assertions powerful enough im-
plicit grounding to achieve representations that
encode the denotational semantics of a language?

4 Strong Transparency

We first consider the case where the language
being learned is known to be strongly transpar-
ent. Let TRANSPARENT denote the class of strongly
transparent languages. We will show that TRANS-
PARENT is ℵ-emulatable. The core idea of the proof
is to construct a canonical form for each expres-
sión. The canonical form is the first expression
in a lexicographic ordering that the assertion ora-
cle deems equivalent to the target expression. Para
technical reasons, the emulator returns the index
of this string under the lexicographic order.

Teorema 1 TRANSPARENT is ℵ-emulatable.
Prueba. As Python is Turing-complete, we write μ :
S(cid:2) → N as a Python function emulate in Figure 2.
The function receives as input an expression expr
and a callback function asserteq to an oracle
computing ℵL. For each e ∈ Σ(cid:2), there exists e(cid:2) ∈
S(cid:2) such that ℵL(mi, mi(cid:2) | l2) = 1. In the ‘‘worst
case’’, this holds when e(cid:2) = e by symmetry.
Por construcción, all strings reaches all strings
in finite time. Por lo tanto, the number of loop
iterations before reaching e(cid:2) is finite. We can
conclude that emulate halts on every e ∈ Σ(cid:2),
establishing that it is computable.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

Ahora, we move towards justifying that the em-
ulation is correct for every κ ∈ (S(cid:2))2. We note
that δ is simply the indicator function for equality
over the natural numbers:
(cid:2)

δ(metro, metro(cid:10) | κ) =

1 if m = m(cid:10)
0 de lo contrario.

a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3

/
t

a
C
_
a
_
0
0
4
1
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

The function emulate outputs i ∈ N, the index of
the first string e(cid:2) such that (cid:2)mi | l2(cid:3)L = (cid:2)mi(cid:2) | l2(cid:3)l.
Ahora, let e, mi(cid:10) ∈ suppL(κ) be different inputs to μ.
Because the enumeration order of the for loop is
fixed across computation of μL(mi) and μL(mi(cid:10)):

where the last step follows by strong transparency.
We conclude that the conditions for emulation
(cid:2)
(Definición 3) are fully satisfied.

Through a simple construction, we have shown
it is possible to emulate meaning from assertion
queries for languages with strongly transparent

1051

Cifra 3: Templates for strings in Lm, for m ∈ N ∪ {∞}. M evaluates to m in all strings, while other expressions
are evaluated according to Python 3.8 semantics. The metavariable n ranges over N to form different strings in
Lm, and is serialized as a decimal string.

semantics. The number of bits in the emulated rep-
resentation μL(mi) is linear in the size of e. En el
next section, we consider what happens without
strong transparency, dónde, among other com-
plejidades, values can be bound to variables, compli-
cating the construction used in Theorem 1.

5 General Case

Requiring strong transparency precludes a broad
class of linguistic patterns allowing an expression
to refer to different values in different contexts.
Por ejemplo, this includes assigning variable or
function names in Python, or binding pronouns in
natural language. These constructions can make
emulation impossible to achieve from assertions.
We will construct a class of languages based on
Python where emulation is uncomputable.

Definición 4 Let LEQ = {Lm | m ∈ N ∪ {∞}},
where strings in Lm are defined according
to Figure 3. For semantics, we first define
(cid:2)METRO | κ(cid:3)Lm = m. For any other ler ∈ Lm that
is a well-formed Python 3.8 expresión, we define
(cid:2)mi | yo, r(cid:3)Lm as the value of e assigned by the
Python interpreter in the context (cid:5)yo, r(cid:6). For strings
that are not valid Python expressions, define
(cid:2)mi | yo, r(cid:3)Lm = ∅.

What does it take to emulate the expressions
leq() and True in Lm? If we knew m, then we
could emulate them by simply comparing n < m. However, it turns out that recovering m for any Lm ∈ LEQ is not possible with a fixed number of assertion queries. Formalizing this, we will show that LEQ is not ℵ-emulatable.8 Theorem 2 LEQ is not ℵ-emulatable. each of which is parameterized by some value of n. Notationally, we identify each Lm with m, and each context with its parameter n. This enables shorthand like (cid:2)e | n(cid:3)m for the denotation of the expression e in the context parameterized by n in Lm. When m = ∞, it holds for all n that ℵ∞(leq(), True | n) = 1. To satisfy emulation of e ∈ {leq(), True}, μ∞ makes a finite number of assertion queries ℵ∞(leq(), True | ni). for some sequence of contexts n1, · · · , nq, which we assume without loss of generality is sorted in increasing order. We can adversarially construct m(cid:10) (cid:9)= ∞ such that all these queries are the same, and thus μ∞(e) = μm(cid:10)(e) for both e. To imple- ment this, we simply set m(cid:10) = nq + 1. Since μ∞(e) = μm(cid:10)(e), we conclude that, for all n, δ(μm(cid:10)(leq()), μm(cid:10)(True) | n) = δ(μ∞(leq()), μ∞(True) | n). On the other hand, consider n > nq. En este caso,

(cid:2)leq() | norte(cid:3)metro(cid:10) = False
(cid:2)leq() | norte(cid:3)∞ = True,

which can be rewritten as

(cid:2)leq() | norte(cid:3)metro(cid:10)
(cid:9)= (cid:2)True | norte(cid:3)metro(cid:10)
(cid:2)leq() | norte(cid:3)∞ = (cid:2)True | norte(cid:3)∞.

Por lo tanto, the conditions of ℵ-emulation (Definir-
ción 3) cannot be satisfied for both Lm(cid:10) and L∞.
(cid:2)
This implies that LEQ is not ℵ-emulatable.

Prueba. Without loss of generality, we focus on
the contexts for leq()9 and True within print(·),

5.1 Discusión

8Another example of a non-ℵ-emulatable language takes
M to be a finite list of integers and replaces n < M with n in M. 9The only ‘‘valid’’ context for leq() is within print(·). The denotation of leq() when it occurs next to def is ∅. We briefly summarize this result in less formal terms. LEQ contains languages Lm defined by Figure 3. Every program in each Lm is easily computable. With knowledge of the Python in- terpreter and m, any agent could execute all of 1052 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 4: An informal construction adapting the program templates in Figure 3 to English. Under our framework, two sentences are considered equivalent if they are true in exactly the same set of contexts. If the number is allowed to be ∞, this cannot be done in general for the final lines of each template. these programs. This can be formalized by ob- serving that, for a fixed m, the class {Lm} is ℵ-emulatable. Rather, what we have shown is that, with finite time, it is impossible for an un- grounded agent to emulate Lm using assertion queries when m is unknown in advance. In other words, without prior knowledge of m, no algo- rithm can use assertions to disambiguate which notion of = is used by Lm from the infinite other possibilities. In a rough sense, m can be thought of as a cryptographic key enabling linguistic under- standing: agents that know m can directly emulate Lm, but agents without it cannot, at least using assertions.10 Theorem 2 does not use the fact that δ must be computable, as opposed to an arbitrary function. Even if δ is an arbitrary function, it could not disambiguate whether m halts based on queries. It is more precise to state Theorem 2 in a formal language, but an argument similar to Theorem 2 can be adapted to a natural language like English. An example is shown in Figure 4, where we define the meaning of a sentence as its truth conditions, and we imagine the class of candidate languages is formed by varying the unspecified number, which can potentially be ∞. Deciding if n is less than it has the same truth conditions as Zero equals one is equivalent to comparing leq() and True. A system must necessarily fail to emulate the semantics of these expressions in some context, for some secret number. The rest of the paper further explores the implications and limitations of applying our formal model to natural language. 10Alternatively, we can take a more complexity-theoretic perspective by measuring the number of queries needed to emulate up to a bounded context size. Fix a maximum n. Then we can use binary search with O(log n) queries to find the value of m. Since the number of context bits is O(log n), the numbers of queries is O(|κ|), beating the O(|Σ||κ|) query complexity achievable by brute force. This perspective somewhat resembles Pratt-Hartmann and Third (2006) and other work in semantic complexity theory on the computational complexity of evaluating fragments of natural language. 6 Towards Natural Language As discussed in Section 1, our results are inspired by the thought experiment of whether a language model can use raw code to learn a compiler. A goal of this, of course, is to examine whether un- derstanding can be acquired from natural language text in a simplified setting. In principle, our formal results can bear on this broader question about nat- ural language, although some differences emerge when extending the results to a less well-defined setting. In many cases, these differences appear to make the task of learning meaning harder, sug- gesting that our negative claim in a simpler setting (Theorem 2) may still hold as an impossibility result. We now discuss some points of difference between our formal model and natural language. Truth Conditions There are connections be- tween our framework and the concepts of truth values and truth conditions in linguistic semantics. For a Boolean-valued expression e, a truth value corresponds to computing (cid:2)e | κ(cid:3)L in a fixed con- text. On the other hand, truth conditions corre- spond roughly to a function computing (cid:2)e | κ(cid:3)L for any κ. A crucial difference, though, is that these conditions cannot be intensional (Von Fintel and Heim, 2011), that is, they are not functions of the world state, but rather of the linguistic con- text only. In this sense, emulation corresponds to recovering the ability to resolve non-intensional truth conditions of sentences. This model is natural for formalizing a closed programming language environment, for example, with no environment variables or user input, since in this case the pro- gram state is specified completely by the linguistic context. On the other hand, English has common elements like that whose meaning can change depending on world state external to language. Perhaps allowing such elements would only make understanding more difficult; or, arguably, gen- erally impossible, since there is no way for the model to observe the grounding world state us- ing only an assertion oracle. We are inclined to 1053 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 believe that, since such changes would make un- derstanding more difficult, Theorem 2 would still hold as an impossibility result. However, future work would be needed to make this idea precise. information about Possible Worlds In the last paragraph, we dis- cussed how mutable world state is an additional complexity of natural language compared to our setup. Similarly, speakers of natural languages the world have imperfect around them, which can be captured by modeling the referent of an expression over a set of possible worlds, rather than within a specific evaluation context. In Appendix A, we explore to what de- gree this setting makes the task of learning to understand more difficult. In adapting our model to this context, the assertion oracle must become ‘‘modal’’ in the sense that it quantifies over sets of worlds. We explore two different models of modality for the oracle, corresponding to different physical interpretations. In one case, Theorem 1 and Theorem 2 apply analogously, while, in the other, emulation becomes an ill-defined problem. Denotation vs. Intent Bender and Koller (2020) distinguish between standing meaning and com- municative intent, reflecting a distinction between denotational semantics and other pragmatic inten- tions that a speaker has in producing an utterance. In this paper, it is most straightforward to take (cid:2)e | κ(cid:3)L to reflect standing meaning. In principle, we could imagine that it represents the speaker’s communicative intent, and that an omniscient ora- cle ℵL can reveal information about the speaker’s intents to the system. Even with this unrealistically powerful oracle, Theorem 2 says that the system cannot emulate the speaker’s intents. Competence vs. Performance Chomsky (1965) differentiates competence and performance in linguistic theory, where competence corresponds roughly to the correct algorithmic modeling of a linguistic process, and performance describes its implementation subject to resource constraints like memory. Arguably, agents might be said to understand language if they are competent in this sense, even if they sometimes make performance errors. In contrast, our definition of emulation (Definition 3) permits no performance errors. In future work, it would be interesting to adapt an approximate notion of emulation that tolerates performance errors in order to more closely target understanding in a sense reflecting competence. Other Relations Theorem 1 and Theorem 2 investigate whether ℵL can be used to emulate meaning representations that preserve an equiva- lence relation. While equivalence is an important part of semantics, other semantic relations like entailment are also necessary for language under- standing. In Appendix B, we show a generalization of Theorem 5 extends to any semantic relation. In other words, referential transparency also enables emulation of relations besides =. Other Oracles We believe assertions are a fairly general model of the types of semantics encoded in unsupervised learning resulting from a prag- matic bias for truth; however, it is possible other information is also represented, resulting from other pragmatic biases governing language usage and dataset creation. This additional information could be formalized as access to additional ora- cles. It would be exciting to formalize the power of multimodal setups by analyzing the interactions of oracles enabled by different input modalities. 7 Stepping Back In this work, we formalized an argument that was raised by Bender and Koller (2020) as a thought experiment. Bender and Koller (2020) question whether unsupervised training objectives are the right goal to target for achieving natural language understanding. If meaning is defined as identify- ing which object in the real world, or which set of situations, a linguistic element refers to, then, in a direct sense, an ungrounded system cannot under- stand meaning. But Bender and Koller (2020) go farther than this, claiming that an ungrounded sys- tem cannot even emulate understanding because it is not clear how a system should learn to interpret strings, even if it can model their distribution. We formalize this idea of emulation as ℵ-emulation. One counterargument mentioned by Bender and Koller (2020) is that indirect forms of grounding do exist in programming and natural language, which we formalize as assertions. The syntac- tic distributions of statements like assert allow us to indirectly observe semantic relations over the denotations. Assertions are one way that the distribution of strings in a corpus is not blind to their semantics. By studying them, we study whether this indirect grounding enables a compu- tational system to emulate the underlying semantic relations. 1054 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Key Takeaways While assertions allow a sys- tem to emulate semantic relations in simple cases where the semantics are referentially transparent, we find that linguistic constructs like variable binding bring this task in conflict with the fun- damental laws of computability. In other words, under our formal model of meaning and emula- tion, it is not just intractable for an ungrounded system to emulate understanding of a formal lan- guage, but, in some cases, impossible. We provide constructive examples where understanding must necessarily break down. We present these results in a well-defined framework building off for- mal approaches in logic, linguistics, and computer science. While we do not prove anything about natural languages, we do show that ungrounded models must fail to emulate equivalence in a very simple setting. A similar result likely extends to natural language understanding as well, which among other things, requires modeling referen- tial identity (e.g., for sentences like Manny is the cat). Further, we believe much of our framework can be readily adopted in other works formalizing understanding in Turing-complete systems. Open Questions In this work, we have focused on utterances, by default, as opposed to dialogues. An exciting extension would be to formalize a dialogue between two speakers, interrupted by the ‘‘octopus’’ of Bender and Koller (2020).11 Ex- isting theories of discourse could potentially be synthesized with this framework. What linguistic properties besides referential transparency relate to emulatability? Can this framework be extended to formalize multimodal setups, where multiple oracles from different domains can potentially be combined to gain additional power? Finally, is there a natural way to relax our standard of emu- lation towards a probabilistic definition, and how would this change the results? Acknowledgments We thank Mark-Jan Nederhof for his excellent suggestions. We also thank Dana Angluin, Matt Gardner, Eran Yahav, Zachary Tatlock, Kyle Richardson, Ruiqi Zhong, Samuel Bowman, Christopher Potts, Thomas Icard, and Zhaofeng 11The octopus thought experiment imagines a deep-sea octopus O observes a dialogue between two humans by intercepting an underwater cable. Could O learn to emulate the role of one of the speakers without exposure to life on land? Wu for their feedback on various versions of this work. Further thanks to our anonymous reviewers and researchers at the Allen Institute for AI and UW NLP. Finally, we appreciate the lively online discussion of the paper, which informed updates to the camera-ready version. References Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine- grained analysis of sentence embeddings using auxiliary prediction tasks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Con- ference Track Proceedings. OpenReview.net. Dana Angluin. 1987. Learning regular sets from Information queries and counterexamples. and Computation, 75(2):87–106. https:// doi.org/10.1016/0890-5401(87) 90052-6 Yonatan Belinkov and James Glass. 2019. Anal- ysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72. https:// doi.org/10.1162/tacl a 00254 Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Pro- ceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics. https://doi.org /10.18653/v1/2020.acl-main.463 Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. The arXiv is: 2005.14165. 1055 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Noam Chomsky. 1965. Aspects of the Theory of Syntax, volume 11. MIT Press. edition. Metaphysics Research Lab, Stanford University. the description of Alexander Clark. 2010. Three learnable models In Lan- for guage and Automata Theory and Applications, pages 16–31, Berlin, Heidelberg. Springer Berlin Heidelberg. https://doi.org/10 .1007/978-3-642-13089-2 2 language. Alexis Conneau, German Kruszewski, Guillaume Lample, Lo¨ıc Barrault, and Marco Baroni. 2018. What you can cram into a single $&!#* vector: Probing sentence embeddings for lin- guistic properties. In Proceedings of the 56th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), 2126–2136, Melbourne, Australia. pages Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1198 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language the 2019 understanding. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Com- putational Linguistics. Herbert P Grice. 1975. Logic and conversation. In Speech Acts, pages 41–58. Brill. https:// doi.org/10.1163/9789004368811 003 Zellig S. Harris. 1954. Distributional structure. WORD, 10(2–3):146–162. Irene Heim and Angelika Kratzer. 1998. Semantics in Generative Grammar. Blackwell. John Hewitt and Percy Liang. 2019. Designing and interpreting probes with control tasks. In Pro- ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Nat- ural Language Processing (EMNLP-IJCNLP), pages 2733–2743, Hong Kong, China. Associa- tion for Computational Linguistics. https:// doi.org/10.18653/v1/D19-1275 Dieuwke Hupkes and Willem Zuidema. 2018. Visualisation and ’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure (extended ab- stract). In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 5617–5621. In- ternational Joint Conferences on Artificial Intelligence Organization. https://doi.org /10.24963/ijcai.2018/796 Hector Levesque. 2014. On our best behaviour. Artificial Intelligence, 212. https://doi.org /10.1016/j.artint.2014.03.007 Julian Michael. 2020. To dissect an octopus: Making sense of the form/meaning debate. Christopher Potts. 2020. Is it possible for language models to achieve understanding? https:// doi.org/10.1305/ndjfl/1153858644 Ian Pratt-Hartmann and Allan Third. 2006. More fragments of language. Notre Dame Journal of Formal Logic, 47(2):151–177. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learn- ing with a unified text-to-text transformer. The arXiv: 1910.10683. Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in BERTology: What we know about how BERT works. arXiv: https://doi.org/10.1162 2002.12327. /tacl a 00349 Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, Florence, Italy. Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1452 Laurence R. Horn and Heinrich Wansing. 2020. Negation. In Edward N. Zalta, editor, The Stan- ford Encyclopedia of Philosophy, spring 2020 Alan M. Turing. 1936. On computable numbers, with an application to the Entscheidungsprob- lem. Journal of Math, 58(345–363):5. 1056 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Alan M. Turing. 1950. Computing machinery and intelligence. Mind, LIX(236):433–460. https://doi.org/10.1093/mind/LIX.236 .433 Kai Von Fintel and Irene Heim. 2011. Intensional semantics. Unpublished Lecture Notes. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A multi-task benchmark and language un- analysis platform for natural the 2018 In Proceedings of derstanding. EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics. https://doi .org/10.18653/v1/W18-5446 Alfred North Whitehead and Bertrand Russell. 1925–1927. Principia Mathematica, Cam- bridge University Press. Yoad Winter. 2016. Elements of Formal Se- mantics: An Introduction to the Mathematical Theory of Meaning in Natural Language. Edinburgh University Press. Dani Yogatama, Cyprien de Masson d’Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, and Phil Blunsom. 2019. Learning and evaluat- ing general linguistic intelligence. arXiv: 1901 .11373. Hongming Zhang, Xinran Zhao, and Yangqiu Song. 2020. WinoWhy: A deep diagnosis of es- sential commonsense knowledge for answering Winograd schema challenge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5736–5745, Online. Association for Computational Linguis- tics. https://doi.org/10.18653/v1/2020 .acl-main.508 A Multiple Worlds Programs execute in well-defined environments with a clear state. Speakers of natural language, on the other hand, have imperfect information and beliefs about the world around them. Thus, it can be more natural to model grounding context for language as a set of possible worlds, rather than a single world state. We formalize this in two different ways (with two different physical interpretations) and explore how it affects our results. Let W be a set of all possible worlds. We redefine denotations to be intensionalized (Von Fintel and Heim, 2011), that is, we write (cid:2)e | κ(cid:3)w as the denotation of e in the context κ, evaluated in world w ∈ W . Assume for simplicity that Y = {0, 1, ∅}. We will now introduce modal denotations and assertions using a generic modal quantifier (cid:18), which reduces a sequence of worlds to a boolean value according to some intensional predicate. This quantifier controls how multiple possible worlds are collapsed to form denotations and query outputs. Definition 5 (Modal denotation) Let (cid:18) be a modal quantifier. For all e ∈ Σ(cid:2), κ ∈ (Σ(cid:2))2, define (cid:5) (cid:18)(cid:2)e | κ(cid:3)L = (cid:2)e | κ(cid:3)w L. w∈W We will write the previously defined assertion oracle to apply in a specific world w, namely, ℵw L. We also extend it to quantify over multiple worlds: Definition 6 (Modal assertion) Let (cid:18) be a modal quantifier. For all e ∈ Σ(cid:2), κ ∈ (Σ(cid:2))2, define (cid:18)ℵL(e, e(cid:10) | κ) = (cid:5) w∈W L(e, e(cid:10) | κ). ℵw Specifically, we consider (cid:18) = {(cid:2), ♦}, corre- sponding to universal and existential quantifiers over worlds. Thus, (cid:2) can be thought of as as ∀ over worlds, and ♦ can be thought of as ∃. For either quantifier, if any (cid:2)e | κ(cid:3)w L = ∅, we define (cid:18)(cid:2)e | κ(cid:3)L = ∅ as well. Each quantifier will have a different physical interpretation. With universal quantification, we will find that results analogous to Theorem 1 and Theorem 2 hold. With existen- tial quantification, it turns out that the equivalence class of μ is underspecified. In other words, not only is it impossible to compute an emulator with a finite number of assertion queries, but, even with infinite assertions, there is no consistent way to emulate the underlying modal semantics. A.1 Universal Quantification In the first case we let (cid:18) = (cid:2). Two expressions are viewed as having the same meaning if they 1057 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 are equivalent in every possible belief world. This is interpretable as observing text L(cid:2) written by a single author whose belief state is represented by multiple possible worlds. The author only asserts a statement is true if it is consistent across all worlds that they believe are possible. In this setting, we will show that the modal assertion oracle uniquely specifies a modal de- notation for each expression, up to isomorphism. In other words, as with the non-modal assertion oracle, each assertion query would let us decide some relation between two expressions. Thus, the same results for the non-modal setting discussed in the main body of the paper will also hold here. Theorem 3 Consider e, e(cid:10) ∈ Σ(cid:2) and any con- text κ ∈ (Σ(cid:2))2 such that (cid:2)(cid:2)e | κ(cid:3)L (cid:9)= ∅ and (cid:2)(cid:2)e(cid:10) | κ(cid:3)L (cid:9)= ∅. Then, (cid:2)(cid:2)e | κ(cid:3)L = (cid:2)(cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ (cid:2)ℵL(e, e(cid:10) | κ). Proof. (cid:2)(cid:2)e | κ(cid:3)L = (cid:2)(cid:2)e(cid:10) | κ(cid:3)L (cid:6) (cid:2)e | κ(cid:3)w ⇐⇒ L = (cid:6) (cid:2)e(cid:10) | κ(cid:3)w L ⇐⇒ ⇐⇒ w∈W (cid:6) (cid:3) w∈W L = (cid:2)e(cid:10) | κ(cid:3)w L (cid:2)e | κ(cid:3)w (cid:4) w∈W (cid:6) w∈W L(e, e(cid:10) | κ) ℵw ⇐⇒ (cid:2)ℵL(e, e(cid:10) | κ). (cid:2) Crucial to this simple proof is the fact that ∧ is distributive over =. This is specific to the quan- tifier being (cid:2). Theorem 3 implies that (cid:2)(cid:2)e | κ(cid:3)L can be recovered from modal assertion queries analogously to the non-modal case. Thus, results analogous to Theorem 1 and Theorem 2 apply for emulating (cid:2)(cid:2)e | κ(cid:3)L using queries to (cid:2)ℵL. A.2 Existential Quantification In the second case we let (cid:18) = ♦. Two expressions are viewed as having the same meaning if they are equivalent in some world. This is interpretable as observing a large dataset of text L♦ generated by many authors, each with a different single belief world w. In the corpus, we imagine two expressions can be asserted to be equivalent in some context if any of the authors would consider them to be equal in that context. e1 0 0 0 e2 0 0 0 ℵ 1 1 1 e1 0 0 0 e2 0 1 1 ℵ 1 0 1 w1 w2 ♦ Table 1: Two tables (separated by a thick line) representing two different versions of W . Within each table, each cell i, j in the main 2-by-2 grid contains the boolean value (cid:2)ej | κ(cid:3)wi L . The column to the right contains ℵwi L (e1, e2 | κ). The bottom row aggregates each column by quantifying ♦. In this case, assertions do not even fully spec- ify equivalence between the modal denotations. This is a stronger sense in which meaning cannot be emulated from assertion queries. Emulation is not just impossible with finite assertions, but mathematically underspecified. Theorem 4 There exist e, e(cid:10) ∈ E(L) and κ ∈ (Σ(cid:2))2 such that ♦(cid:2)e | κ(cid:3)L (cid:9)= ∅ and ♦(cid:2)e(cid:10) | κ(cid:3)L (cid:9)= ∅, and also ♦ℵL(e, e(cid:10) | κ) = 1 is consistent with either ♦(cid:2)e | κ(cid:3)L = ♦(cid:2)e(cid:10) | κ(cid:3)L or ♦(cid:2)e | κ(cid:3)L (cid:9)= ♦(cid:2)e(cid:10) | κ(cid:3)L. Proof. We construct an example with expressions e1, e2 in a single context κ. Fix W = {w1, w2}. Table 1 shows two versions of this modal setup. In both versions of the universe, ♦ℵL(e, e(cid:10) | κ) = 1. However, on the left, ♦(cid:2)e | κ(cid:3)L = ♦(cid:2)e(cid:10) | κ(cid:3)L, while, on the right, the opposite holds. So, with ♦, modal assertions do not uniquely determine (cid:2) equivalence of modal denotations. As an equivalence class for μ is not even well- defined by ♦ℵL, we cannot hope to compute it from queries. This is an even stronger sense in which emulation is impossible using assertions. On some level, this may be a natural model for language modeling corpora, which aggregate text from potentially inconsistent sources. In summary, if assertions uniquely determine equivalence between denotations in a strongly transparent language, then we can expect to emu- late representations preserving equivalence using assertions. Otherwise, there are various levels of formal challenges to emulating equivalence. B Other Semantic Relations Sections 4, 5, and A investigate whether ℵL can be used to emulate meaning representations that 1058 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 5: A concrete implementation of all strings, which is referenced in Figure 2 and Figure 6. Figure 6: emulate computes a structured representation of the input string expr that preserves any semantic relation ◦ in terms of assertion queries. The iterable all strings is defined in Figure 5. preserve semantic equivalence. While equivalence is an important part of semantics, other semantic relations are also necessary for language un- derstanding. For example, the following feature prominently in theories of linguistic semantics: Definition 7 For e, e(cid:10), ∈ Σ(cid:2) and κ ∈ (Σ(cid:2))2, define the assertion oracle ℵL,◦(e, e(cid:10) | κ) = (cid:2) 1 0 if (cid:2)e | κ(cid:3)L ◦ (cid:2)e(cid:10) | κ(cid:3)L otherwise. • Entailment In general terms, an entailment (Winter, 2016) relation → is a partial order over Y . Intuitively, if y → y(cid:10), then y is a ‘‘special case’’ of y(cid:10). For example, one could construct E, a semantic analysis of English, where (cid:2)fat cat | a, sits(cid:3)E → (cid:2)cat | a, sits(cid:3)E. • Contrary negation Negation is a complex topic in semantics. One sense of negation is if two meaning representations are ‘‘contrary’’ (Horn and Wansing, 2020), meaning both cannot be true at the same time. Does Theorem 2 generalize to other relations besides =? To answer this, we first extend asser- tions and emulation to apply to a generic relation ◦ : M 2. The proof for Theorem 1 does not fully translate to this new setting, but we will show via a new argument that emulation is still possible. Definition 8 A class of languages L over Σ is ℵ-emulatable w.r.t. ◦ if there exists an oracle Turing machine μ and standard Turing machine δ such that, for all L ∈ L, κ ∈ (Σ(cid:2))2, and e, e(cid:10) ∈ suppL(κ), (cid:2)e | κ(cid:3)L ◦ (cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ δ (cid:4) (cid:3) μL(e), μL(e(cid:10)) | κ . We now are ready to prove the extended form of Theorem 1. The main idea of the proof will be to memoize the value of the relation ◦ between (cid:2)e | κ(cid:3)L and the values of all expressions smaller than e. This guarantees that δ will be able to ‘‘look up’’ the correct output. Theorem 5 TRANSPARENT is ℵ-emulatable w.r.t. ◦. Proof. Similarly to Theorem 1, we present the proof constructively as a Python program to 1059 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 compute μ. We then show how to define δ appropriately, completing the proof. Figure 6 shows the algorithm to compute μL(e) ∈ M . In Python, μL(e) is a dictionary; we interpret it as a function μL(e) : Σ(cid:2) × Σ(cid:2) → {0, 1, ∅}, where ∅ represents values that are not set. We define δ as follows: δ(m, m(cid:10) | κ) ⇐⇒ (cid:2) m(e, e(cid:10)) m(cid:10)(e, e(cid:10)) if m(e, e(cid:10)) (cid:9)= ∅ otherwise. Crucially, it must be that μL(e)(e, e(cid:10)) (cid:9)= ∅ or μL(e(cid:10))(e, e(cid:10)) (cid:9)= ∅. In Figure 6, cand either reaches e before e(cid:10), or e(cid:10) before e. By symmetry, assume it reaches e before e(cid:10). Then μL(e(cid:10))(e, e(cid:10)) (cid:9)= ∅, so δ(μL(e), μL(e(cid:10)) | κ) ⇐⇒ μL(e(cid:10))(e, e(cid:10)) = 1 ⇐⇒ ℵL,◦(e, e(cid:10) | λ2) = 1 ⇐⇒ (cid:2)e | λ2(cid:3) ◦ (cid:2)e(cid:10) | λ2(cid:3) ⇐⇒ (cid:2)e | κ(cid:3) ◦ (cid:2)e(cid:10) | κ(cid:3). (cid:2) transitivity, and symmetry: Therefore emulate satisfies Definition 3. We needed to change the proof of Theorem 5 compared to Theorem 1 because ◦ is not an equiv- alence relation. In Theorem 1, the final steps relied on reflexivity, the three properties that constitute equivalence. The new proof enlarges the size of the emulated rep- resentations. Rather than representing each e with a number, μL(e) becomes a large dictionary of strings. This represents an increase in space com- plexity from linear to exponential in the size of e. We will write f ∼ == g in this case. We will refer to a set of contexts S ⊆ (Σ(cid:2))2 as a syntactic role. Each syntactic role has a set of expressions supp−1 L (S) whose support is that role: suppL(e) = {κ ∈ (Σ(cid:2))2 | (cid:2)e | κ(cid:3)L (cid:9)= ∅} supp−1 L (S) = {e ∈ Σ(cid:2) | suppL(e) = S}. We can now give the old definition of emulation: Definition 9 (Old ℵ-emulation) μ : Σ(cid:2) → M emulates (cid:2)·(cid:3)L w.r.t. = iff: 1. μ ∼ == (cid:2)·(cid:3)L over supp−1 L (S), for all S ⊆ (Σ(cid:2))2 2. There exists a Turing machine that computes whether m = m(cid:10) for each m, m(cid:10) ∈ M 3. There exists a Turing machine with oracle access to ℵL that computes μ For a set of languages L, this is equivalent to saying L is ℵ-emulatable iff, for all L ∈ L, there exists an oracle Turing machine μ and normal Turing machine δ such that, for all S ∈ (Σ(cid:2))2, e, e(cid:10) ∈ supp−1 L (S), (cid:2)e(cid:3)L = (cid:2)e(cid:10)(cid:3)L ⇐⇒ δ (cid:3) μL(e), μL(e(cid:10)) (cid:4) . This more closely resembles Definition 3, but we will make two slight changes. First, we will change the quantifier order, such that a single μ must work for every L ∈ L. Then, we will grant δ access to a context κ, and rephrase the equation to hold over all κ ∈ (Σ(cid:2))2 and e, e(cid:10) ∈ suppL(κ): (cid:4) μL(e), μL(e(cid:10)) | κ (cid:2)e | κ(cid:3)L = (cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ δ (cid:3) . l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 4 1 2 1 9 6 3 9 8 3 / / t l a c _ a _ 0 0 4 1 2 p d . C Old Emulation Definition A previous version of this paper defined emulation slightly differently. We discuss the differences and explain the advantages of the new definition. First, we defined a general denotation as (cid:2)e(cid:3)L = {(cid:5)κ, (cid:2)e | κ(cid:3)L(cid:6) | κ ∈ (Σ(cid:2))2}. The general meaning represents the meaning of a word across all contexts. Now, say that two functions f, g are isomorphic (with respect to =) over a set X iff, for all x, x(cid:10) ∈ X, f (x) = f (x(cid:10)) ⇐⇒ g(x) = g(x(cid:10)). This recovers Definition 3. This version more faithfully reflects the intuitive notion of emulation. The old version required μL(e) to determine how e should evaluate in every possible context. Em- ulation would not be possible in some cases even with perfect knowledge of L. Now, it must just be possible in any context κ to compute (cid:2)e | κ(cid:3)L from κ and μL(e), which is a weaker standard. Under the new definition, it is always possible to emulate a class of languages with one element, assuming (cid:2)e | κ(cid:3)L is computable. An additional improvement is that emulation now applies to all expressions that share a context, whereas before it only targeted expressions with the same support. f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 1060
Descargar PDF