Provable Limitations of Acquiring Meaning from Ungrounded Form:
What Will Future Language Models Understand?
William Merrill∗ Yoav Goldberg∗ † Roy Schwartz‡ Noah A. Smith∗§
∗Allen Institute for AI, United States
†Bar Ilan University, Israel
‡Hebrew University of Jerusalem, Israel
§University of Washington, United States
{willm,yoavg,roys,noah}@allenai.org
Abstracto
Language models trained on billions of tokens
have recently led to unprecedented results on
many NLP tasks. This success raises the ques-
tion of whether, in principle, a system can ever
‘‘understand’’ raw text without access to some
form of grounding. We formally investigate
the abilities of ungrounded systems to acquire
significado. Our analysis focuses on the role of
‘‘assertions’’: textual contexts that provide in-
direct clues about the underlying semantics.
We study whether assertions enable a system
to emulate representations preserving semantic
relations like equivalence. We find that asser-
tions enable semantic emulation of languages
that satisfy a strong notion of semantic trans-
parency. Sin embargo, for classes of languages
where the same expression can take different
values in different contexts, we show that em-
ulation can become uncomputable. Finalmente, nosotros
discuss differences between our formal model
and natural language, exploring how our re-
sults generalize to a modal setting and other
relaciones semánticas. Juntos, our results sug-
gest that assertions in code or language do
not provide sufficient signal to fully emulate
semantic representations. We formalize ways
in which ungrounded language models appear
to be fundamentally limited in their ability to
‘‘understand’’.
1
Introducción
Recientemente, language models trained on huge data-
sets of raw text have pushed the limits of natural
language processing (Devlin et al., 2019; Raffel
et al., 2019; Brown y cols., 2020, among others).
Such systems transcend the expert system para-
digm, where rules about language and meaning are
hardcoded into a system, as well as the supervised
learning paradigm, where a notion of meaning
is provided through ground-truth labels. Bastante,
analysis of massive language models has revealed
eso, to some degree, knowledge of syntactic and
semantic dependencies can emerge without ex-
plicit supervision (Rogers et al., 2020; Tenney
et al., 2019). This knowledge can then be trans-
ferred to a variety of downstream NLP tasks.
Todavía, today’s NLP systems built on large lan-
guage models still fall short of human-level gen-
eral understanding (Yogatama et al., 2019; zhang
et al., 2020). Brown et al. (2020) discuss the limi-
tations of their GPT-3 language model compared
with humans, suggesting that:
Scaling up any LM-like model . . . may
eventually run into (or could already be
running into) the limits of the pretraining
objetivo.
This possibility raises an interesting theoretical
pregunta. What are the fundamental limits of learn-
ing meaning from language modeling, even as-
suming a perfect learner with access to unlimited
datos? Recientemente, Bender and Koller (2020) argumentó
that achieving true natural language understanding
from text alone is impossible, y eso, to really
get at meaning, some type of semantic grounding
is necessary.1 Their style of argumentation largely
focused on developing thought experiments,
rather than making formal arguments.
One thought experiment featuring prominently
in Bender and Koller (2020) was the task of learn-
ing to understand a programming language’s se-
mantics from raw code. Aquí, understanding was
defined as fully emulating a compiler. This setup
has clear parallels to learning to understand natural
idioma, although the more well-defined nature
of programming languages makes them easier to
reason about. Bender and Koller (2020) argue that
emulation is difficult in this setting, and per-
haps impossible, because the source code alone
contains no information about how it should be
interpreted to create outputs. One counterpoint
1See Michael (2020) for a summary of the informal
discussion around Bender and Koller (2020), much of which
took place on social media.
1047
Transacciones de la Asociación de Lingüística Computacional, volumen. 9, páginas. 1047–1060, 2021. https://doi.org/10.1162/tacl a 00412
Editor de acciones: Mark-Jan Nederhof. Lote de envío: 9/2020; Lote de revisión: 4/2021; Publicado 9/2021.
C(cid:3) 2021 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
yo
a
C
_
a
_
0
0
4
1
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
raised by the paper, as well as others (Miguel
2020; Potts, 2020), is the existence of unit tests,
with assertions encoding examples of input/output
pairs for blocks of code.2 For example, sistema-
atically observing blocks like x = 3; assert x
== 3 could let a system bootstrap the semantics
of variable assignment, because a programmer is
likely to write assertions that will pass. These as-
sertions constitute a form of implicit grounding
embedded within language modeling by the prag-
matic concerns of programmers, and they could
potentially be leveraged to emulate a compiler.3
Sin embargo, it is not immediately clear if unit tests
provide ‘‘enough’’ supervision to do this, incluso
with unlimited data.
Viewing the debate about the power of asser-
tions as central to the larger philosophical ques-
ción, we aim to clarify it in more formal terms.
en este documento, we formally study whether observ-
ing a generalized notion of assertions can allow a
system to ‘‘understand’’ strings. An assertion is
a query about whether two strings evaluate to the
same value within a fixed context. This is moti-
vated by the role of assertions in unit tests, dónde
asserting two expressions are equal suggests that
they have the same value within the test.
While assertions are directly motivated by the
compiler thought experiment, they also have ana-
logs in natural language, where sentences make
assertions about the world, and it is reasonable to
expect some form of bias towards true statements
this is one of Grice’s
(Potts, 2020). En efecto,
Maxims (Grice, 1975): a set of basic principles
proposed to govern the pragmatics of natural lan-
guage. Por ejemplo, the truth conditions of This
cat is the cat that Mary owns verify that two cats in
the world identified in distinct ways are the same
entidad. En general, we might expect a sentence to
appear with higher frequency if its truth conditions
hold within its context, similar to an assertion in
código, although of course there will also be other
factors governing sentence frequency besides this.
En este sentido, the example sentence resembles the
Python statement assert cat1 == cat2, dónde
cat1 and cat2 are two Cat objects. See Section 6
for more discussion of how assertions and other
formal concepts translate to natural language. Nosotros
will generalize assertions to an abstract formal
2Unit tests are blocks of code in a software project that are
designed to test whether the core code is behaving correctly.
3Contexts like assertions can be seen as an argument in
favor of the distributional hypothesis (harris, 1954).
language context, allowing us to study how they
can be used to emulate semantic relations.
Our findings are as follows. If every expression
in a language has the same value in every valid
contexto, then the language can be emulated using
a finite number of assertion queries (Sección 4).
Sin embargo, we construct a class of languages where
expressions can take different values in different
contextos, and where assertions do not enable em-
ulación, es decir., infinite queries would be required
(Sección 5). Intuitivamente, this means that assertions
do not provide enough signal for a Turing-complete
emulator to fully ‘‘understand’’ languages from
this class. We go on to discuss differences between
our formal model and the less well-defined context
of natural language (Sección 6).
These results provide a formal way to charac-
terize upper bounds on whether it is possible to
emulate the semantics of a language from distribu-
tional properties of strings. Within our framework,
in certain settings, we find that meaning cannot
be learned from text alone. We strengthen claims
made by Bender and Koller (2020) that assertions
in code do not necessarily provide sufficient signal
for a language model to emulate understanding.
We do not make strong claims about how these
results transfer to natural language, although we
expect that the added complexity of natural lan-
guage would make it, if anything, more difficult to
‘‘understand’’ than code.4
2 Preliminaries
Let L ⊆ Σ(cid:2) denote a formal language over alpha-
bet Σ. We will use λ to denote the empty string.
Dejar (S(cid:2))2 denote the Cartesian product of Σ(cid:2)
with itself; eso es, the set of all pairs of strings.
Resembling Clark (2010), we refer to a tuple
(cid:5)yo, r(cid:6) ∈ (S(cid:2))2 as a syntactic context. Nosotros también
use other symbols to refer to a context (p.ej.,
κ = (cid:5)yo, r(cid:6)). We denote by λ2 the empty context
(cid:5)λ, λ(cid:6).
2.1 Significado
We will model formal languages not just as sets
of strings, but as having an associated semantics.5
4Appendix C documents and motivates conceptual
changes since the original arXiv version of the paper.
5We slightly abuse notation by using L to refer to both
a set of strings, and a set of strings paired with a denotation
función, which could be written more verbosely as (cid:5)l, (cid:2)·(cid:3)l(cid:6).
1048
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
yo
a
C
_
a
_
0
0
4
1
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Específicamente, we assume the existence of a de-
notational semantics over every substring of L,
which we now elaborate on. Let Y be a count-
able set of referents. Primero, we will say that some
e ∈ Σ(cid:2) is a valid expression within the context
κ = (cid:5)yo, r(cid:6) if there exists some contextual denota-
ción (cid:2)mi | κ(cid:3)L ∈ Y . Intuitivamente, this represents the
value of e when it occurs in the larger context
ler ∈ L. We will also use the notation (cid:2)mi | yo, r(cid:3)l
where convenient. We will reserve ∅ ∈ Y as a
special null symbol, defining (cid:2)mi | κ(cid:3)L = ∅ iff e is
not a valid expression in the context κ.6
Each context κ ∈ (S(cid:2))2 also has a support, o
set of expressions that are valid within it:
suppL(κ) = {e ∈ Σ(cid:2) | (cid:2)mi | κ(cid:3)l (cid:9)= ∅}.
Example Let L be a language of integers along
con el + operator, Por ejemplo, 2 + 2. Y is
simply the integers. We take (cid:2)mi | κ(cid:3)L to map e
to its standard arithmetic interpretation, a saber,
(cid:2)2 + 6 | λ, + 4(cid:3)L = 8. We take expressions that
are not conventionally well-formed to be invalid:
Por ejemplo, (cid:2)+ | λ, +(cid:3)L = ∅. Finalmente, let κ =
(cid:5)λ, + 4(cid:6). Then suppL(κ) = L, since any valid
expression can occur within κ.
2.2 Strong Transparency
As defined above, we make very few assump-
tions about denotations. They are not necessarily
compositivo, and expressions may take differ-
ent referents in different contexts. Sin embargo, nosotros
saw in the integer expression language that the
meanings of an expression did not depend on its
contexto. We now define a property formalizing
this idea.
Definición 1 (Strong transparency) L is strongly
transparent iff, for all e ∈ Σ(cid:2), κ ∈ (S(cid:2))2, either
(cid:2)mi | κ(cid:3)L = (cid:2)mi | l2(cid:3)l (cid:9)= ∅, o (cid:2)mi | κ(cid:3)L = ∅.
Informalmente, strong transparency says each e has
a well-defined denotation that exists independent
6Our simple model of denotations does not reflect the full
range of semantic theories that have been proposed for natural
idioma. En particular, our denotations (cid:2)mi | κ(cid:3)L depend only
on the linguistic context κ rather than any external world
estado. This differs substantially from how truth conditions are
traditionally conceptualized in formal semantics (Heim and
Kratzer, 1998). Por ejemplo, in our framework, the referent
of English (cid:2)the dog | κ(cid:3)L must be fixed with no regard for
the extralinguistic context. Sección 6 further contrasts our
setup with the richer semantics of natural language.
of context, and that this simple denotation can be
‘‘plugged into’’ any context. Our previous exam-
ple expression 2 + 6 is strongly transparent be-
cause it can be said to have a well-defined value 8
independent of its context. We could break strong
transparency by adding bound variables to the lan-
guage, Por ejemplo, x = 2; X + 6 in Python. En
este caso, (cid:2)X | κ(cid:3)L non-vacuously depends on κ.
Strong transparency resembles referential trans-
parency (Whitehead and Russell, 1925–1927), pero
is a stronger condition, in that it does not allow the
same name to ever refer to different values. por ejemplo-
amplio, for a Python program, strong transparency
does not allow assigning local variables within a
función, even if the function output would remain
completely specified by its inputs.
2.3 Assertion Queries
We now define an oracle function providing asser-
tion information about expressions in L, resem-
bling assert e1 == e2 for two Python expressions
e1, e2. A system is granted access to this function,
and it can make assertion queries to it in order
to learn about the semantics of L.7 An assertion
query tells us whether two expressions e, mi(cid:10) son
equivalent within the context κ.
Definición 2 (Assertion oracle) For e, mi(cid:10), ∈ Σ(cid:2)
and κ ∈ (S(cid:2))2, define the assertion oracle
ℵL(mi, mi(cid:10) | κ) =
(cid:2)
1
0
si (cid:2)mi | κ(cid:3)L = (cid:2)mi(cid:10) | κ(cid:3)l
de lo contrario.
Recall that we defined (cid:2)mi | κ(cid:3)L = ∅ if e is not
valid in the context κ. In our example language
of integer expressions, for all κ, ℵL(4, 2 + 2 |
κ) = 1, desde 4 = 2 + 2. The computational
power of this oracle depends on the complexity
of the underlying semantics: For arbitrary seman-
tics, it can become uncomputable. en este documento,
aunque, we focus on classes of languages for which
the denotation function and assertion oracle are
computable.
The ℵL oracle is motivated by assertion state-
ments in programming languages, which occur
naturally in environments like unit tests. The dis-
tribution of strings in a corpus of code should cap-
ture some notion of this oracle, since a programmer
is more likely to assert two expressions are equal if
7This resembles the role of queries in classical grammar
induction works (p.ej., Angluin, 1987).
1049
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
yo
a
C
_
a
_
0
0
4
1
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
they are expected to have the same value. Our goal
is to study the limits of understanding achievable
from raw text, so we consider an ‘‘upper bound’’
setup by assuming a system has full access to ℵL.
Can the system use this powerful oracle to emulate
the underlying semantics?
2.4 Turing Machines
Our notion of language understanding will be
based around the idea of emulation, which in turn
requires a model of computational realizability.
We will use Turing machines (Turing, 1936) como
a model of universal computation. We write μ(mi)
for the output of Turing machine μ evaluated on
input e ∈ Σ(cid:2). We will also define an oracle Turing
machine as a standard Turing machine that can
compute a blackbox ‘‘oracle’’ function f as a
subroutine. We imagine the machine has a special
query instruction and tape. After writing x to the
query tape and executing the query instruction,
the query tape will contain f (X). We will write
μf (mi) for the Turing machine μ evaluated on
input e with oracle access to f . In the case where
f = ℵL, we will simply write μL(mi). Whereas,
in computability theory, oracle Turing machines
are generally leveraged to make reductions from
uncomputable problems, here we will use them
to formalize the ability of an emulator to make
assertion queries about L. This oracle provides
additional power because these queries contain
additional information beyond that encoded in the
input expression.
3 Research Question: Do Assertions
Enable Emulation?
There is a long history in AI of trying to define
and measure understanding. Turing (1950) consti-
tutes an early behaviorist perspective; more recent
approaches tend to emphasize not just an external
view of a system’s behavior, but also ‘‘how it is
achieved’’ (Levesque, 2014). Understanding can
be behaviorally diagnosed in neural models by
evaluating them on benchmarks (Wang y cols.,
2018). An alternate approach is probing (Adi et al.,
2017; Conneau et al., 2018; Hupkes and Zuidema,
2018; Hewitt and Liang, 2019; Belinkov and
Glass 2019), which investigates how directly a
model’s representations encode semantic relations
by measuring if they can be easily decoded from
Cifra 1: An illustration of Definition 3. μ emulates a
representation of each expression using assertion que-
ries. Entonces, δ compares the emulated representations to
determine equivalence.
a ellos. Similarmente, we take the position that systems
are capable of understanding if they emulate rep-
resentations that are isomorphic to underlying
meaning under important semantic relations like
equivalence. We will formalize this in Question 1,
which asks whether such emulation is possible
using assertions.
Definición 3 (ℵ-emulation) A class of languages
L over Σ is ℵ-emulatable if there exists an oracle
Turing machine μ and standard Turing machine
δ such that, for all L ∈ L, κ ∈ (S(cid:2))2, y
mi, mi(cid:10) ∈ suppL(κ),
(cid:2)mi | κ(cid:3)L = (cid:2)mi(cid:10) | κ(cid:3)L ⇐⇒ δ
(cid:3)
(cid:4)
μL(mi), μL(mi(cid:10)) | κ
.
μ can be thought of as an emulator that evaluates
expresiones, whereas δ receives two values and
decides whether they are equal. Fundamentalmente, only μ
has direct access to ℵL. δ can only use information
from the oracle to the extent that it is encoded in
the representations μL(mi) and μL(mi(cid:10)).
Definición 3 formulates emulation as a decision
problema, as is typical in theoretical computer
ciencia. Equivalently, δ can be replaced by a
computable function ρ such that ρ(μL(mi) | κ)
evaluates μL(mi) in context κ, eso es, its output
string is isomorphic to (cid:2)mi | κ(cid:3)L under =. El
functions δ and ρ are Turing-reducible to each
otro, implying that if one definition is satisfied,
so is the other.
1050
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
yo
a
C
_
a
_
0
0
4
1
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 2: emulate implements an emulator μ. Let all strings be an iterable enumerating all strings in Σ(cid:2).
We provide a concrete implementation of all strings in Figure 5.
With our definition of emulation in place, nosotros
can formally state the research question:
Question 1 For a class of languages L, is L
ℵ-emulatable?
How does Question 1 relate to understanding in
large language models? We imagine that, con
sufficiently large amounts of data, the frequencies
of strings in L carry enough signal such that the
language model objective ‘‘supervises’’ access
to ℵL. De este modo, μL(mi) can be thought of as the
language model representation of an expression e.
We then hope to recover underlying semantic
relations from the representations produced by the
language model via some function δ. The class L
corresponds to a set of hypothesis languages over
which the language model must search for the true
l. We will see that whether emulation is possible
will depend on the properties of L.
Stepping back, Question 1 bears on the role
of assertions
raised by Bender and Koller
(2020). Does observing assertions allow a Turing-
complete system to emulate a compiler? In more
general terms, are assertions powerful enough im-
plicit grounding to achieve representations that
encode the denotational semantics of a language?
4 Strong Transparency
We first consider the case where the language
being learned is known to be strongly transpar-
ent. Let TRANSPARENT denote the class of strongly
transparent languages. We will show that TRANS-
PARENT is ℵ-emulatable. The core idea of the proof
is to construct a canonical form for each expres-
sión. The canonical form is the first expression
in a lexicographic ordering that the assertion ora-
cle deems equivalent to the target expression. Para
technical reasons, the emulator returns the index
of this string under the lexicographic order.
Teorema 1 TRANSPARENT is ℵ-emulatable.
Prueba. As Python is Turing-complete, we write μ :
S(cid:2) → N as a Python function emulate in Figure 2.
The function receives as input an expression expr
and a callback function asserteq to an oracle
computing ℵL. For each e ∈ Σ(cid:2), there exists e(cid:2) ∈
S(cid:2) such that ℵL(mi, mi(cid:2) | l2) = 1. In the ‘‘worst
case’’, this holds when e(cid:2) = e by symmetry.
Por construcción, all strings reaches all strings
in finite time. Por lo tanto, the number of loop
iterations before reaching e(cid:2) is finite. We can
conclude that emulate halts on every e ∈ Σ(cid:2),
establishing that it is computable.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
Ahora, we move towards justifying that the em-
ulation is correct for every κ ∈ (S(cid:2))2. We note
that δ is simply the indicator function for equality
over the natural numbers:
(cid:2)
δ(metro, metro(cid:10) | κ) =
1 if m = m(cid:10)
0 de lo contrario.
yo
a
C
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
yo
a
C
_
a
_
0
0
4
1
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
The function emulate outputs i ∈ N, the index of
the first string e(cid:2) such that (cid:2)mi | l2(cid:3)L = (cid:2)mi(cid:2) | l2(cid:3)l.
Ahora, let e, mi(cid:10) ∈ suppL(κ) be different inputs to μ.
Because the enumeration order of the for loop is
fixed across computation of μL(mi) and μL(mi(cid:10)):
μL(mi) = μL(mi(cid:10)) ⇐⇒ (cid:2)mi | l2(cid:3)L = (cid:2)mi(cid:2) | l2(cid:3)l
∧ (cid:2)mi(cid:10) | l2(cid:3)L = (cid:2)mi(cid:2) | l2(cid:3)l
⇐⇒ (cid:2)mi | l2(cid:3)L = (cid:2)mi(cid:10) | l2(cid:3)l
⇐⇒ (cid:2)mi | κ(cid:3)L = (cid:2)mi(cid:10) | κ(cid:3)l,
where the last step follows by strong transparency.
We conclude that the conditions for emulation
(cid:2)
(Definición 3) are fully satisfied.
Through a simple construction, we have shown
it is possible to emulate meaning from assertion
queries for languages with strongly transparent
1051
Cifra 3: Templates for strings in Lm, for m ∈ N ∪ {∞}. M evaluates to m in all strings, while other expressions
are evaluated according to Python 3.8 semantics. The metavariable n ranges over N to form different strings in
Lm, and is serialized as a decimal string.
semantics. The number of bits in the emulated rep-
resentation μL(mi) is linear in the size of e. En el
next section, we consider what happens without
strong transparency, dónde, among other com-
plejidades, values can be bound to variables, compli-
cating the construction used in Theorem 1.
5 General Case
Requiring strong transparency precludes a broad
class of linguistic patterns allowing an expression
to refer to different values in different contexts.
Por ejemplo, this includes assigning variable or
function names in Python, or binding pronouns in
natural language. These constructions can make
emulation impossible to achieve from assertions.
We will construct a class of languages based on
Python where emulation is uncomputable.
Definición 4 Let LEQ = {Lm | m ∈ N ∪ {∞}},
where strings in Lm are defined according
to Figure 3. For semantics, we first define
(cid:2)METRO | κ(cid:3)Lm = m. For any other ler ∈ Lm that
is a well-formed Python 3.8 expresión, we define
(cid:2)mi | yo, r(cid:3)Lm as the value of e assigned by the
Python interpreter in the context (cid:5)yo, r(cid:6). For strings
that are not valid Python expressions, define
(cid:2)mi | yo, r(cid:3)Lm = ∅.
What does it take to emulate the expressions
leq() and True in Lm? If we knew m, then we
could emulate them by simply comparing n < m.
However, it turns out that recovering m for any
Lm ∈ LEQ is not possible with a fixed number of
assertion queries. Formalizing this, we will show
that LEQ is not ℵ-emulatable.8
Theorem 2 LEQ is not ℵ-emulatable.
each of which is parameterized by some value of
n. Notationally, we identify each Lm with m, and
each context with its parameter n. This enables
shorthand like (cid:2)e | n(cid:3)m for the denotation of the
expression e in the context parameterized by n
in Lm.
When m = ∞,
it holds for all n that
ℵ∞(leq(), True | n) = 1. To satisfy emulation
of e ∈ {leq(), True}, μ∞ makes a finite number
of assertion queries
ℵ∞(leq(), True | ni).
for some sequence of contexts n1, · · · , nq, which
we assume without loss of generality is sorted in
increasing order. We can adversarially construct
m(cid:10) (cid:9)= ∞ such that all these queries are the same,
and thus μ∞(e) = μm(cid:10)(e) for both e. To imple-
ment this, we simply set m(cid:10) = nq + 1. Since
μ∞(e) = μm(cid:10)(e), we conclude that, for all n,
δ(μm(cid:10)(leq()), μm(cid:10)(True) | n) =
δ(μ∞(leq()), μ∞(True) | n).
On the other hand, consider n > nq. En este caso,
(cid:2)leq() | norte(cid:3)metro(cid:10) = False
(cid:2)leq() | norte(cid:3)∞ = True,
which can be rewritten as
(cid:2)leq() | norte(cid:3)metro(cid:10)
(cid:9)= (cid:2)True | norte(cid:3)metro(cid:10)
(cid:2)leq() | norte(cid:3)∞ = (cid:2)True | norte(cid:3)∞.
Por lo tanto, the conditions of ℵ-emulation (Definir-
ción 3) cannot be satisfied for both Lm(cid:10) and L∞.
(cid:2)
This implies that LEQ is not ℵ-emulatable.
Prueba. Without loss of generality, we focus on
the contexts for leq()9 and True within print(·),
5.1 Discusión
8Another example of a non-ℵ-emulatable language takes
M to be a finite list of integers and replaces n < M with n
in M.
9The only ‘‘valid’’ context for leq() is within print(·).
The denotation of leq() when it occurs next to def is ∅.
We briefly summarize this result in less formal
terms. LEQ contains languages Lm defined by
Figure 3. Every program in each Lm is easily
computable. With knowledge of the Python in-
terpreter and m, any agent could execute all of
1052
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 4: An informal construction adapting the program templates in Figure 3 to English. Under our framework,
two sentences are considered equivalent if they are true in exactly the same set of contexts. If the number is
allowed to be ∞, this cannot be done in general for the final lines of each template.
these programs. This can be formalized by ob-
serving that, for a fixed m, the class {Lm} is
ℵ-emulatable. Rather, what we have shown is
that, with finite time, it is impossible for an un-
grounded agent to emulate Lm using assertion
queries when m is unknown in advance. In other
words, without prior knowledge of m, no algo-
rithm can use assertions to disambiguate which
notion of = is used by Lm from the infinite other
possibilities. In a rough sense, m can be thought of
as a cryptographic key enabling linguistic under-
standing: agents that know m can directly emulate
Lm, but agents without it cannot, at least using
assertions.10
Theorem 2 does not use the fact that δ must be
computable, as opposed to an arbitrary function.
Even if δ is an arbitrary function, it could not
disambiguate whether m halts based on queries.
It is more precise to state Theorem 2 in a formal
language, but an argument similar to Theorem 2
can be adapted to a natural language like English.
An example is shown in Figure 4, where we define
the meaning of a sentence as its truth conditions,
and we imagine the class of candidate languages is
formed by varying the unspecified number, which
can potentially be ∞. Deciding if n is less than
it has the same truth conditions as Zero equals
one is equivalent to comparing leq() and True.
A system must necessarily fail to emulate the
semantics of these expressions in some context,
for some secret number. The rest of the paper
further explores the implications and limitations
of applying our formal model to natural language.
10Alternatively, we can take a more complexity-theoretic
perspective by measuring the number of queries needed to
emulate up to a bounded context size. Fix a maximum n.
Then we can use binary search with O(log n) queries to
find the value of m. Since the number of context bits is
O(log n), the numbers of queries is O(|κ|), beating the
O(|Σ||κ|) query complexity achievable by brute force. This
perspective somewhat resembles Pratt-Hartmann and Third
(2006) and other work in semantic complexity theory on the
computational complexity of evaluating fragments of natural
language.
6 Towards Natural Language
As discussed in Section 1, our results are inspired
by the thought experiment of whether a language
model can use raw code to learn a compiler. A
goal of this, of course, is to examine whether un-
derstanding can be acquired from natural language
text in a simplified setting. In principle, our formal
results can bear on this broader question about nat-
ural language, although some differences emerge
when extending the results to a less well-defined
setting. In many cases, these differences appear
to make the task of learning meaning harder, sug-
gesting that our negative claim in a simpler setting
(Theorem 2) may still hold as an impossibility
result. We now discuss some points of difference
between our formal model and natural language.
Truth Conditions There are connections be-
tween our framework and the concepts of truth
values and truth conditions in linguistic semantics.
For a Boolean-valued expression e, a truth value
corresponds to computing (cid:2)e | κ(cid:3)L in a fixed con-
text. On the other hand, truth conditions corre-
spond roughly to a function computing (cid:2)e | κ(cid:3)L for
any κ. A crucial difference, though, is that these
conditions cannot be intensional (Von Fintel and
Heim, 2011), that is, they are not functions of
the world state, but rather of the linguistic con-
text only. In this sense, emulation corresponds to
recovering the ability to resolve non-intensional
truth conditions of sentences. This model is natural
for formalizing a closed programming language
environment, for example, with no environment
variables or user input, since in this case the pro-
gram state is specified completely by the linguistic
context. On the other hand, English has common
elements like that whose meaning can change
depending on world state external to language.
Perhaps allowing such elements would only make
understanding more difficult; or, arguably, gen-
erally impossible, since there is no way for the
model to observe the grounding world state us-
ing only an assertion oracle. We are inclined to
1053
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
believe that, since such changes would make un-
derstanding more difficult, Theorem 2 would still
hold as an impossibility result. However, future
work would be needed to make this idea precise.
information about
Possible Worlds
In the last paragraph, we dis-
cussed how mutable world state is an additional
complexity of natural language compared to our
setup. Similarly, speakers of natural languages
the world
have imperfect
around them, which can be captured by modeling
the referent of an expression over a set of possible
worlds, rather than within a specific evaluation
context. In Appendix A, we explore to what de-
gree this setting makes the task of learning to
understand more difficult. In adapting our model
to this context, the assertion oracle must become
‘‘modal’’ in the sense that it quantifies over sets
of worlds. We explore two different models of
modality for the oracle, corresponding to different
physical interpretations. In one case, Theorem 1
and Theorem 2 apply analogously, while, in the
other, emulation becomes an ill-defined problem.
Denotation vs. Intent Bender and Koller (2020)
distinguish between standing meaning and com-
municative intent, reflecting a distinction between
denotational semantics and other pragmatic inten-
tions that a speaker has in producing an utterance.
In this paper, it is most straightforward to take
(cid:2)e | κ(cid:3)L to reflect standing meaning. In principle,
we could imagine that it represents the speaker’s
communicative intent, and that an omniscient ora-
cle ℵL can reveal information about the speaker’s
intents to the system. Even with this unrealistically
powerful oracle, Theorem 2 says that the system
cannot emulate the speaker’s intents.
Competence vs. Performance Chomsky (1965)
differentiates competence and performance in
linguistic theory, where competence corresponds
roughly to the correct algorithmic modeling of
a linguistic process, and performance describes
its implementation subject to resource constraints
like memory. Arguably, agents might be said to
understand language if they are competent in this
sense, even if they sometimes make performance
errors. In contrast, our definition of emulation
(Definition 3) permits no performance errors. In
future work, it would be interesting to adapt an
approximate notion of emulation that tolerates
performance errors in order to more closely target
understanding in a sense reflecting competence.
Other Relations Theorem 1 and Theorem 2
investigate whether ℵL can be used to emulate
meaning representations that preserve an equiva-
lence relation. While equivalence is an important
part of semantics, other semantic relations like
entailment are also necessary for language under-
standing. In Appendix B, we show a generalization
of Theorem 5 extends to any semantic relation. In
other words, referential transparency also enables
emulation of relations besides =.
Other Oracles We believe assertions are a fairly
general model of the types of semantics encoded
in unsupervised learning resulting from a prag-
matic bias for truth; however, it is possible other
information is also represented, resulting from
other pragmatic biases governing language usage
and dataset creation. This additional information
could be formalized as access to additional ora-
cles. It would be exciting to formalize the power
of multimodal setups by analyzing the interactions
of oracles enabled by different input modalities.
7 Stepping Back
In this work, we formalized an argument that was
raised by Bender and Koller (2020) as a thought
experiment. Bender and Koller (2020) question
whether unsupervised training objectives are the
right goal to target for achieving natural language
understanding. If meaning is defined as identify-
ing which object in the real world, or which set of
situations, a linguistic element refers to, then, in a
direct sense, an ungrounded system cannot under-
stand meaning. But Bender and Koller (2020) go
farther than this, claiming that an ungrounded sys-
tem cannot even emulate understanding because it
is not clear how a system should learn to interpret
strings, even if it can model their distribution. We
formalize this idea of emulation as ℵ-emulation.
One counterargument mentioned by Bender and
Koller (2020) is that indirect forms of grounding
do exist in programming and natural language,
which we formalize as assertions. The syntac-
tic distributions of statements like assert allow
us to indirectly observe semantic relations over
the denotations. Assertions are one way that the
distribution of strings in a corpus is not blind
to their semantics. By studying them, we study
whether this indirect grounding enables a compu-
tational system to emulate the underlying semantic
relations.
1054
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Key Takeaways While assertions allow a sys-
tem to emulate semantic relations in simple cases
where the semantics are referentially transparent,
we find that linguistic constructs like variable
binding bring this task in conflict with the fun-
damental laws of computability. In other words,
under our formal model of meaning and emula-
tion, it is not just intractable for an ungrounded
system to emulate understanding of a formal lan-
guage, but, in some cases, impossible. We provide
constructive examples where understanding must
necessarily break down. We present these results
in a well-defined framework building off for-
mal approaches in logic, linguistics, and computer
science. While we do not prove anything about
natural languages, we do show that ungrounded
models must fail to emulate equivalence in a very
simple setting. A similar result likely extends to
natural language understanding as well, which
among other things, requires modeling referen-
tial identity (e.g., for sentences like Manny is the
cat). Further, we believe much of our framework
can be readily adopted in other works formalizing
understanding in Turing-complete systems.
Open Questions
In this work, we have focused
on utterances, by default, as opposed to dialogues.
An exciting extension would be to formalize a
dialogue between two speakers, interrupted by the
‘‘octopus’’ of Bender and Koller (2020).11 Ex-
isting theories of discourse could potentially be
synthesized with this framework. What linguistic
properties besides referential transparency relate
to emulatability? Can this framework be extended
to formalize multimodal setups, where multiple
oracles from different domains can potentially be
combined to gain additional power? Finally, is
there a natural way to relax our standard of emu-
lation towards a probabilistic definition, and how
would this change the results?
Acknowledgments
We thank Mark-Jan Nederhof for his excellent
suggestions. We also thank Dana Angluin, Matt
Gardner, Eran Yahav, Zachary Tatlock, Kyle
Richardson, Ruiqi Zhong, Samuel Bowman,
Christopher Potts, Thomas Icard, and Zhaofeng
11The octopus thought experiment imagines a deep-sea
octopus O observes a dialogue between two humans by
intercepting an underwater cable. Could O learn to emulate
the role of one of the speakers without exposure to life on
land?
Wu for their feedback on various versions of this
work. Further thanks to our anonymous reviewers
and researchers at the Allen Institute for AI and
UW NLP. Finally, we appreciate the lively online
discussion of the paper, which informed updates
to the camera-ready version.
References
Yossi Adi, Einat Kermany, Yonatan Belinkov,
Ofer Lavi, and Yoav Goldberg. 2017. Fine-
grained analysis of sentence embeddings using
auxiliary prediction tasks. In 5th International
Conference on Learning Representations, ICLR
2017, Toulon, France, April 24-26, 2017, Con-
ference Track Proceedings. OpenReview.net.
Dana Angluin. 1987. Learning regular sets from
Information
queries
and counterexamples.
and Computation, 75(2):87–106. https://
doi.org/10.1016/0890-5401(87)
90052-6
Yonatan Belinkov and James Glass. 2019. Anal-
ysis methods in neural language processing:
A survey. Transactions of the Association for
Computational Linguistics, 7:49–72. https://
doi.org/10.1162/tacl a 00254
Emily M. Bender and Alexander Koller. 2020.
Climbing towards NLU: On meaning, form,
and understanding in the age of data. In Pro-
ceedings of
the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 5185–5198, Online. Association for
Computational Linguistics. https://doi.org
/10.18653/v1/2020.acl-main.463
Tom B. Brown, Benjamin Mann, Nick Ryder,
Melanie Subbiah,
Jared Kaplan, Prafulla
Dhariwal, Arvind Neelakantan, Pranav Shyam,
Girish Sastry, Amanda Askell, Sandhini
Agarwal, Ariel Herbert-Voss, Gretchen
Krueger, Tom Henighan, Rewon Child, Aditya
Ramesh, Daniel M. Ziegler,
Jeffrey Wu,
Clemens Winter, Christopher Hesse, Mark
Chen, Eric Sigler, Mateusz Litwin, Scott Gray,
Benjamin Chess,
Jack Clark, Christopher
Berner, Sam McCandlish, Alec Radford, Ilya
Sutskever, and Dario Amodei. 2020. Language
models are few-shot learners. The arXiv is:
2005.14165.
1055
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Noam Chomsky. 1965. Aspects of the Theory of
Syntax, volume 11. MIT Press.
edition. Metaphysics Research Lab, Stanford
University.
the description of
Alexander Clark. 2010. Three learnable models
In Lan-
for
guage and Automata Theory and Applications,
pages 16–31, Berlin, Heidelberg. Springer
Berlin Heidelberg. https://doi.org/10
.1007/978-3-642-13089-2 2
language.
Alexis Conneau, German Kruszewski, Guillaume
Lample, Lo¨ıc Barrault, and Marco Baroni.
2018. What you can cram into a single $&!#*
vector: Probing sentence embeddings for lin-
guistic properties. In Proceedings of the 56th
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers),
2126–2136, Melbourne, Australia.
pages
Association for Computational Linguistics.
https://doi.org/10.18653/v1/P18-1198
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training
of deep bidirectional transformers for language
the 2019
understanding. In Proceedings of
Conference of the North American Chapter
of the Association for Computational Linguis-
tics: Human Language Technologies, Volume 1
(Long and Short Papers), pages 4171–4186,
Minneapolis, Minnesota. Association for Com-
putational Linguistics.
Herbert P Grice. 1975. Logic and conversation.
In Speech Acts, pages 41–58. Brill. https://
doi.org/10.1163/9789004368811 003
Zellig S. Harris. 1954. Distributional structure.
WORD, 10(2–3):146–162.
Irene Heim and Angelika Kratzer. 1998. Semantics
in Generative Grammar. Blackwell.
John Hewitt and Percy Liang. 2019. Designing and
interpreting probes with control tasks. In Pro-
ceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and
the 9th International Joint Conference on Nat-
ural Language Processing (EMNLP-IJCNLP),
pages 2733–2743, Hong Kong, China. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/D19-1275
Dieuwke Hupkes and Willem Zuidema. 2018.
Visualisation and ’diagnostic classifiers’ reveal
how recurrent and recursive neural networks
process hierarchical structure (extended ab-
stract). In Proceedings of the Twenty-Seventh
International Joint Conference on Artificial
Intelligence, IJCAI-18, pages 5617–5621. In-
ternational Joint Conferences on Artificial
Intelligence Organization. https://doi.org
/10.24963/ijcai.2018/796
Hector Levesque. 2014. On our best behaviour.
Artificial Intelligence, 212. https://doi.org
/10.1016/j.artint.2014.03.007
Julian Michael. 2020. To dissect an octopus:
Making sense of the form/meaning debate.
Christopher Potts. 2020. Is it possible for language
models to achieve understanding? https://
doi.org/10.1305/ndjfl/1153858644
Ian Pratt-Hartmann and Allan Third. 2006. More
fragments of language. Notre Dame Journal of
Formal Logic, 47(2):151–177.
Colin Raffel, Noam Shazeer, Adam Roberts,
Katherine Lee, Sharan Narang, Michael
Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
2019. Exploring the limits of transfer learn-
ing with a unified text-to-text transformer. The
arXiv: 1910.10683.
Anna Rogers, Olga Kovaleva,
and Anna
Rumshisky. 2020. A primer in BERTology:
What we know about how BERT works. arXiv:
https://doi.org/10.1162
2002.12327.
/tacl a 00349
Ian Tenney, Dipanjan Das, and Ellie Pavlick.
2019. BERT rediscovers the classical NLP
pipeline. In Proceedings of the 57th Annual
Meeting of the Association for Computational
Linguistics, pages 4593–4601, Florence, Italy.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/P19-1452
Laurence R. Horn and Heinrich Wansing. 2020.
Negation. In Edward N. Zalta, editor, The Stan-
ford Encyclopedia of Philosophy, spring 2020
Alan M. Turing. 1936. On computable numbers,
with an application to the Entscheidungsprob-
lem. Journal of Math, 58(345–363):5.
1056
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Alan M. Turing. 1950. Computing machinery
and intelligence. Mind, LIX(236):433–460.
https://doi.org/10.1093/mind/LIX.236
.433
Kai Von Fintel and Irene Heim. 2011. Intensional
semantics. Unpublished Lecture Notes.
Alex Wang, Amanpreet Singh, Julian Michael,
Felix Hill, Omer Levy, and Samuel Bowman.
2018. GLUE: A multi-task benchmark and
language un-
analysis platform for natural
the 2018
In Proceedings of
derstanding.
EMNLP Workshop BlackboxNLP: Analyzing
and Interpreting Neural Networks for NLP,
pages 353–355, Brussels, Belgium. Association
for Computational Linguistics. https://doi
.org/10.18653/v1/W18-5446
Alfred North Whitehead and Bertrand Russell.
1925–1927. Principia Mathematica, Cam-
bridge University Press.
Yoad Winter. 2016. Elements of Formal Se-
mantics: An Introduction to the Mathematical
Theory of Meaning in Natural Language.
Edinburgh University Press.
Dani Yogatama, Cyprien de Masson d’Autume,
Jerome Connor, Tomas Kocisky, Mike
Chrzanowski, Lingpeng Kong, Angeliki
Lazaridou, Wang Ling, Lei Yu, Chris Dyer,
and Phil Blunsom. 2019. Learning and evaluat-
ing general linguistic intelligence. arXiv: 1901
.11373.
Hongming Zhang, Xinran Zhao, and Yangqiu
Song. 2020. WinoWhy: A deep diagnosis of es-
sential commonsense knowledge for answering
Winograd schema challenge. In Proceedings of
the 58th Annual Meeting of the Association for
Computational Linguistics, pages 5736–5745,
Online. Association for Computational Linguis-
tics. https://doi.org/10.18653/v1/2020
.acl-main.508
A Multiple Worlds
Programs execute in well-defined environments
with a clear state. Speakers of natural language,
on the other hand, have imperfect information
and beliefs about the world around them. Thus, it
can be more natural to model grounding context
for language as a set of possible worlds, rather
than a single world state. We formalize this in
two different ways (with two different physical
interpretations) and explore how it affects our
results.
Let W be a set of all possible worlds. We
redefine denotations to be intensionalized (Von
Fintel and Heim, 2011), that is, we write (cid:2)e | κ(cid:3)w
as the denotation of e in the context κ, evaluated
in world w ∈ W . Assume for simplicity that
Y = {0, 1, ∅}. We will now introduce modal
denotations and assertions using a generic modal
quantifier (cid:18), which reduces a sequence of worlds
to a boolean value according to some intensional
predicate. This quantifier controls how multiple
possible worlds are collapsed to form denotations
and query outputs.
Definition 5 (Modal denotation) Let (cid:18) be a
modal quantifier. For all e ∈ Σ(cid:2), κ ∈ (Σ(cid:2))2,
define
(cid:5)
(cid:18)(cid:2)e | κ(cid:3)L =
(cid:2)e | κ(cid:3)w
L.
w∈W
We will write the previously defined assertion
oracle to apply in a specific world w, namely, ℵw
L.
We also extend it to quantify over multiple worlds:
Definition 6 (Modal assertion) Let (cid:18) be a modal
quantifier. For all e ∈ Σ(cid:2), κ ∈ (Σ(cid:2))2, define
(cid:18)ℵL(e, e(cid:10) | κ) =
(cid:5)
w∈W
L(e, e(cid:10) | κ).
ℵw
Specifically, we consider (cid:18) = {(cid:2), ♦}, corre-
sponding to universal and existential quantifiers
over worlds. Thus, (cid:2) can be thought of as as ∀
over worlds, and ♦ can be thought of as ∃. For
either quantifier, if any (cid:2)e | κ(cid:3)w
L = ∅, we define
(cid:18)(cid:2)e | κ(cid:3)L = ∅ as well. Each quantifier will have
a different physical interpretation. With universal
quantification, we will find that results analogous
to Theorem 1 and Theorem 2 hold. With existen-
tial quantification, it turns out that the equivalence
class of μ is underspecified. In other words, not
only is it impossible to compute an emulator with
a finite number of assertion queries, but, even with
infinite assertions, there is no consistent way to
emulate the underlying modal semantics.
A.1 Universal Quantification
In the first case we let (cid:18) = (cid:2). Two expressions
are viewed as having the same meaning if they
1057
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
are equivalent in every possible belief world. This
is interpretable as observing text L(cid:2) written by a
single author whose belief state is represented by
multiple possible worlds. The author only asserts
a statement is true if it is consistent across all
worlds that they believe are possible.
In this setting, we will show that the modal
assertion oracle uniquely specifies a modal de-
notation for each expression, up to isomorphism.
In other words, as with the non-modal assertion
oracle, each assertion query would let us decide
some relation between two expressions. Thus, the
same results for the non-modal setting discussed
in the main body of the paper will also hold here.
Theorem 3 Consider e, e(cid:10) ∈ Σ(cid:2) and any con-
text κ ∈ (Σ(cid:2))2 such that (cid:2)(cid:2)e | κ(cid:3)L (cid:9)= ∅ and
(cid:2)(cid:2)e(cid:10) | κ(cid:3)L (cid:9)= ∅. Then,
(cid:2)(cid:2)e | κ(cid:3)L = (cid:2)(cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ (cid:2)ℵL(e, e(cid:10) | κ).
Proof.
(cid:2)(cid:2)e | κ(cid:3)L = (cid:2)(cid:2)e(cid:10) | κ(cid:3)L
(cid:6)
(cid:2)e | κ(cid:3)w
⇐⇒
L =
(cid:6)
(cid:2)e(cid:10) | κ(cid:3)w
L
⇐⇒
⇐⇒
w∈W
(cid:6)
(cid:3)
w∈W
L = (cid:2)e(cid:10) | κ(cid:3)w
L
(cid:2)e | κ(cid:3)w
(cid:4)
w∈W
(cid:6)
w∈W
L(e, e(cid:10) | κ)
ℵw
⇐⇒ (cid:2)ℵL(e, e(cid:10) | κ).
(cid:2)
Crucial to this simple proof is the fact that ∧ is
distributive over =. This is specific to the quan-
tifier being (cid:2). Theorem 3 implies that (cid:2)(cid:2)e | κ(cid:3)L
can be recovered from modal assertion queries
analogously to the non-modal case. Thus, results
analogous to Theorem 1 and Theorem 2 apply for
emulating (cid:2)(cid:2)e | κ(cid:3)L using queries to (cid:2)ℵL.
A.2 Existential Quantification
In the second case we let (cid:18) = ♦. Two expressions
are viewed as having the same meaning if they
are equivalent in some world. This is interpretable
as observing a large dataset of text L♦ generated
by many authors, each with a different single
belief world w. In the corpus, we imagine two
expressions can be asserted to be equivalent in
some context if any of the authors would consider
them to be equal in that context.
e1
0
0
0
e2
0
0
0
ℵ
1
1
1
e1
0
0
0
e2
0
1
1
ℵ
1
0
1
w1
w2
♦
Table 1: Two tables (separated by a thick line)
representing two different versions of W . Within
each table, each cell i, j in the main 2-by-2 grid
contains the boolean value (cid:2)ej | κ(cid:3)wi
L . The column
to the right contains ℵwi
L (e1, e2 | κ). The bottom
row aggregates each column by quantifying ♦.
In this case, assertions do not even fully spec-
ify equivalence between the modal denotations.
This is a stronger sense in which meaning cannot
be emulated from assertion queries. Emulation
is not just impossible with finite assertions, but
mathematically underspecified.
Theorem 4 There exist e, e(cid:10) ∈ E(L) and κ ∈
(Σ(cid:2))2 such that ♦(cid:2)e | κ(cid:3)L (cid:9)= ∅ and ♦(cid:2)e(cid:10) | κ(cid:3)L (cid:9)=
∅, and also ♦ℵL(e, e(cid:10) | κ) = 1 is consistent with
either ♦(cid:2)e | κ(cid:3)L = ♦(cid:2)e(cid:10) | κ(cid:3)L or ♦(cid:2)e | κ(cid:3)L (cid:9)=
♦(cid:2)e(cid:10) | κ(cid:3)L.
Proof. We construct an example with expressions
e1, e2 in a single context κ. Fix W = {w1, w2}.
Table 1 shows two versions of this modal setup. In
both versions of the universe, ♦ℵL(e, e(cid:10) | κ) = 1.
However, on the left, ♦(cid:2)e | κ(cid:3)L = ♦(cid:2)e(cid:10) | κ(cid:3)L,
while, on the right, the opposite holds. So, with
♦, modal assertions do not uniquely determine
(cid:2)
equivalence of modal denotations.
As an equivalence class for μ is not even well-
defined by ♦ℵL, we cannot hope to compute it
from queries. This is an even stronger sense in
which emulation is impossible using assertions.
On some level, this may be a natural model for
language modeling corpora, which aggregate text
from potentially inconsistent sources.
In summary, if assertions uniquely determine
equivalence between denotations in a strongly
transparent language, then we can expect to emu-
late representations preserving equivalence using
assertions. Otherwise, there are various levels of
formal challenges to emulating equivalence.
B Other Semantic Relations
Sections 4, 5, and A investigate whether ℵL can
be used to emulate meaning representations that
1058
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 5: A concrete implementation of all strings, which is referenced in Figure 2 and Figure 6.
Figure 6: emulate computes a structured representation of the input string expr that preserves any semantic
relation ◦ in terms of assertion queries. The iterable all strings is defined in Figure 5.
preserve semantic equivalence. While equivalence
is an important part of semantics, other semantic
relations are also necessary for language un-
derstanding. For example, the following feature
prominently in theories of linguistic semantics:
Definition 7 For e, e(cid:10), ∈ Σ(cid:2) and κ ∈ (Σ(cid:2))2,
define the assertion oracle
ℵL,◦(e, e(cid:10) | κ) =
(cid:2)
1
0
if (cid:2)e | κ(cid:3)L ◦ (cid:2)e(cid:10) | κ(cid:3)L
otherwise.
• Entailment In general terms, an entailment
(Winter, 2016) relation → is a partial order
over Y . Intuitively, if y → y(cid:10), then y is a
‘‘special case’’ of y(cid:10). For example, one could
construct E, a semantic analysis of English,
where (cid:2)fat cat | a, sits(cid:3)E → (cid:2)cat | a, sits(cid:3)E.
• Contrary negation Negation is a complex
topic in semantics. One sense of negation is if
two meaning representations are ‘‘contrary’’
(Horn and Wansing, 2020), meaning both
cannot be true at the same time.
Does Theorem 2 generalize to other relations
besides =? To answer this, we first extend asser-
tions and emulation to apply to a generic relation
◦ : M 2. The proof for Theorem 1 does not fully
translate to this new setting, but we will show via
a new argument that emulation is still possible.
Definition 8 A class of languages L over Σ is
ℵ-emulatable w.r.t. ◦ if there exists an oracle
Turing machine μ and standard Turing machine
δ such that, for all L ∈ L, κ ∈ (Σ(cid:2))2, and
e, e(cid:10) ∈ suppL(κ),
(cid:2)e | κ(cid:3)L ◦ (cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ δ
(cid:4)
(cid:3)
μL(e), μL(e(cid:10)) | κ
.
We now are ready to prove the extended form
of Theorem 1. The main idea of the proof will be
to memoize the value of the relation ◦ between
(cid:2)e | κ(cid:3)L and the values of all expressions smaller
than e. This guarantees that δ will be able to ‘‘look
up’’ the correct output.
Theorem 5 TRANSPARENT is ℵ-emulatable w.r.t. ◦.
Proof. Similarly to Theorem 1, we present the
proof constructively as a Python program to
1059
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
compute μ. We then show how to define δ
appropriately, completing the proof.
Figure 6 shows the algorithm to compute
μL(e) ∈ M . In Python, μL(e) is a dictionary;
we interpret it as a function μL(e) : Σ(cid:2) × Σ(cid:2) →
{0, 1, ∅}, where ∅ represents values that are not
set. We define δ as follows:
δ(m, m(cid:10) | κ) ⇐⇒
(cid:2)
m(e, e(cid:10))
m(cid:10)(e, e(cid:10))
if m(e, e(cid:10)) (cid:9)= ∅
otherwise.
Crucially, it must be that μL(e)(e, e(cid:10)) (cid:9)= ∅ or
μL(e(cid:10))(e, e(cid:10)) (cid:9)= ∅. In Figure 6, cand either reaches
e before e(cid:10), or e(cid:10) before e. By symmetry, assume
it reaches e before e(cid:10). Then μL(e(cid:10))(e, e(cid:10)) (cid:9)= ∅, so
δ(μL(e), μL(e(cid:10)) | κ) ⇐⇒ μL(e(cid:10))(e, e(cid:10)) = 1
⇐⇒ ℵL,◦(e, e(cid:10) | λ2) = 1
⇐⇒ (cid:2)e | λ2(cid:3) ◦ (cid:2)e(cid:10) | λ2(cid:3)
⇐⇒ (cid:2)e | κ(cid:3) ◦ (cid:2)e(cid:10) | κ(cid:3).
(cid:2)
transitivity, and symmetry:
Therefore emulate satisfies Definition 3.
We needed to change the proof of Theorem 5
compared to Theorem 1 because ◦ is not an equiv-
alence relation. In Theorem 1, the final steps relied
on reflexivity,
the
three properties that constitute equivalence. The
new proof enlarges the size of the emulated rep-
resentations. Rather than representing each e with
a number, μL(e) becomes a large dictionary of
strings. This represents an increase in space com-
plexity from linear to exponential in the size of e.
We will write f ∼
== g in this case. We will
refer to a set of contexts S ⊆ (Σ(cid:2))2 as a syntactic
role. Each syntactic role has a set of expressions
supp−1
L (S) whose support is that role:
suppL(e) = {κ ∈ (Σ(cid:2))2 | (cid:2)e | κ(cid:3)L (cid:9)= ∅}
supp−1
L (S) = {e ∈ Σ(cid:2) | suppL(e) = S}.
We can now give the old definition of emulation:
Definition 9 (Old ℵ-emulation) μ : Σ(cid:2) → M
emulates (cid:2)·(cid:3)L w.r.t. = iff:
1. μ ∼
== (cid:2)·(cid:3)L over supp−1
L (S), for all S ⊆ (Σ(cid:2))2
2. There exists a Turing machine that computes
whether m = m(cid:10) for each m, m(cid:10) ∈ M
3. There exists a Turing machine with oracle
access to ℵL that computes μ
For a set of languages L, this is equivalent to
saying L is ℵ-emulatable iff, for all L ∈ L, there
exists an oracle Turing machine μ and normal
Turing machine δ such that, for all S ∈ (Σ(cid:2))2,
e, e(cid:10) ∈ supp−1
L (S),
(cid:2)e(cid:3)L = (cid:2)e(cid:10)(cid:3)L ⇐⇒ δ
(cid:3)
μL(e), μL(e(cid:10))
(cid:4)
.
This more closely resembles Definition 3, but
we will make two slight changes. First, we will
change the quantifier order, such that a single μ
must work for every L ∈ L. Then, we will grant
δ access to a context κ, and rephrase the equation
to hold over all κ ∈ (Σ(cid:2))2 and e, e(cid:10) ∈ suppL(κ):
(cid:4)
μL(e), μL(e(cid:10)) | κ
(cid:2)e | κ(cid:3)L = (cid:2)e(cid:10) | κ(cid:3)L ⇐⇒ δ
(cid:3)
.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
1
2
1
9
6
3
9
8
3
/
/
t
l
a
c
_
a
_
0
0
4
1
2
p
d
.
C Old Emulation Definition
A previous version of this paper defined emulation
slightly differently. We discuss the differences and
explain the advantages of the new definition. First,
we defined a general denotation as
(cid:2)e(cid:3)L = {(cid:5)κ, (cid:2)e | κ(cid:3)L(cid:6) | κ ∈ (Σ(cid:2))2}.
The general meaning represents the meaning of
a word across all contexts. Now, say that two
functions f, g are isomorphic (with respect to =)
over a set X iff, for all x, x(cid:10) ∈ X,
f (x) = f (x(cid:10)) ⇐⇒ g(x) = g(x(cid:10)).
This recovers Definition 3. This version more
faithfully reflects the intuitive notion of emulation.
The old version required μL(e) to determine how
e should evaluate in every possible context. Em-
ulation would not be possible in some cases even
with perfect knowledge of L. Now, it must just
be possible in any context κ to compute (cid:2)e | κ(cid:3)L
from κ and μL(e), which is a weaker standard.
Under the new definition, it is always possible to
emulate a class of languages with one element,
assuming (cid:2)e | κ(cid:3)L is computable. An additional
improvement is that emulation now applies to all
expressions that share a context, whereas before it
only targeted expressions with the same support.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
1060
Descargar PDF