REPORT - Am MIT spezialisierte KI-Forschung

REPORT

Active Iterative Social Inference in Multi-Trial
Signaling Games

Asya Achimova1,2

, Gregory Scontras3, Ella Eisemann4, and Martin V. Butz1,5

1Research Training Group 1808 “Ambiguity: Production and Perception”, University of Tübingen, Tübingen, Deutschland
2Department of General and Computational Linguistics, University of Tübingen, Tübingen, Deutschland
3Department of Language Science, Universität von Kalifornien, Irvine, USA
4Institute of Vocational Education and Work Studies, Technische Universität Berlin, Berlin, Deutschland
5Department of Computer Science and Department of Psychology, University of Tübingen, Tübingen, Deutschland

Schlüsselwörter: pragmatics, social learning, sequential learning, ambiguity, online experiments,
inference, learning about others

ABSTRAKT

Human behavioral choices can reveal intrinsic and extrinsic decision-influencing factors. Wir
investigate the inference of choice priors in situations of referential ambiguity. Insbesondere, Wir
use the scenario of signaling games and investigate to which extent study participants profit
from actively engaging in the task. Previous work has revealed that speakers are able to infer
listeners’ choice priors upon observing ambiguity resolution. Jedoch, it was also shown that
only a small group of participants was able to strategically construct ambiguous situations to
create learning opportunities. This paper sets to address how prior inference unfolds in more
complex learning scenarios. In Experiment 1, we examine whether participants accumulate
evidence about inferred choice priors across a series of four consecutive trials. Despite the
intuitive simplicity of the task, information integration turns out to be only partially successful.
Integration errors result from a variety of sources, including transitivity failure and recency
bias. In Experiment 2, we investigate how the ability to actively construct learning scenarios
affects the success of prior inference and whether the iterative settings improve the ability to
choose utterances strategically. The results suggest that full task engagement and explicit
access to the reasoning pipeline facilitates the invocation of optimal utterance choices as well
as the accurate inference of listeners’ choice priors.

EINFÜHRUNG

With an objective in mind but multiple options at hand, an agent must make a choice about
the appropriate action to take. When observing such choices, we can learn about the mental
states of the agents who made them: what led the agent to choose option a over options b or c?
The current paper explores a particular type of social scenario that presents choices to an
agent: cases of referential ambiguity where one particular referent must be chosen in response
to an ambiguous utterance, which opens up multiple choice options. In this process, listeners
rely on their choice priors—the beliefs, preferences, or desires that shape an agent’s choice
behavior—as well as a variety of pragmatic reasoning strategies, to come to a decision. Wir
explore how people reason about the apparent choice priors of their social partners as they
resolve ambiguity, particularly in cases where the speaker can create ambiguous situations
actively and iteratively over several interaction trials.

Keine offenen Zugänge

Tagebuch

Zitat: Achimova, A., Scontras, G.,
Eisemann, E., & Butz, M. V. (2023).
Active Iterative Social Inference in
Multi-Trial Signaling Games. Open
Geist: Discoveries in Cognitive Science,
7, 111–129. https://doi.org/10.1162
/opmi_a_00074

DOI:
https://doi.org/10.1162/opmi_a_00074

Erhalten: 17 Marsch 2022
Akzeptiert: 10 Februar 2023

Konkurrierende Interessen: The authors
declare no conflict of interests.

Korrespondierender Autor:
Asya Achimova
asya.achimova@uni-tuebingen.de

Urheberrechte ©: © 2023
Massachusetts Institute of Technology
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International
(CC BY 4.0) Lizenz

Die MIT-Presse

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

The human ability to interpret each other’s behavior as driven by motives, intentions, Und
goals is a critical component of Theory of Mind. Early work in this direction developed within
the attribution theory ( Jones & Davis, 1965; Kelley, 1967; Kelley & Stahelski, 1970). The ability
to infer mental states of others upon observing their behavioral choices develops early in life.
Infants as young as 18 months of age have been shown to infer the preferences of the exper-
imenter in a setup where the experimenter is pulling toys from buckets, and the buckets differ
in their distributions of types of toys (Kushnir et al., 2010). In a different set of experiments, Das
time with adults, Baker et al. (2017) show that participants are able to infer the food prefer-
ences of an agent upon observing how the agent navigates the space between several food
LKWs. The authors furthermore model the inference process as Bayesian Theory of Mind infer-
enz. Jara-Ettinger et al. (2016, 2020) argue that this social inference is an integral part of a
naive utility calculus—an intuitive theory humans have about other agents making choices.
Hier, we explore potential benefits of actively engaging the agent, who makes social infer-
ences iteratively across four trials of distinct signaling game interactions, by enabling her to
actively choose utterances in each trial. The utterances selectively restrict the response choices
available to the listener. We further embed our task in a 4-trial learning scenario where par-
ticipants observe the behavior of a particular simulated agent through several iterations.

Iterative decision-making has been previously explored with computational models of
social inference (Evans et al., 2016; Jara-Ettinger et al., 2020). Integrating information across
a sequence of trials entails not only retaining information in memory over a period of time
longer than a single trial, but also performing additional inference steps. Zum Beispiel, partic-
ipants may need to perform transitive inferences in a given learning scenario, inferring that a is
rated higher than c upon observing a > b and b > c scenarios. Ciranka et al. (2022) investigated
how inference success depends on the type of feedback provided to the participants. Sie
contrasted a model where full feedback is provided and participants do not have to make tran-
sitive inferences about the ordering of values (Bryant & Trabasso, 1971; Wynne, 1995) with a
partial feedback model. In the full feedback model, if a is chosen over b, the model increases
the value of a and decreases the value of b at the same rate. In the partial feedback model,
andererseits, the implicit value update is asymmetric: the model only increases or decreases
the value of the chosen or discarded property, jeweils, but does not both increase and
decrease values. The authors demonstrated that transitive inference can be efficiently modeled
as a reinforcement learning scenario and demonstrated that the model gives correct predic-
tions for a range of cognitive effects reported in psychophysics and decision-making.

From a reinforcement learning (RL) Perspektive, preference inference can be modeled as a
particular instance of hidden value learning (Sutton & Barto, 2018), inverse RL (Hadfield-
Menell et al., 2016), or inverse decision-making ( Jern et al., 2017). Jern et al. (2017) investigate
how participants infer preferences of agents choosing objects with multiple attributes. Building
on the naive utility calculus model of Jara-Ettinger et al. (2016), the authors offer an inverse
decision-making model that accounts for human inferences. Their model relies on a decision-
making function that provides an explicit link between the preferences of an agent and a deci-
sion that she makes. Trotzdem, the choices humans make can be motivated by a multitude of factors
and precisely specifying which of them drive decision making is a complex task. Zum Beispiel,
the model of Evans et al. (2016) infers not only preferences but also beliefs of the agent, moti-
vated by the general Beliefs-Desires-Intention model (Bratman, 1984).

In our work, we will use preference inference as a test scenario to investigate how choice
priors more generally can be inferred in situations where a participant makes a behavioral
Auswahl. Even though we ask participants to infer potential “preferences” of a simulated listener,
due to the abstractness of the task, we cannot specifically test whether it is the preferences or

OPEN MIND: Discoveries in Cognitive Science

112

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

other factors that determine the choices of objects in the task. Eher, we regard the inference
of choice priors in our scenarios as a form of social inference concentrating on the following
Aspekte. Erste, we investigate whether participants successfully integrate information iteratively
across the four trials. Zweite, we explore how an active role of the participant in the learning
scenario affects inference success. In this work, we focus on the empirical investigation of
choice priors, without any further differentiation of the factors that contribute to them. Der
results imply that the active creation of choice options helps improving the social inference
of choice priors.

AMBIGUITY RESOLUTION PARADIGM

We use a signaling game scenario in which choices can be made and reasoned about. In clas-
sic signaling games, a speaker makes an utterance and signals an object to the listener (Lewis,
1975). The listener’s task is to identify the intended object. Typically, signaling games are used
to investigate how speakers make utterance choices to maximize the chance that the listener
will choose the target object. Darüber hinaus, the listeners’ choice behavior has been investigated in
situations when the utterance applies to more than one object—a case of referential ambiguity
(z.B., Frank & Guter Mann, 2012; Franke & Jäger, 2016; Guter Mann & Frank, 2016). Im Gegensatz,
here we focus on the extent to which speakers can draw iterative social inferences about the
behavioral choice priors of listeners. Speakers observe listener’s object choices in four succes-
sive signal game interaction trials. In jedem Versuch, a particular set of three objects is shown, A
particular utterance is provided, and the consequent object choice is indicated. Teilnehmer,
acting as the speaker, are then asked to infer the apparent choice priors of the listener
(instructed as “apparent preference”). Darüber hinaus, in the second experiment, speakers are addi-
tionally asked to choose the potentially choice-restricting utterance.

The general potential of such signaling game scenarios to infer listeners’ choice priors has
been previously explored in Achimova et al. (2022). The authors have shown that participants
were indeed able to infer listener priors upon observing the listener’s choice of an object
given an ambiguous object choice request. Zum Beispiel, in Abbildung 1, participants might
observe that the speaker said “red”—a referentially-ambiguous utterance that is consistent
with either of the two red objects—and the listener resolves the ambiguity by choosing the
center object, as indicated by the orange square. The task was to decide which “preferences”
the listener may have used to make her choice. What Achimova et al. (2022) labelled
“preferences” operationalized as choice priors over potential object selections in their
Bayesian model. Entsprechend, we refer to the more general term “choice prior” in the
remainder of the current paper. In response to a scenario like Figure 1, participants were more
likely to conclude that the listener has larger choice priors for clouds and stripes than for circles
and polka-dotted objects.

Figur 1. Example preference-inference communication scenario from Achimova et al. (2022).
Participants see the three-object scenario, observe that a speaker produced an utterance (z.B., “red”)
as an instructive choice request, and are informed about the listener’s consequent choice (d.h.,
picking the striped red cloud, as indicated by the orange dotted square).

OPEN MIND: Discoveries in Cognitive Science

113

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 2. Exemplar utterance-choice communication scenario from Achimova et al. (2022). In einem
typical utterance choice task, participants are asked to use an object property (z.B., “blue,” “green,”
“cloud,” etc.) as a choice instruction to the listener, such that they can expect to learn about the
choice priors of the listeners when observing their consequent object choice. Infolge, ambiguous
utterances typically promise more information gain than unambiguous ones.

For strategically creating cases of ambiguity, Achimova et al. (2022) had participants help
the speaker select their utterances in an effort to better learn about the listener’s choice priors.
Also, Zum Beispiel, when confronted with the scenario in Figure 2, a subset of participants would
suggest “green”, “striped”, or “cloud” rather than “circle”, “blue”, or “solid” – because these
utterances create a referential ambiguity that can reveal information about listeners’ choice
priors upon observing their object choice. Surprisingly, a varying but significant subset of other
participants systematically selected un-ambiguous utterances, failing to pursue information
gain about choice priors but preferring ambiguity avoidance.1

Achimova et al. (2022) articulate a hypothesis about how speakers reason about choice
priors in the context of ambiguity—a hypothesis in the form a computational cognitive model
formulated within the Rational Speech Act modeling framework (Guter Mann & Frank, 2016).
While the authors found support for their hypothesis in terms of the model’s ability to quan-
titatively predict human behavior in the experimental tasks, the model makes an
interesting—and as yet untested—prediction: when observing multiple ambiguity resolution
Versuche, participants should be able to gain even deeper insights into the (potentially complex)
choice priors that the listener may use to resolve cases of ambiguity. We explore this expec-
tation in the current work. Insbesondere, we expected that participants will be able to both
integrate gained knowledge over subsequent trials and choose ambiguous utterances in a
more strategic manner when in a multi-trial setting. Darüber hinaus, we expected that the partici-
pants that choose maximally effective ambiguous utterances will also learn more from the con-
sequent ambiguity resolution behavior.

To test these expectations, we asked to what extent participants can learn a more complex
hierarchy of choice priors when experiencing four subsequent signaling game interaction trials.
Darüber hinaus, we asked whether participants’ inference success could benefit from enabling active
utterance choices. Over the course of two experiments, we show that (ich) multi-trial learning
about choice priors is possible (Exp. 1 & 2), (ii) the inference process suffers from a recency bias
(Exp. 1 & 2), (iii) some participants manage to actively choose ambiguous utterances in search
of information gain about choice priors (Exp. 2), Und (iv) participants indeed learn more about
the listeners’ choice priors when they actively pursue ambiguous utterances (Exp. 2).

EXPERIMENT 1: ITERATED PRIOR INFERENCE

Erste, we extend the information-foraging experimental set-up from Achimova et al. (2022) to a
multi-trial setting, seeing whether participants are able to learn about the (potentially-complex)

1 See Achimova et al. (2022) for a full discussion of the variability in utterance choice strategies across par-
ticipants and across several experiments.

OPEN MIND: Discoveries in Cognitive Science

114

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 3.

Sample trial for Experiment 1.

priors of conversation partners in the context of ambiguous utterances.2 Rather than the
single-trial design of the Achimova et al.’s (2022) Experiment, here participants are exposed
to four trials’ worth of interpretation behavior.

Material and Methods

Teilnehmer. We collected data online using the Prolific crowd-sourcing platform. Teilnehmer
received £1.3 as compensation, and the experiment lasted approximately 9 minutes (mean =
8.58 minutes, median = 7.71 minutes). The experimental protocol was approved by the
Psychology Department Ethics Committee at the University of Tübingen. We collected data
aus 55 Teilnehmer.

Design. Participants completed 4 blocks of trials, each containing 4 Versuche. Within a block, Wir
kept the simulated listener stable. Each listener had a name and an avatar. According to the test
scenario, the listener picked an object that fit the description she heard, and she always picked
her “favorite” shape, texture, or color (Figur 3). The task of the participant was to infer the
preferences of the listener along a particular dimension: color, shape, oder Textur. To indicate
the preferences, participants adjusted the sliders corresponding to the levels of the target prop-
erty. Zum Beispiel, if a participant’s task was to infer shape preferences (wie in Abbildung 3), she was
asked to adjust the sliders “cloud”, “circle”, and “square”. At the end of the block, we provided
feedback to the participants showing whether they inferred the preferences of the listener cor-
rectly. After that, participants proceeded to the next block.

2 Experiment 1 is available at https://cognitive-modeling-experiments.uni-tuebingen.de/publix/10/start
?batchId=17&generalMultiple.

OPEN MIND: Discoveries in Cognitive Science

115

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

The experiment featured two types of learning scenarios. In the a > b > c blocks, es war
possible to learn the full preference hierarchy of the simulated listener upon observing their
ambiguity resolution behavior. Over the course of four trials, participants saw scenarios that
allowed learning that a is preferred over b and b is preferred over c. The a > c pair was never
explicitly presented in the experiment, and thus participants were invited to make the transi-
tivity inference themselves. Daher, if the task was to infer color preferences and the simulated
listener preferred red over green and green over blue objects, critical trials showed the lis-
tener’s choice for each of these pairs. Partial hierarchy blocks, or a > b, c blocks, allowed
participants to learn that one feature value was preferred to two other values, aber dort
was no evidence for the relative preference of b and c. Mit anderen Worten, participants saw
explicit evidence for both of the pairs a > b and a > c, but no evidence for the relationship
between b and c.

Each block contained four trials: two critical ones and two fillers. Filler trials differed in their
informativity. Redundant fillers provided the same information that was already presented in
critical trials, offering additional evidence to the participants to test their hypotheses. Uninfor-
mative fillers featured scenarios where no learning about priors was possible. Zum Beispiel, es ist
not possible to infer any preferences when the chosen utterance is unambiguous, as well when
an utterance is ambiguous but it applies to objects that do not differ in their target feature value
(z.B., the task is to infer color preferences, and the utterance “round” applies to 2 objects that
are both red).

Daher, crossing two types of learning scenarios and two types of filler trials yielded four types
of experimental blocks. Each participant completed all four blocks of trials; the block order
was randomized.

Ergebnisse

Inference success. We begin presenting the results by identifying how often participants were
able to infer the most preferred feature value (d.h., A) in different blocks of trials upon observing
referential ambiguity resolution. Participants indicated the inferred preferences by adjusting
slider values. To convert slider values into hierarchies, we simply ordered the slider inputs.
If a participant assigned a value of 0.8 to a, 0.5 to b, Und 0.1 to c, we recorded the inferred
hierarchy as a > b > c. Daher, we evaluated for the last trial in each block whether a participant
rated the property a higher than the properties b and c. Figur 4 plots success at inferring the
preferred feature value by block type. The results of a generalized linear mixed effects model
predicting preferred value inference by filler type (redundant vs. uniformative) and hierarchy
(a > b > c vs. a > b, C) with random intercepts for participants demonstrates that participants
were more successful in inferring the preferred value of the target feature in the simpler a > b, C
blocks compared to the more complicated a > b > c blocks ( β = 2.1892, SE = 0.397, z = 5.509,
P < 0.001). Moreover, participants identified the correct preferred value less often when the fillers were uninformative compared to redundant fillers, since the latter provided confirmatory evidence ( β = −1.132, SE = 0.352, z = −3.221, p < 0.01). Integration of evidence across trials. Our main question in Experiment 1 was whether partici- pants were able to integrate the priors learned across a series of trials or whether they relied only on single trial evidence instead. For the first trial, the trial evidence and the available evidence are the same. However, for the second trial, the available evidence diverges from the trial evidence: the available evidence incorporates what could have been learned from the previous trials. Table 1 illustrates the difference between trial evidence and available OPEN MIND: Discoveries in Cognitive Science 116 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / . / 1 0 1 1 6 2 o p m _ a _ 0 0 0 7 4 2 0 7 8 8 9 7 o p m _ a _ 0 0 0 7 4 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Active Social Inference Achimova et al. Figure 4. Experiment 1: Proportion of blocks where the most preferred value has been identified correctly. Learning success increases when redundant information is provided. Participants are less accurate when they infer priors in the a > b > c blocks compared to a > b, c blocks.

evidence. Tisch 1 also provides examples of accumulated evidence, or the preference hierar-
chy indicated by a participant’s slider ratings on a given trial.

To assess the rates at which participants rely on evidence collected in previous trials, Wir
first compared what relationship between the feature values a, B, and c the participants
inferred (d.h., their accumulated evidence) and what relationship could in principle have been
inferred given the set of trials a participant saw in that block (d.h., their available evidence). Wir
assigned a value of 1 as their accumulated evidence score if a participant’s accumulated evi-
dence matched the available evidence, suggesting that they successfully incorporated the
information they previously learned; we assigned a value of 0 if a participant’s accumulated
evidence did not match their available evidence, suggesting that they failed to integrate the
evidence from the previous trial. Dann, for each participant we calculated an average accu-
mulated evidence score taking into account their performance either across all 16 Versuche (vier
blocks of four trials each) or across blocks with similar evidence type (d.h., a > b > c blocks vs.
a > b, c blocks). This score reflects whether participants systematically integrated evidence
throughout (portions of ) the experiment. Zusätzlich, we also calculated the proportion of trials
in which participants successfully inferred the priors just based on the information available in

Tisch 1.
True hierarchy: a > b > c

Trial evidence vs. available evidence and the corresponding accumulated evidence score.

Trial
1

Trial
evidence
a > b

b > c

Verfügbar
evidence
a > b

a > b > c

Accumulated
evidence
a > b

a > b > c

Accumulated
evidence score
1

OPEN MIND: Discoveries in Cognitive Science

117

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 5.
Experiment 1: Density plots over the proportion of evidence-respecting preference
inference trials, dependent on the available trial evidence (right bottom; average trial evidence
Punktzahl) or accumulated evidence (Andere; average accumulated evidence score) in all blocks (bot-
tom) or block-respective (top). While participants take the individual trial evidence well into
account, more errors can be detected in the accumulated evidence and in particular in the trials
where a more complex hierarchy (a > b > c) can be learned.

that trial. We refer to this metric as trial evidence and use it as a control showing task
engagement.

Figur 5 shows the distribution of participants’ accumulated evidence scores across differ-
ent blocks of trials. The two upper panels contrast blocks where the full hierarchy (a > b > c)
could have been learned vs. blocks where only partial information was available (a > b, C).
The probability mass on the right side of each panel corresponds to participants who success-
fully integrated evidence. A linear mixed effects model analysis predicting the accumulated
evidence scores (binomial variable) by block type confirms that participants were more suc-
cessful at integrating evidence across the blocks of trials for the a > b, c blocks compared to a >
b > c blocks ( β = 0.375, SE = 0.028, t = 13.45, P < 0.001). This effect is expected: the partial hierarchy is cognitively simpler since the participants do not need to make any transitivity inferences and can simply rely on explicit evidence they register in a series of trials. Figure 5 also provides a more general measure of evidence accumulation success by looking at the performance in all blocks together (bottom left panel). This distribution is skewed to the right, suggesting that most of the participants did successfully accumulate evidence across a series of trials more than half of the time. This result confirms that prior inference upon observ- ing ambiguity resolution extends to multi-trial scenarios. Finally, the bottom right panel of Figure 5 shows an example of a distribution when most participants achieve the highest score: trial evidence. Trial evidence concerns whether a participant uses the evidence available in a given trial on that trial. A high trial evidence score signals that participants paid attention throughout the experiment and performed the task as expected. In sum, the distributions of trial evidence and accumulated evidence scores demonstrate that a) participants successfully infer the preferences of the simulated listener within a single OPEN MIND: Discoveries in Cognitive Science 118 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / / . 1 0 1 1 6 2 o p m _ a _ 0 0 0 7 4 2 0 7 8 8 9 7 o p m _ a _ 0 0 0 7 4 p d / . i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Active Social Inference Achimova et al. trial (trial evidence); b) they integrate the inferred information across a series of trials (accumu- lated evidence); and c) they perform better in blocks with partial rather than full hierarchy available (block effect on accumulated evidence score). In the next subsection, we will take a closer look at those cases where participants fail to integrate evidence across trials. Analysis of errors. The performance on a > b > c blocks (upper left panel of Figure 5) zeigt, dass
several participants made errors in accumulating evidence across trials. To better understand
these errors, we ask whether the presentation order of trials affected their inference success:
perhaps learning that b > c after learning that a > c made the transitivity inference that a > c
more difficult. In order to assess participants’ inference success, we calculated the total
inference score that participants achieved at the end of a block. The total inference score
was calculated by assigning a value of 1 for every pair of the hierarchy identified correctly,
namely a > b, b > c, and a > c, then summing over those values for the trials that made up a
block. Tisch 2 shows several examples of scoring.

We can now scrutinize the performance in a > b > c blocks by looking at the effect of trial
order on the total inference scores. To be more precise, we are interested in the effect of early
vs. late presentation of evidence about the most preferred feature value. Comparing the total
inference scores in blocks with a > b versus b > c evidence appearing last in a block (Figur 6),
we find a marginal effect of trial order ( β = −0.196, SE = 0.105, t = −1.86, p = 0.064)—a trend
indicating that participants may have performed less well when they saw the information
about the most preferred value early in the block (b > c blocks). In this analysis, we treated
the total inference score as the dependent variable, the type of evidence block as an indepen-
dent variable, and included random intercepts for participants.

Looking qualitatively at the errors, we see that after receiving b > c as the final piece of
evidence, some participants rated the middle (B) value higher than the previously learned best
preferred value (A), since the latter did not appear in the final trial. Mit anderen Worten, Erinnerung
limitations may be responsible for less successful information integration in blocks where the
information about the most preferred feature value came early in the block.

The difficulties in learning the hierarchy in the a > b > c blocks might have been addition-
ally caused by a confound introduced by the wording of the task. The instructions specified
that the listener always chooses her favorite feature value. These instructions were added to
signal that the listener’s object choice is deterministic. Daher, “favorite” is predicted to be inter-
preted as favorite among the available options. The meaning of words is commonly restricted
by the relevant context, oder, to use linguistic terminology, by the current event or situation
(Barwise & Perry, 1983; Kratzer, 2021). Jedoch, we cannot exclude the possibility that par-
ticipants interpreted the predicate “favorite” as applying globally to the whole block of four

Tisch 2.

Examples of total inference score calculation. True hierarchy: a > b > c

Inferred hierarchy
a > b > c

a > b
1

b > c
1

a > c
1

Total inference score
3

a > b, C

b > a > c

c > a > b

c > b > a

OPEN MIND: Discoveries in Cognitive Science

119

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 6.
Experiment 1: Normalized histograms over total inference scores for the blocks in
which the full a > b > c preference hierarchy could be learned, dependent on whether the infor-
mation about a > b (links) or b > c (Rechts) was available in the last trial(S). The total inference score
counts the number of correctly ordered feature pairs in the inferred preferenc hierarchy. The results
imply a tendency towards a recency bias: when information about b > c arrived last, Teilnehmer
generate more preference ordering errors (d.h., incorrectly ranking b over a).

Versuche, rather than to the current trial only. This interpretation would yield confusion when b > c
evidence appeared last, as participants may have concluded that b was in fact the absolute
favorite feature value.

In Experiment 1, we asked whether participants can a) infer choice
Experiment 1: Summary.
priors of the listener upon observing how she resolves referential ambiguity and b) ob
participants integrate information across a series of trials, manipulating the information that
was available to the participant. We replicate the results of Achimova et al. (2022) and confirm
that participants are indeed capable of inferring the choice priors of others upon observing a
choice of an object in a situation where the utterance applies to more than one object. We further
show that participants are more successful at integrating information across a series of trials for
blocks with partial hierarchy—that is, blocks with less information to integrate. By analyzing the
way that participants used trial evidence and the errors that resulted, it appears that errors are
attributable to information integration recency effects and potentially misleading instructions.

EXPERIMENT 2: COMBINED DESIGN

With evidence that speakers can use ambiguity-resolution behavior to infer choice priors in
multi-shot signaling game scenarios, we next explore the utterance-selection behavior of
speakers, seeing whether participants can strategically select ambiguous utterances in an
attempt to learn about the choice priors of their listeners, and whether selecting ambiguous
utterances leads to an increase in learning. In the process, we also explore whether the
increased task engagement necessitated by utterance selection leads to better learning relative
to Experiment 1, where participants encountered pre-selected utterances.

OPEN MIND: Discoveries in Cognitive Science

120

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 7.

Sample trial for Experiment 2.

Material and Methods

Experiment 2 featured a combined utterance-selection and choice-prior-inference design.3 On
each trial, the participants first selected an utterance, then observed a choice of an object by
the listener in response to the selected utterance, and then adjusted the sliders indicating the
inferred listeners’ choice priors. The experiment was carried out on the Amazon Mechanical
Turk crowd-sourcing platform.

Teilnehmer. 100 participants completed the experiment and received £1.5 as compensation.
The experiment design and participant compensation were approved by the Pscyhology
Department Ethics Committee at the University of Tübingen. We excluded data from two par-
ticipants who self-identified as non-native speakers and from three other participants who
reported that they were confused and did not fully understand the instructions. The experiment
lasted approximately 9 minutes (mean = 9.3 minutes, 315 median = 8.8 minutes).

In jedem Versuch, the participants first selected an utterance and then watched a simulated
Design.
listener choose an object. Participants completed 4 blocks of trials, each containing 4 Versuche.
The simulated listener was kept constant within a block. Each subsequent block featured a
different simulated listener. For the utterance-choice portion of the task, participants encoun-
tered combinations of objects that could potentially let the speakers infer the choice priors of
the listener. Daher, we excluded scenarios with three identical objects, since no utterance can
lead to learning about the choice priors with those objects. We also avoided scenarios with all
objects being unique: if the objects do not share any properties, no utterance is ambiguous,
and therefore no learning is possible.

Once the participant selected an utterance, the simulated listener picked an object accord-
ing to her implicit preferences, which always represent a full hierarchy: a > b > c. Zum Beispiel,
if she preferred the solid texture to striped and polka-dotted in Figure 7, she would select a
solid object if this choice was available. If solid objects did not appear in the scene, the listener
would pick the next preferred object according to the implicit a > b > c hierarchy. This process
was deterministic: the listener always picked an object with the most preferred feature value
available given the current scene.

3 Experiment 2 is available at https://cognitive-modeling-experiments.uni-tuebingen.de/publix/9/start?batchId
=18&generalMultiple.

OPEN MIND: Discoveries in Cognitive Science

121

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

After observing the object choice, participants then adjusted sliders to indicate their beliefs
about the listener’s choice priors. The information gain potential of this second part of the trial
was modulated by the participants’ choices of an utterance in the first part. If participants
chose ambiguous utterances that picked out multiple objects that differed in their target feature
value, such a situation offered the potential for learning. Jedoch, if an unambiguous utter-
ance was chosen, no choice priors of the listener could be learned because the object choice
would be uninformative.

Ergebnisse

Unlike in Experiment 1, where we controlled the structure of blocks by either presenting the
full information about the hierarchy (a > b > c) or only partial information (a > b, C) over the
range of four trials, in Experiment 2 the type of learning scenario was determined by the par-
ticipant’s utterance choices. Ambiguous informative utterances created learning opportunities,
while unambiguous utterances did not permit any inferences. Despite the fact that we could
not systematically manipulate the learning scenario (d.h., block type) as an experimental
parameter, we were able to analyze the resulting trial configurations post-hoc. We identified
the blocks where participants could have learned the full preference hierarchy a > b > c or the
partial hierarchy a > b, C, given the utterances that they chose. The question we asked was
whether they indeed succeeded in inferring the choice priors.

Just like in Experiment 1, we start by examining whether participants were
Inference success.
more successful in inferring the preferred value a in a > b, c vs. a > b > c blocks. We again
coded whether they identified the preferred value correctly at the end of each block as a bino-
mial variable and treated the type of block as the independent variable. We then fit the data
with a generalized linear mixed model with random intercepts for participants. The analysis
revealed that, similarly to Experiment 1, participants were more successful in identifying the
preferred value in the partial hierarchy blocks (a > b, C: mean score 0.831, a > b > c: mean
Punktzahl 0.722; β = 0.977, SE = 0.378, z = 2.580, P < 0.01). We also registered a significant interaction of experiment and type of block ( β = −1.284, SE = 0.484, z = −2.651, p < 0.01): the performance was comparable in a > B, c blocks over 2 experiments (mean1 =
0.86; mean2 = 0.83), while the the scores diverged to a greater extent between experiments
in a > b > c blocks (mean1 = 0.5; mean2 = 0.72). Gesamt, participants were more successful in
identifying the preferred value in Experiment 2 ( β = 1.048, SE = 0.325, z = 3.220, P < 0.01). Integration of evidence across trials. Next, we look at participants’ ability to infer the full hier- archy of preferences, looking at accumulated evidence scores which signal whether partici- pants used the available evidence from the previous trials to update their inference of the listener’s choice priors. Figure 8 plots the distribution of participants’ accumulated evidence scores and the trial evidence score (bottom right panel). Unlike in the analysis reported above where we focused on inferring the preferred value at the end of the block, here we use all the trials of each block. We analyze whether the hierarchy that participants indicate at each step corresponds to the information about choice priors that was in principle available to them. The density plots in Figure 8 illustrate that participants were more successful in integrating evidence in the a > B, c blocks compared to the a > b > c blocks: the purple distribution in the
top right is skewed to the right, while the gray distribution in the top left distributes the density
mass more evenly. This interpretation is confirmed by the results of a generalized mixed effects
Modell. Our model predicted binary accumulated evidence scores by the type of block; we also
fit random intercepts for participants. The analysis reveals that, as in Experiment 1, Teilnehmer

OPEN MIND: Discoveries in Cognitive Science

122

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 8.
Experiment 2: Density plots over the proportion of evidence-respecting preference
inference trials, dependent on the available trial evidence (right bottom) or accumulated evidence
in all blocks (bottom left) or block-respective (top panels). Compared with Experiment 1 (vgl.
Figur 5), the inferred accumulated information is of higher quality.

integrated evidence more successfully in a > b, c blocks (mean = 0.837) compared to full hier-
archy a > b > c blocks (mean = 0.557; β = 2.843, SE = 0.328, z = 8.658, P < 0.001). In Figure 9, we look at the total inference scores for a > b > c blocks depending
Analysis of errors.
on whether the a > b or b > c information was elicited later in the trials. We calculate total infer-
ence scores by evaluating which of the pairs from the hierarchy a > b > c were identified correctly.
For each of the pairs {A, B}, {B, C}, Und {A, C}, we assign a value of 1 and then sum them up. Der
first thing to note is that these inference scores are overall higher for Experiment 2 compared to
Experiment 1: they are skewed to the right regardless of which piece of evidence came later, mit
almost 60% of participants inferring the correct full hierarchy when experiencing a > b > c blocks.
To assess this effect quantitatively, we fit the data with a linear mixed model, treating the inference
score as the dependent variable, and the experiment (1 vs. 2) as an independent variable. Der
random effect structure included random intercepts for participants. The analysis revealed that
participants achieved higher inference scores in a > b > c blocks for Experiment 2 (mean =
2.15) compared to Experiment 1 (mean = 1.94) (β = 0.216, SE = 0.08, t = 2.418, p = 0.017).

Strategic ambiguity. We hypothesized that learning success (d.h., was the full choice prior hier-
archy inferred?) depends on the quality of utterances that the participants selected. We define
utterance quality in a trial based on a three-step procedure. Erste, we identify whether the utter-
ance that a person selected is ambiguous; mit anderen Worten, does the utterance apply to more than
one object? Zweite, we check the extent to which the utterance-conforming objects differ on the
target feature dimension. Zum Beispiel, if the speaker selected the utterance “red” and there are
two red objects in the scene, we check whether they differ in their target feature values (wenn die
target feature is “shape”, we check whether they are both clouds, circles, or squares). If the utter-
ance “red” picks out a red circle and a red square, it receives a score of 2; if “red” applies to a red

OPEN MIND: Discoveries in Cognitive Science

123

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Figur 9. Experiment 2: Histograms of total inference scores for the blocks in which the full a >
b > c preference hierarchy could be learned, dependent on whether information about a > b or
b > c was available in the last trial(S). In contrast to Experiment 1 (vgl. Figur 6), the two densities
hardly differ, indicating a lower recency bias and thus a better accumulative integration of information.

circle, a red square, and a red cloud, it receives a score of 3. Unambiguous utterances receive
the score 1. Dritte, we evaluate how the utterance compares to the best possible utterance in that
trial. This comparison transforms the score calculated in the previous step into a value between 0
(worst) Und 1 (best). This transformed value is the utterance quality score. The utterance quality
score reflects whether a person chose an utterance that was ambiguous (applied to more than 1
Objekt), informative (it applied to objects that differ on the target dimension), and optimal (Dort
is no other utterance that would allow learning about more target feature values).

We can now evaluate whether the utterance quality score is a predictor of the overall per-
formance in the inference task. To get a metric of the performance, we assess whether partic-
ipants inferred the full hierarchy of preferences a > b > c. Like in Experiment 1, we assign a
score of 1 for every relation between values inferred correctly. Zum Beispiel, a participant who
inferred the relations a > b, b > c, a > c receives the total inference score of 3. To plot the total
inference scores against utterances quality scores, we first calculated average scores for every
person depending on the trials that they saw in the experiment. Figur 10 plots average infer-
ence scores against average utterance quality scores, showing that when participants strategi-
cally selected more ambiguous utterances that created the potential for learning, they indeed
were more likely to learn the choice priors better; this result is confirmed by a linear model
where we treated total inference scores averaged per person as the dependent variable and the
corresponding utterance quality scores as the independent variable ( β = 1.599, SE = 0.1752,
t = 9.124, P < 0.001). Utterance quality explains 46% of variance in the learning scores. The utterance quality scores we calculated above depends on two interrelated properties of the utterances: their ambiguity and their informativity. When calculating the utterance quality score, we rewarded the choice of ambiguous utterances. However, since all trials contained at least one ambiguous utterance, it is possible that participants picked ambiguous utterances by chance rather than strategically. In order to assess the strategic aspect of utterance choice, we OPEN MIND: Discoveries in Cognitive Science 124 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / / . 1 0 1 1 6 2 o p m _ a _ 0 0 0 7 4 2 0 7 8 8 9 7 o p m _ a _ 0 0 0 7 4 p d / . i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Active Social Inference Achimova et al. Experiment 2: Utterance quality correlates with the total inference scores. Thus, more Figure 10. ambiguous utterances indeed enable participants to learn more about the hidden preference hier- archy in our task. calculated the chance level of ambiguous utterances for each participant depending on the trials they saw, and then subtracted the number of ambiguous utterances predicted by chance from the number of ambiguous utterances each participant selected. Figure 11 shows the resulting difference scores: it plots participant IDs on the x-axis and their difference scores on the y-axis. Color coding reflects the magnitude and the polarity of the score. We observe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / . / 1 0 1 1 6 2 o p m _ a _ 0 0 0 7 4 2 0 7 8 8 9 7 o p m _ a _ 0 0 0 7 4 p d / . i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 11. Experiment 2: Difference scores by participant. The difference score measures strate- gic usage of ambiguity. Values below zero imply active ambiguity avoidance, while values signif- icantly above zero imply strategic ambiguity choices. OPEN MIND: Discoveries in Cognitive Science 125 Active Social Inference Achimova et al. that, while some participants strategically chose non-ambiguous utterances (data points below the reference line), 84% of the datapoints fall above the reference line, and darker color coding marks those participants who strategically and systematically chose ambiguous utterances. As a conservative estimate of the proportion of participants who strategically chose ambiguous utterances, we see that 55% have difference scores above 5. In sum, higher rates of selecting informative ambiguous utterances are Experiment 2: Summary. associated with greater success in learning the full hierarchy of choice priors of the listener, as demonstrated by the comparison of inference scores across experiments. Moreover, a group of participants chooses ambiguous utterances systematically rather than randomly, suggesting that they are strategically choosing those utterances to improve their learning chances. GENERAL DISCUSSION In this paper, we focused on the factors that determine the success of choice prior inference in a situation of observing a referential choice. We have demonstrated that participants are capa- ble of inferring not only simple priors upon observing a single act of disambiguation, but also more complex, hierarchical choice priors by accumulating evidence over multiple trials in a rational manner. Experiment 1 revealed that, despite the low overall number of trials (only four trials in each block), many participants managed to successfully integrate the available infor- mation about preferences. This process was easier when only a simple a > B, c feature hier-
archy had to be learned. The fact that redundant information about the simulated choice priors
helped to get the choice prior hierarchy correct indicates that the task was quite challenging. A
deeper analysis of those cases where participants failed to infer the relevant hierarchy showed
that some participants exhibited a recency bias—perhaps driven by the task instructions—
which led to overwriting previously encountered pair-wise choice prior differences: partici-
pants performed better at correctly concluding that a was the highest ranked option among
the relevant choices in the choice prior in blocks of trials where a > b information came last
compared to blocks that featured b > c trials as the last ones. Gesamt, the results of our first
experiment imply that iterative evidence accumulation is possible but challenging in the inves-
tigated signaling game scenario, perhaps because the scenario is somewhat artificial, but also
because our instructions may have been misleading. We thus moved on to a more active
social interaction scenario in Experiment 2.

Experiment 2 demonstrated that being able to play an active part in generating learning
scenarios—and thereby presumably being more engaged in the task—yielded higher inference
success. The data also revealed that the use of ambiguous utterances in the signaling game
scenario indeed allowed for learning about the listener’s priors. We observed that participants
were more likely to infer the correct priors if they used informative ambiguous utterances, con-
firming that they are capable of strategically employing ambiguity as an epistemic tool. More-
über, our results suggest that the observation of the full signaling game, including the active
utterance choice, helps participants to make use of the full Bayesian inference pipeline. Das
conclusion is supported by the fact that the recency bias was much smaller in Experiment 2
compared to Experiment 1, particularly in the participants that managed to choose utterances
in a way that yielded sufficient information to extract the full a > b > c choice prior hierarchy.
Daher, Experiment 2 showed that task engagement can be enhanced when full signaling games
are played out pragmatically. Darüber hinaus, it confirms that more complex hierarchies can be
learned iteratively over time by corroborating choice information over successive trials. In
the future, we plan to model the observed behavior by means of Achimova et al.’s (2022)
RSA-based utterance choice and choice prior-inference mechanisms, potentially contrasting

OPEN MIND: Discoveries in Cognitive Science

126

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

iterative Bayesian updating with reinforcement learning approaches (Ciranka et al., 2022;
Glasauer, 2019).

Despite the apparent simplicity of our task, we observed that memory limitations in some
circumstances likely prevented the successful integration of choice priors. Unlike in one of the
experiments reported in Baker et al. (2017), our participants received no prior information
about the structure of the relevant choice priors. Baker et al. informed their participants that
the agent preferred property a over property b, and properties a and b over property c, daher
providing prior expectations that might structure the inference process. The absence of this
information in our experiments might be partially responsible for lower inference scores for
the full choice prior hierarchy, particularly in Experiment 1 without active speaker engage-
ment. Jedoch, in Experiment 2 we show that enabling participants to set up their experiment
selbst, thus actively creating ambiguous instructive choice situations and observing the
ambiguity resolution behavior, facilitates choice prior inference. With at least 55% of partici-
pants systematically selecting ambiguous utterances (their difference rate was above 5 In
Figur 11), it appears that the multi-trial utterance-choice setting of our current task yields
greater rates of ambiguous utterance selection than the 26% of participants identified as having
done so in the single-trial experiment of Achimova et al. (2022).4 This comparison indicates
that the observation of the full signaling game trials may have better motivated the potential
utility of ambiguity. When participants could observe the listener’s choice of an object follow-
ing the utterance they selected, they were able to better anticipate in subsequent trials what
types of utterances are useful for learning. Compared to Experiment 1, consequent choice prior
inference was more successful, further supporting the benefit of setting up actual inference
experiments. While the participants did make inference errors in integrating the information
across a series of trials, their performance nevertheless remains relatively high given the num-
ber of trials that were available. Experiments that target the inference of more complex choice
priors that include seven rather than three values of the target feature and involve transitive
inference can include as many as 300–500 evidence trials (Ciranka et al., 2022).

Despite the observed increase in strategic utterance choices, our results still confirm that
actively engineering learning opportunities remains a complex task for some participants. In
this paper, we used the case of referential ambiguity to create a situation where a behavioral
choice may reflect the person’s priors. A further look into strategic ambiguity as a linguistic
phenomenon may in fact suggest a possible source of how such learning opportunities
develop. Theoretical models of dog-whistles (Henderson & McCready, 2017) and strategic
indirectness (Pinker et al., 2008) suggest that ambiguity may emerge as an epiphenomenon
of the speaker simultaneously pursuing a combination of information transfer and social goals.
More recent experimental evidence suggests that indirectness can also emerge when speakers
optimize social goals along with information transfer goals ( Yoon et al., 2020). Unabhängig
of how referential choices emerge, the listener’s response reveals aspects of her choice prior.
The results presented here suggest that active engagement and iterative social exchanges can
increase the chance of inference success. Whether this is the case also in more natural social
interactions remains to be shown.

4 It is important to acknowledge that the procedures for calculating the proportion of participants who use
ambiguity strategically is not identical across the two studies. Achimova et al. (2022) used a modeling approach
and identified “strategic” participants based on the value of a parameter that regulates the choice of utterances
by scaling how important information gain is for a given person. In the present work, we conservatively identify
strategic ambiguity use by assessing whether a participant chose ambiguous utterances markedly more often
than would be expected by chance. Jedoch, in both cases, the selection of a cut-off point that separates stra-
tegic ambiguity from its non-strategic use is to a certain degree arbitrary.

OPEN MIND: Discoveries in Cognitive Science

127

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

DATA AND MATERIALS AVAILABILITY

Data and analysis code are available at this OSF repository: https://osf.io/yn4wd/?view_only
=a723e0e89688475ea022cf59d2e3e9df.

ACKNOWLEDGMENTS

We would like to thank Johannes Bertram (University of Tübingen) for his help with experi-
ment implementation, data processing and analysis. Two anonymous reviewers provided
thoughtful comments and suggestions and allowed us to see the results in a new light.

FUNDING INFORMATION

The project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) via the Research Training Group 1808: Ambiguity – Production and Perception,
project number 198647426. Martin V. Butz is also a member of the Machine Learning Cluster
of Excellence, EXC number 2064/1 – Project number 390727645.

BEITRÄGE DES AUTORS

Asya Achimova: Konzeptualisierung: Equal; Datenkuration: Lead; Formale Analyse: Lead; Meth-
odology: Equal; Visualisierung: Lead; Writing – Original draft: Lead. Gregory Scontras: Concep-
tualization: Equal; Formale Analyse: Supporting; Visualisierung: Supporting; Writing – Original
Entwurf: Equal. Ella Eisemann: Konzeptualisierung: Equal; Datenkuration: Equal; Methodik:
Equal; Visualisierung: Supporting; Writing – Original draft: Supporting. Martin V. Butz: Concep-
tualization: Equal; Formale Analyse: Supporting; Akquise von Fördermitteln: Lead; Methodik:
Equal; Aufsicht: Lead; Visualisierung: Supporting; Writing – Original draft: Equal.

VERWEISE

Achimova, A., Scontras, G., Stegemann-Philipps, C., Lohmann, J.,
& Butz, M. V. (2022). Learning about others: Modeling social
inference through ambiguity resolution. Cognition, 218, Article
104862. https://doi.org/10.1016/j.cognition.2021.104862,
PubMed: 34634532

Bäcker, C. L., Jara-Ettinger, J., Sachsen, R., & Tenenbaum, J. B. (2017).
Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing. Natur menschliches Verhalten, 1(4), Article
0064. https://doi.org/10.1038/s41562-017-0064

Barwise, J., & Perry, J. (1983). Situations and attitudes. MIT Press.
Bratman, M. (1984). Two faces of intention. The Philosophical

Rezension, 93(3), 375–405, https://doi.org/10.2307/2184542

Bryant, P. E., & Trabasso, T. (1971). Transitive inferences and mem-
ory in young children. Natur, 232(5311), 456–458. https://doi
.org/10.1038/232456a0, PubMed: 4937205

Ciranka, S., Linde-Domingo, J., Padezhki, ICH., Wicharz, C., Wu,
C. M., & Spitzer, B. (2022). Asymmetric reinforcement learning
facilitates human inference of transitive relations. Nature Human
Behaviour, 6(4), 555–564. https://doi.org/10.1038/s41562-021
-01263-w, PubMed: 35102348

Evans, O., Stuhlmüller, A., & Guter Mann, N. D. (2016). Learning the
preferences of ignorant, inconsistent agents. In V. Rus & Z.
Markov (Hrsg.), Proceedings of the 30th AAAI Conference on
Artificial Intelligence (S. 323–329). AAAI Press. https://doi.org
/10.1609/aaai.v30i1.10010

Frank, M. C., & Guter Mann, N. D. (2012). Predicting pragmatic rea-
soning in language games. Wissenschaft, 336(6084), 998. https://doi
.org/10.1126/science.1218633, PubMed: 22628647

Franke, M., & Jäger, G. (2016). Probabilistic pragmatics, or why
Bayes’ rule is probably important for pragmatics. Zeitschrift für
Sprachwissenschaft, 35(1), 3–44. https://doi.org/10.1515/zfs
-2016-0002

Glasauer, S. (2019). Sequential Bayesian updating as a model for
human perception. In S. Ramat & A. G. Shaikh (Hrsg.), Progress
in brain research (Bd. 249, S. 3–18). Sonst. https://doi.org
/10.1016/bs.pbr.2019.04.025, PubMed: 31325989

Guter Mann, N. D., & Frank, M. C. (2016). Pragmatic language inter-
pretation as probabilistic inference. Trends in den Kognitionswissenschaften,
20(11), 818–829. https://doi.org/10.1016/j.tics.2016.08.005,
PubMed: 27692852

Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016).
Cooperative inverse reinforcement learning. In D. D. Lee, M.
Sugiyama, U. V. Luxburg, ICH. Guyon, & R. Garnett (Hrsg.), Advances
in neural information processing systems 29 (S. 3909–3917).
Curran Associates, Inc.

Henderson, R., & McCready, E. (2017). How dogwhistles work. In S. Arai,
K. Kojima, K. Mineshima, D. Bekki, K. Satoh, & Y. Ohta (Hrsg.), JSAI
International Symposium on Artificial Intelligence (S. 231–240).
Springer. https://doi.org/10.1007/978-3-319-93794-6_16

Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B.
(2016). The naïve utility calculus: Computational principles
underlying commonsense psychology. Trends in Cognitive Sci-
zen, 20(8), 589–604. https://doi.org/10.1016/j.tics.2016.05
.011, PubMed: 27388875

Jara-Ettinger, J., Schulz, L. E., & Tenenbaum, J. B. (2020). The naïve
utility calculus as a unified, quantitative framework for action

OPEN MIND: Discoveries in Cognitive Science

128

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

ich

1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
7
4
2
0
7
8
8
9
7
Ö
P
M
_
A
_
0
0
0
7
4
P
D

ich

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Active Social Inference Achimova et al.

Verständnis. Cognitive Psychology, 123, Article 101334. https://
doi.org/10.1016/j.cogpsych.2020.101334, PubMed: 32738590
Jern, A., Lucas, C. G., & Kemp, C. (2017). People learn other peo-
ple’s preferences through inverse decision-making. Cognition,
168, 46–64. https://doi.org/10.1016/j.cognition.2017.06.017,
PubMed: 28662485

Jones, E. E., & Davis, K. E. (1965). From acts to dispositions: Der
attribution process in person perception. In L. Berkowitz (Ed.),
Advances in experimental social psychology (Bd. 2, S. 219–266).
Sonst. https://doi.org/10.1016/S0065-2601(08)60107-0

Kelley, H. H. (1967). Attribution theory in social psychology. In
D. Levine (Ed.), Nebraska symposium on motivation (Bd. 15,
S. 192–238). University of Nebraska Press.

Kelley, H. H., & Stahelski, A. J. (1970). Social interaction basis of
cooperators’ and competitors’ beliefs about others. Zeitschrift für
Personality and Social Psychology, 16(1), 66–91. https://doi.org
/10.1037/h0029849

Kratzer, A. (2021). Situations in natural language semantics. In E. N.
Zalta (Ed.), The Stanford encyclopedia of philosophy (Winter
2021 Hrsg.). Metaphysics Research Lab, Universität in Stanford.

Kushnir, T., Xu, F., & Wellman, H. M. (2010). Young children use
statistical sampling to infer the preferences of other people. Psy-
chological Science, 21(8), 1134–1140. https://doi.org/10.1177
/0956797610376652, PubMed: 20622142

Lewis, D. K. (1975). Adverbs of quantification. In E. L. Keenan (Ed.),
Formal semantics of natural language (S. 3-15). Cambridge Uni-
versity Press. https://doi.org/10.1017/CBO9780511897696.003
Pinker, S., Nowak, M. A., & Lee, J. J. (2008). The logic of indirect
Rede. Verfahren der Nationalen Akademie der Wissenschaften,
105(3), 833–838. https://doi.org/10.1073/pnas.0707192105,
PubMed: 18199841

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: Ein

introduction (2nd ed.). MIT Press.

Wynne, C. D. L. (1995). Reinforcement accounts for transitive infer-
ence performance. Animal Learning & Behavior, 23(2), 207–217.
https://doi.org/10.3758/BF03199936

Yoon, E. J., Tessler, M. H., Guter Mann, N. D., & Frank, M. C. (2020).
Polite speech emerges from competing social goals. Open Mind,
4, 71–87. https://doi.org/10.1162/opmi_a_00035, PubMed:
33225196

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
Ö
P
M

ich
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/