LETTER - Specialized Research AI at MIT

LETTER

Communicated by Karl Friston

Systems of Bounded Rational Agents with
Information-Theoretic Constraints

Sebastian Gottwald
sebastian.gottwald@uni-ulm.de
Daniel A. Braun
daniel.braun@uni-ulm.de
Institute of Neural Information Processing, Faculty of Engineering,
Computer Science and Psychology, University of Ulm,
Ulm, Baden-Württemberg, 89081 Germany

Specialization and hierarchical organization are important features of
efficient collaboration in economical, artificial, and biological systems.
Here, we investigate the hypothesis that both features can be explained
by the fact that each entity of such a system is limited in a certain way.
We propose an information-theoretic approach based on a free energy
principle in order to computationally analyze systems of bounded ra-
tional agents that deal with such limitations optimally. We find that
specialization allows a focus on fewer tasks, thus leading to a more effi-
cient execution, but in turn, it requires coordination in hierarchical struc-
tures of specialized experts and coordinating units. Our results suggest
that hierarchical architectures of specialized units at lower levels that are
coordinated by units at higher levels are optimal, given that each unit’s
information-processing capability is limited and conforms to constraints
on complexity costs.

1 Introduction

The question of how to combine a given set of individual entities in or-
der to perform a certain task efficiently is a long-lasting question shared
by many disciplines, including economics, neuroscience, and computer sci-
ence. Although the explicit nature of a single individuum might differ be-
tween these fields—for example, an employee of a company, a neuron in
a human brain, or a computer or processor as part of a cluster—they have
one important feature in common that usually prevents them from func-
tioning isolated by themselves: they are all limited. In fact, this was the
driving idea that inspired Herbert A. Simon’s early work on decision mak-
ing within economic organizations (Simon, 1943, 1955), which earned him a
Nobel prize in 1978. He suggested that a scientific behavioral grounding of
economics should be based on bounded rationality, which has remained an
active research topic (Russell & Subramanian, 1995; Lipman, 1995; Aumann,

Neural Computation 31, 440–476 (2019)
doi:10.1162/neco_a_01153

© 2018 Massachusetts Institute of Technology.
Published under a Creative Commons
Attribution 4.0 International (CC BY 4.0) license.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

441

1997; Kaelbling, Littman, & Cassandra, 1998; DeCanio and Watkins, 1998;
Gigerenzer & Selten, 2001; Jones, 2003; Sims, 2003; Burns, Ruml, & Do, 2013;
Ortega & Braun, 2013; Acerbi, Vijayakumar, & Wolpert, 2014; Gershman,
Horvitz, & Tenenbaum, 2015). Subsequent studies in management theory
have been built on Simon’s basic observation, because “if individual man-
agers had unlimited access to information that they could process costlessly
and instantaneously, there would be no role for organizations employing
multiple managers” (Geanakoplos & Milgrom, 1991). In neuroscience and
biology, similar concepts have been used to explore the evolution of special-
ization and modularity in nature (Kashtan & Alon, 2005; Wagner, Pavlicev,
& Cheverud, 2007). In modern computer science, the terms parallel com-
puting and distributed computing denote two separate fields that share the
concept of decentralized computing (Radner, 1993)—the combination of
multiple processing units in order to decrease the time of computationally
expensive calculations.

Despite their success, there are also shortcomings of most approaches
to the organization of decision-making units based on bounded rationality.
As DeCanio and Watkins (1998) point out, existing agent-based methods
(including their own) are not using an overreaching optimization principle
but are tailored to the specific types of calculations the agents are capable
of, and therefore lack in generality. Moreover, it is usually imposed as a
separate assumption that there are two types of units, specialized opera-
tional units and coordinating nonoperational units, which was expressed
by (Knight, 1921) as “workers do, and managers figure out what to do.”

Here, we use a free energy optimization principle in order to study
systems of bounded rational agents, extending the work in Ortega and
Braun (2011, 2013), Genewein and Braun (2013) and Genewein, Leibfried,
Grau-Moya, and Braun (2015) on decision making, hierarchical information
processing, and abstraction in intelligent systems with limited information-
processing capacity, that has precursors in the economic and game-theoretic
literature (McKelvey & Palfrey, 1995; Ochs, 1995; Mattsson & Weibull, 2002;
Wolpert, 2006; Spiegler, 2011; Howes, Lewis, & Vera, 2009; Todorov, 2009;
Still, 2009; Tishby & Polani, 2011; Kappen, Gómez, & Opper, 2012; Vul,
Goodman, Griffiths, & Tenenbaum, 2014; Lewis, Howes, & Singh, 2014).
Note that the free energy optimization principle of information-theoretic
bounded rationality is connected to the free energy principle used in varia-
tional Bayes and active inference (Friston, Levin, Sengupta, & Pezzulo, 2015;
Friston, Rigoli et al., 2015; Friston, Lin, Frith, & Pezzulo, 2017; Friston, Parr,
& de Vries, 2017), but has a conceptually distinct interpretation and some
formal differences (see section 6.3 for a detailed comparison).

By generalizing the ideas in Genewein and Braun (2013) and Genewein
et al. (2015) on two-step information processing to an arbitrary number of
steps, we arrive at a general free energy principle that can be used to study
systems of bounded rational agents. The advantages of our approach can
be summarized as follows:

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

442

S. Gottwald and D. Braun

1. There is a unifying free energy principle that allows for a multiscale
problem formulation for an arbitrary number of agents distributed
among the steps of general multistep processes (see sections 3.3 and
4.2).

2. The computational nature of the optimization principle allows ex-
plicitly calculating and comparing optimal performances of differ-
ent agent architectures for a given set of objectives and resource
constraints (see section 5).

3. The information-theoretic description implies the existence of the
two types of units already mentioned, nonoperational units (selector
nodes) that coordinate the activities of operational units. Depending
on their individual resource constraints, the free energy principle as-
signs each unit to a region of specialization that is part of an optimal
partitioning of the underlying decision space (see section 4.3).

In particular, we find that for a wide range of objectives and resource
limitations (see sections 5.3 and 5.3), hierarchical systems with specialized
experts at lower levels and coordinating units at higher levels generally out-
perform other structures.

2 Preliminaries

This section serves as an introduction to the terminology required for our
framework presented in sections 3 and 4.

(cid:2)

2.1 Notation. We use curly letters (e.g., W, X , A) to denote sets of fi-
nite cardinality—in particular, the underlying spaces of the corresponding
random variables (e.g., W, A, X)—whereas the values of these random vari-
ables are denoted by lowercase letters: w ∈ W, a ∈ A, and x ∈ X , respec-
tively. We denote the space of probability distributions on a given set X by
PX . Given a probability distribution p ∈ PX , the expectation of a function
f : X → R is denoted by (cid:4) f (cid:5)p :=
x p(x) f (x). If the underlying probability
measure is clear without ambiguity, we just write (cid:4) f (cid:5).

For a function g with multiple arguments (e.g., for g : X × Y → R,
(x, y) (cid:6)→ g(x, y)), we denote the function X → R, x (cid:6)→ g(x, y) for fixed y ∈ Y
by g( · , y) (partial application), that is, the dot indicates the variable of the
new function. Similarly, for fixed y ∈ Y, we denote a conditional probabil-
ity distribution on X with values p(x|y) by p( · |y). This notation shows the
dependencies clearly without giving up the original function names and
thus allows writing more complicated expressions in a concise form. For
example, if F is a functional defined on functions of one variable, such as
F[ f ] :=
x f (x) for all functions f : X → R, then evaluating F on the func-
tion g in its first variable while keeping the second variable fixed, is simply
denoted by F[g( · , y)]. Here, the dot indicates on which argument of g the
functional F is acting, and at the same time it records that the resulting value

(cid:2)

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

443

(cid:2)

(which equals
particular x but on the fixed y.

x g(x, y) in the case of the example) does not depend on a

2.2 Decision Making. Here, we consider (multitask) decision making
as the process of observing a world state w ∈ W, sampled from a given dis-
tribution ρ ∈ PW , and choosing a corresponding action a ∈ A drawn from
a posterior policy P( · |w) ∈ PA. Assuming that the joint distribution of W
and A is given by p(a, w) := ρ(w)P(a|w), then P is the conditional probabil-
ity distribution of A given W. Unless stated otherwise, the capital letter P
always denotes a posterior, while the lowercase letter p denotes the joint
distribution or a marginal of the joint (i.e., a dependent variable).

A decision-making unit is called an agent. An agent is rational if its pos-

terior policy P maximizes the expected utility,

(cid:4)U(cid:5) =

(cid:3)

w∈W

ρ(w)

(cid:3)

a∈A

P(a|w) U(a, w),

(2.1)

for a given utility function U : W × A → R. Note that the utility U may itself
represent an expected utility over consequences in the sense of von Neu-
mann and Morgenstern (1944), where W would serve as a context variable
for different tasks. The posterior P can be seen as a state-action policy that
selects the best action a ∈ A with respect to a utility function U given the
state w ∈ W of the world.

2.3 Bounded Rational Agents. In the information-theoretic model of
bounded rationality (Ortega & Braun, 2011, 2013; Genewein et al., 2015), an
agent is bounded rational if its posterior P maximizes equation 2.1, subject
to the constraint

(cid:4)
(cid:5)
DKL(P(cid:7)q)

(cid:3)

w∈W

ρ(w) DKL(P( · |w)(cid:7)q) (cid:2) D0

(2.2)

> 0 and a prior policy q ∈ PA. Here, DKL(p(cid:7)q) denotes
for a given bound D0
the Kullback-Leibler (KL) divergence between two distributions p, q ∈ PY
(cid:2)
on a set Y, defined by DKL(p(cid:7)q) :=
y∈Y p(y) log(p(y)/q(y)). Note that for
DKL(p(cid:7)q) to be well defined, p must be absolutely continuous with respect to
q, so that q(y) = 0 implies p(y) = 0. When p or q are conditional probabilities,
we treat DKL(p(cid:7)q) as a function of the additional variables.

Given a world state w, the information processing consists of transform-
ing a prior q to a world-state specific posterior distribution P( · |w). Since
DKL(P( · |w)(cid:7)q) measures by how much P( · |w) diverges from q, the upper
bound D0 in equation 2.2 characterizes the limitation of the agent’s aver-
age information-processing capability: If D0 is close to zero, the posterior
must be close to the prior for all world states, which means that A contains

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

444

S. Gottwald and D. Braun

only little information about W, whereas if D0 is large, the posterior is al-
lowed to deviate from the prior by larger amounts and therefore A contains
more information about W. We use the KL divergence as a proxy for any
resource measure, as any resource must be monotone in processed informa-
tion, which is measured by the KL divergence between prior and posterior.
Technically, maximizing expected utility under constraint 2.2 is the same
as minimizing expected complexity cost under the constraint of a minimal
expected performance, where complexity is given by the expected KL di-
vergence between prior and posterior and performance by expected utility.
Minimizing complexity means minimizing the number of bits required to
generate the actions.

2.4 Free Energy Principle. By the variational method of Lagrange mul-
tipliers, the above constrained optimization problem is equivalent to the
unconstrained problem,

(cid:6)

(cid:4)U(cid:5) − 1
β

(cid:7)
(cid:4)
(cid:5)
DKL(P(cid:7)q)

max
P

(2.3)

where β > 0 is chosen such that the constraint 2.2 is satisfied. In the liter-
ature on information-theoretic bounded rationality (Ortega & Braun, 2011,
2013), the objective in equation 2.3 is known as the free energy F of the corre-
sponding decision-making process. In this form, the optimal posterior can
be explicitly derived by determining the zeros of the functional derivative
of F with respect to P, yielding the Boltzmann-Gibbs distribution,

P(a|w) = 1
Z(w)

q(a) e

β U(a,w), Z(w) :=

(cid:3)

a∈A

q(a) e

β U(a,w).

(2.4)

Note how the Lagrange multiplier β (also known as inverse temperature)
interpolates between an agent with zero processing capability that always
acts according to its prior policy (β = 0) and a perfectly rational agent (β →
∞). Note that plugging equation 2.4 back into the free energy equation 2.3,
gives

F[P] = 1
β

max
P

(cid:4)

(cid:5)
log Z

(2.5)

2.5 Optimal Prior. The performance of a given bounded rational agent
crucially depends on the choice of the prior policy q. Depending on D0 and
the explicit form of the utility function, it can be advantageous to a priori
prefer certain actions over others. Therefore, optimal bounded rational de-
cision making includes optimizing the prior in equation 2.3. In contrast to
equation 2.3, the modified optimization problem,

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

(cid:6)

(cid:4)U(cid:5) − 1
β

(cid:4)

(cid:7)
(cid:5)
DKL(P(cid:7)q)

max
P,q

445

(2.6)

does not have a closed-form solution. However, since the objective is con-
vex in (P, q), a unique solution can be obtained iteratively by alternating
between fixing one and optimizing the other variable (Csiszár & Tusnády,
1984), resulting in a Blahut-Arimoto-type algorithm (Arimoto, 1972; Blahut,
1972) that consists of alternating the equations

⎧
⎪⎪⎪⎨
⎪⎪⎪⎩

P(a|w) = 1
Z(w)

q(a) = p(a) =

q(a) e

βU(a,w),

(cid:3)

ρ(w)P(a|w),

(2.7)

with Z(w) given by equation 2.4. In particular, the optimal prior policy is
the marginal p of the joint distribution of W and A. In this case, the average
Kullback-Leibler divergence between prior and posterior coincides with the
mutual information between W and A,

I(W; A) =

(cid:3)

w∈W

a∈A

p(w, a) log

p(w, a)
ρ(w)p(a)

(cid:4)

(cid:5)
.
DKL(P, p)

It follows that the modified optimization principle, equation 2.6, is equiva-
lent to

(cid:6)

(cid:4)U(cid:5) − 1

(cid:7)
β I(W; A)

max
P

(2.8)

Due to its equivalence to rate distortion theory (Shannon, 1959) (with a
negative distortion measure given by the utility function), equation 2.8 is
denoted as the rate distortion case of bounded rationality in Genewein and
Braun (2013).

2.6 Multistep and Multiagent Systems. When multiple random vari-
ables are involved in a decision-making process, such a process constitutes
a multistep system (see section 3). Consider the case of a prior over A that
is conditioned on an additional random variable X with values x ∈ X , that
is, q( · |x) ∈ PA for all x ∈ X . Remember that we introduced a bounded ra-
tional agent as a decision-making unit that, after observing a world state w,
transforms a single prior policy over a choice space A to a posterior policy
P( · |w) ∈ PA. Therefore, in the case of a conditional prior, the collection of
prior policies {q( · |x)}x∈X can be considered as a collection or ensemble of
agents, or a multiagent system, where for a given x ∈ X , the prior q( · |x) is
transformed to a posterior P( · |x, w) ∈ PA by exactly one agent. Note that a

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

446

S. Gottwald and D. Braun

single agent deciding about both, X and A, would be modeled by a prior of
the form q(x, a) with x ∈ X and a ∈ A, instead.

Hence, in order to combine multiple bounded rational agents, we are first
splitting the full decision-making process into multiple steps by introduc-
ing additional intermediate random variables (see section 3), which then
will be used to assign one or more agents to each of these steps (see section
4). In this view, we can regard a multiagent decision-making system as per-
forming a sequence of successive decision steps until an ultimate action is
selected.

3 Multistep Bounded Rational Decision Making

3.1 Decision Nodes. Let W and A denote the random variables de-
scribing the full decision-making process for a given utility function U :
W × A → R, as described in section 2. In order to separate the full pro-
cess into N > 1 steps, we introduce internal random variables X1
, . . . , XN−1,
which represent the outputs of additional intermediate bounded rational
decision-making steps. For each k, let X
∈ X
k
a particular value of Xk. We call a random variable that is part of a multistep
decision-making system a (decision) node. For simplicity, we assume that all
intermediate random variables are discrete (just like W and A).

k denote the target space and xk

Here, we are treating feedforward architectures originating at X0 := W
and terminating in XN := A. This allows labeling the variables {Xk
}N
k=0 ac-
cording to the information flow, so that X j potentially can only obtain in-
formation about Xi if i < j. The canonical factorization, p(w, x1 , . . . , xN−1 , a) = ρ(w) p(x1 |w) p(x2 |x1 , w) · · · p(a|xN−1 , . . . , x1 , w), of the joint probability distribution of {Xk terior policies of each decision node. }N k=0 therefore consists of the pos- 3.2 Two Types of Nodes: Inputs and Prior Selectors. A specific mul- tistep architecture is characterized by specifying the explicit dependencies on the preceding variables for each node’s prior and posterior or, better, the missing dependencies. For example, in a given multistep system, the pos- terior of the node X3 might depend explicitly on the outputs of X1 and X2 , x1). If its prior has the form , w) = P(x3 but not on W, so that P(x3 |x1), then X3 has to process the output of X2. Moreover, in this case, the q(x3 actual prior policy q( · |x1) ∈ PX 3 that is used by X3 for decision making is selected by X1 (see Figure 1). In general, the inputs Xi , . . . , X j that have to be processed by a particular node Xk are given by the variables in the posterior that are missing from the , . . . , Xm, then these prior, and if its prior q is conditioned on the outputs of Xl , x1 |x2 |x2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 1 2 4 4 0 1 0 5 0 9 1 8 n e c o _ a _ 0 1 1 5 3 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Systems of Bounded Rational Agents 447 Figure 1: Example of a processing node that is part of a multistep architecture with N = 5, visualized as a directed graph. Here, X3 processes the output of , x1). The X2 by transforming a prior policy p(x3 prior of X3 being conditioned on the output of X1 (indicated by the dashed ar- row) means that X1 determines which of the prior policies {p( · |x1)} 1 is used by X3 to process a given output of X2. |x1) to a posterior policy P(x3 |x2 ∈X x1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . nodes select which of the prior policies {q( · |xl used by Xk for decision making, that is, for the transformation , . . . , xm)}xl ∈X l ,...,xm∈Xm q(xk |xl , . . . , xm) −→ P(xk |xl , . . . , xm, xi , . . . , x j ). ⊂ PX k is / e d u n e c o a r t i c e - p d / l f / We denote the collection of input nodes of Xk by Xk prior selecting nodes of Xk by Xk , . . . , XN is then given by of X0 (cid:12) sel (:= {Xl (cid:12) (cid:14) p(x0 , . . . , xN ) = ρ(w) P1 x1 x1 sel , x1 in · · · PN xN (cid:13) (cid:13) in (:= {Xi }) and the , . . . , Xm}). The joint distribution , . . . , X j (cid:13) (cid:13) (cid:14) xN sel , xN in (3.1) ∈ X for all xk k and xk sel Specifying the sets Xk in (k = 1, . . . , N). ∈ X k ∈ X k sel, xk in sel and Xk in of selectors and inputs for each node in the system then uniquely characterizes a particular multistep decision-making system. Note that we always have (X1 sel Decompositions of the form 3.1 are often visualized by directed acyclic graphs, so-called DAGs (see Bishop, 2006). Here, in addition to the decom- position of the joint in terms of posteriors, we have added the informa- tion about the prior dependencies in terms of dashed arrows, as shown in Figure 1. in) = ({}, {X0 , X1 }). 3.3 Multistep Free Energy Principle. If Pk and qk denote the posterior and prior of the kth node of an N-step decision process, then the free energy principle takes the form (cid:15) sup ,...,PN,qN ,q1 P1 (cid:4)U(cid:5) − (cid:16) , (cid:5) (cid:7)qk) DKL(Pk N(cid:3) k=1 (cid:4) 1 β k (3.2) / / / 3 1 2 4 4 0 1 0 5 0 9 1 8 n e c o _ a _ 0 1 1 5 3 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 448 S. Gottwald and D. Braun where, in addition to the expectation over inputs, the average of DKL(Pk now also includes the expectation with respect to Xsel: (cid:7)qk) (cid:4) (cid:5) DKL(P(cid:7)q) = (cid:3) xsel ,xin p(xsel , xin) DKL (cid:12) P( · |xsel (cid:14) , xin)(cid:7)q( · |xsel ) . Since the prior policies appear only in the KL divergences and, moreover, there is exactly one KL divergence per prior, it follows, as in section 2.5, that for each k = 1, . . . , N, the optimal prior is the marginal given for all xk ∈ X by k qk(xk |xsel ) = pk(xk |xsel ) := 1 p(xsel ) (cid:3) {x0 ,...,xN}\({xk }∪xsel ) p(x0 , . . . , xN ), (3.3) whenever Xk sel = xsel. Hence, the free energy principle can be simplified to (cid:15) (cid:4)U(cid:5) − sup ,...,PN P1 (cid:16) (cid:14) , (cid:13) (cid:13) Xk sel Xk in ; Xk N(cid:3) k=1 (cid:12) I 1 β k (3.4) where I(X; Y|Z) denotes the conditional mutual information of two random variables X, Y given a third random variable Z. By optimizing equation 3.4 alternatingly, that is, optimizing one poste- rior at a time while keeping the others fixed, we obtain for each k = 1, . . . , N, (cid:12) xk Pk |xsel , xin (cid:14) (cid:12) (cid:14) = pk xk Zk(xsel |xsel , xin) (cid:17) exp β k F k[P1 , . . . , PN](xk , xsel (cid:18) , , xin) (3.5) = xin. Here, Zk(xsel = xsel and Xk in k[P1 , xin) denotes the normaliza- whenever Xk sel tion constant and F , . . . , PN] denotes the (effective) utility function on which the decision making in Xk is based on. More precisely, given ˜X = (Xk in), it is the free energy of the subsequent nodes in the system, that is, for any value of ˜x := (xk , . . . , PN], , xin) we obtain for F k := F , Xk sel , xsel , Xk k[P1 k( ˜x) = 1 F p( ˜x) (cid:3) {x0 ,...,xN}\ ˜x where p(x0 , . . . , xN ) F k,loc(x0 , . . . , xN ), (3.6) F k,loc(x0 , . . . , xN ) := U(x0 , xN ) − (cid:3) i>k

1
β
i

log

Pi(xi

|xi
sel
|xi
pi(xi

, xi
sel )

in)

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

449

in and xi

sel are collections of values of the random variables in Xi

Here, xi
in and
Xi
sel, respectively. The final Blahut-Arimito-type algorithm consists of iter-
ating equations 3.5, 3.3, and 3.6 for each k = 1, . . . , N until convergence is
achieved. Note that since each optimization step is convex (marginal con-
vexity), convergence is guaranteed but generally not unique (Jain & Kar,
2017), so that depending on the initialization, one might end up in a local
optimum.

3.4 Example: Two-Step Information Processing. The cases of serial and
parallel information processing studied in Genewein and Braun (2013) are
special cases of the multistep decision-making systems introduced above.
Both cases are two-step processes (N = 2) involving the variables X0
=
= X, and X2
in) =
, X2
= A. The serial case is characterized by (X2
W, X1
sel
({}, {X1
}). There is a third
}) and the parallel case by (X2
sel
possible combination for N = 2, given by (X2
}). How-
, X1
sel
ever, it can be shown that this case is equivalent to the (one-step) rate dis-
toration case from section 2, because if A has direct world state access, then
any extra input to the final node A = X2 that is not a prior selector contains
redundant information.

}, {X0
in) = ({}, {X0

in) = ({X1
, X2

, X2

4 Systems of Bounded Rational Agents

4.1 From Multistep to Multiagent Systems. As explained in section 2.6,
a single random variable Xk that is part of an N-step decision-making sys-
tem can represent a single agent or a collection of multiple agents, depend-
ing on the cardinality of X k
sel, that is, whether Xk has multiple priors selected
by the nodes in Xk
sel. Therefore, an N-step bounded rational decision-making
system with N > 1 represents a bounded rational multiagent system (of
depth N).

sel of Xk

For a given k ∈ {1, . . . , N}, each value x ∈ X k

sel are choosing which of the |X k

sel corresponds to ex-
actly one agent in Xk. During decision making, the agents that belong to the
| agents in Xk will receive a given
nodes in Xk
input xin (see section 4.4 for a detailed example). This decision is based on
how well the selected agent x will perform on the input xin by transform-
ing its prior policy pk( · |x) into a posterior policy Pk( · |x, xin), subject to the
constraint

sel

(cid:4)DKL(Pk

||pk)(cid:5)(x) :=

(cid:3)

xin

p(xin

|x) DKL

(cid:12)

(cid:14)
Pk( · |x, xin)(cid:7)pk( · |x)

(cid:2) Dx,

(4.1)

where Dx > 0 is a given bound on the agent’s information-processing capa-
bility. Similar to multistep systems, this choice is based on the performance
measured by the free energy of the subsequent agents.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

450

S. Gottwald and D. Braun

4.2 Multiagent Free Energy Principle. In contrast to multistep deci-
sion making, the information-processing bounds are allowed to be func-
tions of the agents instead of just the nodes, resulting in an extra Lagrange
multiplier for each agent in the free energy principle, equation 3.12. As in
equation 3.4, optimizing over the priors yields the simplified free energy
principle

⎛

⎝(cid:4)U(cid:5) −

sup
,…,PN
P1

N(cid:3)

(cid:3)

k=1

xk
sel

∈X k
sel

(cid:12)

p(xk
β
k(xk

sel )
sel )

(cid:13)
(cid:13)

Xk
in

; Xk

Xk
sel

= xk
sel

⎞

(cid:14)

⎠ ,

(4.2)

which can be solved iteratively as explained in the previous section, the
only difference being that the Lagrange parameters β
sel.
Hence, for the posterior of an agent that belongs to node k, we have

k now depend on xk

(cid:12)

|xsel

, xin

(cid:14)

(cid:12)

(cid:14)

= pk

xk
Zk(xsel

|xsel
, xin)

(cid:23)
β
k(xk

sel ) F

k(xk

(cid:24)
, xin)

, xsel

(4.3)

exp

where β
and F

k(xk

sel ) is chosen such that constraint 4.1 is fulfilled for all x ∈ xk

sel,

k is given by equation 3.6 except that now, we have

k,loc(x0

, . . . , xN ) := U(x0

, xN ) −

(cid:3)

i>k

1
β
i(xi
sel )

log

Pi(xi

|xi
sel
|xi
pi(xi

, xi
sel )

in)

(4.4)

The resulting Blahut-Arimoto-type algorithm is summarized in algo-
rithm 1.

4.3 Specialization. Although a given multiagent architecture predeter-
mines the underlying set of choices for each agent, only a small part of such
a set might be used by a given agent in the optimized system. For exam-
ple, all agents in the final step potentially can perform any action a ∈ A (see
Figure 2 and the example in section 4.4). However, depending on their indi-
viual information-processing capabilities, the optimization over the agents’
priors can result in a (soft) partitioning of the full action space A into multi-
ple chunks, where each of these chunks is given by the support of the prior
of a given agent x, supp(p( · |x)) ⊂ A. Note that the resulting partitioning
is not necessarily disjoint, since agents might still be sharing a number of
actions, depending on their available information-processing resources. If
the processing capability is low compared to the number of possible actions
in the full space and if there are enough agents at the same level, then this
partitioning allows each agent to focus on a smaller number of options to
choose from, provided that the coordinating agents have enough resources
to decide among the partitions reliably.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

451

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Therefore, the amount of prior adaptation of an agent—by how much its
optimal prior p deviates from a uniform prior p0 over all accessible choices,
which is measured by the KL divergence DKL(p(cid:7)p0)—determines its degree
of specialization. More precisely, we define the specialization of an agent
with prior p and choice space X by

S[p] := DKL(p(cid:7)p0)
log |X |
(cid:2)

= 1 − H[p]
log |X |

(4.5)

where H[p] := −
x p(x) log p(x) denotes the Shannon entropy of p. By nor-
malizing with log |X |, we obtain a quantity between 0 and 1, since 0 (cid:2)
H(p) (cid:2) log |X |. Here, S[p] = 0 corresponds to H[p] = log |X |, which means
that the agent is completely unspecialized, whereas S[p] = 1 corresponds
to H[p] = 0, which implies that p has support on a single option x∗ ∈ X ,
meaning that the agent deterministically performs the same action always
and therefore is fully specialized.

452

S. Gottwald and D. Braun

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

Figure 2: Example of a hierarchical architecture of 10 agents that are com-
bined using the three-step decision-making system (N = 3) shown in the upper-
left corner (see section 4.4 for details). Here, every node—and therefore every
agent—has access to the world states (big circle). X1 consists of one agent that
| = 3 agents in X2 obtains a given world state
decides about which of the |X
as input. The selected agent in X2 selects which of the |X
| = 2 agents out of
| = 6 agents in A that are connected to it, obtains the world state to
the |X
perform the final decision about an action a ∈ A (gray circles on the right). In
our notation introduced below, this architecture is labeled by (1, 4)[1,3,(3,2)] (see
section 5.1).

| · |X

4.4 Example: Hierarchical Multiagent System with Three Levels.
Consider the example of an architecture of 10 agents shown in Figure 2 that
are combined via the three-step decision-making system given by

(X2
sel

, X2

in) = ({X1

}, {W}),

(X3
sel

, X3

in) = ({X1

, X2

}, {W}),

(4.6)

as visualized in the upper-left corner of Figure 2. The number of agents in
each node is given by the cardinality of the target space of the selecting
node(s) (or equals one if there are no selectors). Hence, X1 consists of one
agent, X2 consists of |X
| agents. For
1
example, if we have |X
| = 2, as in Figure 2, then this results
1
in a hierarchy of one, three, and six agents.

| agents, and A consists of |X
1
| = 3 and |X

| · |X

The joint probability of the system characterized by equation 4.6 is given

p(w, x1

, x2

, a) = p(w)P1(x1

|w)P2(x2

|x1

, w)P3(a|x2

, x1

, w)

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

453

and the free energy by

F[P1

, P2

, P3] =

(cid:3)

w,x1

,x2

p(w, x1

, x2

(cid:17)
U(a, w) − 1
, a)
β
1

log

P1(x1

|w)
p1(x1)

− 1
β

2(x1)

log

P2(x2

, w)
|x1
|x1)
p2(x2

−

1
, x2)
3(x1

log

P3(a|x2
p3(a|x2

, w)
, x1
, x1)

(cid:18)
,

where the priors p1, p2, and p3 are given by the marginals (see equation 3.3):

p1(x1) =

p2(x2

|x1) =

p3(a|x2

, x1) =

(cid:3)

w
(cid:3)

ρ(w)P(x1

|w),

p(w|x1)P2(x2

|x1

, w),

p(w|x1

, x2)P3(a|x2

, x1

, w).

By equation 3.5, the posteriors that iteratively solve the free energy principle
are

P1(x1

P2(x2

|x1

P3(a|x2

, x1

|w) = p1(x1)
Z(w)
, w) = p2(x2

|x1)
Z(w, x1)
, w) = p3(a|x2
Z(w, x1

, x1)
, x2)

(cid:23)
β
1

(cid:24)
1(w, x1)

exp

(cid:23)
β
2(x1)F

2(w, x1

(cid:24)
, x2)

exp

(cid:23)
β

3(x1

(cid:24)
, x2)U(a, w)

exp

where, by equations 3.6 and 4.4,

1(w, x1) :=

2(w, x1

, x2) :=

(cid:3)

p(x2

, a|x1

(cid:17)
U(a, w) − 1
, w)

2(x1)

log

−

1
, x2)
3(x1

(cid:17)
U(a, w) −
, w)

P3(a|x2

, x1

P2(x2

, w)
|x1
|x1)
p2(x2
(cid:18)
,

, x1
, w)
, x1)

log

P3(a|x2
p3(a|x2

1
, x2)
3(x1

log

P3(a|x2
p3(a|x2

, x1
, w)
, x1)

Given a world state w ∈ W, the agent in X1 decides which of the three
agents in X2 obtains w as an input. This narrows down the possible choices
for the selected agent in X2 to two out of the six agents in A. The selected
agent performs the final decision by choosing an action a ∈ A. Depend-
ing on its degree of specialization, which is a result of his own and the

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:18)
.

454

S. Gottwald and D. Braun

coordinating agents’ resources, this agent will choose his action from a cer-
tain subset of the full space A.

5 Optimal Architectures

Here, we show how the framework we have described can be used to de-
termine optimal architectures of bounded rational agents. Summarizing the
assumptions made in the derivations, the multiagent systems that we ana-
lyze must fulfill the following requirements:

i. The information flow is feedforward. An agent in Xk can obtain in-
formation directly from another agent that belongs to Xm only if
m < k. ii. Intermediate agents cannot be end points of the decision-making process. The information flow always starts with the processing of W and always ends with a decision a ∈ A. iii. A single agent is not allowed to have multiple prior policies. Agents are the smallest decision-making unit, in the sense that they trans- form a prior to a posterior policy over a set of actions in one step. The performance of the resulting architectures is measured with respect to the expected utility they are able to achieve under a given set of re- source constraints. To this end, we need to specify (1) the objective for the full decision-making process, (2) the number N of decision-making steps in the system, (3) the maximal number n of agents to be distributed among the nodes, and (4) the individual resource constraints {D1 , . . . , Dn} of those agents. We illustrate these specifications with a toy example in section 5.2 by showcasing and explicitly explaining the differences in performance of several architectures. Moreover, we provide a broad performance compari- son in section 5.3, where we systematically vary a set of objective functions and resource constraints in order to determine which architectural features most affect the overall performance. For simplicity, in all simulations, we are limiting ourselves to architectures with N (cid:2) 3 nodes and n (cid:2) 10 agents. In the following section, we start by describing how we characterize the architectures conforming to the three requirements. 5.1 Characterization of Architectures 5.1.1 Type. In order to be able to reference the architectures resulting from i–iii, we label an N-step decision-making process with N > 1 by a
, . . . , iN−1) of length N − 1 which we call the type of the architec-
tuple (i1
ture, where iN−2 characterizes the relation between the first N variables
, . . . , XN−1, and iN−1 determines how these variables are connected to XN.
X0
Note that the explicit mapping between an index and the corresponding re-
lation of random variables is arbitrary.

For example, for N (cid:2) 3, we define the types shown in Figure 3, where

i ∈ {0, 1, 2} and j ∈ {0, . . . , 5} represent the following relations:

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

455

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 3: Overview of the resulting architectures for N (cid:2) 3, each of them la-
beled by its type.

i = 0 : (X2
sel
i = 1 : (X2
sel
i = 2 : (X2
sel
j = 0 : (X3
sel
j = 1 : (X3
sel
j = 2 : (X3
sel
j = 3 : (X3
sel
j = 4 : (X3
sel
j = 5 : (X3
sel

, X2
, X2
, X2
, X3
, X3
, X3
, X3
, X3
, X3

}),
}, {W}),

in) = ({}, {X1
in) = ({X1
in) = ({}, {W}),
in) = ({}, {X2
}),
, X2
in) = ({}, {X1
}),
}),
}, {X2
in) = ({X1
}, {X1
in) = ({X2
}),
, X2
in) = ({X1
}, {W}),
}, {W}).
in) = ({X2

For example, the architecture shown in Figure 2 has the type (1, 4). Corre-
spondingly, the two-step cases are labeled by (i, ) for i ∈ {0, 1, 2}, and the
one-step rate distoration case by (−1, ). Note that not every combination of
i ∈ {0, 1, 2} and j ∈ {0, . . . , 5} describes a unique system; for example, (2, 3)
is equivalent to (2, 2) when replacing X1 by X2. Moreover, as already men-
tioned, (2, ) is equivalent to (−1, ), and similarly, (0, 1) is equivalent to (0, ).

5.1.2 Shape. After the number of nodes has been fixed, the remaining
property that characterizes a given architecture is the number of agents per

456

S. Gottwald and D. Braun

node. For most architectures, there are multiple possibilities to distribute a
given number of agents among the nodes, even when neglecting individ-
ual differences in resource constraints. We call such a distribution a shape,
, . . .], where nk denotes the number of agents in node k.
denoted by [n1
Note that not all architectures will be able to use the full number of avail-
able agents, most immanently the one-step rate distortion case (one agent),
or the two-step serial case (two agents). For these systems, we always use
the agents with the highest available resources in our simulations.

, n2

For example, for N (cid:2) 3, the resulting shapes for a maximum of n = 10

agents are as follows:

• [1] for (−1,), [1, 1] for (0,), and [1, 9] for (1,)
• [1, 1, 1] for (0, 0) and (2, 1)
• [1, 1, 8] for (0, 2), (0, 3), (0, 5), (2, 2)
• [1, 1, (2, 4)] and [1, 1, (4, 2)] for (0, 4) and (2, 4)
• [1, 8, 1] for (1, 0) and (1, 1)
• [1, 4, 4] for (1, 2)
• [1, 2, 7], [1, 3, 6], [1, 4, 5], [1, 5, 4], [1, 6, 3], [1, 7, 2] for (1, 3) and (1, 5)
• [1, 2, (2, 3)] and [1, 3, (3, 2)] for (1, 4),

where a tuple inside the shape means that two different nodes are deciding
about the agents in that spot; for example [1, 1, (2, 4)] means that there are
eight agents in the last node, labeled by the values (x1
2 with
|X
| = 4. In Figure 4, we visualize one example architecture for
each of the above three-step shapes, except for the shapes of type (1, 4) of
which one example is shown in Figure 2.

| = 2 and |X

, x2) ∈ X

× X

Together, the type (i, . . .) and shape [n1

, . . .] uniquely characterize a

given multiagent architecture, denoted by (i, . . .)[n1

,…].

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

5.2 Example: Call Center. Consider the operation of a company’s call
center as a decision-making process, where customer calls (world states)
must be answered with an appropriate response (action) in order to achieve
high customer satisfaction (utility). The utility function shown on the left
of Figure 5 can be viewed as a simplistic model for a real-world call cen-
ter of a big company such as a communication service provider. In this
simplification, there are 24 possible customer calls that belong to three
separate topics—for example, questions related to telephone, Internet, or
television—which can be further subdivided into two subcategories—for
example, consisting of questions concerning the contract or problems with
the hardware. (See the Figure 5 caption for the explicit utility values.)

Handling all possible phone calls perfectly by always choosing the cor-
responding response with maximum utility requires log2(24) ≈ 4.6 bits (see
Figure 5). However, in practice, a single agent is usually not capable of
knowing the optimal answers to every type of question. For our example,
this means that the call center has access only to agents with information

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

457

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 4: Visualization of exemplary three-step multiagent architectures speci-
fied by their types and shapes.

processing capability less than 4.6 bits. It is then required to organize the
agents in a way so that each agent has to deal with only a fraction of the
customer calls. This is often realized by first passing the phone call through
several filters in order to forward it to a specialized agent. Arranging these
selector or filter units in a strict hierarchy then corresponds to architectures
of the form of (1, 4) or (1, 5) (see below for a comparison of these two),
where at each stage, a single operator selects how a call is forwarded. In
contrast, architectures of the form of (2, 4) allow for multiple independent
filters working in parallel—for example, realized by multiple trained neu-
ral networks, where each is responsible for a particular feature of the call
(say, one node deciding about the language of the call and another node de-
ciding about the topic). In the following, we do not discriminate between
human and artificial decision makers, since both can qualify equally well
as information-processing units.

458

S. Gottwald and D. Braun

Figure 5: (a) Utility function for example 5.2. The set of phone calls W is parti-
tioned into three separate regions, corresponding to three different topics about
which customers might have complaints or questions. Each of these can be di-
vided into two subcategories of four customer calls each. For each phone call,
there is exactly one answer that achieves the best result (U = 1). Moreover,
the responses that belong to one subcategory of calls are also suitable for the
other calls in that particular subcategory, albeit slightly less effective (U = 0.85)
than the optimal answers. Similarly, the responses that belong to the same topic
of calls are still a lot better (U = 0.7) than responses to other topics (U = 0).
(b) Posterior policy of a single agent with an information bound of 4.6 bits.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 6: Optimal prior policies for each agent of the architecture (1, 5)[1,3,6]
with an information bound of (D1

, . . . , D10) = (1.6, 0.1, . . . , 0.1).

, D2

Assume that there are n = 10 bounded rational agents available. Con-
sidering the given utility function, the architectures (1, 4)[1,3,(3,2)] (shown in
Figure 2) and (1, 5)[1,3,6] (shown in Figure 4) might be obvious choices as
they represent the hierarchical structure of the utility function. With an in-
formation bound of 1.6 (≈ log2(3)) bits for the first agent and 0.1 bits for
the rest, the optimal prior policies for (1, 5)[1,3,6] obtained by our free en-
ergy principle are shown in Figure 6. We can see that for this architecture,
the choice x1 of the agent at the first step corresponds to the general topic
of the phone call, the decisions x2 of the three agents at the second stage

Systems of Bounded Rational Agents

459

Figure 7: Performance comparison under two different information bounds.

correspond to the subcategory on which one of the six agents at the final
stage is specialized to, who then makes the decision about the final response
a by picking one of the four actions in the support of its prior.

We can see in Figure 7 on the left that a hierarchical structure as in
(1, 5)[1,3,6] or (1, 4)[1,3,(3,2)] is indeed superior when comparing with the ar-
chitecture (2, 4)[1,1,(2,4)], because there is no good selector for the second
filter. We have also added two architectures to the comparison that have
a bottleneck of the information flow at either end of the decision-making
process, (0, 3)[1,1,8] and (1, 0)[1,8,1] (see Figure 4 for a visualization), which
are performing considerably worse than the others: in (0, 3)[1,1,8], the first
agent is the only one who has direct contact to the customer and passes the
filtered information on to everybody else, whereas in (1, 0)[1,8,1], the cus-
tomer talks to multiple agents; however, they cannot make any decisions
and pass on the information to a final decision node who has to select from
all possible options. Interestingly, as can be seen on the right side of Fig-
ure 7, when changing the resource bounds such that the first agent has only
= 0.5 bits instead of
D1
0.1, then the strictly hierarchical architectures (1, 5)[1,3,6] and (1, 4)[1,3,(3,2)]
are outperformed by the architecture (2, 4)[1,1,(2,4)], because their first agent
is not able to perfectly distinguish among the three topics anymore. This
is an ideal situation for (2, 4)[1,1,(2,4)], since here, the total information pro-
cessing for filtering the phone calls is split efficiently between the first two
agents in the system.

= 1 bits instead of 1.6 and the second agent has D2

Note that (1, 4) and (1, 5) do not necessarily perform identically (as can
be seen on the right in Figure 7), even though the structure of the utility
function might suggest that it is ideal for (1, 5)[1,3,6] to always have the opti-
mal priors shown in Figure 6. However, this crucially depends on the given
information-processing bounds. In Figure 8, we illustrate the difference be-
tween the two types in more detail by showing the processed information
that can actually be achieved per agent in the respective architecture for
an information bound of D = (0.4, 2.6, 2.6, 2.6, 0.4, . . . , 0.4). When the first
agent in the hierarchy has low capacity, then the rigid structure of (1, 4) is
penalized because the agents at the second stage cannot compensate the
errors of the first agent, regardless of their capacity. In contrast, for (1, 5),
the connection between the second stage and the executing stage can be
changed freely, which leads to ignoring the first agent and letting the three

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

460

S. Gottwald and D. Braun

Figure 8: Demonstration of the difference between the two architectures
(1, 4)[1,3,(3,2)] and (1, 5)[1,3,6] for an information bound of D = (0.4, 2.6, 2.6, 2.
6, 0.4, . . . , 0.4).

agents in the second stage determine the distribution of phone calls com-
pletely. In this sense, (1, 5) is more robust to errors in the first filter than
(1, 4).

5.3 Systematic Performance Comparison. In this section, we move
away from an explicit toy example to a broad performance comparison of
all architectures for N (cid:2) 3, averaged over multiple types of utility functions
and a large number of resource constraints (as defined below). In section 6.1,
this is supplemented with an analysis of the architectural features that best
explain the performances.

5.3.1 Objectives. We compare all possible architectures for 12 different
utility functions, {Uk
k=1, defined on a world and action space of |W| =
}12
|A| = 20 elements, and we assume the same cardinality for the range of all
hidden variables. Note that the cardinality of the target set X for selector
nodes X ∈ Xsel is given by the number of agents it decides about. In partic-
ular, we consider three kinds of utility functions (one-to-one, many-to-one,
one-to-many) that we vary in a 2 × 2 paradigm, where the first dimension
is the number of maximum utility peaks (single, multiple) and the second
dimension is the range of utility values (binary, multivalued). The utility
functions are visualized in Figure 9, where the three kinds of functions cor-
respond to the three rows of the plot. A one-to-one scenario applies to a
needle-in-a-haystack situation where each world state affords only a unique
action and, vice versa, each optimal action allows uniquely identifying the
world state, for example, an absolute identification task. A many-to-one sce-
nario allows for abstractions in the world states, for example, in categoriza-
tion when multiple instances are judged to belong to the same class (e.g.,
vegetables are boiled; fruit is eaten raw). A one-to-many scenario allows for
abstractions in the action space—for example, in hierarchical motor control
when a grasp action can be performed in many different ways.

5.3.2 Resource Limitations. We are considering three schemes of resource

constraints:

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

461

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 9: Utility functions on which the performances are measured.

1. Same constraints for all agents
2. Same constraints for all agents but one, which has a higher limit than

the other agents

3. Same constraints for all but two agents, which can have a different

limit and have higher limits than all the other agents.

For the first constraint, we compare 20 sets of constraints {D0
, . . .} with
Di equally spaced in the range between 0 and 3 bits; for the second, we
compare 39 sets in the same range but the high resource agent having 1, 2
and 3 bits; and for the third, we allow 89 sets with similar constraints than
in the second constraint but additional combinations for the second high-
resource agent.

, D1

5.3.3 Simulation Results. The performance of an architecture is given by
its expected utility with respect to a given objective and a given informa-
tion bound as defined above. In Figure 10, we show which of the architec-
tures won at least one condition, together with the proportion of conditions
won by each of these architectures. We can see that (2, 4)[1,1,(2,4)] overall out-
performs all the other systems (see Figure 4 for a visualization). When all

462

S. Gottwald and D. Braun

Figure 10: Proportion of conditions where the given architectures had the high-
est performance, for all conditions, and separately for each of the three different
schemes of resource constraints.

agents have the same resource constraints, the architecture (1, 4)[1,3,(3,2)] is
a strong second winner; however, this is not the case if one or two agents
have more resources than the rest. It is not surprising that in these situa-
tions, the parallel case with one high-resource agent distributing the work
among the low-resource agents, and even the case of a single agent that
does everything by himself, are both performing well.

A closer look on the achieved expected utilities, however, shows several
architectures that are almost equally well performing for many conditions.
In order to increase comparability between the different utility functions,
we measure performance in terms of a relative score, which, for a given
utility function and resource constraint, is given by the architectures’ ex-
pected utility divided by the maximum expected utility of all architectures.
The score averaged over all conditions is shown for each architecture in
Figure 11 in the top row. We can see that the best architectures are pretty
close to each other. As expected, the architecture that won the most condi-
tions also has the highest overall performance; however, there are multiple
architectures that are very close. The top three architectures are

(2, 4)[1,1,(2,4)]

, (1, 5)[1,3,6]

, (1, 4)[1,3,(3,2)]

(5.1)

which have already been visualized (see Figures 2 and 4).

A better understanding of their performances under different resource
constraints can be gathered from the remaining graphs in Figure 11. In the
second row, we can see that the top three overall architectures also perform
best for almost all utility functions when averaged over the information

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

463

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 11: Architecture performances averaged over all conditions (a), aver-
aged over all information bounds for each utility function (b), and averaged
over all utility functions for each information bound (c–e).

464

S. Gottwald and D. Braun

bounds. The last three graphs in Figure 11 show the expected utility of
each architecture averaged over all utility functions for each information
bound. We can see how the expected utility increases with higher informa-
tion bounds for some architectures more than for others. The top three ar-
chitecures perform differently for most of the bounds, with spans of bounds
where each of them clearly outperforms the others.

6 Discussion

6.1 Analysis of the Simulations. Plenty of factors influence the perfor-
mance of each of the given architectures. Here, we attempt to unfold the
features that determine their performances in the clearest way. To this end,
we compare the architectures with respect to the following quantities:

Average specialization of operational agents: The specialization (see equation
4.5) averaged over all agents in the final stage of the architecture
Hierarchical: Boolean value that specifies whether an architecture is hier-
archical or not, meaning that consecutive nodes are occupied by an
increasing amount of agents

Agents with direct w-access: The number of agents with direct world state

access

Operational agents with direct w-access: The number of agents in the last

node of the architecture

Number of w-bottlenecks: The total number of nodes that are missing direct

access to the world state

As can be seen from Figure 12, we found that these architectural fea-
tures explain the differences in performance quite well. More precisely,
the architectures can be roughly grouped into different categories, indi-
cated by slightly different color saturations in Figure 12). The poorest-
performing group consists of architectures that have between one and two
w-bottlenecks, and therefore have only a few agents with direct w-access; in
particular, none of their operational agents has direct w-access. Moreover,
in this group, most architectures are not hierarchical at all, and their oper-
ational agents have low specialization, with two exceptions that both have
two w-bottlenecks.

The architectures with medium performance have maximally one
w-bottleneck, and many of them are hierarchical. Here, systems that have
operational units with high specialization are missing direct w-access, and
the systems that have operational units with direct w-access have low
specialization.

All architectures in the top group have many agents with direct world-
state access, and they have no w-bottlenecks. Interestingly, the best six ar-
chitectures are all strictly hierarchical. Moreover, the order of performance

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

465

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 12: Proposed features to explain the architectures’ performances (see
section 6.1).

is almost in direct accordance with the average specialization of the opera-
tional agents.

Overall we can say that it is best to have as many operational units as
needed to discriminate the actions well, as long as the coordinating agents
have enough resources to discriminate among them properly. The archi-
tecture (1, 4)[1,1,(2,4)] has eight operational agents managed by two coor-
dinating units, which need maximally two bits (for choosing among four
agents) and one bit (for choosing among two agents) in order to perform
well. Both of the other top three architectures, (1, 5)[1,3,6] and (1, 4)[1,3,(3,2)],
have six operational agents, managed by three coordinating units, so that
each of them needs maximally one bit. But compared to (1, 4)[1,1,(2,4)], there
are fewer agents to spare for the operational stage. Hence, if the operational

466

S. Gottwald and D. Braun

units have low resources, it is always a trade-off between the number of op-
erational units and the resources of the coordinating ones.

Another way to see why the architecture (1, 4)[1,1,(2,4)] overall outper-
forms all the other high-ranked systems might be its lower average choice-
per-agent ratio—the average number of options for the decision of each
agent in the system. In (1, 4)[1,1,(2,4)], the second agent also directly ob-
serves the world state; moreover, the choice space of eight agents at the
operational stage is split into two and four choices. Therefore, there are
only 2+4+20
= 2.6 choices per agent on average, whereas for (1, 5)[1,3,6] and
(1, 4)[1,3,(3,2)], there are 3+6+20

= 2.9.

6.2 Limitations of Our Analysis. The analysis we have presented pro-
vides only a rough explanation of the differences in performance. Which ar-
chitecture is optimal depends a lot on the actual information bounds of each
agent. In all of our conditions, we assumed that most agents have the same
processing capabilities, which is why there is a certain bias toward architec-
tures that perform well under this assumption (low variance in choice-per-
agent ratio across the agents).

Due to the large number of Lagrange parameters in the free energy
principle (see equation 4.2), the data generation was done by running the
Blahut-Arimoto-type algorithm for 10,000 different combinations of param-
eters for each of the architectures, for each type of the three different types
of resource limitations in section 5.3, and for each of the utility functions
defined in section 5.3. For a given information bound, the corresponding
parameters were determined by looking for the points with the highest free
energy that still respect the bound. A better approach would be to enhance
the global parameter search by a more finely grained local search. Another
possibility is to use an evolutionary algorithm, where each population is
given by multiple sets of parameters and the information constraints are
built in by a method similar to Chehouri, Younes, Perron, and Ilinca (2016).
This works well but requires significantly more time to process.

Since the Blahut-Arimoto type of algorithm is not guaranteed to con-
verge to a global maximum, the resulting values for the expected utility
and mutual information for a given set of parameters can depend on the
initialization of the algorithm. In practice, this variation is small enough
so that it influences the average performance over multiple conditions by
only a negligable amount. However, direct comparisons of architectures for
a given information bound and utility function should be repeated multiple
times to make sure that the results are stable.

6.3 Relation to Variational Bayes and Active Inference. Above, we de-
termined the architectures that achieve the highest expected utility under a
given resource constraint. These constraints are fulfilled by tuning the La-
grange multipliers in the free energy principle. If the Lagrange multipliers

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

467

themselves are fixed, for instance, as exchange rates between information
and utility (Ortega & Braun, 2010), or inverse temperatures in thermody-
namics (Ortega & Braun, 2013), then the free energy itself would be an
appropriate performance measure. This is done, for example in Bayesian
model selection, which is also known as structure learning and represents
an important problem in Bayesian inference and machine learning. The
Bayesian approach for evaluating different Bayesian network structures, in
order to find the relation of a given set of hidden variables that best ex-
plains a data set D, consists in comparing the marginal likelihood or evi-
dence p(D|S) of the structures S (Friedman & Koller, 2003). This can be seen
to be analogous to a performance comparison of different decision-making
architectures measured by the free energy. In the simple case of one observ-
able Y and one hidden variable X, we have

p(y|S) =

(cid:3)

x∈X

p(x|S) p(y|x, S)

∀y ∈ Y,

where the likelihood p(y|x, S) is assumed to be known. Given a prior p(x|S)
and, for simplicity, a single observed data point y ∈ Y, the posterior distri-
bution of X can be inferred by using Bayes’ rule:

p(x|y, S) = p(x|S) p(y|x, S)

p(y|S)

∀x ∈ X .

(6.1)

As Ortega and Braun (2013) have noted, when comparing equation 6.1
with the Boltzmann equation, 2.4, we can see that equation 6.1 is equivalent
to the posterior P of a bounded rational decision maker with choice space
X , prior policy p(x|S), Lagrange parameter β = 1, and utility function given
by U(x) := log p(y|x, S). Since the marginal likelihood p(y|S) is the normal-
ization constant in equation 6.1, it follows immediately from equation 2.5
that log p(y|S) is the optimal free energy Fvar[P = p( · |y, S)] of this decision
maker, where

Fvar[P] :=

(cid:3)

x
(cid:3)

P(x) log p(y|x, S) −

(cid:3)

P(x) log

P(x)
p(x|S)

P(x) log

p(x|y, S)
P(x)

+ log p(y|S)

P(x) log

p(x, y|S)
P(x)

(6.2)

In Bayesian statistics, Fvar is known as the variational free energy, and
the given decomposition is often referred to in terms of the difference be-
tween accuracy (expected log likelihood) and complexity (KL divergence

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

468

S. Gottwald and D. Braun

between prior and posterior). It is used in the variational characteriza-
tion of Bayes’ rule, that is, the approximation of the exact Bayesian pos-
terior p( · |y, S) given by equation 6.1 in terms of a simpler—for example, a
parameterized—distribution q by minimizing the KL divergence between q
and p( · |y, S). Since DKL(q(cid:7)p( · |y, S)) = −Fvar[q] + log p(y, S), this is equiv-
alent to the maximization of Fvar.

The same is true for multiple hidden variables. For example, let S
be the three-step architecture of type (1, 4) from section 4.4 with W = Y
= 1 and
and hidden variables X1
U(a, x1

, y) = log p(y|a, x1

= A. Setting β
1

, S), we obtain

, and X3

= β
2

= β
3

, X2

, x2

2(y, x1

, x2) = log p(y|x1

, x2

, S), F

1(y, x1) = log p(y|x1

, S)

and

Z(y, x1

, x2) = p(y|x1

, x2

, S) , Z(y, x1) = p(y|x1

, S) , Z(y) = p(y|S).

Note that although so far, we always assumed that the utility function de-
pends on only the world states and actions, the equations in sections 3, 4,
and 4.4 are also valid in the general case of U depending on all the variables
in the system. The total free energy for a given y ∈ Y then takes the form

(cid:3)

,x2
x1
(cid:3)

(cid:6)

p(x1

, x2

|y, S)

log p(y|x1

, x2

, S) − log

|x1
|x1

, y, S)
, S)

p(x2
p(x2
(cid:7)

(cid:7)

− log

p(x1
p(x1

|y, S)
|S)

(cid:6)

p(x1

|y, S)

log p(y|x1

, S) − log

p(x1
p(x1

|y, S)
|S)

= log p(y|S).

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

Hence, also in this case, the logarithm of the marginal likelihood is given
by the free energy of the corresponding decision-making system. Choosing
the multistep architecture with the highest free energy is then analogous to
Bayesian model selection with the marginal likelihood or Bayesian model
evidence as performance measure.

Another interesting interpretation of equation 6.2 is that here, the hid-
den variable X can be thought of as an action causing observed outcomes y.
This is close to the framework of active inference (Friston, Rigoli et al., 2015;
Friston, Parr et al., 2017), where actions directly cause transitions of hidden
states, which generate outcomes that are observed by the actor. More pre-
cisely, there the real-world process generating observable outcomes is dis-
tinguished from an internal generative model describing the beliefs about
the external generative process (e.g., a Markov decision process). Observa-
tions are generated from transitions of hidden states, which depend on the
decision maker’s actions. Decision making is given by the optimization of a
variational free energy analogous to equation 6.2, where the log likelihood

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

469

is given by the generative model, which describes beliefs about the hidden
and control states of the generative process. This way, utilities are absorbed
into a (desired) prior (Ortega & Braun, 2015). There are several differences
to our approach. First, the structure of the free energy principle of bounded
rationality originates from the maximization of a given predefined exter-
nal utility function under information constraints, whereas the free energy
principle of active inference aims to minimize surprise or Bayesian model
evidence, effectively minimizing the divergence between approximate and
true posterior. Second, in active inference, utility is transformed into pref-
erences in terms of prior beliefs, while in bounded rationality, prior policies
over actions can be part of the optimization process, which results in spe-
cialization and abstraction. In constrast, active inference compounds utili-
ties and priors into a single desired prior, which is fixed and does not allow
separately optimizing utility and action priors.

7 Conclusion

In this work, we have presented an information-theoretic framework
to study systems of decision-making units with limited information-
processing capabilities. It is based on an overreaching free energy opti-
mization principle that, on the one hand, allows computing the optimal
performances of explicit architectures and, on the other hand, produces op-
timal partitions of the involved choice spaces into regions of specialization.
In order to combine a given set of bounded rational agents, the full decision-
making process is split into multiple decision steps by introducing interme-
diate decision variables, and then a given set of agents is distributed among
these variables. We have argued that this leads to two types of agents,
nonoperational units that distribute the work among subordinates and op-
erational units that are doing the actual work in the sense of choosing a
particular action that either serves as an input for another agent in the sys-
tem or represents the final decision of the full process. This “vertical” spe-
cialization is enhanced by optimizing over the agents’ prior policies, which
leads to an optimal soft partitioning of the underlying choice space of each
step in the system, resulting in a “horizontal” specialization as well.

In order to illustrate the proposed framework, we have simulated and
analyzed the performances under a number of different resource con-
straints and tasks for all possible three-step architectures whose informa-
tion flow starts by observing a given world state and ends with the selection
of a final decision. Although the relative architecture performances depend
crucially on the explict information-processing constraints, the overall best-
performing architectures tend to be hierarchical systems of nonoperational
manager units at higher hierarchical levels and operational worker units at
the lowest level.

Our approach is based on earlier work on information-theoretic bounded
rationality (Ortega & Braun, 2011, 2013; Genewein & Braun, 2013; Genewein

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

470

S. Gottwald and D. Braun

et al., 2015). In particular, the N-step decision-making systems introduced in
section 3 generalize the two-step processes studied in Genewein and Braun
(2013) and Genewein et al. (2015). According to Simon (1979), there are three
different bounded rational procedures that can transform intractable into
tractable decision problems: (1) Looking for satisfactory choices instead of
optimal ones, (2) replacing global goals with tangible subgoals, and (3) di-
viding the decision-making task among many specialists. From this point
of view, the decision-making process of a single agent, given by the one-
step case of information-theoretic bounded rationality (Ortega & Braun,
2011, 2013) described in section 2, corresponds to the first procedure, while
the bounded rational multistep and multiagent decision-making processes
introduced in sections 3 and 4 can be attributed to the second and third
procedures.

The main advantage of a purely information-theoretic treatment is its
universality. To our knowledge, this work is the first systematic theory-
guided approach to the organization of agents with limited resources in the
generality of information theory. In other approaches, more specific meth-
ods, tailored to each particular focus of study, are used instead. In particular,
bounded rationality has usually a very specific meaning, often being imple-
mented by simply restricting the cardinality of the choice space. For exam-
ple, in management theory, the well-known results by Graicunas from the
1930s (Graicunas, 1933) suggest that managers must have a limited span
of control in order to be efficient. By counting the number of possible re-
lationships between managers and their subordinates, he concludes that
there is an explicit upper bound of five or six subordinates. Of course, there
are many cases of successful companies today that disagree with Graicu-
nas’s claim; for example, Apple’s CEO has 17 managers reporting directly
to him. However, current management experts think that the optimal num-
ber is somewhere between 5 and 12. The idea of restricting the cardinality of
the space of decision making is also studied for operational agents. For ex-
ample, Camacho and Persky (1988) explore the hierarchical organization of
specialized producers with a focus on production. Although their treatment
is more abstract and more general than many preceding studies, their take
on bounded rationality is very explicit and based on the assumption that the
number of elementary parts that form a product, as well as the number of
possibilities of each part, are larger than a single individual can handle. Sim-
ilarly, in most game-theoretic approaches that are based on automaton the-
ory (Neyman, 1985; Abreu & Rubinstein, 1988; Hernández & Solan, 2016),
the boundedness of an agent’s rationality is expressed by a bound on the
number of states of the automaton. Most of these non-information-theoretic
treatments consider cases when there is a hard upper bound on the number
of options, but they usually lack a probabilistic description of the behavior
in cases when the number of options is larger than the given bound.

The work by Geanakoplos and Milgrom (1991) uses “information” to
describe the limited attention of managers in a firm. But here, we use the

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Systems of Bounded Rational Agents

471

term more informally, and not in the classical information-theoretical sense.
However, one of their results suggests that “firms with more prior informa-
tion about parameters . . . will employ less able managers, or give their man-
agers wider spans of control” (Geanakoplos & Milgrom, 1991, p. 207). This
observation is in line with information-theoretic bounded rationality, since
by optimizing over priors in the free energy principle, the required process-
ing information is decreased compared to the case of nonoptimal priors, so
that less able agents can perform a given task or, similarly, an agent with a
higher information bound can have a larger choice space.

In neuroscience, the variational Bayes approach explained in section 6.3
has been proposed as a theoretical framework to understand brain function
in terms of active inference (Friston 2009, 2010; Friston, Levin et al., 2015;
Friston, Rigoli et al., 2015; Friston, Lin et al., 2017; Friston, Parr et al., 2017),
where perception is modeled as variational Bayesian inference over hid-
den causes of observations. There, a processing node (usually a neuron) is
limited in the sense that it can only linearly combine a set of input signals
into a single output signal. Decision making is modeled by approximating
Bayes’ rule in terms of these basic operations and then tuning the weights
of the resulting linear transformations in order to optimize the free energy
(see equation 6.2). Hence, there, the free energy serves as a tool to computa-
tionally simplify Bayesian inference on the neuronal level, whereas our free
energy principle is a tool to computationally trade off expected utility and
processing costs, providing an abstract probabilistic description of the best
possible choices when the information-processing capability is limited.

In the general setting of approximate Bayesian inference, there are
many interesting algorithms and belief update schemes—for example, be-
lief propagation in terms of message passing on factor graphs (see Yedidia,
Freeman, & Weiss, 2005). These algorithms make use of the notion of the
Markov boundary (minimal Markov blanket) of a node X, which consists
of the nodes that share a common factor with X (so-called neighbors).
Conditioned on its Markov boundary, a given random variable is inde-
pendent of all other variables in the system, which allows approximating
marginal probabilities in terms of local messages between neighbors. These
approximations are generally exact only on tree-like factor graphs with-
out loops (Mézard & Montanari, 2009, theorem 14.1). This raises the inter-
esting question of whether such algorithms could also be applied to our
setting. First, it should be noted that variational Bayesian inference consti-
tutes only a subclass of problems that can be expressed by utility optimiza-
tion with information constraints. In this subclass, all random variables
have to appear either in utility functions (they have to be given as log
likelihoods) or in marginal distributions that are kept fixed—see, for ex-
ample, the definition of the utility in the inference example above where
U(a, x1
, S) compared to the utility functions of the
form U(w, a) used throughout the letter that leave all intermediate ran-
, . . . , XN−1 unspecified. Second, while it may be possible to
dom variables X1

, y) = log p(y|a, x1

, x2

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
1
2
4
4
0
1
0
5
0
9
1
8
n
e
c
o
_
a
_
0
1
1
5
3
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

472

S. Gottwald and D. Braun

exploit the notion of Markov blankets by recursively computing free ener-
gies among the nodes in a similar fashion to message passing, there can also
be contributions from outside the Markov boundary—for example, when
the action node has to take an expectation over possible world states that
lie outside the Markov boundary. Finally, it may be interesting to study
whether message-passing algorithms can be extended to deal with our gen-
eral problem setting and at least to approximately generate the same kind
of solutions as Blahut-Arimoto, even though in general, we do not have
tree-structured graphs.

There are plenty of other possible extensions of the basic frame-
work introduced in this work. Marschak and Reichelstein (1998) study
multiagent systems in terms of communication cost minimization, while ig-
noring the actual decision-making process. One could combine our model
with the information bottleneck method (Tishby, Pereira, & Bialek, 1999)
and explicitly include communication costs in order to study more gen-
eral agent architectures—in particular, systems with nondirected informa-
tion flow. Moreover, we have seen in our simulations that specialization
of operational agents is an important feature shared among all of the best-
performing architectures. In the biological literature, specialization is often
paired with modularity. For example Kashtan and Alon (2005) and Wagner
et al. (2007) show that modular networks are an evolutionary consequence
of modularly varying goals. Similarly, it would be interesting to study the
effects of changing environments on specialization, abstraction, and opti-
mal network architectures of systems of bounded rational agents.

Appendix: Proof of Equation 3.5

The free energy functional F that is optimized in the free energy principle,
equation 3.4, is given by

F[P1

, . . . , PN] =

(cid:3)

p(x) F0,loc(x),

where x := (x0

, . . . , xN ), and for all k ∈ {0, . . . , n}

p(x) = ρ(x0) P1

(cid:12)

(cid:13)
(cid:13)

, xN ) −

k,loc(x) = U(x0

By writing

, x1
in

x1
sel
(cid:3)

1
β
i

log

i>k

(cid:14)

· · · PN

(cid:14)

(cid:12)

(cid:13)
(cid:13)
xN
sel
, xi
sel )

xN
|xi
sel
|xi
pi(xi

, xN
in

in)

Pi(xi

0,loc(x) = F

k,loc(x) − 1
β
k

log

Pk(xk

|xk
sel
|xk
pk(xk

, xk
sel )

in)

− R

k(x

Download pdf