文章 - 麻省理工学院人工智能研究专业

文章

Communicated by Minoru Asada

A Neural Framework for Organization and Flexible
Utilization of Episodic Memory in Cumulatively
Learning Baby Humanoids

Vishwanathan Mohan
vishwanathan.mohan@iit.it
Giulio Sandini
giulio.sandini@iit.it
Pietro Morasso
pietro.morasso@iit.it
Robotics, Brain and Cognitive Science Department, Istituto Italiano di Tecnologia,
Genova, 意大利

Cumulatively developing robots offer a unique opportunity to reenact
the constant interplay between neural mechanisms related to learning,
记忆, prospection, and abstraction from the perspective of an inte-
grated system that acts, learns, remembers, 原因, and makes mistakes.
Situated within such interplay lie some of the computationally elusive
and fundamental aspects of cognitive behavior: the ability to recall and
ﬂexibly exploit diverse experiences of one’s past in the context of the
present to realize goals, simulate the future, and keep learning further.
This article is an adventurous exploration in this direction using a sim-
ple engaging scenario of how the humanoid iCub learns to construct
the tallest possible stack given an arbitrary set of objects to play with.
The learning takes place cumulatively, with the robot interacting with
different objects (some previously experienced, some novel) in an open-
ended fashion. Since the solution itself depends on what objects are
available in the “now,” multiple episodes of past experiences have to be
remembered and creatively integrated in the context of the present to
be successful. Starting from zero, where the robot knows nothing, 我们
explore the computational basis of organization episodic memory in a
cumulatively learning humanoid and address (1) how relevant past ex-
periences can be reconstructed based on the present context, (2) 如何
multiple stored episodic memories compete to survive in the neural
space and not be forgotten, (3) how remembered past experiences can
be combined with explorative actions to learn something new, 和 (4)
how multiple remembered experiences can be recombined to generate
novel behaviors (without exploration). Through the resulting behav-
iors of the robot as it builds, breaks, learns, and remembers, we em-
phasize that mechanisms of episodic memory are fundamental design
features necessary to enable the survival of autonomous robots in a real

神经计算 26, 2692–2734 (2014)
土井:10.1162/NECO_a_00664

C(西德:2) 2014 麻省理工学院
在知识共享下发布
归因 3.0 Unported (抄送 3.0) 执照

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2693

world where neither everything can be known nor can everything be
experienced.

1 介绍

Our individual experiences play a fundamental role in leading us to ex-
hibit numerous instances of creativity, rationality, and irrationality in our
behaviors. Use of experience to go beyond experience is important simply
because we all inhabit a continuously changing world where neither every-
thing can be known nor everything can be experienced. To survive, 我们必须
integrate diverse chunks of knowledge emerging from our past experiences
and exploit them ﬂexibly in the context of the present to ensure smooth re-
alization of our goals. Neural mechanisms associated with the organization
and use of memory play a fundamental role in connecting our past with the
available present and possible future. 的确, such processes are of crucial
importance for autonomous robots situated in unstructured environments.
简单的说, beyond a point, a software programmer cannot travel the jour-
ney of an autonomous robot. 反而, like natural cognitive agents, robots
must be endowed with mechanisms that enable them to efﬁciently organize
their sensorimotor experiences into their memories, remember and exploit
them effectively when needed, and keep learning cumulatively.

This article is an adventurous exploration in this direction using a play-
ful scenario of the humanoid iCub learning to assemble the tallest possible
stack using an arbitrary set of objects available to it: learning progressing
cumulatively in an open-ended fashion. There are several causal relations
that the robot has to learn, remember, and exploit. 例如, nothing
can be stacked on top of objects like spheres, mushrooms, or pyramids;
it is better to stack large objects at the bottom; the color of objects is not
a causally dominant parameter while building stacks (but shape and size
do matter); 等等. 重要的, there are no unique solutions to be
optimized because the solution itself depends on what objects are avail-
able to the robot in the present. Sometimes past experiences may have to
be combined with explorative actions on a novel object, and sometimes
multiple past experiences could be creatively recombined to generate novel
behaviors. 一般来说, this playful scenario allows the investigation of the
constant interplay between neural mechanisms related to learning, 记忆,
prospection, and abstraction from the perspective of an integrated system
that acts, learns, remembers, 原因, and makes mistakes.

1.1 The Context: Connecting Emerging Trends in Neurosciences to
Developmental Robotics. A central challenge for brain science today is to
causally and computationally correlate the complex behaviors of animals
to the complex activitiy in their brains. 这里, emerging empirical studies
from the neurosciences connect to developmental robotics that attempts

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2694

V. Mohan, G. 桑迪尼, 和P. Morasso

to understand cognition through a model-building approach that reenacts
the gradual process of infant developmental learning through robots. 这
underlying value is both intrinsic (understanding ourselves) and extrinsic
(creating a new generation of autonomous systems that can cognitively as-
sist us in the environments we inhabit and create). Mechanisms related to
the organization of memory in the brain have been actively investigated
over several decades at multiple levels (Squire & Wixted, 2011) 和交流电-
companied by propositions of various computational models (Sederberg
& Norman, 2010; Chong, Tan, & 的, 2007). More recent excitement in
this topic is attributable to studies that provide converging evidence for
shared neural processes underlying remembering past events and simu-
lating future events. Speciﬁcally, converging evidence suggests an exten-
sive overlap in the brain networks activated while recalling the past and
those engaged during other activities as diverse as thinking about the fu-
真实 (Addis, 王, & Schacter, 2007; Szpunar, 沃森, & 麦克德莫特, 2007;
Hassabis, & Maguire, 2011; Schacter et al., 2012; Addis & Schacter, 2012),
spatial navigation (伯吉斯, Maguire, & 奥基夫 2002, Suddendorf, 2013),
social cognition (Raichle et al., 2001; Frith & Frith, 2012), and perspective
采取 (Mason et al., 2007). This network of interacting cortical areas has
been termed the default mode network (DMN) (Raichle et al., 2001; 巴克纳
& 卡罗尔, 2007; 巴克纳, Andrews-Hanna, & Schacter, 2008; Suddendorf,
Addis, & Corballis 2009; Bressler & Menon, 2010; Welberg, 2012). While the
reviews cited go into precise details, functionally there is consensus that
the central function of DMN is to construct self-referential episodic simula-
系统蒸发散, which include reconstruction of past experiences based on contextual
cues, simulation of possible future alternatives, evaluating their desirability,
and generating goal-directed plans. What is the underlying computational
and neural basis of such processes? Can we emulate such mechanisms in a
cumulatively developing robot (这里, the humanoid iCub)?

Practically, when a humanoid robot like iCub interacts with various
objects in its playground, it is the ongoing sequences of actions on vari-
ous perceived objects, the ensuing consequences, internal body state, 和
rewards received that mainly form the content of its experiences. 尽管
multimodal elements of sensorimotor experience and their temporal order
(IE。, microtime: Eichenbaum, 2004) need to be bound together to create an
episodic trace, inversely, partial cues arising from multiple sensorimotor
modalities must be able to trigger the recollection of relevant past episodic
经历, ﬁlling in the remaining missing information—for example,
perceiving a pyramid and recalling that it is more rewarding to place it on
the top if the goal is to assemble the tallest stack. Since the real world is
the main source of partial cues processed bottom up through the sensory
and motor streams, clearly there must be a link between subsystems in-
volved in perception and action, how such information is bound together
to form the episodic trace, and mechanisms related to recall, prospection,
and goal-directed planning.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2695

To functionally implement such a link in a cognitive robot, we took
guidance from multiple emerging results. Recent functional imaging stud-
ies have shed light on how conceptual knowledge is organized in the brain
(帕特森, Nestor, & 罗杰斯, 2007; 马丁, 2007, 2009; 迈耶 & Damasio,
2009). The main ﬁnding is that conceptual information is organized in a dis-
tributed fashion in property-speciﬁc cortical networks that directly support
perception and action (and that were active during learning). 相同
set of networks is known to be active during real perception and action,
imagination, and lexical processing. From a computational perspective, 我们
believe that such organization enables information coming from lower pro-
cessing areas in the cortical hierarchy (involved in, 例如, 颜色, shape, 尺寸,
行动, 声音) to generate partial cues to trigger recall of context-relevant
past experiences and facilitates learning which properties are causally dom-
inant for a speciﬁc task (例如, the color of objects is not a causally dominant
property while constructing the tallest stack). 同时, 信息-
tion processed by subsystems organized in a distributed property-speciﬁc
fashion must be coherently integrated both to form the episodic trace and
facilitate critical top-down, bottom-up interactions during learning, 记起,
prospection, and forgetting. Findings from the ﬁeld of connectomics, specif-
ically in relation to small-world properties, provide valuable clues in this
方向. Small worlds are complex systems involving a large number
of individual members (例如, 人们, 神经元, 电脑) that form tightly
knit local communities (high clustering) and are characterized by very short
path lengths (globally accessible in a very few hops). Since the seminal work
of Watts, Strogatz, and Barab´asi (Watts & Strogatz, 1998; Barab´asi & 阿尔伯特,
1999; Barab´asi, 2003); it is now established that several complex systems
(例如, 互联网, power grids) exhibit the small world property (Barab´asi,
2012). More recent attempts to map the large-scale structural architecture of
the cerebral cortex (Haggman et al., 2008; 斯波恩斯, 2010) have revealed that
cortical networks in the brain also exhibit small-world property, 具体来说
pointing to the existence of a small set of hubs (highly connected clusters)
that mediate global trafﬁc, facilitating swift integration and in turn forming
a core network of interacting cortical areas (Van den Heuvel & 斯波恩斯, 2013;
Bressler & Menon, 2010).

Guided by these studies, our working hypothesis was that while the
distributed property-speciﬁc organization brings in a level of functional
segregation enabling efﬁcient organization of sensorimotor information,
the small-world property enables global integration between them and fa-
cilitates the emergence of a small set of hubs that together form a higher-level
cognitive network (like DMN). 在这个意义上, the proposed neural frame-
work both connects and embodies these emerging trends in neurosciences.
As seen in Figure 1, there is a distributed property-speciﬁc organization of
sensorimotor information, integrated through a small set of hubs. The tem-
poral order of activations in hubs while experience is being gained forms
the core content of the robot’s episodic memory, duly supplemented by

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2696

V. Mohan, G. 桑迪尼, 和P. Morasso

数字 1: (正确的) Block diagram of how information related to perception and
action is organized and the link to the episodic simulation system. There is a dis-
tributed property-speciﬁc organization of sensorimotor information, integrated
through a small set of hubs. The temporal sequence of activations in the hubs
when experience is originally gained is used to form the episodic memory. 在
同一时间, bottom-up activations in the hub provide partial cues to trig-
ger context-related recall. Activations in the episodic memory network in turn
modulate top down the hubs to mediate fundamental processes like combining
past experiences with exploration, ﬂexibly connecting multiple experiences in
a novel situation, consolidation, and forgetting. (左边) Snapshots of the basic
perception-action loop at work: the robot perceiving objects through color and
shape and performing basic motor actions necessary to interact with objects to
kick-start the learning-memory-prospection-consolidation loop.

mechanisms that enable context-speciﬁc recall, combining past experiences
with explorative actions, creative plan formation, and forgetting.

1.2 Aims and Scopes. The emerging trends in neuroscience coupled
with inherent difﬁculties faced while enabling robotic systems to exhibit
brainlike resourcefulness, purposefulness, and adaptivity in their behav-
iors call for novel frameworks for cumulative development going beyond
conventional engineering and machine learning techniques. 在本文中,
we integrate emerging ideas from neuroscience, to create a brain-guided
framework for the organization and creative use of episodic memory in
a cumulatively developing humanoid. Both the proposed computational
framework and the results are described in a cumulative fashion as learn-
ing progresses gradually. The goal for the robot is to learn to build the tallest
possible stack given an arbitrary set of objects. Each episode of play may in-
volve objects that have been experienced previously along with novel ones.
此外, there is no unique solution, as the solution itself depends on

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2697

the objects available in the now and what the robot knows about them.
因此, both learning and reasoning take place in an open-ended setup
where the robot is continuously pushed to both exploit what it “knows”
from its past experiences in the context of new situations and at the same
time learn by exploring novel objects, remember its own mistakes, 和每-
form better in the future. The simple, playful scenario is both novel and
ﬁtting to explore complex open issues that lie at the intersection of learning,
记忆, and prospection planning when any autonomous robot learns
incrementally in an unstructured setup. Using this scenario, we explore
the computational mechanisms related to organization and utilization of
episodic memories in a cumulatively learning humanoid and speciﬁcally
try to address the following open questions:

• What are the basic neural mechanisms underlying storage and recall
of past experiences based on the present context in an open-ended
cumulatively learning setup?

• How can remembered past experiences be combined with explorative

actions to learn and memorize something new?

• How can multiple remembered experiences be recombined to gener-
ate novel behaviors in a new situation (without the need for explo-
rative actions)?

• What is the relationship between the robot’s episodic memories and
the core subsystems directly involved in perception and action when
experience is gained originally?

• The neural basis for forgetting: How do multiple episodic memories
compete to survive in the neural space and thus not be forgotten?
• Putting it all together: What are the basic computational pro-
cesses governing the incessant interplay between learning, 记忆,
prospection, and abstraction in a cumulatively developing system?

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

We next present a brief overview of the robot and existing sensorimotor

基础设施.

1.3 The iCub Humanoid and the Underlying the Perception-Action
Loop. The iCub is a small humanoid robot of the dimensions of a
three-and-half-year-old child and designed by the Robot Cub consortium
(www.icub.org). 这 105 cm tall robot is characterized by 53 degrees of
freedom: 7 DoFs for each arm, 9 for each hand, 6 for the head, 3 为了
trunk and spine, 和 6 for each leg. The iCub body is also endowed with
a range of sensors for measuring forces, torques, 关节角度, inertial sen-
索尔斯, and tactile sensors in the hand and arms and three axis gyroscopes,
cameras, and microphones for visual and auditory information acquisi-
的. With a special focus on manipulation and interaction of the robot
with the real world, iCub is characterized by highly sophisticated hands,
a ﬂexible oculomotor system, and sizable bimanual work space. 数字 1
shows a block diagram of how the perception-action related information is

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2698

V. Mohan, G. 桑迪尼, 和P. Morasso

organized. At the bottom is the Darwin sensory layer that includes the
sensors, associated communication protocols, and algorithms to analyze
properties of the objects—mainly color, shape, and size.1 Results of percep-
tual analysis activate various neural maps (property-speciﬁc SOMs in layer
1), ultimately leading to a distributed representation of the perceived object
in the connector hub. (Interested readers may refer to Mohan, Morasso,
桑迪尼, & Kasderidis, 2013, for a detailed description of the sensorimo-
tor organization and learning). The kind of distributed property-speciﬁc
organization and global integration through hubs is in line with emerging
results from neuroscience discussed in section 1.1. What is relevant as far
as this article is concerned are mainly that (1) bottom-up processing leads
to a distributed representation of the perceived objects in the object connec-
tor hub (IE。, “what is it?”), and due to reciprocal connectivity between the
hubs and layer 1 SOMS, it becomes possible to learn which properties are
causally dominant in a particular task (we explore this issue in subsequent
sections).

In relation to the organization of action, there is a subtle separation
between the representation of actions at an abstract level (“what can be done
with an object”) and the action planning details (“how to do”). While the
former relates to the affordance of an object, the latter relates to procedural
memories of motor skills. The abstract layer forms the action hub and
consists of single neurons coding for different action goals like reach, grasp,
push, and stack and grows with time as new skills are learned. 在这个
感觉, neurons in the top-level action connector hub are similar to canonical
neurons found in the premotor cortex (Murata, Fadiga, Fogassi, Gallese,
Raos, & Rizzolatti, 1997) that are activated at the sight of objects to which
speciﬁc actions are applicable. The action hub in turn provides motor goals
to the action generation layer that is responsible for the details of motion
规划, and synthesis of motor commands to perform the requisite action.
The passive motion paradigm framework (Mohan & Morasso, 2011; Mohan
等人。, 2011), coordinating the iCub upper body is used to generate all motor
actions relevant to this article. 总结一下, we begin the tallest stack task
with a functional identify-localize-reach-grasp loop. 数字 1 also illustrates
the link between the core hubs and the episodic simulation system that
forms the locus of investigation in this article. The temporal sequence of
activations in the hubs when experience is originally gained is used to
form episodic memory. 同时, bottom-up activations in the
hub provide partial cues to trigger context-related recall. Activations in the
episodic memory network in turn modulate top down the hubs to mediate
fundamental processes like combining past experiences with exploration,
ﬂexibly connecting multiple experiences in novel situations, 和角色

1The acronym Darwin stands for the ongoing EU-funded project Dexterous Assembler

Robot Working with embodied Intelligence (www.darwin-project.eu).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2699

数字 2: (Top) An example of one memory organized as a distributed activity in
1000 neurons arranged in a sheetlike structure with 20 rows, each containing 50
神经元. Activity in every row may be thought an event in time. 完整的
sequence is considered as an episode of experience. (Bottom) Reconstruction
of the complete episodic memory triggered by a partial cue coming from the
环境 (例如, perceiving a cylinder and remembering the past experience
of stacking the cylinder on top of a pyramid and failing to receive any reward).

of consolidation and forgetting as learning progresses cumulatively. 这些
topics form the central core of the rest of this article.

2 A Basic Implementation of Episodic Memory

在这个部分, we brieﬂy summarize a recently proposed excitatory-
inhibitory neural network of autoassociative memory (Hopﬁeld, 2008). 这
network that deals with basic storage and retrieval mechanisms will be
taken as a starting point and further enriched in the context of a cumula-
tive developmental learning and reasoning framework where experiences
are cumulatively acquired by the robot by interacting with the world; 这
number of memories grows with time, some eventually forgotten, 一些
consolidated; and multiple memories of past experiences retrieved based
on context and goals may have to be causally combined to generate novel
creative behaviors. For modeling purposes in the context of this article, 我们
deal with a small patch in the sheetlike neocortex, consisting of 1000 pyra-
midal cells (N= 1000). For simplicity in visualization, 这 1000 neurons are
organized in a sheetlike structure with 20 rows, each containing 50 新-
罗恩. An example is shown in the top panel of Figure 2; activity in every
row may be thought of as an event in time and the complete memory as
an episode of experience. We are mainly dealing with objects, actions and
rewards as these are different aspects relevant to the tallest stack assembly
scenario. But in general, anything worth remembering can be represented in

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2700

V. Mohan, G. 桑迪尼, 和P. Morasso

such neural activity. 重要的, in the memory network of 1000 神经元,
multiple episodic memories can be encoded and retrieved—for example,
playing on day 1 with cubes and pyramids; playing on day 2 with spheres,
cubes, and containers; 等等. 同时, given a partial cue
(the robot perceives a red pyramid on day 3), the complete past experience
that it had on day 1 (or other days) can be recalled from this partial cue.
The memory circuit is characterized by all-to-all connections between the
N excitatory neurons (因此, the connectivity matrix is of the order N × N).
Memories are stored in the network by updating the connections between
different neurons using Hebbian learning. 此外, there is an inhibitory
network equally driven by all N excitatory neurons that inhibits equally all
excitatory units. A rate-based model is used in which the instantaneous
ﬁring rate of each neuron is a function of its instantaneous input current.
The procedures for storage and recall are as follows:

• Memorizing an episode: Let Vnew be a one-dimensional vector repre-
senting the activity of N (N =1000) neurons shown as a 20 × 50 矩阵 (看
数字 2, top panel). Let T denote the connectivity matrix between the N
神经元. Since there are 1000 神经元, the dimensionality of T is 1000 ×
1000, which represents the strength of the connection between any neuron
i to any neuron j. T is a null matrix to start with, as nothing is known. 骗局-
sider that the episode represented by activity VNew has to be stored in the
memory network. This is done by updating all the connections Tij between
the N neurons in a Hebbian fashion, using a very simple rule:

If Vi

= 1 and V j

= 1, then make Ti j

= 1 (regardless of what its

value was before). Else, make no change in Ti j

(2.1)

Starting with T = 0, as newer and newer experiences are gained, forgotten,
or consolidated, the connectivity matrix gradually is updated dynamically.
• Network dynamics to remember an episode from partial cue: To reconstruct
the complete memory (说, Vnew) from a partial cue (例如, the next day the
robot perceives one of the objects it has played with in the past), such partial
cues (见图 3 bottom left) or initial conditions V are impressed on the
memory network and the network is allowed to evolve according to the
equation of motion:

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

t
相对

˙Vk

= −Vk
(西德:3)

氮(西德:2)

j=1

Tk, jVj

+ Iinhib
(西德:4)

Iinhib

= g

−αin + β

(西德:2)

G(我) = 0, i f (我 < 0), else, g(i) = i. (2.2a) (2.2b) (2.2c) A Neural Framework for Episodic Memory 2701 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / Figure 3: A global picture of bottom-up and top-down information ﬂow be- tween the subsystems involved in perception-action, the hubs, and the episodic simulation system. Note the direct link between the hubs and the episodic memory network. Bottom-up information ﬂow (black arrows) gives rise to par- tial cues triggering the recall of context-relevant past experiences. However, as the robot learns cumulatively, there will be several remembered experiences. Thus arises the need to both ﬁlter out the most valuable “team” of past expe- riences relevant to the present context or goal and at the same time gradually consolidate or forget some of these stored memories. This is functionally imple- mented by the top-down information ﬂow (the arrows) through a survival of the ﬁttest–like competition mechanism. VK is the activity in the Kth neuron. T is the connectivity matrix between the neurons learned using equation 2.1 when the memory is stored in the network. I is the current coming from the inhibition network that is modeled as a single neuron. The function of the inhibitory network is to keep the excitatory system from running away, to limit the ﬁring rate of the excitatory neurons. At low levels of excitation the inhibitory term generally vanishes. For all experiments αin was chosen as 30, τ rel as 1000, and β as 3.5. As seen in Figure 3 (bottom right panel) triggering the memory network with a partial cue and allowing it to evolve in the dynamics described in equation 1.2, it is possible to reconstruct the complete episode. Multiple episodes around 200 to 250 (Hopﬁeld, 2008) can be simultaneously stored and correctly retrieved / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2702 V. Mohan, G. Sandini, and P. Morasso in a network of 1000 neurons. In the sections that follow, we start from zero and gradually present results related to: • How the robot learns cumulatively about different objects and their affordances in the context of enabling it to assemble the tallest stack • How the robot combines recalled past experiences with explorative actions to learn further or causally connects multiple remembered experiences to generate novel behaviors • Survival of the ﬁttest like competition between multiple stored expe- riences and the ensuing process of growth, forgetting and assimilation of episodic memories as learning progresses cumulatively 3 Storing and Remembering Experiences from Partial Cues In the beginning, the robot has no memory of any past experience (T =Null). Only the robot is able to execute primitive sensory and motor actions like identifying objects in the scene and generating reach and grasp actions. With this starting point, iCub is presented with 2 objects to explore: a green mushroom and a yellow cylinder. 3.1 Content of iCub’s Episodic Memories: Top Down–Bottom Up In- teractions Between Hubs and the Episodic Simulation System. We noted that episodic memories of iCub are organized as activations in a 1000- neuron patch arranged in the form of a 20 × 50 sheet. However, we did not clarify what those activations meant. We clarify this here before proceeding with the ﬁrst episode of learning. In the proposed framework, the content of the robot’s episodic memory is the temporal sequence of activity in the ob- ject, action hubs, or reward received when experience was originally gained by the robot and encoded in the neural connectivity (using equation 2.1). Every row (in the 20 × 50 sheet of neurons) is a discrete event in time and the complete sequence an episode of experience (like stacking a cylinder on top of the mushroom and receiving a reward of 0). Hence, there is a direct relation between activity in the hubs and the activity in the episodic memory network. There is both biological grounding (see section 1.1) and computational simplicity behind this proposition. The crucial advantage is that such a scheme allows both bottom-up activation of the hub to generate partial cues, thus triggering a recall of past experiences, and inversely, the possibility of such remembered episodic experiences to modulate the hub’s top-down facilitating core processes related to combining past experiences with explorative actions, creative plan formation, and forgetting. Both of these issues will be addressed in detail gradually with numerous examples in this letter. Figure 3 gives a global picture of bottom-up and top-down in- teractions between the subsystems involved in perception-action, the hubs, and the episodic simulation system. Objects present in the world activate l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2703 the object hub, bottom up (black arrows) through the perceptual streams processing color, shape, and size-related information. The distributed ac- tivity in the object hub is the source of partial cues. From partial cues, context-relevant past experiences are recalled (using equation 2.2). How- ever, as the robot learns cumulatively, there will be several remembered ex- periences. Thus arises the need to both ﬁlter out the most valuable “team” of past experiences relevant to the present context or goal and at the same time gradually consolidate or forget some of these stored memories. This is functionally implemented by the top-down information ﬂow through a sur- vival of the ﬁttest–like competition mechanism. Only memories that gain top-down control over the hubs enter the construction system and get their content reenacted again through the body thus reasserting the value of their content to the organism. This ensures their longevity. Memories that never win the top-down competition are either consolidated or eventually forgot- ten. In sum, bottom-up activation of the hub is equivalent to what is there in the world (this is also the input to the visuospatial sketch pad, a component of the working memory that keeps track of things in the present). Top-down activation of the hub is equivalent to what is known from experience and plays a crucial role in facilitating how past experiences are combined with explorative actions on novel objects (see section 4) or recombining multiple past experiences to generate novel goal-oriented behaviors (see section 5) or consolidation and forgetting (see section 6). 3.2 Day 1: Playing with a Green Mushroom and a Yellow Cylinder. In episode 1, the robot is presented with a green sphere (with a ﬂat base like a mushroom; see Figure 4) and a yellow cylinder. Since there is no past expe- rience, the connectivity matrix T is null. Considering that nothing is known, the only option is to explore. Randomly the robot chooses to stack the mush- room on top of the cylinder. The sequence of activation in various neural maps (color, shape, word, and hub) as a function of time when the sphere is stacked on top of the cylinder is shown in the top panel of Figure 4A. The yellow cylinder is identiﬁed and localized (sensory streams trigger different property-speciﬁc maps processing color and shape information leading to activation in the object hub in relation to the yellow cylinder). Since the goal is to stack and this comes directly from the user, the single neuron coding for stacking in the action hub is activated. Next, attention is focused on the mushroom, activating the hub in relation to the sphere that is stacked on top of the yellow cylinder. Finally, the user/teacher gives a reward (a number entered by keyboard) to the robot. In this case, the reward received is 2 because two objects were stacked successfully. This temporal sequence forms the basis of our ﬁrst episodic memory, say, EM1, shown in Figure 4B. Every row in the 20 × 50 memory represents activity in the object hub, action hub, or reward received (that terminates the sequence). In the case of episode 1, the ﬁrst row corresponds to activity in the object hub l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2704 V. Mohan, G. Sandini, and P. Morasso l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 in relation to the yellow cylinder, the second row corresponds to activity in the action hub related to action taken (stack), the third row the activity corresponding to the green sphere, the fourth row corresponding to action hub activity, and the ﬁfth indicating the reward received. Columns 43 to 45 in each row code the identity of the hub to which the information in the row is related (object, action, or value). EM1 is stored in the memory based A Neural Framework for Episodic Memory 2705 on the learning rule of equation 2.1 to update the connectivity matrix T. The robot has not yet exhausted all its explorative options. In episode 2, it attempts to stack the cylinder on top of the sphere (4C). If we compare episodic memory 1 and 2, the difference is that the object representations swap roles (spheres moving to row 1 and cylinders to row 3). This turns out to be a disaster, and the user rewards the robot with just 1 (row 5). Episode 2 is also impressed in the neural network and stored as a new memory. So now the robot has two episodic memories of its explorative experi- ences: sequences of actions on different objects with reward received at the end. 3.3 Generation of Partial Cues. What happens even after these two initial episodes of explorative sensorimotor experience is interesting. Two cases are shown in Figure 4D. In the ﬁrst case (scenario 1), a green mushroom is presented to the robot. Perception of the green mushroom generates two partial cues from which the past experiences related to it (episodes 1 and 2) can be recalled from memory (using the dynamics of equation 2.2). In short, what is remembered is that “in the past, I have seen this object coming along with yellow cylinders and stacking the spherical object on the top was more rewarding.” While equations 2.1 and 2.2 describe storage and retrieval mechanisms of the episodic memory, we now describe the computational basis of how partial cues are generated. This is a nontrivial problem in a cumulative learning setup where the robot gradually gains Figure 4: (A) Temporal sequence of activations in various neural maps (color, shape, object, and action hubs) when the robot by random exploration stacks the mushroom on top of the cylinder. The content encoded in the episodic memory network is the temporal sequence of activity in the object, action, and value hubs, when the robot gains experiences. (B) The complete temporal sequence of bottom-up activity in the object-action hubs and rewards received when ex- perience is acquired (Panel A) as represented in the 50 × 20 episodic memory network. (C) The similar encoding of episode 2 where the robot stacks the cylin- der on top of the mushroom, receiving the lesser reward (as the tallest stack was not built). Note that the activations in rows 1 and 3 are swapped in Pan- els B and C, reﬂecting the temporal sequence of activations when experience is gained. Rewards received are based on the robot’s success in building the tallest stack and changes dynamically with the situation. (D) The behavior im- mediately after two episodes of experience are encoded. Bottom-up perception of the mushroom activates the object hub and ﬁlls in partial information in the episodic memory network, leading to recall of past experiences associated with it. What is recalled ﬁlling in all missing information is a valuable inference that it is more rewarding to stack mushroom-like objects on the top. The emphasis in the preliminary example is that valuable action sequences are implicitly evident in the episodic recall of past experiences. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2706 V. Mohan, G. Sandini, and P. Morasso new experiences, new memories are formed, and some forgotten, and the same objects may be a part of multiple episodic memories (in combinations with other objects and rewards received). Partial cues basically come from objects perceived in one’s immediate environment and action-related goals (to build a stack) that activate the top-level object action hubs (bottom- up information Flow of Figure 3). To generate partial cues in the episodic memory network based on bottom-up activations in the hubs, we introduce three new variables. The ﬁrst variable, C, is a scalar counter that keeps track of the number of episodic experiences stored in the memory. C starts from zero and is incre- mented when a new memory is stored and decremented when memories are forgotten (C = 2 at present because two episodes, EM1 and EM2, are stored in memory). W, the second variable, encodes connections between neurons in the object hub and episodic memory network. If there are (cid:3) neurons in the object hub (42, here) and N neurons in the episodic memory patch (1000, here), then W is a matrix of (cid:3) × N. W is also null to start with and learned in a Hebbian fashion at the same time when a particular episode is stored in the memory. The learning rule is that if a neuron i in the object hub activates a neuron j in the episodic memory patch, then reinforce the connection between them. For example, if neuron h in the object hub and neuron n in the episodic memory patch (in rows that relate to activity in the object hub) are concurrently active and Whn denotes the link between them, then set Whn = 1 (3.1) The net effect is that any time in the future when the same neurons in the object hub are activated due to bottom-up perception (like seeing a sphere on day 2), speciﬁc neurons in the 20 × 50 episodic memory network are activated as modulated by W, giving rise to partial cues to trigger retrieval of the complete past experience (through the dynamics of equation 2.2). However, one further issue must be dealt with: the connectivity matrix W encodes all possible partial cues that could be triggered by a perceived object. Hence, there is a need to bring in additional context that must have the effect of switching on only a subset of W that relates to the generation of partial cues for retrieving one episodic memory and not all of them at the same time. This is done by introducing a local parameter, Mhn, associated to every Whn, that encodes the identity of the episodic memory during which Whn was adapted (using equation 3.1). For example, if a connection between a neuron h in the hub and a neuron n in the episodic memory patch, Whn, was learned while memorizing episodic memory c, then Mhn is set to c. In this way, the connectivity matrix Whn can be further modulated to enable generation of partial cues related to retrieval of speciﬁc episodic memories. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2707 If there are C episodic memories stored in T (connectivity matrix of the 1000-neuron-patch encoding past experiences), then the partial cue Pc related to an episodic memory c, based on present context (i.e. object hub activity H), can be generated using Pc (cid:5) hn = H × K, where Khn = 1 if, Mhn = c = Whn × (cid:5) hn (3.2) As depicted in equation 3.2, partial cues generated are a function of three components: the activity in the hub H, which represents objects perceived in the present; W, which encodes all possible partial cues that can be generated by all the objects active in the hub; and (cid:5), which has the net effect of switching on a partial cue related to a speciﬁc episodic memory c, and not all of them at the same time. When the robot is presented with a green sphere (see Figure 5D), two partial cues are generated—one in relation to EM1 and other in relation to EM2. The dynamics of equation 2.2 triggers a pattern completion process by which the full experience in relation to the partial cue is reconstructed. What is then known at the end is not just information about spheres, but also how they were stacked along with cylinders in the past and what the consequence was (information that was not available in the partial cue itself). Retrieved past experiences are transferred to temporary buffers (working memory) and begin their life in the system. This summarizes the bottom-up information ﬂow of Figure 3. 3.4 Valuable Action Sequences Are Evident in the Episodic Recon- struction. As seen in the right panel of Figure 4D, from the retrieval of the past experiences, it is possible to infer which behavior is more rewarding. This is the simplest example to illustrate the use of episodic reconstruction of the past toward planning actions in the present. One may also envision that the two remembered “past experiences” are competing to survive (as depicted in Figure 3), with the “losers” gradually forgotten. In this simplest case, anticipated reward is the criterion based on which a reconstructed memory of past experience wins the competition. Note that there is no need for an explicit planner; the valuable action sequence is evident in the reconstructed episodic memories that win the top-down competition (in this case, EM1, which anticipates greater rewards). Memories that win the top-down competition manage to reenact their sensorimotor content through the body (in a way, reasserting their value to the organism). In- versely, consistent losers like EM2 may be forgotten as learning progresses incrementally. We elaborate these topics in detail with examples in the sec- tions that follow. 3.5 Causally Irrelevant “Properties” Can Be Eliminated During the Assimilation of Episodic Memories. Before introducing new objects in l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2708 V. Mohan, G. Sandini, and P. Morasso l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 5: When iCub is presented with a blue cylinder and orange sphere, partial cues are generated (A), leading iCub to recall its past experiences of exploring with the green sphere and yellow cylinders (B). Only the more re- warding experience EM1 is shown (B). (C) The present behavior of the robot of stacking the orange sphere on top of the blue cylinder. Note that this episode is different from the recalled past experience (panel B). The difference between panels B and C is highlighted and mainly corresponds to change in activations related to color in the object hub. Nevertheless, the consequence in terms of reward received is as anticipated. Thus, instead of storing the present behavior as a new memory, EM1 is consolidated by eliminating the difference between recalled past experience and present behavior. In other words, the robot en- codes that color of objects do not matter when building the tallest stack. Thus, not every episode of experience is stored. Only those that contain information that is not available in the retrieved past experiences are stored. In the future, when any combination of spheres and cylinders (E) is encountered, the new consolidated episodic memory is remembered and used to guide the present action plan. the environment, we describe an interesting consequence of distributed property-speciﬁc organization of objects in our computational framework. It becomes possible to go beyond object-action and learn which properties are causally dominant in a particular task. How can we abstract which property is causally dominant for a speciﬁc task by playing and learning A Neural Framework for Episodic Memory 2709 incrementally with objects in the real world? We brieﬂy address this topic here in the context of stacking. Considering that the robot has past experi- ences with the green sphere and yellow cylinder, the teacher now presents the robot with a blue cylinder and orange sphere. Bottom-up visual analysis of the scene activates the object hub and leads to the generation of partial cues (see Figure 5A). Note that the generated partial cue is different and contains lesser information as compared to the partial cues in Figure 4D. This is because the objects in the scene that caused the generation of partial cues are also different: they share similarity in shape but not in color. From the partial cue, the past experiences of playing with the green sphere and yellow cylinder is recalled successfully. Only the more rewarding memory (i.e., placing the green sphere on top of the yellow cylinder) EM1 is shown (see Figure 5B). Although the robot knows nothing about stacking blue cylinders and or- ange spheres, it knows something about yellow cylinders and green spheres and the fact that it was more rewarding in the past to place the sphere on top of the cylinder. EM1, the more rewarding action sequence, is once again exe- cuted, and it turns out that the consequence (in terms of reward received) is the same as anticipated. This new episode generated by the robot is shown in Figure 5C. Note that this is different from the recalled past experience but results in same consequence (the difference, which is highlighted, mainly deals with different activity in terms of the color in the object hubs). Does this new episode also have to be stored in the memory by updating the T matrix? Not really, because we can come up with an elimination rule that compares a reconstructed past experience with the present experience: if a change in property results in no change in anticipated consequence, then the property that has changed is not causally dominant for the task being learned. Hence, the nondominant property can be eliminated. Thus, instead of storing episode 3, the knowledge that the color of objects does not matter while building stacks can be assimilated into the previously stored episodic memory by inhibiting the ability of the color map to activate the object hubs in the context of stacking (this will ensure that color-related activations do not trigger the partial cues related to stacking). The consoli- dated memory is shown in Figure 5D. Thus, instead of memorizing the new episode, the robot has implicitly learned that the color of objects does not affect the way they should be stacked. Hence, not every episode is encoded in the memory. Only those that contain information that is not available in the retrieved past experiences are stored (we see this in the next section when cubes are introduced). 4 When Memories of Past Experiences Compete to Become Alive Again: Introducing Cuboids as Novel Objects Cubes are introduced as novel objects along with spheres and cylinders. Now there is an interesting combination, because the robot has incomplete knowledge: it knows something about cylinders and spheres but has never l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2710 V. Mohan, G. Sandini, and P. Morasso experienced the effect of cubes in the context of assembling the tallest stack. This is the simplest case where exploration and experience have to be com- bined. In the sections that follow, we incrementally propose a number of ideas related to this topic, implementation of the necessary subsystems, and experimental results. 4.1 Top-Down Information Flow: What Does It Take for Past Experi- ences to Become Alive Again. “Becoming alive again” refers to the ability of a remembered memory trace to get its content reenacted by the actor (body), hence reasserting its value to the organism. To functionally imple- ment this, we introduce a survival of the ﬁttest–like top-down competition between remembered episodic memories to gradually retain the valuable ones and forget consistent losers. The schematic representation of this pro- cess is also shown in Figure 3. In our framework, of all the remembered ex- periences in relation to the present context, only a small subset that manage to gain control over the object hub top down get access to the construction system (and the body). Gaining access to the construction system basically means that either the complete remembered experience or a part of it will be used or reenacted in the “now,” hence ensuring the longevity of that memory trace. This in fact is the beauty of top-down and bottom-up driv- ing each other. The only way for a memory to stay alive is to go through the same process that gave birth to it in the ﬁrst place: control the object hub top down. Whosoever manages to do so enters the construction system, has an opportunity to reenact their content through the body, reasserts their value, and ultimately survives longer. We believe mechanisms related to interleaving of top-down and bottom-up control of hubs may be crucial in the efﬁcient exploitation, growth, and assimilation of memory impor- tantly when acquired by a process of cumulative learning through playful sensorimotor interactions. A subtle point to note here is that episodic memories of past experiences that manage to enter the construction system may involve actions on several objects that may not actually be present in the now and hence cannot be acted on (e.g., when the robot is presented with a green sphere, the past experience that was remembered involved both green sphere and yellow cylinder: see Figure 4D). To eliminate such elements of the past that are not relevant in the now and extract only the doable actions, we need another subsystem that represents just objects in the now and is not corrupted by top-down activity. To this subsystem we add the visuospatial sketchpad (VSSP), an element of the working memory. Though it has several cognitive functions, we consider for simplicity that VSSP represents perceived objects that are available in the now. VSSP itself is refreshed through bottom-up perception as the robot perceives objects present in front of it and has similar representations as the bottom-up activity of the object hub. The only difference between VSSP and object hub is that VSSP holds only context- dependent information, while object hub may be activated even top down l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2711 (by reconstructed memories of past experiences). So an object that is not present in the environment but is internally simulated manages to activate the object hub top down but not VSSP (VSSP in this sense represents objects on which real actions can take place). 4.2 Combining Exploration and Past Experience to Create Plans. Fig- ure 6 shows the temporal evolution of the system when cubes are introduced as novel objects along with spheres and cylinders. For clarity, we break this scenario into three phases. 4.2.1 Bottom-Up Information Flow: From Objects (and Goals) in the World to Remembering Past Experiences Encountered with Them. Figure 6A shows the bottom-up information ﬂow. Objects present in the world are analyzed by perceptual modules, ultimately activating the object hub. Bottom-up activa- tion of the object hub (which indicates the recognized objects in the world) is also transferred to the VSSP. As a result of bottom-up information ﬂow, both VSSP and object hub activity show the presence of three objects. Ob- ject hub activations generate partial cues (as in Figure 4D, hence not shown here) leading to retrieval of two past experiences the robot has had in the past: EM1, stacking the sphere on top of the cylinder and receiving a reward of 2, and EM2, stacking the cylinder on the sphere and receiving a reward of 1. To summarize, bottom-up information processing ﬁrst refreshes the VSSP (what objects are there) and then gives rise to partial cues that lead to the retrieval of relevant past experiences (“what I have done in the past in relation to the present situation”). 4.2.2 Top-down Inhibitory Competition Between Multiple Remembered Episodic Memories to Assert Their Signiﬁcance with Respect to Others in the Present Context. Remembered episodes of the past now compete and in- hibit each other in an attempt to control the hub in a top-down way. Which episodic memories (among all those remembered) win the competition are based on two factors: 1. The anticipated reward that could be obtained by the robot if the content encoded by the remembered episodic memory (or a part of it) is reenacted to realize the goal at hand. 2. The exclusivity of the knowledge they encode in the context of the goal. This implies that there need not be one winning past experi- ence; multiple experiences may reach the construction system by controlling parts of the object hub. Hence, the hub can be con- trolled in a distributed fashion by multiple reconstructed episodic memories. This is because different past experiences may encode different kinds of knowledge that could contribute to realizing the present goal. In such cases, it is like a team of past experiences con- nected together in the context of the present situation to realize l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2712 V. Mohan, G. Sandini, and P. Morasso l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / Figure 6: The temporal evolution of the system when cubes are introduced as novel objects along with spheres and cylinders. (A) Bottom-up hub activation through perceptual streams shows the presence of three objects. This informa- tion is also copied into the VSSP. Hub activity generates partial cues leading to recall of past experiences (EM1 and EM2 that encode knowledge related to spheres and cylinders). (B) EM1 and EM2 compete to gain top-down control of the hub. Temporal evolution of the top-down inﬂuence of these competing memories on the object hub is shown. EM1 emerges the winner in this simple case because of the greater reward fetched by it and gains full control of the hub. Note that bottom-up and top-down object hub activity are different, indicating that not everything about the present world is known through past experience. (C) Action sequence encoded by winning memory EM1 enters the construction system and is combined with two explorative actions: either binding at the end or at the beginning of the chunk that comes from past experience. The episode that fetches greater reward is stored as a new memory (placing the cube at the bottom, stacking the cylinder on top of it and sphere on top of the cylinder: EM3). The updated connectivity matrix T is shown in the right corner. f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2713 the goal. This interesting issue is elaborated in the next section when the robot is presented with a large box, cube, cylinder, and sphere. Top-down inhibitory competition between episodic memories to gain access to the hub is implemented as follows. Let H be the bottom-up acti- vation of the hub (present situation), and let there be m episodic memories reconstructed through partial cues competing to gain top-down access to the hub. Then the individual value or top-down inﬂuence (cid:6) i of the ith episodic memory on the hub (without taking into account effect of other competitors) is as follows: (cid:6) i = ((W × EMi ) × H) × Ri . (4.1) The component ((W × EMi ) × H) determines how much information is known in memory EMi in the context of the present situation (i.e., bottom- up hub activation H). W is the hub-to-episodic-memory connection learned whenever any memory is stored using equation 3.1. W is the 42 × 1000 matrix here; EMi is the reconstructed past experience (1000 × 1 vector rep- resenting activations of the 50 × 20 episodic memory patch). So the com- ponent W × Ei is a 42 × 1 vector and determines all possible top-down inﬂuences caused by EMi. To bring in the present context, we multiply ev- ery element in W × Ei with the corresponding element in bottom-up hub activity H (which encodes the present environmental situation). The result is weighted by a scalar Ri that denotes normalized reward fetched by this past experience (e.g., Ri for EM1 is 1 and for EM2 is 0.5) giving rise to (cid:6) i (a 42 × 1 matrix that captures the initial top-down inﬂuence of memory Ei on every neuron in the hub). Equation 4.1 accounts only for the inﬂuence of normalized reward fetched by Ei in its ability to inhibit other competitors E j. This sufﬁces for simple cases where all competing memories encode the same knowl- edge but yield different rewards, as in the present case (both EM1 and EM2 encode knowledge related to cylinders and spheres but yield different rewards). In addition, we also need to take into account case 2: exclusiv- ity of knowledge encoded in the competitors. Hence in our scheme, (cid:6) i is basically the initial condition with respect to the net top-down inﬂu- ence of the memory EMi to begin with. Starting from the initial condi- tion, (cid:6) i related to every episodic memory EMi evolves in time based on its own value in the present context and inhibitory effects of other com- petitors EMj (where j → 1:m episodic memories retrieved through partial cues). The temporal evolution of the top-down control (cid:6) episodic memory EMi on the hub (i.e., τ ˙(cid:6) i = F((cid:6) i i exerted by every )) is implemented by l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2714 V. Mohan, G. Sandini, and P. Morasso means of a Euler integration step: (cid:6) i (t) = (cid:6) i (t − 1) + 1 τ F((cid:6) i λ = R j j η(cid:2) k=1 ((cid:6) . ) k j (t − 1)), where F((cid:6) i ) = m(cid:2) . ∗ (cid:6) ((cid:6) i ).λ , j j j=1,J(cid:5)=i (4.2) . × (cid:6) The function F basically generates a net inhibitory effect on memory EMi coming from all other m competing episodic memories and takes into ac- count (1) how much knowledge is common between any EMi and EMj determined by the term (cid:6) j and (2) how much more a memory EMj i knows than EMi in the present context, which is determined by the scalar λ j (e.g., if the environment contains spheres, cylinders, and cubes and a re- membered past experience, say EM3, encodes information related to all of them, then it can inhibit the ones that know only parts like EM1 and EM2). In such cases, EM3 will inhibit EM1 because it knows more in relation to the present context, and both EM3 and EM1 in turn will cumulatively inhibit EM2, which knows the same as EM1 but yields even less of a reward. In the present scenario, only case 1 applies since both remembered past expe- riences EM1 and EM2 encode knowledge related to spheres and cylinders with EM1 (stacking the sphere on the top) anticipating a greater reward (last row). Figure 6B shows ﬁve temporal iterations of the top-down inﬂuence of EM1 and EM2 on the object hub. As seen, EM1, which fetches a greater reward, quickly manages to inhibit EM2 and gains complete control of the hub. 4.2.3 Combining Task-Relevant Action Sequence Known from the Past and Combining It with Explorative Actions (to Come Up with New Plans and Learn Further). Note that the top-down activity in the object hub (see Figure 6B, right corner) is different from activity in the VSSP (which holds bottom- up object hub activation). This is because there is no experience related to cubes encoded in the winning episodic memory EM1. In other words, directly comparing the VSSP and top-down hub activity, it is possible to infer that past experience is not sufﬁcient to realize the goal in the present context, thus requiring explorative actions to be combined with what is known from past experience. The inverse of this argument is even more intriguing and will be addressed in section 5. Now we are left with the problem of connecting the explorative stack- ing action on the cube with the partial actions sequence that comes from memory of past experience (EM1). This is straightforward: the explorative stacking action binds at either the end or the beginning of the chunk that comes from past experience, so the robot tries out two different action se- quences. In the ﬁrst episode (new experience 1), the robot places the cube l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2715 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 7: (A) Snapshots of the robot building the tallest stack as a result of combining past experience (with cylinders and mushrooms) and exploration (on the novel object: cubes). (B) Novel scenarios where this new experience is exploited to give rise to novel action sequences. Both of these action sequences generated by the robot are new and related to achieving the goal in the new situation (a novel combination of objects). Neither learning nor planning is needed. The correct action sequence is implicitly embedded in the remembered past experience. at the bottom (explorative action) and then places the cylinder on top of the cube and sphere on top of the cylinder. It is rewarded fully “i.e., 3” (all objects are stacked correctly). Exploring further (experience 2), the robot tries to put the cube on top of the sphere but does not succeed in getting the full reward (for obvious reasons). The more rewarding experience is now encoded as a new episodic memory (EM 3) as shown in Figure 6C. The connectivity matrix T is shown in the right corner. As seen, beginning from a null matrix, it has slowly started to grow. Figure 7A shows snapshots of the robot combining exploration with past experience to build the tallest stack using cubes, spheres, and cylinders. Figure 7B shows novel scenarios where no further learning is needed to come up with the correct stacking plan. In the ﬁrst novel scenario, cubes are presented with spheres. The same 2716 V. Mohan, G. Sandini, and P. Morasso bottom up–top down activity ﬂow (see Figures 6A and 6B) ensues, and the new memory EM3 (which encodes the knowledge related to spheres, cubes, and cylinders) controls the complete hub and enters the construction sys- tem. Since the cylinder is not represented in the VSSP, the action chunk related to the cylinder is not possible and is deleted from the action se- quence encoded by EM3. The robot stacks the sphere on top of the cube and anticipates the full reward. The same applies to the second scenario. Note that both of these action sequences generated by the robot are new, and related to achieving the goal in the new situation (one not encountered previously). Neither learning nor planning is needed. The correct action sequence is implicitly embedded in the remembered past experience that is, the “winning” episodic memory. 4.3 Introducing Large Objects. Before moving to the next level of com- plexity, we introduce one more object category: a large box. This section may also serve as a case that summarizes all that has been said so far. In the next episode of experience, the robot is given a large box and a small cube. The temporal evolution of the behavior is shown in Figure 8A. The bottom- up information ﬂow leads to neural activations in the object hub and VSSP. From object hub activity, partial cues are generated, reconstructing the most relevant past experience in the context of the present situation. EM3 (the previous episode related to stacking cubes, cylinders, and spheres) emerges as the winner and controls the hub top down. Note again that top-down hub activation differs from bottom-up hub activations because the past ex- perience itself is not sufﬁcient (there is a new object of which nothing is known). The winning past episodic experience enters the construction sys- tem where the task-speciﬁc chunk is extracted (cylinders and spheres are not present in the world or VSSP, hence vanish); only the cube remains. The robot explores by placing the large box at the bottom and placing the cube on top of it and is rewarded fully (as seen in the last row of explorative binding 1); explorative binding 2, putting the novel object on top of the cube, fails (and hence yields a lesser reward). The more rewarding action sequence is now stored as EM4 by updating the T matrix. 5 How Novel Action Sequences Emerge out of Multiple Past Experiences (Without Exploration) The user puts all the objects (cube, small cylinder, large box, and sphere) in front of the robot to assemble the tallest stack. Note that iCub has isolated past experiences with all of them. However, it has never encountered all of them together. This is an interesting scenario because none of the past experiences of the robot has the full information to deal with all these objects (all of them have partial chunks of sequences), but if the robot is able to l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2717 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 8: This episode is similar to the one presented in Figure 6, but with one more new object: a large box. The temporal evolution of the system dynamics is shown. Bottom-up information ﬂow leads to the generation of a partial cue and recall of past experiences. EM3 (the previous episode related to stacking cubes, cylinders, and spheres) emerges as the winner and controls the hub top down. Note that top-down hub activation differs from the bottom-up hub activation because the past experience itself is not sufﬁcient. The winning episodic mem- ory enters the construction system where the task-speciﬁc chunk is extracted (cylinders and spheres are not present in the world or VSSP, hence vanish); only the cube remains. The robot explores by placing the large cylinder at the bottom and placing the cube on top of it and is rewarded fully (i.e., 2, as seen in the last row of explorative binding 1); explorative binding 2 of putting the novel object on top of the cube fails. The more rewarding action sequence is now stored as EM4 by updating the T matrix. combine knowledge from multiple experiences to come up with a novel action sequence without any further learning, it is indeed interesting. With the help of Figure 9, we discuss how multiple past experiences remembered in the context of the present can be recombined to generate novel behavior (without any exploration). The process initiates with bottom-up information coming from the world activating the object hub—generation of partial cues that enable recall of all four past experiences (EM1–EM4) stored so far in the episodic memory. This 2718 V. Mohan, G. Sandini, and P. Morasso l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2719 is because all these memories have some information related to a subset of objects present in the world. This summarizes the bottom-up process, from objects in the world to remembering past experiences encountered with them (see Figure 9A). At the same time, not all of these episodic memories enter the construction system; for this, they have to assert their signiﬁcance by controlling the hub either fully or partially. The temporal evolution of the top-down inﬂuence of these competing memories on the hub is shown in Figure 9B. Note that EM1 and EM2 are completely wiped out in the competition because there are other competitors that know more (in the context of the present situation). EM3 encodes information related not just to cylinders and spheres (encoded by EM1 and EM2) but also about cubes, and hence is a stronger competitor. But in addition to EM3, EM4 also manages to stay alive (it knows something about large objects that none of the others know anything about). Furthermore, we see also that EM3 and EM4 know something in common (i.e., cubes) to control, which are basically inhibiting each other (the overlapping neuron is shown in the box. Note that it is approximately, 50% controlled by EM3 and 50% controlled by EM4). Note that in this interesting case, the sum of the activity imposed top down on the hub by EM3 and EM4 is equal to the activity in bottom-up object hub activation (unlike the cases of Figures 7 and 8, where there was a difference because there was a novel object for which there was no experience). Figure 9: The cube, cylinder, large box, and sphere are presented. The robot has had isolated experiences with all of these objects, but none of its past expe- riences encode the complete solution to build the tallest stack now. The ﬁgure shows how the robot achieves this by recombining its multiple past experi- ences; the arrow shows the temporal evolution of this process. (A) Bottom-up perception leads to generation of partial cues and the retrieval of relevant past experiences (note that all past episodes, EM1 to EM4, are remembered as they all contain some information relevant to the present context). Remembered episodic memories compete to control the hub top down (temporal evolution of the top-down inﬂuence of these memories on the hub shown in panel 9B). EM1 and EM2 are eliminated, EM3 and EM4 both jointly control the hub partly competing for the common element both of them know, the cube (EM4 exclu- sively encodes experience related to large objects). The net top-down activity of the hub is identical to the bottom-up activity, which indirectly implies that the complete solution is available in the isolated past experiences (without the need for exploration). (C) Action sequence chunks of EM3 and EM4 enter the construction system, with two ways to bind these sequences. The preferred solution is the one in which overlapping elements of knowledge encoded by different experiences are brought as close as possible (overlaps in this sense playing the role of a subgoal). (D) The ﬁnal solution: a large box-cube-cylinder- sphere with anticipation of a full reward that is given. (E) Novel sequence of actions generated to assemble the tallest stack using four available objects in the scene. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 2720 V. Mohan, G. Sandini, and P. Morasso This implies that the complete action sequence to solve the problem is already available in the isolated past experiences that won the competition, and this applies independent of how many past experiences claim their control over the hub. Either the most valuable action sequence is directly available (in a single episodic memory) or multiple past experiences may have to be combined in a novel fashion to generate a new behavior. In any case, if the net top-down hub activity is equivalent to the bottom- up hub activity (or, equivalently, VSSP), then even if the environment is “novel” (as in the present case), the robot can infer that its past experiences contain enough information to realize the goal (by optimally combining these past memories into a novel sequence). So action sequence chunks encoded by EM3 and EM4 enter the construction system (see Figure 9C), the overlapping object cube shown. The overlaps in knowledge between different remembered experiences are advantageous because they help to connect the experiences together. The construction system employs one simple rule to achieve this: if there are overlaps in knowledge encoded by different winning past experiences, bring them as close as possible. In this sense, the overlapping element is similar to an intermediate subgoal (a point of intersection between two different past experiences). As seen in Figure 9C (right panels), binding the sequence encoded by EM4 before EM3, the overlaps are closest. This is the one enforced by the construction rule. The other alternative is also shown but does not make cognitive sense because we believe that overlaps in knowledge related to past experiences in general play the function of subgoals (or points where one chunk of knowledge of memory connects to another). When isolated memories of past experiences are combined, a novel sequence emerges (see Figure 9D): stack the large box at the bottom, then the cube, the small cylinder on top of the cube, and the sphere on top of the small cylinder and anticipate a full reward for this. Indeed full reward is given! 6 Effects of Key Parameters, Change in Order of Objects During Cumulative Learning, and Mechanisms for Forgetting We have gradually described the neural episodic memory of the robot in section 2, explorative learning and recall of relevant past experiences based on partial cues to generate goal-oriented behaviors in section 3, combining explorative actions with past experiences in section 4, and combining multi- ple past experiences to generate novel behaviors in section 5. In this section, we quantitatively analyze the global behavior of the proposed computa- tional framework under dynamic conditions like change in key parameters of the episodic memory network, change in order of the presentation of ob- jects during cumulative learning, consolidation mechanisms to minimize similar memories being encoded, and mechanisms related to forgetting and the ensuing computational advantages. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 2 6 1 2 2 6 9 2 2 0 1 7 2 4 8 n e c o _ a _ 0 0 6 6 4 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Neural Framework for Episodic Memory 2721 Figure 10: The retrieval performance of the episodic memory network when parameters τ rel (row A), αin (row B), and β (row C) are varied. 6.1 Effects of Change in Parameters in the Episodic Memory Network. As described in section 2, the dynamics of the episodic memory network is dependent on some key parameters like T (learned network connectivity matrix), the time constant of the relaxation τ rel, parameters in the inhibitory network αin, and β. T is null to begin with and cumulatively learned from scratch. It changes dynamically as new information is encoded or forgotten. The other parameters are constant and are set empirically. Here we brieﬂy investigate the behavior of the episodic memory when these parameters are modiﬁed. Figure 10 shows the effects of variations in the parameters and the effect on the retrieval performance of the episodic memory network. Row A (left corner) is the partial cue used to trigger retrieval in all the cases. The middle and right panels of row A show the recalled patterns as a result of the dynamics of equation 2.2, when τ = 40) adversely affect the convergence, resulting in both reduced activations in the neurons and spurious retrieval (row A, right panel). Nominally, a time constant of 1000 is sufﬁcient to ensure stable and robust recall from partial cues in real time (on a quad core laptop). rel is varied. Low time constants (τ rel The middle row shows the retrieval performance from the same par- tial cue (row A, left panel), when αin is changed from 2 to 50, keeping τ rel as 1000. As observed, a change in this parameter does not signiﬁcantly affect the retrieval performance. Instead, increasing it from 2 to 15 has a gradual scaling effect on the activation of the neurons. This behavior was also indicated by Hopﬁeld (2008). Increasing this parameter beyond a cer- tain value (>20) has no signiﬁcant effect (row B, right column). A nominal
value of 5 for αin was chosen for all the experiments reported in this article.
相反, the network behavior is more sensitive to β as it has an
effect on the inhibitory current to the neurons. Row C shows the retrieval
performance for the same partial cue (row A, left panel) when β is varied

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2722

V. Mohan, G. 桑迪尼, 和P. Morasso

从 0.5 到 18, keeping αin constant at 5 and τ
rel as 1000. As observed in row
C (right panel), very small values of β result in a very low inhibition current.
因此, we can see spurious activations and incorrect retrieval. The mid-
dle panel shows the retrieval when β is set to 5, resulting in correct recall
of the stored memory from the partial cue. 然而, further increasing β
also abnormally affects the retrieval because of the high level of inhibition.
因此, the parameter β must be neither too small (resulting in very low
inhibitory current) nor too large (resulting in very high inhibition). In all
our experiments, a nominal value of 3.5 was set for β. The retrieval perfor-
mance can also be affected as more memories are stored, but as estimated
by Hopﬁeld (2008), 大约 250 episodes can be simultaneously stored
and correctly retrieved in a network of 1000 神经元. 更远, 专业人士-
posed framework also includes mechanisms related to both consolidation
and forgetting, which have an effect of either merging multiple memories
into one or eliminating them altogether.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

6.2 Effects of Change in the Order of Presentation of Objects During
Cumulative Learning. While section 6.1 dealt with variations to key pa-
rameters in the episodic memory effect on recall of encoded experiences,
we now go to the next level: change in order of presentation of different
objects during cumulative learning and the resulting effect on the behavior
of the robot. Through Figure 11, we also revisit sections 3 到 5 (勘探,
combining past experiences with exploration to learn further, combining
multiple past experiences to generate novel behavior) when the order of
presentation of objects to the robot is changed. Rows A to D show four
cases of different orders of presentation of objects and the resulting behav-
ior of the robot under situations described in sections 3 到 5 (columns in
数字 11). “EM” stands for episodic memory encoded during the partic-
ular stage. 例如, in row C, at the beginning the robot is presented
with a large box and a cylinder leading to the formation of EM1. 在里面
next episodes, cuboids and mushrooms are introduced as novel objects
(with the robot already having past experience with a large box and cylin-
这) leading to the formation of EM2 and EM3. 下一栏显示
the behavior when all the objects are presented together to construct the
tallest stack. 在这种情况下, multiple past experiences have to be com-
bined to generate novel behavior (参见部分 5). 在这种情况下, EM1 and
EM3 win the top-down competition and control the hub. Merging EM1
and EM3 through the process described in section 5 leads to generation
of the novel behavior. As also seen in the other cases (row A, 乙, 和D),
change in the order of presentation of objects mainly affects the content of
the episodic memory encoded during the learning process. Despite this,
the novel behavior generated by combining multiple past experiences to
construct the tallest stack using all the objects is the same (all rows, 正确的
柱子).

A Neural Framework for Episodic Memory

2723

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 11: Rows A–D show the gradual progression of learning and formation
of episodic memories (EM) with the order of presentation of objects varied. 每个
row is divided into three phases: basic exploration, combining past experience
with exploration, and combining multiple past experiences to generate novel
行为 (that captures the essence of sections 3– 5). In the right column, 尽管
the changes in order of presentation of the objects, behaviors generated to build
the tallest stack in different conditions do not change.

6.3 The Computational Advantage of Forgetting: 什么时候, 为什么, 和
什么. As evident in Figures 6 到 9, memories related to episodes 1 和 2
(IE。, 数字 4) no longer win the top-down competition to control the object
hub and get their content reenacted. New episodic memories that in fact
originated through their support now exert greater inﬂuence on the hub,

2724

V. Mohan, G. 桑迪尼, 和P. Morasso

数字 12: Starting from null, the left panel shows the growth of the connectivity
matrix encoding memories EM1–EM5 acquired cumulatively by the robot. 这
right panel shows the connectivity matrix after EM1–EM4 is gradually forgotten
and only the memory trace EM5 encoding the cumulative knowledge of all the
experience gained so far is retained.

inhibiting them. In the proposed framework, memories that consistently
lose the competition are forgotten because there is a new “competitor” that
overshadows them by not only encapsulating the knowledge they encode
but going beyond and extending the knowledge (to newer objects expe-
rienced cumulatively). Since EM5 encapsulates all the knowledge related
to large objects, cubes, cylinders, and spheres, it is retained and all others
(EM1–EM4) are forgotten. 数字 12 shows the T matrix before and after
the assimilative process (forgetting EM1–EM4 and storing only EM5). 作为
见过, the result is that now we have a trimmed T matrix as compared to the
previous case (where all memories were stored in the connectivity matrix).
This is because now there is one big memory encapsulating everything
instead of ﬁve different isolated sequences. So what is the computational
advantage of such forgetting and assimilation? When should it take place?
The “when” part is when older ones no longer win the access to the hub be-
cause a new competitor encapsulates the knowledge they encode and goes
even beyond. Now regarding “what is the advantage” part of the question,
there are two central advantages:

1. Forgetting decreases patterns that are too close to each other, 制作
retrieval more efﬁcient (when triggered by partial cues) and increas-
ing the storage capacity of the episodic memory network;

2. Reduces computational load: Instead of retrieving several isolated
experiences that then compete against each other top down to control
the hub, there is a minimal set of a few winners that encode all
necessary information to synthesize any goal-directed behavior (这

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2725

global system at the same time open to further learning and formation
of new memories).

因此, forgetting is indeed advantageous, and we have further shown
“when” memories are forgotten (IE。, when they do not win access to hubs
for long time and hence are unable to reenact their plans through the body),
how they are forgotten (by duly updating the connectivity matrix of the
memory network), and hinted at the computational advantages of such
流程 (IE。, more efﬁcient retrieval, increase in storage capacity, 减少-
tion in computational load).

After the phase of forgetting, the memory network contains only one
episodic trace, EM5, that encodes the cumulative knowledge of everything
experienced and learned so far. Before we conclude, we add one more object
that is commonly found: the container. The robot is presented a combination
of cubes and containers. Figure 13A shows the resulting behavior, 这是
similar to what happened when cubes and large objects were introduced
(见图 5). Bottom-up activation of the object hub gives rise to partial
cues. 重要的, note that now there is only one winner that encodes a
large sequence (because of the effect of forgetting). Only one winner means
no competition to control the hub top down; EM5 is the winner and gets
access to the hub. 然而, comparison between bottom-up and top-down
hub activation indicates that not everything is known from past experience.
The task-relevant chunk of the EM5 gains access to the construction system
and becomes part of the plan, with the rest to be developed by explor-
ing with the new object. Putting the container on top of the cube leads to
greater reward. This episode of playing with containers and cubes becomes
our new memory, EM6. So now there are two episodic memories: one that
encodes knowledge about large cylinders, cubes, cylinders, and spheres
and one that knows something about cubes and containers. Figure 13B
presents the response when the robot is presented now with four objects
in a novel combination: a large object, a small cylinder, a small cube, 和
a container. Similar to the situation encountered in section 5, the robot has
isolated experiences with all these objects but none of the memories encode
the complete solution. The hub is controlled in a distributed fashion. 作为
we can see, the sum of the top-down hub activity imposed by competing
episodic memories (EM5 and EM6) is equal to the bottom-up hub activity
(resulting through bottom-up sensory stream). This implies that the com-
plete knowledge to solve the problem is embedded in the isolated past
experiences if recombined creatively. A novel behavior emerges and brings
the full reward!

7 讨论

“It’s a wrong sort of memory that only works backwards,” remarked the
White Queen in Lewis Carroll’s Alice in Wonderland. 有趣的是, 新兴的

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2726

V. Mohan, G. 桑迪尼, 和P. Morasso

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 13: (A) The temporal evolution of the behavior when containers are
introduced as novel objects along with cubes. Partial cues lead to recall of
EM5, the only existing experience in the new consolidated memory network.
Note that top-down hub activity as modulated by EM5 differs from bottom-
up hub activity, indicating that not everything is known from past experience
(similar to Figure 6). The task-relevant chunk of EM5 needs to be combined
with explorative actions on the container. Stacking the container on top of
the cube brings greater reward. This experience is encoded as a new memory,
EM6. (乙) The behavior when a large object, a cube, a cylinder, and a container
are presented together. As seen, both context-relevant experiences EM5 and
EM6 are recalled and now control the hub in distributed fashion (as described
in section 5). Note that in contrast to Figure 13A, the sum of the down hub
activity imposed by competing episodic memories (EM5 and EM6) is equal to
the bottom-up hub activity. This implies that the complete solution is available
in the isolated past experiences if they are recombined ﬂexibly. A novel behavior
emerges. The robot places the cube on the large box, the cylinder on the cube,
and the container on top of the cylinder and anticipates a reward of 4.

trends in neurosciences, in particular the discovery of the DMN, now pro-
vide converging evidence suggesting an extensive overlap in the brain
networks activated while recalling the past and those engaged during activi-
ties as diverse as simulating the future, goal-directed planning, 看法

A Neural Framework for Episodic Memory

2727

采取, and some forms of spatial navigation. Such a perspective urges
viewing memory not just as past oriented but also future oriented—in other
字, as a key component of the prospective brain that actively facilitates
simulation of future events, formation of ﬂexible plans, and predictions—
the essence beautifully captured in Carroll’s novel.

While the computational bases of such mechanisms are still elusive, 这是
imperative that cognitive robots envisaged to assist us in the unstructured
environments we inhabit must be equipped with a powerful biologically
inspired memory architecture that allows them to remember their past expe-
riences based on context and exploit them ﬂexibly in novel situations. 这
article was an exploration in this direction, capturing in a simple way the
constant interplay between neural mechanisms related to learning, 内存-
奥里, prospection, and abstraction in a cumulatively developing humanoid
robot. 在这个部分, we brieﬂy summarize the general perspective we have
gained by teaching a baby humanoid to build the tallest stack and how we
are taking the framework ahead in near future.

7.1 “No Traveler, No Travel.” In a seminal article Tulving (1972) 苏格-
gested that retrieval of one’s own past experiences involves a conscious
reliving of past events, like a mental journey into the past. 最近几年,
evidence has accumulated that such time travels are also responsible for
simulating the possible future in order to facilitate ﬂexibly goal-directed
behaviors in the present. 的确, if the sole function of episodic memory
mechanism was to record the past, it might be expected to function in a
reproductive manner, similar to a video recorder (Suddendorf & Corbal-
利斯, 1997). 反而, it functions in a constructive fashion, where multiple
experiences can be retrieved, eliminated by competition and at the same
time creatively recombined to facilitate the survival of the “traveler” in
the dynamically changing unstructured world. While much of the discus-
sion on mental time travel has been centered around whether nonhuman
animals possess this ability (Tulving 2002; Suddendorf & Corballis, 2007),
attempts to emulate such mechanisms on cumulatively learning embodied
robots have been negligible to our knowledge. Such an exercise may give
rise to novel computational insights and at the same time aid the creation
of better cognitive artifacts. This article goes in this direction. Both when a
stack of objects gets destroyed or built successfully, iCub learns something
from them and uses such memories in the future. A time travel to its past
explorative interactions with the world and resulting consequences enables
it to do so. 同时, had it not experienced these events gradually
in time through direct sensorimotor interactions, it would not have been
able to encode such diverse experiences into its episodic memory or use
them in the future. Time travel needs an active traveler, and this directly
resonates with the concept of embodiment and the emergence of repre-
sentational content as a consequence of sensory-motor interactions of the
agent with its environment (Wiener, 1961; 吉布森, 1966, 1979; Maturana &

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2728

V. Mohan, G. 桑迪尼, 和P. Morasso

瓦雷拉, 1980; 克拉克, 1997). We show in this article how such continuous ex-
change of signals between the brain, 身体, and the environment leads
to the formation and ﬂexible use of episodic memories in an embodied
robot. Given the diversity of the world, the travels of different travelers
are indeed unique, and this is reﬂected in the diversity in individuals be-
haviors and preferences. Some of the facts may eventually get assimilated
into semantic knowledge (iCub learns that color of objects do not affect
construction of tallest stacks). But the diversity in our behaviors is in many
ways attributable to our own unique episodic experiences. This may reﬂect
also in the behaviors of different iCub robots, each learning cumulatively
and guided by its own episodic memories. 一般来说, just as we all have
to travel our own journey, software programmers cannot travel the journey
for an autonomous robotic assistant expected to inhabit an unstructured
世界. Instead they must keep learning cumulatively in time and use their
experiences effectively in the future (Georg Stork, 2012). 在此背景下,
mechanisms related to episodic memory as addressed in this article serve
as a central design feature of the prospective brain, and there is a need
to push further the state of the art in relation to creation and use of such
mechanisms in cognitive robots.

7.2 Why Top Down and Bottom Up Must Share Neural Substrates. A
central feature in our computational framework is the innovative use of
top-down and bottom-up information ﬂows that share neural substrates.
imaging and embodied cognition
Numerous studies from functional
provide direct evidence for this (Hesslow, 2002; Grafton, 2009; 马丁,
2009; Bressler & Menon, 2010; Gallese & Sinigaglia, 2011). However it is not
clear what the computational advantages are, how cognitive architectures
for embodied robots must exploit this idea, and how much of the compu-
tational and neural substrates are eventually shared (this is the more recent
debate between hard embodiment vs. soft embodiment; see Martin, 2009).
In our framework, the higher-level maps related to perception and action
(见图 1) can be activated both top down and bottom up (while both
early stages of perception like sensory processing and late stages of action
at the level of motor commands are not involved). 一般来说, sharing of
computational substrates between top down and bottom up gives rise to
two main advantages that we have exploited in our framework. 第一的, 它
simpliﬁes comparison between what has been experienced in the past (IE。,
reconstructed through memory) with what an embodied agent is presently
experiencing, since both mechanisms are brought down to a common
平台 (IE。, the shared computational/neural substrate: hub). Such com-
parisons play a crucial role in both inference and assimilation. The former
utility is fairly straightforward: the resonance between top down and
bottom up directly indicates that the world is working as anticipated (和
the inverse is true if there is dissonance). 换句话说, sharing of neural
substrates between top down and bottom up can be effectively used to close

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2729

the loop between learning and reasoning in an open-ended setup: 更多的
learning leading to better reasoning, inconsistencies in reasoning leading
to greater learning. As seen in section 3.5, direct comparison between
remembered past experience (of stacking green sphere on yellow cylinder)
with the present behavior (of stacking blue cylinder and orange sphere)
and their resulting consequences is sufﬁcient to infer that color is not a
causally dominant property as far as the goal of creating the tallest stack is
担心的. 因此, instead of storing the new episode in the memory, 这
ability of color map to activate the object hub in the case of stacking was
reduced.

The second application of this idea in our article is subtle and relates
to control of the object hub bottom up by real perception and top down
by multiple competing episodic memories. In our framework, top-down
activation of the hub when compared with the bottom up gives rise to three
crucial pieces of information as demonstrated by numerous examples in
sections 4 到 6:

1. It clusters what is known from past experience about the present sit-
uation and what is unknown because of novelty in the environment.
This facilitates combining past experiences with explorative actions
to learn further (see Figures 6, 8, 和 13).

2. It separates out experiences that are valuable in the present context
from the set of all remembered episodic memories, which are com-
peting to control the hub top down. The bottom-up activation of the
hub represents the present context and helps generate of partial cues
that lead to the retrieval of multiple related past experiences. 哪个
past experiences win the competition and control the hub top down
are based on the anticipated rewards they fetch (information that is
ﬁlled in during episodic reconstruction) and the exclusivity of the
knowledge they encode (IE。, knowing something about the present
situation that no competitor knows). 总共, memories are recon-
structed bottom up through partial cues because they are relevant in
the present context. But this is not enough; to survive, they have to
compete and demonstrate that they are valuable in comparison with
其他的. The hub is the arena where both top-down and bottom-up
processes culminate.

3. If the net top-down hub activity is equivalent to the bottom-up hub
活动, then even if the environment is novel, the robot can infer that
its past experiences contain enough information to realize the goal
(sometimes though generation of a novel behavior). Either the most
valuable action sequence is directly available (in a single episodic
记忆: see Figures 5 和 7) or multiple past experiences may have
to be combined in a novel fashion to generate a new behavior (看
人物 9 和 13). In the latter case, it is like a team of past experiences
reassembled together in the context of the goal. As a ﬁnal remark,

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2730

V. Mohan, G. 桑迪尼, 和P. Morasso

we chose not to involve initial layers of sensation and ﬁnal stages
of action because otherwise it would be impossible to distinguish
imagination from reality (imagine activating the joints every time you
read words like lick, kick). There is evidence from functional imaging
too to justify this assumption (马丁, 2009). We believe millions of
years of evolution have managed to strike the right balance on the set
of neural substrates shared by top down and bottom up processing.

7.3 Survival of the Fittest May Apply to Memory Too. While there is
some support for the opinion that the act of recalling refreshes an episodic
trace anew (Dudai, 2006), there is no clear consensus on what the under-
lying computational mechanisms are and what their advantages are. 我们
were forced to consider this topic when the robot was experiencing differ-
ent episodes of interactions with objects on different days. Some of them
encoded partially the same content but with different consequences; 一些
of them included knowledge related to objects that others did not encode
but coincided partially with others. Even assuming that all such memories
may be recalled accurately based on partial cues from the present (那
is not true because there are indeed errors in retrieving similar patterns),
not all of them can be used at the same time. Hence we introduced the
idea that only the ﬁttest memories—-those that win the competition and
manage to control the hub top down (even partially)—are refreshed. 我们
found that the inverse naturally leads to a mechanism of forgetting: 这
only way for an episodic trace to survive is by reenacting its content (甚至
部分的) through the body. 反过来, consistent losers are eliminated. Just
like the old EM1 to EM4 were forgotten and new EM5 and EM6 took over
(themselves constructed using the old experiences) one day someone else
who knows more or reaps greater rewards may eventually replace them as
learning goes on. We believe that this is a consequence of a natural process
of cognitive development of a cumulatively learning agent: as it encoun-
ters new things, old things have to be put in context and some of them
get eliminated, their knowledge encoded in a new competitor who goes
超过. We also showed that such a scheme is healthy in the sense that it
decreases patterns that are too close to each other, hence making retrieval
more efﬁcient and in turn also increasing storage capacity of the memory
网络, and it reduces computational load by decreasing drastically the
number of isolated experiences that have to be remembered and then com-
pete against each other to control the hub top down. 在这个意义上, we believe
survival of the ﬁttest applies even in the mental space with a direct impli-
cation for efﬁcient management of computational resources, reduction in
computational load, hence fast reaction times, and having implications on
the growth of the cognitive agent. It may be interesting to see what happens
if this capacity is deactivated in iCub. We look forward to this in our future
作品.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2731

7.4 Counting Comes Before Calculus: Role of the Teacher. We are so-
cial agents, and helping and seeking help is undoubtedly cognitive. 这
obvious reason is that it has a minimizing effect related to efforts that an
agent needs to direct toward exploration (that can be expensive energeti-
卡莉). In moving from basic counting to the complexity of calculus, 经常
the helping hand of the good teacher helps. And it does so for an embodied
robot that learns cumulatively. If the “user” can support its development
by creating scenarios of gradually increasing complexity for it to act and
学习, we believe it may minimize the need for engaging in needless ex-
ploration. Readers might have noticed that while the robot itself learns
cumulatively, the teacher has also intelligently introduced various objects
in the environment cumulatively in time (see sections 3 到 6) 为了
intentionally cause contradictions and trigger explorations or generation of
novel behaviors’. We believe the introduction of such social context and soft
user guidance moves in the direction of a middle path that both minimizes
excessive exploration by the robot on one hand and eliminates hard coding
by the programmer on the other. Human infants often go through this phase
where toys of different levels of complexity are introduced gradually to play
和 (even categorized approximately in age groups). The same applies to a
baby humanoid learning cumulatively, but unlike a human infant, this helps
looking deeper into the underlying computational principles as we users
are ourselves are at the receiving end and are constantly learning. So closing
the loop between robot and an intelligent teacher/user can make learning
more productive. 总结, infants often learn in a social environment
where the parent/teacher plays a key role in nurturing the developmental
curve. Cognitive robots/assistants are also envisaged to exist in a shared
environment with its user, and it is up to the user to train it in the tasks in
which he or she needs assistance. We have attempted to incorporate such
an aspect into our ongoing efforts to develop iCub cognition. 一般来说,
the teacher/user plays three crucial roles:

1. Motivate: Set goals and create rich sensorimotor worlds where the
robot can get diverse experiences; at the same time ensure that the
environment is within the zone of proximal development (Vygotsky,
1978) of the robot.

2. Demonstrate: This deals with imitation learning to acquire motor
技能, for example learning to use common day-to-day tools found
in domestic and industrial setups. This issue is ongoing, 和最近的
results have been addressed elsewhere (Mohan & Morasso, 2011,
2012).

3. Reinforce: Rewards and penalties coming from the teacher/user aid
the value-dependent learning process of the robot. This feature
contributes toward creating contradictions between what the robot
anticipates getting and what it gets, hence driving it to learn
what was wrong and help it to reason better next time. Perhaps a

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2732

V. Mohan, G. 桑迪尼, 和P. Morasso

“humanlike” touch to machine learning is the need of the times if we
are to see the emergence of machines that can assist us ﬂexibly in the
environments we inhabit and create.

致谢

The research presented in this article is supported by Istituto Italiano di
Tecnologia and the European Union through the FP7 project DARWIN
(www.darwin-project.eu, grant FP7-270138). We express our gratitude to-
ward the anonymous reviewers for their enormous patience and construc-
tive feedback to develop the article.

参考

Addis, D. R。, & Schacter, D. L.

(2012). The hippocampus and imagining
未来: Where do we stand? Frontiers in Human Neuroscience, 5, 173.
土井:10.3389/fnhum.2011.00173

Addis, D. R。, 黄, A. T。, & Schacter, D. L. (2007). Remembering the past and imagin-
ing the future: Common and distinct neural substrates during event construction
and elaboration. Neuropsychologia, 45, 1363–1377.

Barab´asi, A. L. (2003). 链接: The new science of networks. Cambridge Perseus Books.
Barab´asi, A.-L. (2012). The network takeover. Nature Physics, 8, 14–16.
Barab´asi, A.-L., & 阿尔伯特, 右. (1999). Emergence of scaling in random networks. 科学,

286, 509–512.

Bressler, S. L。, & Menon, V. (2010). Large-scale brain networks in cognition: Emerging

methods and principles. 认知科学的趋势, 14, 277–290.

巴克纳, 右. L。, Andrews-Hanna, J. R。, & Schacter, D. L. (2008). The brain’s default
网络: Anatomy, function, and relevance to disease. 安. NY Acad. Sci., 1124,
1–38.

巴克纳, 右. L。, & 卡罗尔, D. C. (2007). Self-projection and the brain. Trends in Cognitive

科学, 2, 49–57.

伯吉斯, N。, Maguire, 乙. A。, & 奥基夫, J. (2002). The human hippocampus and

spatial and episodic memory. 神经元, 35, 625–641.

Chong, H. Q., Tan, A.-H., & 的, G.-H. (2007). Integrated cognitive architecture: A

survey. Art. 情报审查, 28, 103–130.

克拉克, A. (1997). Being there: Putting brain, body and world together again. 剑桥,

嘛: 与新闻界.

Dudai, 是. (2006). Reconsolidation: The advantage of being refocused. Curr. Opin.

Neurobiol., 16, 174–178.

Eichenbaum, H. (2004). Hippocampus: Cognitive processes and neural representa-

tions that underlie declarative memory. 神经元, 44, 109–120.

Frith, C. D ., & Frith, U. (2012). Mechanisms of social cognition. 安努. 牧师. Psychol.,

63, 287–313.

Gallese, 五、, & Sinigaglia, C. (2011). What is so special with embodied simulation.

认知科学的趋势, 15, 512–519.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

A Neural Framework for Episodic Memory

2733

Georg Stork, H. (2012). Towards a scientiﬁc foundation for engineering. Biologically

Inspired Cognitive Architecture, 1, 82–91.

吉布森, J. J. (1966). The senses considered as perceptual systems. 波士顿: Houghton

Mifﬂin.

吉布森, J. J. (1979). The ecological approach to visual perception. 波士顿: Houghton

Mifﬂin.

Grafton, S. 时间. (2009). Embodied cognition and the simulation of action to understand

其他的. Annals of the New York Academy of Sciences, 1156, 97–117.

哈格曼, P。, 卡蒙, L。, 巨人, X。, 买, R。, 蜂蜜, C. J。, 威登, V. J。, &
斯波恩斯, 氧. (2008). 绘制人类大脑皮层的结构核心图. 公共科学图书馆
生物学, 6, e59.

Hassabis, D ., & Maguire, 乙. A. (2011). The construction system of the brain. 在米. Bar
(埃德。), Predictions in the brain: Using our past to generate a future. 纽约: 牛津
大学出版社.

Hesslow, G. (2002). Conscious thought as a simulation of behavior and perception.

认知科学的趋势, 6, 242–247.

Hopﬁeld, J. J. (2008). Searching for memories: Sudoku, implicit check bits, 和
iterative use of not-always-correct rapid neural computation. 神经计算,
20, 512–519.

马丁, A. (2007). The representation of object concepts in the brain. Annual Review

心理学系, 58, 25–45.

马丁, A. (2009). Circuits in mind: The neural foundations for object concepts. 在米.
Gazzaniga (埃德。), The cognitive neurosciences (4第三版。, PP. 1031–1045). 剑桥,
嘛: 与新闻界.

石匠, 中号. F。, 诺顿, 中号. 我。, Van Horn, J. D ., Wegner, D. M。, Grafton, S. T。, & Macrae,
C. 氮. (2007). Wandering minds: The default network and stimulus-independent
想法. 科学, 315, 393–395.

Maturona, H. R。, & 瓦雷拉, F. J. (1980). Autopoiesis and cognition. 多德雷赫特:

D. Reidel.

迈耶, K., Damasio, A. (2009). Convergence and divergence in a neural architecture

for recognition and memory. Trends in Neuroscience, 32, 376–382.

Mohan, 五、, & Morasso, 磷. (2011). Passive motion paradigm: An alternative to optimal

控制. Front. Neurorobot., 5, 4. 土井:10.3389/fnbot.2011.00004

Mohan, 五、, & Morasso, 磷. (2012). How past experience, imitation and practice can be
combined to swiftly learn to use novel “tools”: Insights from skill learning experiments
with baby humanoids. Intl. Conf. on Biomimetic and Biohybrid Systems: Living
Machines 2012, 巴塞罗那, 西班牙.

Mohan, 五、, Morasso, P。, 桑迪尼, G。, & Kasderidis, S. (2013). Inference through
embodied simulation in cognitive robots. Cognitive Computation, 5, 355–382.
土井:10.1007/s12559-013-9205-4

Mohan, 五、, Morasso, P。, Zenzeri, J。, Metta, G。, Chakravarthy, V. S。, & 桑迪尼, G.
(2011). Teaching a humanoid robot to draw “Shapes.” Autonomous Robots, 31(1),
21–53.

Murata, A。, Fadiga, L。, Fogassi, L。, Gallese, 五、, Raos, 五、, & Rizzolatti, G. (1997). 目的
representation in the ventral premotor cortex (area f5) of the monkey. 杂志
Neurophysiology, 78, 2226–2230.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

2
6
1
2
2
6
9
2
2
0
1
7
2
4
8
n
e
C
哦
_
A
_
0
0
6
6
4
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2734

V. Mohan, G. 桑迪尼, 和P. Morasso

帕特森, K., Nestor, 磷. J。, & 罗杰斯, 时间. 时间. (2007). Where do you know what you
知道? The representation of semantic knowledge in the human brain. 自然
评论神经科学, 8(12), 976–987.

Raichle, 中号. E., MacLeod, A. M。, 斯奈德, A. Z。, 权力, 瓦. J。, Gusnard, D. A。, &
舒尔曼, G. L. (2001). A default mode of brain function. Proc. Natl. Acad. Sci.
美国。, 98, 676–682.

Schacter, D. L。, Addis, D. R。, Hassabis, D ., 马丁, 五、, Nathan, 右. S。, & Szpunar, K. K.
(2012). The future of memory: Remembering, imagining, and the brain. 神经元,
76, 677–694.

Sederberg, 磷. B., & Norman, K. A. (2010). Learning and memory: 计算型
型号. In G. F. Koob, 中号. Le Moal, & 右. F. 汤普森 (编辑。), Encyclopedia of
behavioral neuroscience. 牛津: 学术出版社.

斯波恩斯, 氧. (2010). Networks of the brain. 剑桥, 嘛: 与新闻界.
Squire, L. R。, & Wixted, J. (2011). The cognitive neuroscience of human memory since

H.M. Annual Review of Neuroscience, 34, 259–288.

Suddendorf, 时间. (2013). Mental time travel: Continuities and discontinuities. 趋势

认知科学, 17, 151–152.

Suddendorf, T。, & Corballis, 中号. (1977). Mental time travel and the evolution of the
human mind. Genetic, Social, and General Psychology Monographs, 123, 133–164.
Suddendorf, T。, & Corballis, 中号. (2007). The evolution of foresight: What is mental
time travel and is it unique to humans? Behavioral and Brain Sciences, 30, 299–313.
Suddendorf, T。, Addis, D. R。, & Corballis, 中号. C. (2009). Mental time travel and the
shaping of the human mind. Philosophical Transactions of the Royal Society of London
乙: Biological Sciences, 364, 1317–1324.