信
Communicated by Lea Duncker
Probing the Relationship Between Latent Linear Dynamical
Systems and Low-Rank Recurrent Neural Network Models
Adrian Valente
adrian.valente@ens.fr
Srdjan Ostojic
srdjan.ostojic@ens.fr
Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960,
Ecole Normale Superieure–PSL Research University, 75005 巴黎, 法国
Jonathan W. Pillow
pillow@princeton.edu
Princeton Neuroscience Institute, 普林斯顿大学, 普林斯顿大学,
新泽西州 08544, 美国.
A large body of work has suggested that neural populations exhibit low-
dimensional dynamics during behavior. 然而, there are a variety of
different approaches for modeling low-dimensional neural population
活动. One approach involves latent linear dynamical system (LDS)
型号, in which population activity is described by a projection of low-
dimensional latent variables with linear dynamics. A second approach
involves low-rank recurrent neural networks (RNNs), in which popula-
tion activity arises directly from a low-dimensional projection of past
活动. Although these two modeling approaches have strong similar-
实体, they arise in different contexts and tend to have different domains
of application. Here we examine the precise relationship between latent
LDS models and linear low-rank RNNs. When can one model class be
converted to the other, and vice versa? We show that latent LDS models
can only be converted to RNNs in specific limit cases, due to the non-
Markovian property of latent LDS models. 反过来, we show that lin-
ear RNNs can be mapped onto LDS models, with latent dimensionality at
most twice the rank of the RNN. A surprising consequence of our results
is that a partially observed RNN is better represented by an LDS model
than by an RNN consisting of only observed units.
1 介绍
Recent work on large-scale neural population recordings has suggested
that neural activity is often confined to a low-dimensional space, 和
fewer dimensions than the number of neurons in a population (教会-
土地, Byron, Sahani, & 谢诺伊, 2007; 高 & 甘古利, 2015; Gallego, Perich,
神经计算 34, 1871–1892 (2022) © 2022 麻省理工学院
https://doi.org/10.1162/neco_a_01522
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
1872
A. Valente, S. Ostojic, 和 J. Pillow
磨坊主, & Solla, 2017; Saxena & 坎宁安, 2019; Jazayeri & Ostojic,
2021). To describe this activity, modelers have at their disposal a wide ar-
ray of tools that give rise to different forms of low-dimensional activity
(坎宁安 & 于, 2014). Two classes of modeling approaches that have
generated a large following in the literature are descriptive statistical mod-
els and mechanistic models. Broadly speaking, descriptive statistical mod-
els aim to identify a probability distribution that captures the statistical
properties of an observed neural dataset, while remaining agnostic about
the mechanisms that gave rise to it. Mechanistic models, 相比之下, 目的
to reproduce certain characteristics of observed data using biologically
inspired mechanisms, but often with less attention to a full statistical de-
scription. Although these two classes of models often have similar mathe-
matical underpinnings, there remain a variety of important gaps between
他们. Here we focus on reconciling the gaps between two simple but pow-
erful models of low-dimensional neural activity: latent linear dynamical
系统 (LDS) and linear low-rank recurrent neural networks (RNNs).
The latent LDS model with gaussian noise is a popular statistical model
for low-dimensional neural activity in both systems neuroscience (史密斯
& 棕色的, 2003; Semedo, Zandvakili, Kohn, Machens, & Byron, 2014) 和
brain-machine interface settings (Kim, Simeral, Hochberg, 多诺霍, &
黑色的, 2008). This model has a long history in electrical engineering, 在哪里
the problem of inferring latents from past observations has an analytical
solution known as the Kalman filter (Kalman, 1960). In neuroscience set-
tings, this model has been used to describe high-dimensional neural pop-
ulation activity in terms of linear projections of low-dimensional latent
变量. Although the basic form of the model includes only linear dy-
namics, recent extensions have produced state-of-the-art models for high-
dimensional spike train data (Yu et al., 2005; Petreska et al., 2011; Macke
等人。, 2011; Pachitariu, Petreska, & Sahani, 2013; Archer, 科斯特, Pillow, &
Macke, 2014; Duncker, Bohner, Boussard, & Sahani, 2019; Zoltowski, Pillow,
& Linderman, 2020; Glaser, Whiteway, 坎宁安, Paninski, & Linder-
男人, 2020; Kim et al., 2008).
Recurrent neural networks, 相比之下, have emerged as a powerful
framework for building mechanistic models of neural computations under-
lying cognitive tasks (Sussillo, 2014; Barak, 2017; Mante, Sussillo, 谢诺伊, &
Newsome, 2013) and have more recently been used to reproduce recorded
neural data (Rajan, 哈维, & Tank, 2016; 科恩, DePasquale, Aoi, &
Pillow, 2020; Finkelstein et al., 2021; Perich et al., 2021). While randomly
connected RNN models typically have high-dimensional activity (作为-
polinsky, Crisanti, & Sommers, 1988; Laje & Buonomano, 2013), recent work
has shown that RNNs with low-rank connectivity provide a rich theoretical
framework for modeling low-dimensional neural dynamics and the result-
ing computations (Mastrogiuseppe & Ostojic, 2018; Landau & Sompolin-
天空, 2018; 佩雷拉 & Brunel, 2018; Schuessler, Dubreuil, Mastrogiuseppe,
Ostojic, & Barak, 2020; Beiran, Dubreuil, Valente, Mastrogiuseppe, &
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1873
Ostojic, 2021; Dubreuil, Valente, Beiran, Mastrogiuseppe, & Ostojic, 2022;
Bondanelli, Deneux, Bathellier, & Ostojic, 2021; Landau & Sompolinsky,
2021). In these low-rank RNNs, the structure of low-dimensional dynam-
ics bears direct commonalities with latent LDS models, yet the precise
relationship between the two classes of models remains to be clarified.
Understanding this relationship would open the door to applying to low-
rank RNNs probabilistic inference techniques developed for LDS models
and conversely could provide mechanistic interpretations of latent LDS
models fitted to data.
In this letter, we examine the mathematical relationship between latent
LDS and low-rank RNN models. We focus on linear RNNs, which are less
expressive but simpler to analyze than their nonlinear counterparts while
still leading to rich dynamics (Hennequin, Vogels, & Gerstner, 2014; Kao,
Sadabadi, & Hennequin, 2021; Bondanelli et al., 2021). We show that even if
both LDS models and linear low-rank RNNs produce gaussian distributed
activity patterns with low-dimensional linear dynamics, the two model
classes have different statistical structures and are therefore not in general
equivalent. 进一步来说, in latent LDS models, the output sequence has
non-Markovian statistics, meaning that the activity in a single time step is
not independent of its history given the activity on the previous time step.
This stands in contrast to linear RNNs, which are Markovian regardless
of the rank of their connectivity. A linear low-rank RNN can nevertheless
provide a first-order approximation to the distribution over neural activity
generated by a latent LDS model, and we show that this approximation be-
comes exact in several cases of interest, and in particular, in the limit where
the number of neurons is large compared to the latent dimensionality. 骗局-
versely, we show that any linear low-rank RNN can be converted to a latent
LDS, although the dimensionality of the latent space depends on the over-
lap between the subspaces spanned by left and right singular vectors of
the RNN connectivity matrix and may be as high as twice the rank of this
矩阵. The two model classes are thus closely related, with linear low-rank
RNNs comprising a subset of the broader class of latent LDS models. An in-
teresting implication of our analyses is that the activity of an RNN in which
only a subset of neurons are observed is better fit by a latent LDS model
than by an RNN consisting only of observed units.
2 Modeling Frameworks
We start with a formal description of the two model classes in question,
both of which describe the time-varying activity of a population of n
神经元.
2.1 Latent LDS Model. The latent linear dynamical system (LDS)
模型, also known as a linear gaussian state-space model, describes neural
population activity as a noisy linear projection of a low-dimensional latent
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
1874
A. Valente, S. Ostojic, 和 J. Pillow
数字 1: (A) Schematic representation of the latent linear dynamical system
模型, as defined by equations 2.1 到 2.3. (乙) Schematic representation of the
low-rank linear RNN, as defined by equations 2.4 和 2.5.
variable governed by linear dynamics with gaussian noise (Kalman, 1960;
Roweis & Ghahramani, 1999; see Figure 1A). The model is characterized by
the following equations:
xt+1
= Axt + wt,
yt = Cxt + vt,
wt ∼ N (0, 问),
vt ∼ N (0, 右).
(2.1)
(2.2)
这里, xt is a d-dimensional latent (or “unobserved”) vector that follows
discrete-time linear dynamics specified by a d × d matrix A and is cor-
rupted on each time step by a zero-mean gaussian noise vector wt ∈ Rd with
covariance Q. The vector of neural activity yt arises from a linear trans-
formation of xt via the n × d observation (or “emissions”) matrix C, 科尔-
rupted by zero-mean gaussian noise vector vt ∈ Rn with covariance R.
Generally we assume d < n, so that the high-dimensional observations yt
are explained by the lower-dimensional dynamics of the latent vector xt. For
clarity, in the main text, we focus on LDS models without external inputs
and study their effect in appendix D.
The complete model also contains a specification of the distribution of
the initial latent vector x0, which is commmonly assumed to have a zero-
mean gaussian distribution with covariance (cid:2)
0:
x0
∼ N (0, (cid:2)
0).
(2.3)
}.
The complete parameters of the model are thus θ
Note that this parameterization of an LDS is not unique: any invertible
= {A, C, Q, R, (cid:2)
LDS
0
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1875
linear transformation of the latent space leads to an equivalent model if the
appropriate transformations are applied to matrices A, C, Q, and (cid:2)
0.
2.2 Low-Rank Linear RNN. A linear RNN, also known as an autore-
gressive (AR) model, represents observed neural activity as a noisy linear
projection of the activity at the previous time step. We can write the model
as (see Figure 1B)
yt+1
= Jyt + (cid:3)t,
(cid:3)t ∼ N (0, P),
(2.4)
where J is an n × n recurrent weight matrix and (cid:3)t ∈ Rn is a gaussian noise
vector with mean zero and covariance P. Moreover, we assume that the ini-
tial condition is drawn from a zero-mean distribution with covariance Vy
0:
y0
∼ N (0, Vy
0).
(2.5)
A low-rank RNN model is obtained by constraining the rank of the re-
current weight matrix J to be r (cid:4) n. In this case, the recurrence matrix can
be factorized as
J = MN
(cid:5),
(2.6)
where M and N are both n × r matrices of rank r.
Note that this factorization is not unique, but a particular factorization
can be obtained from a low-rank J matrix using the truncated singular value
decomposition: J = USV(cid:5)
, where U and V are semiorthogonal n × r matri-
ces of left and right singular vectors, respectively, and S is an r × r diagonal
matrix containing the largest singular values. We can then set M = U and
N = SV(cid:5)
.
The model parameters of the low-rank linear RNN are therefore given
by θRNN = {M, N, P, Vy
0
}.
, . . . , yT ) over time series (y1
2.3 Comparing the Two Models. Both models described above ex-
hibit low-dimensional dynamics embedded in a high-dimensional obser-
vation space. In the following, we examine the probability distributions
, . . . , yT ) generated by the two models.
P(y1
We show that in general, the two models give rise to different distribu-
tions, such that the family of probability distributions generated by the LDS
model cannot all be captured with low-rank linear RNNs. Specifically, RNN
models are constrained to purely Markovian distributions, which is not the
case for LDS models. However, the two model classes can be shown to be
equivalent when the observations yt contain exact information about the
latent state xt, which is in particular the case if the observation noise is or-
thogonal to the latent subspace or in the limit of a large number of neurons
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1876
A. Valente, S. Ostojic, and J. Pillow
n (cid:6) d. Conversely, a low-rank linear RNN can in general be mapped to a
latent LDS with a dimensionality of the latent state at most twice the rank
of the RNN.
3 Mapping from LDS Models to Linear Low-Rank RNNs
3.1 Nonequivalence in the General Case. Let us consider a latent LDS
described by equations 2.1 to 2.3 and a low-rank linear RNN defined by
equations 2.4 and 2.5. We start by comparing the properties of the joint dis-
, . . . , yT ) for any value of T for the two models. For both mod-
tribution P(y0
els, the joint distribution can be factored under the form
P(y0
, . . . , yT ) = P(y0)
T(cid:2)
t=1
P(yt | yt−1
, . . . , y0),
(3.1)
where each term in the product is the distribution of neural population ac-
tivity at a single time point given all previous activity (see appendix A for
details). More specifically, each of the conditional distributions in equation
3.1 is gaussian, and for the LDS, we can parameterize these distributions as
P(xt|yt−1
P(yt|yt−1
, . . . , y0) := N (ˆxt, Vt ),
, . . . , y0) = N (Cˆxt, CVtC
(cid:5) + R),
(3.2)
(3.3)
where ˆxt is the mean of the conditional distribution over the latent at time
step t, given observations until time step t − 1. It obeys the recurrence
equation
ˆxt+1
= A(ˆxt + Kt (yt − Cˆxt )),
where Kt is the Kalman gain given by
(cid:5)
Kt = VtC
(CVtC
(cid:5) + R)
−1,
(3.4)
(3.5)
and Vt represents a covariance matrix, which is independent of the obser-
vations and follows a recurrence equation detailed in appendix A.
Iterating equation 3.4 over multiple time steps, one can see that ˆxt+1 de-
pends not only on the last observation yt but on the full history of obser-
, . . . , yt ), which therefore affects the distribution at any given
vations (y0
, . . . , yt ) generated by the LDS model is hence
time step. The process (y0
non-Markovian.
Conversely, for the linear RNN, the observations (y0
, . . . , yt ) instead do
form a Markov process, meaning that observations are conditionally inde-
pendent of their history given the activity from the previous time step:
P(yt | yt−1
, . . . , y0) = P(yt | yt−1).
(3.6)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1877
Figure 2: Mean autocorrelation of observations yt from latent LDS processes
compared with their first-order RNN approximations. The latent space is one-
dimensional (d = 1), and the dimension n of the observation space is increased
from left to right: (a) n = 3, (b) n = 20, (c) n = 100. The parameters of the latent
state processes are fixed scalars (A = (0.97), Q = (0.1)), while the elements of
the observation matrices C are drawn randomly and independently from a cen-
tered gaussian distribution of variance 1. The observation noise has covariance
v = 2. Note that we have chosen observation noise to largely
R = σ 2
dominate over latent state noise in order to obtain a large difference between
models at low n. Dots and shaded areas indicate, respectively, mean and stan-
dard deviation of different estimations of the mean autocorrelation done on 10
independent folds of 100 trials each (where C was identical across trials).
v In with σ 2
The fact that this property does not in general hold for the latent LDS shows
that the two model classes are not equivalent. Due to this fundamental con-
straint, the RNN can only approximate the complex distribution (see equa-
tion 3.1) parameterized by an LDS, as detailed in the following section and
illustrated in Figure 2.
, . . . , yt ) by deriving the conditional distribution P(yt+1
3.2 Matching the First-Order Marginals of an LDS Model. We can
obtain a Markovian approximation of the LDS-generated sequence of ob-
| yt )
servations (y0
under the LDS model and matching it with a low-rank RNN (Pachitariu
et al., 2013). This type of first-order approximation will preserve exactly the
, yt ) although struc-
one-time-step-difference marginal distributions P(yt+1
ture across longer timescales might not be captured correctly.
First, we note that we can express both yt and yt+1 as noisy linear pro-
jections of xt:
yt = Cxt
+ vt,
= C(Axt + wt ) + vt+1
yt+1
,
(3.7)
(3.8)
which follows from equation 2.1.
Let N (0, (cid:2)t ) denote the gaussian marginal distribution over the la-
tent vector xt at time t. Then we can use standard identities for linear
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1878
A. Valente, S. Ostojic, and J. Pillow
transformations of gaussian variables to derive the joint distribution over
yt and yt+1:
(cid:3)
(cid:4)
(cid:5)(cid:3)
(cid:4)(cid:6)
(cid:3)
(cid:4)
yt
yt+1
∼ N
,
0
0
C(cid:2)tC(cid:5) + R
CA(cid:2)tC(cid:5)
C(cid:2)tA(cid:5)C(cid:5)
C(A(cid:2)tA(cid:5) + Q)C(cid:5) + R
.
(3.9)
We can then apply the formula for conditioning of multivariate gaussians
(see Bishop, 2006, equations 2.81 and 2.82) to obtain
yt+1
| yt ∼ N (Jtyt, Pt ) ,
where
(cid:5)
Jt = CA(cid:2)tC
Pt = C(A(cid:2)tA
(C(cid:2)tC
(cid:5) + Q)C
(cid:5) + R)
−1,
(cid:5) + R − CA(cid:2)tC
(cid:5)
(3.10)
(3.11)
(3.12)
(cid:5).
C
(C(cid:2)tC
(cid:5) + R)
−1C(cid:2)tA
(cid:5)
In contrast, from equation 2.4, for a low-rank RNN, the first-order
marginal is given by
yt+1
| yt ∼ N (Jyt, P) .
(3.13)
Comparing equations 3.10 and 3.13, we see for the LDS model that the
effective weights Jt and the covariance Pt depend on time through (cid:2)t, the
marginal covariance of the latent at time t, while for the RNN, they do not.
Note, however, that (cid:2)t follows the recurrence relation
(cid:2)t+1
= A(cid:2)tA
(cid:5) + Q,
(3.14)
which converges toward a fixed point (cid:2)∞ that obeys the discrete Lyapunov
equation,
(cid:2)∞ = A(cid:2)∞A
(cid:5) + Q,
(3.15)
provided all eigenvalues of A have absolute value less than 1.
The LDS can therefore be approximated by an RNN with constant
weights when the initial covariance (cid:2)
0 is equal to the asymptotic covari-
ance (cid:2)∞, as noted previously (Pachitariu et al., 2013). Even if this condition
does not hold at time 0, (cid:2)∞ will in general be a good approximation of the
latent covariance after an initial transient. In this case, we obtain the fixed
recurrence weights
(cid:5)
J = CA(cid:2)∞C
(C(cid:2)∞C
(cid:5) + R)
−1 := MN
(cid:5),
(3.16)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1879
where we define M = C, which has shape n × d, and N(cid:5) = A(cid:2)∞C(cid:5)
(C(cid:2)∞C(cid:5) + R)−1, which has shape d × n, so that J is a rank r matrix with
r = d.
3.3 Cases of Equivalence between LDS and RNN Models. Although
latent LDS and low-rank linear RNN models are not equivalent in gen-
eral, we can show that the first-order Markovian approximation introduced
above becomes exact in two limit cases of interest: for observation noise or-
thogonal to the latent subspace and in the limit n (cid:6) d, with coefficients of
the observation matrix generated randomly and independently.
Our key observation is that if KtC = I in equation 3.4 with I the identity
= AKtyt, so that the dependence on the observations
matrix, we have ˆxt+1
before time step t disappears and the LDS therefore becomes Markovian.
Interestingly, this condition KtC = I also implies that the latent state can
be inferred from the current observation yt alone (see equation A.7 in ap-
pendix A) and that this inference is exact, since the variance of the distribu-
tion p(xt|yt ) is then equal to 0 as seen from equation A.8. We next examine
two cases where this condition is satisfied.
We first consider the situation where the observation noise vanishes: R =
0. Then, as shown in appendix A, the Kalman gain is Kt = (C(cid:5)C)
, so
that KtC = I. In that case, the approximation of the LDS by the RNN defined
in section 3.2 is exact, with equations 3.11 and 3.12 becoming:
−1C(cid:5)
−1C
(cid:5),
C)
(cid:5)
J = CA(C
(cid:5).
P = CQC
(3.17)
(3.18)
More generally, this result remains valid when the observation noise is or-
thogonal to the latent subspace spanned by the columns of the observation
matrix C (in which case the recurrence noise given by equation 3.18 becomes
P = CQC(cid:5) + R).
A second case in which we can obtain KtC ≈ I is in the limit of many neu-
rons, n (cid:6) d, assuming that coefficients of the observation matrix are gen-
erated randomly and independently. Indeed, under these hypotheses, the
Kalman gain given by equation 3.5 is dominated by the term CVtC(cid:5)
, so that
the observation covariance R becomes negligible, as shown formally in ap-
pendix B. Intuitively this means that the information about the latent state
ˆxt is distributed over a large enough population of neurons for the Kalman
filter to average out the observation noise and estimate it optimally with-
out making use of previous observations. Ultimately this makes the LDS
asymptotically Markovian in the case where we have an arbitrarily large
neural population relative to the number of latent dimensions.
To illustrate the convergence of the low-rank RNN approximation to
the target latent LDS in the large n limit, in Figure 2, we consider a sim-
ple example with a one-dimensional latent space and observation spaces of
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1880
A. Valente, S. Ostojic, and J. Pillow
increasing dimensionality. To visualize the difference between the LDS and
its low-rank RNN approximation, we plot the trace of the autocorrela-
tion matrix of observations yt in the stationary regime, ρ(δ) = Tr(E[ytyT
t+δ]).
Since the RNNs are constructed to capture the marginal distributions of ob-
servations separated by at most one time step, the two curves match exactly
for a lag δ ∈ {−1, 0, 1}, but dependencies at longer timescales cannot be ac-
curately captured by an RNN due to its Markov property (see Figure 2a).
However, these differences vanish as the dimensionality of the observation
space becomes much larger than that of the latent space (see Figures 2b and
2c), which illustrates that the latent LDS converges to a process equivalent
to a low-rank RNN.
4 Mapping Low-Rank Linear RNNs onto Latent LDS Models
We now turn to the reverse question: Under what conditions can a low-rank
linear RNN be expressed as a latent LDS model? We start with an intuitive
mapping for the deterministic case (when noise covariance P = 0) and then
extend it to a more general mapping valid in the presence of noise.
We first consider a deterministic linear low-rank RNN obeying
yt+1
= MN
(cid:5)
yt,
(4.1)
Since M is an n × r matrix, it is immediately apparent that for all t, yt is
confined to a linear subspace of dimension r, spanned by the columns of
M. Hence, we can define the r-dimensional latent state as
xt = M
+
yt,
where M+
defined since M is of rank r), so that we retrieve yt as
is the pseudoinverse of M defined as M+ = (M(cid:5)M)
−1M(cid:5)
yt = Mxt.
We then obtain a recurrence equation for the latent state:
xt+1
yt+1
(cid:5)
MN
yt
Mxt
+
+
= M
= M
(cid:5)
= N
:= Axt,
(4.2)
(well
(4.3)
(4.4)
which with A = N(cid:5)M describes the dynamics of a latent LDS with d = r. A
key insight from equation 4.4 is that the overlap between the columns of N
and M determines the part of the activity that is integrated by the recurrent
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1881
dynamics (Mastrogiuseppe & Ostojic, 2018; Schuessler et al., 2020; Beiran
et al., 2021; Dubreuil et al., 2022).
In presence of noise (cid:3)t, yt is no longer confined to the column space of
M. Part of this noise is integrated into the recurrent dynamics and can con-
tribute to the activity accross many time steps. This integration of noise can
occur in an LDS at the level of latent dynamics through wt, but not at the
level of observation noise vt, which is independent accross time steps. As
noted above, recurrent dynamics only integrate the activity present in the
column space of N. In the presence of noise, this part of state space there-
fore needs to be included into the latent variables. More important, a similar
observation can be made about external inputs when they are added to the
RNN dynamics (see appendix D).
A full mapping from a noisy low-rank RNN to an LDS model can there-
fore be built by extending the latent space to the linear subspace F of Rn
spanned by the columns of M and N (see appendix C), which has dimen-
sion d with r ≤ d ≤ 2r. Let C be a matrix whose columns form an orthogonal
basis for this subspace (which can be obtained via the Gram-Schmidt algo-
rithm). In that case, we can define the latent vector as
(cid:5)
xt = C
yt,
and the latent dynamics are given by
xt+1
= Axt + wt,
(4.5)
(4.6)
where the recurrence matrix is A = C(cid:5)JC, and the latent dynamics noise is
wt ∼ N (0, Q) with Q = C(cid:5)PC. Introducing vt = yt − Cxt, under a specific
condition on the noise covariance P, we obtain a normal random variable
independent of the other sources of noise in the process (appendix C), so
that yt can be described as a noisy observation of the latent state xt as in the
LDS model:
yt = Cxt + vt.
(4.7)
4.1 Subsampled RNNs. Experimental recordings typically access only
the activity of a small fraction of neurons in the local network. An important
question for interpreting neural data concerns the statistics of activity when
only a random subset of k neurons in an RNN is observed. This situation
can be formalized by introducing the set of observed activities ot:
yt+1
= Jyt + (cid:3)t,
ot = yt[: k] = Dyt.
(4.8)
Here [: k] symbolizes the selection of the first k values of a vector and D is
the corresponding projection matrix on the subspace spanned by the first k
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1882
A. Valente, S. Ostojic, and J. Pillow
Figure 3: Mean autocorrelation of k neurons subsampled from an n-dimesional
rank-one RNN, compared with a k-dimensional RNN built to match the first-
order marginals of partial observations. Formally, we first built an LDS equiv-
alent to the partial observations as in equation 4.9, and then the corresponding
RNN as in section 3.2. The rank-one RNN contains n = 20 neurons, of which
only k = 3 are observed. The mismatch occurs because the long-term correla-
tions present in the partial observations are caused by the larger size of the orig-
inal RNN with 20 neurons and cannot be reproduced by an RNN with only 3
neurons.
neurons. The system described by equation 4.8 is exactly an LDS but with
latent state yt and observations ot. In contrast to the regime considered in
the previous sections, the latents have a higher dimensionality than obser-
vations. However, assuming as before that J is low-rank, this model can be
mapped onto an equivalent LDS following the steps in appendix C:
xt+1
= Axt + wt,
ot = DCxt + Dvt.
(4.9)
This LDS is equivalent to equation 4.8, but with latent dynamics xt of di-
mension r < d < 2r where r is the rank of J. The dynamics of the latent
state xt are identical to those of the fully observed low-rank RNN (see equa-
tion 4.6), but the observations are generated from a subsampled observation
matrix DC.
Interestingly, this mapping highlights the fact that the activity statistics
of the k subsampled neurons are in general not Markovian, in contrast to
the full activity yt of the n neurons in the underlying RNN. In particular,
for that reason, the statistics of ot cannot be exactly reproduced by a smaller
RNN consisting of k units (see Figure 3). Remarkably, when considering the
subsampled activity of an RNN, a latent LDS is therefore a more accurate
model than a smaller RNN containing only the observed units.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1883
5 Discussion
In this letter, we have examined the relationship between two simple yet
powerful classes of models of low-dimensional activity: latent linear dy-
namical systems (LDS) and low-rank linear recurrent neural networks
(RNN). We have focused on these tractable linear models with additive
gaussian noise to highlight their mathematical similarities and differences.
Although both models induce a jointly gaussian distribution over neural
population activity, generic latent LDS models can exhibit long-range, non-
Markovian temporal dependencies that cannot be captured by low-rank
linear RNNs, which describe neural population activity with a first-order
Markov process. Conversely, we showed that generic low-rank linear RNNs
can be captured by an equivalent latent LDS model. However, we have
shown that the two classes of models are effectively equivalent in limit cases
of practical interest for neuroscience, in particular when the number of sam-
pled neurons is much higher than the latent dimensionality.
Although these two model classes can generate similar sets of neural
trajectories, different approaches are typically used for fitting them to neu-
ral data: parameters of LDS models are in general inferred by variants of
the expectation-maximization algorithm (Yu et al., 2005; Pachitariu et al.,
2013; Nonnenmacher, Turaga, & Macke, 2017; Durstewitz, 2017), which
include the Kalman smoothing equations (Roweis & Ghahramani, 1999),
while RNNs are often fitted with variants of linear regression (Rajan et al.,
2016; Eliasmith & Anderson, 2003; Pollock & Jazayeri, 2020; Bondanelli
et al., 2021) or backpropagation through time (Dubreuil et al., 2022). The re-
lationship uncovered here therefore opens the door to comparing different
fitting approaches more directly, and in particular to developing probabilis-
tic methods for inferring RNN parameters from data.
We have considered here only linear RNN and latent LDS models.
Nonlinear low-rank RNNs without noise can be directly reduced to
nonlinear latent dynamics with linear observations following the same
mapping as in section 4 (Mastrogiuseppe & Ostojic, 2018; Schuessler et al.,
2020; Beiran et al., 2021; Dubreuil et al., 2022) and therefore define a natural
class of nonlinear LDS models. A variety of other nonlinear generalizations
of LDS models have been considered in the literature. One line of work
has examined linear latent dynamics with a nonlinear observation model
(Yu et al., 2005) or nonlinear latent dynamics (Yu et al., 2005; Durstewitz,
2017; Duncker et al., 2019; Pandarinath et al., 2018; Kim et al., 2008). An-
other line of work has focused on switching LDS models (Linderman et al.,
2017; Glaser et al., 2020) for which the system undergoes different linear dy-
namics depending on a hidden discrete state, thus combining elements of
latent LDS and hidden Markov models. Both nonlinear low-rank RNNs and
switching LDS models are universal approximators of low-dimensional dy-
namical systems (Funahashi & Nakamura, 1993; Chow & Li, 2000; Beiran
et al., 2021). Relating switching LDS models to local linear approximations
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1884
A. Valente, S. Ostojic, and J. Pillow
of nonlinear low-rank RNNs (Beiran et al., 2021; Dubreuil et al., 2022) is
therefore an interesting avenue for future investigations.
Appendix A: Kalman Filtering Equations
We reproduce in this appendix the recurrence equations followed by the
conditional distributions in equation 3.1 for both the latent LDS and the
linear RNN models.
For the latent LDS model, the conditional distributions are gaussians,
and their form is given by the Kalman filter equations (Kalman, 1960; Yu
et al., 2004; Welling, 2010). Following Yu et al. (2004), we observe that for
any two time steps τ ≤ t the conditional distributions P(yt+1
|yτ , . . . , y0) and
P(xt+1
|yτ , . . . , y0) are gaussian, and we introduce the notations
τ
P(yt|yτ , . . . , y0) := N ( ˆy
t
τ
P(xt|yτ , . . . , y0) := N (ˆx
t
τ
, W
τ
, V
t ),
t ).
(A.1)
(A.2)
In particular, we are interested in expressing ˆyt
t+1 and ˆxt
t+1, which are the
predicted future observation and latent state, but also in ˆxt
t which represents
the latent state inferred from the history of observations until time step t
included. To lighten notations, in the main text, we remove the exponent
when it has one time step difference with the index, by writing ˆxt+1, ˆyt+1,
Wt+1 and Vt+1 instead of, respectively, ˆxt
t+1, Wt
First, note that we have the natural relationships
t+1 and Vt
t+1, ˆyt
t+1.
,
t+1
ˆxt
t+1
ˆyt
t+1
Vt
t+1
Wt
t+1
,
= Aˆxt
t
= Cˆxt
= AVt
= CVt
(cid:5) + Q,
tA
t+1C
(cid:5) + R,
(A.3)
(A.4)
(A.5)
(A.6)
t and Vt
so that it is sufficient to find expressions for ˆxt
detailed in Yu et al. (2004) or Welling (2010), we obtain
t. After calculations
ˆxt
t
Vt
t
+ Kt (yt − Cˆxt−1
= ˆxt−1
t
= (I − KtC)Vt−1
,
t
t
),
where Kt is the Kalman gain given by
Kt = Vt−1
(cid:5)
t C
(CVt−1
t C
(cid:5) + R)
−1.
(A.7)
(A.8)
(A.9)
These equations form a closed recurrent system, as can be seen by com-
bining equations A.3 and A.7 and equations A.5 and A.8 to obtain a self-
consistent set of recurrence equations for the predicted latent state and
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1885
its variance:
ˆxt
t+1
Vt
t+1
t
+ Kt (yt − Cˆxt−1
)),
t
(cid:5) + Q,
= A(ˆxt−1
= A(I − KtC)Vt−1
= A(I − Vt−1
t A
(CVt−1
(cid:5)
t C
t C
(cid:5) + R)
−1C)Vt−1
t A
(cid:5) + Q.
(A.10)
(A.11)
From equation A.10, we see that the predicted state at time t + 1, and
thus the predicted observation, depends on observations at time steps τ ≤
t − 1 through the term ˆxt, making the system non-Markovian. Also note
that equations for the variances do not involve any of the observations yt,
showing these are exact values and not estimations.
This derivation, however, is not valid in the limit case R = 0, since Kt
is then undefined. In that case, however, we can observe that yt lies in the
linear subspace spanned by the columns of C, so that one can simply replace
equation A.7 by
ˆxt
t
+
= C
yt = xt,
(A.12)
−1C(cid:5)
where C+ = (C(cid:5)C)
is the pseudoinverse of C. Since this equation is
deterministic, the variance of the estimated latent state is equal to 0, so that
= 0. This case can be encompassed by equations
equation A.8 becomes Vt
t
A.3 to A.8 if we rewrite the Kalman gain as
Kt = C
(cid:5)
+ = (C
−1C
(cid:5).
C)
(A.13)
Finally, for the linear RNN, the conditional distribution of equation A.1
is directly given by
P(yt+1
|yt, . . . , y0) = N (Jyt, P),
(A.14)
which shows that the predicted observation depends only on the last one,
making the system Markovian.
Appendix B: Equivalence in the Large Network Limit
Here we make the assumption that the coefficients of the observation matrix
are generated randomly and independently. We show that in the limit of
large n with d fixed, one obtains KtC → I so that the LDS is asymptotically
Markovian and can therefore be exactly mapped to an RNN.
We start by considering a latent LDS whose conditional distributions
obey equations A.1 to A.11, with the Kalman gain obeying equation A.9.
To simplify equation A.9, we focus on the steady state where variance Vt
has reached its stationary limit V in equation A.11.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1886
A. Valente, S. Ostojic, and J. Pillow
Without loss of generality, we reparameterize the LDS by applying a
change of basis to the latent states such that V = I. We also apply a change of
basis to the observation space such that R = I in the new basis (this transfor-
mation does not have an impact on the conditional dependencies between
the yt at different time steps, and it can also be shown that it cancels out in
the expression KtC). Equation A.9 then becomes
(cid:5)
KtC = C
(cid:5)
(I + CC
−1C.
)
(B.1)
Applying the matrix inversion lemma gives (I + CC(cid:5))−1 = I − C(I +
C(cid:5)C)−1C(cid:5), from which we get
(cid:5)
KtC = C
(cid:5)
C − C
(cid:5)
C(I + C
(cid:5)
−1C
C)
C.
Using a Taylor expansion, we then write
(cid:5)
(I + C
C)
−1 = (I + (C
(cid:5)
(cid:5)
−1)
C)
−1(C
(cid:5)
(cid:6)
−1
C)
∞(cid:7)
=
(cid:5)
(−(C
−1)k
C)
(cid:5)
(C
−1
C)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
k=0
(cid:5)
≈ (C
which gives
−1 − ((C
(cid:5)
C)
−1)2 + ((C
(cid:5)
C)
−1)3,
C)
(cid:5)
KtC ≈ C
(cid:5)
C − C
(cid:5)
− C
(cid:5)
≈ C
C((C
(cid:5)
C − C
(cid:5)
C(C
(cid:5)
(cid:5)
−1C
C)
(cid:5)
−1)3C
C)
C
(cid:5)
C + I − (C
−1.
C)
(cid:5)
C + C
(cid:5)
C((C
−1)2C
(cid:5)
C)
C
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
Assuming the coefficients of the observation matrix are independent
and identically distributed (i.i.d.) with zero mean and unit variance, for n
large, we obtain C(cid:5)C = nI + O(
n) from the central limit theorem so that
(C(cid:5)C)−1 = O(1/n) (which can again be proven with a Taylor expansion).
This finally leads to KtC = I + O(1/n).
√
An alternative proof takes advantage of the spectral theorem applied to
C(cid:5)C. Indeed, since it is a symmetric matrix, it can be decomposed as C(cid:5)C =
UDU(cid:5) where U is an orthonormal matrix and D the diagonal matrix of
eigenvalues. Starting from equation B.1 we derive
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
(cid:5)
KtC = C
(cid:5)
C − C
(cid:5)
C(I + C
(cid:5) − UDU
(cid:5)
(cid:5) − UDU
(cid:5)
(cid:5)
(cid:5) − UDU
(cid:5)
−1C
C
C)
(cid:5)
−1UDU
(cid:5)
(I + UDU
)
(cid:5)
(cid:5)
−1UDU
(U(D + I)U
)
(cid:5)
(cid:5)
−1U
U(D + I)
UDU
= UDU
= UDU
= UDU
Comparing the Low-Rank RNN and LDS Models
1887
(cid:5)
−1U
(cid:5) − UD2(D + I)
= UDU
(cid:5)
= U(D − I/(D + I))U
(cid:5)
= U(D/(D + I))U
(cid:5)
= U(I − I/(D + I))U
= I − U(I/(D + I))U
(cid:5).
Assuming as before that the coefficients of C are i.i.d. gaussian with zero
mean and unit variance, C(cid:5)C is then the empirical covariance of i.i.d. sam-
ples of a gaussian ensemble with identity matrix covariance. The matrix
C(cid:5)C = UDU(cid:5) then follows the (I, n)-Wishart distribution, and for n large,
√
n (using e.g., the tail bounds of Wain-
its eigenvalues are all greater than
wright, 2019, theorem 6.1). This shows that (I/(D + I)) = O(1/
n)I, com-
pleting the proof.
√
Appendix C: Derivation of the RNN to LDS Mapping
As mentioned in section 4, we consider an RNN defined by equation 2.4
with J = MN(cid:5) and note C an orthonormal matrix whose columns form a
basis of F, the linear subspace spanned by the columns of M and N. Note
that CC(cid:5) is an orthogonal projector onto the subspace F and that since all
columns of M and N belong to this subspace, we have CC(cid:5)M = M and
CC(cid:5)N = N. Hence, we have
(cid:5)
CC
JCC
(cid:5) = J.
(C.1)
We thus define the latent vector as xt = C(cid:5)yt, and we can then write
xt+1
(cid:5)(cid:3)t
yt+1
Jyt + C
(cid:5)
(cid:5)
= C
(cid:5)
= C
(cid:5)
= C
(cid:5)
= C
JCC
= Axt + wt,
CC
(cid:5)(cid:3)t
(cid:5)
JCC
(cid:5)
yt + C
yt + C
(cid:5)(cid:3)t
(by equation C.1)
(because C
(cid:5)
C = I)
where we have defined the recurrence matrix A = C(cid:5)JC and the latent dy-
namics noise wt = C(cid:5)(cid:3)t, which follows wt ∼ N (0, Q) with Q = C(cid:5)PC.
Let us define vt = yt − Cxt = (I − CC(cid:5)
)yt. We need to determine the con-
ditions under which vt is normally distributed and independent of yt−1 and
xt. For this, we write
Cxt = CAxt−1
+ Cwt−1
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1888
A. Valente, S. Ostojic, and J. Pillow
(cid:5)
= CC
(cid:5)
= CC
= Jyt−1
+ Cwt−1
JCxt−1
(cid:5)
+ Cwt−1
yt−1
JCC
,
+ Cwt−1
and hence,
vt = (cid:3)t−1
− Cwt−1
(cid:5)
)(cid:3)t−1
= (I − CC
,
which is independent of yt−1 and has a marginal distribution vt ∼ N (0, R)
with R = P − CC(cid:5)PCC(cid:5)
but is not in general independent of wt−1. A suffi-
cient and necessary condition for the independence of wt−1 and vt is that the
RNN noise covariance P has all its eigenvectors either aligned with or or-
thogonal to the subspace F (in this case, the covariance R is degenerate and
has F as a null space, which implies that observation noise is completely
orthogonal to F). If that is not the case, the reparameterization stays valid
up to the fact that the observation noise vt and the latent dynamics noise wt
can be correlated.
Appendix D: Addition of Input Terms
Let us consider an extension of both the latent LDS and the linear RNN
models to take into account inputs. More specifically, we consider adding
to both model classes an input under the form of a time-varying signal ut
fed to the network through a constant set of input weights. In the latent LDS
model, the input is fed directly to the latent variable and equations 2.1 and
2.2 become
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
xt = Axt−1
yt = Cxt
+ But + wt,
+ vt,
wt ∼ N (0, Q),
vt ∼ N (0, R).
The linear RNN equation 2.4 becomes
yt = Jyt−1
+ Winut + (cid:3)t,
(cid:3)t ∼ N (0, P),
(D.1)
(D.2)
(D.3)
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
so that we will represent by B a low-dimensional input projection and Win
a high-dimensional one.
For the LDS-to-RNN mapping, we can directly adapt the derivations of
section 3.2, which lead to
yt+1
| yt ∼ N (CBut + Jtyt, Pt )
(D.4)
with the same expressions for Jt and Pt, given in equations 3.11 and 3.12.
Comparing the Low-Rank RNN and LDS Models
1889
For the RNN-to-LDS mapping, assuming again that J is low-rank and
written as J = MN(cid:5)
, we can define
(cid:5)
xt = C
yt,
where C is a matrix whose columns form an orthonormal basis for the sub-
space F spanned by the columns of M, N, and Win. This latent vector then
follows the dynamics
xt+1
(cid:5)
= CJC
(cid:5)
xt + C
Winut + C
(cid:5)(cid:3)t,
(D.5)
which corresponds to equation D.1, and it is straightforward to show that
it leads to equation D.2, with the technical condition that the covariance
of (cid:3)t should have its eigenvectors aligned with the subspace F to avoid
correlations between observation and recurrent noises.
Acknowledgments
We thank both reviewers for constructive suggestions that have signifi-
cantly improved this letter. In particular, we thank Scott Linderman for
the alternative proof in appendix B. A.V. and S.O. were supported by
the program Ecoles Universitaires de Recherche (ANR-17-EURE-0017), the
CRCNS program through French Agence Nationale de la Recherche (ANR-
19-NEUC-0001-01), and the NIH BRAIN initiative (U01NS122123). J.W.P.
was supported by grants from the Simons Collaboration on the Global Brain
(SCGB AWD543027), the NIH BRAIN initiative (R01EB026946), and a visit-
ing professorship grant from the Ecole Normale Superieure.
References
Archer, E., Koster, U., Pillow, J., & Macke, J. (2014). Low-dimensional models of neu-
ral population activity in sensory cortical circuits. In Z. Ghahramani, M. Welling,
C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information
processing systems, 27 (pp. 343–351). Red Hook, NY: Curran.
Barak, O. (2017). Recurrent neural networks as versatile tools of neuroscience
research. Current Opinion in Neurobiology, 46, 1–6. 10.1016/j.conb.2017.06.003,
PubMed: 28668365
Beiran, M., Dubreuil, A., Valente, A., Mastrogiuseppe, F., & Ostojic, S. (2021). Shap-
ing dynamics with multiple populations in low-rank recurrent networks. Neural
Computation, 33, 1572–1615. 10.1162/neco_a_01381, PubMed: 34496384
Bishop, C. (2006). Pattern recognition and machine learning. Berlin: Springer.
Bondanelli, G., Deneux, T., Bathellier, B., & Ostojic, S. (2021). Network dynam-
ics underlying OFF responses in the auditory cortex. eLife, 10, e53151. 10.7554/
eLife.53151, PubMed: 33759763
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1890
A. Valente, S. Ostojic, and J. Pillow
Chow, T., & Li, X. (2000). Modeling of continuous time dynamical systems with input
by recurrent neural networks. IEEE Transactions on Circuits and Systems I: Funda-
mental Theory and Applications, 47, 575–578. 10.1109/81.841860
Churchland, M., Byron, M., Sahani, M., & Shenoy, K. (2007). Techniques for extract-
ing single-trial activity patterns from large-scale neural recordings. Current Opin-
ion in Neurobiology, 17, 609–618. 10.1016/j.conb.2007.11.001, PubMed: 18093826
Cohen, Z., DePasquale, B., Aoi, M., & Pillow, J. (2020). Recurrent dynamics of prefrontal
cortex during context-dependent decision-making. bioRxiv.
Cunningham, J., & Yu, B. (2014). Dimensionality reduction for large-scale neu-
ral recordings. Nature Neuroscience, 17, 1500–1509. 10.1038/nn.3776, PubMed:
25151264
Dubreuil, A., Valente, A., Beiran, M., Mastrogiuseppe, F., & Ostojic, S. (2022). The
role of population structure in computations through neural dynamics. Nature
Neuroscience, 25, 783–794. 10.1038/s41593-022-01088-4, PubMed: 35668174
Duncker, L., Bohner, G., Boussard, J., & Sahani, M. (2019). Learning interpretable
continuous-time models of latent stochastic dynamical systems. In Proceedings of
the International Conference on Machine Learning (pp. 1726–1734).
Durstewitz, D. (2017). A state space approach for piecewise-linear recurrent neu-
ral networks for identifying computational dynamics from neural measure-
ments. PLOS Computational Biology, 13, e1005542. 10.1371/journal.pcbi.1005542,
PubMed: 28574992
Eliasmith, C., & Anderson, C. (2003). Neural engineering: Computation, representation,
and dynamics in neurobiological systems. Cambridge, MA: MIT Press.
Finkelstein, A., Fontolan, L., Economo, M., Li, N., Romani, S., & Svoboda, K. (2021).
Attractor dynamics gate cortical information flow during decision-making. Na-
ture Neuroscience, 24, 843–850. 10.1038/s41593-021-00840-6, PubMed: 33875892
Funahashi, K., & Nakamura, Y. (1993). Approximation of dynamical systems by con-
tinuous time recurrent neural networks. Neural Networks, 6, 801–806. 10.1016/
S0893-6080(05)80125-X
Gallego, J., Perich, M., Miller, L, & Solla, S. (2017). Neural manifolds for the con-
trol of movement. Neuron, 94, 978–984. 10.1016/j.neuron.2017.05.025, PubMed:
28595054
Gao, P., & Ganguli, S. (2015). On simplicity and complexity in the brave new world
of large-scale neuroscience. Current Opinion in Neurobiology, 32, 148–155. 10.1016/
j.conb.2015.04.003, PubMed: 25932978
Glaser, J., Whiteway, M., Cunningham, J., Paninski, L., & Linderman, S. (2020). Re-
current switching dynamical systems models for multiple interacting neural pop-
ulations. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.),
Advances in neural information processing systems, 33 (pp. 14867–14878). Red Hook,
NY: Curran.
Hennequin, G., Vogels, T., & Gerstner, W. (2014). Optimal control of transient dynam-
ics in balanced networks supports generation of complex movements. Neuron, 82,
1394–1406. 10.1016/j.neuron.2014.04.045, PubMed: 24945778
Jazayeri, M., & Ostojic, S. (2021). Interpreting neural computations by examining in-
trinsic and embedding dimensionality of neural activity. Current Opinion in Neu-
robiology, 70, 113–120. 10.1016/j.conb.2021.08.002, PubMed: 34537579
Kalman, R. (1960). A new approach to linear filtering and prediction problems. Jour-
nal of Basic Engineering, 82, 35–45. 10.1115/1.3662552
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Comparing the Low-Rank RNN and LDS Models
1891
Kao, T., Sadabadi, M., & Hennequin, G. (2021). Optimal anticipatory control as a
theory of motor preparation: A thalamo-cortical circuit model. Neuron, 109, 1567–
1581. 10.1016/j.neuron.2021.03.009, PubMed: 33789082
Kim, S., Simeral, J., Hochberg, L., Donoghue, J., & Black, M. (2008). Neural control of
computer cursor velocity by decoding motor cortical spiking activity in humans
with tetraplegia. Journal of Neural Engineering, 5, 455. 10.1088/1741-2560/5/4/010
Laje, R., & Buonomano, D. (2013). Robust timing and motor patterns by taming chaos
in recurrent neural networks. Nature Neuroscience, 16, 925–933. 10.1038/nn.3405,
PubMed: 23708144
Landau, I., & Sompolinsky, H. (2018). Coherent chaos in a recurrent neural network
with structured connectivity. PLOS Computational Biology, 14, e1006309. 10.1371/
journal.pcbi.1006309, PubMed: 30543634
Landau, I., & Sompolinsky, H. (2021). Macroscopic fluctuations emerge in balanced
networks with incomplete recurrent alignment. Phys. Rev. Research., 3, 023171.
10.1103/PhysRevResearch.3.023171
Linderman, S., Johnson, M., Miller, A., Adams, R., Blei, D., & Paninski, L. (2017).
Bayesian learning and inference in recurrent switching linear dynamical systems.
In Proceedings of the 20th International Conference on Artificial Intelligence and Statis-
tics (pp. 914–922).
Macke, J., Buesing, L., Cunningham, J., Byron M., Shenoy, K., & Sahani, M. (2011).
Empirical models of spiking in neural populations. In S. Solla, T. Leen, & K. R.
Müller (Eds.), Advances in neural information processing systems (pp. 1350–1358).
Cambridge, MA: MIT Press.
Mante, V., Sussillo, D., Shenoy, K., & Newsome, W. (2013). Context-dependent com-
putation by recurrent dynamics in prefrontal cortex. Nature, 503, 78–84. 10.1038/
nature12742, PubMed: 24201281
Mastrogiuseppe, F., & Ostojic, S. (2018). Linking connectivity, dynamics, and com-
putations in low-rank recurrent neural networks. Neuron, 99, 609–623. 10.1016/
j.neuron.2018.07.003, PubMed: 30057201
Nonnenmacher, M., Turaga, S., & Macke, J. (2017). Extracting low-dimensional dy-
namics from multiple large-scale neural population recordings by learning to pre-
dict correlations. In I. Guyon, Y. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.
Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing sys-
tems, 30. Red Hook, NY: Curran.
Pachitariu, M., Petreska, B., & Sahani, M. (2013), Recurrent linear models of
simultaneously-recorded neural populations. In C. J. C. Burges, L. Bottou, M.
Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information
processing systems, 26 (pp. 3138–3146). Red Hook, NY: Curran.
Pandarinath, C., O’Shea, D., Collins, J., Jozefowicz, R., Stavisky, S., Kao, J., . . .
Sussilllo, D. (2018). Inferring single-trial neural population dynamics using se-
quential auto-encoders. Nature Methods, 15, 805–815. 10.1038/s41592-018-0109-9,
PubMed: 30224673
Pereira, U., & Brunel, N. (2018). Attractor dynamics in networks with learning rules
inferred from in vivo data. Neuron, 99, 227–238. 10.1016/j.neuron.2018.05.038,
PubMed: 29909997
Perich, M., Arlt, C., Soares, S., Young, M., Mosher, C., Minxha, J., . . . Rajan, K. (2021).
Inferring brain-wide interactions using data-constrained recurrent neural network mod-
els. bioRxiv:2020-12.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
1892
A. Valente, S. Ostojic, and J. Pillow
Petreska, B., Byron, M., Cunningham, J., Santhanam, G., Ryu, S., Shenoy, K., & Sa-
hani, M. (2011). Dynamical segmentation of single trials from population neu-
ral data. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, & K. Q. Weinberger
(Eds.), Advances in neural information processing systems, 24 (pp. 756–764). Red
Hook, NY: Curran.
Pollock, E., & Jazayeri, M. (2020). Engineering recurrent neural networks from task-
relevant manifolds and dynamics. PLOS Computational Biology, 16, e1008128.
10.1371/journal.pcbi.1008128
Rajan, K., Harvey, C., & Tank, D. (2016). Recurrent network models of sequence
generation and memory. Neuron, 90, 128–142. 10.1016/j.neuron.2016.02.009,
PubMed: 26971945
Roweis, S., & Ghahramani, Z. (1999). A unifying review of linear gaussian models.
Neural Computation, 11, 305–345. 10.1162/089976699300016674, PubMed: 9950734
Saxena, S., & Cunningham, J. (2019). Towards the neural population doctrine. Cur-
rent Opinion in Neurobiology, 55, 103–111. 10.1016/j.conb.2019.02.002, PubMed:
30877963
Schuessler, F., Dubreuil, A., Mastrogiuseppe, F., Ostojic, S., & Barak, O. (2020). Dy-
namics of random recurrent networks with correlated low-rank structure. Physi-
cal Review Research, 2, 013111. 10.1103/PhysRevResearch.2.013111
Semedo, J., Zandvakili, A., Kohn, A., Machens, C., & Byron, M. (2014). Extracting
latent structure from multiple interacting neural populations. In Z. Ghahramani,
M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural
information processing systems, 27 (pp. 2942–2950). Red Hook, NY: Curran.
Smith, A., & Brown, E. (2003). Estimating a state-space model from point pro-
cess observations. Neural Computation, 15, 965–991. 10.1162/089976603765202622,
PubMed: 12803953
Sompolinsky, H., Crisanti, A., & Sommers, H. (1988). Chaos in random neural
networks Phys. Rev. Lett., 61, 259–262. 10.1103/PhysRevLett.61.259, PubMed:
10039285
Sussillo, D. (2014). Neural circuits as computational dynamical systems. Cur-
rent Opinion in Neurobiology, 25, 156–163. 10.1016/j.conb.2014.01.008, PubMed:
24509098
Wainwright, M. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cam-
bridge: Cambridge University Press.
Welling, M. (2010). The Kalman filter (Caltech lecture note 136-93).
Yu, B., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2005). Extracting
dynamical structure embedded in neural activity. In Y. Weiss, B. Schölkopf, & J.
Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA:
MIT Press.
Yu, B., Shenoy, K., & Sahani, M. (2004). Derivation of Kalman filtering and smoothing
equations. Stanford, CA: Stanford University.
Zoltowski, D., Pillow, J., & Linderman, S. (2020). A general recurrent state space
framework for modeling neural dynamics during decision-making. In Proceed-
ings of the International Conference on Machine Learning (pp. 11680–11691).
Received October 26, 2021; accepted April 15, 2022.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
c
o
_
a
_
0
1
5
2
2
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3