信 - 麻省理工学院人工智能研究专业

信

Communicated by Lea Duncker

Probing the Relationship Between Latent Linear Dynamical
Systems and Low-Rank Recurrent Neural Network Models

Adrian Valente
adrian.valente@ens.fr
Srdjan Ostojic
srdjan.ostojic@ens.fr
Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960,
Ecole Normale Superieure–PSL Research University, 75005 巴黎, 法国

Jonathan W. Pillow
pillow@princeton.edu
Princeton Neuroscience Institute, 普林斯顿大学, 普林斯顿大学,
新泽西州 08544, 美国.

A large body of work has suggested that neural populations exhibit low-
dimensional dynamics during behavior. 然而, there are a variety of
different approaches for modeling low-dimensional neural population
活动. One approach involves latent linear dynamical system (LDS)
型号, in which population activity is described by a projection of low-
dimensional latent variables with linear dynamics. A second approach
involves low-rank recurrent neural networks (RNNs), in which popula-
tion activity arises directly from a low-dimensional projection of past
活动. Although these two modeling approaches have strong similar-
实体, they arise in different contexts and tend to have different domains
of application. Here we examine the precise relationship between latent
LDS models and linear low-rank RNNs. When can one model class be
converted to the other, and vice versa? We show that latent LDS models
can only be converted to RNNs in specific limit cases, due to the non-
Markovian property of latent LDS models. 反过来, we show that lin-
ear RNNs can be mapped onto LDS models, with latent dimensionality at
most twice the rank of the RNN. A surprising consequence of our results
is that a partially observed RNN is better represented by an LDS model
than by an RNN consisting of only observed units.

1 介绍

Recent work on large-scale neural population recordings has suggested
that neural activity is often confined to a low-dimensional space, 和
fewer dimensions than the number of neurons in a population (教会-
土地, Byron, Sahani, & 谢诺伊, 2007; 高 & 甘古利, 2015; Gallego, Perich,

神经计算 34, 1871–1892 (2022) © 2022 麻省理工学院
https://doi.org/10.1162/neco_a_01522

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

1872

A. Valente, S. Ostojic, 和 J. Pillow

磨坊主, & Solla, 2017; Saxena & 坎宁安, 2019; Jazayeri & Ostojic,
2021). To describe this activity, modelers have at their disposal a wide ar-
ray of tools that give rise to different forms of low-dimensional activity
(坎宁安 & 于, 2014). Two classes of modeling approaches that have
generated a large following in the literature are descriptive statistical mod-
els and mechanistic models. Broadly speaking, descriptive statistical mod-
els aim to identify a probability distribution that captures the statistical
properties of an observed neural dataset, while remaining agnostic about
the mechanisms that gave rise to it. Mechanistic models, 相比之下, 目的
to reproduce certain characteristics of observed data using biologically
inspired mechanisms, but often with less attention to a full statistical de-
scription. Although these two classes of models often have similar mathe-
matical underpinnings, there remain a variety of important gaps between
他们. Here we focus on reconciling the gaps between two simple but pow-
erful models of low-dimensional neural activity: latent linear dynamical
系统 (LDS) and linear low-rank recurrent neural networks (RNNs).

The latent LDS model with gaussian noise is a popular statistical model
for low-dimensional neural activity in both systems neuroscience (史密斯
& 棕色的, 2003; Semedo, Zandvakili, Kohn, Machens, & Byron, 2014) 和
brain-machine interface settings (Kim, Simeral, Hochberg, 多诺霍, &
黑色的, 2008). This model has a long history in electrical engineering, 在哪里
the problem of inferring latents from past observations has an analytical
solution known as the Kalman filter (Kalman, 1960). In neuroscience set-
tings, this model has been used to describe high-dimensional neural pop-
ulation activity in terms of linear projections of low-dimensional latent
变量. Although the basic form of the model includes only linear dy-
namics, recent extensions have produced state-of-the-art models for high-
dimensional spike train data (Yu et al., 2005; Petreska et al., 2011; Macke
等人。, 2011; Pachitariu, Petreska, & Sahani, 2013; Archer, 科斯特, Pillow, &
Macke, 2014; Duncker, Bohner, Boussard, & Sahani, 2019; Zoltowski, Pillow,
& Linderman, 2020; Glaser, Whiteway, 坎宁安, Paninski, & Linder-
男人, 2020; Kim et al., 2008).

Recurrent neural networks, 相比之下, have emerged as a powerful
framework for building mechanistic models of neural computations under-
lying cognitive tasks (Sussillo, 2014; Barak, 2017; Mante, Sussillo, 谢诺伊, &
Newsome, 2013) and have more recently been used to reproduce recorded
neural data (Rajan, 哈维, & Tank, 2016; 科恩, DePasquale, Aoi, &
Pillow, 2020; Finkelstein et al., 2021; Perich et al., 2021). While randomly
connected RNN models typically have high-dimensional activity (作为-
polinsky, Crisanti, & Sommers, 1988; Laje & Buonomano, 2013), recent work
has shown that RNNs with low-rank connectivity provide a rich theoretical
framework for modeling low-dimensional neural dynamics and the result-
ing computations (Mastrogiuseppe & Ostojic, 2018; Landau & Sompolin-
天空, 2018; 佩雷拉 & Brunel, 2018; Schuessler, Dubreuil, Mastrogiuseppe,
Ostojic, & Barak, 2020; Beiran, Dubreuil, Valente, Mastrogiuseppe, &

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Comparing the Low-Rank RNN and LDS Models

1873

Ostojic, 2021; Dubreuil, Valente, Beiran, Mastrogiuseppe, & Ostojic, 2022;
Bondanelli, Deneux, Bathellier, & Ostojic, 2021; Landau & Sompolinsky,
2021). In these low-rank RNNs, the structure of low-dimensional dynam-
ics bears direct commonalities with latent LDS models, yet the precise
relationship between the two classes of models remains to be clarified.
Understanding this relationship would open the door to applying to low-
rank RNNs probabilistic inference techniques developed for LDS models
and conversely could provide mechanistic interpretations of latent LDS
models fitted to data.

In this letter, we examine the mathematical relationship between latent
LDS and low-rank RNN models. We focus on linear RNNs, which are less
expressive but simpler to analyze than their nonlinear counterparts while
still leading to rich dynamics (Hennequin, Vogels, & Gerstner, 2014; Kao,
Sadabadi, & Hennequin, 2021; Bondanelli et al., 2021). We show that even if
both LDS models and linear low-rank RNNs produce gaussian distributed
activity patterns with low-dimensional linear dynamics, the two model
classes have different statistical structures and are therefore not in general
equivalent. 进一步来说, in latent LDS models, the output sequence has
non-Markovian statistics, meaning that the activity in a single time step is
not independent of its history given the activity on the previous time step.
This stands in contrast to linear RNNs, which are Markovian regardless
of the rank of their connectivity. A linear low-rank RNN can nevertheless
provide a first-order approximation to the distribution over neural activity
generated by a latent LDS model, and we show that this approximation be-
comes exact in several cases of interest, and in particular, in the limit where
the number of neurons is large compared to the latent dimensionality. 骗局-
versely, we show that any linear low-rank RNN can be converted to a latent
LDS, although the dimensionality of the latent space depends on the over-
lap between the subspaces spanned by left and right singular vectors of
the RNN connectivity matrix and may be as high as twice the rank of this
矩阵. The two model classes are thus closely related, with linear low-rank
RNNs comprising a subset of the broader class of latent LDS models. An in-
teresting implication of our analyses is that the activity of an RNN in which
only a subset of neurons are observed is better fit by a latent LDS model
than by an RNN consisting only of observed units.

2 Modeling Frameworks

We start with a formal description of the two model classes in question,
both of which describe the time-varying activity of a population of n
神经元.

2.1 Latent LDS Model. The latent linear dynamical system (LDS)
模型, also known as a linear gaussian state-space model, describes neural
population activity as a noisy linear projection of a low-dimensional latent

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
4
9
1
8
7
1
2
0
3
9
7
7
5
n
e
C
哦
_
A
_
0
1
5
2
2
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

1874

A. Valente, S. Ostojic, 和 J. Pillow

数字 1: (A) Schematic representation of the latent linear dynamical system
模型, as defined by equations 2.1 到 2.3. (乙) Schematic representation of the
low-rank linear RNN, as defined by equations 2.4 和 2.5.

variable governed by linear dynamics with gaussian noise (Kalman, 1960;
Roweis & Ghahramani, 1999; see Figure 1A). The model is characterized by
the following equations:

xt+1

= Axt + wt,
yt = Cxt + vt,

wt ∼ N (0, 问),
vt ∼ N (0, 右).

(2.1)

(2.2)

这里, xt is a d-dimensional latent (or “unobserved”) vector that follows
discrete-time linear dynamics specified by a d × d matrix A and is cor-
rupted on each time step by a zero-mean gaussian noise vector wt ∈ Rd with
covariance Q. The vector of neural activity yt arises from a linear trans-
formation of xt via the n × d observation (or “emissions”) matrix C, 科尔-
rupted by zero-mean gaussian noise vector vt ∈ Rn with covariance R.
Generally we assume d < n, so that the high-dimensional observations yt are explained by the lower-dimensional dynamics of the latent vector xt. For clarity, in the main text, we focus on LDS models without external inputs and study their effect in appendix D. The complete model also contains a specification of the distribution of the initial latent vector x0, which is commmonly assumed to have a zero- mean gaussian distribution with covariance (cid:2) 0: x0 ∼ N (0, (cid:2) 0). (2.3) }. The complete parameters of the model are thus θ Note that this parameterization of an LDS is not unique: any invertible = {A, C, Q, R, (cid:2) LDS 0 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1875 linear transformation of the latent space leads to an equivalent model if the appropriate transformations are applied to matrices A, C, Q, and (cid:2) 0. 2.2 Low-Rank Linear RNN. A linear RNN, also known as an autore- gressive (AR) model, represents observed neural activity as a noisy linear projection of the activity at the previous time step. We can write the model as (see Figure 1B) yt+1 = Jyt + (cid:3)t, (cid:3)t ∼ N (0, P), (2.4) where J is an n × n recurrent weight matrix and (cid:3)t ∈ Rn is a gaussian noise vector with mean zero and covariance P. Moreover, we assume that the ini- tial condition is drawn from a zero-mean distribution with covariance Vy 0: y0 ∼ N (0, Vy 0). (2.5) A low-rank RNN model is obtained by constraining the rank of the re- current weight matrix J to be r (cid:4) n. In this case, the recurrence matrix can be factorized as J = MN (cid:5), (2.6) where M and N are both n × r matrices of rank r. Note that this factorization is not unique, but a particular factorization can be obtained from a low-rank J matrix using the truncated singular value decomposition: J = USV(cid:5) , where U and V are semiorthogonal n × r matri- ces of left and right singular vectors, respectively, and S is an r × r diagonal matrix containing the largest singular values. We can then set M = U and N = SV(cid:5) . The model parameters of the low-rank linear RNN are therefore given by θRNN = {M, N, P, Vy 0 }. , . . . , yT ) over time series (y1 2.3 Comparing the Two Models. Both models described above ex- hibit low-dimensional dynamics embedded in a high-dimensional obser- vation space. In the following, we examine the probability distributions , . . . , yT ) generated by the two models. P(y1 We show that in general, the two models give rise to different distribu- tions, such that the family of probability distributions generated by the LDS model cannot all be captured with low-rank linear RNNs. Specifically, RNN models are constrained to purely Markovian distributions, which is not the case for LDS models. However, the two model classes can be shown to be equivalent when the observations yt contain exact information about the latent state xt, which is in particular the case if the observation noise is or- thogonal to the latent subspace or in the limit of a large number of neurons l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1876 A. Valente, S. Ostojic, and J. Pillow n (cid:6) d. Conversely, a low-rank linear RNN can in general be mapped to a latent LDS with a dimensionality of the latent state at most twice the rank of the RNN. 3 Mapping from LDS Models to Linear Low-Rank RNNs 3.1 Nonequivalence in the General Case. Let us consider a latent LDS described by equations 2.1 to 2.3 and a low-rank linear RNN defined by equations 2.4 and 2.5. We start by comparing the properties of the joint dis- , . . . , yT ) for any value of T for the two models. For both mod- tribution P(y0 els, the joint distribution can be factored under the form P(y0 , . . . , yT ) = P(y0) T(cid:2) t=1 P(yt | yt−1 , . . . , y0), (3.1) where each term in the product is the distribution of neural population ac- tivity at a single time point given all previous activity (see appendix A for details). More specifically, each of the conditional distributions in equation 3.1 is gaussian, and for the LDS, we can parameterize these distributions as P(xt|yt−1 P(yt|yt−1 , . . . , y0) := N (ˆxt, Vt ), , . . . , y0) = N (Cˆxt, CVtC (cid:5) + R), (3.2) (3.3) where ˆxt is the mean of the conditional distribution over the latent at time step t, given observations until time step t − 1. It obeys the recurrence equation ˆxt+1 = A(ˆxt + Kt (yt − Cˆxt )), where Kt is the Kalman gain given by (cid:5) Kt = VtC (CVtC (cid:5) + R) −1, (3.4) (3.5) and Vt represents a covariance matrix, which is independent of the obser- vations and follows a recurrence equation detailed in appendix A. Iterating equation 3.4 over multiple time steps, one can see that ˆxt+1 de- pends not only on the last observation yt but on the full history of obser- , . . . , yt ), which therefore affects the distribution at any given vations (y0 , . . . , yt ) generated by the LDS model is hence time step. The process (y0 non-Markovian. Conversely, for the linear RNN, the observations (y0 , . . . , yt ) instead do form a Markov process, meaning that observations are conditionally inde- pendent of their history given the activity from the previous time step: P(yt | yt−1 , . . . , y0) = P(yt | yt−1). (3.6) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1877 Figure 2: Mean autocorrelation of observations yt from latent LDS processes compared with their first-order RNN approximations. The latent space is one- dimensional (d = 1), and the dimension n of the observation space is increased from left to right: (a) n = 3, (b) n = 20, (c) n = 100. The parameters of the latent state processes are fixed scalars (A = (0.97), Q = (0.1)), while the elements of the observation matrices C are drawn randomly and independently from a cen- tered gaussian distribution of variance 1. The observation noise has covariance v = 2. Note that we have chosen observation noise to largely R = σ 2 dominate over latent state noise in order to obtain a large difference between models at low n. Dots and shaded areas indicate, respectively, mean and stan- dard deviation of different estimations of the mean autocorrelation done on 10 independent folds of 100 trials each (where C was identical across trials). v In with σ 2 The fact that this property does not in general hold for the latent LDS shows that the two model classes are not equivalent. Due to this fundamental con- straint, the RNN can only approximate the complex distribution (see equa- tion 3.1) parameterized by an LDS, as detailed in the following section and illustrated in Figure 2. , . . . , yt ) by deriving the conditional distribution P(yt+1 3.2 Matching the First-Order Marginals of an LDS Model. We can obtain a Markovian approximation of the LDS-generated sequence of ob- | yt ) servations (y0 under the LDS model and matching it with a low-rank RNN (Pachitariu et al., 2013). This type of first-order approximation will preserve exactly the , yt ) although struc- one-time-step-difference marginal distributions P(yt+1 ture across longer timescales might not be captured correctly. First, we note that we can express both yt and yt+1 as noisy linear pro- jections of xt: yt = Cxt + vt, = C(Axt + wt ) + vt+1 yt+1 , (3.7) (3.8) which follows from equation 2.1. Let N (0, (cid:2)t ) denote the gaussian marginal distribution over the la- tent vector xt at time t. Then we can use standard identities for linear l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1878 A. Valente, S. Ostojic, and J. Pillow transformations of gaussian variables to derive the joint distribution over yt and yt+1: (cid:3) (cid:4) (cid:5)(cid:3) (cid:4)(cid:6) (cid:3) (cid:4) yt yt+1 ∼ N , 0 0 C(cid:2)tC(cid:5) + R CA(cid:2)tC(cid:5) C(cid:2)tA(cid:5)C(cid:5) C(A(cid:2)tA(cid:5) + Q)C(cid:5) + R . (3.9) We can then apply the formula for conditioning of multivariate gaussians (see Bishop, 2006, equations 2.81 and 2.82) to obtain yt+1 | yt ∼ N (Jtyt, Pt ) , where (cid:5) Jt = CA(cid:2)tC Pt = C(A(cid:2)tA (C(cid:2)tC (cid:5) + Q)C (cid:5) + R) −1, (cid:5) + R − CA(cid:2)tC (cid:5) (3.10) (3.11) (3.12) (cid:5). C (C(cid:2)tC (cid:5) + R) −1C(cid:2)tA (cid:5) In contrast, from equation 2.4, for a low-rank RNN, the first-order marginal is given by yt+1 | yt ∼ N (Jyt, P) . (3.13) Comparing equations 3.10 and 3.13, we see for the LDS model that the effective weights Jt and the covariance Pt depend on time through (cid:2)t, the marginal covariance of the latent at time t, while for the RNN, they do not. Note, however, that (cid:2)t follows the recurrence relation (cid:2)t+1 = A(cid:2)tA (cid:5) + Q, (3.14) which converges toward a fixed point (cid:2)∞ that obeys the discrete Lyapunov equation, (cid:2)∞ = A(cid:2)∞A (cid:5) + Q, (3.15) provided all eigenvalues of A have absolute value less than 1. The LDS can therefore be approximated by an RNN with constant weights when the initial covariance (cid:2) 0 is equal to the asymptotic covari- ance (cid:2)∞, as noted previously (Pachitariu et al., 2013). Even if this condition does not hold at time 0, (cid:2)∞ will in general be a good approximation of the latent covariance after an initial transient. In this case, we obtain the fixed recurrence weights (cid:5) J = CA(cid:2)∞C (C(cid:2)∞C (cid:5) + R) −1 := MN (cid:5), (3.16) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1879 where we define M = C, which has shape n × d, and N(cid:5) = A(cid:2)∞C(cid:5) (C(cid:2)∞C(cid:5) + R)−1, which has shape d × n, so that J is a rank r matrix with r = d. 3.3 Cases of Equivalence between LDS and RNN Models. Although latent LDS and low-rank linear RNN models are not equivalent in gen- eral, we can show that the first-order Markovian approximation introduced above becomes exact in two limit cases of interest: for observation noise or- thogonal to the latent subspace and in the limit n (cid:6) d, with coefficients of the observation matrix generated randomly and independently. Our key observation is that if KtC = I in equation 3.4 with I the identity = AKtyt, so that the dependence on the observations matrix, we have ˆxt+1 before time step t disappears and the LDS therefore becomes Markovian. Interestingly, this condition KtC = I also implies that the latent state can be inferred from the current observation yt alone (see equation A.7 in ap- pendix A) and that this inference is exact, since the variance of the distribu- tion p(xt|yt ) is then equal to 0 as seen from equation A.8. We next examine two cases where this condition is satisfied. We first consider the situation where the observation noise vanishes: R = 0. Then, as shown in appendix A, the Kalman gain is Kt = (C(cid:5)C) , so that KtC = I. In that case, the approximation of the LDS by the RNN defined in section 3.2 is exact, with equations 3.11 and 3.12 becoming: −1C(cid:5) −1C (cid:5), C) (cid:5) J = CA(C (cid:5). P = CQC (3.17) (3.18) More generally, this result remains valid when the observation noise is or- thogonal to the latent subspace spanned by the columns of the observation matrix C (in which case the recurrence noise given by equation 3.18 becomes P = CQC(cid:5) + R). A second case in which we can obtain KtC ≈ I is in the limit of many neu- rons, n (cid:6) d, assuming that coefficients of the observation matrix are gen- erated randomly and independently. Indeed, under these hypotheses, the Kalman gain given by equation 3.5 is dominated by the term CVtC(cid:5) , so that the observation covariance R becomes negligible, as shown formally in ap- pendix B. Intuitively this means that the information about the latent state ˆxt is distributed over a large enough population of neurons for the Kalman filter to average out the observation noise and estimate it optimally with- out making use of previous observations. Ultimately this makes the LDS asymptotically Markovian in the case where we have an arbitrarily large neural population relative to the number of latent dimensions. To illustrate the convergence of the low-rank RNN approximation to the target latent LDS in the large n limit, in Figure 2, we consider a sim- ple example with a one-dimensional latent space and observation spaces of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1880 A. Valente, S. Ostojic, and J. Pillow increasing dimensionality. To visualize the difference between the LDS and its low-rank RNN approximation, we plot the trace of the autocorrela- tion matrix of observations yt in the stationary regime, ρ(δ) = Tr(E[ytyT t+δ]). Since the RNNs are constructed to capture the marginal distributions of ob- servations separated by at most one time step, the two curves match exactly for a lag δ ∈ {−1, 0, 1}, but dependencies at longer timescales cannot be ac- curately captured by an RNN due to its Markov property (see Figure 2a). However, these differences vanish as the dimensionality of the observation space becomes much larger than that of the latent space (see Figures 2b and 2c), which illustrates that the latent LDS converges to a process equivalent to a low-rank RNN. 4 Mapping Low-Rank Linear RNNs onto Latent LDS Models We now turn to the reverse question: Under what conditions can a low-rank linear RNN be expressed as a latent LDS model? We start with an intuitive mapping for the deterministic case (when noise covariance P = 0) and then extend it to a more general mapping valid in the presence of noise. We first consider a deterministic linear low-rank RNN obeying yt+1 = MN (cid:5) yt, (4.1) Since M is an n × r matrix, it is immediately apparent that for all t, yt is confined to a linear subspace of dimension r, spanned by the columns of M. Hence, we can define the r-dimensional latent state as xt = M + yt, where M+ defined since M is of rank r), so that we retrieve yt as is the pseudoinverse of M defined as M+ = (M(cid:5)M) −1M(cid:5) yt = Mxt. We then obtain a recurrence equation for the latent state: xt+1 yt+1 (cid:5) MN yt Mxt + + = M = M (cid:5) = N := Axt, (4.2) (well (4.3) (4.4) which with A = N(cid:5)M describes the dynamics of a latent LDS with d = r. A key insight from equation 4.4 is that the overlap between the columns of N and M determines the part of the activity that is integrated by the recurrent l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1881 dynamics (Mastrogiuseppe & Ostojic, 2018; Schuessler et al., 2020; Beiran et al., 2021; Dubreuil et al., 2022). In presence of noise (cid:3)t, yt is no longer confined to the column space of M. Part of this noise is integrated into the recurrent dynamics and can con- tribute to the activity accross many time steps. This integration of noise can occur in an LDS at the level of latent dynamics through wt, but not at the level of observation noise vt, which is independent accross time steps. As noted above, recurrent dynamics only integrate the activity present in the column space of N. In the presence of noise, this part of state space there- fore needs to be included into the latent variables. More important, a similar observation can be made about external inputs when they are added to the RNN dynamics (see appendix D). A full mapping from a noisy low-rank RNN to an LDS model can there- fore be built by extending the latent space to the linear subspace F of Rn spanned by the columns of M and N (see appendix C), which has dimen- sion d with r ≤ d ≤ 2r. Let C be a matrix whose columns form an orthogonal basis for this subspace (which can be obtained via the Gram-Schmidt algo- rithm). In that case, we can define the latent vector as (cid:5) xt = C yt, and the latent dynamics are given by xt+1 = Axt + wt, (4.5) (4.6) where the recurrence matrix is A = C(cid:5)JC, and the latent dynamics noise is wt ∼ N (0, Q) with Q = C(cid:5)PC. Introducing vt = yt − Cxt, under a specific condition on the noise covariance P, we obtain a normal random variable independent of the other sources of noise in the process (appendix C), so that yt can be described as a noisy observation of the latent state xt as in the LDS model: yt = Cxt + vt. (4.7) 4.1 Subsampled RNNs. Experimental recordings typically access only the activity of a small fraction of neurons in the local network. An important question for interpreting neural data concerns the statistics of activity when only a random subset of k neurons in an RNN is observed. This situation can be formalized by introducing the set of observed activities ot: yt+1 = Jyt + (cid:3)t, ot = yt[: k] = Dyt. (4.8) Here [: k] symbolizes the selection of the first k values of a vector and D is the corresponding projection matrix on the subspace spanned by the first k l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1882 A. Valente, S. Ostojic, and J. Pillow Figure 3: Mean autocorrelation of k neurons subsampled from an n-dimesional rank-one RNN, compared with a k-dimensional RNN built to match the first- order marginals of partial observations. Formally, we first built an LDS equiv- alent to the partial observations as in equation 4.9, and then the corresponding RNN as in section 3.2. The rank-one RNN contains n = 20 neurons, of which only k = 3 are observed. The mismatch occurs because the long-term correla- tions present in the partial observations are caused by the larger size of the orig- inal RNN with 20 neurons and cannot be reproduced by an RNN with only 3 neurons. neurons. The system described by equation 4.8 is exactly an LDS but with latent state yt and observations ot. In contrast to the regime considered in the previous sections, the latents have a higher dimensionality than obser- vations. However, assuming as before that J is low-rank, this model can be mapped onto an equivalent LDS following the steps in appendix C: xt+1 = Axt + wt, ot = DCxt + Dvt. (4.9) This LDS is equivalent to equation 4.8, but with latent dynamics xt of di- mension r < d < 2r where r is the rank of J. The dynamics of the latent state xt are identical to those of the fully observed low-rank RNN (see equa- tion 4.6), but the observations are generated from a subsampled observation matrix DC. Interestingly, this mapping highlights the fact that the activity statistics of the k subsampled neurons are in general not Markovian, in contrast to the full activity yt of the n neurons in the underlying RNN. In particular, for that reason, the statistics of ot cannot be exactly reproduced by a smaller RNN consisting of k units (see Figure 3). Remarkably, when considering the subsampled activity of an RNN, a latent LDS is therefore a more accurate model than a smaller RNN containing only the observed units. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1883 5 Discussion In this letter, we have examined the relationship between two simple yet powerful classes of models of low-dimensional activity: latent linear dy- namical systems (LDS) and low-rank linear recurrent neural networks (RNN). We have focused on these tractable linear models with additive gaussian noise to highlight their mathematical similarities and differences. Although both models induce a jointly gaussian distribution over neural population activity, generic latent LDS models can exhibit long-range, non- Markovian temporal dependencies that cannot be captured by low-rank linear RNNs, which describe neural population activity with a first-order Markov process. Conversely, we showed that generic low-rank linear RNNs can be captured by an equivalent latent LDS model. However, we have shown that the two classes of models are effectively equivalent in limit cases of practical interest for neuroscience, in particular when the number of sam- pled neurons is much higher than the latent dimensionality. Although these two model classes can generate similar sets of neural trajectories, different approaches are typically used for fitting them to neu- ral data: parameters of LDS models are in general inferred by variants of the expectation-maximization algorithm (Yu et al., 2005; Pachitariu et al., 2013; Nonnenmacher, Turaga, & Macke, 2017; Durstewitz, 2017), which include the Kalman smoothing equations (Roweis & Ghahramani, 1999), while RNNs are often fitted with variants of linear regression (Rajan et al., 2016; Eliasmith & Anderson, 2003; Pollock & Jazayeri, 2020; Bondanelli et al., 2021) or backpropagation through time (Dubreuil et al., 2022). The re- lationship uncovered here therefore opens the door to comparing different fitting approaches more directly, and in particular to developing probabilis- tic methods for inferring RNN parameters from data. We have considered here only linear RNN and latent LDS models. Nonlinear low-rank RNNs without noise can be directly reduced to nonlinear latent dynamics with linear observations following the same mapping as in section 4 (Mastrogiuseppe & Ostojic, 2018; Schuessler et al., 2020; Beiran et al., 2021; Dubreuil et al., 2022) and therefore define a natural class of nonlinear LDS models. A variety of other nonlinear generalizations of LDS models have been considered in the literature. One line of work has examined linear latent dynamics with a nonlinear observation model (Yu et al., 2005) or nonlinear latent dynamics (Yu et al., 2005; Durstewitz, 2017; Duncker et al., 2019; Pandarinath et al., 2018; Kim et al., 2008). An- other line of work has focused on switching LDS models (Linderman et al., 2017; Glaser et al., 2020) for which the system undergoes different linear dy- namics depending on a hidden discrete state, thus combining elements of latent LDS and hidden Markov models. Both nonlinear low-rank RNNs and switching LDS models are universal approximators of low-dimensional dy- namical systems (Funahashi & Nakamura, 1993; Chow & Li, 2000; Beiran et al., 2021). Relating switching LDS models to local linear approximations l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1884 A. Valente, S. Ostojic, and J. Pillow of nonlinear low-rank RNNs (Beiran et al., 2021; Dubreuil et al., 2022) is therefore an interesting avenue for future investigations. Appendix A: Kalman Filtering Equations We reproduce in this appendix the recurrence equations followed by the conditional distributions in equation 3.1 for both the latent LDS and the linear RNN models. For the latent LDS model, the conditional distributions are gaussians, and their form is given by the Kalman filter equations (Kalman, 1960; Yu et al., 2004; Welling, 2010). Following Yu et al. (2004), we observe that for any two time steps τ ≤ t the conditional distributions P(yt+1 |yτ , . . . , y0) and P(xt+1 |yτ , . . . , y0) are gaussian, and we introduce the notations τ P(yt|yτ , . . . , y0) := N ( ˆy t τ P(xt|yτ , . . . , y0) := N (ˆx t τ , W τ , V t ), t ). (A.1) (A.2) In particular, we are interested in expressing ˆyt t+1 and ˆxt t+1, which are the predicted future observation and latent state, but also in ˆxt t which represents the latent state inferred from the history of observations until time step t included. To lighten notations, in the main text, we remove the exponent when it has one time step difference with the index, by writing ˆxt+1, ˆyt+1, Wt+1 and Vt+1 instead of, respectively, ˆxt t+1, Wt First, note that we have the natural relationships t+1 and Vt t+1, ˆyt t+1. , t+1 ˆxt t+1 ˆyt t+1 Vt t+1 Wt t+1 , = Aˆxt t = Cˆxt = AVt = CVt (cid:5) + Q, tA t+1C (cid:5) + R, (A.3) (A.4) (A.5) (A.6) t and Vt so that it is sufficient to find expressions for ˆxt detailed in Yu et al. (2004) or Welling (2010), we obtain t. After calculations ˆxt t Vt t + Kt (yt − Cˆxt−1 = ˆxt−1 t = (I − KtC)Vt−1 , t t ), where Kt is the Kalman gain given by Kt = Vt−1 (cid:5) t C (CVt−1 t C (cid:5) + R) −1. (A.7) (A.8) (A.9) These equations form a closed recurrent system, as can be seen by com- bining equations A.3 and A.7 and equations A.5 and A.8 to obtain a self- consistent set of recurrence equations for the predicted latent state and l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1885 its variance: ˆxt t+1 Vt t+1 t + Kt (yt − Cˆxt−1 )), t (cid:5) + Q, = A(ˆxt−1 = A(I − KtC)Vt−1 = A(I − Vt−1 t A (CVt−1 (cid:5) t C t C (cid:5) + R) −1C)Vt−1 t A (cid:5) + Q. (A.10) (A.11) From equation A.10, we see that the predicted state at time t + 1, and thus the predicted observation, depends on observations at time steps τ ≤ t − 1 through the term ˆxt, making the system non-Markovian. Also note that equations for the variances do not involve any of the observations yt, showing these are exact values and not estimations. This derivation, however, is not valid in the limit case R = 0, since Kt is then undefined. In that case, however, we can observe that yt lies in the linear subspace spanned by the columns of C, so that one can simply replace equation A.7 by ˆxt t + = C yt = xt, (A.12) −1C(cid:5) where C+ = (C(cid:5)C) is the pseudoinverse of C. Since this equation is deterministic, the variance of the estimated latent state is equal to 0, so that = 0. This case can be encompassed by equations equation A.8 becomes Vt t A.3 to A.8 if we rewrite the Kalman gain as Kt = C (cid:5) + = (C −1C (cid:5). C) (A.13) Finally, for the linear RNN, the conditional distribution of equation A.1 is directly given by P(yt+1 |yt, . . . , y0) = N (Jyt, P), (A.14) which shows that the predicted observation depends only on the last one, making the system Markovian. Appendix B: Equivalence in the Large Network Limit Here we make the assumption that the coefficients of the observation matrix are generated randomly and independently. We show that in the limit of large n with d fixed, one obtains KtC → I so that the LDS is asymptotically Markovian and can therefore be exactly mapped to an RNN. We start by considering a latent LDS whose conditional distributions obey equations A.1 to A.11, with the Kalman gain obeying equation A.9. To simplify equation A.9, we focus on the steady state where variance Vt has reached its stationary limit V in equation A.11. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1886 A. Valente, S. Ostojic, and J. Pillow Without loss of generality, we reparameterize the LDS by applying a change of basis to the latent states such that V = I. We also apply a change of basis to the observation space such that R = I in the new basis (this transfor- mation does not have an impact on the conditional dependencies between the yt at different time steps, and it can also be shown that it cancels out in the expression KtC). Equation A.9 then becomes (cid:5) KtC = C (cid:5) (I + CC −1C. ) (B.1) Applying the matrix inversion lemma gives (I + CC(cid:5))−1 = I − C(I + C(cid:5)C)−1C(cid:5), from which we get (cid:5) KtC = C (cid:5) C − C (cid:5) C(I + C (cid:5) −1C C) C. Using a Taylor expansion, we then write (cid:5) (I + C C) −1 = (I + (C (cid:5) (cid:5) −1) C) −1(C (cid:5) (cid:6) −1 C) ∞(cid:7) = (cid:5) (−(C −1)k C) (cid:5) (C −1 C) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l k=0 (cid:5) ≈ (C which gives −1 − ((C (cid:5) C) −1)2 + ((C (cid:5) C) −1)3, C) (cid:5) KtC ≈ C (cid:5) C − C (cid:5) − C (cid:5) ≈ C C((C (cid:5) C − C (cid:5) C(C (cid:5) (cid:5) −1C C) (cid:5) −1)3C C) C (cid:5) C + I − (C −1. C) (cid:5) C + C (cid:5) C((C −1)2C (cid:5) C) C f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / Assuming the coefficients of the observation matrix are independent and identically distributed (i.i.d.) with zero mean and unit variance, for n large, we obtain C(cid:5)C = nI + O( n) from the central limit theorem so that (C(cid:5)C)−1 = O(1/n) (which can again be proven with a Taylor expansion). This finally leads to KtC = I + O(1/n). √ An alternative proof takes advantage of the spectral theorem applied to C(cid:5)C. Indeed, since it is a symmetric matrix, it can be decomposed as C(cid:5)C = UDU(cid:5) where U is an orthonormal matrix and D the diagonal matrix of eigenvalues. Starting from equation B.1 we derive f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 (cid:5) KtC = C (cid:5) C − C (cid:5) C(I + C (cid:5) − UDU (cid:5) (cid:5) − UDU (cid:5) (cid:5) (cid:5) − UDU (cid:5) −1C C C) (cid:5) −1UDU (cid:5) (I + UDU ) (cid:5) (cid:5) −1UDU (U(D + I)U ) (cid:5) (cid:5) −1U U(D + I) UDU = UDU = UDU = UDU Comparing the Low-Rank RNN and LDS Models 1887 (cid:5) −1U (cid:5) − UD2(D + I) = UDU (cid:5) = U(D − I/(D + I))U (cid:5) = U(D/(D + I))U (cid:5) = U(I − I/(D + I))U = I − U(I/(D + I))U (cid:5). Assuming as before that the coefficients of C are i.i.d. gaussian with zero mean and unit variance, C(cid:5)C is then the empirical covariance of i.i.d. sam- ples of a gaussian ensemble with identity matrix covariance. The matrix C(cid:5)C = UDU(cid:5) then follows the (I, n)-Wishart distribution, and for n large, √ n (using e.g., the tail bounds of Wain- its eigenvalues are all greater than wright, 2019, theorem 6.1). This shows that (I/(D + I)) = O(1/ n)I, com- pleting the proof. √ Appendix C: Derivation of the RNN to LDS Mapping As mentioned in section 4, we consider an RNN defined by equation 2.4 with J = MN(cid:5) and note C an orthonormal matrix whose columns form a basis of F, the linear subspace spanned by the columns of M and N. Note that CC(cid:5) is an orthogonal projector onto the subspace F and that since all columns of M and N belong to this subspace, we have CC(cid:5)M = M and CC(cid:5)N = N. Hence, we have (cid:5) CC JCC (cid:5) = J. (C.1) We thus define the latent vector as xt = C(cid:5)yt, and we can then write xt+1 (cid:5)(cid:3)t yt+1 Jyt + C (cid:5) (cid:5) = C (cid:5) = C (cid:5) = C (cid:5) = C JCC = Axt + wt, CC (cid:5)(cid:3)t (cid:5) JCC (cid:5) yt + C yt + C (cid:5)(cid:3)t (by equation C.1) (because C (cid:5) C = I) where we have defined the recurrence matrix A = C(cid:5)JC and the latent dy- namics noise wt = C(cid:5)(cid:3)t, which follows wt ∼ N (0, Q) with Q = C(cid:5)PC. Let us define vt = yt − Cxt = (I − CC(cid:5) )yt. We need to determine the con- ditions under which vt is normally distributed and independent of yt−1 and xt. For this, we write Cxt = CAxt−1 + Cwt−1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1888 A. Valente, S. Ostojic, and J. Pillow (cid:5) = CC (cid:5) = CC = Jyt−1 + Cwt−1 JCxt−1 (cid:5) + Cwt−1 yt−1 JCC , + Cwt−1 and hence, vt = (cid:3)t−1 − Cwt−1 (cid:5) )(cid:3)t−1 = (I − CC , which is independent of yt−1 and has a marginal distribution vt ∼ N (0, R) with R = P − CC(cid:5)PCC(cid:5) but is not in general independent of wt−1. A suffi- cient and necessary condition for the independence of wt−1 and vt is that the RNN noise covariance P has all its eigenvectors either aligned with or or- thogonal to the subspace F (in this case, the covariance R is degenerate and has F as a null space, which implies that observation noise is completely orthogonal to F). If that is not the case, the reparameterization stays valid up to the fact that the observation noise vt and the latent dynamics noise wt can be correlated. Appendix D: Addition of Input Terms Let us consider an extension of both the latent LDS and the linear RNN models to take into account inputs. More specifically, we consider adding to both model classes an input under the form of a time-varying signal ut fed to the network through a constant set of input weights. In the latent LDS model, the input is fed directly to the latent variable and equations 2.1 and 2.2 become l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / xt = Axt−1 yt = Cxt + But + wt, + vt, wt ∼ N (0, Q), vt ∼ N (0, R). The linear RNN equation 2.4 becomes yt = Jyt−1 + Winut + (cid:3)t, (cid:3)t ∼ N (0, P), (D.1) (D.2) (D.3) f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 so that we will represent by B a low-dimensional input projection and Win a high-dimensional one. For the LDS-to-RNN mapping, we can directly adapt the derivations of section 3.2, which lead to yt+1 | yt ∼ N (CBut + Jtyt, Pt ) (D.4) with the same expressions for Jt and Pt, given in equations 3.11 and 3.12. Comparing the Low-Rank RNN and LDS Models 1889 For the RNN-to-LDS mapping, assuming again that J is low-rank and written as J = MN(cid:5) , we can define (cid:5) xt = C yt, where C is a matrix whose columns form an orthonormal basis for the sub- space F spanned by the columns of M, N, and Win. This latent vector then follows the dynamics xt+1 (cid:5) = CJC (cid:5) xt + C Winut + C (cid:5)(cid:3)t, (D.5) which corresponds to equation D.1, and it is straightforward to show that it leads to equation D.2, with the technical condition that the covariance of (cid:3)t should have its eigenvectors aligned with the subspace F to avoid correlations between observation and recurrent noises. Acknowledgments We thank both reviewers for constructive suggestions that have signifi- cantly improved this letter. In particular, we thank Scott Linderman for the alternative proof in appendix B. A.V. and S.O. were supported by the program Ecoles Universitaires de Recherche (ANR-17-EURE-0017), the CRCNS program through French Agence Nationale de la Recherche (ANR- 19-NEUC-0001-01), and the NIH BRAIN initiative (U01NS122123). J.W.P. was supported by grants from the Simons Collaboration on the Global Brain (SCGB AWD543027), the NIH BRAIN initiative (R01EB026946), and a visit- ing professorship grant from the Ecole Normale Superieure. References Archer, E., Koster, U., Pillow, J., & Macke, J. (2014). Low-dimensional models of neu- ral population activity in sensory cortical circuits. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 343–351). Red Hook, NY: Curran. Barak, O. (2017). Recurrent neural networks as versatile tools of neuroscience research. Current Opinion in Neurobiology, 46, 1–6. 10.1016/j.conb.2017.06.003, PubMed: 28668365 Beiran, M., Dubreuil, A., Valente, A., Mastrogiuseppe, F., & Ostojic, S. (2021). Shap- ing dynamics with multiple populations in low-rank recurrent networks. Neural Computation, 33, 1572–1615. 10.1162/neco_a_01381, PubMed: 34496384 Bishop, C. (2006). Pattern recognition and machine learning. Berlin: Springer. Bondanelli, G., Deneux, T., Bathellier, B., & Ostojic, S. (2021). Network dynam- ics underlying OFF responses in the auditory cortex. eLife, 10, e53151. 10.7554/ eLife.53151, PubMed: 33759763 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1890 A. Valente, S. Ostojic, and J. Pillow Chow, T., & Li, X. (2000). Modeling of continuous time dynamical systems with input by recurrent neural networks. IEEE Transactions on Circuits and Systems I: Funda- mental Theory and Applications, 47, 575–578. 10.1109/81.841860 Churchland, M., Byron, M., Sahani, M., & Shenoy, K. (2007). Techniques for extract- ing single-trial activity patterns from large-scale neural recordings. Current Opin- ion in Neurobiology, 17, 609–618. 10.1016/j.conb.2007.11.001, PubMed: 18093826 Cohen, Z., DePasquale, B., Aoi, M., & Pillow, J. (2020). Recurrent dynamics of prefrontal cortex during context-dependent decision-making. bioRxiv. Cunningham, J., & Yu, B. (2014). Dimensionality reduction for large-scale neu- ral recordings. Nature Neuroscience, 17, 1500–1509. 10.1038/nn.3776, PubMed: 25151264 Dubreuil, A., Valente, A., Beiran, M., Mastrogiuseppe, F., & Ostojic, S. (2022). The role of population structure in computations through neural dynamics. Nature Neuroscience, 25, 783–794. 10.1038/s41593-022-01088-4, PubMed: 35668174 Duncker, L., Bohner, G., Boussard, J., & Sahani, M. (2019). Learning interpretable continuous-time models of latent stochastic dynamical systems. In Proceedings of the International Conference on Machine Learning (pp. 1726–1734). Durstewitz, D. (2017). A state space approach for piecewise-linear recurrent neu- ral networks for identifying computational dynamics from neural measure- ments. PLOS Computational Biology, 13, e1005542. 10.1371/journal.pcbi.1005542, PubMed: 28574992 Eliasmith, C., & Anderson, C. (2003). Neural engineering: Computation, representation, and dynamics in neurobiological systems. Cambridge, MA: MIT Press. Finkelstein, A., Fontolan, L., Economo, M., Li, N., Romani, S., & Svoboda, K. (2021). Attractor dynamics gate cortical information flow during decision-making. Na- ture Neuroscience, 24, 843–850. 10.1038/s41593-021-00840-6, PubMed: 33875892 Funahashi, K., & Nakamura, Y. (1993). Approximation of dynamical systems by con- tinuous time recurrent neural networks. Neural Networks, 6, 801–806. 10.1016/ S0893-6080(05)80125-X Gallego, J., Perich, M., Miller, L, & Solla, S. (2017). Neural manifolds for the con- trol of movement. Neuron, 94, 978–984. 10.1016/j.neuron.2017.05.025, PubMed: 28595054 Gao, P., & Ganguli, S. (2015). On simplicity and complexity in the brave new world of large-scale neuroscience. Current Opinion in Neurobiology, 32, 148–155. 10.1016/ j.conb.2015.04.003, PubMed: 25932978 Glaser, J., Whiteway, M., Cunningham, J., Paninski, L., & Linderman, S. (2020). Re- current switching dynamical systems models for multiple interacting neural pop- ulations. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in neural information processing systems, 33 (pp. 14867–14878). Red Hook, NY: Curran. Hennequin, G., Vogels, T., & Gerstner, W. (2014). Optimal control of transient dynam- ics in balanced networks supports generation of complex movements. Neuron, 82, 1394–1406. 10.1016/j.neuron.2014.04.045, PubMed: 24945778 Jazayeri, M., & Ostojic, S. (2021). Interpreting neural computations by examining in- trinsic and embedding dimensionality of neural activity. Current Opinion in Neu- robiology, 70, 113–120. 10.1016/j.conb.2021.08.002, PubMed: 34537579 Kalman, R. (1960). A new approach to linear filtering and prediction problems. Jour- nal of Basic Engineering, 82, 35–45. 10.1115/1.3662552 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Comparing the Low-Rank RNN and LDS Models 1891 Kao, T., Sadabadi, M., & Hennequin, G. (2021). Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model. Neuron, 109, 1567– 1581. 10.1016/j.neuron.2021.03.009, PubMed: 33789082 Kim, S., Simeral, J., Hochberg, L., Donoghue, J., & Black, M. (2008). Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia. Journal of Neural Engineering, 5, 455. 10.1088/1741-2560/5/4/010 Laje, R., & Buonomano, D. (2013). Robust timing and motor patterns by taming chaos in recurrent neural networks. Nature Neuroscience, 16, 925–933. 10.1038/nn.3405, PubMed: 23708144 Landau, I., & Sompolinsky, H. (2018). Coherent chaos in a recurrent neural network with structured connectivity. PLOS Computational Biology, 14, e1006309. 10.1371/ journal.pcbi.1006309, PubMed: 30543634 Landau, I., & Sompolinsky, H. (2021). Macroscopic fluctuations emerge in balanced networks with incomplete recurrent alignment. Phys. Rev. Research., 3, 023171. 10.1103/PhysRevResearch.3.023171 Linderman, S., Johnson, M., Miller, A., Adams, R., Blei, D., & Paninski, L. (2017). Bayesian learning and inference in recurrent switching linear dynamical systems. In Proceedings of the 20th International Conference on Artificial Intelligence and Statis- tics (pp. 914–922). Macke, J., Buesing, L., Cunningham, J., Byron M., Shenoy, K., & Sahani, M. (2011). Empirical models of spiking in neural populations. In S. Solla, T. Leen, & K. R. Müller (Eds.), Advances in neural information processing systems (pp. 1350–1358). Cambridge, MA: MIT Press. Mante, V., Sussillo, D., Shenoy, K., & Newsome, W. (2013). Context-dependent com- putation by recurrent dynamics in prefrontal cortex. Nature, 503, 78–84. 10.1038/ nature12742, PubMed: 24201281 Mastrogiuseppe, F., & Ostojic, S. (2018). Linking connectivity, dynamics, and com- putations in low-rank recurrent neural networks. Neuron, 99, 609–623. 10.1016/ j.neuron.2018.07.003, PubMed: 30057201 Nonnenmacher, M., Turaga, S., & Macke, J. (2017). Extracting low-dimensional dy- namics from multiple large-scale neural population recordings by learning to pre- dict correlations. In I. Guyon, Y. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing sys- tems, 30. Red Hook, NY: Curran. Pachitariu, M., Petreska, B., & Sahani, M. (2013), Recurrent linear models of simultaneously-recorded neural populations. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 26 (pp. 3138–3146). Red Hook, NY: Curran. Pandarinath, C., O’Shea, D., Collins, J., Jozefowicz, R., Stavisky, S., Kao, J., . . . Sussilllo, D. (2018). Inferring single-trial neural population dynamics using se- quential auto-encoders. Nature Methods, 15, 805–815. 10.1038/s41592-018-0109-9, PubMed: 30224673 Pereira, U., & Brunel, N. (2018). Attractor dynamics in networks with learning rules inferred from in vivo data. Neuron, 99, 227–238. 10.1016/j.neuron.2018.05.038, PubMed: 29909997 Perich, M., Arlt, C., Soares, S., Young, M., Mosher, C., Minxha, J., . . . Rajan, K. (2021). Inferring brain-wide interactions using data-constrained recurrent neural network mod- els. bioRxiv:2020-12. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 1892 A. Valente, S. Ostojic, and J. Pillow Petreska, B., Byron, M., Cunningham, J., Santhanam, G., Ryu, S., Shenoy, K., & Sa- hani, M. (2011). Dynamical segmentation of single trials from population neu- ral data. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 24 (pp. 756–764). Red Hook, NY: Curran. Pollock, E., & Jazayeri, M. (2020). Engineering recurrent neural networks from task- relevant manifolds and dynamics. PLOS Computational Biology, 16, e1008128. 10.1371/journal.pcbi.1008128 Rajan, K., Harvey, C., & Tank, D. (2016). Recurrent network models of sequence generation and memory. Neuron, 90, 128–142. 10.1016/j.neuron.2016.02.009, PubMed: 26971945 Roweis, S., & Ghahramani, Z. (1999). A unifying review of linear gaussian models. Neural Computation, 11, 305–345. 10.1162/089976699300016674, PubMed: 9950734 Saxena, S., & Cunningham, J. (2019). Towards the neural population doctrine. Cur- rent Opinion in Neurobiology, 55, 103–111. 10.1016/j.conb.2019.02.002, PubMed: 30877963 Schuessler, F., Dubreuil, A., Mastrogiuseppe, F., Ostojic, S., & Barak, O. (2020). Dy- namics of random recurrent networks with correlated low-rank structure. Physi- cal Review Research, 2, 013111. 10.1103/PhysRevResearch.2.013111 Semedo, J., Zandvakili, A., Kohn, A., Machens, C., & Byron, M. (2014). Extracting latent structure from multiple interacting neural populations. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 2942–2950). Red Hook, NY: Curran. Smith, A., & Brown, E. (2003). Estimating a state-space model from point pro- cess observations. Neural Computation, 15, 965–991. 10.1162/089976603765202622, PubMed: 12803953 Sompolinsky, H., Crisanti, A., & Sommers, H. (1988). Chaos in random neural networks Phys. Rev. Lett., 61, 259–262. 10.1103/PhysRevLett.61.259, PubMed: 10039285 Sussillo, D. (2014). Neural circuits as computational dynamical systems. Cur- rent Opinion in Neurobiology, 25, 156–163. 10.1016/j.conb.2014.01.008, PubMed: 24509098 Wainwright, M. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cam- bridge: Cambridge University Press. Welling, M. (2010). The Kalman filter (Caltech lecture note 136-93). Yu, B., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2005). Extracting dynamical structure embedded in neural activity. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press. Yu, B., Shenoy, K., & Sahani, M. (2004). Derivation of Kalman filtering and smoothing equations. Stanford, CA: Stanford University. Zoltowski, D., Pillow, J., & Linderman, S. (2020). A general recurrent state space framework for modeling neural dynamics during decision-making. In Proceed- ings of the International Conference on Machine Learning (pp. 11680–11691). Received October 26, 2021; accepted April 15, 2022. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 4 9 1 8 7 1 2 0 3 9 7 7 5 n e c o _ a _ 0 1 5 2 2 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 字母图像

下载pdf