Song, L., Ren, Y., Shuhan, X., Hou, Y. & Er, X. (2023). A hybrid spatio-temporal deep belief network and sparse

Song, L., Ren, Y., Shuhan, X., Hou, Y. & Er, X. (2023). A hybrid spatio-temporal deep belief network and sparse
representation based framework reveals multi-level core functional components in decoding multi-task fMRI signals. Netzwerk
Neurowissenschaften, Advance publication. https://doi.org/10.1162/netn_a_00334.

A Hybrid Spatio-Temporal Deep Belief Network and Sparse Representation-Based

Framework Reveals Multi-Level Core Functional Components in Decoding Multi-Task

fMRI Signals

Limei Song1#, Yudan Ren1#*, Shuhan Xu1, Yuqing Hou1, Xiaowei He1

1 School of Information Science & Technologie, Northwest University, China;

# These authors contributed equally to this work and should be considered co-first authors.

* Corresponding authors.

Abstrakt

Decoding human brain activity on various task-based functional brain imaging data is of great

significance for uncovering the functioning mechanism of the human mind. Currently, most

feature extraction model-based methods for brain state decoding are shallow machine learning

Modelle, which may struggle to capture complex and precise spatio-temporal patterns of brain

activity from the highly noisy fMRI raw data. Darüber hinaus, although decoding models based on

deep learning methods benefit from their multi-layer structure that could extract spatio-

temporal features at multi-scale, the relatively large populations of fMRI datasets are

indispensable and the explainability of their results is elusive. To address the above problems,

we proposed a computational framework based on hybrid spatio-temporal deep belief network

and sparse representations to differentiate multi-task fMRI (tfMRI) Signale. Using a relatively

small cohort of tfMRI data as a testbed, our framework can achieve an average classification

accuracy of 97.86% and define the multi-level temporal and spatial patterns of multiple

cognitive tasks. Intriguingly, our model can characterize the key components for differentiating

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

the multi-task fMRI signals. Gesamt, the proposed framework can identify the interpretable

and discriminative fMRI composition patterns at multiple scales, offering an effective

methodology for basic neuroscience and clinical research with relatively small cohorts.

Schlüsselwörter: Multi-task classification, Task-based fMRI, Deep belief network, Sparse

representation, Functional brain network.

Einführung

For years, researchers have been attempting to decode the human brain states based on

functional magnetic resonance imaging (fMRT) Daten (Haynes & Rees, 2006; Jang, Plis, Calhoun,

& Lee, 2017; Rubin et al., 2017; Stanislas Dehaene, 1998), where distinguishing different

cognitive tasks from fMRI data and extracting discriminative fMRI composition patterns are

effective means to improve our understanding of the relationship among current cognitive tasks,

brain responses, and individual behavior (Friston, 2009; Logothetis, 2008). To decode

meaningful neurological patterns embedded in diverse task-based fMRI data, various

computational and statistical methods have been proposed in the last decades. The most widely

used brain state decoding strategy is multi-voxel pattern analysis (MVPA) (Davatzikos et al.,

2005; Jang et al., 2017; Kriegeskorte & Bandettini, 2007). Despite its popularity, its commonly-

used classification strategy support vector machine (SVM) usually struggles to perform well

on high-dimensional fMRI data and thus requires effective techniques for feature

selection/extraction (LeCun, Bengio, & Hinton, 2015; Vieira, Pinaya, & Mechelli, 2017).

Somit, the feasibility of feature selection/extraction has been investigated using various

machine learning methods (LeCun et al., 2015; Vieira et al., 2017; S. Zhang et al., 2016).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Jedoch, most of these machine learning methods rely on shallow models, and their shallow

nature may hinder them from effectively capturing non-linear relationships in the highly noisy

fMRI raw data, resulting in difficulties in extracting complex and specific spatio-temporal

Merkmale (Qiang et al., 2020; Rashid, Singh, & Goyal, 2020; Varoquaux & Thirion, 2014).

Kürzlich, studies applying deep learning models such as deep neural network (DNN) Und

convolutional neural networks (CNN) to decode brain states based on task-based fMRI signals

have been reported (J. Hu et al., 2019; Liu, Er, Chen, & Gao, 2019; Sotetsu Koyamadaa, 2015;

Y. Zhang, Tetrel, Thirion, & Bellec, 2021). Such deep learning models take the advantage of

being a multi-layer architecture by stacking multiple building blocks with similar structure,

which has demonstrated the ability to significantly reduce noises in raw fMRI data and model

the non-linear relationships among neural activities of brain regions, allowing for the extraction

of multi-level spatio-temporal features (Bengio, Courville, & Vincent, 2012; Najafabadi et al.,

2015; Ren, Xu, Tao, Song, & Er, 2021). Trotzdem, there are still some limitations in current

brain state decoding strategies based on deep learning models. Erste, as large-size samples are

indispensable for the deep learning model, current decoding models are not suitable for small

datasets (Bo Liu, 2017; Litjens et al., 2017; Wang et al., 2020; Wen et al., 2018). Zum Beispiel,

Wang et al. (2020) proposed a DNN-based model for tfMRI signal classification, welche

requires 1034 Fächer, making it less practical for clinical populations. Zweite, most of the

decoding models based on deep learning are end-to-end learning and the explainability of such

models is elusive (J. Hu et al., 2019; LeCun et al., 2015; Wang et al., 2020). Kürzlich, manche

researchers have attempted to define the key components for decoding brain states using the

machine learning method. Zum Beispiel, our previous study based on sparse dictionary learning

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

has determined that the key components for multi-task classification tend to be functional brain

Netzwerke (FBNs) (Song, Ren, Hou, Er, & Liu, 2022). Another research has shown that artifact

components such as movement-related artifacts are significantly more informative with respect

to the classification accuracy of the multi-task electroencephalogram (EEG) Signale

(McDermott et al., 2021). Jedoch, uncovering the interpretable key features in decoding

tfMRI signals has received much less attention.

Due to the pitfalls in existing research, it is desirable to develop an appropriate framework

capable of identifying the interpretable and discriminative fMRI composition patterns

embedded in multi-task fMRI data. Daher, in this study, we aim to extract both multi-level

group-wise temporal features and spatial features from tfMRI signals, and define interpretable

classification features for multi-task fMRI data simultaneously. Recent studies have revealed

that the deep belief network (DBN) can effectively identify multi-layer spatial and temporal

features from fMRI signals (Dong, 2020; Ren et al., 2021), which is typically stacked by

multiple Boltzmann machine (RBM) (Geoffrey E Hinton & Sejnowski, 1986) and thus can

naturally act as a multi-level feature extractor. Außerdem, these prior studies have integrated

the least absolute shrinkage and selection operator (LASSO) regression with the DBN model,

indicating the efficacy of LASSO regression in extracting relevant spatial patterns. Daher, Wir

here proposed a novel two-stage feature extraction framework based on hybrid DBN and sparse

representations framework (DBN-SR) to decode multi-task fMRI signals with the capability of

extracting multi-scale deep features. Speziell, the DBN model was utilized to capture multi-

level group-wise temporal features, based on which the individual spatial features were

estimated by LASSO regression. Subsequently, a sparse representation method that combines

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

dictionary learning and LASSO regression was utilized to further characterize the group-wise

spatial features and individual spatio-temporal features for the purpose of classification. Based

on the correspondence between the individual classification features and the group-wise spatial

Merkmale, a relationship between the decoding capability of classification features and their

spatial patterns can be effectively established, which can facilitate the interpretation of neural

implications associated with the classification features. Endlich, due to its strong generalization

capabilities in small sample sizes, SVM was employed for the multi-class classification task.

Our results demonstrated that the proposed framework could successfully classify seven

task fMRI signals on a relatively small dataset. Darüber hinaus, by taking advantage of DBN in

extracting mid-level and high-level features and sparse coding in brain functional network

representation (Lv, Jiang, Li, Zhu, Chen, et al., 2015; Ren et al., 2021; Song et al., 2022), unser

framework could effectively characterize the multi-level spatiotemporal features embedded in

multi-task fMRI signals, which provides the bases to identify the interpretable key components

100

for well characterizing and differentiating multi-task signals. Gesamt, the proposed model can

101

disclose the underlying neural implications of key components with greater classification

102

Kapazität, offering an effective and interpretable methodology for decoding fMRI data.

103

Materials and methods

104

Overview

105

The framework of our proposed method is illustrated in Figure 1. The pipeline of the proposed

106

framework can divide into four stages: 1) individual data preparation; 2) data preparation for

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

107

five-fold cross-validation; 3) training and testing process; 4) SVM-based classification and

108

Ratio of activation (ROA) Analyse (Feige. 1A). In the data preparation stage, each individual’s

109

tfMRI data of seven different tasks were extracted and then spatially concatenated to one signal

110

Matrix (the first panel in Fig. 1A). In this work, five-fold cross-validation was performed for

111

model validation, thus the whole dataset was randomly divided into five folds (the second panel

112

in Fig. 1A). In training process, four folds were served as training set, and the tfMRI signal

113

matrices of all the subjects in training set were spatially concatenated to a multi-subject signal

114

Matrix. Dann, the DBN model was applied to training set to derive the weight matrix W, welche

115

served as group-wise temporal features 𝑫1. Dann, the LASSO regression aims to extract the

116

corresponding loading coefficient 𝜶1 based on the defined temporal dictionary 𝑫1 . In the

117

second stage of our model, the loading coefficient 𝜶1 was employed as input to sparse

118

Darstellungen (SR) Modell, where they were decomposed into group-wise dictionaries 𝑫2 and

119

loading coefficient 𝜶2. In testing process, the individual signal matrix in testing set and the

120

group-wise dictionary 𝑫1 obtained during the training phase was utilized as the inputs to the

121

LASSO regression. This yielded the loading coefficients 𝜶𝑡𝑒𝑠𝑡

. Subsequently, employing 𝜶𝑡𝑒𝑠𝑡

122

and the 𝑫2 obtained during the training phase, we performed a second LASSO regression to

123

obtain 𝜶𝑡𝑒𝑠𝑡

, which were then used as the classification features for the testing subjects (Die

124

third panel in Fig. 1A). Note that during the training phase, we utilized the independent training

125

data to learn and train regularization parameters employed for LASSO regression, sowie

126

the group-wise dictionaries 𝑫1 and 𝑫2 , without using any information from the test data.

127

Nachher, to further assess the multi-task fMRI data classification performance of proposed

128

Modell, the loading coefficient 𝜶2 derived from training set was used to train support vector

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

129

machine (SVM) for classification, where the loading coefficient 𝜶𝑡𝑒𝑠𝑡

derived from testing set

130

was then fed into this trained SVM model to identify the testing set labels (the last panel in Fig.

131

1A).

132

Our DBN-SR based framework can also identify the multi-level temporal features, spatial

133

Merkmale, and features for multi-task classification (Feige. 1B). Speziell, the DBN model took

134

fMRI time series from training data as input and produced a weight matrix W for each layer

135

jeweils, which represent the multi-layer temporal features of group-wise tfMRI signals

136

(the first two panels in Fig. 1B). These multi-layer temporal features W were served as the

137

temporal dictionary 𝑫1 and used as input to the LASSO algorithm to regress corresponding

138

loading coefficient 𝜶1, which represents individual-level spatial patterns (the third panel in Fig.

139

1B). Nächste, the loading coefficient 𝜶1 was used as the input of SR stage to derive the common

140

dictionary 𝑫2 and the loading coefficient 𝜶2, which represent group-wise spatial patterns and

141

features for multi-task classification for each layer, jeweils (the last three panels in Fig.

142

1B).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

143

Figur 1. The overview of hybrid deep belief network and sparse representation framework

144

(DBN-SR). (A) The pipeline of multi-task fMRI data classification analysis via the proposed

145

Modell. The seven capital letters refer to seven different tasks respectively (E: emotion, G:

146

gambling, R: relational, M: motor, L: Sprache, S: sozial, and W: work memory). (B) Der

147

detailed illustration of using DBN and SR model to extract multi-level temporal features,

148

spatial features, and features for classification from multi-task fMRI signals. In the second

149

block, the blue line represents temporal features derived from the weights of DBN, während die

150

red line represents task design paradigms.

151

Data acquisition and preprocessing

152

We employed the seven task fMRI data from Q1 release of Human Connectome Project (HCP)

153

in this study (Barch et al., 2013). The details of tfMRI data acquisition and preprocessing

154

pipeline could be referred to our previous study (Song et al., 2022).

155

Speziell, the seven tasks are emotion, gambling, relational, motor, Sprache, sozial,

156

and working memory (WM). The number of time points for each task is shown in Table 1. Als

157

the tfMRI data consist of different time points, we truncated all tfMRI signals to the same time

158

Länge (176 frames). In this work, 60 subjects were used from the released dataset

159

Table1. Details of the condition and frames for seven tasks

TASK

EMOTION GAMBLING RELATIONAL MOTOR

LANGUAGE

SOCIAL WM

Condition

Frames

176

253

232

284

316

274

405

160

The truncation preprocessing, unavoidably, influences the integrity of task design. Für

161

Beispiel, four conditions are excluded from the WM task due to data truncation. dennoch,

162

in terms of other tasks, the truncated tfMRI data include not less than one block for all events

163

(sFig. 1).

164

Data preparation

165

Erste, we extracted the whole-brain fMRI signal for each subject using the standard MNI152

166

template as the mask, resulting in each 2-dimensional matrix. Then the signal matrices of the

167

168

seven tasks for each subject were spatially concatenated into a large matrix 𝑺𝑖

1 (𝑺𝑖

1= [𝑺𝑖,𝐸

1 ,
1 , 𝑺𝑖,𝐺

1 , 𝑺𝑖,𝑀
𝑺𝑖,𝑅

1 , 𝑺𝑖,𝐿

1 , 𝑺𝑖,𝑆

1 , 𝑺𝑖,𝑊

1 ] ∈𝑅t×(n×7), where 𝑺𝑖,𝐸

1 ∈𝑅t×n had 𝑡 time points and 𝑛 voxels. Der

169

seven capital letter subscripts refer to seven different tasks respectively (E: emotion, G:

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

170

gambling, R: relational, M: motor, L: Sprache, S: sozial, and W: work memory). TfMRI time

171

series for each voxel were normalized to derive zero mean and unit norm. In this work, five-

172

fold cross-validation scheme was chosen. Daher, 60 subjects were randomly divided into five

173

equal folds. In each iteration, one fold (12 Fächer) was taken for testing and the rest four (48

174

Fächer) for training. It is noteworthy that the training and testing sets for each iteration were

175

completely independent. Dann, the multi-task fMRI signal matrices of all the subjects in the

176

1 ,
training set were spatially concatenated to compose a multi-subject fMRI matrix 𝑺1 = [𝑺1

177

178

1,…, 𝑺𝑝
𝑺2

1] ∈𝑅t×(n×7×𝑝), where 𝑝 is the number of training subjects (𝑝 = 48 ) (Feige. 1A).

As whole-brain fMRI data generally contain enormous voxels, the group-wise tfMRI

179

signals consisting of multiple tasks and subjects exhibit relatively high dimensionality,

180

inevitably resulting in an overloaded computational burden and memory consumption. To

181

tackle these problems, we randomly selected only 10% of voxels’ whole-brain signals for each

182

subject in training stage (Huan Liu 2017; Song et al., 2022). To ensure the uniform distribution

183

of sampled voxels across different brain regions, we employed the Fisher-Yates shuffle

184

algorithm implemented by the “randperm” function in MATLAB, known for generating

185

random permutations with a uniform distribution (Fischer & Yates, 1938). The distribution of

186

the randomly selected 10% voxels across all subjects can be found in the Supplementary

187

Materials (sFig. 6-7).

188

Deep belief network model-based analysis

189

In this work, we chose DBN to extract group-wise temporal features based on previous research

190

demonstrating its ability to identify meaningful FBNs (Qiang et al., 2020; Ren et al., 2021). In

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

191

allgemein, DBN can be regarded as stacked blocks of Restricted Boltzmann Machines (RBM) (G.

192

E. Hinton, Osindero, & Teh, 2006), an energy-based probability generation model that

193

simulates the potential distribution of input data via interactions between visible and hidden

194

Variablen. While units between visible layer 𝑣 and hidden layer ℎ are connected by weights,

195

there is no connection within the layer. As a multiple stacked RBM model, the DBN model is

196

designed to learn and train weights for each layer. As described in Asja Fischer (2012) and X.

197

Hu et al. (2018), the energy function of the DBN model adopted to update the weights layer by

198

layer is defined as follows:

199

𝐸(𝑣, ℎ) = ∑ 𝑏𝑖𝑣𝑖 − ∑ 𝑏𝑗ℎ𝑗 − ∑ 𝑣𝑗ℎ𝑗𝑤𝑗

(1)

200

Where 𝑣𝑖 and ℎ𝑗 refer to the activation state of two layers; 𝑏𝑖 and 𝑏𝑗 represent their bias; 𝑤𝑗

201

indicate the weight between layer 𝑖 and layer 𝑗.

202

As introduced in the previous section, the tfMRI signals of randomly selected 10% voxels

203

in each individual’s whole brain of multi-task in training set were spatially concatenated to

204

generate a multi-subject fMRI matrix for model training, and thus the group-wise tfMRI time

205

Serie (176 time points) were taken as training samples for the DBN model. In our work, Die

206

neural architecture of DBN model was set as 4 layers and 128 neurons experimentally and

207

empirically (see Parameter Selection part). Speziell, the number of visible variables 𝑡 is the

208

same as the number of time points of fMRI signal (d.h., 176 in our study), und die Anzahl der

209

hidden variables 𝑘1 in each hidden layer represents the number of latent components expressed

210

in fMRI data (𝑘1=128). The DBN model was adopted to model group-wise tfMRI matrix 𝑺1

211

to obtain a weight matrix 𝑤𝑗 from each layer. The weight matrix of visible layer is represented

212

by 𝑤1𝜖𝑅𝑡×𝑘1, and the weight matrix of each hidden layer refers to 𝑤𝑗𝜖𝑅𝑘1×𝑘1 (𝑗 =2,3,4). Der

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

213

multi-layer temporal features 𝑊𝑗 in each layer of DBN model can be derived by successive

214

multiplication of the weight matrices on the adjacent layers ( 𝑊𝑗𝜖𝑅𝑡×𝑘1 ), that is,

215

𝑊4 = 𝑤4 ∗ 𝑤3 ∗ 𝑤2 ∗ 𝑤1 , 𝑊3 = 𝑤3 ∗ 𝑤2 ∗ 𝑤1 , 𝑊2 = 𝑤2 ∗ 𝑤1 , 𝑊1 = 𝑤1. Since each sample

216

input to the DBN model consists of all time points for each voxel, the weights 𝑤𝑗 (𝑗 =1,2,3,4)

217

across 4 layers represent the temporal features of the input fMRI data at different levels of

218

abstraction. Daher, the successive multiplication of weight matrix 𝑊𝑗 (𝑗 =1,2,3,4) obtained from

219

each layer of the DBN model represents multi-level temporal features embedded in fMRI

220

Signale.

221

Drawing inspiration from the successful application of LASSO regression for deriving

222

spatial features in previous studies (Haufe et al., 2014; Lee, Jeong, & Ye, 2013), we performed

223

the LASSO regression to derive individual spatial features. Speziell, the multi-layer

224

temporal features 𝑊𝑗 derived by the DBN model were normalized and then served as the

225

temporal dictionary 𝑫1𝜖𝑅𝑡×𝑘1 (Calhoun et al., 2001; Tibshirani, 2011). Hier, as the successive

226

multiplication of weight matrices leads to the larger scale of deeper dictionaries, A

227

normalization procedure ensures reasonable performance of LASSO regression at the same

228

scale. Subsequently, we employed the original individual signal matrix 𝑺𝑖 (𝑖 ∈1, 2, …, P),

229

along with the temporal dictionary 𝑫1 as input to the LASSO algorithm, which produce the

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

230

dazugehörigen

individual

loading coefficient 𝜶𝑖

1 (𝜶𝑖

1 ∈ 𝑅𝑘1×n, n=228453). Since 𝑫1

231

1
incorporates the group-wise temporal features, the resulting individual loading coefficients 𝜶𝑖

232

obtained through regression can be considered as spatial sparse representations of each

233

individual’s fMRI signals 𝑺𝑖 on the common temporal dictionary 𝑫1 . Folglich, Die

234

individual loading coefficients 𝜶𝑖

1 represent the individual spatial features. Hier, all the loading

235

236

237

coefficient matrix derived from LASSO regression refers to 𝜶1 (𝜶1=[𝜶1

1, 𝜶2

1, …, 𝜶𝑖

1, …, 𝜶𝑝
1 ]

∈𝑅 k1×(n×7×𝑝), 𝜶𝑖

1= [𝜶𝑖,𝐸

1 , 𝜶𝑖,𝐺

1 , 𝜶𝑖,𝑅

1 , 𝜶𝑖,𝑀

1 , 𝜶𝑖,𝐿

1 , 𝜶𝑖,𝑆

1 , 𝜶𝑖,𝑊

1 ]∈𝑅k1×(n×7).

Ähnlich, in order to derive the loading coefficient matrix 𝜶𝑡𝑒𝑠𝑡

for testing set of each

238

layer, the group-wise time-series dictionary matrix 𝑫1 derived from the training stage was

239

applied to model 𝑺𝑡𝑒𝑠𝑡

to obtain 𝜶𝑡𝑒𝑠𝑡

by resolving a typical l-1 regularized LASSO problem.

240

In this work, the regularization parameter 𝜆 1 of LASSO regression was set as 0.1

241

experimentally and empirically.

242

Sparse Representation model

243

Although we successfully obtained individual loading coefficient matrices 𝜶1 and 𝜶𝑡𝑒𝑠𝑡

244

through LASSO regression for the training and testing sets, jeweils, these features were

245

unsuitable for classification due to their high dimensionality (𝜶1 ∈ 𝑅𝑘1×n, 𝑘1=128, n=228453).

246

daher, our next goal was to extract the multi-level group-wise spatial patterns based on the

247

individual spatial patterns, and finally excavate multi-level features for multi-task classification

248

that could distinguish multi-task fMRI signals and reveal the distinctive organization patterns

249

of different task stimulations. Hier, we adopted a sparse representation based model, welche

250

has already been proven as an effective algorithm in previous research to identify the intrinsic

251

spatial functional patterns and features for multi-task classification from fMRI data (Song et

252

al., 2022; S. Zhang et al., 2016). Speziell, we first aggregated all the loading coefficient

253

matrices 𝜶𝑖

1 of all the subjects into one matrix 𝑺2 for each layer of the DBN model (𝑺2= [𝑺1
2,

254

255

2,…,𝑺𝑖
𝑺2

2,…, 𝑺𝑝

2] ∈𝑅k1×(n×7×p), where 𝑺𝑖

2= [(𝜶𝑖,𝐸

1 )T , (𝜶𝑖,𝐺

1 )T, (𝜶𝑖,𝑅

1 )T, (𝜶𝑖,𝑀

1 )T, (𝜶𝑖,𝐿

1 )T, (𝜶𝑖,𝑆

1 )T,

1 )T] ∈𝑅n×(7×k1). Dann, 𝑺2 would be served as the input for dictionary learning and sparse

(𝜶𝑖,𝑊

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

256

representation to derive a group-wise spatial dictionary 𝑫2∈𝑅n×k2 and the corresponding

257

loading coefficients 𝜶2 for each layer, jeweils. Note that 𝑘2 represents the number of

258

259

dictionary atoms, which was set as the same value as 𝑘1 (𝑘2 =128). Hier, 𝜶𝟐 =[𝜶1

2 , 𝜶2

2 , …,

𝜶𝑖

2 , …, 𝜶𝑝

2 ]∈𝑅k2 ×(k1 ×7×p) , where 𝜶𝑖

2 =[𝜶𝑖,𝐸

2 , 𝜶𝑖,𝐺

2 , 𝜶𝑖,𝑅

2 , 𝜶𝑖,𝑀

2 , 𝜶𝑖,𝐿

2 , 𝜶𝑖,𝑆

2 , 𝜶𝑖,𝑊

2 ]∈𝑅k2 ×k1 ×7 .

260

The loss function of sparse representation model yields a sparse resolution constraint on the

261

loading coefficient 𝜶 2 with an l1 regularization (Eq. (2)), where 𝜆 2 is a regularization

262

parameter that can balance the regression residual and sparsity level. 𝜆 2 was set as 0.05.

263

264

𝑀𝑖𝑛

‖𝑺2 − 𝑫2𝜶2‖𝐹

2 + λ2‖𝜶2‖1,1

(2)

To prevent 𝑫2 from arbitrarily large values that cause the trivial solution of the

265

optimization, the columns 𝑑 1, 𝑑 2, …, 𝑑 k are restricted by Equation (3).

266

267

𝐶 ≜ {𝑫2∈𝑅t×k2,𝑠 .𝑡 .∀𝑗 = 1,⋯,𝑘 2 , 𝑑𝑗

𝑇𝑑𝑗 ≤ 1}

(3)

As the dictionary 𝑫2 was obtained by a sparse representation of 𝜶𝟏, which comprise all

268

individual spatial features, the learned dictionary 𝑫2consequently represents the group-wise

269

spatial features. Correspondingly, 𝜶𝑖

2 was a sparse representation on the common spatial

270

dictionary 𝑫2 . Given the ability of a sparse representation model to effectively reduce the

271

dimensionality of raw fMRI data while retaining its essential information, the resulting intrinsic

272

Merkmale (𝜶𝑖

2) derived from the extraction of common temporal and spatial dictionaries can

273

effectively capture the variations in spatio-temporal patterns of functional brain activity across

274

different tasks. Infolge, these intrinsic features were suitable for multi-task classification.

275

To derive the 𝜶𝑡𝑒𝑠𝑡

of testing set for post-hoc classification analysis, we also leveraged

276

the LASSO regression algorithm for each layer. Speziell, the loading coefficient matrix

277

1
𝜶𝑡𝑒𝑠𝑡

was regarded as the input matrix 𝑺𝑡𝑒𝑠𝑡

, and the dictionary matrix 𝑫2 derived from the

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

278

training stage was employed to model 𝑺𝑡𝑒𝑠𝑡

to learn the loading coefficient 𝜶𝑡𝑒𝑠𝑡

. All the

279

parameters in testing stage were set the same as in the training stage.

280

Parameter Selection

281

The determination of hyperparameters, such as the number of cross-validation folds, Die

282

number of layers and neurons of the DBN model, and the regularization parameters of the

283

sparse representation model, was accomplished through a combination of referring to previous

284

studies and learning from the training set, the testing set was not involved in any parameter

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

285

selection process.

286

The choice of cross-validation folds is crucial as it offers a trade-off between precision

287

and computational cost for performance estimation (Hansen et al., 2013). Commonly used

288

cross-validation folds in current machine learning experiments often include 2-fold, 5-fold, 10-

289

fold, or the leave-one-out method. In theory, while some studies suggest the 10-fold or leave-

290

one-out method may provide a higher estimated accuracy (Kohavi, 1995), some reveals that 5-

291

fold or 10-fold is the optimal choice for balancing computational cost and accuracy (Hansen et

292

al., 2013). Jedoch, due to the need for our framework to combine all individuals within the

293

training set to extract group-wise temporal features during training phase, the computational

294

resource demands of the 10-fold or leave-one-out method are greater. daher, we opted for

295

the 5-fold approach. To further validate our selection, we conducted a comparative analysis

296

between the 2-fold and 5-fold to assess the decoding accuracy. The findings revealed that the

297

average decoding rate was slightly lower for the 2-fold compared to the 5-fold, providing

298

additional confirmation of our initial selection. (sTab. 1).

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

299

Our selection of a 4-layer, 128-neuron DBN structure was set based on our previous study

300

utilizing the neural architecture search technique (NAS) for recognizing spatio-temporal

301

features from fMRI data (Xu, Ren, Tao, Song, & Er, 2022),which effectively determined the

302

optimal structure for DBN model with 3 layers and 120-150 Neuronen. daher, in our study,

303

we defined the number of neurons as 128 and experimented with both 3-layer and 4-layer

304

configurations to extract meaningful task-related temporal features. Speziell, we compared

305

the group-wise temporal features derived from DBN model with 3-layer and 4-layer structures,

306

in terms of their Pearson correlation coefficient (PCC) with task paradigm curve, based on

307

training set (fold 5). The results revealed that the 4-layer DBN outperformed in capturing

308

temporal features, as indicated by the higher PCC values observed in 4-layer structure (Tab. 2).

309

In terms of selecting the number of neurons, we took into consideration computational

310

efficiency. We determined that selecting 128 Neuronen, a power of two within the desired range

311

von 120-150, would optimize computational speed. Somit, we concluded that the optimal

312

configuration for the DBN model with 128 neurons and 4 layers.

313

The regularization parameter (λ) plays a crucial role in sparse representation and LASSO

314

regression. Although no golden standard exists for determining the value of λ, previous studies

315

on FBN recognition have experimentally set λ within the range of 0.05 Zu 0.5 (Fangfei Ge,

316

2018; Lv, Jiang, Li, Zhu, Chen, et al., 2015; Shu Zhang 2017). In our previous work on task

317

fMRI data classification using a two-stage sparse representation approach, we conducted

318

parameter selection experiments within the range of λ from 0.05 Zu 0.5 and found that the

319

highest accuracy was achieved when λ1=0.1 and λ2=0.05 or 0.1 (Song et al., 2022). Hier, λ1

320

and λ2 represent the regularization parameters for the LASSO regression and sparse

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

321

representation, jeweils. daher, in this study, we determined the λ1 as 0.1, Und

322

systematically changed the setting of the regularization parameter in the sparse representation

323

λ2 (λ2=0.05, 0.1) while evaluating their impact on the obtained group-wise spatial features

324

derived from training set (fold 5). The results showed that when λ2 was set to 0.05, a greater

325

number of FBNs could be identified in the group-wise spatial features 𝑫2 by comparison with

326

the general linear model (GLM) -derived activation patterns (Tab. 3). Folglich, we set

327

λ1=0.1 and λ2=0.05 as regularization parameters for LASSO regression and sparse

328

representation stage, jeweils. To further validate this, we assessed the classification

329

accuracy on testing dataset using these two different λ2 values (0.05, 0.1) while keeping λ1=0.1

330

for all 5 folds. The results demonstrated that λ2=0.05 achieved higher accuracy, reconfirming

331

our choice (sTab. 2).

332

Tisch 2. Comparison of Pearson correlation coefficient (PCC) for 3-layer structure and

333

4-layer structure.

Structure
3-layer

4-layer

Layer1
0.48±0.12

0.55±0.00

Layer2
0.52±0.06

0.63±0.01

Layer3
0.50±0.06

0.66±0.03

Layer4

0.71±0.02

Mean±SD
0.50±0.08

0.64±0.02

334

Tisch 3. Comparison of the number of identified FBNs cross each layer for different λ2

335

Werte.

λ2
0.05

0.1

Layer1
15

Layer2
17

Layer3
22

Layer4
45

336

Identification of multi-level temporal patterns

337

As mentioned in the “Deep belief network model based analysis” section, 𝑊𝑗 of the 𝑗-th hidden

338

layer (𝑗 = 1,2,3,4) represents the temporal features of group-wise tfMRI for respective layer

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

339

(Feige. 1B). Here we used PCC as a metric to identify the task-related temporal features (Benesty,

340

Chen, Huang, & Cohen, 2009; Lv, Jiang, Li, Zhu, Chen, et al., 2015). Speziell, we first

341

calculated the task paradigm curves convolved with hemodynamic response function (HRF).

342

Nächste, we computed the PCC values between the convolved task paradigm curves and the atoms

343

in the group-wise temporal features 𝑫1 derived from the DBN model, following standard

344

procedures employed in previous studies (Kay, Rokem, Winawer, Dougherty, & Wandell, 2013;

345

O’Reilly, Woolrich, Behrens, Schmied, & Johansen-Berg, 2012). The PCC of the identified

346

temporal features and the task-based stimulus can be defined as Equation (4).

347

348

1
Pcorr, c =corr (𝑫𝑐

, TASK)

(4)

Hier, 𝑫𝑐

1 refers to the c-th component in temporal features 𝑫1 derived from DBN stage (c = 1,

349

⋯,𝑘 1). TASK represents the task paradigm curves convolved with HRF. Im Wesentlichen, Pcorr, C,

350

measures the temporal similarity between the temporal patterns of 𝑫𝑐

1 and the task stimulus.

351

The atoms with the highest PCC value in group-wise temporal features 𝑫1 were chosen to

352

represent the multi-layer temporal features.

353

Identification of multi-level spatial patterns

354

The multi-level spatial patterns can also be identified in the second stage of sparse

355

356

representation model. Speziell, the 𝑺𝑖,𝑡

1 can be factorized into 𝑫1 and the loading

coefficient 𝜶𝑖,𝑡

1 , which represent the group-wise temporal features and the individual spatial

357

Merkmale, jeweils. Hier, 𝑖 refers to 𝑖 -th subjects (i∈1, 2, …, P, and p=48 in this work), 𝑡

358

means 𝑡 kind of task, 𝑡∈ 𝚽 = {E, G, R, 𝑀 , L, 𝑆 , W}. To further derive the group-wise spatial

359

Merkmale, the transposition of 𝜶1 could be then decomposed into 𝑫2 and 𝜶2 as shown in

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

360

361

Gleichung (5). Since the transpose of 𝜶𝑖,𝑡

1 can be expressed as dictionary 𝑫2 multiplied by

loading coefficient 𝜶𝑖,𝑡

2 (Gleichung (5)), the relationship between 𝑺𝑖,𝑡

1 and 𝑫1 , 𝑫2 , 𝜶2 can be

362

deduced as Equation (6) shown, which also consistent with previous studies (Huan Liu 2017;

363

Song et al., 2022).

364

365

366

2 = (𝜶𝑖,𝑦
𝑺𝑖,𝑡

2
1 )𝑇= 𝑫2 × 𝜶𝑖,𝑡

1 = 𝑫1×𝜶𝑖,𝑡
𝑺𝑖,𝑡

1 = 𝑫1 × (𝑫2 ×𝜶𝑖,𝑡

2 )𝑇

(5)

(6)

Since all subjects share the same group-wise temporal dictionary 𝑫1 , the common

367

dictionary 𝑫2 contained group-wise spatial patterns, of which atoms could be used to define

368

the FBNs. Daher, the corresponding multi-layer spatial features were derived from the common

369

dictionary 𝑫2 for each layer of the proposed framework (the fourth and fifth panels in Fig. 1B).

370

We then identified the spatial correlation coefficient (SCC) to quantify the similarity

371

between spatial patterns obtained from the proposed framework and the GLM -derived

372

activation patterns. Speziell, the GLM-based analysis was performed individually, followed

373

by group-wisely analysis using FSL FEAT (http://www.fmrib.ox.ac.uk/fsl/feat5/index.html).

374

The group-level GLM-based results were employed for comparison. More details of GLM

375

analysis are available in previous literature (Lv, Jiang, Li, Zhu, Zhang, et al., 2015). The SCC

376

is defined in Equation (7) (Ben J. Harrison, 2008; Zuo et al., 2010):

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

377

𝐑 (𝑿 , 𝑻 ) =

𝑛 (𝑋𝑝−𝑋̅)(𝑇𝑝−𝑇̅)
𝛴𝑝=1
2

√𝛴𝑝=1

𝑛 (𝑋𝑝−𝑋̅)

𝑛 (𝑇𝑝−𝑇̅)

⋅𝛴𝑝=1

(7)

378

where 𝑿 is the spatial functional network derived by the proposed framework, 𝑻 represents

379

the GLM-derived activation template, and 𝑛 refers to the number of voxels of whole brain.

380

SVM-based classification method

381

To further classify multi-task fMRI signals, we performed five-fold cross-validation to evaluate

382

the classification performance of the proposed framework. As the linear SVM has optimization

383

and generalization capability in limited sample sizes, as well as its proven effectiveness in

384

multi-class classification (Chang & Lin, 2011B; Jang et al., 2017), we conducted multi-task

385

classification analysis based on linear SVM classifier, which was established by the LIBSVM

386

toolbox (Chang & Lin, 2011A). For each layer, as the loading coefficient 𝜶2 contains both

387

temporal and spatial features embedded in fMRI signals, we first trained the SVM classifier

388

using 𝜶2 derived from training set, and then evaluated the classification performance by

389

feeding the 𝜶𝑡𝑒𝑠𝑡

of testing set into the trained SVM model. Based on the true label of seven

390

tasks for each loading coefficient 𝜶𝑡𝑒𝑠𝑡

, the classification accuracy of each layer in each fold

391

was defined as the percentage of correctly predicted samples. The final classification accuracy

392

for each layer is the average of five folds for seven tasks. We then calculated the specificity of

393

each fold for each layer, and the final specificity for each layer is the average of the five folds.

394

ROA-based analysis

395

The further goal aimed at uncovering discriminative functional components for multi-task

396

classification. Inspired by the successful use of the Ratio of activation (ROA) in identifying

397

discriminative components for decoding resting state fMRI (rsfMRI) and tfMRI (S. Zhang et

398

al., 2016), we raised a novel ROA metric to identify the key components for seven-task

399

classification. The ROA of the 𝑗-th row in loading coefficients 𝜶2 could be defined as follows:

400

𝑁𝑡 = |𝜶2(𝑗, 𝑘)|0, 𝑘𝑡ℎ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑡𝑎𝑠𝑘(𝑡)
20

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

401

402

403

ROA𝑗 = √1

𝑇

∑ (𝑁𝑡 − 𝑁𝑡̅̅̅)2

𝑇
𝑡=1

(8)

In Equation (8), 𝜶2 represent all the individual spatio-temporal features, 𝜶2= [𝜶1

2, 𝜶2

2, …,

𝜶𝑖

2, …, 𝜶𝑝

2]∈𝑅k2 ×(k1 ×7×p) (𝑘1= 𝑘2=128, p=48). 𝑖 refers to 𝑖 -th subject (𝑖 ∈1, 2, …, P). 𝑡

404

represents task index (t∈1, 2, …, 7), and 𝑇 represents the number of task paradigms (d.h., 7 In

405

unsere Arbeit). Task (𝑡) represents each of the seven different tasks. 𝑁𝑡 represents the activation

406

level for each task, and 𝑁𝑡̅̅̅ represents the average of 𝑁𝑡 (𝑡 = 1, ⋯,7). Hier, the activation level

407

𝑁𝑡 was defined by counting the number of non-zero entries marked as each task in the

408

corresponding each row vector of 𝜶2 (t∈1, 2, …, 7). As 𝜶2 is a sparse matrix, the task with a

409

higher count of nonzero elements in the row vectors of 𝜶2 is deemed to be more “active”.

410

daher, 𝑁𝑡 represents each task’s activation level in the row vectors of 𝜶2. The ROA was

411

calculated by counting the standard deviation of 𝑁𝑡 across the seven tasks. A larger ROA value

412

(d.h., larger standard deviation) indicates greater differences in activity levels across the seven

413

tfMRI signals, which were more discriminative for multi-task classification.

414

To validate that the components of higher ROA values capture greater capacity in

415

classifying the multi-task fMRI signals, an experiment was designed as illustrated below. Nach

416

sorting the ROA values for all components (d.h., rows in loading coefficients 𝜶2) from highest

417

to lowest, we iteratively adopted more rows sorted by their ROA values in 𝜶2 as feature inputs

418

for training the SVM classifier, das ist, the components with higher ROA values were used

419

preferentially for training. Afterwards, the corresponding components of 𝜶𝑡𝑒𝑠𝑡

from testing set

420

were entered into the trained SVM model to evaluate the classification accuracy. Speziell,

421

to define the key components with greater capacity for multi-task classification in each layer,

422

we have repeated this ROA analysis using 𝜶2 derived from each layer of proposed model. Hier

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

423

we applied the same classification scheme described in the previous section “SVM-based

424

classification method”.

425

After establishing the ROA metric for the classification features 𝜶2 , our subsequent

426

objective is to elucidate the neural implications of these classification features. Given that each

427

row of 𝜶2 corresponds to each column of 𝑫2 (d.h., each atom in 𝑫2), and these atoms can be

428

mapped back to brain space, we thus established a relationship between the brain activations

429

derived from the atoms in 𝑫2 and the ROA values of the row vectors of 𝜶2. This connection

430

allows us to interpret neural implications of classification features.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

431

Result

432

Classification performance of multi-task fMRI signals

433

By applying the proposed DBN-SR framework to multi-task fMRI data using five-fold cross-

434

validation strategy, our results reveal that the fMRI data of seven tasks can be accurately

435

classified. In detail, the classification accuracy for five-fold ranges from 92.86% Zu 100%, mit

436

an average accuracy of 97.86%±3.42% (Mean ± SD) in the layer 4 (Feige. 2A), welche

437

demonstrated the proposed framework can effectively uncover the inherent differences in

438

composition patterns of multi-task fMRI signals.

439

We also explored the classification performance based on features derived from each layer

440

of the proposed framework (Feige. 2). The trend of the classification accuracy curves for five

441

folds is relatively steady, with an average accuracy of 98.15%±0.90% (Mean±SD) (Feige. 2A).

442

Darüber hinaus, the average accuracies across five-fold from layer1 to layer4 are 99.29%, 98.33%,

443

97.14%, Und 97.86%, jeweils. We depicted confusion matrices for each layer to represent

444

the average classification accuracy of the seven tasks, as shown in Figure 2b. The results

445

indicate that all the average classification accuracies for seven tasks across five-fold are greater

446

als 95% in each layer, except for three major confusions, das ist, gambling task in layer 3 Und

447

layer 4, relational task in layer 2 and layer 3, and language task in layer 3 (Feige. 2B). Zusätzlich,

448

the specificity of classification results of the first two layers is slightly higher than that of the

449

deeper two layers (Feige. 2C). Gesamt, the classification performance of the shallower layers is

450

relatively better than that of the deeper layers.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

451

452

Figur 2. Classification performance. (A) The classification accuracy of five-fold in each layer.

453

(B) The average confusion matrices of five-fold cross-validation on the seven tasks. (C) Der

454

average specificity of five-fold cross-validation classification on the seven tasks.

455

Identified multi-level temporal and spatial patterns of multi-task fMRI signals

456

Multi-level temporal patterns

457

Our DBN-SR based framework can effectively identify the temporal patterns of multi-task

458

fMRI signals at multi-scale (Feige. 3). In each layer, we quantitatively compared the PCC of the

459

identified temporal features and each task-based stimulus. Those atoms with the highest PCC

460

value in temporal dictionary 𝑫1 were chosen to represent the task-related temporal patterns.

461

We randomly select one training fold as an example to show the representative temporal

462

patterns for each layer (fold 5) (Feige. 3). The average PCC values of seven tasks for all 5-fold

463

can be found in Supplemental Table 6.

464

The overall multi-level temporal patterns are relatively consistent with the task design

465

paradigms. Speziell, the average PCC of seven tasks from layer1 to layer4 is 0.55±0.12,

466

0.61±0.03, 0.65±0.07, and 0.71±0.08 (Mean ± SD), jeweils, where the highest correlation

467

is observed in layer4 (Feige. 3). Intriguingly, there exist gradient in the resolution of temporal

468

patterns derived from different layers. In the shallow layer, all the identified temporal patterns

469

are mixed with many random noises, resulting in a relatively poor correlation with task

470

paradigms. In comparison, in the deeper layer, the temporal patterns are smoother and more

471

consistent with the original task design curves, indicating that DBN-SR model can filter noises

472

in each layer while keeping useful information of brain activities, which agrees with the former

473

Forschung (H. Huang et al., 2018; Wei Zhang, 2020).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

474

475

Figur 3. Comparison of group-wise temporal patterns for seven tasks across different layers,

476

including the identified temporal features (blue lines) and the task paradigms (red lines). Der

477

quantitative similarities (PCC) of identified temporal features with task paradigms are also

478

provided. The y-axis represents the stimulus response amplitude, while the x-axis represents

479

time point. The background colors represent different layers of our DBN-SR model. The lighter

480

colors represent shallower layers, while the darker colors represent deeper layers.

481

Multi-level spatial patterns

482

Our framework can also effectively identify the spatial patterns from different layers. Am meisten

483

predominant spatial patterns identified by the proposed framework are the task-evoked FBNs,

484

including emotion, gambling, relational, motor, sozial, Sprache, and working memory. In each

485

layer, we quantitatively compared the SCC of the identified spatial patterns and the GLM-

486

derived activation patterns. Those atoms with the highest SCC value in spatial dictionaries 𝑫2

487

were chosen to represent the spatial pattern. We randomly selected one training fold to illustrate

488

the representative FBNs for each layer (Feige. 4).

489

Gesamt, the spatial patterns are generally consistent with the GLM-derived activation

490

patterns, with increasingly precise resolution from shallow to deep layers. Quantitatively, Die

491

average SCC of seven tasks from layer1 to layer4 is 0.36±0.20, 0.26±0.11, 0.40±0.12, Und

492

0.48±0.12 (Mean ± SD), jeweils, where the highest SCC is observed in layer 4 (Feige. 4).

493

Intriguingly, there exist distinct differences among spatial patterns derived from different layers.

494

The spatial patterns across layers show a trend of increasing consistency with the GLM-derived

495

activation patterns, and are more compact in deeper layers for most tasks. In der Zwischenzeit, mehr

496

FBNs can be found in the deeper layers compared with shallow layer. Zum Beispiel, some FBNs

497

cannot be identified in the first three layers, such as FBNs related to gambling and relational

498

tasks (Feige. 4).

499

Figur

500

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

501

Comparison of group-wise spatial patterns for seven tasks across different layers. The spatial

502

correlation coefficient (SCC) between each identified spatial pattern and GLM-derived

503

activation pattern is labeled on top of each brain map.

504

505

Apart from FBNs, the proposed framework can also effectively detect various artifact-

related components. Speziell, the atoms in spatial dictionary 𝑫2 can represent the group-
26

506

wise spatial features and can be mapped back to the 3D brain volume. Subsequently, Wir

507

manually inspected whether spatial map matched the known types of artifacts based on

508

previous study (Salimi-Khorshidi et al., 2014). Through this process, we found several artifact-

509

related components, including movement-related, cardiac-related, sagittal sinus, susceptibility-

510

Bewegung, white-matter, and MRI acquisition/reconstruction related (Feige. 5).

511

512

Figur 5. Identified artifact components, including movement-related, cardiac-related, sagittal

513

sinus, susceptibility-motion, white-matter, and MRI acquisition/reconstruction related.

514

Gesamt, our effective DBN-SR model is capable of characterizing the multi-level

515

spatiotemporal features of brain function. The quantitative analysis further demonstrates that,

516

in deeper layer, the representative temporal features correspond well with task design curves,

517

and the spatial features are relatively more consistent with the GLM-derived activation. In

518

addition to task-evoked functional components, our framework could also effectively identify

519

artifact components from group-wise multi-task fMRI data, laying the groundwork for further

520

research into the functional role of these components in multi-task classification.

521

Identification of discriminative features by ROA analysis

522

As depicted in the “ROA-based analysis” section, we first computed the ROA index by sorting

523

the ROA values of all the components in loading coefficients 𝜶2 of the training set, Dann, In

524

order to evaluate the classification performance, the corresponding components in the loading

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

525

coefficient 𝜶𝑡𝑒𝑠𝑡

of testing set were fed sequentially into the trained SVM classifier according

526

to the ROA index. Hier, the classification results of each layer on one randomly selected testing

527

fold dataset (fold 5) using different number of components, sorted by their ROA values, Sind

528

illustrated in Fig. 6A. While the number of components increases from 1 Zu 20, the accuracy

529

curves of four layers grow monotonically, and the average accuracy of all curves rises to

530

91.96%. When more than twenty components are included for classification, the accuracy

531

curves of four layers exhibit a plateau with accuracies reaching close to 100%, indicating that

532

the additional components with lower ROA values contribute less to the successful

533

classification of multi-task signals. Daher, the top twenty components with higher ROA values

534

can be regarded as key components for the classification task to some extent. Generally, unser

535

method can effectively disclose the key components with great classification capacity. In

536

addition, the findings are consistent across different testing folds, hence the additional results

537

of the other four folds are included in the Supplementary Materials (sFig2-5).

538

To further investigate the neural implications of key components with greater

539

classification capacity, we inspected the spatial patterns of the top twenty key components

540

identified by ROA analysis in each layer. By further analyzing the composition of the twenty

541

key components in each layer, we found that these key atoms are either FBNs or artifact-related

542

components, which were identified by visually examining their spatial patterns with established

543

templates and further calculating their SCC with GLM-derived activation maps.

544

Intriguingly, our results show that the top twenty key components in the four layers are

545

largely composed of artifacts, while the proportion of FBNs in key components is small as a

546

ganz. Andererseits, the proportion of FBNs is relatively higher in deeper layers compared

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

547

to shallower layers (Feige. 6B). This conclusion aligns with the findings when using the top 40

548

components as key components (sFig. 8).

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

549

550

Figur 6. ROA classification results in each layer (fold 5). (A) Classification accuracy for

551

SVM-based classification of four layers using the different number of components sorted by

552

their ROA values. (B) The composition of twenty key components sorted by ROA value across

553

each layer.

554

Diskussion

555

In this study, we proposed a hybrid spatio-temporal deep belief network and sparse

556

representation framework to decode multi-task fMRI signals on a relatively small cohort

557

dataset. Our framework could classify fMRI signals of seven tasks with high accuracy and

558

detect multi-level temporal patterns and FBNs, suggesting the effectiveness of the proposed

559

method. Zusätzlich, our framework can reveal key components including artifact components

560

and functional brain networks in multi-task classification and uncover their underlying

561

neurological implication.

562

Our proposed framework is composed of several elements, including DBN model,

563

LASSO regression, sparse representation, and SVM classifier, resulting in a relatively complex

564

Struktur. Trotzdem, our framework achieved a relatively higher classification accuracy in

565

comparison to prior research that also conducted classification of 7 task states on the HCP

566

dataset (X. Huang, Xiao, & Wu, 2021; Wang et al., 2020), while also yielding interpretable

567

classification components. Speziell, Wang et al. (2020) reported two standard machine

568

learning algorithms, namely MVPA-SVM and DNN, and X. Huang et al. (2021) proposed a

569

novel framework (CRNN) incorporating multiple modules such as CNN, recurrent neural

570

Netzwerk (RNN), and attention mechanism. The average accuracy of our framework (98.15%)

571

is much higher than that of MVPA-SVM (69.2%) and comparable to the accuracies of DNN-

572

based model (93.7%) and CRNN-based model (94.31%) (X. Huang et al., 2021; Wang et al.,

573

2020). Zusätzlich, the neuroscientific implications of their results remain elusive. In

574

conclusion, our proposed model achieved higher decoding accuracy than these models, while

575

also providing a more comprehensive and interpretable methodology for decoding fMRI data.

576

Außerdem, our model unveils multi-level temporal and spatial patterns, demonstrating

577

a resolution gradient spanning from shallow to deep layers. Speziell, in the deeper layers,

578

the identified temporal features are better correlated to the original task paradigm curves.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

579

In der Zwischenzeit, more diverse FBNs can be detected and the spatial features show more consistency

580

with the GLM-derived activation patterns, in deeper layers.

581

Intriguingly, although more higher-order FBNs can be detected in deeper layers, Die

582

classification accuracy using features for multi-task classification derived from deeper layers

583

is lower than that of shallower layers, indicating that these higher-order FBNs are not much

584

helpful for multi-task classification. To validate this observation, we specifically selected only

585

FBNs components from all available components across all five folds for multi-task

586

classification, resulting in an average accuracy of 97.08%±2.14% (Mean±SD), slightly lower

587

than the classification rate obtained using all components (98.15%±0.90%) (sTab. 3). Der

588

possible reason is that the FBNs evoked by different cognitive tasks may have co-activated

589

Gehirnregionen, thus the FBNs components alone may not fully reveal the potential fundamental

590

differences in functional composition patterns of multi-task fMRI data. On the other hand,

591

ROA-based analyses indicate that artifact components occupy higher proportion of key

592

components for multi-task classification in shallower layers than that in deeper layers, along

593

with higher classification accuracy and specificity in the shallower layers. These findings

594

suggest that the artifact components play an important role in multi-task fMRI signal

595

classification, which is also consistent with previous research, where the artifact components

596

of the EEG signal are significantly more informative than brain activity concerning

597

classification accuracy (McDermott et al., 2021).

598

While our study provides novel insight into the core functional components in decoding

599

multi-task fMRI signals, it is important to note that there are three limitations. The first

600

limitation is the manual setting of parameters for DBN and sparse representation framework,

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

601

mainly including the number of neuron nodes and layers in DBN and the sparsity penalty

602

parameter of SR. Daher, automatic optimization of model parameters is one of the future

603

research directions. The second limitation stems from our inability to detect FBNs related to

604

gambling and relational tasks within the first two to three layers of the DBN-SR framework.

605

This could be attributed to more noise present in the group-wise temporal features 𝑫1 extracted

606

at lower levels (Feige. 1). Zusätzlich, LASSO regression may not be well-suited for handling

607

noisy shallow features, thus making it challenging for LASSO regression to accurately capture

608

the underlying spatial patterns. To address this limitation, future studies could explore

609

alternative regression approaches that are better suited for handling noisy shallow features,

610

thereby improving the accurate acquisition of the underlying spatial patterns. The third

611

limitation is that our study employed a relatively small dataset, consisting of 60 individuals out

612

von 68 from HCP Q1 dataset. To assess the robustness of our model, we included the remaining

613

8 individuals from the same dataset as a hold-out dataset, 6 of which do not have complete data

614

for all 7 tasks (sTab. 4). Jedoch, this does not affect their suitability as an independent lock

615

box dataset to test the performance of our trained model. The results revealed that the average

616

decoding accuracy for these 8 individuals (96.43%) was comparable to the 5-fold cross-

617

validation accuracy of the 60 individuals (sTab. 5), suggesting the robustness of our model.

618

dennoch, we acknowledge that a larger dataset would lend further support to our findings.

619

In future work, we aim to apply our model to more extensive or multicenter datasets to evaluate

620

its generalizability and robustness.

621

Gesamt, with the superiority of interpretability and effectiveness of DBN-SR model on

622

small datasets, our framework could potentially be useful to differentiate abnormal brain

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

623

function in clinical research.

624

Danksagungen

625

This work was supported by the National Natural Science Foundation of China (Grant. NEIN.

626

62006187), the Youth Innovation Team Foundation of Education Department of Shaanxi

627

Province Government (Grant. NEIN. 21JP119), the China Postdoctoral Science Foundation

628

Funded Project (Grant No. 2021M702650), the National Natural Science Foundation of China

629

(Grant. NEIN. 61971350), the National Natural Science Foundation of China (Grant. NEIN.

630

12271434), Natural Science Basic Research Program of Shaanxi (Grant. NEIN. 2023-JC-JQ-57),

631

and the Key Research and Development Program Project of Shaanxi Province (Grant. NEIN.

632

2020SF-036). We thank the Human Connectome Project for providing Quarter 1 (Q1) Dataset

633

(https://www.humanconnectome.org/study/hcp-young-adult/document/q1-data-release).

634

Reference

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

635

636

637

638

639

640

641

Asja Fischer, C. ICH. (2012). An Introduction to Restricted Boltzmann Machines. Paper presented at the

Iberoamerican Congress on Pattern Recognition, Berlin.

Barch, D. M., Bürger, G. C., Harms, M. P., Petersen, S. E., Schlaggar, B. L., Corbetta, M., . . .

Konsortium, W. U.-M. H. (2013). Function in the human connectome: task-fMRI and individual

differences in behavior. Neurobild, 80, 169-189. doi:10.1016/j.neuroimage.2013.05.033

Ben J. Harrison, J. P., Marina Lo´ pez-Sola, Rosa Herna´ ndez-Ribas, Joan Deus, Hector Ortiz, Carles

Soriano-Mas, Murat Yu¨ cel, Christos Pantelis, and Narcı´s Cardoner. (2008). Consistency and

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

functional specialization in the default mode brain network. PNAS, 105, 9781–9786.

Benesty, J., Chen, J., Huang, Y., & Cohen, ICH. (2009). Pearson correlation coefficient. In Noise reduction

in speech processing (S. 1-4): Springer.

Bengio, Y., Courville, A. C., & Vincent, P. (2012). Unsupervised feature learning and deep learning: A

review and new perspectives. CoRR, abs/1206.5538, 1(2665), 2012.

Bo Liu, Y. W., Yu Zhang, Qiang Yang. (2017, August). Deep Neural Networks for High Dimension, Low

Sample Size Data. Paper presented at the IJCAI, Melbourne.

Calhoun, V. D., Adali, T., McGinty, V. B., Pekar, J. J., Watson, T. D., & Pearlson, G. D. (2001). fMRT

activation in a visual-perception task: network of areas detected using the general linear model

Und

independent

components

Analyse.

Neurobild,

14(5),

1080-1088.

doi:10.1006/nimg.2001.0921

Chang, C.-C., & Lin, C.-J. (2011A). Libsvm. ACM Transactions on Intelligent Systems and Technology,

2(3), 1-27. doi:10.1145/1961189.1961199

Chang, C.-C., & Lin, C.-J. (2011B). LIBSVM: a library for support vector machines. ACM transactions

on intelligent systems and technology (TIST), 2(3), 1-27.

Davatzikos, C., Ruparel, K., Fan, Y., Shen, D. G., Acharyya, M., Loughead, J. W., . . . Langleben, D. D.

(2005). Classifying spatial patterns of brain activity with machine learning methods: application

to lie detection. Neurobild, 28(3), 663-668. doi:10.1016/j.neuroimage.2005.08.009

Dong, Q. (2020). Modeling Hierarchical Brain Networks via Volumetric Sparse Deep Belief Network

(VSDBN). Computerized Medical Imaging and Graphics.

Fangfei Ge, J. L., Xintao Hu , Lei Guo , Junwei Han , Shijie Zhao, Tianming Liu (2018, April 4-7).

Exploring intrinsic networks and their interactions using group wise temporal sparse coding.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

Paper presented at the International Symposium on Biomedical Imaging (ISBI 2018),

Washington, D.C., USA.

Fischer, R. A., & Yates, F. (1938). Statistical tables for biological, agricultural aad medical research.

Statistical tables for biological, agricultural aad medical research.

Friston, K. J. (2009). Modalities, Modes, and Models in Functional Neuroimaging. SCIENCE, 326, 399-

403.

Hansen, K., Montavon, G., Biegler, F., Fazli, S., Rupp, M., Scheffler, M., . . . Müller, K. R. (2013).

Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization

Energies. J Chem Theory Comput, 9(8), 3404-3419. doi:10.1021/ct400195d

Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B., & Bießmann, F. (2014).

On the interpretation of weight vectors of linear models in multivariate neuroimaging.

Neurobild, 87, 96-110.

Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nat Rev

Neurosci, 7(7), 523-534. doi:10.1038/nrn1931

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neuronal

Comput, 18(7), 1527-1554. doi:10.1162/neco.2006.18.7.1527

Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. Parallel

distributed processing: Explorations in the microstructure of cognition, 1(282-317), 2.

Hu, J., Kuang, Y., Liao, B., Cao, L., Dong, S., & Li, P. (2019). A Multichannel 2D Convolutional Neural

Network Model for Task-Evoked fMRI Data Classification. Comput Intell Neurosci, 2019,

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

5065214. doi:10.1155/2019/5065214

Hu, X., Huang, H., Peng, B., Han, J., Liu, N., Lv, J., . . . Liu, T. (2018). Latent source mining in FMRI

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

via restricted Boltzmann machine. Hum Brain Mapp, 39(6), 2368-2380. doi:10.1002/hbm.24005

Huan Liu , M. Z., Xintao Hu , Yudan Ren , Shu Zhang , Junwei Han , Lei Guo , Tianming Liu (2017).

Fmri data classification based on hybrid temporal and spatial sparse representation. Paper

presented at the IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017),

Melbourne, VIC, Australia.

Huang, H., Hu, X., Zhao, Y., Makkie, M., Dong, Q., Zhao, S., . . . Liu, T. (2018). Modeling Task fMRI

Data Via Deep Convolutional Autoencoder. IEEE Trans Med Imaging, 37(7), 1551-1561.

doi:10.1109/TMI.2017.2715285

Huang, X., Xiao, J., & Wu, C. (2021). Design of Deep Learning Model for Task-Evoked fMRI Data

Classification. Comput Intell Neurosci, 2021, 6660866. doi:10.1155/2021/6660866

Jang, H., Plis, S. M., Calhoun, V. D., & Lee, J. H. (2017). Task-specific feature extraction and

classification of fMRI volumes using a deep neural network initialized with a deep belief network:

Evaluation

verwenden

sensorimotor

tasks.

Neurobild,

145(Pt

B),

314-328.

doi:10.1016/j.neuroimage.2016.04.003

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

Kay, K., Rokem, A., Winawer, J., Dougherty, R., & Wandell, B. (2013). GLMdenoise: a fast, automated

technique for denoising task-based fMRI data. Frontiers in neuroscience, 247.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Auswahl. Paper presented at the Ijcai.

Kriegeskorte, N., & Bandettini, P. (2007). Analyzing for information, not activation, to exploit high-

resolution fMRI. Neurobild, 38(4), 649-662.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Natur, 521(7553), 436-444.

doi:10.1038/nature14539

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

Lee, J., Jeong, Y., & Ye, J. C. (2013). Group sparse dictionary learning and inference for resting-state

fMRI analysis of Alzheimer’s disease. Paper presented at the 2013 IEEE 10th International

Symposium on Biomedical Imaging.

Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., . . . Sanchez, C. ICH.

(2017). A survey on deep learning in medical image analysis. Med Image Anal, 42, 60-88.

doi:10.1016/j.media.2017.07.005

Liu, X., Er, P., Chen, W., & Gao, J. (2019). Multi-task deep neural networks for natural language

Verständnis. arXiv preprint arXiv:1901.11504.

Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Natur, 453(7197), 869-

878.

Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., . . . Liu, T. (2015). Sparse representation of whole-

brain fMRI signals for identification of functional networks. Med Image Anal, 20(1), 112-134.

doi:10.1016/j.media.2014.10.011

Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., . . . Liu, T. (2015). Holistic atlases of functional

networks and interactions reveal reciprocal organizational architecture of cortical function. IEEE

Trans Biomed Eng, 62(4), 1120-1131. doi:10.1109/TBME.2014.2369495

McDermott, E. J., Raggam, P., Kirsch, S., Belardinelli, P., Ziemann, U., & Zrenner, C. (2021). Artifacts

in EEG-Based BCI Therapies: Friend or Foe? Sensors (Basel), 22(1). doi:10.3390/s22010096

Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015).

Deep learning applications and challenges in big data analytics. Journal of big data, 2(1), 1-21.

O’Reilly, J. X., Woolrich, M. W., Behrens, T. E., Schmied, S. M., & Johansen-Berg, H. (2012). Tools of the

trade: psychophysiological interactions and functional connectivity. Social cognitive and

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

affective neuroscience, 7(5), 604-609.

Qiang, N., Dong, Q., Zhang, W., Ge, B., Ge, F., Liang, H., . . . Liu, T. (2020). Modeling task-based fMRI

data via deep belief network with neural architecture search. Comput Med Imaging Graph, 83,

101747. doi:10.1016/j.compmedimag.2020.101747

Rashid, M., Singh, H., & Goyal, V. (2020). The use of machine learning and deep learning algorithms

in functional magnetic resonance imaging—a systematic review. Expert Systems, 37(6),

e12644. doi:10-1111

Ren, Y., Xu, S., Tao, Z., Song, L., & Er, X. (2021). Hierarchical Spatio-Temporal Modeling of

Naturalistic Functional Magnetic Resonance Imaging Signals via Two-Stage Deep Belief

Network With Neural Architecture Search.

Front Neurosci,

15,

794955.

doi:10.3389/fnins.2021.794955

Rubin, T. N., Koyejo, O., Gorgolewski, K. J., Jones, M. N., Poldrack, R. A., & Yarkoni, T. (2017).

Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human

Erkenntnis. PLoS Comput Biol, 13(10), e1005649. doi:10.1371/journal.pcbi.1005649

Salimi-Khorshidi, G., Douaud, G., Beckmann, C. F., Glasser, M. F., Griffanti, L., & Schmied, S. M. (2014).

Automatic denoising of functional MRI data: combining independent component analysis and

hierarchical

fusion

von

classifiers.

Neurobild,

90,

449-468.

doi:10.1016/j.neuroimage.2013.11.046

Shu Zhang , X. L., Lei Guo , Tianming Liu. (2017, 18-21 April). Exploring human brain activation via

nested sparse coding and functional operators. Paper presented at the International

Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia.

Song, L., Ren, Y., Hou, Y., Er, X., & Liu, H. (2022). Multitask fMRI Data Classification via Group-Wise

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

752

753

754

755

756

757

Hybrid

Temporal

Und

Spatial

Sparse

Darstellungen.

eNeuro,

9(3).

doi:10.1523/ENEURO.0478-21.2022

Sotetsu Koyamadaa, b., Yumi Shikauchia,B, Ken Nakaea, Masanori Koyamaa, Shin Ishii. (2015). Deep

learning of fMRI big data: a novel approach to subject-transfer decoding. arXiv preprint arXiv.

Stanislas Dehaene, G. L. C. H., Laurent Cohen, Jean-Baptiste Poline, Pierre-François van de Moortele

and Denis Le Bihan. (1998). Inferring behavior from functional brain images.

758

Tibshirani, R. ( 2011). Regression shrinkage and selection via the lasso:

759

a retrospective. Royal Statistical Society, 73, 273-282.

760

761

762

763

764

765

766

767

768

769

770

771

772

773

Varoquaux, G., & Thirion, B. (2014). How machine learning is shaping cognitive neuroimaging.

GigaScience, 3(1), 1-7. doi:10.1186

Vieira, S., Pinaya, W. H., & Mechelli, A. (2017). Using deep learning to investigate the neuroimaging

correlates of psychiatric and neurological disorders: Methods and applications. Neurosci

Biobehav Rev, 74(Pt A), 58-75. doi:10.1016/j.neubiorev.2017.01.002

Wang, X., Liang, X., Jiang, Z., Nguchu, B. A., Zhou, Y., Wang, Y., . . . Qiu, B. (2020). Decoding and

mapping task states of the human brain via deep learning. Hum Brain Mapp, 41(6), 1505-1519.

doi:10.1002/hbm.24891

Wei Zhang, S. Z., Xintao Hu,2, Qinglin Dong,Heng Huang,Shu Zhang, Yu Zhao, Haixing Dai, Fangfei

Ge, Lei Guo and Tianming Liu. (2020). Hierarchical Organization of Functional Brain Networks

Revealed by Hybrid Spatiotemporal Deep Learning. Brain Connectivity, 10.

doi:10.1089/brain.2019.0701

Wen, D., Wei, Z., Zhou, Y., Li, G., Zhang, X., & Han, W. (2018). Deep Learning Methods to Process

fMRI Data and Their Application in the Diagnosis of Cognitive Impairment: A Brief Overview

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

774

775

776

777

778

779

780

781

782

783

784

785

786

and Our Opinion. Front Neuroinform, 12, 23. doi:10.3389/fninf.2018.00023

Xu, S., Ren, Y., Tao, Z., Song, L., & Er, X. (2022). Hierarchical Individual Naturalistic Functional Brain

Networks with Group Consistency uncovered by a Two-Stage NAS-Volumetric Sparse DBN

Framework. eNeuro, 9(5). doi:10.1523/ENEURO.0200-22.2022

Zhang, S., Li, X., Lv, J., Jiang, X., Guo, L., & Liu, T. (2016). Characterizing and differentiating task-

based and resting state fMRI signals via two-stage sparse representations. Brain Imaging

Behav, 10(1), 21-32. doi:10.1007/s11682-015-9359-7

Zhang, Y., Tetrel, L., Thirion, B., & Bellec, P. (2021). Functional annotation of human cognitive states

verwenden

deep

graph

convolution.

Neurobild,

231,

117847.

doi:10.1016/j.neuroimage.2021.117847

Zuo, X. N., Kelly, C., Adelstein, J. S., Klein, D. F., Castellanos, F. X., & Milham, M. P. (2010). Reliable

intrinsic connectivity networks: test-retest evaluation using ICA and dual regression approach.

Neurobild, 49(3), 2163-2177. doi:10.1016/j.neuroimage.2009.10.080

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 1.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 2.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 3.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 4.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 5.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figur

Click here to access/download;Figur;Figure 6.tif

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Author Summary

Decoding different cognitive processes using task-based functional magnetic resonance

Bildgebung (tfMRI) is crucial for understanding the relationship between brain activities and

cognitive states. Jedoch, existing machine learning-based feature extraction methods for

decoding brain states may struggle to capture the complex and precise spatiotemporal patterns

of brain activity from the highly noisy raw fMRI data. Zusätzlich, current deep learning-

based end-to-end decoding models struggle to unveil interpretable components in tfMRI signal

decoding.

To address these limitations, we proposed a novel framework, the hybrid spatio-temporal

deep belief network and sparse representations (DBN-SR) Rahmen, which effectively

distinguished multi-task fMRI signals with an average accuracy of 97.86%. Außerdem, Es

simultaneously identified multi-level temporal and spatial patterns of multiple cognitive tasks.

By utilizing a novel Ratio-of-Activation metric, our framework unveiled interpretable

components with greater classification capacity, offering an effective methodology for basic

neuroscience and clinical research.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2
N
e
N
_
A
_
0
0
3
3
4
2
1
5
6
8
1
3
N
e
N
_
A
_
0
0
3
3
4
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3 Song, L., Ren, Y., Shuhan, X., Hou, Y. & Er, X. (2023). A hybrid spatio-temporal deep belief network and sparse image

PDF Herunterladen