REZENSION - Am MIT spezialisierte KI-Forschung

REZENSION

Disentangling causal webs in the brain using
functional magnetic resonance imaging:
A review of current approaches

Natalia Z. Bielczyk

1,2, Sebo Uithol

1,3, Tim van Mourik

1,2, Paul Anderson 1,4,

Jeffrey C. Glennon1,2, and Jan K. Buitelaar

1,2

1Donders Institute for Brain, Cognition and Behavior, Nijmegen, die Niederlande
2Department of Cognitive Neuroscience, Radboud University Nijmegen Medical Centre, Nijmegen, die Niederlande
3Bernstein Centre for Computational Neuroscience, Charité Universitätsmedizin, Berlin, Deutschland
4Faculty of Science, Radboud University Nijmegen, Nijmegen, die Niederlande

Keine offenen Zugänge

Tagebuch

Schlüsselwörter: Causal inference, Effective connectivity, Functional Magnetic Resonance Imaging,
Dynamic Causal Modeling, Granger Causality, Structural Equation Modeling, Bayesian Nets,
Directed Acyclic Graphs, Pairwise inference, Large-scale brain networks

ABSTRAKT

In the past two decades, functional Magnetic Resonance Imaging (fMRT) has been used
to relate neuronal network activity to cognitive processing and behavior. Recently this
approach has been augmented by algorithms that allow us to infer causal links between
component populations of neuronal networks. Multiple inference procedures have been
proposed to approach this research question but so far, each method has limitations when it
comes to establishing whole-brain connectivity patterns. In diesem Papier, we discuss eight ways
to infer causality in fMRI research: Bayesian Nets, Dynamical Causal Modelling, Granger
Causality, Likelihood Ratios, Linear Non-Gaussian Acyclic Models, Patel’s Tau, Structural
Equation Modelling, and Transfer Entropy. We ﬁnish with formulating some recommendations
for the future directions in this area.

EINFÜHRUNG

What is causality?

Although inferring causal relations is a fundamental aspect of scientiﬁc research, the notion
of causation itself is notoriously difﬁcult to deﬁne. The basic idea is straightforward: Wann
process A is the cause of process B, A is necessarily in the past from B, and without A, B would
not occur. But in practice, and in dynamic systems such as the brain in particular, the picture
is far less clear. Erste, for any event a large number of (Potenzial) causes can be identiﬁed. Der
efﬁcacy of certain neuronal process in producing behavior is dependent on the state of many
andere (neuronal) processes, but also on the availability of glucose and oxygen in the brain, Und
so forth. In a neuroscientiﬁc context, we are generally not interested in most of these causes,
but only in a cause that stands out in such a way that it is deemed to provide a substantial part
of the explanation, for instance causes that vary with the experimental conditions. Jedoch,
the contrast between relevant and irrelevant causes (in terms of explanatory power) is arbitrary
and strongly dependent on experimental setup, contextual factors, und so weiter. Zum Beispiel,
respiratory movement is typically considered a confound in fMRI experiments, unless the re-
search question concerns the inﬂuence of respiration speed on the dynamics of the neuronal
Netzwerke.

Zitat: Bielczyk, N. Z., Uithol, S.,
van Mourik, T., Anderson, P.,
Glennon, J. C., & Buitelaar, J. K. (2019).
Disentangling causal webs in the brain
using functional magnetic resonance
Bildgebung: A review of current
approaches. Netzwerkneurowissenschaften,
3(2), 237–273. https://doi.org/10.1162/
netn_a_00062

DOI:
https://doi.org/10.1162/netn_a_00062

Erhalten: 13 Marsch 2018
Akzeptiert: 08 Juni 2018

Konkurrierende Interessen: Die Autoren haben
erklärte, dass keine konkurrierenden Interessen bestehen
existieren.

Korrespondierender Autor:
Natalia Z. Bielczyk
natalia.bielczyk@gmail.com

Handling-Editor:
Olaf Sporns

Urheberrechte ©: © 2018
Massachusetts Institute of Technology
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International
(CC BY 4.0) Lizenz

Die MIT-Presse

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

In dynamic systems, causal processes are unlikely to be part of a unidirectional chain
of events, but rather a causal web, with often mutual inﬂuences between process A and B
(Mannino & Bressler, 2015). Infolge, it is hard to maintain the temporal ordering of cause
and effect and, In der Tat, a clear separation between them (Schurger & Uithol, 2015).

Außerdem, causation can never be observed directly, just correlation (Hume, 1772).
When a correlation is highly stable, we are inclined to infer a causal link. Additional in-
formation is then needed to assess the direction of the assumed causal link, as correlation
indicates for association and not for causation (Altmann & Krzywi ´nski, 2015). Zum Beispiel, Die
motor cortex is always active when a movement is made, so we assume a causal link between
the two phenomena. The anatomical and physiological properties of the motor cortex, Und
the timing of the two phenomena provide clues about the direction of causality (d.h., cortical
activity causes the movement, and not the other way around). Jedoch, only intervention
Studien, such as delivering Transcranial Magnetic Stimulation (Kim, Pesiridou, & O’Reardon,
2009), pulses over the motor cortex or lesion studies, can conﬁrm the causal link between the
activity in the motor cortex and behavior.

Causal studies in fMRI are based on three types of correlations: correlating neuronal activity
Zu (1) mental and behavioral phenomena, (2) to physiological states (such as neurotransmitters,
hormones, usw.), Und (3) to neuronal activity in other parts of the brain. In this review, we will
focus on the last ﬁeld of research: establishing causal connections between activity in two or
more brain areas.

A Note on the Limitations of fMRI Data

fMRI studies currently use a variety of algorithms to infer causal links (Fornito, Zalesky, &
Breakspear, 2013; S. Smith et al., 2011). All these methods have different assumptions, Anzeige-
vantages and disadvantages (sehen, z.B., Stephan & Roebroeck, 2012; Valdes-Sosa, Roebroeck,
Daunizeau, & Friston, 2011), and approach the problem from different angles. An important
reason for this variety of approaches is the complex nature of fMRI data, which imposes severe
restrictions on the possibility of ﬁnding causal relations using fMRI.

•

Temporal resolution and hemodynamics. Erste, and best known, the temporal resolution
of the image acquisition in MR imaging is generally restricted to a sampling rate < 1[Hz]. Recently, multiband fMRI protocols have gained in popularity (Feinberg & Setsompop, 2013), which increases the upper limit for the scanning frequency to up to 10[Hz], albeit at the cost of a severely decreased signal-to-noise ratio. However, no imaging proto- col (including multiband imaging) can overcome the limitation of the recorded signal itself: the lagged change in blood oxygenation, which peaks 3 to 6[s] after neuronal ﬁring in the adult human brain (Arichi et al., 2012). The hemodynamic response thus acts as a low-pass ﬁlter, which results in high correlations between activity in consec- utive frames (J. D. Ramsey et al., 2010). Since the hemodynamic lags (understood as the peaks of the hemodynamic response) are region- and subject-speciﬁc (Devonshire et al., 2012) and vary over time (Glomb, Ponce-Alvarez, Gilson, Ritter, & Deco, 2017), it is difﬁcult to infer causality between two time series with potentially different hemo- dynamic lags (Bielczyk, Llera, Buitelaar, Glennon, & Beckmann, 2017). Computational work by Seth, Chorley, and Barnett (2013) suggests that upsampling the signal to low repetition times (TRs) (< 0.1[s]) might potentially overcome this issue. Furthermore, hemodynamics typically ﬂuctuates in time. These slow ﬂuctuations, similarly to other low frequency artifacts such as heartbeat or body movements, should be removed from Network Neuroscience 238 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI • • the datasets through high-pass ﬁltering before the inference procedure (J. D. Ramsey, Sanchez-Romero, & Glymour, 2014). Signal-to-noise ratio. Second, fMRI data is characterized by a relatively low signal-to- noise ratio. In gray matter, the recorded hemodynamic response changes by 1 to 2% at ﬁeld strengths of 1.5 − 2.0[T] (Boxerman et al., 1995; Ogawa et al., 1993), and by 5 to 6% at ﬁeld strengths of 4.0[T]. Moreover, typical fMRI protocols generate relatively short time series. For example, the Human Connectome Project resting state datasets (Essen et al., 2013) do not contain more than a few hundred to maximally few thou- sand samples. The two most popular ways of improving on the signal-to-noise ratio in fMRI datasets are averaging signals over multiple voxels (K. J. Friston, Ashburner, Kiebel, Nichols, & Penny, 2007) and spatial smoothing (Triantafyllou, Hoge, & Wald, 2006). Caveats associated with region deﬁnition. Third, in order to propose a causal model, one ﬁrst needs to deﬁne the nodes of the network. A single voxel does not represent a biolog- ically meaningful part of the brain (Stanley et al., 2013). Therefore, before attempting to establish causal connection in the network, one needs to integrate the BOLD time series over regions of interest (ROIs): groups of voxels that are assumed to share a common sig- nal with a neuroscientiﬁc meaning. Choosing the optimal ROIs for a study is a complex problem (Fornito et al., 2013; Kelly et al., 2012; Marrelec & Fransson, 2011; Poldrack, In task-based fMRI, ROIs are 2007; Thirion, Varoquaux, Dohmatob, & Poline, 2014). often chosen on the basis of activation patterns revealed by the standard General Linear Model analysis (K. J. Friston et al., 2007). On the other hand, in research on resting-state brain activity, the analysis is usually exploratory and the connectivity in larger, meso- and macroscale networks is typically considered. In that case, a few strategies for ROI deﬁnition are possible. First, one can deﬁne ROIs on the basis of brain anatomy. However, a consequence of this strategy could be that BOLD activity related to the cognitive process of interest will be mixed with other, unrelated activity within the ROIs. This is particularly likely to happen given that brain structure is not exactly replicable across individuals, so that a speciﬁc area cannot be deﬁned reliably based on location alone. As indicated in the computational study by S. Smith et al. (2011), and also in a recent study by Bielczyk, et al. (2017), such signal mixing is detrimental to causal inference and causes all the existing methods for causal inference in fMRI to underperform. What these studies demonstrate is that parcel- lating into ROIs based on anatomy rather than common activity, can induce additional scale-free background noise in the networks. Since this noise has high power in low fre- quencies, the modeled BOLD response cannot effectively ﬁlter it out. As a consequence, the signatures of different connectivity patterns are getting lost. As an alternative to anatomical parcellation, choosing ROIs can be performed in a functional, data-driven fashion. There are multiple techniques developed to reach this goal, and to list some recent examples: Instantaneous Correlations Parcellation imple- mented through a hierarchical Independent Component Analysis (ICP; van Oort et al., 2017), probabilistic parcellation based on Chinese restaurant process (Janssen, Jylänki, Kessels, & van Gerven, 2015), graph clustering based on intervoxel correlations (van den Heuvel, Mandl, & Pol, 2008), large-scale network identiﬁcation through comparison be- tween correlations among ROIs versus a model of the correlations generated by the noise (LSNI; Bellec et al., 2006), multi-level bootstrap analysis (Bellec, Rosa-Neto, Lyttelton, Benali, & Evans, 2010), clustering of voxels revealing common causal patterns in terms of Granger Causality (DSouza, Abidin, Leistritz, & Wismüller, 2017), spatially constrained hierarchical clustering (Blumensath et al., 2013) and algorithms providing a trade-off between machine learning techniques and knowledge coming from neuroanatomy Causal inference: Inferring direct causal effects within a given network based on available empirical data, e.g., BOLD fMRI recordings in the nodes of the network. Network Neuroscience 239 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI (Glasser et al., 2016). Another possibility to reduce the effect of mixing signals is to perform Principal Component Analysis (PCA; Jolliffe, 2002; Shlens, 2014), separate the BOLD time series within each anatomical region into a sum of orthogonal signals (eigen- variates) and choose only the signal with the highest contribution to the BOLD signal (the ﬁrst eigenvariate; K. J. Friston, Harrison, & Penny, 2003), instead of averaging activ- ity over full anatomical regions. Finally, one can build ROIs on the basis of patterns of activation only (task localizers; Fedorenko, Hsieh, Nieto-Castañón, Whitﬁeld-Gabrieli, & Kanwisher, 2010; Heinzle, Wenzel, & Haynes, 2012). However, this approach can- not be applied to resting-state research. In this work, we assume that the deﬁnition of ROIs has been performed by the researcher prior to the causal inference, and we do not discuss it any further. Criteria for Evaluating Methods for Causal Inference in Functional Magnetic Resonance Imaging Given the aforementioned characteristics of fMRI data (low temporal resolution, slow hemo- dynamics, low signal-to-noise ratio) and the fact that causal webs in the brain are likely dense and dynamic, is it in principle possible to investigate causality in the brain by using fMRI? Multiple distinct families of models have been developed in order to approach this problem over the past two decades. One can look at the methods from different angles and classify them into different categories. One important distinction proposed by K. Friston, Moran, and Seth (2013), includes division of methods with respect to the depth of the neuroimaging measurements at which a method is deﬁned. Most methods (such as the original formulation of Structural Equation Modeling for fMRI (Mclntosh & Gonzalez-Lima, 1994) see section Structural Equation Modeling) oper- ate on the experimental observables, that is, the measured BOLD responses. These methods are referred to as directed functional connectivity measures. On the contrary, other methods (e.g., Dynamic Causal Modeling) consider the underlying neuronal processes. These meth- ods are referred to as effective connectivity measures. Mind that while some methods such as Dynamic Causal Modeling are hardwired to assess effective connectivity (as they are built upon a generative model), other methods can be used both as a method to assess directed functional connectivity or effective connectivity. For example, in Granger Causality research, a blind deconvolution is often used in order to deconvolve the observed BOLD responses into an underlying neuronal time series (David et al., 2008; Goodyear et al., 2016; Hutcheson et al., 2015; Ryali et al., 2016; Ryali, Supekar, Chen, & Menon, 2011; Sathian, Deshpande, & Stilla, 2013; Wheelock et al., 2014), which allows for assessing effective connectivity. On the contrary, when Granger Causality is used without deconvolution (Y. C. Chen et al., 2017; Regner et al., 2016; Zhao et al., 2016), it is a directed functional connectivity method. Of course, both scenarios have pros and cons, as blind deconvolution can be a very noisy oper- ation (Bush et al., 2015), and for more details, please see K. Friston, Moran and Seth (2013). Another important distinction was proposed by Valdes-Sosa et al. (2011). According to this point of view, methods can be divided on the basis of the approach toward temporal sequence of the samples: some of the methods are based on the temporal sequence of the signals (e.g., Transfer Entropy (Schreiber, 2000), see section Transfer Entropy, or Granger Causality, (Granger, 1969), see section Granger Causality), or rely on the dynamics expressed by state-space equa- tions (so-called state-space models, e.g., Dynamic Causal Modeling), while other methods do not draw information from the sequence in time, and solely focus on the statistical properties of the time series (so-called structural models, e.g., Bayesian Nets (Frey & Jojic, 2005), see section Bayesian Nets). Directed functional connectivity: Causal relations between nodes of investigated network, derived from experimental observables, e.g., measured BOLD responses. Effective connectivity: Causal relations between nodes of investigated network, derived from a model that additionally considers the underlying neuronal processes. Generative model: A model representing prior knowledge of how underlying causal structures are manifested in the experimental datasets. Network Neuroscience 240 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI In this work, we would like to propose another classiﬁcation of methods for causal inference in fMRI. First, we identify nine characteristics of models used to study causality. Then, we compare and contrast the popular approaches to the causal research in fMRI according to these criteria. Our list of features of causality is as follows: 1. Sign of connections: Can the method distinguish between excitatory and inhibitory causal relations? In this context, we do not mean synaptic effects, but rather an overall driving or attenuating impact of the activity in one brain region on the activity in another region. Certain methods only detect the existence of causal inﬂuence from the BOLD responses, whereas others can distinguish between these distinct forms of inﬂuence. 2. Strength of connections: Can the method distinguish between weak and strong con- nections, apart from indicating the directionality of connections at a certain conﬁdence level? 3. Conﬁdence intervals: How are the conﬁdence intervals for the connections determined? 4. Bidirectionality: Can the method pick up bidirectional connections X (cid:2) Y, or only indicate the strongest of the two connections X → Y and Y → X? Some methods do not allow for bidirectional relations, since they cannot deal with cycles in the network. Immediacy: Does the method speciﬁcally identify direct inﬂuences X → Y, or does it → Y? We assume that Zi represent pool across direct and indirect inﬂuences Zi: X → Zi nodes in the network, and the activity in these nodes is measured (otherwise Zi become a latent confounder). While some methods aim to make this distinction, others highlight any inﬂuence X → Y, whenever it is direct or not. 5. 6. Resilience to confounds: Does the method correct for possible spurious causal effects from a common source (Z → X, Z → Y, so we infer X → Y and/or Y → X), or other confounders? In general, confounding variables are an issue to all the methods for causal inference, especially when a given study is noninterventional (Rohrer, 2017); however, different methods can suffer from these issues to a different extent. 7. Type of inference: Does the method probe causality through classical hypothesis testing or through model comparison? Hypothesis-based methods will test a null hypothesis H0 that there is no causal link between two variables, against a hypothesis H1 that there is causal link between the two. In contrast, model comparison based methods do not have an explicit null hypothesis. Instead, evidence for a predeﬁned set of models is computed. In particular cases, when the investigated network contains only a few nodes and the estimation procedure is computationally cheap, a search through all the con- nectivity patterns by means of model comparison is possible. In all the other cases, prior knowledge is necessary to select a subset of possible models for model comparison. 8. Computational cost: What is the computational complexity of the inference procedure? In the case of model comparison, the computational cost refers to the cost of ﬁnding the likelihood of a single model, as the range of possible models depends on the research question. This can lead to practical limitations based on computing power. 9. Size of the network: What sizes of network does the method allow for? Some methods are restricted in the number of nodes that it allows, for computational or interpretational reasons. In certain applications, an additional criterion of empirical accuracy in realistic simula- tion could be of help to evaluate the method. Testing the method on synthetic, ground truth datasets available for the research problem at hand can give a good picture on whether or not the method gives reliable results when applied to experimental datasets. In fMRI research, multiple methods for causal inference were directly compared with each other in a seminal Confounder: A node that projects information to two other nodes in the network, causing a spurious causal association between them. A con founder can be latent in the experiment. Classical hypothesis testing: Testing whether a given hypothesis is plausible in the light of available data. This approach requires the assumption of a null distribution, i.e., the distribution of the values for that variable if the hypothesis is not true. Model comparison: Causal inference in which one model is selected from a set of candidate models representing potential causal structures in the network on the basis of experimental evidence. Network Neuroscience 241 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI simulation study by Smith et al. In this study, the authors employed a Dynamic Causal Model- ing generative model (DCM; K. J. Friston et al., 2003), introduced in section Dynamic Causal Modeling in order to create synthetic datasets with a known ground truth. Surprisingly, most of the methods struggled to perform above chance level, even though the test networks were sparse and the noise levels introduced to the model were low compared with what one would expect in real recordings. In this manuscript, we will refer to this study throughout the text. However, we will not list empirical accuracy as a separate criterion, for two reasons. First, some of the methods reviewed here, for example, Structural Equation Modeling (SEM; Mclntosh & Gonzalez-Lima, 1994), were not tested on the synthetic benchmark datasets. Second, the most popular method in the ﬁeld, DCM (K. J. Friston et al., 2003), builds on the same genera- tive model that is used for comparing methods to each other in Smith’s study. Therefore, it is hard to perform a fair comparison between DCM and other methods in the ﬁeld by using this generative model. In the following chapters, the references to this “causality list” will be marked in the text with subscripted indices that refer to 1–9 above. With respect to assumptions made on the connectivity structure, the approaches discussed here can be divided into three main groups (Figure 1). The ﬁrst group comprises multivariate methods that search for directed graphs without imposing any particular structure onto the graph: GC (Seth, Barrett, & Barnett, 2015), Transfer Entropy (TE; Marrelec et al., 2006), SEM (Mclntosh & Gonzalez-Lima, 1994) and DCM (K. J. Friston et al., 2003). These methods will be referred to as network-wise models throughout the manuscript. The second group of methods is also multivariate, but requires an additional assumption of acyclicity. Models in this group assume that information travels through the brain by feed-forward projections only. As a result, the network can always be represented by a Directed Acyclic Graph (DAG; Thulasiraman & Swamy, 1992). Methods in this group include Linear Non-Gaussian Acyclic Models (LiNGAM; Directed Acyclic Graph (DAG): A graph structure with no closed loops (i.e., between each pair of nodes X and Y, there is at most one path to cross the graph from X to Y). This property imposes a structural hierarchy on the network. Figure 1. Causal research in fMRI. The discussed methods can be divided into two families: Net- work Inference Methods, which are based on a one-step multivariate procedure, and Pairwise Infer- ence Methods, which are based on a two-step pairwise inference procedures. As pairwise methods by deﬁnition establish causal connections on a connection-by-connection basis, they do not require any assumptions on the structure of the network, but also do not reveal the structure of the network. Network Neuroscience 242 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Shimizu, Hoyer, Hyvärinen, & Kerminen, 2006) and Bayesian Nets (BNs; Mumford & Ramsey, 2014), and will be referred to as hierarchical network-wise models throughout the manuscript. The last group of methods, referred to as pairwise methods, use a two-stage procedure: ﬁrst, a map of nondirectional functional connections is rendered; and second, the directionality in each connection is assessed. Since these methods focus on pairwise connections rather than complete network architectures, they by deﬁnition do not impose network assumptions like acyclicity. Patel’s tau (PT; Patel, Bowman, & Rilling, 2006) and Pairwise Likelihood Ratios (PW-LR; Hyvärinen & Smith, 2013) are members of this group. In this review, we do not include studying a coupling between brain region and the rest of the brain with relation to a particular cognitive task, The Psycho-Physiological Interactions (PPIs; K. J. Friston et al., 1997), as we are only focused on the methods for assessing causal links within brain networks, and we do not include brain-behavior causal interactions. NETWORK-WISE METHODS The ﬁrst group of models that we discuss in this review involves multivariate methods: meth- ods that simultaneously assess all causal links in the network—speciﬁcally, GC (Granger, 1969), TE (Schreiber, 2000), SEM (Wright, 1920) and DCM (K. J. Friston et al., 2003). These methods do not pose any constraints on the connectivity structure. GC, TE, and SEM infer causal structures through classical hypothesis testing. As there are no limits to the size of the analyzed network, these methods allow for (relatively) hypothesis-free discovery. DCM on the other hand, compares a number of predeﬁned causal structures in networks of only a few nodes. As such, it requires a speciﬁc hypothesis based on prior knowledge. Granger Causality Clive Granger introduced Granger Causality (GC) in the ﬁeld of economics (Granger, 1969). GC has found its way into many other disciplines, including fMRI research (Bressler & Seth, 2011; Roebroeck, Seth, & Valdes-Sosa, 2011; Seth et al., 2015; Solo, 2016). GC is based on prediction (Diebold, 2001): the signal in a certain region is dependent on its past values. Therefore, a time series Y(t) at time point t can be partly predicted by its past values Y(t − i). A signal in an upstream region is followed by the same signal in a downstream region with a certain temporal lag. Therefore, if prediction of Y(t) improves when past values of another signal X(t − i) are taken into account, X is said to Granger-cause Y. Time series X(t) and Y(t) can be multivariate, therefore they will be further referred to as (cid:2)X(t), (cid:2)Y(t). Y(t) is represented as an autoregressive process: it is predicted by a linear combination of its past states and a Gaussian noise (there is also an equivalent of GC in the frequency domain, spectral GC [Geweke, 1982, 1984], but this method will not be covered in this review). This model is compared with model including the past values of X(t): H0 : (cid:2)Y(t) = N ∑ i=1 (cid:2)Y(t − i) +(cid:2)σ(t) Byi H1 : (cid:2)Y(t) = N ∑ i=1 (cid:2)Y(t − i) + Byi N ∑ i=1 Bxi (cid:2)X(t − i) +(cid:2)σ(t) (1) (2) where σ(t) denotes noise (or rather, the portion of the signal not explained by the model). Theoretically, this autoregressive (AR) model can take any order N (which can be optimized using, e.g., Bayesian Information Criterion; Schwarz, 1978), but in fMRI research it is usually set to N = 1 (Seth et al., 2015), that is, a lag that is equal to the TR. Network Neuroscience 243 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI By ﬁtting the parameters of the AR model, which include the inﬂuence magnitudes Byi, Bxi, the sign1 as well as the strength2 of the causal direction can be readily assessed with GC. The signiﬁcance of the results is evaluated by comparing variance of the noise obtained from mod- els characterized by Equation 1 and Equation 2. This can be achieved either by means of F tests or by permutation testing3. Like all the methods in this chapter, GC does not impose any constraints on the network architecture and therefore can yield bidirectional connections4. As a multivariate method, GC ﬁts the whole connectivity structure at once. Therefore, ideally, it indicates the direct causal connections only5, whereas the indirect connections should be captured only through higher order paths in the graph revealed in the GC analysis. However, this is not enforced directly by the method. Furthermore, in the original formulation of the problem by Granger, GC between X and Y works based on the assumption that the input of all the other variables in the environment potentially inﬂuencing X and Y has been removed (Granger, 1969). In theory, this would provide resilience to confounds6. However, in reality this assumption is most often not valid in fMRI (Grosse-Wentrup, 2014b). In a result, direct and indirect causality between X and Y are in fact pooled. In terms of the inference type, one can look at GC in two ways. On the one hand, GC is a model comparison technique, since the inference procedure is, in principle, based on a comparison between two models expressed by Equations 1 and 2. On the other hand, the difference between GC and other model comparison techniques lies in the fact that GC does not optimize any cost function, but uses F tests or permutation testing instead, and it can therefore also be interpreted as a method for classic hypothesis testing7. Since the temporal resolution of fMRI is so low, typically ﬁrst order AR models with a time lag equal to 1 TR are used for the inference in fMRI. Therefore, there is no need to optimize either the temporal lag or the model order, and as such the computational cost of GC estimation procedure in fMRI is low8. One constraint though, is that the AR model imposes a mathematical restriction on the size of the network: the number of regions divided by the number of shifts can never exceed the number of time points (degrees of freedom). GC is used in fMRI research in two forms: as mentioned in section Criteria for Evaluat- ing Methods for Causal Inference in Functional Magnetic Resonance Imaging, GC can be either applied to the observed BOLD responses (Y. C. Chen et al., 2017; Regner et al., 2016; Zhao et al., 2016), or to the BOLD responses deconvolved into neuronal time series (David et al., 2008; Goodyear et al., 2016; Hutcheson et al., 2015; Ryali et al., 2016, 2011; Sathian et al., 2013; Wheelock et al., 2014). The purpose of deconvolution is to model fMRI data more faithfully. However, estimating the hemodynamic response from the data—a necessity to perform this deconvolution—adds uncertainty to the results. The applicability of GC to fMRI data has been heavily debated (Stokes & Purdon, 2017). Firstly, the application of GC requires certain additional assumptions such as signal station- arity (stationarity means that the joint probability distribution in the signal does not change over time. This also implies that mean, variance and other moments of the distribution of the samples in the signal do not change over time), which does not always hold in fMRI data. The- oretical work by Seth et al. (2013), and work by Roebroeck, Formisano, and Goebel (2005), suggest that despite the limitations related to slow hemodynamics, GC is still informative about the directionality of causal links in the brain (Seth et al., 2015). In the study by S. Smith et al. (2011), several versions of GC implementation were tested. However, all versions of GC were characterized by a low sensitivity to false positives and low overall accuracy in the direction- ality estimation. The face validity of GC analysis was empirically validated using joint fMRI and magnetoencephalography recordings (Mill, Bagic, Bostan, Schneider, & Cole, 2017), with the causal links inferred with GC matching the ground truth conﬁrmed by MEG. On the other hand, experimental ﬁndings report that GC predominantly identiﬁes major arteries and veins Network Neuroscience 244 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI as causal hubs (Webb, Ferguson, Nielsen, & Anderson, 2013). This result can be associated with a regular pulsating behavior with different phases in the arteries across the brain. This is a well-known effect and is even explicitly targeted with physiological noise estimates such as RETROICOR (Glover, Li, & Ress, 2000). Another point of concern is the time lag in fMRI data, which restricts the possible scope of AR models that can be ﬁt in the GC procedure. Successful implementations of GC in EEG/MEG research typically involve lags of less than 100 ms (Hesse, Möller, Arnold, & Schack, 2003). In contrast, for fMRI the minimal lag is one full TR, which is typically between 0.7[s] and 3.0[s] (although new acceleration protocols allow for further reduction of TR). What is more, the hemodynamic response function (HRF) may well vary across regions (David et al., 2008; Handwerker, Ollinger, & D’Esposito, 2004), revealing spurious causal connections: when the HRF in one region is faster than in another, the temporal precedence of the peak will easily be mistaken for causation. The estimated directionality can in the worst case, even be reversed, when the region with the slower HRF in fact causes the region with the faster HRF (Bielczyk, et al., 2017). Furthermore, the BOLD signal might be noninvertible into the neu- ronal time series (Seth et al., 2015), which can affect GC analysis regardless of whether it is performed on the BOLD time series or the deconvolved signal. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Transfer Entropy Transfer Entropy (TE; Schreiber, 2000) is another data-driven technique, equivalent to Granger Causality under Gaussian assumptions (Barnett, Barrett, & Seth, 2009), and asymptotically equiv- alent to GC for general Markovian (nonlinear, non-Gaussian) systems (Barnett & Bossomaier, 2012a). In other words, TE is a nonparametric form of GC (or, GC is a parametric form of TE). It was originally deﬁned for pairwise analysis and later extended to multivariate analysis (J. Lizier, Prokopenko, & Zomaya, 2008; Montalto, Faes, & Marinazzo, 2014). TE is based on the concept of Shannon entropy (Shannon, 1948). Shannon entropy H(x) quantiﬁes the information contained in a signal of unknown spectral properties as the amount of uncertainty, or unpredictability. For example, a binary signal that only gets values of 0 with a probability p, and values of 1 with a probability 1 − p, is most unpredictable when p = 0.5. This is because there is always exactly a 50% chance of correctly predicting the next sample. Therefore, being informed about the next sample in a binary signal of p = 0.5 reduces the amount of uncertainty to a higher extent than being informed about the next sample in a binary signal of, say, p = 0.75. This can be interpreted as a larger amount of information contained in the ﬁrst signal as compared with the latter. The formula which quantiﬁes the information content according to this rule reads as follows: H(X) = − ∑ i P(xi)log2P(xi) (3) where xi denotes the possible values in the signal (for the binarized signal, there are only two possible values: 0 and 1). TE builds up on the concept of Shannon entropy by extension to conditional Shannon entropy: it describes the amount of uncertainty reduced in future values of Y by knowing the past values of X along with the past values of Y: TEX→Y = H(Y|Yt−τ) − H(Y|Xt−τ, Yt−τ) where τ denotes the time lag. (4) 245 Network Neuroscience Disentangling causal webs in the brain using fMRI In theory, TE requires no assumptions about the properties of the data, not even signal stationarity. However, in most real-world applications, stationarity is required to almost the same extent as in GC. Certain solutions for TE in nonstationary processes are also avail- able (Wollstadt, Martinez-Zarzuela, Vicente, Diaz-Pernas, & Wibral, 2014). TE does need an a priori deﬁnition of the causal process, and it may work for both linear and nonlinear inter- actions between the nodes. TE can distinguish the signum of connections1, as the drop in the Shannon entropy can be both positive and negative. Furthermore, the absolute value of the drop in the Shannon entropy can provide a measure of the connection strength2. TE can also distinguish bidirectional con- nections, as in this case, both TEX→Y and TEY→X will be nonzero4. In TE, signiﬁcance testing by means of permutation testing is advised (Vicente, Wibral, Lindner, & Pipa, 2011)3. Imme- diacy and resilience to confounds in TE is the same as in GC: multivariate TE represents direct interactions, and becomes resilient to confounds only when deﬁned for an isolated system. The inference in TE is performed through classical hypothesis testing 7 and is highly cost-efﬁcient 8. As in GC, the maximum number of regions in the network divided by the number of shifts can never exceed the number of time points (degrees of freedom) 9. TE is a straightforward and computationally cheap method (Vicente et al., 2011). However, TE did not perform well when applied to synthetic fMRI benchmark datasets (S. Smith et al., 2011). One reason for this could be the time lag embedded in the inference procedure, which poses an obstacle to TE in fMRI research for the same reasons as to GC: it requires at least one full TR. TE is nevertheless gaining interest in the ﬁeld of fMRI (Chai, Walther, Beck, & Fei-Fei, 2009; J. T. Lizier, Heinzle, Horstmann, Haynes, & Prokopenko, 2011; Montalto et al., 2014; Ostwald & Bagshaw, 2011; Sharaev, Ushakov, & Velichkovsky, 2016). Structural Equation Modeling Structural Equation Modeling (SEM; Mclntosh & Gonzalez-Lima, 1994) is a simpliﬁed version of GC and can be considered a predecessor to DCM (K. J. Friston et al., 2003). This method was originally applied to a few disciplines: economics, psychology and genetics (Wright, 1920), and was only recently adapted for fMRI research (Mclntosh & Gonzalez-Lima, 1994). SEM is used to study effective connectivity in cognitive paradigms, for example, on motor coordi- nation (Kiyama, Kunimi, Iidaka, & Nakai, 2014; Zhuang, LaConte, Peltier, Zhang, & Hu, 2005), as well as in search for biomarkers of psychiatric disorders (Carballedo et al., 2011; R. Schlösser et al., 2003). It was also used for investigating heritability of large-scale resting-state connec- tivity patterns (Carballedo et al., 2011). The idea behind SEM is to express every ROI time series in a network by a linear com- bination of all the time series (with the addition of noise), which implies no time lag in the communication. These signals are combined in a mixing matrix B: (cid:2)X(t) = B(cid:2)X(t) +(cid:2)σ(t) (5) where (cid:2)σ denotes the noise, and the assumption is that each univariate component Xi(t) is a mixture of the remaining components Xj(t), j (cid:3)= i. This is a simple multivariate regression equation. The most common strategy for ﬁtting this model is a search for the regression coefﬁ- cients that correspond to the maximum likelihood (ML) solution: a set of model parameters B that give the highest probability of the observed data (Anderson & Gerbing, 1988; Mclntosh & Gonzalez-Lima, 1994). Assuming that variables Xi are normally distributed, the ML function Network Neuroscience 246 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI can be computed and optimized. This function is dependent on the observed covariance between variables, as well as a concept of a so-called implied covariance; for the details, see Bollen (1989), and for a practical example of SEM inference, see Ferron and Hess (2007). Furthermore, under the assumption of normality of the noise, there is a closed-form solution to this problem which gives the ML solution for parameters B, known as Ordinary Least Squares (OLS) approximation (Bentler, 1985; Hayashi, 2000). In SEM applications to fMRI datasets, it is a common practice to establish the presence of connections with use of anatomical information derived, for example, from Diffusion Tensor Imaging (Protzner & McIntosh, 2006). In that case, SEM inference focuses on estimating the strength of causal effects and not on identifying the causal structure. SEM does not constrain the weight of connections, therefore it can retrieve both excitatory and inhibitory connections1 as well as bidirectional connections4. The connection coefﬁ- cients Bij can take any values of rational numbers and as such they can reﬂect the strength of the connections 2. Since OLS gives a point estimate for β, it does not provide a measure of conﬁdence that would determine whether the obtained β is signiﬁcantly different from zero. This issue can be overcome in multiple ways. First, one can perform parametric tests, for exam- ple, a t test. Second, one can obtain conﬁdence intervals through nonparametric permutation testing (generate a null distribution of B values by the repeated shufﬂing of node labels across subjects and creating surrogate subjects). Third, one can perform causal inference through model comparison: various models are ﬁtted one by one, and the variance of the residual noise resulting from different model ﬁts is compared, using either an F test, or a goodness of ﬁt (Zhuang et al., 2005). Highly optimized software packages such as LiSREL (Joreskög & Thillo, 1972) allow for an exploratory analysis with SEM by comparing millions of models against each other (James et al., 2009). Last, one can ﬁt the B matrix with new methods including reg- ularization that enforces sparsity of the solution (Jacobucci, Grimm, & McArdle, 2016), and therefore eliminates weak and noise-induced connections from the connectivity matrix3. As with GC, SEM was designed to reﬂect direct connections5: if regions Xi and Xj are connected only through a polysynaptic causal web, Bij should come out as zero, and the polysynaptic connection should be retrievable from the path analysis. Again, similar to GC, SEM is resilient to confounds only under the assumption that the model represents an isolated system, and all the relevant variables present in the environment are taken into account6. Moreover, in order to obtain the ML solution for B parameters, one needs to make a range of assumptions on the properties of the noise in the network. Typically, a Gaussian white noise is assumed, although background noise in the brain is most probably scale-free (He, 2014). Inference can be per- formed either through the classical hypothesis testing (as the computationally cheap version) or through model comparison (as the computationally heavier version) 7,8. In summary, SEM is a straightforward approach: it simpliﬁes the causal inference by reduc- ing the complex network with a low-pass ﬁlter at the output to a very simple linear system, but this simplicity comes at the cost of a number of assumptions. In the ﬁrst decade of fMRI research, SEM was often a method of choice (R. G. M. Schlösser et al., 2008; Zhuang, Peltier, He, LaConte, & Hu, 2008) however recently, using DCM has become more popular in the ﬁeld. One recently published approach in this domain, by Schwab et al. (2018), extends lin- ear models by introducing time-varying connectivity coefﬁcients, which allows for tracking the dynamics of causal interactions over time. In this approach, linear regression is applied to each node in the network separately (in order to ﬁnd causal inﬂuence of all the remaining nodes in the network on that node). The whole graph is then composed from node-speciﬁc DAGs node by node, and that compound graph can be cyclic. Network Neuroscience 247 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Dynamic Causal Modeling All the aforementioned network-wise methods were developed in other disciplines, and only later applied to fMRI data. Yet, using prior knowledge about the properties of fMRI datasets can prove useful when searching for causal interactions. Dynamic Causal Modeling (DCM; K. J. Friston et al., 2003) is a model comparison tool that uses state space equations reﬂecting the structure of fMRI datasets. This technique was also implemented for other neural recording methods: EEG and MEG (Kiebel, Garrido, Moran, & Friston, 2008). DCM is well received within the neuroimaging community (the original article by K. J. Friston et al. gained over 3,300 citations at the time of publishing this manuscript). In this work, we describe the original work by (K. J. Friston et al., 2003) because, des- pite multiple recent developments (Daunizeau, Stephan, & Friston, 2012; Frässle, Lomakina, Razi, Friston, Buhmann, & Stephan, 2017; Frässle, Lomakina-Rumyantseva, Razi, Buhmann, & Friston, 2016; K. J. Friston, Kahan, Biswal, & Razi, 2011; Havlicek et al., 2015; Kiebel, Kloppel, Weiskopf, & Friston, 2007; Li et al., 2011; Marreiros, Kiebel, & Friston, 2008; Prando, Zorzi, Bertoldo, & Chiuso, 2017; Razi & Friston, 2016; Seghier & Friston, 2013; Stephan et al., 2008; Stephan, Weiskopf, Drysdale, Robinson, & Friston, 2007), it remains the most popular version of DCM in the fMRI community. The idea of DCM is as follows. First, one needs to build a generative model (Figure 2). This model has two levels of description: the neuronal level (Figure 2, iii), and the hemodynamic level (Figure 2, v). Both of these levels contain parameters that are not directly recorded in the experiment and need to be inferred from the data. This model reﬂects scientiﬁc evidence on how the BOLD response is generated from neuronal activity. At the neuronal level of the DCM generative model, simple interactions between brain areas are posited, either bilinear (K. J. Friston et al., 2003) or nonlinear (Stephan et al., 2008). In the simplest, bilinear version of the model, the bilinear state equation reads: ˙z = (A + ∑ j ujBj)z + Cu (6) Figure 2. The full pipeline for the DCM forward model. The model involves three node network stimulated during the cognitive experiment (i). The parameter set describing the dynamics in this network includes a ﬁxed connectivity matrix (A), modulatory connections (B), and inputs to the nodes (C) (ii). In the equation describing the fast neuronal dynamics, z denotes the dynamics in the nodes, and u is an experiment-related input. Red: excitatory connections. Blue: inhibitory connec- tions. The dynamics in this network can be described with use of ordinary differential equations. The outcome is the fast neuronal dynamics (iii). The neuronal time series is then convolved with the hemodynamic response function (HRF) (iv) in order to obtain the BOLD response (v), which may be then subsampled (vertical bars). This is the original, bilinear implementation of DCM (K. J. Friston et al., 2003). Now, more complex versions of DCM with additional features are available, such as spectral DCM (K. J. Friston et al., 2011), stochastic DCM (Daunizeau et al., 2012), nonlinear DCM (Stephan et al., 2008), two-state DCM (Marreiros et al., 2008), large DCMs (Frässle et al., 2018; Frässle, Lomakina-Rumyantseva, et al., 2016; Seghier & Friston, 2013) and so on. Network Neuroscience 248 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI where z denotes the dynamics in the nodes of the network, u denotes the experimental in- puts, A denotes the connectivity matrix characterizing causal interactions between the nodes of the network, B denotes the modulatory inﬂuence of experimental inputs on the connec- tions within the network, and C denotes the experimental inputs to the nodes of the network (Figure 2). The hemodynamic level is more complex and follows the biologically informed Balloon-Windkessel model (Buxton, Wong, & Frank, 1998); for details please see K. J. Friston et al. (2003). The Balloon–Windkessel model (Buxton et al., 1998) describes the BOLD sig- nal observed in fMRI experiments as a function of neuronal activity but also region-speciﬁc and subject-speciﬁc physiological features such as the time constant of signal decay, the rate of ﬂow-dependent elimination, and the hemodynamic transit time or resting oxygen fraction. This is a weakly nonlinear model with free parameters estimated for each brain region. These parameters determine the shape of the hemodynamic response (Figure 2, iv), which typically peaks at 4 − 6[s] after the neuronal activity takes place, to match the lagged oxygen consump- tion in the neuronal tissue mentioned in section A Note on the Limitation of fMRI Data. The Balloon–Windkessel model is being iteratively updated based on new experimental ﬁndings, for instance to mimic adaptive decreases to sustained inputs during stimulation or the post- stimulus undershoot (Havlicek et al., 2015). In this paper, the deterministic, bilinear single-state per region DCM will be described (K. J. Friston et al., 2003). The DCM procedure starts with deﬁning hypotheses based on observed activations, which involves deﬁning which regions are included in the network (usually on the basis of activations found through the General Linear Model (K. J. Friston et al., 2007) and then deﬁning a model space based on the research hypotheses. In the latter model selection phase, a range of literature-informed connectivity patterns and inputs in the networks (referred to as “models”) are posited (Figure 2, i). The deﬁnition of a model space is the key to the DCM analy- sis. The models should be considered carefully in the light of the existing literature. The model space represents the formulation of a prior over models, therefore, it should always be con- structed prior to the DCM analysis. Subsequently, for every model one needs to set priors on the parameters of interest: connectivity strengths and input weights in the model (Figure 2, ii) and the hemodynamic parameters. The priors for hemodynamic parameters are experimentally informed Gaussian distributions (K. J. Friston et al., 2003). The priors for connectivity strengths are Gaussian probability distributions centered at zero (which is often referred to as conserva- tive shrinkage priors). The user usually does not need to specify the priors, as they are already implemented in the DCM algorithms. Next, an iterative procedure is used to ﬁnd the model evidence by maximizing a cost func- tion, a so-called negative free energy (K. J. Friston & Stephan, 2007). Negative free energy is a particular cost function which gives a trade-off between model accuracy and complex- ity (which accounts for correlations between parameters, and for moving away from the prior distributions). During the iterative procedure, the prior probability distributions gradually shift their mean and standard deviation, and converge toward the ﬁnal posterior distributions. Neg- ative free energy is a more sophisticated approximation of the model evidence when compared to methods such as Akaike’s Information Criterion (AIC; Akaike, 1998) or Bayesian Informa- tion Criterion (BIC; Schwarz, 1978); AIC and BIC simply count the number of free parameters (thereby assuming that all parameters are independent), while negative free energy also takes the covariance of the parameters into account (W. D. Penny, 2012). In DCM, causality is modeled as a set of upregulating or downregulating connections be- tween nodes. During the inference procedure, conservative shrinkage priors can shift towards both positive and negative values, which can be interpreted as effective excitation or effec- tive inhibition. The exceptions aren self-connections, which are always only negative (this Network Neuroscience 249 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI self inhibition is mathematically motivated: the system characterizing the fast dynamics of the neuronal network must be stable, and this requires the diagonal terms of the adjacency matrix A to be negative), Figure 2, ii, connections denoted in blue1. During the inference procedure, the neural and hemodynamic parameters of all models postulated for model com- parison are optimized2. The posterior probability distributions determine signiﬁcance of all the parameters3. The models can contain both uni- and bidirectional connections (Buijink et al., 2015; Vaudano et al., 2013)4. The estimated model evidence can then be compared7. As such, the original DCM (K. J. Friston et al., 2003) is a hypothesis-testing tool working only through model comparison. However, now a linear version of DCM dedicated to exploratory research in large networks is also available (Frässle, Lomakina-Rumyantseva, et al., 2016). Testing the immediacy5 and resilience to confounds6 in DCM is possible through creating separate models and comparing their evidence. For instance, one can compare the evidence for X → Y with ev- idence for X → Z → Y in order to test whether or not the connection X → Y is direct or rather mediated by another region Z. Note that this strategy requires an explicit speciﬁcation of the alternative models and it cannot take hidden causes into consideration (in this work, we refer to the original DCM implementation [K. J. Friston et al., 2003], but there are also implementa- tions of DCM involving estimation of time-varying hidden states, such as Daunizeau, Friston, & Kiebel, 2009). However, including extra regions in order to increase resilience to confounds is not necessarily a good idea. Considering the potentially large number of ﬁtted parameters per region (the minimum number of nodes per region is two hemodynamic parameters and one input/output to connect to the rest of network), this may result in a combinatorial explosion. Also, models with different nodes are not comparable in DCM for fMRI (K. J. Friston et al., 2003). DCM is, in general, computationally costly. The original DCM (K. J. Friston et al., 2003) is restricted to small networks of a few nodes9 (as mentioned previously, today, large DCMs dedicated to exploratory research in large networks are also available; Frässle, Lomakina- Rumyantseva, et al., 2016; Seghier & Friston, 2013). The proper application of DCM needs a substantial amount of expertise (Daunizeau, David, & Stephan, 2011; Stephan et al., 2010). Even though ROIs can be deﬁned in a data-driven fash- ion (through a preliminary classical General Linear Model analysis; K. J. Friston et al., 1995), the model space deﬁnition requires prior knowledge of the research problem (Kahan & Foltynie, 2013). In principle, the model space should reﬂect prior knowledge about possible causal connections between the nodes in the network. If a paradigm developed for the fMRI study is novel, there might be no reference study that can be used to build the model space. In that case, using family-wise DCM modeling can be helpful (W. D. Penny et al., 2010). Family-wise mod- els group large families of models deﬁned on the same set of nodes, in order to test a particular hypothesis. For instance, one can explore a three node network with nodes X, Y, Z and com- pare the joint evidence behind all the possible models that contain connection X → Y with the joint evidence behind all the possible models that contain connection Y → X (Figure 2, i). Another solution that allows for constraining a large model space is Bayesian model averaging (Hoeting, Madigan, Raftery, & Volinsky, 1999; Stephan et al., 2010) which explores the en- tire model space and returns average value for each model parameter, weighted by the poste- rior probability for each model. Finally, one can perform a Bayesian model reduction (J.Friston et al., 2016), in which the considered models are reduced versions of a full (or “parent”) model. This is possible when the priors can be reduced, for example, when a prior distribution of a parameter in a parent model is set to a mean and variance of zero. There are a few points that need particular attention when interpreting the results of the DCM analysis. First, in case the data quality is poor, evidence for one model over another In the worst case, it could give a preference to the simplest model will not be conclusive. Network Neuroscience 250 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI (i.e., the model with the fewest free parameters). In that case, simpler models will be preferred over more complex ones regardless of the low quality of ﬁt. It is important, therefore, to in- clude a “null model” in a DCM analysis, with all parameters of interest ﬁxed at zero. This null model can then act as a baseline against which other models can be compared (W. D. Penny, 2012). Second, the winning model might contain parameters with a high probability of being equal to zero. To illustrate this, let us consider causal inference in a single subject (also referred to as ﬁrst level analysis). Let us assume that we chose a correct set of priors (i.e., model space). The Variational Bayes (VB; Bishop, 2006) procedure then returns a posterior probability distribu- tion for every estimated connectivity strength. This distribution gives a measure of probability for the associated causal link to be larger than zero. Some parameters may turn out to have high probability of being equal to zero in the light of this posterior distribution. This may be due to the fact that the winning model is correct, but some of the underlying causal links are weak and therefore hard to conﬁrm by the VB procedure. Also, DCM requires data of high quality; when the signal-to-noise ratio is insufﬁcient, it is possible that the winning model would explain a small portion of the variance in the data. In that case, getting insigniﬁcant parameters in the winning model is likely. Therefore, it is advisable to check the amount of variance explained by the winning model at the end of the DCM analysis. The most popular implementation of the DCM estimation procedure is based on VB (Bishop, 2006) which is a deterministic algorithm. Recently, also Markov-Chain Monte Carlo (MCMC; Bishop, 2006; Sengupta, Friston, & Penny, 2015) was implemented for DCM. When applied to a unimodal free energy landscape, these two algorithms will both identify the global maxi- mum. MCMC will be slower than VB as it is stochastic and therefore computationally costly. However, free energy landscape for multiple-node networks is most often multimodal and complex. In such case, VB—as a local optimization algorithm—might settle on a local max- imum. MCMC on the other hand, is guaranteed to converge to the true posterior densities— and thus the global maximum (given an inﬁnite number of samples). DCM was tailored for fMRI and, unlike other methods, it explicitly models the hemody- namic response in the brain. The technique tends to return highly reproducible results, and is therefore statistically reliable (Bernal-Casas et al., 2013; Rowe, Hughes, Barker, & Owen, 2010; Schuyler, Ollinger, Oakes, Johnstone, & Davidson, 2010; Tak et al., 2018). Recent longitudinal study on spectral DCM in resting state revealed systematic and reliable patterns of hemispheric asymmetry (Almgren et al., 2018). DCM also yielded high test-retest reliability in an fMRI motor task study (Frässle et al., 2015) in a face perception study (Frässle, Paulus, Krach, & Jansen, 2016), in a facial emotion perception study (Schuyler et al., 2010), and in a ﬁnger-tapping task in a group of subjects suffering from Parkinson’s disease (Rowe et al., 2010). It has also been demonstrated most reliable when directly compared with GC and SEM (W. Penny, Stephan, Mechelli, & Friston, 2004). Furthermore, the DCM procedure can provide complimentary information to GC (K. Friston, Moran, & Seth, 2013): GC models dependency among observed BOLD responses, whereas DCM models coupling among the hidden states generating observations. GC seems to be equally effective as DCM in certain circumstances, such as when the HRF is deconvolved from the data (David et al., 2008; Ryali et al., 2016, 2011; Wang, Katwal, Rogers, Gore, & Deshpande, 2016). Importantly, the face validity of DCM was examined on experimental datasets coming from interventional study with use of rat model of epilepsy (David et al., 2008; Papadopoulou et al., 2015). DCM is not always a method of choice in causal studies in fMRI. Proper use of DCM re- quires knowledge of the biology and of the inference procedure. DCM also has limitations Network Neuroscience 251 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI in terms of the size of the possible models. Modeling a large network may run into problems with identiﬁability; there will be many possible combinations of parameter settings that could give rise to the same or similar model evidence. In other words, strong covariance between parameters will preclude conﬁdent estimates of the strength of each connection. One possible remedy for this, in the context of large-scale networks, is to impose appropriate prior constrains on the connections, for example, using priors based on functional connectivity as priors (Razi et al., 2017). Large networks may also give rise to comparisons of large number of different models with varying combinations of connections. To reduce the possibility of overﬁtting at the level of model comparison—that is, ﬁnding a model which is appropriate for one subject or group of subjects’ data, but not for others—it can be useful to group the models into a small number of families (W. D. Penny et al., 2010) based on pre-deﬁned hypotheses. More infor- mation on the limitations of DCM can be found in work by Daunizeau et al. (2011). A critical note on limitations of DCM in terms of network size can also be found in Lohmann, Erfurth, Muller, and Turner, 2012, and see also a response to this article, Breakspear (2013); K. Friston, Daunizeau, and Stephan (2013). However, to extend the scope of application of the DCM analysis to larger networks, re- cently two approaches were developed. First, a new, large-scale DCM framework for resting- state fMRI has been proposed (Razi et al., 2017). This framework uses the new, spectral DCM (K. J. Friston et al., 2011) designed for resting-state fMRI and is able to handle dozens of nodes in the network. Spectral DCM is then combined with functional connectivity priors in or- der to estimate the effective connectivity in the large-scale resting-state networks. Second, a new approach by Frässle et al. (2018) imposes sparsity constraints on the variational Bayesian framework for task fMRI, which enables for causal inference on the whole-brain network level. DCM was further developed into multiple procedures including more sophisticated gener- ative models than the original model discussed here. The ﬁeld of DCM research in fMRI is still growing (K. J. Friston et al., 2017). The DCM generative model is continuously being updated in terms of the structure of the forward model (Havlicek et al., 2015), the estimation procedure (Sengupta et al., 2015), and the scope of the possible applications (K. J. Friston et al., 2017). HIERARCHICAL NETWORK-WISE MODELS The second group of methods involves hierarchical network-wise models: Linear Non-Gaussian Acyclic Models (LiNGAM, Shimizu et al. (2006)) and Bayesian Nets (BNs; Frey & Jojic, 2005). Similarly, as network-wise methods reviewed in the previous chapter, these methods are also multivariate but with one additional constraint: the network can only include feed forward projections (and therefore, no closed cycles). Consequently, the resulting models have a hier- archical structure with feed forward distribution of information through the network. LiNGAM Linear Non-Gaussian Acyclic Models (LiNGAM; Shimizu et al., 2006) is an example of a data driven approach working under the assumption of acyclicity (Thulasiraman & Swamy, 1992). The model is simple: every time course within an ROI Xi(t) is considered to be a linear combination of all other signals with no time lag: (cid:2)X(t) = B(cid:2)X(t) +(cid:2)σ(t) (7) in which B denotes a matrix containing the connectivity weights, and (cid:2)σ denotes multivariate noise. The model is in principle the same as in SEM (section Structural Equation Modeling), Network Neuroscience 252 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI but the difference lies in the inference procedure: whereas in SEM, inference is based on minimizing the variance of the residual noise under the assumption of independence and Gaussianity, LiNGAM ﬁnds connections based on the dependence between residual noise components (cid:2)σ(t) and regressors (cid:2)X(t). The rationale of this method is as follows (Figure 3). Let us assume that the network is noisy, and every time series within the network is associated with a background noise uncorrelated with the signal in that node. An example of such a mixture of signal with noise is given in Figure 3A. Then, let us assume that ˆX(t), which is a mixture of signal X(t) and noise σ X(t), causes Y(t). Then, as it cannot distinguish between the signal and the noise, Y becomes a function of both these components. Y(t) is also associated with noise σ Y(t); however, as there is no causal link Y → X, X(t) is not dependent on the noise component σ Y(t). Therefore, if Y depends on the σ Y(t) component, one can infer projection X → Y. X(t) component, but X does not depend on the σ This effect is further explained on an example of a simple causal relationship between two variables is demonstrated in Figure 3B: age versus length in a ﬁsh. If ﬁsh length is expressed in a function of ﬁsh age (upper panel), the residual noise in the dependent variable (length) is uncorrelated with the independent variable (age). Therefore, the noise variance is constant over a large range of ﬁsh age. On the contrary, once the variables are ﬂipped and ﬁsh age l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 3. The Linear Non-Gaussian Acyclic Model (LiNGAM). A: The noisy time series ˆX(t) con- sists of signal X(t) and noise σ X(t). Y(t) thus becomes a function of both the signal and the noise in ˆX(t). B: Causal inference through the analysis of the noise residuals (ﬁgure reprinted from http://videolectures.net/bbci2014_grosse_wentrup_causal_inference/). The causal link from age to length in a population of ﬁsh can be inferred from the properties of the residual noise in the system. If ﬁsh length is expressed in a function of ﬁsh age (upper panel), the residual noise in the dependent variable (length) is uncorrelated with the independent variable (age): the noise variance is constant over a large range of ﬁsh age (red bars). On the contrary, once the variables are ﬂipped and ﬁsh age becomes a function of ﬁsh length (lower panel), the noise variance becomes dependent on the independent variable (length): it is small for small values of ﬁsh length and large for the large values of ﬁsh length (red bars). Network Neuroscience 253 Disentangling causal webs in the brain using fMRI becomes a function of ﬁsh length (lower panel), the noise variance becomes dependent on the independent variable (length): it is small for small values of ﬁsh length and large for the large values of ﬁsh length. Therefore, the ﬁrst causal model (ﬁsh age inﬂuencing ﬁsh length) is correct. In applications to causal research in fMRI, the LiNGAM inference procedure is often ac- companied by an Independent Component Analysis (ICA; Hyvärinen & Oja, 2000) as follows. The connectivity matrix B in Equation 7 describes how signals in the network mix together. By convention, not B but a transformation of B into A = (1 − B)−1 (8) is used as a mixing matrix in the LiNGAM inference procedure. By using this mixing matrix A, one can look at Equation 7 in a different way: (cid:2)X = A(cid:2)σ (9) Now, the BOLD time course in the network (cid:2)X(t) can be represented as a mixture of in- dependent sources of noise (cid:2)σ(t). This is the well-known cocktail party problem and it was originally described in acoustics (Bronkhorst, 2000): in a crowded room, a human ear regis- ters a linear combination of the noises coming from multiple sources. In order to decode the components of this cacophony, the brain needs to perform a blind source separation (Comon & Jutten, 2010): to decompose the incoming sound into a linear mixture of independent sources of sounds. In the LiNGAM procedure, ICA (Hyvärinen & Oja, 2000) is used to approach this issue. ICA assumes that the noise components (cid:2)σ are independent and have a non-Gaussian distribution, and ﬁnds these components as well as the mixing matrix A through dimensionality reduction with Principal Component Analysis (Jolliffe, 2002; Shlens, 2014). From this mixing matrix, one can in turn estimate the desired adjacency matrix B with use of Equation 8. Since the entries Bij of the connectivity matrix B can take any value, LiNGAM can in principle retrieve both excitatory and inhibitory connectivity1 of any strength2. The author of LiNGAM recommends (Shimizu, 2014) performing signiﬁcance testing through either boot- strapping (Hyvärinen, Zhang, Shimizu, & Hoyer, 2010; Komatsu, Shimizu, & Shimodaira, 2010; Thamvitayakul, Shimizu, Ueno, Washio, & Tashiro, 2012) or permutation testing (Hyvärinen & Smith, 2013)3. However, LiNGAM makes the assumption of acyclicity, there- fore only unidirectional connections can be picked up4. Moreover, the connectivity matrix revealed with the use of LiNGAM is meant to pick up on direct connections5. The original for- mulation of LiNGAM assumes no latent confounds (Shimizu et al., 2006), but the model can be extended to a framework that can capture the causal links even in the presence of (unknown) hidden confounds (Z. Chen & Chan, 2013; Hoyer, Shimizu, Kerminen, & Palviainen, 2008)6. LiNGAM-ICA’s causal inference consists of ICA and a simple machine learning algorithm, and, as such, it is a fully data-driven strategy that does not involve model comparison7. Conﬁdence intervals for the connections B can be found through permutation testing. ICA itself can be computationally costly and its computational stability cannot be guaranteed (the procedure that searches for independent sources of noise can get stuck in a local minimum). Therefore, the computational cost in LiNGAM can vary depending on the dataset8. This also sets a limit on the potential size of the causal network. When the number of connections approaches the number of time points (degrees of freedom), the ﬁtting procedure will become increasingly unstable as it will be overﬁtting the data 9. Network Neuroscience 254 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI When tested on synthetic fMRI benchmark datasets (S. Smith et al., 2011), LiNGAM-ICA performs relatively well, but is more sensitive to confounders than several other methods dis- cussed in this paper, such as Patel’s tau or GC. However, as LiNGAM performs particularly well for datasets containing a large number of samples, the authors suggested that a group analysis could resolve the sensitivity problem in LiNGAM. The concept was then picked up and developed by at least two groups. Firstly, Ramsey et al. (J. D. Ramsey, Hanson, & Glymour, 2011) proposed LiNG Orientation, Fixed Structure technique (LOFS). The method is inspired by LiNGAM and uses the fact that, within one graph equivalence class, the correct causal model should return conditional probability distributions that are maximally non-Gaussian. LOFS was tested on the synthetic benchmark datasets, where it achieved performance very close to 100%. Second, Xu et al. published a pooling-LiNGAM technique (Xu et al., 2014), which is a classic LiNGAM-ICA applied to the surrogate datasets. Validation on synthetic datasets revealed that both False Positive (FP) and False Negative (FN) rates decrease exponentially along with the length of the (surrogate) time series; however, combining time series of as long as 5,000 samples is necessary for this method to give both FP and FN as a reasonable level of 5%. Despite the promising results obtained in the synthetic datasets, LiNGAM is still rarely ap- plied to causal research in fMRI to date. Bayesian Nets The use of the LiNGAM inference procedures assumes a linear mixing of signals underlying a causal interaction. Model-free methods do not make this assumption: the bare fact that one is likely to observe Y given the presence of X can indicate that the causal link X → Y exists (Figure 4). Let as assume the simplest example: causal inference for two binary signals X(t), Y(t). In a binary signal, only two values are possible: 1 and 0; 1 can be interpreted as an “event” while 0 - as “no event.” Then, if in signal Y(t), events occur in 80% of the cases when events in signal X(t) occur (Figure 4A), but the opposite is not true, the causal link X → Y is likely. Computing the odds of events given the events in the other signal, is sufﬁcient to establish causality. In a model-based approach on the other hand, a model is ﬁtted to the data, in order to establish the precise form of the inﬂuence of the independent variable X on the dependent variable Y. Note that both model-based and model-free approaches contain a measure of uncertainty, but this uncertainty is computed differently. In model-based approaches, p values associated with the ﬁtted model are a measure of conﬁdence that the modeled causal link is a true positive (Figure 4A, left panel). In contrast, in model-free approaches this conﬁdence is quantiﬁed directly by quantifying causal relationships in terms of conditional probabilities (Figure 4A, right panel). In practice, since the BOLD response—unlike the aforementioned example of binary signals—takes continuous values, estimating conditional probabilities is based on the basis of the joint distribution of the variables X and Y (Figure 4B). Conditional probability P(Y|X) becomes a distribution of Y when X takes a given value. BNs (Frey & Jojic, 2005) are based on such a model-free approach (Figure 4C). The causal inference in BNs is based on the concept of conditional independency (a.k.a. Causal Markov Condition; (Hausman & Woodward, 1999). For example, suppose there are two events that could independently cause the grass to get wet: either a sprinkler, or rain. When one only observes the grass being wet, the direct cause for this event is unknown. However, once rain is observed, it becomes less likely that the sprinkler was used. Therefore, one can Network Neuroscience 255 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Figure 4. Bayesian nets. A: Model-based versus model-free approach. β: a regressor coefﬁcient ﬁtted in the modeling procedure. σ(t): additive noise. Both model-based and model-free approach contain a measure of conﬁdence. In a model-based approach, a model is ﬁtted to the data, and p-values associated with this ﬁt are a measure of conﬁdence that the causal link exists (i.e., is a true positive, left panel). In a model-free approach, this conﬁdence is quantiﬁed directly by expressing causal relationships in terms of conditional probabilities (right panel). B: Conditional probability for continuous variables. Since BOLD fMRI is a continuous variables, the joint probability distribu- tion for variables X and Y is a two-dimensional distribution. Therefore, conditional probability of P(Y|X = x) becomes a distribution. C: (i) An exemplary Bayesian Net. X1, X2, X3: parents, X4, X5: children. (ii) Competitive Bayesian Nets: one can deﬁne competitive models (causal structures) in the network and compare their joint probability derived from the data. (iii) Cyclic belief propa- gation: if there was a cycle in the network, the expression for the joint probability would convert into an inﬁnite series of conditional probabilities. say that the variables X1 (sprinkler) and X2 (rain) are conditionally dependent given variable X3 (wet grass), because X1, X2 become dependent on each other after information about X3 In BNs, the assumption of conditional dependency in the network is used to is provided. compute the joint probability of a given model, that is, the model evidence (once variables Xi are conditionally dependent on Xj, the joint distribution P(Xi, Xj) factorizes into a product of probabilities P(Xj)P(Xi |Xj). Implementing a probabilistic BN requires deﬁning a model: choosing a graph of “parents” who send information to their “children.” For instance, in Figure 4C, i, node X1 is a parent of nodes X4 and X5, and node X4 is a child of nodes X1, X2 and X3. The joint probability of the model can then be computed as the product of all marginal probabilities of the parents and conditional probabilities of the children given the parents. Marginal probability P(Xj) is the total probability that the variable of interest Xj occurs while disregarding the values of all the other variables in the system. For instance, in Figure 4C, (i), P(X1) means a marginal |Xj) is the prob- probability of X1 happening in this experiment. Conditional probability P(Xi ability of a given variable (Xi) occurring given that another variable has occurred (Xj). For |X1, X3) means a conditional probability of X5 given its parents instance, in Figure 4C, i, P(X5 X1 and X3. Network Neuroscience 256 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Then, once the whole graph is factorized into the chain of marginal and conditional proba- bilities, the joint probability of the model can be computed as the product of all marginal and conditional probabilities. For instance, in Figure 4C, i, the joint probability of the model M yields P(M) = P(X1)P(X2)P(X3)P(X4 |X1, X2, X3)P(X5 |X1, X2, X3) (10) Finally, there are at least three possible approaches to causal inference with BNs: 1. Model comparison: choosing the scope of possible models (by deﬁning their structure a priori), and comparing their joint probability. Mind that in this case, the algorithm will simply return the winning graphical model, without estimation of the coefﬁcients representing connection weights 3. 2. Assuming one model structure a priori, and only inferring the weights. This is common practice, related to, for example, Naive Bayes (Bishop, 2006) in which the structure is assumed, and the connectivity weights are estimated from conditional probabilities. In this case, the algorithm will assume that the proposed graphical model is correct, and infer the connection weights only Inferring the structure of the model from the data in an iterative way, by using a variety of approximate inference techniques that attempt to maximize posterior probability of the model by minimizing a cost function called free energy (Frey & Jojic, 2005), similar to DCM): expectation maximization (EM; Bishop, 2006; Dempster, Laird, & Rubin, 1977), variational procedures (Jordan, Ghahramani, Jaakkola, & Saul, 1998), Gibbs sampling (Neal, 1993) or the sum-product algorithm (Kschischang, Frey, & Loeliger, 2001), which gives a broader selection of procedures than in the DCM. BNs can detect both excitatory and inhibitory connections X → Y, depending on whether the conditional probability p(Y|X) is higher or lower than the marginal probability p(X)1. Like LiNGAM, BNs cannot pick up on bidirectional connections in general. The assumption of acyclicity comes from the cyclic belief propagation (Figure 4C, iii): the joint probability of a cyclic graph would be expressed by an inﬁnite chain of conditional probabilities, which usually does not converge into a closed form. This restricts the scope of possible models to DAGs (Thulasiraman & Swamy, 1992). However, there are also implementations of BNs that cope with cyclic propagation of information throughout the network, for example, Cyclic Causal Discovery algorithm (CCD; Richardson & Spirtes, 2001). This algorithm is not often used in practice. However, as it works in the large sample limit, CCD requires assumption on the graph structure and retrieves a complex output4. In BNs, the value of conditional prob- ability P(Y|X) can be a measure of a connection strength2. We can consider conditional probabilities signiﬁcantly higher than chance as an indication for signiﬁcant connections3. In principle, BNs are not resilient to latent confounds. However, some classes of algorithms were designed speciﬁcally to tackle this problem, such as Stimulus-based Causal Inference (SBCI; Grosse-Wentrup, Janzing, Siegel, & Schölkopf, 2016), Fast Causal Inference (FCI; P. Spirtes, Glymour, & Scheines, 1993; Zhang, 2008) and Greedy Fast Causal Inference (GFCI; Ogarrio, Spirtes, & Ramsey, 2016)6. BNs can either work through model comparison or as an explor- atory technique7. In the ﬁrst case, it involves model speciﬁcation that, like in DCM, requires a priori knowledge about the experimental paradigm. In the latter case, the likelihood is in- tractable and can only be approximated 8 (Diggle, 1984). In principle, networks of any size can be modeled with BNs, either through a model comparison or through exploratory tech- niques. Exploratory techniques typically minimize a cost function during the iterative search for the best model. Since together with the growing network size, the landscape of the cost Network Neuroscience 257 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Bayesian inference: A probabilistic method for causal inference, in which competitive models representing causal structure in the network are evaluated with respect to evidence in the experimental data to support these models. Pairwise inference: A two-step causal inference procedure that reduces causal inference in a large graph to studying two-node interactions, in contrast to network-wise inference and hierarchical network-wise models. function becomes multidimensional and complex, and the algorithm is more likely to fall into a local minimum, exploratory techniques may become unreliable for large networks9. What can also become an issue while using BNs in practice is that multiple BN algorithms return an equivalence class of a graph: the set of all graphs indistinguishable from the true causal structure on the basis of their sole probabilistic independency (Spirtes, 2010). These structures cannot be further distinguished without further assumptions or experimental inter- ventions. For ﬁnite data, taking even one wrong assumption upon the directionality of causal link in the graph can be propagated through the network, and cause an avalanche of incorrect orientations (Spirtes, 2010). One approach designed to overcome this issue is the Constraint- In this approach, Bayesian Inference is Based Causal Inference (Claassen & Heskes, 2012). employed to estimate the reliability of a set of constraints. This estimation can further be used to decide whether this prior information should be used to determine the causal structure in the graph. BNs cope well with noisy datasets, which makes them an attractive option for causal re- search in fMRI (Mumford & Ramsey, 2014). S. Smith et al. (2011) tested multiple implemen- tations of BNs, including FCI, CCD, as well as other algorithms: Greedy Equivalence Search (GES; Chickering, 2002; Meek, 1995), “Peter and Clark” algorithm (PC; Meek, 1995) and a conservative version of “Peter and Clark” (J. Ramsey, Zhang, & Spirtes, 2006). All these imple- mentation performed similarly well with respect to estimating the existence of connections, but not to the directionality of the connections. BNs are not widely used in fMRI research up to date, the main reason being the assumption of acyclicity. One exception is Fast Greedy Equivalence Search (FGES; J. D. Ramsey, 2015; J. D. Ramsey, Glymour, Sanchez-Romero, & Glymour, 2017; J. D. Ramsey et al., 2014), a variant of GES optimized to large graphs. The algorithm assumes that the network is acyclic with no hidden confounders, and returns an equivalence class for the graph. In a recent work by Dubois et al. (2017), FGES was applied with use of a new, computational-experimental approach to causal inference from fMRI datasets. In the initial step, causal inference is per- formed from large observational resting-state fMRI datasets with use of FGES in order to get the aforementioned class of candidate causal structures. Further steps involve causal inference in a single patient informed by the results of the initial analysis, and interventional study with use of an electrical stimulation in order to determine which of the equivalent structures revealed by FGES can be associated with a particular subject. PAIRWISE INFERENCE The last group of methods reﬂects the most recent trends in the ﬁeld of causal inference in fMRI. This family of methods is represented by Pairwise Likelihood Ratios (PW-LR; Hyvärinen & Smith, 2013), and involves a two-stage inference procedure. In the ﬁrst step, functional con- nectivity is used to ﬁnd connections, without assessing their directionality.Unlike network-wise methods which eliminate insigniﬁcant connections post hoc, pairwise methods eliminate in- signiﬁcant connections prior to causal inference. In the second step, each previously found connection is analyzed separately, and the two nodes involved are classiﬁed as an upstream or downstream region. These methods do not involve assumptions on the global patterns of con- nectivity at the network level (recurrent vs. feedforward). However, they involve the assump- tion that the connections are nontransitive: if X projects to Y, and Y projects to Z, it does not imply that X projects to Z. The causal inference is based on the pairs of nodes only, and this has consequences for the interpretation of the network as a whole. As there is uncertainty Network Neuroscience 258 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI associated with estimation of every single causal link, the probability that all connections are correctly estimated decreases rapidly with the number of nodes in the network. Pairwise Likelihood Ratios A two-step procedure to causal inference in fMRI was ﬁrst proposed by Patel et al. (2006) as Patel’s tau (PT). The ﬁrst step involves identifying the (undirected) connections by means of functional connectivity, and is achieved on the basis of correlations between the time series in different regions. This step results in a binary graph of connections, and the edges identiﬁed as empty are disregarded from further considerations, because if there is no correlation, there is no causation. The second step determines the directionality in each one of the previously detected con- nections. The causal inference boils down to a two-node Bayesian network as the whole concept is based on a simple observation: if there is a causal link X → Y, Y should get a transient boost of activity every time X increases activity. And vice versa: if there is a causal link Y → X, X should react to the activation in Y by increasing activity. Therefore, one can threshold the signals X(t), Y(t), and compute the difference between conditional probabilities P(Y|X) and P(X|Y). Three scenarios are possible: 1. P(Y|X) equals P(X|Y): it is a bidirectional connection X ↔ Y (since empty connections were sorted out in the previous step). 2. The difference between P(Y|X) and P(X|Y) is positive: the connection X → Y is likely. 3. The difference between P(Y|X) and P(X|Y) is negative: the connection Y → X is likely. Building on the concept of PT, the Pairwise Likelihood Ratios approach (PW-LR; Hyvärinen & Smith, 2013) was proposed. The authors improved on the second step of the in- ference by analytically deriving a classiﬁer to distinguish between two causal models X → Y and Y → X, which corresponds to the LiNGAM model for two variables. The authors com- pared the likelihood of these two competitive models derived under LiNGAM’s assumptions (Hyvärinen et al., 2010), and provided with a cumulant based approximation to their ratio. In particular, the authors focused on the approximation of the likelihood ratios with third cu- mulant for variables X and Y, which is an asymmetry between ﬁrst (the mean) and second (the variance) moment of the distributions of variables X and Y (this version of the method is referred to by the authors as “PW-LR skew”): C3 = 1 N N ∑ i=1 (X(i)Y(i)2 − X(i)2Y(i)) (11) Then, if the value of this cumulant is positive, it indicates for the connection X → Y, and back- ward otherwise. Additionally, the authors proposed a modiﬁed version of the third cumulant, including a nonlinear transformation of the signal to improve resilience against outliers in the signal (and referred to this modiﬁed metric as “PW-LR r skew”). Additionally, the authors also introduced a version based on fourth cumulant (referred to as “PW-LR kurtosis”). PW-LR methods cannot distinguish between excitation and inhibition1, but provide with a quantitative measure for the strength of the connection2. The authors recommended to test signiﬁcance of PW-LR results through permutation testing (Hyvärinen & Smith, 2013)3. Follow- ing the interpretation from Patel, it is possible to distinguish between uni- and bidirectionality (since scores close to zero might indicate the bidirectionality)4. The authors proposed using partial correlation instead of Pearson’s correlation in the ﬁrst step of the causal inference, which Network Neuroscience 259 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI aims to ﬁnd direct connections in the network5. As for the resilience to confounds, PW-LR methods were tested on benchmark data for which common inputs to the nodes of the net- work were introduced (S. Smith et al., 2011, simulation no. 12). PW-LR gave much better performance than the best competitors (LiNGAM-ICA and PT) and reached as much as 84% of correctly classiﬁed connections across all the benchmark datasets6. In the original formulation, PW-LR involves a point estimate for the strength of effective connectivity and lacks estimation of conﬁdence intervals. In such cases, in fMRI studies, estimating conﬁdence intervals is per- formed in a data-driven fashion. This is typically achieved by means of permutation testing (Hyvärinen & Smith, 2013; S. Smith et al., 2011), but can also be approached with use of mix- ture modeling (Bielczyk et al., 20187). PW-LR, as a closed-form solution, is computationally cheap8. As the pair-by-pair inferences do not require network ﬁtting procedures, this approach can easily be applied to larger networks9. On the benchmark datasets, all versions of PW-LR were performing very well, as contrasted with the best competitors: PT and LiNGAM (and, PW-LR r skew was giving the best results). In all but one out of 28 simulations PW-LR methods were performing highly above chance, and in a few cases they even reached 100% accuracy. However, PW-LR has not been validated on the experimental fMRI datasets to date. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 NEW DIRECTIONS IN CAUSAL RESEARCH IN fMRI A number of methods have been discussed, but the search for new ways of extracting causal information from fMRI data is still on, of which we want to highlight four representatives. Laminar Analysis Advancements in fMRI acquisition have made it possible to scan at submillimeter resolution, which opens up the possibility of a layer-speciﬁc examination of the BOLD signal. As the dif- ferent layers of the cortex receive and process feedforward and feedback information largely in different layers (Bastos et al., 2015; Felleman & Essen, 1991, e.g.), these different processes could be visible in the laminar BOLD response. In rat studies, the BOLD response was indeed shown to have laminar speciﬁcity and have its onset in the input layer of rat motor and so- matosensory cortex (Yu, Qian, Chen, Dodd, & Koretsky, 2014). And also in humans, several studies suggest laminar speciﬁcity of feedback processes (Kok, Bains, vanMourik, Norris, & de Lange, 2016; Muckli et al., 2015). These results suggest that human laminar BOLD signal may contain directional and causal information. Hitherto, only single-region laminar fMRI has been employed, but it may well be worthwhile to investigate how output layers of one region inﬂuence the input layer of the other. Fractional Cumulants Certain new methods take a more statistical approach to neuroimaging data. For instance, char- acterizing the shape of BOLD distributions by means of fractional moments of the BOLD dis- tribution combined into cumulants (Bielczyk et al., 2016) can improve on the classiﬁcation of the two nodes within one connection into an upstream and a downstream node. Fractional moments of a distribution are a mathematical concept with limited practical interpretation, but could still contain valuable (causal) information. In this method a classiﬁcation procedure using fractional cumulants derived from BOLD distribution is developed. The classiﬁer is informed by the DCM generative model. The initial Network Neuroscience 260 Disentangling causal webs in the brain using fMRI results show that the causal classiﬁcation scores similarly or better than competitive methods when applied to low-noise benchmark synthetic datasets (S. Smith et al., 2011), and its perfor- mance is, in general, similar to PW-LW r-skew. The difference shows up after imposing higher level neuronal noise on the network: the fractional cumulant-based classiﬁer is the most robust approach in presence of such natural confounds. However, validation on real fMRI datasets for this method is still pending. Rendering Whole-Brain Effective Connectivity with Use of Covariance Matrices Recent approach to causal inference in fMRI involves inferring directionality of information transfer by using a set of covariance matrices with both zero and nonzero time lags (Gilson, Moreno-Bote, Ponce-Alvarez, Ritter, & Deco, 2016). The authors build a dynamic model of the brain network and optimize the effective connectivity (adjacency matrix) such that the model covariances reproduce the empirical fMRI/BOLD covariance matrices. In this way, the ﬁtted model best matches the BOLD dynamics with respect to the second-order statistics. The authors validate the model in synthetic datasets, and apply to experimental fMRI datasets by using diffusion-weighted MRI imaging in order to constrain the network connectivity. The concept of lagged covariance matrices was also used to evaluate the difference in cortical activation between two behavioral conditions (in application to movie watching; M. Gilson et al., 2017). As this method incorporates lags, it has similar limitations as other lagged methods (such as GC or TE): it becomes lag-dependent. The authors theoretically demonstrate that for accuracy of the directed connectivity estimation, time lag must be matched with the time constant of the underlying dynamical system representing the network. How to achieve the accuracy in order to fulﬁll this requirement in practice remains an open research question. Another recent contribution in this domain by Schiefer et al. (2018) focuses on inferring causal connections from resting-state fMRI datasets (and other continuous time series coming from noninterventional studies), based on the assumption that the symmetric, nonlagged co- variance matrix derived from the observed activity contains footprints of the direction and the sign of sparse directed connections. This underlying sparse structure is found via L1- minimization with a gradient descent, which allows for obtaining asymmetric output con- nectivity matrix from the initial symmetric covariance structure. In the process, the method utilizes the fact that in case of a collider present in the network (X and Y projecting to the same node Z), projecting nodes X and Y have a positive covariance, which indicates for a particular motif in the covariance structure. The validation on ground truth synthetic datasets derived from a simple Ornstein–Uhlenbeck process resulted in impressive results. On the other hand, application to the experimental fMRI datasets brought more vague results; therefore, the method requires more exploration in the fMRI datasets. Neural Network Models Another recent development relevant to the problem of causal inference is the approach of implementing neural network models to perform a complex task that is emblematic of hu- man cognition (most commonly, visual object recognition). It is then possible to study the functional architecture and representational space of such models and attempt to draw insight from optimal model parameters as to how such tasks are implemented in the human brain. In recent years neural network models designed to recognize objects have reached human levels of performance (Kriegeskorte, 2015; Krizhevsky, Sutskever, & Hinton, 2012), and the potential of using these as models of how biological brains represent object space became a realizable goal. Early studies of feedforward neural networks that has been replicated across multiple Network Neuroscience 261 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI studies is that the closer the representational space a model uses resembles inferior temporal cortex fMRI activity the better the model performs (Khaligh-Razavi & Kriegeskorte, 2014; D. L. Yamins, Hong, Cadieu, & DiCarlo, 2013; D. L. K. Yamins et al., 2014). Of particular interest is the ﬁnding that object representations in neural network models correlate with human brain representations in a hierarchical fashion, a result shown in across both spatial and temporal dimensions (Cichy, Khosla, Pantazis, Torralba, & Oliva, 2016). While care must be taken not to overinterpret the generalisability of such models, these promising ﬁndings indicate that neural network models may be able to provide insight into the fundamental constraints of certain computational processes which in turn can be applied to determining functional (and casual) relationships in human cognition. SUMMARY We sum up the characteristics of all the discussed methods in the Table 1: DISCUSSION In this work, we focused on discussing methods with respect to the causal structure imposed on the brain. According to this criterion, the methods fall into three categories. Network- wise methods, such as GC or SEM, do not restrict the connectivity patterns, whereas DAGs, such as BNs, assume a hierarchical structure and unidirectional connections. In the latter category, a primary node receives input from outside the network and distributes information downstream throughout the network. This may be a good approximation for many processes, (see for instance recent work on the visual cortex by Michalareas et al., 2016). However, the feed forward structure assumes a strictly hierarchical organization, which limits its capacity to model communication between different brain networks. Under what circumstances DAGs can be an accurate representation for causal structures in the brain remains an open question. Next to network-wise methods and DAGs, we also discussed a third group of methods, re- ferred to as “pairwise.” In this approach, the causal inference is done by splitting the inference into many pairwise inferences. Prior to this, the dimensionality is reduced based on functional Table 1. Summary for all the methods discussed in this paper. GC: Granger causality; SEM: Struc- tural Equation Modeling; DC: Dynamic Causal Modeling; LN: LINGaM; BN: Bayesian Nets; TE: Transfer Entropy; PW-LR: Pairwise Likelihood Ratios; net: network-wise; dag: Directed Acyclic Graphs only; pw: pairwise; +/-: depends on implementation; mc: model comparison; c: classical hypothesis testing; ml: machine learning; l: low; h: high; n/a: nonapplicable. PW-LR is based on the same concept as Patel’s tau (PT), and the inference is the same, therefore we did not add a separate column for PT. Feature — Method Group of methods Sign of connections Directionality Connection strength Immediacy Resilience to confounds Causality through... Computational cost Model-free? Prespecify the graph? Regression in time GC net + + + +/− +/− c l − − + SEM net + + + +/− +/− mc/c l/h − − − DCM net + + + − − mc h − + − LN dag + − + + +/− ml+c h − − − BN dag − − + + +/− mc/ml l/h + +/− − TE net + + + +/− +/− c l + − + PW-LR pw − + + + + c l + − − Network Neuroscience 262 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI connectivity, based on the idea that (partial) correlation is a good indicator for the existence of causal links (S. Smith et al., 2011) and therefore allows for simplifying the problem, both computationally and conceptually. Since the inference in this class of methods is split into a set of pairwise inferences, it is important to be aware of the fact that the conﬁdence levels are also obtained connection by connection. Therefore, for a network represented by a set i(1 − pi) (in of connections with p values pi, the joint probability of the model is roughly Π practice, conﬁdence values for the existence of single connections are not independent, there- fore this is only a rough approximation of the joint probability). This also means that there is a trade-off between the joint probability of the graph and its density: the joint probability of the whole network pattern can be increased by decreasing the threshold for connectivity at more conservative p values. Furthermore, one can look at the pairwise inference methods as a sort of model comparison, because in the second step of the inference, for every connection only three options are possible to choose from. The difference with DCM procedure lies in the fact that pairwise inference methods are based on the simple statistical properties emerging from causation in linear systems, and do not involve minimizing the cost function—such as negative free energy—as is done in DCM. In the fMRI community, the DCM family (K. J. Friston et al., 2003) is currently the most popular approach to causal inference. This is partially due to the fact that DCM was tailor- made for fMRI, and includes a generative model based on the biological underpinnings of the BOLD dynamics (Buxton et al., 1998). Some of the GC studies also involve estimation of the HRF, and deconvolving the data before applying the estimation procedure (David et al., 2008; Goodyear et al., 2016; Hutcheson et al., 2015; Ryali et al., 2016, 2011; Sathian et al., 2013; Wheelock et al., 2014). This notion of the hemodynamics is both a strength and a weakness: the generative model ﬁts the data well, but only as long as the current state of knowledge is accurate. New studies suggest that human hemodynamics are very dynamic and driven by state-dependent processes (Handwerker, Gonzalez-Castillo, D’Esposito, & Bandettini, 2012; Miezin, Maccotta, Ollinger, Petersen, & Buckner, 2000). The inﬂuence of this complex behav- ior on the performance of DCM is hard to estimate. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t The DCM procedure performs causal inference through model comparison, and as such, it is restricted to causal research in small networks containing a few nodes since the computational costs increase like a factorial with the number of nodes. With the rise of research into resting- state networks that contain up to 200 nodes, this may prove to be a limiting characteristic (S. M. Smith et al., 2009a). This issue can be addressed with new methods for pairwise inference such as PT and PW-LR, which do not impose any upper bound on the size of the network as well as new versions of whole-brain DCMs (Frässle et al., 2016, Frässle et al., 2018). f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 It is important to remember that there are always two aspects to a method for causal infer- ence. First, the method should have assumptions grounded in a biologically plausible frame- work, well suited for the given dataset. For instance, a method for causal inference in fMRI should respect (1) the confounding, region- and subject-speciﬁc BOLD dynamics (Handwerker et al., 2004) and (2) co-occurance of cause and effect (since the time resolution of the data is low compared with the underlying neuronal dynamics; the causes and their effects most likely happen within the same frame in the fMRI data). The new methods for pairwise infer- ence address this issue by (1) breaking the time order, and performing causal inference on the basis of statistical properties of the distribution of the BOLD samples, and not from the timing of events; and (2) using correlation in order to detect connections. A good counterexam- ple here is GC. GC has been proven useful in multiple disciplines, and its estimation proce- dure is impeccable: nonparametric, computationally straightforward, and it gives a unique, Network Neuroscience 263 Disentangling causal webs in the brain using fMRI unbiased solution. However, there is an ongoing discussion on whether or not GC is suited for causal interpretations of fMRI data. On the one hand, theoretical work by Seth et al. (2013) and Roebroeck et al. (2005) suggest that despite the slow hemodynamics, GC can still be informa- tive about the directionality of causal links in the brain. On the other hand, the work by Webb et al. (2013) demonstrates that the spatial distribution of GC corresponds to the Circle of Willis, the major blood vessels in the brain. Second, an estimation procedure needs to be computationally stable. Even if the generative model faithfully describes the data, it still depends on the estimation algorithm whether the method will return correct results. However, the face validity of the algorithms can only be tested in particular paradigms, in which the ground truth is known. If in the given paradigm, the ground truth is unknown, which is most often the case in fMRI experiments, only reliability can be tested. One way of assessing reliability of the method is testing for the test-retest conver- gence. So far, DCM is the only method that has been extensively tested in terms of test-retest reliability in separate studies (Frässle, Paulus, et al., 2016; Frässle et al., 2015; Rowe et al., 2010; Schuyler et al., 2010; Tak et al., 2018) and performed good overall. In general, it is de- sirable to have more studies testing the reliability of the methods on reliability in experimental fMRI datasets, as such validation of multiple methods such as GC or SEM, is still missing. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . One last remark about the nature of the different methods: some methods are developed for event-related fMRI, such as DCM. Yet, new implementations of spectral DCM for the resting state were also developed (K. J. Friston et al., 2011). As for other methods, application to resting-state studies is relatively straightforward, while task fMRI can pose certain constraints on the methods. For instance, lag-based methods such as GC work best when the task is executed in a form of epochs (Deshpande, LaConte, James, Peltier, & Hu, 2008) rather than a few second stimulus-response blocks, because it is extremely difﬁcult to ﬁt an AR model to datasets of 1 to 2 frames in length. For this reason, structural methods (which do not regard the time sequence) such as BNs or PW-LR, will be much more efﬁcient in estimating causality in such cases. / t / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t Coming back to the main question posed in this review, can we hope to uncover causal relations in the brain using fMRI? Although there are new concepts in the ﬁeld, which propose to consider causal interactions in the brain in probabilistic terms (Grifﬁths, 2015; Mannino & Bressler, 2015), the “traditional,” deterministic models of causality are prevalent in neuroimag- ing. Within these deterministic models, in the light of the existing literature, the new research directions based on breaking the time order as the axiom of causal inference (such as PW-LR, PT, and LiNGAM), prove more successful than the more “traditional” approaches, which take regression in time into account (such as GC or TE; Hyvärinen & Smith, 2013; S. Smith et al., 2011). Also, Patel’s two-step design to achieve a causal map of connections is very promis- ing, especially once the Pearson correlation is replaced with partial correlation as is done in PW-LR. One note to add is that “success” of any method for causal inference in fMRI de- pends on the forward model used for generating the synthetic dataset. In the seminal paper by S. Smith et al. (2011), multiple methods were evaluated and critically discussed on the basis of simulations of the DCM generative model. However, there are alternatives, for example, the generative model Seth et al. (2013), which might potentially yield other hierarchy of methods in terms of success rate in inferring causal links from synthetic fMRI BOLD datasets. f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 In this paper, we discuss the topic of inferring causal processes from fMRI datasets on the level of individual subject. One approach that could further contribute to the development of methods for causal inference in fMRI though, is a group inference approach. In such an Network Neuroscience 264 Disentangling causal webs in the brain using fMRI approach, a prior that different subjects represent similar causal structures is added to the inference procedure. As lumping the datasets coming from different subjects increases the amount of data to derive the causal structure from, this assumption, in general, facilitates the in- ference. Multiple algorithms for group inference for effective connectivity in fMRI have already been proposed, including Independent Multiple sample Greedy Equivalence Search (IMaGES; J. D. Ramsey et al., 2010), previously mentioned LOFS algorithm (J. D. Ramsey et al., 2011) and Group Iterative Multiple Model Estimation (GIMME; Gates & Molenaar, 2012). Furthermore, with the current rapid growth of translational research and increase in use of invasive and acute stimulation techniques such as optogenetics (Deisseroth, 2011; Ryali et al., 2016) or transcranial magnetic stimulation (Kim et al., 2009), a rigid validation of methodol- ogy for causal inference becomes feasible through interventional studies. Recently, multiple methods for inferring causality from fMRI data were validated using a joint fMRI and MEG experiment (Mill et al., 2017), with promising results for GC and BNs. This gives hope for establishing causal relations in neural networks using fMRI. ACKNOWLEDGMENTS We thank to Lionel Barnett, Christian Beckmann, Daniel Borek, Patrick Ebel, Daniel Gomez, Moritz Grosse-Wentrup, Max Hinne, Maciej Jedynak, Christopher Keown, S ´andor Kolumb ´an, Vinod Kumar, Randy McIntosh, Nils Müller, Hanneke den Ouden, Payam Piray, Thomas Rhys-Marshall, Gido Schoenmacker, Ghaith Tarawneh, Fabian Walocha, and Johannes Wilbertz for sharing knowledge about causal inference in fMRI, and for providing a valuable content. We further thank Martha Nari-Havenith and Peter Vavra for his contribution to the conceptual work. In addition, we cordially thank Thomas Wolfers for encouragement and help at an early stage. AUTHOR CONTRIBUTIONS Natalia Bielczyk: Conceptualization; Writing – original draft; Writing – review & editing. Sebo Uithol: Conceptualization; Writing – original draft; Writing – review & editing. Tim van Mourik: Conceptualization; Writing – original draft; Writing – review & editing. Paul Anderson: Con- ceptualization; Writing – original draft; Writing – review & editing. Jeffrey Glennon: Writing – review & editing. Jan K Buitelaar: Writing – review & editing. FUNDING INFORMATION European Research Council Natalia Bielczyk, FP7 Ideas: (http://dx.doi.org/10.13039/ 100011199), Award ID: 305697. Natalia Bielczyk, FP7 Ideas: European Research Coun- cil (http://dx.doi.org/10.13039/100011199), Award ID: 278948. Natalia Bielczyk, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Award ID: 603016. Sebo Uithol, H2020 Marie Skłodowska-Curie Actions (http://dx.doi.org/10.13039/100010665), Award ID: 657605. Jeffrey Glennon, FP7 Ideas: European Research Council (http://dx.doi.org/ 10.13039/100011199), Award ID: 603016. Jeffrey Glennon, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Award ID: 278948. Jeffrey Glennon, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Award ID: 602805. Jeffrey Glennon, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/ 100011199), Award ID: 305697. Jeffrey Glennon, Horizon 2020 (http://dx.doi.org/10.13039/ Jan K Buitelaar, FP7 Ideas: European Research Coun- 501100007601), Award ID: 115916. cil (http://dx.doi.org/10.13039/100011199), Award ID: 115300. Jan K Buitelaar, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Award ID: 603016. Jan K Buitelaar, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Network Neuroscience 265 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d t . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Award ID: 278948. Jan K Buitelaar, FP7 Ideas: European Research Council (http://dx.doi.org/ 10.13039/100011199), Award ID: 602805. Jan K Buitelaar, FP7 Ideas: European Research Council (http://dx.doi.org/10.13039/100011199), Award ID: 305697. Jan K Buitelaar, Horizon 2020 (http://dx.doi.org/10.13039/501100007601), Award ID: 115916. REFERENCES Akaike, H. (1998). Information theory and an extension of the maxi- mum likelihood principle. In Selected papers of Hirotugu Akaike (pp. 199–213). New York: Springer. Almgren, H. B. J., de Steen, F. V., Kühn, S., Razi, A., Friston, K. J., & Marinazzo, D. (2018). Variability and reliability of effective con- nectivity within the core default mode network: A longitudinal spectral DCM study. BioRxiV. https://doi.org/10.1101/273565 Altman, N., & Krzywi ´nski, M. (2015). Association, correlation and causation. Nature Methods, 12(10), 899–900. https://doi.org/10. 1038/nmeth.3587 Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step ap- proach. Psychological Bulletin, 103(3), 411–23. https://doi.org/ 10.1037/0033-2909.103.3.411 Arichi, T., Fagiolo, G., Varela, M., Melendez-Calderon, A., Allievi, A., Merchant, N., . . . Edwards, A. D. (2012). Development of BOLD signal hemodynamic responses in the human brain. Neuro- Image, 63(2), 663–73. https://doi.org/10.1016/j.neuroimage.2012. 06.054 Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. arXiv. https://doi.org/10.1103/PhysRevLett.103.238701 Barnett, L., & Bossomaier, T. (2012). Transfer entropy as a log- likelihood ratio. Physical Review Letters, 109(13). https://doi.org/ 10.1103/PhysRevLett.109.138105 Bastos, A. M., Vezoli, J., Schoffelen, C. A. B. J.-M., Oostenveld, R., Dowdall, J. R., Weerd, P. D., . . . Fries, P. (2015). Visual areas exert feedforward and feedback inﬂuences through distinct fre- quency channels. Neuron, 85(2), 390–401. https://doi.org/10. 1016/j.neuron.2014.12.018 Bellec, P., Perlbarg, V., Jbabdi, S., Pélégrini-Issac, M., Anton, J. L., Doyon, J., . . . Benali, H. (2006). Identiﬁcation of large-scale networks in the brain using fMRI. NeuroImage, 29(4), 1231–43. https://doi.org/10.1016/j.NeuroImage.2005.08.044 Bellec, P., Rosa-Neto, P., Lyttelton, O. C., Benali, H., & Evans, A. C. (2010). Multi-level bootstrap analysis of stable clusters in resting-state fMRI. NeuroImage, 51(3), 1126–39. https://doi.org/ 10.1016/j.neuroimage.2010.02.082 Bentler, P. M. (1985). Theory and implementation of EQS, a struc- tural equations program. BMDP Statistical Software, Pennsylvania State University. Bernal-Casas, D., Balaguer-Ballester, E., Gerchen, M. F., Iglesias, S., Walter, H., Heinz, A., . . . Kirsch, P. (2013). Multi-site re- producibility of prefrontal-hippocampal connectivity estimates by stochastic DCM. NeuroImage, 82, 555–63. https://doi.org/10. 1016/j.NeuroImage.2013.05.120 Bielczyk, N. Z., Llera, A., Buitelaar, J. K., Glennon, J. C., & Beckmann, C. F. Increasing robustness of pairwise methods for effective connectivity in Magnetic Resonance Imag- (2016). ing by using fractional moment series of BOLD signal distribu- tions. arXiV preprint. Retrieved from https://arxiv.org/abs/1606. 08724 Bielczyk, N. Z., Llera, A., Buitelaar, J. C., & Beckmann, C. F. (2017). The impact of haemodynamic variabil- ity and signal mixing on the identiﬁability of effective connectiv- ity structures in BOLD fMRI. Brain and Behavior, 7(8), e00777. https://doi.org/10.1002/brb3.777 J. K., Glennon, Bielczyk, N. Z., Walocha, F., Ebel, P. W. J., Haak, K., Llera, A., Buitelaar, J. K., . . . Beckmann, C. F. (2018). Thresholding func- tional connectomes by means of mixture modeling. NeuroImage, 171,402–414.https://doi.org/10.1016/j.neuroimage.2018.01.003 (2006). Pattern recognition and machine learning. Bishop, C. M. New York: Springer. Blumensath, T., Jbabdi, S., Glasser, M. F., Essen, D. C. V., Ugurbil, K., Behrens, T. E., & Smith, S. M. (2013). Spatially constrained hierarchical parcellation of the brain with resting- state fMRI. NeuroImage, 76, 313–24. https://doi.org/10.1016/ j.NeuroImage.2013.03.024 Bollen, K. (1989). Structural Equations with Latent Variables. New York: John Wiley and Sons. Boxerman, J. L., Bandettini, P. A., Kwong, K. K., Baker, J. R., Davis, T. L., Rosen, B. R., & Weisskoff, R. M. (1995). The intra- vascular contribution to fMRI signal change: Monte Carlo modeling and diffusion-weighted studies in vivo. Magnetic Resonance in Medicine, 34(1), 4–10. https://doi.org/10.1002/ mrm.1910340103 Breakspear, M. (2013). Dynamic and stochastic models of neuro- imaging data: A comment on Lohmann et al. NeuroImage, 75, 270–4. https://doi.org/10.1016/j.neuroimage.2012.02.047 Bressler, S. L., & Seth, A. K. (2011). Wiener-Granger causality: A well established methodology. NeuroImage, 58(2), 323–9. https://doi.org/10.1016/j.neuroimage.2010.02.059 Bronkhorst, A. W. (2000). The cocktail party phenomenon: A re- view on speech intelligibility in multiple-talker conditions. Acta Acustica United with Acustica, 86, 117–28. https://doi.org/10. 1121/1.1345696 Buijink, A. W. G., van der Stouwe, A. M. M., Broersma, M., Shariﬁ, S., Groot, P. F. C., Speelman, J. D., . . . van Rootselaar, A.-F. tremor: A functional and effective connectivity study. Brain, 138(10), 2934–47. https://doi.org/10.1093/brain/awv225 (2015). Motor network disruption in essential Bush, K., Cisler, J., Bian, J., Hazaroglu, G., Hazaroglu, O., & Kilts, C. (2015). Improving the precision of fMRI BOLD signal decon- volution with implications for connectivity analysis. Magnetic Resonance Imaging, 33(10), 1314–23. https://doi.org/10.1016/ j.mri.2015.07.007 Buxton, R. B., Wong, E. C., & Frank, L. R. (1998). Dynamics of blood ﬂow and oxygenation changes during brain activation: The Network Neuroscience 266 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Balloon model. Magnetic Resonance in Medicine, 39(6), 855–64. https://doi.org/10.1002/mrm.1910390602 J., Meisenzahl, E., Schoepf, V., Carballedo, A., Scheuerecker, (2011). Functional . Bokde, A., Möller, H. Journal connectivity of emotional processing in depression. of Affective Disorders, 134(1-3), 272–9. https://doi.org/10.1016/ j.jad.2011.06.021 . Frodl, T. J., . Chai, B., Walther, D., Beck, D., & Fei-fei, L. (2009). Exploring func- tional connectivities of the human brain using multivariate in- formation analysis. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in Neural Informa- tion Processing Systems 22 (pp. 270–278). La Jolla, CA: Curran Associates, Inc. Chen, Y. C., Xia, W., Chen, H., Feng, Y., Xu, J. J., Gu, J. P., . . . Yin, X. (2017). Tinnitus distress is linked to enhanced resting-state func- tional connectivity from the limbic system to the auditory cortex. Human Brain Mapping, 38(5), 2384–97. https://doi.org/10.1002/ hbm.23525 Chen, Z., & Chan, L. (2013). Causality in linear nongaussian acyclic models in the presence of latent gaussian confounders. Neural Computation, 25(6), 1605–41. Chickering, D. M. (2002). Optimal structure identiﬁcation with greedy search. Journal of Machine Learning Research, 3, 507–54. https://doi.org/10.1162/153244303321897717 Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientiﬁc Reports, 6, 1–13. Claassen, T., & Heskes, T. (2012). A Bayesian approach to constraint based causal inference. In UAI, Proceedings of the 28th Confer- ence on Uncertainty in Artiﬁcial Intelligence. Comon, P., & Jutten, C. (2010). Handbook of Blind Source Sep- Independent Component Analysis and Applications. aration: Academic Press. Daunizeau, J., David, O., & Stephan, K. E. (2011). Dynamic causal modelling: A critical review of the biophysical and statistical foundations. NeuroImage, 58(2), 312–22. https://doi.org/10.1016/ j.neuroimage.2009.11.062 Daunizeau, J., Friston, K. J., & Kiebel, S. J. (2009). Variational Bayesian identiﬁcation and prediction of stochastic nonlinear dynamic causal models. Physica D: Nonlinear Phenomena, 238(21), 2089–118. https://doi.org/10.1016/j.physd.2009.08. 002a Daunizeau, J., Stephan, K. E., & Friston, K. (2012). Stochastic dy- namic causal modelling of fMRI data: Should we care about neu- ral noise? NeuroImage, 62(1), 464–81. https://doi.org/10.1016/ j.NeuroImage.2012.04.061 David, O., Guillemain, I., Saillet, S., Reyt, S., Deransart, C., Segebarth, C., Depaulis, A. (2008). Identifying neural drivers with functional MRI: An electrophysiological validation. PLoS Biology, 6(12), e315. https://doi.org/10.1371/journal.pbio.0060315 Deisseroth, K. (2011). Optogenetics. Nature Methods, 8, 26–9. https://doi.org/doi:10.1038/nmeth.f.324 Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1), 1–38. https://doi.org/ 10.2307/2984875 Deshpande, G., LaConte, S., James, G. A., Peltier, S., & Hu, X. (2008). Multivariate Granger causality analysis of fMRI data. Human Brain Mapping, 30(4), 1361–73. https://doi.org/10.1002/ hbm.20606 Devonshire, I. M., Papadakis, N. G., Port, M., Berwick, J., Kennerley, A. J., Mayhew, J. E., & Overton, P. G. (2012). Neuro- vascular coupling is brain region-dependent. NeuroImage, 59(3), 1997–2006. https://doi.org/10.1016/j.neuroimage.2011.09.050 Diebold, F. X. (2001). Elements of Forecasting (2nd ed.). Cincinnati: South Western. Diggle, P. J. (1984). Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society, Series B, 46, 193–227. DSouza, A. M., Abidin, A. Z., Leistritz, L., & Wismüller, A. (2017). Exploring connectivity with large-scale Granger causal- Journal of Neuroscience ity in resting-state functional MRI. Methods, 287, 68–79. https://doi.org/10.1016/j.jneumeth.2017. 06.007 Dubois, J., Oya, H., Tyszka, J. M., Howard, M., Eberhardt, F., & Adolphs, R. (2017). Causal mapping of emotion networks in the human brain: Framework and preliminary ﬁndings. Neuro- psychologia. https://doi.org/10.1016/j.neuropsychologia.2017. 11.015 Essen, D. C. V., Smith, S. M., Barch, D. M., Behrens, T., Yacoub, E., Ugurbil, K., & Consortium W.-M. H. (2013). The Human Con- nectome Project: A data acquisition perspective. NeuroImage, 62(4), 2222–31. https://doi.org/10.1016/j.NeuroImage.2012.02. 018 Fedorenko, E., Hsieh, P.-J., Nieto-Castañón, A., Whitﬁeld-Gabrieli, S., & Kanwisher, N. (2010). New method for fMRI investigations of language: Deﬁning ROIs functionally in individual subjects. Journal of Neurophysiology, 104(2), 1177–94. https://doi.org/ 10.1152/jn.00032.2010 Feinberg, D. A., & Setsompop, K. (2013). Ultra-fast MRI of the Journal human brain with simultaneous multi-slice imaging. of Magnetic Resonance, 229, 90–100. https://doi.org/10.1016/ j.jmr.2013.02.002 Felleman, D. J., & Essen, D. C. V. (1991). Distributed hierarchical pro- cessing in the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47. Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: A concrete example. Journal of Educational and Behavioral Statistics, 32(1), 110–20. https://doi.org/10.3102/1076998606298025 Fornito, A., Zalesky, A., & Breakspear, M. (2013). Graph analy- sis of the human connectome: Promise, progress, and pitfalls. NeuroImage, 80, 426–44. https://doi.org/10.1016/j.neuroimage. 2013.04.087 Frässle, S., Lomakina, E. I., Razi, A., Friston, K. J., Buhmann, J. M., & Stephan, K. E. (2017). Regression DCM for fMRI. NeuroImage, 155, 406–21. https://doi.org/10.1016/j.neuroimage.2017.02.090 Frässle, S., Lomakina, E. I., Kasper, L., Manjaly, Z. M., Leffe, A., Pruessmann, K. P., . . . Stephan, K. E. (2018). A generative model of whole-brain effective connectivity. NeuroImage. https://doi. org/10.1016/j.neuroimage.2018.05.058 Frässle, S., Lomakina-Rumyantseva, E., Razi, A., Buhmann, J. M., & Friston, K. J. (2016). Whole-brain Dynamic Causal Modeling of fMRI data. Retrieved from https://www.researchgate.net/project/ Whole-brain-dynamic-causal-modeling-of-fMRI-data Network Neuroscience 267 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 3 2 2 3 7 1 0 9 2 5 4 5 n e n _ a _ 0 0 0 6 2 p d . t f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Disentangling causal webs in the brain using fMRI Frässle, S., Paulus, F. M., Krach, S., & Jansen, A. (2016). Test- retest reliability of effective connectivity in the face perception network. Human Brain Mapping, 37(2), 730–44. https://doi.org/ 10.1002/hbm.23061 Frässle, S., Stephan, K. E., Friston, K. J., Steup, M., Krach, S., Paulus, F. M., & Jansen, A. (2015). Test-retest reliability of dynamic causal modeling for fMRI. NeuroImage, 117, 56–66. https://doi.org/ 10.1016/j.neuroimage.2015.05.040 Frey, B. J., & Jojic, N. (2005). A comparison of algorithms for in- ference and learning in probabilistic Graphical Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9), 1392–416. https://doi.org/10.1109/TPAMI.2005.169 Friston, K., Daunizeau, J., & Stephan, K. E. (2013). Model selection and gobbledygook: Response to Lohmann et al. NeuroImage, 75, 275–8. https://doi.org/10.1016/j.neuroimage.2011.11.064 Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modelling. Current Opinion in Neurobiology, 23(2), 172–8. https://doi.org/10.1016/ j.conb.2012.11.010. Friston, K. J., Ashburner, J., Kiebel, S. J., Nichols, T. E., & Penny, (2007). Statistical Parametric Mapping: The Analysis of W. D. Functional Brain Images. Cambridge, MA: Academic Press. Friston, K. J., Buchel, C., Fink, G. R., Morris, J., Rolls, E., & Dolan, R. (1997). Psychophysiological and modulatory interactions in neu- roimaging. NeuroImage, 6(3), 218–29. https://doi.org/10.1006/ nimg.1997.0291 Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal mo- deling. NeuroImage, 19(4), 1273–302. https://doi.org/10.1016/ S1053-8119(03)00202-7 Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-P., Frith, C. D., & Frackowiak, R. S. J. (1995). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2, 189–210. Friston, K. J., Kahan, J., Biswal, B., & Razi, A. (2011). A DCM for resting state fMRI. NeuroImage, 94, 396–407. https://doi.org/ 10.1016/j.NeuroImage.2013.12.009 Friston, K. J., Preller, K. H., Mathys, C., Cagnan, H., Heinzle, J., Razi, A., & Zeidman, P. (2017). Dynamic causal modelling re- visited. NeuroImage, S1053-8119(17), 30156–8. https://doi.org/ 10.1016/j.neuroimage.2017.02.045 Friston, K. J., & Stephan, K. E. (2007). Free-energy and the brain. Syn- these,159(3),417–58. https://doi.org/10.1007/s11229-007-9237-y Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homoge- neous and heterogeneous samples. NeuroImage, 63(1), 310–9. https://doi.org/10.1016/j.neuroimage.2012.06.026 Geweke, J. F. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association, 77(378), 304–13. https://doi.org/10.1080/ 01621459.1982.10477803 Geweke, J. F. (1984). Measures of linear dependence and feed- back between multiple time series. Journal of the American Statistical Association, 79(388), 907–15. https://doi.org/10.1080/ 01621459.1984.10477110 Gilson, M., Moreno-Bote, R., Ponce-Alvarez, A., Ritter, P., & Deco, G. (2016). Estimation of directed effective connectivity from fMRI functional connectivity hints at asymmetries of cortical con- nectome. PLoS Computational Biology. https://doi.org/10.1371/ journal.pcbi.1004762 Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, E., . . . Essen, D. C. V. (2016). A multi- modal parcellation of human cerebral cortex. Nature, 536(7615), 171–8. https://doi.org/10.1038/nature18933 Glomb, K., Ponce-Alvarez, A., Gilson, M., Ritter, P., & Deco, G. (2017). Stereotypical modulations in dynamic functional con- nectivity explained by changes in BOLD variance. NeuroImage. https://doi.org/10.1016/j.neuroimage.2017.12.074 (2000). Image-based method for retrospective correction of physiological motion ef- fects in fMRI: RETROICOR. Magnetic Resonance in Medicine, 44(1), 162–167. https://doi.org/10.1002/1522-2594(200007)44: 1<162::AID-MRM23>3.0.CO;2-E

Glover, G. H., Li, T. Q., & Ress, D.

Goodyear, K., Parasuraman, R., Chernyak, S., Madhavan, P.,
Deshpande, G., & Krueger, F.
(2016). Advice taking from
humans and machines: An fMRI and effective connectivity
Studie. Grenzen der menschlichen Neurowissenschaften, 4(10), 542. https://doi.
org/10.3389/fnhum.2016.00542

Granger, C. W. J. (1969). Investigating causal relations by econo-
metric models and cross-spectral methods. Econometrica, 37(3),
424–38. https://doi.org/10.2307/1912791

Griffiths, J. D. (2015). Causal inﬂuence in neural systems: Reconciling
mechanistic-reductionist and statistical perspectives. comment
on “Foundational perspectives on causality in large-scale brain
networks’’ by M. Mannino & S. L. Bressler. Physics of Life Reviews,
15, 130–2. https://doi.org/10.1016/j.plrev.2015.11.003

Grosse-Wentrup, M. (2014). Lecture: An introduction to causal
inference in neuroimaging. Max Planck Institute for Intelligent Sys-
Systeme. Retrieved from http://videolectures.net/bbci2014_grosse_
wentrup_causal_inference/

Grosse-Wentrup, M., Janzing, D., Siegel, M., & Schölkopf, B. (2016).
Identiﬁcation of causal relations in neuroimaging data with la-
tent confounders: An instrumental variable approach. Neuro-
Image, 125, 825–33. https://doi.org/10.1016/j.neuroimage.2015.
10.062

Handwerker, D. A., Gonzalez-Castillo,

J., D'Esposito, M., &
Bandettini, P. A. (2012). The continuing challenge of understand-
ing and modeling hemodynamic variation in fMRI. NeuroImage,
62(2), 1017–23. https://doi.org/10.1016/j.NeuroImage.2012.02.
015

Handwerker, D. A., Ollinger, J. M., & D'Esposito, M. (2004). Varia-
tion of BOLD hemodynamic responses across subjects and brain
regions and their effects on statistical analyses. NeuroImage,
21(4), 1639–51. https://doi.org/10.1016/j.NeuroImage.2003.11.
029

Hausman, D. M., & Woodward, J.

Independence, invari-
ance, and the causal markov condition. British Journal for the
Philosophy of Science, 50(4), 521–83. https://doi.org/10.1093/
bjps/50.4.521

(1999).

Havlicek, M., Roebroeck, A., Friston, K., Gardumi, A., Ivanov, D.,
& Uludag, K. (2015). Physiologically informed Dynamic Causal
Modeling of fMRI data. NeuroImage, 122, 355–72. https://10.
1016/j.NeuroImage.2015.07.078

Netzwerkneurowissenschaften

268

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

Hayashi, F. (2000). Econometrics. Princeton University Press.
Er, B. Y. (2014). Scale-free brain activity: Past, present, and future.
Trends in Cognitive Neurosciences, 18(9), 480–87. https://doi.org/
10.1016/j.tics.2014.04.003

Heinzle,

J., Wenzel, M. A., & Haynes,

(2012). Visuo-
motor functional network topology predicts upcoming tasks.
Zeitschrift für Neurowissenschaften, 32(29), 9960–8. https://doi.org/10.1523/
JNEUROSCI.1604-12.2012

J.-D.

Hesse, W., Möller, E., Arnold, M., & Schack, B.

(2003).
The use of time-variant EEG Granger causality for inspecting
Zeitschrift für
directed interdependencies of neural assemblies.
Neuroscience Methods, 124(1), 27–44. https://doi.org/10.1016/
S0165-0270(02)00366-7

Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999).
Bayesian model averaging: A tutorial. Statistical Science, 14,
382–401. https://doi.org/10.1214/ss/1009212519

Hoyer, P. O., Shimizu, S., Kerminen, A., & Palviainen, M. (2008).
Estimation of causal effects using linear non-Gaussian causal
International Journal of Approx-
models with hidden variables.
imate Reasoning, 49(2), 362–78. https://doi.org/10.1016/j.ijar.
2008.02.006

Hume, D.

(1772). Cause and effect.

In An Enquiry Concerning

Human Understanding.

Hutcheson, N. L., Sreenivasan, K. R., Deshpande, G., Reid, M. A.,
Hadley, J., White, D. M., . . . Lahti, A. C. (2015). Effective connec-
tivity during episodic memory retrieval in schizophrenia partic-
ipants before and after antipsychotic medication. Menschliches Gehirn
Mapping, 36(4), 1442–57. https://doi.org/10.1002/hbm.22714
Hyvärinen, A., & Oja, E. (2000). Independent component analysis:
Algorithms and applications. Neural Networks, 13(4–5), 411–430.
https://doi.org/10.1016/s0893-6080(00)00026-5

Hyvärinen, A., & Schmied, S.

(2013). Pairwise likelihood ratios for
estimation of non-Gaussian structural equation models. Zeitschrift
of Machine Learning Research, 14(1), 111–52.

Hyvärinen, A., Zhang, K., Shimizu, S., & Hoyer, P. Ö. (2010). Es-
timation of a structural vector autoregression model using non-
gaussianity. Journal of Machine Learning Research, 11, 1709–31.
J. J. (2016). Regular-
ized structural equation modeling. Structural Equation Mod-
eling, 23(4), 555–66. https://doi.org/10.1080/10705511.2016.
1154793

Jacobucci, R., Grimm, K. J., & McArdle,

James, G., Kelley, M., Craddock, R., Holtzheimer, P., Dunlop, B.,
Nemeroff, C., . . . Hu, X. (2009). Exploratory structural equation
modeling of resting-state fMRI: Applicability of group models to
individual subjects. NeuroImage, 45(3), 778–87. https://doi.org/
10.1016/j.NeuroImage.2008.12.049

Janssen, R.

J.,

Jylänki, P., Kessels, R. P., & van Gerven,
M. A. (2015). Probabilistic model-based functional parcellation
the striatum.
reveals a robust, ﬁne-grained subdivision of
NeuroImage, S1053-8119(15), 00589-3. https://doi.org/10.1016/
j.NeuroImage.2015.06.084

J. Friston, K., Litvak, V., Oswal, A., Razi, A., Stephan, K. E., Transporter
Wijk, B. C. M., . . . Zeidman, P. (2016). Bayesian model reduction
and empirical Bayes for group (DCM) Studien. NeuroImage, 128,
413–31. https://doi.org/10.1016/j.neuroimage.2015.11.015

Jolliffe,

ICH. T. (2002). Principal Component Analysis. New York:

Springer.

Jordanien, M. ICH., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998).
An introduction to variational methods for Graphical Models.
Jordanien (Ed.), Learning in Graphical Models. Kluwer
In M.
Academic.

ICH.

Joreskög, K. G., & Thillo, M. V. (1972). LISREL a general com-
puter program for estimating a linear structural equation sys-
tem involving multiple indicators of unmeasured variables. ETS
Research Bulletin Series, 2, i–71 https://doi.org/10.1002/j.2333-
8504.1972.tb00827.x

Kahan, J., & Foltynie, T. (2013). Understanding DCM: Ten simple
rules for the clinician. NeuroImage, 83, 542–9. https://doi.org/
10.1016/j.NeuroImage.2013.07.008.

Kelly, C., Toro, R., Martino, A. D., Cox, C., Bellec, P., Castellanos,
F. X., & Milham, M. P. (2012). A convergent functional archi-
tecture of the insula emerges across imaging modalities. Neu-
roImage, 61(4), 1129–42. https://doi.org/10.1016/j.neuroimage.
2012.03.021

Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised,
but not unsupervised, models may explain IT cortical represen-
Station. PLoS Computational Biology, 10(11), e1003915–29.
Kiebel, S. J., Garrido, M. ICH., Moran, R. J., & Friston, K. J. (2008).
Dynamic causal modelling for EEG and MEG. Cognitive Neu-
rodynamics, 2(2), 121–36. https://doi.org/10.1007/s11571-008-
9038-0

Kiebel, S. J., Kloppel, S., Weiskopf, N., & Friston, K. J. (2007).
Dynamic causal modeling: A generative model of slice timing
in fMRI. NeuroImage, 34(4), 1487–96. https://doi.org/10.1016/
j.neuroimage.2006.10.026

Kim, D. R., Pesiridou, A., & O’Reardon, J. P. (2009). Transcra-
nial magnetic stimulation in the treatment of psychiatric disor-
ders. Current Psychiatry Reports, 11(6), 447–52. https://doi.org/
10.1007/s11920-009-0068-z

Kiyama, S., Kunimi, M., Iidaka, T., & Nakai, T. (2014). Distant
functional connectivity for bimanual ﬁnger coordination dec-
lines with aging: An fMRI and SEM exploration. Grenzen in
Human Neuroscience, 8, 251. https://doi.org/10.3389/fnhum.
2014.00251

Kok, P., Bains, L., van Mourik, T., Norris, D., & de Lange, F. (2016).
Selective activation of the deep layers of the human primary
visual cortex by top-down feedback. Aktuelle Biologie, 26(3),
371–376. https://doi.org/10.1016/j.cub.2015.12.038

Komatsu, Y., Shimizu, S., & Shimodaira, H. (2010). Assessing statis-
tical reliability of lingam via multiscale bootstrap. In Proceedings
in 20th International Conference on Artiﬁcial Neural Networks
(ICANN2010).

Kriegeskorte, N. (2015). Deep neural networks: A new framework
for modeling biological vision and brain information processing.
Annual Review of Vision Science, 1(1), 417–446.

Krizhevsky, A., Sutskever, ICH., & Hinton, G. E. (2012). Imagenet clas-
siﬁcation with deep convolutional neural networks. In Proceed-
ings of the 25th International Conference on Neural Information
Processing Systems – Volumen 1 (S. 1097–1105). USA: Curran
Associates Inc. Retrieved from http://dl.acm.org/citation.cfm?
id=2999134.2999257

Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs
and the sum-product algorithm. IEEE Transactions on Information
Theory, 47(2), 498–519. https://doi.org/10.1109/18.910572

Netzwerkneurowissenschaften

269

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

Li, B., Piriz, J., Mirrione, M., Chung, C., Proulx, C. D., Schulz, D.,
. . . Schulz, D. (2011). Synaptic potentiation onto habenula neu-
rons in the learned helplessness model of depression. Natur,
470(7335), 535–9. https://doi.org/10.1038/nature09742

Lizier, J., Prokopenko, M., & Zomaya, A. (2008). Local information
transfer as a spatiotemporal ﬁlter for complex systems. Physical
Review E – Statistical, Nonlinear, and Soft Matter Physics, 77(2),
026110. https://doi.org/10.1103/PhysRevE.77.026110

Lizier,

J. T., Heinzle,

J., Horstmann, A., Haynes,

J. D.,
& Prokopenko, M. (2011). Multivariate information-theoretic
measures reveal directed information structure and task rele-
vant changes in fMRI connectivity. Journal of Computational
Neurowissenschaften, 30(1), 85–107. https://doi.org/10.1007/s10827-
010-0271-2

Lohmann, G., Erfurth, K., Müller, K., & Turner, R. (2012). Critical
comments on dynamic causal modelling. NeuroImage, 59(3),
2322–29. https://doi.org/10.1016/j.neuroimage.2011.09.025
Mannino, M., & Bressler, S. L. (2015). Foundational perspectives on
causality in large-scale brain networks. Physics of Life Reviews,
15, 107–23. https://doi.org/10.1016/j.plrev.2015.09.002

Marreiros, A. C., Kiebel, S. J., & Friston, K. J. (2008). Dynamic causal
modelling for fMRI: A two-state model. NeuroImage, 39(1),
269–78. https://doi.org/10.1016/j.NeuroImage.2007.08.019
Marrelec, G., & Fransson, P. (2011). Assessing the inﬂuence of
different ROI selection strategies on functional connectivity
analyses of fMRI data acquired during steady-state conditions.
PLoS One, 6(4), e14788. https://doi.org/10.1371/journal.pone.
0014788

Marrelec, G., Krainik, A., Duffau, H., Pélégrini-Issac, M., Lehéricy,
S., Doyon, J., & Benali, H. (2006). Partial correlation for func-
tional brain interactivity investigation in functional MRI. Neuro-
Image, 32(1), 228–37. https://doi.org/10.1016/j.NeuroImage.
2005.12.057

Mclntosh, A., & Gonzalez-Lima, F. (1994). Structural equation
modeling and its application to network analysis in functional
brain imaging. Kartierung des menschlichen Gehirns, 2, 2–22. https://doi.org/
10.1002/hbm.460020104

Meek, C. (1995). Causal inference and causal explanation with back-
ground knowledge. In Proceedings of the 11th Annual Confer-
ence on Uncertainty in Artiﬁcial Intelligence 558 (S. 403–10).
M.Gilson, K. J. Friston, G. D., Hagmann, P., Mantini, D., Betti,
V., Roma, G. L., & Corbetta, M. (2017). Effective connec-
tivity inferred from fMRI
transition dynamics during movie
viewing points to a balanced reconﬁguration of cortical inter-
Aktionen. NeuroImage. https://doi.org/10.1016/j.neuroimage.
2017.09.061

Michalareas, G., Vezoli, J., van Pelt, S., Schoffelen, J.-M., Kennedy,
H., & Fries, P. (2016). Alpha-beta and gamma rhythms sub-
serve feedback and feedforward inﬂuences among human visual
cortical areas. Neuron, 89(2), 384–97. https://doi.org/10.1016/j.
neuron.2015.12.018

Miezin, F. M., Maccotta, L., Ollinger, J. M., Petersen, S. E., &
(2000). Characterizing the hemodynamic re-
Buckner, R. L.
sponse: Effects of presentation rate, sampling procedure, Und
the possibility of ordering brain activity based on relative tim-
ing. NeuroImage, 11(6), 735–59. https://doi.org/10.1006/nimg.
2000.0568

Mill, R. D., Bagic, A., Bostan, A., Schneider, W., & Cole,
M. W. (2017). Empirical validation of directed functional con-
nectivity. NeuroImage, 146, 275–87. https://doi.org/10.1016/j.
NeuroImage.2016.11.037

Montalto, A., Faes, L., & Marinazzo, D.

(2014). Mute: A Matlab
toolbox to compare established and novel estimators of the mul-
tivariate transfer entropy. PLoS One, 9(10), e109462. https://doi.
org/10.1371/journal.pone.0109462

Muckli, L., De Martino, F., Vizioli, L., Petro, L., Schmied, F., Ugur-Auto,
K., . . . Yacoub, E.
(2015). Contextual feedback to superﬁcial
layers of V1. Aktuelle Biologie, 25(20), 2690–2695. https://doi.
org/10.1016/j.cub.2015.08.057

Mumford, J. A., & Ramsey, J. D. (2014). Bayesian networks for fMRI:
A primer. NeuroImage, 86, 573–82. https://doi.org/10.1016/
j.NeuroImage.2013.10.020

Neal, R. M. (1993). Probabilistic inference using Markov Chain
Monte Carlo methods (Technical Report CRG-TR-93-1). Depart-
ment of Computer Science, Universität von Toronto.

Ogarrio, J. M., Spirtes, P., & Ramsey, J. (2016). A hybrid causal
search algorithm for latent variable models. In Proceedings of
the Eighth International Conference on Probabilistic Graphical
Models, PMLR.

Ogawa, S., Menon, R. S., Tank, D. W., Kim, S. G., Merkle, H.,
Ellermann, J. M., & Ugur-Auto, K.
(1993). Functional brain map-
ping by blood oxygenation level-dependent contrast magnetic
resonance imaging. a comparison of signal characteristics with
a biophysical model. Biophysics Journal, 64(3), 803–12. https://
doi.org/10.1016/S0006-3495(93)81441-3
(2011).

Information theoretic
approaches to functional neuroimaging. Magnetic Resonance
Imaging, 29(10), 1417–28. https://doi.org/10.1016/j.mri.2011.
07.013

Ostwald, D., & Bagshaw, A. P.

Papadopoulou, M., Leite, M., van Mierlo, P., Vonck, K., Lemieux, L.,
Friston, K., & Marinazzo, D. (2015). Tracking slow modulations
in synaptic gain using dynamic causal modelling: Validation in
epilepsy. NeuroImage, 107, 117–126. https://doi.org/10.1016/j.
neuroimage.2014.12.007

Patel, R., Bowman, F. D., & Rilling, J. (2006). A Bayesian approach
to determining connectivity of the human brain. Menschliches Gehirn
Mapping, 27(3), 267–76. https://doi.org/10.1002/hbm.20182

Penny, W., Stephan, K., Mechelli, A., & Friston, K.

(2004). Mod-
elling functional integration: A comparison of structural equa-
tion and dynamic causal models. NeuroImage, 23(S1), 264–74.
https://doi.org/10.1016/j.NeuroImage.2004.07.041

Penny, W. D. (2012). Comparing dynamic causal models using AIC,
BIC and free energy. NuroImage, 59(1), 319–330. https://doi.org/
10.1016/j.neuroimage.2011.07.039

Penny, W. D., Stephan, K. E., Daunizeau, J., Rosa, M. J., Friston,
(2010). Comparing families of dynamic
K. J., & et al., T. M. S.
causal models. PLoS Computational Biology, 6(3), e1000709.
https://doi.org/10.1371/journal.pcbi.1000709

Poldrack, R. A. (2007). Region of interest analysis for fMRI. Sozial
Cognitive and Affective Neuroscience, 2(1), 67–70. https://doi.
org/10.1093/scan/nsm006

Prando, G., Zorzi, M., Bertoldo, A., & Chiuso, A. (2017). Estimat-
ing effective connectivity in linear brain network models. arXiv
preprint. Retrieved from https://arxiv.org/abs/1703.10363

Netzwerkneurowissenschaften

270

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

Protzner, A. B., & McIntosh, A. R.

(2006). Testing effective con-
nectivity changes with structural equation modeling: What does
a bad model tell us? Kartierung des menschlichen Gehirns, 27(12), 935–47.
https://doi.org/10.1002/hbm.20233

Ramsey, J., Zhang, J., & Spirtes, P. (2006). Adjacency-faithfulness and
conservative causal inference. In Proceedings of the 22nd Annual
Conference on Uncertainty in Artiﬁcial Intelligence (S. 401–8).
Ramsey, J. D. (2015). Scaling up Greedy Causal Search for contin-

uous variables. arXiv:1507.7749.

Ramsey, J. D., Glymour, M., Sanchez-Romero, R., & Glymour,
C. (2017). A million variables and more: The fast greedy equiv-
alence search algorithm for learning high-dimensional graphi-
cal causal models, with an application to functional magnetic
resonance images. International Journal of Data Science and
Analytics, 3(2), 121–9. https://doi.org/10.1007/s41060-016-
0032-z

Ramsey, J. D., Hanson, S. J., & Glymour, C. (2011). Multi-subject
search correctly identiﬁes causal connections and most causal
directions in the DCM models of
the Smith et al. simula-
tion study. NeuroImage, 58(3), 838–48. https://doi.org/10.1016/
j.NeuroImage.2011.06.068

Ramsey,

J. D., Hanson, S. J., Hanson, C., Halchenko, Y. O.,
Poldrack, R., & Glymour, C. (2010). Six problems for causal in-
ference from fMRI. NeuroImage, 49(2), 1545–58. https://doi.org/
10.1016/j.NeuroImage.2009.08.065

Ramsey, J. D., Sanchez-Romero, R., & Glymour, C. (2014). Nicht-
Gaussian methods and high-pass ﬁlters in the estimation of
effective connections. NeuroImage, 84, 986–1006. https://doi.
org/10.1016/j.neuroimage.2013.09.062

Razi, A., & Friston, K. J. (2016). The connected brain: Causality,
Modelle, and intrinsic dynamics. IEEE Signal Processing Magazine,
33(3), 14–35. https://doi.org/10.1109/MSP.2015.2482121

Razi, A., Seghier, M. L., Zhou, Y., McColgan, P., Zeidman, P., Park,
H.-J., . . . Friston, K.-J. (2017). Large-scale DCMs for resting state
fMRT. Netzwerkneurowissenschaften, 1, 222–241.

Regner, M. F., Saenz, N., Maharajh, K., Yamamoto, D. J., Mohl, B.,
Wylie, K., . . . Tanabe, J.
(2016). Top-down network effective
connectivity in abstinent substance dependent individuals. PLoS
Eins, 11(10), e0164818. https://doi.org/10.1371/journal.pone.
0164818

Richardson, T., & Spirtes, P. (2001). Automated discovery of linear
feedback models. In C. Glymour & G. Cooper (Hrsg.), Computa-
tion, Causation and Causality. Cambridge, MA: MIT Press.

Roebroeck, A., Formisano, E., & Goebel, R. (2005). Mapping
directed inﬂuence over
the brain using Granger causality
and fMRI. NeuroImage, 25(1), 230–42. https://doi.org/10.1016/
j.NeuroImage.2004.11.017

Roebroeck, A., Seth, A. K., & Valdes-Sosa, P. (2011). Causal time
series analysis of functional magnetic resonance imaging data.
Journal of Machine Learning Research: Workshop and Confer-
ence Proceedings, 12, 65–94.

Rohrer, J. M. (2017). Clarifying the confusion surrounding correla-
tionen, statistical control and causation. PsyArXiv preprint. https://
doi.org/10.17605/OSF.IO/T3QUB

Rowe, J., Hughes, L., Barker, R., & Owen, A.

(2010). Dynamic
causal modelling of effective connectivity from fMRI: Are results
reproducible and sensitive to Parkinson’s disease and its treat-

ment? NeuroImage, 52(3), 1015–26. https://doi.org/10.1016/
j.NeuroImage.2009.12.080

Ryali, S., Shih, Y. Y., Chen, T., Kochalka, J., Albaugh, D., Fang, Z.,
. . . Menon, V. (2016). Combining optogenetic stimulation and
fMRI to validate a multivariate dynamical systems model for es-
timating causal brain interactions. NeuroImage, 132, 398–405.
https://doi.org/10.1016/j.NeuroImage.2016.02.067

Ryali, S., Supekar, K., Chen, T., & Menon, V.

(2011). Multivari-
ate dynamical systems models for estimating causal interactions
in fMRI. NeuroImage, 54(2), 807–23. https://doi.org/10.1016/
j.NeuroImage.2010.09.052

Sathian, K., Deshpande, G., & Stilla, R.

(2013). Neural changes
with tactile learning reﬂect decision-level reweighting of percep-
tual readout. Zeitschrift für Neurowissenschaften, 33(12), 5387–98. https://
doi.org/10.1523/JNEUROSCI.3482-12.2013

Schiefer, J., Niederbühl, A., Pernice, V., Lennartz, C., Hennig, J.,
LeVan, P., & Rotter, S. (2018). From correlation to causation:
Estimating effective connectivity from zero-lag covariances of
brain signals. PLoS Computational Biology, 14(3), e1006056.
https://doi.org/10.1371/journal.pcbi.1006056

Schlösser, R., Gesierich, T., Kaufmann, B., Vucurevic, G.,
Hunsche, S., Gawehn, J., & Stoeter, P. (2003). Altered effec-
tive connectivity during working memory performance in
schizophrenia: A study with fMRI and structural equation model-
ing. NeuroImage, 19(3), 751–63. https://doi.org/10.1016/S1053-
8119(03)00106-X

Schlösser, R. G. M., Wagner, G., Koch, K., Dahnke, R.,
Reichenbach, J. R., & Sauer, H. (2008). Fronto-cingulate effec-
tive connectivity in major depression: A study with fMRI and
Dynamic Causal Modeling. NeuroImage, 43(3), 645–55.

Schreiber, T. (2000). Measuring information transfer. Physical Re-
view Letters, 85(2), 461–4. https://doi.org/10.1103/PhysRevLett.
85.461

Schurger, A., & Uithol, S. (2015). Nowhere and everywhere: The causal
origin of voluntary action. Review of Philosophy and Psychol-
Ogy, 6(4), 761–78. https://doi.org/10.1007/s13164-014-0223-2
Johnstone, T., &
Davidson, R. J. (2010). Dynamic causal modeling applied to fMRI
data shows high reliability. NeuroImage, 49(1), 603–11. https://
doi.org/10.1016/j.neuroimage.2009.07.015

Schuyler, B., Ollinger,

J. M., Oakes, T. R.,

Schwab, S., Harbord, R., Zerbi, V., ad S. Afyouni, L. E., Schmied, J. Q.,
Woolrich, M. W., . . . Nichols, T. E. (2018). Directed functional
connectivity using dynamic graphical models. NeuroImage,
S1053–8119(18), 30284–2. https://doi.org/10.1016/j.neuroimage.
2018.03.074

Schwarz, G. E. (1978). Estimating the dimension of a model. Annals
of Statistics, 6(2), 461–4. https://doi.org/10.1214/aos/1176344136
Seghier, M. L., & Friston, K. J. (2013). Network discovery with
large DCMs. NeuroImage, 68, 181–91. https://doi.org/10.1016/
j.neuroimage.2012.12.005

Sengupta, B., Friston, K. J., & Penny, W. D. (2015). Gradient-free
mcmc methods for dynamic causal modelling. NeuroImage, 112,
375–81. https://doi.org/10.1016/j.NeuroImage.2015.03.008
Seth, A. K., Barrett, A. B., & Barnett, L. (2015). Granger causality
analysis in neuroscience and neuroimaging. Journal of Neuro-
Wissenschaft, 35(8), 3293–7. https://doi.org/10.1523/JNEUROSCI.
4399-14.2015

Netzwerkneurowissenschaften

271

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

Seth, A. K., Chorley, P., & Barnett, L. C.

(2013). Granger causal-
ity analysis of fMRI BOLD signals is invariant to hemodynamic
convolution but not downsampling. NeuroImage, 65, 540–55.
https://doi.org/10.1016/j.NeuroImage.2012.09.049

Shannon, C. E. (1948). A mathematical theory of communication.
Bell System Technical Journal, 27(4), 623–56. https://doi.org/
10.1002/j.1538-7305.1948.tb01338.x

Sharaev, M., Ushakov, V., & Velichkovsky, B. (2016). Causal in-
teractions within the default mode network as revealed by low-
frequency brain ﬂuctuations and information transfer entropy. In
A. V. Samsonovich, V. V. Klimov, & G. V. Rybina (Hrsg.), Biologi-
cally Inspired Cognitive Architectures (bica) for Young Scientists :
Proceedings of the First International Early Research Career En-
hancement School (FIERCES 2016) (S. 213–18).

Shimizu, S. (2014). LiNGAM: Non-Gaussian methods for estimat-
ing causal structures. Behaviormetrika, 41(1), 65–98. https://doi.
org/10.2333/bhmk.41.65

Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A
linear non-gaussian acyclic model for causal discovery. Zeitschrift
of Machine Learning Research, 7, 2003–30.

Shlens, J. (2014). A tutorial on principal component analysis. arxiv.

org/abs/1404.1100

Schmied, S., Müller, K., Salimi-Khorshidi, G., Webster, M., Beckmann,
C., Nichols, T., . . . Woolrich, M.
(2011). Network modelling
methods for fMRI. NeuroImage, 54(2), 875–91. https://doi.org/
10.1016/j.NeuroImage.2010.08.063

Schmied, S. M., Fuchs, P. T., Müller, K. L., Glahn, D. C., Fuchs, P. M.,
Mackay, C. A., . . . Beckmann, C. F.
(2009). Correspondence
of the brain’s functional architecture during activation and rest.
Verfahren der Nationalen Akademie der Wissenschaften, 106(31),
13040–5. https://doi.org/10.1073/pnas.0905267106

Solo, V. (2016). State-space analysis of Granger-geweke causality
measures with application to fMRI. Neural Computation, 28(5),
914–49. https://doi.org/10.1162/NECO_a_00828

Spirtes, P.

(2010).

Introduction to causal

inference.

Zeitschrift für

Machine Learning Research, 11, 1643–62.

Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Predic-
tion, and Search. Springer-Verlag Lecture Notes in Statistics.
Stanley, M. L., Moussa, M. N., Paolini, B. M., Lyday, R. G., Burdette,
J. H., & Laurienti, P. J. (2013). Deﬁning nodes in complex
brain networks. Frontiers in Computational Neuroscience, 7,
169. https://doi.org/10.3389/fncom.2013.00169

Stephan, K. E., Kasper, L., Harrison, L., Deaunizeau, J., van den
(2008).
Ouden, H. E. M., Breakspear, M., . . . Friston, K. J.
Nonlinear dynamic causal models for fMRI. NeuroImage, 42(2),
649–62. https://doi.org/10.1016/j.NeuroImage.2008.04.262
Stephan, K. E., Penny, W. D., Moran, R. J., den Ouden, H. E.,
Daunizeau, J., & Friston, K. J. (2010). Ten simple rules for dy-
namic causal modeling. NeuroImage, 49(4), 3099–109. https://
doi.org/10.1016/j.NeuroImage.2009.11.015

Stephan, K. E., & Roebroeck, A. (2012). A short history of causal
modeling of fMRI data. NeuroImage, 62(2), 856–63. https://doi.
org/10.1016/j.NeuroImage.2012.01.034

Stephan, K. E., Weiskopf, N., Drysdale, P. M., Robinson, P. A., &
Friston, K. J.
(2007). Comparing hemodynamic models with
DCM. NeuroImage, 38(3), 387–401. https://doi.org/10.1016/
j.neuroimage.2007.07.040

Stokes, P. A., & Purdon, P. L. (2017). A study of problems encoun-
tered in Granger causality analysis from a neuroscience perspec-
tiv. Verfahren der Nationalen Akademie der Wissenschaften. https://
doi.org/10.1073/pnas.1704663114

Tak, S., Noh, J., Cheong, C., Zeidman, P., Razi, A., Penny, W. D., &
Friston, K. J. (2018). A validation of dynamic causal modelling
for 7T fMRI. Journal of Neuroscience Methods. https://doi.org/
10.1016/j.jneumeth.2018.05.002

Thamvitayakul, K., Shimizu, S., Ueno, T., Washio, T., & Tashiro, T.
(2012). Assessing statistical reliability of LiNGAM via multiscale
In Proceedings of 2012 IEEE 12th International Con-
bootstrap.
ference on Data Mining Workshops (icdmw2012).

Thirion, B., Varoquaux, G., Dohmatob, E., & Polina, J. B.

(2014).
Which fMRI clustering gives good brain parcellations? Grenzen in
Neurowissenschaften, 8, 167. https://doi.org/10.3389/fnins.2014.00167
Thulasiraman, K., & Swamy, M. N. S. (1992). Directed acyclic graphs.
In Graphs: Theory and Algorithms. New York: John Wiley and
Son.

Triantafyllou, C., Hoge, R. D., & Wald, L. (2006). Effect of spa-
tial smoothing on physiological noise in high-resolution fMRI.
NeuroImage, 32(2), 551–7. https://doi.org/10.1016/j.neuroimage.
2006.04.182

Valdes-Sosa, P. A., Roebroeck, A., Daunizeau, J., & Friston, K.
(2011). Effective connectivity: Inﬂuence, causality and biophys-
ical modeling. NeuroImage, 58(2), 339–61. https://doi.org/10.
1016/j.NeuroImage.2011.03.058

van den Heuvel, M., Mandl, R., & Pol, R. H. (2008). Normalized
cut group clustering of resting-state fMRI data. PLoS One, 3(4),
e2001. https://doi.org/10.1016/j.NeuroImage.2008.08.010
van Oort, E. S. B., Mennes, M., Schröder, T. N., Kumar, V. J.,
Jimenez, N. ICH. Z., Grodd, W., . . . Beckmann, C. F. (2017). Func-
tional parcellation using time courses of instantaneous connec-
tivity. NeuroImage. https://doi.org/10.1016/j.neuroimage.2017.
07.027

Vaudano, A. E., Avanzini, P., Tassi, L., Ruggieri, A., Cantalupo, G.,
Benuzzi, F., . . . Meletti, S. (2013). Causality within the epilep-
tic network: An EEG-fMRI study validated by intracranial EEG.
Frontiers in Neurology, 14(4), 185. https://doi.org/10.3389/fneur.
2013.00185

Vicente, R., Wibral, M., Lindner, M., & Pipa, G.

(2011). Transfer
entropy—A model-free measure of effective connectivity for the
Journal of Computational Neuroscience, 30(1),
neurosciences.
45–67. https://doi.org/10.1007/s10827-010-0262-3

Wang, Y., Katwal, S., Rogers, B., Gore,

J., & Deshpande, G.
(2016). Experimental validation of dynamic Granger causality for
inferring stimulus-evoked sub-100ms timing differences from
IEEE Transactions on Neural Systems and Rehabilita-
fMRT.
tion Engineering, PP(99). https://doi.org/10.1109/TNSRE.2016.
2593655

Webb, J. T., Ferguson, M. A., Nielsen, J. A., & Anderson, J. S. (2013).
BOLD Granger causality reﬂects vascular anatomy. PLoS One,
8:e84279. https://doi.org/10.1371/journal.pone.0084279

Wheelock, M. D., Sreenivasan, K. R., Holz, K. H., Hoef, L. W. V.,
Deshpande, G., & Ritter, D. C.
(2014). Threat-related learn-
ing relies on distinct dorsal prefrontal cortex network connec-
tivity. NeuroImage, 102(2), 904–12. https://doi.org/10.1016/j.
NeuroImage.2014.08.005

Netzwerkneurowissenschaften

272

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u
N
e
N
A
R
T
ich
C
e
–
P
D

F
/

3
2
2
3
7
1
0
9
2
5
4
5
N
e
N
_
A
_
0
0
0
6
2
P
D

B
j
G
u
e
S
T

Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Disentangling causal webs in the brain using fMRI

Wollstadt, P., Martinez-Zarzuela, M., Vicente, R., Diaz-Pernas, F. J.,
& Wibral, M. (2014). Efﬁcient transfer entropy analysis of non-
stationary neural time series. PLoS One, 9(7), e102833. https://
doi.org/10.1371/journal.pone.0102833

Wright, S. (1920). The relative importance of heredity and environ-
ment in determining the piebald pattern of guinea-pigs. Proceed-
ings of the National Academy of Sciences, 6(6), 320–32. https://
doi.org/10.1073/pnas.6.6.320

Xu, L., Fan, T., Wu, X., Chen, K., Guo, X., Zhang, J., & Yao, L.
(2014). A pooling-LiNGAM algorithm for effective connectivity
analysis of fMRI data. Frontiers in Computational Neuroscience,
8, 125. https://doi.org/10.3389/fncom.2014.00125

Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierar-
chical modular optimization of convolutional networks achieves
representations similar to macaque it and human ventral stream.
In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q.
Weinberger (Hrsg.), Advances in Neural Information Processing
Systeme 26 (S. 3093–3101). Curran Associates, Inc.

Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A.,
Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hier-
archical models predict neural responses in higher visual cortex.
Proceedings of the National Academy of Sciences of the United

States of America, 111(23), 8619–8624. https://doi.org/10.1073/
pnas.1403112111

Yu, X., Qian, C., Chen, D.-y., Dodd, S. J., & Koretsky, A. P. (2014).
Deciphering laminar-speciﬁc neural inputs with line-scanning
fMRT. Nature Methods, 11(1), 55–58.

Zhang,

J. (2008). On the completeness of orientation rules for
causal discovery in the presence of latent confounders and selec-
tion bias. Artiﬁcial Intelligence, 172(16–17), 1873–96. 0.1016/
j.artint.2008.08.001

Zhao, Z., Wang, X., Fan, M., Yin, D., Sun, L., Jia, J., . . . Gong, J.
(2016). Altered effective connectivity of the primary motor cor-
tex in stroke: A resting-state fMRI study with Granger causality
Analyse. PLoS One, 11(11), e0166210. https://doi.org/10.1371/
zeitschrift.pone.0166210

Zhuang, J., LaConte, S., Peltier, S., Zhang, K., & Hu, X. (2005).
Connectivity exploration with structural equation modeling: Ein
fMRI study of bimanual motor coordination. NeuroImage, 25(2),
462–70. https://doi.org/10.1016/j.NeuroImage.2004.11.007
Zhuang, J., Peltier, S., Er, S., LaConte, S., & Hu, X. (2008). Mapping
the connectivity with structural equation modeling in an fMRI
study of shape from motion task. NeuroImage, 42(2), 799–806.
https://doi.org/10.1016/j.neuroimage.2008.05.036