RESEARCH - Specialized Research AI at MIT

RESEARCH

Inferring network properties from time series
using transfer entropy and mutual information:
Validation of multivariate versus
bivariate approaches

Leonardo Novelli

1 and Joseph T. Lizier

a n o p e n a c c e s s

j o u r n a l

1Centre for Complex Systems, Faculty of Engineering, University of Sydney, Sydney, Australia

Keywords: Directed connectivity, Functional connectivity, Network inference, Multivariate transfer
entropy, Information theory, Complex networks

ABSTRACT

Functional and effective networks inferred from time series are at the core of network
neuroscience. Interpreting properties of these networks requires inferred network models to
reﬂect key underlying structural features. However, even a few spurious links can severely
distort network measures, posing a challenge for functional connectomes. We study the
extent to which micro- and macroscopic properties of underlying networks can be inferred
by algorithms based on mutual information and bivariate/multivariate transfer entropy. The
validation is performed on two macaque connectomes and on synthetic networks with
various topologies (regular lattice, small-world, random, scale-free, modular). Simulations
are based on a neural mass model and on autoregressive dynamics (employing Gaussian
estimators for direct comparison to functional connectivity and Granger causality). We ﬁnd
that multivariate transfer entropy captures key properties of all network structures for longer
time series. Bivariate methods can achieve higher recall (sensitivity) for shorter time series but
are unable to control false positives (lower speciﬁcity) as available data increases. This leads
to overestimated clustering, small-world, and rich-club coefﬁcients, underestimated shortest
path lengths and hub centrality, and fattened degree distribution tails. Caution should
therefore be used when interpreting network properties of functional connectomes obtained
via correlation or pairwise statistical dependence measures, rather than more holistic
(yet data-hungry) multivariate models.

AUTHOR SUMMARY

We compare bivariate and multivariate methods for inferring networks from time series,
which are generated using a neural mass model and autoregressive dynamics. We assess
their ability to reproduce key properties of the underlying structural network. Validation is
performed on two macaque connectomes and on synthetic networks with various topologies
(regular lattice, small-world, random, scale-free, modular). Even a few spurious links can
severely bias key network properties. Multivariate transfer entropy performs best on all
topologies for longer time series.

Citation: Novelli, L., & Lizier, J. T.
(2021). Inferring network properties
from time series using transfer entropy
and mutual information: Validation of
multivariate versus bivariate
approaches. Network Neuroscience,
5(2), 373–404. https://doi.org/10.1162
/netn_a_00178

DOI:
https://doi.org/10.1162/netn_a_00178

Supporting Information:
https://doi.org/10.1162/netn_a_00178
https://github.com/pwollstadt/IDTx1
https://github.com/LNov/infonet

Received: 22 July 2020
Accepted: 3 December 2020

Competing Interests: The authors have
declared that no competing interests
exist.

Corresponding Author:
Leonardo Novelli
leonardo.novelli@sydney.edu.au

Handling Editor:
Daniele Marinazzo

Copyright: © 2020
Massachusetts Institute of Technology
Published under a Creative Commons
Attribution 4.0 International
(CC BY 4.0) license

The MIT Press

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

Brain parcellation:
The subdivision of the brain into
“parcels” or regions of interest,
serving as nodes in network-based
models of the brain.

Network topology:
The structure of the network
describing the connections between
its elements.

Multivariate network inference:
Considering multiple time series at
once (as opposed to bivariate
inference, which considers each pair
of nodes in isolation).

INTRODUCTION

Functional and effective network inference in neuroscience typically involves preprocessing
the data, deﬁning the parcellation, extracting the time series, inferring the links in the model
network, and measuring network properties, for example, to compare patients and controls
or to predict phenotype (Bassett & Sporns, 2017; Fornito, Zalesky, & Bullmore, 2016). Each
step in the pipeline requires making modeling and analysis choices, whose inﬂuence on the
ﬁnal results is the subject of ongoing research (Aquino, Fulcher, Parkes, Sabaroedin, & Fornito,
2020; Cliff, Novelli, Fulcher, Shine, & Lizier, 2020; Zalesky et al., 2016). As part of this effort,
we study how the choice of different inference algorithms affects the properties of the resulting
network model in comparison to the underlying structural network, and whether the ability to
accurately reﬂect these properties changes across different underlying structural networks.

The structure (or topology) of a network can be described at multiple scales (Basett
& Sporns, 2017): from the microscopic (individual links), to the mesoscopic (modules and
motifs) and the macroscopic (summary statistics, such as average shortest path length and
measures of small-worldness; Rubinov & Sporns, 2010). At each scale, the structure is asso-
ciated with development, aging, cognition, and neuropsychiatric diseases (Xia et al., 2020).
Previous studies have assessed the performance of different network inference algorithms in
identifying the structural links at the microscale (Kim, Rogers, Sun, & Bollt, 2016; Novelli,
Wollstadt, Mediano, Wibral, & Lizier, 2019; Razi, Kahan, Rees, & Friston, 2015; Runge,
Nowack, Kretschmer, Flaxman, & Sejdinovic, 2018; Sun, Taylor, & Bollt, 2015). The goal
of this work is to extend the assessment to all scales and to a variety of topologies, across a
range of related network inference algorithms. We link the performance at the microscale to
the resulting network properties at the macroscale, and describe how this changes as a function
of the overall topology.

We compare bivariate and multivariate approaches for inferring network models, employ-
ing statistical dependence measures based on information theory (Shannon, 1948). These
approaches include functional network inference, which produces models of networks of pair-
wise or bivariate statistical relationships between nodes, and can either quantify undirected
statistical dependence, in the case of mutual information (MI; Cover & Thomas, 2005), or di-
rected dependence, in the case of transfer entropy (TE; Bossomaier, Barnett, Harré, & Lizier,
2016; Schreiber, 2000). These approaches also include effective network inference, which is
intended to produce the simplest possible circuit models that explain the observed responses
In this class, we evaluate the use of multivariate
(Aertsen, Gerstein, Habib, & Palm, 1989).
TE, which, in contrast to the bivariate approaches, aims to minimize spurious links and infer
minimal models of the parent sets for each target node in the network.

All of these inference techniques seek to infer a network model of the relationships between
the nodes in a system. Different methods capture different aspects of these relationships and
don’t necessarily seek to replicate the underlying structural topology, nor do we expect them
to in general (particularly in neuroimaging experiments, where aspects of the structure may
be expressed more or less or not at all, depending on the cognitive task). In spite of that, in
this paper we do seek to evaluate and indeed validate these methods in inferring microscopic,
mesoscopic, and macroscopic features of the underlying network structure. Crucially, we per-
form this validation under idealized conditions—including full observability, stationarity no
subsampling, and so fourth—that allow us to establish a hypothesis that effective networks
should be not just complementary to the structural but converge to it under these conditions,
as our available data increase. Indeed, under these idealized conditions (speciﬁcally in the
absence of hidden nodes, and other simplifying assumptions, including stationarity), effective

Network Neuroscience

374

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

networks inferred via multivariate TE are proven to converge to the underlying structure for
sufﬁciently long time series (Runge, 2018; Sun et al., 2015). In gaining an understanding of
these multivariate effective connectivity inference algorithms, it is important to validate that
they perform to that expectation where it is applicable, and investigate how that performance
varies with respect to factors such as sample size, and so on. In doing so, we also address the
recent call for more extensive model diversity in testing multivariate algorithms: “To avoid bi-
ased conclusions, a large number of different randomly selected connectivity structures should
be tested [including link density as well as properties such as small-worldness]” (Runge, 2018,
p. 13).

Outside of these idealized circumstances though, we can no longer make a clear general
hypothesis on how the effective network models are expected to reﬂect the underlying struc-
ture, yet a successful validation gives conﬁdence that the directed statistical relationships they
represent remain accurate as an effective network model at the microscale. Furthermore, it
is at least desirable for not only effective networks but also functional networks to recognize
important features in the underlying network structure: to track overall regime changes in the
macroscopic structure reliably, and to reﬂect the mesoscopic properties of distinctive nodes
(or groups of nodes) in the structure. The desire for recognition of important features in the
network is applicable whether the inference is made under idealized conditions or not.

This motivates our validation study, which is primarily conducted under idealized condi-
tions as above and based on synthetic datasets involving ground truth networks of 100–200
nodes with different topologies, from regular lattice to small-world, random, scale-free, and
modular. Many of these structural properties are incorporated in the macaque connectomes
analyzed in the last section. At the macroscale, we measure several fundamental and widely
used properties, including shortest path length, clustering coefﬁcient, small-world coefﬁcient,
betweenness centrality, and features of the degree distributions (Rubinov & Sporns, 2010).
These properties of the inferred network models are compared with those of the real under-
lying structural networks in order to validate and benchmark different inference algorithms
in terms of their ability to capture the key properties of the underlying topologies. At the
microscale, the performance is assessed in terms of precision, recall, and speciﬁcity of the
inferred model in classifying the links of the underlying structural network. As above, while
we do not expect all approaches to strictly capture the microscale features, these results help
to explain their performance at the macroscale.

For most of our experiments, the time series of node activity on these networks are generated
by vector autoregressive (VAR) dynamics, with linearly coupled nodes and Gaussian noise.
Both the VAR process and the inference algorithms are described in detail in the Methods sec-
tion, where we also discuss how MI and the magnitude of Pearson correlation are equivalent for
stationary VAR processes. This implies that the undirected networks obtained via the bivariate
MI algorithm are equivalent to the widely employed undirected functional networks obtained
via correlation, extending the implications of our results beyond information-theoretic meth-
ods. Further, our results based on TE extend to Granger causality, which is equivalent to TE for
stationary VAR processes (Barnett, Barrett, & Seth, 2009). Networks inferred using bivariate TE
are typically referred to as directed functional networks, to emphasize the directed nature of
their links. The extension to multivariate TE for effective network inference can also be viewed
as an extension to multivariate Granger causality for the stationary VAR processes here.

We ﬁnd that multivariate TE performs better on all network topologies at all scales, for
longer time series. Bivariate methods can achieve better recall with limited amount of data

Network Neuroscience

375

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

(shorter time series) in some circumstances, but the precision and the ability to control false
positives are neither consistent nor predictable a priori. On the other hand, thanks to recent
statistical improvements, multivariate TE guarantees high speciﬁcity and precision regardless
of the amount of data available, and the recall steadily increases with more data. We discuss
how the mesoscopic properties of the underlying structural network—particularly the network
motifs—can inﬂuence the precision and recall of the model at the microscale.
In turn, we
show how the performance at the microscale affects the inferred network properties at the
macroscale. We observe that bivariate methods are often unable to capture the most distinctive
topological features of the networks under study (including path length, clustering, degree
distribution, and modularity), largely because of their inability to control false positives at the
microscale.

Our ﬁnal section moves beyond the validation under idealized conditions to extend the
experiments to time series from a neural mass model employed on the CoCoMac connectome.
Although the incorporation of complexities such as nonlinear interactions and subsampled
time series do somewhat reduce the performance of the methods, the superior performance of
multivariate TE aligns with the results above from the validation in idealized conditions. While
further and more wide-ranging experiments are required, our experiments provide substantial
evidence for the validity of multivariate TE in providing effective network models, which still
retain meaningful network insights in more realistic conditions.

METHODS

Generating Dynamics on Networks

Two models are used to generate time series dynamics on networks of coupled variables. Vec-
tor autoregressive processes are employed for validation studies under idealized conditions,
and a neural mass model is used on the weighted CoCoMac connectome as a ﬁnal investi-
gation going beyond these conditions. The simulation and analysis pipeline is illustrated in
Figure 1.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Networks of linearly coupled Gaussian variables. We simulate discrete-time, stationary, ﬁrst-
order vector autoregressive processes (VAR) on underlying structural networks of N nodes. A
VAR process is described by the recurrence relation

Z(t + 1) = Z(t) · C + ε(t),

(1)

where Z(t) is a row vector and Zi(t) is the activity of node i at time t. The Gaussian noise ε(t)
is spatially and serially uncorrelated, with standard deviation θ = 0.1. The N × N weighted
adjacency matrix C = [Cij] describes the network structure, where Cij is the weight of the
directed connection from node i to node j. These dynamics are generated on various network
topologies, as detailed in the following sections. The choice of the weights Cij (detailed in
the following sections) guarantees the stability of the system, which is a sufﬁcient condition
for stationarity (Atay & Karabacak, 2006). Since stationary VAR processes have multivariate
Gaussian distributions, the information-theoretic measures we use can be directly related to
Pearson correlation and Granger causality (Granger, 1969). The simple VAR dynamics are
chosen as the primary model for our validation studies instead of nonlinear alternatives be-
cause the main goal is not to prove the superiority of nonlinear dependence measures on
nonlinear systems, which has been shown elsewhere (Novelli et al., 2019). We rather aim to
show that, even on linearly coupled Gaussian variables—perfectly suitable to be studied via
cross-correlations—multivariate approaches are better able to infer the macroscopic network

Network Neuroscience

376

Inferring network properties from time series

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 1. Pipeline of the network inference comparison, using the CoCoMac connectome as an example (see Methods and Numerical
Simulations in the Macaque Connectome section for full details). With the real adjacency matrix deﬁning the links, synthetic time series are
generated using either a linear autoregressive (VAR) system with uniform link weights, or a nonlinear neural mass model with realistic weights.
Three network inference algorithms (bivariate MI, bivariate TE, multivariate TE) are then employed to analyze the time series. Bivariate methods
consider pairs of nodes independently of each other, either taking the past states into account (bivariate TE) or not (bivariate MI). On the other
hand, multivariate TE analyzes pairs of nodes in the context of the whole network. At the microscale, the links in the inferred networks
are classiﬁed as true/false positives/negatives and these scores are used to compute standard performance measured (precision, recall, and
speciﬁcity). At the macroscale, the performance is instead measures according to the ability of each algorithm to faithfully reﬂect network
properties (summary statistics) of the underlying structural network serving as ground truth.

Network Neuroscience

377

Inferring network properties from time series

properties.
In addition, the VAR dynamics are amenable to be investigated using the faster
Gaussian estimator for the information-theoretic measures, allowing us to carry out more
extensive simulations over a wider range of parameters.

To provide an extension beyond the linear
Neural mass model on the CoCoMac connectome.
VAR dynamics, neural activity in various brain regions is modeled (following Shine, Aburn,
Breakspear, & Poldrack, 2018; Li et al., 2019) as an oscillating two-dimensional neu-
ral mass model derived by mode decomposition from the Fitzhugh-Nagumo single neuron
model (FitzHugh, 1961). As previously presented (Li et al., 2019; Shine et al., 2018), the
CoCoMac connectome (Kötter, 2004) is used to provide directed coupling between 76 re-
gions, with axonal time delays between these regions based on the length of ﬁber tracts
as estimated by diffusion spectrum imaging (Sanz Leon et al., 2013). Data provided from
Li et al. (2019) and Shine et al. (2018) was simulated using the open-source framework The
Virtual Brain (Sanz Leon et al., 2013), using code implementing the model freely available at
https://github.com/macshine/gain_topology (Shine, 2018).

Langevin equations (Equation 2) specify the neural mass model, via the dynamics of local

mean membrane potential (V) and the slow recovery variable (W) at each regional node i:

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(cid:2)

Wi(t) + 3Vi(t)2 − Vi(t)3 + γIi

˙Vi(t) = 20
˙Wi(t) = 20(−Wi(t) − 10Vi(t)) + η

(cid:3)

+ ξ

i(t),

i(t).

(2)

In the above, ξ

i and η

i are independent standard Wiener noises and Ii is the synaptic current

Ii = ∑

CjiSj(t − τ

ji),

(3)

with Cji indicating the connection weight from j to i and incorporating time delays τ
ji from j to
i (estimated as described above). The CoCoMac connectome network contains 1,560 directed
connections (including 66 self-links), with τ
ji on non-self links having an average of 19.8 ms
(standard deviation 8.32 ms). The membrane potentials Vi are converted to normalized ﬁring
rates Si via a sigmoid activation function

Si(t) =

1
1 + e−σ(Vi(t)−m) ,

(4)

with parameter m = 1.5 chosen to align the sigmoid with its typical input. The parameters for
gain σ = 0.5 (in Equation 4) and excitability γ = 0.3 (in Equation 2) are selected to simulate
activity in the integrated regime of dynamics identiﬁed by Shine et al. (2018).

Finally, the time series of membrane voltage Vi(t) (originally obtained with a 0.5-ms tem-
poral resolution via stochastic Heun integration) are subsampled at 15 ms (selected as half the
median time for the autocorrelation functions to decay to 1/e).

Network Inference Algorithms

As illustrated in Figure 1, three algorithms are employed to infer network models from the time
series using the IDTxl Python package (Wollstadt et al., 2019):

Bivariate mutual information for functional connectivity. Mutual information (MI) is computed
between all pairs of nodes independently, in a bivariate fashion, and only the measurements

IDTxl:
The Information Dynamics Toolkit xl
is an open-source Python package
for network inference, freely
available on GitHub (https://github
.com/pwollstadt/IDTxl).

Network Neuroscience

378

Inferring network properties from time series

that pass a strict statistical signiﬁcance test (described below) are interpreted as undirected
links.

MI is a measure of statistical dependence between random variables (Cover & Thomas,
2005), introduced by Shannon in laying the foundations of information theory (Shannon,
1948). Formally, the MI between two continuous random variables X and Y with joint proba-
bility density function μ(x, y) and marginal densities μ

Y(y) is deﬁned as

X(x) and μ

(cid:4) (cid:4)

I(X; Y) :=

μ(x, y) log

μ(x, y)
X(x)μ

Y(y)

dxdy,

(5)

where the integral is taken over the set of pairs (x, y) such that μ(x, y) > 0. The strength of MI
lies in its model-free nature, meaning that it doesn’t require any assumptions on the distribution
or the variables (e.g., Gaussian). Being able to capture nonlinear relationships, MI is typically
presented as a generalized version of the Pearson correlation coefﬁcient. However, for the VAR
processes considered here (Equation 1) with stationary multivariate Gaussian distributions, the
MI between two variables X and Y under these dynamics is completely determined by the
magnitude of their Pearson correlation coefﬁcient ρ (Cover & Thomas, 2005):

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

I(X; Y) = − ln(1 − ρ2).

(6)

Crucially, this one-to-one relationship between MI and the absolute value of ρ (for VAR pro-
cesses) implies that the networks inferred via the bivariate MI algorithm are equivalent to the
functional networks obtained via cross-correlation—widely employed in neuroscience. This
equivalence persists whenever a Gaussian estimator for MI is used (which models the processes
as VAR), even for nonlinear dynamics, as is used in our experiments. Differences may lie in
how the raw MI values are transformed into a network structure. Early approaches often used
a ﬁxed threshold aimed at obtaining a prescribed link density, while the bivariate MI algorithm
used here adopts an adaptive threshold (different for each link) to meet a desired statistical
signiﬁcance level. The statistical signiﬁcance is computed via null hypothesis testing to reﬂect
the probability of observing a larger MI from the same samples if their temporal relationship
were destroyed (the p value is obtained from a chi-square test, as summarized in Lizier, 2014).
The critical level for statistical signiﬁcance is set to α = 0.01/N, where N is the network size.
This produces a Bonferroni correction for the inference of parent nodes for each target (i.e., for
each target, there is a 0.01 chance under the null hypothesis that at least one spurious parent
node is selected, assuming independent sources).

Transfer entropy (TE) is computed
Bivariate transfer entropy for directed functional connectivity.
between all pairs of nodes independently, in a bivariate fashion, and only the measurements
that pass a strict statistical signiﬁcance test (described below) are interpreted as links.

TE is a model-free measure of statistical dependence between random variables (Schreiber,
2000); however, different from MI and cross-correlation, it is a directed and not a symmetric
measure (i.e.,
the TE from a source node X to a target node Y is not necessarily the same as
the TE from Y to X), and speciﬁcally considers information about the dynamic state updates
of the target Y. Thus, employing TE has the advantage of generating directed networks and
providing a more detailed model of the dynamics of the system under investigation. Formally,
the TE from a source stochastic process X to a target process Y is deﬁned as (Schreiber, 2000)

TX→Y(t) := I(Xt−1; Yt|Y k
N>k(N>k

− 1)

(10)

whereby E>k denoted the number of edges among the N>k nodes having degree higher
than a given value k.

SMALL-WORLD NETWORKS

Numerical Simulations

The ﬁrst experiment is aimed at testing the robustness of the three inference algorithms with
respect to vast changes in network structure. The Watts-Strogatz model is used to generate
a spectrum of topologies, ranging from regular lattices to random networks (similar to Erd ˝os-
Rényi networks, although not equivalent; Maier, 2019) through a small-world transition (Watts
& Strogatz, 1998). Each simulation starts with a directed ring network of 100 nodes with
uniform link weights Cij = Cii = 0.15 and ﬁxed in-degree din = 4 (i.e., each node is linked

Network Neuroscience

382

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

to two neighbors on each side, as well as to itself via a self-loop). The source of each link is
then rewired with a given probability p ∈ [0, 1], so as to change the overall network topology
while keeping the in-degree of each node ﬁxed. Only rewiring attempts that keep the network
connected are accepted, in order to allow the measurement of the average shortest path length.
The simulations for each p are repeated 10 times on different network realizations and with
random initial conditions.

Results

At the microscale, the performance is evaluated in terms of precision, recall, and speciﬁcity
in the classiﬁcation of the links (present or absent) in the inferred network compared with the
underlying structural network. In the case of bivariate MI, each undirected link in the inferred
network is represented as two directed links in opposite directions. For longer time series of
10,000 samples, multivariate TE is the most accurate method, achieving optimal performance
according to all metrics on all the network topologies generated by the Watts-Strogatz rewiring
model (Figure 2, right column). Bivariate TE also achieves nearly optimal recall and high speci-
ﬁcity on all topologies; however, despite the strict statistical signiﬁcance level, the precision is
signiﬁcantly lower on lattice-like topologies (low rewiring probability) than on random ones
(high rewiring probability). The opposite trend is shown by the bivariate MI algorithm, whose
precision and recall drastically decrease with increasing rewiring probability. As expected,
the recall of all methods decreases when shorter time series of 1,000 samples are provided
(Figure 2, left column). However, the recall for multivariate TE is consistent across topologies,
while it decreases with higher rewiring probability when bivariate methods are used. This re-
sults in the bivariate TE having larger recall for lattice-like topologies, while multivariate TE has
larger recall than bivariate for more random topologies (i.e.,
for a rewiring probability larger
than p = 0.2). A further interesting effect is that bivariate TE attains better precision on shorter
time series than on longer ones.

At the macroscale, the three algorithms are tested on their ability to accurately measure
three fundamental network properties relevant through the small-world transition, using the
longer time series of 10,000 samples. Multivariate TE is able to closely approximate the real
shortest path length on all the network topologies generated by the Watts-Strogatz rewiring
model, while the bivariate MI and TE algorithms produce signiﬁcant underestimates, particu-
larly on lattice-like topologies (Figure 3). Similarly, multivariate TE is able to closely match the
real mean clustering on all the network topologies generated by the Watts-Strogatz rewiring
model, while the bivariate MI and TE algorithms consistently overestimate it (Figure 4). The
related measure of local efﬁciency (Latora & Marchiori, 2001) is reported in the Supporting
Information.

Given the above results on the characteristic path length and the mean clustering coef-
ﬁcient, it is not surprising that the bivariate MI and TE algorithms signiﬁcantly overestimate
the real small-worldness coefﬁcient, while the multivariate TE method produces accurate esti-
mates on all the network topologies generated by the Watts-Strogatz rewiring model (Figure 5).
Equivalent results are found if the alternative measures of “small-world index” (Neal, 2017) or
“double-graph normalized index” (Telesford, Joyce, Hayasaka, Burdette, & Laurienti, 2011)
are computed instead (not shown).

Discussion

At the microscale, the results concerning the bivariate TE can be explained in the light of the
recent theoretical derivation of TE from network motifs for VAR dynamics (Novelli, Atay, Jost,

Network Neuroscience

383

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Figure 2. Performance as a function of the rewiring probability in Watts-Strogatz ring networks
(N = 100 nodes). Multivariate TE is consistent across network structure and guarantees high pre-
cision regardless of the amount of data available (third row). Bivariate TE (second row) can have
better recall than multivariate TE for shorter time series (T = 1,000, left column) but its precision
is not consistent (it drops when T = 10,000, right column) and the optimal time series length can-
not be determined a priori. Bivariate MI (ﬁrst row) has lower precision and recall than TE-based
methods, for both T = 1,000 and T = 10,000. For each value of the rewiring probability, the re-
sults for 10 simulations on different networks are presented (low-opacity markers) in addition to the
mean values (solid markers).

& Lizier, 2020). For a ﬁxed in-degree, the TE decreases with the rewiring probability, making
it harder for candidate links to pass the statistical signiﬁcance tests when only short time se-
ries are available. This explains why the recall for the bivariate TE slightly drops with higher
rewiring probability for T = 1000 (Figure 2). We can speculate that a similar mechanism
could be responsible for the more drastic drop in the recall for the bivariate MI (via evidence
from derivations for covariances from network structure for similar processes; Pernice, Staude,
Cardanobile, & Rotter, 2011; Schwarze & Porter, 2020). The fact that bivariate TE is larger for

Network Neuroscience

384

Inferring network properties from time series

Figure 3. Characteristic path length as a function of the rewiring probability in Watts-Strogatz
ring networks (N = 100 nodes and T = 10,000 time samples). Multivariate TE is able to closely
approximate the characteristic path length of the real topologies (ground truth). On the other hand,
bivariate MI and TE produce signiﬁcant underestimates due to spurious links creating shortcuts
across the network, particularly on lattice-like topologies (low rewiring probability). The results
for 10 simulations on different network realizations are presented (low-opacity markers) in addition
to the mean values (solid markers).

regular lattice structures also explains why its recall is slightly higher than for multivariate TE
here: The redundancy between close sources that elevates their bivariate TE is explicitly con-
ditioned out of the multivariate TE for secondary sources. On the other hand, as the rewiring
increases, the higher recall for multivariate TE must be due to this method capturing synergistic
effects that (more disparate) multiple sources have on the target, which the bivariate method
does not.

Comparing the results between shorter and longer time series raises another question: Why
is the precision of the bivariate TE worse for longer time series than for shorter ones, espe-
cially for lattice-like topologies? More complex motifs involving common parents and multiple

Figure 4. Average clustering coefﬁcient as a function of the rewiring probability in Watts-Strogatz
ring networks (N = 100 nodes and T = 10,000 time samples). The multivariate TE algorithm closely
matches the average clustering coefﬁcient of the real networks (ground truth), which is instead
overestimated by bivariate MI and TE. The results for 10 simulations on different network realizations
are presented (low-opacity markers) in addition to the mean values (solid markers).

Network Neuroscience

385

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

Figure 5.
Small-world coefﬁcient as a function of the rewiring probability in Watts-Strogatz ring
networks (N = 100 nodes and T = 10,000 time samples). The multivariate TE algorithm pro-
duces accurate estimates of the small-world coefﬁcient of the real topologies (ground truth), which
is instead strongly overestimated by bivariate MI and TE. The results for 10 simulations on differ-
ent network realizations are presented (low-opacity markers) in addition to the mean values (solid
markers).

walks, which are more prevalent in regular lattice topologies, can result in nonzero TE on spu-
rious links. These indirect effects are typically weak; however, for long enough time series, the
low TE values can be distinguished from noise and thus pass the statistical signiﬁcance tests.
The resulting spurious links (false positives) decrease the precision and the speciﬁcity as the
time series length is increased, with the effect being stronger in regular lattice topologies. In
other words, the Bonferroni correction of the statistical signiﬁcance level (i.e., dividing α by
the network size N) does not result in a well-calibrated test for bivariate inference methods—
the sources are correlated, and the tests on them are not independent. The differences in the
speciﬁcity on the plots are subtle because the networks are sparse; however, they manifest in
large differences in the precision. Crucially, this effect is not seen for the multivariate TE, which
maintains speciﬁcity consistent with the requested α for all topologies and time series lengths.
Thus, lower recall achieved by multivariate TE on regular lattice networks for short time series
(compared with bivariate TE) can be viewed as a compromise to control the speciﬁcity in a
consistent fashion. A compelling argument in favor of controlling the speciﬁcity is provided
by Zalesky et al. (2016 p. 407), who conclude that “speciﬁcity is at least twice as important as
sensitivity [i.e., recall] when estimating key properties of brain networks, including topological
measures of network clustering, network efﬁciency and network modularity.” Unfortunately,
there is currently no consistent a priori way (and no reasonable candidate) to determine the
optimal time series length for bivariate TE to attain high precision.

Moving to the macroscale results, it is clear that the ability to control the false positives
while building connectomes is a crucial prerequisite for the application of complex network
measures. Adding only a few spurious links leads to signiﬁcant underestimation of the aver-
age shortest path length—an effect that has previously been reported for lattice-like networks
using MI (Bialonski, Horstmann, & Lehnertz, 2010) and extended here to TE and across a
range of topologies (Figure 3). Together with the clustering coefﬁcient, the shortest path length
is a deﬁning feature of small-world networks. Although evidence of small-world properties
of functional networks obtained from fMRI recordings have been provided in several studies
(e.g., van den Heuvel, Stam, Boersma, & Hulshoff Pol, 2008), whether or not the brain is a

Network Neuroscience

386

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

small-world network is still being debated (Hilgetag & Goulas, 2015; Papo, Zanin, Martínez,
& Buldú, 2016. Following Papo et al. (2016), the question addressed here is of a pragmatic
rather than an ontological nature: Independently of whether the brain is a small-world network,
to what extent can neuroscientists using standard system-level neuroimaging techniques inter-
pret the small-world construct in the context of functional brain networks? An indication that
the interpretation is problematic was provided by Hlinka, Hartman, and Paluš (2012), who
showed that functional connectivity matrices of randomly coupled autoregressive processes
show small-world properties. The effect is due to intrinsic properties of correlation rather than
just to the ﬁnite sample size problem or spatial oversampling. Speciﬁcally, correlation has a
transitivity property: For any node X with neighbors Y and Z (and respective correlations ρ
and ρ
Schwertman, & Owens, 2001:

XY
XZ), a lower bound can be derived for the correlation between the neighbors (Langford,

(cid:5)

≥ ρ

−

1 − ρ2

XZ.

(11)

XY + ρ2

In particular, a strong positive correlation between two pairs of them implies a positive cor-
relation within the third pair: ρ2
XZ > 1 implies ρ2
YZ > 0 (Langford et al., 2001). The
problem was further investigated by Zalesky, Fornito, and Bullmore (2012), who showed that
functional connectivity matrices of independent processes also exhibit small-world properties
and that—in practice—the correlation between neighbors is much higher than the theoretical
lower bound in Equation 11. These considerations on correlation extend to bivariate MI, given
the one-to-one relationship between MI and the absolute value of Pearson’s correlation coefﬁ-
cient for the Gaussian variables considered in this study (see the Bivariate mutual information
for functional connectivity section in the Methods Section). This transitivity property results in
more triangular cliques in functional networks, that is, an inﬂated clustering coefﬁcient across
the whole spectrum of networks in Figure 4. Together with the underestimate of the shortest
path length discussed above, the outcome is an overestimate of the small-worldness coefﬁcient
(Figure 5). As shown, the limitations of bivariate methods can be overcome by multivariate TE,
to a large degree for shorter time series and certainly when sufﬁciently long time series are
available.

SCALE-FREE NETWORKS

Numerical Simulations

The linear preferential attachment algorithm without attractiveness (Barabási & Albert, 1999)
is used to generate undirected scale-free networks of N = 200 nodes. Starting with two con-
nected nodes, a new node is added at each iteration and linked bidirectionally to two existing
nodes, selected with probability proportional to their current degree (via linear preferential
attachment). This preferential mechanism makes high-degree nodes more likely to be selected
and further increase their degree—a positive feedback loop that generates few highly con-
nected hubs and many low-degree nodes. The resulting density is approximately 4/N = 0.02
(average in- and out-degrees being approximately 4), with hubs having degrees up to around
50 here. A constant uniform link weight CXY = CXX = 0.1 is assigned to all the links,
achieving strong coupling but ensuring the stationarity of the VAR dynamics. For robustness,
each simulation is repeated 10 times on different network realizations and with random initial
conditions.

Results

At the microscale, the performance is evaluated in terms of precision and recall in the clas-
siﬁcation of the links. The outcome is qualitatively similar to the small-world case presented

Network Neuroscience

387

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
n
a
r
t
i
c
e
–
p
d

f
/

5
2
3
7
3
1
9
1
3
5
4
1
n
e
n
_
a
_
0
0
1
7
8
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Inferring network properties from time series

above: For longer time series (10,000 samples), multivariate TE is the most accurate method,
achieving optimal performance according to all metrics (Figure 6, right column). Bivariate
TE also achieves optimal recall; however, despite the strict statistical signiﬁcance level, the
precision is signiﬁcantly lower than multivariate TE. The bivariate MI algorithm scores com-
paratively very poorly in terms of both precision and recall (< 40% on average). As expected, the recall of all methods decreases when shorter time series of 1,000 samples are provided (Figure 6, left column). Once more, bivariate TE attains better precision on shorter time series than on longer ones, and for these networks attains slightly better recall than multivariate TE on the shorter time series. At the macroscale, the three algorithms are tested on their ability to accurately measure several relevant properties of scale-free networks. It is well known that the degree distribution of networks generated via this preferential attachment algorithm follows a power-law, with theoretical exponent β = 3 in the limit of large networks (Barabási & Albert, 1999). Fitting power-laws to empirical data requires some caution, such as adopting a logarithmic binning scheme (Virkar & Clauset, 2014), and the dedicated powerlaw Python package is employed for this purpose (Alstott, Bullmore, & Plenz, 2014). For sufﬁciently long time series (T = 10,000 in this study), multivariate TE is able to accurately recover the in-degrees of the nodes in our scale-free networks, while the bivariate MI and TE algorithms produce signiﬁcant overestimates l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 6. Precision (top row) and recall (bottom row) in scale-free networks obtained via pref- erential attachment (N = 200 nodes). Multivariate TE guarantees high precision regardless of the amount of data available (T = 1, 000 in the left column and T = 10,000 in the right column). Bi- variate TE can achieve slightly better recall than multivariate TE for shorter time series (bottom left panel) but its precision drops substantially for longer time series (top right panel) and the optimal time series length cannot be determined a priori. Bivariate MI has lower precision and recall than TE-based methods, for both T = 1,000 and T = 10,000. The box-whiskers plots summarize the results over 10 simulations on different network realizations, with median values indicated in color. Network Neuroscience 388 Inferring network properties from time series Figure 7. Inferred vs. real in-degree in scale-free networks obtained via preferential attachment (N = 200 nodes and T = 10,000 time samples). Multivariate TE is the only algorithm able to preserve the in-degrees of the nodes as compared to their value in the real networks (ground truth). The dashed black line represents the identity between real and inferred values. Surprisingly, bivariate methods can inﬂate the in-degree of nonhubs by over 1 order of magnitude, making hubs less distinguishable. The results are collected over 10 simulations on different network realizations. (Figure 7). As a consequence, the (absolute value of the) exponent of the ﬁtted power-law is underestimated by the latter methods, as shown in Figure 8. Hubs are a key feature of scale-free networks and have high betweenness centrality, since most shortest paths pass through them. However, their centrality is highly underestimated by bivariate methods, often making it indistinguishable from the centrality of peripheral nodes (Figure 9). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 8. Log-log plot of the in-degree distribution in scale-free networks obtained via preferential attachment (N = 200 nodes and T = 10,000 time samples). The best power-law distribution ﬁt for the real networks (ground truth) and the inferred network models are plotted with dashed lines (decay exponents reported in the legend). Despite the ﬁnite-size effect due to the small network size, multivariate TE is able to approximate the theoretical power-law decay exponent β = 3 and to match the power-law ﬁt to the real in-degree distribution (β = 3.1). On the other hand, bivariate TE and MI underestimate the absolute value of the exponent. The results are collected over 10 simulations on different network realizations. Network Neuroscience 389 Inferring network properties from time series Figure 9. Inferred vs. real betweenness centrality for nodes in scale-free networks obtained via preferential attachment (N = 200 nodes and T = 10,000 time samples). Multivariate TE is the only algorithm able to preserve the centrality of the nodes as compared with their value in the real networks (ground truth). The dashed black line represents the identity between real and inferred values. Bivariate MI underestimates the centrality of all nodes, while bivariate TE particularly un- derestimates the centrality of the most central nodes in the ground truth network. The results are collected over 10 simulations on different network realizations. As in the small-world case, multivariate TE is able to very closely approximate the real mean clustering coefﬁcient, while the bivariate MI and TE algorithms consistently overestimate it (Figure 10). The related measure of local efﬁciency (Latora & Marchiori, 2001) is reported in the Supporting Information. A closer examination of the clustering of individual nodes (instead of the average) reveals that low clustering values are consistently overestimated by bivariate methods, while high clus- tering values are underestimated (Supporting Information). Finally, bivariate methods overes- timate the rich-club coefﬁcient (Supporting Information). Inferred vs. real average clustering coefﬁcient in scale-free networks obtained via Figure 10. preferential attachment (N = 200 nodes and T = 10,000 time samples). Multivariate TE is the only algorithm able to preserve the average clustering of the real networks (ground truth), while bivariate TE and MI consistently overestimate it. The box-whiskers plots summarize the results over 10 simulations, with median values indicated in color. Network Neuroscience 390 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series Discussion Echoing the discussion of small-world networks above, the ability to control the false positives while building connectomes—exhibited only by multivariate TE—is also crucial for correctly identifying fundamental features of scale-free networks, such as the power-law degree distribu- tion and the presence of hub nodes. Hubs are characterized by high degree and betweenness centrality. Unfortunately, the centrality of hubs is not robust with respect to false positives: The addition of spurious links cause’s strong underestimates of the betweenness centrality of real hubs, since additional links provide alternative shortest paths. For bivariate TE, the effect is so prominent that the inferred centrality of real hubs can be indistinguishable from the central- ity of peripheral nodes, as shown in Figure 9. The in-degree is in principle more robust with respect to false positives; however, bivariate methods infer so many spurious incoming links into nonhubs that they become as connected (or more) than the real hubs are inferred to be (Figure 7). Taken together, these effects on the in-degree and centrality greatly hinder the iden- tiﬁcation of real hubs when bivariate MI or TE are employed. The inﬂation of the in-degree of peripheral nodes also fattens the tail of the in-degree distribution (Figure 8), resulting in an un- derestimate of the exponent of the ﬁtted power-law with respect to the theoretical value β = 3 (Barabási & Albert, 1999). This has severe implications for the synthetic networks used in this study, erroneously providing evidence against the simple preferential attachment algorithm used to generate them. The third distinct characteristic of these networks is their low average clustering, which is also induced by the preferential attachment algorithm, whereby each new node is only connected to two existing ones. However, bivariate methods fail to capture this feature, producing a strong overestimate of the average clustering coefﬁcient (Figure 10). This can be attributed to the transitivity property of Pearson’s correlation, which produces overabun- dant triangular cliques in functional networks (as previously discussed). Given the signiﬁcant biases affecting all the distinctive properties of scale-free networks—in addition to the small- world networks presented above—it is evident that great caution should be used when applying bivariate inference methods (cross-correlation, MI, TE) to draw conclusions as to topological properties of real-world networks. In contrast, again, the multivariate TE was demonstrated to produce network models with microscopic and macroscopic topological properties consistent with those of the underlying structural scale-free networks. MODULAR NETWORKS Numerical Simulations In order to study the performance of the three inference algorithms on modular topologies, networks of 100 nodes are generated and equally partitioned into ﬁve groups of 20. Initially, each node is directly linked to 10 random targets within its own group, such that the ﬁve com- munities are completely disconnected. The initial density is thus 50% within each group and 10% overall. Link targets are then gradually rewired from within to between groups, weak- ening the modular structure but preserving the overall density and keeping the out-degrees ﬁxed. Eventually, the concepts of “within” and “between” groups are no longer meaningful— the links are equally distributed and the topology resembles a random Erd ˝os-Rényi network of equal overall density. This happens when the rewiring is so prevalent that only 2 links are left within the initial groups and 8 out of 10 links are formed between them (for each node). Going even further, when all 10 links are formed between the initial groups and none within, the network becomes multipartite, that is, the nodes are partitioned into ﬁve independent sets having no internal connections. A constant uniform link weight CXY = CXX = 0.08 is assigned to all the links, achieving strong coupling but ensuring the stationarity of the VAR dynamics. Network Neuroscience 391 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series Each simulation is repeated 10 times on different network realizations and with random initial conditions. Results At the microscale, we ﬁnd that bivariate MI and TE infer more spurious links within the initial groups than between them for smaller between-group densities (Figure 11, left column). As the between-group density increases though, we ﬁnd more spurious links between the initial groups than within them. The normalized false positive rate is also signiﬁcantly higher within groups for smaller between-group densities (right column), however the normalization sees the false positive rate becoming comparable between and within group’s as the between-group density increases. The number of false positives produced by multivariate TE is comparatively negligible. At the mesoscale, the modularity of the partition corresponding to the ﬁve disconnected communities is maximal in the absence of rewiring and decreases as more and more links are formed between groups rather than within them (Figure 12). Bivariate and multivariate TE produce accurate estimates of the real modularity, while bivariate MI often underestimates it, particularly for shorter time series (T = 1,000) and intermediate between-group densities. Discussion Our results on modular networks conﬁrm and extend previous ﬁndings on correlation-based functional connectivity, stating that “false positives occur more prevalently between network modules than within them, and the spurious inter-modular connections have a dramatic impact on network topology” (Zalesky et al., 2016, 407). Indeed, the left column of Figure 11 shows that bivariate MI and TE infer a larger number of false positives between the initial groups than within them, once we have a midrange between-group link density in the underlying structure (which induces the transitive relationships). However, the same does not apply to the false positive rate (i.e., the normalized number of false positives) shown in the right column of Figure 11: Where an edge does not actually exist, it is more likely to be inferred if it is within rather than across groups (for up to midrange between-group link densities). As such, the higher number of false positives between modules is mostly due to the larger number of potential spurious links available between different communities compared with those within them. Nonetheless, the key message is that the modular structure (at the mesoscale level) affects the performance of bivariate algorithms in inferring single links (at the microscale level). This provides further empirical evidence for the theoretical ﬁnding that bivariate TE—despite being a pairwise measure—does not depend solely on the directed link weight between a single pair of nodes, but on the larger network structure they are embedded in, via the mesoscopic network motifs (Novelli et al., 2020). In particular, the abundance of speciﬁc “clustered motifs” in modular structure increase the bivariate TE, making links within each group easier to detect but also increasing the false positive rate within modules. Other studies have related also the correlation-based functional connectivity to speciﬁc structural features, such as search information, path transitivity (Goni et al., 2014), and topological similarity (Bettinardi et al., 2017). The underestimate of the modularity of the initial partition by bivariate MI (Figure 12, bot- tom panel) is a direct result of these higher numbers of spurious between-group links. This has important implications for the identiﬁcation of the modules, since a lower score makes this partition less likely to be deemed optimal by popular greedy modularity maximization Network Neuroscience 392 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 11. False positives per node (left column) and false positive rate (right column) in modular networks of N = 100 nodes and T = 10,000 time samples. Each node is connected to 10 neighbors and the horizontal axis represents the number of links between groups (in the two extreme cases all 10 links are formed exclusively within each group or exclusively between groups). Bivariate MI and TE infer more spurious links (left column) within the initial groups than between them for smaller between-group densities; the comparison then reverses for larger between-group densities. The false positive rate (i.e., the normalized number of false positives; right column) is also higher within groups for smaller between-group densities, while the rate for between groups becomes comparable (instead of larger) as between-group density increases. The number of false positives produced by multivariate TE is comparatively negligible. The results for 10 simulations on different networks are presented (low-opacity markers) in addition to the mean values (solid markers). algorithms (Blondel, Guillaume, Lambiotte, & Lefebvre, 2008). We speculate that the spuri- ous intermodular links would also hinder the identiﬁcation of the modules when alternative approaches for community detection are employed (a thorough comparison of which is be- yond the scope of this study). Network Neuroscience 393 Inferring network properties from time series l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 12. Modularity of the partition representing ﬁve groups of 20 nodes (T = 10,000 time samples). Each node is initially connected to 10 neighbours within the same group via directed links. As link targets are gradually rewired from within to between groups, the modularity of the initial partition linearly decreases. The horizontal axis represents the number of links between groups for each node (in the two extreme cases all 10 links are formed exclusively within each group or exclusively between groups). Bivariate and multivariate TE produce accurate estimates of the real modularity, while bivariate MI often underestimates it, for both shorter and longer time series (T = 1,000 in the top panel and T = 10,000 in the bottom one). The results for 10 simulations on different networks are presented (low-opacity markers) in addition to the mean values (solid markers). MACAQUE CONNECTOME Finally, the three inference algorithms are compared on two real macaque brain connectomes, using both linear VAR dynamics and a nonlinear neural mass model (pipeline illustrated in Figure 1). Numerical Simulations As a ﬁnal validation study under idealized conditions, the linear VAR Linear VAR dynamics. dynamics in Equation 1 is run on the connectome obtained via tract-tracing by Young (1993). This directed network consists of 71 nodes and 746 links (15% density) and incorporates mul- tiple properties investigated in the previous sections, including a small-world topology and the Network Neuroscience 394 Inferring network properties from time series presence of hubs and modules. The scaling of the performance is studied as a function of the cross-coupling strength (i.e., the sum of incoming link weights into each node, denoted as Cin and formally deﬁned as Cin = ∑X CXY for each node Y). The coupling Cin is varied in the [0.3, 0.7] range, making CXY constant for each parent X for a given Y to achieve this, and the self-link weights are kept constant at CXX = 0.2 to ensure the stationarity of the VAR dynamics. For robustness, each simulation is repeated 10 times with random initial conditions. As a ﬁnal experiment, we provide an initial investigation of Nonlinear neural mass model. whether the insights from the previous validation studies extend beyond the idealized con- ditions there. Speciﬁcally, in moving towards a more realistic setting, neural mass model dy- namics are simulated on the CoCoMac connectome, as described in the Methods Section. This network structure contains 76 nodes with 1,560 directed connections (27% density), which are weighted and have experimentally estimated coupling delays. Importantly, by incorporating nonlinear coupling, coupling delays, a distribution of coupling weights, and subsampling, this last study drops many of the simplifying assumptions made using the VAR dynamics in the previous sections. The linear Gaussian estimator is retained for our information-theoretic measures despite the nonlinear interactions here, so as to remain consistent with the previ- ous studies. Dropping the assumption of sampling at the real causal process resolution adds a particular challenge, and is often encountered in practice in modalities with low temporal resolution. To handle the variation in coupling delays, we consider sources at lags up to L = 4 time steps (60 ms) here. The longest time series analyzed (30,000 samples) corresponds to 7.5 min. of one sample per 15 ms. Results At the microscale, the results for the linear and nonlinear dynamics (Figure 13 and Figure 14) are complementary and summarize the main ﬁndings presented so far. There exists a window— characterized by low cross-coupling strength and short time series—where bivariate TE attains similar or better performance compared with multivariate TE in terms of recall, speciﬁcity, and precision. For stronger coupling or longer time series, the recall of all methods increase’s, but the precision and speciﬁcity of the bivariate methods substantially drop while those of multivariate TE remain consistently high. An intuitive visual representation of how these differences in precision, recall, and speci- ﬁcity affect the macroscopic inferred network is provided in Figure 15, where the inferred adjacency matrices are displayed beside the real connectome, with different colors indicating which links are correctly/incorrectly inferred or missed by each method. The macroscale results (in terms of local and global efﬁciency measures) are reported in the Supporting Information. Discussion Interestingly, Figures 13 and 14 show how similar outcomes are produced by either stronger coupling (link weights) or longer time series. An explanation is readily available in the simple case of VAR dynamics: The bivariate TE on spurious links is typically lower than the TE on real links, and it increases with the coupling strength. (Bivariate TE can be analytically derived from the network structure under the assumption of VAR dynamics, and spurious links are associated with larger motifs (e.g., longer chains), contributing to the TE at lower orders of magnitude (Novelli et al., 2020). Therefore, spurious links can only pass statistical signiﬁcance Network Neuroscience 395 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 13. Performance as a function of coupling weight in a real macacque connectome with 71 nodes and linear VAR dynamics. Multi- variate TE (third row) guarantees high speciﬁcity (adequate control of the false positive rate) regardless of the cross-coupling strength and time series length (T = 1,000 in the left column and and T = 10,000 in the right column). Bivariate TE (second row) attains similar performance to multivariate TE for low cross-coupling strength and short time series, according to all metrics. For stronger coupling or longer time series, the recall of all methods increases, but the precision and speciﬁcity of the bivariate methods substantially drop. For each value of the cross- coupling weights, the results for 10 simulations from random initial conditions are presented (low-opacity markers) in addition to the mean values (solid markers). Network Neuroscience 396 Inferring network properties from time series l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 14. Performance as a function of number of time series samples (T) in the CoCoMac con- nectome with 76 nodes and nonlinear dynamics (neural mass model). Similarly to the linear VAR dynamics case presented in the previous ﬁgure, multivariate TE (third row) has lower recall than bi- variate methods, particularly for shorter time series. More importantly though, it guarantees higher and more consistent precision and speciﬁcity, testifying to a more effective control of false positives. This advantage becomes increasingly important as longer time series are provided. The results for all simulations are presented (low-opacity markers) in addition to the mean values (solid markers). Network Neuroscience 397 Inferring network properties from time series Inferred adjacency matrices beside real CoCoMac connectome with N = 76 nodes. Rows are source regions, columns are targets. Figure 15. Different colors indicate which links are correctly/incorrectly inferred or missed by each inference method. True positives are indicated in green, false positives in red, false negatives in yellow, and true negatives in white. The large number of spurious interhemispheric links produced by bivariate methods can hinder the identiﬁcation of important macroscopic features, including the two hemispheres (particularly for longer time series, as in the case of T = 30, 000 time samples shown here). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 tests when sufﬁciently long time series are available (in order for their weak TE values to be distinguished from noise); for the same reason, for shorter time series, spurious links can only be detected in the presence of strong enough coupling. Unfortunately, for real datasets, there is no consistent a priori way to determine the optimal window of time series lengths for bivariate TE, before increasing false positive rate decays precision and speciﬁcity. It is crucial to note that, despite moving beyond the idealized conditions used for validation with the VAR model, the qualitative differences between the inference algorithms remain un- changed in the neural mass model study. That is, multivariate TE attains higher and more con- sistent precision and speciﬁcity than bivariate methods, testifying to a more effective control of false positives that enables more faithful representation of macroscale network features—an Indeed, advantage that becomes increasingly important as longer time series are provided. on a more intuitive level, Figure 15 provides immediate visual evidence of the importance of a reliable method for controlling the false positive rate. The large number of spurious in- terhemispheric links produced by bivariate methods can hinder the identiﬁcation of impor- tant macroscopic features, starting from the very presence of the two hemispheres themselves and extending to other fundamental network properties (as shown for local and global efﬁ- ciency measures in the Supporting Information). This issue becomes particularly problematic for longer time series, as in the case of T = 30, 000 time samples shown in Figure 15. With that said, the precision and speciﬁcity of the multivariate TE for this more realistic study in Figure 14 is noticeably lower compared with the previous experiments in idealized conditions, such as that shown in Figure 13. Speciﬁcally, the speciﬁcity is lower than would be expected from the proven well-controlled false positive rate under idealized conditions. This is potentially due to a number of factors in this study, including the subsampling, nonlinear dynamics, strong autocorrelation, and to a lesser extent coupling delays. While we retained the linear Gaussian estimator to be consistent with the previous experiments, we have previ- ously demonstrated substantial performance enhancements for the multivariate TE algorithm when using a nonlinear estimator for studying nonlinear dynamics (Novelli et al., 2019). That could be expected to improve performance here as well. Similarly, we have recently demon- strated approaches to rigorously control the known inﬂation of false positive rates for MI and TE Network Neuroscience 398 Inferring network properties from time series because of autocorrelation (Cliff et al., 2020), although this is not expected to have as dramatic an effect in this study because of the selection of subsampling time via the autocorrelation time. The subsampling itself though (by a factor of 30 on the original time series) is likely to have had a signiﬁcant impact on performance. This is because subsampling obscures our view of the dynamics at the real interaction scale, and prevents us from properly conditioning on the past of the target (which is known to inﬂate the false positive ratel; Wibral et al., 2011). While initial studies have suggested some level of robustness of TE to subsampling (Lizier et al., 2011), a more systematic analysis would be important future work to properly understand its effect. CONCLUSION We have sought to evaluate how well network models produced via bivariate and multivariate network inference methods capture features of underlying structural topologies. As outlined in the Introduction, these inference techniques seek to infer a network model of the relationships between the nodes in a system, and are not necessarily designed or expected to replicate the underlying structural topology in general. Our primary focus, however, was in evaluating the techniques under speciﬁc idealized conditions under which effective network models are proven to converge to the underlying structure. The focus on such conditions is important because they provide assumptions under which our evaluation becomes a validation study. The performance of these methods was evaluated at both the microscopic and the macroscopic scales of the network. While we may not expect the same performance in identifying links at the microscopic scale, we should expect all of the methods to identify relevant macroscopic features in the underlying network structure, as well as distinctive nodes or groups of nodes. For longer time series, multivariate TE performs better on all network topologies (lattice-like, small-world, scale-free, modular, and the real macaque connectome). This enhanced perfor- mance is very clear at the microscale of single links, achieving high precision and recall, and consequently at the macroscale of network properties, accurately reﬂecting the key summary statistics of the ground truth networks used for validation. Bivariate methods (directed and undirected) can exhibit higher recall (or sensitivity) for shorter time series for certain underlying topologies; however, as available data increase, they are unable to control false positives (that is, they have lower speciﬁcity). While decreasing statistical signiﬁcance thresholds (critical α levels) for inferring links is a common strategy to reduce false positives, the bivariate measures simply cannot match the sensitivity of the multivariate approach at the same speciﬁcity (compare the precisions for same recall level in Figure 6 as an example, or refer to the sample ROC curve in the Supporting Information for a more extensive comparison). At the macroscale, the comparatively larger number of false positives leads to overestimated clustering, small-world, and rich-club coefﬁcients, underes- timated shortest path lengths and hub centrality, and fattened degree distribution tails. The changes in these measures are partly due to the aforementioned transitivity property for bivari- ate measures (implying that false positives are often “close” to real links in the network), and partly due to higher density; untangling these effects is a topic for future work. In any case, cau- tion should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures. Their use is advisable only when the limited amount of data don’t allow the use of the more sophisticated but more accurate multivariate TE, which more faithfully tracks trends in underlying structural topology. Further research is required to try to reliably identify—a priori—situations where bivariate TE will exhibit higher precision and recall (particularly in terms of time series length), as there is no clear candidate approach to do so at present. In the current status quo, the critical strength Network Neuroscience 399 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series of the multivariate approach lies in its ability to appropriately control the false positive rate to meet the requested values. Our evaluation of the inference techniques under idealized conditions considered several of the highest proﬁle complex network topologies: lattice-like, small-world, scale-free, mod- ular, and a mix of their features in a real macaque connectome. This complements previous work (Novelli et al., 2019; Sun et al., 2015) at a similar scale, which evaluated performance on random network structures and incorporated a study of the effect of network size and linear- versus-nonlinear dynamics and estimators. Obviously, we have only scratched the surface of examining the effects of the myriad combinations of network parameters that could be inves- tigated, which could include larger variations in degree, distributions on edge weights, super or sublinear preferential attachment, nonuniform module sizes, or cross-module connection probabilities, and could also incorporate experiments across other types of dynamics. Thus far, our conclusions on how the multivariate TE approach performs against the bivariate measures were consistent across the variety of structures, and while it would be interesting to see how other variations in structure affect the performance, we do expect the general conclusions on the comparison between approaches to remain similar. Of course, there is a computational time tradeoff, with the run-time of the multivariate TE algorithm requiring O(d) longer in comparison to the bivariate approach (where d is the average inferred in-degree). The runtime complexity is analyzed in detail and benchmarked for both linear and nonlinear estimators on similar scale experiments in Novelli et al. (2019; Supporting Information). Our experiments here on 10,000 time samples for up to 200 nodes, with the more efﬁcient linear estimator, took less than 2 hr (single core) on average per target on the same hardware. Given the availability of parallel computing to analyze targets simulta- neously, we believe the trade-off in runtime increase is justiﬁable for the performance increase demonstrated here. Beyond idealized conditions, effective network inference techniques are not guaranteed to converge in such manner to an underlying structure. This can be for many reasons, including hidden nodes (or lack of full observability), nonstationarity or short sample size, or subsampling obscuring the scale of interaction. Yet, our ﬁnal experiment (examining time series dynamics of a neural mass model on the 76 node CoCoMac connectome) extended our investigations into the domain beyond idealized conditions and also demonstrated superior performance of the multivariate TE, aligning with the validation studies in idealized conditions. Importantly, this included visually revealing the characteristic hemispheric macroscopic structure of this connectome. With that said, the performance of multivariate TE in this example was certainly reduced in comparison to our experiments under idealized conditions. This appears to be due to various factors as discussed in that section, including the use of a linear estimator on nonlin- ear dynamics as well as the effect of subsampling. There is substantial scope for further study to understand the performance of inference techniques under nonideal techniques, and how the effective network models they infer are related to underlying structure. This will involve further experiments on realistic neural dynamics; systematic study of the effect of subsampling in network inference (building on existing studies for Granger causality; Barnett & Seth, 2017), and assessing the ability of inference algorithms to capture key network features when only a subset of nodes is observed (i.e., in the presence of hidden nodes). Finally, while we focus on functional brain networks, our conclusions and methods also apply to anatomical brain networks in which connectivity is measured using correlation in cortical thickness or volume (He, Chen, & Evans, 2007). Beyond neuroscience, they also Network Neuroscience 400 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series extend to metabolite, protein and gene correlation networks (Gillis & Pavlidis, 2011; a similar validation study using synthetic networks was carried out in gene regulatory networks using bivariate MI and TE; Budden & Crampin, 2016). ACKNOWLEDGMENTS The authors thank Mac Shine and Daniele Marinazzo for useful discussions. The authors acknowledge the Sydney Informatics Hub and the University of Sydney’s high-performance computing cluster Artemis for providing the high-performance computing resources that have contributed to the research results reported within this paper. The authors thank Matthew Aburn for providing time-series data simulated from the neural mass model on the CoCoMac connectome from Li et al. (2019) and Shine et al. (2018). SUPPORTING INFORMATION Supporting Information for this article is available at https://doi.org/netn_a_00178. The net- work inference algorithms described in this paper are implemented in the open-source Python software package IDTxl (Wollstadt et al., 2019), which is freely available on GitHub (https:// github.com/pwollstadt/IDTxl). The code used for the systematic exploration of network struc- tures and inference methods is also publicly available (Novelli, 2020; https://github.com /LNov/infonet). AUTHOR CONTRIBUTIONS Leonardo Novelli: Conceptualization; Data curation; Formal analysis; Investigation; Method- ology; Project administration; Software; Validation; Visualization; Writing – original draft. Joseph Lizier: Conceptualization; Funding acquisition; Investigation; Supervision; Writing – review & editing. FUNDING INFORMATION Joseph Lizier, Australian Research Council, Award ID: DECRA Fellowship grant DE160100630. Joseph Lizier, University of Sydney, Award ID: Research Accelerator (SOAR) prize program. REFERENCES Aertsen, A. M., Gerstein, G. L., Habib, M. K., & Palm, G. (1989). Dynamics of neuronal ﬁring correlation: Modulation of “effective connectivity.” Journal of Neurophysiology, 61(5), 900–917. DOI: https://doi.org/10.1152/jn.1989.61.5.900, PMID: 2723733 Alstott, J., Bullmore, E., & Plenz, D. (2014). Powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE, 9(1), e85777. DOI: https://doi.org/10.1371/journal.pone.0085777, PMID: 24489671, PMCID: PMC3906378 Aquino, K. M., Fulcher, B. D., Parkes, L., Sabaroedin, K., & Fornito, A. (2020). Identifying and removing widespread signal deﬂec- tions from fMRI data: Rethinking the global signal regression problem. NeuroImage, 212, 116614. DOI: https://doi.org/10 .1016/j.neuroimage.2020.116614, PMID: 32084564 Atay, F. M., & Karabacak, Ö. (2006). Stability of coupled map net- works with delays. SIAM Journal on Applied Dynamical Systems, 5(3), 508–527. DOI: https://doi.org/10.1137/060652531 Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in ran- dom networks. Science, 286(5439), 509–512. DOI: https://doi .org/10.1126/science.286.5439.509 PMID: 10521342 Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical Review Letters, 103(23), 238701. DOI: https://doi.org /10.1103/PhysRevLett.103.238701, PMID: 20366183 Barnett, L., & Seth, A. K. (2017). Detectability of Granger causality for subsampled continuous-time neurophysiological processes. Journal of Neuroscience Methods, 275, 93–121. DOI: https:// doi.org/10.1016/j.jneumeth.2016.10.016, PMID: 27826091 Bassett, D. S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience, 20(3), 353–364. DOI: https://doi.org/10.1103 /PhysRevLett.103.238701, PMID: 20366183 Bettinardi, R. G., Deco, G., Karlaftis, V. M., Van Hartevelt, T. J., (2017). Fernandes, H. M., Kourtzi, Z., . . . Zamora-López, G. Network Neuroscience 401 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / t e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series How structure sculpts function: Unveiling the contribution of anatomical connectivity to the brain’s spontaneous correlation structure. Chaos: An Interdisciplinary Journal of Nonlinear Sci- ence, 27(4), 047409. DOI: https://doi.org/10.1063/1.4980099, PMID: 28456160 Bialonski, S., Horstmann, M.-T., & Lehnertz, K. (2010). From brain to earth and climate systems: Small-world interaction networks or not? Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(1), 013134. DOI: https://doi.org/10.1063/1.3360561, PMID: 20370289 Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. Jour- (2008). Fast unfolding of communities in large networks. nal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. DOI: https://doi.org/10.1088/1742-5468/2008/10 /P10008 Bossomaier, T., Barnett, L., Harré, M., & Lizier, J. T. (2016). An in- troduction to transfer entropy. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-43222-9 Budden, D. M., & Crampin, E. J. (2016). Information theoretic ap- proaches for inference of biological networks from continuous- valued data. BMC Systems Biology, 10(1), 89. DOI: https://doi .org/10.1186/s12918-016-0331-y, PMID: 27599566, PMCID: PMC5013667 Cliff, O. M., Novelli, L., Fulcher, B. D., Shine, J. M., & Lizier, J. T. (2020). Exact inference of linear dependence between mul- tiple autocorrelated time series. https://journals.aps.org/prresearch /abstract/10.1103/PhysRevResearch.3.013145 Colizza, V., Flammini, A., Serrano, M. A., & Vespignani, A. (2006). Detecting rich-club ordering in complex networks. Nature Physics, 2(2), 110–115. DOI: https://doi.org/10.1038/nphys209 (2005). Elements of information John Wiley & Sons. DOI: https://doi.org/10 Cover, T. M., & Thomas, J. A. theory (2nd ed.). .1002/047174882X Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Physical Review E, 83(5), 051112. DOI: https://doi.org/10.1103/PhysRevE.83.051112, PMID: 21728495 Fagiolo, G. (2007). Clustering in complex directed networks. Physical Review E, 76(2), 026107. DOI: https://doi.org/10.1103 /PhysRevE.76.026107, PMID: 10991308 FitzHugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6), 445–466. DOI: https://doi.org/10.1016/S0006-3495(61)86902-6 Fornito, A., Zalesky, A., & Bullmore, E. T. (2016). Fundamentals of brain network analysis (1st ed.). Academic Press. Gillis, J., & Pavlidis, P. (2011). The role of indirect connections in gene networks in predicting function. Bioinformatics, 27(13), 1860–1866. DOI: https://doi.org/10.1093/bioinformatics/btr288, PMID: 21551147, PMCID: PMC3117376 Goni, J., van den Heuvel, M. P., Avena-Koenigsberger, A., Velez de Mendizabal, N., Betzel, R. F., Griffa, A., . . . Sporns, O. (2014). Resting-brain functional connectivity predicted by analy- tic measures of network communication. Proceedings of the National Academy of Sciences, 111(2), 833–838. DOI: https:// doi.org/10.1073/pnas.1315529111, PMID: 24379387, PMCID: PMC3896172 Granger, C. W. J. (1969). Investigating causal relations by econo- metric models and cross-spectral methods. Econometrica, 37(3), 424–438. DOI: https://doi.org/10.2307/1912791 He, Y., Chen, Z. J., & Evans, A. C. (2007). Small-world anatomi- cal networks in the human brain revealed by cortical thickness from MRI. Cerebral Cortex, 17(10), 2407–2419. DOI: https://doi .org/10.1093/cercor/bhl149, PMID: 17204824 Hilgetag, C. C., & Goulas, A. (2015). Is the brain really a small- world network? Brain Structure and Function, 221(4), 2361–2366. DOI: https://doi.org/10.1007/s00429-015-1035-6, PMID: 25894630, PMCID: PMC4853440 Hlinka, J., Hartman, D., & Paluš, M. (2012). Small-world topology of functional connectivity in randomly connected dynamical sys- tems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(3), 033107. DOI: https://doi.org/10.1063/1.4732541, PMID: 23020446 Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Net- work structure of cerebral cortex shapes functional connectiv- ity on multiple time scales. Proceedings of the National Academy of Sciences, 104(24), 10240–10245. DOI: https://doi.org/10.1073 /pnas.0701519104, PMID: 17548818, PMCID: PMC1891224 Humphries, M. D., & Gurney, K. (2008). Network “Small-world- ness”: A quantitative method for determining canonical network equivalence. PLoS ONE, 3(4), e0002051. DOI: https://doi.org /10.1371/journal.pone.0002051, PMID: 18446219, PMCID: PMC2323569 Kim, P., Rogers, J., Sun, J., & Bollt, E. M. (2016). Causation en- tropy identiﬁes sparsity structure for parameter estimation of Journal of Computational and Nonlinear Dy- dynamic systems. namics, 12(1), 011008. DOI: https://doi.org/10.1115/1.4034126 Kötter, R. (2004). Online retrieval, processing, and visualization of primate connectivity data from the CoCoMac database. Neuroinformatics, 2(2), 127–144. DOI: https://doi.org/10.1385 /NI:2:2:127 Kugiumtzis, D. (2013). Direct-coupling information measure from nonuniform embedding. Physical Review E, 87(6), 062918. DOI: https://doi.org/10.1103/PhysRevLett.87.198701, PMID: 11690461 Langford, E., Schwertman, N., & Owens, M. (2001). Is the property of being positively correlated transitive? The American Statisti- cian, 55(4), 322–325. DOI: https://doi.org/10.1371/journal.pcbi .1006957, PMID: 31613882, PMCID: PMC6793849 Latora, V., & Marchiori, M. (2001). Efﬁcient behavior of small- world networks. Physical Review Letters, 87(19), 198701. DOI: https://doi.org/10.1103/PhysRevLett.87.198701, PMID: 11690461 Li, M., Han, Y., Aburn, M. J., Breakspear, M., Poldrack, R. A., Shine, J. M., & Lizier, J. T. (2019). Transitions in information process- ing dynamics at the whole-brain network level are driven by al- terations in neural gain. PLOS Computational Biology, 15(10), e1006957. DOI: https://doi.org/10.1371/journal.pcbi.1006957, PMID: 31613882, PMCID: PMC6793849 Lizier, J. T. (2014). JIDT: An information-theoretic toolkit for study- ing the dynamics of complex systems. Frontiers in Robotics and AI, 1, 11. DOI: https://doi.org/10.3389/frobt.2014.00011 Network Neuroscience 402 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / t / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series Lizier, J. T., Heinzle, J., Horstmann, A., Haynes, J.-D., & Prokopenko, M. (2011). Multivariate information-theoretic mea- sures reveal directed information structure and task relevant Journal of Computational Neu- changes in fMRI connectivity. roscience, 30(1), 85–107. DOI: https://doi.org/10.1007/s10827 -010-0271-2, PMID: 20799057 Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute. Maier, B. F. (2019). Generalization of the small-world effect on a model approaching the Erd?s–Rényi random graph. Scientiﬁc Reports, 9(1), 9268. DOI: https://doi.org/10.1038/s41598-019 -45576-3, PMID: 31239466, PMCID: PMC6592893 Marinazzo, D., Wu, G., Pellicoro, M., Angelini, L., & Stramaglia, S. (2012). Information ﬂow in networks and the law of diminish- ing marginal returns: Evidence from modeling and human elec- troencephalographic recordings. PLoS ONE, 7(9), e45026. DOI: https://doi.org/10.1371/journal.pone.0045026, PMID: 23028745, PMCID: PMC3445562 Montalto, A., Faes, L., & Marinazzo, D. (2014). MuTE: A MATLAB toolbox to compare established and novel estimators of the multi- variate transfer entropy. PLoS ONE, 9(10), e109462. DOI: https:// doi.org/10.1371/journal.pone.0109462, PMID: 25314003, PMCID: PMC4196918 Neal, Z. P. (2017). How small is it? Comparing indices of small worldliness. Network Science, 5(1), 30–44. DOI: https://doi.org /10.1017/nws.2017.5 Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113. DOI: https://doi.org/10.1103/PhysRevE.69.026113, PMID: 14995526 Novelli, L. (2020). Infonet, GitHub. https://github.co/LNov/infonet Novelli, L., Atay, F. M., Jost, J., & Lizier, J. T. (2020). Deriving pair- wise transfer entropy from network structure and motifs. Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 476(2236), 20190779. DOI: https://doi .org/10.1098/rspa.2019.0779, PMID: 32398937 Novelli, L., Wollstadt, P., Mediano, P., Wibral, M., & Lizier, J. T. (2019). Large-scale directed network inference with multivariate testing. Network transfer entropy and hierarchical statistical Neuroscience, 3(3), 827–847. DOI: https://doi.org/10.1162/netn _a_00092, PMID: 31410382, PMCID: PMC6663300 Orlandi, J. G., Stetter, O., Soriano, J., Geisel, T., & Battaglia, D. (2014). Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging. PLoS ONE, 9(6), e98842. DOI: https://doi.org/10.1371/journal.pone.0098842, PMID: 24905689, PMCID: PMC4048312 Papo, D., Zanin, M., Martínez, J. M. (2016). Beware of the small-world neuroscientist! Frontiers in Human Neuroscience, 10(March), 1–4. DOI: https://doi.org/10.3389 /fnhum.2016.00096 J. H., & Buldú, Pernice, V., Staude, B., Cardanobile, S., & Rotter, S. (2011). How structure determines correlations in neuronal networks. PLoS Computational Biology, 7(5), e1002059. DOI: https://doi.org /10.1371/journal.pcbi.1002059, PMID: 21625580, PMCID: PMC3098224 Razi, A., Kahan, J., Rees, G., & Friston, K. J. (2015). Construct vali- dation of a DCM for resting state fMRI. NeuroImage, 106, 1–14. DOI: https://doi.org/10.1016/j.neuroimage.2014.11.027, PMID: 25463471, PMCID: PMC4295921 Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. NeuroImage, 52(3), 1059–1069. DOI: https://doi.org/10.1016/j.neuroimage .2009.10.003, PMID: 19819337 Runge, J. (2018). Causal network reconstruction from time se- ries: From theoretical assumptions to practical estimation. Chaos, 28(7), 075310. DOI: https://doi.org/10.1063/1.5025050, PMID: 30070533 Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinovic, D. (2018). Detecting causal associations in large nonlinear time series datasets. DOI: https://doi.org/10.1126/sciadv.aau4996, PMID: 31807692, PMCID: PMC6881151 Sanz Leon, P., Knock, S. A., Woodman, M. M., Domide, L., Mersmann, J., McIntosh, A. R., & Jirsa, V. (2013). The Virtual Brain: A simulator of primate brain network dynamics. Frontiers in Neuroinformatics, 7(MAY). DOI: https://doi.org/10.3389/fninf .2013.00010, PMID: 23781198, PMCID: PMC3678125 Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464. DOI: https://doi.org/10.1103/ PhysRevLett.85.461, PMID: 10991308 Schwarze, A. C., & Porter, M. A. (2020). Motifs for processes on networks. Shannon, C. E. (1948). A mathematical theory of communica- tion. Bell System Technical Journal, 27(3), 379–423. DOI: https:// doi.org/10.1002/j.1538-7305.1948.tb01338.x Shine, J. M.unskip (2018). Gain topology, GitHub. https://github .com/macshine/gain_topology Shine, J. M., Aburn, M. J., Breakspear, M., & Poldrack, R. A. (2018). The modulation of neural gain facilitates a transition between functional segregation and integration in the brain. eLife, 7, 1–16. DOI: https://doi.org/10.7554/eLife.31130, PMID: 29376825, PMCID: PMC5818252 Stetter, O., Battaglia, D., Soriano, J., & Geisel, T. (2012). Model- free reconstruction of excitatory neuronal connectivity from calcium imaging signals. PLoS Computational Biology, 8(8), e1002653. DOI: https://doi.org/10.1371/journal.pcbi.1002653, PMID: 22927808, PMCID: PMC3426566 Stramaglia, S., Cortes, J. M., & Marinazzo, D. (2014). Synergy and redundancy in the Granger causal analysis of dynami- cal networks. New Journal of Physics, 16(10), 105003. DOI: https://doi.org/10.1088/1367-2630/16/10/105003 Sun, J., Taylor, D., & Bollt, E. M. (2015). Causal network inference by optimal causation entropy. SIAM Journal on Applied Dy- namical Systems, 14(1), 73–106. DOI: https://doi.org/10.1137 /140956166 Takens, F. (1981). Detecting strange attractors in turbulence. In D. Rand & L. Young (Eds.), Dynamical systems and tur- bulence (pp. 366–381). Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/BFb0091924 Telesford, Q. K., Joyce, K. E., Hayasaka, S., Burdette, J. H., & Laurienti, P. J. (2011). The ubiquity of small-world networks. Brain Connectivity, 1(5), 367–375. DOI: https://doi.org/10.1089 /brain.2011.0038, PMID: 22432451, PMCID: PMC3604768 Network Neuroscience 403 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d . t f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Inferring network properties from time series van den Heuvel, M., Stam, C., Boersma, M., & Hulshoff Pol, H. (2008). Small-world and scale-free organization of voxel- based resting-state functional connectivity in the human brain. NeuroImage, 43(3), 528–539. DOI: https://doi.org/10.1016 /j.neuroimage.2008.08.010, PMID: 18786642 Virkar, Y., & Clauset, A. (2014). Power-law distributions in binned empirical data. The Annals of Applied Statistics, 8(1), 89–119. DOI: https://doi.org/10.1214/13-AOAS710 Vlachos, I., & Kugiumtzis, D. (2010). Nonuniform state-space re- construction and coupling detection. Physical Review E, 82(1), 016207. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small- world” networks. Nature, 393(6684), 440–442. DOI: https:// doi.org/10.1038/30918, PMID: 9623998 Wibral, M., Rahm, B., Rieder, M., Lindner, M., Vicente, R., & Kaiser, (2011). Transfer entropy in magnetoencephalographic data: J. Quantifying information ﬂow in cortical and cerebellar networks. Progress in Biophysics and Molecular Biology, 105(1–2), 80–97. DOI: https://doi.org/10.1016/j.pbiomolbio.2010.11.006, PMID: 21115029 Wollstadt, P., Lizier, J. T., Vicente, R., Finn, C., Martínez-Zarzuela, M., Mediano, P., . . . Wibral, M. IDTxl: The Informa- tion Dynamics Toolkit xl: A Python package for the efﬁcient analysis of multivariate information dynamics in networks. Jour- nal of Open Source Software, 4(34), 1081. DOI: https://doi .org/10.21105/joss.01081 (2019). Xia, C. H., Ma, Z., Cui, Z., Bzdok, D., Thirion, B., Bassett, D. S., . . . Witten, D. M. (2020). Multi-scale network regression for brain-phenotype associations. Human Brain Mapping, 41, 2553–2566. DOI: https://doi.org/10.1002/hbm.24982, PMID: 32216125, PMCID: PMC7383128 Young, M. P. (1993). The organization of neural systems in the primate cerebral cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 252(1333), 13–18. DOI: https://doi.org/10.1098/rspb.1993.0040, PMID: 8389046 Zalesky, A., Fornito, A., & Bullmore, E. (2012). On the use of correlation as a measure of network connectivity. NeuroImage, 60(4), 2096–2106. DOI: https://doi.org/10.1016/j.neuroimage .2012.02.001, PMID: 22343126 Zalesky, A., Fornito, A., Cocchi, L., Gollo, L. L., van den Heuvel, M. P., & Breakspear, M. (2016). Connectome sensitivity or speci- ﬁcity: Which is more important? NeuroImage, 142, 407–420. DOI: https://doi.org/10.1016/j.neuroimage.2016.06.035, PMID: 27364472 Zalesky, A., Fornito, A., Harding, I. H., Cocchi, L., Yücel, M., Pantelis, C., & Bullmore, E. T. (2010). Whole-brain anatomical networks: Does the choice of nodes matter? NeuroImage, 50(3), 970–983. DOI: https://doi.org/10.1016/j.neuroimage.2009.12.027, PMID: 20035887 Zanin, M. (2015). On alternative formulations of the small-world metric in complex networks. Zhou, S., & Mondragon, R. (2004). The rich-club phenomenon IEEE Communications Letters, 8(3), in the internet 180–182. DOI: https://doi.org/10.1109/LCOMM.2004.823426 topology. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . t / / e d u n e n a r t i c e - p d l f / / / / / 5 2 3 7 3 1 9 1 3 5 4 1 n e n _ a _ 0 0 1 7 8 p d t . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Network Neuroscience 404 RESEARCH image

Download pdf