ARTICLE - Specialized Research AI at MIT

ARTICLE

Communicated by Terrence Sejnowski

Face Representations via Tensorfaces of Various Complexities

Sidney R. Lehky
sidney.lehky@riken.jp
Cognitive Brain Mapping Laboratory, RIKEN Center for Brain Science, Wako-shi,
Saitama 351-0198, Japan, and Computational Neurobiology Laboratory,
Salk Institute, La Jolla, CA 92037, U.S.A.

Anh Huy Phan
a.phan@skoltech.ru
Center for Computational and Data-Intensive Science and Engineering, Skolkovo
Institute of Science and Technology, 143026 Moscow, Russia; and Institute of Global
Innovation Research, Tokyo University of Agriculture and Technology,
Tokyo 183-8538, Japan

Andrzej Cichocki
a.cichocki@skoltech.ru
Center for Computational and Data-Intensive Science and Engineering, Skolkovo
Institute of Science and Technology, 143026 Moscow, Russia; Systems Research
Institute, Polish Academy of Sciences, 01447 Warsaw, Poland; College of Computer
Science, Hangzhou Dianzu University, Hangzhou 310018, China; and Institute of
Global Innovation Research, Tokyo University of Agriculture and Technology,
Tokyo 183-8538, Japan

Keiji Tanaka
keiji@riken.jp
Cognitive Brain Mapping Laboratory, RIKEN Center for Brain Science,
Wako-shi, Saitama 325-0198, Japan

Neurons selective for faces exist in humans and monkeys. However,
characteristics of face cell receptive fields are poorly understood. In this
theoretical study, we explore the effects of complexity, defined as al-
gorithmic information (Kolmogorov complexity) and logical depth, on
possible ways that face cells may be organized. We use tensor decompo-
sitions to decompose faces into a set of components, called tensorfaces,
and their associated weights, which can be interpreted as model face cells
and their firing rates. These tensorfaces form a high-dimensional rep-
resentation space in which each tensorface forms an axis of the space.
A distinctive feature of the decomposition algorithm is the ability to
specify tensorface complexity. We found that low-complexity tensor-
faces have blob-like appearances crudely approximating faces, while

Neural Computation 32, 281–329 (2020)
https://doi.org/10.1162/neco_a_01258

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

282

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

high-complexity tensorfaces appear clearly face-like. Low-complexity
tensorfaces require a larger population to reach a criterion face recon-
struction error than medium- or high-complexity tensorfaces, and thus
are inefficient by that criterion. Low-complexity tensorfaces, however,
generalize better when representing statistically novel faces, which are
faces falling beyond the distribution of face description parameters
found in the tensorface training set. The degree to which face rep-
resentations are parts based or global forms a continuum as a func-
tion of tensorface complexity, with low and medium tensorfaces being
more parts based. Given the computational load imposed in creating
high-complexity face cells (in the form of algorithmic information and
logical depth) and in the absence of a compelling advantage to using
high-complexity cells, we suggest face representations consist of a mix-
ture of low- and medium-complexity face cells.

1 Introduction

The ability to recognize individual faces and interpret facial expressions
is central to human social interactions, as well as the social interactions
of some nonhuman primates (Leopold & Rhodes, 2010; Parr, 2011; Parr,
Winslow, Hopkins, & de Waal, 2000). Neurons whose responses are selec-
tive for faces have been demonstrated in humans and nonhuman primates,
both neurophysiologically and through fMRI (Duchaine & Yovel, 2015; Frei-
wald, Duchaine, & Yovel, 2016; Haxby, Hoffman, & Gobbini, 2000; Kan-
wisher & Yovel, 2006; Nestor, Plaut, & Behrmann, 2016; Parr, Hecht, Barks,
Preuss, & Votaw, 2009; Tsao, 2014; Tsao & Livingstone, 2008). How those
neurons are used to represent face is a matter of extensive research.

Neurophysiological evidence indicates that faces can be encoded using
a neural population code, with each face represented by a point within
a high-dimensional face response space (Chang & Tsao, 2017; Eifuku, De
Souza, Tamura, Nishijo, & Ono, 2004; Rolls & Tovée, 1995; Young & Ya-
mane, 1992). Each neuron forms an axis of the neural face space. Neural
responses within the high-dimensional response space can be visualized
through dimensional-reduction techniques such as multidimensional scal-
ing (MDS) or principal components analysis (PCA). The dimensionality of
face space has been estimated psychophysically to be on the order of 100
(Meytlis & Sirovich, 2007; Sirovich & Meytlis, 2009). Within an axis-based
face space, the average face may have special status as defining the origin of
the face space coordinate system (Leopold, Bondar, & Giese, 2006; Leopold,
O’Toole, Vetter, & Blanz, 2001; Rhodes & Jeffery, 2006; Tsao & Freiwald, 2006;
Wilson, Loffler, & Wilkinson, 2002), though this remains controversial.

The use of axis-based high-dimensional neural response spaces has
become commonplace for interpreting neural data, not just for describ-
ing faces responses but also for describing neural responses to object

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

283

stimuli in general. Those using an axis-based approach to characterize
include MDS studies
neurophysiological object responses (nonface)
(Kayaert, Biederman, & Vogels, 2005; Kiani, Esteky, Mirpour, & Tanaka,
2007; Lehky & Sereno, 2007; Murata, Gallese, Luppino, Kaseda, & Sakata,
2000; Op de Beeck, Wagemans, & Vogels, 2001; Romero, Van Dromme, &
Janssen, 2013; Sereno & Lehky, 2018; Sereno, Sereno, & Lehky, 2014), as well
as those based on PCA (Baldassi et al., 2013; Chang & Tsao, 2017). This axis-
based approach can be extended to interpreting fMRI data for objects, in this
case using each voxel as an axis for the high-dimensional response space
(Bracci & Op de Beeck, 2016; Connolly et al., 2012; Kravitz, Peng, & Baker,
2011; Kriegeskorte et al., 2008).

There are two perspectives on the development of face processing
circuitry in temporal cortex. The first is that there are face-specific neu-
ral processes that are hardwired (domain specificity) (Kanwisher, 2000;
McKone, Kanwisher, & Duchaine, 2007; Tsao & Livingstone, 2008; Yovel &
Kanwisher, 2004). The second is that the temporal cortex can also acquire
processing for different classes of nonface stimuli through experience
(expertise) (Cowell & Cottrell, 2013; Gauthier, Behrmann, & Tarr, 1999;
Gauthier, Skudlarski, Gore, & Anderson, 2000; Gauthier & Tarr, 1997; Tong,
Joyce, & Cottrell, 2008; Wang, Gauthier, & Cottrell, 2016). For the purposes
of this study, we remain agnostic between these possibilities, focusing on
the face representations themselves, not their development.

A neural face space is defined by the properties of the individual neu-
rons that constitute the axes of the space (plus possible interactions within
the face cell population if the face space is nonlinear). Therefore, a central
task in characterizing face space is to characterize those individual neurons.
As with high-level representations of objects in general, the complexity of
face representations at the population level reflects the complexity in the
organization of individual face cell receptive fields. Given the complexity
of face cell organization, a fruitful approach is to constrain the possibilities
of what aspects of facial features are important to face cells. An interesting
example of this sort of analysis is given by Freiwald, Tsao, and Livingstone
(2009) for monkey inferotemporal cortex, based on the geometry of facial
features and parts/whole organization using simple cartoon face stimuli.
In contrast, we have hesitations concerning the conclusions of Chang and
Tsao (2017) that face space corresponds to one unique linear space that they
have discovered. We believe that other linear face spaces are also consis-
tent with their data under their mathematical analysis methods, as we will
consider in section 4.

Here we suggest that image complexity may be a novel way to charac-
terize face representations, where complexity is given a well-defined math-
ematical definition. We approach the issue of face complexity theoretically
by using a mathematical technique based on tensor decomposition (Bro,
1997; Cichocki et al., 2015; Favier & de Almeida, 2014; Kolda & Bader,
2009; Rabanser, Shchur, & Günnemann, 2017; Sidiropoulos et al., 2017) that

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

284

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

allows us to vary the complexity of the face cells that constitute the encoding
dimensions. Complexity as used here is defined as Kolmogorov com-
plexity, also known as algorithmic information (Adriaans, 2019; Cover &
Thomas, 2006; Grünwald & Vitányi, 2008a, 2008b; Li & Vitányi, 2008), as
well as another complexity measure called logical depth (Bennett, 1988,
1994; Zenil, Delahaye, & Gaucherel, 2012). Comparing properties of face
representations with different complexities is the central focus of this
study.

Tensor analysis decomposes faces into a set of components called ten-
sorfaces. Under the algorithm used here, the original faces can be recon-
structed by a weighted linear sum of the tensorfaces (under other tensor
algorithms the mixing can be multilinear). A set of components and their
associated weights can be thought of as model face cells and their firing
rates. This tensor decomposition is analogous to reconstructing faces using
a weighted linear sum of components derived from principal components
analysis (PCA) (Turk & Pentland, 1991), a weighted linear sum of com-
ponents derived from independent components analysis (ICA) (Bartlett,
Movellan, & Sejnowski, 2002; Bartlett & Sejnowski, 1997), or a weighted
linear sum of components derived from nonnegative matrix factorization
(NMF; Wang, Jia, Hu, & Turk, 2005), among other possibilities. These de-
composition algorithms differ based on what constraint is applied to the de-
composition. PCA produces components subject to the constraint that they
are orthogonal, ICA that they are statistically independent, and NMF that
they are nonnegative. Another member of this genre of decomposing faces
into linear components is active appearance modeling (AAM) (Cootes, Ed-
wards, & Taylor, 2001; Edwards, Cootes, & Taylor, 1998), as used by Chang
and Tsao (2017). AAM is similar to PCA except that fiducial markers are
placed on the face images by hand to help with aligning features during
decomposition (thus, this is not an automatic algorithm). In this study, the
constraint we place on the face decomposition is that the components have
fixed complexity.

Tensor decomposition is not a single algorithm but a category of algo-
rithms. The term tensorface was originated by Vasilescu and Terzopoulos
(2002) for a particular nonlinear (multilinear) tensorface decomposition al-
gorithm (see also Vasilescu & Terzopoulos, 2002, 2003, 2005, 2011). We use
a different tensor algorithm to linearly decompose faces (Phan, Cichocki,
Tichavský, Zdunek, & Lehky, 2013), one that is constrained to produce ten-
sorfaces with specified image complexity. Each tensorface can be visual-
ized as a matrix of pixels, and the rank of that matrix serves as the direct
proxy of image complexity when running the algorithm. (Rank is defined
as the maximum number of linearly independent columns or rows in a
matrix.) Matrix rank is the input parameter specified for the algorithm to
specify the face complexity we want, while Kolmogorov complexity and
logical depth are calculated from the output tensorfaces after the algorithm
is run.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

285

Figure 1: Examples of different classes of faces included in the face sets.

We are not advocating the algorithm used here as a specific model of
biological face cells, and we are not interested in creating a canonical face
space (we believe such an effort is premature). Rather, we are interested in
exploring the concept of complexity in face representations in general us-
ing this algorithm as an example, with hopes that this concept will prove
useful in future investigations of biological face processing. We create ten-
sorfaces with specified complexity by adding a rank constraint to a tensor
decomposition algorithm. Creating other face representations with speci-
fied complexity could also be done by adding rank constraints to other de-
composition algorithms not based on tensor algorithms. An example of this
is PCA decomposition with a rank constraint added (Yu, 2016). We confine
ourselves here to issues of basic face representation and do not attempt to
categorize different views of individual faces because we are not creating a
full face recognition model.

2 Methods

2.1 Face Stimulus Set. Synthetic colored faces were generated using
FaceGen software (Singular Inversions, Inc.; facegen.com). Some details of
the FaceGen algorithms are discussed in Blanz and Vetter (1999). The face
set included equal numbers of males and females and equal numbers from
the four racial groups provided by the software: African, East Asian, Eu-
ropean, and South Asian. Because we included color in our consideration
of facial representations, we wanted to have different skin tones in the face
sample. Example faces are shown in Figure 1. Within each racial group, we
generated faces with random shape, color, and texture parameters using
the Generate button in the software control panel. This automatic, random

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

286

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

generation of faces sometimes led to unnatural-looking faces, which either
were rejected from inclusion in the face set or had their parameters man-
◦
ually tweaked. Faces had zero rotation. The illumination angle was 0
az-
imuth and 0
elevation. Tensor decomposition was carried out on a sample
set usually consisting of 128 faces (see the examples in Figure 2a). The re-
sulting tensorfaces were tested by using them to reconstruct a different set
of faces, a test set containing 40 faces (see Figure 2b).

◦

For this initial study of tensorface complexity, we have kept the face-
sample set simple, all front-facing with identical illumination. The multi-
way nature of tensor decompositions would allow inclusion of additional
image parameters as additional dimensions to the input tensor containing
the sample face set. For example, representations of rotated faces (changes
in viewpoint) are an important aspect of face identification (Fang, Murray, &
He, 2007; Freiwald & Tsao, 2010; Jiang, Blanz, & O’Toole, 2006; Natu et al.,
2010; Noudoost & Esteky, 2013; Perrett et al., 1991, 1985; Ramírez, Cichy,
Allefeld, & Haynes, 2014). Face rotation in depth (azimuth) could be added
as a fifth dimension to the current four-dimensional input tensor (x spa-
tial dimensions, y spatial dimension, color, and different individuals), and
analogously for additional image parameters.

2.2 Tensor Decomposition Algorithm Background. We computed face
components using tensor methods rather than the matrix methods used in
PCA, ICA, and NMF. PCA and other matrix techniques can deal only with
2D data. That means each face image must be unfolded or vectorized into
one long 1D vector. Then the vectors for the individual faces are placed
together to form the columns of a 2D matrix, which serves as the input to
PCA (see Figure 3a). In contrast, tensor methods can be applied to data with
an arbitrarily large number of parameter dimensions. Therefore, images do
not need to be vectorized, and each pixel within the image retains its spa-
tial context during the decomposition process (see Figure 3b). Here, we did
tensor decompositions of 4D face data structures, which included two spa-
tial dimensions for each face, color as the third dimension, and different
individuals as the fourth dimension. While PCA and other matrix methods
use linear algebra, tensor methods use multilinear algebra that allows con-
sideration of multiple parameter dimensions concurrently. While we used
a multilinear algorithm to decompose faces into a set of components and
weights, the faces were reconstructed linearly as the weighted sum of the
components.

2.3 Tensor Decomposition Algorithm. Matlab code and example face

files are available at https://github.com/slehky/tensorfaces-neco.

2.3.1 Matrix Operators. The tensor decomposition algorithm uses the
Kronecker, Khatri-Rao, and Hadamard products between two matrices, as

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

287

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 2: Face sets used to examine the tensor decomposition algorithm.
(a) Sample set. Shows 64 out of 128 faces serving as input to the algorithm to
create the tensorfaces. (b) Test set: A different set of faces to evaluate properties
of the tensorfaces.

288

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 3: Comparison between matrix methods used in PCA and tensor meth-
ods. (a) Matrix methods can operate only on 2D data. That requires faces to
be unfolded into 1D vectors before being placed as columns in a 2D matrix.
(b) Tensor methods allow data structures with an indefinite number of dimen-
sions. That means faces do not need to be vectorized but can be stacked on top
of each other, retaining their 2D organization. Here we used a 4D data structure
for faces, including two spatial dimensions, a color dimension, and a dimension
representing faces of different individuals.

well as Hadamard division. The properties and applications of those ma-
trix operators have been reviewed by Van Loan (2000), as well as Liu and
Trenkler (2008), and are included in the Matlab toolbox software of Kolda
et al. (2017) and Phan (2018). Here we briefly look at these operators before
describing the algorithm.

The Kronecker product ⊗ of the matrix A ∈ Mm,n and the matrix B ∈ Mp,q

is defined as

A ⊗ B =

⎡

⎢
⎣

a11B · · · a1nB
…
…
am1B · · · amnB

⎤

⎥
⎦ ,

(2.1)

Face Representations of Various Complexities

for example,

(cid:8)

(cid:9)

(cid:8)

⊗

g h

(cid:9)

⎡

⎢
⎢
⎢
⎢
⎣

ai bg bh

bi cg

a j

al b j

dg dh

di eg

dl e j

bl c j

f g

f j

289

⎤

⎥
⎥
⎥
⎥
⎦

f i

f l

(2.2)

f h

f k

The Kronecker product is the generalization to matrices of the vector outer
product. It is sometimes called the tensor product.

The Khatri-Rao product (cid:5) of the matrix A ∈ Mm,n and the matrix B ∈
Mp,n is defined as the Kronecker product between corresponding columns
of the two matrices:

A (cid:5) B = [a1

⊗ b1

, a2

⊗ b2

. . . an ⊗ bn] ,

(2.3)

where an and bn are the nth column vectors. The Khatri-Rao product is de-
fined only if the matrices have the same number of columns—for example:

(cid:8)

(cid:9)

(cid:8)

(cid:5)

(cid:9)

g h

⎡

⎢
⎢
⎢
⎢
⎣

ag bh

a j

dg eh

⎤

⎥
⎥
⎥
⎥
⎦

c j

f i

f l

(2.4)

The Hadamard product (cid:2) between two matrices A ∈ Mm,n and B ∈ Mm,n

is defined as the element-wise multiplication between them:

A (cid:2) B = [A]i j[B]i j

(2.5)

for all 1 ≤ i ≤ m, 1 ≤ j ≤ n. The Hadamard product is defined only if the
two matrices have the same dimension—for example:

(cid:8)

(cid:9)

(cid:8)

(cid:2)

g h

(cid:9)

(cid:8)

ag bh

(cid:9)

f l

(2.6)

Hadamard division (cid:7) is defined analogously as element-wise division be-
tween two matrices.

2.3.2 The Model. The tensor decomposition algorithm we use is de-
scribed by Phan et al. (2013). We have not made any changes to it but

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

290

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

present it in more detail here. The algorithm is a variant of the CAN-
DECOMP/PARAFAC (CP) algorithm (Carroll & Chang, 1970; Harshman,
1970). It falls into the category of structured or constrained CP incorpo-
rating a PARALIND algorithm (Bro, Harshman, Sidiropoulos, & Lundy,
2009). Constrained CP algorithms have been reviewed by Favier and de
Almeida (2014). Although this algorithm is derived from CP, it is not based
on an outer product sum of rank 1 components as is done by CP. Rather,
the decomposition is based on a Kronecker product between two tensors—
namely, a components tensor and a weights tensor. The structured CP algo-
rithm used here can be viewed in some sense as intermediate between two
commonly used tensor decomposition models, the conventional CP model
(Carroll & Chang, 1970; Harshman, 1970) and the Tucker model (Tucker,
1966), and incorporates aspects of both. The reason for using a structured
CP model in this study rather than either the conventional CP or Tucker
models is briefly outlined in Phan et al. (2013).

Consider a data tensor Y of size I1

× · · · × IN. Our aim is to repre-
sent this tensor by multiple basis components (tensorfaces) in which the
components were specified to have various levels of complex structures. In
our case, we are dealing with a four-way tensor (N = 4) with size 200 (pix-
els) × 200 (pixels) × 3 (color channels) × 128 (individuals), which repre-
sents 128 colored face images concatenated into a single data structure. All
calculations are performed with the color channels converted from RGB to
CIE 1976 L*A*B color space, which approximates human color vision more
closely.

× I2

The tensor decomposition algorithm we use factors the tensor Y into a

sum of components (basis patterns) and mixing weights:

Y ≈

P(cid:10)

p=1

X p ⊗ Ap,

(2.7)

where ⊗ denotes the generalized Kronecker product, X p are compo-
nents (tensorfaces in our case), and Ap the associated coefficient tensors
(weights), for p = 1, 2, . . . , P (P = number of patterns). Unlike matrix de-
compositions such as PCA, ICA, and NMF, where the weight for each com-
ponent must be a scalar, tensor decompositions can allow weights to be
a higher-order tensor, allowing multilinear mixing during reconstruction
(Vasilescu & Terzopoulos, 2002). However, in this model, we arranged the
algorithm such that the weights tensor is order-1 and rank-1, thereby mak-
ing the weight for each component scalar and the mixing linear. The compo-
nents tensor X p is a higher-rank tensor. Although the face reconstruction is
linear here, the decomposition of the input face tensor Y itself into weights
Ap and components X p is multilinear.

Various decomposition algorithms can be carried out subject to different
constraints on X p, such as orthogonality (PCA), statistical independence

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

291

(ICA), and nonnegativity (NMF), as well as possible constraints on Ap such
as sparseness. Here the constraint was on the tensor rank of X p, where we
take rank to be a measure of the complexity of the tensorface patterns. As
the number of components P is limited, the decomposition will be only ap-
proximately equal to the original data Y.

× Jp2

For our model, the weights Ap are of size Jp1

× · · · × JpM, with their
order (dimensionality) given by M. Within the algorithm, we defined Ap to
be order M = 1, and thus Ap is represented by an n × 1 vector, where n is
number of face images in the input set (typically n = 128). The patterns Xp
× · · · × KpL, with their order given by L. Xp are of order
are of size Kp1
L = 3, and form m × m × 3-sized tensors, where m is the size of the input
image in pixels—in our case, always 200 pixels.

× Kp2

Ap and X p are rank-Sp and rank-Rp tensors, respectively. The rank of Ap
is always Sp = 1. The rank of Xp is set over the range Rp = 2 to Rp = 32 for
different runs of the tensor decomposition algorithm. Examining the effects
of changing Rp (changing the complexity of tensorfaces) is a central concern
of this study.

The subscript p for different tensorface patterns is included for general-
ity, but we hold both the order and the rank of both Ap and X p constant for
all p. Notably the rank of X p is constant for the entire population of ten-
sorfaces during a single run. Although we had the option to set the rank of
each tensorface individually, we do not do so here.

In implementing the model, Ap and X p can be expressed as sets of matri-
ces U(m) and V(l) through canonical polyadic decomposition (CPD; Carroll
& Chang, 1970; Harshman, 1970) of Ap and X p (see Figure 4):

Ap = I ×

1 U(1)
p

2 U(2)
p

X p = I ×

1 V(1)
p

2 V(2)
p

· · · ×M U(M)

p
· · · ×L V(L)
p

(2.8)

(2.9)

×Sp and V(l) ∈ RKpl

where ×n is tensor matrix multiplication along the nth mode (dimension),
I is a tensor with ones along the superdiagonal, and the superscripts in-
dicate the dimension number and the subscripts the pattern number. The
×Rp. For Ap, which
sizes of the matrices were U(m) ∈ RJpm
has order Mp = 1 and rank Sp = 1, Um reduces to a single 128 × 1 vector.
For X p, which has order Lp = 3 and rank Rp as variably defined, there were
three matrices in which the number of rows was set equal to image dimen-
sions and the number of columns equal to tensorface rank. Assuming rank
Rp = 8 as an example, the sizes of the three matrices associated with each
pattern were 200 × 8, 200 × 8, and 3 × 8. It is here that the rank constraint
enters explicitly into the calculations. This model of Y is equivalent to a CP
(cid:11)
decomposition with total rank T =
P
p=1 RpSp.

The tensor decompositions in equations 2.7 to 2.9 are particular cases of
Kronecker tensor decomposition (KTD) and also constitute a generalized
model of block term decomposition (BTD) (De Lathauwer, 2008a, 2008b;

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

292

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

×I2

p (weights) and X

Figure 4: Illustration of the tensor decomposition equations. Order-3 tensors
×I3 are shown here as examples. (a) Block term decomposition (BTD)
Y ∈ RI1
for P terms of Kronecker tensor products of A
p (tensorface
patterns) (see equation 2.7), where ⊗ is the Kronecker product and P is the
number of tensorfaces in the decomposition. In general, the algorithm allows
tensor size for each term P to be set individually, as shown in the diagram, but
in practice, all were set the same size. (b) Rank-constrained BTD decomposi-
p and X
tion illustrated for a single term. A
p can each be expressed as a set of
matrices Um and V(l) (indicated by small rectangles) (see equations 2.8 and 2.9).
Setting the number of columns for those matrices equal to the desired rank val-
ues, Sp and Rp, respectively, imposes the rank constraints of the decomposition.
Rank of the X
p decomposition determines tensorface complexity. Rank of the
A

p decomposition was always 1.

Sorber, Van Barel, & De Lathauwer, 2013). If all Ap are of order 1 (i.e., M =
1), as was the case here, then the above model is simplified into the rank-Rp
◦ rank-1 BTD (Sorber et al., 2013).

In order to derive the algorithm that updates the factor matrices of the
basis patterns and the weight tensors, we rewrite the tensor decomposition
in equation 2.7 with rank constraints in equations 2.8 and 2.9 in the form of
the CP decomposition.

Lemma 1. The decomposition in equations 2.7 to 2.9 is equivalent to a structured
canonical polyadic decomposition,

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Y ≈ I ×

1 W (1) ×

2 W (2) · · · ×N W (N),

where the factor matrices W (n) are given by

(cid:12)

W (n) =

n = 1, 2, . . . , L

˜V (n)QX
˜U (n)QA n = L + 1, . . . , N

(2.10)

(2.11)

Face Representations of Various Complexities

˜U (n) =

˜V (n) =

(cid:13)
U (n)
1
(cid:13)
V (n)
1

(cid:14)

, U (n)
2

, . . . , U (n)
P
(cid:14)

, . . . , V (n)
P

, V (n)
2
(cid:15)

= blkdiag

(cid:15)

IR1

1T
R1

⊗ 1T
S1
⊗ IS1

, IR2
, 1T
R2

⊗ 1T
S2
⊗ IS2

, . . . , IRP
, . . . , 1T
RP

⊗ 1T
SP
⊗ ISP

293

(2.12)

(2.13)

(2.14)

(2.15)

(cid:16)

In equations 2.14 and 2.15, ⊗ is the Kronecker product, defined in equa-
tion 2.1. I is the tensor with ones along the superdiagonal, 1 is a vector of
ones, and T is the matrix transpose operator.

In this decomposition, due to properties of the Kronecker product, each
component (column) of U(n)
p was replicated Sp times in W(n) for n ≤ L, and
each component of V(n)
p was replicated Rp times in W(n) for n > L. Such be-
havior is related to the rank-overlap problem (the decomposition creates
multiple identical components), which often exists in real-world signals
such as chemical data, flow injection analysis (FIA) data; Bro, 1998; Bro et al.,
2009), or spectral tensors of EEG signals (Phan et al., 2013). However, in our
case, this does not lead to the creation of multiple identical tensorfaces X p
because each X p is the result of combining all factor matrices V(n)
p .

The structured CPD in lemma 1 is a particular case of parallel factor anal-
ysis (CANDECOMP/PARAFAC; Carroll & Chang, 1970; Harshman, 1970)
with linearly dependent loadings (PARALIND; Bro et al., 2009) in which
the dependency matrices (Bro et al., 2009) are fixed and given in lemma 1.
Discussions on the uniqueness of the CPD with linearly dependent load-
ings can be found in Guo, Miron, Brie, and Stegeman (2012) and Stegeman
and Lam (2012).

2.3.3 Algorithm. We use an alternating least squares (ALS) algorithm to
learn the approximate factorization of Y into Ap and X p. The ALS algo-
rithm is applied to the structured CPD in lemma 1 in order to iteratively
update ˜U(n) and ˜V(n):

(cid:15)

QL(cid:2)nQT
L
(cid:15)

QM(cid:2)nQT
M

(cid:16)−1 ,
(cid:16)−1 ,

(n = 1, 2, . . . , S) ,

(n = S + 1, 2, . . . , N) ,

˜U(n) ← GnQT
L

˜V(n) ← GnQT
M

where

Gn = Y(n)

(cid:17)
W(N) (cid:5) · · · (cid:5) W(n+1) (cid:5) W(n−1) (cid:5) · · · (cid:5) W(1)

(cid:18)

(cid:17)
W(1)T W(1)

(cid:18)

(cid:2)n =

(cid:2) · · · (cid:2)
(cid:18)

(cid:17)
W(N)T W(N)

(cid:2) · · · (cid:2)

(cid:17)

W(n−1)T W(n−1)

(cid:18)

(cid:17)
W(n+1)T W(n+1)

(cid:18)

(cid:2)

(2.16)

(2.17)

(2.18)

(2.19)

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

294

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

and (cid:5) and (cid:2) denote the Khatri-Rao product (see equation 2.3) and
Hadamard product (see equation 2.5), respectively.

(cid:19)
(cid:19)Y − ˆY

Updating ˜U(n) and ˜V(n) in turn allows updates of Ap and X p through
equations 2.8 and 2.9. ALS acts to iteratively adjust the factors Ap and X p
in equation 2.7 so as to minimize the Frobenius error between the original
data tensor Y and the reconstructed data tensor ˆY, Error =
F. ˆY is
calculated from the estimated Ap and X p during each iteration. The ALS
algorithm updates each parameter sequentially, in contrast to error mini-
mization using a gradient descent algorithm, which updates all parameters
simultaneously. The error minimization loop is begun by initializing Ap
and X p using the singular value decomposition (SVD) of Y. SVD is per-
formed on a matrix in which each column is formed by vectorizing a face
image (creating a vector with 200 × 200 × 3 pixels), with the number of
columns equal to the number of images (128 images). The left SVD vector is
saved to a tensor with image dimensions, then approximated by a low-rank
tensor using CANDECOMP/PARAFAC, and finally assigned as initializa-
tion of X p. Ap is initialized using the right SVD vector.

(cid:19)
(cid:19)

Although we did not impose nonnegativity constraints, they could be
included using the iterative algorithm below (Cichocki, Zdunek, Phan, &
Amari, 2009; Lantéri, Soummer, & Aime, 1999; Lee & Seung, 1999; Lin,
2007):

˜U(n) = ˜U(n) (cid:2)

˜V(n) = ˜V(n) (cid:2)

(cid:15)

(cid:16)

(cid:7)

GnQT
L

(cid:16)

(cid:7)

GnQT
M

(cid:18)

(cid:17)
˜U(n)QL(cid:2)nQT
L
(cid:17)

,
(cid:18)

(n = 1, 2, . . . , S) ,

˜V(n)QM(cid:2)nQT
M

(n = S + 1, 2, . . . , N) ,

(2.20)

where (cid:7) denotes (element-wise) Hadamard division.

2.4 Reconstruction Error. We measure the error between original faces
and faces reconstructed from a set of tensorface components. Error is cal-
culated as the Frobenius norm (Euclidean matrix norm) of the pixel-wise
difference between the original face and the reconstructed face, divided by
the Frobenius norm of the original face:

Err =

(cid:20)(cid:11)

(cid:11)

n
i=1
(cid:20)(cid:11)

m
j=1

(cid:11)

(cid:15)

ai j

− ˆai j

(cid:16)

n
i=1

m
j=1 a2
i j

(2.21)

Reconstructions and reconstruction errors are meant to illustrate the
amount of information contained in the tensorfaces and associated weights
and are not intended to imply that the brain reconstitutes face pixel maps
somewhere along the visual pathways.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

295

2.5 Displaying Tensorfaces. The pixel values of the tensorfaces pro-
duced by the tensor decomposition algorithm generally extend beyond the
range of values allowed by the L ∗ a ∗ b color space, as the decomposition
was not constrained to fit requirements of the color space. The L (luminance)
channel allows values on the range 0 to 100, while the a (red-green oppo-
nent) and b (blue-yellow opponent) channels both allow values on the range
−100 to 100. For display purposes, each tensorface was individually nor-
malized to fill out the allowable values of the color space. The L channel
was separately normalized, while the a and b channels were jointly normal-
ized so as not to affect the color balance between the two. After the L ∗ a ∗ b
color space normalization, the tensorface was converted to RGB color space
for display.

2.6 Kolmogorov Complexity (Algorithmic Information). The Kol-
mogorov complexity of a pattern or, equivalently, the algorithmic informa-
tion it contains is the length of the shortest algorithm required to reproduce
it (Grünwald & Vitányi, 2008a, 2008b; Li & Vitányi, 2008). In other words,
the complexity of a pattern is the size of the most compressed descrip-
tion of the pattern. The concept of Kolmogorov complexity was indepen-
dently introduced by Solomonoff (1964), Kolmogorov (1965), and Chaitin
(1969), and is sometimes known as Kolmogorov-Chaitin-Solomonoff (KCS)
complexity.

To illustrate the difference between algorithmic information and Shan-
non information, consider a communications channel in which only two
messages are possible: face A or face B. Whenever one of those faces is trans-
mitted, the Shannon information is one bit because there are only two possi-
bilities. However, the algorithmic information transmitted is vastly higher
because it requires many bits to form a complete description of the face.

While the definition of Kolmogorov complexity is straightforward, de-
termining its value is problematic as there is no systematic way to determine
the most compact description of a pattern. In other words, Kolmogorov
complexity is uncomputable (no algorithm exists). In practice, therefore, we
use lossless compression algorithms to approximate an upper bound to the
complexity of the tensorfaces (Ruffini, 2017).

Here, we base our estimate of the Kolmogorov complexity of tensorfaces
on the file size of the tensorface images after they underwent a lossless com-
pression. That is done by saving a tensorface image in PNG image format
and noting the number of bits in the saved file. The PNG image format
uses an efficient, lossless compression algorithm called DEFLATE, based
on the Lempel-Ziv algorithm (Lempel & Ziv, 1976; Ruffini, 2017) together
with Huffman coding. To further compress the tensorface files beyond the
standard PNG format, we use the program ImageOptim (imageoptim.com),
which ran an additional set of compression algorithms, also based on DE-
FLATE, that are more efficient but too-time consuming for ordinary use,
combining the results of those compression algorithms. The algorithms

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

296

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

included Zopfli, PGNOUT, OptiPGN, AdvPGN, and PGNCrush. Using Im-
ageOptim reduces tensorface file sizes beyond the standard PNG compres-
sion by an amount depending on tensorface rank, ranging on average from
19% for rank = 2 tensorfaces to 9% for rank = 32 tensorfaces.

The number of bits in the compressed image was then normalized by
the number of pixels in the image, giving an estimate of Kolmogorov com-
plexity as bits per pixel for the compressed image. While all tensorface im-
ages had identical file sizes when uncompressed and initially had 24 bits
per pixel, some tensorfaces were more compressible than others, reflecting
image complexity.

After setting the desired rank of the tensor decomposition, the face sam-
ple set was decomposed into 100 components, and then the decomposition
was replicated 10 times to produce 1000 tensorfaces. Kolmogorov complex-
ity was averaged over those 1000 tensorfaces. The same set of tensorfaces
was used in calculations of logical depth, power spectra, and globality de-
scribed below.

In this analysis, the tensor decomposition algorithm provides us with
model face receptive fields (tensorfaces). Such fields are presented as im-
ages of receptive fields, analogous to the way that V1 Gabor receptive fields
are presented as images of the receptive fields. Having access to such re-
ceptive field images makes it feasible to employ mathematical methods for
evaluating algorithmic complexity. On the other hand, in an experimental
neurophysiological situation, producing images of face cell receptive fields
is problematic because of the intractability with finding optimal face stim-
uli given undefined spatial nonlinearities in the receptive fields. We discuss
nonlinearities in face cells further below.

2.7 Logical Depth. Logical depth is another way to measure the com-
plexity of tensorfaces. In the present context, logical depth is the duration
of computational time required to restore an image back to its original state
after it has been maximally compressed in a lossless manner. The concept
of logical depth was originated by Bennett (1988, 1994) and has previously
been applied to the characterization of images by Zenil et al. (2012). The ba-
sic idea is that objects that “contain internal evidence of a nontrivial causal
history” (Bennett, 1988) have a complex structure that requires more com-
putational time to reconstitute from their shortest descriptions (maximally
compressed states) than objects without complex structure.

While Kolmogorov complexity can be thought of as measuring complex-
ity in terms of space (the length of the shortest description of an object),
logical depth measures complexity in terms of time (the number of com-
putational steps required to reconstruct the object from that shortest de-
scription). An important difference between the two is that Kolmogorov
complexity considers both structured states and random states to be com-
plex, but logical depth considers only structured states as complex, while
treating both trivial and random states as noncomplex. Thus, as Zenil et al.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

297

(2012) pointed out, logical depth may lie closer to our intuitive concept of
complexity than Kolmogorov complexity.

To measure logical depth, we first compress the tensorface images by
running the program dzip within Matlab (Mathworks, Natick, MA). Then
the image is uncompressed using dunzip, and the elapsed time to perform
the uncompression was measured using the Matlab tic-toc timer function.
The uncompression time is measured 1000 times for each tensorface and
then averaged. Timing is measured with no user applications running aside
from Matlab, with WiFi and Bluetooth turned off, and nothing attached to
any of the computer ports.

Dzip implements the DEFLATE lossless compression algorithm.
(Dzip and dunzip are available for download from the Matlab File
Exchange: www.mathworks.com/matlabcentral/fileexchange/8899-rapid
-lossless-data-compression-of-numerical-or-string-variables.)

2.8 Power Spectra. We calculated the 2D spatial frequency power spec-
tra of tensorfaces having different levels of complexity. The tensorfaces were
first converted from color to grayscale images. The 2D spectra were then
transformed to 1D by performing rotational averaging (averaging spectral
power over all orientations in the images).

2.9 Globality Index. We define the globality of a tensorface component
as the fraction of the face it covers. This is the number of pixels in a ten-
sorface divided by the average number of pixels in a face (averaged over
all faces in the sample). The number of pixels in the faces is simple to de-
termine, as the faces are on a black background and easy to segment. The
number of pixels in a tensorface is more difficult, as the tensorfaces had a
continuum of values that could blend in with the background. Including
all tensorface pixels that differed just a tiny bit from the background would
greatly inflate the size of the tensorfaces and therefore their globality.

We therefore follow the following procedure to exclude small pixel val-
ues from the globality calculations and isolate the high-activity regions of
the tensorfaces. First, we convert the tensorfaces to grayscale and subtract
the background, leaving the tensorfaces on a black background. Then we
set a gray threshold level and exclude pixels below that level. The thresh-
old is set using Otsu’s method (Otsu, 1979), which minimizes the intra-
class variance of the pixels above and below threshold (Matlab command
graythresh in the Image Processing Toolbox). The grayscale tensorface is
then binarized based on that threshold level, with pixels above threshold set
to white and those below set to black. This thresholding typically leaves the
high-activity tensorface regions as a set of disjoint white patches. To create
a unitary tensorface region for purposes of globality calculations, all the in-
dividual white patches are enclosed by their convex hull (Matlab command
convhull). The interior of this convex hull constitutes the high-activity re-
gion of the tensorface. Finally, the area of a tensorface enclosed by the

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

298

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

convex hull, measured in pixels, is divided by the area of the face. The re-
sulting fraction is the globality index of the tensorface.

2.10 Selectivity and Sparseness. We use kurtosis as a measure of both
the selectivity of single tensorfaces and the sparseness of populations of ten-
sorfaces. Kurtosis is a measure of the shape of a probability distribution—in
this case, the distribution of tensorface responses to stimuli. A high kurtosis
distribution, corresponding to high selectivity or high sparseness, empha-
sizes the peak and tails of the distribution with less probability in between.
A low kurtosis distribution, corresponding to low selectivity or low sparse-
ness, has a flatter distribution. By tensorface “response” to a stimulus, we
mean the weight associated with that tensorface when reconstructing the
stimulus image.

Cell selectivity is based on the probability distribution of the responses
of a single cell (single tensorface) when presented with a set of stimuli over
time. Selectivity has also been called the “lifetime sparseness” of single neu-
rons (Willmore & Tolhurst, 2001). Population sparseness is based on the
probability distribution of the simultaneous responses of a population of
cells to a single stimulus (using the terminology of Lehky, Kiani, Esteky, &
Tanaka, 2011; Lehky, Sejnowski, & Desimone, 2005; Lehky & Sereno, 2007).
In the work presented here, responses could take both positive and nega-
tive values, which we interpret as deviations from a spontaneous level of
activity, and probability distributions were roughly symmetrical.

The equation for kurtosis is

kurtosis =

(cid:11)

− ¯r)4

n
i=1 (ri
(n − 1) s4

− 3.

(2.22)

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

For single-cell responses, ri refers to the response of the neuron to the ith
stimulus, and n refers to the number of stimuli. For population responses,
ri refers to the response of the ith cell in the population to a single partic-
ular stimulus, and n refers to the number of cells in the population. Mean
response is indicated by ¯r, and the standard deviation of the responses is
given by s. Subtracting three scales values so that a normal distribution has
a reduced kurtosis of zero.

Kurtosis has previously been used as a measure of selectivity and sparse-
ness in the theoretical literature (Bell & Sejnowski, 1997; Olshausen & Field,
1996; Simoncelli & Olshausen, 2001). Kurtosis has also been used in the ex-
perimental literature for extrastriate cortex (Lehky et al., 2011, 2005; Lehky
& Sereno, 2007; Tolhurst, Smyth, & Thompson, 2009).

2.11 Multidimensional Scaling. Multidimensional scaling (MDS;
Hout, Papesh, & Goldinger, 2013) is used to visualize the face space
produced by tensor decomposition. The MDS analysis is based on the

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

299

tensorface weights that allow reconstruction of the faces in the sample set,
after the faces have been decomposed into a set of 100 tensorfaces. We
examined responses (weights) of a population of 100 tensorfaces to each
of the 128 faces in the sample face set, as well as the responses of those
tensorfaces to the average face calculated from the 128 faces. Thus, in total,
we have population responses for 129 faces. These faces form 129 points
in a 100-dimensional face space defined by the tensorface population.
Because the relative positions of faces in the high-dimensional face space
cannot be visualized, we use MDS to reduce the dimensionality of the face
space down to two dimensions while maintaining approximate relative
positions. While MDS is useful for low-dimensional visualization, the MDS
algorithm has nonlinearities within it and should not be relied to produce
a quantitatively accurate depiction of biological face space.

The responses of the tensorface population to a single face form a re-
sponse vector with a length of 100 elements, defining the position of that
face in face space. The first step in performing MDS is to calculate the dis-
tances between response vectors for all 129 faces, forming a 129 × 129 dis-
tance matrix. A Euclidean distance metric is used. The distance matrix is
then fed into the cmdscale command in the Matlab Statistics and Machine
Learning Toolbox, which performs the MDS.

3 Results

3.1 Appearance of Tensorfaces. The tensor decomposition algorithm
was applied to a set of 128 sample faces (the examples are shown in
Figure 2a), producing tensorface components. Shown are the resulting
low-complexity tensorfaces (rank = 2, Figure 5), medium-complexity ten-
sorfaces (rank = 8, Figure 6), and high-complexity tensorfaces (rank = 32,
Figure 7). These are all shown as 200 × 200 pixel images. The number of ten-
sorface components created by the algorithm was set by a parameter, and
here we show examples of a decomposition of the face set into 40 compo-
nents. The qualitative appearance of the components did not change as we
varied the number of components over the range 5 to 100.

An expanded view of example tensorfaces at the three complexity levels
is shown in Figure 8. As the complexity increases, the face representation
progresses from crude blobs to a clear face-like appearance.

For comparison, eigenfaces resulting from a PCA decomposition of the
sample face set are shown in Figure 9. They most closely resemble the high-
complexity tensorfaces. The eigenfaces have rank = 142, so they are more
complex than any of the tensorfaces we created. Applying ICA to the sample
faces produced components that qualitatively resembled the eigenfaces and
were also highly complex, with the same rank = 142.

3.2 Reconstructing Faces Using Tensorfaces. The tensorfaces were
used to reconstruct a set of test faces (see Figure 3b), which were different

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

300

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 5: Tensorfaces with low complexity (rank = 2).

from the sample faces used to create the tensorfaces. Although the tensor
decomposition algorithm used to create the tensorfaces is nonlinear, recon-
structing faces from a population of tensorfaces is a linear process. These
face reconstructions are used to graphically illustrate how much informa-
tion is available in the tensorfaces for representing faces and does not im-
ply that the brain reconstitutes face bitmaps somewhere along the visual
pathways.

Face reconstructions are shown in Figures 10a (reconstructed using
low-complexity tensorfaces), 10b (reconstructed using medium-complexity
faces), and 10c (reconstructed using high-complexity tensorfaces). In all
three cases the reconstructions are subjectively comparable, showing that
even the blob-like, low-complexity tensorfaces are capable of providing a
good face representation.

Reconstruction errors are plotted as a function of the number of compo-
nents in Figure 11a. Reconstruction error is the normalized Euclidean pixel-
wise distance between original and reconstructed images. Not surprisingly,
performance improved as tensorface population size increased. There was
a trade-off between tensorface complexity and the population size required

Face Representations of Various Complexities

301

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 6: Tensorfaces with medium complexity (rank = 8).

to reach a criterion error level. A large population of low-complexity tensor-
faces can match the performance of a smaller population of high-complexity
tensorfaces.

Reconstruction error is plotted as a function of complexity in Figure
11b (holding the tensorface population size constant at 100). Error is large
for low-complexity tensorfaces, with the error dropping greatly going to
medium complexity but then staying approximately constant with further
increases in complexity. There is, in fact, a slight rise in reconstruction er-
ror at high complexities. That is because error is being measured here on a
test set of faces different from the sample set used to create the tensorfaces,
and high-complexity tensorfaces have a poorer ability to generalize to new
stimuli. (Generalization is further discussed below.)

3.3 Computational Complexity of Tensorfaces. We have been measur-
ing complexity in terms of the rank of the matrix of pixel values represent-
ing a tensorface image. The algorithm allows specification of the desired
tensorface rank resulting from the decomposition process. Matrix rank is
the minimum number of column vectors that can be used to generate all

302

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

Figure 7: Tensorfaces with high complexity (rank = 32).

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 8: Example tensorfaces with different levels of complexity.

the columns in the matrix (equivalently, it can be done in terms of rows
rather than columns). For example, a tensorface with rank = 8 means that
all 200 columns in the tensorface image can be generated by different linear

Face Representations of Various Complexities

303

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

Figure 9: Eigenfaces resulting from PCA decomposition of the sample face set.
This shows the first 64 eigenfaces out of 128. PCA calculated after converting
from RGB to L*A*B color space. When vectorizing the faces, the three color chan-
nels were concatenated to form one long 1D vector for each face. The average
face was not subtracted prior to performing PCA, so the first eigenface here is
the average face.

combinations of just eight column vectors. A matrix with a larger rank re-
quires a larger basis set of vectors to define it and is therefore more complex.
A standard way to measure complexity within computational theory
is Kolmogorov complexity, also known as algorithmic information (Grün-
wald & Vitányi, 2008b; Li & Vitányi, 2008). As described in section 2, we op-
erationally define Kolmogorov complexity as the number of bits per pixel
required to store the tensorface image after undergoing maximal lossless
compression. A more complex image requires a larger file size. The rela-
tionship between complexity measured as matrix rank and Kolmogorov

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

304

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 10: Reconstructions of the face test set (see Figure 3b) using tensorfaces.
(a) Reconstruction using tensorfaces with low complexity (rank = 2, Figure 5).
(b) Reconstruction using tensorfaces with medium complexity (rank = 8, Fig-
ure 6). (c) Reconstruction using tensorfaces with high complexity (rank = 32,
Figure 7).

Face Representations of Various Complexities

305

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

Figure 11: Plots of reconstruction errors. (a) Mean reconstruction error as a
function of the number of tensorface components, holding tensorface complex-
ity (rank) constant. Mean was calculated over 128 faces in sample set. (b) Mean
reconstruction error as a function of tensorface complexity (rank), holding the
number of tensorface components constant. Mean and standard error calculated
over 128 faces in the sample set.

complexity is plotted in Figure 12a. We see that Kolmogorov complexity
correlates with complexity measured by rank. A second measure of com-
putational complexity is logical depth, the time duration of computations
required to uncompress a compressed tensorface (Bennett, 1988, 1994; Zenil
et al., 2012). The logical depth of tensorfaces as a function of tensorface rank
is plotted in Figure 12b.

Both Kolmogorov complexity and logical depth provide similar esti-
mates of the relative complexity of different tensorfaces. Tensorfaces that

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

306

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

Figure 12: Complexity measurements of tensorface images. (a) Relation be-
tween tensorface rank and mean Kolmogorov complexity. (b) Relation between
tensorface rank and mean logical depth. Means were calculated from 100 ten-
sorfaces. Shaded area shows standard deviation.

compress to a small file size (low Kolmogorov complexity) take less com-
putational time to uncompress (small logical depth). Tensorfaces that pro-
duce large compressed file sizes (high Kolmogorov complexity) take more
computational time to uncompress (large logical depth).

Note the large standard deviations in Figure 12. For each plotted point,
although all tensorfaces had identical matrix ranks, the resulting values for
Kolmogorov complexity and logical depth were spread out over a broad
range. The relation between tensorface rank and the two complexity mea-
surements is therefore statistical rather than deterministic.

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

307

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

Figure 13: Spatial frequency power spectra of tensorfaces. (a) Power spectra as
a function of spatial frequency for tensorfaces for different rank values. Geo-
metrical means of power spectra for 100 tensorfaces shown. (b) Power at a high
spatial frequency (100 cycles/image) as a function of tensorface rank. Geomet-
rical means and geometrical standard deviations plotted.

In addition to characterizing tensorfaces by their complexity as defined
by computational theory, we can also characterize them using concepts
from signal processing theory. The average spatial frequency power spectra
of tensorfaces at rank = 2, 8, and 32 are plotted in Figure 13a. We see that as
tensorface complexity increases, the spectral power at high spatial frequen-
cies also increases (see Figure 13b). Computational complexity of tensorface

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

308

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

receptive fields (Figures 12a and 12b) correlates strongly with their Fourier
power at high spatial frequencies.

3.4 Selectivity and Sparseness of Tensorfaces. Selectivity and sparse-
ness of neural responses to stimuli are major concerns in neural coding
theory. Under our terminology, selectivity is a function of the statistical dis-
tribution of responses of a single neuron to a large set of stimuli presented
sequentially (Lehky et al., 2005). Sparseness is a function of the statistical
distribution of responses over a population of neurons when simultane-
ously presented with a single stimulus. Here, we quantify both selectiv-
ity and sparseness by calculating kurtosis of the appropriate probability
distribution.

Tensorface selectivity is plotted as a function of rank in Figure 14a. Al-
though there is a lot of variability for different tensorfaces, the median value
of selectivity (kurtosis) is close to zero, independent of tensorface complex-
ity (rank). That means that as one presents many faces to a particular ten-
sorface, responses tend to be gaussian distributed, as gaussians have zero
reduced kurtosis. Population sparseness of tensorfaces is plotted as a func-
tion of rank in Figure 14b. Although there is higher population sparseness
with very low complexities, for medium and high complexities, the sparse-
ness settles down to values of around 1.0.

Sparseness and selectivity of tensorfaces are lower than reported in mon-
key inferotemporal cortex (Lehky et al., 2011), with single-unit selectivity
= 1.88 and population sparseness = 9.61 (as measured by kurtosis). One
reason tensorfaces have lower sparseness values and lower selectivity val-
ues is that responses of tensorfaces do not include a threshold nonlinearity
for response rates. In real neurons, response rates cannot have negative val-
ues. That causes the probability distribution of response rates to be skewed
to the right, leading to higher sparseness and selectivity values. Without
that threshold nonlinearity, the response probability distributions of ten-
sorfaces are closer to gaussian and the sparseness and selectivity values are
therefore smaller.

Another factor reducing model values of sparseness and selectivity is
that tensorfaces are linear filters, as are all the face decompositions men-
tioned earlier (PCA, ICA, NMF, AAF), whereas biological face cells are non-
linear spatial filters, as is the case for inferotemporal object representations
in general (Tanaka, 1996). By being nonlinear spatial filters, we mean that
different portions of the receptive field sum nonlinearly to produce the total
response of a neuron to an object stimulus. As the nature of the spatial non-
linearities within face cells and object cells is unknown, we cannot quantify
their contributions to sparseness and selectivity. Spatial nonlinearities are
discussed further below.

3.5 Tensorfaces: Local or Global Representations. We define the glob-
ality of a tensorface as the fraction of the face covered by that tensorface.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

309

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

Figure 14: Median cell selectivity and population sparseness of tensorfaces as
a function of tensorface rank. Responses are from 100 tensorfaces stimulated by
the 128 faces in the sample set. Cell selectivity and population sparseness are
both calculated as kurtosis of the probability distribution of responses. Cell se-
lectivity refers to sparseness of single neurons calculated to a set of stimuli pre-
sented over time (also called lifetime sparseness). Population sparseness refers
to population response to a single stimulus. The shaded area shows interquar-
tile range. (a) Cell selectivity. (b) Population sparseness.

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Thus, local and global representations formed end points on a continuum
rather than a dichotomy. Within that continuum, we observe tensorfaces
that can be strongly local or strongly global (Figure 15a), and also every-
thing in between.

To measure globality, we threshold the tensorfaces to include only the
envelope (convex hull) of the areas that gave a “strong” activation, as

310

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 15: Globality of tensorface representations. (a) Examples of local and
global tensorfaces. (b) Examples showing tensorface “high-activation” regions
(enclosed by black lines) used to define the area covered by a tensorface. The
globality of a tensorface is defined as the area of the tensorface divided by the
average area of a face. (c) Plot of mean globality as a function of tensorface rank.
The mean is calculated over 100 tensorfaces. The shaded area shows standard
deviation.

described in section 2. Examples of such thresholded tensorfaces are shown
in Figure 15b, with the high-activation region outlined in a black line. Only
the high-activation region was used in calculations of globality.

Globality as a function of tensorface complexity is plotted in Figure 15c.
More complex tensorfaces have greater globality on average, although there
is large variability in globality across a tensorface population. This rela-
tionship breaks down at the lowest values of rank. The discrepancy at low
ranks appears to be an artifact of the methodology we are using to calculate

Face Representations of Various Complexities

311

globality. Some low-rank tensorfaces form bilaterally symmetric pairs of
blobs at the left and right edges of the face. When those two widely sepa-
rated blobs are enclosed in an envelope to define the high-activation region
of the tensorface that inflates the area covered by the tensorface, that in-
creases the globality measure as we calculate it.

3.6 Generalization to Statistically Novel Categories of Faces. When
previously examining face reconstructions based on tensorfaces (see Fig-
ures 10 and 11), we used a set of test faces (see Figure 3b) that closely
resembled the original sample set (see Figure 3a). Although the two sets
contained different individuals, both are drawn from the same statistical
distribution of face parameters in the face-generating software and thus are
statistically nonnovel. As used here, the statistically nonnovel/statistically
novel distinction is based purely on the physical characteristics of faces and
not cognitive and semantic factors involved.

In Figure 16, we examine what happens when we reconstruct statisti-
cally novel faces that are radically different from those used to create the
tensorfaces. Yoda (see Figure 16ai) is a face we can instantly perceive with-
out a period of training, yet it is unlikely from an evolutionary perspective
that we would have developed face cells specifically tuned to handle that
stimulus. Presented here are reconstructions of Yoda using tensorface pop-
ulations of different complexities that were created using human faces. The
quality of all the tensorface reconstructions is poor, as nothing resembling
this stimulus was part of the face sample set used to create the tensorfaces.
However, subjectively, it looks as if the low-complexity reconstruction is
better than the high-complexity one. The high-complexity reconstruction
appears to be overconstrained to resemble the faces in the training set.

Measuring reconstruction error, we can see the reconstruction error of
Yoda does indeed get worse as tensorface complexity increases (see Fig-
ure 16aii, dashed line). That trend is the opposite of what we saw for the
reconstruction of the test face set, where reconstruction error decreased
with greater tensorface complexity (see Figure 16aii, solid line, repeated
from Figure 11b). Figure 16bi show the reconstruction of another statisti-
cally novel face far beyond the bounds of what was included in the sample
face set, the face of a chimpanzee. As with Yoda, we see in Figure 16bii that
reconstruction error increases with tensorface complexity, the opposite of
what occurs when reconstructing the test face set.

Although high-complexity tensorfaces produce the best reconstructions
of faces that are statistically nonnovel, they have a reduced ability to gen-
eralize to faces in statistically novel faces. For statistically novel faces, the
low-complexity tensorfaces produce the best reconstructions. The lower
ability to generalize as tensorface complexity increases cannot be explained
by changes in the selectivity of tensorfaces. We see in Figure 14a that
tensorface selectivity remains constant (and low) regardless of tensorface
complexity.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

312

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 16: Reconstruction of statistically novel faces. These faces are radically
different from sample set (Figure 3a) used to create the tensorfaces. (ai) Re-
constructing Yoda using tensorfaces with different complexities. (aii) Relative
reconstruction error for Yoda as a function of tensorface complexity (rank)
(dashed line), as well as relative reconstruction error for the test face set (solid
line). The solid line is duplicated from Figure 11b. Reconstruction error de-
creases as a function of tensorface complexity for familiar faces (solid) but
increases for the statistically novel face (dashed). Each line independently nor-
malized so that the maximum equals one. (bi) Reconstructing chimp using
tensorfaces with different complexities. (bii) Relative reconstruction error for
chimp as a function of tensorface complexity (rank) (dashed line), as well as rel-
ative reconstruction error for the test face set (solid line). The solid line is again
duplicated from Figure 11b. As with Yoda, reconstruction error decreases as a
function of tensorface complexity for familiar faces (solid) but increases for the
statistically novel face (dashed).

Face Representations of Various Complexities

313

Figure 17: Average face is at the origin of the face space. (a) Average face, based
on 128 faces in sample set. (b) Face space as derived by multidimensional scal-
ing (MDS). Based on responses of a population of 100 tensorface cells (rank = 8)
to 128 face stimuli, as well as responses of those tensorfaces to the average face.
MDS reduced the original 100-dimensional face space to a two-dimensional ap-
proximation to allow visualization. Plot symbols show positions of individual
faces in the face space, classified by race and gender. Black star shows the aver-
age face located at the origin of the face space.

3.7 Representation of the Average Face. There is evidence indicat-
ing that the representation of the average face forms the origin of a high-
dimensional face space (Leopold et al., 2001; Rhodes & Jeffery, 2006; Tsao &
Freiwald, 2006; Wilson et al., 2002). Using multidimensional scaling (MDS)
based on a Euclidean distance metric, we examined the location of the
average face in a face space formed by 100 tensorfaces. This set of faces
thus formed 129 points (128 sample faces plus the average face) in a 100-
dimensional face space. The MDS analysis is based on tensorfaces with rank
= 8, with other rank values performed similarly.

The result of MDS analysis is given in Figure 17. It shows that the faces
of different racial groups and genders cluster into different regions of face
space. Furthermore, we see that the face space formed by the tensorfaces
places the representation of the average face at the origin of the face space.
Note that the representation of the average face at the origin is due to activ-
ity across a population of tensorfaces. No individual tensorface specifically
represents the average face.

3.8 Cross-Stimulation of Tensorfaces by Nonface Stimuli. A com-
plete and autonomous face processing system should reject nonface inputs.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

314

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Therefore, now we look at the reconstruction of a nonface object by ten-
sorfaces. Reconstruction of a melon is shown in Figure 18a. The recon-
struction obviously fails, producing an enormous reconstruction error (see
Figure 18b). Going beyond just the magnitude of the error, the organization
of the errors gives the reconstructed melon the shape of a face, although
with melon texture on it.

Representing faces is essentially the only thing that tensorfaces are ca-
pable of. Any object presented to that face representation system will be
interpreted as a face. In contrast, the ideal representation of a nonface ob-
ject produced by a specialized face representation system should be null,
no response.

The spurious reconstruction of nonface objects as faces by the tensorface
population occurs because response magnitudes of tensorfaces are simi-
lar to face and nonface objects (see Figure 18c). This cross-stimulation of
tensorfaces by face and nonface stimuli appears to be the result of the low
stimulus selectivity of tensorfaces seen in Figure 14a. As will be discussed
further below, a possible solution to this cross-stimulation problem would
be to have a nonlinearity associated with the linear tensorface receptive
fields that would filter out nonface stimuli from being processed (see
Figure 18d).

4 Discussion

Based on measures of algorithmic information (Kolmogorov complexity),
we show here that low-complexity and high-complexity faces have differ-
ent properties and therefore that complexity can be a way of constraining
possible ways that face space is organized. Just as Shannon information has
proven useful for understanding processing in early vision (Barlow, 1961;
Field, 1994), we suggest that Kolmogorov complexity and related measures
such as logical depth may prove useful in providing a framework for study-
ing high-level vision, including face recognition and object recognition in
general.

Cover and Thomas (2006), in their textbook on information theory, state,
“We consider Kolmogorov complexity to be more fundamental than Shan-
non entropy.” Kolmogorov complexity is associated with concepts from
computational theory (Turing machines), while Shannon entropy is a statis-
tical theory not derived from computational theory. Both Shannon entropy
and Kolmogorov complexity can be used as measures of efficient coding,
indicating how compressed a representation can be. In addition to con-
sidering compression from the statistical perspective of Shannon entropy
(Barlow, 1961; Field, 1994; Olshausen & Field, 2004) it can also be consid-
ered from the algorithmic perspective of Kolmogorov complexity (Adri-
aans, 2007; Chater & Vitányi, 2003; Feldman, 2016). A critical difference
between the two types of information is that Shannon entropy is defined
probabilistically in terms of a distribution over an ensemble of symbols,

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

315

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Figure 18: Reconstruction of a nonface object (melon) using tensorfaces (rank
= 8). (ai) Original melon image. (ii) Reconstructed melon using tensorfaces de-
rived from human face sample set. (iii) Ideal reconstruction of melon, which
should be null for a specialized face processing system. (b) Reconstruction error
for melon compared to other stimuli. Human faces’ response shows the mean
and standard deviation for 128 faces in the sample set. Other response values are
for a single stimulus image. (c) Average response magnitudes of tensorfaces to
face and nonface stimuli, which are very similar. This similarity leads tensorface
populations to create spurious reconstructions of nonface objects. Shows mean
responses of 100 tensorfaces of rank 8 to 512 faces and 512 nonface objects. (d)
To prevent spurious reconstructions of nonface stimuli, the face identification
stage (tensorfaces) requires a nonlinearity. Two possible organizations for such
nonlinearity are (i) sequential nonlinearity, with nonlinear face detector stage
preceding linear face identification stage in separate neurons, and (ii) parallel
nonlinearity with nonlinear spatial interactions present within receptive fields
of single face cells. In this case, face detection and face identification occur con-
currently within single face cells.

316

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

without any connection to the structure of individual messages, while Kol-
mogorov complexity is a deterministic concept measuring information of a
single entity (message) by itself in isolation (Grünwald & Vitányi, 2008a).
While assigning probabilities to repetitive low-level structures (e.g., V1 Ga-
bor receptive fields) is clearly reasonable within the framework of Shannon
entropy, assigning such probabilities to high-level structures that are es-
sentially unique (e.g., inferotemporal receptive fields) may be problematic
(Chater & Vitányi, 2003). As a nonprobabilistic computation, Kolmogorov
complexity can assign a measure of information content to an individual
high-level structure purely in terms of its internal structure.

Receptive field complexity appears to increase as one ascends through
the hierarchy of visual cortical areas. Although this impression is not yet
confirmed through neurophysiological measurements of Kolmogorov com-
plexity, the ventral stream deep learning model of Güçlü and van Ger-
ven (2015) reinforces this perception of increased complexity, as it shows a
monotonic increase in Kolmogorov complexity as a function of the layer in
the network. While Kolmogorov complexity has not been measured exper-
imentally, sparseness has been. It is well established that visual representa-
tions are sparse (Dan, Atick, & Reid, 1996; Lehky et al., 2011, 2005; Lehky &
Sereno, 2007; Pitkow & Meister, 2012; Rolls & Tovée, 1995; Vinje & Gallant,
2000; Willmore, Mazer, & Gallant, 2011). What is not well established is the
gradient of how sparseness changes across different cortical areas, as such
data are limited. Kolmogorov complexity and sparseness need not neces-
sarily be correlated. Just because receptive field organization may appear
highly complex does not mean that there must be a correspondingly high
level of sparseness (see Figure 14). Indeed, a comparison of a low-level vi-
sual area (V1) (see Lehky et al., 2005) and a high-level visual area (anterior
inferotemporal cortex) (see Lehky et al., 2011) shows only a very modest
increase in sparseness (median kurtosis going from 0.84 to 1.88, measuring
lifetime sparseness or what we call cell selectivity). The data of Willmore
et al. (2011) indicate sparseness stays essentially the same going from V1 to
V4. As Willmore et al. (2011) conclude, the data suggest that “maximization
of lifetime sparseness is not a principle that determines the organization of
visual cortex.” In contrast, there appears to be a steady and substantial in-
crease in receptive field complexity across the cortical areas of the ventral
visual hierarchy. In view of that, Kolmogorov complexity may be a more
interesting parameter than sparseness in high-level visual processing.

Given these preliminary remarks on the general significance of using
Kolmogorov complexity for characterizing visual receptive fields, we now
turn to face processing specifically. The different complexities of tensor-
faces we examine here demonstrate a range of possibilities that biological
face cells could have. In particular, low- and medium-complexity face cells
form feasible representations in addition to very high-complexity represen-
tations, such as those formed by PCA eigenfaces or variants thereof (e.g.,

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

317

active appearance models; Chang & Tsao, 2017). Such high-complexity face
representations have in the past been suggested as forming the basis of bi-
ological face space. The actual complexity of biological face cells remains a
question for future experimental studies.

We observe a trade-off between receptive field complexity and the pop-
ulation size necessary to reach a criterion error in reconstructing faces (see
Figure 11a). A large population of low-complexity tensorfaces is equivalent
to a smaller population of high-complexity tensorfaces. This trade-off can
be observed for receptive fields in earlier cortical areas. For example, large
populations of low-complexity Gabor functions in striate cortex can also
accurately represent faces, and face identification can be performed using
Gabor-based face representations without any face cells (Wiskott, Krüger,
Kuiger, & von der Malsburg, 1997).

From the perspective of information contained in a population of tensor-
faces as indicated by reconstruction error, there does not seem to be a bene-
fit to using high-complexity face cells. Reconstruction error as a function of
tensorface complexity does not decrease moving from medium- to high-
complexity tensorfaces (see Figure 11b). Moreover, high-complexity face
cells incur high computational costs to create, measured as Kolmogorov
complexity or logical depth (see Figure 12). Low-complexity cells are in-
efficient in that they require larger population sizes to reach a criterion
reconstruction error. The sweet spot for face representations may be at
intermediate complexity, perhaps at about rank = 8. Nevertheless, low-
complexity face cells may balance their representational inefficiency with
their increased ability to generalize to statistically novel faces (see Fig-
ure 16). Thus, there may be an advantage to having a mixture of low- to
medium-complexity face cells but not high-complexity face cells such as
produced by PCA. Furthermore, not all face cells in a population need to
have the same level of complexity. That is another empirical question for
future experimental work.

What might be the advantage of increased complexity of face repre-
sentations in higher visual cortical areas (i.e., the creation of face cells)?
The smaller population sizes allowed by more complex receptive fields
means that face spaces with lower dimensionalities can be created (see
Lehky, Kiani, Esteky, & Tanaka, 2014, for a discussion of dimensionality).
In other words, creating face representations with more complex receptive
fields may be a dimensionality-reduction technique. Lower-dimensional
face spaces may make it easier to categorize faces (Plastria, De Bruyne,
& Carrizosa, 2008). However, the benefits for creating more efficient face
spaces using more complex receptive fields must be balanced with com-
putational costs of the increased complexity as measured by Kolmogorov
complexity and logical depth of receptive field spatial structure.

There is a high correlation between computational complexity (see Fig-
ure 12) and spectral power at high spatial frequencies (see Figure 13b). The

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

318

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

link between computational complexity and spatial frequency provides ad-
ditional motivation to characterize spatial frequency properties of face cells,
expanding on current physiological (Inagaki & Fujita, 2011; Rolls, Baylis, &
Hasselmo, 1987) and psychophysical studies (Costen, Parker, & Craw, 1996;
Gaspar, Sekuler, & Bennett, 2008; Näsänen, 1999). Nevertheless, tensorfaces
with different complexities are not simply Fourier amplitude filtered ver-
sions of each other but have substantial differences in appearance (phase
spectra). The spatial frequency content of a facial representation is not suf-
ficient to completely characterize its complexity.

We worked with colored faces rather than the monochromatic faces as
used in most studies of face coding. Color can be an important aspect of
face identification (Nestor, Plaut, & Behrmann, 2013; Tanaka, Weiskopf, &
Williams, 2001), particularly if the shape information is degraded or am-
biguous (Choi, Ro, & Plataniotis, 2009; Yip & Sinha, 2002). We include joint
shape and color sensitivity in the tensorfaces developed here. Responsive-
ness to both shape and color is found in the same face cells in the inferotem-
poral cortex of monkeys as measured neurophysiologically (Edwards, Xiao,
Keysers, Földiák, & Perrett, 2003). However, there is also fMRI evidence for
separate, parallel channels coding face shape and color (Lafer-Sousa & Con-
way, 2013; Lafer-Sousa, Conway, & Kanwisher, 2016).

A significant question is whether the representation of faces is global
(holistic) or local (parts based) (Behrmann, Richler, Avidan, & Kimchi,
2015; Maurer, Grand, & Mondloch, 2002; Piepers & Robbins, 2012; Rich-
ler, Palmeri, & Gauthier, 2012; Tanaka & Simonyi, 2016). We examined this
issue by measuring a globality index for tensorfaces, defined as the average
fraction of the face covered by the tensorfaces. Tensorfaces across a popula-
tion exhibit a great deal of variability in their globality. Some tensorfaces are
local, and others are strongly global. On average, high-complexity tensor-
faces are more global than low-complexity ones (see Figure 15). Typically, a
tensorface covers a sizable fraction of a face but not the entire face.

This variability in globality is consistent with both psychophysical
(Tanaka & Simonyi, 2016) and neurophysiological reports (Freiwald et al.,
2009), which conclude that face processing involves both global and parts-
based processing. We have previously proposed such mixed and interme-
diate globality for inferotemporal object representations in general, not just
faces (Lehky & Tanaka, 2016), based on data from monkey neurophysiol-
ogy showing sensitivity of neurons to a partial set of features but gener-
ally not the entire object (Fujita, Tanaka, Ito, & Cheng, 1992; Ito, Fujita,
Tamura, & Tanaka, 1994; Ito, Tamura, Fujita, & Tanaka, 1995; Kobatake &
Tanaka, 1994; Tanaka, Saito, Fukada, & Moriya, 1991; Yamane, Tsunoda,
Matsumoto, Phillips, & Tanifuji, 2006).

Recently Chang and Tsao (2017) reported that biological face space corre-
sponds to one specific linear space that they have discovered. However, we
believe that the linear face space they report is not uniquely defined under

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

319

their mathematical data analyses. Rather, a variety of different face spaces
are consistent with their data.

Approximate linear transforms (i.e., multiple linear regression) can be
fit between face coefficients for various linear decompositions (e.g., PCA,
ICA, NMF, our version of tensorfaces). Fits between the different linear
face decompositions will be good provided each is capable of doing accept-
able reconstructions of faces (e.g., under some psychophysical criterion for
reconstruction error). If the neurophysiological data provide a good fit to
face components from one linear decomposition, such as active appearance
model (AAT) of Chang and Tsao (2017), then the data will also provide good
fits to other linear face decompositions. Chang and Tsao (2017) have stud-
ied one predetermined linear face decomposition, and since it happened
to meet their criterion of goodness of fit, they did not continue to examine
other possible decompositions.

For example, we investigated the transform between PCA coefficients
(PCAcoeff) and tensor coefficients (tensorCoeff) for two linear face decompo-
sitions. This transform is given by PCAcoeff (cid:11) = tensorCoeff ∗ b, where PCAco-
eff and tensorCoeff are matrices of coefficients for a set of faces, one face per
column, and PCAcoeff (cid:11)
are estimated PCA coefficients. The coefficient b is
given by b = pinv(tensorCoeff) ∗ PCAcoeff, where pinv is the Moore-Penrose
pseudoinverse operator performing a multiple linear regression, and ten-
sorcoeff has been augmented by a column of ones to include offsets. There
were 128 faces as input, the tensor decomposition had 100 components, and
we modeled the first 50 PCA components. We fit the model leaving one face
left out for testing, repeating with a different face being left out. The results
show that when comparing actual PCAcoeff and estimated PCAcoeff (cid:11)
, the
model accounts for a 0.985 fraction of the variance. This shows that it is
possible to predict PCAcoeff from tensorCoeff with high accuracy. Therefore,
the two linear face decompositions would each provide essentially the same
fit to the neurophysiological data. The interchangeability between different
linear face spaces means that if one wants to select a face model, it would
have to be constrained based on some criterion other than overall goodness
of fit of data to one single linear model in isolation, but perhaps based in-
stead on comprehensive experimental characterizations of receptive fields
of individual face cells across the population.

Furthermore, experimental face stimulus sets are limited in that they
cover only a limited range of the faces that are possible. It is conceivable
that observed face spaces such as Chang and Tsao (2017) are approximately
linear at a local scale but that a more complete sampling of faces will reveal
a nonlinear face space at a broader scale. Even color space at high-level vi-
sual cortex is a complicated, nonlinear space (Bohon, Hermann, Hansen, &
Conway, 2016; Komatsu, Ideura, Kaji, & Yamane, 1992; Lehky & Sejnowski,
1999), and there is no reason to expect that face space is not similarly com-
plicated and nonlinear.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

320

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Underlying a possible nonlinear face space would be face cells them-
selves that act individually as spatially nonlinear filters. Reports of infer-
otemporal processing indicate that object representations, and in particular
face representations, involve nonlinear spatial filters, as mentioned earlier
(Owaki et al., 2018; Tanaka, 1996; Yamane et al., 2006). Those nonlinear spa-
tial interactions are why we cannot map face cell receptive fields with a sim-
ple stimulus spot as we do in striate cortex. The spurious reconstruction of
nonface objects using linear components such as tensorfaces (see Figure 18)
and eigenfaces (see Figure 2b in Tsao & Livingstone, 2008) also indicates a
requirement to introduce some sort of nonlinearity in face cells.

Linear models of biological facial representations, including the partic-
ular implementation of tensorfaces used here, can reveal some significant
aspects of face processing and thus can be useful in theoretical discus-
sions as long as the biological limitations of those models are kept firmly in
sight. However, without nonlinearity, they cannot be considered complete
solutions. Nonlinearity is a central stumbling block in understanding bi-
ological face processing and object processing generally. One approach to
introducing nonlinearity into face representations is illustrated by the non-
linear tensor modeling of Vasilescu and Terzopoulos (2011). However, there
are multitudes of other possibilities,
including the development of
Kolmogorov-complexity constrained deep learning networks.

Overall, the results here suggest that spatial complexity of face cells is
likely to be a significant factor, among others, in characterizing face space.
Defining the complexity of face representations may contribute to a more
complete framework for guiding future research.

Acknowledgments

This research was supported by a grant to K.T. from the Strategic Research
Program for Brain Sciences of the Japan Agency for Medical Research and
Development. It was also supported by grants to A.C. and A-H.P. from
the Ministry of Education and Science of the Russian Federation (grant
14.756.31.0001) and to A.C. from the Polish National Science Center (grant
2016/20/W/N24/00354). We thank Topi Tanskanen for comments on the
manuscript.

References

Adriaans, P. (2007). Learning as data compression. Paper presented at the Third Con-
ference on Computability in Europe: Computation and Logic in the Real World,
Siena, Italy.

Adriaans, P. (2019). Information. In E. N. Zalta (Ed.), The Stanford encyclopedia of phi-
losophy (Spring 2019 ed.). https://plato.stanford.edu/archives/spr2019/entries
/information/

Baldassi, C., Alemi-Neissi, A., Pagan, M., Dicarlo, J. J., Zecchina, R., & Zoccolan,
D. (2013). Shape similarity, better than semantic membership, accounts for the

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

321

structure of visual object representations in a population of monkey inferotem-
poral neurons. PLOS Computational Biology, 9, e1003167.

Barlow, H. B. (1961). Possible principles underlying the transformations of sensory
messages. In W. Rosenblith (Ed.), Sensory communication (pp. 217–234). Cam-
bridge, MA: MIT Press.

Bartlett, M. S., Movellan, J. R., & Sejnowski, T. J. (2002). Face recognition by indepen-
dent component analysis. IEEE Transactions on Neural Networks, 13, 1450–1464.
Bartlett, M. S., & Sejnowski, T. J. (1997). Independent components of face images: A repre-
sentation for face recognition. Paper presented at the 4th Annual Joint Symposium
on Neural Computation, Pasadena, CA.

Behrmann, M., Richler, J. J., Avidan, G., & Kimchi, R. (2015). Holistic face perception.
In J. Wagemans (Ed.), The Oxford handbook of perceptual organization (pp. 758–774).
Oxford: Oxford University Press.

Bell, A. J., & Sejnowski, T. J. (1997). The “independent components” of natural scenes

are edge filters. Vision Research, 37, 3327–3338.

Bennett, C. H. (1988). Logical depth and physical complexity. In R. Herken (Ed.),
The universal Turing machine—a half-century survey (pp. 227–257). Oxford: Oxford
University Press.

Bennett, C. H. (1994). Complexity in the universe. In J. J. Halliwell, J. Peres-Mercader,
& W. H. Zurek (Eds.), Physical origins of time asymmetry (pp. 33–46). New York:
Cambridge University Press.

Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. Paper
presented at Siggraph ’99: The 26th Annual Conference on Computer Graphics and
Interactive Techniques, Los Angeles.

Bohon, K. S., Hermann, K. L., Hansen, T., & Conway, B. R. (2016). Representation
of perceptual color space in macaque posterior inferior temporal cortex (the V4
complex). eNeuro, 3, ENEURO.0039-16.2016.

Bracci, S., & Op de Beeck, H. (2016). Dissociations and associations between shape
and category representations in the two visual pathways. Journal of Neuroscience,
36, 432–444.

Bro, R. (1997). PARAFAC: Tutorial and applications. Chemometrics and Intelligent Lab-

oratory Systems, 38, 149–171.

Bro, R. (1998). Multi-way analysis in the food industry: Models, algorithms, and applica-

tions. PhD diss., University of Amsterdam.

Bro, R., Harshman, R. A., Sidiropoulos, N. D., & Lundy, M. E. (2009). Modeling mul-
tiway data with linearly dependent loadings. Journal of Chemometrics, 23, 324–
340.

Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidi-
mensional scaling via an n-way generalization of Eckart-Young decomposition.
Psychometrika, 35, 283–319.

Chaitin, G. (1969). On the length of programs for computing finite binary sequences:
Statistical considerations. Journal of the Association of Computing Machinery, 16,
145–159.

Chang, L., & Tsao, D. Y. (2017). The code for facial identity in the primate brain. Cell,

169, 1013–1028.

Chater, N., & Vitányi, P. (2003). Simplicity: A unifying principle in cognitive science?

Trends in Cognitive Sciences, 7, 19–22.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

322

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Choi, J. Y., Ro, Y. M., & Plataniotis, K. N. (2009). Color face recognition for degraded
face images. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,
39, 1217–1230.

Cichocki, A., Mandic, D., Phan, A.-H., Caiafa, C., Zhou, G., Zhao, Q., . . . De Lath-
auwer, (2015). Tensor decompositions for signal processing applications: From
two-way to multiway component analysis. IEEE Signal Processing Magazine, 32,
145–163.

Cichocki, A., Zdunek, R., Phan, A.-H., & Amari, S.-I. (2009). Nonnegative matrix and
tensor factorizations: Applications to exploratory multi-way data analysis and blind
source separation. Chichester: Wiley.

Connolly, A. C., Guntupalli, J. S., Gors, J., Hanke, M., Halchenko, Y. O., Wu, Y. C.,
. . . Haxby, J. V. (2012). The representation of biological classes in the human brain.
Journal of Neuroscience, 32, 2608–2618.

Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.

Costen, N. P., Parker, D. M., & Craw, I. (1996). Effects of high-pass and low-pass
spatial filtering on face identification. Perception and Psychophysics, 58, 602–612.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.). Hoboken,

NJ: Wiley.

Cowell, R. A., & Cottrell, G. W. (2013). What evidence supports special processing for
faces? A cautionary tale for fMRI interpretation. Journal of Cognitive Neuroscience,
25, 1777–1193.

Dan, Y., Atick, J. J., & Reid, R. C. (1996). Efficient coding of natural scenes in the
lateral geniculate nucleus: Experimental test of a computational theory. Journal of
Neuroscience, 16, 3351–3362.

De Lathauwer, L. (2008a). Decompositions of a higher-order tensor in block terms—
Part I: Lemmas for partitioned matrices. SIAM Journal on Matrix Analysis and Ap-
plications, 30, 1022–1032.

De Lathauwer, L. (2008b). Decompositions of a higher-order tensor in block terms—
Part II: Definitions and uniqueness. SIAM Journal on Matrix Analysis and Applica-
tions, 30, 1033–1066.

Duchaine, B., & Yovel, G. (2015). A revised neural framework for face processing.

Annual Review of Vision Science, 1, 393–416.

Edwards, G. J., Cootes, T. F., & Taylor, C. J. (1998). Face recognition using active ap-
pearance models. Paper presented at the 5th European Conference on Computer
Vision, Freiburg, Germany.

Edwards, R., Xiao, D., Keysers, C., Földiák, P., & Perrett, D. (2003). Color sensitivity
of cells responsive to complex stimuli in the temporal cortex. Journal of Neurophys-
iology, 90, 1245–1256.

Eifuku, S., De Souza, W. C., Tamura, R., Nishijo, H., & Ono, T. (2004). Neuronal corre-
lates of face identification in the monkey anterior temporal cortical areas. Journal
of Neurophysiology, 91, 358–371.

Fang, F., Murray, S. O., & He, S. (2007). Duration-dependent fMRI adaptation and
distributed viewer-centered face representation in human visual cortex. Cerebral
Cortex, 17, 1402–1411.

Favier, G., & de Almeida, A. L. (2014). Overview of constrained PARAFAC models.

EURASIP Journal on Advances in Signal Processing, 2014, 142.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

323

Feldman, J. (2016). The simplicity principle in perception and cognition. Wiley Inter-

disciplinary Review: Cognitive Science, 7, 330–340.

Field, D. J. (1994). What is the goal of sensory coding? Neural Computation, 6, 559–601.
Freiwald, W. A., Duchaine, B., & Yovel, G. (2016). Face processing systems: From
neurons to real-world social perception. Annual Review of Neuroscience, 39, 325–
346.

Freiwald, W. A., & Tsao, D. Y. (2010). Functional compartmentalization and view-
point generalization within the macaque face-processing system. Science, 330,
845–851.

Freiwald, W. A., Tsao, D. Y., & Livingstone, M. (2009). A face feature space in the

macaque temporal lobe. Nature Neuroscience, 12, 1187–1196.

Fujita, I., Tanaka, K., Ito, M., & Cheng, K. (1992). Columns for visual features of ob-

jects in monkey inferotemporal cortex. Nature, 360, 343–346.

Gaspar, C., Sekuler, A. B., & Bennett, P. J. (2008). Spatial frequency tuning of upright

and inverted face identification. Vision Research, 48, 2817–2826.

Gauthier, I., Behrmann, M., & Tarr, M. J. (1999). Can face recognition really be
dissociated from object recognition? Journal of Cognitive Neuroscience, 11, 349–
370.

Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars
and birds recruits brain areas involved in face recognition. Nature Neuroscience, 3,
191–197.

Gauthier, I., & Tarr, M. J. (1997). Becoming a “Greeble” expert: Exploring mecha-

nisms for face recognition. Vision Research, 37, 1673–1682.

Grünwald, P., & Vitányi, P. (2008a). Algorithmic information theory. https://arxiv.org

/abs/0809.2754

Grünwald, P., & Vitányi, P. (2008b). Shannon information and Kolmogorov complexity.

https://arxiv.org/abs/cs/0410002

Güçlü, U., & van Gerven, M. A. (2015). Deep neural networks reveal a gradient in
the complexity of neural representations across the ventral stream. Journal of Neu-
roscience, 35, 10005–10014.

Guo, X., Miron, S., Brie, D., & Stegeman, A. (2012). Uni-mode and partial unique-
ness conditions for CANDECOMP/PARAFAC of three-way arrays with linearly
dependent loadings. SIAM Journal on Matrix Analysis and Applications, 33, 111–
129.

Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and con-
ditions for an explanatory multimodal factor analysis. UCLA Working Papers in
Phonetics, 16, 1–84.

Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural

system for face perception. Trends in Cognitive Sciences, 4, 223–233.

Hout, M. C., Papesh, M. H., & Goldinger, S. D. (2013). Multidimensional scaling.

Wiley Interdisciplinary Review: Cognitive Science, 4, 93–103.

Inagaki, M., & Fujita, I. (2011). Reference frames for spatial frequency in face rep-
resentation differ in the temporal visual cortex and amygdala. Journal of Neuro-
science, 31, 10371–10379.

Ito, M., Fujita, I., Tamura, H., & Tanaka, K. (1994). Processing of contrast polarity of
visual images in inferotemporal cortex of the macaque monkey. Cerebral Cortex,
14, 499–508.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

324

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Ito, M., Tamura, H., Fujita, I., & Tanaka, K. (1995). Size and position invariance of
neuronal responses in monkey inferotemporal cortex. Journal of Neurophysiology,
73, 218–226.

Jiang, F., Blanz, V., & O’Toole, A. J. (2006). Probing the visual representation of faces
with adaptation: A view from the other side of the mean. Psychological Science, 17,
493–500.

Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3,

759–763.

Kanwisher, N., & Yovel, G. (2006). The fusiform face area: A cortical region special-
ized for the perception of faces. Philosophical Transactions of the Royal Society of
London B Biological Sciences, 361, 2109–2128.

Kayaert, G., Biederman, I., & Vogels, R. (2005). Representation of regular and ir-
regular shapes in macaque inferotemporal cortex. Cerebral Cortex, 15, 1308–
1321.

Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in
response patterns of neuronal population in monkey inferior temporal cortex.
Journal of Neurophysiology, 97, 4296–4309.

Kobatake, E., & Tanaka, K. (1994). Neuronal selectivities to complex object features
in the ventral visual pathway of the macaque cerebral cortex. Journal of Neuro-
physiology, 71, 856–867.

Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM

Review, 51, 455–500.

Kolda, T. G., Bader, B. W., Acar Ataman, E., Dunlary, D., Bassett, R., . . . Hansen,
S. (2017). Matlab tensor toolbox (Version 3.0-dev). https://www.tensortoolbox
.org

Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of infor-

mation. Problems of Information Transmission, 1, 1–7.

Komatsu, H., Ideura, Y., Kaji, S., & Yamane, S. (1992). Color selectivity of neurons
in the inferior temporal cortex of the awake macaque monkey. Journal of Neuro-
science, 12, 408–424.

Kravitz, D. J., Peng, C. S., & Baker, C. I. (2011). Real-world scene representations in
high-level visual cortex: It’s the spaces more than the places. Journal of Neuro-
science, 31, 7322–7333.

Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., . . . Ban-
detlini, P. A. (2008). Matching categorical object representations in inferior tem-
poral cortex of man and monkey. Neuron, 60, 1126–1141.

Lafer-Sousa, R., & Conway, B. R. (2013). Parallel, multi-stage processing of colors,
faces and shapes in macaque inferior temporal cortex. Nature Neuroscience, 16,
1870–1878.

Lafer-Sousa, R., Conway, B. R., & Kanwisher, N. (2016). Color-biased regions of the
ventral visual pathway lie between face- and place-selective regions in humans,
as in macaques. Journal of Neuroscience, 36, 1682–1697.

Lantéri, H., Soummer, R., & Aime, C. (1999). Comparison between ISRA and RLA
algorithms: Use of a Wiener filter based stopping criterion. Astronomy and Astro-
physics Supplementary Series, 140, 235–246.

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix

factorization. Nature, 401, 788–791.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

325

Lehky, S. R., Kiani, R., Esteky, H., & Tanaka, K. (2011). Statistics of visual responses
in primate inferotemporal cortex to object stimuli. Journal of Neurophysiology, 106,
1097–1117.

Lehky, S. R., Kiani, R., Esteky, H., & Tanaka, K. (2014). Dimensionality of object rep-
resentations in monkey inferotemporal cortex. Neural Computation, 26, 2135–2162.
Lehky, S. R., & Sejnowski, T. J. (1999). Seeing white: Qualia in the context of decoding

population codes. Neural Computation, 11, 1261–1280.

Lehky, S. R., Sejnowski, T. J., & Desimone, R. (2005). Selectivity and sparseness in

the responses of striate complex cells. Vision Research, 45, 57–73.

Lehky, S. R., & Sereno, A. B. (2007). Comparison of shape encoding in primate dorsal

and ventral visual pathways. Journal of Neurophysiology, 97, 307–319.

Lehky, S. R., & Tanaka, K. (2016). Neural representation for object recognition in

inferotemporal cortex. Current Opinion in Neurobiology, 37, 23–35.

Lempel, A., & Ziv, J. (1976). On the complexity of finite sequences. IEEE Transactions

on Information Theory, 22, 75–81.

Leopold, D. A., Bondar, I. V., & Giese, M. A. (2006). Norm-based face encoding by

single neurons in the monkey inferotemporal cortex. Nature, 442, 572–575.

Leopold, D. A., O’Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced
shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4, 89–94.
Leopold, D. A., & Rhodes, G. (2010). A comparative view of face perception. Journal

of Comparative Psychology, 124, 233–251.

Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications

(3rd ed.). New York: Springer.

Lin, C. J. (2007). Projected gradient methods for non-negative matrix factorization.

Neural Computation, 19, 2756–2779.

Liu, S., & Trenkler, G. (2008). Hadamard, Khatri-Rao, Kronecker and other matrix
products. International Journal of Information and Systems Sciences, 4, 160–177.
Maurer, D., Grand, R. L., & Mondloch, C. J. (2002). The many faces of configural

processing. Trends in Cognitive Sciences, 6, 255–260.

McKone, E., Kanwisher, N., & Duchaine, B. C. (2007). Can generic expertise explain

special processing for faces? Trends in Cognitive Sciences, 11, 8–15.

Meytlis, M., & Sirovich, L. (2007). On the dimensionality of face space. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, 29, 1262–1267.

Murata, A., Gallese, V., Luppino, G., Kaseda, M., & Sakata, H. (2000). Selectivity for
the shape, size, and orientation of objects for grasping in neurons of monkey pari-
etal area AIP. Journal of Neurophysiology, 83, 2580–2601.

Näsänen, R. (1999). Spatial frequency bandwidth used in the recognition of facial

images. Vision Research, 39, 3824–3833.

Natu, V. S., Jiang, F., Narvekar, A., Keshvari, S., Blanz, V., & O’Toole, A. J. (2010).
Dissociable neural patterns of facial identity across changes in viewpoint. Journal
of Cognitive Neuroscience, 22, 1570–1582.

Nestor, A., Plaut, D. C., & Behrmann, M. (2013). Face-space architectures: Evidence
for the use of independent color-based features. Psychological Science, 24, 1294–
1300.

Nestor, A., Plaut, D. C., & Behrmann, M. (2016). Feature-based face representations
and image reconstruction from behavioral and neural data. Proceedings of the Na-
tional Academy of Sciences of the United States of America, 113, 416–421.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

326

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Noudoost, B., & Esteky, H. (2013). Neuronal correlates of view representation re-

vealed by face-view aftereffect. Journal of Neuroscience, 33, 5761–5772.

Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field
properties by learning a sparse code for natural images. Nature, 381, 607–
609.

Olshausen, B. A., & Field, D. J. (2004). Sparse coding of sensory inputs. Current Opin-

ion in Neurobiology, 14, 481–487.

Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent
low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4,
1244–1252.

Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE

Transactions on Systems, Man and Cybernetics, 9, 62–66.

Owaki, T., Vidal-Naquet, M., Nam, Y., Uchida, G., Sato, T., Câteau, H., . . . Tanifuji,
M. (2018). Searching for visual features that explain response variance of face
neurons in inferior temporal cortex. PLOS One, 13, e0201192.

Parr, L. A. (2011). The evolution of face processing in primates. Philosophical Transac-

tions of the Royal Society of London B Biological Sciences, 366, 1764–1777.

Parr, L. A., Hecht, E., Barks, S. K., Preuss, T. M., & Votaw, J. R. (2009). Face processing

in the chimpanzee brain. Current Biology, 19, 50–53.

Parr, L. A., Winslow, J. T., Hopkins, W. D., & de Waal, F. B. (2000). Recognizing fa-
cial cues: Individual discrimination by chimpanzees (Pan troglodytes) and rhesus
monkeys (Macaca mulatta). Journal of Comparative Psychology, 114, 47–60.

Perrett, D. I., Oram, M. W., Harries, M. H., Bevan, R., Hietanen, J. K., Benson, P. J.,
& Thomas, S. (1991). Viewer-centred and object-centred coding of heads in the
macaque temporal cortex. Experimental Brain Research, 86, 159–173.

Perrett, D. I., Smith, P. A., Potter, D. D., Mistlin, A. J., Head, A. S., Milner, A. D.,
& Jeeves, M. A. (1985). Visual cells in the temporal cortex sensitive to face view
and gaze direction. Proceedings of the Royal Society of London. Series B: Biological
Sciences, 223, 293–317.

Phan, A.-H. (2018). Matlab TensorBox (Version 2018.08). https://faculty.skoltech.ru

/people/anhhuyphan-tabs-4

Phan, A.-H., Cichocki, A., Tichavský, P., Zdunek, R., & Lehky, S. R. (2013). From basis
components to complex structural patterns. Paper presented at the 38th IEEE Inter-
national Conference on Acoustics, Speech, and Signal Processing.

Piepers, D. W., & Robbins, R. A. (2012). A review and clarification of the terms “holis-
tic,” “configural,” and “relational” in the face perception literature. Frontiers in
Psychology, 3, 559.

Pitkow, X., & Meister, M. (2012). Decorrelation and efficient coding by retinal gan-

glion cells. Nature Neuroscience, 15, 628–635.

Plastria, F., De Bruyne, S., & Carrizosa, E. (2008). Dimensionality reduction for classi-
fication. Lecture Notes in Computer Science: Vol. 5139. Advanced Data Mining and
Applications. Berlin: Springer-Verlag.

Rabanser, S., Shchur, O., & Günnemann, O. (2017). Introduction to tensor decomposi-
tions and their applications in machine learning. https://arxiv.org/abs/1711.10781
Ramírez, F. M., Cichy, R. M., Allefeld, C., & Haynes, J. D. (2014). The neural code
for face orientation in the human fusiform face area. Journal of Neuroscience, 34,
12155–12167.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Face Representations of Various Complexities

327

Rhodes, G., & Jeffery, L. (2006). Adaptive norm-based coding of facial identity. Vision

Research, 46, 2977–2987.

Richler, J. J., Palmeri, T. J., & Gauthier, I. (2012). Meanings, mechanisms, and mea-

sures of holistic processing. Frontiers in Psychology, 3, 553.

Rolls, E. T., Baylis, G. C., & Hasselmo, M. E. (1987). The responses of neurons in
the cortex in the superior temporal sulcus of the monkey to band-pass spatial
frequency filtered faces. Vision Research, 27, 311–326.

Rolls, E. T., & Tovée, M. J. (1995). Sparseness of the neuronal representation of stimuli
in the primate temporal visual cortex. Journal of Neurophysiology, 73, 713–726.
Romero, M. C., Van Dromme, I. C., & Janssen, P. (2013). The role of binocular dispar-
ity in stereoscopic images of objects in the macaque anterior intraparietal area.
PLoS One, 8, e55340.

Ruffini, G. (2017). Lempel-Zif complexity reference. https://arxiv.org/abs/1707.09848
Sereno, A. B., & Lehky, S. R. (2018). Attention effects on neural population repre-
sentations for shape and location are stronger in the ventral than dorsal stream.
eNeuro, 5, e0371–0317.2018.

Sereno, A. B., Sereno, M. E., & Lehky, S. R. (2014). Recovering stimulus locations us-
ing populations of eye-position modulated neurons in dorsal and ventral visual
streams of non-human primates. Frontiers in Integrative Neuroscience, 8, 28.

Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., & Falout-
sos, C. (2017). Tensor decomposition for signal processing and machine learning.
IEEE Transaction on Signal Processing, 65, 3551–3582.

Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural rep-

resentation. Annual Review of Neuroscience, 24, 1193–1216.

Sirovich, L., & Meytlis, M. (2009). Symmetry, probability, and recognition in face
space. Proceedings of the National Academy of Sciences of the United States of America,
106, 6895–6899.

Solomonoff, R. (1964). A formal theory of inductive inference. Part I. Information and

Control, 7, 1–22.

Sorber, L., Van Barel, M., & De Lathauwer, L. (2013). Optimization-based algorithms
for tensor decompositions: Canonical polyadic decomposition, decomposition in
rank-(Lr, Lr, 1) terms, and a new generalization. SIAM Journal on Optimization, 23,
695–720.

Stegeman, A., & Lam, T. (2012). Improved uniqueness conditions for canonical ten-
sor decompositions with linearly dependent loadings. SIAM Journal on Matrix
Analysis and Applications, 33, 1250–1271.

Tanaka, J., & Simonyi, D. (2016). The “parts and wholes” of face recognition: A review

of the literature. Quarterly Journal of Experimental Psychology, 69, 1876–1889.

Tanaka, J., Weiskopf, D., & Williams, P. (2001). The role of color in high-level vision.

Trends in Cognitive Sciences, 5, 211–215.

Tanaka, K. (1996). Inferotemporal cortex and object vision. Annual Review of Neuro-

science, 19, 109–139.

Tanaka, K., Saito, H., Fukada, Y., & Moriya, M. (1991). Coding visual images of ob-
jects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysi-
ology, 66, 170–189.

Tolhurst, D. J., Smyth, D., & Thompson, I. D. (2009). The sparseness of neuronal re-
sponses in ferret primary visual cortex. Journal of Neuroscience, 29, 2355–2370.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d

f
/

3
2
2
2
8
1
1
8
6
4
5
5
1
n
e
c
o
_
a
_
0
1
2
5
8
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

328

S. Lehky, A. Phan, A. Cichocki, and K. Tanaka

Tong, M. H., Joyce, C. A., & Cottrell, G. W. (2008). Why is the fusiform face area
recruited for novel categories of expertise? A neurocomputational investigation.
Brain Research, 1202, 14–24.

Tsao, D. Y. (2014). The macaque face patch system: A window into object represen-

tation. Cold Spring Harbor Symposia on Quantitative Biology, 79, 109–114.

Tsao, D. Y., & Freiwald, W. A. (2006). What’s so special about the average face? Trends

in Cognitive Sciences, 10, 391–393.

Tsao, D. Y., & Livingstone, M. S. (2008). Mechanisms of face perception. Annual Re-

view of Neuroscience, 31, 411–437.

Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psy-

chometrica, 31, 279–311.

Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neu-

roscience, 3, 71–86.

Van Loan, C. F. (2000). The ubiquitous Kronecker product. Journal of Computational

and Applied Mathematics, 123, 85–100.

Vasilescu, M. A., & Terzopoulos, D. (2002). Multilinear analysis of image ensembles:
TensorFaces. Paper presented at the European Conference on Computer Vision,
Copenhagen, Denmark.

Vasilescu, M. A., & Terzopoulos, D. (2003). Multilinear subspace analysis of image en-
sembles. Paper presented at the IEEE Conference on Computer Vision and Pattern
Recognition, Madison, WI.

Vasilescu, M. A., & Terzopoulos, D. (2005). Multilinear independent components anal-
ysis. Paper presented at the IEEE Conference on Computer Vision and Pattern
Recognition, San Diego, CA.

Vasilescu, M. A., & Terzopoulos, D. (2011). Multilinear projection for face recognition
via canonical decomposition. Paper presented at the IEEE Conference on Automatic
Face and Gesture Recognition, Santa Barbara, CA.

Vinje, W. E., & Gallant, J. L. (2000). Sparse coding and decorrelation in primary visual

cortex during natural vision. Science, 287, 1273–1276.

Wang, P., Gauthier, I., & Cottrell, G. (2016). Are face and object recognition inde-
pendent? A neurocomputational modeling exploration. Journal of Cognitive Neu-
roscience, 28, 558–574.

Wang, Y., Jia, Y., Hu, C., & Turk, M. (2005). Non-negative matrix factorization frame-
work for face recognition. International Journal of Pattern Recognition and Artificial
Intelligence, 19, 495–511.

Willmore, B., Mazer, J. A., & Gallant, J. L. (2011). Sparse coding in striate and extras-

triate visual cortex. Journal of Neurophysiology, 105, 2907–2919.

Willmore, B., & Tolhurst, D. J. (2001). Characterizing the sparseness of neural codes.

Network, 12, 255–270.

Wilson, H. R., Loffler, G., & Wilkinson, F. (2002). Synthetic faces, face cubes, and the

geometry of face space. Vision Research, 42, 2909–2923.

Wiskott, L., Krüger, N., Kuiger, N., & von der Malsburg, C. (1997). Face recognition
by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 19, 775–779.

Yamane, Y., Tsunoda, K., Matsumoto, M., Phillips, A. N., & Tanifuji, M. (2006). Rep-
resentation of the spatial relationship among object parts by neurons in macaque
inferotemporal cortex. Journal of Neurophysiology, 96, 3147–3156.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u
n
e
c
o
a
r
t
i
c
e
–
p
d