Mid-level Feature Differences Support Early Animacy
and Object Size Distinctions: Evidence from
Electroencephalography Decoding
Ruosi Wang , Daniel Janini, and Talia Konkle
Abstrakt
■ Responses to visually presented objects along the cortical
surface of the human brain have a large-scale organization
reflecting the broad categorical divisions of animacy and object
Größe. Emerging evidence indicates that this topographical orga-
nization is supported by differences between objects in
mid-level perceptual features. With regard to the timing of neu-
ral responses, images of objects quickly evoke neural responses
with decodable information about animacy and object size, Aber
are mid-level features sufficient to evoke these rapid neural
responses? Or is slower iterative neural processing required
to untangle information about animacy and object size from
mid-level features, requiring hundreds of milliseconds more
processing time? To answer this question, we used EEG to
measure human neural responses to images of objects and their
texform counterparts—unrecognizable images that preserve
some mid-level feature information about texture and coarse
bilden. We found that texform images evoked neural responses
with early decodable information about both animacy and
real-world size, as early as responses evoked by original images.
Außerdem, successful cross-decoding indicates that both tex-
form and original images evoke information about animacy and
size through a common underlying neural basis. Broadly, diese
results indicate that the visual system contains a mid-level fea-
ture bank carrying linearly decodable information on animacy
and size, which can be rapidly activated without requiring
explicit recognition or protracted temporal processing. ■
EINFÜHRUNG
The ventral visual stream contains extensive information
about different object categories, with a large-scale spatial
organization of response preferences characterized by the
broad categories of animacy and object size (Thorat,
Proklova, & Peeling, 2019; Julian, Ryan, & Epstein, 2017;
Grill-Spector & Wiener Würstchen, 2014; Konkle & Caramazza,
2013; Konkle & Oliva, 2012). Classic understanding of
the ventral stream posits a hierarchical series of processing
Stufen, en route to a more conceptual format that ulti-
mately abstracts away from perceptual information
(Proklova, Kaiser, & Peeling, 2016; Mahon, Anzellotti,
Schwarzbach, Zampini, & Caramazza, 2009; z.B., for a
Rezension, see Peelen & Downing, 2017). Jedoch, emerging
evidence has revealed that the broad categorical distinc-
tions of the ventral stream are supported by more primi-
tive perceptual differences among “mid-level features” of
texture, shape, and curvature ( Jagadeesh & Gardner,
2022; Vinken, Konkle, & Livingstone, 2022; Bao, Sie,
McGill, & Tsao, 2020; Yue, Robert, & Ungerleider, 2020;
Jozwik, Kriegeskorte, & Mur, 2016; Long, Yu, & Konkle,
2018; Long, Störmer, & Alvarez, 2017; Long, Konkle,
Cohen, & Alvarez, 2016; Baldassi et al., 2013). On this
emerging account of visual system processing, the ventral
stream represents objects in a rich mid-level feature
Harvard Universität
© 2022 Massachusetts Institute of Technology
bank, from which more categorical distinctions can be
extracted (z.B., with linear read-out).
Evidence for this mid-level feature bank account comes
from recent work by Long et al. (2018) investigating brain
responses to a new stimulus class called “texforms” (Long
et al., 2016, 2017, 2018; Figure 1A). Texform images are
created using a texture-synthesis algorithm (Freeman &
Simoncelli, 2011), which preserves some mid-level feature
information related to the texture and coarse form of the
original depicted objects, while obscuring higher-level shape
features like clear contours and explicit shape information.
Empirically, people cannot identify what these are at the
basic level (z.B., as a “cat”). Long et al. (2018) found that tex-
form images evoked extensive responses along the entire
ventral visual cortex with a similar large-scale organization
as evoked by original, recognizable images. Zum Beispiel,
zones of cortex responding more strongly to original animals
also responded more to texformed animals. Jedoch, gegeben
that fMRI data obscure temporal information, there are a
number of possible accounts of these large-scale activations.
Daher, in the present study, we examined the time-evolving
signatures of visual system processing to ask when there
is information about animacy and size in neural responses
to texform images relative to their original counterparts.
According to the mid-level feature bank account, schnell
feedforward activations of the ventral stream reflect sensi-
tivity to mid-level featural distinctions, which directly carry
information about animacy and object size. A strong
Zeitschrift für kognitive Neurowissenschaften 34:9, S. 1670–1680
https://doi.org/10.1162/jocn_a_01883
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
J
/
Ö
C
N
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
J
Ö
C
N
_
A
_
0
1
8
8
3
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
J
/
Ö
C
N
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
J
Ö
C
N
_
A
_
0
1
8
8
3
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
Figur 1. Stimuli and decoding results. (A) Example stimulus images. Each of the four conditions (animacy × size) enthalten 15 exemplars, yielding
60 unrecognizable texforms (upper) und ihre 60 original counterparts (lower). (B) Time course of animate versus inanimate decoding. Classification
accuracy is plotted along the y axis, as a function of time (x axis), for original (solid sliver lines), texform (solid black lines), and texform-to-original
cross-decoding (dashed gray lines). Significant time points are depicted with horizontal lines above the time courses in the corresponding color
(ps < .05, one-sided signed-rank test, FDR corrected in the time window of interest, 100–500 msec). The shaded region indicates a 95% confidence
interval. Adjacent to this axis is a MDS visualization, with a 2-D projection of the pairwise distances in the neural responses to each image from the
peak animacy cross-decoding time (176 msec). (C) Time course for big versus small decoding, as in (B). Adjacent MDS plot reflects a 2-D projection
of the neural similarity structure measured at the peak size cross-decoding time (140 msec).
temporal prediction of this account is that animacy and
object size information emerge early in the time-evolving
responses, with comparable timing for texform and origi-
nal formats. Indeed, EEG and magnetoencephalography
decoding studies measuring responses to intact pictures
have found that information can be decoded relatively
early in the time course of processing about depicted
animals versus inanimate objects (Ritchie et al., 2021;
Grootswagers, Ritchie, Wardle, Heathcote, & Carlson,
2017; Kaneshiro, Perreau Guimaraes, Kim, Norcia, &
Suppes, 2015; Cichy, Pantazis, & Oliva, 2014; Carlson,
Tovar, Alink, & Kriegeskorte, 2013) and about big versus
small objects (depicted at the same visual size on the
screen; Khaligh-Razavi, Cichy, Pantazis, & Oliva, 2018).
Furthermore, neurophysiological studies in nonhuman
primates also have found that within 100 msec of stimulus
onset, information about the animacy of the presented
images can be decoded from the population structure
of neural responses in V4 and IT (Cauchoix, Crouzet, Fize,
& Serre, 2016). Early decoding performance of these
high-level properties in original images is consistent with
a more primitive underlying format—although this infer-
ence is not required by the data.
An alternate temporal prediction is that neural re-
sponses to texforms will show more gradual emergence of
animacy and object size information, increasing steadily
over hundreds of milliseconds. This pattern of data might
emerge if texforms contain only very subtle feature differ-
ences related to animacy and object size, which are not
linearly decodable in an initial feed-forward pass. These
subtle differences may trigger later stages of processing,
which can reformat and amplify the visual input through
more iterative processing steps, so that animacy and
object size information is evident in the structure of the
responses at later time points. Indeed, Grootswagers,
Robinson, Shatek, and Carlson (2019) recently argued
for this possibility. They measured responses to texform
and original images with EEG, using a rapid presentation
Wang, Janini, and Konkle
1671
design (Grootswagers, Robinson, & Carlson, 2019) in
which they varied the presentation speed of the stimuli.
Considering neural responses to original images, they
found that animacy and size information could be robustly
decoded with presentation rates up to 30 Hz. However,
considering neural responses to texform images, they
found that animacy could only be decoded at the slowest
rate (5 Hz), and size information was not decodable at all.
Based on these results, they argued that texforms can elicit
animacy signatures, but only given sufficient processing
time, and that perhaps higher-order visual areas are
required to further “untangle” these features into linearly
separable categorical organizations (DiCarlo & Cox, 2007).
Here, we also measured EEG responses to both original
and texform images depicting animate and inanimate
objects of big and small real-world sizes. However, we
used a standard event-related paradigm, allowing us to
probe the structure of the neural responses without addi-
tional effects of forward and backward masking. To antic-
ipate, we found that both animacy and size information
could be decoded from EEG responses to texforms, as
early in the time-evolving responses as evoked by original
recognizable images. Moreover, we found that classifiers
trained on neural responses to texform images were able
to predict the animacy and size of responses to original
images, indicating that these two image formats reflect
animacy and object size information through a common
representational basis. Broadly, our results thus support
the view that mid-level feature differences contain signa-
tures of animacy and object size, which are available early
in the visual processing stream.
METHODS
The experimental data and code used in this study can be
found at osf.io/mxrge.
Participants
Participants (n = 19) with normal or corrected-to-normal
vision were recruited at the Harvard University community
(mean age = 27.5 years, range: 20–42 years; 13 women;
one left-handed). This sample size was decided by previous
similar studies using EEG decoding (Bae & Luck, 2018;
Grootswagers, Ritchie, et al., 2017). All participants pro-
vided informed consent and received course credits or
financial compensation. We excluded one participant
from further analyses because of excessive movements
and self-reports of discomfort during the experiment.
All procedures were approved by the institutional review
board at Harvard University.
Stimuli and Tasks
The stimulus set consisted of 120 total images with 60
recognizable images of 15 big animals, 15 big objects,
15 small animals, and 15 small objects and their texform
counterparts (Figure 1A), which were created by under-
going a modified texture-synthesis process (Freeman &
Simoncelli, 2011). See Long et al. (2018) for detailed
descriptions of stimulus generation. The image set
reflects a stratified randomly selected subset of the
full stimulus set of Long et al. (2018), which consisted
of 240 images.1
Stimuli were presented on a 13-in. LCD monitor (1024 ×
768 pixels; refresh rate = 60 Hz) at a viewing distance of
around 60 cm with a visual angle of 12°, using MATLAB
and Psychophysics Toolbox extensions (Brainard, 1997).
A bullseye-like fixation remained present at the center
of the screen at all times. At the start of each trial, an
image was shown for a 400-msec stimulus presentation.
In the first 100 msec, the image was linearly faded in,
and in the last 83.3 msec, the image was linearly faded
out. We made this choice based on the reasoning that it
might reduce the abrupt onset and offset signals that
might swamp out signatures of later-stage processing. At
image offset, there was a 600-msec blank period before
the subsequent trial began. We instructed the participants
to view the stimulus images attentively while undergoing
EEG recording. To minimize artifacts, we included a
1.5-sec “blinking period” every five trials. During this
period, the fixation dot turned green to signal the partici-
pants that they were encouraged to blink. They were
asked to refrain from blinking for the rest of the time.
For each run, all 60 exemplars within a given stimulus
type (original or texform) were shown in randomized
order and repeated 4 times, resulting in 240 trials
(5.32 min). Participants first completed six runs of this
protocol in which they saw texform stimuli, followed by
six runs with original stimuli. The texform runs were all
completed first (rather than alternating with original runs),
because we wanted to avoid the possibility that partici-
pants hypothesized and looked for correspondences
between original images and texform images. This
texform-first procedure was also used in the fMRI design
from Long et al. (2018).
EEG Recording and Preprocessing
Continuous EEG was recorded from 32 Ag/AgCI electrodes
mounted on an elastic cap (EasyCap) and amplified by a
Brain Products ActiCHamp system (Brain Vision).2 The
following scalps sites were used: FP1, FP2, F3, F4, FC1,
FC2, Cz, C3, C4, CP1, CP2, CP5, CP6, P3, P4, P7, P8, POz,
PO3, PO4, PO7, PO8, Oz, O1, O2, Iz, I3, I4. This montage
was arranged according to the 10–10 system with some
modifications. Specifically, three frontal electrodes were
rearranged to have more electrodes over the posterior
occipital pole (Stormer, Alvarez, & Cavanagh, 2014).
Another two sites, T7 and T8, were also obtained but not
used because of the noisy data. The horizontal electrooc-
ulogram was measured using electrodes positioned at the
external ocular canthi to monitor horizontal eye move-
ments. The vertical electrooculogram was measured at
1672
Journal of Cognitive Neuroscience
Volume 34, Number 9
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
electrode FP1 to detect eye blinks. All scalp electrodes
were on-line referenced to the average of both mastoids
and digitized at a rate of 500 Hz.
We conducted EEG data preprocessing and analysis
using the MNE-Python package (Gramfort et al., 2014).
First, portions of EEG containing excessive muscle move-
ments were identified by visual inspection and removed.
Continuous signals were then bandpass filtered with cutoff
frequencies of 0.01 Hz and 100 Hz. In the next step, we
applied independent component analysis (ICA) for each
participant to identify and remove components associated
with eye blinks or horizontal eye movements. The ICA-
corrected data were segmented into 1000-msec epochs
from −100 to 900 msec relative to the stimulus onset
and baselined to prestimulus periods. Finally, automated
artifact rejection was employed to drop and repair
bad epochs using the code package Autoreject ( Jas,
Engemann, Bekhti, Raimondo, & Gramfort, 2017) with
default parameters.
Following these preprocessing steps, participants had,
on average, 1373 trials (SD = 77) for texform stimuli and
1358 trials (SD = 85) for original stimuli, with no significant
difference between these two stimulus types, t(17) = .97,
p = .35, paired t test. The number of trials did not differ
across conditions (big animals, big objects, small animals,
and small objects) for either original stimuli, F(3, 51) =
0.96, p = .42, ANOVA, or texform stimuli, F(3, 51) =
1.12, p = .35, ANOVA. We also conducted the main analy-
ses without ICA and autoreject procedures in place and
obtained the same patterns of results.
Decoding Analyses
Category-level Decoding
A linear discriminant analysis classifier was trained to dis-
criminate animate versus inanimate objects based on neu-
ral activation patterns across scalp electrodes, at each time
point. The classifier was implemented with scikit-learn
(Pedregosa et al., 2011) with default parameters (solver:
singular value decomposition with threshold of 1.0e-4).
We conducted decoding analyses on supertrials averaged
across multiple trials rather than on single-trial data. This
procedure is included because previous studies showed
that averaging across several trials can improve the signal-
to-noise ratio (Bae & Luck, 2018; Grootswagers, Wardle,
& Carlson, 2017; Isik, Meyers, Leibo, & Poggio, 2014). In
particular, six supertrials were computed for each stimulus
exemplar by averaging over two to four trials because the
numbers of trials varied across different stimuli after auto-
matic artifact rejection. The number of averaged trials was
determined by the recommendation of Grootswagers,
Wardle, et al. (2017). This procedure yielded 360 supertrials
for recognizable stimuli (e.g., 180 animate / 180 inanimate)
and 360 supertrials for texform stimuli. In addition, we also
conducted data analysis without applying supertrial aver-
aging and observed the same pattern of results.
Following standard EEG decoding practices on category
decoding (Grootswagers, Ritchie, et al., 2017; Carlson
et al., 2013; see Grootswagers, Wardle, et al., 2017, for a
method review), we employed independent exemplar
cross-validation (five-fold), which requires the classifier
to generalize to new stimuli. In each fold, the supertrials
for 24 animate stimuli and 24 inanimate stimuli (80%
exemplars) were used to train the classifier, which was
then tested on the supertrials from the remaining six ani-
mate stimuli and six inanimate stimuli (20% exemplars).
For each fold, we measured the area under the curve of
the receiver-operating characteristic (AUC ROC), which
reflects an aggregate measure of performance across all
possible classification thresholds. Size decoding was
computed with a similar logic. Classifiers were trained
to discriminate between 24 big and 24 small stimuli
and tested on the remaining six big and six small stimuli.
In a further analysis to explore tripartite representation
(Konkle & Caramazza, 2013), we conducted size decoding
separately for big versus small animals and for big versus
small inanimate objects.
To ensure the robustness of this AUC ROC estimate, we
iterated the above procedure 20 times to minimize the
idiosyncrasies in supertrial averaging and five-fold strati-
fied splits. After completing all iterations of cross-
validation, the final decoding performance was computed
as the average of the 100 decoding attempts (5 folds ×
20 iterations).
Cross-decoding
A similar decoding procedure was followed for the cross-
decoding analyses but trained on one stimulus type and
tested it on the other. For example, in one-fold, the classi-
fier was trained using supertrials from 24 animate exem-
plars and 24 inanimate exemplars in their texform format.
Critically, this classifier was then tested with supertrials
from the remaining six animate and six inanimate exem-
plars in their recognizable form. These procedures
ensured that the number of trials used for training and
testing were exactly the same as those used for decoding
the same stimulus category, thus are similarly powered.
We also conducted cross-decoding in the opposite direc-
tion (training on recognizable originals, testing on
texforms).
To create a graphical depiction of the similarity struc-
ture in the measured EEG responses, we used the follow-
ing approach. First, electrode patterns were extracted for
each object exemplar at each time point, yielding 60 con-
ditions for recognizable images and 60 conditions for
texform images. Next, we measured the multivariate
noise-normalized Euclidean distance (Guggenmos, Sterzer,
& Cichy, 2018) between EEG patterns of all possible object
pairs. Therefore, a 120 × 120 representational dissimilarity
matrix was obtained for each participant at each time
point. Finally, we used multidimensional scaling (MDS)
to transform the group-averaged EEG representational
Wang, Janini, and Konkle
1673
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
dissimilarity matrix at the peak decoding time into a 2-D
space. Note that these plots are purely a supplementary
visualization to provide a graphical intuition of the suc-
cessful cross-decoding results (e.g., the main decoding
analyses were not conducted in this 2-D MDS space).
Cichy et al., 2014). To test the differences of onset and
peak latencies, we estimated the p values based on boot-
strapped distributions. Such results were corrected for
the number of comparisons using FDR correction with
the significance level of p < .05.
Pairwise Decoding
To determine the decodability of each object against
others, we estimated the pairwise decoding performance
of all pairs of objects for both original and texform images.
Linear discriminant analysis classifiers were trained and
evaluated with ROC AUC metric via five iterations of
cross-validation. On each iteration, we trained a classifier
to discriminate between two objects on 80% of trials and
tested on the held-out 20% of trials. Please note that no
supertrial averaging was applied here because of the
limited number of trials for each single object stimulus
(original: 22.6 ± 1.4; texform: 22.9 ± 1.3). The final pair-
wise decoding performance at each time point was the
average of all pairwise decoding results across all cross-
validation attempts (1770 pairs × 5 iterations). For the
sake of saving computation time, we downsampled the
EEG data with a decimation factor of two.
Statistical Testing
To examine whether the decoding performance was
significantly above chance, we conducted one-sided
Wilcoxon signed-rank tests, which is nonparametric and
does not make any assumptions about the shape of the
data distribution. When comparing the performance of dif-
ferent conditions of interest, we used two-sided Wilcoxon
signed-rank tests. We conducted these statistical tests
across the time points in a time window of interest
(100–500 msec) and then applied false discovery rate
(FDR) correction ( p < .05). The time window of interest
was determined as the duration of a 400-msec presentation
with a starting point at 100 msec when the stimuli have full
onset (stimuli were faded-in in the first 100 msec).
The latency of decoding onset was defined as the first
time point with above-chance decoding ( p < .01, uncor-
rected) for three consecutive time points; this approach
was adapted from several previous studies (Robinson,
Grootswagers, & Carlson, 2019; Cichy et al., 2014; Carlson
et al., 2013). Note that in this procedure, multiple compar-
isons are not applied so that the estimation of onset
latency does not depend on the decoding performance
of later time points. The time of peak decoding was defined
as the time point with maximum performance within the
time window of interest (100–500 msec). In the case where
there were multiple local maximums within the window,
the first of those maximums was selected.
We assessed the median and confidence interval of the
onset and peak latencies using bootstrap sampling (with
replacement) with 5000 iterations (for a similar analysis,
see Robinson et al., 2019; Cichy, Pantazis, & Oliva, 2016;
RESULTS
Animate versus Inanimate Decoding
First, we examined whether recognizable images of ani-
mate and inanimate objects evoked distinguishable spatial
EEG patterns over time, as has been previously shown
(e.g., Khaligh-Razavi et al., 2018; Grootswagers, Wardle,
et al., 2017; Ritchie, Tovar, & Carlson, 2015; Carlson
et al., 2013). Figure 1B (solid silver line) shows a plot of
decoding accuracy as a function of time for original
images. Consistent with previous work, we observed a
robust ability to classify animacy information: The spatial
topography of the elicited EEG responses to animate
and inanimate recognizable images were distinguishable
from each other ( ps < .05, one-sided signed-rank test,
FDR corrected), with significant onset at 126 msec (95%
CI [116, 142] msec) and peak classification accuracy at
188 msec (95% CI [184, 200] msec).
Next, we investigated (i) whether unrecognizable tex-
form images of animate and inanimate objects evoke dis-
tinct spatial EEG patterns, and if so, (ii) at what time these
distinctions emerge relative to the recognizable image
counterparts. The same classification analysis as above
was performed but considering only responses to texform
images (Figure 1B, solid black line). Animate and inani-
mate texforms elicited different EEG patterns, with an
onset of significant decoding at 152 msec (95% CI [106,
164] msec) and an early classification peak at 176 msec
(95% CI [146, 190] msec). The onset and peak latencies
of decoding for texform images did not significantly differ
from those for recognizable images (onset: p = .34, peak:
p = .14, bootstrapping test, FDR corrected). Critically, ani-
macy decoding did not emerge over several hundreds of
milliseconds, as would be predicted if extra processing
time was needed to extract and/or amplify animacy infor-
mation from texform images. However, animacy decoding
did have a lower accuracy for texforms in comparison to
original images (non-independent peak decoding accu-
racy: original 74.46% vs. texform 56.01%, p < .001, two-sided
signed-rank test). Overall, these results indicate that the
mid-level feature content preserved in texform images con-
tains early perceptual signatures of animacy information.
Are the features that support the animacy distinction in
texforms the same as those supporting animacy decoding
in original recognizable images? If this is the case, both tex-
forms and original images should evoke the same topo-
graphical differences that distinguish between animate
and inanimate objects. To test this possibility, we con-
ducted cross-decoding analyses in which we trained clas-
sifiers to discriminate EEG responses to animate versus
1674
Journal of Cognitive Neuroscience
Volume 34, Number 9
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
inanimate texform images, and then tested the classifiers
on responses to animate and inanimate original images.
To ensure that classifiers were generalizing to new exam-
ples, we did not include any of the original-counterpart
images to the texforms used to train the classifier. As
shown in Figure 1B (dashed gray line), we found that
texform-trained classifiers could successfully classify
whether a new recognizable object was animate or inani-
mate ( ps < .05, one-sided signed-rank test, FDR cor-
rected). Such successful decoding was also evident early
(onset: 140 msec, 95% CI [114, 152] msec; peak: 176 msec,
95% CI [174, 192] msec), with no significant difference in
time to original images (onset: p = .44, peak: p = .34,
bootstrapping test, FDR corrected) or texform images
(onset: p = .14, peak: p = .46, bootstrapping testing,
FDR corrected). Moreover, we also observed similar
results when conducting the cross-decoding in the oppo-
site direction (training on recognizable originals and test-
ing on texforms). Thus, the classification boundary
between animate and inanimate texforms also separates
the animate and inanimate recognizable images, demon-
strating the activation patterns are similar between these
image formats.
Unexpectedly, we found that texform-trained classifiers
could predict the animacy more accurately for recognizable
images than for other texform images (non-independent
peak decoding: texform-original 63.4% vs. texform-texform
56.0%, p < .001, two-tailed signed-rank test). How is this
superior classification accuracy possible? One possibility
is that the original images evoke more discriminable neu-
ral responses than texforms, while still sharing a common
large-scale topographic decision boundary. Consistent
with this possibility, Figure 1B (right) provides a graphi-
cal intuition for this explanation. This MDS plot visualizes
the neural pattern similarity structure among the original
images (open dots) and texform images (filled dots) at
the peak cross-decoding time (176 msec), such that
items with similar neural response patterns are nearby
in the plot. Note that there is a general separation
between animates (purple dots) and inanimates (green
dots) across both texforms and originals. Furthermore,
the texform images (filled dots) are closer to each other;
in contrast, recognizable images (open dots) are more
distinctive and farther apart in this visualization. Thus,
this visualization helps provide an intuition for how orig-
inal images can be classified more accurately than tex-
form images by a texform-trained classifier.
A second piece of evidence also supports the interpre-
tation that the original images evoke more separable, dis-
tinctive neural responses than those evoked by texforms.
Specifically, we estimated the discriminability of responses
at the item level, estimating the average pairwise decoding
accuracy over all pairs of items. Figure 2 shows that pair-
wise decoding accuracy is significantly higher for original
images than for texform images ( ps < .05, two-sided
signed-rank test, FDR corrected). Thus, we reason that,
to the degree that both texforms and original images
Figure 2. Time course of pairwise decoding. Pairwise decoding
performance averaged across all object pairs is plotted along the y axis,
as a function of time (x axis), for originals (silver line) and texforms
(black line); shaded region indicates 95% CI. Time points with
significant difference between original and texform stimuli are depicted
below the time courses (two-sided signed-rank test, ps < .05, FDR
corrected in the time window of interest, 100–500 msec).
evoke similar patterns of neural responses that share a
common decision boundary, the original images should
be more easily classifiable because of their more dis-
tinctive evoked brain responses. In this way, this
cross-decoding result provides strong evidence that the
differences of elicited spatial topography that reflect
animacy distinction in texforms are highly compatible
with the distinguishing differences between recognizable
animals and objects.
Big versus Small Decoding
Next, we examined evoked differences between big and
small entities. Overall, the results reveal a similar pattern
of results but with weaker overall decoding accuracy, plot-
ted in Figure 1C (left). There was a significant difference
between the elicited EEG response patterns to original
images depicting big entities and small entities ( ps <
.05, one-sided signed-rank test, FDR corrected), as well
as for texform images ( ps < .05, one-sided signed-rank
test, FDR corrected). The timing of this emerging size dis-
tinction was also early in the response: neither decoding
onsets nor decoding peaks for texform responses (onset:
130 msec, 95% CI [120, 246] msec; peak: 150 msec, 95% CI
[114, 162] msec) were significantly different from those for
original images (onset: 120 msec, 95% CI [114, 132] msec,
p = .38; peak: 174 msec, 95% CI [110, 194] msec, p = .88,
bootstrapping test, FDR corrected), although we note the
lower accuracy is also accompanied with less confident
estimates of the onset. Furthermore, we found significant
cross-decoding evident in classifiers trained on texform
images and tested on original images ( ps < .05, one-
sided signed-rank test, FDR corrected), also evident
early in time (onset: 124 msec, 95% CI [120, 128] msec;
Wang, Janini, and Konkle
1675
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
peak: 140 msec, 95% CI [130, 172] msec), with no signif-
icant difference in timing to original images (onset: p =
.38, peak: p = .88, bootstrapping test, FDR corrected) or
to texform images (onset: p = .38; peak: p = .88, boot-
strapping test, FDR corrected). In summary, the above
results demonstrate systematic (albeit weak) differences
in neural responses to texformed versions of big and
small images, evident early in the time course of process-
ing, with compatible EEG response structure as evoked
by original images.
We next conducted further analysis to assess size decod-
ing separately for the animate and inanimate domains,
motivated by previous work with fMRI by Konkle and
Caramazza (2013). In particular, the spatial activations of
ventral visual cortex exhibit three large-scale cortical zones
preferentially responding to big inanimate objects, small
inanimate objects, and animals (of both sizes). That is,
there were similar spatial activation patterns for big and
small animals (Konkle & Caramazza, 2013). Thus, we next
examined the degree to which this “tripartite” signature
was also apparent in the decoding of EEG responses.
Given these previous findings from fMRI, we expected size
decoding to be stronger among inanimate objects than
among animals.
The results are shown in Figure 3 (top). We found that
size information was decodable from responses evoked by
inanimate objects, and by animate objects, for both origi-
nals and texforms (all ps < .05, one-sided signed-rank
test, FDR corrected). However, size decoding from
responses to animal images was actually stronger than size
decoding from responses to object images, contrary to
what we expected ( ps < .05, two-sided signed-rank test,
FDR corrected). Note that this pattern of results held in
both texforms and originals images. Considering the time
course of this size decoding, responses to big versus small
animals show an earlier and more rapid rise in their clas-
sifiability, whereas responses to big versus small inanimate
objects show a slower and more gradual separability.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Figure 3. Decoding size among animate or inanimate objects only, for (A) original images and (B) texform images. In both plots (upper),
classification accuracy ( y axis) is plotted as a function of time (x axis). Purple line: big animals versus small animals. Green line: big objects versus
small objects. Gray line: the combined classification for animates and inanimates is plotted for reference, which corresponds to the silver (original) or
black (texform) line in Figure 1C. Time points with significant decoding are depicted above the time courses ( ps < .05, one-sided signed-rank test,
FDR corrected in the time window of interest, 100–500 msec), and time points with significant difference between original and texform stimuli are
depicted below the time courses ( ps < .05, two-sided signed-rank test, FDR corrected). Below the line plots are MDS visualizations, with a 2-D
projection of the pairwise distances of the neural responses to animate objects only or inanimate objects only, examined at the peak cross-decoding
(140 msec).
1676
Journal of Cognitive Neuroscience
Volume 34, Number 9
Using MDS, we visualized the EEG pattern similarity
structure among animals and among inanimate objects,
separately for original and texform images (Figure 3 bot-
tom). Specifically, we visualized the similarity structure
evident at 140 msec, when the texform-to-original cross-
decoding showed peak performance. In line with the
decoding results, this visualization shows that the separa-
tion between big and small objects are clearer for animate
objects in comparison to inanimate objects (for both orig-
inal and texform images). Thus, these EEG size decoding
results reveal a notable difference between the scalp-
electrode response patterns over time and the large-scale
cortical activation patterns along the ventral pathway that
aggregated over time. We speculate on the underlying
causes of these patterns of data in the Discussion section.
DISCUSSION
Here, we employed multivariate EEG decoding to examine
whether and when the visual system is sensitive to mid-
level feature differences related to the broad distinctions
of animacy and real-world size. We used a well-established
stimulus set that includes recognizable images of big and
small animals and objects, as well as their unrecognizable
“texform” counterparts (Long et al., 2016, 2017, 2018). We
found that: (1) neural responses measured by EEG to tex-
form images contained early information about animacy
and size, as evident by above-chance decoding accuracy.
(2) This broad categorical information was decodable
from evoked responses to texforms at a similar time as
from evoked responses to recognizable original images.
(3) In addition, the time-evolving activation patterns were
similar between these image formats, as evident by signif-
icant cross-decoding, suggesting a common underlying
basis. Broadly, these EEG results indicate that the visual
system contains an extensive mid-level feature bank, with
early sensitivity to mid-level feature differences supporting
animacy and size distinctions.
These patterns of data, and our subsequent interpreta-
tions, offer a different perspective than recent work by
Grootswagers, Robinson, Shatek, et al. (2019). Specifically,
Grootswagers, Robinson, Shatek, et al. (2019) also
explored if animacy and size could be decoded from tex-
form images, but they employed a fast image presentation
paradigm (Grootswagers, Robinson, & Carlson, 2019)
in which the presentation rate was varied from 5 Hz to
60 Hz—differing from our slow event-related design. In
their data, texforms elicited brain response structure with
weaker decoding of animacy information than recogniz-
able objects, and only at the slowest presentation rate.
Based on these results, they proposed that additional pro-
cessing time in higher order visual areas is required to
further “untangle” the mid-level feature differences
evident in texforms into linearly separable categorical
organizations (cf. DiCarlo & Cox, 2007). In contrast, we
propose that no further “untangling” is required for ani-
macy and object size information to emerge.
To reconcile our findings with Grootswagers, Robinson,
Shatek, et al. (2019), we offer the following possibility. We
propose that the visual system contains a mid-level feature
bank that carries linearly decodable information on ani-
macy and size. Texforms and original images rapidly
activate this feature bank in a primarily feedforward
processing sweep, enabling early decoding. However,
perhaps when stimuli are presented in rapid succession
without gap time in between, as in Grootswagers, Robinson,
Shatek, et al. (2019), the recurrent/feedback activity from
the previous stimulus interferes with the early processing
stages of the incoming stimulus. For example, back-to-
back presentations have been reported to elicit smaller
periodic signals (Retter, Jiang, Webster, & Rossion, 2018)
and delayed neural responses (Robinson et al., 2019) in
comparison to presentation schedules with gap time
between successive stimuli. We also observed in our data
that texforms do not elicit the same magnitude of feature
activation as original images—this is evident in our data by
their generally lower decoding accuracy, both at the cate-
gory and item-level, and is also found in neuroimaging
results (Long et al., 2018). Thus, these responses may
be more likely to be extinguished under conditions of
forward masking, leading to accentuated differences
between original and texform images. Thus, rather than
requiring more untangling time, our proposal accounts
for the similarities between texforms and originals seen
in our study at early time points, and instead posits
increased susceptibility to forward masking during rapid
texform presentation.
One other pattern of these data was that animacy
decoding was more accurate from neural responses to
original stimuli than to texform stimuli—what factors
might underlie this accuracy difference? One possibility
is that the original stimuli have additional mid-level visual
features not captured by the texform generation algorithm
(e.g., clear outer and inner contours). It is important to
keep in mind that texforms preserve some mid-level visual
features related to second-order image statistics in local-
ized pooling regions, but these are not necessarily a per-
fect model of mid-level visual representation. Relatedly,
another possibility is that decoding was higher for original
images because they contain additional category-specific
object parts that are not present in texforms. For example,
animals often have tails, eyes, and noses, and these object
parts are obscured in the texform images. Finally, partici-
pant attention may have differed between these two sets
of stimuli, as recognizable original stimuli may better cap-
ture attention than texform stimuli. These possibilities are
not mutually exclusive. Further studies are needed to
determine what stimulus properties and task effects
account for the animacy decoding gap between original
and texform stimuli.
How do the current real-world size decoding results
relate to previous fMRI work? Specifically, Konkle and
Caramazza (2013) found that big and small object images
evoked a large-scale organization of responses across the
Wang, Janini, and Konkle
1677
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
cortical surface, whereas big versus small animals had sim-
ilar response topographies. We expected that EEG decod-
ing accuracy would also reflect this tripartite organization,
but that is not what we found. We can rule out the possi-
bility that the distinction between big and small animals
was driven by the detection of recognizable eyes or frontal
faces, because this result was also evident in the texform
images, which lack clear facial features. One possibility,
invited by the time-course of decoding, is that the neural
populations that distinguish between big and small
animals are only engaged early and transiently, and their
responses may not be evident in slower aggregated
responses of fMRI. This spatial–temporal hypothesis may be
possible to explore through fMRI-magnetoencephalography
fusion (Khaligh-Razavi et al., 2018; Cichy et al., 2016),
electrocorticography, or neural recordings in monkey
populations. More generally, these results highlight the
need for a deeper exploration of the convergences and
discrepancies between the spatial similarity structure of
neural activation patterns over EEG electrodes, and
BOLD-estimated activations over cortical voxels.
Although texform and original stimuli both quickly
evoked neural responses with information on animacy
and size, one limitation of this study relates to the preci-
sion at which we could measure the onset latencies of
these decoding results. In some cases, the onset latency
of decoding had a 95% confidence interval spanning sev-
eral tens of milliseconds, making it difficult to detect subtle
differences in the timing of decoding results. Such ranges
of variability could arise from individual differences in
decoding time course and have also been observed in
other studies that have reported the confidence intervals
of onset latency (Robinson et al., 2019; Cichy et al., 2014,
2016). Because of this variability, we would interpret the
onset time with some level of caution. In this study, the
timing of animacy and size decoding for both original
and texform stimuli is compatible with an early, primarily
feedforward stage of processing, rather than protracted
recurrent processing evolving over hundreds of millisec-
onds. However, subtle differences between texforms and
original images on the order of tens of milliseconds may
not have been revealed by our methods. Another limita-
tion is that the number of stimuli employed was relatively
limited (n = 60, 15 per animacy-size combination), leaving
open the possibility that these randomly selected exem-
plars may not be fully representative of the broader
categories they were sampled from. In our analyses, we
leveraged cross-validation methods that require predict-
ing animacy and size in held-out stimuli, mitigating this
concern with an analytical approach.
This work joins a growing set of results showing the
tight links between original and texformed counterparts
in perceptual processes(e.g., Chen, Deza, & Konkle,
2022; Long et al., 2016, 2017, 2018) and more generally
between mid-level feature distinctions and broader cate-
gorical distinction (Groen, Silson, & Baker, 2017). Overall,
this work provides clear support for the claim that early
visual processes operating over mid-level features con-
tain information about the broad categorical distinctions
of animacy and object size.
Acknowledgments
We thank Aylin Kallmayer and Hrag Pailian for their help during
the experiments and data collection. This work was supported
by NSF CAREER BCS-1942438 (T. K.).
Reprint requests should be sent to Ruosi Wang or Daniel Janini,
Department of Psychology, Harvard University, 33 Kirkland st.
7 floor, Cambridge, MA, 02138, United States, or via e-mail:
wang.ruosi@outlook.com or janinidp@gmail.com.
Author Contributions
R. W., D. J., and T. K. designed research. R. W. and D. J.
performed research. R. W. analyzed data. R. W., D. J.,
and T. K. interpreted the results. R. W. and T. K., wrote
the first draft of the paper. R. W., D. J., and T. K. edited
the paper. R. W. and D. J. contributed unpublished ana-
lytic tools.
Funding Information
Talia Konkle, Division of Behavioral and Cognitive Sci-
ences (https://dx.doi.org/10.13039/100000169), grant
number: CAREER BCS-1942438.
Diversity in Citation Practices
Retrospective analysis of the citations in every article pub-
lished in this journal from 2010 to 2021 reveals a persistent
pattern of gender imbalance: Although the proportions of
authorship teams (categorized by estimated gender iden-
tification of first author/last author) publishing in the Jour-
nal of Cognitive Neuroscience ( JoCN) during this period
were M(an)/M = .407, W(oman)/M = .32, M/ W = .115,
and W/ W = .159, the comparable proportions for the arti-
cles that these authorship teams cited were M/M = .549,
W/M = .257, M/ W = .109, and W/ W = .085 (Postle and
Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encour-
ages all authors to consider gender balance explicitly when
selecting which articles to cite and gives them the oppor-
tunity to report their article’s gender citation balance.
Notes
In the initial stimulus set from Long et al. (2018), there
1.
were 120 texforms total, 30 for each of the animacy × size con-
ditions. These 30 images were further split into six groups
based on their level of classifiability, reflecting how well inde-
pendent participants could guess whether the texform was
animate/inanimate and big/small. We used a subset of these
stimuli by randomly selected stimuli from each level of classifia-
bility: three exemplars from each group with highest, high, and
medium–high classifiability; and two exemplars from each
group with medium–low, low, and lowest classifiability.
2. Early in piloting, we tested this paradigm both with our 64-
channel EEG system and with a custom channel configuration
1678
Journal of Cognitive Neuroscience
Volume 34, Number 9
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
with more electrodes over the visual cortex. These equipment
changes did not yield any differences in the overall pattern of
our pilot data. Thus, we went to the 32-channel system because
the setup time was much shorter, which enabled us to increase
the power per subject within the limited duration of an EEG
experimental session.
REFERENCES
Bae, G.-Y., & Luck, S. J. (2018). Dissociable decoding of spatial
attention and working memory from EEG oscillations and
sustained potentials. Journal of Neuroscience, 38, 409–422.
https://doi.org/10.1523/JNEUROSCI.2860-17.2017, PubMed:
29167407
Baldassi, C., Alemi-Neissi, A., Pagan, M., DiCarlo, J. J., Zecchina,
R., & Zoccolan, D. (2013). Shape similarity, better than
semantic membership, accounts for the structure of visual
object representations in a population of monkey
inferotemporal neurons. PLoS Computational Biology, 9,
e1003167. https://doi.org/10.1371/journal.pcbi.1003167,
PubMed: 23950700
Bao, P., She, L., McGill, M., & Tsao, D. Y. (2020). A map of
object space in primate inferotemporal cortex. Nature, 583,
103–108. https://doi.org/10.1038/s41586-020-2350-5, PubMed:
32494012
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision,
10, 433–436. https://doi.org/10.1163/156856897X00357,
PubMed: 9176952
Carlson, T., Tovar, D. A., Alink, A., & Kriegeskorte, N. (2013).
Representational dynamics of object vision: The first 1000 ms.
Journal of Vision, 13, 1. https://doi.org/10.1167/13.10.1,
PubMed: 23908380
Cauchoix, M., Crouzet, S. M., Fize, D., & Serre, T. (2016).
Fast ventral stream neural activity enables rapid visual
categorization. Neuroimage, 125, 280–290. https://doi.org/10
.1016/j.neuroimage.2015.10.012, PubMed: 26477655
Chen, Y.-C., Deza, A., & Konkle, T. (2022). How big should this
object be? Perceptual influences on viewing-size preferences.
Cognition, 225, 105114. https://doi.org/10.1016/j.cognition
.2022.105114, PubMed: 35381479
Cichy, R. M., Pantazis, D., & Oliva, A. (2014). Resolving human
object recognition in space and time. Nature Neuroscience,
17, 455–462. https://doi.org/10.1038/nn.3635, PubMed:
24464044
Cichy, R. M., Pantazis, D., & Oliva, A. (2016). Similarity-based
fusion of MEG and fMRI reveals spatio-temporal dynamics in
human cortex during visual object recognition. Cerebral
Cortex, 26, 3563–3579. https://doi.org/10.1093/cercor
/bhw135, PubMed: 27235099
DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object
recognition. Trends in Cognitive Sciences, 11, 333–341.
https://doi.org/10.1016/j.tics.2007.06.010, PubMed: 17631409
Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral
stream. Nature Neuroscience, 14, 1195–1201. https://doi.org
/10.1038/nn.2889, PubMed: 21841776
Gramfort, A., Luessi, M., Larson, E., Engemann, D. A.,
Strohmeier, D., Brodbeck, C., et al. (2014). MNE software for
processing MEG and EEG data. Neuroimage, 86, 446–460.
https://doi.org/10.1016/j.neuroimage.2013.10.027, PubMed:
24161808
Grill-Spector, K., & Weiner, K. S. (2014). The functional
architecture of the ventral temporal cortex and its role in
categorization. Nature Reviews Neuroscience, 15, 536–548.
https://doi.org/10.1038/nrn3747, PubMed: 24962370
Groen, I. I. A., Silson, E. H., & Baker, C. I. (2017). Contributions
of low- and high-level properties to neural processing of
visual scenes in the human brain. Philosophical Transactions
of the Royal Society B: Biological Sciences, 372, 20160102.
https://doi.org/10.1098/rstb.2016.0102, PubMed: 28044013
Grootswagers, T., Ritchie, J. B., Wardle, S. G., Heathcote, A.,
& Carlson, T. A. (2017). Asymmetric compression of
representational space for object animacy categorization
under degraded viewing conditions. Journal of Cognitive
Neuroscience, 29, 1995–2010. https://doi.org/10.1162/jocn_a
_01177, PubMed: 28820673
Grootswagers, T., Robinson, A. K., & Carlson, T. A. (2019). The
representational dynamics of visual objects in rapid serial
visual processing streams. Neuroimage, 188, 668–679.
https://doi.org/10.1016/j.neuroimage.2018.12.046, PubMed:
30593903
Grootswagers, T., Robinson, A. K., Shatek, S. M., & Carlson,
T. A. (2019). Untangling featural and conceptual object
representations. Neuroimage, 202, 116083. https://doi.org
/10.1016/j.neuroimage.2019.116083, PubMed: 31400529
Grootswagers, T., Wardle, S. G., & Carlson, T. A. (2017).
Decoding dynamic brain patterns from evoked responses: A
tutorial on multivariate pattern analysis applied to time series
neuroimaging data. Journal of Cognitive Neuroscience, 29,
677–697. https://doi.org/10.1162/jocn_a_01068, PubMed:
27779910
Guggenmos, M., Sterzer, P., & Cichy, R. M. (2018). Multivariate
pattern analysis for MEG: A comparison of dissimilarity
measures. Neuroimage, 173, 434–447. https://doi.org/10
.1016/j.neuroimage.2018.02.044, PubMed: 29499313
Isik, L., Meyers, E. M., Leibo, J. Z., & Poggio, T. (2014). The
dynamics of invariant object recognition in the human visual
system. Journal of Neurophysiology, 111, 91–102. https://doi
.org/10.1152/jn.00394.2013, PubMed: 24089402
Jagadeesh, A. V., & Gardner, J. L. (2022). Texture-like
representation of objects in human visual cortex.
Proceedings of the National Academy of Sciences, 119,
e2115302119. https://doi.org/10.1073/pnas.2115302119,
PubMed: 35439063
Jas, M., Engemann, D. A., Bekhti, Y., Raimondo, F., & Gramfort,
A. (2017). Autoreject: Automated artifact rejection for MEG
and EEG data. Neuroimage, 159, 417–429. https://doi.org/10
.1016/j.neuroimage.2017.06.030, PubMed: 28645840
Jozwik, K. M., Kriegeskorte, N., & Mur, M. (2016). Visual
features as stepping stones toward semantics: Explaining
object similarity in IT and perception with non-negative
least squares. Neuropsychologia, 83, 201–226. https://doi
.org/10.1016/j.neuropsychologia.2015.10.023, PubMed:
26493748
Julian, J. B., Ryan, J., & Epstein, R. A. (2017). Coding of object
size and object category in human visual cortex. Cerebral
Cortex, 27, 3095–3109. https://doi.org/10.1093/cercor
/bhw150, PubMed: 27252351
Kaneshiro, B., Perreau Guimaraes, M., Kim, H.-S., Norcia, A. M.,
& Suppes, P. (2015). A representational similarity analysis of
the dynamics of object processing using single-trial EEG
classification. PLoS One, 10, e0135697. https://doi.org/10.1371
/journal.pone.0135697, PubMed: 26295970
Khaligh-Razavi, S.-M., Cichy, R. M., Pantazis, D., & Oliva, A.
(2018). Tracking the spatiotemporal neural dynamics of
real-world object size and animacy in the human brain.
Journal of Cognitive Neuroscience, 30, 1559–1576. https://
doi.org/10.1162/jocn_a_01290, PubMed: 29877767
Konkle, T., & Caramazza, A. (2013). Tripartite Organization of
the Ventral Stream by Animacy and object size. Journal of
Neuroscience, 33, 10235–10242. https://doi.org/10.1523
/JNEUROSCI.0983-13.2013, PubMed: 23785139
Konkle, T., & Oliva, A. (2012). A real-world size organization of
object responses in occipitotemporal cortex. Neuron, 74,
1114–1124. https://doi.org/10.1016/j.neuron.2012.04.036,
PubMed: 22726840
Wang, Janini, and Konkle
1679
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Long, B., Konkle, T., Cohen, M. A., & Alvarez, G. A. (2016).
Mid-level perceptual features distinguish objects of different
real-world sizes. Journal of Experimental Psychology:
General, 145, 95–109. https://doi.org/10.1037/xge0000130,
PubMed: 26709591
Long, B., Störmer, V. S., & Alvarez, G. A. (2017). Mid-level
perceptual features contain early cues to animacy. Journal of
Vision, 17, 20. https://doi.org/10.1167/17.6.20, PubMed:
28654965
Long, B., Yu, C.-P., & Konkle, T. (2018). Mid-level visual features
underlie the high-level categorical organization of the ventral
stream. Proceedings of the National Academy of Sciences,
U.S.A., 115, E9015–E9024. https://doi.org/10.1073/pnas
.1719616115, PubMed: 30171168
Mahon, B. Z., Anzellotti, S., Schwarzbach, J., Zampini, M., &
Caramazza, A. (2009). Category-specific organization in the
human brain does not require visual experience. Neuron, 63,
397–405. https://doi.org/10.1016/j.neuron.2009.07.012,
PubMed: 19679078
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,
B., Grisel, O., et al. (2011). Scikit-learn: Machine learning
in python. Journal of Machine Learning Research, 12,
2825–2830.
Peelen, M. V., & Downing, P. E. (2017). Category selectivity in
human visual cortex: Beyond visual object recognition.
Neuropsychologia, 105, 177–183. https://doi.org/10.1016/j
.neuropsychologia.2017.03.033, PubMed: 28377161
Proklova, D., Kaiser, D., & Peelen, M. V. (2016). Disentangling
representations of object shape and object category in
human visual cortex: The animate–inanimate distinction.
Journal of Cognitive Neuroscience, 28, 680–692. https://doi
.org/10.1162/jocn_a_00924, PubMed: 26765944
Retter, T. L., Jiang, F., Webster, M. A., & Rossion, B. (2018).
Dissociable effects of inter-stimulus interval and presentation
duration on rapid face categorization. Vision Research, 145,
11–20. https://doi.org/10.1016/j.visres.2018.02.009, PubMed:
29581059
Ritchie, J. B., Tovar, D. A., & Carlson, T. A. (2015). Emerging
object representations in the visual system predict reaction
times for categorization. PLoS Computational Biology, 11,
e1004316. https://doi.org/10.1371/journal.pcbi.1004316,
PubMed: 26107634
Ritchie, J. B., Zeman, A. A., Bosmans, J., Sun, S., Verhaegen, K.,
& Op de Beeck, H. P. (2021). Untangling the animacy
organization of occipitotemporal cortex. Journal of
Neuroscience, 41, 7103–7119. https://doi.org/10.1523
/JNEUROSCI.2628-20.2021, PubMed: 34230104
Robinson, A. K., Grootswagers, T., & Carlson, T. A. (2019). The
influence of image masking on object representations during
rapid serial visual presentation. Neuroimage, 197, 224–231.
https://doi.org/10.1016/j.neuroimage.2019.04.050, PubMed:
31009746
Stormer, V. S., Alvarez, G. A., & Cavanagh, P. (2014). Within-
Hemifield competition in early visual areas limits the ability
to track multiple objects with attention. Journal of
Neuroscience, 34, 11526–11533. https://doi.org/10.1523
/JNEUROSCI.0980-14.2014, PubMed: 25164651
Thorat, S., Proklova, D., & Peelen, M. V. (2019). The nature of
the animacy organization in human ventral temporal cortex.
eLife, 8, e47142. https://doi.org/10.7554/eLife.47142, PubMed:
31496518
Vinken, K., Konkle, T., & Livingstone, M. (2022). The neural
code for ‘face cells’ is not face specific. bioRxiv. https://doi
.org/10.1101/2022.03.06.483186
Yue, X., Robert, S., & Ungerleider, L. G. (2020). Curvature
processing in human visual cortical areas. Neuroimage, 222,
117295. https://doi.org/10.1016/j.neuroimage.2020.117295,
PubMed: 32835823
1680
Journal of Cognitive Neuroscience
Volume 34, Number 9
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
j
/
o
c
n
a
r
t
i
c
e
-
p
d
l
f
/
/
/
3
4
9
1
6
7
0
2
0
3
7
4
4
6
/
/
j
o
c
n
_
a
_
0
1
8
8
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3