RESEARCH ARTICLE - IA de Investigación especializada en el MIT

ARTÍCULO DE INVESTIGACIÓN

Evidence for a Spoken Word Lexicon in the
Auditory Ventral Stream

Srikanth R. Damera1
Suneel Banerjee1

, Lillian Chang1, Plamen P. Nikolov1, James A. Mattei1,
, Laurie S. Glezer2, Patrick H. Cox1, Xiong Jiang1,

Josef P. Rauschecker1, and Maximilian Riesenhuber1

1Department of Neuroscience, Georgetown University Medical Center, Washington, corriente continua, EE.UU
2Department of Speech, Idioma, and Hearing Sciences, San Diego State University, San Diego, California, EE.UU

Palabras clave: auditory lexicon, auditory ventral stream, speech recognition, superior temporal gyrus

ABSTRACTO

The existence of a neural representation for whole words (es decir., a lexicon) is a common feature
of many models of speech processing. Prior studies have provided evidence for a visual
lexicon containing representations of whole written words in an area of the ventral visual
stream known as the visual word form area. Similar experimental support for an auditory
lexicon containing representations of spoken words has yet to be shown. Using functional
magnetic resonance imaging rapid adaptation techniques, we provide evidence for an
auditory lexicon in the auditory word form area in the human left anterior superior temporal
gyrus that contains representations highly selective for individual spoken words. Además,
we show that familiarization with novel auditory words sharpens the selectivity of their
representations in the auditory word form area. These findings reveal strong parallels in how
the brain represents written and spoken words, showing convergent processing strategies
across modalities in the visual and auditory ventral streams.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
norte
oh

yo
/

a
r
t
i
C
mi
–
pag
d

F
/

4
3
4
2
0
2
1
4
5
1
5
1
norte
oh
_
a
_
0
0
1
0
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

INTRODUCCIÓN

Speech perception is perhaps the most remarkable achievement of the human auditory system
and one that likely is critically dependent on its overall cortical architecture. Es generalmente
accepted that the functional architecture of auditory cortex in human and nonhuman primates
comprises two processing streams (Hickok & Poeppel, 2007; Rauschecker & Scott, 2009;
Rauschecker & tian, 2000). There is an auditory dorsal stream that is involved in the process-
ing of auditory space and motion (van der Heijden et al., 2019) as well as in sensorimotor
transformations such as those required for speech production (Archakov et al., 2020; Hickok
et al., 2011; Rauschecker, 2011, 2018). There is also an auditory ventral stream specialized for
recognizing auditory objects such as spoken words. This stream is organized along a simple-
to-complex feature hierarchy (Rauschecker & Scott, 2009), akin to the organization of the
visual ventral stream (Kravitz et al., 2013).

Visual object recognition studies support a simple-to-complex model of cortical visual pro-
cessing in which neuronal populations in the visual ventral stream are selective for increas-
ingly complex features and ultimately visual objects along a posterior-to-anterior gradient
extending from lower-to-higher-order visual areas (Felleman & VanEssen, 1991; Hubel &
Wiesel, 1977). For the special case of recognizing written words, this simple-to-complex
model predicts that progressively more anterior neuronal populations are selective for

un acceso abierto

diario

Citación: Damera, S. r., Chang, l.,
Nikolov, PAG. PAG., Mattei, j. A., Banerjee, S.,
Glezer, l. S., Cox, PAG. h., Jiang, X.,
Rauschecker, j. PAG., & Riesenhuber, METRO.
(2023). Evidence for a spoken word
lexicon in the auditory ventral stream.
Neurobiology of Language, 4(3),
420–434. https://doi.org/10.1162
/nol_a_00108

DOI:
https://doi.org/10.1162/nol_a_00108

Supporting Information:
https://doi.org/10.1162/nol_a_00108

Recibió: 12 December 2022
Aceptado: 27 Abril 2023

Conflicto de intereses: Los autores tienen
declaró que no hay intereses en competencia
existir.

Autor correspondiente:
Srikanth R. Damera
srd49@georgetown.edu

Editor de manejo:
Sophie Scott

Derechos de autor: © 2023
Instituto de Tecnología de Massachusetts
Publicado bajo Creative Commons
Atribución 4.0 Internacional
(CC POR 4.0) licencia

La prensa del MIT

Evidence for a spoken word lexicon

increasingly complex orthographic patterns (Dehaene et al., 2005; Vinckier et al., 2007). De este modo,
analogous to general visual processing, orthographic word representations are predicted to
culminate in representations of whole visual words—an orthographic lexicon. Evidence sug-
gests that these lexical representations are subsequently linked to concept representations in
downstream areas like the anterior temporal lobe (Damera et al., 2020; Lambon Ralph et al.,
2017; Liuzzi et al., 2019; Malone et al., 2016). The existence of this orthographic lexicon in
the brain is predicted by neuropsychological studies of reading (Coltheart, 2004). En efecto,
functional magnetic resonance imaging (resonancia magnética funcional; Glezer et al., 2009, 2015) y, more recently,
electrocorticographic data (Hirshorn et al., 2016; Woolnough et al., 2021) have confirmed the
existence of such a lexicon in a region of the posterior fusiform cortex known as the visual
word form area ( VWFA; Dehaene & cohen, 2011; Dehaene et al., 2005).

It has been proposed that an analogous simple-to-complex hierarchy exists in the auditory
ventral stream as well (Kell et al., 2018; Rauschecker, 1998) extending anteriorly from
Heschl’s gyrus along the superior temporal cortex (STC; DeWitt & Rauschecker, 2012;
Rauschecker & Scott, 2009). Todavía, the existence and location of a presumed auditory lexicon,
eso es, a neural representation for the recognition (and storage) of real words, has been quite
controversial (Bogen & Bogen, 1976): The traditional posterior view is that the auditory lexicon
should be found in posterior STC (pSTC; Geschwind, 1970). A diferencia de, a notable meta-
análisis (DeWitt & Rauschecker, 2012) provided strong evidence for the existence of word-
selective auditory representations in anterior STC (aSTC), consistent with imaging studies of
speech intelligibility (Binder et al., 2000; Scott et al., 2000) and proposals for an auditory word
form area (AWFA) in the human left anterior temporal cortex (Cohen et al., 2004; DeWitt &
Rauschecker, 2012). Such a role of the aSTC is compatible with nonhuman primate studies
that show selectivity for complex communication calls in aSTC (Ortiz-Rios et al., 2015;
Rauschecker et al., 1995; Tian et al., 2001) and demonstrate, in humans and nonhuman pri-
compañeros, that progressively anterior neuron populations in the STC pool over longer timescales
(Hamilton et al., 2018; Hullett et al., 2016; Jasmin et al., 2019; Kajikawa et al., 2015). En esto
anterior account of lexical processing, the pSTC and speech-responsive regions in the IPL are
posited to be involved in “inner speech” and phonological reading (covert articulation), pero
not auditory comprehension (DeWitt & Rauschecker, 2013; Rauschecker, 2011). Todavía, a pesar de
this compelling alternative to traditional theories, there is still little direct evidence for an
auditory lexicon in the aSTC.

Investigating the existence and location of auditory lexica is critical for understanding the
neural bases of speech processing and, como consecuencia, the neural underpinnings of speech pro-
cessing disorders. Sin embargo, finely probing the selectivity of neural representations in the
human brain with fMRI is challenging, in part because it is difficult to assess the selectivity
of these populations. Many studies have identified speech processing areas by contrasting
speech stimuli with various nonspeech controls (Evans et al., 2014; Okada et al., 2010; Scott
et al., 2000). Sin embargo, these coarse contrasts cannot reveal what neurons in a particular audi-
tory word-responsive region of interest (ROI) are selective for, Por ejemplo, phonemes, sylla-
bles, or whole words. More sensitive techniques such as fMRI rapid adaptation (fMRI-RA;
Grill-Spector & Malach, 2001; Krekelberg et al., 2006) are needed to probe the selectivity
of speech representations in the brain and resolve the question of the existence of auditory
lexica. En el estudio actual, we used fMRI-RA to test the existence of lexical representations
in the auditory ventral stream. Paralleling previous work in the visual system that used fMRI-RA
to provide evidence for the existence of an orthographic lexicon in the VWFA (Glezer et al.,
2009, 2015, 2016), we first performed an independent auditory localizer scan that we used to
identify the AWFA in individual subjects, and then conducted three fMRI-RA scans that probed

Neurobiology of Language

421

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
norte
oh

yo
/

a
r
t
i
C
mi
–
pag
d

F
/

4
3
4
2
0
2
1
4
5
1
5
1
norte
oh
_
a
_
0
0
1
0
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Evidence for a spoken word lexicon

the representation in the AWFA and its plasticity. The first two scans consisted of real words
(RWs) and pseudowords (PWs; es decir., pronounceable nonwords), respectivamente. These scans
revealed an adaptation profile consistent with lexical selectivity in the putative AWFA for
RWs, but not novel PWs, directly replicating results for written words in the VWFA. Nosotros entonces
tested the lexicon hypothesis by predicting that training subjects to recognize novel PWs
would add them to their auditory lexica, leading them to exhibit lexical-like selectivity in
the AWFA following training, as previously shown for written words in the VWFA (Glezer
et al., 2015). para hacerlo, we conducted a third fMRI-RA scan after PW training. Results from this
scan showed RW-like lexical selectivity to the now-familiar PWs following training, secundario
the role of the AWFA as an auditory lexicon shaped by experience with auditory words.

MATERIALES Y MÉTODOS

Overall Procedure

en este estudio, participants completed an auditory localizer scan and three RA experiments over
the course of three scanning sessions. In the first session, subjects completed the auditory loca-
lizer and RW scans. The auditory localizer scan was used to identify a candidate AWFA region
in the anterior auditory ventral stream (Rauschecker & Scott, 2009), adopting the approach
used in the visual/orthographic case for defining the VWFA (Glezer et al., 2009, 2015,
2016). The RW RA scan was used to test if the candidate AWFA exhibited lexical selectivity
for RWs. In the remaining two scanning sessions subjects completed pre- and post-training PW
scans separated by six behavioral training sessions outside of the scanner. During the pre-
training scan, subjects were presented with the then untrained pseudowords (UTPW). Próximo,
subjects completed six behavioral training sessions consisting of a 2-back and a novel/familiar
task designed to familiarize and assess subject familiarity with the UTPW, respectivamente. Sub-
jects completed a maximum of one behavioral session each day and those participants that
achieved at least 80% accuracy on the novel/familiar task by the final session qualified for the
post-training scan. During the post-training scan, subjects were presented with the same set of
PWs, now called trained pseudowords (TPW). The pre- and post-training scans were used to
test whether the candidate AWFA exhibited lexical selectivity for TPW (es decir., after training) pero
not to the UTPW (es decir., before training).

Participantes

We recruited a total of 28 right-handed healthy native English speakers for this study (ages 18–
34, 12 hembras). Georgetown University’s Institutional Review Board approved all experimen-
tal procedures, and written informed consent was obtained from all subjects before the
experimento. Two subjects were excluded from further analyses for performing 2 standard
deviations below the average on the in-scanner task. Además, two subjects dropped
out of the study after completing the RW RA scan. In total, we analyzed 26 subjects for the
RW RA scan and 24 of those 26 for the pre-training UTPW scan. Due to subject dropout (norte = 4)
or failure to achieve 80% accuracy on the novel/familiar task (norte = 4), 16 fuera de 24 subjects
were analyzed for the post-training TPW scan.

Estímulos

Real word stimuli, for the RW RA experiment, were chosen using the English Lexicon Project
(Balota et al., 2007). Analogous to studies of the neural representation of written words (Glezer
et al., 2009, 2015), three sets of 50 high-frequency (>50 per million), monosyllabic RWs that
were 3–5 phonemes in length were created. One set of words (target words) served as the

Neurobiology of Language

422

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
norte
oh

yo
/

a
r
t
i
C
mi
–
pag
d

F
/

4
3
4
2
0
2
1
4
5
1
5
1
norte
oh
_
a
_
0
0
1
0
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Evidence for a spoken word lexicon

reference for the other two lists. The second set was created by altering each target word by a
single phoneme to create another RW. The third set was created by selecting for each target
word another RW that had the same number of (but no shared) phonemes. All three of these
lists were matched on the number of phonemes, orthographic neighborhood, and phonological
neighborhood. To create the UTPW/TPW we used MCWord (Medler & Binder, 2005) to gen-
erate four sets of 50 target PWs, 3–5 phonemes in length. One set to be trained and three sets to
remain untrained and serve as foils in the training task. As with the RW stimuli, we then used the
target PW as the reference to generate a set of PWs each differing by one phoneme from the
target word, and another set of PWs matched to the target PWs by number of phonemes but not
sharing any phonemes. RW and PW sets were matched for length, bigram and trigram fre-
quency, and phonological neighborhood. All stimuli were recorded using a 44.1-kHz sampling
rate in a sound-isolated booth by a trained female native speaker of American English.

Auditory Localizer Scan

The auditory localizer scan (Figura 1A) was used to independently identify auditory speech-
selective areas. In this scan, subjects were randomly presented with trials from one of five con-
ditions: Real Words, Pseudowords, Scrambled Real Words, Scrambled Pseudowords, y

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
norte
oh

yo
/

a
r
t
i
C
mi
–
pag
d

F
/

4
3
4
2
0
2
1
4
5
1
5
1
norte
oh
_
a
_
0
0
1
0
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 1. Rapid adaptation and auditory localizer experimental paradigms. (A) The slow clustered
acquisition paradigm used in the auditory localizer scan. Each trial was 9 s long with 1.5 s of vol-
ume acquisition and 7.5 s of silence. During the silent period, the subject heard four sounds from
one of five stimulus classes and performed a 1-back task. (B) The rapid clustered acquisition para-
digm used for the RA scans (Chevillet et al., 2013; Jiang et al., 2018). Each trial was 3.36 s long with
1.68 s of volume acquisition. During the silent period, two spoken words were played to the subject
con un 50 ms interstimulus interval. The first word acted as a prime and the second word the target.
The experimental paradigms for (C) real words and (D) pseudowords. The prime was followed by a
target word that was either the same word (SAME), a word that differed from the target by one pho-
neme (1PH), or a word that shared no phonemes with the target (DIFF). Además, subjects were
presented with silence trials that served as an explicit baseline. During the task, subjects were asked
to attend to all the words and respond when they heard the oddball stimulus (RW or PW containing
the rhyme “-ox,” e.g., “socks”) in either the prime or target position.

Neurobiology of Language

423

Evidence for a spoken word lexicon

Silence. Real Words and Pseudowords were one syllable long; the lists were matched for length
and orthographic and phonologic neighborhood. Scrambled Real Words and Scrambled Pseu-
dowords were generated by randomly rearranging 200 ms by 1-octave tiles of the constant Q
spectrogram (Marrón, 1991) for each RW or PW, respectivamente, and then reconstructing a time-
domain waveform with an inverse transform (Ortiz-Rios et al., 2015). Each trial in the auditory
localizer scan was 9 s long and began with 2.5 s of silence (during which 1.68 s of scanner
acquisition occurred), followed by a ∼3–4 s stimulus presentation period, and concluding with
silence. During the stimulus presentation period, subjects heard four stimuli from a given con-
dition and responded with a left-handed button press if any of the four stimuli was a repeat within
that block. In total, Había 145 trials per run with 25 trials of each of the five conditions and
five 1-back trials for each of the four non-Silence conditions. An additional 18 s of fixation were
added to the start and end of each run. Subjects completed five runs of the task.

Rapid Adaptation Scans

There were three RA scans (Figure 1B–D; es decir., RW, UTPW, TPW) performed on different days. En
the RA scans, subjects heard a pair of words (prime/target) on each trial (Figura 1C-D). El
words in each pair were either identical (SAME), differed by a single phoneme (1PH), or shared
no phonemes at all (DIFF). These pairs were generated by using the three matched word lists
described above. To engage subjects’ attention, we asked subjects to perform an oddball rhyme
detection task in the scanner. para hacerlo, we created an additional condition (Oddball) in which a
word or pseudoword containing the oddball rhyme “-ox” (p.ej., socks, grox) was presented in
lieu of either the prime or target word. Participants were asked to attentively listen to all stimuli
and respond with a left-handed button press when an oddball stimulus was heard. In all three
scans, the number of repetitions of each word was counterbalanced across all conditions to
control for long-lag priming effects (Henson et al., 2000). Trial order and timing were adjusted
using M-sequences (Buračas & Boynton, 2002). Each trial was 3.36 s long and consisted of a
1.68 s silent period followed by 1.68 s stimulus presentation period during which the word pairs
were presented. Following prior auditory RA studies (Chevillet et al., 2013; Jiang et al., 2018),
we presented stimuli with a 50 ms ISI. In total, Había 25 trials of each condition (SAME,
1PH, DIFF, oddball, and silence) for a total of 125 trials per run. An additional 10.08 s of fixation
was added to the start and end of each run. Subjects completed four runs for each scan.

Data Acquisition

MRI data were acquired on a 3.0 Tesla Siemens Prisma-fit scanner. We used whole-head echo-
planar imaging (EPI) sequences (flip angle = 70°, tiempo de eco [EL] = 35 EM, field of view [FOV] =
205 mm, 102 × 102 matrix) with a 64-channel head coil. Building off other auditory localizer
(Damera et al., 2021) and RA paradigms (Chevillet et al., 2013), a slow (repetition time [TR] =
9,000 EM, acquisition time [TA] = 1,680 EM) and a fast (TR = 3,360 EM, TA = 1,680 EM) agrupado
acquisition paradigm were used for the auditory localizer and RA scans, respectivamente. Fifty-four
axial slices were acquired in descending order (thickness = 1.8 mm, 0.18 mm gap; in-plane
resolution = 2.0 × 2.0 mm2). A T1-weighted MPRAGE (magnetization-prepared rapid acquisi-
tion with gradient echo) imagen (resolution 1 × 1 × 1 mm3) was also acquired for each subject.

fMRI Data Preprocessing

Image preprocessing was performed using SPM12 (Ashburner et al., 2021; https://www.fil.ion
.ucl.ac.uk/spm/software/spm12/). The first three volumes of each run were discarded to allow
for T1 stabilization, and the remaining EPI images were spatially realigned to the mean blood

Neurobiology of Language

424

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
norte
oh

yo
/

a
r
t
i
C
mi
–
pag
d

F
/

4
3
4
2
0
2
1
4
5
1
5
1
norte
oh
_
a
_
0
0
1
0
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Evidence for a spoken word lexicon

oxygen level-dependent (BOLD) reference image. No slice-time correction was done given the
presence of temporal discontinuities between successive volumes in clustered acquisition par-
adigms (Perrachione & Ghosh, 2013). EPI images for each subject were co-registered to the
anatomical image. The anatomical image was then segmented and the resulting deformation
fields for spatial normalization were used to normalize the functional data to the standard MNI
(Instituto Neurológico de Montreal) espacio. Próximo, we smoothed the normalized functional images
with a 4-mm full-width at half maximum Gaussian kernel. Finalmente, a first-level model contain-
ing regressors for each condition and the six motion parameters from realignment was fit. En
the auditory localizer scan, the regressors were Real Words, Pseudowords, Scrambled Real
Words, Scrambled Pseudowords, Button Press, and Silence. In the RA scans, the regressors
were SAME, 1PH, DIFF, Button Press, and Silence.

Defining the Auditory Word Form Area

The AWFA was determined at the individual-subject level using the auditory localizer scan.
Analogous to studies of the visual word form area (Glezer et al., 2009, 2015, 2016), for each
subject we first defined an RW vs. Silence contrast and an RW vs. Scrambled Words contrast.
Próximo, at the group level, we masked the RW vs. Silence t statistic map with the RW vs. Scram-
bled Words t statistic map thresholded at p < 0.05 before applying a p < 0.001 voxel-wise threshold. The resulting map had a peak at MNI: −62, −14, 2. These coordinates are consistent with the hypothesized locus (MNI: −61, −15, −5) of the AWFA from prior studies (DeWitt & Rauschecker, 2012). We then created the same RW vs. Silence masked by RW vs. Scrambled Words maps at the individual-subject level using the same thresholds as above. Then, for each subject we identified the local peak in the resulting maps closest to the group peak (MNI: −62, −14, 2). Finally, to create each individual subject’s AWFA, we created an ROI consisting of the 50 closest voxels to each individual’s local peak. Behavioral Training Subjects were trained to recognize 150 auditory PWs (TPW, see above). A 2-back training task, in which subjects had to detect repeats of a PW separated by another PW, was used to famil- iarize subjects with the PWs. Each session of the 2-back task consisted of 15 blocks of 75 trials each with self-paced breaks between each block. Each trial lasted 1.5 s during which subjects heard a single PW and had to respond if they heard a 2-back repeat (i.e., if the current PW was the same as the PW before the last). Each block lasted 112.5 s, and each session lasted for a total task length of 28.125 min excluding breaks. Following the 2-back task, subjects’ familiarity with the trained PWs was assessed using a novel/familiar task. For this task, we developed three sets of foils for each of the 150 PWs. Each foil differed from its base PW by a single phoneme. Each session of the novel/familiar task consisted of three blocks of 100 trials with self-paced breaks between each block. Each trial lasted 1.5 s during which subjects heard a single PW and had to respond with either the left or right arrow key to indicate a novel (i.e., a foil) or familiar (i.e., trained) PW. In total, each block lasted 150 s for a total task length of 7.5 min excluding breaks. Over the course of the novel/familiar task sessions each foil list was paired with the trained PW list only twice with at least two days since the last pairing. Six sessions of both tasks were per- formed over the course of about eight days. To proceed to the post-training RA scan, subjects had to achieve at least 80% accuracy on the novel/familiar task by their sixth session. Task-Based Functional Connectivity After preprocessing, we used the CONN toolbox Version 21.a (Whitfield-Gabrieli & Nieto- Castanon, 2012) to calculate seed-to-voxel stimulus-driven functional connectivity from the Neurobiology of Language 425 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon AWFA during auditory word processing in the auditory localizer scans. Word processing was included as the primary task condition by combining the onset times and durations for RW and PW into one condition. The individual subject (n = 26) AWFA seeds were used for the analysis. We then performed denoising with confounds, which included the subject-specific motion regressors computed during preprocessing in SPM, cerebrospinal fluid and white-matter sig- nals, and their derivatives, following the CompCor strategy (Behzadi et al., 2007) as imple- mented in CONN. Data were then band-pass filtered (0.008–0.09 Hz) with linear detrending. Seed-to-voxel functional connectivity was performed using a weighted general linear model computing the bivariate correlation between the AWFA seed with the whole brain. Due to the sparse acquisition type of this analysis, no hemodynamic response function weighting was performed. Group-level significance was determined with a voxel threshold of p < 0.001 and a cluster threshold of p < 0.05, false discoverey rate (FDR) corrected. RESULTS Auditory Localizer Scan Identifies Bilateral Speech-Selective ROI in the Auditory Ventral Stream An independent auditory localizer scan was used to identify a putative AWFA near previous literature coordinates (Figure 2A). To do so, analogous to prior studies of written word repre- sentations in the VWFA (Glezer et al., 2009, 2015, 2016), we first identified group-level audi- tory speech-selective areas (see Materials and Methods) by examining the (Real Words) vs. Silence contrast thresholded at p < 0.001 masked by the RW vs. Scrambled Real Words con- trast thresholded at p < 0.05 at the group-level cluster-corrected at the FDR p < 0.05. This approach mimicked definition of the VWFA in prior studies (Glezer et al., 2009, 2015) and gave us the greatest flexibility for defining ROIs while ensuring responsiveness to sound and intelligible speech. This revealed several clusters of activation in the superior temporal, frontal, l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d / . l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Identifying the auditory word form area (AWFA). (A) Proposed location of the AWFA Figure 2. (MNI: −61, −15, −5). Adapted from DeWitt and Rauschecker (2012). Color bar (arbitrary units) reflects the activation likelihood estimatoin (Laird et al., 2005) statistic. (B) The RW vs. Silence con- trast (p < 0.001) masked by the RW vs. Scrambled Real Words contrast (p < 0.05) in the auditory localizer scan. Only clusters significant at the FDR p < 0.05 level are shown. Colors reflect t statis- tics. (C) The peak in the left STG (MNI: −62, −14, 2). (D) The AWFA defined in individual subjects. The inset zooms in on the perisylvian region to highlight the location of the AWFA. Neurobiology of Language 426 Evidence for a spoken word lexicon and inferior temporal cortices (Figure 2B). A local peak in the left superior temporal gyrus (STG) was identified at MNI: −62, −14, 2 (Figure 2C) near the literature coordinates (DeWitt & Rauschecker, 2012) of MNI: −61, −15, −5. Individual subject AWFA ROIs (Figure 2D) were created by building a 50-voxel ROI (400 mm3) around the local peak of each subject (mean ± SD: −62 ± 2, −14.9 ± 3, 2.6 ± 2.6) closest to the group peak (see Materials and Methods). This is similar to the size of the VWFA identified in prior studies (Glezer et al., 2009, 2015). Lexical Selectivity for Real Words but Not Pseudowords in the AWFA The first two fMRI-RA scans were performed with RWs and UTPWs, respectively. In these experiments, we predicted the lowest signal for the SAME condition since the two identical words presented in that condition would repeatedly activate the same neural populations, thereby causing maximal adaptation. Likewise, we predicted the least amount of adaptation for the DIFF condition because two words that share no phonemes should activate disjoint groups of neurons, irrespective of whether responsiveness to auditory words in the localizer scan in that ROI was due to neurons selective for phonemes, syllables, or whole words. Finally, we tested specific predictions regarding responses in the 1PH condition. Specifically, if neurons in the AWFA ROI are tightly tuned to whole RWs (i.e., if the AWFA contains an auditory lexicon), the two similar but nonidentical RWs in the 1PH condition should have minimal neural overlap, and therefore no adaptation should occur and response levels in the 1PH condition should be comparable to that of the DIFF condition (as found for written real words in the VWFA; Glezer et al., 2009). In contrast, if neurons in the AWFA were tuned to sublexical phoneme combinations, then there should be a gradual release from adaptation from SAME to 1PH to DIFF, with 1PH < DIFF, as sublexical overlap would continue to increase from 1PH to DIFF. For PWs, we predicted that there would be a gradual increase in the BOLD signal paralleling the increasing dissimilarity (i.e., SAME to 1PH to DIFF). This is thought to reflect low-level activation of RW-tuned neurons to phonologically similar PWs. These predic- tions mirror findings in the VWFA for written words (Glezer et al., 2009, 2015, 2016) and are compatible with an experience-driven increase in selectivity of neurons in the AWFA to real words because of extensive auditory experience with and the need to discriminate among real words but not pseudowords. To test our hypotheses, we ran a 2-way repeated measures analysis of variance (ANOVA) on AWFA responses to investigate the relationship between lexicality (RW and UTPW) and word similarity (SAME, 1PH, and DIFF). The ANOVA was run on subjects that successfully com- pleted both the RW and UTPW scans (n = 24). This revealed a significant main effect of similarity (F2,46 = 42.597; p = 3.4E−11) but no significant main effect of lexicality (F1,23 = 2.547; p = 0.124). Critically, however, the analysis revealed a significant interaction between lexicality and word similarity (F2,46 = 4.092; p = 0.023). Planned paired t tests revealed an adaptation profile in the AWFA that was consistent with tight neural tuning for individual RWs (Figure 3A): There was a significant difference in the mean percent signal between the DIFF vs. SAME conditions (t(23) = 7.86; p < 0.001) and the 1PH vs. SAME (t(23) = 5.71; p < 0.001). However, there was no significant difference between the DIFF and 1PH conditions (t(23) = 1.95; p = 0.189). Similar results were obtained using all 26 subjects who completed the RW scan (Figure S1 in the Supporting Information). In contrast, the adaptation profile for UTPWs was not consistent with lexical selectivity (Figure 3B): There was a significant response difference between the DIFF vs. SAME conditions (t(23) = 7.03; p < 0.001), the 1PH vs. SAME (t(23) = 3.28; p = 0.010), and, critically, also for the DIFF vs. 1PH conditions (t(23) = 3.89; p = 0.002). Finally, we ran a whole-brain analysis to test whether lexical representations were localized to the anterior STG or more distributed in nature. Specifically, for RW the Neurobiology of Language 427 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d / . l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon Evidence for auditory lexical representations in the auditory word form area. Within- Figure 3. subject (n = 24) adaptation profile for auditory real words (RWs) and untrained pseudowords (UTPWs). Patterns of release from adaptation are compatible with tight tuning to individual RWs consistent with an auditory lexicon. In contrast, UTPWs show a graded release from adaptation as a function of phonological similarity. Horizontal black line in violin plots indicates the median. ***, **, *, and N.S. mark p < 0.001, <0.01, <0.05, and not significant (>0.1), all Bonferroni-
corrected for multiple comparisons.

conjunction of DIFF vs. SAME and 1PH vs. SAME after excluding voxels where DIFF vs. PH
was p < 0.05 produced a cluster in the left anterior STG (family-wise error corrected p < 0.05; MNI: −62, −14, 6) within 3.5 mm from the average coordinate (MNI: −62, −14.9, 2.6) of our individual ROIs. Importantly, this analysis for UTPWs produced no significant clusters. Thus, the whole-brain analysis also supports the special status of the AWFA as the location of a lex- icon for spoken real words. Planned paired t tests revealed an adaptation profile that was consistent with tight neural tuning for individual RWs. See Figure S1 in the Supporting Information available at https://doi .org/10.1162/nol_a_00108. There was a significant difference in the mean percent signal between the DIFF vs. SAME conditions (t(25) = 6.94; p < 0.001) and the 1PH vs. SAME (t(25) = 5.36; p < 0.001). However, there was no significant difference between the DIFF and 1PH conditions (t(25) = 1.70; p = 0.3039). Adaptation Patterns to Pseudowords in the AWFA Exhibit Lexical Selectivity After but Not Before Familiarization Next, in the pre- and post-training scans (UTPW and TPW, respectively), we tested the hypoth- esis that familiarization with previously novel PWs drives the formation of lexical selectivity in the AWFA. To do so, we examined the adaptation profiles for RWs, UTPWs, and TPWs in sub- jects who had completed all three scans (n = 16). We ran a 2-way repeated measures ANOVA to investigate the relationship between lexicality (RW, UTPW, and TPW) and similarity (SAME, 1PH, and DIFF). This revealed a significant main effect of similarity (F2,30 = 43.023; p = 2.95E- 9) but not a significant main effect of lexicality (F2,30 = 2.398; p = 0.116). Critically, there was again a significant interaction between lexicality and similarity (F4,60 = 4.144; p = 0.012). Neurobiology of Language 428 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d / . l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon Figure 4. Auditory lexical representations emerge in the auditory word form area (AWFA) for pseu- dowords after familiarization training. Within-subject (n = 16) adaptation profile for auditory real words (RW), untrained pseudowords (UTPW), and trained pseudowords (TPW). RW adaptation profile shows tuning to individual RWs consistent with an auditory lexicon. UTPWs show a graded release from adaptation as a function of phonological similarity. Importantly, following familiariza- tion training, adaptation patterns in the AWFA to the same pseudowords (now TPW) reveal tight lexical tuning, similar to RW. Horizontal black line in violin plots indicates the median. ***, **, *, and N.S. mark p < 0.001, <0.01, <0.05, and not significant (>0.1), all Bonferroni-corrected.

Consistent with the full data set (Cifra 3), planned paired t tests revealed an adaptation
profile that was consistent with tight neural tuning for individual RWs (Cifra 4). Había
a significant difference in the mean percent signal change between the DIFF vs. SAME condi-
ciones (t(15) = 7.28; pag < 0.001) and 1PH vs. SAME (t(15) = 5.85; p < 0.001). However, there was no significant difference between the DIFF and 1PH conditions (t(15) = 0.750; p = 1). In con- trast, the adaptation profile for UTPW was not consistent with lexical selectivity (Figure 4): There was a significant difference between the DIFF vs. SAME conditions (t(15) = 7.84; p < 0.001) and the DIFF vs. 1PH conditions (t(15) = 4.57; p = 0.001), but not 1PH vs. SAME (t(15) = 2.16; p = 0.141). Crucially, the adaptation profile for PW after training (i.e., TPW) was consistent with lexical selectivity (Figure 4): There was a significant difference in the mean percent signal between the DIFF vs. SAME conditions (t(15) = 5.55; p < 0.001) and the 1PH vs. SAME (t(15) = 3.17; p = 0.019). However, there was no significant difference between the DIFF and 1PH conditions (t(15) = 1.71; p = 0.327). The AWFA Connects to the Language Network We next calculated task-based functional connectivity in the auditory localizer data set (n = 26) to examine the connections between the AWFA and the rest of the brain. This seed-to- voxel analysis (Figure 5) showed that the AWFA is highly connected with brain regions previ- ously shown (Wilson et al., 2004) to be involved in language processing such as the inferior frontal gyrus (local peak at MNI −46, 12, 26) and premotor cortex (local peak at MNI −50, −8, 48). Especially noteworthy is the connectivity between the AWFA and a cluster in the left posterior fusiform cortex (circled in red; MNI: −44, −46, −14) that encompases the reported location of the VWFA (MNI: −45, −54, −20; Dehaene et al., 2005; Kronbichler et al., 2004). Crucially, subjects never saw any words in print during the auditory localizer scan, thereby precluding the possibility of spurious correlations between these regions. Neurobiology of Language 429 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d / . l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon Figure 5. Functional connectivity of the auditory word form area (AWFA). Whole-brain functional connectivity of the AWFA during the auditory localizer task (n = 26). Results are thresholded at a voxel-wise p < 0.001 and cluster-level p < 0.05, family-wise error corrected. Cluster corresponding to the literature coordinates of the visual word form area is circled in red. Color bar represents t statistic. DISCUSSION Cognitive models of speech comprehension (Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986; Morton, 1969) have proposed the existence of an auditory lexicon. More recent models of speech comprehension in the brain (Hickok & Poeppel, 2007; Rauschecker & Scott, 2009) have posited the existence of such an auditory lexicon in an auditory ventral what stream. Yet, significant disagreement exists between these models about the location of such an auditory lexicon, and no studies to date have directly tested the existence of an auditory lexicon for speech comprehension in the brain. Prior work (Glezer et al., 2009, 2015, 2016), has used fMRI RA techniques to establish the existence of an orthographic lexicon in a region of the visual ventral stream—the VWFA (subsequently confirmed by human neurophysiolog- ical recordings; Hirshorn et al., 2016; Lochy et al., 2018; Woolnough et al., 2021). In the pres- ent study, we leveraged these techniques to test the existence of an auditory lexicon in the auditory ventral stream, particularly in the AWFA of the anterior STG (Cohen et al., 2004; DeWitt & Rauschecker, 2012). We first defined the ROI for an individual AWFA through an independent auditory localizer scan. We then showed in RW and UTPW RA scans that, con- sistent with an auditory lexicon, spoken RWs engaged distinct neural populations in the AWFA, with even a single phoneme change causing full release from adaptation, whereas PWs did not exhibit this lexical adaptation profile but instead showed a graded release from adaptation as a function of phoneme overlap. This graded release from adaptation suggests that neurons in the AWFA, like the VWFA (Glezer et al., 2009), exhibit broader tuning to PWs due to experience-driven refinement of tuning of neurons to RWs but not PWs. Thus, these results directly replicated analogous findings of highly selective lexical tuning to RWs Neurobiology of Language 430 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon but not PWs in the VWFA (Glezer et al., 2009). Then, in the TPW RA scan, we showed that training subjects to recognize a set of PWs led to the development of lexical selectivity for these TPWs, again replicating previous results for written words in the VWFA (Glezer et al., 2015). Finally, the AWFA was connected to an area in the left posterior fusiform cortex coin- cident with the reported location of the VWFA, further supporting analogous roles of the AWFA and VWFA in the processing of spoken and written words, respectively. Our novel evidence of auditory lexical representations in the brain informs current cognitive models of speech comprehension. While all these models map acoustic input to meaning, they disagree on whether auditory lexica exist (Coltheart, 2004; Woollams, 2015). Our data not only present compelling evidence for the existence of an auditory lexicon—they also place its loca- tion in the anterior STG where it is ideally suited to interface with semantic representations located further anteriorly in the temporal lobe (Lambon Ralph et al., 2017; Ueno et al., 2011), thereby completing the mapping of speech sounds to meaning. Furthermore, this anterior STG location is consistent with prior studies demonstrating other familiar auditory objects (Griffiths & Warren, 2004), such as human voices (Belin et al., 2000; Bodin et al., 2021; Staib & Frühholz, 2023) or musical instruments (Leaver & Rauschecker, 2010). Moreover, such a simple-to- complex progression in selectivity from simple perceptual features over lexical representations to semantic representations in the anteroventral auditory processing stream along the STC is a direct counterpart of the ventral visual stream in the inferior temporal cortex (Damera et al., 2020; Kravitz et al., 2013), revealing convergent processing strategies across speech modalities. FUNDING INFORMATION Maximilian Riesenhuber, National Science Foundation (https://dx.doi.org/10.13039 /100000001), Award ID: BCS-1756313. Maximilian Riesenhuber, National Science Founda- tion (https://dx.doi.org/10.13039/100000001), Award ID: ACI-1548562. Ashley VanMeter, Foundation for the National Institutes of Health (https://dx.doi.org/10.13039/100000009), Award ID: 1S10OD023561. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l AUTHOR CONTRIBUTIONS Srikanth R. Damera: Conceptualization; Formal analysis; Investigation; Methodology; Project administration; Visualization; Writing – original draft; Writing – reviewing & editing. Lillian Chang: Investigation; Writing – reviewing & editing. Plamen P. Nikolov: Investigation; Writing – reviewing & editing. James A. Mattei: Investigation; Writing – reviewing & editing. Suneel Banerjee: Investigation; Writing – reviewing & editing. Laurie S. Glezer: Conceptualization; Methodology. Patrick H. Cox: Conceptualization; Methodology. Xiong Jiang: Conceptual- ization; Methodology; Project administration; Writing – reviewing & editing. Josef P. Rauschecker: Conceptualization; Project administration; Writing – reviewing & editing. Maximilian Riesenhuber: Conceptualization; Methodology; Project administration; Writing – reviewing & editing. f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 DATA AVAILABILITY STATEMENT The data that support the findings of this study are openly available at https://osf.io/pbuh5/. REFERENCES Archakov, D., DeWitt, I., Kuśmierek, P., Ortiz-Rios, M., Cameron, D., Cui, D., Morin, E. L., VanMeter, J. W., Sams, M., Jääskeläinen, I. P., & Rauschecker, J. P. (2020). Auditory representation of learned sound sequences in motor regions of the macaque brain. Proceedings of the National Academy of Sciences, 117(26), 15242–15252. https:// doi.org/10.1073/pnas.1915610117, PubMed: 32541016 Ashburner, J., Barnes, G., Chen, C.-C., Daunizeau, J., Flandin, G., Friston, K., Gitelman, D., Glauche, V., Henson, R., Hutton, C., Neurobiology of Language 431 Evidence for a spoken word lexicon Jafarian, A., Kiebel, S., Kilner, J., Litvak, V., Mattout, J., Moran, R., Penny, W., Phillips, C., Razi, A., … Zeidman, P. (2021). SPM12 manual. Wellcome Centre for Human Neuroimaging. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758 /BF03193014, PubMed: 17958156 Behzadi, Y., Restom, K., Liau, J., & Liu, T. T. (2007). A component based noise correction method (CompCor) for BOLD and perfu- sion based fMRI. NeuroImage, 37(1), 90–101. https://doi.org/10 .1016/j.neuroimage.2007.04.042, PubMed: 17560126 Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice- selective areas in human auditory cortex. Nature, 403(6767), 309–312. https://doi.org/10.1038/35002078, PubMed: 10659849 Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J. N., & Possing, E. T. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cere- bral Cortex, 10(5), 512–528. https://doi.org/10.1093/cercor/10.5 .512, PubMed: 10847601 Bodin, C., Trapeau, R., Nazarian, B., Sein, J., Degiovanni, X., Baurberg, J., Rapha, E., Renaud, L., Giordano, B. L., & Belin, P. (2021). Functionally homologous representation of vocalizations in the auditory cortex of humans and macaques. Current Biology, 31(21), 4839–4844. https://doi.org/10.1016/j.cub.2021.08.043, PubMed: 34506729 Bogen, J. E., & Bogen, G. M. (1976). Wernicke’s region—Where is it? Annals of the New York Academy of Sciences, 280(1), 834–843. https://doi.org/10.1111/j.1749-6632.1976.tb25546.x, PubMed: 1070943 Brown, J. C. (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434. https://doi.org/10.1121/1.400476 Buračas, G. T., & Boynton, G. M. (2002). Efficient design of event-related fMRI experiments using M-sequences. Neuro- Image, 16(3, Pt. A), 801–813. https://doi.org/10.1006/nimg .2002.1116, PubMed: 12169264 Chevillet, M. A., Jiang, X., Rauschecker, J. P., & Riesenhuber, M. (2013). Automatic phoneme category selectivity in the dorsal audi- tory stream. Journal of Neuroscience, 33(12), 5208–5215. https:// doi.org/10.1523/JNEUROSCI.1870-12.2013, PubMed: 23516286 Cohen, L., Jobert, A., Bihan, D. L., & Dehaene, S. (2004). Distinct unimodal and multimodal regions for word processing in the left temporal cortex. NeuroImage, 23(4), 1256–1270. https://doi.org /10.1016/j.neuroimage.2004.07.052, PubMed: 15589091 Coltheart, M. (2004). Are there lexicons? Quarterly Journal of Experimental Psychology Section A, 57(7), 1153–1172. https:// doi.org/10.1080/02724980443000007, PubMed: 15513241 Damera, S. R., Malone, P. S., Stevens, B. W., Klein, R., Eberhardt, S. P., Auer, E. T., Bernstein, L. E., & Riesenhuber, M. (2021). Metamodal coupling of vibrotactile and auditory speech process- ing systems through matched stimulus representations. bioRxiv. https://doi.org/10.1101/2021.05.04.442660 Damera, S. R., Martin, J. G., Scholl, C., Kim, J. S., Glezer, L., Malone, P. S., & Riesenhuber, M. (2020). From shape to mean- ing: Evidence for multiple fast feedforward hierarchies of concept processing in the human brain. NeuroImage, 221, Article 117148. https://doi.org/10.1016/j.neuroimage.2020.117148, PubMed: 32659350 Dehaene, S., & Cohen, L. (2011). The unique role of the visual word form area in reading. Trends in Cognitive Sciences, 15(6), 254–262. https://doi.org/10.1016/j.tics.2011.04.003, PubMed: 21592844 Dehaene, S., Cohen, L., Sigman, M., & Vinckier, F. (2005). The neural code for written words: A proposal. Trends in Cognitive Sciences, 9(7), 335–341. https://doi.org/10.1016/j.tics.2005.05 .004, PubMed: 15951224 DeWitt, I., & Rauschecker, J. P. (2012). Phoneme and word recog- nition in the auditory ventral stream. Proceedings of the National Academy of Sciences, 109(8), E505–E514. https://doi.org/10 .1073/pnas.1113427109, PubMed: 22308358 DeWitt, I., & Rauschecker, J. P. (2013). Wernicke’s area revisited: Parallel streams and word processing. Brain and Language, 127(2), 181–191. https://doi.org/10.1016/j.bandl.2013.09.014, PubMed: 24404576 Evans, S., Kyong, J. S., Rosen, S., Golestani, N., Warren, J. E., McGettigan, C., Mourão-Miranda, J., Wise, R. J. S., & Scott, S. K. (2014). The pathways for intelligible speech: Multivariate and univariate perspectives. Cerebral Cortex, 24(9), 2350–2361. https://doi.org/10.1093/cercor/bht083, PubMed: 23585519 Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47. https://doi.org/10.1093/cercor/1.1.1-a, PubMed: 1822724 Geschwind, N. (1970). The organization of language and the brain: Language disorders after brain damage help in elucidating the neural basis of verbal behavior. Science, 170(3961), 940–944. https://doi.org/10.1126/science.170.3961.940, PubMed: 5475022 Glezer, L. S., Eden, G., Jiang, X., Luetje, M., Napoliello, E., Kim, J., & Riesenhuber, M. (2016). Uncovering phonological and ortho- graphic selectivity across the reading network using fMRI-RA. NeuroImage, 138, 248–256. https://doi.org/10.1016/j .neuroimage.2016.05.072, PubMed: 27252037 Glezer, L. S., Jiang, X., & Riesenhuber, M. (2009). Evidence for highly selective neuronal tuning to whole words in the “visual word form area.” Neuron, 62(2), 199–204. https://doi.org/10 .1016/j.neuron.2009.03.017, PubMed: 19409265 Glezer, L. S., Kim, J., Rule, J., Jiang, X., & Riesenhuber, M. (2015). Adding words to the brain’s visual dictionary: Novel word learn- ing selectively sharpens orthographic representations in the VWFA. Journal of Neuroscience, 35(12), 4965–4972. https://doi .org/10.1523/JNEUROSCI.4031-14.2015, PubMed: 25810526 Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience, 5(11), 887–892. https://doi.org/10 .1038/nrn1538, PubMed: 15496866 Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional properties of human cortical neurons. Acta Psychologica, 107(1–3), 293–321. https://doi.org/10.1016 /S0001-6918(01)00019-1, PubMed: 11388140 Hamilton, L. S., Edwards, E., & Chang, E. F. (2018). A spatial map of onset and sustained responses to speech in the human superior temporal gyrus. Current Biology, 28(12), 1860–1871. https://doi .org/10.1016/j.cub.2018.04.033, PubMed: 29861132 Henson, R., Shallice, T., & Dolan, R. (2000). Neuroimaging evi- dence for dissociable forms of repetition priming. Science, 287(5456), 1269–1272. https://doi.org/10.1126/science.287 .5456.1269, PubMed: 10678834 Hickok, G., Houde, J., & Rong, F. (2011). Sensorimotor integration in speech processing: Computational basis and neural organiza- tion. Neuron, 69(3), 407–422. https://doi.org/10.1016/j.neuron .2011.01.019, PubMed: 21315253 Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. https://doi.org/10.1038/nrn2113, PubMed: 17431404 Hirshorn, E. A., Li, Y., Ward, M. J., Richardson, M. R., Fiez, J. A., & Ghuman, A. (2016). Decoding and disrupting left midfusiform Neurobiology of Language 432 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon gyrus activity during word reading. Proceedings of the National Academy of Sciences, 113(29), 8162–8167. https://doi.org/10 .1073/pnas.1604126113, PubMed: 27325763 Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture: Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London, Series B: Biological Sciences, 198(1130), 1–59. https://doi.org/10.1098/rspb.1977.0085, PubMed: 20635 Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E., & Chang, E. F. (2016). Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stim- uli. Journal of Neuroscience, 36(6), 2014–2026. https://doi.org /10.1523/JNEUROSCI.1779-15.2016, PubMed: 26865624 Jasmin, K., Lima, C. F., & Scott, S. K. (2019). Understanding rostral– caudal auditory cortex contributions to auditory perception. Nature Reviews Neuroscience, 20(7), 425–434. https://doi.org /10.1038/s41583-019-0160-2, PubMed: 30918365 Jiang, X., Chevillet, M. A., Rauschecker, J. P., & Riesenhuber, M. (2018). Training humans to categorize monkey calls: Auditory feature- and category-selective neural tuning changes. Neuron, 98(2), 405–416. https://doi.org/10.1016/j.neuron.2018.03.014, PubMed: 29673483 Kajikawa, Y., Frey, S., Ross, D., Falchier, A., Hackett, T. A., & Schroeder, C. E. (2015). Auditory properties in the parabelt regions of the superior temporal gyrus in the awake macaque monkey: An initial survey. Journal of Neuroscience, 35(10), 4140–4150. https://doi.org/10.1523/JNEUROSCI.3556-14.2015, PubMed: 25762661 Kell, A., Yamins, D., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630–644. https://doi.org/10.1016/j.neuron.2018.03.044, PubMed: 29681533 Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26–49. https://doi.org/10.1016/j.tics .2012.10.011, PubMed: 23265839 Krekelberg, B., Boynton, G. M., & van Wezel, R. J. A. (2006). Adap- tation: From single cells to BOLD signals. Trends in Neurosci- ences, 29(5), 250–256. https://doi.org/10.1016/j.tins.2006.02 .008, PubMed: 16529826 Kronbichler, M., Hutzler, F., Wimmer, H., Mair, A., Staffen, W., & Ladurner, G. (2004). The visual word form area and the fre- quency with which words are encountered: Evidence from a parametric fMRI study. NeuroImage, 21(3), 946–953. https://doi .org/10.1016/j.neuroimage.2003.10.021, PubMed: 15006661 Laird, A. R., Fox, P. M., Price, C. J., Glahn, D. C., Uecker, A. M., Lancaster, J. L., Turkeltaub, P. E., Kochunov, P., & Fox, P. T. (2005). ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Human Brain Mapping, 25(1), 155–164. https://doi.org/10.1002/hbm.20136, PubMed: 15846811 Lambon Ralph, M. A., Jefferies, E., Patterson, K., & Rogers, T. T. (2017). The neural and computational bases of semantic cogni- tion. Nature Reviews Neuroscience, 18(1), 42–55. https://doi.org /10.1038/nrn.2016.150, PubMed: 27881854 Leaver, A. M., & Rauschecker, J. P. (2010). Cortical representation of natural complex sounds: Effects of acoustic features and auditory object category. Journal of Neuroscience, 30(22), 7604–7612. https://doi.org/10.1523/JNEUROSCI.0296-10.2010, PubMed: 20519535 Liuzzi, A. G., Bruffaerts, R., & Vandenberghe, R. (2019). The medial temporal written word processing system. Cortex, 119, 287–300. https://doi.org/10.1016/j.cortex.2019.05.002, PubMed: 31174078 Lochy, A., Jacques, C., Maillard, L., Colnat-Coulbois, S., Rossion, B., & Jonas, J. (2018). Selective visual representation of letters and words in the left ventral occipito-temporal cortex with intra- cerebral recordings. Proceedings of the National Academy of Sci- ences, 115(32), E7595–E7604. https://doi.org/10.1073/pnas .1718987115, PubMed: 30038000 Malone, P. S., Glezer, L. S., Kim, J., Jiang, X., & Riesenhuber, M. (2016). Multivariate pattern analysis reveals category-related organization of semantic representations in anterior temporal cortex. Journal of Neuroscience, 36(39), 10089–10096. https:// doi.org/10.1523/ JNEUROSCI.1599-16.2016, PubMed: 27683905 Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10(1), 29–63. https://doi.org/10 .1016/0010-0285(78)90018-X McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1–86. https:// doi.org/10.1016/0010-0285(86)90015-0 Medler, D. A., & Binder, J. R. (2005). MCWord: An on-line ortho- graphic database of the English language. Language Imaging Laboratory, Medical College of Wisconsin. https://www.neuro .mcw.edu/mcword/ Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76(2), 165–178. https://doi.org/10.1037 /h0027366 Okada, K., Rong, F., Venezia, J., Matchin, W., Hsieh, I.-H., Saberi, K., Serences, J. T., & Hickok, G. (2010). Hierarchical organiza- tion of human auditory cortex: Evidence from acoustic invari- ance in the response to intelligible speech. Cerebral Cortex, 20(10), 2486–2495. https://doi.org/10.1093/cercor/ bhp318, PubMed: 20100898 Ortiz-Rios, M., Kuśmierek, P., DeWitt, I., Archakov, D., Azevedo, F. A. C., Sams, M., Jääskeläinen, I. P., Keliris, G. A., & Rauschecker, J. P. (2015). Functional MRI of the vocalization- processing network in the macaque brain. Frontiers in Neurosci- ence, 9, Article 113. https://doi.org/10.3389/fnins.2015.00113, PubMed: 25883546 Perrachione, T. K., & Ghosh, S. S. (2013). Optimized design and analysis of sparse-sampling fmri experiments. Frontiers in Neuro- science, 7, Article 55. https://doi.org/10.3389/fnins.2013.00055, PubMed: 23616742 Rauschecker, J. P. (1998). Cortical processing of complex sounds. Current Opinion in Neurobiology, 8(4), 516–521. https://doi.org /10.1016/S0959-4388(98)80040-8, PubMed: 9751652 Rauschecker, J. P. (2011). An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hearing Research, 271(1–2), 16–25. https://doi.org/10.1016/j.heares .2010.09.001, PubMed: 20850511 Rauschecker, J. P. (2018). Where, when, and how: Are they all sen- sorimotor? Towards a unified view of the dorsal pathway in vision and audition. Cortex, 98, 262–268. https://doi.org/10 .1016/j.cortex.2017.10.020, PubMed: 29183630 Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience, 12(6), 718–724. https://doi .org/10.1038/nn.2331, PubMed: 19471271 Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings Neurobiology of Language 433 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Evidence for a spoken word lexicon of the National Academy of Sciences, 97(22), 11800–11806. https://doi.org/10.1073/pnas.97.22.11800, PubMed: 11050212 Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science, 268(5207), 111–114. https://doi.org/10.1126/science .7701330, PubMed: 7701330 Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. (2000). Identifica- tion of a pathway for intelligible speech in the left temporal lobe. Brain, 123(12), 2400–2406. https://doi.org/10.1093/ brain/123 .12.2400, PubMed: 11099443 Staib, M., & Frühholz, S. (2023). Distinct functional levels of human voice processing in the auditory cortex. Cerebral Cortex, 33(4), 1170–1185. https://doi.org/10.1093/cercor/ bhac128, PubMed: 35348635 Tian, B., Reser, D., Durham, A., Kustov, A., & Rauschecker, J. P. (2001). Functional specialization in rhesus monkey auditory cor- tex. Science, 292(5515), 290–293. https://doi.org/10.1126 /science.1058911, PubMed: 11303104 Ueno, T., Saito, S., Rogers, T. T., & Lambon Ralph, M. A. (2011). Lichtheim 2: Synthesizing aphasia and the neural basis of lan- guage in a neurocomputational model of the dual dorsal-ventral language pathways. Neuron, 72(2), 385–396. https://doi.org/10 .1016/j.neuron.2011.09.013, PubMed: 22017995 van der Heijden, K., Rauschecker, J. P., de Gelder, B., & Formisano, E. (2019). Cortical mechanisms of spatial hearing. Nature Reviews Neuroscience, 20(10), 609–623. https://doi.org/10 .1038/s41583-019-0206-5, PubMed: 31467450 Vinckier, F., Dehaene, S., Jobert, A., Dubus, J., Sigman, M., & Cohen, L. (2007). Hierarchical coding of letter strings in the ventral stream: Dissecting the inner organization of the visual word-form system. Neuron, 55(1), 143–156. https://doi.org/10 .1016/j.neuron.2007.05.031, PubMed: 17610823 Whitfield-Gabrieli, S., & Nieto-Castanon, A. (2012). Conn: A func- tional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connectivity, 2(3), 125–141. https://doi .org/10.1089/brain.2012.0073, PubMed: 22642651 Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech activates motor areas involved in speech pro- duction. Nature Neuroscience, 7(7), 701–702. https://doi.org/10 .1038/nn1263, PubMed: 15184903 Woollams, A. M. (2015). Lexical is as lexical does: Computational approaches to lexical representation. Language, Cognition and Neuroscience, 30(4), 395–408. https://doi.org/10.1080 /23273798.2015.1005637, PubMed: 25893204 Woolnough, O., Donos, C., Rollo, P. S., Forseth, K. J., Lakretz, Y., Crone, N. E., Fischer-Baum, S., Dehaene, S., & Tandon, N. (2021). Spatiotemporal dynamics of orthographic and lexical processing in the ventral visual pathway. Nature Human Behaviour, 5(3), 389–398. https://doi.org/10.1038/s41562-020-00982-w, PubMed: 33257877 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u n o / l / l a r t i c e - p d f / / / / 4 3 4 2 0 2 1 4 5 1 5 1 n o _ a _ 0 0 1 0 8 p d . / l f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Neurobiology of Language 434 ARTÍCULO DE INVESTIGACIÓN imagen

Descargar PDF