Going Beyond Rote Auditory Learning: Neural Patterns

Going Beyond Rote Auditory Learning: Neural Patterns
of Generalized Auditory Learning

Shannon L. M. Heald1

, Stephen C. Van Hedger2, John Veillette1,

Katherine Reis1, Joel S. Snyder3, and Howard C. Nusbaum1

Astratto

■ The ability to generalize across specific experiences is vital
for the recognition of new patterns, especially in speech percep-
tion considering acoustic–phonetic pattern variability. Infatti,
behavioral research has demonstrated that listeners are able
via a process of generalized learning to leverage their experi-
ences of past words said by difficult-to-understand talker to
improve their understanding for new words said by that talker.
Here, we examine differences in neural responses to generalized
versus rote learning in auditory cortical processing by training
listeners to understand a novel synthetic talker. Using a
pretest–posttest design with EEG, participants were trained
using either (1) a large inventory of words where no words were
repeated across the experiment (generalized learning) O (2) UN
small inventory of words where words were repeated (rote

apprendimento). Analysis of long-latency auditory evoked potentials
at pretest and posttest revealed that rote and generalized learn-
ing both produced rapid changes in auditory processing, yet the
nature of these changes differed. Generalized learning was
marked by an amplitude reduction in the N1–P2 complex and
by the presence of a late negativity wave in the auditory evoked
potential following training; rote learning was marked only by
temporally later scalp topography differences. The early N1–P2
change, found only for generalized learning, is consistent with
an active processing account of speech perception, which pro-
poses that the ability to rapidly adjust to the specific vocal char-
acteristics of a new talker (for which rote learning is rare) relies
on attentional mechanisms to selectively modify early auditory
processing sensitivity.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

INTRODUCTION

A fundamental problem faced by all theories of speech per-
ception is to explain how listeners understand speech
despite extensive variability and noise in acoustic patterns
across talkers and contexts. One explanation is that listeners
overcome acoustic–linguistic variability by remapping the
relationship of acoustic cues to linguistic categories by gen-
eralizing across their recent experiences (Weatherholtz &
Jaeger, 2016; Heald & Nusbaum, 2014). Under this view, lis-
teners leverage their past experiences with a talker to pre-
sumably form an abstract representation of the talker’s
acoustic–phonetic (vocal) space that ultimately can be used
by the listener to better modify attention toward the most
diagnostic acoustic cues for that talker. This form of learn-
ing has the benefit of improving recognition for even previ-
ously unheard words said by the same talker. Tuttavia,
many studies investigating the neural correlates of general-
ized learning in such settings have focused on generaliza-
tion acquired after long-term rote training (Tremblay, Ross,
Inoue, McClannahan, & Collet, 2014; Ross & Tremblay,
2009), where listeners are trained and tested on a small
set of repeating words. Although rote training may be one

1The University of Chicago, 2Huron University College, London,
Canada, 3University of Nevada, Las Vegas

© 2022 Istituto di Tecnologia del Massachussetts

way to rapidly learn the meaning associated with a small set
of acoustic patterns (Fenn, Margoliash, & Nusbaum, 2013),
it is not the most effective way of producing generalization
(Fenn et al., 2013; Greenspan, Nusbaum, & Pisoni, 1988).
Invece, broad exposure to a variety of patterns promotes
rapid learning of general perceptual categories, particularly
for speech (generalized training; Heald & Nusbaum, 2014).
The ability for listeners to generalize beyond their per-
ceptual experiences to novel acoustic patterns has been
shown to depend on the type of experience or training a
learner is given. When participants are trained on a
difficult-to-understand computer-generated (synthetic)
talker, a listener’s ability to generalize beyond the words
in the training set has been shown to depend on whether
they were given all novel words during training (general-
ized training) or if they were given a small set of words that
repeated (rote training; Greenspan et al., 1988). Partici-
pants who are given all novel words during training
demonstrate significantly better generalization compared
with participants who were trained on a small set of words
that repeated. The notion that equal amounts of rote and
generalized training yield different performance outcomes
suggests that they may be mediated by different pro-
cesses. The work of Fenn et al. (2013) supported this idea
by showing that memory consolidation during sleep
selectively benefited generalized learning, but not rote

Journal of Cognitive Neuroscience 34:3, pag. 425–444
https://doi.org/10.1162/jocn_a_01805

apprendimento. Tuttavia, this conjecture of different neural pro-
cesses underlying online rote and generalized learning
has not been directly tested.

The evidence that type of training (rote vs. generalized)
can determine the degree to which learning will transfer
beyond previous experiences raises questions as to what
cognitive and neural mechanisms allow for such transfer
of learning or generalization. Although the neural under-
pinnings of rapid generalized learning have, to our knowl-
edge, not been empirically examined, rapid generalized
learning has been described cognitively as being depen-
dent on the mechanism of selective attention. From a
cognitive view, rapid generalized learning improves per-
ception by orienting attention toward the most phoneti
cally relevant acoustic cues and away from irrelevant ones
for a given circumstance (Francis & Nusbaum, 2002;
Goldstone, 1998; Nosofsky, 1986). In the context of
speech perception, the process of generalized learning
has been utilized to understand how listeners come to
learn a difficult-to-understand talker given more listening
experience. The emphasis on selective attention in the
context of learning a difficult-to-understand talker, COME
opposed to one that emphasizes, Dire, learning new per-
ceptual categories, stems from the idea that adult listeners
already possess a complete phonological category system
(Liberman, 1970; Chomsky & Halle, 1968). As such, UN
listener adjusting to the circumstance of trying to under-
stand a difficult-to-understand talker—provided they are
speaking the same language—has been discussed as a pro-
cess of narrowing attention toward the most diagnostic
acoustic–phonetic cues for the given talker and away from
uninformative ones. Infatti, the work of Francis and
Nusbaum (2009) has shown that generalized learning
modifies the way available attentional and working mem-
ory capacity is used, which has led many to draw on
resource allocation models of perception (per esempio., Lavie,
1995) to explain how training leads to such improvements
(per esempio., Heald & Nusbaum, 2014). According to these cogni-
tive accounts, the initial poor intelligibility manifests
because listeners do not know which acoustic cues to
focus their attention on to derive meaning appropriately,
and as such, ongoing recognition is associated with higher
attentional and working memory costs. Following rapid
generalized learning, Tuttavia, this selective attention
account suggests that listening will be much less effortful
for new words spoken by the same talker, as listeners are
able to shift attention to the subset of acoustic features
that are most diagnostic of the phonemes produced by
the talker (Francis, Baldwin, & Nusbaum, 2000). For this
reason, training on synthetic speech offers a promising
way to investigate how selective attention works in the
context of perceptual learning.

Although synthetic speech learning has been used as a
model to understand how listeners adapt to difficult-
to-understand speech (per esempio., foreign accented, deaf, dys-
arthric, or time compressed), it has been argued that the
processes underlying synthetic speech learning may be

reflective of learning mechanisms that are critical in
speech processing in general (see Heald & Nusbaum,
2014). Specifically, generalized learning mechanisms
have been used to understand how listeners overcome
the huge amount of acoustic variability in their listening
ambiente. Even when speech is putatively easy to
understand, listeners encounter many circumstances
when the underlying acoustic-to-phonetic mapping
changes—such as a shift in talker, speaking rate, or social
register (cf. Mugnaio, 1987; Liberman, Cooper, Shankweiler,
& Studdert-Kennedy, 1967; Ladefoged & Broadbent,
1957). In these cases, behavioral work shows that a
momentary increase in load on attentional and working
memory resources occurs (per esempio., Magnuson & Nusbaum,
2007), potentially indicative of a learning process in which
listeners must determine which acoustic cues are most
diagnostic in that setting. For this reason, neural changes
found to underlie synthetic speech learning may indeed
be pertinent to speech perception in general. Infatti,
the application of generalized learning mechanisms likely
extends beyond speech perception, as this approach to
speech perception (where the acoustic–phonetic map-
pings are dynamically modified by experience) is consis-
tent with general category learning models, dove il
need to remap is tied to changes in how cues relate to cat-
egory membership (per esempio., Goldstone, 1994).

Neuroimaging studies have shown that instruction to
selectively attend to phonetic content modulates activity
in anterior parts of the auditory cortex, whereas instruc-
tion to selectively attend to a spatial location modulates
posterior activity ( Woods et al., 2009; Ahveninen et al.,
2006; Petkov et al., 2004). This dissociation—likely related
to the “what” and “where” processing streams proposed
by Rauschecker and Tian (2000)—is mirrored in electro-
physiological studies demonstrating that the combined
activity of these two sources (anterior and posterior)
contributes to the morphology of the N1, a negative
peak around 100 msec in the auditory evoked potential
(McEvoy, Levänen, & Loveless, 1997). The electrophysio-
logical measures indicate that activity in anterior parts of
the auditory cortex has a longer latency than activity aris-
ing from the posterior source, which has been argued to
reflect why the process of identifying an object takes lon-
ger than recognizing its spatial origin (Picton, 2011). As
come, although N1 as a whole has been argued as a marker
of attention, the more posterior, earlier-latency N1 source
appears to support the gating of awareness to novel
sounds, and the more anterior, later-latency N1 source
supports a subsequent attentional focus to acoustic fea-
tures comprising the auditory object (Gutschalk, Micheyl,
Oxenham, von Kriegstein, & Warren, 2008; Jääskeläinen
et al., 2004; Tiitinen, May, Reinikainen, & Näätänen,
1994). This differentiation between early and late N1
sources relates to earlier the work of McCallum and
Curry (1979), who argued that the N1 wave should be dif-
ferentiated into separate waves. Specifically, McCallum
and Curry (1979) proposed that the N1 wave should be

426

Journal of Cognitive Neuroscience

Volume 34, Numero 3

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

approached as three separate waves, an N1a wave (con un
frontotemporal maximal peak at ∼70 msec), an N1b wave
(with a vertex maximal peak at ∼100 msec), and an N1c
wave (with a temporal maximal peak at ∼140 msec). Adopt-
ing this framework, Picton (2011) has speculated that the
N1c wave can be recognized as arising from this anterior,
longer-latency N1 source. Given that selective attention
is thought to exclusively alter how attention is aligned
to featural information in phoneme recognition for gener-
alized learning, we hypothesize that the rapid generalized
learning of a synthetic talker may exclusively alter the
longer-latency N1 activity (see Figure 1A).

Although neural studies on selective attention offer
some context to our current question, research on the
neural correlates of auditory perceptual learning can also
offer additional insight. Tuttavia, it is important to note
that generalized perceptual learning of synthetic speech
marks a departure from other paradigms used to study
the neural underpinnings of perceptual learning. Primo,
extant neural studies investigating perceptual learning
have almost exclusively focused on rote (not generalized)
perceptual learning, in which participants are repeatedly
trained and tested on the same, small set of stimuli. Sez-
ond, in extant perceptual learning paradigms, participants
have been required to either learn (1) to differentially label
sounds that are functionally equivalent in their native lan-
guage (Tremblay et al., 2014; Ross & Tremblay, 2009) O
(2) to separately label two concurrently presented vowels
(with the same spatial origin; Alain & Snyder, 2008; Alain,
Snyder, Lui, & Reinke, 2007; Reinke, Lui, Wang, & Alain,
2003). Neither of these paradigms examines learning that
occurs at the phonological system level, which is critical
for understanding a difficult-to-understand talker. Piuttosto,
these paradigms emphasize the learning of tokens—
either labeling nonnative tokens (new phonological cate-
sanguinose) or distributing attention over known tokens in
novel ways in the case of labeling two stimuli at once.
For this reason, past paradigms that have been used to
understand perceptual learning may only offer partial
clues into what neural mechanisms support rapid general-
ized perceptual learning.

Perceptual learning paradigms where participants are
given experience with a novel phonetic contrast not in
their native language have documented that learning is
marked by an overall decrease in N1 amplitude (maximal
at vertex ∼100 msec; Alain, Campeanu, & Tremblay, 2010;
Ross & Tremblay, 2009). Work in this paradigm, Tuttavia,
has argued that this N1 change in this context may be a
consequence of habituation and not learning, as the stim-
ulus set only consists of two sounds played repeatedly,
often as participants passively listen (Tremblay et al.,
2014). Tuttavia, in more active tasks, in which partici-
pants are asked to rapidly learn to segregate concurrently
presented vowels, learning has been demonstrated to lead
to a positive shift in the ERP wave ∼130 msec from stimu-
lus onset in temporal electrodes (Alain & Snyder, 2008;
Alain et al., 2007). As previously mentioned, Picton

Figura 1. Schematic of hypothesized time of changes to the ERP signal
as a consequence of generalized learning compared with rote learning.
(UN) We hypothesize that an overall decrease in potentiation for the longer-
latency N1 source or N1c will be observed in the generalized learning
condition, but not in the rote learning condition, if the longer-latency N1
source or N1c is sensitive to the demands of attention toward features
comprising an auditory object. (B) We hypothesize that an overall decrease
in P2 potentiation will be observed in the generalized learning condition,
but for not the rote learning condition, if P2 is sensitive to the number
of active featural relationships searing current recognition. (C) Noi
hypothesize that an overall decrease in late negativity should be found in
the generalized learning condition, but not in the rote learning condition,
if late negativity is reflective of a prediction error correction process that
supports perception.

(2011) has argued that this temporal positive going shift
can be taken as a modulation in the longer-latency N1
source that supports a subsequent attentional focus to
acoustic features comprising the auditory object
(Gutschalk, Micheyl, & Oxenham, 2008; Jääskeläinen
et al., 2004; Tiitinen et al., 1994). As such, this finding sug-
gests that a similar positive change (cioè., smaller negative
amplitude) in the longer-latency N1 may be observed fol-
lowing the rapid generalized learning for a difficult-to-
understand synthetic talker (see Figure 1A).

In contrast to rapid generalized learning, rote learning
of speech tokens may be more similar to paired-associate
apprendimento. Successful paired-associate learning entails the
formation of associations between stimuli and associated
responses rather than the systematic relationships among
the speech tokens as a phonological system for a single
talker. If rapid rote learning of specific utterances from a
difficult-to-understand synthetic talker is characterized by
the encoding and retrieval of episodic memories, Poi
such learning will not be marked by a change in sensory
processing because there is no need to develop a system-
atic relationship between the talker’s phonetic idiosyncra-
sies and the native phonological system. Invece, listeners
may simply memorize the limited set of acoustic patterns
and their associated meanings. From this perspective,

Heald et al.

427

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

rapid rote learning should not modify attention to
acoustic–phonetic properties and therefore should not
influence longer-latency N1 activity (see Figure 1A).

Multiday rote perceptual learning experiments have
also reported an increase in the auditory evoked P2
response after 2 O 3 days of training (Bosnyak, Eaton, &
Roberts, 2004). The change in the auditory evoked P2
risposta, which occurs around 200 msec poststimulus
(Ross et al., 2013; Näätänen & Winkler, 1999) is thought
to reflect a relatively slow learning process, perhaps relat-
ing to the consolidation of a featural representation for the
nonnative phonetic contrast that participants are learning
in long-term memory (Tremblay et al., 2014; Ross &
Tremblay, 2009). In the context of learning to understand
a difficult-to-understand synthetic talker via generalized
apprendimento, it is unclear if a change in the auditory evoked
P2 response would be observed. If the change in the audi-
tory evoked P2 response marks the consolidation of a
newly learned featural representation in long-term mem-
ory, then P2 changes should not be observed following
generalized learning of a difficult-to-understand talker.
Tuttavia, given that the evoked P2 response has been
shown to increase after new perceptual categories have
been formed (Tremblay et al., 2014; Tremblay, Shahin,
Picton, & Ross, 2009), another interpretation is that the
auditory evoked P2 response may simply be sensitive to
the number of active featural representations serving cur-
rent recognition. Such an interpretation is consistent with
research showing that the P2 response is sensitive to spec-
tral complexity but only for experts who presumably rely
on more featural representations as spectral complexity
increases (Shahin, Roberts, Pantev, Trainor, & Ross,
2005) If the auditory evoked P2 response is sensitive to
the number of active feature representations that underlie
perceptual recognition, then rapid generalized learning of
a difficult-to-understand talker may indeed yield immedi-
ate effects on the auditory evoked P2 response. Specifi-
cally, if rapid generalized learning leads to a reduction in
the ambiguity of how acoustic patterns match to linguistic
categories by reducing the number of active feature repre-
sentations required for ongoing perception, we should
see an immediate reduction in the auditory evoked P2
response following training (see Figure 1B). In contrasto
to generalized learning, it is unlikely that any change
would be observed for rote learning of a difficult-to-
understand talker because rote learning of a difficult-to-
understand talker may rely more on the encoding and
retrieval of episodic memories. Consequently, rote learn-
ing of a difficult-to-understand talker should not alter the
number of active featural representations nor, by this
logic, should it elicit a change in the auditory P2 (Vedere
Figure 1B).

Although previous research examining cortical and sub-
cortical evoked activity associated with perceptual learn-
ing has largely focused on changes in the auditory N1
and P2 waves (sometimes referred to collectively as the
N1–P2 complex), a decrease in late negativity in the

auditory evoked potential starting 600 msec poststimulus
onset has also been found to be coincident with improved
perception following training (Tremblay et al., 2014).
Given that negative deflections in event-related potentials
are often affiliated with processes related to error correc-
zione (per esempio., N400, late difference negativity, and error-
related negativity), changes in the late negativity wave
posttraining may represent a change in an error monitor-
ing mechanism that supports learning. This is consistent
with perceptual learning models that specify that the reor-
ganization or formation of perceptual categories (Quale
are implicit in nature) should be dependent on a trial-by-
trial prediction error correction process (Ashby & O’Brien,
2005; Ashby, Alfonso-Reese, Turken, & Waldron, 1998).
Under this view, changes in late negativity in the auditory
evoked potential posttraining should only be found when
training leads to changes in the reorganization or forma-
tion of implicit categories (such as those that presumably
guide perception). As this view suggests that learning
should lead to an improvement in trial-by-trial prediction
error monitoring, as attention becomes appropriately
organized, we hypothesize that the late negativity wave
should lessen as a consequence of generalized learning,
but not rote learning (see Figure 1C).

The goal of this study is to examine how neural
responses produced by generalized learning or rote learn-
ing of synthetic speech differ. Specifically, we assess
whether generalized and rote learning are marked by
different changes in neural activity during early sensory
auditory processing (cioè., during the first 250 msec) or only
during later processing. To test this, we used a pretest–
training–posttest design while performing EEG. Noi
trained participants using either (1) a large inventory of
words in which no words were repeated across the exper-
iment (generalized learning) O (2) a small inventory of
words where words were repeated (rote learning; Vedere
Figura 2). Although participants in the rote learning con-
dition can adopt a simple memorization strategy, partici-
pants in the generalized learning condition cannot use
such a strategy as no words were repeated across the
experiment. Using 128 electrodes and nonparametric
significance testing using a permutation test procedure,
we compared auditory evoked potentials from stimuli at
pretest to those at posttest for each learning condition.
If resource allocation models of perception are correct
(Heald & Nusbaum, 2014), we should find that only gen-
eralized learning leads to changes in N1–P2 window post-
training. Such a finding would suggest that generalized
learning improves perception by constraining how
listeners selectively attend to and process acoustic infor-
mazione. Additionally, to the degree that an error correction
process is needed to guide the reorganization of attention
(as suggested by perceptual learning models, see Ashby &
O’Brien, 2005; Ashby et al., 1998), we should observe
decreases in the late negativity wave as listeners become
more successful in selectively attending to and processing
the difficult-to-understand speech.

428

Journal of Cognitive Neuroscience

Volume 34, Numero 3

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Figura 2. Schematic for trial structure for the generalized learning condition and rote learning condition at pretest, training, and posttest. Trial time
is relativized against initial word onset (black bolded screen). All trials start with a fixation cross (1250 msec before initial word onset), followed by a
blank screen (500 msec before initial word onset). A 1000 msec after initial word onset, participants are asked to type the word that they heard. In
training, this identification procedure was followed by visual written feedback in tandem with an additional auditory presentation of the initial word
(gray bolded screen). Participants in the rote condition were given an additional test of 100 novel words (not shown) to behaviorally assess their
generalization performance (EEG during this additional test was not recorded).

METHODS

Participants

Twenty-nine individuals participated in the generalized
learning portion of the experiment (M = 21.4 years,
SD = 3.94 years, age range: 18–38, 13 women, 2 left-
handed), E 33 participants participated in the rote
learning portion of the experiment (M = 20.3 years, SD =
2.19 years, age range: 18–26, 17 women, 5 left-handed).
These two groups of participants did not statistically differ
in terms of age, T(60) = 1.38, p = .17; genere (Fisher’s exact
test, p = .62); or handedness (Fisher’s exact test, p = .43).
All participants were recruited from the University of Chi-
cago and surrounding community. Participants were paid
or granted course credit for their participation. All partici-
pants identified as native English speakers, with no
reported history of either a hearing or speech disorder.
Upon completion of the task, all participants reported no
prior experience with the stimuli heard in the experiment.
Additionally, informed consent was obtained from all par-
ticipants, and the research protocol was approved by the
University of Chicago institutional review board.

Stimuli

Stimuli consisted of 500 monosyllabic words produced
using the text-to-speech synthesizer Rsynth (Ing-Simmons,
1994). These words were taken from or modeled after a
phonetically balanced inventory of words that approxi-
mate the distribution of phonemes in American English
(Egan, 1948; see Open Science Framework for

a complete list of the words used, https://osf.io/kwcsv
/?view_only=ee2b903d3c2c467aa6b954b19c7a2afa).
Rsynth uses a formant synthesizer (Klatt, 1980), together
with relatively primitive orthography-to-speech rules, E
it has a reduced and degraded acoustic–phonetic cue set
with low acoustic cue covariation compared with natural
speech. For these reasons, the intelligibility for Rsynth is
quite low. Tuttavia, listeners show rapid improvement
even after a 1-hr training session, even when no words
were repeated across the experiment, increasing their
understanding on average by 15 parole (15% improve-
ment) compared with their performance at pretest (Fenn
et al., 2013; Fenn, Nusbaum, & Margoliash, 2003; Schwab,
Nusbaum, & Pisoni, 1985). This is similar to the work of
Nygaard and Pisoni (1998), which has demonstrated that
listeners learning the speech of a particular (nonsyn-
thetic) talker show significant improvements in speech
recognition even for words not previous trained on. Sim-
ilarly, research with accented talkers shows that learning
the accent improves recognition for untrained words
(Bradlow & Bent, 2008).

For the generalized learning condition, two test lists
containing 100 words each (Test 1, average word duration
Di 354 msec, SD = 75 msec, and Test 2, average word dura-
tion of 331 msec, SD = 83 msec) and a training list contain-
ing 300 word (average word duration of 348 msec, SD =
80 msec) were constructed from the synthesized set of
500 parole. For the rote learning condition, 20 words were
picked from the generalized training list of 300 words to
serve as the test (both pretest and posttest) and training
parole. The same 20 words were used for each participant

Heald et al.

429

in this condition (average word duration of 348 msec,
SD = 82 msec). The two tests (Test 1 and Test 2) for gen-
eralized learning were piloted to be performance balanced
in terms of difficulty. Although no words were repeated
across test and training in the generalized learning condi-
zione, there was repetition of words in the rote learning
condition such that the 20 selected words were repeated
5 times each during testing (100 items per test) E
15 times each during training (300 training items).

To ensure that the 20 selected rote words approximated
the properties of the full 500 words from the generalized
learning condition, we made a random selection of
20 words from the generalized list (500 possible words)
and calculated the number of unique phonemic items in
this set, repeating this process 10,000 times to generate
a distribution against which we could evaluate our
observed number of unique phonemes in the actual rote
set. The actual rote set possessed 24 unique elements,
which fell within the 95% confidence interval (CI) del
constructed distribution (21–28 unique phonemes). In
addition to this, we also calculated the percent overlap
between the top 24 phonemes that occurred the most in
the generalized word list (500 possible words) and the 24
unique phonemes identified in the actual rote word list.
We observed that there was an 85% overlap between these
two lists. To evaluate this statistic, we again created a dis-
tribution of this statistic by making a random selection of
20 words from the generalized list (500 possible words)
and calculating the percent overlap between this set
and the top 24 phonemes found in the generalized word
list, repeating this process 10,000 times. Our observed
percent overlap fell within the 95% CI of the constructed
distribution (70–95%). These analyses suggest that the 20
selected rote words well approximated the properties of
the generalized word list, given their set size.

Procedure

Behavioral work has clearly demonstrated that perfor-
mance changes due to perceptual learning are long-
lasting (for a review, see Goldstone, 1998). Specifically,
research by Schwab et al. (1985) has demonstrated that
improvements in understanding from rapid generalized
training of a difficult-to-understand synthetic voice lasts
for 6 months. To avoid obvious carryover effects, partici-
pants were either assigned to engage in generalized
learning or rote learning. For both groups, informed con-
sent was obtained before beginning the experiment. Tutto
participants were initially tested (pretest), trained (train-
ing), and retested (posttest) on their identification perfor-
mance for monosyllabic synthetic speech stimuli (Vedere
Figura 2 for a schematic of trial and condition structure).
In both groups, stimuli were presented binaurally using
MATLAB 2015 with Psychtoolbox 3 over insert earphones
(3M E-A-RTONE GOLD) at 65–70 dB SPL In the general-
ized learning condition, the test lists used at pretest and
posttest were counterbalanced across participants to

ensure that any change in performance from pretest to
posttest would reflect learning. As the testing material
for the rote learning condition was the same at pretest
(5 repetitions for each word–100 test trials), training (15
repetitions for each word–300 training trials), and posttest
(5 repetitions for each word–100 test trials), there was no
need to counterbalance the tests for the rote learning con-
dizione. For both rote and generalized learning, test trials
consisted of participants hearing a synthetic speech token
E, after a short delay, being asked to type back what they
heard. For training, this identification procedure was
followed by visual written feedback in tandem with an
additional auditory presentation of the synthetic speech
token. In the generalized learning condition, no words
were ever repeated across the experiment (cioè., partici-
pants heard 100 unique words during the pretest, 300
unique words during training, E 100 unique words
during the posttest).

At the conclusion of the experiment, participants in the
rote condition were given a generalized learning test of
100 novel words (identical to Test 1 in the generalized
learning condition). This was done to replicate previous
research that showed that rote learning training leads to
poorer generalization to untrained words compared with
training on all novel words when learning a difficult-to-
understand talker (Fenn et al., 2013; Greenspan et al.,
1988). Important to our present hypotheses, these previ-
ous studies show that rote learning training for 20 repeat
words of a difficult-to-understand talker does allow for
some generalization to untrained words, but that this gen-
eralization is significantly weaker than the generalization
found for those who are trained on all novel words (Fenn
et al., 2013; Greenspan et al., 1988). For this reason, we
expect generalization performance for rote individuals to
be better than the generalized learning conditions’ pretest
performance but worse than their posttest performance.
EEG signals were recorded continuously during pretest,
training, and posttest. After the experiment concluded,
participants’ heads were photographed using a geodesic
dome with 11 mounted infrared cameras to precisely
determine the location of all 128 electrodes (Russell,
Jeffrey Eriksen, Poolman, Luu, & Tucker, 2005).

Data Acquisition

Neurophysiological responses were obtained using an
Electrical Geodesics, Inc. (EGI) GES 300 Amp system (fuori-
put resistance of 200 , with a recording ranging from
0.01 A 1000 Hz). The high-density EEG (128 electrodes)
was recorded at 250 samples/sec in reference to vertex
using unshielded HCGSN 130 nets. Before both pretest
and posttest periods, impedances were minimized by
reseating or, if necessary, by rewetting electrode sponges
using a transfer pipette and saline (A 50 kΩ or less).
Resulting amplified EEG signals were recorded using EGI
Net Station software (v. 4.5.7) on a computer running Mac
OSX (10.6) operating system. No filtering was applied to

430

Journal of Cognitive Neuroscience

Volume 34, Numero 3

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

the EEG signal at acquisition. Trial types were tagged in
Netstation using the Netstation Toolbox in Psychtoolbox.
Timing of tags was corrected during preprocessing as a
mean tagging latency of 15 msec (SD = 2.4 msec) was found
between the stimulus presentation computer running Mac
OSX and Net Station via EGI’s audio timing test kit.

EEG Preprocessing

EEG recordings were preprocessed in Brain Electrical
Source Analysis (BESA) software (BESA Research 7.0).
Electrode coordinates from individuals’ net placement
photos were used to assign individual sensor locations
for each participant. Recordings were filtered with 0.3–
50 Hz band pass and 60 Hz notch filters to remove electri-
cal noise. Voltage was rereferenced to the average of all
electrodes. Based on the trial tags, epochs of interest
around the times of stimuli presentation were selected
COME 200 msec before to 800 msec after the onset of the stim-
ulus. Epochs were then examined for artifacts including
eye blinks and movements. Beyond visual inspection, volt-
age threshold detection was also used (voltage thresholds
for eye movements were 150 μV for horizontal movements
picked up in the EOG electrodes and 250 μV for vertical
movements). Artifacts were removed from the epochs of
interest using ocular source components using BESA
(BESA Research 7.0; Picton et al., 2000; Berg & Scherg,
1994). In some cases, artifacts due to large movements
or to sweat could not be removed by independent compo-
nent analysis. In these cases, the contaminated trials were
not included in further analysis. Individual channels that
were problematic for a majority of trials (amplitude of
>150 μV indicating excessive noise, <0.01 μV indicating low signal, or changes of >75 μV from one sample to
il prossimo) were replaced by interpolation using surround-
ing channels. Because electrode impedances were only
checked before pretest and posttest periods, we declined
to analyze EEG data from the training block to ensure we
only present the highest quality data.

The data collected from three participants in the gener-
alized learning condition (two men, one woman; all right
handed) and three participants in the rote learning condi-
zione (two men, one woman; all right handed) were
removed from further analysis because of excessive arti-
fact contamination (removal of more than 30 trials in
either pretest or posttest). In the generalized learning
condition, the remaining 26 participants had an average
Di 90 trials remaining for the pretest (SD = 8.69, range:
74–100), and an average of 88 trials remaining for the
posttest (SD = 7.22, range: 75–99). In the rote learning
condition, the remaining 30 participants had an average
Di 89 trials remaining for the pretest (SD = 8.10, range:
71–100) and an average of 91 trials remaining for the post-
test (SD = 8.43, range: 70–100). For each participant, aver-
aged waveforms for the conditions of interest (per esempio., pretest
and posttest) were created, as were corresponding files for
topographic analysis in RAGU (RAndomization Graphical

User interface; Koenig, Kottlow, Stein, & Melie-García,
2011). The 100-msec prestimulus period was used to
baseline correct the ERP averages by subtracting the
average during the prestimulus period from each time
point in the waveform. To compute topographic maps,
participant-specific 3-D electrode locations were used.
The averaged ERP data, participant-specific electrode
location files, specific words used in each condition, E
behavioral data have been made available on Open
Science Framework (https://osf.io/ kwcsv/?view_only
=ee2b903d3c2c467aa6b954b19c7a2afa).

Statistical Analyses

Global Analyses

To conduct global analyses over the course of the entire
epoch using every electrode, we used RAGU, an open-
source MATLAB-based program that performs nonpara-
metric significance testing by generating 5000 simulations
in which data from the conditions of interest have been
randomly shuffled to bootstrap a control data set (Vedere
www.thomaskoenig.ch/index.php/software/ragu
/download). This set of simulations functions as a null
distribution, against which the observed data can be
compared, usually using a measure of effect size such as
global field power (GFP; or the standard deviation across
electrodes at a given time point) or global map dissimilar-
ità (a measure that captures scalp topography differences
between conditions at a given time point). This avoids
biases associated with a priori assumptions about which
time windows and electrodes should be included in the
analysis (Koenig et al., 2011; Murray, Brunet, & Michel,
2008). We used RAGU to perform a GFP analysis and a
topographic ANOVA (TANOVA) that compared strength
of scalp field potential and scalp topography, rispettivamente,
between pretest and posttest.

RAGU calculates GFP at every time point in the epoch

of interest as follows:

S

GFP ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
− μÞ2
ðμ
N
i¼1

io

N

where n is the number of electrodes, μi is the voltage of
electrode i, and _μ is the mean voltage across all elec-
trodes (Koenig, Gianotti, & Lorena, 2009). Così, GFP
is a measure of standard deviation across electrodes.
Conceptually, this means that if there is a strong
response over part of the scalp, then the GFP will be
greater due to more variance across locations, whereas
weak responses will yield low GFPs. GFP also has the
benefit of being entirely reference independent. Once
the observed GFP is calculated, the data are shuffled
between conditions and GFP is recalculated. Questo
reshuffling procedure is carried out 5000 times at each
time point to obtain the null distribution of GFP for a
given time point. At each time point, a p value is

Heald et al.

431

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
4
3
4
2
5
1
9
8
4
8
4
9

/
j

o
C
N
_
UN
_
0
1
8
0
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

calculated that represents the proportion of randomized
GFPs that exceed the observed GFP.

N1 and P2 auditory evoked potentials (Picton, 2011) Di
interest to us here.

Although GFP is a good measurement of differences
in the strength of potentials across the scalp, potentially
important topographical information is lost by calculat-
ing standard deviation across all electrodes. RAGU’s
TANOVA measures differences in topographical distribu-
tions of voltage between pairs of conditions or time
points. The measure of effect size used is generalized
dissimilarity s across the experimental conditions:

s ¼

X

C
i¼1

S

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(cid:4)
P
2

(cid:3)

N
j¼1

υij − υj
N

where c is the number of conditions, n is the number of
electrodes, __υij is the mean voltage of condition i at electrode
j across participants, and __υj is the mean voltage at electrode
j across all participants, with conditions averaged together
(Koenig & Melie-García, 2010). Because this measure
accounts for differences between condition-wise maps
at individual electrodes, it preserves topographical infor-
mazione, unlike GFP: The farther apart the voltage at elec-
trode j in condition A versus B, the larger the squared
difference added to s and therefore the larger the differ-
ence in voltage patterns on the scalp.

Once RAGU has calculated the generalized dissimilar-
ità (S) from the data, it shuffles the data between conditions
and recalculates s 5000 times to generate a null distribution
of generalized dissimilarities. These are the effect sizes that
would be expected in the absence of a true difference
between the conditions. A p value is calculated at each time
point by comparing this null distribution to the observed
data as with GFP. The TANOVA was used to identify periods
of interest as time windows where there appeared to be
significant map differences between pretest and posttest
( P < .05). Before the TANOVA analysis, we normalized data by dividing all voltage values of a given map by its time-specific GFP. This was done so that significant differ- ences found between the conditions in the TANOVA analysis could be attributed solely to underlying differ- ences in source contributions in the brain. For both GFP and TANOVA analyses, we also report whether the window passed a duration threshold test. This was done by collecting the duration of continuous win- dows found to be significant in the bootstrapped data. The distribution of these durations represents the distri- bution of duration under the null hypothesis that the data are interchangeable between conditions, as they were obtained from the shuffled data (in which the data are interchanged between conditions randomly). For each test, we set the threshold for the duration as the 95th per- centile of spurious window durations that appear across the 5000 random permutations. Windows in the observed data that pass this threshold testing are clearly noted; however, we decided to report all windows, especially those before 300 msec, given the transient nature of the Beyond the RAGU analysis, we used BESA Statistics 2.0 to ascertain which electrodes were responsible for the observed topographic changes. To do this, we averaged each electrode’s voltage over the windows identified in the TANOVA analysis and performed paired-samples t tests between pretest and posttest. This analysis used a spatiotemporal permutation-based correction to adjust for multiple comparisons. For these analyses, we used a cluster alpha level of .05 for cluster building, 5000 permu- tations, and a channel distance of 4 cm that resulted in an average of 6.58 neighbors per channel for the generalized learning condition and an average of 7.09 neighbors per channel for the rote learning condition. The small differ- ence in neighbors in the cluster analysis between the two conditions is due to small variation in head size between the two conditions, as a geodesic dome with 11 mounted infrared cameras was used to precisely deter- mine the location of all 128 electrodes for each participant. Source Level Analysis To investigate the intracranial sources underlying the topographic window changes identified by the TANOVA analysis in RAGU, we used the local auto regressive aver- age (LAURA) model in BESA Research 7.0. LAURA is a dis- tributed source localization method that does not make a priori assumption with respect to the number of discrete sources. Similar to other distributed volume inverse imag- ing methods, LAURA seeks to find a solution where the dis- tribution of the current over all source points is minimized while optimally trying to explain the observed topography. LAURA, however, uses a spatial weighting function to account for the fact that source strength should decrease by the inverse of the cubic distance between a putative source and recording electrodes on the scalp. The result of this technique is a spatiotemporal projection of current density in a neuroanatomical space similar to a functional map in fMRI. LAURA modeling was applied to the full ERP epoch (−200 to 800 msec) for each participant. Using a cluster-based permutation test in BESA Statistics 2.0, we computed average distributed source images across the time windows identified by the TANOVA analysis in RAGU. These average distributed source images were then con- trasted between pretest and posttest for each learning con- dition using a cluster-based permutation test (contrast: posttest–pretest). This analysis allowed us to identify ana- tomical locations in the brain volume responsible for the topographical differences identified by the TANOVA analysis. For this analysis, cluster values were ascertained by the sum of all t values within a given cluster. The significance of observed clusters is determined by generating and compar- ing clusters from 5000 permutations of the data between stimulus conditions. Statistically, the results reported from this analysis are highly conservative as BESA corrects for multiple comparisons across all voxels and time points to l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 432 Journal of Cognitive Neuroscience Volume 34, Number 3 control the familywise error rate. Because of the con- servative nature of this analysis and that the windows we are investigating have been previously identified through the TANOVA analysis in RAGU, all identified clusters from this analysis are reported including null results. RESULTS Behavioral Results condition at posttest (independent, equal variance unas- sumed two-sample [Welch’s] t test: t(41.90) = −2.18, p = .035, Cohen’s d = 0.60). This finding replicates previ- ous work showing that although rote training (repeat experience on a small subset of words) can yield some generalized learning, such generalized learning is signifi- cantly weaker than generalized learning that results from training on a set of all novel words. Generalized Learning Behavioral Results Electrophysiology Results Word recognition performance (i.e., the number of words transcribed correctly) at posttest was subtracted from word recognition performance at pretest to obtain a participant- specific learning score. Pretest performance averaged 29 words correct out of 100 (SD = 8.19, range: 15–45). After training, recognition performance on the posttest signifi- cantly increased to an average of 42 words correct out of 100 (SD = 13.27, range: 20–71; paired-samples t test: t(25) = 7.11, p < .00001, Cohen’s d = 1.402). This means that individuals significantly recognized more words at post- test than they did at pretest, despite no words repeating across the tests (or training) in the generalized learning condition. To verify that the tests were indeed performance balanced, we compared pretest, posttest, or learning per- formance between the two test orders in the generalized learning condition. We found no evidence for any difference in performance (pretest: Welch’s two-sample independent- sample t test: t(24.0) = −0.93, p = .36, Cohen’s d = 0.37; posttest: Welch’s two-sample independent-sample t test: t(23.7) = −1.30, p = .22, Cohen’s d = 0.51; learning: Welch’s two-sample independent-sample t test: t(23.5) = −1.00, p = .32, Cohen’s d = 0.39). Rote Learning Behavioral Results Similar to generalized learning, word recognition perfor- mance at posttest was subtracted from word recognition performance at pretest to obtain a participant-specific learning score. Pretest performance averaged 16 words correct out of 100 (SD = 7.83, range: 4–30).1 After training, recognition performance on the posttest significantly increased to an average of 95 words correct out of 100 (SD = 8.6, range: 68–100; paired-samples t test: t(29) = 41.48, p < .00001, Cohen’s d = 7.571). This indicates that training significantly helped individuals to appropriately recognize the words shown at pretest by the posttest in the rote learning condition. Performance on the additional generalized learning test (all novel words) for those in the rote learning condition shows that participants, on average, correctly identified 36 words correctly out of 100 (SD = 8.7, range: 13–47 words). Although this performance was better than the perfor- mance demonstrated by individuals in the generalized learning condition at pretest (independent, equal variance unassumed two-sample [Welch’s] t test: t(53.55) = 2.77, p = .008, Cohen’s d = 0.74), it was also significantly worse than performance by those in the generalized learning Generalized Learning Electrophysiology Results Figure 3A shows the grand-averaged ERPs (across all par- ticipants in the generalized learning condition) elicited during both pretest and posttest. For both pretest and posttest, N1 and P2 had maximal voltages at central sites (e.g., C3, Cz, C4). At pretest, N1 peaked at 112 msec at Cz, whereas P2 peaked at 200 msec at Cz. At posttest, N1 peaked at 108 msec at Cz, whereas P2 peaked at 200 msec at Cz. Consistent with prior research, inverted polarity for the N1 and P2 waves was found over sites P9 (left temporal) and P10 (right temporal; see Figure 3B) below the Sylvian fissure, thereby suggesting that the neural generators for both N1 and P2 are in or near primary auditory cortex (Yvert, Fischer, Bertrand, & Pernier, 2005; Liégeois-Chauvel, Musolino, Badier, Marquis, & Chauvel, 1994; Andrews, Knight, & Kirby, 1990; Scherg, Vajsar, & Picton, 1989). Rote Learning Electrophysiology Results Figure 3C shows the grand-averaged ERPs (obtained by averaging across all participants in the rote learning condi- tion) elicited during pretest and posttest. At both pretest and posttest, N1 and P2 had maximal voltages at central sites (e.g., C3, Cz, C4), with the highest peak at Cz. At pre- test, N1 peaked at 112 msec at Cz, whereas P2 peaked at 200 msec at Cz. At posttest, N1 peaked at 108 msec at Cz, whereas P2 peaked at 200 msec at Cz. Similar to the gen- eralized learning condition, inverted polarity for the N1 and P2 waves was found over sites P9 (left temporal) and P10 (right temporal) below the Sylvian fissure in the rote learning condition (see Figure 1D), indicating that the neural generators for both N1 and P2 in this condition are also in or near primary auditory cortex ( Yvert et al., 2005; Liégeois-Chauvel et al., 1994; Andrews et al., 1990; Scherg et al., 1989). Analysis of Overall Amplitude Difference for Generalized Learning Based on the analysis with RAGU, we identified two win- dows in the time series of auditory evoked potential in which the observed GFP difference exceeded the top bound of the null distribution’s 95% CI. Both of these windows had sufficient length and passed window thresh- olding.2 In the first window from 116 to 208 msec, GFP was shown to be lower at posttest (mean = 1.16 μV, SD = 0.14) compared with pretest (mean = 1.39 μV, SD = 0.14; see Heald et al. 433 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 3. Grand-averaged ERPs for generalized learning (A) and rote learning (C) for the nine centralized locations using a virtual montage of the 10–20 system available in BESA Research 7.0 (F3, Fz, F4, C3, Cz, C4, P3, Pz, P4). Grand-average ERPs for P9 and P10 electrodes for generalized learning (B) and rote learning (D) demonstrate the inversion of the N1–P2 complex that is typical of auditory evoked potentials. The ERPs associated with pretest trials are shown in red, and the ERPs associated with posttest trials are shown in blue. Horizontal tick marks span 100 msec, and vertical tick marks represent 1 μV; negative is plotting up. Word onset was used to register and align the EEG traces for averaging, and thus, 0 msec in these plots represents word onset time. Average duration of words in the generalized learning condition was ∼340 msec, whereas the average duration of words in the rote learning condition was ∼350 msec. Figure 4, top plot). The mean GFP difference in the observed data in this time window was 0.22 μV (posttest– pretest). The mean GFP difference under the null was 0.073 μV, 95% CI [0.03, 0.14]. During this time window, the RAGU GFP procedure showed that, on average, there was .02 probability that the effect size of GFP under the null was larger than the observed difference in GFP (minimum and maximum p values of where the observed data fall in the null distribution for GFP difference in this interval are .002 and .05, respectively; see Figure 4, bottom plot). In the second window from 580 to 800 msec, GFP was shown to be lower at posttest (M = 2.14 μV, SD = 0.08) compared with pretest (M = 2.77 μV, SD = 0.05; see Figure 4, top). The mean GFP difference in the observed data in this time window was 0.64 μV (posttest–pretest). The mean GFP difference under the null was 0.21 μV, 95% CI [0.05, 0.49]. During this time window, the RAGU GFP procedure showed that, on average, there was .01 probability that the effect size of GFP under the null was larger than the observed difference in GFP (minimum and maximum p values of where the observed data falls in the null distribution for GFP difference in this interval are .0014 and .05, respectively; see Figure 4, bottom). Analysis of Overall Amplitude Difference for Rote Learning Using the same analysis on the rote learning data yielded no significant GFP differences. The top panel of Figure 5 plots the observed GFP values over time at pretest and posttest for rote learning, along with the probability of obtaining a GFP difference by chance more extreme than the observed GFP difference. The only time that the observed data came close to falling in the extreme tail ( p < .05) of the null dis- tribution is around 300 msec. The bottom panel of Figure 5 shows where the observed GFP difference fell relative to the null distribution (mean and 95% CI are shown). Analysis of Overall Topographic Difference for Generalized Learning We used RAGU to generate the generalized dissimilarity between pretest and posttest that may be obtained due to chance by shuffling the data 5000 times. This distribution for the generalized dissimilarity statistic under the null was then compared with the observed generalized dissim- ilarity statistic. For generalized learning, only one window (250–272 msec) for topographic change was identified (see Figure 6A).3 Although this interval did not pass the win- dow threshold test in RAGU, its appearance at the end of the N1–P2 time window (previous to 300 msec) fits our pre- diction that generalized perceptual learning modifies sen- sory evoked responses. The average observed generalized dissimilarity statistic between pretest and posttest in this time window was 2.99, whereas the average generalized dis- similarity statistic between pretest and posttest under the null was 2.22, with 95% CI [1.76, 2.77]. During this time 434 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 4. The top plot shows mean GFP (right y-axis) over time for both pretest (short dashed line) and posttest (long dashed line; error bars show ±1 SE ) in the generalized learning condition overlaid on the probability over time that the GFP difference under the null was larger than the observed difference in GFP (black line; left y-axis). Significant time periods identified by the GFP analysis are shaded gray: one occurring from 116 to 208 msec and another occurring from 580 to 800 msec. The bottom plot shows the difference in GFP between pretest and posttest (dark black line). For context, the mean (light gray line) and 95% CI (dark gray area) for the GFP difference expected due to random chance (estimated from randomizing the data 5000 times) has been plotted. Significant time periods are again shaded gray, although note that these periods are identified by the observed difference in GFP exceeding the upper bound of the 95% CI of the shuffled data. window, the RAGU TANOVA procedure showed that, on average, there was .011 probability that the generalized dissimilarity statistic between pretest and posttest under the null was larger than the observed difference in the generalized dissimilarity statistic (minimum and maximum p values of where the observed data fall in the null distri- bution for the generalized dissimilarity statistic in this interval are .01 and .04, respectively; see Figure 6A). A spatiotemporal permutation-based analysis per- formed in BESA Statistics 2.0 on average surface electrode activity during this time window identified three signifi- cant electrode clusters driving the topographic change between pretest and posttest (see Figure 6, W1). The first significant cluster ( p = .0006) was found between frontal and central electrodes left of the midline and comprised the following EGI electrodes (an approximate of 10–10 equivalent is included if available; Luu & Ferree, 2000): 6 (Fcz), 7, 12, 13 (FC1), 110, 111 (FC4), 112 (FC2), 117(FC6), and 118. The second significant cluster ( p = .0038) was found over the left preauricular region and comprised the following EGI electrodes: 114 (T10) and 113. The third significant cluster ( p = .0154) was found between parietal and occipital electrode left of the midline and comprised the following EGI electrodes (an approximate of 10–10 equivalent is included if available): 83 (O2), 84, 89, 90 (PO8), and 91. Analysis of Overall Topographic Difference for Rote Learning To determine if a change in the scalp distribution of brain electrical activity occurred because of rote learning, we used RAGU to calculate the observed generalized dissimi- larity statistic between pretest and posttest, along with this statistic for 5000 shuffles of the data, to calculate a null distribution. This analysis identified six windows (see Figure 7A: W1, W2, W3, W4, W5, and W6) where the observed generalized dissimilarity statistic exceeded the top bound of the null distribution’s 95% CI. However, only two of these windows (W4: 424–484 msec and W6: 660– 800 msec) passed the window threshold test in RAGU. Table 1 reports (1) the observed generalized dissimilarity Heald et al. 435 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 5. The top plot shows mean GFP (right y-axis) over time for both pretest (short dashed line) and posttest (long dashed line) in the rote learning condition (error bars show ±1 SE ) overlaid on the probability over time that the GFP difference under the null was larger than the observed difference in GFP (black line; left y-axis). No significant time windows were observed. The bottom plot shows the difference in GFP between pretest and posttest. For context, the mean (light gray line) and 95% CI (dark gray area) for the GFP difference expected due to random chance (estimated from randomizing the data 5000 times) has been plotted. Figure 6. Plot A shows how the generalized dissimilarity statistic between pretest and posttest topographies varies overtime in the generalized learning condition (black line). For context, the mean (light gray line) and 95% CI (medium gray area) for the generalized dissimilarity statistic expected due to random chance (estimated from randomizing the data 5000 times) has been plotted. Plot W1 shows the results of the spatiotemporal permutation-based analysis that was performed on the W1 window found in RAGU. This plot shows the average topographic difference between pretest and posttest (contrast: posttest–pretest) and indicates where the three significant electrode clusters ( p < .05) are topographically located. 436 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 7. Plot A shows the time-varying generalized dissimilarity between pretest and posttest topographies for rote learning (black line). For context, the mean (light gray line) and 95% CI (medium gray area) for the generalized dissimilarity expected due to random chance (estimated from randomizing the data 5000 times) has been plotted. Six windows were identified where the observed data exceeded the upper bound of the 95% CI of the shuffled data. Plots W1, W2, W3, W4, W5, and W6 show the results of the spatiotemporal permutation-based analysis that was performed on each window. All electrode clusters shown, uniquely colored on each plot, are significant at a p < .05 level. Heald et al. 437 Table 1. Significant Time Windows Showing Topographic Change from Pretest to Posttest Identified by the TANOVA RAGU Analysis in the Rote Learning Condition Window Time Observed GD Mean GD from the Permutation Distribution W1 W2 W3 W4 W5 W6 100 msec 156–160 msec 380–400 msec 424–484 msec 580–604 msec 660–800 msec 3.22 2.5 2.4 2.33 2.2 2.66 2.46 1.96 1.89 1.8 1.81 2.05 95% CI for GD from the Permutation Distribution [1.92, 3.14] [1.58, 2.43] [1.55, 2.3] [1.48, 2.19] [1.51, 2.17] [1.72, 2.4] Probability of the Observed GD Statistic (or Greater) under the Permutation Distribution .04 .04 .02 .03 .04 .01 For each window, the observed generalized dissimilarity (GD) between pretest and posttest is reported along with the mean and the 95% CI from the permutation distribution. The final column (Probability of the Observed GD statistic (or Greater) under the Permutation Distribution) reports the percentage of the 5000 shuffled versions of the data that obtained a GD statistic between pretest and posttest more extreme than actually observed. Data in bold indicate windows that pass the duration threshold in RAGU. statistic between pretest and posttest, (2) the mean, and (3) 95% CI for the generalized dissimilarity statistic between pretest and posttest under the null, as well as (4) the probability of the observed data given the shuffled data for each of these identified time windows. To identify significant electrode clusters driving the topographic differences during these periods, a spatio- temporal permutation-based analysis was performed in BESA Statistics 2.0 for each of these time windows. An examination of the first of these periods to pass the win- dow threshold test (W4) revealed three significant clusters of electrodes that showed significantly different activity from pretest to posttest (see Figure 7, W4). The first signif- icant cluster ( p < .0001) was centered between the Cz and Fz electrodes and comprised the following EGI electrodes (an approximate of 10–10 equivalent is included if avail- able; Luu & Ferree, 2000): 5, 6 (Fcz), 7, 20, 30, 55 (CpZ), 106, 112 (FC2), and 118. The second significant cluster ( p = .0214) in W4 was located over the Fpz region and included the following EGI electrodes (an approximate of 10–10 equivalent is included if available): 14 (Fpz), 15 (Fcz), 16 (Afz), 17, 21, and 22 (Fp1). The final significant cluster ( p = .022) in W4 was found over the frontal– temporal region and comprised the following EGI elec- trodes (an approximate of 10–10 equivalent is included if available): 39 and 40. The analysis of the second time period to pass the window threshold test ( W6) revealed one significant cluster of electrodes ( p = .04) that cen- tered near FT9 and F9 electrodes. This cluster consisted of EGI electrodes (an approximate of 10–10 equivalent is included if available): 48, 128, and 127 (see Figure 7, W6). Analysis of Source Difference for Generalized Learning To estimate the source activations that lead to the map difference in the generalized learning condition that spanned from 252 to 272 msec, average distributed source images obtained from LAURA modeling were contrasted between pretest and posttest using a cluster- based permutation test in BESA Statistics 2.0 (contrast: posttest–pretest). For each identified source cluster, the closest cortical region was determined using the MATLAB toolbox version of the Brede Database (Nielsen, 2003). For generalized learning, four clusters were identified as decreasing in activity following training: a cluster near superior temporal gyrus (31.5, −30.9, 9.7, p = .231), a cluster in left superior parietal (−17.5, −79.9, 37.7, p = .335), a cluster in left anterior cingulate gyrus (−17.5, 11.1, 23.7, p = .39), and a cluster in right anterior cingulate gyrus (10.5, 25.1, 9.7, p = .559. Although these four clusters do not pass the significance threshold of p < .05, these identified regions directly align with areas that would be expected given our hypothesis that generalized learning of a difficult-to- understand talker helps to alleviate attention by reorga- nizing attention to the most diagnostic phonological features for the talker. Analysis of Source Difference for Rote Learning Similar to Generalized Learning, we estimated the source activations that led to the six identified map differences ( W1, W2, W3, W4, W5, and W6) in the rote learning condi- tion by contrasting pretest and posttest average distrib- uted source images obtained from LAURA modeling using a cluster-based permutation test in BESA Statistics 2.0 (contrast: posttest–pretest). For each identified source cluster, the closest cortical region in Talairach coordinate space was determined using the MATLAB toolbox version of the Brede Database (Nielsen, 2003). Table 2 reports the results of the cluster-based permutation analysis for each of the six windows identified through the RAGU TANOVA 438 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Table 2. Results of the Cluster-based Permutation Analysis on the Average Distributed Source Images Obtained from LAURA Modeling Comparing Pretest and Posttest for Each of the Six Windows Identified through the RAGU TANOVA Analysis for the Rote Learning Condition Window Time Sources Lobar Anatomy W1 100 msec −17.5, −51.9, −25.3 Left cerebellum 31.5, 60.1, 23.7 Right middle frontal gyrus 59.5, −51.9, −18.3 Right inferior temporal gyrus −31.5, 4.1, 16.7 Left inferior frontal gyrus Cluster Significance Pretest Activity Posttest Activity .01 .02 .07 .11 0.137 0.173 0.1026 0.1302 0.1237 0.9824 0.1821 0.1469 W2 145–160 msec 24.5, −23.9, 16.7 Right superior temporal gyrus <.0001 0.1855 0.1406 10.5, −79.9, 44.7 Right medial parietal/precuneus W3 380–400 msec 10.5, −30.9, 9.7 Right posterior cingulate 10.5, 32.1, 16.7 Right anterior cingulate 52.5, 18.1, 9.7 Right inferior frontal gyrus W4 424–484 msec 24.4, −23.9, 16.7 Right superior temporal gyrus −31.5, −9.9, 9.7 Left temporal insula W5 580–604 msec −31.5, −23.9, 9.7 Left Heschl’s gyrus −17.5, −72.9, 30.7 Left medial parietal/precuneus W6 660–800 msec −31.5, −2.9, 23.7 Left inferior frontal gyrus −3.5, −79.9, −4.3 Lingual gyrus −24.4, −65.9, 30.7 Left precuneus .44 .18 .31 .48 .14 .38 .07 .2 .23 .29 .31 0.0327 0.0263 0.2578 0.185 0.5819 0.4024 0.1492 0.1135 0.2937 0.2148 0.3886 0.3078 0.4862 0.3298 0.212 0.1597 0.4096 0.3171 0.1262 0.1626 0.2423 0.1797 For each identified cluster, the x, y, z location of peak activation is reported in Talairach coordinate space. Data in bold indicate windows that pass the duration threshold in RAGU TANOVA analysis. analysis. It is important to note that the areas implicated by this analysis in the rote learning condition are to some extent consistent with episodic learning models (Spaniol et al., 2009). DISCUSSION One model of generalization and abstraction in memory is that individual experiences are encoded as rote represen- tations and that generalization emerges from the aggre- gate response of long-term memory given a novel test item. The test item elicits responses from prior experi- ences that are stored, and the emergent response to the novel item is a generalization over those individual traces (McClelland & Rumelhart, 1985). If this were the case, there should be substantial similarities in rote learning and generalized learning with the primary difference being the strength of representation in rote learning (more instances of encoding the same trace). However, prior research has argued there are different mechanisms underlying rote and generalized perceptual learning of synthetic speech (Fenn et al., 2013) by showing different patterns of consolidation for each type of learning during sleep. The present patterns of neural responses support this latter view. In this study, generalized learning was marked by an amplitude reduction in the latter portion of the N1 wave into the peak of the P2 wave from 116 to 208 msec not seen in rote learning. These generalization training effects were followed by (1) a source configura- tion change from 250 to 272 msec that was estimated to arise from a decrease in activity in the right superior temporal gyrus, the left superior parietal, and the anterior cingulate gyrus bilaterally and (2) late negativity in the auditory evoked potential 580–800 msec poststimulus onset. Unlike generalized learning, rote learning was only marked by a series of source configuration changes mostly Heald et al. 439 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 occurring 380 msec after stimulus onset. The demonstra- tion of changes in the N1–P2 complex for generalized learning, but not for rote learning, supports the theoretical view that the transfer of learning beyond talker-specific experiences is garnered through an attentional reorganiza- tion process that adaptively modifies early auditory pro- cessing to cope with systematic acoustic variability. This is consistent with the work of Francis et al. (2000) that has demonstrated that generalized learning reduces atten- tion to uninformative acoustic cues and increases it to informative ones. An alternative to this account is that the observed reduction in the N1–P2 complex is reflective of an automatization of sensory processing (Shiffrin & Schneider, 1977). Generalized learning on this kind of syn- thetic speech has been shown to reduce working memory demands, consistent with increased automatization and reduced cognitive load (Francis & Nusbaum, 2009). How- ever, one might expect that if automatization were the explanation of the N1–P2 change observed in generalized learning, the same change should have been observed for rote learning, given that overtraining with a small set of stimuli is much more consistent with the conditions for automatization. As previously discussed, the auditory evoked N1 poten- tial has been argued to be composed of at least two phys- ically and arguably functionally distinct sources (Picton, 2011; Jääskeläinen et al., 2004; McEvoy et al., 1997), with the temporally earlier N1 source supporting mechanisms by which novel, unattended sounds are brought into awareness and the temporally later N1 source supporting additional attentional focus to features comprising the auditory object ( Jääskeläinen et al., 2004). Our GFP analy- sis in the generalized learning condition revealed a signif- icant change in the later part of the N1 time period, from 116 to 208 msec, starting approximately at the height of the N1 peak and lasting through to the P2 component. The absence of change in the temporally earlier N1 source and presence of the change during the temporally later N1 source is additionally consistent with the view that gener- alized learning of synthetic speech is related to a substan- tial reduction in the demands of attention toward features comprising an auditory object (Gutschalk, Micheyl, & Oxenham, 2008; Jääskeläinen et al., 2004; Tiitinen et al., 1994). The absence of a similar decrease in N1 following rote training supports the idea that rote learning in this setting does not substantially alter early attentional pro- cesses and as such may be much more similar to memory encoding of episodic traces. Consequently, it also offers an explanation as to why transfer of learning is found to a much greater extent following generalized training com- pared with rote training in the context of learning a difficult-to-understand talker. Changes found in the P2 component of the auditory evoked potential in the generalized learning condition dif- fer from those found in previous studies examining per- ceptual learning (Ross & Tremblay, 2009; Tremblay et al., 2014). Although previous studies demonstrate postsleep increases in the P2 component following train- ing, generalized learning here coincided with an immedi- ate reduction in P2 amplitude (see the Analysis of Overall Amplitude Difference for Generalized Learning section). Given the reliance on sleep to consolidate rapidly acquired learning into long-term representations (Nusbaum, Uddin, Van Hedger, & Heald, 2018), previous studies have argued that the sleep-dependent P2 change marks the consolidation of a feature-based representation in long- term memory (Tremblay et al., 2014; Ross & Tremblay, 2009). Here, we highlight an implication of this interpreta- tion: If the P2 component is sensitive to the formation of an additional featural representation in long-term mem- ory, it indicates that the auditory evoked P2 response may be sensitive to the number of active featural represen- tations serving current recognition. Under this view, the change in P2 following generalized training suggests that generalized training decreases the number of active fea- tural representations required for ongoing perception. Beyond the observed change in GFP following generalized training during the first half of the P2 response due to gen- eralized training, the TANOVA analysis indicated a change in topography in the latter half of the P2 component between 250 and 272 msec (see the Analysis of Overall Topographic Difference for Generalized Learning sec- tion). Although this window did not pass the duration threshold for RAGU, its appearance during the N1–P2 complex arguably elevates its relevance, and as such, we interpret its appearance. The distributed source modeling with LAURA estimates that this window of topographic change was driven by a decrease in activity in the right superior temporal gyrus, the left superior parietal, and the anterior cingulate gyrus bilaterally. Although the phys- ics of volume conduction, as well as individual differences in neuroanatomy, can limit the specificity with which we can draw inferences about particular neural sources, we consider it noteworthy that the results of the distributed source estimation using LAURA aligns closely with the a priori dipole model derived from past fMRI work (Uddin, Reis, Heald, Van Hedger, & Nusbaum, 2020; Wong, Nusbaum, & Small, 2004). Indeed, both superior parietal cortex as well as the anterior cingulate have been argued to be responsible for attentional resource allocation for ongoing processing (Myers & Theodore, 2017; Piai, Roelofs, Acheson, & Takashima, 2013; Wong et al., 2004). Further- more, the superior temporal gyrus has been associated with processing that is sensitive to talker-specific phonology (Myers & Theodore, 2017; Wong et al., 2004). These regions therefore appear to comprise a network capable of reorganizing attentional resources to acoustic cues, which are most diagnostic for a to-be-learned, difficult-to- understand talker. Although the alignment of our distrib- uted source modeling with LAURA to these past studies should not be taken as confirmatory, it can be said that our data are consistent with an existing theoretical model in which a network of middle/superior temporal and supe- rior parietal regions is leveraged when normalizing across 440 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 idiosyncratic differences between talkers to facilitate the perception of speech categories. Taken collectively, the observed source changes during the P2 window in the cur- rent study adds to mounting evidence that the transfer of learning beyond utterance-specific experiences is accom- plished by modifying attention toward features that are most informative. Again, the lack of similar changes in the P2 component following rote training supports the idea that rote learning (at least in the context of understanding a difficult-to-understand talker) does not substantially alter early attentional processes and as such may be best thought of as a process that involves the simple formation of mem- ory representations or associations between phonetic pat- terns and their meanings. Results from distributed source modeling with LAURA at the windows of topographic changes found in the rote learning condition appear to support this view, with many of the identified areas (see Table 2) implicated in episodic learning models (Spaniol et al., 2009). As such, our source analysis work supports the view that the performance dif- ferences found between rote and generalized learning arise from the engagement of two distinct learning strate- gies. In rote learning, brain regions focused on the mem- ory encoding and retrieving of specific learned patterns were engaged, whereas in generalized learning, brain regions involved in the reorganizing of attention during early sensory processing were active. According to the reverse hierarchy theory (RHT; Ahissar, Nahum, Nelken, & Hochstein, 2009), these differences in neural changes offer explanation for (1) why rote recognition perfor- mance reaches ceiling or near ceiling for the small, trained set of words and (2) why this group shows significantly weaker generalized learning compared with those in the generalized learning condition at posttest. According to RHT, the level that learning occurs at is determined by the minimum-level representation needed to observe and uncover systematic pattern variability. As stimulus var- iability increases, higher, more abstract representations are needed to understand the relationships among the stimuli, as there will be little experience with specific stim- ulus patterns to guide learning. According to a predictive coding framework of the brain, given that rote and gener- alized learning fundamentally alter different types of rep- resentations, these forms of learning should lead learners to make fundamentally different kinds of predictions to guide perception. In rote learning, representations tied to specific stimulus patterns are likely strengthened with training, which in turn fosters strong, stimulus-level pre- dictions for the trained words to guide perception. Learn- ing at this level of representation, however, does little for predicting the meaning of untrained, novel words. In gen- eralized learning, more abstract representations, perhaps tied to modeling the talker’s acoustic–phonetic space, are needed to drive learning. Learning at this level would yield more abstract predictions that support better predictions for novel words by helping to orient attention to acoustic cues that are most diagnostic of the speech sound categories for the given talker. This trickling down of learn- ing to lower processing levels is consistent with RHT, which largely casts perceptual learning as a top–down pro- cess that is organized by higher-level representations. Beyond changes in the N1–P2 complex, we observed a decrease in the late negativity of the auditory evoked potential starting 580 msec poststimulus onset following generalized learning, but not following rote learning. This mirrors results by Tremblay et al. (2014), who found late negativity in the auditory evoked potential starting 600 msec poststimulus onset following training. As previ- ously mentioned, this decrease in this late negative poten- tial may be reflective of an improvement in trial-by-trial, prediction error monitoring that helps to drive the reorga- nization of attention (Ashby & O’Brien, 2005; Ashby et al., 1998). According to a predictive coding framework of the brain, a reduction in prediction error monitoring in the context of generalized learning would suggest that trial-by-trial predictions have improved as a consequence of attentional realignment to more diagnostic cues. As previously mentioned, this improvement in the predic- tion of novel words is entirely predicted according to RHT, which asserts that, in generalized learning, higher- level representations, perhaps reflective of the to-be- learned talker’s acoustic–phonetic space, should develop to suppor better predictions for novel words. Our obser- vation that a decrease in the late negativity was only found in the generalized learning condition suggests that differ- ent forms of learning selectively engages and impacts prediction error monitoring. This is consistent with an active–cognitive view of speech recognition in general, which has argued that the cognitive resources recruited during perception (such as selective attention, working memory, and learning) are dynamically determined by an interplay between the ambiguity of the speech signal and context (here training type) in which it occurs (Heald & Nusbaum, 2014). Conclusion Previous research has suggested that generalized and rote learning may be mediated by different neural mechanisms (Fenn et al., 2013). This study tested this directly by com- paring how patterns of neural responses during speech recognition change following rote and generalized learn- ing. The present results demonstrate substantial differ- ences in neural responses for these two types of learning. On the one hand, rote and generalized learning might have been supported by the same neural process of encoding (McClelland & Rumelhart, 1985). Under this view, generalization and abstraction is an emergent prop- erty of experience, captured by the aggregate response of long-term memory traces that are utilized to process novel stimuli. Although this view corresponds to an entire class of memory models, the present data reject this view, showing that generalized learning entails early sensory changes in processing that may be attributed to changes Heald et al. 441 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 in attention that are not seen in rote learning. This differ- ence argues against a passive, bottom–up fixed speech processing system that simply records auditory traces that are then later brought in aggregate to afford general- ization. Rather, the data support the view that speech perception is mediated by active neural processing, in which listeners are able to leverage their recent experi- ence to selectively attend to and process the most mean- ingful acoustic cues for a given situation. Given that the capacity for generalized learning can be used to under- stand how listeners are able to adaptively respond to and overcome the lack of invariance problem in the speech sig- nal, the present work makes clear the need for further work to demonstrate whether or not the neural markers found in the current study—in the context of learning to understand a difficult talker—are also present in more naturalistic circumstances when the underlying acoustic-to-phonetic mapping has been disrupted system- atically (such as a shift in talker, speaking rate, or social register). Acknowledgments The authors thank Sophia Uddin for her helpful comments on an earlier draft, as well as Nina Bartram, Edward Wagner, and Brendan Colson for their assistance with data collection. Reprint requests should be sent to Shannon L. M. Heald, Department of Psychology, The University of Chicago, Chicago, IL 60637, or via e-mail: smbowdre@uchicago.edu. Author Contributions Shannon L. M. Heald: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Software; Supervision; Visualization; Writing—Original draft; Writing—Review & editing. Stephen C. Van Hedger: Writing—Review & editing. John Veillette: Data curation; Writing—Review & editing. Katherine Reis: Data curation; Writing—Review & editing. Joel S. Snyder: Supervision; Writing—Review & editing. Howard C. Nusbaum: Conceptualization; Funding acquisition; Methodology; Resources; Supervision; Writing—Review & editing. Funding Information This work was supported in part by the Multidisciplinary University Research Initiatives (MURI) Program of the Office of Naval Research (https://dx.doi.org/10.13039 /100000006), grant number: DOD/ONR N00014–13-1- 0205 and NSF NCS (https://dx.doi.org/10.13039 /100000169), grant number: 1835181. Diversity in Citation Practices Retrospective analysis of the citations in every article pub- lished in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience ( JoCN ) during this period were M(an)/M = .407, W(oman)/M = .32, M/ W = .115, and W/ W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/ W = .109, and W/ W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encour- ages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the oppor- tunity to report their article’s gender citation balance. Notes 1. Some readers may wonder why poorer performance was found at pretest in the rote learning condition (16 of 100 correct) compared to pretest in the generalized learning condition (29 of 100). One potential explanation for this could be the difference in phonetic diversity between the two lists given their word set size differences. Although the rote list was constructed to closely capture the phonetic diversity of the top occurring phonetic items in the generalized words list (see the Stimuli section), the rote list’s phonemic inventory is smaller than the generalized learning’s inventory simply because of the word set size differ- ences between the lists. As such, the error rate for the rote word list may be higher than for the larger sample used in generalized learning. Moreover, we were less concerned about matching ini- tial performance given that rote training is known to quickly move performance to ceiling with repeat practice. 2. An additional GFP analysis using “test order” as a between- subject covariate yielded near identical pre–post effects, suggest- ing that it is highly unlikely that the observed pre–post effects were driven by any one particular test order. 3. An additional TANOVA analysis using “test order” as a between-subject covariate yielded near identical pre–post effects, suggesting that it is highly unlikely that the observed pre–post effects were driven by any one particular test order. REFERENCES Ahissar, M., Nahum, M., Nelken, I., & Hochstein, S. (2009). Reverse hierarchies and sensory learning. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 364, 285–299. https://doi.org/10.1098 /rstb.2008.0253, PubMed: 18986968 Ahveninen, J., Jääskeläinen, I. P., Raij, T., Bonmassar, G., Devore, S., Hämäläinen, M., et al. (2006). Task-modulated “what” and “where” pathways in human auditory cortex. Proceedings of the National Academy of Sciences, U.S.A., 103, 14608–14613. https://doi.org/10.1073/pnas.0510480103, PubMed: 16983092 Alain, C., Campeanu, S., & Tremblay, K. (2010). Changes in sensory evoked responses coincide with rapid improvement in speech identification performance. Journal of Cognitive Neuroscience, 22, 392–403. https://doi.org/10.1162/jocn.2009 .21279, PubMed: 19485700 Alain, C., & Snyder, J. S. (2008). Age-related differences in auditory evoked responses during rapid perceptual learning. Clinical Neurophysiology, 119, 356–366. https://doi.org/10 .1016/j.clinph.2007.10.024, PubMed: 18083619 Alain, C., Snyder, J. S., He, Y., & Reinke, K. S. (2007). Changes in auditory cortex parallel rapid perceptual learning. Cerebral Cortex, 17, 1074–1084. https://doi.org/10.1093/cercor/bhl018, PubMed: 16754653 Andrews, R. J., Knight, R. T., & Kirby, R. P. (1990). Evoked potential mapping of auditory and somatosensory cortices 442 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 in the miniature swine. Neuroscience Letters, 114, 27–31. https://doi.org/10.1016/0304-3940(90)90423-7, PubMed: 2116608 Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442–481. https://doi.org/10.1037/0033-295X.105.3.442, PubMed: 9697427 Ashby, F. G., & O’Brien, J. B. (2005). Category learning and multiple memory systems. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2004.12.003, PubMed: 15668101 Berg, P., & Scherg, M. (1994). A multiple source approach to the correction of eye artifacts. Electroencephalography and Clinical Neurophysiology, 90, 229–241. https://doi.org/10 .1016/0013-4694(94)90094-9, PubMed: 7511504 Bosnyak, D. J., Eaton, R. A., & Roberts, L. E. (2004). Distributed auditory cortical representations are modified when non-musicians are trained at pitch discrimination with 40 Hz amplitude modulated tones. Cerebral Cortex, 14, 1088–1099. https://doi.org/10.1093/cercor/bhh068, PubMed: 15115745 Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106, 707–729. https://doi.org /10.1016/j.cognition.2007.04.005, PubMed: 17532315 Chomsky, N., & Halle, M. (1968). The sound pattern of English. Cambridge, MA: MIT Press. Egan, J. P. (1948). Articulation testing methods. Laryngoscope, 58, 955–991. https://doi.org/10.1288/00005537-194809000 -00002, PubMed: 18887435 Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2013). Sleep restores loss of generalized but not rote learning of synthetic speech. Cognition, 128, 280–286. https://doi.org/10.1016/j .cognition.2013.04.007, PubMed: 23747650 e138. https://doi.org/10.1371/journal.pbio.0060138, PubMed: 18547141 Heald, S. L. M., & Nusbaum, H. C. (2014). Speech perception as an active cognitive process. Frontiers in Systems Neuroscience, 8, 35. https://doi.org/10.3389/fnsys.2014.00035, PubMed: 24672438 Ing-Simmons, N. (1994). RSYNTH: Complete speech synthesis system for UNIX [Computer software]. https://www.speech .cs.cmu.edu/comp.speech/Section5/Synth/rsynth.html. Jääskeläinen, I. P., Ahveninen, J., Bonmassar, G., Dale, A. M., Ilmoniemi, R. J., Levanen, S., et al. (2004). Human posterior auditory cortex gates novel sounds to consciousness. Proceedings of the National Academy of Sciences, U.S.A., 101, 6809–6814. https://doi.org/10.1073/pnas.0303760101, PubMed: 15096618 Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971–995. https://doi.org/10.3758/BRM.42.3.863, PubMed: 20805608 König, T., & Gianotti, L. (2009). Scalp field maps and their characterization. In C. Michel, T. König, D. Brandeis, L. Gianotti & J. Wackermann (Eds.), Electrical Neuroimaging (pp. 25–48). Cambridge: Cambridge University Press. https:// doi.org/10.1017/CBO9780511596889.003 Koenig, T., Kottlow, M., Stein, M., & Melie-García, L. (2011). Ragu: A free tool for the analysis of EEG and MEG event- related scalp field data using global randomization statistics. Computational Intelligence and Neuroscience, 2011, 1–14. https://doi.org/10.1155/2011/938925, PubMed: 21403863 Koenig, T., & Melie-García, L. (2010). A method to determine the presence of averaged event-related fields using randomization tests. Brain Topography, 23, 233–242. https:// doi.org/10.1007/s10548-010-0142-1 Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2003). Ladefoged, P., & Broadbent, D. E. (1957). Information Consolidation during sleep of perceptual learning of spoken language. Nature, 425, 614–616. https://doi.org/10.1038 /nature01951, PubMed: 14534586 Francis, A. L., Baldwin, K., & Nusbaum, H. C. (2000). Effects of training on attention to acoustic cues. Perception & Psychophysics, 62, 1668–1680. https://doi.org/10.3758 /BF03212164, PubMed: 11140187 Francis, A. L., & Nusbaum, H. C. (2002). Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance, 28, 349–366. https://doi.org/10.1037/0096-1523 .28.2.349, PubMed: 11999859 Francis, A. L., & Nusbaum, H. C. (2009). Effects of intelligibility on working memory demand for speech perception. Attention, Perception, & Psychophysics, 71, 1360–1374. https://doi.org/10.3758/APP.71.6.1360, PubMed: 19633351 Goldstone, R. L. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123, 178–200. https://doi.org/10.1037 /0096-3445.123.2.178 Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psychology, 49, 585–612. https://doi.org/10.1146/annurev .psych.49.1.585, PubMed: 9496632 Greenspan, S. L., Nusbaum, H. C., & Pisoni, D. B. (1988). Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 421–433. https://doi.org/10.1037/0278 -7393.14.3.421, PubMed: 2969941 Gutschalk, A., Micheyl, C., & Oxenham, A. J. (2008). Neural correlates of auditory perceptual awareness under informational masking. PLoS Biology, 6, e138. https://doi.org /10.1371/journal.pbio.0060138, PubMed: 18547141 Gutschalk, A., Micheyl, C., Oxenham, A. J., von Kriegstein, K., & Warren, J. (2008). Neural correlates of auditory perceptual awareness under informational masking. PLoS Biology, 6, conveyed by vowels. Journal of the Acoustical Society of America, 29, 98–104. https://doi.org/10.1121/1.1908694 Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451. https://doi .org/10.1037/0096-1523.21.3.451, PubMed: 7790827 Liberman, A. M. (1970). The grammars of speech and language. Cognitive Psychology, 1, 301–323. https://doi.org/10.1016 /0010-0285(70)90018-6 Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert- Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431–461. https://doi.org/10.1037 /h0020279, PubMed: 4170865 Liégeois-Chauvel, C., Musolino, A., Badier, J. M., Marquis, P., & Chauvel, P. (1994). Evoked potentials recorded from the auditory cortex in man: Evaluation and topography of the middle latency components. Electroencephalography and Clinical Neurophysiology/ Evoked Potentials, 92, 204–214. https://doi.org/10.1016/0168-5597(94)90064-7, PubMed: 7514990 Luu, P., & Ferree, T. C. (2000). Determination of the geodesic sensor nets’ average electrode positions and their 10–10 international equivalents. Eugene, OR: Electronic Geodesics Inc. Magnuson, J. S., & Nusbaum, H. C. (2007). Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. Journal of Experimental Psychology: Human Perception and Performance, 33, 391–409. https:// doi.org/10.1037/0096-1523.33.2.391, PubMed: 17469975 McCallum, W. C., & Curry, S. H. (1979). Hemisphere differences in event related potentials and CNV’s associated with monaural stimuli and lateralized motor responses. In D. Lehmann & E. Callaway (Eds.), Human evoked potentials (pp. 235–250). Springer. https://doi.org/10.1007/978-1-4684 -3483-5_16 Heald et al. 443 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 McClelland, J. L., & Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159–188. https://doi.org/10.1037/0096-3445.114.2.159, PubMed: 3159828 McEvoy, L., Levänen, S., & Loveless, N. (1997). Temporal characteristics of auditory sensory memory: Neuromagnetic evidence. Psychophysiology, 34, 308–316. https://doi.org/10 .1111/j.1469-8986.1997.tb02401.x, PubMed: 9175445 Miller, S. M. (1987). Monitoring and blunting: Validation of a questionnaire to assess styles of information seeking under threat. Journal of Personality and Social Psychology, 52, 345. https://doi.org/10.1037/0022-3514.52.2.345, PubMed: 3559895 Murray, M. M., Brunet, D., & Michel, C. M. (2008). Topographic ERP analyses: A step-by-step tutorial review. Brain Topography, 20, 249–264. https://doi.org/10.1007/s10548-008-0054-5, PubMed: 18347966 Myers, E. B., & Theodore, R. M. (2017). Voice-sensitive brain networks encode talker-specific phonetic detail. Brain and Language, 165, 33–44. https://doi.org/10.1016/j.bandl.2016 .11.001, PubMed: 27898342 Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125, 826–859. https://doi.org/10.1037 /0033-2909.125.6.826, PubMed: 10589304 Nielsen, F. Å. (2003). The Brede database: A small database for functional neuroimaging. Neuroimage, 19, 19–22. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57. https://doi .org/10.1037/0096-3445.115.1.39, PubMed: 2937873 Nusbaum, H. C., & Pisoni, D. B. (1985). Constraints on the perception of synthetic speech generated by rule. Behavior Research Methods, Instruments, & Computers, 17, 235–242. https://doi.org/10.3758/BF03214389, PubMed: 24511177 Nusbaum, H. C., Uddin, S., Van Hedger, S. C., & Heald, S. L. (2018). Consolidating skill learning through sleep. Current Opinion in Behavioral Sciences, 20. https://doi.org/10.1016/j .cobeha.2018.01.013 Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355–376. https://doi.org/10.3758/BF03206860, PubMed: 9599989 Petkov, C. I., Kang, X., Alho, K., Bertrand, O., Yund, E. W., & Woods, D. L. (2004). Attentional modulation of human auditory cortex. Nature Neuroscience, 7, 658–663. https://doi .org/10.1038/nn1256, PubMed: 15156150 Piai, V., Roelofs, A., Acheson, D. J., & Takashima, A. (2013). Attention for speaking: Domain-general control from the anterior cingulate cortex in spoken word production. Frontiers in Human Neuroscience, 7, 832. https://doi.org/10 .3389/fnhum.2013.00832, PubMed: 24368899 Picton, T. W. (2011). Human auditory evoked potentials. Ear and hearing (Vol. 33). San Diego: Plural Publishing. Picton, T. W., Van Roon, P., Armilio, M. L., Berg, P., Ille, N., & Scherg, M. (2000). The correction of ocular artifacts: A topographic perspective. Clinical Neurophysiology, 111, 53–65. https://doi .org/10.1016/S1388-2457(99)00227-8, PubMed: 10656511 Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences, U.S.A., 97, 11800–11806. https://doi.org/10.1073/pnas.97.22.11800, PubMed: 11050212 Reinke, K. S., He, Y., Wang, C., & Alain, C. (2003). Perceptual learning modulates sensory evoked response during vowel segregation. Cognitive Brain Research, 17, 781–791. https:// doi.org/10.1016/S0926-6410(03)00202-7, PubMed: 14561463 Ross, B., Jamali, S., Tremblay, K. L., Abdi, H., Toga, A., Evans, A., et al. (2013). Plasticity in neuromagnetic cortical responses suggests enhanced auditory object representation. BMC Neuroscience, 14, 151. https://doi.org/10.1186/1471-2202-14 -151, PubMed: 24314010 Ross, B., & Tremblay, K. (2009). Stimulus experience modifies auditory neuromagnetic responses in young and older listeners. Hearing Research, 248, 48–59. https://doi.org/10 .1016/j.heares.2008.11.012, PubMed: 19110047 Russell, G. S., Jeffrey Eriksen, K., Poolman, P., Luu, P., & Tucker, D. M. (2005). Geodesic photogrammetry for localizing sensor positions in dense-array EEG. Clinical Neurophysiology, 116, 1130–1140. https://doi.org/10.1016/j.clinph.2004.12.022, PubMed: 15826854 Scherg, M., Vajsar, J., & Picton, T. W. (1989). A source analysis of the late human auditory evoked potentials. Journal of Cognitive Neuroscience, 1, 336–355. https://doi.org/10.1162 /jocn.1989.1.4.336, PubMed: 23971985 Schwab, E. C., Nusbaum, H. C., & Pisoni, D. B. (1985). Some effects of training on the perception of synthetic speech. Human Factors, 27, 395–408. https://doi.org/10.1177 /001872088502700404, PubMed: 2936671 Shahin, A., Roberts, L. E., Pantev, C., Trainor, L. J., & Ross, B. (2005). Modulation of P2 auditory-evoked responses by the spectral complexity of musical sounds. NeuroReport, 16, 1781–1785. https://doi.org/10.1097/01.wnr.0000185017 .29316.63, PubMed: 16237326 Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127–190. https://doi.org/10.1037/0033-295X.84.2.127 Spaniol, J., Davidson, P. S. R., Kim, A. S. N., Han, H., Moscovitch, M., & Grady, C. L. (2009, July). Event-related fMRI studies of episodic encoding and retrieval: Meta-analyses using activation likelihood estimation. Neuropsychologia, 47, 1765–1779. https://doi.org/10.1016/j .neuropsychologia.2009.02.028, PubMed: 19428409 Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372, 90–92. https:// doi.org/10.1038/372090a0, PubMed: 7969425 Tremblay, K. L., Ross, B., Inoue, K., McClannahan, K., & Collet, G. (2014). Is the auditory evoked P2 response a biomarker of learning? Frontiers in Systems Neuroscience, 8, 28. https:// doi.org/10.3389/fnsys.2014.00028, PubMed: 24600358 Tremblay, K. L., Shahin, A. J., Picton, T., & Ross, B. (2009). Auditory training alters the physiological detection of stimulus-specific cues in humans. Clinical Neurophysiology, 120, 128–135. https://doi.org/10.1016/j.clinph.2008.10.005, PubMed: 19028139 Uddin, S., Reis, K. S., Heald, S. L., Van Hedger, S. C., & Nusbaum, H. C. (2020). Cortical mechanisms of talker normalization in fluent sentences. Brain and Language, 201, 104722. https:// doi.org/10.1016/j.bandl.2019.104722, PubMed: 31835154 Weatherholtz, K., & Jaeger, T. F. (2016). Speech perception and generalization across speakers and accents. Oxford Research Encyclopedia of Linguistics. https://doi.org/10.1093/acrefore /9780199384655.013.95 Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normalization. Journal of Cognitive Neuroscience, 16, 1173–1184. https://doi.org/10.1162 /0898929041920522, PubMed: 15453972 Woods, D. L., Stecker, G. C., Rinne, T., Herron, T. J., Cate, A. D., Yund, E. W., et al. (2009). Functional maps of human auditory cortex: Effects of acoustic features and attention. PLoS One, 4. https://doi.org/10.1371/journal.pone.0005183, PubMed: 19365552 Yvert, B., Fischer, C., Bertrand, O., & Pernier, J. (2005). Localization of human supratemporal auditory areas from intracerebral auditory evoked potentials using distributed source models. Neuroimage, 28, 140–153. https://doi.org/10 .1016/j.neuroimage.2005.05.056, PubMed: 16039144 444 Journal of Cognitive Neuroscience Volume 34, Number 3 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 4 3 4 2 5 1 9 8 4 8 4 9 / j o c n _ a _ 0 1 8 0 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image
Going Beyond Rote Auditory Learning: Neural Patterns image

Scarica il pdf