Concurrent Sound Segregation Is - IA de Investigación especializada en el MIT

La segregación de sonido concurrente es
Enhanced in Musicians

Benjamin Rich Zendel and Claude Alain

Abstracto

& The ability to segregate simultaneously occurring sounds is
fundamental to auditory perception. Many studies have shown
that musicians have enhanced auditory perceptual abilities;
sin embargo, the impact of musical expertise on segregating con-
currently occurring sounds is unknown. Por lo tanto, we exam-
ined whether long-term musical training can improve listeners’
ability to segregate sounds that occur simultaneously. Partic-
ipants were presented with complex sounds that had either all
harmonics in tune or the second harmonic mistuned by 1%,
2%, 4%, 8%, o 16% of its original value. The likelihood of
hearing two sounds simultaneously increased with mistuning,
and this effect was greater in musicians than nonmusicians.
The segregation of the mistuned harmonic from the harmonic

series was paralleled by an object-related negativity that was
larger and peaked earlier in musicians. It also coincided with a
late positive wave referred to as the P400 whose amplitude was
larger in musicians than in nonmusicians. The behavioral and
electrophysiological effects of musical expertise were specific
to processing the mistuned harmonic as the N1, the N1c, y
the P2 waves elicited by the tuned stimuli were comparable in
both musicians and nonmusicians. These results demonstrate
that listeners’ ability to segregate concurrent sounds based on
harmonicity is modulated by experience and provides a basis
for further studies assessing the potential rehabilitative effects
of musical training on solving complex scene analysis problems
illustrated by the cocktail party example. &

INTRODUCCIÓN

Musical performance requires rapid, accurate, and consis-
tent perceptual organization of the auditory environment.
Específicamente, this requires the organization of acoustic
components that occur simultaneously (es decir., concurrent
sound organization) as well as the organization of suc-
cessive sounds that takes place over several seconds (es decir.,
sequential organization). Broadly, this organization of the
auditory world is known as ‘‘auditory scene analysis,''
which is important because natural auditory environments
often contain multiple sound sources that occur simulta-
neously (Bregman, 1990). The present study focused on
the impact of musical expertise on listeners’ ability to
perceptually organize sounds that occur concurrently.

A powerful way to organize the incoming acoustic
waveform is based on the harmonic relations between
components of a single physical sound source. If a tonal
component is not harmonically related to the sound’s
fundamental frequency ( f0), it can be heard as a simul-
taneous but separate entity, especially if it is a lower
rather than a higher harmonic and if the amount of
mistuning is greater than 4% of its original value (Alain,
2007; moore, Glasberg, & Peters, 1986). The mecha-
nisms underlying the perception of the mistuned har-
monic as a separate sound are not well understood but
likely involve neurons that are sensitive to frequency

universidad de toronto

periodicity. Neurophysiological studies indicate that vio-
lations of harmonicity (es decir., a mistuned harmonic) son
registered at various stages along the ascending auditory
pathways including the auditory nerve (Sinex, Guzik, &
Sabes, 2003), the cochlear nucleus (Sinex, 2008), el en-
ferior colliculus (Sinex, Sabes, & li, 2002), and the pri-
mary auditory cortex (Fishman et al., 2001). These early
and automatic representations of frequency suggest that
violations of harmonicity are encoded as primitive cues
to parsing the auditory scene.

In humans, the neural correlates of concurrent sound
processing have been investigated using scalp recorded
ERPs. When ERPs elicited by a complex sound are com-
pared with those elicited by the same complex sound
with a mistuned tonal component (especially above 8%),
an increased negativity is observed, which peaks around
140 msec poststimulus onset (see Alain, 2007). This object-
related negativity (ORN) is best illustrated by subtracting
ERPs to tuned stimuli from those elicited by the mistuned
estímulos. The difference wave reveals a negative deflection
at fronto-central sites that reverses in polarity at electrodes
placed near the mastoids and the cerebellar areas.

The segregation of concurrent sounds based on har-
monicity, as indexed by ORN generation, is little affected
by attentional demands, as ORN can be observed in situ-
ations where participants are attending to other tasks
including a contralateral auditory task (Alain & Izenberg,
2003), reading a book (Alain, Arnott, & Picton, 2001), o

D 2008 Instituto de Tecnología de Massachusetts

Revista de neurociencia cognitiva 21:8, páginas. 1488–1498

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

t
t

F
/

i
t
.

:
/
/

F
r
oh
metro
D
h
oh
t
w
t
norte
pag
oh
:
a
/
d
/
mi
metro
d
i
F
t
r
oh
pag
metro
r
C
h
.
s
pag
i
yo
d
v
i
mi
r
mi
r
C
C
t
.
h
metro
a
i
r
mi
.
d
tu
C
oh
oh
metro
C
/
norte
j
a
oh
r
C
t
i
norte
C
/
mi
a
–
pag
r
d
t
i
2
C
1
yo
8
mi
–
1
pag
4
d
8
F
8
/
1
2
9
1
3
/
7
8
8
/
6
1
0
4
oh
8
C
8
norte
/
1
2
0
7
0
6
9
0
2
2
6
1
7
1
4
/
0
j
oh
pag
C
d
norte
.
b
y
2
0
gramo
0
tu
9
mi
.
s
t
2
oh
1
norte
1
4
0
0
8
.
S
pag
mi
d
pag
F
mi
metro
b
y
b
mi
gramo
r
tu
2
0
mi
2
s
3
t

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

watching a silent movie (Alain, Schuler, & McDonald,
2002). These findings provide strong support for the
proposal that the organization of simultaneous auditory
objects is not under volitional control. Sin embargo, cuando
participants were asked to make a perceptual judgment
about the incoming complex sounds (es decir., if he or she
heard one sound or two simultaneous sounds), the like-
lihood of reporting two concurrent sounds was corre-
lated with ORN amplitude (see Alain, 2007). Además,
when subjects reported hearing two simultaneous sounds,
a later positive difference (tuned minus mistuned stim-
uli) wave peaking at about 400 msec after sound onset
(P400) emerged (Alain et al., 2001). Like the ORN, el
amplitude of the P400 correlated with perceptual judg-
mento, being larger when participants perceived the mis-
tuned harmonic as a separate tone (Alain et al., 2001).
These findings suggest that the P400 reflects a conscious
evaluation and decision-making process regarding the
number of auditory objects present, whereas the ORN
reflects low-level primitive perceptual organization (ver
Alain, 2007).

One important issue that remains unanswered and
deserves further empirical work is whether the organi-
zation of simultaneous acoustic components can be en-
hanced by experience. It is well accepted that auditory
scene analysis engages learned schema-driven processes
that reflect listeners’ intention, experiencia, and knowl-
edge of the auditory environment (Bregman, 1990). Para
instancia, psychophysical studies have shown that pre-
senting an auditory cue with an identical frequency to an
auditory target improved detection of the target when
embedded in noise (Hafter, Schlauch, & Espiga, 1993;
Schlauch & Hafter, 1991). Similarmente, familiarity with a
melody facilitates detection when interweaved with dis-
tracter sounds (Bey & McAdams, 2002; Dowling, 1973).
Por eso, schema-driven processes provide a way to resolve
perceptual ambiguity in complex listening situations
when the signal to noise ratio is poor. In more recent
estudios, short-term training (over the course of an hour
or a few days) has been shown to improve listeners’
ability to segregate and to identify two synthetic vowels
presented simultaneously in young (Alain, Snyder, Él, &
Reinke, 2007; Reinke, Él, Wang, & Alain, 2003) también
as in older adults (Alain & Snyder, 2008), suggesting that
learning and intention can enhance sound segregation
and identification. Sin embargo,
it is unclear from these
studies whether improvement in identifying concurrent
vowels occurred because of a greater reliance on schema-
driven processes or whether the improvement also reflects
learning-related changes in primitive auditory processes.
Studies measuring scalp-recorded ERPs suggest that
musical expertise may be associated with neuroplastic
changes in early sensory processes. Por ejemplo, the am-
plitude of the N11 (Pantev, Roberts, Schultz, Engelien, &
ross, 2001; Pantev et al., 1998), N1c (Shahin, Bosnyak,
Trainor, & Roberts, 2003), and P2 (Shahin, Roberts, Pantev,
Trainor, & ross, 2005; Shahin et al., 2003) ondas, evoked

by transient tones with musical timbres, are larger in
musicians compared with nonmusicians. The N1 is
further enhanced in musicians when the evoking stim-
ulus is similar in timbre to the instrument on which
they were trained, with violin tones evoking a larger
response in violinists and trumpet tones evoking a larger
response in trumpeters (Pantev et al., 2001). Similarmente,
increasing the spectral complexity of a sound so that it
approached the sound of a real piano yielded a larger P2
wave in musicians compared with nonmusicians (Shahin
et al., 2005). More importantly, these enhancements are
smaller or nonexistent when presented with pure tones,
suggesting that the observed changes in sensory-evoked
responses in musicians are specific to musical stimuli
(Shahin et al., 2005; Pantev et al., 1998). Además de
the cortical change related to processing sounds with
musical timbres, evidence suggests that the encoding of
frequency at the subcortical level (es decir., the brain stem) es
also enhanced in musicians, which suggests that low-
level auditory processing may be modulated by experi-
ence (Wong, Skoe, ruso, Dees, & Kraus, 2007).

The current study investigated whether long-term mu-
sical training influenced the segregation of concurrently
occurring sounds. The nature of music performance in-
volves the processing of multiple sounds occurring
simultaneously, which leads us to believe that expert
musicians should demonstrate enhanced concurrent
sound segregation paralleled by modulations to the
associated neural correlates. By using nonmusical stim-
uli, we assessed whether general (not specific to music)
processes were influenced by long-term musical train-
En g. To test this hypothesis, we presented participants
with complex sounds similar to those of Alain et al.
(2001), and they indicated whether the incoming har-
monic series fused into a single auditory object or
whether it segregated into two distinct sounds, eso es,
a buzz plus another sound with a pure tone quality. En
addition, the same stimuli were presented without
requiring a response to examine whether electrophysi-
ological differences related to musical expertise were
response dependent. It was expected that the percep-
tion of concurrent auditory objects will increase as a
function of mistuning and that the perception of con-
current sounds will be paralleled by ORN and P400
ondas, as was found in previous studies (p.ej., Alain &
Izenberg, 2003; Alain et al., 2001, 2002). Además, él
was hypothesized that musicians will be more likely to
report hearing the mistuned harmonic as a separate
sound and that these behavioral changes will be accom-
panied by changes to the ORN and the P400 waves.

MÉTODOS

Participantes

Twenty-eight participants were recruited for the study:
14 expert musicians (m = 28.2 años, DE = 3.2, 8 women)

Zendel and Alain

1489

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

F
/

t
t

i
t
.

:
/
/

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

y 14 nonmusicians (m = 32.9 año, DE = 9.9, 7 women).
Expert musicians were defined as having advanced musi-
cal training (es decir., undergraduate or graduate degree in
música, conservatory Grade 8 or equivalent) y estafa-
tinued to practice on a regular basis. Nonmusicians had
no more then 1 year of formal or self-directed music
lessons and did not play any musical instruments. Todo
participants were screened for hearing loss and neurologi-
cal and psychiatric illness. Además, all participants had
pure tone thresholds below 30 dB hearing level (HL) para
frequencies ranging from 250 a 8000 Hz.

Estímulos

Stimuli consisted of six complex sounds each compris-
ing six harmonically related tonal elements. The funda-
mental frequency was 220 Hz. Each component (220,
440, 660, 880, 1100, y 1320 Hz) was a pure tone sine
wave generated with Sig-Gen software (Tucker-Davis
Tecnología, Alachua, Florida) and had durations of 150 mseg
con 10 msec rise/fall times. The pure tone components
were combined into a harmonic complex using Cubase
SX (Steinberg, V.3.0, Las Vegas, NV). The third compo-
próximo (second harmonic) of the series (660 Hz) era
either tuned or mistuned by 1%, 2%, 4%, 8%, o 16%,
correspondiente a 666.6, 673.2, 686.4, 712.8, y 765.6 Hz,
respectivamente. All stimuli were presented binaurally at 80 dB
sound pressure level (SPL) through ER 3A insert earphones
(Etymotic Research, Elk Grove).

Procedimiento

Stimuli were presented in two listening conditions,
active and passive. Un total de 720 stimulus iterations
(120 exemplars of each stimulus type) were presented
in each condition. During the passive condition, partícipe-
ipants were instructed to relax and not to pay attention
to the sounds being presented. The passive condition
was spread across two blocks of 360 randomly ordered
stimulus presentations with interstimulus intervals (ISIs)
that varied randomly between 1200 y 2000 mseg. El
active condition was spread across four blocks of 180
stimulus presentations in random order with an ISI that
varied randomly between 2000 y 3000 mseg. Después
each trial, participants indicated whether they heard one
complex sound (es decir., a buzz) or whether they heard two
sounds (es decir., a buzz plus another sound with a pure
tone quality) by pressing a button on a response box.
The longer ISI in the active condition allowed time for a
respuesta. All participants first completed a passive block,
then four active blocks, and finally a second passive block.

Recording of Electrical Brain Activity

Neuroelectric brain activity was digitized continuously
de 64 scalp locations with a band-pass filter of 0.05–
100 Hz and a sampling rate of 500 Hz per channel using

SynAmps2 amplifiers (Compumedics Neuroscan, El Paso,
Texas) and stored for analysis. Electrodes on the outer
canthi and at the superior and inferior orbit monitored
ocular activity. During recording, all electrodes were ref-
erenced to electrode Cz; sin embargo, for data analysis, nosotros
re-referenced all electrodes to an average reference.

All averages were computed using BESA software (ver-
sión 5.1.6). The analysis epoch included 100 msec of
prestimulus activity and 1000 msec of poststimulus
actividad. Trials containing excessive noise (±125 AV) en
electrodes not adjacent to the eyes (es decir., IO1, IO2, LO1,
LO2, FP1, FP2, FP9, FP10) were rejected before averag-
En g. ERPs were then averaged separately for each con-
condición, stimulus type, and electrode site.

Para cada participante, a set of ocular movements was
obtained before and after the experiment (Picton et al.,
2000). From this set, averaged eye movements were
calculated for both lateral and vertical eye movements
as well as for eye blinks. A PCA of these averaged re-
cordings provided a set of components that best ex-
plained the eye movements. The scalp projections of
these components were then subtracted from the ex-
perimental ERPs to minimize ocular contamination such
as blinks, saccades, and lateral eye movements for each
individual average. ERPs were then digitally low-pass
filtered to attenuate frequencies above 30 Hz.

All data were analyzed using a mixed design repeated
measures ANOVA with musical training (musician and
nonmusician) as a between-subjects factor and mistun-
ing of the second harmonic (tuned, 1%, 2%, 4%, 8%, y
16%) as a within-subjects factor. For ERP data, condición
(active and passive) and various electrode montages
were included as within-subjects factors. The first analy-
sis examined the effect of musical expertise on the peak
amplitude and the latency of the N1, N1c, P2, and late
positive complex (LPC). The N1 wave was defined as the
largest negative deflection between 85 y 120 mseg
and was quantified at fronto-central scalp sites (Fz, F1,
F2, FCz, FC1, FC2, Cz, C1, and C2). The N1c was defined
as the maximum negative deflection between 110 y
210 msec at the left and right (T7/T8) temporal elec-
trodes. The P2 peak was measured during the 130- y
the 230-msec interval at fronto-central scalp sites (Fz, F1,
F2, FCz, FC1, FC2, Cz, C1, and C2). Por último, the LPC was
quantified between 300 y 700 msec at parietal and
parieto-occipital sites (Pz, P1, P2, POz, PO3, and PO4).
The second and the third analyses focus on the ORN
and the P400 components, respectivamente. The effect of
musical expertise on the ORN was quantified by com-
paring the mean amplitude during the 100- to 180-msec
interval following stimulus onset with ANOVA, using mu-
sical expertise, listening condition, and mistuning level
as factors. Two analyses were conducted over two dif-
ferent brain regions: The first was quantified over nine
fronto-central electrodes (Fz, F1, F2, FCz, FC1, FC2, Cz,
C1, and C2), and the second was quantified over four
mastoid/cerebellar electrodes (M1, M2, CB1, and CB2).

1490

Revista de neurociencia cognitiva

Volumen 21, Número 8

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

t
t

F
/

i
t
.

:
/
/

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

These electrodes were chosen because the peak activa-
tion of the ORN and its inversion were observed at
these points. Además, the measurements over the left
and right mastoids and the cerebellar electrodes allow
us to test for potential hemispheric differences in pro-
cessing the mistuned harmonic. For the P400, el efecto
of musical expertise was quantified for the mean ampli-
tude during the 300- to 400-msec interval with ANOVA,
using musical expertise and mistuning level as factors
(condition was excluded for reasons explained below). Como
with the ORN, two analyses were conducted over two dif-
ferent brain regions. The first was quantified over a wid-
ened fronto-central scalp region to account for the right
asymmetry of the P400 (Fz, F1, F2, FCz, FC1, FC2, Cz, C1,
C2, C3, and C4), and the second was quantified over the
left and the right mastoid/cerebellar sites (CB1, CB2, M1,
and M2). Además, the rate of change in amplitude
during both of these time windows (100–180 and 300–
400 mseg) as a function of mistuning and musical exper-
tise was also examined by orthogonal polynomial decom-
position with a focus on the linear and quadratic trends.
Preliminary analyses indicated that the ORN recorded
during the first and the second passive listening blocks
were comparable. De este modo, the ERPs recorded during these
two blocks of trials were averaged together, and subse-
quent analyses were performed on the ERPs averaged
across block. For the P400 wave, the effects of musical
expertise and mistuning were limited to ERPs recorded
during the active listening condition because there was
no reliable P400 wave during the passive listening (differ-
ences between Blocks 1 y 2 were also examined, y
no difference was found). De este modo, all analyses on the P400
were done only during active listening.

RESULTADOS

Datos de comportamiento

Cifra 1 shows the proportion of trials where partici-
pants reported hearing two concurrent sounds as a func-
tion of mistuning. The ANOVA yielded a main effect of

mistuning, F(5,130) = 133.7, pag < .001, and a significant interaction between expertise and mistuning, F(5,130) = 3.68, p < .01. Post hoc comparisons revealed that musi- cians were more likely than nonmusicians to report hear- ing two simultaneous sounds when the second harmonic was mistuned by 4%, 8%, and 16% ( p < .05 in all cases). There was no difference in perceptual judgment between musicians and nonmusicians when the second harmonic was either tuned or mistuned by 1% ( p > .1), but there
was a trend toward a difference at 2% ( pag = .09).

Electrophysiological Data

Figure 2A and B show the group mean ERPs averaged
across stimulus type during active and passive listening,
respectivamente. The ERPs comprised N1 and P2 waves that
were largest over the fronto-central scalp sites and
peaked at about 100 y 180 msec after sound onset,
respectivamente. During active listening, the N1–P2 complex
was followed by a sustained potential that was positive
and maximal over the parietal regions, referred to as an
LPC. Primero, analyses of N1, N1c, and P2 peaks were done
only on tuned stimuli to examine whether musical
expertise modulates the processing of complex sounds
irrespective of mistuning. The main effect of musical
expertise on the N1, N1c, and P2 amplitude was not
significant nor was the interaction between musical
expertise and listening condition ( p > .2 en todos los casos).
The N1 and N1c were both larger in active listening,
F(1,26) = 30.9 y 14.0, pag < .01; however, the P2 was not affected by listening condition ( p > .2).

In subsequent analyses, mistuning was included as an
additional factor. As expected, the N1 and the N1c waves
were larger during active than passive listening, F(1,26) =
41.93 y 12.08, pag < .01, and the P2 wave was not affected by listening conditions ( p > .2). The main effect
of musical expertise and the interaction between exper-
tise and listening condition were not significant for N1,
N1c, or P2 ( p > .1); sin embargo, the effect of mistuning inter-
acted with musical expertise for the N1 and P2, F(5,130) =

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

F
/

t
t

i
t
.

:
/
/

/
j

Cifra 1. Percentage of
stimuli perceived as two
tones as a function of
mistuning of the second
harmonic (error bars = 1 SE).

oh
norte

1
8

METRO
a
y

2
0
2
1

Zendel and Alain

1491

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

t
t

F
/

i
t
.

:
/
/

/
j

Cifra 2. (A) Active listening: Sensory-evoked responses averaged across all mistuning conditions in active trials separated by group. The topographic
maps for each peak show activity at the following latencies: N1, 100 mseg; P2, 180 mseg; and LPC, 500 mseg. Electrode Cz is a solid black line, POz is a
dotted line, and all other electrodes are gray. Horizontal gray lines show that the amplitude of N1 and P2 is similar between musicians and nonmusicians, y
that the LPC is larger in musicians. (B) Passive listening: Sensory-evoked responses averaged across all mistuning conditions in passive trials separated by
grupo. The topographic maps for each peak show activity at the following latencies: N1, 100 mseg; and P2, 180 mseg. Electrode Cz is a solid black line, POz
is a dotted line, and all other electrodes are gray. Horizontal gray lines show that the amplitude of N1 and P2 is similar between musicians and nonmusicians.

3.3 y 2.3, pag < .05, but no effect of mistuning was observed for the N1c ( p > .05). The source of the N1
interaction was an increasing negativity for N1 in musi-
cians but not nonmusicians, whereas the source of the P2
interaction was an increasing negativity for nonmusicians
but not musicians. This interaction is likely due to the
differing latencies of the ORN between groups and is
explained in more detail below. Por último, the LPC was
significantly larger in musicians during active listening,
F(1,26) = 5.4, pag < .05, and was not observed in passive trials (see Figure 2). In addition, the effect of mistuning on the LPC was significant, F(5,130) = 6.81, p < .01. Post hoc tests revealed a smaller LPC at the 2% and the 4% mistuning conditions compared with the tuned condition ( p < .01 in both cases), whereas no differences in LPC were observed in the 1%, the 8%, and the 16% mistuning conditions compared with the tuned condition ( p > .1).
The mistuning by expertise interaction was not significant
for LPC amplitude ( p > .2).

oh
norte

1
8

METRO
a
y

2
0
2
1

Object-related Negativity

In both groups, the increase in mistuning was associ-
ated with a greater negativity over the 100- to 180-msec
time window at fronto-central, F(5,130) = 16.2, pag < .01, and greater positivity at mastoid/cerebellar sites, F(5, 130) = 16.61, p < .01, consistent with an ORN that was superimposed over the N1 and the P2 waves, with 1492 Journal of Cognitive Neuroscience Volume 21, Number 8 generator(s) in auditory cortices along the superior tem- poral plane (Figures 3 and 4). The ANOVA also revealed a significant interaction between musical expertise and mistuning for the ORN recorded at mastoid/cerebellar sites, F(5,130) = 3.74, p < .01 [linear trend: F(1,26) = 6.7, p < .01; see Fig- ure 5], with a similar trend for the ORN measured at fronto-central sites, F(5,130) = 1.7, p = .14 [linear trend: F(1,26) = 4.93, p < .05]. To gain a better understanding of this interaction, we performed separate ANOVAs for each group. In musicians, pairwise comparisons re- vealed greater negativity in the 8% and the 16% mistun- ing conditions compared with the tuned and the 1% conditions ( p < .01 in all cases). In nonmusicians, only ERPs elicited by the 16% mistuned stimuli differed from those elicited by the tuned stimuli ( p < .05). This sug- gests that nonmusicians required greater level of mis- tuning than musicians to elicit an ORN. In addition, taking into account the polynomial decompositions, these results demonstrate that the ORN is larger in musi- cians compared with nonmusicians (greater change from tuned to 16% mistuned in musicians compared with nonmusicians at fronto-central: 0.686 versus 0.304 AV and mastoid/cerebellar: 0.888 vs. 0.429 AV). Finally, the inter- action between listening condition and mistuning level was not significant nor was the three-way interaction between group, listening condition, and mistuning level ( p > .1 en todos los casos). These latter analyses indicate that

the ORN was little affected by listening condition in both
grupos. Finalmente, the interaction between hemisphere,
mistuning, listening condition, and expertise was not sig-
nificant nor were any lower-order interactions that in-
cluded hemisphere as a factor at mastoid/cerebellar sites
( p > .1), indicating no hemispheric asymmetries in ORN
amplitude.

To asses the impact of musical expertise on the ORN
estado latente, we measured the peak latency of the difference
wave between ERPs elicited by the tuned and those
elicited by the 16% mistuned harmonic stimuli. The ORN
latency was quantified as the peak activity between 100
y 200 msec poststimulus onset at the midline fronto-
central electrode (FCz) in both active and passive listen-
ing conditions. The ANOVA, with expertise and listening
conditions as factors, yielded a main effect of expertise,
with ORN latency being shorter in musicians than in
nonmusicians (135 vs. 149 mseg), F(1,26) = 4.28, pag < .05. Finally, the main effect of listening condition was not significant nor was the interaction between musical expertise and listening condition, suggesting that the ORN latency is similar in both active and passive listen- ing ( p > .1 in both cases).

P400

In both groups, the P400 elicited during active listening
was slightly right lateralized over the fronto-central scalp

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

t
t

F
/

i
t
.

:
/
/

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

Cifra 3. (A) Active listening: Topographic maps of the ORN and the P400 at three angles recorded in during active listening. The ORN
contour maps show the peak amplitude for musicians (135 mseg) and nonmusicians (149 mseg). The P400 contour maps show the mean
peak amplitude for musicians (358 mseg) and nonmusicians (378 mseg). Black arrows in the top row indicate fronto-central ORN and P400
activación; arrows in the middle row show the inversion of the ORN and the P400 at mastoid and cerebellar sites. (B) Passive listening:
Topographic maps of the ORN at three angles recorded during passive listening. The latency maps show the ORN amplitude distribution of
the ORN at electrode FCz for musicians (135 mseg) and nonmusicians (149 mseg). Black arrows in the top row indicate fronto-central ORN
activación; arrows in the middle row show the inversion of ORN at mastoid and cerebellar sites.

Zendel and Alain

1493

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

F
/

t
t

i
t
.

:
/
/

/
j

Cifra 4. (A) Active listening: The difference between the evoked response in the 16% mistuned stimulus and the tuned stimulus in active trials. El
difference wave (in solid black) illustrates the ORN and the P400. Horizontal gray lines from the peak of the ORN and the P400 show the enhancement
in musicians. (B) Passive listening: The difference between the evoked response in the 16% mistuned stimulus and the tuned stimulus in passive trials.
The difference wave (in solid black) illustrates the ORN. Horizontal gray lines from the peak of the ORN show the enhancement in musicians.

region and inverted in polarity at mastoid/cerebellar sites
(Figura 3A). The increase in mistuning was associated
with an enhanced positivity over the 300- to 400-msec
time window at fronto-central sites, F(5,130) = 12.52,
pag < .01, and greater negativity at mastoid/cerebellar sites, F(5,130) = 13.31, p < .01, consistent with a P400 with generator(s) in auditory cortices along the superior tem- poral plane (Figures 3A and 4A). More importantly, the ANOVA on the mean amplitude over the 300- to the 400-msec interval yielded an interac- tion between musical expertise and mistuning at mastoid/ cerebellar sites, F(5,130) = 2.50, p < .05 [quadratic trend, F(1,26) = 8.37, p < .01], and fronto-central sites, F(5, 130) = 2.40, p < .05 [quadratic trend, F(1,26) = 1.55, p < .01]. To gain a better understanding of this interac- tion, we performed separate ANOVAs for each group. In musicians, pairwise comparisons revealed greater positiv- ity in the 8% and the 16% mistuning conditions compared with the tuned and the 1% conditions ( p < .05 in all cases). In nonmusicians, ERPs elicited by the 8% and the 16% mistuned stimuli differed from those elicited by only the tuned stimuli ( p < .05 in both cases). This sug- gests that both groups required similar levels of mistun- ing to elicit a P400. Taking into account the polynomial decompositions, the P400 was elicited with similar lev- els of mistuning but was larger in musicians (greater change from tuned to 16% mistuned in musicians com- pared with nonmusicians at fronto-central 1.02 vs. 0.77 AV o n 1 8 M a y 2 0 2 1 1494 Journal of Cognitive Neuroscience Volume 21, Number 8 Figure 5. ORN 100–180 msec: Mean amplitude of the evoked response averaged across four mastoid electrodes, from 100 to 180 msec poststimulus onset, as a function of mistuning (error bars = 1 SE). D o w n l o a d e d l l / / / / j f / t t i t . : / / f r o m D h o t w t n p o : a / d / e m d i f t r o p m r c h . s p i l d v i e r e r c c t . h m a i r e . d u c o o m c / n j a o r c t i n c / e a - p r d t i 2 c 1 l 8 e - 1 p 4 d 8 f 8 / 1 2 9 1 3 / 7 8 8 / 6 1 0 4 o 8 c 8 n / 1 2 0 7 0 6 9 0 2 2 6 1 7 1 4 / 0 j o p c d n . b y 2 0 g 0 u 9 e . s t 2 o 1 n 1 4 0 0 8 . S p e d p f e m b y b e g r u 2 0 e 2 s 3 t / j . f / . t . o n 1 8 M a y 2 0 2 1 and mastoid/cerebellar 1.29 vs. 0.78 AV). Finally, the interaction between hemisphere, mistuning, and musical expertise was not significant nor were any lower-order interactions that included hemisphere as a factor ( p > .1).
There was a significant main effect of hemisphere at
mastoid/cerebellar sites, F(1,26) = 10.35, pag < .01, indi- cating greater activity (not P400 because P400 requires a mistuning effect) recorded over the right hemisphere. The P400 latency was defined as the largest peak on the difference wave (ERPs to tuned stimuli minus ERPs elicited by the 16% mistuned stimuli) at electrodes C2 and C4 during the 250- to 450-msec interval. The latency of the P400 was slightly shorter in musicians compared with nonmusicians (358 vs. 378 msec); however, this effect was not statistically reliable ( p > .1).

DISCUSIÓN

The purpose of this study was to examine the influence
of long-term training on concurrent sound segregation.
We found that musicians were more likely to identify a
mistuned harmonic as a distinct auditory object com-
pared with nonmusicians. This was paralleled by larger
amplitude and earlier ORN waves and larger P400 waves.
Our behavioral and electrophysiological data demon-
strate that musicians have enhanced ability to partition
the incoming acoustic wave based on harmonic rela-
ciones. More importantly, these results cannot easily be
accounted for by models of auditory scene analysis that
postulate that low-level processes occur independently
of listeners’ experience. En cambio, the findings support
the more contemporary idea that long-term training
can alter even primitive perceptual functions (see Wong
et al., 2007; Koelsch, Schroger, & Tervaniemi, 1999;
Beauvois & Meddis, 1997).

The earlier and enhanced ORN amplitude in musicians
likely reflects greater abilities in the primitive processing
of periodicity cues. Studies measuring the mismatch
negativity (MMN) ola, an ERP component thought to
index a change detection process (p.ej., Picton et al., 2000;

Na¨a¨ta¨nen, Gaillard, & Mantysalo, 1978), have shown en-
hancements to the MMN in musicians across numerous
dominios, including violations of periodicity (Koelsch et al.,
1999), violations of melodic contour and interval structure
(Fujioka, Trainor, ross, Kakigi, & Pantev, 2004, 2005), y
violations of temporal structure (Russeler, Altenmuller,
Nager, Kohlmetz, & Munte, 2001). Curiosamente, Koelsch
et al. (1999) found that when the same components of
the harmonic series were presented in isolation, el
deviant mistuned tone evoked a comparable MMN in
both musicians and nonmusicians; sin embargo, cuando el
same deviant sound was presented as part of a chord,
musicians had a larger MMN and were able to identify the
deviant chord more consistently. Por lo tanto, a pesar de
both musicians and nonmusicians can detect differences
in frequency, musicians have an advantage when dealing
with concurrently occurring sounds and detecting viola-
tions of periodicity.

Detection of periodic (harmonic) violations must pre-
cede or coincide with concurrent sound segregation
because without detection, perception of a second
auditory object would be impossible. Although musical
training did not alter the amount of mistuning required
to perceive a second auditory object (2–4% in both
grupos), musicians were more consistent in their per-
ceptions, which suggests that as a result of musical train-
En g, harmonic violations are more easily detected by
músicos. The increased ability of musicians to detect
mistuning in a complex sound allows for more consis-
tent sound segregation.

Koelsch et al. (1999) observed musician-related en-
hancements at identifying mistuning in a complex sound.
The Koelsch et al. study used pure tones arranged as
acordes, which isolated the harmonic relations found in
instruments,
música, without using timbres of musical
much like the current study used mistuned harmonics
to investigate sound segregation without using stimuli
with musical timbres. Isolating low-level perceptual func-
ciones (from the effect of timbre) is paramount to draw-
ing conclusions about low-level scene analysis functions

Zendel and Alain

1495

because previous research has shown enhanced ampli-
tude for the N1 (Pantev et al., 1998, 2001), N1c (Shahin
et al., 2003), and P2 (Shahin et al., 2003, 2005) in mu-
sicians when presented with stimuli of musical timbre.
The enhancements to the N1, the N1c, and the P2 in
musicians are typically observed for musical sounds, es-
pecially for those that are similar to the instrument of
training (p.ej., piano tone for pianist, trumpet sounds for
trumpeter). The expertise-related differences in sensory-
evoked responses are typically small or even nonexistent
when musicians and nonmusicians are presented with
pure tones (see Shahin et al., 2003, 2005).

It is important to acknowledge the cortical source of
the myriad enhancements observed in musicians. Largo-
latency auditory-evoked responses (es decir., N1, N1c, y
P2) are thought to originate at various points along the
superior temporal plane (see Scherg, Vajsar, & Picton,
1999), and therefore enhancements to these waveforms
were thought to be due to cortical plasticity. Emerging
evidence suggests that the plasticity goes even deeper
and may be at the level of the brainstem (Wong et al.,
2007). Taking this new data into account, one could
hypothesize that enhancements to long latency auditory-
evoked responses are due to a stronger signal coming in
from the brain stem. In terms of the present study, el
ORN enhancements could be due to enhanced frequency
coding at precortical stages of the auditory pathway, as a
reliable ORN emerges with less mistuning in musicians
compared with nonmusicians. The data from the present
study cannot support or refute this hypothesis, y
further study is warranted.

En el presente estudio, cortical representations of har-
monic complexes (as indexed by N1, N1c, and P2 waves)
were similar in both musicians and nonmusicians. Group
differences were only observed in ERP components
related to the perception of simultaneous sounds. Har-
monic complexes are not domain specific to music; de este modo,
the lack of effects on the N1, the N1c, and the P2 waves
were to be expected. Musicians do, sin embargo, segregate
simultaneous sounds as part of their training. Perform-
ers in a large group must be able to segregate instru-
ments from one another; even practicing alone requires
the musician to segregate the sounds of his or her in-
strument from environmental noise. Some of this seg-
regation is probably based on harmonicity, which may
be why musicians demonstrate enhanced concurrent
sound processing.

The use of harmonicity as a cue for auditory scene
analysis in a musical setting also explains the enhance-
ment to the LPC. The LPC has been described as an
index of the decision-making process about an incoming
sound stimulus (Starr & Don, 1988). The data in the
current study support this explanation because the LPC
was smallest in conditions where the decision about the
harmonic complex was difficult (2–4% mistuning) para
both groups. This may be related to the increased var-
iance in behavioral performance in the 2% y el

4% mistuning conditions, indicating that LPC amplitude
might be related to the confidence in behavioral re-
sponses. Además, a larger LPC was observed in musi-
cians. Previous research demonstrated increased LPC
activity in musicians when making decisions about ter-
minal note congruity (Besson & Faita, 1995). The en-
hanced LPC in musicians in the current study may be
due to the salience of periodicity and violations of pe-
riodicity for musicians. For a performing musician, differ-
ent cues would require different behavioral responses.
Por ejemplo, a violinist in a group may determine that
she is slightly out of tune with the rest of the group and
adjust her fingering accordingly. For the lay person, slight
harmonic violations are not normally important. Este
alternative explanation suggests that the change in the
LPC observed in musicians is due to cortical enhance-
ments related to harmonic detection and related actions.
Despite the evidence for the effect of musical exper-
tise on primitive auditory scene analysis, some alterna-
tive explanations should be considered. One possibility
is that musicians were better at focusing their attention
to the frequency region of the mistuned harmonic. En
the present study, musicians may have realized that it
was always the second harmonic that was mistuned and
used this information to focus their attention to the
frequency of the mistuned harmonic. Although the bulk
of research suggests that the ORN indexes an attention-
independent process (Alain, 2007), there is some evi-
dence that under certain circumstances (es decir., cuando el
mistuned harmonic is predictable) the ORN amplitude
may be enhanced by attention (see Experiment 1 por
Alain et al., 2001). Por eso, the enhancements observed in
the ORN of musicians could be due to a greater alloca-
tion of attention to the frequency region of the mis-
tuned harmonic. Los datos, sin embargo, does not support
this view. Nonsignificant interactions between mistuning
and listening condition and between mistuning, listen-
ing condition, and musical training indicate that the
observed effects were consistent in both passive and
active listening. The ORN was enhanced in musicians
compared with nonmusicians by similar amounts in both
listening conditions.

Another possible explanation for our findings is that in
the present study we used a strict selection criterion for
nonmusicians, excluding participants with intermediate
levels of musical training. By using a strict criterion for
selecting nonmusicians, we may have selected individu-
als who have poor auditory processing abilities in gen-
eral. Individuals with poor auditory abilities may not
have been detected using pure tone thresholds as the
sole screening procedure. Future research should con-
sider a more comprehensive assessment of auditory
abilities when comparing musicians and nonmusicians.
Poor auditory processing abilities could explain why the
ORN of the nonmusicians was much smaller compared
with the ORN observed in previous studies (dónde
musical training was not a criterion). Similarmente, en el

1496

Revista de neurociencia cognitiva

Volumen 21, Número 8

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

F
/

t
t

i
t
.

:
/
/

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

estudio actual, we aimed to select a group of highly
trained musicians who may have enhanced auditory
habilidades de procesamiento. De este modo, our screening method may
have created two groups at opposite ends of the spec-
trum in term of auditory abilities.

Conclusión

The findings of the current study support the hypothesis
that musical training enhances concurrent sound segre-
gation. Music perception is governed by the same
primitive auditory scene processes as all other audi-
tory perception. Bregman (1990) points out that ‘‘the
primitive processes of auditory organization work in
the same way whether they are tested by studying
simplified sounds in the laboratory or by examining
examples in the world of music’’ (pag. 528). If we apply
this theory to the current data, we can conclude that mu-
sical training engenders general enhancements to con-
current sound segregation, regardless of stimulus type.
The process of concurrent sound segregation is differ-
ent in expert musicians. Musicians are better at identify-
ing concurrently occurring sounds, and this is paralleled
by neural change. This positive change in musicians is
probably due to experience in dealing with chords and
other harmonic (and inharmonic) relations found in
música. Enhancements to concurrent sound segregation
and related neural activity suggest that primitive auditory
scene abilities are improved by long-term musical training.

Expresiones de gratitud

The research was supported by grants from the Canadian
Institutes of Health Research and the Natural Sciences and
Engineering Research Council of Canada. Special thanks to
Dr. Takako Fujioka, Dr. Ivan Zendel, Patricia Van Roon, and two
anonymous reviewers for constructive comments on earlier ver-
sions of this manuscript.

Reprint requests should be sent to Claude Alain, Rotman Re-
search Institute, Baycrest Centre for Geriatric Care, 3560 Bathurst
Calle, toronto, ontario, Canada M6A 2E1, o por correo electrónico: calain@
rotman-baycrest.on.ca.

Nota

1. The N1 wave refers to a deflection in the auditory ERPs
that peaks at about 100 msec after sound onset and is largest
over the fronto-central scalp region. It is followed by an N1c,
which is a smaller negative wave over the right and the left
temporal sites and a P2 wave that peaks at about 180 después
sound and is maximal over the central scalp region. For a more
detailed review of long-latency human auditory-evoked poten-
tials, see Crowley and Colrain (2004), Scherg et al. (1999), Starr
and Don (1988), and Na¨a¨ta¨nen and Picton (1987).

REFERENCIAS

Alain, C., Arnott, S. r., & Picton, t. W.. (2001). Bottom–up and
top–down influences on auditory scene analysis: Evidencia
from ERPs. Revista de Psicología Experimental, 27,
1072–1089.

Alain, C., & Izenberg, A. (2003). Effects of attentional load on
auditory scene analysis. Revista de neurociencia cognitiva,
15, 1063–1073.

Alain, C., Schuler, B. METRO., & McDonald, k. l. (2002). Neural

activity associated with distinguishing concurrent auditory
objects. Journal of the Acoustical Society of America, 111,
990–995.

Alain, C., & Snyder, j. S. (2008). Age-related differences in

auditory evoked responses during rapid perceptual learning.
Clinical Neurophysiology, 119, 356–366.

Alain, C., Snyder, j. S., Él, y., & Reinke, k. (2007). Changes in
auditory cortex parallel rapid perceptual learning. Cerebral
Corteza, 17, 1074–1084.

Beauvois, METRO. w., & Meddis, R. (1997). Time decay of auditory
stream biasing. Perception and Psychophysics, 59, 81–86.

Besson, METRO., & Faita, F. (1995). An ERP study of musical

expectancy: Comparison of musicians with nonmusicians.
Revista de Psicología Experimental, 21, 1278–1296.

Bey, C., & McAdams, S. (2002). Schema-based processing in
auditory scene analysis. Perception and Psychophysics, 64,
844–854.

Bregman, A. S. (1990). Auditory scene analysis: The perceptual

organization of sound. Cambridge, MAMÁ: CON prensa.

Crowley, k. MI., & Colrain, I. METRO. (2004). A review of the evidence
for P2 being an independent component process: Age, sleep
& modality. Clinical Neurophysiology, 115, 732–744.

Dowling, W.. j. (1973). The perception of interleaved melodies.

Psicología cognitiva, 5, 322–337.

Fishman, Y. I., Volkov, I. o., Noh, METRO. D., Garell, PAG. C., Bakken,
h., Arezzo, j. C., et al. (2001). Consonance and dissonance of
musical chords: Neural correlates in auditory cortex of
monkeys and humans. Revista de neurofisiología, 86,
2761–2788.

Fujioka, T., Trainor, l. J., ross, B., Kakigi, r., & Pantev, C.

(2004). Musical training enhanced automatic encoding of
melodic contour and interval structure. Revista de Cognitivo
Neurociencia, 16, 1010–1021.

Fujioka, T., Trainor, l. J., ross, B., Kakigi, r., & Pantev, C.
(2005). Automatic encoding of polyphonic melodies in
musicians and non-musicians. Revista de Cognitivo
Neurociencia, 17, 1578–1592.

Hafter, mi. r., Schlauch, R. S., & Espiga, j. (1993). Attending to

auditory filters that were not stimulated directly. Diario de
the Acoustical Society of America, 94, 743–747.

Koelsch, S., Schroger, MI., & Tervaniemi, METRO. (1999). Superior

pre-attentive auditory processing in musicians. NeuroReport,
10, 1309–1313.

moore, B. C., Glasberg, B. r., & Peters, R. W.. (1986).

Thresholds for hearing mistuned partials as separate tones
in harmonic complexes. Journal of the Acoustical Society of
America, 80, 479–483.

Na¨a¨ta¨nen, r., Gaillard, A. W.. K., & Mantysalo, S. (1978). Early
selective attention effect on evoked potential reinterpreted.
Acta Psicológica, 42, 313–329.

Na¨a¨ta¨nen, r., & Picton, t. (1987). The N1 wave of the human
electric and magnetic response to sound: A review and an
analysis of the component structure. Psychophysiology, 24,
375–425.

Pantev, C., Oostenveld, r., Engelien, A., ross, B., Roberts, l. MI.,

& Hoke, METRO. (1998). Increased auditory cortical
representation in musicians. Naturaleza, 392, 811–814.

Alain, C. (2007). Breaking the wave: Effects of attention and

learning on concurrent sound perception. Hearing
Investigación, 229, 225–236.

Pantev, C., Roberts, l. MI., Schultz, METRO., Engelien, A., & ross, B.
(2001). Timbre-specific enhancement of auditory cortical
representations in musicians. NeuroReport, 12, 169–174.

Zendel and Alain

1497

D
oh
w
norte
yo
oh
a
d
mi
d

/
j

F
/

t
t

i
t
.

:
/
/

/
j

oh
norte

1
8

METRO
a
y

2
0
2
1

Picton, t. w., van Roon, PAG., Armilio, METRO. l., Iceberg, PAG., Ille, NORTE., &
Scherg, METRO. (2000). The correction of ocular artifacts: A
topographic perspective. Clinical Neurophysiology, 111,
53–65.

Russeler, J., Altenmuller, MI., Nager, w., Kohlmetz, C., & Munte,

t. F. (2001). Event related brain potentials to sound
omissions differ in musicians and non-musicians.
Neuroscience Letters, 308, 33–36.

Reinke, k. S., Él, y., Wang, C., & Alain, C. (2003). Perceptivo
learning modulates sensory evoked response during vowel
segregation. Cognitive Brain Research, 17, 781–791.

Scherg, METRO., Vajsar, J., & Picton, t. W.. (1999). A source analysis
of the late human auditory evoked potentials. Diario de
Neurociencia Cognitiva, 1, 336–355.

Schlauch, R. S., & Hafter, mi. R. (1991). Listening bandwidths
and frequency uncertainty in pure-tone signal detection.
Journal of the Acoustical Society of America, 90,
1332–1339.

Shahin, A., Bosnyak, D. J., Trainor, l. J., & Roberts, l. mi. (2003).
Enhancement of neuroplastic P2 and N1c auditory evoked

potentials in musicians. Revista de neurociencia, 23,
5545–5552.

Shahin, A., Roberts, l. MI., Pantev, C., Trainor, l. J., & ross, B.
(2005). Modulation of P2 auditory-evoked responses by the
spectral complexity of musical sounds. NeuroReport, 16,
1781–1785.

Sinex, D. GRAMO., Guzik, h., li, h., & Sabes, j. h. (2003). Responses
of auditory nerve fibers to harmonic and mistuned complex
tones. Hearing Research, 182, 130–139.

Sinex, D. GRAMO., Sabes, j. h., & li, h. (2002). Responses of inferior

colliculus neurons to harmonic and mistuned complex
tones. Hearing Research, 168, 150–162.

Starr, A., & Don, METRO. (1988). Brain potentials evoked by acoustic

estímulos. In T. W.. Picton (Ed.), Human event-related
potentials: EEG handbook (volumen. 3, páginas. 97–157). Ámsterdam:
Elsevier Science Publishers.

Wong, PAG. C. METRO., Skoe, MI., ruso, norte. METRO., Dees, T., & Kraus, norte.

(2007). Musical experience shapes human brainstem
encoding of linguistic pitch patterns. Neurociencia de la naturaleza,
10, 420–422.

D
oh
w
norte
yo
oh
a
d
mi
d