ARTICLE DE RECHERCHE
Effects of Syllable Rate on Neuro-Behavioral
Synchronization Across Modalities: Cerveau
Oscillations and Speech Productions
Deling He1,2
, Eugene H. Buder1,2
, and Gavin M. Bidelman3,4
1School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, Etats-Unis
2Institute for Intelligent Systems, University of Memphis, Memphis, TN, Etats-Unis
3Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, Etats-Unis
4Program in Neuroscience, Indiana University, Bloomington, IN, Etats-Unis
Mots clés: cortical tracking, phase locking, sensorimotor integration, speech rhythm, speech
synchronization
ABSTRAIT
Considerable work suggests the dominant syllable rhythm of the acoustic envelope is
remarkably similar across languages (∼4–5 Hz) and that oscillatory brain activity tracks these
quasiperiodic rhythms to facilitate speech processing. Cependant, whether this fundamental
periodicity represents a common organizing principle in both auditory and motor systems
involved in speech has not been explicitly tested. To evaluate relations between entrainment
in the perceptual and production domains, we measured individuals’ (je) neuroacoustic
tracking of the EEG to speech trains and their (ii) simultaneous and non-simultaneous
productions synchronized to syllable rates between 2.5 et 8.5 Hz. Productions made without
concurrent auditory presentation isolated motor speech functions more purely. We show that
neural synchronization flexibly adapts to the heard stimuli in a rate-dependent manner, mais
that phase locking is boosted near ∼4.5 Hz, the purported dominant rate of speech. Cued
speech productions (recruit sensorimotor interaction) were optimal between 2.5 et 4.5 Hz,
suggesting a low-frequency constraint on motor output and/or sensorimotor integration. Dans
contraste, “pure” motor productions (without concurrent sound cues) were most precisely
generated at rates of 4.5 et 5.5 Hz, paralleling the neuroacoustic data. Correlations further
revealed strong links between receptive (EEG) and production synchronization abilities;
individuals with stronger auditory-perceptual entrainment better matched speech rhythms
motorically. Ensemble, our findings support an intimate link between exogenous and
endogenous rhythmic processing that is optimized at 4–5 Hz in both auditory and motor
systèmes. Parallels across modalities could result from dynamics of the speech motor system
coupled with experience-dependent tuning of the perceptual system via the sensorimotor
interface.
INTRODUCTION
The auditory cortex faithfully tracks amplitude modulations in continuous sounds, regardless
of whether those acoustic events are speech (Ahissar et al., 2001; Casas et al., 2021; Luo &
Poeppel, 2007), modulated white noise (Henry & Obleser, 2012), or clicks (Will & Berger,
2007). This phenomenon, whereby a listener’s rhythmic brain activity (c'est à dire., oscillations)
entrains to the physical signal, is described as neural synchronization or cortical tracking.
un accès ouvert
journal
Citation: Il, D., Buder, E. H., &
Bidelman, G. M.. (2023). Effects of
syllable rate on neuro-behavioral
synchronization across modalities:
Brain oscillations and speech
productions. Neurobiology of
Language, 4(2), 344–360. https://
doi.org/10.1162/nol_a_00102
EST CE QUE JE:
https://doi.org/10.1162/nol_a_00102
Reçu: 8 Septembre 2022
Accepté: 25 Janvier 2023
Intérêts concurrents: Les auteurs ont
a déclaré qu'aucun intérêt concurrent
exister.
Auteur correspondant:
Deling He
dhe2@memphis.edu
Éditeur de manipulation:
David Poeppel
droits d'auteur: © 2023
Massachusetts Institute of Technology
Publié sous Creative Commons
Attribution 4.0 International
(CC PAR 4.0) Licence
La presse du MIT
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
.
/
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
Neurocognitive models suggest that the phase of ongoing brain oscillations, especially within
the low theta band (4–8 Hz), lock to the slowly varying amplitude envelope to parse contin-
uous sounds into discrete segments necessary for speech comprehension (Doelling et al.,
2014; Ghitza, 2011, 2012; Giraud & Poeppel, 2012; Luo & Poeppel, 2007). En particulier,
speech syllable rhythms, which exhibit a quasiregularity in their envelope modulation (Ding
et coll., 2017; Tilsen & Johnson, 2008), have been used to study how the brain parses the con-
tinuous speech stream (Ghitza, 2012; Hyafil et al., 2015). Cependant, such brain entrainment is
not solely low-level neural activity that simply mirrors the acoustic attributes of speech. Plutôt,
entrained responses also serve to facilitate speech comprehension (Doelling et al., 2014; Luo
& Poeppel, 2007; Peelle et al., 2013). These studies demonstrate that the degree to which
auditory cortical activity tracks acoustic speech (and non-speech) signals provides an impor-
tant mechanism for perception.
Syllable rhythms in speech range in speed from 2–8 Hz (Ding et al., 2017). With this var-
iability in mind, it is natural to ask whether the brain’s speech systems are equally efficient
across syllable rates, or instead are tuned to a specific natural speech rhythm. En effet, le
majority of the world’s languages unfold at rates centered near 4–5 Hz and neuroacoustic
entrainment is enhanced at these ecological syllable speeds (Ding et al., 2017; Poeppel &
Assaneo, 2020). In their neuroimaging study, Assaneo and Poeppel (2018) demonstrated that
auditory entrainment (c'est à dire., sound-to-brain synchronization) is modulated by speech rates from
2.5 à 6.5 Hz but declines at faster rates. In contrast, a more restricted 2.5–4.5 Hz frequency
coupling was found in phase-locked responses to speech between auditory and motor cortices
(c'est à dire., brain-to-brain synchronization; Assaneo & Poeppel, 2018). This suggests that while neu-
ral oscillations can entrain to a wider band of external rhythms (par exemple., 2.5–6.5 Hz), motor cortex
resonates at select frequencies to emphasize syllable coding at 4.5 Hz. A neural model was
proposed accordingly: speech-motor cortical function is modeled as a neural oscillator, un
element capable of generating rhythmic activity, with maximal coupling to auditory system
à 4.5 Hz. Such studies suggest, at least theoretically, a convergence of the frequency of
endogenous brain rhythms during speech production and the cortical encoding of speech at
its input.
In parallel with auditory-motor cortex coupling, behavioral sensorimotor synchronization
has been extensively characterized by having individuals produce certain movements in time
along with external physical events. Sensorimotor skills have most often been studied in the
form of tapping to a periodic stimulus (Repp, 2005). The rate limits of synchronization in beat
tapping approximately correspond with inter-onset intervals between 100 ms (Pressing &
Jolley-Rogers, 1997) et 1800 ms (Miyake et al., 2004; Repp, 2005). Cependant, these exam-
ples of non-speech motor synchronization may not generalize to speech considering its
unique nature in human cognition. The therapeutic benefits of synchronizing to audio or
visual speech productions, referred to speech entrainment, has been demonstrated in patients
with Broca’s aphasia (Fridriksson et al., 2012; Thors, 2019). Cependant, experience-based rates
(c'est à dire., patient’s most comfortable rate) have been implicitly used in speech entrainment tasks
rather than systematically verified. En plus, using a spontaneous speech synchronization
(SSS) task, Assaneo et al. (2019) found some listeners involuntarily match their speech with
external rhythm while others remain impervious. Listeners were instructed to freely produce
syllable trains while hearing syllables at rates of 4.5 syll/s with the goal of monitoring the
occurrence of syllables. Their data established a link between word learning capabilities
and sensorimotor speech synchrony. Critique, the optimal rate of the speech sounds in those
studies was assumed to be close to the natural/normal speaking rate (c'est à dire., ∼4–5 Hz). Uncer-
tainty also persists regarding how wider ranges of syllable rates might affect speech
Neurobiology of Language
345
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
/
.
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
synchronization. Plus loin, studies have shown that better rhythm perception abilities are indic-
ative of increased conversational quality mediated by better speech entrainment (Wynn et al.,
2022). Ainsi, it is highly plausible that an individual’s preference for certain stimulus rates
perceptually might facilitate their successfully entrainment at similar preferred rates during
production. To address this knowledge gap and explicitly test for frequency-specific coupling
in speech perception and production, sensorimotor and auditory synchronization must be
measured in a common paradigm.
In the present study, we aimed to empirically compare syllable rate sensitivity of the
auditory-perceptual and (sensori)motor systems. Ce faisant, we ask whether brain and speech
entrainment is or is not selectively tuned to the fundamental periodicity inherent to speech
(∼4.5 Hz) and thus represents a common organizing principle of processing across modalities.
This notion has been suggested, but to our knowledge has remain largely untested, in prom-
inent neurocognitive models of speech processing (Assaneo et al., 2021; Assaneo & Poeppel,
2018; Poeppel & Assaneo, 2020). To this end, we measured neuroacoustic tracking of
listeners’ electroencephalogram (EEG) to speech syllable trains to quantify their perceptual
entrainment to speech. To quantify motor entrainment, we measured speech productions
where participants synchronized to a wide range of syllable rates between 2.5 et 8.5 Hz
along with (simultaneous production) or without (non-simultaneous production) a concurrent
auditory speech stimulus. Employing both production tasks allowed us to isolate more or less
pure measures of motor system by including/excluding external auditory stimuli. Cerveau-
behavior correlations and comparisons of rate profiles across EEG and production data
allowed us to explicitly characterize possible links between auditory neural and motor produc-
tion entrainment mechanisms of speech processing.
MATERIALS AND METHODS
Participants
Fifteen young adults participated in the study (mean age 26.7 ± 3.4 années; 10/5 females/males).
(One additional participant completed the experiment but their data were lost due to a
logging error). They were of mixed race and ethnicity. Ten were native English speakers
and five were bilingual with English as a second language. Several participants had musical
entraînement (mean 9.9 ± 3.8 années). All participants were right-handed (Oldfield, 1971) et
reported no history of neuropsychiatric disorders. All had normal hearing sensitivity, defined
as air-conduction pure tone thresholds ≤ 25 dB HL (hearing level) at octave frequencies from
500 Hz to 4000 Hz. Listeners were provided written informed consent in compliance with a
protocol approved by the University of Memphis institutional review board and were mon-
etarily compensated for their time.
Stimuli
EEG stimuli
We used stimuli inspired by Assaneo and Poeppel (2018) to characterize brain synchrony to
rhythmic speech. Each consisted of trains of a single repeating syllable from the set /ba/, /ma/,
/wa/, /va/ (random draw). Individual tokens were synthesized from online text-to-speech
software (FromTextToSpeech.com, n.d.) using a male voice, and time compressed in Praat
à 120 ms durations (Boersma & Weenink, 2013). Tokens were concatenated to create syllable
trains of 6 s duration. To vary syllable rate, we parametrically varied the silent gap between
tokens from 0 à 280 ms to create seven continuous streams of speech syllables with rates of
Neurobiology of Language
346
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
/
.
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
2.5, 3.5, 4.5, 5.5, 6.5, 7.5, et 8.5 syll/s. In practice, le 8.5 Hz condition was presented at a
nominal rate of 8.33 Hz to achieve the fastest presentation speed possible given the 120 ms
duration of our individual speech tokens.
Speech production stimuli
To assess simultaneous (cued) and non-simultaneous (un-cued) speech production synchroni-
zation, we generated another two sets of stimuli adapted from the SSS task (Assaneo et al.,
2019). To study the non-simultaneous rhythm production, we used syllable trains of continuous
repetitions of /ta/ lasting for 10 s.
For simultaneous rhythm production, we used 60 s long syllable streams with 16 distinct
syllables (unique consonant-vowel combinations) that were randomly concatenated. Nous
generated seven rate conditions (∼2.5–8.5 syll/s). This was achieved by temporally
compressing/expanding the 4.5 Hz syllable stream from Assaneo et al. (2019) by the appro-
priate scale factor using the “Lengthen” algorithm in Praat (Boersma & Weenink, 2013).
Acquisition et prétraitement des données
Participants were seated comfortably in front of a PC monitor and completed the three exper-
imental tasks in a double-walled, sound-attenuating booth (Industrial Acoustics Company,
2023). Auditory stimuli were presented binaurally at 82 dB SPL (sound pressure level) via elec-
tromagnetically shielded ER-2 insert earphones (Etymotic, 2023). Stimuli and task instructions
were controlled by MATLAB 2013 (MathWorks, 2013) routed to a TDT RP2 signal processing
interface (Tucker-Davis Technologies, 2022). Speech production samples were recorded dig-
itally with a professional microphone (Blue Yeti USB, Logitech; 44100 Hz; 16 bits; cardioid
pattern; Blue Yeti, 2022).
EEG data
During neural recordings, participants listened to rhythmic syllable trains (Figure 1A). To main-
tain attention, they were instructed to identify which syllable (c'est à dire., /ba/, /ma/, /wa/, /va/) était
presented at the end of the trial via button press. There was no time constraint to respond, et
the next trial started after the button press. Listeners heard 10 trials of each 6 s syllable train per
syllable rate condition. Rate and syllable token were randomized within and between
participants.
Continuous EEGs were recorded differentially between Ag/AgCl disc electrodes placed on
the scalp at the mid-hairline referenced to linked mastoids (A1/A2) (mid-forehead = ground).
This single channel, sparse montage is highly effective for recording auditory cortical EEG
given their fronto-central scalp topography (Bidelman et al., 2013; Picton et al., 1999).
Interelectrode impedance was kept ≤ 10 kΩ. EEGs were digitized at 1000 Hz uisngSynAmps
RT amplifiers (Compumedics Neuroscan, 2022) and an online passband of 0–400 Hz. Neural
signals were bandpass filtered (0.9–30 Hz; 10th order Butterworth), epoched into individual
6 s trial segments synchronized to the audio stimuli, and concatenated. This resulted in 60 s
of EEG data per rate condition. Eyeblinks were then nullified in the continuous EEG via a
wavelet-based denoising algorithm (Khatun et al., 2016). Trials were averaged in the time
domain to derive cortical neural oscillation for each condition. We measured synchronization
between brain and acoustic speech signals via phase-locking values (PLV; see Phase-Locking
Value).
Neurobiology of Language
347
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
.
/
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
.
/
je
Chiffre 1. Examples of neural entrainment and speech synchronizations. (UN) Brain entrainment to speech envelope for a slower (2.5 syll/s) et
higher (8.5 syll/s) syllable rate. Black = cortical EEG responses; green = schematized EEG envelope; red = stimulus waveform; pink = speech
fundamental envelope. (B) Schematic of the non-simultaneous (un-cued) speech production task (2.5 Hz rate). (C) Schematic of the cued
(simultaneous) production synchronization task (2.5 Hz rate). Pink = auditory stimuli; light blue = speech production samples.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Speech production data
Non-simultaneous syllable rhythm synchronization (Figure 1B). Participants first listened to rhyth-
mic syllable trains (/ta/ repeated for 10 s). They were instructed to then whisper /ta/ with the
same pace as the previous stimulus for 10 s (c'est à dire., without a concurrent audio stimulus). With
this explicit instruction and whispering articulation requirement, we aimed to investigate
intentional speech rhythm production guided by internal rhythmic cues, minimizing self-
auditory feedback. The procedure was repeated twice for each rate condition. Two runs were
conducted in anticipation of avoiding possible practice effects. Cependant, data from the two
runs were highly correlated (r2.5 = 0.75, r3.5 = 0.88, r4.5 = 0.80, r5.5 = 0.91, r6.5 = 0.86, r7.5 =
0.77, r8.5 = 0.82, p < 0.001), indicating good test-retest repeatability. Moreover, paired t tests
further confirmed the two runs did not differ at any of the rates ( p2.5 = 0.85, p3.5 = 0.66, p4.5 =
0.22, p5.5 = 0.17, p6.5 = 0.23, p7.5 = 0.94, p8.5 = 0.17).
Neurobiology of Language
348
Neurobehavioral speech synchronization
Simultaneous syllable rhythm synchronization (Figure 1C). We adapted the SSS test (Assaneo et al.,
2019) to measure cued motor speech to auditory synchronization. Participants were instructed to
continuously whisper /ta/ while concurrently listening to a rhythmic syllable stream for 60 s. By
employing whisper and insert earphones, we aimed to avoid participants’ using their own produc-
tion sounds as auditory feedback to their speech output. After each trial, listeners indicated whether
a target syllable were presented in the previous stream. Four target syllables were randomly chosen
from a pool of eight (50% were from the syllable stream). Importantly, we did not explicitly instruct
participants to synchronize to the external audio rhythm and we also removed their training session.
In previous studies using the SSS, listeners first heard a fixed syllable rate at 4.5 Hz presented audi-
torily (Assaneo et al., 2019). This may have primed them to produce syllables with the same pace
leading to an artificial increase in performance at 4.5 Hz. Participants were informed the goal was to
correctly identify the target syllable and that the speech they heard was only to increase task diffi-
culty. The purpose of this behavioral task was to prevent participants from intentionally matching
their speech to the aural inputs by directing their attention to the syllable identification task.
Data Analysis: Quantifying Synchronization and Rate Accuracy
We performed analyses using custom scripts written in MATLAB and used TF32 software to
examine the rate of acoustic signals (Milenkovic, 2002).
Phase-locking value
We measured brain-to-stimulus synchronization (and similarly speech-to-stimulus synchroni-
zation) as a function of frequency via PLV; Lachaux et al., 1999). Neural and auditory signals
were bandpass filtered (±0.5 Hz) around each frequency bin from 1 to 12 Hz (0.5 Hz steps).
The envelope was calculated as the absolute value of the signal’s Hilbert transform. PLV was
then computed in each narrow frequency band according to Equation 1.
PLV ¼
(cid:2)
(cid:2)
(cid:2)
X
T
t¼1
1
T
½
(cid:2)
ei θ1 tð Þ−θ2 tð Þ
(cid:2)
(cid:2)
(cid:2)
(1)
where θ1(t) and θ2(t) are the Hilbert phases of the EEG and stimulus signals, respectively. Intuitively,
PLV describes the consistency in phase difference (and by reciprocal, the correspondence) between
the two signals over time. PLV ranges from 0–1, where 0 represents no (random) phase synchrony
and 1 reflects perfect phase synchrony between signals. The PLV was computed for windows of
6 s length and averaged within each rate condition. Repeating this procedure across frequencies
(1–12 Hz; 0.5 Hz steps) resulted in a continuous function of PLV describing the degree of brain-to-
speech synchronization across the bandwidth of interest (e.g., Assaneo et al., 2019). PLVs were then
baselined in the frequency domain by centering each function on 0 by subtracting the value of the
first (i.e., 1 Hz) frequency bin. This allowed us to evaluate the relative change in stimulus-evoked
PLV above the noise floor of the metric. We then measured the peak magnitude from each PLV
function to trace changes in brain-to-speech synchronization with increasing syllable rate.
For speech production-to-stimulus synchronization (which are both acoustic signals), we
processed the recordings using the speech modulation procedure described by Tilsen and
Johnson (2008). We first discarded the first/last 5 s of each recording to avoid onset/offset artifacts
and then normalized the amplitude. We then bandpass filtered the signal (3000–4000 Hz; 4th
order Butterworth) to highlight the voiceless whispered energy followed by half-wave rectification
to extract the speech envelope. We then lowpass filtered (fc = 30 Hz), downsampled (Fs = 80 Hz),
windowed (Tukey window), and de-meaned the envelope modulated signal to isolate slower
speech rhythms. As in the brain-to-stimulus synchronization analysis, we then measured PLV
between the acoustic productions and speech stimulus for each rate.
Neurobiology of Language
349
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
/
.
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
Speech rate
As an alternate approach to corroborate the automatic rate measures, we manually counted
syllables for each 10 s recording of participants’ non-simultaneous productions from wideband
spectrograms computed in TF32. Speech rate was calculated as the number of syllables per s;
onset and offset silences were not included in these calculations. Since the audio recordings of
implicit speech rate productions were 60 s each, we further validated the reliability of syllable
counting by applying an automatic peak finding algorithm. Again, the first/last 5 s were discarded
to avoid transient onset/offset effects. We then extracted the Hilbert envelope and smoothed the
signal using a 30 ms moving average. The amplitude was normalized before and after envelope
extraction. Lastly, we employed MATLAB’s ‘findpeaks’ function (minpeakhight = 0.08, minpeak-
prominence = 0.01, minpeakdistance = 117ms) to automatically detect and measure syllable
peaks. Visual inspection and auditory playback were used to determine these optimal parameters.
The speech rate calculated from the spectrogram and peak finding algorithm were highly corre-
lated (r = 0.95; p < 0.0001) confirming the reliability of the automatic analysis approach.
Statistical Analysis
Unless noted otherwise, we analyzed the data using one-way, mixed-model analyses of var-
iance (ANOVAs) in R ( Version 1.3.1073; ‘lme4’ package; Bates et al., 2015) with rate (7 levels;
2.5–8.5 Hz) as a categorical fixed effect and subjects as random factor (e.g., PLV ∼ rate +
(1|subject)) to assess whether the brain-to-stimulus and speech-to-stimulus synchrony differed
across syllable rate. The Tukey post hoc test for multiple comparisons was used. Moreover, to
test whether PLV at 4.5 Hz is enhanced, following the omnibus ANOVA, we used an a priori
contrast to compare neural PLV at 4.5 Hz versus other syllable rates. For production data, we
tested whether participants’ produced rate achieved the target syllable rate using one-sample
Shapiro t test and Wilcox signed rank test for the simultaneous (implicit) and non-simultaneous
(explicit) rate production tasks, respectively. Significance in these tests indicates participant’s
production speed deviated from (e.g., was slower/faster than) the nominal stimulus rate. To
assess brain-behavior associations, we first used Pearson’s correlations to test the across
individual association after aggregating across rates between neural and production PLV.
We then used repeated measures correlations (rmCorr; Bakdash & Marusich, 2017) to assess
within-subject relations between neural and acoustic synchrony measures. Unlike conven-
tional correlations, rmCorr accounts for non-independence among each listener’s observa-
tions and measures within-subject correlations by evaluating the common intra-individual
association between two measures. Initial diagnostics (quantile–quantile plot and residual
plots) were used to verify normality and homogeneity assumptions. Consequently, PLV mea-
sures were square-root transformed to allow for parametric ANOVAs. Behavioral data from
the EEG task (i.e., percentage of correctly perceived syllables) were rationalized arcsine
transformed (Studebaker, 1985). A priori significance level was set at α = 0.05. Effect sizes
are reported as n 2
p .
RESULTS
Cortical Oscillation Synchrony Is Enhanced at ∼4.5 Hz Syllable Rate
The percentage of correctly perceived syllables during EEG recordings showed no significant
difference (F6,90 = 1.76, p = 0.1162, n 2
p = 0.11) across conditions, confirming participants were
equally engaged in the listening task across rates. We evaluated neural-speech PLV (Figure 2)
to assess how ongoing brain activity synchronized to speech (Assaneo & Poeppel, 2018) over
Neurobiology of Language
350
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
/
.
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
Figure 2. Phase-locked neural oscillations synchronize to the rate of the syllable envelope. The phase-locked value (PLV) increment from
baseline between neuroelectric activities and the stimuli envelope across frequency are enhanced at 4.5 Hz. Note the peak in the PLV close to
the nominal syllable rate as well as higher harmonics. Similar harmonics were observed in the spectra of the acoustic stimulus envelopes,
owing to the non-sinusoidal nature of speech waveforms. The bottom right panel represents the distribution of peak PLV across participants as a
function of stimulus syllable rate. Shading = ±1 standard error of the mean.
an expanded range of ecologically valid syllable rates (2.5–8.5 Hz) characteristic of most
languages (Ding et al., 2017; Poeppel & Assaneo, 2020). Each PLV plot shows a strong peak at
the fundamental frequency surrounding the rate of the stimulus as well as additional peaks at
harmonic frequencies. Harmonic energy was also present in the acoustic stimuli. An ANOVA
conducted on neural PLV revealed a main effect of syllable rate (F6,90 = 3.76, p = 0.0022, n2
p =
0.2). An a priori contrast showed that PLV was stronger for 4.5 Hz compared to all other rates
( p = 0.026). Interestingly, 4.5 Hz corresponds with the mean syllable rate in English (Goswami
& Leong, 2013; Greenberg et al., 2003) as well as most other languages (Ding et al., 2017;
Varnet et al., 2017). Our results reinforce the notion that neural oscillations synchronize to the
speech envelope and are modulated by syllable rate. More critically, we observed an enhance-
ment of PLV at the frequency close to the predominant syllable rhythm (4.5 syll/s) inherent to
most languages, suggesting a preferred rate of neural oscillation coherent with listeners’ long-
term listening experience.
Spontaneous Speech Synchronization Is Restricted to Slower Rates
We next examined whether listeners’ cued speech productions were synchronized to the
simultaneous audio track at various syllable rates (Figure 3). Speech-to-stimulus PLVs showed
selective peaks at the audio speech rhythm that declined with increasing rate above ∼6.5 Hz
(main effect of syllable rate: F6,90 = 14.355, p < 0.0001, n 2
p = 0.49). Post hoc analysis revealed
stronger PLV for slower (2.5–4.5 Hz) versus faster (5.5–8.5 Hz) rates (all p values < 0.05).
These results suggest that participants can only synchronize their speech productions to rela-
tively slow syllable rate (i.e., motor performance is akin to a lowpass filter).
Correspondence Between Syllable Perception and Production
To explore the link between syllable rhythm entrainment in perception and production, we
measured participants’ accuracy for producing target syllables under the two experimental
settings: one following an explicit instruction to replicate a previously heard rhythm (non-
simultaneous/un-cued productions) and the other with implicit instruction to mirror a concur-
rently presented syllable train (simultaneous/cued production). One sample t tests showed that
Neurobiology of Language
351
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
Simultaneous speech synchronization to syllable trains is modulated by rate. The phase-locking value (PLV) increment against
Figure 3.
baseline was computed between acoustic stimuli and listeners’ speech productions. Note the performance optimizes at slower (2.5–4.5 Hz)
compared with higher rates (5.5–8.5 Hz). The bottom right panel represents the distribution of peak PLV across participants as a function of
stimulus syllable rate. Shading = ±1 standard error of the mean.
for non-simultaneously produced syllable rate (NSR; Figure 4A), participants only hit target
rates at 4.5 and 5.5 syll/s (4.5 Hz: t(14) = −1.49, p = 0.16; 5.5 Hz: t(14) = −1.74, p = 0.10).
However, the variability in productions also appeared to differ across rates. Indeed, measuring
the mean absolute deviation of responses, we found smaller variability in productions at rates
of 2.5 and 3.5 Hz versus 4.5 and 5.5 Hz ( p = 0.003, one-way ANOVA). This suggests at least
part of the effect at 4.5–5.5 Hz in Figure 4A might be attributed to more/less precise produc-
tions across rates. Notably, productions deviated from (were slower than) the target speeds
above 6.5 Hz indicating they failed to keep pace with the audio stimulus. Simultaneously pro-
duced rate (SSR; Figure 4B) measures showed highly accurate reproductions for ∼2.5–4.5 Hz
(p2.5 = 0.46, p3.5 = 0.13, p4.5 = 0.26), with slowing of production at higher rates. The results of
SSR were consistent with the enhanced speech-to-stimulus PLV at 2.5–4.5 Hz (see Figure 3).
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
/
.
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 4. Participants’ produced speech rate compared to the target rate of auditory stimuli. (A) Speech rate was produced after rhythmic
syllable trains were presented (non-simultaneous) with explicit instructions of pace duplication. (B) Participants produced syllables while
simultaneously listening to rhythmic streams with implicit rate synchronization. *p < 0.05, significant deviations from the expected rate
(red +) based on one-sample tests against the nominal (target) rate value. Shaded region = ±1 standard deviation (SD).
Neurobiology of Language
352
Neurobehavioral speech synchronization
Figure 5. Correlations between brain and production synchronization to speech. (A) Pearson correlation (between-subjects) aggregating
across rate conditions between neural and production PLV. (B) Repeated measures correlations (within-subjects) between neural and
production PLV. PLV_EEG = neural-to-stimulus PLV; PLV_pro = speech-to-stimulus PLV. Dots = individual participants’ responses; solid lines =
within-subject fits to each individual’s data across the seven rates; dashed line = linear fit across the aggregate sample. *p < 0.05, **p < 0.01,
***p < 0.001.
Brain-Behavior Correlations Between Production and Neural Speech Entrainment Accuracy
To explore the relationship between auditory and motor (production) responses, we conducted
between- and within-subject correlations. Figure 5A suggests a non-significant relation
between neural and production PLV when the data are considered on the whole, without
respect to each individual. Indeed, rmCorr correlations assessing within-subject correspon-
dence revealed a positive correlation between neural and speech PLV (r = 0.25, p = 0.019,
Figure 5B), indicating an auditory-motor relation in rhythmic synchronization abilities at the
individual level.
DISCUSSION
By measuring EEG oscillations and acoustical speech productions in response to syllable
trains presented at various rates, the current study evaluated syllable rate-dependencies
in auditory neural entrainment and simultaneous speech synchronization, and possible
dynamic relations between these domains. We first confirmed that auditory brain activity
robustly synchronizes to the ongoing speech envelope and flexibly adapts to the speed of syl-
lable trains in a rate-dependent manner (Assaneo & Poeppel, 2018; Ding et al., 2016; Rimmele
et al., 2021; Will & Berg, 2007). More interestingly, we found that neuroacoustic phase locking
was boosted at rates of ∼4.5 Hz, corresponding to the putative dominant syllable rate
observed across languages (Ding et al., 2017). Production data showed that simultaneous
speech synchronization to audio rhythms was largely restricted to slower syllable rates
(2.5–4.5 Hz). In contrast, and converging with neural data, we found “pure” motor rate
productions were produced more accurately; participants more precisely matched syllable
rates between 4–5 syll/s even without concurrent auditory cuing. Lastly, correlations
between brain and production PLV data extend prior work (Assaneo et al., 2019; Assaneo
& Poeppel, 2018) by explicitly linking auditory and motor entrainment skills. We found that
individuals with superior auditory entrainment to speech also show enhanced motor speech
capabilities in speech-audio synchronization.
Neurobiology of Language
353
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
Cortical Oscillation Synchrony Is Modulated by the Heard Syllable Rates
Corroborating previous magnetoencephalography (MEG)/EEG studies (Assaneo & Poeppel,
2018; Ding et al., 2016; Keitel et al., 2018; Teng et al., 2017), our data reveal that low fre-
quency neural oscillatory signals (2.5–8.5 Hz) robustly phase lock and closely mirror the rate
of auditorily presented speech. Neuroacoustic phase locking did diminish with increasing rate,
consistent with previous findings showing cortical activity fails to synchronize with the enve-
lope of accelerated speech (Ahissar et al., 2001; Nourski et al., 2009). However, entrainment
remained above the noise floor even for the fastest syllable rate (8.5 Hz). Accurate neural
entrainment to a larger range of frequencies, even some of which are well beyond the regular
speeds of intelligible speech (Adams & Moore, 2009; Momtaz et al., 2021; Viemeister, 1979),
is perhaps not surprising given the ease at which the auditory system tags temporal acoustic
landmarks of speech and non-speech signals (Doelling et al., 2014; Luo & Ding, 2020;
Momtaz et al., 2021; Viemeister, 1979). In order to cope with the varying timescales of
temporal patterns in speech, neuronal processing must demonstrate rate flexibility (Saltzman
& Munhall, 1989; van Lieshout, 2004). Indeed, neural entrainment to external rhythmicity
helps ensure proper signal detection (Besle et al., 2011; Stefanics et al., 2010) and facilitates
speech comprehension (Doelling et al., 2014; Giraud & Poeppel, 2012; Luo & Poeppel, 2007).
One hypothesis of these phenomena is that continuous speech is discretized and segmented
on multiscale temporal analysis windows formed by cortical oscillation locking to the input
speech rhythm (Ghitza, 2011, 2012, 2014; Giraud & Poeppel, 2012). Our data support these
general notions that low-frequency activity of auditory cortex flexibly tracks the speed of the
speech envelope via phase synchronization of cortical activity.
Interestingly, cortical responses also showed enhanced phase locking for speech rates prox-
imal to 4.5 Hz. Notably, we observed a bell-shaped rate-dependence with the maximum gain
in neural phase locking near 4.5 Hz, which aligns with the dominant spectral profile of sylla-
ble rates across languages (Ding et al., 2017). This finding suggests that neural excitability is
adjusted to align the acoustic temporal structure of speech such that neural oscillations are
tuned to track the acoustic proclivities of natural languages. This is probably coherent with
listeners’ long-term listening and speaking experience with the dominant speech rhythms in
their language. This supports the notion that neural oscillations coding speech reflect an inter-
play of input processing and output generation in which the associated neural activities are
shaped over time by the statistical structure of speech (Poeppel, 2003).
Simultaneous Speech-Audio Synchronization Is Rate Restricted
Paralleling our brain-audio synchronization data, we further asked whether simultaneous
speech-audio synchronization is affected by syllable rates from 2.5–8.5 syll/s. Importantly,
we did not explicitly instruct participants to match the audio rate nor did we provide practice
on the task, which we speculate can lead to priming effects and apparent enhancements in
synchronization at certain rates (cf. Assaneo et al., 2019). The resulting production data dem-
onstrate that participants’ rhythmic speech output does not uniformly synchronize across rates
but is instead severely restricted to slower frequencies from 2.5 to 4.5 Hz. Because the simul-
taneous production task implicitly instructed listeners to align their self-speech production to
heard audio, it necessarily evoked sensorimotor integration. The fact such productions are lim-
ited to low rates is consistent with neuroimaging results indicating selective coupling between
auditory and motor cortices between 2.5 and 4.5 Hz (Assaneo & Poeppel, 2018). Moreover,
the lack of entrainment at higher frequencies as observed in our EEG data perhaps suggests the
sensorimotor effects of producing while also listening to speech might create a mixture of
entrained brain processes that interfere with or are at least distinct from one another. The shift
Neurobiology of Language
354
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
to slower rate preferences in motor speech synchronization also seems reasonable given the
risk of articulatory undershooting when speaking fast (Gay et al., 1974), and the fact that speed
of articulation is constrained by the biomechanical limits of articulators. Alternatively, this rate-
constriction could result from the oscillator tuning of the motor system in which it involuntarily
entrains to (i.e., resonates with) the auditory stimuli when rates are close to its intrinsic rhythm.
It is conceivable that auditory-motor interaction has adapted its sensitivity to both forms of
natural constraints imposed by the articulatory and motor systems.
Neurophysiologically, this lowpass filter shape could also result if motor responses are
dominated by lower-frequency rhythms of the brain. Indeed, delta (0.5–4 Hz) oscillations
are thought to reflect endogenous rhythms from primary motor cortex (Keitel & Gross,
2016; Morillon et al., 2019), which can emerge in the absence of acoustic stimulation (Ding
et al., 2016; Rimmele et al., 2021). Other possible explanations could be due to the cognitive
demands of this task, which consumes heavier cognitive load (Zhou et al., 2018) and requires
extra neurocomputational time to match the motor program with the auditory input. Higher
task demands would tend to result in successful synchronization only at the easiest (slowest)
rate conditions. Low-frequency components of the EEG have been linked to cognitive opera-
tions such as sustained attention and working memory (Bidelman et al., 2021; Kirmizi-Alsan
et al., 2006). However, this explanation seems speculative since we could not explicitly
measure brain oscillations during the production tasks. Instead, the lowpass nature of the
simultaneous production data seems parsimoniously described in terms of limits to sensori-
motor processing, with more severe constraints imposed by the motor component.
Non-Simultaneous Productions Highlight an Intrinsic Rhythm at 4–5 Hz
Under an oscillatory framework, different aspects of spoken communication arise from neural
oscillations that are accessible for both perception and production. Such oscillations could
emerge in the context of input processing and output generation and result in the associated
auditory and motor activities that would reflect the structure of speech (Giraud et al., 2007;
Giraud & Poeppel, 2012; Liberman & Whalen, 2000).
A second aspect of our study design examined natural speech rate productions via non-
simultaneous productions. Some conditions were quite challenging given the rapid production
speeds required of the task. This paradigm provided listeners with minimal auditory feedback
and thus, better isolated more pure motor system responses during speech output. Without
concurrent auditory feedback either from their own speech or external stimuli, possible inter-
ference confounds from sound-evoked auditory oscillations mentioned earlier are minimized.
Surprisingly, we found participants’ productions under these conditions hit target speeds
(statistically speaking) only for rates of 4.5 and 5.5 syll/s. Productions failed to meet targets
(i.e., were slower than the nominal rates) at all lower and higher syllable speeds. However,
we also note production variability differed as speeds increased (Figure 4A). While we inter-
pret the non-simultaneous data to reflect motor speech function during limited auditory
involvement, an alternate interpretation might be the more explicit instruction of rate imitation.
Nevertheless, those findings align with our EEG results on auditory entrainment, which simi-
larly showed maximum synchronization at 4.5 Hz and flexibility with wide range of speech
rates. This frequency specialization in both the speech perception and production data is sug-
gestive of a resonance of intrinsic neural oscillations representing syllable rhythm (Assaneo &
Poeppel, 2018; Luo & Poeppel, 2007; Poeppel & Assaneo, 2020).
The notion of an intrinsic 4–5 Hz rhythm receives further support from several other obser-
vations: the predominant peak in speech envelope spectra for many languages and speaking
Neurobiology of Language
355
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
conditions (Ding et al., 2017; Goswami & Leong, 2013); the mean syllable duration in English
(∼200 ms; Greenberg et al., 2003; Pellegrino et al., 2011); the coordinated articulation or
motor gesture trajectory in sound production (Poeppel & Assaneo, 2020); movement of the
lips, tongue, and hyoid with a 5 Hz rhythm during lip-smacking in monkey (Ghazanfar
et al., 2012). Neurologically, continuous speech is processed through a temporal integration
window of ∼200 ms (period of 4–5 Hz; Luo & Poeppel, 2007). Studies using transcranial alter-
nating current stimulation further show that 5 Hz stimulation enhances cortical entrainment
and results in better sentence comprehension (Wilsch et al., 2018). The striking coherence
between these divergent methodologies, along with the present data, supports the notion of
an intrinsic rhythm at ∼4–5 Hz, a computational primitive in cortical speech processing that
also seems to link input and output processing.
Differences and Limitations to Related Studies
Our stimulus paradigm was adapted from previous neuroimaging studies on neural entrain-
ment to speech rhythm (e.g., Assaneo et al., 2019; Assaneo & Poeppel, 2018). However, there
are several distinct aspects of the findings presented here. First, our cortical tracking data
observed a stronger brain-to-speech phase synchronization at 4.5 syllables/sec which contrasts
with previous reports suggesting auditory cortex is invariant in syllable tracking across rates
(Assaneo & Poeppel, 2018). Although listening to rhythmic sounds induces motor cortex
(Bengtsson et al., 2009; Wilson et al., 2004), our single channel EEG recordings do not allow
us to localize our effects to auditory versus motor cortex generators, per se. In this regard,
high-density neural recordings (Assaneo & Poeppel, 2018) revealed enhanced intracranial
coupling of speech-evoked oscillations between auditory and motor cortices specifically at
4.5 Hz. It is possible then that the gain in cortical phase locking at 4.5 Hz observed in our
data reflects neural entrainment in motor-related regions (Assaneo & Poeppel, 2018). Accord-
ingly, other neuroimaging studies have shown that oscillation power in motor areas modulates
auditory cortex tracking of acoustic dynamics to facilitate comprehension (Keitel et al., 2017,
2018). Given that the scalp EEG reflects a mixture of intracranial sources, the effects we
observe here probably reflect a mixture of entrained oscillations in auditory and motor cortex
as suggested by previous MEG studies (Bengtsson et al., 2009; Wilson et al., 2004). Multi-
channel EEG recordings with source reconstruction analysis could test this hypothesis in
future studies. Privileged recruitment of motor brain regions induced by concurrent auditory
entrainment may account for the local enhancements in PLV we observe near 4.5 Hz in
both our EEG and production data.
Second, we observed a more complex syllable rate-constrained pattern in speech-audio
responses (simultaneous productions) but a preferred syllable rhythm for isolated motor syn-
chronization (non-simultaneous productions). To our knowledge, these novel findings have
not been observed previously and are only revealed by comparing speech productions with
varying degrees of sensory and motor involvement. By explicitly examining multiple modes of
production and tasks which tease apart sensory from motor processes, our data establish a link
between exogenous and endogenous speech entrainment mechanisms and further reveal
unique specialization at 4–5 Hz in both the auditory and motor modalities. These parallel
effects likely trace back to the long-term experience of the listener and dominant syllable rates
for input processing and output production. In contrast, with concurrent auditory inputs, the
rate-restricted pattern could emerge from the tuning of motor oscillator and its interaction with
the sensory system. Future studies are also needed to test whether this oscillator tuning is
mediated by the better versus worse synchronization performance. It is possible the bimodal
distribution in speech-rate synchronization observed in prior work (Assaneo et al., 2019) is
Neurobiology of Language
356
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
apparent only with a very large number of participants or with those with more heterogeneous
backgrounds.
In conclusion, our data establish a positive speech perception-production link for rate syn-
chronization. Both perceptual and motor entrainment for speech processing seem optimized
for rates between 4 and 5 Hz, the putative nominal speech rate across languages. Still, these
links are only identifiable when carefully considering the nature of speech production and
tasks that isolate motor from sensorimotor processes. Moreover, we find synchronization skills
are subject to individual differences, with performance in the perceptual domain predicting
skills in motor domain and vice versa. As such, our findings provide support for theoretical
notions of an oscillation-based account of speech processing which organizes both input
and output domains of speech processing.
ACKNOWLEDGMENTS
The authors thank Dr. Bashir Morshed for supplying code for the denoising algorithm, and
Dr. M. Florencia Assaneo and Dr. David Poeppel for providing the code for the spontaneous
speech synchronization test.
FUNDING INFORMATION
Gavin M. Bidelman, National Institute on Deafness and Other Communication Disorders
(https://dx.doi.org/10.13039/100000055), Award ID: R01DC016267.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
AUTHOR CONTRIBUTIONS
Deling He: Conceptualization; Formal analysis; Investigation; Methodology; Visualization;
Writing – original draft; Writing – review & editing. Gavin M. Bidelman: Conceptualization;
Formal analysis; Funding acquisition; Methodology; Supervision; Writing – original draft;
Writing – review & editing. Eugene H. Buder: Conceptualization; Writing – review & editing.
DATA AVAILABILITY
The data that support the findings of this study are available on request from the Gavin M.
Bidelman (gbidel@indiana.edu). The data are not publicly available because of privacy/ethical
restrictions.
REFERENCES
Adams, E. M., & Moore, R. E. (2009). Effects of speech rate, back-
ground noise, and simulated hearing loss on speech rate judgment
and speech intelligibility in young listeners. Journal of the American
Academy of Audiology, 20(1), 28–39. https://doi.org/10.3766/jaaa
.20.1.3, PubMed: 19927680
Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke,
H., & Merzenich, M. M. (2001). Speech comprehension is corre-
lated with temporal response patterns recorded from auditory
cortex. Proceedings of the National Academy of Sciences,
98(23), 13367–13372. https://doi.org/10.1073/pnas.201400998,
PubMed: 11698688
Assaneo, M. F., & Poeppel, D. (2018). The coupling between audi-
tory and motor cortices is rate-restricted: Evidence for an intrinsic
speech-motor rhythm. Science Advances, 4(2), Article eaao3842.
https://doi.org/10.1126/sciadv.aao3842, PubMed: 29441362
Assaneo, M. F., Rimmele, J. M., Sanz Perl, Y., & Poeppel, D.
(2021). Speaking rhythmically can shape hearing. Nature Human
Behaviour, 5(1), 71–82. https://doi.org/10.1038/s41562-020
-00962-0, PubMed: 33046860
Assaneo, M. F., Ripollés, P., Orpella, J., Lin, W. M., de Diego-
Balaguer, R., & Poeppel, D. (2019). Spontaneous synchroniza-
tion to speech reveals neural mechanisms facilitating language
learning. Nature Neuroscience, 22(4), 627–632. https://doi.org
/10.1038/s41593-019-0353-z, PubMed: 30833700
Bakdash, J. Z., & Marusich, L. R. (2017). Repeated measures corre-
lation. Frontiers in Psychology, 8, Article 456. https://doi.org/10
.3389/fpsyg.2017.00456, PubMed: 28439244
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear
mixed-effects models using lme4. Journal of Statistical Software,
67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Neurobiology of Language
357
Neurobehavioral speech synchronization
Bengtsson, S. L., Ullén, F., Ehrsson, H. H., Hashimoto, T., Kito, T.,
Naito, E., Forssberg, H., & Sadato, N. (2009). Listening to
rhythms activates motor and premotor cortices. Cortex, 45(1),
62–71. https://doi.org/10.1016/j.cortex.2008.07.002, PubMed:
19041965
Besle, J., Schevon, C. A., Mehta, A. D., Lakatos, P., Goodman, R. R.,
McKhann, G. M., Emerson, R. G., & Schroeder, C. E. (2011).
Tuning of the human neocortex to the temporal dynamics of
attended events. Journal of Neuroscience, 31(9), 3176–3185.
https://doi.org/10.1523/ JNEUROSCI.4518-10.2011, PubMed:
21368029
Bidelman, G. M., Brown, J. A., & Bashivan, P. (2021). Auditory cortex
supports verbal working memory capacity. NeuroReport, 32(2),
163–168. https://doi.org/10.1097/ WNR.0000000000001570,
PubMed: 33323838
Bidelman, G. M., Moreno, S., & Alain, C. (2013). Tracing the emer-
gence of categorical speech perception in the human auditory
system. NeuroImage, 79(1), 201–212. https://doi.org/10.1016/j
.neuroimage.2013.04.093, PubMed: 23648960
Blue Yeti. (2022). USB microphone [Equipment]. https://blueyeti.us
.com
Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by
computer ( Version 5.3.51) [Computer software]. https://www
.fon.hum.uva.nl/praat
Casas, A. S. H., Lajnef, T., Pascarella, A., Guiraud-Vinatea, H.,
Laaksonen, H., Bayle, D., Jerbi, K., & Boulenger, V. (2021).
Neural oscillations track natural but not artificial fast speech: Novel
insights from speech-brain coupling using MEG. NeuroImage,
244, Article 118577. https://doi.org/10.1016/j.neuroimage.2021
.118577, PubMed: 34525395
Compumedics Neuroscan. (2022). SynAmps RT amplifier [Equip-
ment]. https://compumedicsneuroscan.com
Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016).
Cortical tracking of hierarchical linguistic structures in connected
speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10
.1038/nn.4186, PubMed: 26642090
Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D.
(2017). Temporal modulations in speech and music. Neurosci-
ence & Biobehavioral Reviews, 81(Pt B), 181–187. https://doi
.org/10.1016/j.neubiorev.2017.02.011, PubMed: 28212857
Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014).
Acoustic landmarks drive delta–theta oscillations to enable
speech comprehension by facilitating perceptual parsing. Neuro-
Image, 85(Pt 2), 761–768. https://doi.org/10.1016/j.neuroimage
.2013.06.035, PubMed: 23791839
Etymotic. (2023). Insert earphones (ER-2) [Equipment]. https://www
.etymotic.com/
Fridriksson, J., Hubbard, H. I., Hudspeth, S. G., Holland, A. L.,
Bonilha, L., Fromm, D., & Rorden, C. (2012). Speech entrain-
ment enables patients with Broca’s aphasia to produce fluent
speech. Brain, 135(12), 3815–3829. https://doi.org/10.1093
/brain/aws301, PubMed: 23250889
FromTextToSpeech.com. (n.d.). [Online software]. https://www
.fromtexttospeech.com
Gay, T., Ushijima, T., Hiroset, H., & Cooper, F. S. (1974). Effect of
speaking rate on labial consonant-vowel articulation. Journal of
Phonetics, 2(1), 47–63. https://doi.org/10.1016/S0095-4470(19)
31176-3
Ghazanfar, A. A., Takahashi, D. Y., Mathur, N., & Fitch, W. T.
(2012). Cineradiography of monkey lip-smacking reveals puta-
tive precursors of speech dynamics. Current Biology, 22(13),
1176–1182. https://doi.org/10.1016/j.cub.2012.04.055,
PubMed: 22658603
Ghitza, O. (2011). Linking speech perception and neurophysiology:
Speech decoding guided by cascaded oscillators locked to the
input rhythm. Frontiers in Psychology, 2, Article 130. https://doi
.org/10.3389/fpsyg.2011.00130, PubMed: 21743809
Ghitza, O. (2012). On the role of theta-driven syllabic parsing in
decoding speech: Intelligibility of speech with a manipulated
modulation spectrum. Frontiers in Psychology, 3, Article 238.
https://doi.org/10.3389/fpsyg.2012.00238, PubMed: 22811672
Ghitza, O. (2014). Behavioral evidence for the role of cortical θ
oscillations in determining auditory channel capacity for speech.
Frontiers in Psychology, 5, Article 652. https://doi.org/10.3389
/fpsyg.2014.00652, PubMed: 25071631
Giraud, A.-L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak,
R. S., & Laufs, H. (2007). Endogenous cortical rhythms determine
cerebral specialization for speech perception and production.
Neuron, 56(6), 1127–1134. https://doi.org/10.1016/j.neuron
.2007.09.038, PubMed: 18093532
Giraud, A.-L., & Poeppel, D. (2012). Cortical oscillations and
speech processing: Emerging computational principles and oper-
ations. Nature Neuroscience, 15(4), 511–517. https://doi.org/10
.1038/nn.3063, PubMed: 22426255
Goswami, U., & Leong, V. (2013). Speech rhythm and temporal
structure: Converging perspectives? Laboratory Phonology, 4(1),
67–92. https://doi.org/10.1515/lp-2013-0004
Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. (2003).
Temporal properties of spontaneous speech—A syllable-centric
perspective. Journal of Phonetics, 31(3–4), 465–485. https://doi
.org/10.1016/j.wocn.2003.09.005
Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains
slow neural oscillations and optimizes human listening behavior.
Proceedings of the National Academy of Sciences, 109(49),
20095–20100. https://doi.org/10.1073/pnas.1213390109,
PubMed: 23151506
Hyafil, A., Fontolan, L., Kabdebon, C., Gutkin, B., & Giraud, A.-L.
(2015). Speech encoding by coupled cortical theta and gamma
oscillations. Elife, 4, Article e06213. https://doi.org/10.7554/eLife
.06213, PubMed: 26023831
Industrial Acoustics Company. (2023). Sound-attentuating booth
[Equipment]. https://www.iacacoustics.com
Keitel, A., & Gross, J. (2016). Individual human brain areas can be
identified from their characteristic spectral activation fingerprints.
PLOS Biology, 14(6), Article e1002498. https://doi.org/10.1371
/journal.pbio.1002498, PubMed: 27355236
Keitel, A., Gross, J., & Kayser, C. (2018). Perceptually relevant
speech tracking in auditory and motor cortex reflects distinct
linguistic features. PLOS Biology, 16(3), Article e2004473.
https://doi.org/10.1371/journal.pbio.2004473, PubMed:
29529019
Keitel, A., Ince, R. A. A., Gross, J., & Kayser, C. (2017). Auditory
cortical delta-entrainment interacts with oscillatory power in
multiple fronto-parietal networks. NeuroImage, 147, 32–42.
https://doi.org/10.1016/j.neuroimage.2016.11.062, PubMed:
27903440
Khatun, S., Mahajan, R., & Morshed, B. I. (2016). Comparative
study of wavelet-based unsupervised ocular artifact removal
techniques for single-channel EEG data. IEEE Journal of Transla-
tional Engineering in Health and Medicine, 4(1), Article
2000108. https://doi.org/10.1109/ JTEHM.2016.2544298,
PubMed: 27551645
Kirmizi-Alsan, E., Bayraktaroglu, Z., Gurvit, H., Keskin, Y. H., Emre,
M., & Demiralp, T. (2006). Comparative analysis of event-related
potentials during Go/NoGo and CPT: Decomposition of electro-
physiological markers of response inhibition and sustained
Neurobiology of Language
358
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
n
o
/
l
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
a
_
0
0
1
0
2
p
d
.
/
l
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
attention. Brain Research, 1104(1), 114–128. https://doi.org/10
.1016/j.brainres.2006.03.010, PubMed: 16824492
Lachaux, J. P., Rodriguez, E., Martinerie, J., & Varela, F. J. (1999).
Measuring phase synchrony in brain signals. Human Brain Map-
ping, 8(4), 194–208. https://doi.org/10.1002/(SICI)1097-0193
(1999)8:4<194::AID-HBM4>3.0.CO;2-C, PubMed: 10619414
Liberman, UN. M., & Whalen, D. H. (2000). On the relation of speech
to language. Tendances des sciences cognitives, 4(5), 187–196. https://
est ce que je.org/10.1016/S1364-6613(00)01471-6, PubMed: 10782105
Luo, C., & Ding, N. (2020). Cortical encoding of acoustic and
linguistic rhythms in spoken narratives. Elife, 9, Article e60433.
https://doi.org/10.7554/eLife.60433, PubMed: 33345775
Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal
responses reliably discriminate speech in human auditory cortex.
Neurone, 54(6), 1001–1010. https://est ce que je.org/10.1016/j.neuron
.2007.06.004, PubMed: 17582338
MathWorks. (2013). MATLAB 2013 [software]. https://www
.mathworks.com/products/matlab.html
Milenkovic, P.. (2002). TF32 [Computer software]. Université de
Wisconsin–Madison.
Miyake, Y., Onishi, Y., & Pöppel, E. (2004). Two types of anticipa-
tion in synchronization tapping. Acta Neurobiologiae Experi-
mentalis, 64(3), 415–426. PubMed: 15283483
Momtaz, S., Moncrieff, D., & Bidelman, G. M.. (2021). Dichotic
listening deficits in amblyaudia are characterized by aberrant
neural oscillations in auditory cortex. Neurophysiologie clinique,
132(9), 2152–2162. https://doi.org/10.1016/j.clinph.2021.04
.022, PubMed: 34284251
Morillon, B., Arnal, L. H., Schroeder, C. E., & Keitel, UN. (2019).
Prominence of delta oscillatory rhythms in the motor cortex
and their relevance for auditory and speech perception. Neuro-
science & Biobehavioral Reviews, 107, 136–142. https://doi.org
/10.1016/j.neubiorev.2019.09.012, PubMed: 31518638
Nourski, K. V., Reale, R.. UN., Oya, H., Kawasaki, H., Kovach, C. K.,
Chen, H., Howard, M.. UN., III, & Brugge, J.. F. (2009). Temporal
envelope of time-compressed speech represented in the human
auditory cortex. Journal des neurosciences, 29(49), 15564–15574.
https://doi.org/10.1523/ JNEUROSCI.3065-09.2009, PubMed:
20007480
Oldfield, R.. C. (1971). The assessment and analysis of handedness:
The Edinburgh inventory. Neuropsychologie, 9(1), 97–113.
https://doi.org/10.1016/0028-3932(71)90067-4, PubMed:
5146491
Peelle, J.. E., Gross, J., & Davis, M.. H. (2013). Phase-locked
responses to speech in human auditory cortex are enhanced
during comprehension. Cortex cérébral, 23(6), 1378–1387.
https://doi.org/10.1093/cercor/bhs118, PubMed: 22610394
Pellegrino, F., Coupé, C., & Marsico, E. (2011). A cross-language
perspective on speech information rate. Language, 87(3),
539–558. https://doi.org/10.1353/lan.2011.0057
Picton, T. W., Alain, C., Woods, D. L., John, M.. S., Scherg, M.,
Valdes-Sosa, P., Bosch-Bayard, J., & Trujillo, N. J.. (1999). Intrace-
rebral sources of human auditory-evoked potentials. Audiology &
Neuro-otology, 4(2), 64–79. https://doi.org/10.1159/000013823,
PubMed: 9892757
Poeppel, D. (2003). The analysis of speech in different temporal
integration windows: Cerebral lateralization as “asymmetric
sampling in time.” Speech Communication, 41(1), 245–255.
https://doi.org/10.1016/S0167-6393(02)00107-3
Poeppel, D., & Assaneo, M.. F. (2020). Speech rhythms and their
neural foundations. Nature Revues Neurosciences, 21(6),
322–334. https://doi.org/10.1038/s41583-020-0304-4, PubMed:
32376899
Pressing, J., & Jolley-Rogers, G. (1997). Spectral properties of
human cognition and skill. Biological Cybernetics, 76(5),
339–347. https://doi.org/10.1007/s004220050347, PubMed:
9237359
Repp, B. H. (2005). Sensorimotor synchronization: A review of
the tapping literature. Psychonomic Bulletin & Review, 12(6),
969–992. https://doi.org/10.3758/ BF03206433, PubMed:
16615317
Rimmele, J.. M., Poeppel, D., & Ghitza, Ô. (2021). Acoustically
driven cortical δ oscillations underpin prosodic chunking.
Eneuro, 8(4), Article ENEURO.0562-20.2021. https://est ce que je.org/10
.1523/ENEURO.0562-20.2021, PubMed: 34083380
Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to
gestural patterning in speech production. Ecological Psychology,
1(4), 333–382. https://doi.org/10.1207/s15326969eco0104_2
Stefanics, G., Hangya, B., Hernádi, JE., Winkler, JE., Lakatos, P., &
Ulbert, je. (2010). Phase entrainment of human delta oscillations
can mediate the effects of expectation on reaction speed. Journal
of Neuroscience, 30(41), 13578–13585. https://est ce que je.org/10.1523
/JNEUROSCI.0703-10.2010, PubMed: 20943899
Studebaker, G. UN. (1985). A “rationalized” arcsine transform. Jour-
nal of Speech, Language, and Hearing Research, 28(3), 455–462.
https://doi.org/10.1044/jshr.2803.455, PubMed: 4046587
Teng, X., Tian, X., Rowland, J., & Poeppel, D. (2017). Concurrent
temporal channels for auditory processing: Oscillatory neural
entrainment reveals segregation of function at different scales.
PLOS Biology, 15(11), Article e2000812. https://est ce que je.org/10
.1371/journal.pbio.2000812, PubMed: 29095816
Thors, H. (2019). Speech entrainment to improve spontaneous
speech in Broca’s aphasia [Doctoral dissertation, Norman J.
Arnold School of Public Health, Wilmington University]. Univer-
sity of South Carolina Scholar Commons Theses and Disserta-
tion. https://scholarcommons.sc.edu/etd/5454
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of
speech rhythm. Journal of the Acoustical Society of America,
124(2), EL34–EL39. https://doi.org/10.1121/1.2947626,
PubMed: 18681499
Tucker-Davis Technologies. (2022). Signal processing interface
(RP2) [Equipment]. https://www.tdt.com
van Lieshout, P.. H. H. M.. (2004). Dynamical systems theory and its
application in speech. In B. Maassen, R.. D. Kent, H. F. M.. Peters,
P.. H. H. M.. van Lieshout, & W. Hulstijn (Éd.), Speech motor
control in normal and disordered speech (pp. 51–82). Oxford
Presse universitaire.
Varnet, L., Ortiz-Barajas, M.. C., Erra, R.. G., Gervain, J., & Lorenzi,
C. (2017). A cross-linguistic study of speech modulation spectra.
Journal of the Acoustical Society of America, 142(4), 1976–1989.
https://doi.org/10.1121/1.5006179, PubMed: 29092595
Viemeister, N. F. (1979). Temporal modulation transfer functions
based upon modulation thresholds. Journal of the Acoustical
Society of America, 66(5), 1364–1380. https://doi.org/10.1121/1
.383531, PubMed: 500975
Will, U., & Berger, E. (2007). Brain wave synchronization and
entrainment to periodic acoustic stimuli. Neuroscience Letters,
424(1), 55–60. https://doi.org/10.1016/j.neulet.2007.07.036,
PubMed: 17709189
Wilsch, UN., Neuling, T., Obleser, J., & Herrmann, C. S. (2018).
Transcranial alternating current stimulation with speech enve-
lopes modulates speech comprehension. NeuroImage, 172,
766–774. https://doi.org/10.1016/j.neuroimage.2018.01.038,
PubMed: 29355765
Wilson, S. M., Saygin, UN. P., Séréno, M.. JE., & Iacoboni, M.. (2004).
Listening to speech activates motor areas involved in speech
Neurobiology of Language
359
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
/
.
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobehavioral speech synchronization
production. Neurosciences naturelles, 7(7), 701–702. https://doi.org
/10.1038/nn1263, PubMed: 15184903
Wynn, C. J., Barrett, T. S., & Borrie, S. UN. (2022). Rhythm percep-
tion, speaking rate entrainment, and conversational quality: UN
mediated model. Journal of Speech, Language, and Hearing
Research, 65(6), 2187–2203. https://doi.org/10.1044/2022
_JSLHR-21-00293, PubMed: 35617456
Zhou, J., Yu, K., Chen, F., Wang, Y., & Arshad, S. Z. (2018). Multi-
modal behavioral and physiological signals as indicators of cog-
nitive load. In S. Oviatt, B. Schuller, P.. Cohen, D. Sonntag, G.
Potameanos, & UN. Krüger (Éd.), The handbook of multimodal-
multisensor interfaces: Vol. 2. Signal processing, architectures,
and detection of emotion and cognition (pp. 287–329). Morgan
& Claypool. https://doi.org/10.1145/3107990.3108002
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
n
o
/
je
/
je
un
r
t
je
c
e
–
p
d
F
/
/
/
/
4
2
3
4
4
2
0
9
6
8
7
2
n
o
_
un
_
0
0
1
0
2
p
d
.
/
je
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Neurobiology of Language
360