Seeing an Auditory Object: Pupillary Light Response

Seeing an Auditory Object: Pupillary Light Response
Reflects Covert Attention to Auditory Space and Object

Hsin-I Liao1

, Haruna Fujihira1,2, Shimpei Yamagishi1,

Yung-Hao Yang1, and Shigeto Furukawa1

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
5
2
2
7
6
2
0
6
7
0
1
8

/
j

o
C
N
_
UN
_
0
1
9
3
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Astratto

■ Attention to the relevant object and space is the brain’s strat-
egy to effectively process the information of interest in complex
environments with limited neural resources. Numerous studies
have documented how attention is allocated in the visual
domain, whereas the nature of attention in the auditory domain
has been much less explored. Here, we show that the pupillary
light response can serve as a physiological index of auditory
attentional shift and can be used to probe the relationship
between space-based and object-based attention as well. Exper-
iments demonstrated that the pupillary response corresponds
to the luminance condition where the attended auditory object

(per esempio., spoken sentence) was located, regardless of whether
attention was directed by a spatial (left or right) or nonspatial
(per esempio., the gender of the talker) cue and regardless of whether
the sound was presented via headphones or loudspeakers.
These effects on the pupillary light response could not be
accounted for as a consequence of small (although observable)
biases in gaze position drifting. The overall results imply a uni-
fied audiovisual representation of spatial attention. Auditory
object-based attention contains the space representation of
the attended auditory object, even when the object is oriented
without explicit spatial guidance.

INTRODUCTION

The auditory world is rarely silent. It is usually full of var-
ious sounds, including background noises, and some-
times with multiple people talking at the same time.
Attention to the relevant object and space is the brain’s
strategy to effectively process the information of interest.
A massive number of studies based on behavioral mea-
surements have described how attention is distributed
in visual space, Per esempio, as the spotlighting metaphor
(Posner, Snyder, & Davidson, 1980), the zoom-lens meta-
phor (Cave & Bichot, 1999; Eriksen & St. James, 1986), O
the gradient model (Downing, 1988). Tuttavia, IL
evidence for the nature of “auditory” spatial attention is
less robust and controversial (Best, Shinn-Cunningham,
Ozmeral, & Kopco, 2010; Best, Ozmeral, Kopčo, &
Shinn-Cunningham, 2008; Spence & Driver, 1994, 1996;
Mondor & Zatorre, 1995; Quinlan & Bailey, 1995; Rhodes,
1987). This makes it difficult to infer how spatial attention
functions in auditory space and for an auditory object.
Neuroimaging studies indicate that auditory spatial atten-
tion and visual spatial attention share the same neural cir-
cuit underlying the dorsal frontoparietal cortical networks
(Braga, Fu, Seemungal, Wise, & Leech, 2016; Smith et al.,
2010; Corbetta, 1998). It suggests that auditory spatial
attention operates in a way similar to visual spatial

1NTT Communication Science Laboratories, Japan, 2Japan Soci-
ety for the Promotion of Science

Attenzione. In the current study, we aimed to demonstrate
that the internal processes related to auditory attention
can be “read out” via the pupillary light response (PLR)
as they have been shown to be in visual attention (Mathôt,
van der Linden, Grainger, & Vitu, 2013). Inoltre, PLR
tracks auditory spatial attention across time, presumably
reflecting the internal processing of how attention selects
an auditory object.

Recent evidence indicates that the PLR is modulated by
neural activities from the FEF (Ebitz & Moore, 2017), E
this neural basis may explain why it reflects not only
changes in physical luminance but also top–down modu-
lated perceptual brightness (also see Strauch, Wang,
Einhäuser, Van der Stigchel, & Naber, 2022; Binda &
Gamlin, 2017). Pioneering studies have demonstrated that
pupils respond to perceived brightness even when the
physical luminance input remains the same in the case
of visual awareness to luminance in binocular rivalry
(Naber, Frässle, & Einhäuser, 2011), the bright illusion
(Suzuki, Minami, Laeng, & Nakauchi, 2019; Laeng &
Endestad, 2012), visual scene interpretation (Binda,
Pereverzeva, & Murray, 2013; Naber & Nakayama, 2013),
and mental imagery (Laeng & Sulutvedt, 2014). Remark-
ably, pupils also reflect the luminance condition of the
location to which visual attention is directed while the eyes
fixate steadily (Strauch, Romein, Naber, Van der Stigchel,
& Ten Brink, 2022; Binda, Pereverzeva, & Murray, 2014;
Mathôt, Dalmaijer, Grainger, & Van der Stigchel, 2014;
Binda et al., 2013; Mathôt et al., 2013; Haab, 1886). For

© 2022 Istituto di Tecnologia del Massachussetts. Published under a
Creative Commons Attribution 4.0 Internazionale (CC BY 4.0) licenza.

Journal of Cognitive Neuroscience 35:2, pag. 276–290
https://doi.org/10.1162/jocn_a_01935

instance, Mathôt et al. (2013) demonstrated that when
participants view a visual display containing luminance dis-
parity between the left and right visual hemifields (per esempio.,
darkness on the left and brightness on the right, or vice
versa), pupil size is larger when they covertly attend to,
but not overtly shift the eyes to, the dark rather than bright
visual hemifield. A recent study further showed that this
attentional modulation of the PLR operates on abstract
mental content of a left-to-right spatial–numerical associa-
zione (Salvaggio, Andres, Zénon, & Masson, 2022). Questo
implies that the attentional modulation of the PLR does
not only serve for the anticipation of the coming percep-
tion in the visual system (Mathôt, 2018; Mathôt & Van der
Stigchel, 2015) but automatically interacts with a more
general cognitive attentional function as well. This view
also suggests the possibility of attentional modulation of
PLR in other sensory domains.

One apparent difference between visual attention and
auditory attention is that visual attention is better under-
stood in its spatial deployment, whereas auditory attention
enables us to understand more about its temporal aspect,
assuming that both operate through the same underlying
neural network (Noyce, Kwasa, & Shinn-Cunningham,
2022). Auditory objects or streams consist of acoustic fea-
tures extended through time, which can be presented in
separate or overlapped space (Shinn-Cunningham, 2008;
Fritz, Elhilali, David, & Shamma, 2007). Space is not con-
sidered an essential feature to define an auditory object.
By contrast, in natural scenes, visual objects are generally
associated with particular locations in space. Theoretically,
the location serves as the master map in visual attention
(Treisman & Gelade, 1980). Assuming the same spatial
attention mechanism operates in both the visual and
auditory domains, it would be expected that the same
attentional modulation of the PLR operates in auditory
spazio, particularly when the attended auditory object is
referenced by a spatial cue. Tuttavia, it is unclear whether
the location information is automatically and compulsively
represented in the auditory object when it is attended and
defined via nonspatial acoustic features. Neuroimaging
studies have shown controversial evidence: Some indicate
that nonspatial auditory attention engages networks such
as the inferior frontal gyrus (Larson & Lee, 2014; Hill &
Mugnaio, 2009); others indicate an overlapping neural circuit
between space-based and object-based auditory attention
(Bushara et al., 1999; Zatorre, Mondor, & Evans, 1999). If
nonspatial object-based auditory attention operates inde-
pendently of space-based auditory attention, it is plausible
that the auditory object can be attended and processed
without spatial attention allocated to its location. Alterna-
tively, if object-based and space-based auditory attention
share a common mechanism, spatial attention is expected
to shift to the object’s location even when the auditory
object is selected via nonspatial features. Inoltre, IL
timing of spatial attention shifts may vary if the involve-
ment of spatial information requires further processing
depending on the circumstances.

We addressed the above issues by investigating how
PLR reflects spatial attention to auditory objects. In four
esperimenti, human participants listened to two concur-
rent environmental sounds or speech sentences pre-
sented dichotically to the two ears through headphones
(Experiments 1–3) or through two loudspeakers located
left and right in space (Experiment 4). They were
instructed to attend the one defined by a spatial cue (left
or right) or by a nonspatial cue, Per esempio, the gender
of the talker (male or female). Sitting in front of a visual
display containing luminance disparity between the left
and right visual hemifields, they fixated the center of
the display while listening to the auditory stimuli. An
infrared video-based eye tracker recorded their pupillary
responses and gaze positions throughout the experi-
menti. The results consistently demonstrated that their
pupils dilated more strongly when the attended auditory
object was located on the dark side of the visual hemi-
field than when it was on the bright side. The finding
was replicated regardless of whether attention was
directed by a spatial or nonspatial cue and whether the
sound was presented via headphones or loudspeakers.
The timing of the PLR divergence (cioè., the difference
between attend-to-dark and bright conditions) occurred
earlier for the spatial cue than nonspatial cue. Local lumi-
nance differences due to gaze position drift could not
explain most of the effects. The overall results imply that
auditory attention to an object in space and visual spatial
attention recruit a common underlying mechanism.
When the auditory object is directed by the nonspatial
cue, extra time is needed to identify the auditory object’s
location before shifting spatial attention accordingly. IL
finding provides profound insights into not only the neu-
ral mechanism of spatial attention across modalities but
also into brain–computer interface to predict human
auditory spatial attention by eyes.

METHODS

Participants
Seventy-four adults (54 women, age range 20–50 years,
median age 39 years) who had normal or corrected-to-
normal vision and normal hearing acuity participated in
the current study (15 in Experiment 1, 15 in Experiment 2,
27 in Experiment 3, E 17 in Experiment 4). Sample
sizes were chosen based on our previous studies with
comparable pupillometry measurements and trial numbers
per participant (Liao, Kashino, & Shimojo, 2021; Liao,
Kidani, Yoneya, Kashino, & Furukawa, 2016). In Experi-
ment 3, the number of trials in the critical dichotic condi-
tion for each participant was half of that in Experiment 2,
because of adding a diotic sound presentation condition
(see Design and Procedure for details). To have a similar
number of critical trials, more participants were recruited.
One participant in Experiment 4 was excluded because
del < 50% accuracy in task performance (46.9%). All were naive about the purpose of the study. The current Liao et al. 277 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 study was approved by the NTT Communication Science Laboratories ethics committee. All participants gave written informed consent before the experiment and received payment for their participation. Overview of Experiments Experiment 1 was modified from Mathôt et al. (2013) by changing the main task to an auditory task to discrimi- nate two environmental sounds. Experiments 2, 3, and 4 used spoken sentences. Participants dichotically lis- tened to two sentences spoken by different talkers, one male and one female, and attended the one according to the instruction. In Experiment 2, the target sentence was defined by space and the gender of the talker. In Exper- iments 3 and 4, the target sentence was defined by gen- der. In Experiment 3, half of the target sentences were presented dichotically, and the other half was presented diotically. In Experiment 4, the auditory stimuli were pre- sented via two loudspeakers instead of the headphones as in Experiments 1–3. All the experiments had a general common structure as follows. A trial started with presentation of a visual display with its left and right hemifields dark and bright, respec- tively, or vice versa. After 3 sec as an adaptation period, a “voice cue” was presented to indicate the target dimension (i.e., space or gender) in Japanese, followed by a “silent period” (Experiment 1) or a “listening period” (Experi- ments 2–4). This was followed by a “response period” wherein participants were asked to discriminate the envi- ronmental sound as soon and accurately as possible (Experiment 1) or to recall the content of the target sen- tence with no time pressure (the others). Participants were given a written and oral explanation about the nature of the auditory task and performed several practice trials to familiarize themselves with the task. Apparatus and Stimuli Auditory and visual stimuli were generated with MATLAB (The MathWorks, Inc.) and PsychToolbox (Kleiner, Brainard, & Pelli, 2007; Brainard, 1997; Pelli, 1997), con- trolled by a personal computer (Dell OptiPlex 755). Visual stimuli were presented on an 18.1-in. monitor (Eizo FlexScan L685Ex) with a 60-Hz frame rate and a 1280 × 1024 resolution in Experiments 1, 2, and 3 and a 1024 × 768 resolution in Experiment 4. There were two types of visual displays with luminance disparity between the left and right visual fields: black (0.35 cd/m2) on the left and white (94.04 cd/m2) on the right, and vice versa. In Experiment 1, a gray fixation cross (0.5° × 0.5°, 21.04 cd/m2) was presented at the center of the visual display against the background with luminance disparity. In Experiments 2, 3, and 4, a gray vertical band (2.3° in width, 21.04 cd/m2) acted as a “buffer zone” that was superimposed at the center against the background with luminance disparity to lessen the sharp luminance variation in fixation. A Gabor patch (0.5° × 0.5°), which was generated by superimposing a Gaussian and a sine- wave function with a vertical orientation (10 cycles per degree) and presented at the center, served as the fixation point. Auditory stimuli were presented through headphones (Sennheiser HD 595) in Experiments 1, 2, and 3 and through two custom-built loudspeakers in Experiment 4. The loudspeakers were positioned so that their centers were located 70 cm left and right of the fixation point on the visual display, and the surfaces of the display and the loudspeakers were aligned. The voice cues were recorded by a female native-Japanese speaker in our laboratory. They had a duration of 400–570 msec and were presented diotically (identical signal to both ears). The target sounds used in Experiment 1, a dog barking and a phone ringing, were sampled from a database on a compact disc (Audio Pro Sound Effects by Yannick Chevalier) and edited to be 500 msec in length. In Experiments 2, 3, and 4, the target sentences were sampled from the coordinate response measure (CRM) corpus recorded in our laboratory, Japanese version with a slight variation in the keywords of Bolia, Nelson, Ericson, and Simpson (2000). Each CRM sentence consisted of three Japanese keywords, namely, an animal name (KUMA [bear], SHIKA [deer], TORA [tiger], INU [dog], NEKO [cat], TORI [bird], SARU [monkey], or BUTA [pig]), a color (AKA [red], AO [blue], SHIRO [white], KURO [black], MOMO [pink], or NIJI [rainbow]), and a number (ZERO [zero], ICHI [one], SAN [three], YON [four], ROKU [six], NANA [seven], HACHI [eight], KYUU [nine], or JUU [ten]). The sentences were spoken by 10 different native-Japanese talkers (five of each gender) in a 2.5- to 3-sec length. The sounds were presented at a comfortable listening level, self-adjusted by each participant. During the response period, participants viewed the fixation display with luminance disparity in Experiment 1. In Experiments 2, 3, and 4, there were three types of response displays, corresponding to the three keywords, namely, an animal name, color, and number. The animal- response display consisted of eight patches (2.5° × 1.3°), each labeled with animal names. The color-response dis- play consisted of six patches, each labeled with color names. The number-response display consisted of nine patches, each labeled with a number. All the labels were written in Japanese Katakana. Eye movements including pupillary responses were recorded binocularly with an infrared eye-tracker camera (Eyelink 1000 Plus Desktop Mount, SR Research Ltd.). The camera was positioned below the monitor. The sampling rate of the recording was 1000 Hz. Participants sat in front of the monitor at an 80-cm distance with their head sup- ported on a chin rest. Before each formal experimental session, they went through the 5-point Eyelink calibration program to calibrate and validate their eye data. After the calibration, they were instructed to fixate the central fixa- tion point while performing the auditory task. 278 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Design and Procedure In Experiment 1 (Figure 1A), the voice cue indicating “left” or “right” was presented, followed by a 5-sec silent period. In the response period, the target sounds (i.e., dog barking and phone ringing) were presented dichoti- cally (one to each ear). Participants were asked to shift their attention to the left or right ear, according to the voice cue’s instruction, and discriminate whether the sound presented in the cued direction was “dog barking” or “phone ringing” by pressing a designated key on the keyboard as soon and accurately as possible. The correspondence of the response key and target sound was counterbalanced across the participants: Half of the participants pressed the “M” key for “dog barking” and the “N” key for “phone ringing,” and the other half did the opposite. Once they pressed the response key, the next trial started. The visual display disparity (black on left or right), the voice cue (left or right), and the tar- get sound (dog barking sound to the left or right ear) were counterbalanced across trials. There were 48 trials in each block. Each block took about 15 min, and the participants performed two blocks with a 15-min break in between. In Experiments 2, 3, and 4, 1 sec after the voice cue pre- sentation, the listening period started, in which the two target sentences were presented dichotically (Figure 2A). The target sentences were selected so that the contents of the three keywords would not overlap. Each sentence was repeated twice, resulting in a total duration of 6 sec. Partic- ipants were asked to pay attention to the target sentence, memorize its content, and recall it during the response period. With no time pressure, they gave the answer by using the mouse to click the patch in the response display, which was randomly selected from the three types of response displays for each trial. Participants did not know in advance the response set (animal, color, or number) for each trial and thus had to memorize the entire content to perform the task. Experiment 2 consisted of two conditions, namely, the attend-to-location and attend-to-gender conditions. In the attend-to-location condition, the voice cue was on the spa- tial dimension (“left” or “right”). Participants paid atten- tion to the sentence presented at the cued direction, as in Experiment 1. In the attend-to-gender condition, the voice cue was on the dimension of the talker’s gender (“male” or “female”). Participants paid attention to the sentence spoken by the male talker if the voice cue said “male” beforehand or to the one spoken by the female talker if the voice cue said “female” beforehand. The visual display disparity, the voice cue (left or right in the attend- to-location condition; male or female in the attend-to- gender condition), and the target sound (the male’s spoken sentence to the left or right ear) were counterba- lanced across trials. There were 64 trials in each block. Participants performed the two types of attention l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 1. Experimental procedure and pupillary response results in Experiment 1. (A) Schematic procedure. The visual display with two different luminance disparities was randomly presented for each trial. Participants paid attention to their left or right ear, depending on the voice cue, and responded by indicating whether the cued target sound was a “dog barking” or a “phone ringing” as soon and accurately as possible while looking at the central fixation cross for pupillary response recording. (B) Pupillary response as a function of time from the voice cue onset, parameterized with the cue–luminance associations (attend dark or bright side). The shaded area represents standard errors across participants. The horizontal black line indicates a significant difference between the attend-dark and attend-bright conditions (nonparametric cluster-based permutation tests, p < .05) during the period of 821–5000 msec. (C) Mean pupil diameter as a function of the cue–luminance associations. Error bars represent standard errors among participants. Gray lines represent data of individual participants. Norm. = Normalized. Liao et al. 279 Figure 2. Experimental procedure and pupillary response results in Experiment 2. (A) Schematic of procedure. The visual display with two different luminance disparities was randomly presented for each trial. The center vertical gray band was presented as a “buffer” zone. Participants paid attention to the target sentence, memorized its content, and recalled it after the sound presentation. The target sentence was defined by a spatial cue (I) or a nonspatial cue (II). (B) Pupillary response as a function of time from the voice cue onset in the spatial (i.e., attend-to-location) and nonspatial (i.e., attend-to- gender) cue conditions, parameterized with the cue– luminance associations (attend dark or bright side; left) or the target–luminance associations (target located on the dark or bright side; right), respectively. The shaded area represents standard errors across participants. The horizontal black lines indicate a significant difference between the attend-dark and attend-bright conditions (nonparametric cluster-based permutation tests, p < .05) during the period of 978–7000 msec in the left panel and the difference between the target-at-dark and target-at-bright conditions during that of 1542–7000 msec in the right panel. (C) Mean pupil diameter as a function of the cue–luminance associations (left) or target–luminance associations (right). Error bars represent standard errors among participants. Gray lines represent data of individual participants. Norm. = Normalized. conditions in separate sessions, with the order counter- balanced across participants. Each attention condition consisted of two blocks with a short break in between. In Experiment 3, the design and procedure were the same as in Experiment 2 except for the following. First, only the gender voice cue (male or female) was used. Second, the target sentences were presented dichotically as well as diotically, where the two target sentences were mixed into one signal and presented to the two ears. The two types of target sound presentation (dichotic, diotic) were presented in randomized order within a block. There were 32 trials for each target sound presentation type, resulting in 64 trials in each block. Participants per- formed two blocks with a break in between. Experiment 4 followed the same design and procedure of the attend-to- gender condition in Experiment 2, except that all the auditory stimuli were presented via speakers instead of headphones. Participants only performed one block. Behavioral Data Analyses In all experiments, trials with incorrect answers were excluded from further analyses (1.7%, 3.4%, 4.4%, and 1.6% of trials excluded in Experiments 1, 2, 3, and 4, respectively). Further exclusion criterion was set for Experiment 1: Trials with an RT exceeding two times the standard deviation from the mean within each session were excluded (4.3% of trials). No RT criterion was set for Experiments 2–4 because participants performed the task with no time pressure. 280 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Eye Metrics Data Analyses Because eye movements and pupillary responses are con- sensual, only the data from the left eye were used. During the silent or listening period, blinks accounted for 12.3% of data points in Experiment 1, 12.2% in Experiment 2, 12.8% in Experiment 3, and 12.3% in Experiment 4. The missing pupil-diameter data during blinks were interpo- lated by using shape-preserving piecewise cubic interpola- tion. To compare the pupillary response results across participants and conditions, pupil diameter data were normalized by z-transform using all the data recorded in each block and then baseline-corrected trial-by-trial by subtracting the mean of the data during the 1-sec period before the cue onset. Statistical Analysis Frequentist and Bayesian Statistics Both frequentist and Bayesian statistics were performed on average data, including RTs (Experiment 1), accuracies, and mean pupil diameters during the silent period (Exper- iment 1) or the listening period (Experiments 2–4). The Bayesian statistical analyses were computed by using the open-source program JASP ( Jeffreys’s Amazing Statistic Program; jasp-stats.org), referenced by Keysers, Gazzola, and Wagenmakers (2020). Detailed parameters and tests for each experiment are described below. Behavioral Performance In Experiment 1, mean RTs and accuracies were subjected to repeated-measure ANOVAs with Display Disparity (black on left or right), Cue Direction (left, right), and Target Type (dog barking or phone ringing) as within- subject factors. In Experiments 2 and 3, mean accuracies were subjected to paired two-sample t tests on the com- parison between attend-to-location and attend-to-gender conditions (Experiment 2) and on the comparison between dichotic and diotic conditions (Experiment 3). For between-experiment comparison, mean accuracies in the attend-to-gender condition in Experiment 2 and mean accuracies of the dichotic trials in Experiment 3 were subjected to independent samples t test. repeated-measure ANOVA with the Target–Luminance Associations (dichotic trials with the target located on the dark or bright side, or diotic trials with the target inter- laced at the center) as a within-subject factor. Between- experiment comparison was conducted as follows: Mean pupil diameters in Experiment 2 (only in the attend-to- gender condition) and Experiment 3 (only the dichotic presentation trials) were subjected to a mixed-design ANOVA with the Target–Luminance Associations (target located at the dark or bright side) as a within-subject factor and Experiment as a between-subject factor. Eye Metrics Data across Time Pupil diameter change and luminance of the gazed posi- tion were subjected to nonparametric cluster-based per- mutation tests (Maris & Oostenveld, 2007) to examine the difference between the conditions across time. Lumi- nance contrast of the gazed position was smoothed by moving averaging with a 100-msec time window for each participant. This was done to reduce the noise across time. The cluster-based analyses were computed by using the Fieldtrip MATLAB toolbox (Oostenveld, Fries, Maris, & Schoffelen, 2011) with 5000 iterations and both the cluster-defining height threshold and FWE-corrected clus- ter size threshold below an α level of .05. Linear Mixed-Effects Models Linear mixed-effects (LME) analyses were performed at each 1-msec sampling point with luminance of the gazed position (dark, bright in Experiment 1; dark, bright, gray in Experiments 2–4) and attended luminance (cue–luminance or target–luminance association as dark, bright) as fixed effects, participant as the random effect, and pupil diameter as the dependent variable. The LME analyses were com- puted by using the lem4 package in R (Bates, Mächler, Bolker, & Walker, 2015). Following a criterion similar to Mathôt et al. (2014), the significant clusters were defined as a t value > 2 with at least 500 consecutive samples.
Unlike Mathôt et al. (2014), who chose 200 consecutive
samples, we set the criterion stricter with larger consecu-
tive samples because the gazed luminance was noisy
across time.

Attention-Based PLR Bias
Mean pupil diameters during the silent (cioè., 0–5 sec time-
locked to the cue onset in Experiment 1) or listening (cioè.,
1–7 sec time-locked to the cue onset in Experiments 2–4)
period were subjected to paired two-sample t tests with
the cue–luminance associations (attend the dark or bright
side) for the space cue (Experiments 1 E 2) and with
the target–luminance associations (target located at the
dark or bright side) for the gender cue (Experiments 2–
4) as the independent variables. In Experiment 3, mean
pupil diameter data were further subjected to a

PLR Divergence Latency

The latency was defined as the earliest end of the period
with the cluster-based significant difference between the
attend-to- (or target-at-) dark side and attend-to- (O
target-at-) bright side conditions. Following the jackknife
resampling technique, the latency of the divergence differ-
ence and its variance were estimated. Pupil size difference
between the two attended luminance conditions was sub-
jected to nonparametric cluster-based permutation tests
by omitting the data from one participant. To guarantee
that the effective clusters could be identified for all

Liao et al.

281

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
5
2
2
7
6
2
0
6
7
0
1
8

/
j

o
C
N
_
UN
_
0
1
9
3
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

participants, the threshold was set to be the smallest α
level or the second smallest α level if the difference
between them was smaller than .05. The procedure was
repeated until all participants were omitted once. The esti-
mated latencies were subjected to both frequentist and
Bayesian statistics. Within-experiment comparison was
conducted in Experiment 2, in which the estimated
latencies were subjected to a paired two-sample t test with
the attention condition (attend-to-location, attend-to-
genere) as the independent variable. Between-experiment
comparisons were conducted for the attend-to-location
and attend-to-gender conditions separately. For the
attend-to-location condition, the estimated latencies in
Experiment 1 and those of the attend-to-location condi-
tion in Experiment 2 were subjected to an independent
sample t test with the experiment as the grouping
variable. For the attend-to-gender condition, the esti-
mated latencies in Experiments 3 E 4 and those of
the attend-to-gender condition in Experiment 2 were
subjected to ANOVAs with the Experiment as the fixed
factor.

RESULTS

Experiment 1: PLR Reflects the Direction of
Endogenous Auditory Spatial Attention

We started by replicating the paradigm of Mathôt et al.
(2013), but with a modification of the main task to dis-
criminate auditory objects (Experiment 1). Participants
listened to an auditory voice cue saying “left” or “right”
to shift their attention to the left or right ear, rispettivamente.
Dopo 5 sec, two environmental sounds, a dog barking
and a phone ringing, were presented for 500 msec
dichotically through headphones. They responded by
pressing a designated key to report whether the sound
presented in the cued ear was “dog barking” or “phone
ringing” as soon and accurately as possible (see Figure 1A
for procedure).

Behavioral results showed that RTs and accuracies were
equally fast and accurate among all the conditions (Vedere
Tavolo 1). For RTs, the three-way interaction of Visual Display
disparity (darkness at the left or right visual hemifield), Cue
Direction (left or right), and Target Sound (dog barking or
phone ringing) showed significance, F(1, 14) = 4.87, p =
.045, whereas the Bayes factor (BF) with the model

including only the three-way interaction was very small
(BFincl = 0.70). None of the other effects or interactions
were significant (Fs < 3.6, ps > .08, 0.1 < BFsincl < 0.6). For accuracies, none of any main effects or interactions were significant (Fs < 1.8, ps > .2, 0.2 < BFsincl < 0.7). Most importantly, mean pupil size was larger when the target sound was presented in the cued direction where the visual hemifield was dark than when it was bright, t(14) = 5.32, p < .001, BF+0 = 530.19, median δ = 1.24, 95% CI [0.538, 1.976] (Figure 1C). This differ- ence was significant 821 msec after the voice cue onset ( p < .05, nonparametric cluster-based permutation test; see Figure 1B). Experiment 2: PLR Reflects Continuous Covert Attention to Auditory Object in Space In Experiment 1, the spatial attention effect reflected in the PLR was observed during the period in which no auditory stimulus was presented. It could be argued that, during this period, the participant’s spatial attention was not directed to an auditory object but to the visual dis- play, and thus the present result is a mere replication of the experiment on visual attention (Mathôt et al., 2013). We examined this possibility in Experiment 2, in which the target sound was changed to a continuous speech sentence presented for 6 sec, and we investigated whether and how the PLR effect was observed during the target sound presentation period. The stimuli for each trial were two sentences randomly sampled from a Japanese version of the CRM corpus (Bolia et al., 2000). Versions of the CRM corpus have been widely used to study speech intelligibility in competing speech or noise, and each sentence in the corpus consists of three key- words, namely, an animal name, color, and number in the Japanese version. In our procedure (see Figure 2A), two sentences by two talkers, one male and one female, were presented dichotically through headphones. In the attend-to-location condition, the voice cue saying “left” or “right” (as in Experiment 1) was presented 1 sec before the target sentence. Participants paid attention to the cued direction (or ear), memorized the keywords in the target sentence presented at the cued direction, and later recalled them. In the attend-to-gender condition, the voice cue said “male” or “female,” and participants recalled the content of the sentence expressed by the Table 1. Mean RTs (msec) and Accuracies (in Parentheses) Under Each Condition in Experiment 1 Cue-Left Cue-Right Black-left Black-right Phone 1128 (99%) 1151 (98%) Dog 1149 (97%) 1188 (98%) Phone 1213 (98%) 1144 (99%) Dog 1154 (99%) 1178 (99%) 282 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 cued-gender talker. Participants performed these two conditions in separate blocks, in a counterbalanced order across participants. Behavioral performance was equally good in these two conditions (mean accuracy = 96.1% and 97.1% for the attend-to-location and attend-to-gender conditions, respectively, t(14) = 1.81, p = .092, BF10 = 0.97, median δ = −0.396, 95% CI [−0.916, 0.092]. Critically, in both conditions, mean pupil size was larger when the target sentence appeared in the dark visual hemifield than when it appeared at the bright one: t(14) = 3.43, p = .004, BF+0 = 23.28, median δ = 0.775, 95% CI [0.217, 1.385], and t(14) = 3.97, p = .001, BF+0 = 58.34, median δ = 0.907, 95% CI [0.306, 1.553], in the attend-to-location and attend-to-gender conditions, respectively (Figure 2C). The difference reached significance 978 msec after the spatial cue (left or right) and 1542 msec after the nonspa- tial cue (male or female; Figure 2B). Experiment 3: Does PLR-Reflecting Object-Based Auditory Attention Require Consistent Association with Target Space? An interesting finding of the above experiment (Experi- ment 2) was that PLR reflected attention even to a nonspa- tially guided auditory target. This suggests compulsory involvement of space information in object-based auditory attention. It should be noted, however, that the target sounds were always presented in separate locations (or ears) in that experiment. Thus, it is arguable that the participants might have strategically used the spatial rep- resentation of the target sound predominantly even when the target sound was defined by a nonspatial cue and that the observed PLR effectively reflected an intention of attention to space. Experiment 3 was designed to discour- age the participant from taking this strategy. We mixed trials with dichotic stimulus presentation (as in Experiment 2) and those with diotic presentation in a random order within a block. In the diotic presentation trial, the two talkers’ sentences were mixed into one signal, which was presented to the two ears. In this case, the two voices are not perceptually lateralized, so shifting spatial attention toward the left or right direction would not help in the task. We expected that the modified procedure would encourage the participants to consistently pay more atten- tion to the nonspatial acoustic characteristics to perform the task. Behavioral results showed that the accuracies of dich- otic trials in Experiment 3 were as good as in the attend- to-gender condition in Experiment 2 (96.6% vs. 97.1%, t(40) = 0.28, p = .784, BF10 = 0.32, median δ = 0.067, 95% CI [−0.486, 0.633]), and they were better than those of the diotic trials in Experiment 3 (96.6% vs. 93.5%, t(26) = 6.16, p < .001, BF10 = 11321.85, median δ = 1.115, 95% CI [0.625, 1.621]). Critically, mean pupil sizes were different among the three conditions we tested (i.e., dichotic trials with target on the dark side, dichotic trials with target on the bright side, and diotic trials), F(1.54, 40.15)with Greenhouse–Geisser correction = 3.74, p = .043, BFM = 1.73. Post hoc multiple comparisons indicated that mean pupil size was larger when the target sound was located in the dark hemifield than when it was located in the bright one ( pBonferroni = .026, BF+0 = 3.11, median δ = 0.389, 95% CI [0.062, 0.772]), but no significant difference was found for the other two pairs ( psBonferroni >
.4, BFs+0 < 0.94; Figure 3B). The difference among these conditions was found during the periods of 2036–3403 and 4408–7000 msec (Figure 3A). The segment of two significant periods corresponds to how the target sen- tences were presented, namely, twice with a short break in between. The result suggests that the allocation of spa- tial attention to the auditory object matches the temporal dynamic of the auditory object’s presentation. We compared the PLRs in comparable conditions between Experiments 2 and 3, in which the gender-cue- guided target and the two sentences were presented dichotically. The purpose of this comparison was to examine whether the PLR-reflecting object-based atten- tion was modulated by the consistency of the availability of the sound space information. Mean pupil sizes in the dichotic trials in Experiment 3 and those in the attend- to-gender condition in Experiment 2 were subjected to a mixed-design ANOVA with the Target Sound Location (located in the dark or bright hemifield) as a within-subject factor and Experiment as a between-subject factor. Consis- tent with the previous analyses conducted for individual experiments, mean pupil size was overall larger when the target sound was located in the dark than bright hemifield, F(1, 40) = 23.22, p < .001, BFincl = 112.77. Mean pupil size did not differ between the two experi- ments, F(1, 40) = 0.04, p = .849, BFincl = 0.48. Impor- tantly, the PLR effect was stronger in Experiment 2 than Experiment 3, indexed by the interaction between the target sound location and experiment, F(1, 40) = 5.75, p = .021, BFincl = 2.27. The overall results indicate that PLRs reflected spatial auditory attention even when a nonspatial cue guided attention. However, the PLR bias became weaker when the spatial representation of the sound source did not always help for the task at hand (as in Experiment 3). Experiment 4: Attention-Related PLRs to Sounds from Loudspeakers in Space It can be argued that the attention-related PLRs reflected the attention directed to the ear, not the space, per se, in the previous three experiments, in which all the sounds were presented via headphones. To test for this possibil- ity, in Experiment 4, we replicated the attend-to-gender condition in Experiment 2 but presented stimuli via loud- speakers, which were located on either side of the visual display. Participants performed the task with the same accuracy as in Experiment 2 (98.4% vs. 97.1%, t(29) = Liao et al. 283 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 3. Pupillary response results in Experiments 3 and 4. (A) Pupillary response as a function of time from the voice cue onset, parameterized with the target–luminance associations (dichotic trials with the target located on the dark or bright side, or diotic trials). The shaded area represents standard errors across the participants. The horizontal black line indicates a significant difference between the target- at-dark and target-at-bright conditions during periods of 2493–3403 and 4408–5986 msec. The horizontal blue line indicates a significant difference between the target-at-dark and diotic conditions during the period of 2036–2997 msec. The horizontal red line indicates a significant difference between the target-at-bright and diotic conditions during that of 4803– 7000 msec. (B) Mean pupil diameter as a function of the target–luminance associations. Error bars represent standard errors among participants. Gray lines represent data of individual participants. (C) Pupillary response as a function of time from the voice cue onset, parameterized with the target–luminance associations (target located at the dark or bright side). The shaded area represents standard errors across participants. The horizontal black lines indicate a significant difference between the target-at-dark and target-at-bright conditions during the periods of 2559–3669 and 4856–6831 msec. (D) Mean pupil diameter as a function of the target–luminance associations. Error bars represent standard errors among participants. Gray lines represent data of individual participants. Norm. = Normalized. 1.27, p = .216, BF10 = 0.62, median δ = −0.340, 95% CI [−1.016, 0.270]). Critically, the pupillary response results showed that mean pupil size was larger when the target sound came from the loudspeaker located next to the dark visual hemifield than from the one next to the bright one, t(15) = 3.04, p = .008, BF+0 = 12.62, median δ = 0.666, 95% CI [0.161, 1.227] (Figure 3D). The difference reached significance 2559–3669 and 4856–6831 msec after the cue presentation (Figure 3C). The segment of the two periods showed a pattern similar to the result in Experiment 3. Gaze Bias and PLR It can be argued that the attention-related PLR is because of the change in light inputs after uncontrolled eye move- ments such as ocular drifts. To address this issue, we first identified the luminance of the local gazed position by matching the gaze position at each 1-msec sampling point and the luminance of the viewed display for each trial. We found that although the global luminance was controlled to be constant, there were significant differences in the local luminance of the gazed position between attending-dark and attending-bright conditions in all experiments. The difference reached significance at 726 msec after the cue presentation in Experiment 1; 494 and 1470 msec in the attend-to-location and attend- to-gender conditions, respectively, in Experiment 2; 1564 msec in Experiment 3; and 1595 msec in Experiment 4 (see Figure 4). One unexpected finding is that the gaze was focused more when a luminance boundary was presented near the fixation as in Experiment 1 (93% of the data points within 2°) than it was in Experiments 2–4 (percentage of data points within the vertical gray area: 67% for the attend-to-location condition and 70% for the attend-to-gender condition in Experiment 2, 66% in Experiment 3, and 74% in Experiment 4). The difference could be also because of the nature of the task. In any case, although the initial motivation for inserting the gray vertical area (2.3 visual degrees in width) in Experiments 2–4 was to make a “buffer zone” between the hemifields to reduce drastic luminance variations in foveal inputs accompanied by small gaze drifts, the gaze analysis suggests that gaze drift was beyond the gray buffer zone and thus still accompanied local luminance variation. 284 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Figure 4. Gazed luminance contrast as a function of time from the voice cue onset, parameterized with the attended luminance condition (dark or bright) in all experiments. The horizontal black lines indicate a significant difference (nonparametric cluster-based permutation tests, p < .05) between the two attended luminance conditions during the periods of 726–5000 msec in (A), 494–7000 msec in (B), 1470–6747 msec in (C), 1564–7000 msec in (D), and 1595–6268 msec in (E). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 We therefore performed LME analyses to examine whether gazed local luminance or attended luminance could better explain the changes in pupil size. LME analy- ses were conducted at each 1-msec sampling point with gazed luminance and attended luminance as fixed effects, participant as the random effect, and pupil size as the dependent measure. Results showed that, except for Experiment 3, pupil size was better predicted by attended luminance than gazed luminance. In Experiment 1, only attended luminance, not gazed luminance, significantly predicted pupil size. The t value reached significance 800 msec after the cue onset. In Experiments 2 and 4, attended luminance, compared with gazed luminance, predicted pupil size with earlier timings: 971 versus 1690 msec in the attend-to-location condition, 1861 versus 3848 msec in the attend-to-gender condition in Experi- ment 2, and 2579 versus 6755 msec in Experiment 4. The significant periods were also longer and more stable for attended luminance than for gazed luminance (see Figure 5). By contrast, in Experiment 3, although attended luminance showed earlier prediction timing than gazed luminance (1977 vs. 2364 msec), the effect did not last long. The gazed luminance instead showed more stable and larger t values than attended luminance. The results suggest that when the spatial representation of the sound source was not always clear and lateralized, as we mixed dichotic trials and diotic trials here, the PLR was better explained by the gazed local luminance than the attended luminance. The uncertainty of the spatial property may evoke unex- pected gaze drift, obscuring the attention-evoked effect. Liao et al. 285 Figure 5. Result of the LME analysis in all experiments. t Value as a function of time from the voice cue onset, parameterized with regression variables of gazed luminance (blue lines) and attended luminance (orange lines) predicting pupil size. The plot was smoothed by moving averaging with a 200-msec time window. The horizontal color lines indicate the significant period (t > 2) with at least
500 consecutive samples.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
j

/

o
C
N
UN
R
T
io
C
e

P
D

l

F
/

/

/

/

3
5
2
2
7
6
2
0
6
7
0
1
8

/
j

o
C
N
_
UN
_
0
1
9
3
5
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3

Tracking the Time Course of Attentional Shift by
PLR Divergence

On the assumption that the instantaneous change in pupil
size reflects the state of spatial attention at a time point
(with some delay), we may be able to infer the partici-
pants’ internal process for directing spatial attention. Quello
È, the divergence of the PLR between the attend-to-dark
and attend-to-bright conditions can be regarded as reflect-
ing the existence or degree of attention-direction bias.
Figura 6 represents the time course of the difference (In
z score) between the pupil size for the attend-to- (O
target-at-) dark and bright side conditions, summarizing
the results of Experiments 1–4. The horizontal color bars
at the bottom indicate the time ranges in which the

difference (or divergence) was significantly above zero.
We adopted the jackknife resampling technique to esti-
mate the variance of the onset latency for statistical analy-
ses (see Methods for details). Results showed that, In
Experiment 2, the onset latency was shorter in the
attend-to-location condition than in the attend-to-gender
condition, T(14) = 6.38, P < .001, BF10 = 1358.88, median δ = 1.496, 95% CI [0.724, 2.320]. For the attend-to- location condition, the onset latency was shorter in Exper- iment 1 than in Experiment 2, t(28) = 2.74, p = .011, BF10 = 4.89, median δ = 0.814, 95% CI [0.103, 1.592]. For the attend-to-gender condition, the onset latency was differ- ent among the three experiments (i.e., Experiments 2, 3, and 4; F(2, 55) = 179.00, p < .001, BFM = 2.463e+21). Post hoc analysis indicated that the latency was shorter 286 Journal of Cognitive Neuroscience Volume 35, Number 2 Experiment 1 may reflect the complexity or difficulty of the task, or the onset of a target sentence, which delays attentional shift. When the cue was nonspatial (i.e., the attend-to-gender condition in Experiments 2, 3, and 4), participants needed extra time to consciously or uncon- sciously identify the location of the target talker before shifting their spatial attention to the auditory object accordingly. Within a comparable attend-to-gender condi- tion, the latency was longer when the sounds were pre- sented via loudspeakers in space (Experiment 4) than when presented directly to the ears through headphones (Experiment 2). It should be noted that the size of the PLR difference was also smaller in Experiment 4 ( loud- speakers) than in Experiment 2 (headphones). The slower and smaller divergence in Experiment 4 may be because of a smaller difference between the two sound sources in the internal representation. With the loud- speaker presentation, the azimuthal angles of the sources were smaller, and the sound from a loudspeaker reached both ears. The observed PLR effect in Experiment 3 (dichotic and diotic sound presentations mixed in an experimental block in random order) was generally weaker (compared with Experiment 2) and better explained by the local gazed luminance caused by unstable ocular drifting than by attended luminance (Figure 5). This could be because of the complexity of the stimulus presentation procedure, in which the auditory objects were sometimes presented in overlapping space. Participants may have, in general, decreased their incentives for using space representation of the auditory object. Alternatively, they may have changed their strategy trial-by-trial, depending on the availability of a clear space presentation of the auditory object (i.e., left or right vs. center/unknown). Such a trial-by-trial switching of strategy is implied by the behavioral perfor- mance in that the accuracy was higher when the sounds were presented in separate spaces (i.e., dichotic trials) than mixed into one signal (i.e., diotic trials). In any case, the Experiment 3 results suggest that, despite the possibil- ity of a coactivation of a spatial-attention-related mecha- nism by object-based attentional orientation, the spatial attention may more strongly affect fixational eye move- ments than the PLR per se. These movements can be regarded as ocular drifts or microtremors rather than sac- cades, because most of the samples of the gaze position fell within 1 visual degree around the fixation point, albeit it is difficult to classify the detailed characteristics with the video-based eye-tracking system used in the current study (Ko, Snodderly, & Poletti, 2016). As a result of the fixa- tional eye movements, together with the nature of the visual display we used here, the local luminance is input to the eyes differently and thus affects pupil sizes. Future study should investigate to use of a wider gray area around the fixation to avoid drastic luminance input differences because of gaze position drifting or to better control the participant’s fixation within a designed area by online contingent-gaze position monitoring. Liao et al. 287 Figure 6. Differences in pupillary responses between the attend-to- (or target-at-) dark and attend-to- (or target-at-) bright conditions, as a function of the time from the cue onset. The horizontal color bars indicate the significance (nonparametric cluster-based permutation tests, p < .05) as the pupillary response divergence deviated from zero. The boxplots show the median values, the interquartile range (IQR), IQR × 1.5, and outliers of the divergence latency estimated by the jackknife resampling technique. Exp = Experiment. in Experiment 2 than in Experiment 3 ( pTukey < .001, BF10 = 3.600e+15) or in Experiment 4 ( pTukey < .001, BF10 = 3.981e+9). DISCUSSION PLR Reflects Spatial Attentional Shift to Auditory Object In four experiments, we demonstrated that the PLR reflects the focus of covert auditory attention not only when attention is directed to a particular space by an endogenous spatial cue (Experiments 1 and 2) but also when it is allocated to a particular auditory object via non- spatial characteristics (e.g., gender; Experiments 2, 3, and 4). The finding was replicated regardless of whether the sounds were presented via headphones (Experiments 1, 2, and 3) or loudspeakers (Experiment 4). The time course of the PLR divergence (i.e., the differ- ence between attend-to-dark and attend-to-bright condi- tions; Figure 6) can be assumed to reflect the timing of the auditory spatial attention shift in participant’s planning process of goal-directed action. Given this assumption, we can explain the observed differences in PLR-divergence latencies among the conditions by the fol- lowing scenarios. When the spatial cue was provided explicitly (i.e., Experiment 1 and the attend-to-location condition in Experiment 2), participants started shifting their attention immediately after the cue presentation. The somewhat longer latency in Experiment 2 than in l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Possible Neural Mechanism of PLR Reflecting Auditory Spatial Attention and Its Implications The PLR has been considered primarily as a reflex, but it is not entirely accounted for by physical inputs. The neural pathway from the optic nerves to the oculomotor nerves controls the “reflex” response. When the photosensitive ganglion cells in the retina are activated by light, the infor- mation is transmitted via the optic nerves to the olivary pretectal nucleus (OPN) and projected to the Edinger– Westphal nucleus (EWN) in the midbrain. The EWN sup- plies preganglionic parasympathetic fibers to the eye, which exit with the oculomotor nerves and form synapses with neurons in the ciliary ganglion. Postsynaptic fibers of the parasympathetic root leave the ciliary ganglion in short ciliary nerves, which innervate the pupillary sphincter muscle to induce pupil constriction. In addition to this direct “reflex pathway,” several other modulatory inputs exist. Developmentally, the optic nerves are derived from the diencephalon, a division of the forebrain, although they are categorized as cranial nerves involved in sensory and somatic motor functions. Anatomically, the OPN receives inputs from the FEF in pFC (McDougal & Gamlin, 2015; Leichnetz, 1990; Huerta, Krubitzer, & Jon, 1986; Künzle & Akert, 1977). Recent evidence has further demonstrated that subthreshold electrical microstimulation of the FEF modulates the PLR (Ebitz & Moore, 2017). It has been proposed that the PLR is modulated by multiple cortical and subcortical pro- jections, including a direct projection from the FEF to OPN, or indirect projections involving the relayed areas such as occipital visual cortical areas or superior colliculus (SC) to the OPN or EWN ( Joshi & Gold, 2020; Binda & Gamlin, 2017). Recent evidence supports the account that emphasizes the contribution of the SC, especially in relation to orient- ing responses (Strauch, Wang, et al., 2022; Wang & Munoz, 2018). This SC-centered circuit receives top–down modu- lations from the FEF, ACC, and lateral intraparietal cortex and projects neural activities to the OPN and EWN, as men- tioned above. Microstimulation of the intermediate layers of the SC evokes not only pupil dilation orienting responses ( Wang, Boehnke, White, & Munoz, 2012) but also location-specific pupil luminance modulation ( Wang & Munoz, 2018). Particularly, Wang and Munoz (2018) used the visual display with constant global luminance, similar to the current study, and demonstrated that through altering activity of the intermediate layers of the SC via electrical stimulation (facilitation) and lidocaine injection (inhibition), pupil size was modulated by local luminance at the next fixated location (i.e., attended loca- tion). This serves as the neural basis of top–down modu- lated pupillary responses to the attended or expected luminance condition. The SC, together with the FEF and lateral intraparietal cortex, constitutes the neural network that takes charge of the control of visual selective attention and eye movements (Maunsell, 2015; Knudsen, 2011). In addition to visual attention, the SC is known to integrate multisensory information and have a cross-modal space map. The inferior colliculus, a major brainstem nucleus in the auditory pathway, transmits information that is essential for forming the space map in the SC (Cohen & Knudsen, 1999). The inferior colliculus also receives descending projections from the auditory cortex and thus could be under the influence of cognitive states, including attention (Huffman & Henson, 1990). Our finding of the PLR’s reflecting auditory spatial attention ties closely with the SC-centered account. Assuming that attention- modulated PLR serves for the preparation of the upcoming perception, in particular in the visual domain (Mathôt, 2018; Mathôt & Van der Stigchel, 2015), our finding implies that the spatial representation in auditory space is merged with the pupil-related spatial map in the visual domain. Conclusions The current study provides a method to infer auditory spa- tial attention by examining the luminance condition of the environment and pupillary response. The attention- modulated PLR reflects the dynamics of attentional shift to auditory objects. The overall results imply a unified audiovisual representation of spatial attention. Auditory object-based attention contains a space representation of the attended auditory object, even when the object is oriented without explicit spatial guidance. Reprint requests should be sent to Hsin-I Liao, NTT Communi- cation Science Laboratories, NTT Corporation, 3-1, Morinosato Wakamiya, Atsugi, Kanagawa 243-0198, Japan, or via e-mail: hsini.liao.pb@hco.ntt.co.jp. Data Availability Statement Data and analysis code are available at github.com/hsiniliao /PLR_AuditoryObject. Author Contributions H.-I. L. and S. F. developed the study concept, contributed to the study design, and wrote the manuscript. H.-I. L., H. F., and Y.-H. Y. conducted experiments and collected the data. H.-I. L. performed the data analysis. H. F., S. Y. and Y.-H. Y. provided critical comments. All authors approved the final version of the manuscript for submission. Diversity in Citation Practices Retrospective analysis of the citations in every article pub- lished in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender 288 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 identification of first author/last author) publishing in the Journal of Cognitive Neuroscience ( JoCN ) during this period were M(an)/M = .407, W(oman)/M = .32, M/ W = .115, and W/ W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/ W = .109, and W/ W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article’s gender citation balance. The authors of this article report its pro- portions of citations by gender category to be as follows: M/M = .707; W/M = .122; M/W = .049; W/W = .122. REFERENCES Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01 Best, V., Ozmeral, E. J., Kopčo, N., & Shinn-Cunningham, B. G. (2008). Object continuity enhances selective auditory attention. Proceedings of the National Academy of Sciences, U.S.A., 105, 13174–13178. https://doi.org/10.1073/pnas .0803718105, PubMed: 18719099 Best, V., Shinn-Cunningham, B., Ozmeral, E., & Kopco, N. (2010). Exploring the benefit of auditory spatial continuity. Journal of the Acoustical Society of America, 127, EL258–EL264. https://doi.org/10.1121/1.3431093, PubMed: 20550229 Binda, P., & Gamlin, P. D. (2017). Renewed attention on the pupil light reflex. Trends in Neurosciences, 40, 455–457. https://doi.org/10.1016/j.tins.2017.06.007, PubMed: 28693846 Binda, P., Pereverzeva, M., & Murray, S. O. (2013). Attention to bright surfaces enhances the pupillary light reflex. Journal of Neuroscience, 33, 2199–2204. https://doi.org/10.1523 /jneurosci.3440-12.2013, PubMed: 23365255 Binda, P., Pereverzeva, M., & Murray, S. (2014). Pupil size reflects the focus of feature-based attention. Journal of Neurophysiology, 112, 3046–3052. https://doi.org/10.1152/jn .00502.2014, PubMed: 25231615 Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065–1066. https://doi.org/10.1121/1.428288, PubMed: 10687719 Braga, R. M., Fu, R. Z., Seemungal, B. M., Wise, R. J. S., & Leech, R. (2016). Eye movements during auditory attention predict individual differences in dorsal attention network activity. Frontiers in Human Neuroscience, 10, 164. https://doi.org /10.3389/fnhum.2016.00164, PubMed: 27242465 Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. https://doi.org/10.1163 /156856897X00357, PubMed: 9176952 Bushara, K. O., Weeks, R. A., Ishii, K., Catalan, M.-J., Tian, B., Rauschecker, J. P., et al. (1999). Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans. Nature Neuroscience, 2, 759–766. https://doi.org/10 .1038/11239, PubMed: 10412067 Cave, K. R., & Bichot, N. P. (1999). Visuospatial attention: Beyond a spotlight model. Psychonomic Bulletin & Review, 6, 204–223. https://doi.org/10.3758/BF03212327, PubMed: 12199208 Cohen, Y. E., & Knudsen, E. I. (1999). Maps versus clusters: Different representations of auditory space in the midbrain and forebrain. Trends in Neurosciences, 22, 128–135. https:// doi.org/10.1016/S0166-2236(98)01295-8, PubMed: 10199638 Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye to visual locations: Identical, independent, or overlapping neural systems? Proceedings of the National Academy of Sciences, U.S.A., 95, 831–838. https://doi.org/10.1073/pnas.95.3.831, PubMed: 9448248 Downing, C. J. (1988). Expectancy and visual–spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188–202. https://doi.org/10.1037/0096-1523.14.2.188, PubMed: 2967876 Ebitz, R. B., & Moore, T. (2017). Selective modulation of the pupil light reflex by microstimulation of prefrontal cortex. Journal of Neuroscience, 37, 5008–5018. https://doi.org/10 .1523/jneurosci.2433-16.2017, PubMed: 28432136 Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception & Psychophysics, 40, 225–240. https://doi.org/10 .3758/BF03211502, PubMed: 3786090 Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention—Focusing the searchlight on sound. Current Opinion in Neurobiology, 17, 437–455. https://doi .org/10.1016/j.conb.2007.07.011, PubMed: 17714933 Haab, O. (1886). Vortrag gesellschaft der ärzte in zürich am 21. November 1885. Korrespondenzblatt für Schweizer Ärzte, 16, 153. Hill, K. T., & Miller, L. M. (2009). Auditory attentional control and selection during cocktail party listening. Cerebral Cortex, 20, 583–590. https://doi.org/10.1093/cercor/bhp124, PubMed: 19574393 Huerta, M. F., Krubitzer, L. A., & Jon, H. C. K. (1986). Frontal Intracortical eye field as defined by microstimulation in squirrel monkeys, owl monkeys, and macaque monkeys: I. subcortical connections. Journal of Comparative Neurology, 253, 415–439. https://doi.org/10.1002/cne.902530402, PubMed: 3793998 Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor systems: Connections with the inferior colliculus. Brain Research Reviews, 15, 295–323. https://doi.org/10.1016/0165-0173(90)90005-9, PubMed: 2289088 Joshi, S., & Gold, J. I. (2020). Pupil size as a window on neural substrates of cognition. Trends in Cognitive Sciences, 24, 466–480. https://doi.org/10.1016/j.tics.2020.03.005, PubMed: 32331857 Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience, 23, 788–799. https://doi.org/10.1038/s41593-020-0660-4, PubMed: 32601411 Kleiner, M., Brainard, D., & Pelli, D. (2007). What’s new in Psychtoolbox-3? Perception, 36, 1–16. Knudsen, E. I. (2011). Control from below: The role of a midbrain network in spatial attention. European Journal of Neuroscience, 33, 1961–1972. https://doi.org/10.1111/j.1460 -9568.2011.07696.x, PubMed: 21645092 Ko, H.-K., Snodderly, D. M., & Poletti, M. (2016). Eye movements between saccades: Measuring ocular drift and tremor. Vision Research, 122, 93–104. https://doi.org/10.1016 /j.visres.2016.03.006, PubMed: 27068415 Künzle, H., & Akert, K. (1977). Efferent connections of cortical, area 8 (frontal eye field) in Macaca fascicularis. A reinvestigation using the autoradiographic technique. Journal of Comparative Neurology, 173, 147–164. https://doi .org/10.1002/cne.901730108, PubMed: 403205 Laeng, B., & Endestad, T. (2012). Bright illusions reduce the eye’s pupil. Proceedings of the National Academy of Sciences, U.S.A., 109, 2162–2167. https://doi.org/10.1073/pnas .1118298109, PubMed: 22308422 Laeng, B., & Sulutvedt, U. (2014). The eye pupil adjusts to imaginary light. Psychological Science, 25, 188–197. https:// doi.org/10.1177/0956797613503556, PubMed: 24285432 Liao et al. 289 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Larson, E., & Lee, A. K. C. (2014). Switching auditory attention Pelli, D. G. (1997). The VideoToolbox software for visual using spatial and non-spatial features recruits different cortical networks. Neuroimage, 84, 681–687. https://doi.org /10.1016/j.neuroimage.2013.09.061, PubMed: 24096028 Leichnetz, G. R. (1990). Preoccipital cortex receives a differential input from the frontal eye field and projects to the pretectal olivary nucleus and other visuomotor-related structures in the rhesus monkey. Visual Neuroscience, 5, 123–133. https://doi.org/10.1017/S095252380000016X, PubMed: 2177637 Liao, H.-I., Kashino, M., & Shimojo, S. (2021). Attractiveness in the eyes: A possibility of positive loop between transient pupil constriction and facial attraction. Journal of Cognitive Neuroscience, 33, 315–340. https://doi.org/10.1162/jocn_a _01649, PubMed: 33166194 Liao, H.-I., Kidani, S., Yoneya, M., Kashino, M., & Furukawa, S. (2016). Correspondences among pupillary dilation response, subjective salience of sounds, and loudness. Psychonomic Bulletin & Review, 23, 412–425. https://doi.org/10.3758 /s13423-015-0898-0, PubMed: 26163191 Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164, 177–190. https://doi.org/10.1016/j.jneumeth .2007.03.024, PubMed: 17517438 Mathôt, S. (2018). Pupillometry: Psychology, physiology, and function. Journal of Cognition, 1, 16. https://doi.org/10.5334 /joc.18, PubMed: 31517190 Mathôt, S., Dalmaijer, E., Grainger, J., & Van der Stigchel, S. (2014). The pupillary light response reflects exogenous attention and inhibition of return. Journal of Vision, 14, 7. https://doi.org/10.1167/14.14.7, PubMed: 25761284 Mathôt, S., van der Linden, L., Grainger, J., & Vitu, F. (2013). The pupillary light response reveals the focus of covert visual attention. PLoS One, 8, e78168. https://doi.org/10.1371 /journal.pone.0078168, PubMed: 24205144 Mathôt, S., & Van der Stigchel, S. (2015). New light on the mind’s eye: The pupillary light response as active vision. Current Directions in Psychological Science, 24, 374–378. https://doi.org/10.1177/0963721415593725, PubMed: 26494950 Maunsell, J. H. R. (2015). Neuronal mechanisms of visual attention. Annual Review of Vision Science, 1, 373–391. https://doi.org/10.1146/annurev-vision-082114-035431, PubMed: 28532368 McDougal, D. H., & Gamlin, P. D. (2015). Autonomic control of the eye. Comprehensive Physiology, 5, 439–473. https://doi .org/10.1002/cphy.c140014, PubMed: 25589275 Mondor, T. A., & Zatorre, R. J. (1995). Shifting and focusing auditory spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 387–409. https://doi .org/10.1037//0096-1523.21.2.387, PubMed: 7714479 Naber, M., Frässle, S., & Einhäuser, W. (2011). Perceptual rivalry: Reflexes reveal the gradual nature of visual awareness. PLoS One, 6, e20910. https://doi.org/10.1371/journal.pone .0020910, PubMed: 21677786 Naber, M., & Nakayama, K. (2013). Pupil responses to high-level image content. Journal of Vision, 13, 7. https://doi.org/10 .1167/13.6.7, PubMed: 23685390 Noyce, A. L., Kwasa, J. A. C., & Shinn-Cunningham, B. G. (2022). Defining attention from an auditory perspective. WIREs Cognitive Science, e1610. https://doi.org/10.1002/wcs.1610, PubMed: 35642475 Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J.-M. (2011). FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence and Neuroscience, 2011, 156869. https://doi.org/10.1155/2011/156869, PubMed: 21253357 psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. https://doi.org/10.1163/156856897X00366, PubMed: 9176953 Posner, M., Snyder, C., & Davidson, B. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160–174. https://doi.org/10.1037/0096-3445 .109.2.160, PubMed: 7381367 Quinlan, P. T., & Bailey, P. J. (1995). An examination of attentional control in the auditory modality: Further evidence for auditory orienting. Perception & Psychophysics, 57, 614–628. https://doi.org/10.3758/BF03213267, PubMed: 7644322 Rhodes, G. (1987). Auditory attention and the representation of spatial information. Perception & Psychophysics, 42, 1–14. https://doi.org/10.3758/BF03211508, PubMed: 3658631 Salvaggio, S., Andres, M., Zénon, A., & Masson, N. (2022). Pupil size variations reveal covert shifts of attention induced by numbers. Psychonomic Bulletin & Review, 29, 1844–1853. https://doi.org/10.3758/s13423-022-02094-0, PubMed: 35384595 Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12, 182–186. https://doi.org/10.1016/j.tics.2008.02.003, PubMed: 18396091 Smith, D. V., Davis, B., Niu, K., Healy, E. W., Bonilha, L., Fridriksson, J., et al. (2010). Spatial attention evokes similar activation patterns for visual and auditory stimuli. Journal of Cognitive Neuroscience, 22, 347–361. https://doi.org/10.1162 /jocn.2009.21241, PubMed: 19400684 Spence, C. J., & Driver, J. (1994). Covert spatial orienting in audition: Exogenous and endogenous mechanisms. Journal of Experimental Psychology: Human Perception and Performance, 20, 555–574. https://doi.org/10.1037/0096-1523 .20.3.555 Spence, C., & Driver, J. (1996). Audiovisual links in endogenous covert spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 22, 1005–1030, https://doi.org/10.1037/0096-1523.22.4.1005, PubMed: 8756965 Strauch, C., Romein, C., Naber, M., Van der Stigchel, S., & Ten Brink, A. F. (2022). The orienting response drives pseudoneglect—Evidence from an objective pupillometric method. Cortex, 151, 259–271. https://doi.org/10.1016/j .cortex.2022.03.006, PubMed: 35462203 Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S., & Naber, M. (2022). Pupillometry as an integrated readout of distinct attentional networks. Trends in Neurosciences, 45, 635–647. https://doi.org/10.1016/j.tins.2022.05.003, PubMed: 35662511 Suzuki, Y., Minami, T., Laeng, B., & Nakauchi, S. (2019). Colorful glares: Effects of colors on brightness illusions measured with pupillometry. Acta Psychologica, 198, 102882. https://doi.org /10.1016/j.actpsy.2019.102882, PubMed: 31288107 Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. https:// doi.org/10.1016/0010-0285(80)90005-5, PubMed: 7351125 Wang, C. A., Boehnke, S. E., White, B. J., & Munoz, D. P. (2012). Microstimulation of the monkey superior colliculus induces pupil dilation without evoking saccades. Journal of Neuroscience, 32, 3629–3636. https://doi.org/10.1523 /jneurosci.5512-11.2012, PubMed: 22423086 Wang, C.-A., & Munoz, D. P. (2018). Neural basis of location-specific pupil luminance modulation. Proceedings of the National Academy of Sciences, U.S.A., 115, 10446–10451. https://doi.org/10.1073/pnas.1809668115, PubMed: 30249636 Zatorre, R. J., Mondor, T. A., & Evans, A. C. (1999). Auditory attention to space and frequency activates similar cerebral systems. Neuroimage, 10, 544–554. https://doi.org/10.1006 /nimg.1999.0491, PubMed: 10547331 290 Journal of Cognitive Neuroscience Volume 35, Number 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / j / o c n a r t i c e - p d l f / / / / 3 5 2 2 7 6 2 0 6 7 0 1 8 / j o c n _ a _ 0 1 9 3 5 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image
Seeing an Auditory Object: Pupillary Light Response image

Scarica il pdf