Pupil Diameter Tracks the Exploration–Exploitation
Trade-off during Analogical Reasoning and Explains
Individual Differences in Fluid Intelligence
Taylor R. Hayes and Alexander A. Petrov
Abstrait
■ The ability to adaptively shift between exploration and exploi-
tation control states is critical for optimizing behavioral perfor-
mance. Converging evidence from primate electrophysiology
and computational neural modeling has suggested that this ability
may be mediated by the broad norepinephrine projections ema-
nating from the locus coeruleus (LC) [Aston-Jones, G., & Cohen,
J.. D. An integrative theory of locus coeruleus-norepinephrine
fonction: Adaptive gain and optimal performance. Annual Review
of Neuroscience, 28, 403–450, 2005]. There is also evidence that
pupil diameter covaries systematically with LC activity. Although
imperfect and indirect, this link makes pupillometry a useful tool
for studying the locus coeruleus norepinephrine system in hu-
mans and in high-level tasks. Ici, we present a novel paradigm
that examines how the pupillary response during exploration and
exploitation covaries with individual differences in fluid intelli-
gence during analogical reasoning on Raven’s Advanced Progressive
Matrices. Pupillometry was used as a noninvasive proxy for LC
activité, and concurrent think-aloud verbal protocols were used
to identify exploratory and exploitative solution periods. Ce
novel combination of pupillometry and verbal protocols from
40 participants revealed a decrease in pupil diameter during exploi-
tation and an increase during exploration. The temporal dynamics
of the pupillary response was characterized by a steep increase
during the transition to exploratory periods, sustained dilation
for many seconds afterward, and followed by gradual return to
baseline. De plus, the individual differences in the relative mag-
nitude of pupillary dilation accounted for 16% of the variance in
Advanced Progressive Matrices scores. Assuming that pupil diam-
eter is a valid index of LC activity, these results establish promising
preliminary connections between the literature on locus coeruleus
norepinephrine-mediated cognitive control and the literature on
analogical reasoning and fluid intelligence. ■
INTRODUCTION
The ability to adaptively regulate the balance between
exploration and exploitation is critical for optimizing
behavior in the diverse, dynamic environments we encoun-
ter on a daily basis. Despite the ubiquity of the exploration–
exploitation trade-off and its broad importance in under-
standing executive control, the neural mechanisms involved
are still not well understood (Cohen, McClure, & Yu, 2007;
Berridge & Waterhouse, 2003). Recent animal and human
studies suggest that the locus coeruleus norepinephrine
(LC-NE) system may support the exploration–exploitation
trade-off, but the work is limited by the use of low-level tasks
( Jepma & Nieuwenhuis, 2011; Gilzenrat, Nieuwenhuis,
Jepma, & Cohen, 2010; Aston-Jones & Cohen, 2005). Ici,
we present a unique paradigm that examines how the
pupillary response during exploration and exploitation
covaries with individual differences in fluid intelligence
(Gf ) by combining pupillometry and verbal protocol anal-
ysis during analogical reasoning on Raven’s Advanced
Progressive Matrices (APM; Raven, Raven, & Court, 1998).
Expanding the study of the exploration–exploitation trade-
Ohio State University
© 2015 Massachusetts Institute of Technology
off to a high-level analogical reasoning task and employ-
ing an individual differences approach provided novel
insights into the relationship between the exploration–
exploitation trade-off, noradrenergic function, and indi-
vidual differences in Gf.
Converging evidence suggests that the LC-NE system plays
a central role in mediating the exploration–exploitation
trade-off (Cohen et al., 2007; Aston-Jones & Cohen, 2005).
Spécifiquement, the LC-NE system is thought to monitor for
unexpected uncertainty and actively mediate the shift
between exploration and exploitation in response to
reward history (Aston-Jones & Cohen, 2005). Much of
the current theory of LC-NE function is based on monkey
electrophysiological recordings (see Aston-Jones & Cohen,
2005, for a review) and computational models thereof
(Brun, Gilzenrat, & Cohen, 2005; Usher, Cohen, Servan-
Schreiber, Rajkowski, & Aston-Jones, 1999). The electro-
physiological data suggested that locus coeruleus (LC)
neurons in monkeys exhibit distinct firing patterns that
lie along a continuum of task performance from offline to
phasic to tonic modes.
Offline mode occurs when the animal is drowsy or
actively sedated and is characterized by low levels of LC
activity and poor task performance. In phasic mode, le
Journal des neurosciences cognitives 28:2, pp. 308–318
est ce que je:10.1162/jocn_a_00895
D
o
w
n
je
o
un
d
e
d
F
r
o
m
je
je
/
/
/
/
j
t
t
F
/
je
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
je
n
t
o
p
un
r
d
c
e
.
d
s
F
je
r
o
je
m
v
e
h
r
c
p
h
un
d
je
je
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
toi
c
n
o
/
c
un
n
r
un
t
r
je
t
je
c
c
je
e
e
–
p
–
d
p
d
2
F
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
un
/
_
j
0
o
0
c
8
n
9
5
_
un
p
_
d
0
0
b
8
oui
9
g
5
toi
.
e
p
s
t
d
o
F
n
b
0
oui
7
S
M.
e
je
p
T
e
m
L
je
b
b
e
r
r
un
2
r
0
je
2
3
e
s
/
j
t
/
F
.
toi
s
e
r
o
n
1
7
M.
un
oui
2
0
2
1
baseline firing rate remains low, but the LC neurons fire
phasic bursts of activity synchronized to task-relevant
events; the animals exhibit high task performance. Enfin,
the tonic mode is characterized by high baseline rates of
LC firing, poor task performance, exploratory behaviors,
and indiscriminate sensitivity to both task-related and
task-unrelated stimuli. These electrophysiological find-
ings have been incorporated into a broader theory of
LC function in which the LC mediates the exploration–
exploitation trade-off in response to online assessments
of task utility (adaptive gain theory, AGT; Aston-Jones &
Cohen, 2005). AGT postulates that the LC actively me-
diates the gain of cortical units through the release of nor-
epinephrine to promote exploitation via phasic mode or
exploration via tonic mode (Aston-Jones & Cohen, 2005).
Although AGT provides an elegant account of the existing
data, it is limited by a paucity of corroborative human
studies to validate and test the theorized link between
LC function and the exploration–exploitation trade-off.
One major obstacle to studying LC function in humans
is identifying a noninvasive method for measuring LC
activité. Recently, pupil diameter has emerged as one
promising noninvasive measure for LC activity and is being
increasingly employed for this purpose (par exemple., Cheadle et al.,
2014; Eldar, Cohen, & Niv, 2013; Jepma & Nieuwenhuis,
2011; Einhäuser, Koch, & Carter, 2010; Gilzenrat et al.,
2010; Einhäuser, Stout, Koch, & Carter, 2008). Neuro-
imaging work has shown that the pupil diameter covaries
with fMRI BOLD activity in the LC (Murphy, O’Connell,
O’Sullivan, Robertson, & Balsters, 2014) and that both P3
ERP and pupil diameter are sensitive to LC-NE modes of
task engagement (Cheadle et al., 2014; Murphy, Robertson,
Balsters, & O’Connell, 2011). Converging evidence from
electrophysiology (par exemple., Rajkowski, Kubiak, & Aston-Jones,
1994) and pharmacology (par exemple., Phillips, Szabadi, &
Bradshaw, 2000; Koss, 1986) also suggests that pupil
diameter correlates with LC activity in animals. The anatom-
ical pathways linking the LC and the pupil are a topic of
ongoing research but probably involve α2-adrenoreceptor-
mediated inhibition of the parasympathetic Edinger–
Westphal nucleus responsible for pupil constriction (Samuels
& Szabadi, 2008).
Ici, we extend the study of the exploration–exploitation
trade-off and LC function by tracking real-time shifts in
exploration and exploitation in a rich, temporally extended
analogical reasoning task, Raven’s APM (Raven et al., 1998).
The APM is a geometric analogy test with excellent psycho-
metric properties (Brouwers, Van de Viver, & Van Hemert,
2009) that has been a popular and trusted instrument in
psychology for 70 années (par exemple., Hayes, Petrov, & Sederberg,
2011, 2015; Gray, Chabris, & Plus courageux, 2003; Carpenter,
Just, & Shell, 1990). Raven’s APM is an excellent envi-
ronment to induce strong shifts between exploration and
exploitation because it repeatedly places participants in
an unfamiliar relational environment in which they must
engage in relational foraging to attempt to construct and/
or identify the correct answer.
Although the term “relational foraging” is not com-
monly used in the literature on relational reasoning and
Gf, it is pertinent to geometric analogy problems such as
Raven’s APM (Chiffre 1, gauche). The problem-solving pro-
cess is a search of an abstract problem space and involves
the formulation of hypotheses and subgoals that either
succeed or fail (par exemple., Taatgen, Huss, Dickison, & Anderson,
2008; Newell & Simon, 1976). There are deep structural
parallels, although not strict isomorphism, with rein-
forcement learning concepts such as reward, punishment,
exploration, and exploitation. En effet, reinforcement learn-
ing algorithms are commonly used to learn the utilities of
production rules in production systems (par exemple., Taatgen,
2013; Anderson et al., 2004). These structural similarities
provide connections with independently developed theo-
ries of LC-NE function.
Specifically in Raven’s APM, the relational foraging
process consists of an extended series of goals and
subgoals, in which hypothesized patterns are extracted
from one part of the problem matrix and tested on
others. If the hypothesized pattern generalizes to an-
other part of the matrix, the current pattern receives
reinforcement. If the test fails, then a new pattern hy-
pothesis must be generated (Carpenter et al., 1990). Ce
iterative strategy is used to extract all the relations
contained in a given Raven item or as many relations as
needed to narrow the number of possible responses. Ce
process has been formalized in models of matrix reason-
ing (Lovett, Tomai, Forbus, & Usher, 2009; Carpenter
et coll., 1990).
Surtout, for our present purposes, the relational
foraging process maps onto key aspects of AGT. Chaque
Raven problem is a miniature environment with a precise
definition of optimal performance (c'est à dire., pinpointing the
item that best completes the relational pattern), fluctua-
tions in task utility over time that result from testing
hypothesized patterns, and implicit reinforcement re-
ceived when hypothesized patterns generalize or fail to
generalize. Last but not least, the use of Raven’s APM
instead of simpler reinforcement learning tasks is meth-
odologically beneficial for a pupillometric study because
it produces more extended periods of exploration and
exploitation, which are better suited to the relatively
low temporal resolution of the pupillary response.
In the current study, pupil diameter was recorded as
an indirect proxy for LC activity on each trial. Periods of
exploration and exploitation were identified with the aid
of think-aloud verbal protocols (Ericsson & Simon, 1993)
that were collected while the participants solved each
Raven problem. The results revealed a decrease in pupil-
lary response during exploitative periods and a significant
increase during exploratory periods.
This pattern is consistent with prominent theories of
LC-NE system function and provides the first evidence
that this system may be involved in cognitive control of
the exploration–exploitation trade-off during analogical
raisonnement. De plus, the individual differences in the
Hayes and Petrov
309
D
o
w
n
je
o
un
d
e
d
F
r
o
m
je
je
/
/
/
/
j
F
/
t
t
je
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
je
n
t
o
p
un
r
d
c
e
.
d
s
F
je
r
o
je
m
v
e
h
r
c
p
h
un
d
je
je
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
toi
c
n
o
/
c
un
n
r
un
t
r
je
t
je
c
c
je
e
e
–
p
–
d
p
d
2
F
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
un
/
_
j
0
o
0
c
8
n
9
5
_
un
p
_
d
0
0
b
8
oui
9
g
5
toi
.
e
p
s
t
d
o
F
n
b
0
oui
7
S
M.
e
je
p
T
e
m
L
je
b
b
e
r
r
un
2
r
0
je
2
3
e
s
/
j
t
.
F
/
toi
s
e
r
o
n
1
7
M.
un
oui
2
0
2
1
D
o
w
n
je
o
un
d
e
d
F
r
o
m
je
je
/
/
/
/
j
F
/
t
t
je
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
je
n
t
o
p
un
r
d
c
e
.
d
s
F
je
r
o
je
m
v
e
h
r
c
p
h
un
d
je
je
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
toi
c
n
o
/
c
un
n
r
un
t
r
je
t
je
c
c
je
e
e
–
p
–
d
p
d
2
F
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
un
/
_
j
0
o
0
c
8
n
9
5
_
un
p
_
d
0
0
b
8
oui
9
g
5
toi
.
e
p
s
t
d
o
F
n
b
0
oui
7
S
M.
e
je
p
T
e
m
L
je
b
b
e
r
r
un
2
r
0
je
2
3
e
s
/
j
F
/
.
t
toi
s
e
r
o
n
1
7
M.
un
oui
2
0
2
1
Chiffre 1. Raven problem format and trial sequence. (Gauche) The problem matrix and the eight response alternatives are shown with solid lines.
The height of the rectangular box around the matrix subtended 9° of visual angle. This example item (generated by the authors) contains three
relations that must be extracted: distribution of three shapes (diamond, triangle, parallelogram), distribution of three line orientations (0°, 45°, 90°),
and decreasing line number down columns (3 → 2 → 1). (Droite) Each trial had three phases: fixation, solution, and response. Participants fixated
pour 1 sec. Eye movements and concurrent think-aloud verbal protocols were collected during the solution phase. Moving the mouse cursor out
of the fixation box triggered the response phase, during which the problem matrix was masked, and the participant clicked on their chosen answer.
The intertrial interval (ITI) était 200 msec.
exploratory pupillary dilation could account for 16% de
the variation in APM scores across participants.
MÉTHODES
This study was conducted in a larger context of several
related experiments that combined think-aloud verbal
protocols and eye tracking to investigate the role of stra-
tegic cognitive control during visual relational reasoning
on Raven’s APM (Hayes, 2015; Hayes et al., 2011, 2015).
These experiments involved multiple sessions and vari-
ous manipulations from Session 2 onward, but the first
session was always the same: To establish a common
baseline, eye-tracking and think-aloud protocols were
collected while the participants worked on 14 Raven prob-
lems as detailed below. This study is based exclusively on
data from this common baseline session. Although other
aspects of this large and multifaceted data set have been
published elsewhere (Hayes et al., 2011, 2015), the pupil-
lometric and verbal protocol aspects are reported here
for the first time.
Participants
One hundred thirty-six students at the Ohio State Uni-
versity participated in the experiments outlined above
(Hayes, 2015). They responded to recruitment flyers
posted in the Ohio State University Psychology Building
and were paid $6 per hour plus $1 bonus for each correct
answer to a Raven problem. Sixteen participants did not
consistently provide think-aloud protocols throughout
each trial and were excluded from further consideration
for the present purposes. Because of the labor-intensive
nature of verbal protocol preprocessing, only 40 sessions’
worth of verbal protocols were coded and analyzed. Ainsi,
all results reported below are based on a random stratified
sample of 40 participants (20 women and 20 men).
The distribution of Gf scores in the large sample (N =
120) was partitioned into four ability groups as follows: haut
(APM scores of 13–14), medium-high (scores of 11–12),
medium-low (scores of 8–10), et faible (scores of ≤7) abili-
liens. Ten participants were then drawn at random from each
ability group, and the verbal protocols and pupillometric
data from their first (baseline) session were processed.
Stimuli
The participants completed a short-form test from Raven’s
APM Set II (Raven et al., 1998). Participants either com-
pleted Items 2, 4, 6, 9 10, 11, 16, 17, 19, 21, 23, 24,
26, et 29 or Items 1, 3, 5, 7, 12, 13, 14, 15, 18, 20, 22,
25, 27, et 28. The participant instructions followed the
Raven APM Manual guidelines for untimed individual test
administration (Raven et al., 1998). The two 14-item sub-
sets of the complete (36-item) APM were chosen to be
approximately matched for difficulty on the basis of their
psychometric characteristics published in the manual
(Raven et al., 1998). There were no statistically significant
differences in the respective distributions of scores in our
sample (Hayes, 2015).
Apparatus
The Raven items were presented on a 21-in. NEC AccuSync
120 color CRT using Experiment Builder (SR Research,
Mississauga, Canada). Participants viewed the items bino-
cularly from a chin-and-forehead rest located 935 mm
away. Gaze position and pupil response data were recorded
from the left eye using an EyeLink 1000 desktop eye tracker
(SR Research) at a sampling rate of 250 Hz. The experi-
mental room had a constant ambient illuminance with
25 lux incident at participants’ eyes to control for the pupil-
lary light reflex. Image analysis of the Raven APM items
revealed high luminance consistency across the 28 Raven
test items (grayscale intensity: M = 0.96, SD = 0.02) et
310
Journal des neurosciences cognitives
Volume 28, Nombre 2
across the individual matrix and response cells within each
item (grayscale intensity: M = 0.90, SD = 0.04). Donc,
we did not alter the luminance properties of the original
Raven APM test images, preserving their original psycho-
metric properties.
Verbal protocols were recorded for each Raven item
using a Shure Beta 58A supercardioid dynamic micro-
phone (Shure, Inc., Niles, IL) and E-MU 0202 audio inter-
face (E-MU, Scotts Valley, Californie) controlled via Experiment
Builder (SR Research). The microphone was placed close
to the participant’s mouth (≈5 cm) using a telescop-
ing boom tripod microphone stand to provide clear
audio recordings. The concurrent think-aloud verbal
protocols were collected according to standard think-
aloud procedures (Ericsson & Simon, 1993). After the
participants received instructions on thinking aloud, ils
practiced it on unrelated items such as multiplication prob-
lems until the experimenter was confident they under-
stood the instructions.
Procedure
Before the study, participants completed the EyeLink
1000 9-point calibration procedure. Each Raven item was
preceded by a beep and fixation cross (similar to the Eye-
Link 1000 9-point calibration procedure) that appeared in
the middle of the screen (Chiffre 1, droite). The fixation
screen was equal in luminance to the subsequent Raven
item to avoid luminance changes at stimulus onset. After
the participant fixated for 1 sec, which allowed for equip-
ment recalibration, the Raven problem appeared, et le
participant had unlimited time to work on it. Once par-
ticipants had chosen an answer, they used the mouse to
click on one of the eight responses, thereby ending the
trial. Moving the mouse out of the fixation box triggered
an isoluminant mask to be drawn over the problem matrix,
which delineated solution and response phases. No accu-
racy feedback was provided until the very end of the exper-
imental session to avoid feedback-induced pupillary
dilations. Accuracy and solution time data were collected
for each trial. Accuracy was defined as the total number
of Raven items answered correctly, and solution time was
measured from stimulus onset until response selection.
Pupil diameter and gaze position were recorded through-
dehors (c'est à dire., from pretrial fixation through the end of the inter-
trial interval).
Pupil Data Preprocessing
Before analysis, the pupillary data were corrected for
blink artifacts and pupil foreshortening error. Following
standard procedures, pupillary measurements were first
filtered for blink artifacts, linearly interpolated, and then
smoothed for measurement noise (Klingner, 2010; Beatty
& Lucero-Wagoner, 2000). En outre, the pupil data were
corrected to account for pupil foreshortening error—the
systematic foreshortening of the pupil image as the eye
rotates away from the eye-tracking camera (Hayes &
Petrov, 2015). Pupil foreshortening error must be cor-
rected before analysis because solving Raven items re-
quires the participants to freely scan the screen. Le
pupil foreshortening error correction described in detail
in Hayes and Petrov (2015) fits a geometric model that
expresses the pupil foreshortening as a function of the
cosine of the angle between the eye-to-camera axis and
the eye-to-stimulus axis. In calibration studies with arti-
ficial eyes with known, fixed pupil diameters (Hayes &
Petrov, 2015), the geometric correction successfully
reduced the root mean squared error in pupil diameter
estimates by 97.5% when the model parameters were
optimized to fit the empirical error surface. The cali-
bration results strongly indicated that the pupil fore-
shortening error is invariant across changes in pupil size
and systematically varies as a function of the orientation
of the eye with respect to the camera. The results cor-
responded well with previous empirical measurements of
pupil foreshortening error in biological human eyes (Mathur,
Gehrmann, & Atchison, 2013; Jennings & Charman, 1978;
Jay, 1962; Spring & Stiles, 1948). Ensemble, these findings
suggest the geometric correction can be used to virtually
eliminate pupil foreshortening error. Dans cette étude, we first
performed artifact correction to measure the apparent
pupil diameter and then applied the optimized geometric
model correction from Hayes and Petrov (2015) to esti-
mate the true pupil diameter.
Verbal Protocol Coding
Concurrent think-aloud verbal protocols were used to
segment each trial into exploration and exploitation solu-
tion periods. A broad coding scheme was developed to
assist the coder in identifying exploration and exploita-
tion periods during each Raven trial. Exploration periods
were indicated by utterances that described isolated
Raven image features (par exemple., “Alright, it looks like we have
a bunch of circles and squares….”) or expressed uncer-
tainty (“Not sure what is going on here…I don’t see any
patterns yet.”). Exploitation periods were indicated by
utterances that described a specific pattern within the
Raven item (par exemple., “In each line it looks like we have circle
and diamond and square…. So on this we have square…
circle…So the bottom should be a diamond.”). Many
transitions from exploration to exploitation were signaled
by insight language (“Oh, I see!»). Early Raven items that
were easier to solve often only contained one exploration-
to-exploitation shift, whereas later more difficult Raven
items contained multiple transitions between exploring
and exploiting. On these more difficult items, subse-
quent transitions from exploiting back to exploring were
preceded by participants realizing either that a pattern
extracted on one row of the Raven problem matrix did
not generalize to the subsequent rows or that there
was no response option that matched the final solution
they had in mind. In these cases, the return to exploration
Hayes and Petrov
311
D
o
w
n
je
o
un
d
e
d
F
r
o
m
je
je
/
/
/
/
j
t
t
F
/
je
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
je
n
t
o
p
un
r
d
c
e
.
d
s
F
je
r
o
je
m
v
e
h
r
c
p
h
un
d
je
je
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
toi
c
n
o
/
c
un
n
r
un
t
r
je
t
je
c
c
je
e
e
–
p
–
d
p
d
2
F
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
un
/
_
j
0
o
0
c
8
n
9
5
_
un
p
_
d
0
0
b
8
oui
9
g
5
toi
.
e
p
s
t
d
o
F
n
b
0
oui
7
S
M.
e
je
p
T
e
m
L
je
b
b
e
r
r
un
2
r
0
je
2
3
e
s
/
j
t
.
/
F
toi
s
e
r
o
n
1
7
M.
un
oui
2
0
2
1
was signaled by failure utterances (par exemple., “But that doesn’t
match the second row” or “ok, it doesn’t look like that is
even one of the possible options”) followed by a transition
back to uncertainty utterances and/or isolated feature
descriptions.
A semiautomated coding routine was developed in
MATLAB (The MathWorks, Natick, MA) and was used to
code all verbal protocol data. In this routine, for each trial,
the human expert coder would be presented with an
image of the relevant APM item while the recorded verbal
protocol audio was played back in real time. The coder
served as an “exploration detector” pressing one key to
indicate the beginning of an exploratory period and
another key to indicate the end of an exploratory period
and the beginning of an exploitative period. The begin-
ning of a trial was coded as neutral before any key presses.
The MATLAB routine would then convert the time-stamped
key presses into a code stream that contained the neutral
(0), exploratory (+1), and exploitative (−1) codes for that
verbal protocol, each sampled at 250 Hz. This procedure
was completed for all participants (n = 40) and trials
(n = 14), resulting in 560 individual protocol code streams.
Recent studies that have examined the effect of think-
ing aloud relative to silent control conditions have not
shown any pupillary effect of vocalization (Hertzum &
Holmegaard, 2013; Kammerer & Gerjets, 2013). Là-
fore, no distinction was made in the verbal protocol coding
between periods of vocalization and gaps in vocalization.
All coding was performed by the first author (T. R.. H.).
He did not have access to any pupillometric data while he
was coding the verbal protocols. Coder reliability was
assessed by coding the data from five randomly sampled
participants twice. The recoding was done approximately
1 full year after the original coding. The intrarater reli-
ability for T. R.. H. across the two coding sessions was high
(mean % agreement = 82.16, 95% CI [80.17, 84.15]). Ce
suggests that the coding scheme was applied consistently.
Synchronizing Pupil and Verbal Protocol Streams
To synchronize the pupillary response stream with the
verbal protocol code stream, three sources of latency
were considered: participant latency, coder latency, et
pupillary response/LC latency. Participant latency refers
to the latency that occurs because of a participant pro-
cessing the APM item information and transforming it
into an utterance. Participant latency unfortunately can-
not be accounted for in our study because it is known
to vary across individuals and types of processing steps
et, donc, will invariably add some noise to our data
(Ericsson & Simon, 1993). In contrast, coder latency and
LC pupillary response can and were accounted for before
analyse. Coder response latency refers to the processing
time it takes for the verbal protocol coder to process
what they are hearing, make the decision to switch
codes, and then actually press the key on the keyboard.
To estimate this value, a random sample of 50 trials was
used to compare the coder key RT stamps to the original
audio time series using audio editing software (Apple,
Cupertino, Californie). The results showed a coder response
latency of approximately 1 sec (M = 1014 msec, SD =
198 msec). Enfin, we considered the documented lag
between LC activity and the pupillary response. Single-
cell studies of LC neurons show that LC activity is tightly
linked to stimulus onset, with a lag of only ≈200 msec
(Clayton, Rajkowski, Cohen, & Aston-Jones, 2004; Rajkowski,
Majczynski, Clayton, & Aston-Jones, 2004). Cependant, le
temporal resolution of the pupillary response is much
lower than that of LC neurons. The pupil acts as a low-pass
filter of LC activity with a lag of approximately 1 sec after
stimulus onset (Hayes & Petrov, submitted; van Steenbergen
& Band, 2013; Gagl, Hawelka, & Hutzler, 2011). As the
coder and pupillary response latencies were approxi-
mately equivalent (each about 1 sec), no additional pre-
processing was necessary to synchronize the pupil and
code streams before analysis.
Segmentation of the Pupillary Data
Enfin, the pupillary data were segmented according to
the exploratory and exploitative periods obtained from
the verbal protocols. D'abord, a baseline pupil diameter
was calculated for each segment as follows: The baseline
for the first nonneutral (exploratory) segment at the begin-
ning of each trial was computed as the average pupil
diameter during the first 500 msec of that segment to pro-
vide a more accurate baseline estimate as participants
began the trial. The baseline for all subsequent segments
until the end of the trial was computed as the average pupil
diameter during the last 1000 msec of the immediately
preceding segment.
Our main dependent variable is the percent change in
pupil diameter (PCPD) relative to the relevant (most recent)
baseline. The PCPD is measured in dimensionless units
and is invariant with respect to the considerable individual
differences in absolute pupil diameter as well as to slow
drifts in pupillary tone. Spécifiquement, the PCPD was com-
puted as the task-evoked diameter minus the baseline di-
ameter, divided by the baseline diameter. The mean PCPD
was calculated by averaging the PCPD time series within
each exploratory or exploitative segment for each partici-
pant on each trial (Beatty & Lucero-Wagoner, 2000).
RÉSULTATS
The accuracy and RT data replicated well-documented
patterns in the literature on Raven’s APM (par exemple., Bors &
Vigneau, 2003; Carpenter et al., 1990). There were sub-
stantial individual differences in overall APM scores, et
the trial-by-trial accuracy decreased whereas RTs in-
creased for the later, more difficult problems on the test.
The verbal protocols indicated a slightly greater number
of exploration than exploitation periods (990 explore,
312
Journal des neurosciences cognitives
Volume 28, Nombre 2
D
o
w
n
je
o
un
d
e
d
F
r
o
m
je
je
/
/
/
/
j
t
t
F
/
je
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
je
n
t
o
p
un
r
d
c
e
.
d
s
F
je
r
o
je
m
v
e
h
r
c
p
h
un
d
je
je
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
toi
c
n
o
/
c
un
n
r
un
t
r
je
t
je
c
c
je
e
e
–
p
–
d
p
d
2
F
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
un
/
_
j
0
o
0
c
8
n
9
5
_
un
p
_
d
0
0
b
8
oui
9
g
5
toi
.
e
p
s
t
d
o
F
n
b
0
oui
7
S
M.
e
je
p
T
e
m
L
je
b
b
e
r
r
un
2
r
0
je
2
3
e
s
/
j
t
.
F
/
toi
s
e
r
o
n
1
7
M.
un
oui
2
0
2
1
Chiffre 2. Comparison of
group-averaged PCPD for
exploratory and exploitative
periods by APM score, and a
scatterplot of exploratory
PCPD by individual APM score.
(Gauche) Mean PCPD from baseline
for exploratory and exploitative
periods, averaged across all
40 participants and/or for
subgroups at four ability levels
(n = 10 for each subgroup).
The group averages revealed
a decrease in PCPD during
exploitative periods and an
increase during exploratory
periods. The latter increase
was significantly greater in
the higher ability subgroups. The error bars represent ±1 SEM. (Droite) Individual differences in APM score were correlated with individual
differences in mean PCPD during the exploratory periods.
945 exploit, 560 neutral). This is because the first non-
neutral period on a trial was always exploratory, le
two types alternated thereafter, and some trials ended in
exploration mode. Cependant, the exploitation periods were,
on average, longer in duration (exploit: M = 25.9 sec, SD =
25.2 sec; explore: M = 18.7 sec, SD = 14.4 sec; neutral:
M = 1.75 sec, SD = 0.9 sec).
A significant boost in mean PCPD was observed during
exploration periods relative to exploitation periods (figue-
ure 2, gauche). A repeated-measures ANOVA with Segment
type (explore vs. exploit) as a fixed factor and Partici-
pant as a random factor confirmed a strong exploration/
exploitation effect on the PCPD (F(1, 39) = 71.9, p <
.001, ηP
2 = 0.65). A pair of one-tailed t tests confirmed that
the exploration effect was significantly greater than zero
(t(39) = 7.59, p < .001, r2 = .59) and the exploitation
effect was significantly less than zero (t(39) = −4.90, p <
.001, r2 = .38). Under the linking hypothesis that PCPD is
a valid index of LC-NE function, which in turn mediates
the exploration–exploitation trade-off, this novel finding
suggests that this LC-NE mediation also operates during
high-level analogical reasoning.
Furthermore, the mean exploratory PCPD increased
linearly as a function of fluid reasoning ability as indexed
by the APM. The steady increase is evident both in the
group level (Figure 2, left) and individual level (Figure 2,
right) data. A linear regression with mean exploratory
PCPD as the sole predictor accounted for 16% of the var-
iance in individual APM scores (F(1, 38) = 7.05, p = .01,
r2 = .16). Under the linking hypothesis outlined above,
this additional novel finding suggests that individual dif-
ferences in the mediation of the exploration–exploitation
trade-off may contribute to individual differences in Gf.
By contrast, no significant trends were observed in the
exploitative pupillary response as a function of fluid
ability.
The underlying temporal dynamics of the pupillary re-
sponse revealed that these patterns in the averaged PCPD
data were driven by sustained rather than momentary
changes in pupil diameter (Figure 3). Figure 3A shows
Figure 3. Temporal dynamics
of the pupillary response
near segment transition
boundaries. (Left) Group-
averaged PCPD, time-locked
to exploration and exploitation
onset. The dashed vertical
line at 0 sec indicates the
onset of the new segment
as identified in the verbal
protocols. Transitions to
exploration (black line) were
accompanied with steep,
sustained pupillary dilation,
whereas transitions to
exploitation (white line)
showed only slow steady
decrease in pupil diameter.
The semitransparent gray error bands delineate 95% within-subject confidence limits. (Right) PCPD averaged within 5-sec-wide bins as a function
of time spent exploring or exploiting. The error bars represent ±1 SEM.
Hayes and Petrov
313
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
f
/
t
t
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
.
t
f
/
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
the grand-mean PCPD time-locked to exploration and
exploitation onset.1 The transitions from exploratory to
exploitative segments were not associated (on average)
with steep changes in pupil diameter. Rather, the exploit-
ative segments exhibited (on average) a slow steady
decrease in pupil diameter, depicted by the white line
in Figure 3A. By contrast, the transitions from neutral/
exploitative to exploratory segments were accompanied
with steep increases in pupil diameter (black line) that
began before exploratory language became manifest in
the verbal protocols. This suggests that exploration likely
preceded verbalization (on average) as can be seen by
the positive slope of the black line near the transition
boundary in Figure 3A. As this same period was taken as
the baseline for subsequent change in pupil diameter, the
PCPD values used in the statistical analyses may under-
estimate the magnitude of the pupillary dilation during
exploration.
Furthermore, the exploratory pupil dilation seemed to
persist for many seconds into the exploration period, as
depicted in Figure 3 (right). We interpret this sustained
pupillary dilation as a marker of a temporally extended
exploratory state as opposed to a transient event.
In the behavioral data, we expected that both mean
error rate and mean solution time would increase as a
function of trial number according to the progressive
nature of Raven’s test and in agreement with previous
findings (e.g., Bors & Vigneau, 2003; Carpenter et al.,
1990). A one-tailed Pearson’s product–moment correla-
tion test confirmed that trial number accounted for a
significant amount of variance in both error rate (t(12) =
8.42, p < .001, r2 = .85) and solution time (t(12) = 9.13,
p < .001, r2 = .87).
Given the strong trial effect on error rate and solution
time, we tested for a trial difficulty effect on the pupillary
response during exploration and exploitation. A trend
analysis revealed a significant linear decrease in mean
PCPD as a function of within-subject trial number during
the exploration periods (F(1, 507) = 41.01, p < .001, ηP
2 =
.08), whereas no statistically significant trend was de-
tected during the exploitation periods (F(1, 507) =
1.08, p = .299). This negative relationship between the
magnitude of the exploratory pupillary dilation, on the
one hand, and trial number, on the other, can be attri-
buted to the much longer solution times on later, more
difficult trials. Although the exploratory dilation could
be sustained, on average, for at least 20 sec (Figure 3B),
many exploratory periods were quite longer on difficult
trials, eventually diluting the exploratory PCPD increase.
To investigate this further, Figure 4 presents some
basic descriptive statistics about the number and dura-
tion of exploratory and exploitative segments. Both quan-
tities increased as a function of trial number (and trial
difficulty). The earliest trials typically exhibited only one
brief exploration period followed by a single brief exploi-
tation period. On the more difficult middle and late
items, however, the participants tended to alternate mul-
tiple times between exploring and exploiting. Trend
analyses revealed significant linear (F(1, 507) = 193.38,
p < .001, ηP
2 = .27) and quadratic (F(1, 507) = 21.02,
p < .001, ηP
2 = .04) trends in the total number of transi-
tions between exploration and exploitation as a function
of Trial. Analogous analyses also revealed significant
linear and quadratic trends in exploration duration (linear:
F(1, 468) = 32.72, p < .001, ηP
2 = .06; quadratic: F(1,
468) = 46.82, p < .001, ηP
2 = .09) and exploitation dura-
tion (linear: F(1, 468) = 18.21, p < .001, ηP
2 = .04; quadratic:
F(1, 468) = 26.47, p < .001, ηP
2 = .05).
Furthermore, there was evidence for interactions be-
tween the difficulty of the test items and the Gf of the
participants as indexed by their APM scores. Recall that
the participants were sampled from four ability groups.
Repeated-measures ANOVAs with Group as a between-
subject factor and Trial as a within-subject factor showed
significant Trial × Group interactions for the number of
transitions (F(39, 468) = 1.65, p < .01, ηP
2 = .12), explo-
ration duration (F(39, 468) = 1.83, p < .01, ηP
2 = .13),
exploitation duration (F(39, 468) = 2.22, p < .01, ηP
2 =
.16), and total solution time (F(39, 468) = 2.37, p < .001,
ηP
2 = .16). These Ability × Difficulty interactions reflected
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
f
/
t
t
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
/
.
f
t
Figure 4. Mean number and
duration of exploration and
exploitation periods as a
function of trial number.
(Top) The mean number of
exploration and exploitation
periods increased as the Raven
problems got progressively
more difficult. (Bottom) The
mean period duration also
increased as trial difficulty
increased, reflecting the
increasingly complex figural
elements and relations
characteristic of the most
difficult problems. The error
bars on both panels represent
±1 SD.
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
314
Journal of Cognitive Neuroscience
Volume 28, Number 2
the differential ability of the participants to engage with
the most difficult items (Trials 11–14). High-ability par-
ticipants would struggle yet work through those difficult
items over a lengthy sequence of alternating exploration
and exploitation periods, whereas lower ability partici-
pants were prone to become overwhelmed, take a guess,
and terminate the trial after a comparatively short effort.
These Ability × Difficulty interactions raise a possible
alternative explanation for the correlation between explor-
atory PCPD and ability group depicted in Figure 2A above.
It is possible that this correlation might simply be driven
by the difference in performance between high- and low-
ability participants on the most difficult items. However,
the following analysis suggests that this is unlikely. When
the mean exploratory PCPD, taken across Trials 1–10 only,
excluding the most difficult trials (Trials 11–14), was used
as a predictor in a linear regression, it accounted for 20%
of the variance in individual APM scores (F(1, 38) = 9.28,
p < .01, r2 = .20). Recall that the mean exploratory PCPD
across all 14 trials accounted for 16% of this variance.
Therefore, when the high- and low-ability participants
spent the same amount of time exploring, the correlation
between exploratory PCPD and APM scores actually in-
creased. This rules out the late trials as a potential con-
founding factor. If anything, the random guessing on the
most difficult trials in the lower ability groups probably
adds nonsystematic variance that degrades the correlation.
We also checked the so-called time-on-task effect as a
potential confound. Prior studies have found that pupil
diameter can decrease systematically during the experi-
mental session (Hayes & Petrov, submitted; Beatty, 1982;
Kahneman & Beatty, 1967). It should be noted that these
studies used low-level perceptual tasks with subsecond
RTs (e.g., vigilance, auditory discrimination, visual-motion
discrimination). These monotonous simple tasks are quite
different than Raven’s APM, which is designed to vary the
figural material constantly to measure fluid (as opposed to
crystallized) intelligence. It has been suggested that
the decreasing pupil size in earlier studies may be a result
of decreasing arousal as participants get bored with the
task (Laeng, Sirois, & Gredeback, 2012; Beatty & Lucero-
Wagoner, 2000). Some participants in our study worked
for over 5 min on some of the (difficult) Raven problems,
which raises the concern of a within-trial time-on-task
effect as a potentially confounding factor in our data. To
estimate the magnitude of the time-on-task effect over
the course of individual trials, we performed a series of
robust linear regressions2 on the pupillary diameter as a
function of the time since each stimulus onset. This pro-
duced one slope-parameter estimate per trial. These were
averaged across trials to produce one aggregate slope
estimate per participant. The latter estimates did not dif-
fer significantly from zero (t(39) = 1.18, two-tailed p =
.25). In addition, recall that the exclusion of the four lon-
gest trials from the analysis only strengthened the pattern
in Figure 2A. Being the longest, these trials should be the
most vulnerable to a possible time-on-task effect. Overall,
this confound does not seem a viable explanation of our
results. Apparently, the consistently novel and challenging
nature of the Raven task kept our participants engaged
throughout each trial and throughout the session as a
whole.
Finally, we checked whether the exploration and ex-
ploitation periods differed in terms of missing values in
the pupillometric time series and in terms of saccade fre-
quency. Missing values occur when the eye tracker tem-
porarily loses the pupil, for example, because of blink
artifacts. Such artifacts were rejected during prepro-
cessing. The artifact frequency in the raw pupil data
was similar for exploration (M = 11.18, SD = 12.42 per-
cent of period) and exploitation (M = 10.96, SD = 12.47,
t(39) = 0.50, p = .62). Saccade frequency was signifi-
cantly lower during exploration (M = 10.03, SD = 5.11
percent of period) compared with exploitation (M =
11.19, SD = 5.58, t(39) = 6.28, p < .001). However, we
are not aware of any studies showing a systematic effect
of saccade frequency on the pupillary response, and the
1% difference is not likely to account for the large explo-
ration/exploitation effect in our data. Saccades produce a
known risk of pupil foreshortening error as they change
the gaze position, but this source of systematic error was
corrected during preprocessing (Hayes & Petrov, 2015).
DISCUSSION
A novel combination of pupillometry and verbal protocol
analysis was used to compare changes in pupil diameter
during exploration and exploitation control states during
visual analogy making. The analysis revealed a significant
increase in pupil diameter during exploration and de-
crease during exploitation. This broad finding is the first
to generalize theories of the LC-NE system’s role in the
exploration–exploitation trade-off to a high-level analogi-
cal reasoning task such as Raven’s APM. More impor-
tantly, individual differences in the relative magnitude
of exploratory pupillary dilation accounted for 16% of
the variance in APM scores. This novel result suggests
that individual differences in general Gf may be related
to underlying differences in noradrenergic function.
Our findings build upon and are consistent with pre-
vious studies of the exploration–exploitation trade-off
that monitored the pupillary response as a noninvasive
index of the LC-NE system (Jepma & Nieuwenhuis, 2011;
Gilzenrat et al., 2010). Gilzenrat et al. (2010) had partici-
pants complete an auditory pitch discrimination task in
which reward increased as the pitch discrimination diffi-
culty increased, until the discrimination eventually became
impossible. Importantly, their participants were allowed
the option to escape before each trial. Escaping would
reset the reference tone, difficulty, and reward levels. They
found that baseline pupil diameter increased leading up to
escape trials and decreased afterward. This is consistent
with our observation of a decrease during exploitation
and increase during exploration. However, the effect size
Hayes and Petrov
315
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
f
/
t
t
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
t
/
f
.
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
was modest, and Gilzenrat et al. (2010) suggested that this
may be because of the escape manipulation not sufficiently
emulating exploration. In a follow-up study, Jepma and
Nieuwenhuis (2011) tracked the exploration–exploitation
trade-off during a dynamic n-armed bandit gambling task
in which participants repeatedly had to choose to play
one of four slot machines with nonstationary rewards.
Although the dynamic n-armed bandit task strongly pro-
moted shifts in the exploration–exploitation trade-off,
gaze position was not controlled during bandit selection,
and reward feedback was visually presented immediately
after selection. To avoid measurement artifacts, Jepma
and Nieuwenhuis (2011) restricted their pupillary response
analysis to the pretrial baseline period only. The main
result showed an overall increase in baseline pupillary
response before exploratory trial choices (i.e., trials where
participants switched their bandit choice) compared with
exploitative trial choices (i.e., trials in which participants
picked the same bandit). The present data add converging
indirect support for the role of the LC-NE system in the
exploration–exploitation trade-off.
Our study builds upon this earlier work ( Jepma &
Nieuwenhuis, 2011; Gilzenrat et al., 2010) in three impor-
tant ways. First, it generalizes the exploration–exploitation
trade-off findings from animal electrophysiology and
lower level human tasks to a high-level visual analogy-
making task. The previously studied tasks such as percep-
tual discrimination (Gilzenrat et al., 2010) and forced
choice (Jepma & Nieuwenhuis, 2011) are relatively simple
tasks with subsecond RTs. Raven’s APM provides a much
richer task environment with solutions that often unfold
over minutes and produce extended periods of explora-
tion and exploitation. These more temporally extended
periods are better suited to the lower temporal resolution
of the pupillary response as an index for the LC-NE system.
In addition, exploration–exploitation shifts in Raven’s
APM were often triggered by insight moments or pattern
failures providing a sharper boundary compared with pre-
vious studies where the explore–exploit transitions were
more gradual in nature. Second, the combination of think-
aloud verbal protocols and pupillometry allowed for pupil-
lometric analysis of exploration–exploitation shifts as they
happened. Combining these diverse data sources and ad-
dressing limitations of earlier human studies (i.e., removing
overt task feedback and correcting for pupil foreshortening
error) allowed for tighter experimental control over the
pupillary response during control state shifts. Last but not
least, Raven’s APM test is strongly correlated with a major
dimension of individual differences—Gf. This allowed for
a novel examination of how the exploration–exploitation
trade-off and noradrenergic function may contribute to
individual differences in Gf.
Examining how individual differences in APM score
covary with the pupillary response during control state
shifts expands the domain in a novel direction and offers
a plausible explanation for past inconsistencies in the
literature. Recent work ( Van Der Meer et al., 2010) has
indicated that high fluid-intelligence individuals have
larger task-evoked pupillary responses when performing
difficult tasks. This supports the view that people with high
Gf may simply have more cognitive resources that can be
recruited during demanding tasks (resource hypothesis;
Van Der Meer et al., 2010). Earlier work (Ahern & Beatty,
1979, 1981) showed the opposite pattern in which higher
intelligence individuals showed smaller task-evoked pupil-
lary responses than those with average intelligence. This
supports the view that high-intelligence individuals use their
cognitive resources more efficiently (efficiency hypothesis;
Ahern & Beatty, 1979, 1981).
Our results do not directly refute either of these earlier
hypotheses but offer a third account—a control hypoth-
esis. Higher fluid-ability individuals may be better able to
regulate their task-relevant control state. Our finding that
the exploratory boost in pupil diameter covaried with Gf
opens up the interesting possibility that individual differ-
ences in Gf may be related to individual differences in
mediating control state through stronger shifts in neural
gain. The control hypothesis offers a parsimonious expla-
nation for the conflicting earlier findings on the relation-
ship between intelligence and pupillary response. In
tasks that require exploration (such as the geometric
analogy task used by Van Der Meer et al., 2010), high-
Gf individuals who shift into higher gain states will have
larger task-evoked pupillary responses than low-Gf indi-
viduals. On the other hand, overlearned tasks that primarily
require exploitation (such as the mental multiplication,
digit span used by Ahern & Beatty, 1979, 1981) are easier
for high-Gf than low-Gf individuals. This produces a smaller
task-evoked pupillary response in high-Gf individuals. Al-
though our study does not directly bear on the role of
the pupillary response during overlearned tasks, there
are many cognitive load studies indicating that easier tasks
induce smaller pupillary response than difficult tasks (see
Beatty & Lucero-Wagoner, 2000, for a review).
One limitation of our individual differences finding is
that both exploratory pupillary response and Gf were
measured simultaneously on a common task. Although
Raven’s APM is a strong psychometric test (e.g.,
Brouwers et al., 2009), it is not a noise-free measure of
Gf (cf. Hayes et al., 2015). Therefore, it is possible that
the exploratory pupillary response and Gf may share
error variance because of other factors such as partici-
pant motivation or alertness. This study is a first step,
but it will be important to test in future research whether
its findings replicate in a design that measures Gf and
exploratory pupillary response on independent tasks
(e.g., use Raven’s APM on Day 1 to assess Gf and an iso-
luminant foraging task on Day 2 to assess the exploratory
pupillary response).
In conclusion, by combining verbal protocol analysis
and pupillometry, we identified and tracked shifts in the
exploration–exploitation trade-off during analogical rea-
soning on Raven’s APM fluid intelligence test. The results
showed decreased pupil diameter during exploitation
316
Journal of Cognitive Neuroscience
Volume 28, Number 2
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
f
/
t
t
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
t
.
f
/
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
and increased diameter during exploration, consistent
with prominent theories of LC-NE function. Importantly,
one sixth of the variance in Raven scores was accounted
for by individual differences in exploratory pupillary dila-
tion. These findings shed new light on the relationship
between the exploration–exploitation trade-off, nor-
adrenergic function, and individual differences in Gf.
Acknowledgments
This research was supported by the National Eye Institute (R21
EY022745).
Reprint requests should be sent to Taylor R. Hayes, Center for
Mind and Brain, University of California, Davis, CA 95618, or via
e-mail: taylor.r.hayes@gmail.com.
Notes
1. Note that the pupil baseline is applied retroactively to the
data points before the segment transition boundary in Figure 3A.
This is for plotting purposes only. In the statistical analyses, these
data points were incorporated into the preceding segment.
2. Each regression used iteratively reweighed least squares with
a bisquare weighting function. Ordinary regressions yielded sim-
ilar results.
REFERENCES
Ahern, S. K., & Beatty, J. (1979). Pupillary responses during
information processing vary with scholastic aptitude test
scores. Science, 205, 1289–1292.
Ahern, S. K., & Beatty, J. (1981). Physiological evidence that
demand for processing capacity varies with intelligence. In
M. P. Friedman, J. P. Das, & N. O’Connor (Eds.), Intelligence
and learning (pp. 121–128). New York: Plenum.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere,
C., & Qin, Y. (2004). An integrated theory of the mind.
Psychological Review, 111, 1036–1060.
Aston-Jones, G., & Cohen, J. D. (2005). An integrative theory of
locus coeruleus-norepinephrine function: Adaptive gain and
optimal performance. Annual Review of Neuroscience,
28, 403–450.
Beatty, J. (1982). Phasic not tonic pupillary responses vary
with auditory vigilance performance. Psychophysiology,
19, 167–172.
Beatty, J., & Lucero-Wagoner, B. (2000). The pupillary system.
In J. T. Cacioppo, L. G. Tassinary, & G. G. Berntson (Eds.),
Handbook of psychophysiology (2nd ed., pp. 142–162).
Cambridge: Cambridge University Press.
Berridge, C. W., & Waterhouse, B. D. (2003). The locus
coeruleus-noradrenergic system: Modulation of behavioral
state and state-dependent cognitive processes. Brain
Research Reviews, 42, 33–84.
Bors, D. A., & Vigneau, F. (2003). The effect of practice on
Raven’s advanced progressive matrices. Learning and
Individual Differences, 13, 291–312.
Brouwers, S. A., Van de Viver, F. J. R., & Van Hemert, D. A.
(2009). Variation in Raven’s progressive matrices scores
across time and place. Learning and Individual Differences,
19, 330–338.
Brown, E. T., Gilzenrat, M. S., & Cohen, J. D. (2005). The locus
coeruleus, adaptive gain, and the optimization of simple
decision tasks (Technical Report). Princeton, NJ: Princeton
University.
Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one
intelligence test measures: A theoretical account of the
processing in the Raven Progressive Matrices test.
Psychological Review, 97, 404–431.
Cheadle, S., Wyart, V., Tsetsos, K., Myers, N., de Gardelle, V.,
Castanon, S. H., et al. (2014). Adaptive gain control during
human perceptual choice. Neuron, 81, 1429–1441.
Clayton, E. C., Rajkowski, J., Cohen, J. D., & Aston-Jones, G.
(2004). Phasic activation of monkey locus coeruleus neurons
by simple decisions in a forced-choice task. Journal of
Neuroscience, 24, 9914–9920.
Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or
should I go? How the human brain manages the trade-off
between exploitation and exploration. Philosophical
Transactions of the Royal Society, Series B, Biological
Sciences, 362, 933–942.
Einhäuser, W., Koch, C., & Carter, O. L. (2010). Pupil dilation
betrays the timing of decisions. Frontiers in Human
Neuroscience, 4, 1–9.
Einhäuser, W., Stout, J., Koch, C., & Carter, O. (2008). Pupil
dilation reflects perceptual selection and predicts subsequent
stability in perceptual rivalry. Proceedings of the National
Academy of Sciences, U.S.A., 105, 1704–1709.
Eldar, E., Cohen, J. D., & Niv, Y. (2013). The effects of neural gain on
attention and learning. Nature Neuroscience, 16, 1146–1153.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal
reports as data (Rev. ed.). Cambridge, MA: MIT Press.
Gagl, B., Hawelka, S., & Hutzler, F. (2011). Systematic influence
of gaze position on pupil size measurement: Analysis and
correction. Behavior Research Methods, 43, 1171–1181.
Gilzenrat, M. S., Nieuwenhuis, S., Jepma, M., & Cohen, J. D.
(2010). Pupil diameter tracks changes in control state
predicted by the adaptive gain theory of locus coeruleus
function. Cognitive, Affective & Behavioral Neuroscience,
10, 252–269.
Gray, J. R., Chabris, C. F., & Braver, T. S. (2003). Neural
mechanisms of general fluid intelligence. Nature
Neuroscience, 6, 316–322.
Hayes, T. R. (2015). Mechanisms of visual relational reasoning
(Unpublished doctoral dissertation). The Ohio State
University, Columbus, OH.
Hayes, T. R., & Petrov, A. A. (2015). Mapping and correcting the
influence of gaze position on pupil size measurements.
Behavior Research Methods, 1–18. Advance online
publication. doi:10.3758/s13428-015-0588-x.
Hayes, T. R., & Petrov, A. A. (submitted). Learning is in the eye
of the beholder: Phasic pupil diameter decreases during
perceptual learning.
Hayes, T. R., Petrov, A. A., & Sederberg, P. B. (2011). A novel
method for analyzing sequential eye movements reveals
strategic influence on Raven’s Advanced Progressive Matrices.
Journal of Vision, 11, 1–11.
Hayes, T. R., Petrov, A. A., & Sederberg, P. B. (2015). Do we
really become smarter when our fluid-intelligence scores
improve? Intelligence, 48, 1–14.
Hertzum, M., & Holmegaard, K. D. (2013). Thinking aloud
in the presence of interruptions and time constraints.
International Journal of Human–Computer Interaction,
29, 351–364.
Jay, B. S. (1962). The effective pupillary area at varying
perimetric angles. Vision Research, 1, 418–424.
Jennings, J. A., & Charman, W. N. (1978). Optical image quality
in the peripheral retina. American Journal of Optometry
and Physiological Optics, 55, 582–590.
Jepma, M., & Nieuwenhuis, S. (2011). Pupil diameter predicts
changes in exploration–exploitation trade-off: Evidence for
the adaptive gain theory. Journal of Cognitive Neuroscience,
23, 1587–1596.
Hayes and Petrov
317
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
t
t
f
/
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
/
f
t
.
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
Kahneman, D., & Beatty, J. (1967). Pupillary responses in a
pitch-discrimination task. Perception & Psychophysics, 2,
101–105.
Kammerer, Y., & Gerjets, P. (2013). The role of thinking-aloud
instructions and prior domain knowledge in information
processing and source evaluation during Web search. In
M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.),
Proceedings of the 35th Annual Conference of the
Cognitive Science Society (pp. 716–721). Austin, TX:
Cognitive Science Society.
Klingner, J. (2010). Measuring cognitive load during visual
tasks by combining pupillometry and eye tracking
(Unpublished doctoral dissertation). Stanford University,
Stanford, CA.
Koss, M. (1986). Pupillary dilation as an index of central nervous
system α2-adrenoceptor activation. Journal of Pharmacological
Methods, 15, 1–19.
Laeng, B., Sirois, S., & Gredeback, G. (2012). Pupillometry:
A window to the preconscious? Perspectives on Psychological
Science, 7, 18–27.
Lovett, A., Tomai, E., Forbus, K., & Usher, J. (2009). Solving
geometric analogy problems through two-stage analogical
mapping. Cognitive Science, 33, 1192–1231.
Mathur, A., Gehrmann, J., & Atchison, D. A. (2013). Pupil shape
as viewed along the horizontal visual field. Journal of Vision,
13, 1–8.
Murphy, P. R., O’Connell, R. G., O’Sullivan, M., Robertson,
I. H., & Balsters, J. H. (2014). Pupil diameter covaries with
BOLD activity in human locus coeruleus. Human Brain
Mapping, 35, 4140–4154.
Murphy, P. R., Robertson, I. H., Balsters, J. H., & O’Connell,
R. G. (2011). Pupillometry and P3 index of locus coeruleus
noradrenergic arousal function in humans. Psychophysiology,
48, 1531–1542.
Newell, A., & Simon, H. A. (1976). Computer science as
empirical enquiry: Symbols and search. Communications of
the Association of Computing Machinery, 19, 113–126.
Phillips, M. A., Szabadi, E., & Bradshaw, C. M. (2000). Comparison
of the effects of clonidine and yohimbine on spontaneous
pupillary fluctuations in healthy human volunteers.
Psychopharmacology, 150, 85–89.
Rajkowski, J., Kubiak, P., & Aston-Jones, G. (1994). Locus coeruleus
activity in monkey: Phasic and tonic changes are associated
with altered vigilance. Brain Research Bulletin, 35, 607–616.
Rajkowski, J., Majczynski, H., Clayton, E., & Aston-Jones, G.
(2004). Activation of monkey locus coeruleus neurons varies
with difficulty and performance in a target detection task.
Journal of Neurophysiology, 92, 361–371.
Raven, J. C., Raven, J., & Court, J. H. (1998). Manual for
Raven’s progressive matrices and vocabulary scales. Section
4: Advanced progressive matrices. San Antonio, TX: Pearson.
Samuels, E. R., & Szabadi, E. (2008). Functional neuroanatomy
of the noradrenergic locus coeruleus: Its roles in the regulation
of arousal and autonomic function part II: Physiological and
pharmacological manipulations and pathological alterations of
locus coeruleus activity in humans. Current Neuropharmacology,
6, 254–285.
Spring, K. H., & Stiles, W. S. (1948). Apparent shape and size of
the pupil viewed obliquely. British Journal of Ophthalmology,
32, 347–354.
Taatgen, N. A. (2013). The nature and transfer of cognitive
skills. Psychological Review, 120, 439–471.
Taatgen, N. A., Huss, D., Dickison, D., & Anderson, J. R. (2008).
The acquisition of robust and flexible cognitive skills. Journal
of Experimental Psychology: General, 137, 548–565.
Usher, M., Cohen, J. D., Servan-Schreiber, D., Rajkowski, J., &
Aston-Jones, G. (1999). The role of locus coeruleus in the
regulation of cognitive control. Science, 283, 549–554.
Van Der Meer, E., Beyer, R., Horn, J., Foth, M., Bornemann, B.,
Ries, J., et al. (2010). Resource allocation and fluid intelligence:
Insights from pupillometry. Psychophysiology, 47, 158–169.
van Steenbergen, H., & Band, G. P. H. (2013). Pupil dilation in
the Simon task as a marker of conflict processing. Frontiers in
Human Neuroscience, 7, 1–11.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
l
l
/
/
/
/
j
f
/
t
t
i
t
.
:
/
/
h
t
t
p
:
/
D
/
o
m
w
i
n
t
o
p
a
r
d
c
e
.
d
s
f
i
r
o
l
m
v
e
h
r
c
p
h
a
d
i
i
r
r
e
.
c
c
t
.
o
m
m
/
j
e
d
o
u
c
n
o
/
c
a
n
r
a
t
r
i
t
i
c
c
l
e
e
-
p
-
d
p
d
2
f
8
/
2
2
8
3
/
0
2
8
/
1
3
9
0
5
8
0
/
4
1
0
7
2
8
o
4
c
3
n
2
_
2
a
/
_
j
0
o
0
c
8
n
9
5
_
a
p
_
d
0
0
b
8
y
9
g
5
u
.
e
p
s
t
d
o
f
n
b
0
y
7
S
M
e
I
p
T
e
m
L
i
b
b
e
r
r
a
2
r
0
i
2
3
e
s
/
j
/
t
f
.
u
s
e
r
o
n
1
7
M
a
y
2
0
2
1
318
Journal of Cognitive Neuroscience
Volume 28, Number 2