Don’t Think, Just Feel the Music: Individuals with Strong - Ricerca sull'intelligenza artificiale specializzata al MIT

Don’t Think, Just Feel the Music: Individuals with Strong
Pavlovian-to-Instrumental Transfer Effects Rely Less on
Model-based Reinforcement Learning

Miriam Sebold1,2, Daniel J. Schad1,3, Stephan Nebe4, Maria Garbusow1,2, Elisabeth Jünger4,
Nils B. Kroemer4,5,6, Norbert Kathmann2, Ulrich S. Zimmermann4, Michael N. Smolka4,
Michael A. Rapp3, Andreas Heinz1, and Quentin J. M. Huys7,8

D
o
w
N
l
o
UN
D
e
D

F
R
o
M

Astratto

■ Behavioral choice can be characterized along two axes. One
axis distinguishes reflexive, model-free systems that slowly accu-
mulate values through experience and a model-based system
that uses knowledge to reason prospectively. The second axis
distinguishes Pavlovian valuation of stimuli from instrumental
valuation of actions or stimulus–action pairs. This results in four
values and many possible interactions between them, with im-
portant consequences for accounts of individual variation. Noi
here explored whether individual variation along one axis was
related to individual variation along the other. Specifically, we

asked whether individuals’ balance between model-based and
model-free learning was related to their tendency to show
Pavlovian interferences with instrumental decisions. In two in-
dependent samples with a total of 243 participants, Pavlovian–
instrumental transfer effects were negatively correlated with the
strength of model-based reasoning in a two-step task. This sug-
gests a potential common underlying substrate predisposing in-
dividuals to both have strong Pavlovian interference and be less
model-based and provides a framework within which to inter-
pret the observation of both effects in addiction. ■

INTRODUCTION

Pavlovian expectations of rewards or losses richly color
and confound instrumental action choice. Background
music is deployed in shops and restaurants to promote
spending and specific choices, whereas stimuli associated
with addictive substances are thought to perpetuate use
and promote relapse. Individual variation in the nature of
the underlying decision-making systems likely deter-
mines the strength of these effects.

Decision-making in humans and animals can be charac-
terized along at least two axes, both of which are important
for individual variation (Dayan & Berridge, 2014; Huys,
Tobler, Hasler, & Flagel, 2014). The first axis concerns
the distinction between model-free (MF) and model-based
(MB) decision-making (Doll, Duncan, Simone, Shohamy, &
Daw, 2015; Lee, Shimojo, & O’Doherty, 2014; Dezfouli &
Balleine, 2013; Daw, Gershman, Seymour, Dayan, & Dolan,
2011; Glascher, Daw, Dayan, & O’Doherty, 2010). The MF
habit system learns through repeated experience, whereas
the MB goal-directed system uses an internal model to pro-
spectively reason about the value of actions. Computation-
alleato, MF decision-making relies on temporal difference

1Charité-Universitätsmedizin Berlin, 2Humboldt-Universität zu
Berlin, 3University of Potsdam, 4Technische Universität Dresden,
5Yale University School of Medicine, 6The John B. Pierce Labora-
tory, Nuovo paradiso, CT, 7University of Zurich, 8ETH Zürich

apprendimento: Values are learned through comparisons of esti-
mated and actual received reward and updated with pre-
diction errors. In MB reinforcement learning algorithms,
the computation of values happens on the fly, integrating
internal representations of state-action-reward probabili-
ties and rewards (Sutton & Barto, 1998). Although MB
decision-making is therefore computationally costly, MF
decision-making is experientially demanding as changes
have to be experienced multiple times for the iterative
prediction error updates to change existing values. Dopo
an outcome devaluation (per esempio., through satiation), the MB
system can change preferences quickly, but the MF sys-
tem cannot. Individual variation in the balance between
MB and MF decisions, with a shift toward MF and away
from MB learning, is associated with addictive and impul-
sive traits in animals (Huys et al., 2014; Everitt & Robbins,
2005), and a bias has been reported in conditions such as
addiction and obsessive-compulsive disorder where be-
havioral preferences persist against explicit desires (Voon
et al., 2014, 2015; Gillan et al., 2011, 2014; Sebold et al.,
2014; Sjoerds et al., 2013).

The second axis concerns the distinction between
instrumental and Pavlovian paradigms. In instrumental
paradigms, actions a have values that depend on the
presence of particular stimuli or situations s, leading to
the valuation of stimulus–action pairs V(S,UN). In Pavlovian
conditioning paradigms, stimuli s predict outcomes

Journal of Cognitive Neuroscience 28:7, pag. 985–995
doi:10.1162/jocn_a_00945

/
j

F
/

T
T

io
T
.

:
/
/

H
T
T
P
:
/
D
/
o
M
w
io
N
T
o
P
UN
R
D
C
e
.
D
S
F
io
R
o
l
M
v
e
H
R
C
P
H
UN
D
io
io
R
R
e
.
C
C
T
.
o
M
M
/
j
e
D
o
tu
C
N
o
/
C
UN
N
R
UN
T
R
io
T
io
C
C
l
e
e
–
P
–
D
P
D
2
F
8
/
7
2
8
9
/
8
7
5
/
1
9
9
8
5
5
1
/
5
1
1
8
5
1
o
3
C
5
N
4
_
7
UN
/
_
j
0
o
0
C
9
N
4
5
_
UN
P
_
D
0
0
B
9
sì
4
G
5
tu
.
e
P
S
T
D
o
F
N
B
0
sì
8
S
M
e
IO
P
T
e
M
l
io
B
B
e
R
R
UN
2
R
0
io
2
3
e
S

/
j

tu
S
e
R

o
N

1
7

M
UN
sì

2
0
2
1

independent of actions. These situations are described by
action-independent stimulus values V(S) (Dayan, Niv,
Seymour, & Daw, 2006). Pavlovian values V(S) influence
actions in a variety of ways, including by eliciting
approach/withdrawal to the stimulus s and by promoting
or inhibiting the species-specific innate responses to s.
They also have two distinct influences on instrumental
processes in so-called Pavlovian–instrumental transfer
(PIT) paradigms. Pavlovian stimuli influence the tendency
to emit behavior generally (general PIT), with a stimulus
predicting water for instance also enhancing respond-
ing for food, and they specifically increase choices of
actions that lead to the outcome the Pavlovian stimulus
is associated with (outcome-specific PIT). Individual vari-
ation in Pavlovian processes has again been related to
addictive and compulsive traits (Garbusow et al., 2014;
Flagel, Waselus, Clinton, Watson, & Akil, 2014; Flagel et al.,
2011; Robinson & Berridge, 1993).

MB and MF systems have been shown to work in
parallel in both instrumental and Pavlovian paradigms
(Dayan & Berridge, 2014; Huys et al., 2014; Jones et al.,
2012; Daw et al., 2011; McDannald, Lucantonio, Burke,
Niv, & Schoenbaum, 2011; Daw, Niv, & Dayan, 2005;
Killcross & Coutureau, 2003), leading to four values and
many opportunities for complex interactions (Dayan &
Berridge, 2014; Huys et al., 2014). For instance, outcome-
specific PIT requires access to the specific nature of
the outcome associated with the Pavlovian stimulus s.
Computationally, this is by definition not contained in
the MF value and, Perciò, must depend on aspects
of MB evaluation. D'altra parte, devaluation of
the outcome frequently fails to impact outcome-specific
PIT (Eder & Dignath, 2015; Watson, Wiers, Hommel, &
de Wit, 2014; Hogarth & Chase, 2011; Allman, DeLeon,
Cataldo, Holland, & Johnson, 2010; Hogarth, Dickinson,
& Duka, 2010; Corbit, Janak, & Balleine, 2007; Holland,
2004; Rescorla, 1994), suggesting computational mix-
tures, with MB processes for instance retrieving MF values
that are resistant to devaluation. Infatti, possibilities for
such complex interactions have been increasingly exam-
ined recently (Cushman & Morris, 2015; Huys et al., 2012,
2015; Guitart-Masip et al., 2012).

There are thus multiple paths toward the interaction
between different valuation systems, and these are likely
influenced by established individual variation both in
terms of Pavlovian influences on choice and the balance
away from MB decisions. We thus wanted to examine
what the empirical, dominant pattern of covariation be-
tween MB/ MF tradeoffs and Pavlovian influences on
choice are in a healthy sample.

Specifically, we explored whether individual differ-
ences in PIT effects are associated with individual differ-
ences in the behavioral contribution of MB/MF learning
in a separate instrumental choice task (Daw et al.,
2011). We have previously observed increased PIT and
reduced MB decisions in alcohol-dependent patients
(Garbusow et al., 2014, 2015; Sebold et al., 2014) E

hence expected PIT effects overall to be driven more
by MF learning and to covary negatively with MB control.
On the basis of these findings, we expected decreased
MB but enhanced MF behavior in those participants with
higher PIT effects. We aimed to test the described
hypothesis in an exploration sample and replicate them
in a secondary, demographically, and behaviorally distinct
sample.

METHODS

Participants

At the time of analysis, a total of 267 participants were
recruited as part of a longitudinal study on alcohol use
disorder (LeAd study, www.Lead-studie.de, clinical trial
numbers NCT01679145 and NCT01744834). The two-
center study contains two separate projects. One project
examines alcohol-dependent patients and age, sex, E
education-matched healthy control participants. Because
our hypotheses did not focus on alcohol dependence, we
here examined healthy control participants only (n =
78). Data of 11 participants were excluded, two due to
positive drug screenings, three due to technical issues,
and another six due to poor task performance, leav-
ing 67 participants (10 women, Mage = 43.07 years,
SDage = 11.02 years) for analyses. We first analyzed these
participants and will therefore subsequently refer to
them as the exploration sample. The second project
examines 18-year-old male participants, representatively
sampled from the local registry (n = 187). Data of
two participants were removed due to technical issues,
five due to positive drug screenings, two due to other
exclusion criteria of the LeAd study (per esempio., no alcohol
intake in the past year), and two additional participants
due to poor task performance, in partenza 176 participants
for analyses. Those participants were analyzed after the
exploration sample, and we will thus henceforth refer
to them as the replication sample. As the two samples
differed profoundly in terms of demographics and behav-
ior, this is a very stringent test. Both samples were exam-
ined for current and past psychiatric disorders using the
Composite International Diagnostic Interview (Jacobi et al.,
2013; Wittchen & Pfister, 1997). Exclusion criteria com-
prised a lifetime history of bipolar or psychotic disorder,
current diagnosis of major depression, posttraumatic
stress disorder, borderline personality disorder, obsessive-
compulsive disorder, hypomania, generalized anxiety dis-
order, past and current substance dependencies other
than nicotine, past and current neurological disorders, UN
history of severe head trauma, and current medication
that affects the CNS.

Procedure

All participants first completed a PIT task and then the two-
step task (Daw et al., 2011). Both tasks were programmed

986

Journal of Cognitive Neuroscience

Volume 28, Numero 7

D
o
w
N
l
o
UN
D
e
D

F
R
o
M

/
j

F
/

T
T

io
T
.

:
/
/

/
j

tu
S
e
R

o
N

1
7

M
UN
sì

2
0
2
1

using Matlab 2011 (version 7.12.0; The MathWorks, Natick,
MA) with the Psychophysics Toolbox Version 3 (PTB-3;
Brainard, 1997; Pelli, 1997). The two-step task and parts
of the PIT task were performed inside an MRI scanner.
The study was approved by local ethics committees. All par-
ticipants gave written informed consent and were paid a
fixed amount (A10/hr) plus an additional bonus contin-
gent on their performance.

either go or no-go yield more, but on average equal,
rinforzo. The use of approach is motivated by the
intuitive importance of maladaptive approach to drugs in
addiction. By collapsing across equally valued go and no-go
instrumental scenarios, it ensures that the PIT effect is not
specific to active versus inactive responses. By including
both gains and losses, it extracts Pavlovian conditioned
stimuli (CS) effects that are related specifically to value
independent of its sign.

PIT Task

Participants underwent (1) instrumental training, (2)
Pavlovian training, (3) PIT, E (4) a forced-choice task
(see Garbusow et al., 2014). For description of each part,
Guarda la figura 1.

The task is notable in three features: in the use of ap-
proach; of both appetitive and aversive Pavlovian stimuli;
and in that it contains instrumental stimuli for which

Two-step Task

Each participant performed 201 trials of the two-step
decision-making task described by Sebold et al. (2014;
see Figure 2A). In each trial, participants had to perform
an initial choice between two stimuli on a gray back-
ground. This choice then led to one of two second-stage
options (either green or yellow) from which one stimulus

D
o
w
N
l
o
UN
D
e
D

F
R
o
M

/
j

T
T

F
/

io
T
.

:
/
/

/
j

Figura 1. (UN) Instrumental training: Participants were instructed to collect shells by repeated button presses after which they received probabilistic
feedback. In “go trials”, collection of a shell was monetarily rewarded in 80% and punished in 20% of trials, and vice versa if not collected. In
“no-go trials”, collection of a shell was monetarily punished in 80% and rewarded in 20% of the trials, and vice versa if not collected. A learning
criterion for the instrumental training was enforced to ensure comparable task performance between participants (after a minimum of 60 trials,
80% correct choices over 16 consecutive trials). Participants performed the instrumental training until the learning criterion was met or for a
maximum of 120 trials. (B) Pavlovian conditioning: At the beginning of each trial, participants saw a fractal-like stimulus accompanied by the
sound of a tone (combined CS). After a delay of 3 sec, an unconditioned coin stimulus (US) was presented for another 3 sec. Participants were
instructed to be attentive to the CS–US pairings. CS–US associations consisted of two CSs paired with images of +2/+1 EUR coins, one CS paired
con 0 EUR, and two CSs paired with −1/−2 EUR, rispettivamente. All participants completed 80 trials. (C) PIT: Each trial consisted of the presentation
of one of the previously learned shells while both the auditory and visual CS from the Pavlovian conditioning were presented. Participants were
instructed to perform the instrumental task again. Participants had 3 sec to respond. The intertrial interval was exponentially distributed ranging from
2 A 6 sec and a fixation cross displayed centrally. No feedback was presented, but participants were instructed that their choices would influence
their final monetary outcome. There were 90 trials. (D) Forced choice task: Participants were presented with the two combined CS sequentially
and asked to choose one. All possible CS pairings were presented three times in a randomized order. We used these data to verify acquisition of
Pavlovian expectations and excluded participants for further data analyses (exploration sample n = 6, replication sample n = 2) if they did not
perform better than chance in this part.

tu
S
e
R

o
N

1
7

M
UN
sì

2
0
2
1

Sebold et al.

987

D
o
w
N
l
o
UN
D
e
D

F
R
o
M

/
j

F
/

T
T

io
T
.

:
/
/

/
j

tu
S
e
R

o
N

1
7

M
UN
sì

2
0
2
1

Figura 2. (UN) The structure of the two-step task. In each trial, participants chose between two initial stimuli, leading them to a second stage
(either green or yellow), at which they again had to make a choice. Each second-stage choice was probabilistically rewarded. These reward
probabilities slowly changed over time. Each first-stage choice was frequently associated with a certain transition to the second stage (70% of all trials)
but rarely associated with the opposing second stage (30% of trials). (B) MF decision-making does not consider transition frequencies. Stage 1
actions resulting in reward have a higher probability to be repeated than actions that did not end up being rewarded. Così, MF decision-making
predicts a main effect of reward. (C) Only MB decision-making takes transition probabilities into account. After a rewarded rare transition, IL
best chance of reaching that same rewarding second-stage stimulus again is to switch stimuli at the first stage and thereby use the frequent transition.
Likewise, after a rare, unrewarded transition, the best chance of avoiding that same stimulus is to stay at this same first-stage stimulus, which commonly
leads to the other, possibly rewarding second-stage stimuli. Both exploration (D) and replication (E) samples show a mixture of MB and MF
choices.

had to be selected again. Crucially, the transition from
first-stage choices to the specific second stage was prob-
abilistic: Whereas one option on the first stage led fre-
quently to the green second-stage option (70%) Ma
rarely to the yellow second-stage option (30%), the other
first-stage choice was associated with frequent yellow
second-stage visits but rare green second-stage visits. A
the second stage, participants were probabilistically re-
warded with 20 cents or 0 cent (red cross superimposed
on the 20-cent coin). To encourage participants to learn
throughout the experiment, all four second-stage payoff
probabilities changed slowly according to Gaussian ran-
dom walks with reflecting boundaries at 0.25 E 0.75.
We used the same random walk as in the original publi-
catione. In each stage, participants had 2 sec to perform
their response. Variable intertrial intervals were drawn
from an exponential distribution between 1 E 6 sec.
Before starting the task, participants completed a training
session with different random walks and a different stim-
ulus set. Crucially, the training version was carefully trans-
lated from the version implemented by Daw et al. (2011).
MB and MF decisions make distinct predictions on how
reward and transition should influence first-stage behav-
ior (Figure 2B and C).

Data Analysis

We first analyzed data from the exploration sample and
subsequently validated our results with the replication
sample. All regression analyses were conducted using

generalized linear mixed-effects models implemented with
the lme4 package (Bates, Maechler, Bolker, & Walker,
2014) in the R programming language, version 3.1.2
(cran.us.r-project.org). For orthogonal contrasts in linear
mixed-effects models, we used effect coding (−0.5/
+0.5). Computational modeling was performed in Matlab
2012–2015 (versions 8.0–8.5).

PIT Task

All analyses focused on the PIT part (see Figure 1C),
when participants had to perform a previously acquired
response in the presence of Pavlovian stimuli.

The number of button presses in each trial was mod-
eled as a Poisson distribution in a generalized linear
mixed-effects model. In each trial, it was regressed on
the nominal Pavlovian value of the CS in the background
(−2, −1, 0, +1, +2). The model contained an additional
nuisance variable to remove the influence of instrumental
value (go/no-go) from the foreground stimuli. The within-
subject factors (intercept, main effect of Pavlovian value,
instrumental value, and their interaction) were treated as
random effects across participants. Specific instrumental
stimuli (shells) and Pavlovian stimuli (fractals-like) were
taken as additional crossed random effects to control for
item effects. We extracted individual regression coef-
ficients for the CS stimuli (henceforth referred to as
PIT slope) for further analyses. As the PIT slope histo-
grams were bimodal, we clustered participants into two
groups using a mixture of Gaussians fitted with expectation

988

Journal of Cognitive Neuroscience

Volume 28, Numero 7

maximization (mixtools package; Benaglia, Chauveau,
Hunter, & Young, 2009). We also tested whether the PIT
regression coefficients were significant in individual partic-
ipants. Tuttavia, these are for descriptive purposes only:
As participants did not respond at all on some trails, button
presses showed a zero inflation.

Two-step Task

We performed two sets of analyses. The first was a mixed-
effects logistic (Otto, Skatova, Madlon-Kay, & Daw, 2015;
Schad et al., 2014; Otto, Raio, Chiang, Phelps, & Daw,
2013) where first-stage choices (stay/switch) were re-
gressed on the previous trial outcome and transition
frequency (common or rare). Within-subject factors (In-
tercept, main effect of reward, main effect of transition
and their interaction) were taken as random effects
across participants.

RTs. Knowledge of the transition frequency is only used
when decisions are model-based, whereas in MF deci-
sions common and rare trials are considered as equiva-
lent. Così, the difference between second-stage RTs
after common versus rare transitions should reflect the
level of involvement of MB control (Deserno, Huys,
et al., 2015). We therefore repeated the above analyses,
but using log-transformed second-stage RTs. Values two
standard deviations below mean (0.5% of cases) were ex-
cluded from further analyses. This step did not influence
the results. For visualization, MB RT effects were calculated
from the individual difference between mean second-stage
RTs after rare versus common transitions.

Computational model. We additionally fitted a repar-
ameterization of the original Daw et al. (2011) rein-
forcement learning model to the data. It contains an
MF parameter (βMF) that weighs the contribution of an
MF temporal difference learner and a parameter (βMB)
that weighs contributions by the MB learner, which uses
the transition matrix as well as the reward contingen-
cies. We imposed broad Gaussian priors (mean 0, vari-
ance 10) on all parameters, and results are based on
maximum a posteriori parameter estimates. The model
fitted better than chance in 75% (55/67) of the partici-
pants in the exploration sample and 72% (126/176) Di
the participants in the replication sample. Tavolo 2 Rif-
ports the estimated parameters of both samples. For in-
ference, all parameters were transformed such that they
were unbounded, and we retained these transformations
to test correlations. None of the conclusions are affected
by this transformation.

Relationship between PIT and Two-step Tasks

To test whether PIT effects were related to two-step perfor-
mance, we added individual PIT slopes (as z-transformed

variable) as a between-subject predictor in the binomial
models of the two-step task and tested its interactions
with the other fixed effects in the model.

For RT analyses, we performed linear mixed-effects re-
gression with PIT slopes (z-transformed) and transition
frequency as predictors for second-stage RTs.

Inoltre, we correlated individual MB (βMB) and MF
(βMF) subject parameters from the computational model
with PIT coefficients (Spearman correlation).

RESULTS

Exploration Sample: Choices

There was a significant group level PIT effect (fixed effect
Pavlovian value, P < .0001; see Figure 3A) such that participants pressed more when there was a positive background CS and less when it was negative. Approxi- mately half of the participants showed an individually significant effect (slope significantly positive in 63% 42/ 67 participants). The PIT slope was b = 0.27 on average (fixed-effect coefficient) and varied substantially across participants (random-effect SD = 0.36), suggesting large interindividual variation in the extent to which actions are controlled by Pavlovian stimuli, which is in line with previous research on PIT effects in humans (Garbusow et al., 2014; Prévost, Liljeholm, Tyszka, & O’Doherty, 2012). In the two-step task, group level behavior reflected a mixture of MF and MB decision-making. There were both a significant main effect of reward ( p < .0001) and a sig- nificant interaction between reward and transition ( p < .0001; see Figure 2D). To examine the relationship between PIT and the trade- off between MB and MF choices, we performed two tests. First, we entered individual PIT effects as additional regressors in the two-step logistic regression and tested (1) Reward × PIT slope and (2) Reward × Transition × PIT slopes interactions. Significant interactions would in- dicate that a relationship exists between the extent to which actions are influenced by Pavlovian values and MF versus MB learning, respectively. Individual PIT ef- fects significantly interacted with MB decision-making (Reward × Transition × PIT slope: p < .05), but not with MF behavior (Reward × PIT slope, p > .05; Vedi la tabella 1
and Figure 3B); as hypothesized, the association between
PIT effects and MB learning was negative. Così, partici-
pants who showed larger PIT effects were less model-
based.

There was also a significant negative interaction be-
tween transition and PIT (transition × PIT, P < .05), indicating that participants with small PIT effects tended to stay more after common compared with rare trials. Al- though the transition itself does not play a role in either MB or MF system, the fact that those individuals who were less sensitive to it were more sensitive to Pavlovian CSs is in keeping with a shift away from MB learning. Sebold et al. 989 D o w n l o a d e d f r o m l l / / / / j t t f / i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j / t . f u s e r o n 1 7 M a y 2 0 2 1 D o w n l o a d e d f r o m l l / / / / j f / t t i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j . f t / u s e r o n 1 7 M a y 2 0 2 1 Figure 3. Results of the exploration sample. (A) Observed PIT effects. Button presses in the PIT task were strongly influenced by the value of the Pavlovian background (CS value). (B) Repetition probability as a function of reward and transition frequency in the exploration sample displayed separately for participants who show high and low PIT effects. Low PIT participants had a mean PIT effect of 0.03 (n = 41), whereas high PIT participants had an average PIT effect of 0.66 (n = 26). (C) Second-stage RT as a function of transition frequency covaried negatively with PIT effect: Participants who showed no PIT effect discriminated strongly between rare and common trials in their second-stage RTs, whereas participants who displayed large PIT effects did not show this discriminative second-stage RT behavior. (D) Estimates of the MB parameter βMB displayed for participants who showed high and low PIT effects. Participants with high PIT values had lower βMB parameter estimates. Exploration Sample: Computational Modeling Results Modeling analyses replicated these findings. There was a significant negative correlation between the weight given to MB choices, βMB, and PIT coefficients (rSpearman = −.31, p < .01; see Figure 3D). There was no association between PIT and βMF ( p > .05).

of the MB system. RT differences between rare and com-
mon transitions correlated with βMB (rSpearman = .49, P < .0001) but not with βMF ( p > .05) and with Transition ×
Reward effects (rSpearman = .59, P < .0001) but not with reward effects ( p > .05), indicating that RT effects indeed
reflect MB control. PIT effects again interacted negatively
with transition ( P < .01; Figure 4C). Participants with low PIT effects showed stronger transition effects on second- stage RTs and responded faster on common than rare trials. Exploration Sample: RTs Only the MB component has access to transition fre- quency. Hence, any difference in RTs between common and rare transitions should be related to the involvement Replication Sample: Choices As in the exploration sample, there were significant PIT effects (fixed effect Pavlovian value, p < .0001; see Table 1. Binomial Mixed-effects Results Testing the Influence of PIT Effects, Outcome of Previous Trials, and Transition of Previous Trial, upon Response Repetition for the Exploration and Replication Sample Exploration Sample Replication Sample Coefficient Intercept Transition Reward PIT slope Transition × Reward Transition × PIT slope Reward × PIT slope Reward × Transition × PIT slope *p < .05. Estimate (SE) 1.36 (0.13) 0.24 (0.07) 0.80 (0.09) −0.17 (0.13) 0.77 (0.17) −0.13 (0.06) −0.03 (0.09) −0.41 (0.16) p <.0001* .0006* <.0001* .18 <.0001* .04* .73 .012* Estimate (SE) 0.96 (0.06) 0.21 (0.04) 0.36 (0.04) −0.03 (0.06) 1.75 (0.14) −0.01 (0.04) 0.06 (0.04) −0.31 (0.14) p <.0001* <.0001* <.0001* .57 <.0001* .75 .13 .03* 990 Journal of Cognitive Neuroscience Volume 28, Number 7 D o w n l o a d e d f r o m l l / / / / j t t f / i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j . / f t u s e r o n 1 7 M a y 2 0 2 1 Figure 4. Results of the replication sample. (A) Observed PIT effects. Button presses in the PIT task are strongly influenced by the value of the Pavlovian background (CS value). (B) Repetition probability as a function of reward and transition frequency in the exploration sample separately displayed for participants who show high and low PIT effects according clustering of PIT effects as a mixture of Gaussians. Low PIT participants had a mean PIT effect of 0.008 (n = 130), whereas high PIT participants had an average PIT effect of 0.39 (n = 46). (C) Second-stage RT as a function of transition frequency negatively covaried with PIT effect: Participants who show no PIT effect discriminate strongly between rare and common trials in their second-stage RTs, whereas participants who display large PIT effects do not show this discriminative second-stage RT behavior. (D) Estimates of the MB parameter βMB displayed for participants who show high and low PIT effects according to clustering of PIT effects as a mixture of Gaussians: Participants with high PIT values tended to have lower βMB parameter estimates, even though this failed to reach statistical significance. Figure 4A). However, the replication sample showed PIT effects less frequently (52/176 = 29% of participants), and the overall PIT slope (fixed-effect coefficient b = 0.12, random effect SD = 0.23) was numerically half the size of that in the exploration sample. The two-step task again reflected a mixture of MF and MB decision-making with a significant main effect of re- ward ( p < .0001) and a significant interaction between re- ward and transition ( p < .0001). Results of interaction between PIT and all two-step parameters are outlined in Table 1. As in the exploration sample, individual PIT effects interacted with MB decision-making (Reward × Transition × PIT slope: p < .05), but not with MF behavior (Reward × PIT slope, p > .05; Vedi la tabella 1 and Figure 4B). Again, IL
association between PIT effects and MB learning was
negative, indicating that participants with large PIT effects
used less MB behavior in the two-step task.

Of note, Tuttavia, participants in the replication sam-
ple were younger (18 vs. 43.1 years on average) E, In

keeping with previous results, were substantially more MB
but less MF (Age × Reward × Transition, P < .01 and Age × Reward, p < .0001). Replication Sample: Computational Modeling Results There was no association between βMF and PIT ( p > .05),
which mirrors the results from the regression analyses.
Tuttavia, the correlation between individual βMB and PIT
coefficients also failed to reach significance ( p > .05).
Upon visual inspection, participants with high PIT values
tended to have lower βMB values (Figure 4D). For explor-
atory purposes, we conducted an additional analysis
among the high PIT effect group for whom the model
fitted better than chance. Within this subgroup, PIT effects
were negatively correlated with βMB (rSpearman = −.37, P < .05) but not with βMF ( p > .05).

Tavolo 2. Estimates for All Parameters Shown as the Medians Plus Quartiles across Participants

Exploration Sample

Replication Sample

25th percentile

Median

75th percentile

βMB

0.09

0.76

3.49

βMF

1.29

2.41

3.77

0.32

0.72

1.11

β2

1.7

2.52

3.87

α1

0.34

0.57

0.79

α2

0.33

0.62

0.82

0.36

0.61

0.96

βMB

0.64

2.08

4.87

βMF

0.76

1.43

2.49

0.17

0.49

0.95

β2

1.7

2.64

3.71

α1

0.26

0.62

0.91

α2

0.39

0.63

0.80

0.23

0.52

0.91

βMB = MB component; βMF = MF component; ρ = stickiness parameter indicating first-order perservation; β2 = inverse temperature; α1 = first-
stage learning rate; α2 = second-stage learning rate; λ = eligibility trace decay parameter.

Sebold et al.

991

Replication Sample: RTs

Analysis of the second-stage RTs also replicated the re-
sults of the exploration sample, with individual PIT ef-
fects showing a trend toward interacting negatively with
transition ( P < .05; Figure 4C and Table 2). DISCUSSION We examined the relationship between Pavlovian influ- ences on behavior and the distinction between MB and MF choices. Across two independent and demographi- cally diverse samples, we found that the extent to which Pavlovian values exerted control over behavior covaried negatively with MB decision-making in an independent task. In other words, participants whose decisions were strongly controlled by Pavlovian values also expressed de- creased contributions of deliberative MB strategies. The same pattern was evident in RT analyses. Computational modeling analyses revealed equivalent direction of ef- fects, as the MB parameter βMB from a hybrid reinforce- ment learning model was negatively associated with PIT effects, although this association was only significant in one of the two samples. The PIT paradigm we employed could theoretically al- low for both outcome-specific and general PIT effects: The fact that the reward in the instrumental task and in the Pavlovian conditioning were both monetary suggests that outcome-specific PIT effects might be present. How- ever, the parametric effect of CSs on behavior we observe clarifies that the value of the stimulus, not just its identity, is retrieved and influences choice. What we can say, then, is that the tendency to retrieve the value of a CS in PIT covaries negatively with MB reasoning in healthy popula- tions. We therefore judge it strongly unlikely that the CS value retrieved would itself rely on MB processes and judge it more likely that it depends on MF ones. Such an interpretation is in accordance with recent work on individual variation in Pavlovian conditioning: Sign- trackers, who per definition express increased approach behavior toward conditioned cues, have stronger MF phasic dopaminergic signals (Flagel et al., 2011). Further- more, they show less MB learning in that they are less sensitive to devaluation (Morrison, Bamkole, & Nicola, 2015) and Pavlovian extinction (Ahrens, Singer, Fitzpatrick, Morrow, & Robinson, 2016), and abolishing their MF learn- ing through dopamine blockade does not uncover al- ternative MB reasoning (Flagel et al., 2011). Moreover, in humans, sign-trackers express increased PIT effects (Garofalo & di Pellegrino, 2015). As mentioned in the In- troduction, in outcome-specific PIT the outcome must be explicitly accessed through a mental representation (a mental model) not available to the MF system and has hence been associated with the MB prospective system (Cartoni, Puglisi-Allegra, & Baldassarre, 2013; Dolan & Dayan, 2013; Clark, Hollon, & Phillips, 2012). Recent work has shown that the MB system can also access MF values (Cushman & Morris, 2015), which might explain the persis- tence of outcome-specific PIT after devaluation (Eder & Dignath, 2015; Watson et al., 2014; Corbit et al., 2007; Holland, 2004; Rescorla, 1994) and extinction (Rosas, Paredes-Olay, Garcia-Gutierrez, Espinosa, & Abad, 2010). However, such an interpretation of our data would have allowed even strongly MB participants to show strong PIT effects, which was not the case as it arose primarily in the absence of, or in conflict with, MB control. In addition to a negative correlation with MB, we had also predicted a positive correlation between MF decision-making and (general) PIT effects, both because the two-step task measures a tradeoff between MF and MB (Doll, Bath, Daw, & Frank, 2016; Daw et al., 2011), but also because we had expected the strength of MF behavior in the two-step task to covary with the strength of Pavlovian MF conditioning and for that reason to pro- mote general PIT. Against our expectations, we did not find a relationship between MF behavior and PIT, neither through regression analyses nor by analyzing the MF component from the computational model. This is likely because the task does not have much power to detect variation in the MF component, particularly separately from MB variation (cf. Doll et al., 2016). Most studies have found correlations with the MB but not with the MF component, including cognitive (Schad et al., 2014; Otto et al., 2013) and emotional (Otto et al., 2013) vari- ables as well as pharmacological challenges ( Worbe et al., 2015; Wunderlich, Smittenaar, & Dolan, 2012), brain stimulation (Smittenaar, FitzGerald, Romei, Wright, & Dolan, 2013; but see Smittenaar, Prichard, FitzGerald, Diedrichsen, & Dolan, 2014), and interindividual differ- ences such as age (Eppinger, Walter, Heekeren, & Li, 2013) or psychiatric disorders (Sebold et al., 2014; Voon et al., 2014). Other tasks such as the probabilistic selec- tion task may be more appropriate to specifically assess the MF system (Doll et al., 2016). Finally, it is worth noting that the reward effect in the one-step repetition probabilities is strongly influenced by the λ parameter in the model. This parameter directly determines how strongly a reward at the second step impacts on MF ex- pectations at the first step. The MF weight βMF, however, could also theoretically be large without such an effect, that is, for λ = 0 when a one-step repeat probability would show little reward effect. Hence, analyses of the reward-related repeat effects relate to aspects of the MF system more than to its overall behavioral dominance. The study has some limitations. First, it is not entirely clear that other, more general mechanisms might have mediated the described association between both tasks. For instance, decreased MB performance and increased PIT effects might be caused by misunderstanding the in- struction of either task. Specifically, we instructed all par- ticipants to rely on transition frequencies in the two-step task and to respond to the foreground stimuli in the PIT task (which interferes with PIT effects). Thus, those par- ticipants who showed decreased PIT effects and strong 992 Journal of Cognitive Neuroscience Volume 28, Number 7 D o w n l o a d e d f r o m l l / / / / j f / t t i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j . f / t u s e r o n 1 7 M a y 2 0 2 1 MB control might have also been those who were more attentive to the instructions. A second limitation is that, at least in the replication sample, Pavlovian values tended to have comparably little influence on choice behavior and only a small number of participants showed PIT ef- fects at all. Thus, the correlation between behaviors in both tasks is likely to be caused by a subset of partici- pants only. Indeed when we correlated the MB parame- ter from the computational modeling with the PIT coefficients, the association became only significant when we limited our sample to participants with compa- rably high PIT effects. Moreover, we note that there were strong differences in the MF and MB component of the two-step task between the exploration and the replica- tion sample. The samples differed very substantially by age, and there is strong evidence that age reduces MB behavior (Eppinger et al., 2013). As such, the pattern emerging across the two samples is strongly supportive of the findings in both individual samples that a reduc- tion in MB tendencies covaries with increase PIT effects. Third, across both samples, we found a significant main effect of transition. Thus, participants tended to stay more after common compared with rare trials, an effect that is neither obviously related to MF or MB accounts. Even though this effect has not been observed in the original study (Daw et al., 2011), several other studies have reported it. It is a small effect that becomes apparent in large sample sizes (Voon et al., 2014; Skatova, Chan, & Daw, 2013). Thus, null findings might be due to a lack of statistical power. However, we speculate that rare trials might be particularly salient and induce subse- quent response behavioral shifts by reengaging MB con- trollers ( Yasuda, Sato, Miyawaki, Kumano, & Kuboki, 2004). There is accumulating evidence that in substance depen- dence and disorders of compulsivity PIT effects are in- creased (Garbusow et al., 2014, 2015; Hogarth, Field, & Rose, 2013; Glasner, Overmier, & Balleine, 2005) whereas MB control appears to be disrupted (Sebold et al., 2014; Voon et al., 2014). Moreover, MB neural signatures are re- duced in high-impulsive individuals (Deserno, Wilbertz, et al., 2015), and impulsivity further seems to be associated with PIT effects (Garofalo & di Pellegrino, 2015). Our find- ings suggest a common underlying mechanism driving in- dividual variation, possibly increasing the risk to develop substance dependence. Acknowledgments We thank the LeAD study teams in Dresden and Berlin for data acquisition. This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG, FOR 1617; grants HE 2597/13-1, HE 2597/14-1, HE 2597/15-1, RA 1047/2-1, SM 80/7-1, ZI 1119/3-1, WI 709/10-1, SCHA 1971/1-2, HE 2597/13-2, HE 2597/14-2, HE 2597/15-2, RA 1047/2-2, SM 80/7-2, ZI 1119/3-2, WI 709/10-2). Reprint requests should be sent to Miriam Sebold, Department of Psychiatry and Psychotherapy, Charite-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany, or via e-mail: miriam.sebold@charite.de. REFERENCES Ahrens, A. M., Singer, B. F., Fitzpatrick, C. J., Morrow, J. D., & Robinson, T. E. (2016). Rats that sign-track are resistant to Pavlovian but not instrumental extinction. Behavioural Brain Research, 296, 418–430. Allman, M. J., DeLeon, I. G., Cataldo, M. F., Holland, P. C., & Johnson, A. W. (2010). Learning processes affecting human decision making: An assessment of reinforcer-selective Pavlovian-to-instrumental transfer following reinforcer devaluation. Journal of Experimental Psychology Animal Behavior Processes, 36, 402–408. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). _lme4: Linear mixed-effects models using Eigen and S4_. R package version 1.1-7. Available at CRAN.R-project.org/package=lme4. Benaglia, T., Chauveau, D., Hunter, D. R., & Young, D. S. (2009). mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32, 1–29. Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. Cartoni, E., Puglisi-Allegra, S., & Baldassarre, G. (2013). The three principles of action: A Pavlovian–instrumental transfer hypothesis. Frontiers in Behavioral Neuroscience, 7, 153. Clark, J. J., Hollon, N. G., & Phillips, P. E. (2012). Pavlovian valuation systems in learning and decision making. Current Opinion in Neurobiology, 22, 1054–1061. Corbit, L. H., Janak, P. H., & Balleine, B. W. (2007). General and outcome-specific forms of Pavlovian–instrumental transfer: The effect of shifts in motivational state and inactivation of the ventral tegmental area. European Journal of Neuroscience, 26, 3141–3149. Cushman, F., & Morris, A. (2015). Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences, U.S.A., 112, 13817–13822. Daw, N. D., Gershman, S., Seymour, B., Dayan, P., & Dolan, R. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69, 1204–1215. Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711. Dayan, P., & Berridge, K. C. (2014). Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation. Cognitive, Affective & Behavioral Neuroscience, 14, 473–492. Dayan, P., Niv, Y., Seymour, B., & Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural Networks, 19, 1153–1160. Deserno, L., Huys, Q. J., Boehme, R., Buchert, R., Heinze, H. J., Grace, A. A., et al. (2015). Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proceedings of the National Academy of Sciences, U.S.A., 112, 1595–1600. Deserno, L., Wilbertz, T., Reiter, A., Horstmann, A., Neumann, J., Villringer, A., et al. (2015). Lateral prefrontal model-based signatures are reduced in healthy individuals with high trait impulsivity. Translational Psychiatry, 5, e659. Dezfouli, A., & Balleine, B. W. (2013). Actions, action sequences and habits: Evidence that goal-directed and habitual action control are hierarchically organized. PLoS Computational Biology, 9, e1003364. Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80, 312–325. Sebold et al. 993 D o w n l o a d e d f r o m l l / / / / j f / t t i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j . / t f u s e r o n 1 7 M a y 2 0 2 1 Doll, B. B., Bath, K. G., Daw, N. D., & Frank, M. J. (2016). Variability in dopamine genes dissociates model-based and model-free reinforcement learning. Journal of Neuroscience, 36, 1211–1222. Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D., & Daw, N. D. (2015). Model-based choices involve prospective neural activity. Nature Neuroscience, 18, 767–772. Eder, A. B., & Dignath, D. (2015). Cue-elicited food seeking is eliminated with aversive outcomes following outcome devaluation. Quarterly Journal of Experimental Psychology, 69, 574–588. Eppinger, B., Walter, M., Heekeren, H. R., & Li, S. C. (2013). Of goals and habits: Age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 7, 253. Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nature Neuroscience, 8, 1481–1489. Flagel, S. B., Clark, J. J., Robinson, T. E., Mayo, L., Czuj, A., Willuhn, I., et al. (2011). A selective role for dopamine in stimulus-reward learning. Nature, 469, 53–57. Flagel, S. B., Waselus, M., Clinton, S. M., Watson, S. J., & Akil, H. (2014). Antecedents and consequences of drug abuse in rats selectively bred for high and low response to novelty. Neuropharmacology, 76(Pt B), 425–436. Garbusow, M., Schad, D. J., Sebold, M., Friedel, E., Bernhardt, N., Koch, S. P., et al. (2015). Pavlovian-to-instrumental transfer effects in the nucleus accumbens relate to relapse in alcohol dependence. Addiction Biology. doi:10.1111/ adb.12243. Garbusow, M., Schad, D. J., Sommer, C., Jünger, E., Sebold, M., Friedel, E., et al. (2014). Pavlovian-to-instrumental transfer in alcohol dependence: A pilot study. Neuropsychobiology, 70, 111–121. Garofalo, S., & di Pellegrino, G. (2015). Individual differences in the influence of task-irrelevant Pavlovian cues on human behavior. Frontiers in Behavioral Neuroscience, 9, 163. Gillan, C. M., Morein-Zamir, S., Urcelay, G. P., Sule, A., Voon, V., Apergis-Schoute, A. M., et al. (2014). Enhanced avoidance habits in obsessive-compulsive disorder. Biological Psychiatry, 75, 631–638. Gillan, C. M., Papmeyer, M., Morein-Zamir, S., Sahakian, B. J., Fineberg, N. A., Robbins, T. W., et al. (2011). Disruption in the balance between goal-directed behavior and habit learning in obsessive-compulsive disorder. American Journal of Psychiatry, 168, 718–726. Glascher, J., Daw, N., Dayan, P., & O’Doherty, J. P. (2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66, 585–595. Glasner, S. V., Overmier, J. B., & Balleine, B. W. (2005). The role of Pavlovian cues in alcohol seeking in dependent and nondependent rats. Journal of Studies on Alcohol, 66, 53–61. Guitart-Masip, M., Huys, Q. J., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012). Go and no-go learning in reward and punishment: Interactions between affect and effect. Neuroimage, 62, 154–166. Hogarth, L., & Chase, H. W. (2011). Parallel goal-directed and habitual control of human drug-seeking: Implications for dependence vulnerability. Journal of Experimental Psychology Animal Behavior Processes, 37, 261–276. Hogarth, L., Dickinson, A., & Duka, T. (2010). The associative basis of cue-elicited drug taking in humans. Psychopharmacology, 208, 337–351. Hogarth, L., Field, M., & Rose, A. K. (2013). Phasic transition from goal-directed to habitual control over drug-seeking produced by conflicting reinforcer expectancy. Addiction Biology, 18, 88–97. Holland, P. C. (2004). Relations between Pavlovian–instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology Animal Behavior Processes, 30, 104–117. Huys, Q. J. M., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., & Roiser, J. P. (2012). Bonsai trees in your head: How the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8, e1002410. Huys, Q. J. M., Lally, N., Faulkner, P., Eshel, N., Seifritz, E., Gershman, S. J., et al. (2015). Interplay of approximate planning strategies. Proceedings of the National Academy of Sciences, U.S.A., 112, 3098–3103. Huys, Q. J. M., Tobler, P. N., Hasler, G., & Flagel, S. B. (2014). The role of learning-related dopamine signals in addiction vulnerability. Progress in Brain Research, 211, 31–77. Jacobi, F., Mack, S., Gerschler, A., Scholl, L., Hofler, M., Siegert, J., et al. (2013). The design and methods of the mental health module in the German Health Interview and Examination Survey for Adults (DEGS1-MH). International Journal of Methods in Psychiatric Research, 22, 83–99. Jones, J. L., Esber, G. R., McDannald, M. A., Gruber, A. J., Hernandez, A., Mirenzi, A., et al. (2012). Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science, 338, 953–956. Killcross, S., & Coutureau, E. (2003). Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex, 13, 400–408. Lee, S. W., Shimojo, S., & O’Doherty, J. P. (2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81, 687–699. McDannald, M. A., Lucantonio, F., Burke, K. A., Niv, Y., & Schoenbaum, G. (2011). Ventral striatum and orbitofrontal cortex are both required for model-based, but not model- free, reinforcement learning. Journal of Neuroscience, 31, 2700–2705. Morrison, S. E., Bamkole, M. A., & Nicola, S. M. (2015). Sign tracking, but not goal tracking, is resistant to outcome devaluation. Frontiers in Neuroscience, 9, 468. Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences, U.S.A., 110, 20941–20946. Otto, A. R., Skatova, A., Madlon-Kay, S., & Daw, N. D. (2015). Cognitive control predicts use of model-based reinforcement learning. Journal of Cognitive Neuroscience, 27, 319–333. Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. Prévost, C., Liljeholm, M., Tyszka, J. M., & O’Doherty, J. P. (2012). Neural correlates of specific and general Pavlovian-to-instrumental transfer within human amygdalar subregions: A high-resolution fMRI study. Journal of Neuroscience, 32, 8383–8390. Rescorla, R. A. (1994). Transfer of instrumental control mediated by a devalued outcome. Animal Learning & Behavior, 22, 27–33. Robinson, T., & Berridge, K. (1993). The neural basis of drug craving: An incentive-sensitization theory of addiction. Brain Research Reviews, 18, 247–291. Rosas, J. M., Paredes-Olay, M. C., Garcia-Gutierrez, A., Espinosa, J. J., & Abad, M. J. F. (2010). Outcome-specific transfer between predictive and instrumental learning is unaffected by extinction but reversed by counterconditioning in human participants. Learning and Motivation, 41, 150. Schad, D. J., Jünger, E., Sebold, M., Garbusow, M., Bernhardt, N., Javadi, A. H., et al. (2014). Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Frontiers in Psychology, 5, 1450. 994 Journal of Cognitive Neuroscience Volume 28, Number 7 D o w n l o a d e d f r o m l l / / / / j f / t t i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j f t . / u s e r o n 1 7 M a y 2 0 2 1 Sebold, M., Deserno, L., Nebe, S., Schad, D. J., Garbusow, M., Hagele, C., et al. (2014). Model-based and model-free decisions in alcohol dependence. Neuropsychobiology, 70, 122–131. Sjoerds, Z., de Wit, S., van den Brink, W., Robbins, T. W., Beekman, A. T., Penninx, B. W., et al. (2013). Behavioral and neuroimaging evidence for overreliance on habit learning in alcohol-dependent patients. Translational Psychiatry, 3, e337. behaviours in obsessive-compulsive disorder. Translational Psychiatry, 5, e670. Voon, V., Derbyshire, K., Ruck, C., Irvine, M. A., Worbe, Y., Enander, J., et al. (2014). Disorders of compulsivity: A common bias towards learning habits. Molecular Psychiatry, 20, 345–352. Watson, P., Wiers, R. W., Hommel, B., & de Wit, S. (2014). Working for food you don’t desire. Cues interfere with goal-directed food-seeking. Appetite, 79, 139–148. Skatova, A., Chan, P. A., & Daw, N. D. (2013). Extraversion Wittchen, H.-U., & Pfister, H. (1997). DIA-X Interviews: Manual differentiates between model-based and model-free strategies in a reinforcement learning task. Frontiers in Human Neuroscience, 7, 525. Smittenaar, P., FitzGerald, T. H., Romei, V., Wright, N. D., & Dolan, R. J. (2013). Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron, 80, 914–919. Smittenaar, P., Prichard, G., FitzGerald, T. H., Diedrichsen, J., & Dolan, R. J. (2014). Transcranial direct current stimulation of right dorsolateral prefrontal cortex does not affect model-based or model-free reinforcement learning in humans. PLoS One, 9, e86850. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press. Voon, V., Baek, K., Enander, J., Worbe, Y., Morris, L. S., Harrison, N. A., et al. (2015). Motivation and value influences in the relative balance of goal-directed and habitual Für Screening-Verfahren Und Interview; Interviewheft Längsschnittuntersuchung (DIA-X-Lifetime); Ergänzungsheft (DIA-X-Lifetime); Interviewheft Querschnittuntersuchung (DIA-X-12 Monate); Ergänzungsheft (DIA-X-12 Monate); PC-Programm Zur Durchführung Des Interviews (Längs- Und Querschnittuntersuchung); Auswertungsprogramm. Frankfurt am Main: Swets Test Service. Worbe, Y., Palminteri, S., Savulich, G., Daw, N. D., Fernandez-Egea, E., Robbins, T. W., et al. (2015). Valence-dependent influence of serotonin depletion on model-based choice strategy. Molecular Psychiatry. doi:10.1038/mp.2015.46. Wunderlich, K., Smittenaar, P., & Dolan, R. J. (2012). Dopamine enhances model-based over model-free choice behavior. Neuron, 75, 418–424. Yasuda, A., Sato, A., Miyawaki, K., Kumano, H., & Kuboki, T. (2004). Error-related negativity reflects detection of negative reward prediction error. NeuroReport, 15, 2561–2565. D o w n l o a d e d f r o m l l / / / / j t t f / i t . : / / h t t p : / D / o m w i n t o p a r d c e . d s f i r o l m v e h r c p h a d i i r r e . c c t . o m m / j e d o u c n o / c a n r a t r i t i c c l e e - p - d p d 2 f 8 / 7 2 8 9 / 8 7 5 / 1 9 9 8 5 5 1 / 5 1 1 8 5 1 o 3 c 5 n 4 _ 7 a / _ j 0 o 0 c 9 n 4 5 _ a p _ d 0 0 b 9 y 4 g 5 u . e p s t d o f n b 0 y 8 S M e I p T e m L i b b e r r a 2 r 0 i 2 3 e s / j f t . / u s e r o n 1 7 M a y 2 0 2 1 Sebold et al. 995 Don’t Think, Just Feel the Music: Individuals with Strong image

Scarica il pdf