Lookit (Teil 2): Assessing the Viability of Online
Developmental Research, Results From Three
Case Studies
Kimberly Scott
1
, Junyi Chu
1
1, and Laura Schulz
1Massachusetts Institute of Technology
Schlüsselwörter: cognitive development, research methods, Internet, looking time, preferential looking
Keine offenen Zugänge
Tagebuch
ABSTRAKT
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
To help address the participant bottleneck in developmental research, we developed a new
platform called “Lookit,” introduced in an accompanying article (Scott & Schulz, 2017), Das
allows families to participate in behavioral studies online via webcam. To evaluate the
viability of the platform, we administered online versions of three previously published
studies involving different age groups, Methoden, and research questions: an infant (M = 14.0
months, N = 49) study of novel event probabilities using violation of expectation, a study of
two-year-olds’ (M = 29.2 months, N = 67) syntactic bootstrapping using preferential looking,
and a study of preschoolers’ (M = 48.6 months, N = 148) sensitivity to the accuracy of
informants using verbal responses. Our goal was to evaluate the overall feasibility of moving
developmental methods online, including our ability to host the research protocols, securely
collect data, and reliably code the dependent measures, and parents’ ability to
self-administer the studies. Due to procedural differences, these experiments should be
regarded as user case studies rather than true replications. Encouragingly, Jedoch, alle
studies with all age groups suggested the feasibility of collecting developmental data online
and the results of two of three studies were directly comparable to laboratory results.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Access to participants is one of the primary bottlenecks in developmental research. Much of
the time involved in executing a study is spent, not on science per se, but on participant re-
cruitment, outreach, scheduling, database maintenance, and the cultivation of relationships
with partner institutions (preschools, children’s museums, hospitals, usw.). This puts pressure
on investigators to pursue questions that can be addressed with very few children per condi-
tion, motivating elegant designs but also necessarily limiting the questions that researchers can
investigate. To address the participant bottleneck, we developed a new web platform, Lookit,
that allows researchers to conduct behavioral studies in infants and young children online.
Parents access Lookit through their web browsers, self-administer the studies with their child
at their convenience, and transmit the data collected by their webcam for analysis. In a com-
panion paper (Scott & Schulz, 2017), we discuss the conceptual motivation behind Lookit
and the overall feasibility of the approach. Here we report the results of three user case studies
designed to assess the platform’s methodological potential.
We conducted three test studies adapted from previously published studies, selected ar-
bitrarily with the criteria that each use video stimuli, focus on a different age group (infants,
Kleinkinder, and preschoolers), and use a different dependent measure (violation of expectation,
preferential looking, and verbal responses). The three studies selected were a looking time
Zitat: Scott, K., Chu, J., & Schulz, L.
(2017). Lookit (Teil 2): Assessing the
viability of online development
Forschung, results from three case
Studien. Open Mind: Discoveries in
Cognitive Science, 1(1), 15–29.
doi:10.1162/opmi_a_00001
DOI:
http://doi.org/10.1162/opmi_a_00001
Supplemental Materials:
http://dx.doi.org/10.7910/DVN/9PCXYB
Erhalten: 1 Marsch 2016
Akzeptiert: 7 November 2016
Konkurrierende Interessen: The authors
declare that they have no competing
interests.
Korrespondierender Autor:
Kimberly Scott
kimscott@mit.edu
Urheberrechte ©: © 2017
Massachusetts Institute of Technology
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International
(CC BY 4.0) Lizenz
Die MIT-Presse
Online Developmental Case Studies
Scott, Chu, Schulz
study with infants (11–18 months) based on Téglás, Girotto, Gonzalez, and Bonatti
(2007), A
preferential looking time study with toddlers (24–36 months) based on Yuan and Fisher (2009),
and a forced choice study with preschoolers (36–60 months) based on Pasquini, Corriveau,
Koenig, and Harris (2007). All studies were adapted for testing in the online environment and,
as such, do not constitute true replications. Although similar results on Lookit and the lab
would be grounds for optimism (as in a validity assessment), our goal is neither to better esti-
mate the true effect size in each study (replication) nor to judge whether the results obtained
on Lookit are acceptably close to accepted values (formal validation). Eher, these experi-
ments should be regarded as user case studies in the development of a new online platform.
Differences from published works may indicate areas where the online methodology needs
refinement or further study.
METHODEN
Details relevant to the platform as a whole (Rekrutierung, consent, video quality, intercoder
Vereinbarung, usw.) are discussed in a companion paper (Scott & Schulz, 2017). For additional
details of procedures and analysis for each study, and example participant videos, see the
Supplemental Materials (Scott, Chu, & Schulz, 2017).
Data Analysis
Data were analyzed using MATLAB 2014b (MathWorks, Inc., Natick, MA) and R version 3.2.1
(R Core Team, 2015); see Supplemental Materials for code. Sample sizes were determined in
Vorauszahlung, although the final sample size depended on coding completed after stopping data
Sammlung. No conditions were dropped. All dependent measures collected and participant
exclusions are noted.
Exclusion
Participants were excluded if they did not have a valid consent record, if they were not in the
age range for the study, and if their video was not usable. Tisch 1 summarizes the number
of children excluded for these reasons in each study. Study-specific exclusions are noted in
individual methods sections. One concern was that underrepresented families might be dis-
proportionately likely to encounter technical problems or otherwise be excluded from final
Analyse. Um das zu erwähnen, we performed a logistic regression of whether each participant was
included in the final sample (N = 176) or not (N = 345) for all participants with complete
demographic data on file, using income, maternal education, multilingual status, Hispanic
origin, and White race as predictors.
(For details, see Supplemental Materials.) We found
no evidence of any collective effect of demographic details on inclusion in the studies. Der
model was not significant (chi-square = 1.71, df = 6, p = .89) and no individual predictors
were significant (all ps > .2).
Tisch 1. Numbers of unique Lookit participants in each study and numbers excluded before study-
specific criteria.
Study 1
Study 2
Study 3
Unique participants
Invalid consent
Out of age range
Unusable video
Potentially included
269
52
20
85
112
329
51
13
125
140
399
58
28
79
234
OPEN MIND: Discoveries in Cognitive Science
16
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
STUDY 1: SINGLE-EVENT PROBABILITY, USING LOOKING TIME IN INFANTS
This study was adapted from Téglás et al.
(2007), Experiment 1. In the original study, infants
were shown videos of “lottery containers” with four bouncing objects inside: three blue and
one yellow (counterbalanced). Infants looked longer when the yellow object exited through a
narrow opening in the container, demonstrating sensitivity to the probability (25% vs. 75%) von
events they had never seen before.
Method
Videos from 112 Kleinkinder (11–18 months) were coded for looking time unless
Teilnehmer
the child was fussy or distracted on more than one test trial (n = 36) or the parent’s eyes were
open on at least one test trial or she peeked on at least two (n = 13). Participants were excluded
if recording did not start on time (n = 1), if brief inability to see the child’s eyes affected any
looking time measurement by over 4 S (n = 6), oder, following the original study, if more than
two test trials were at ceiling or any test trial had a looking time under 1 S (n = 7). Data from
49 Kleinkinder (23 weiblich) are included in the main analysis. Their ages ranged from 11.0 Zu 18.0
months (M = 13.9 months, SD = 2.2 months).
The study started with four familiarization trials. On each familiarization trial, A
Verfahren
video appeared showing two blue and two yellow objects bouncing in a container for 14 S. Der
container was then occluded and parents were instructed to press the space bar to continue
the experiment, at which point a single object emerged from the container (blue or yellow,
counterbalanced). The occluder was removed, and the video paused for 20 s to measure
the infant’s looking time to the outcome. The familiarization trials were followed by four test
trials in which one of the objects was a different shape and color from the others. Outcomes
alternated between probable and improbable, with order counterbalanced. On the last two
familiarization trials and all four test trials, parents were instructed to close their eyes from the
time they pushed the space bar until audio instructions signaled the end of the trial.
We used the same stimuli as in Téglás et al. (2007). Some changes were necessary to run
the study online (siehe Tabelle 2). Of these, the most important is that videos were not contingent
on infant gaze. This would require automated gaze detection, which may be an option in
the future (for a recent review, see Ferhat & Vilariño, 2016). We expanded the age range
from 12–13 months to 11–18 months only to speed data collection. Zusätzlich, we asked
parents to close their eyes rather than wear opaque glasses. To minimize the time during which
parents had to close their eyes, we asked them to close their eyes only for each trial outcome,
introducing a delay before the start of the trial.
Two coders blind to condition coded each clip for looking time using VCode
Coding
(Hagedorn, Hailpern, & Karahalios, 2008) and for fussiness, distraction, and parent actions, als
described in Scott and Schulz (2017). Looking time for each trial was computed based on the
time from the first look to the screen until the start of the first continuous one-second lookaway,
or until the end of the trial if no valid lookaway occurred. Coders agreed on 95% of frames
on average (N = 63 Kinder; SD = 5.6%), and the mean absolute difference in looking time
between coders was .77 S (SD = .94 S).
Results and Discussion
Mean looking times to probable and improbable outcomes for this study and Téglás et al.
(2007) are shown in Figure 1. To better assess the effect of outcome probability on infants’
OPEN MIND: Discoveries in Cognitive Science
17
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
Tisch 2. Summary of differences between Téglás et al. (2007) and the Lookit study (Study 1).
Téglás et al. (2007)
Lookit
Parent blinding
Parents wore opaque glasses for the entire study.
Familiarization trials
Two
Infant control
Starting test trials
Ending test trials
Age range
Exclusion criteria
Videos of objects bouncing in the container
were paused whenever the infant wasn’t paying
attention; test trials ended after lookaway.
Test trial started (object exited container) nur
once infant was looking at the screen
(experimenter controlled).
12–13 months
Looking at ceiling on more than two test trials,
looking away in synchrony with object exiting
the container, or fussy (20 von 40 Teilnehmer
excluded).
Physical setup
Container covering 14 x 14-cm area on a
17-inch monitor; infants 80 cm away from
monitor.
Parents were asked to close their eyes during
looking time measurements, but could watch the
initial videos of objects bouncing in the lottery
containers. Coders checked parent compliance.
Four (repeated original sequence)
No infant control during videos.
Test trial started after 5-s instructions and once
infant was looking at the screen (parent
controlled).
11–18 months
Looking at ceiling on more than two test trials,
looking < 1 s on any test trial, or fussy or
distracted on more than one test trial (43 of 92
participants excluded, not counting participants
excluded for technical problems or parent
interference).
Videos displayed full-screen on participants’
computer monitors; monitor sizes and distances
not measured. Container covers approx. 11 x
11-cm area on a typical 13-inch laptop monitor
or 20 x 20-cm area on a 22-inch external
monitor.
First continuous 2-s lookaway or at 30-s total
looking time.
After 20 s; looking time measured from video as
time until first 1-s lookaway.
looking times, we performed a hierarchical linear regression of the four test trial looking times
(grouped by child) against whether the outcome was improbable and trial number, omitting
trials where children were distracted or fussy (see Table 3). Improbable outcomes were asso-
ciated with an increase in looking time of 0.62 s (95% CI [–0.68, 1.92]). This is smaller than
the 3.21 s (95% CI [0.36, 6.1]) reported by Téglás et al.
(2007), which was replicated twice
within the lab with differences of 3.7 s (Téglás, Vul, Girotto, Gonzalez, Tenenbaum, & Bonatti,
2011) and 3.4 s (Téglás, Ibanez-Lillo, Costa, & Bonatti, 2015). However, variances in looking
times in each condition are similar (4.0–4.2 s on Lookit compared to 3.6–5.6 s in the original
data). We observed no correlations between fraction looking time to improbable outcomes
and age, total looking time during training trials, or number of fussy/distracted trials (all |r| <
.1 and p > .5, Spearman rank order correlation).
This study confirms our ability to collect and reliably code looking times on Lookit, mit
good intercoder agreement despite varying webcam placement and video quality. The effect
we observed on Lookit was smaller but in the same direction as Téglás et al. (2007). Obwohl
OPEN MIND: Discoveries in Cognitive Science
18
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
)
S
(
e
M
ich
T
ich
G
N
k
Ö
Ö
l
N
A
e
M
14
12
10
8
6
4
2
0
Téglás et al. (2007)
Lookit
Improbable
outcome
Probable
outcome
Figur 1. Mean looking times to improbable and probable outcomes for Study 1 (N = 49) Und
Téglás et al. (2007) Experiment 1 (N = 20).
Error bars show SEM.
there was no evidence that our broader age range or fussiness/distraction contributed to the
reduced effect, several procedural differences likely contributed. Erste, parents were asked
to close their eyes and press the space bar after the container was occluded and before the
outcome was displayed; this introduced a 5-s delay, potentially an untenable memory demand
for the infants. Glücklicherweise, this is not due to any fundamental limitation of the online platform;
this delay could be avoided by having the parent close her eyes earlier in the trial or throughout.
Zweite, the original experiment paused familiarization videos when the infant looked away
and ended test trials upon lookaway. Here both familiarization and test videos were displayed
at a fixed rate and neither was contingent on infant gaze. Dritte, we cannot discount the
possibility that the looking time measure was not well-suited to online testing in children’s
Häuser, requiring a less distracting or more standardized environment. Jedoch, overall mean
looking times on Lookit were comparable to those on the same videos in the lab (M = 10.8 S,
SD = 3.6 S), suggesting that we did not simply lose the child’s attention to more interesting
visual environments at home.
Tisch 3. Coefficients for hierarchical linear regression of test trial looking times, excluding mea-
surements where the child was fussy or distracted (178 measurements, three or four per participant,
49 Teilnehmer).
B
SE B
P
95% CI
Intercept
Improbable (1 oder 0)
Order (1, 2, 3, 4)
13.08
.62
–.86
.94
.66
.29
[11.23, 14.94]
[–0.68, 1.93]
[–1.44, –0.29]
.35
.003
Notiz. B = regression coefficient (unstandardized); SE = stan-
dard error; p = p-value for the null hypothesis that B = 0; CI =
confidence interval.
OPEN MIND: Discoveries in Cognitive Science
19
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
STUDY 2: PARTICIPANT ROLES FROM SYNTACTIC STRUCTURE, USING PREFEREN-
TIAL LOOKING IN TWO-YEAR-OLDS
This study was adapted from Yuan and Fisher (2009). The original study demonstrated syn-
tactic bootstrapping in toddlers (27–30 months). A novel transitive or intransitive verb (“blick-
ing“) was introduced by videotaped dialogues, absent any potential visual targets for the verb’s
Bedeutung. Later, children were either asked to “Find blicking” or simply asked “What’s hap-
pening?” and looking preference for videotaped one- versus two-participant actions was mea-
sured. Children who had heard a transitive verb showed a preference for the two-participant
actions at test only when asked to find the verb, showing that they had mapped the transitive
verbs onto two-participant events.
Method
Participants We received potentially usable video from 140 Kleinkinder (24–36 months). Eins
child was excluded for previous participation and another due to a technical problem that led
to recording starting late. Videos from the remaining 138 participants were coded; Kinder
were excluded if the parent pointed, spoke, or had her eyes open during either test trial, oder wenn
the parent peeked on both test trials (n = 14). Children were also excluded if their practice
scores were under .15 (n = 50), their total looking time during the two test trials was less than
7.5 seconds (n = 1), or they looked at less than 60% of the dialogue (n = 6). Data from the
remaining 67 Kinder (32 weiblich) are included in the main analysis. Their ages ranged from
24.2 Zu 35.8 months (M = 29.2 months, SD = 3.8 months).
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
The study started with two practice phases using familiar verbs, one intransitive
Verfahren
(clapping) and one transitive (tickling). Each practice phase consisted of the child being asked
to find the target verb while a video of the target action was shown on one side of the screen
and a distractor video (showing sleeping or feeding, jeweils) was simultaneously playing
on the other side. This 8-s trial was repeated three times.
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
Ö
P
M
_
A
_
0
0
0
0
1
P
D
.
ich
Nächste, children saw a video of two women using one of four novel verbs (blicking, glor-
ping, meeking, or pimming) in three short dialogues comprising four transitive or intransitive
sentences each. Endlich, children completed a test phase in which a one-participant and a
two-participant action were shown simultaneously on opposite sides of the screen. In the ex-
perimental condition, children were asked to find the novel verb (z.B., “Find blicking!”). In
the control condition they were simply asked, “What’s happening?” Parents were instructed to
close their eyes during the second practice phase and the test phase. Immediately before the
practice and test phases, a 13-s calibration clip was shown in which a colorful spinning ball
appeared first on the left and then on the right of the screen.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Differences between this replication and the original experiments are summarized in
Tisch 4. Because delay between the dialogue and test phase was not found to matter in the
original studies, we chose a shorter delay, closer to Yuan and Fisher’s (2009) Experiment 1,
for convenience. We created four stimulus sets (verbs paired with examples of one- and two-
participant actions) to reduce the probability that an observed effect (or lack thereof) could be
explained by low-level features of the stimuli.
Two coders blind to left/right placement of action videos coded each clip using
Coding
VCode (Hagedorn et al., 2008) and for fussiness, distraction, and parent actions, wie beschrieben
OPEN MIND: Discoveries in Cognitive Science
20
Online Developmental Case Studies
Scott, Chu, Schulz
Tisch 4. Summary of procedural differences between Yuan & Fischer (2009) and Study 2.
Yuan & Fischer (2009), Exp. 1
Dialogues before familiar
verbs? (“Mary was clapping”)
Familiar verbs
Delay between end of novel
verb dialogue and start of
test phase
Ja
7 S
Yuan & Fischer (2009), Exp. 2,
Same-Day Condition
NEIN (to avoid superficial
learning during training)
Clapping and tickling
Lookit
NEIN
100–120 s
13 S (calibration video)
Novel-verb dialogue
8 Sätze
12 Sätze
12 Sätze
Novel verbs and actions used
Blick; one intransitive and one transitive action
Video events
Test trials
8 S, on two separate screens
2
3
Blick, glorp, meek, pimm;
four of each type of action
paired to verbs
8 S, on same screen
3; Zuerst 2 used in analysis to
allow stricter inclusion
criteria regarding fussiness
and parent intervention
Kontrolle (What’s happening?)
Age range
NEIN
26.6–30.2 months (M = 28.6
months)
Ja
26.8–30.4 months (M = 28.4
months)
Ja
24.2–35.8 months (M = 29.2
months)
Exclusion criteria
Fussiness (6%)
Side bias (3%), distraction
(1%), practice trial
performance ≥ 2.5 SD
below mean (1%), looking
time to 2-participant event
≥ 2.5 SD from condition
mean (3%)
Parent interference (10%),
practice scores < 0.15 (36%),
low total looking time to test
or dialogues (5%)
in Scott and Schulz (2017). As in the original study, our dependent measure was preferential
looking; although some children pointed at the screen, this was rare, and we expected looking
to be a more robust measure. For each trial, we computed the fraction of total looking time
spent looking to the right/left (fractional looking time). Mean disagreement between coders on
fractional looking time was 4.4% of trial length (SD = 2.0%, N = 138 participants).
A practice score was computed for each child as fractional right looking time during the
“tickling” practice phase minus fractional right looking time during the “clapping” practice
phase. A score of 1 indicates looking only to the target actions. The mean practice score
across the 138 coded participants was .24 (SD = .23); 36% of children were excluded due to
practice scores under .15 (see Figure 2). Children were attentive during the dialogues, looking
for an average of 83% (SD = 16%) of the duration of the dialogue videos.
Results and Discussion
Based on the original studies conducted by Yuan and Fisher (2009), we expected the
proportion of looking time to the two-participant events to be greater for transitive than
OPEN MIND: Discoveries in Cognitive Science
21
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Included
Excluded
clapping
tickling
Figure 2. Fraction of time spent looking to the right during the two practice action phases in
Study 2.
In the first phase, children were asked to find clapping (on the left), and in the second they were
asked to find tickling (on the right). Each of 138 potentially included children’s responses is displayed
as one line; the x-axis position is jittered slightly for display. Children were only included in the
main analysis if the fraction of looking time to the right was at least 0.15 greater during “tickling”
than “clapping” clips. Yellow lines show the mean fractions of time spent looking to the right for
included (solid line) and excluded (dotted line) participants.
intransitive verbs in the experimental condition, but not the control condition. Figure 3 shows
the fractional two-participant looking time per child. To measure the effect of transitive verbs
in each condition, we performed a linear regression of the fractional two-participant look-
ing time on condition (experimental/control), verb type (transitive/intransitive), interaction be-
tween condition and verb type, and stimulus set used (dummy-coded). Observations were
weighted by (0.5 – 0.5 * practice_score)
, using the time spent looking at mismatching ac-
tions during practice to estimate measurement variance. This reflects the intuition that if a child
performed “well” on practice trials then her behavior at test better indicates her understanding
of the novel verb. Regression coefficients are shown in Table 5.
−2
For comparison, we performed a similar regression on the data from the 80 children
in Experiment 2 of Yuan and Fisher (2009); for details, see the Supplemental Materials. Figure 4
shows the beta coefficients associated with transitive verbs in each condition. We observe
effect sizes similar to the original in both conditions on Lookit, with a positive effect in the
experimental condition and nearly zero effect in the control condition.
The familiar-verb trials afforded an opportunity to check for effects of family socio-
economic status (SES) on data quality. Among potentially usable sessions with demographic
data on file, we did not observe any correlations between practice scores and either SES
measure (income: r = .033, p = .74, n = 101; maternal education: r = .077, p = .44, n =
103, Spearman rank order correlations).
This study confirms that we can collect and reliably code preferential looking mea-
sures on Lookit, with good intercoder agreement. Like Yuan and Fisher (2009), we observed
increased looking to the two-participant actions only when children were asked to find
transitive verbs. However, looking to the correct actions during the practice phases was less
reliable than in the original study; additional work is needed to obtain preferential looking
responses online that are directly comparable to behavior in the lab and retain more of the
OPEN MIND: Discoveries in Cognitive Science
22
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
Transitive
Intransitive
o
e
d
v
i
t
i
n
a
p
c
i
t
r
a
p
-
2
o
t
e
m
i
t
i
g
n
k
o
o
l
l
a
n
o
i
t
c
a
r
F
1
0.8
0.6
0.4
0.2
0
Experimental
"Find [verb]!"
Control
"What's happening?"
Figure 3. Fraction of total looking time (left and right) across the two test trials spent looking
toward the two-participant action by condition.
Each dot represents one child; lines are drawn at the means. The dotted line at 0.5 represents equal
looking to one- and two-participant actions.
data collected. Practice scores were not correlated with SES measures, suggesting that differ-
ences were due to the presentation medium rather than the population.
STUDY 3:
PRESCHOOLERS
TRUST IN TESTIMONY, USING VERBAL RESPONSES FROM
The final user study was adapted from Pasquini et al.
(2007), which investigated whether 3-
and 4-year-olds monitor and evaluate the previous reliability of informants when deciding
which of two informants to trust. Children watched videos in which two informants, one more
and one less reliable, labeled four familiar objects and later provided conflicting labels for
Table 5. Coefficients for practice score weighted linear regression of fractional two-participant
looking time on condition, verb type, interaction between condition and verb type, and stimulus
set used (N = 67).
B
SE B
Intercept
Condition (1 = experimental, 0 = control)
Verb type (1 = transitive, 0 = intransitive)
Experimental * transitive
Stimuli set 1 (of 4)
Stimuli set 2
Stimuli set 3
.013
.0010
.0014
.10
.11
–.067
–.002
.04
.05
.05
.07
.05
.05
.05
p
.76
.98
.98
.18
.04
.17
.97
95% CI
[–.07, .10]
[–.10, .10]
[–.10, .10]
[–.05, .25]
[.01, .21]
[–.17, .03]
[–.10, .09]
Note. Fractional two-participant looking time is reduced by 0.5 so that 0 represents equal
looking to one- and two-participant actions. The coefficient associated with verb type
here corresponds to the effect of verb type within the control condition. B = regression
coefficient (unstandardized); SE = standard error; p = p-value for the null hypothesis that
B = 0; CI = confidence interval.
OPEN MIND: Discoveries in Cognitive Science
23
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
Yuan and Fisher (2009)
Lookit
0.2
0.15
e
z
s
i
t
c
e
f
f
E
0.1
0.05
0
-0.05
Experimental
"Find [verb]!"
Control
"What's happening?"
Figure 4. Effect sizes for effect of transitivity on fractional looking time to two-participant actions
in experimental and control conditions.
Effect sizes are the coefficients associated with transitive, as compared to intransitive, verbs from
2 x 2 linear regressions of the fraction of looking time each child spent looking at two-participant
actions against condition and verb type. The regression of data from Study 2 (N = 67) additionally
weighted data based on variance estimated from practice trials and included predictors for the
various stimulus sets used. Each regression was run with condition coded “Find [verb]!” = 0,
“What’s happening?” = 1 and vice versa so that the beta coefficients associated with transitive verbs
reflected the increase in fractional looking time to transitive verbs in the “Find [verb]!” and “What’s
happening?” conditions respectively. Error bars show standard errors. Data from Yuan and Fisher
(2009) used with permission (N = 80).
novel objects. Four-year-olds explicitly identified the more accurate informant and endorsed
her novel-object labels in all conditions tested; 3-year-olds’ performance was distinguishable
from chance when one informant was 100% accurate.
Method
Participants We received potentially usable video from 125 three-year-olds and 109 four-
year-olds. Children were excluded for failure to answer at least three of the four familiar-
object questions correctly (33 three-year-olds and 14 four-year-olds), failure to choose either of
the informants on the first explicit-judgment question (9 three-year-olds and 9 four-year-olds),
and failure to endorse either informant’s label on at least one of the novel-object questions
(11 three-year-olds and 10 four-year-olds). (For details of exclusion criteria selection see the
Supplemental Materials.) Data from the remaining 72 three-year-olds (45 female; M = 3.53
years) and 76 four-year-olds (44 female; M = 4.54 years) are included in the main analysis.
Following the original study, children completed four familiar-object trials, one
Procedure
initial explicit judgment trial, four novel-object trials, and one final explicit judgment trial.
Children were assigned to one of four conditions where informants demonstrated 100% vs.
0%, 100% vs. 25%, 75% vs. 0%, or 75% vs. 25% accuracy, transforming the original within-
subjects to a between-subjects design in order to keep the study short (about 10 min). During
object trials, children saw a video of two informants taking turns labeling the same object. The
OPEN MIND: Discoveries in Cognitive Science
24
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
Table 6. Object pictures and labels used in Study 3.
Familiar object
Label 1 (accurate)
Label 2 (inaccurate)
Label 3 (inaccurate, used if both
informants are inaccurate)
Novel object
spoon
duck
hat
bottle
apple
fork
brush
plate
key
doll
cup
tree
Label 1 (given by girl in yellow shirt)
Label 2 (given by girl in red shirt)
toma
gobi
danu
modi
dax
wug
riff
fep
informants’ answers were repeated and the child was asked what he/she thought the object was
called (endorsement measure). Onscreen instructions guided parents to replay the question if
needed or to prompt the child without repeating the object labels. The objects and labels used
are shown in Table 6. During explicit judgment trials, children were asked, “Who was better
at answering these questions, the girl in the yellow shirt or the girl in the red shirt?” Differences
between this replication and the original experiments are summarized in Table 7.
See the Supplemental Materials for coding procedures.
If the parent gave the answer
before the child’s final answer, it was treated as an invalid answer. Parents interfered in 8%
of trials by repeating the two options or answering the question themselves before the child’s
final answer.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Results and Discussion
We analyzed two measures of children’s performance:
their answers to the initial explicit-
judgment question and the number of endorsements of each informant’s labels during the
novel-object phase. To compare these endorsements to the original results, we calculated
the fraction correct (number of endorsements of the more accurate informant divided by total
number of endorsements). Performance on each measure in the present study and Pasquini
et al.
(2007) is shown in Figure 5. Children were less likely to endorse either label on Lookit
than in the original study (where all children were required to answer all questions); only 21%
of 3-year-olds and 45% of 4-year-olds gave valid answers to all four endorsement questions.
Overall, we did observe modestly lower performance on Lookit on both explicit judgment and
endorsement questions; weighting each condition and age group equally, Lookit scores were
7.2 percentage points lower on explicit-judgment questions and 10 percentage points lower
on endorsement questions. However, the overall patterns observed were similar: the eight
explicit judgment and endorsement performances, per condition and age group, were highly
correlated across the two studies (explicit preference: r = .78, p = .024; endorsement: r = .88,
p = .004).
To check for effects of SES on performance, we conducted a logistic regression of explicit
judgment responses from the 105 included subjects with demographic information on file,
using a composite SES score (mean of z-scored maternal education level and z-scored family
OPEN MIND: Discoveries in Cognitive Science
25
Online Developmental Case Studies
Scott, Chu, Schulz
Table 7.
Summary of procedural differences between Pasquini et al.
(2007) and Study 3.
Pasquini et al., 2007, Exp. 1
Pasquini et al., 2007, Exp. 2
Lookit
Accuracy conditions
100% vs. 0%
100% vs. 25%
75% vs. 0%
75% vs. 0%
75% vs. 25%
100% vs. 0%
100% vs. 25%
75% vs. 0%
75% vs. 25%
Design
Within-subjects; different informants and objects used
for each condition
Between-subjects; same informants and objects
used for each condition. (Familiar objects and
labels, and novel- object labels, were the ones
used by Pasquini et al., 2007, for the 100% vs.
0% condition of Exp. 1 and the 75% vs. 0%
condition of Exp. 2.)
False-belief task
Yes (no relationship with
other measures found)
No
No
Dependent measures
Endorsement, explicit judgment, and ask (“which
person would you like to ask?”). Children were also
asked what each object was called before the
informants answered. No main effects of question type
were found on ask vs. endorsement questions with
explicit judgment as a covariate.
Endorsement and explicit judgment; to keep the
experiment short we omitted the “ask” measure
and did not ask children what objects were
called before trials.
Explicit judgment
question
One of these people was
not very good at answering
these questions. Which
person was not very good at
answering these questions?
1. [For each informant]
Was the girl with the ___
shirt good at answering the
questions or was she not
very good at answering the
questions?
2. Who was better at
answering the questions:
the girl in the ___ shirt or
the girl in the ___ shirt?
Exclusion criteria
Incorrect response to any familiar-object
question, unless both informants were incorrect
(6% overall)
Who was better at answering these
questions, the girl in the yellow shirt or
the girl in the red shirt?
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Failure to answer at least 3 of 4
familiar-object questions correctly (20%),
failure to choose an informant first
explicit-judgment question or an
informant’s label on at least one
endorsement question (17%)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
income) in addition to age in years, condition, and age by condition interactions. The effect
of SES on the probability of giving a correct response was small and nonsignificant (eB
= 1.00,
95% CI [0.56, 1.83], z = .02, p = .98).
In this study we confirmed the viability of Lookit for collecting verbal responses from
preschoolers. Despite increased variation expected due to the between-subjects design and
slightly reduced performance overall, we observed very similar patterns of performance based
on age group and condition compared to the original study.
OPEN MIND: Discoveries in Cognitive Science
26
Online Developmental Case Studies
Scott, Chu, Schulz
a) 3-year-olds
b) 4-year-olds
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
t
c
e
r
r
o
c
.
c
a
r
F
i
s
e
c
o
h
c
t
n
a
m
r
o
n
f
i
.
c
a
r
f
n
a
e
M
)
l
e
b
a
l
r
e
h
t
i
e
(
/
)
l
e
b
a
l
i
d
e
t
c
d
e
r
p
(
Explicit judgment
(n = 18) (n = 17) (n = 18) (n = 19)
100/0 100/25
75/0
75/25
Endorsement
(n = 18) (n = 17) (n = 18) (n = 19)
100/0 100/25
75/0
75/25
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
t
c
e
r
r
o
c
.
c
a
r
F
i
s
e
c
o
h
c
t
n
a
m
r
o
n
f
i
.
c
a
r
f
n
a
e
M
)
l
e
b
a
l
r
e
h
t
i
e
(
/
)
l
e
b
a
l
i
d
e
t
c
d
e
r
p
(
Explicit judgment
(n = 22) (n = 18) (n = 16) (n = 20)
100/0 100/25
75/25
75/0
Endorsement
(n = 22) (n = 18) (n = 16) (n = 20)
100/0 100/25
75/0
75/25
Lookit
Pasquini et al. (2007), Exp 1
Pasquini et al. (2007), Exp 2
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 5. Mean performance on explicit judgment questions (top row) and endorsement ques-
tions (bottom row), by age group.
Means and 95% confidence intervals are plotted for Study 3 and for Experiments 1 and 2 of Pasquini
et al.
(2007). The study was conducted between-subjects on Lookit and within-subjects in both
Pasquini et al. experiments. Explicit judgment performance is 0 or 1 for Lookit participants and 0,
0.5, or 1 for Pasquini et al. participants. A child’s endorsement question performance is the number
of times (over the four trials) that she chose the label of the more accurate informant divided by the
number of times she endorsed either label.
GENERAL DISCUSSION
Collectively, our user case studies confirm the feasibility and suggest the promise of conduct-
ing developmental research online. Parents of children ranging from infants to preschoolers
were able to access the platform and self-administer the study protocols in their homes at their
convenience. Researchers were able to securely collect and reliably code looking time, prefer-
ential looking, and verbal response measures. We did not observe any relationships between
SES and children’s performance. More critically (since such relationships may well emerge
or be the topic of investigation in other studies), SES differences did not adversely affect par-
ents’ ability to interact with the platform: there was no effect of SES on exclusion rates. This
suggests that online testing can fulfill the goals of expanding access and lowering barriers to
participation in developmental research.
The current project was designed to investigate the possibility of collecting looking
time, preferential looking, and verbal response measures online; the results of the user case
studies suggest that in these respects, the project was successful. However, in adapting the
studies to the online environment, the studies fell short of true replications. Assessing the
degree to which various designs and results are directly reproducible online, and whether
sample diversity moderates effect size, remains an important direction for future research,
OPEN MIND: Discoveries in Cognitive Science
27
Online Developmental Case Studies
Scott, Chu, Schulz
and will be critical to understanding the relationship between online testing and laboratory-
based protocols.
Although we cannot yet conclude that measures collected on Lookit are directly com-
parable with those collected in the lab, the similarity of the results of Studies 2 and 3 to pub-
lished results is very encouraging. In Study 1, we observed effects in the same direction as the
lab-based study, but a smaller effect size than initially reported; further research must deter-
mine to what extent this was due to the protocol differences we introduced or to difficulties
adapting infant looking time measures to the online environment. As noted in the accompa-
nying conceptual paper (Scott & Schulz, 2017), online testing is not appropriate for every
study, and may be more appropriate for some designs than others (i.e., preferential looking
rather than looking time). The initial empirical results, however, provide grounds for opti-
mism about the potential of Lookit to extend the scope, transparency, and reproducibility of
developmental research.
ACKNOWLEDGMENTS
We thank all of the families who participated in this research. Thanks to the Boston Children’s
Museum where we conducted early piloting of computer-based studies. We thank Joseph
Alvarez, Daniela Carrasco, Jean Chow, DingRan (Annie) Dai, Hope Fuller-Becker, Kathryn
Hanling, Nia Jin, Rianna Shah, Shirin Shivaei, Tracy Sorto, Yuzhou (Vivienne) Wang, and Jean
Yu for help with data collection and coding. Special thanks to Kathleen Corriveau, Cynthia
Fisher, and Er ˝o Téglás for original stimuli and helpful comments; lab managers Rachel Magid
and Samantha Floyd for logistical support; and Elizabeth Spelke, Joshua Tenenbaum, and
Rebecca Saxe for helpful discussions. This material is based upon work supported by the Na-
tional Science Foundation (NSF) under Grant No. 1429216, NSF Graduate Research Fellow-
ship under Grant No. 1122374, and by the Center for Brains, Minds and Machines (CBMM),
funded by NSF STC award CCF-1231216.
AUTHOR CONTRIBUTIONS
KS developed the methodology, designed the studies, and collected data with the advice of
LS. Data analysis and interpretation was performed by KS with contributions from JC. KS and
LS prepared the manuscript.
REFERENCES
Ferhat, O., & Vilariño, F. (2016). Low cost eye tracking: The cur-
rent panorama. Computational Intelligence and Neuroscience,
2016(3), 1–14. doi.org/10.1155/2016/8680541
R Core Team. (2015). R: A language and environment for statisti-
cal computing. R Foundation for Statistical Computing, Vienna,
Austria. URL http://www.R-project.org/.
Hagedorn, J., Hailpern, J., & Karahalios, K. (2008). VCode and
Illustrating a new framework for supporting the video
VData:
annotation workflow. Proceedings of
the Workshop on
Advanced Visual Interfaces AVI, 2008, 317–321. doi.acm.org/
10.1145/1385569.1385622
Pasquini, E. S., Corriveau, K. H., Koenig, M., & Harris, P. L. (2007).
Preschoolers monitor the relative accuracy of informants. Devel-
opmental Psychobiology, 43(5), 1216–1226. doi.org/10.1037/
0012-1649.43.5.1216
Scott, K. M., Chu, J., & Schulz, L. E. (2017). Replication data for:
Lookit (part 2): Assessing the viability of online developmental
research, results from three case studies. doi.org/10.7910/DVN/
9PCXYB, Harvard Dataverse, V1.
Scott, K. M. & Schulz, L. E. (2017). Lookit: A new online platform
for developmental research. Open Mind: Discoveries in Cogni-
tive Science.
Téglás, E., Girotto, V., Gonzalez, M., & Bonatti, L. L. (2007). In-
tuitions of probabilities shape expectations about the future at
OPEN MIND: Discoveries in Cognitive Science
28
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Online Developmental Case Studies
Scott, Chu, Schulz
12 months and beyond. Proceedings of the National Academy of
Sciences of the United States of America, 104(48), 19156–
19159. doi.org/10.1073/pnas.0700271104
Téglás, E.,
Ibanez-Lillo, A., Costa, A., & Bonatti, L. L.
(2015). Numerical
representations and intuitions of proba-
bilities at 12 months. Developmental Science, 2, 183–193.
doi.org/10.1111/desc.12196
Téglás, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J. B.,
& Bonatti, L. L. (2011). Pure reasoning in 12-month-old infants
as probabilistic inference. Science, 332(6033), 1054–1059. doi.
org/10.1126/science.1196404
Yuan, S., & Fisher, C. (2009). “Really? She blicked the baby?”
Two-year-olds learn combinatorial facts about verbs by listening.
Psychological Science, 20(5), 619–626. doi.org/10.1111/j.1467-
9280.2009.02341.x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
/
/
/
/
1
1
1
5
1
8
6
8
2
1
0
o
p
m
_
a
_
0
0
0
0
1
p
d
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
OPEN MIND: Discoveries in Cognitive Science
29