REPORT
Inconvenient Samples: Modeling Biases Related
to Parental Consent by Coupling Observational
and Experimental Results
Yue Yu
1
, Patrick Shafto
2
, and Elizabeth Bonawitz
2
1Centre for Research in Child Development, National Institute of Education, Singapur
2Rutgers University-Newark
Schlüsselwörter: parental consent, parent–child interaction, pedagogical question, multiple imputation,
propensity score matching
ABSTRAKT
In studies involving human subjects, voluntary participation may lead to sampling bias,
thus limiting the generalizability of findings. This effect may be especially pronounced in
developmental studies, where parents serve as both the primary environmental input and
decision maker of whether their child participates in a study. We present a novel empirical
and modeling approach to estimate how parental consent may bias measurements of
children’s behavior. Konkret, we coupled naturalistic observations of parent–child
interactions in public spaces with a behavioral test with children, and used modeling
methods to impute the behavior of children who did not participate. Results showed that
parents’ tendency to use questions to teach was associated with both children’s behavior
in the test and parents’ tendency to participate. Exploiting these associations with a
model-based multiple imputation and a propensity score–matching procedure, we estimated
that the means of the participating and not-participating groups could differ as much as 0.23
standard deviations for the test measurements, and standard deviations themselves are likely
underestimated. These results suggest that ignoring factors associated with consent may lead
to systematic biases when generalizing beyond lab samples, and the proposed general
approach provides a way to estimate these biases in future research.
EINFÜHRUNG
Sampling and generalizability are the methodological bedrocks of behavioral science, Und
knowing whether the sample is representative of the population is critical to the validity and
generalizability of research findings. Among the many factors that may bias the sampling pro-
Prozess, one prevalent but understudied factor is the refusal to participate in research. The goal
of this study is to develop a method to estimate would-be experimental performance for those
who did not consent, and to inform the generalizability of research findings.
We chose to focus on one of the fields in which behavior tends to be heterogeneous
along factors that may associate with nonenrollment: research with young children. Before the
start of schooling, children’s experiences are heavily influenced by the values and practices of
their parents, which are known to be heterogeneous both within and between social groups
Keine offenen Zugänge
Tagebuch
Zitat: Yu, Y., Shafto, P., & Bonawitz,
E. (2020). Inconvenient Samples:
Modeling Biases Related to Parental
Consent by Coupling Observational
and Experimental Results. Open Mind:
Discoveries in Cognitive Science, 4,
13–24. https://doi.org/10.1162/opmi_
a_00031
DOI:
https://doi.org/10.1162/opmi_a_00031
Supplemental Materials:
https://www.mitpressjournals.org/doi/
suppl/10.1162/opmi_a_00031
Erhalten: 6 September 2018
Akzeptiert: 23 Dezember 2019
Konkurrierende Interessen: The authors
declare they have no conflict of
interest.
Korrespondierender Autor:
Yue Yu
yue.yu@nie.edu.sg
Urheberrechte ©: © 2020
Massachusetts Institute of Technology
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International
(CC BY 4.0) Lizenz
Die MIT-Presse
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
.
/
/
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
/
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
(Bornstein, 1991; Hoff, Laursen, Tardif, & Bornstein, 2002). Noch, parents are also the ones who
decide whether their children participate in research, and those same values and practices may
play a role in their decision to consent. If variables that influence the likelihood of parental
consent also influence children’s behavior, then this presents a major hurdle for generalizing
findings from the field.
Nonenrollment may also have larger impacts on fields in which substantial proportions
of potential participants decline or ignore recruitment efforts from researchers. Through a sur-
vey among lab managers and project leaders, we estimated the base rate of parental consent
in developmental experiments as approximately 50%, which also differed based on recruit-
ment methods (see Supplemental Materials A; Yu, Shafto, & Bonawitz, 2020). This suggests a
substantial rate of nonenrollment, which necessitates closer examination of its impact.
To date, little is known about whether and how parental consent may bias findings from
experiments with young children. We know that in the field of survey-based research with
school-aged children and adolescents, parental consent has been associated with family demo-
graphics and student behavior (Kearney, Hopkins, Mauss, & Weisheit, 1983; Lueptow, Mueller,
Hammes, & Master, 1977). Darüber hinaus, students recruited through passive consent (welche re-
quires a reply to opt out of a study) differ from students recruited through active consent (welche
requires a reply to opt in) in a number of characteristics, including race, family environment,
school performance, and percentage of at-risk youth (Anderman et al., 1995; Dent et al., 1993;
Esbensen, Müller, Taylor, Er, & Freng, 1999). Jedoch, this type of research is lacking regarding
children before school age, possibly due to a lack of archival information like school records,
as well as ethical concerns in using passive consent for experiments that require face-to-face
interactions with young children. Wichtiger, although existing research on parental
consent has shed light on who is underrepresented because of nonenrollment, we still know
little about whether and how nonenrollment may bias research findings.
Existing research on nonenrollment also tends to focus on studies with correlational de-
signs, but experimental research is subject to biases related to nonenrollment as well. Whereas
randomly assigning participants to treatment and control groups may eliminate systematic
between-group differences on potential confounding factors, the effects of treatment found in
such studies are still confined by the characteristics of the research sample before the random
Abtretung, and may not apply to those who are underrepresented in the sample from the start.
This study takes a first step to investigate whether and how factors associated with parental
consent affect research findings in developmental experiments. We developed a novel ap-
proach to achieve that goal: coupling naturalistic observations of public parent–child interac-
tions with behavioral tests with children. Observations of public behavior are commonly used
in sociology, anthropology, and psychology to study human behavior (Goffman, 1966). Spe-
cific to developmental psychology, researchers have observed and live-coded children’s and
adults’ actions and interactions in public spaces like zoos and supermarkets (Ridge, Weisberg,
Ilgaz, Hirsh-Pasek, & Golinkoff, 2015; Whiten et al., 2016), and these studies have con-
tributed to our understanding of people’s naturalistic behavior without awareness of being in
an experiment.
We started with observing parent–child interactions in public spaces to obtain a relatively
representative distribution that is unaffected by the consent process. During the observation,
we coded aspects of parent–child interactions that are known to be causally linked to chil-
dren’s behavior. We then invited children who were observed to participate in a behavioral
test. By analyzing the correlations between the observational and test data, and between the
OPEN MIND: Discoveries in Cognitive Science
14
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
.
/
/
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
/
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
observational data and participation, we looked for predictors that may associate test data with
participation itself, which would indicate a difference in test data between children who partic-
ipated and did not participate. We then used model-based multiple imputation and propensity
score–matching procedures to simulate the behavior of children who did not participate.1
Endlich, these simulation results were used to assess whether the means and standard devia-
tions of the participating group were biased estimations for those of the initial population.
We examined a domain where there is known heterogeneity in parenting practices: ask-
ing questions to teach. This line of research is grounded in a rich literature about informal
pedagogy (Bonawitz et al., 2011; Csibra & Gergely, 2009), which suggests that the format in
which parents and educators choose to present evidence to children influences how children
infer and learn. Konkret, recent experiments have shown that questions asked by knowl-
edgeable adults improve children’s learning (Gutwill & Allen, 2010, 2012; Haden, Cohen,
Uttal, & Marcus, 2015), and these “pedagogical questions” are particularly effective in facil-
itating children’s exploratory learning of causal properties of a novel artifact (Yu, Landrum,
Bonawitz, & Shafto, 2018). daher, in this study we examine whether asking pedagogical
questions may also be associated with parental consent, thus leading to biases in measurements
of children’s exploratory learning. We examined this hypothesis by replicating one condition
of the previous experiment (Yu et al., 2018), in which an experimenter asked a pedagogical
question about a novel toy before leaving children to explore that toy. Here we added a crit-
ical observation phase, in which parents’ pedagogical questions toward children were coded
along with other parent–child interaction measurements. This allows us to look for associations
between parents’ pedagogical questions and children’s exploratory learning in the test phase.
And because the observational data are available for children who did not participate, diese
associations could then be used to simulate what the not-participating children would have
done in the test. The final goal is to compare the results from the participating group and the
results we would have obtained if all parents consented. A shortened version of this research
was presented in the Proceedings of the 39th Annual Conference of the Cognitive Science So-
ciety (Yu, Bonawitz, & Shafto, 2017). This report includes results from new analyses and a new
survey, which resulted in a more robust and grounded method of evaluating biases related to
nonparticipation.
METHOD
Teilnehmer
We set up the study in two sites: an indoor reptile exhibit in a zoo, and an indoor playground.
We chose these two sites to ensure diversity in the population we initially observed (for details see
Supplemental Materials B; Yu et al., 2020). Seventy-eight parent–child dyads (41 von dem
zoo and 37 from the playground) were observed and then invited for the test. All children
were between 3 Und 6 Jahre alt. Thirty-one additional dyads were observed but were not
invited for the test, for reasons detailed in the Supplemental Materials B (Yu et al., 2020).
Verfahren
This study was approved by the Internal Review Board of Rutgers University-Newark. The ob-
servation phase was considered observation of public behavior based on the guidelines from
1 The multiple imputation procedure uses a multiple regression model fitted on the participating parents and
children to impute the behavior of the not-participating children. The propensity score matching–procedure se-
lects subsamples of the participating dyads who, judging from patterns of parent–child interaction, were unlikely
to participate, and data from these subsamples are then used to simulate the behavior of the not-participating
Kinder.
OPEN MIND: Discoveries in Cognitive Science
15
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
.
/
/
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
.
/
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
the Department of Health and Human Services, and therefore exempted from the requirement
of obtaining informed consent. During each trip to the test sites, three researchers collected
data from parent–child dyads in three phases: Two coders first observed and coded the inter-
actions between the parent and the child (observation phase). Then a third researcher invited
the dyad to participate in a test (recruitment phase). She and one of the coders conducted the
test if the dyad agreed to participate (test phase).2
Two coders who were blind to the study hypotheses pretended to be vis-
Observation Phase.
itors so that they could code parent–child interactions without the dyad’s awareness. Jede
dyad was observed for 5 minutes, during which the coders independently coded the length
of parent–child interactions and the frequency of parent–child communications. Length of
parent–child interactions was measured by the time period of dyadic activities (parent and
child engaging in the same activity), supervised activities (parent watching, following, or tak-
ing pictures of child when child is engaging in his or her own activities), and unsupervised
Aktivitäten (parent and child engaging in different activities). Frequency of parent–child commu-
nications was measured using an adaptation of the Dyadic Parent–Child Interaction Coding
System (Eyberg, Nelson, Ginn, Bhuiyan, & Boggs, 2013): The coders recorded the numbers
of parents’ questions, statements, and commands toward children. Critical to our interest, Par-
ents’ questions were further differentiated based on their functions (Yu, Bonawitz, & Shafto,
2016): Those used to help children learn were coded as “pedagogical questions,” whereas
those used to request information from children were coded as “information-seeking ques-
tions.” Interrater reliabilities were computed based on all observations, and were high across
all measurements: Interrater correlation r = .78 ∼ .84 for the length of parent–child interac-
tionen, and r = .79 ∼ .86 for the frequency of parent–child communications. The average of the
two coders’ codes were used for data analysis.
After the 5-minute observation, a third researcher who was blind to the
Recruitment Phase.
observation phase approached the parent and invited the parent–child dyad to participate in a
test. The recruitment procedure followed a script that resembled that of a typical developmental
Experiment. Among the 78 parent–child dyads that were observed, 59 agreed to participate
Und 19 abgelehnt. Der 19 dyads who refused to participate comprised the “not-participating”
Gruppe. The consent rate (75.6%) is similar to the average consent rate of onsite recruitment
indicated in our survey (84.4%). Of the 59 parents who agreed, data from the test phase were
available for 47 Kinder, who comprised the “participating” group (age 3.0y to 6.3y).
Parents and children who agreed to participate were led to a corner of the zoo
Test Phase.
exhibit or a separate room in the indoor playground, where the test was conducted by the
recruiter (acting as an experimenter) and one of the coders (acting as a confederate). Der
materials and procedure of the test were identical to the pedagogical question condition in
(2018), and details are included in Supplemental Materials E (Yu et al., 2020). Chil-
Yu et al.
dren were presented with a novel toy that, unbeknownst to them, has five functional parts. Der
experimenter explained that she knew all about the toy, then pointed to a button (which is the
trigger of one of the functions) and asked, “What does this button do?” Children were then left
alone to play with the toy until they stopped playing and signaled the researchers. The whole
phase was video-recorded.
2 Detailed procedures for each phase and the coding scheme for parent–child interactions can be found in
the Supplemental Materials C (Yu et al., 2020).
OPEN MIND: Discoveries in Cognitive Science
16
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
/
/
.
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
/
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
After data collection, the videos from the test phase were coded by a new
Video Coding.
research assistant who was blind to the observation phase and the hypotheses of the study. Sie
first determined the total time children spent playing with the toy, and then coded three mea-
surements regarding both the whole playing period, and the first minute after children started
playing: whether children activated the target function (the one triggered by the button), Die
number of unique actions they performed with the toy, and the number of nontarget functions
(out of 4) they activated. A second assistant, also blind to the observation phase and study
hypotheses, coded 14 (30%) of the videos for reliability. The interrater reliability agreement
was high for all measurements (rs and κs > .75; for details see Supplemental Materials F; Yu
et al., 2020). To better capture individual differences in children’s exploratory learning, Wir
further standardized all outcome variables across children and created two composite scores
for each child: Exploration variability is the sum of z-scores of all measurements during the
whole playing period. Exploration efficiency is the sum of z-scores of all measurements during
the first minute of play.
Data Analysis
Between-group comparisons, correlations, and regressions were conducted in IBM SPSS 22.
Fisher’s exact test was used for comparisons of frequencies. Model-based multiple imputation
was implemented with the Multiple Imputation module of SPSS. Bootstrapping and propensity
score matching was implemented with R 3.2.3. An α level of .05 (two-tailed) was used for all
tests.
ERGEBNISSE
The consent rates and children’s behavior in the test did not differ significantly across test sites
(for details see Supplemental Materials G; Yu et al., 2020), therefore data from the two sites
were combined.
Are Parent–Children Interactions Associated With Children’s Behavior in the Test?
Because previous research has suggested an association between parents’ pedagogical ques-
tions and children’s exploratory learning (Yu et al., 2018), we first examined correlations
between these measures. Results have confirmed our hypothesis: after controlling for test site
and age, children whose parents asked more pedagogical questions received higher scores in
both exploration variability and exploration efficiency, rs > .3, ps < (Figure 1). Also, children
of parents who spent more time interacting with them were more efficient in their exploration,
r(42) = .35, p = .021. On the other hand, measurements regarding the composition of the
group being observed (parent’s and child’s gender, and whether they were accompanied by
other adults or children) did not correlate with exploration variability or efficiency, ps > .01
(for details see Supplemental Materials H and Table S1; Yu et al., 2020). These results sug-
gest that patterns observed in parent–child interactions were indeed associated with children’s
exploratory learning during the test.
Are Parent–Child Interactions Associated With Participation?
We fitted a logistic regression model with participation as the outcome variable and the obser-
vational measurements as predictors (Table S2; Yu et al., 2020). Gesamt, the predictors were
2 = .297.
able to explain a significant amount of variance in participation, Nagelkerke’s R
Among individual predictors, we first examined the role of parental pedagogical questioning,
which has been shown to be associated with children’s behavior in the test. As predicted,
OPEN MIND: Discoveries in Cognitive Science
17
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
/
/
.
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
.
/
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
Figur 1. Children whose parents asked more pedagogical questions explored more variably
during the whole playing period (A), and also explored more efficiently during the first minute of
play (B). The shaded area depicts the 95% confidence interval, which means there is a 95% chance
that the true linear regression line of the population will lie within the area. This is different from a
95% prediction interval, which means there is a 95% chance that the real value of y corresponding
to certain x will lie within the area.
parents who asked more pedagogical questions during observation were also more likely to
have their children participate in the test, B = 1.49, p = .047. Zusätzlich, parents were more
also likely to have their boys participate than girls, B = 1.47, p = .032.
What Can Be Predicted for Children Who Did Not Participate?
Results so far have shown that the number of pedagogical questions parents asked children
predicted both the consent for children’s participation in a test and children’s behavior during
the test. This indicates that children’s participation and behavior may be related as well—that
Ist, if we had tested children whose parents did not consent them to participate, they may have
responded differently than children who did participate (Figur 2).
To test this hypothesis, we applied model-based multiple imputation to our data (Rubin,
2004). We first fitted regression models to predict the seven test measurements from the seven
observational measurements, based on data from the participating group. The resulting mod-
els were then used to predict behavior of the not-participating group stochastically for 100
independent runs of simulations.3
Results showed that across the 100 runs of simulations, the means of the imputed not-
participating groups were significantly lower than the participating group for five out of the
seven test measurements (Table S3; Yu et al., 2020). When we compare the imputed groups
to size-matched subsamples randomly chosen from the participating group (bootstrapping
groups, also resampled for 100 runs), these mean differences reached statistical significance for
all five measurements (Figure S1; Yu et al., 2020). This implies that the mean differences were
robust and not caused by the stochastic nature of the imputation procedure. Außerdem, alle
the mean differences were in the same direction—they all suggested that the not-participating
3 The rationale of using multiple imputation and the technical details of the implementation are provided in
the Supplemental Materials I and J (Yu et al., 2020).
OPEN MIND: Discoveries in Cognitive Science
18
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
.
/
/
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
.
/
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Inconvenient Samples Yu et al.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
/
.
/
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
/
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figur 2.
Illustration of why nonenrollment may lead to biased estimates of test measurements.
Consider the scenario where certain parental practices, such as asking pedagogical question (PQ),
is associated with both parental consent and children’s behavior in the test (z.B., exploration). In
this example, we know from the observation phase that parents who asked many PQs consist of a
larger proportion in the participating group (top-left) than in the not-participating group (top-right).
We also know from the test phase that children of those parents (shaded bar in bottom-left) scored
higher in exploration than children whose parents asked few PQs (open bar in the bottom-left).
Assuming a similar relationship between PQ and exploration in the not-participating group (bottom-
Rechts), the different compositions of the two groups with regard to parental PQ will result in different
averages of the test measurements: The mean estimate of exploration is lower in the simulated not-
participating group (bottom-right, dashed line) than in the participating group (bottom-left, dashed
Linie). daher, ignoring the not-participating group may result in an overestimation of children’s
exploration.
group would have learned and explored less with the toy. The effect sizes (Cohen’s d) of these
differences ranged from 0.08 Zu 0.23, which means that the systematic between-group differ-
ences could be as much as 23% of the pooled within-group standard deviations. In addition to
these mean differences, the standard deviations of the simulated not-participating groups are
significantly higher than that of the participating group for six out of the seven measurements
(Figure S1; Yu et al., 2020).
To verify our predictions from multiple imputation, we used propensity score matching
(PSM; Rosenbaum & Rubin, 1985) to select subsamples of the participating group that matched
the not-participating group in size as well as the probability to participate (d.h., although these
children actually participated in the test, the way their parents interacted with them during
observation resembled those who did not participate, thus resulting in a low propensity score
OPEN MIND: Discoveries in Cognitive Science
19
Inconvenient Samples Yu et al.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Ö
P
M
ich
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
ich
/
/
.
1
0
1
1
6
2
Ö
P
M
_
A
_
0
0
0
3
1
1
8
6
8
3
9
8
Ö
P
M
_
A
_
0
0
0
3
1
P
D
/
.
ich
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figur 3. Examples of comparisons of the estimated group means between the participating
group and the simulated not-participating group. Empirical = children who actually participated
in the test phase (n = 47); Bootstrap = randomly selected subsamples of the participating group
that matches the not-participating group in size (n = 19); PSM = subsamples of the participating
group that matches the not-participating group in size and propensity score (n = 19); Imputed =
simulations of the not-participating group using a model-based multiple imputation procedure (n =
19). To examine the robustness of simulation results, the Bootstrap, PSM, and Imputed groups were
resampled for m = 100 runs, and the standard error across all runs are shown in the figure as
the error bars. For all measurements except for total play time, the estimated group means of the
not-participating group were significantly lower than that of the participating group, indicating an
overestimation of children’s test performance by not considering those who were not consented
to participate. Figures of all measurements can be found in the Supplemental Materials (Figure S1;
Yu et al., 2020). *P < .05; **p < .01; ***p < .001.
for participation).4 Details about the matching methods are provided in the Supplemental Ma-
terials (Yu et al., 2020). Group means of test measurements from the PSM groups resembled
that of the imputed groups and not that of the participating group (Figure 3 and Figure S1;
Yu et al., 2020): For six out of seven measurements, the PSM groups had significantly lower
means than the bootstrapping groups, indicating that children whose parents interacted with
them like those who did not participate learned and explored less with the toy. At the same
time, the PSM groups did not significantly differ from the imputed groups for five out of these
six measurements, which validates the imputed results.
4 The rationale of using propensity score matching and the technical details of the implementation are provided
in the Supplemental Materials K and L (Yu et al., 2020).
OPEN MIND: Discoveries in Cognitive Science
20
Inconvenient Samples Yu et al.
Are Predictions From the Models Valid?
We performed cross-validation to confirm the validity of our multiple imputation and propen-
sity score–matching procedures. Specifically, we chose one factor that was uncorrelated to the
test measurements, children’s gender, and explored whether our models could recover hypo-
thetically “missing” data of one gender without any systematic biases.5 To do that, we randomly
selected half of the boys who participated in the study (n = 18) and falsely left out their testing
measurements. We then ran the multiple imputation and propensity score–matching proce-
dures on the remaining data to recover these left-out values, and compared the results to the
observed values. Results (Table S4 and Figure S2; Yu et al., 2020) showed that for six out of the
seven test measurements, the observed group mean fell within the 95% confidence interval
of the simulated group mean, and the effect sizes (Cohen’s d) of the differences between the
observation and simulation are less than 0.05 for all measurements. We also observed no sys-
tematic bias for the standard deviations of the simulated group as compared to the observed
group. These results suggest that the differences between the simulated not-participating group
and the observed participating group are not caused by biases inherent to our models.
What Are the Implications for Generalizing From the Participating Group to the Population?
By combining measurements from the participating group with those from the imputed not-
participating group, we can estimate the means and standard deviations of the initial popula-
tion during the observation phase, and then compare them to the participating group to look
for potential biases. Results showed such biases to exist in the means for five out of seven test
measurements, and in the standard deviations for all seven measurements (Table S3; Yu et al.,
2020). Compared to the initial population, focusing on the participating group results in con-
sistent overestimation of children’s performance in exploring the toy, as well as consistent
underestimation of individual differences.
DISCUSSION
This study takes a first step toward evaluating whether experimental findings from young chil-
dren can be generalized to the population, despite nonenrollment caused by lack of parental
consent. We estimated these potential biases by pairing a behavioral test with naturalistic ob-
servations of parent–child interactions prior to parental consent. Results have shown that a
specific parenting practice—asking questions to teach—is correlated with both parents’ ten-
dency to have their children participate in the test, and children’s exploratory learning during
the test. And since the observational data were also available for those who did not participate,
we were able to exploit these associations to simulate behavior for the not-participating chil-
dren. Results from model-based multiple imputation and propensity score matching showed
differences in group means between the participating and not-participating groups for five out
of the seven test measurements. Furthermore, the participating group showed a lower standard
deviation than the population for all test measurements.
It is worth noting that several assumptions underlie these simulated estimates. First, we
assumed no direct causal relation between parents’ decisions to have their children partici-
pate and children’s potential behavior in the test, therefore the behavior of not-participating
children can be considered missing at random (Rubin, 2004). This assumption is plausible
because parental values and practices associated with consent, not parental consent per se,
5 We thank an anonymous reviewer for suggesting this method of cross-validation.
OPEN MIND: Discoveries in Cognitive Science
21
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
i
.
/
/
1
0
1
1
6
2
o
p
m
_
a
_
0
0
0
3
1
1
8
6
8
3
9
8
o
p
m
_
a
_
0
0
0
3
1
p
d
/
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Inconvenient Samples Yu et al.
are the factors that causally influence children’s behavior. Second, our approach depends on
variations in parent–child interactions for both the participating and not-participating groups,
as well as a significant overlap between the two groups. This allows us to find subsamples of
the participating group that matches the not-participating group in their propensity to partic-
ipate, and also allows imputation to be done as interpolations within the ranges of empirical
support. In short, our methods to generalize experimental results are themselves subject to
usual conditions for generalization.
How much this new approach could be and should be implemented in developmental
experiments would also depend on various factors. First, our approach could be beneficial
for research settings that provide opportunities to observe and recruit from a relatively di-
verse population, such as in public spaces. Second, our approach could be more valuable
for domains in which parent–child interactions and children’s behavior are expected to be as-
sociated with enrollment via common predictors. In our case, both parental questioning and
children’s exploratory learning could relate to traits like curiosity, which may also predict fami-
lies’ tendency to enroll in research. We expect our findings to generalize better to domains that
correlate highly with traits like these (e.g., prosocial behavior), compared to domains that are
less correlated (e.g., basic perceptual development). Third, our approach is more important for
research that focuses on how children typically perform in certain tasks (e.g., research aiming
to provide normative data). Sampling bias may be less of a concern for research that focuses on
competence (e.g., showing that at least some children can demonstrate certain capacities in
an ideal context). Finally, pre-consent observations are ethically viable only for public actions,
and the complexity and quality of coding may be limited by what can be observed without the
dyad’s awareness.
In cases where our approach can be applied, it could benefit the interpretation and
generalization of experimental findings in several ways: First, it could reveal correlations be-
tween parent–child interactions and children’s behavior, which may help explain the cognitive
mechanisms and environmental inputs associated with the observed behavior. Second, it could
inform the generalizability of experimental findings to children whose parents did not consent
them to participate. Third, it can serve as an empirical base for future research to recruit a more
representative sample. By knowing the associations between parental consent and patterns in
parent–child interaction, it may be possible to intentionally focus recruitment on parent–child
dyads who are likely underrepresented in typical recruitment procedures.
Our findings support and add to the recent concern about persistent sampling biases in
psychology broadly (Arnett, 2008; Henrich, Heine, & Norenzayan, 2010), and developmental
psychology in particular (LeWinn, Sheridan, Keyes, Hamilton, & McLaughlin, 2017; Nielsen,
Haun, Kärtner, & Legare, 2017). We show that in addition to the biases resulted from unrep-
resentative pools of participants researchers usually recruit from, the process of recruitment
itself may also skew the sample. Specifically, we show that variations in parenting practices
may directly associate participation with measurements of children’s behavior, which may ex-
plain some of the sampling biases associated with indirect factors such as culture, race, and
socioeconomic status. Therefore, although existing sampling techniques (such as stratified sam-
pling across sociocultural factors) and analytical tools (such as weighting adjustments) could
help in balancing who comprises a research sample, understanding how findings from a re-
search sample can be generalized or not may require measurements of factors more closely
related to children’s behavior, such as patterns of parent–child interactions. One way to do
that, as suggested by this study, is to pair experimental studies with naturalistic observations of
parent–child interactions.
OPEN MIND: Discoveries in Cognitive Science
22
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
i
/
.
/
1
0
1
1
6
2
o
p
m
_
a
_
0
0
0
3
1
1
8
6
8
3
9
8
o
p
m
_
a
_
0
0
0
3
1
p
d
/
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Inconvenient Samples Yu et al.
Our results may also have implications for developmental theories. Many developmental
theories are built upon findings from experiments, as experimental designs have advantages in
addressing a range of developmental questions, such as depicting developmental trajectories,
disentangling causal mechanisms underlying children’s behavior, and testing causal effects of
interventions. In typical cases, random assignment of participants across groups removes un-
wanted systematic differences between groups, so that the effects of age, manipulations, or
treatments can be estimated by comparing between-group differences with within-group dif-
ferences. However, our results have shown that parental consent may bias these comparisons
in two ways that random assignment cannot solve: First, it could lead to an underestimation
of within-group variations, and thus Type I errors may be underestimated and effect sizes
may be overestimated. Second, compared to the general population, children who received
consent may be more susceptible or insusceptible to certain manipulations or treatments, there-
fore biasing the estimation of the between-group differences. Because theories built upon
experimental findings often guide real-world practices that apply to the general population,
understanding factors and biases associated with nonenrollment is essential when interpreting
and applying these findings.
To conclude, this study provides an empirical demonstration that preschool children with
and without parental consent to participate in research may behave differently in experiments.
In addition, we provided a method that could be used to estimate the biases in experimental
results that are related to parental consent.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
i
.
/
/
1
0
1
1
6
2
o
p
m
_
a
_
0
0
0
3
1
1
8
6
8
3
9
8
o
p
m
_
a
_
0
0
0
3
1
p
d
/
.
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
ACKNOWLEDGMENTS
We thank Reham Bader, Merna Botros, Milagros Grados, Anishka Jean, and Natasha Patel for
help in testing and coding data. We also thank Vanessa LoBue and members of the Child Study
Center for helpful feedback on an earlier draft of this manuscript. We thank all participating
children and their parents.
FUNDING INFORMATION
PS, National Science Foundation (http://dx.doi.org/10.13039/501100008982), Award ID: SMA-
1640816. EB, National Science Foundation (http://dx.doi.org/10.13039/501100008982), Award
ID: ECR-1660885. EB, Jacobs Foundation (http://dx.doi.org/10.13039/501100003986).
AUTHOR CONTRIBUTIONS
YY: Conceptualization: Supporting; Data curation: Lead; Formal analysis: Lead; Methodol-
ogy: Lead; Visualization: Lead; Writing—Original Draft: Lead; Writing—Review & Editing:
Lead. PS: Conceptualization: Equal; Funding acquisition: Equal; Supervision: Equal; Writing—
Review & Editing: Supporting. EB: Conceptualization: Equal; Funding acquisition: Equal; Su-
pervision: Equal; Writing—Review & Editing: Supporting.
REFERENCES
Anderman, C., Cheadle, A., Curry, S., Diehr, P., Shultz, L., & Wagner,
E. (1995). Selection bias related to parental consent in school-
based survey research. Evaluation Review, 19, 663–674. doi:10.
1177/0193841X9501900604
Arnett, J. J. (2008). The neglected 95%: Why American psychol-
ogy needs to become less American. American Psychologist, 63,
602–614. doi:10.1037/0003-066X.63.7.602
Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke,
(2011). The double-edged sword of ped-
E., & Schulz, L.
Instruction limits spontaneous exploration and dis-
agogy:
covery. Cognition, 120, 322–330. doi:10.1016/j.cognition.2010.
10.001
Bornstein, M. H. (1991). Cultural approaches to parenting. Hillsdale,
NJ: Erlbaum.
OPEN MIND: Discoveries in Cognitive Science
23
Inconvenient Samples Yu et al.
Dent, C. W., Galaif,
Csibra, G., & Gergely, G. (2009). Natural pedagogy. Trends in Cog-
nitive Sciences, 13(4), 148–153. doi:10.1016/j.tics.2009.01.005
J., Sussman, S., Stacy, A., Burtun, D., &
Flay, B. R. (1993). Demographic, psychosocial and behavioral
differences in samples of actively and passively consented ado-
lescents. Addictive Behaviors, 18(1), 51–56. doi:10.1016/0306-
4603(93)90008-W
Esbensen, F.-A., Miller, M. H., Taylor, T., He, N., & Freng, A. (1999).
Differential attrition rates and active parental consent. Evaluation
Review, 23, 316–335. doi:10.1177/0193841X9902300304
Eyberg, S., Nelson, M., Ginn, N., Bhuiyan, N., & Boggs, S. (2013).
Dyadic parent-child interaction coding system (DPICS): Compre-
hensive manual for research and training (4th ed.). Gainesville,
FL: PCIT International.
Goffman, E. (1966). Behavior in public places: Notes on the social
organization of gatherings. New York, NY: Simon & Schuster.
Gutwill, J. P., & Allen, S. (2010). Facilitating family group inquiry at
science museum exhibits. Science Education, 94, 710–742. doi:
10.1002/sce.20387
Gutwill, J. P., & Allen, S. (2012). Deepening students’ scientific
inquiry skills during a science museum field trip. Journal of
the Learning Sciences, 21(1), 130–181. doi:10.1080/10508406.
2011.555938
Haden, C. A., Cohen, T., Uttal, D. H., & Marcus, M. (2015). Building
learning: Narrating and transferring experiences in a children’s
museum. In D. Sobel & J. Jipson (Eds.), Cognitive development
in museum settings: Relating research and practice (pp. 84–103).
New York, NY: Routledge.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest peo-
ple in the world? Behavioral and Brain Sciences, 33(2–3), 61–83.
doi:10.1017/S0140525X0999152X
Hoff, E., Laursen, B., Tardif, T., & Bornstein, M. (2002). Socio-
economic status and parenting. In M. H. Bornstein (Ed.), Hand-
book of parenting, Vol. 2: Biology and ecology of parenting
(pp. 231–252). Mahwah, NJ: Erlbaum.
Kearney, K. A., Hopkins, R. H., Mauss, A. L., & Weisheit, R. A.
(1983). Sample bias resulting from a requirement for written
parental consent. Public Opinion Quarterly, 47(1), 96–102. doi:
10.1086/268769
LeWinn, K. Z., Sheridan, M. A., Keyes, K. M., Hamilton, A., &
McLaughlin, K. A. (2017). Sample composition alters associa-
tions between age and brain structure. Nature Communications,
8(1), 874. doi:10.1038/s41467-017-00908-7
Lueptow, L., Mueller, S. A., Hammes, R. R., & Master, L. S. (1977).
The impact of informed consent regulations on response rate and
response bias. Sociological Methods & Research, 6(2), 183–204.
doi:10.1177/004912417700600204
Nielsen, M., Haun, D., Kärtner, J., & Legare, C. H. (2017). The persis-
tent sampling bias in developmental psychology: A call to action.
Journal of Experimental Child Psychology, 162, 31–38. doi:10.
1016/j.jecp.2017.04.017
Ridge, K. E., Weisberg, D. S., Ilgaz, H., Hirsh-Pasek, K. A., &
Golinkoff, R. M.
Increasing talk
(2015). Supermarket speak:
among low-socioeconomic status families. Mind, Brain, and
Education, 9(3), 127–135. doi:10.1111/mbe.12081
Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a con-
trol group using multivariate matched sampling methods that in-
corporate the propensity score. The American Statistician, 39(1),
33–38. doi:10.2307/2683903
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys.
Hoboken, NJ: Wiley.
Whiten, A., Allan, G., Devlin, S., Kseib, N., Raw, N., & McGuigan,
N. (2016). Social learning in the real-world: “over-imitation” oc-
curs in both children and adults unaware of participation in an
experiment and independently of social interaction. PloS One,
11, e0159920. doi:10.1371/journal.pone.0159920
Yu, Y., Bonawitz, E., & Shafto, P. (2016). Questions in informal teach-
ing: A study of mother-child conversations. In Proceedings of
the 38th Annual Conference of the Cognitive Science Society
(pp. 1086–1091). Austin, TX: Cognitive Science Society.
Yu, Y., Bonawitz, E., & Shafto, P. (2017). Inconvenient samples: Mod-
eling the effects of non-consent by coupling observational and
experimental results. In Proceedings of the 39th Annual Meeting
of the Cognitive Science Society (pp. 1406–1411). Austin, TX:
Cognitive Science Society.
Yu, Y., Landrum, A., Bonawitz, E., & Shafto, P. (2018). Questioning
supports effective transmission of knowledge and increased ex-
ploratory learning in pre-kindergarten children. Developmental
Science, 21, e12696.
Yu, Y., Shafto, P., & Bonawitz, E. (2020). Supplemental material for
“Inconvenient samples: Modeling biases related to parental con-
sent by coupling observational and experimental results.” Open
Mind: Discoveries in Cognitive Science, 4. doi:10.1162/opmi_a_
00031
OPEN MIND: Discoveries in Cognitive Science
24
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
o
p
m
i
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
i
/
.
/
1
0
1
1
6
2
o
p
m
_
a
_
0
0
0
3
1
1
8
6
8
3
9
8
o
p
m
_
a
_
0
0
0
3
1
p
d
.
/
i
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3