David Blázar

David Blázar

Department of Teaching and

Aprendiendo, Policy and

Leadership

Universidad de educación

Universidad de Maryland

parque universitario, Maryland 20742

dblazar@umd.edu

VALIDATING TEACHER EFFECTS ON

STUDENTS’ ATTITUDES AND BEHAVIORS:

EVIDENCE FROM RANDOM ASSIGNMENT

OF TEACHERS TO STUDENTS

Abstracto
There is growing interest among researchers, policy makers, y
practitioners in identifying teachers who are skilled at improving
student outcomes beyond test scores. Sin embargo, questions remain
about the validity of these teacher effect estimates. Leveraging
the random assignment of teachers to classes, I find that teach-
ers have causal effects on their students’ self-reported behavior in
class, self-efficacy in math, and happiness in class that are similar
in magnitude to effects on math test scores. Weak correlations be-
tween teacher effects on different student outcomes indicate that
these measures capture unique skills that teachers bring to the
aula. Teacher effects calculated in nonexperimental data are
related to these same outcomes following random assignment, re-
vealing that they contain important information content on teach-
ers. Sin embargo, for some nonexperimental teacher effect estimates,
large and potentially important degrees of bias remain. These re-
sults suggest that researchers and policy makers should proceed
with caution when using these measures. They likely are more
appropriate for low-stakes decisions—such as matching teachers
to professional development—than for high-stakes personnel de-
cisions and accountability.

This paper is part of a series invited by this journal, in which authors present results of dissertation research
receiving the Jean Flanigan Outstanding Dissertation Award from the Association for Education Finance and
Política.

https://doi.org/10.1162/edfp_a_00251

© 2018 Asociación para la política y las finanzas educativas

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

F

.

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

281

Validating Teacher Effects

INTRODUCCIÓN

1 .
Decades’ worth of research on education production have narrowed in on the impor-
tance of teachers to student outcomes (Murnane and Phillips 1981; Todd and Wolpin
2003). Over the last several years, these studies have coalesced around two key findings.
Primero, teachers vary considerably in their abilities to improve students’ academic perfor-
mance (Nye, Konstantopoulos, and Hedges 2004; Hanushek y Rivkin 2010), cual
in turn influences a variety of long-term outcomes including teenage pregnancy rates,
college attendance, and earnings in adulthood (Chetty, Friedman, and Rockoff 2014b).
Segundo, experimental and quasi-experimental studies indicate that “value-added” ap-
proaches to estimating teachers’ contributions to student test scores are valid ways to
identify effective teachers (Kane and Staiger 2008; Kane y otros. 2013; Chetty, Friedman,
and Rockoff 2014a; Glazerman and Protik 2015; Bacher-Hicks et al. 2017). En otra
palabras, on average, these teacher effect estimates are not confounded with the non-
random sorting of teachers to students, the specific set of students in the classroom,
or factors beyond teachers’ control. Policy makers have taken notice of these findings,
leading to widespread changes in teacher evaluation, compensación, and promotion.

While these studies have focused predominantly on teachers’ impact on students’
academic performance, the research community is starting to collect evidence that
teachers also vary in their contributions to a variety of other student outcomes in ways
that are only weakly related to their effects on test scores (Jennings and DiPrete 2010;
Jackson 2012; Gershenson 2016; kraft, próximo). Por ejemplo, in work drawing
on the study that generated data used in this paper, Blazar and Kraft (2017) found that
teachers identified as 1 standard deviation (Dakota del Sur) above the mean in the distribution of ef-
fectiveness improved students’ self-reported behavior in class, self-efficacy in math, y
happiness in class by between 0.15 SD to 0.30 Dakota del Sur. These effects are similar to or larger
than teacher effects on students’ test scores (Hanushek y Rivkin 2010). Sin embargo,
teachers who were effective at improving these outcomes often were not equally effec-
tive at improving students’ math test scores, with correlations between teacher effect
estimates no higher than 0.19. Jackson (2012) came to similar conclusions using addi-
tional student outcomes, and also found that teacher effects on non-tested outcomes
captured in ninth grade predicted longer-run outcomes, including high school comple-
tion above and beyond teachers’ effects on test scores (Jackson 2016). Juntos, estos
findings lend empirical evidence to the multidimensional nature of teaching and, de este modo,
the need for policy makers to account for this sort of complexity.

Given that the research base examining teachers’ contributions to student outcomes
beyond test scores is relatively new, important questions remain about the validity of
estas medidas. In the value-added literature more broadly, researchers have asked
about the sensitivity of teacher effects to different model specifications and the spe-
cific set of covariates included in the model (Goldhaber and Theobald 2012), también
as the most appropriate ways to calculate these scores in light of measurement error
(Guarino et al. 2015). Más, it is not clear whether the key identifying assumption un-
derlying the estimation of teacher effects—that estimates are not biased by nonrandom
sorting of students to teachers (Kane y otros. 2013; Chetty, Friedman, and Rockoff 2014a)
holds when test scores are replaced with other student outcomes. Researchers who es-
timate value added to students’ test scores typically control for prior test scores because
they capture many of the predetermined factors that also affect current achievement,

282

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

/

.

F

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

David Blázar

including the schools students attend, their neighborhoods, and the family members
with whom they interact. Sin embargo, it is possible there are additional factors not cap-
tured by prior test scores or by prior measures of the outcome variable that lead to bias
in teacher effects on other student outcomes beyond test scores.

I examine these issues by drawing on a dataset in which participating students com-
pleted a survey that asked about a range of attitudes and behaviors in class. In the third
año del estudio, a subset of participating teachers (norte = 41) was randomly assigned
to class rosters within schools. Juntos, these data allow me to examine the extent to
which teachers vary in their contribution to students’ attitudes and behaviors, even after
random assignment; the sensitivity of teacher effects on students’ attitudes and behav-
iors to different model specifications, including those that control for students’ prior
academic performance versus prior attitudes and behaviors; y, al final, si
nonexperimental estimates of teacher effects on these attitudes and behaviors predict
these same outcomes following random assignment, which produces a measure of
forecast bias.

Findings indicate that teachers have causal effects on students’ self-reported behav-
ior in class, self-efficacy in math, and happiness in class. The magnitude of the teacher-
level variation on these outcomes is similar to or larger than effects on math test scores
(p.ej., Hanushek y Rivkin 2010). Weak correlations between teacher effects on differ-
ent student outcomes indicate that these measures capture unique skills that teachers
bring to the classroom. Sin embargo, en algunos casos, value-added approaches to estimating
these teacher effects appear to be insufficient to account for all sources of bias. Uno
exception is teacher effects on students’ behavior in class, where predicted differences
perfectly predict actual differences following random assignment. In the observational
portion of this study, teacher effects are not particularly sensitive to models that control
for students’ prior achievement, student demographic characteristics, or prior survey
respuestas. Given that these are the tools and data typically available to the econometri-
cian, it is not clear that bias could easily be reduced further. Sucesivamente, it will be important
for researchers and policy makers to use these estimates of teacher effectiveness with
caution. In the Conclusion, I describe some potential uses of these measures, focus-
ing on low-stakes decision making (such as matching teachers to professional develop-
mento) rather than high-stakes decisions (such as teacher evaluation and promotion).

2 . VA L I DAT I N G M E T H O D S F O R E S T I M AT I N G T E AC H E R E F F E C T S

O N S T U D E N T O U T C O M E S

Over the last decade, several experimental and quasi-experimental studies have tested
the validity of nonexperimental methods for estimating teacher effects on student
logro. In the first of these, Kane and Staiger (2008) described the rationale
and setup for such a study: “Non-experimental estimates of teacher effects attempt
to answer a very specific question: If a given classroom of students were to have
teacher A rather than teacher B, how much different would their average test scores
be at the end of the year?" (pag. 1). Sin embargo, as these sorts of teacher effects estimates
are derived from conditions where nonrandom sorting is the norm (Clotfelter, muchacho,
y vigdor 2006; Rothstein 2010), these models assume that statistical controls (p.ej.,
students’ prior achievement, demographic characteristics) are sufficient to isolate the

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

F

/

/

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

F

.

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

283

Validating Teacher Effects

talents and skills of individual teachers rather than “principals’ preferential treatment
of their favorite colleagues, ability-tracking based on information not captured by prior
resultados de las pruebas, or the advocacy of engaged parents for specific teachers” (Kane and Staiger
2008, pag. 1).1

Random assignment of teachers to classes offers a way to test this assumption.
If nonexperimental teacher effects are causal estimates that capture true differences
in quality between teachers, then nonexperimental or predicted differences should be
igual, on average, to actual differences following the random assignment of teachers to
classes. En otras palabras, a 1 SD increase in predicted differences in achievement across
classrooms should result in a 1 SD increase in observed differences, on average. Esti-
mates that are statistically significantly greater than 0 SD indicate that nonexperimen-
tal teacher effects contain some information content about teachers’ underlying talents
and skills. Sin embargo, deviations from the 1:1 relationship would signal that these scores
also are influenced by factors beyond teachers’ control, including students’ background
and skill, the composition of students in the classroom, or strategic assignment poli-
cíes. These deviations often are referred to as “forecast bias.”

Results from Kane and Staiger (2008) and other experimental studies (Kane y otros.
2013; Glazerman and Protik 2015; Bacher-Hicks et al. 2017) have accumulated to pro-
vide strong evidence against bias in teacher effects on students’ test scores. Pooling
results from three experimental studies with the same research design (es decir., profesores
randomly assigned to class rosters within schools2), Bacher-Hicks et al. (2017) found an
estimate of 0.96 SD relating predicted, nonexperimental teacher effects on students’
math achievement to actual differences in this same outcome following random as-
signment. Predicted teacher effects were calculated from models that controlled for
students’ prior achievement. Given the nature of their meta-analytic approach, the stan-
dard error around this estimate (0.099) was much smaller than in each individual
estudiar, and the corresponding 95 percent confidence interval included 1 Dakota del Sur, indicando
little or no bias. This result was quite similar to findings from quasi-experimental stud-
ies in much larger administrative datasets, which leveraged plausibly exogenous varia-
tion in teacher assignments due to staffing changes at the school-grade level (Bacher-
Hicks, kane, and Staiger 2014; Chetty, Friedman, and Rockoff 2014a).

Following a long line of inquiry around the sensitivity of value-added estimates to
different model specifications and which may be most appropriate for policy (Aaronson,
Barrow, and Sander 2007; Newton et al. 2010; Goldhaber and Theobald 2012; Blázar,
Litke, and Barmore 2016), many of these studies also examined the predictive validity of
alternative methods for estimating teacher effects. Por ejemplo, some have advocated
for controlling for the composition of students in the classroom, which is thought to
influence test scores beyond teachers themselves (Hanushek et al. 2003; Kupermintz
2003). Others have specified models that only compare teachers within schools in order

1. See Bacher-Hicks et al. (2017) for an analysis of persistent sorting in the classroom data used in this study.
2.

In a fourth experimental study, Glazerman and Protik (2015) exploited random assignment of teachers across
schools as part of a merit pay program. Allá, findings were more mixed. In the elementary school sample, el
authors estimated a standardized effect size of roughly 1 SD relating nonexperimental value-added scores (stack-
ing across math and reading) to student test scores following random assignment. Sin embargo, in their smaller
sample, the standard error was large (0.34), meaning they could not rule out potentially large degrees of bias.
Más, in the middle school sample, they found no statistically significant relationship between nonexperi-
mental and experimental teacher effect estimates.

284

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

F

/

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

F

.

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

David Blázar

to limit bias due to sorting of teachers and students across schools (Rivkin, Hanushek,
and Kain 2005); sin embargo, this approach can lead to large differences in teacher rank-
ings relative to models that compare teachers across schools (Goldhaber and Theobald
2012). The general conclusion across validation studies is that controlling for students’
prior achievement is sufficient to account for the vast majority of bias in teacher effect
estimates on achievement (Chetty, Friedman, and Rockoff 2014a; Kane y otros. 2013; kane
and Staiger 2008).

To my knowledge, only one study has examined the predictive validity of teacher
effects on student outcomes beyond test scores.3 Drawing on the quasi-experimental
design described by Chetty, Friedman, and Rockoff (2014a), Backes and Hansen (2018)
examined the validity of teacher effects on a range of observed school behaviors cap-
tured in administrative records. They found that teacher effects on students’ suspen-
sions and percent of classes failed did not contain bias when pooling across all grade
niveles. Sin embargo, teacher effects on unexcused absences, grade point average, and on-
time grade progression did contain moderate to large degrees of bias, as least in some
grade levels. For teacher effects on both unexcused absences and on-time grade pro-
gression, predicted differences at the elementary level overstated actual differences (es decir.,
coefficient less than 1 Dakota del Sur), likely due to sorting of higher-performing students to higher-
performing teachers in a way that could not be controlled for in the model. The opposite
was true at the high school level, where predicted differences in teacher effectiveness
understated actual differences (es decir., coefficient greater than 1 Dakota del Sur). This suggests that
bias in teacher effects on outcomes beyond test scores may not be easily quantified or
classified across contexts.

3 . DATA A N D S A M P L E
As in Blazar and Kraft (2017) and Bacher-Hicks et al. (2017), this paper draws on data
from the National Center for Teacher Effectiveness (NCTE), whose goal was to develop
valid measures of effective teaching in upper-elementary mathematics. Over the course
of three school years (2010–11 through 2012–13), the project collected data from partic-
ipating fourth- and fifth-grade teachers (norte = 310) in four anonymous districts from
three states on the east coast of the United States. Participants were generalists who
taught all subject areas. This is important, as it provided an opportunity to estimate
the contribution of individual teachers to students’ attitudes and behaviors that was not
confounded with the effect of another teacher with whom a student engaged in the
same year. Teacher–student links were verified for all study participants based on class
rosters provided by teachers.

Measures of students’ attitudes and behaviors came from a survey administered in
the spring of each school year (see Appendix table A.1 for survey item text and descrip-
tive statistics). Based on theory and exploratory factor analyses (see Blazar and Kraft

3. Two additional studies have examined teacher effects on students’ attitudes and behaviors using the random
assignment portion of the Measures of Effective Teaching (MET) proyecto. kraft (próximo) found sizeable
teacher effects on students’ grit, growth mindset, and effort in class (0.10 a 0.17 Dakota del Sur). Correlations between
teacher effects on students’ academic performance versus effects on other outcomes were no higher than 0.22.
Kane y otros. (2013) found that a composite measure of teacher effectiveness based on observational data predicted
student effort following random assignment. Sin embargo, measures of students’ attitudes and behaviors were col-
lected in only one year. Por lo tanto, it was not possible to relate teacher effects calculated under nonexperimental
conditions to teacher effects on this same outcome calculated under experimental ones.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

F

/

/

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

F

.

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

285

Validating Teacher Effects

2017), I divided items into three constructs: Behavior in Class (internal consistency re-
liability [a] es 0.74), Self-Efficacy in Math (un = 0.76), and Happiness in Class (un = 0.82).
Teacher reports of student behavior and self-reports of versions of the latter two con-
structs have been linked to labor market outcomes even controlling for cognitive ability
(Lyubomirsky, Rey, and Diener 2005; Mueller and Plug 2006; Chetty et al. 2011), lend-
ing strong consequential validity to these metrics. Blazar and Kraft (2017) describe ad-
ditional validity evidence, including convergent validity, for these constructs. For each
of these outcomes, I created final scales by reverse coding items with negative valence,
averaging student responses across all available items, and then standardizing to mean
de 0 and SD of 1.4 Standardization occurred within school year but across grades.

Student demographic and achievement data came from district administrative
records. Demographic data included sex, raza/etnicidad, gratis- or reduced-price lunch
(FRPL) elegibilidad, limited English proficiency (LEP) estado, and special education
(SPED) estado. These records also included current- and prior-year test scores in math
and reading on state assessments, which were standardized within district by grade,
sujeto, and year using the entire population of students in each district, calificación, sub-
ject, y año.

I focus on two subsamples from the larger group of 310 profesores. The primary ana-
lytic sample includes the subset of 41 teachers who were part of the random assignment
portion of the NCTE study in the third year of data collection. I describe this sample
and the experimental design in the next section. The second sample includes the set of
estudiantes (and their teachers) who took the project-administered survey in both the cur-
rent and prior years. This allowed me to test the sensitivity of teacher effect estimates to
different model specifications, including those that controlled for students’ prior survey
respuestas, from a balanced sample of teachers and students. As noted previously, el
student survey only was administered in the spring of each year; por lo tanto, this sample
consisted of the group of fifth-grade teachers who happened to have students who also
were part of the NCTE study in the fourth grade (norte = 51 profesores; norte = 548 estudiantes).5
Generally, I found that average teacher characteristics, including their sex, carrera,
math course taking, math knowledge, route to certification, years of teaching expe-
rience, and value-added scores calculated from state math tests were similar across
muestras (ver tabla 1).6 Given that teachers self-selected into the NCTE study, I also

4. For all three outcomes, composite scores that average across raw responses are correlated at 0.99 and above

with scales that incorporate weights from the factor analysis.

5. This sample size is driven by teachers whose students had current- and prior-year survey responses for Happiness
in Class, which was only available in two of the three years of the study. Additional teachers and students had
current- and prior-year data for Behavior in Class (norte = 111) and Self-Efficacy in Math (norte = 108), both of which were
available in all three years of the study. For consistency, I limit this sample to teachers and students who had
current- and prior-year scores for all three survey measures. I did not place any restriction on the number of
students each teacher needed to have in order to be included in this sample. Although others advocate excluding
teachers with fewer than five students per class (Kane and Staiger 2008), Por ejemplo, this would have further
reduced my sample. En cambio, I rely on the fact that shrinkage estimators shrink estimates more for teachers
with few students than for teachers with larger classes.

6. Background information on teachers was captured on a questionnaire administered in the fall of each year. Sur-
vey items included years teaching math, route to certification, amount of undergraduate or graduate coursework
in math and math courses for teaching (1 = No classes, 2 = One or two classes, 3 = Three to five classes, 4 =
Six or more classes). Por simplicidad, I averaged these last two items to form one construct capturing teachers’
mathematics coursework. Más, the survey included a test of teachers’ mathematical content knowledge,
with items from both the Mathematical Knowledge for Teaching assessment and the Massachusetts Test for

286

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

.

/

F

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

David Blázar

Mesa 1. Demographic Characteristics of Participating Teachers

Experimental Sample

Nonexperimental Sample

District Populations

Full NCTE
Sample

Significar

p-value on
Diferencia

Significar

p-value on
Diferencia

Significar

p-value on
Diferencia

Male

African American

asiático

Hispano

Blanco

Mathematics coursework

Mathematical content knowledge

Alternative certification

Teaching experience

Value added on state math test

p-value on joint test

Maestros

0.16

0.22

0.03

0.03

0.65

2.58

0.01

0.08

11.04

0.02

310

0.15

0.18

0.05

0.03

0.70

2.62

0.05

0.08

14.35

0.00

0.95

0.529

0.408

0.866

0.525

0.697

0.816

0.923

0.005

0.646

0.533

0.19

0.24

0.00

0.02

0.67

2.54

0.07

0.12

11.44

0.01

0.604

0.790

0.241

0.686

0.807

0.735

0.671

0.362

0.704

0.810

0.958

0.00

0.065

NA

41

51

3,454

Nota: p-value refers to difference from the full National Center for Teacher Effectiveness (NCTE) sample.

tested whether these samples differed from the full population of fourth- and fifth-
grade teachers in each district with regard to value-added scores on the state math test
(see equation 1 for more details on these value-added predictions). Although I found a
marginally significant difference between the full NCTE sample and the district popu-
laciones (pag = 0.065), I found no difference between the district populations and either
the experimental or nonexperimental subsamples used in this analysis (pag = 0.890 y
0.652, respectivamente; not shown in table 1). These similarities lend external validity to my
findings.

4 . E X P E R I M E N TA L D E S I G N
en la primavera de 2012, the NCTE project team worked with staff at participating schools
to randomly assign sets of teachers to class rosters of the same grade level (es decir., fourth
or fifth grade) that were constructed by principals or school leaders. To be eligible for
randomization, teachers had to work in schools and grades in which there was at least
one other participating teacher. Además, their principal had to consider these teach-
ers as capable of teaching any of the rosters of students designated for the group of
profesores.

In order to fully leverage this experimental design, it was important to limit the
most pertinent threats to internal validity: attrition and noncompliance among partic-
ipating teachers and students (Murnane and Willett 2011). My general approach was
to focus on randomization blocks in which attrition and noncompliance were not a
concern. As these blocks are analogous to individual experiments, dropping individual
ones should not threaten the internal validity of results. Primero, I restricted the sample to
blocks where teachers and their randomization block partner(s) had both current-year

Educator Licensure. Teacher scores were generated by IRTPro software and standardized in these models, con
a reliability of 0.92. For more information about these constructs, see Hill, Blázar, and Lynch 2015.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

F

/

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

287

Validating Teacher Effects

student outcomes and prior-year, nonexperimental teacher effect estimates. Of the orig-
inal 79 teachers who agreed to participate and were randomly assigned to class rosters
within schools7, I dropped seven teachers who left the study before the beginning of
the 2012–13 school year for reasons unrelated to the experiment (es decir., leaving the district
or teaching, maternity leave, change in teaching assignment); eleven teachers who only
were part of the study in the third year and, por lo tanto, did not have the necessary data
from prior years to calculate nonexperimental teacher effects on students’ attitudes and
behaviors; and seven teachers whose random assignment partner(s) left the study for
either of the two reasons above.8

Próximo, I restricted the remaining sample to randomization blocks with low levels
of noncompliance among participating students. Aquí, noncompliance refers to the
fact that some students switched out of their randomly assigned teacher’s classroom.
Other studies that exploit random assignment between teachers and students have ac-
counted for this form of noncompliance through instrumental variables estimation and
calculation of treatment on the treated (Kane y otros. 2013; Glazerman and Protik 2015;
Bacher-Hicks et al. 2017). Sin embargo, this approach was not possible in this study, given
that students who transferred out of an NCTE teacher’s classroom no longer had sur-
vey data to calculate teacher effects on these outcomes. Más, I would have needed
to have prior student survey responses for these students’ actual teachers, which I did
no. In total, 28 percent of students moved out of their randomly assigned teachers’
aula (see Appendix table A.2 for information on reasons for and patterns of non-
compliance). Al mismo tiempo, noncompliance was nested within a small subset of six
randomization blocks. In these blocks, rates of noncompliance ranged from 40 por ciento
a 82 percent due primarily to principals and school leaders who made changes to the
originally constructed class rosters. By eliminating these blocks, I am able to focus on
a sample with a much lower rate of noncompliance (11 por ciento) and where patterns of
noncompliance are much more typical. The remaining eighteen blocks had a total of
67 noncompliers and an average rate of noncompliance of 9 percent per block; tres
randomization blocks had full compliance.

En mesa 2, I confirm the success of the randomization process among the teach-
ers in my final analytic sample (norte = 41) and the students on their randomly assigned
rosters (norte = 598).9 In a traditional experiment, one can examine balance at baseline
by calculating differences in average student characteristics between the treatment and
control groups. In this context, aunque, treatment consisted of multiple possible teach-
ers within a given randomization block. De este modo, to examine balance, I examined the rela-
tionship between the assigned teacher’s predicted effectiveness at improving students’

7. Two other teachers from the same randomization block also agreed to participate. Sin embargo, the principal de-
cided that it was not possible to randomly assign rosters to these teachers. De este modo, I exclude them from all anal-
yses.

8. One concern with dropping teachers in this way is that they may differ from other teachers on post-
randomization outcomes, which could bias results. Comparing attriters for whom I had post-randomization
datos (norte = 21, which excludes the four teachers who either left teaching, left the district, moved to third grade
and therefore out of my dataset, or were on maternity leave) to the remaining teachers (norte = 54) on their ob-
served effectiveness at raising students’ math achievement in the 2012–13 school year, I found no difference (pag =
0.899). Más, to ensure strong external validity, I compared attriters to the experimental sample on each of
the teacher characteristics listed in table 1 and found no difference on any.

9. Thirty-eight students were hand-placed in these teachers’ classrooms after the random assignment process. Como

these students were not part of the experiment, they were excluded from all analyses.

288

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

F

/

/

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

.

F

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

David Blázar

Mesa 2. Relationship between Randomly Assigned Teacher Effective-
ness in Math and Student Characteristics

Teacher Effects on State
Math Scores from
Randomly Assigned Teacher

Male

African American

asiático

Hispano

Blanco

FRPL

SPED

LEP

Prior achievement on state math test

Prior achievement on state reading test

p-value on joint test

Maestros

Estudiantes

−0.005
(0.009)

0.028
(0.027)

0.030
(0.029)

0.043
(0.028)

0.010
(0.028)

0.002
(0.011)
−0,023
(0.021)

0.004
(0.014)

0.009
(0.007)
−0,001
(0.007)

0.316

41

598

Notas: The regression model
includes fixed effects for randomization
block. Robust standard errors in parentheses. FRPL = free- or reduced-
price lunch eligible; LEP = limited English proficiency status; SPED =
special education status.

state math test scores in years prior to the experiment and baseline student character-
istics. Específicamente, I regressed these teacher effect estimates on a vector of observable
student characteristics and fixed effects for randomization block. As expected, observ-
able student characteristics were not related to teacher effects on state math tests, either
tested individually or as a group (pag = 0.808), supporting the fidelity of the randomiza-
tion process. Students who did not stay in their randomly assigned teacher’s classroom
(es decir., “noncompliers”) look similar to compliers at improving state math test scores in
years prior to random assignment, based on observable baseline characteristics, también
as the observed effectiveness of their randomly assigned teacher (see Appendix table
A.3).10 Tal como, I am less concerned about having to drop the few noncompliers left in
my sample from all subsequent analyses.

10. Twenty-six students were missing baseline data on at least one characteristic. In order to retain all students, I
imputed missing data to the mean of the students’ randomization block. I take the same approach to missing
data in all subsequent analyses. This includes the nineteen students who were part of my main analytic sample
but happened to be absent on the day that project managers administered the student survey and, de este modo, eran
missing outcome data. This approach to imputation seems reasonable given that there was no reason to believe
that students were absent on purpose to avoid taking the survey.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

/

F

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

289

Validating Teacher Effects

5 . E M P I R I C A L S T R AT E G Y
For all analyses, I began with the following model of student production:

OUTCOMEidsg jt = α f (Ait−1) + ζ OUTCOMEit−1 + π Xit + ω ¯X c
él

+ ϕ ¯X s
él

+ εidsg jt.

(1)

OUTCOMEidsg jct was used interchangeably for each survey construct—that is, Comportamiento
in Class, Self-Efficacy in Math, and Happiness in Class—for student i in district d, school s,
grade g taught by teacher j in year t. As a point of comparison, I also specify models that
use students’ math achievement as an outcome. A lo largo del documento, I test a variety of
alternative models that include different combinations of control variables. The full set
of controls includes a cubic function of students’ prior academic achievement, Ait−1,
in both math and reading; a prior measure of the outcome variable, OUTCOMEit−1;
student demographic characteristics, Xit, including gender, carrera, FRPL eligibility, SPED
estado, and LEP status; these same test-score variables and demographic characteristics
él; and school fixed effects, σs,
él , and to the school level, ¯X s
averaged to the class level, ¯X c
which replace school characteristics in some models.

To generate teacher effect estimates, which I refer to as ˆτ S

jt, I took two approaches,
each with strengths and limitations. Primero, I calculated teacher effects by fitting equation
1 using ordinary least squares (OLS) regression and then averaging student-level resid-
uals to the teacher level. I did so separately for each outcome measure, as well as with
several different model specifications denoted by the superscript, S. This approach is
intuitive, as it creates an estimate of the contribution of teachers to student outcomes
above and beyond factors already controlled for in the model. It also is computationally
simple.11 At the same time, measurement error in these estimates due to, Por ejemplo,
small class sizes, sampling idiosyncrasies, and measurement error in students’ survey
responses may lead me to overstate the variance of true teacher effects; it also could
attenuate the relationship between different measures of teacher effectiveness (p.ej.,
measures at two points in time), even if they capture the same underlying construct.

Por lo tanto, I also calculated a form of empirical Bayes estimates that take into ac-
count measurement error and shrink teacher effects back toward the mean based on
their precision. para hacerlo, I included a teacher-level random effect in the model, cual
I fit using restricted maximum likelihood. This approach is similar to a two-step ap-
proach described by others (p.ej., Kane y otros. 2013; Chetty, Friedman, and Rockoff 2014a;
Guarino et al. 2015) that first calculates the unshrunken teacher effects using OLS and
then multiplies these by a shrinkage factor. The shrinkage factor generally is calculated
by looking at variation in teacher effects within teachers and across classrooms. Fue
not possible to use this two-step approach here given that data from multiple class-
rooms from the same teacher were not available in the experimental portion of the
estudiar; elementary teachers in the sample all worked with just one class in a given year.
En cambio, the one-step random effects approach I use shrinks estimates back toward the
mean based on the variance of the observed data (Raudenbush and Bryk 2002). Alabama-
though shrinking teacher effects is commonplace in both research and policy (Koedel,

11. An alternative fixed-effects specification is preferred by some because it does not assume that teacher assign-
ment is uncorrelated with factors that predict student outcomes (Guarino et al. 2015). Sin embargo, in these data,
this approach returned similar estimates in models where it was feasible to include teacher fixed effects in addi-
tion to the other set of control variables, with correlations of 0.99 o superior (see Blazar and Kraft 2017 for more
details).

290

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

/

F

mi
d
tu
mi
d
pag
a
r
t
i
C
mi

pag
d

yo

F
/

/

/

/

1
3
3
2
8
1
1
6
9
2
7
0
4
mi
d
pag
_
a
_
0
0
2
5
1
pag
d

.

F

/

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

David Blázar

Mihaly, and Rockoff 2015), theory and simulated analyses show that shrunken estimates
are biased downward relative to the size of the measurement error (Jacob and Lefgren
2005). I refer to these two sets of estimates as “unshrunken” and “shrunken” teacher
efectos.

I utilized these teacher effect estimates for three subsequent analyses. Primero, I es-
timated the variance of ˆτ S
jt in order to examine the extent to which teachers vary in
their contributions to students’ attitudes and behaviors. I focused on the experimental
sample in order to be assured that estimates were not biased by nonrandom sorting.
Given that the variance of true teacher effects is bounded between the unshrunken
and shrunken estimates (Raudenbush and Bryk 2002), I present both. The latter are
model-based estimates reported directly from the random effects model. By calculating
pairwise correlations between these teacher effect estimates, I also examined whether
teachers who improved one student outcome were equally effective at improving
otros.

Segundo, I examined the sensitivity of ˆτ S

jt to different model specifications. comencé
with a baseline model that calculated teacher effects controlling only for students’ prior
academic achievement, as this is the measure typically used to account for nonrandom
sorting when test scores are the outcome of interest (Kane and Staiger 2008; Kane y otros.
2013; Chetty, Friedman, and Rockoff 2014a). I also considered a model that conditioned
estimates on a lagged measure of students’ survey response, which is a more direct
analog of the value-added approach by looking at gains in student outcomes. Additional
variations of these models include ones that control for student, class, or school char-
caracteristicas. In order to address concerns about “reference bias” in self-reported survey
measures (Duckworth and Yeager 2015; West et al. 2016), I also replaced school charac-
teristics with school fixed effects. By making within-school comparisons, I am able to
difference out school-level factors, including norms around behavior or engagement,
that can create an implicit standard of comparison students use when judging their
own behavior or engagement. It was not possible to run this second set of analyses in
the experimental sample, given that only a small subset of students and teachers in that
sample had lagged survey measures. There also was no guarantee that a teacher who
had students with prior survey measures had a randomization block partner whose stu-
dents had these measures. En cambio, I focused on the balanced sample of teachers and
students with all possible control variables from the larger observational dataset.

In my third and final set of analyses, I examined whether nonexperimental teacher
effect estimates calculated in years prior to 2012–13 predicted student outcomes follow-
ing random assignment. The randomized design allowed for a straightforward analytic
modelo:

OUTCOMEi jsg2012−13 = δˆτ S

jt<2012−13 + νsg + (cid:12)i jsgt. (2) OUTCOMEi jsg2012−13 was used interchangeably for each outcome measure for student i in teacher j’s classroom in the 2012–13 school year. I predicted these measures in the random assignment year with predicted, nonexperimental teacher effect estimates, jt<2012−13. That is, when Behavior in Class is the outcome of interest, ˆτ S ˆτ S jt<2012−13 rep- resents a nonexperimental estimate of teachers’ effectiveness at improving students’ Behavior in Class in prior years; when Self-Efficacy in Math is the outcome of interest, l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 291 Validating Teacher Effects Table 3. Standard Deviation of Teacher-Level Variance (1) (2) (3) Panel A: Unshrunken Estimates State math test Behavior in Class Self-Efficacy in Math Happiness in Class 0.28 0.28 0.29 0.35 0.19 0.24 0.27 0.35 Panel B: Shrunken Estimates State math test Behavior in Class Self-Efficacy in Math Happiness in Class Prior achievement Student characteristics Class characteristics School-by-grade fixed effects Teachers Students 0.22 0.13 0.00 0.33 0.18 0.09 0.00 0.33 X X 41 X 41 531 531 0.13 0.14 0.19 0.26 0.13 0.05 0.08 0.34 X X X X 41 531 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 ˆτ S jt<2012−13, represents a nonexperimental estimate of teachers’ effectiveness at improv- ing Self-Efficacy in Math in prior years. Following the research design, I included fixed effects for each randomization block, νsg. In order to increase the precision of my es- timates, I calculated nonexperimental teacher effects using all available teacher-years prior to the experiment. For the same reason, in equation 2, I also controlled for stu- dents’ prior achievement, demographic characteristics, and class characteristics cap- tured from the randomly assigned rosters. I clustered standard errors at the class level to account for the nested structure of the data. My parameter of interest is δ, which describes the relationship between nonexperi- mental teacher effect estimates and current student outcomes. As in Kane and Staiger (2008), I examined whether these estimates had any predictive validity (i.e., whether they were statistically significantly different from 0 SD) and whether they contained some degree of bias (i.e., whether they were statistically significantly different from 1 SD). 6 . R E S U LT S Experimental Teacher Effects on Students’ Attitudes and Behaviors In table 3, I present results describing the extent to which teachers vary in their contri- butions to students’ attitudes and behaviors, as well as their math achievement. Es- timates represent the standard deviation of the teacher-level variance, with panel A and panel B presenting unshrunken and shrunken estimates, respectively. In table 4, I present correlations between corresponding unshrunken and shrunken estimates. All models focus on the experimental sample in which teachers were randomly as- signed to class rosters within schools, and therefore include randomization block (i.e., school-by-grade) fixed effects to match this design. Model 1 estimates teacher effects with no additional controls, and model 2 adds students’ prior achievement in math and 292 David Blazar Table 4. Correlations between Unshrunken and Shrunken Teacher Effect Estimates (1) (2) (3) Teacher effects on state math test Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math 0.93 0.87 — 0.95 0.84 — Teacher effects on Happiness in Class 0.95 0.95 Prior achievement Student characteristics Class characteristics School-by-grade fixed effects Teachers X 41 X X 41 0.84 0.87 0.83 0.87 X X X X 41 Note: Empty cells indicate that variation in unshrunken or shrunken teacher effects is close to 0 SD (see table 3). reading, which is standard practice when estimating teacher effects. In model 3, I add student demographic characteristics, as well as class characteristics that aim to remove the contribution of peer effects from the teacher effect estimates. It was not possible to model classroom-level shocks directly, as random assignment data were not available over multiple classes or school years. Class characteristics describe the set of students included on the randomly assigned rosters rather than the students who ultimately stayed in that classroom. I begin by describing the magnitude of teacher effects on students’ math perfor- mance on state tests, which have been well documented in the academic literature (for a review, see Hanushek and Rivkin 2010) and thus provide a point of comparison for the magnitude of teacher effects on students’ attitudes and behaviors. I find that a 1 SD in- crease in teacher effectiveness is equivalent to between a 0.13 SD and 0.28 SD increase in students’ math achievement. Results are fairly similar between the corresponding unshrunken and shrunken estimates, particularly when controlling for students’ prior achievement. In models 2 and 3, the magnitude of the teacher-level variation for these unshrunken and shrunken estimates are almost identical to two decimal places, and correlations between them range from 0.84 to 0.95. These results are quite similar to those found by Guarino et al. (2015), who argued that the “effect of shrinkage itself does not appear to be practically important for properly ranking teachers or to ameliorate the performance of the [unshrunken] estimator” (p. 212). Whereas I do not find large differences in the magnitude of teacher effects on math test scores between shrunken and unshrunken estimates, I do observe differences de- pending on the set of covariates included in the model. In particular, the variance of teacher effects on math test scores is substantively larger in model 1 (0.28 and 0.22 SD for unshrunken and shrunken estimates, respectively), which only controls for random- ization block fixed effects, than in model 3 (0.13 SD for both unshrunken and shrunken estimates), which also controls for student and class characteristics. This is consistent with other literature, suggesting that controlling for observable class or peer characteris- tics produces a conservative estimate of the magnitude of teacher effects on student test scores (Kane et al. 2013; Thompson, Guarino, and Wooldridge 2015). In model 3, both shrunken and unshrunken teacher effect estimates of 0.13 SD indicate that, relative to l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d . f / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 293 Validating Teacher Effects an average teacher, teachers at the 84th percentile of the distribution of effectiveness move the median student up to roughly the 55th percentile of math achievement. I also find that teachers have substantive impacts on self-reported measures of students’ attitudes and behaviors. The largest of these teacher effects is on students’ Happiness in Class, where a 1 SD increase in teacher effectiveness leads to a roughly 0.30 SD increase in this outcome. Similar to teacher effects on students’ math perfor- mance, results for teacher effects on students’ Happiness in Class are fairly consistent between panel A and panel B, indicating that shrinkage does not necessarily boost per- formance. Correlations between unshrunken and shrunken estimates range from 0.87 to 0.95. For the unshrunken teacher effects, estimates are smaller when controlling for student and class characteristics in model 3 (0.26 SD) compared with estimates from the other two models (0.35 SD). This is not the case for the shrunken teacher effects, where estimates across all three models are roughly 0.33 SD. In model 3, the variance of the unshrunken teacher effects on students’ Happiness in Class is slightly larger than the variance of the analogous shrunken estimates. This is possible given that, as de- scribed above, the shrunken estimates are not derived by directly shrinking the un- shrunken estimates. Rather, these estimates come from a separate model that includes a teacher-level random effect and generates model-based estimates of the variance component. The evidence also points to sizeable teacher effects on students’ Behavior in Class and Self-Efficacy in Math, though results are less consistent between unshrunken and shrunken estimators. Without shrinkage, the magnitude of teacher effects on these two outcomes generally is larger than teacher effects on students’ math performance but smaller than teacher effects on students’ Happiness in Class: between 0.14 SD and 0.28 SD for teacher effects on students’ Behavior in Class, and between 0.19 SD and 0.29 SD for teacher effects on students’ Self-Efficacy in Math. Shrunken estimates are considerably smaller. For example, in model 3, these shrunken estimates are 0.05 SD for teacher effects on Behavior in Class and 0.08 SD for teacher effects on Self-Efficacy in Math. Correlations between the unshrunken and shrunken estimates still are strong, but never above 0.87. Models that exclude class characteristics and use shrinkage to calculate teacher effects on students’ Self-Efficacy in Math produce estimates close to 0 SD. For this reason, I exclude from table 4 correlations between unshrunken and shrunken estimates for this outcome and these models. It is counterintuitive that models that include class characteristics produce esti- mates that are larger than those that exclude these control variables. It is possible the error structure for students’ self-reported Self-Efficacy in Math is quite different from the error structure for other measures, which in turn leads to challenges when imple- menting shrinkage through a random effects model fit using restricted maximum like- lihood estimation. Although restricted maximum likelihood aims to address concerns that full maximum likelihood tends to produce variance estimates that are biased down- ward, this may also be a concern in the relatively small sample of teachers and students (Harville 1977; Raudenbush and Bryk 2002). Mixed models can result in singular fits (i.e., variance-covariance components that are exactly zero) in several instances, includ- ing a small number of random effects and complex random effects models (Gelman 2006). This topic is beyond the scope of this paper but is an important one for future research. 294 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar Table 5. Pairwise Correlations between Teacher Effects on Different Student Outcomes Teacher Effects on State Math Test Teacher Effects on Behavior in Class Teacher Effects on Self-Efficacy in Math Teacher Effects on Happiness in Class Panel A: Unshrunken Estimates Teacher effects on state math test Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 1.0 0.16 0.17 −0.22 1.0 0.48** 0.17 Panel B: Shrunken Estimates Teacher effects on state math test Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 1.0 0.17 −0.03 −0.38* 1.0 0.65*** 0.17 1.0 0.44** 1.0 0.59*** 1.0 1.0 Notes: Teacher effects are calculated from model 3 in tables 3 and 4, which controls for prior achievement, student characteristics, class characteristics, and randomization block fixed effects. Samples include 41 teachers. *p < 0.05; **p < 0.01; ***p < 0.001. In table 5, I present a correlation matrix of teacher effects on different student outcomes. Here, teacher effects come from model 3 (see tables 3 and 4), which con- trols for prior achievement, student characteristics, and class characteristics. I focus on this model given that the magnitude of shrunken and unshrunken teacher effects are greater than 0 SD for all outcomes. One concern when estimating relationships be- tween different measures of teacher quality is that individual teacher effect estimates are measured with error, which will attenuate these correlations (Spearman 1904). In- deed, correlations between the shrunken teacher effect estimates in panel B—which are estimated in a way that aims to reduce measurement error—generally are larger than correlations between the unshrunken ones in panel A. At the same time, differ- ences in correlations between these two panels are not large, suggesting that additional approaches to address attenuation due to measurement error are unlikely to change overall patterns of results. The largest of these correlations is between teacher effects on different measures of students’ attitudes and behaviors. For example, teacher effects on students’ Self-Efficacy in Math and teacher effects on the other two non-tested outcomes fall between 0.44 and 0.65. However, teachers do not appear to be equally effective at improving all three attitudes and behaviors. The correlation between teacher effects on students’ Happi- ness in Class and teacher effects on students’ Behavior in Class is weak and nonsignifi- cant. Correlations between teacher effects on students’ attitudes and behaviors versus effects on students’ math achievement are similarly weak; most are not statistically sig- nificant. One exception is the relationship between teacher effects on students’ math performance and teacher effects on students’ Happiness in Class, which is negative and statistically significantly correlated when using the shrunken estimates (r = −0.38). This suggests teachers who are skilled at improving students’ math achievement may do so in ways that make students less happy or less engaged in class. Overall, these findings provide strong evidence that teachers impact several student attitudes and behaviors in addition to their academic performance. Weak, l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 295 Validating Teacher Effects Table 6. Pairwise Correlations between Teacher Effects across Model Specifications ρ Model 1,Model 2 ρ Model 1,Model 2 ρ Model 2,Model 3 Panel A: Unshrunken Estimates Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 0.89*** 0.88*** 0.96*** Panel B: Shrunken Estimates Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 0.90*** 0.86*** 0.96*** 0.89*** 0.91*** 0.97*** 0.91*** 0.90*** 0.96*** 1.00*** 0.98*** 0.99*** 1.00*** 0.97*** 0.99*** Notes: Model 1 calculates teacher effectiveness ratings that control for students’ prior achievement in math and reading. Model 2 controls for a prior measure of students’ attitude or behavior. Model 3 controls for prior scores on both prior achievement and prior attitude or behavior. Samples include 51 teachers. ***p < 0.001. Table 7. Pairwise Correlations between Unshrunken Teacher Effects from Model 1 Versus other Model Specifications ρ Model 1,Model 4 ρ Model 1,Model 5 ρ Model 1,Model 6 ρ Model 1,Model 7 Panel A: Unshrunken Estimates Teacher effects on state math test Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 0.98*** 0.95*** 0.99*** 0.97*** 0.72*** 0.74*** 0.84*** 0.85*** Panel B: Shrunken EB Estimates Teacher effects on state math test Teacher effects on Behavior in Class Teacher effects on Self-Efficacy in Math Teacher effects on Happiness in Class 0.99*** 0.98*** 0.99*** 0.99*** 0.76*** 0.69*** — 0.90*** 0.64*** 0.64*** 0.78*** 0.52*** 0.54*** 0.63*** — 0.71*** 0.49*** 0.38*** 0.46*** 0.49*** 0.53*** 0.41*** — 0.66*** Notes: Baseline model to which others are compared (model 1) calculates teacher effectiveness ratings that only control for students’ prior achievement in math and reading. Model 4 adds student demographic characteristics, including gender, race, free or reduced-price lunch eligibility, special education status, and limited English proficiency status; Model 5 adds classroom characteristics. Model 6 adds school characteristics. Model 7 replaces school characteristics with school fixed effects. Empty cells indicate that variation in teacher effects from one of the models is close to 0 SD (see Appendix table A.4). Samples include 51 teachers. ***p < 0.001. nonsignificant correlations between many of these teacher effect estimates indi- cate these measures identify unique skills that teachers bring to and engage in the classroom. Sensitivity of Teacher Effects Across Model Specifications In tables 6 and 7, I present results describing the relationship between teacher ef- fects on students’ attitudes and behaviors across model specifications. As before, panel A shows correlations for unshrunken estimates, and panel B shows correlations for shrunken estimates. Because patterns of results are quite similar for the unshrunken and shrunken estimates, I focus my discussion on the latter for simplicity. This anal- ysis includes the balanced sample of teachers and students with all possible control 296 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / f . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar variables from the larger observational dataset. In Appendix table A.4, I present the magnitude of the teacher-level variation on all student outcomes using this sample, and find that results are similar to those presented in table 3 using the experimental sample. In the first of these tables (table 6), I examine the correlations between teacher effects on students’ attitudes and behaviors that are estimated controlling for prior achievement (model 1), a prior measure of the survey outcome (model 2), or both (model 3). Because this analysis examines the sensitivity of teacher effects to inclusion or ex- clusion of lagged measures of the outcome variable, I focus only on teacher effects on the three measures of students’ attitudes and behaviors. For teacher effects on stu- dents’ math performance used elsewhere in the paper, all models control for lagged achievement. Here, I find correlations of teacher effects across model specifications above 0.86. As expected, the smallest of these correlations describe the relationship between teacher effects that control either for prior achievement (model 1) or for students’ prior survey responses (model 2). However, these correlations still are quite strong: 0.90 for teacher effects on Behavior in Class, 0.86 for Self-Efficacy in Math, and 0.96 for Happi- ness in Class. Correlations between teacher effects from models that have overlapping sets of controls (i.e., between models 1 and 3 or between models 2 and 3) are stronger, be- tween 0.90 and 0.99. This suggests teacher effects on these attitudes and behaviors are not particularly sensitive to inclusion of prior achievement or prior survey responses. In light of these findings, I exclude prior measures of students’ attitudes and behaviors from most subsequent analyses, allowing me to retain the largest possible sample of teachers and students. Next, I examine the sensitivity of teacher effects from this baseline model (model 1) to models that control for additional student, class, or school characteristics (see table 7). In the table, empty cells indicate instances where the teacher-level variation is close to 0 SD (i.e., for shrunken teacher effects on students’ Self-Efficacy in Math gen- erated from models 5 through 7; see Appendix table A.4). I find that teacher effects on students’ math performance and on the three measures of students’ attitudes and behaviors are not particularly sensitive to student demographic characteristics but are sensitive to additional control variables. Correlations between teacher effect estimates from model 1 (which controls for prior test scores) and model 4 (which builds on model 1 by adding student demographic characteristics) all are greater than or equal to 0.95. For teacher effects on students’ math performance, Behavior in Class, and Self-Efficacy in Math, correlations between estimates from model 1 and from model 5 (which builds on previous models by adding classroom characteristics) are substantively smaller, at 0.76, 0.69, and 0.82, respectively. For teacher effects on students’ Happiness in Class, the correlation stays above 0.90. Adding school characteristics to teacher effect specifications appears to have the largest impact on teacher rankings. Correlations between estimates from model 1 and from model 6 (which builds on previous models by adding observable school character- istics) range from 0.54 (for teacher effects on students’ math performance) to 0.71 (for teacher effects on students’ Happiness in Class). Correlations between estimates from model 1 and from model 7 (which replaces observable school characteristics with school fixed effects) range from 0.41 (for teacher effects on students’ Behavior in Class) to 0.66 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 297 Validating Teacher Effects (for teacher effects on students’ Happiness in Class). Correlations between estimates from models 6 and 7 (not shown in table 7) are 0.94, 0.70, 0.71, and 0.92 for teacher effects on students’ math performance, Behavior in Class, Self-Efficacy in Math, and Hap- piness in Class, respectively. Reference bias is one possible explanation for lower corre- lations in models that do and do not control for school fixed effects. At the same time, these estimates are well within the range reported in studies looking at the sensitivity of teacher effects on test scores across models that control for school characteristics or school fixed effects, between roughly 0.5 and 0.9 (Aaronson, Barrow, and Sander 2007; Hill, Kapitula, and Umland 2011; Goldhaber and Theobald 2012). Predictive Validity of Nonexperimental Teacher Effects In table 8, I report estimates describing the relationship between nonexperimental teacher effects on student outcomes and these same measures following random assignment. Cells contain estimates from separate regression models where the de- pendent variable is the student attitude or behavior listed in each column. The inde- pendent variable of interest is the nonexperimental teacher effect on this same out- come estimated in years prior to random assignment. Nonexperimental teacher effects are modeled from five separate equations discussed above, each with different sets of covariates. I exclude teacher effects calculated from models 2 and 3 (described in table 6), both of which controlled for prior measures of students’ attitudes and behaviors that were not available for many teachers’ students in the experimental portion of the study. However, at the end of this section I describe results from additional analyses that estimate the relationship between nonexperimental teacher effects that control for imputed lagged survey measures; results are consistent with the main results. Stars indicate whether point estimates are statistically significantly different from 0 SD, and p-values testing the null hypothesis that effect sizes are equal to 1 SD are presented next to each estimate. The sample sizes for Happiness in Class is reduced by one teacher who did not have nonexperimental teacher effects on this outcome. Validity evidence for teacher effects on students’ math performance are consis- tent with other experimental studies (Kane and Staiger 2008; Kane et al. 2013), where predicted differences in teacher effectiveness in observational data are equal to or come close to actual differences following random assignment of teachers to classes. The nonexperimental teacher effect estimate that comes closest to a 1:1 relationship is the shrunken estimate that controls for students’ prior achievement and other demo- graphic characteristics (0.995 SD). Despite a relatively small sample of teachers, the standard error for this estimate (0.084) is substantively smaller than those in other experimental studies—including the meta-analysis conducted by Bacher-Hicks et al. (2017)—and allows me to rule out relatively large degrees of bias in teacher effects cal- culated from this model. A likely explanation for greater precision in this study rel- ative to others is the fact that other studies generate estimates through instrumen- tal variables estimation to calculate treatment on the treated. Instead, I use OLS re- gression and account for noncompliance by narrowing in on randomization blocks in which very few, if any, students moved out of their randomly assigned teachers’ class- room. Nonexperimental teacher effects calculated without shrinkage are related less strongly to current student outcomes, though differences in estimates and associated 298 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar s s a C n l i i s s e n p p a H h t a M n i y c a c fi f E - f l e S s e t a m i t s E t c e f f E r e h c a e T s s a C n l i r o i v a h e B t s e T h t a M e t a t S l a t n e m i r e p x e n o N , r o i r P d n a t n e m n g i s s A m o d n a R g n w o i l l o F s e m o c t u O t n e d u t S n e e w t e b i p h s n o i t a e R l . 8 e l b a T n o l e u a v - p e c n e r e f f i D D S 1 m o r f E S / e t a m i t s E n o l e u a v - p e c n e r e f f i D D S 1 m o r f E S / e t a m i t s E n o l e u a v - p e c n e r e f f i D D S 1 m o r f E S / e t a m i t s E n o l e u a v - p e c n e r e f f i D D S 1 m o r f E S / e t a m i t s E 0 0 0 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 0 . 3 0 0 0 . 3 0 0 0 . 1 0 0 0 . 5 0 0 0 . 8 0 0 0 . * 0 5 3 0 . ) 2 4 1 0 ( . * 1 6 3 0 . ) 0 4 1 0 ( . * 9 4 3 0 . ) 3 3 1 0 ( . * * 9 9 3 0 . ) 6 3 1 0 ( . * * 9 9 3 0 . ) 4 3 1 0 ( . * 7 2 4 0 . ) 7 7 1 0 ( . * 8 3 4 0 . ) 5 7 1 0 ( . * 6 1 4 0 . ) 7 6 1 0 ( . * * 7 8 4 0 . ) 4 7 1 0 ( . * * 2 2 5 0 . ) 2 7 1 0 ( . 0 1 0 0 . 1 1 0 0 . 9 2 0 0 . 5 3 0 0 . 3 2 0 0 . 5 9 1 0 . 2 9 1 0 . ∼ 8 1 4 0 . ) 6 1 2 0 ( . ∼ 4 1 4 0 . ) 9 1 2 0 ( . ∼ 7 4 4 0 . ) 4 4 2 0 ( . ∼ 1 6 4 0 . ) 7 4 2 0 ( . ∼ 8 4 4 0 . ) 4 3 2 0 ( . 4 1 5 0 . ) 9 6 3 0 ( . 7 0 5 0 . ) 2 7 3 0 ( . — — — s e t a m i t s E n e k n u r h s n U : A l e n a P 4 6 0 0 . 4 5 0 0 . 8 5 0 0 . 7 7 0 0 . 8 1 1 0 . * * * 5 8 6 0 . ) 5 6 1 0 ( . * * * 6 0 7 0 . ) 8 4 1 0 ( . * * * 9 1 7 0 . ) 4 4 1 0 ( . * * * 2 4 7 0 . ) 2 4 1 0 ( . * * * 8 6 7 0 . ) 5 4 1 0 ( . 2 9 9 0 . 8 3 7 0 . 6 3 4 0 . 7 0 2 0 . 2 9 0 0 . * * * 3 0 0 1 . ) 6 6 2 0 ( . * * * 0 9 0 1 . ) 8 6 2 0 ( . * * * 0 4 2 1 . ) 5 0 3 0 ( . * * * 2 7 4 1 . ) 8 6 3 0 ( . * * * 9 8 7 1 . ) 8 5 4 0 ( . s e t a m i t s E n e k n u r h S : B l e n a P 6 4 0 0 . 5 1 1 0 . 6 6 3 0 . 1 6 3 0 . 4 1 3 0 . 9 0 6 0 . 3 5 9 0 . 5 8 5 0 . 5 3 4 0 . 9 1 4 0 . * * * 6 4 8 0 . ) 5 7 0 0 ( . * * * 9 6 8 0 . ) 1 8 0 0 ( . * * * 4 1 9 0 . ) 4 9 0 0 ( . * * * 5 1 9 0 . ) 2 9 0 0 ( . * * * 3 0 9 0 . ) 5 9 0 0 ( . * * * 0 6 9 0 . ) 8 7 0 0 ( . * * * 5 9 9 0 . ) 4 8 0 0 ( . * * * 5 5 0 1 . ) 0 0 1 0 ( . * * * 9 7 0 1 . ) 1 0 1 0 ( . * * * 4 8 0 1 . ) 2 0 1 0 ( . 1 l e d o m m o r f 4 l e d o m m o r f 5 l e d o m m o r f 6 l e d o m m o r f 7 l e d o m m o r f 1 l e d o m m o r f 4 l e d o m m o r f 5 l e d o m m o r f 6 l e d o m m o r f 7 l e d o m m o r f 0 4 9 0 5 1 4 1 3 5 1 4 1 3 5 1 4 1 3 5 l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l l d e t a u c a c l s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s t c e f f e r e h c a e T s r e h c a e T s t n e d u t S s l l e C : s e t o N , s c i t s i r e t c a r a h c i c h p a r g o m e d t n e d u t s i , g n d a e r d n a h t a m n i t n e m e v e h c a i r o i r p ’ s t n e d u t s r o f l o r t n o c t a h t s l e d o m n o i s s e r g e r e t a r a p e s m o r f s e t a m i t s e e d u c n l i l e d o M . s e s e h t n e r a p n i l e v e l l s s a c e h t t a d e r e t s u c l s r o r r e d r a d n a t s t s u b o R l . k c o b n o i t a z i m o d n a r r o f s t c e f f e d e x fi d n a , s r e t s o r d e n g i s s a y l m o d n a r m o r f s c i t s i r e t c a r a h c m o o r s s a c l l e d o M ; s c i t s i r e t c a r a h c i c h p a r g o m e d t n e d u t s s d d a 4 l e d o M i ; g n d a e r d n a h t a m n i t n e m e v e h c a i r o i r p ’ s t n e d u t s r o f l o r t n o c y l n o t a h t s g n i t a r s s e n e v i t c e f f e r e h c a e t l s e t a u c a c l 1 n o i t a i r a v t a h t e t a c d n i i s l l e c y t p m E . s t c e f f e d e x fi l o o h c s h t i w s c i t s i r e t c a r a h c l o o h c s s e c a p e r l 7 l e d o M ; s c i t s i r e t c a r a h c l o o h c s s d d a 6 l e d o M ; s c i t s i r e t c a r a h c m o o r s s a c l s d d a 5 . ) 4 A . l e b a t x i d n e p p A e e s ( D S 0 o t e s o c l s i s t c e f f e r e h c a e t l a t n e m i r e p x e n o n n i . . 1 0 0 0 < p * * * ; . 1 0 0 < p * * ; . 5 0 0 < p * ; . 0 1 0 < p ∼ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 299 Validating Teacher Effects standard errors between panel A and panel B are not large. All corresponding estimates (e.g., model 1 from panel A versus panel B) have overlapping 95 percent confidence intervals. Results examining forecast bias in teacher effects on students’ Behavior in Class are not substantively different from what I would expect based on the math test-score out- come. I find that teacher effects with the best predictive validity are the shrunken esti- mates from model 1, which calculates nonexperimental teacher effects only controlling for students’ prior achievement.12 Here, I find an estimate of 1.00 SD that matches the hypothesis described by Kane and Staiger (2008), where predicted differences across classroom should equal observed differences. However, the standard error around this estimate is substantively larger (0.25) than the standard error for test-score estimates. Comparison of estimates between panel A and panel B provides some insight here and, in particular, the tradeoff between accuracy and precision. In panel B, estimates relating nonexperimental, shrunken teacher effect estimates on students’ Behavior in Class to current student outcomes are notably larger and closer to 1 SD than estimates in panel A relating unshrunken estimates to current student outcomes. This makes sense, as shrunken estimates are adjusted for the amount of measurement error the unshrunken estimates contain. Measurement error will attenuate the relationship be- tween two teacher effect estimates, even if the true relationship is equal to 1 SD. Indeed, earlier in the paper, I showed that teacher effects on students’ Behavior in Class gener- ally underwent more shrinkage than teacher effects on students’ math test scores (see table 3). At the same time, relationships between shrunken teacher effect estimates and current student outcomes in panel B are measured with considerably less preci- sion than relationships drawing on unshrunken teacher effect estimates in panel A. Standard errors in panel B are roughly two to three times as large as those in panel A. This also makes sense, as shrunken estimates provide a lower bound on the variation of true teacher effects, particularly when measurement error is large (Jacob and Lefgren 2005); decreased variation in the independent variable decreases statistical power. Con- sidering results from panels A and B jointly provides evidence that nonexperimental methods for estimating teacher effects on students’ Behavior in Class account for a large degree of bias due to nonrandom sorting and factors beyond teachers’ control. For both Self-Efficacy in Math and Happiness in Class, nonexperimental teacher ef- fect estimates have moderate predictive validity. Generally, I can distinguish estimates from 0 SD, indicating they contain some information content on teachers. The excep- tion is shrunken estimates for Self-Efficacy in Math. Although estimates are similar in magnitude to the unshrunken estimates in panel A, between 0.42 SD and 0.58 SD standard errors are large and 95 percent confidence intervals cross 0 SD. I also can 12. Results in panel B suggest that adding class- and school-level controls to the model calculating nonexperimental teacher effects on students’ Behavior in Class may in fact add bias. Point estimates describing the relationship between current student outcomes and nonexperimental teacher effects calculated from models 3 through 5 all are greater than 1.2 SD. These patterns are somewhat consistent with findings from the MET project, in which researchers suggested that some models “over control,” resulting in removal of peer effects that actually predict important differences in teacher performance (Kane et al. 2013). This explanation makes sense here as well, where teachers’ ability to improve individual students’ behavior likely is closely related to the control they have over other peer-to-peer relationships. At the same time, I do not want to place too much emphasis on these differences across models, given that standard errors are large and, thus, point estimates have overlapping 95 percent confidence intervals. 300 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar distinguish many estimates from 1 SD. This indicates that nonexperimental teacher ef- fects on students’ Self-Efficacy in Math and Happiness in Class contain potentially large and important degrees of bias. For both measures of teacher effectiveness, point esti- mates around 0.5 SD suggest that they contain roughly 50 percent bias. One concern with these results is that nonexperimental teacher effects do not con- trol for prior survey responses, and such a model might reduce bias. Earlier in the pa- per, I show that in the observational portion of the study, teacher effects that control for some combination of prior achievement and/or prior survey responses do not return markedly different teacher rankings. However, in some instances correlations are lower than 0.90 (see table 6), leaving open the possibility that controlling for lagged survey responses may boost performance. To address this concern in the experimental data, I conduct a robustness check that calculates nonexperimental teacher effects after im- puting students’ lagged survey responses. To impute, I fit a series of regression models that predict students’ lagged survey response with all other available data (i.e., prior test scores in math and reading, and the student demographic characteristics listed in table 2) for the sample of students with these data used elsewhere in this paper.13 Then, I use this model to infer predicted values for all students without lagged survey responses. Finally, I calculate nonexperimental teacher effects on students’ attitudes and behaviors controlling for these lagged measures, an indicator for whether or not the lagged survey measure was imputed, and in some instances, students’ prior test scores. Model 2 cal- culates nonexperimental teacher effects controlling for students’ prior survey response, while model 3 calculates nonexperimental teacher effects controlling for prior achieve- ment and prior survey response. I present results that relate these teacher effects to student outcomes following random assignment in Appendix table A.5. Patterns of re- sults are very similar to those presented in table 8, suggesting that controlling for lagged measures of the outcome variable when calculating nonexperimental teacher effects on students’ attitudes and behaviors does not appear to change inferences regarding bias in these measures. These results also are similar to those from the quasi-experimental validation study by Backes and Hansen (2018), where the authors controlled for prior measures of their non-tested outcomes in all models and still found large degrees of bias in some instances. 7 . D I S C U S S I O N A N D C O N C L U S I O N Where does this leave policy, practice, and research? Should these measures be used in policy settings, despite concerns about bias? This is not an easy question to answer. For some readers, the relationships presented in this paper could point to considerable policy usefulness for the nonexperimental estimates. If one were to rank teachers using experimental and nonexperimental estimates, results would be similar. Thus, ignoring reference bias problems and possible gaming, a teacher de-selection policy using biased measures would still improve outcomes on average. 13. Students’ prior test scores are statistically significant predictors of all three measures of students’ attitudes and behavior, as are several demographic characteristics. Together, prior achievement and demographic character- istics explain a sizeable amount of the variation in prior survey responses: 22 percent for Behavior in Class, 10 percent for Self-Efficacy in Math, and 7 percent for Happiness in Class. Further, imputed measures of stu- dents’ prior survey responses are moderately to strongly related to current survey responses, with standardized regression coefficients between 0.43 SD (Happiness in Class) and 0.63 SD (Behavior in Class). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 301 Validating Teacher Effects Another possible reason to incorporate measures of students’ attitudes and behav- iors and teachers’ ability to improve them into selection and accountability policy, in spite of bias, would be to create clear incentives for improving these skills in school. Many, including myself, see students’ social and emotional development as a central goal of teachers’ and schools’ work (e.g., Pianta and Hamre 2009; Durlak et al. 2011; Farrington et al. 2012). Yet, accountability systems that focus predominantly or exclu- sively on student achievement send a message that the skills captured on these tests are the ones that policy makers want students to have when they leave school. Broadening what it means to be a successful student and “making the development of the whole child central to the mission of education” (Garcia 2014, p. 4) clearly is good policy. At the same time, lessons learned from new teacher evaluation systems that incor- porate teacher effects on students’ test scores highlight several reasons why making high-stakes policy decisions based on teacher effects on students’ attitudes and behav- iors may not be appropriate or advantageous. Despite convincing evidence against bias in teacher effects on students’ academic performance, teachers still are skeptical about their use and the fairness of these measures (Jiang, Sporte, and Luppescu 2015). One reason for this skepticism discussed in the academic literature is that, even if teacher ef- fects are unbiased, they often are quite noisy measures of teachers’ effectiveness (Ballou and Springer 2015). Large confidence intervals around individual teachers’ scores—due to the number of students attached to that teacher and error in the student-level assess- ment itself—mean that a teacher’s underlying ability often is statistically indistinguish- able from other teachers. This likely would be an even greater issue for teacher effects on students’ attitudes and behaviors given well-documented concerns about error in student-, teacher-, and parent-reports of these measures (Duckworth and Yeager 2015). Even if systems were to incorporate classical measurement error into teachers’ effec- tiveness ratings, which most do not, there still would be lingering concerns about other sources of error. In particular, there are bound to be concerns about cheating (Campbell 1979; Koretz 2008). In this study, student surveys were administered under low-stakes conditions where student responses were not visible to the teacher or other students in the classroom. It is possible that estimates of bias might differ—likely to increase— under high-stakes settings where survey responses could be coached or influenced by other pressures. Despite concern about using teacher effects on students’ attitudes and behaviors in high-stakes policy settings, I believe there are other uses of these measures that fall within and would enhance existing school practices. In particular, measures of teachers’ effectiveness at improving students’ attitudes and behaviors could be used to identify areas for professional growth and connect teachers with targeted professional devel- opment. Bringing costly but effective development programs, such as teacher coach- ing (Kraft, Blazar, and Hogan 2018), to scale requires at least two key pieces of infor- mation that measures such as those used in this study could provide. First, it would be useful to know which teachers require immediate support in order to allocate pro- fessional development dollars to these teachers, as opposed to investing in lower-cost but less-effective programs that reach all teachers. Second, the individualized nature of coaching and related development programs require that school leaders know teach- ers’ individual strengths and weaknesses in order to facilitate appropriate teacher-coach 302 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / f . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar or teacher-team matches, where members have complementary skill sets (Papay et al. 2016). Observation rubrics provide one source of data for this purpose, yet have logisti- cal constraints, including needing school leaders who have the time and knowledge to assess multiple teachers on multiple teaching skills (Hill and Grossman 2013). In light of moderate to strong relationships between teachers’ observed classroom behaviors captured on established observation rubrics, and teacher effects on several student at- titudes and behaviors (Blazar and Kraft 2017), it is possible that the latter could be used as a lower-cost proxy for the former. In these instances, biased measures are less likely to be a concern than in settings where teachers’ jobs are on the line. Finally, supporting teachers in the work of developing students’ attitudes and behav- iors will require investments in research in addition to changes in policy and practice. Based on experimental studies of teacher effects on student achievement, newer re- search is starting to examine related questions (e.g., how instructional supports im- pact students’ achievement; Kane et al. 2016) using observational and value-added approaches that generally are less expensive and considerably more tractable than ran- domized control trials. Making this important decision in the context of research on students’ attitudes and behaviors will require close consideration of the tradeoffs be- tween two key issues: bias and availability of data. The analyses presented here suggest that value-added approaches likely will reduce some but not all of the sorting bias that could influence estimates of the impact of different inputs on measures of students’ behavior, self-efficacy, and happiness. Even if some degree of bias remains, this ap- proach likely would improve upon much of the existing body of research to date that lacks convincing evidence about what works in education (Murnane and Willett 2011; Kane 2015). At the same time, such studies would not have the benefit of easily acces- sible administrative data that have made this type of work possible when examining gains in student achievement outcomes. Building up administrative datasets that in- clude rich measures of students’ attitudes and behaviors in addition to their academic performance is, in my opinion, a worthy goal (see West 2016 for how this is starting to happen in some education agencies, including the CORE districts in California). Un- til that happens, though, researchers will need to continue to collect these measures themselves. In turn, we likely will want to conduct more and learn as much as possible from random assignment studies. ACKNOWLEDGMENTS I would like to thank Martin West, Thomas Kane, Heather Hill, and anonymous reviewers for their feedback on earlier drafts. The data collection effort described here was supported by the Institute of Education Sciences, U.S. Department of Education, through grant R305C090023 to the President and Fellows of Harvard College to support the National Center for Teacher Ef- fectiveness. The opinions expressed are those of the author and do not represent views of the Institute or the U.S. Department of Education. Research support came from the Mathematica Policy Research summer fellowship and the Smith Richardson Foundation. REFERENCES Aaronson, Daniel, Lisa Barrow, and William Sander. 2007. Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics 25(1): 95–135. doi:10.1086/508733. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d . f / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 303 Validating Teacher Effects Bacher-Hicks, Andrew, Mark Chin, Thomas J. Kane, and Douglas O. Staiger. 2017. An evaluation of bias in three measures of teacher quality: Value-added, classroom observations, and student surveys. NBER Working Paper No. 23478. Bacher-Hicks, Andrew, Thomas J. Kane, and Douglas O. Staiger. 2014. Validating teacher ef- fect estimates using changes in teacher assignments in Los Angeles. NBER Working Paper No. 20657. Backes, Ben, and Michael Hansen. 2018. The impact of Teach for America on non-test academic outcomes. Education Finance and Policy 13(2): 168–193. Ballou, Dale, and Matthew G. Springer. 2015. Using student test scores to measure teacher per- formance some problems in the design and implementation of evaluation systems. Educational Researcher 44(2): 77–86. doi:10.3102/0013189X15574904. Blazar, David, and Matthew A. Kraft. 2017. Teacher and teaching effects on students’ academic behaviors and mindsets. Educational Evaluation and Policy Analysis 29(1): 146–170. doi:10.3102 /0162373716670260. Blazar, David, Erica Litke, and Johanna Barmore. 2016. What does it mean to be ranked a “high” or “low” value-added teacher? Observing differences in instructional quality across districts. American Educational Research Journal 53(2): 324–359. doi:10.3102/0002831216630407. Campbell, Donald T. 1979. Assessing the impact of planned social change. Evaluation and Pro- gram Planning 2(1): 67–90. doi:10.1016/0149-7189(79)90048-X. Chetty, Raj, John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzen- bach, and Danny Yagan. 2011. How does your kindergarten classroom affect your earnings? Evidence from Project STAR. Quarterly Journal of Economics 126(4): 1593–1660. doi:10.1093/qje /qjr041. Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014a. Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. American Economic Review 104(9): 2593–2632. doi:10.1257/aer.104.9.2593. Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014b. Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood. American Economic Review 104(9): 2633–2679. doi:10.1257/aer.104.9.2633. Clotfelter, Charles T., Helen F. Ladd, and Jacob L. Vigdor. 2006. Teacher-student matching and the assessment of teacher effectiveness. Journal of Human Resources 41(4): 778–820. doi:10.3368 /jhr.XLI.4.778. Duckworth, Angela L., and David Scott Yeager. 2015. Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher 44(4): 237– 251. doi:10.3102/0013189X15584327. Durlak, Joseph A., Roger P. Weissberg, Allison B. Dymnicki, Rebecca D. Taylor, and Kriston B. Schellinger. 2011. The impact of enhancing students’ social and emotional learning: A meta- analysis of school-based universal interventions. Child Development 82(1): 405–432. doi:10.1111 /j.1467-8624.2010.01564.x. Farrington, Camille A., Melissa Roderick, Elaine Allensworth, Jenny Nagaoka, Tasha Seneca Keyes, David W. Johnson, and Nicole O. Beechum. 2012. Teaching adolescents to become learners: The role of noncognitive factors in shaping school performance: A critical literature review. Chicago: University of Chicago Consortium on Chicago School Research. 304 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar Garcia, Emma. 2014. The need to address noncognitive skills in the education policy agenda. Washington, DC: Economic Policy Institute Briefing Paper No. 386. Gelman, Andrew. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1(3): 515–534. doi:10.1214/06-BA117A. Gershenson, Seth. 2016. Linking teacher quality, student attendance, and student achievement. Education Finance and Policy 11(2): 125–149. doi:10.1162/EDFP_a_00180. Glazerman, Steven, and Ali Protik. 2015. Validating value-added measures of teacher perfor- mance. Paper presented at the Association for Public Policy Analysis & Management, November, Miami. Goldhaber, Dan, and Roddy Theobald. 2012. Do different value-added models tell us the same things? Stanford, CA: Carnegie Knowledge Network Brief No. 4. Guarino, Cassandra M., Michelle Maxfield, Mark D. Reckase, Paul N. Thompson, and Jeffrey M. Wooldridge. 2015. An evaluation of Empirical Bayes’ estimation of value-added teacher per- formance measures. Journal of Educational and Behavioral Statistics 40(2): 190–222. doi:10.3102 /1076998615574771. Hanushek, E. A., John F. Kain, Jacob M. Markman, and Steven G. Rivkin. 2003. Does peer ability affect student achievement? Journal of Applied Econometrics 18(5): 527–544. doi:10.1002/jae.741. Hanushek, Eric A., and Steven G. Rivkin. 2010. Generalizations about using value-added mea- sures of teacher quality. American Economic Review 100(2): 267–271. doi:10.1257/aer.100.2.267. Harville, David A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72(358): 320–338. doi:10.1080 /01621459.1977.10480998. Hill, Heather C., David Blazar, and Kathleen Lynch. 2015. Resources for teaching: Examining personal and institutional predictors of high-quality instruction. AERA Open 1(4): 1–23. doi:10.1177 /2332858415617703. Hill, Heather, and Pam Grossman. 2013. Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review 83(2): 371– 384. doi:10.17763/haer.83.2.d11511403715u376. Hill, Heather C., Laura Kapitula, and Kristin Umland. 2011. A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal 48(3): 794–831. doi:10.3102/0002831210387916. Jackson, C. Kirabo. 2012. Non-cognitive ability, test scores, and teacher quality: Evidence from ninth grade teachers in North Carolina. NBER Working Paper No. 18624. Jackson, C. Kirabo. 2016. What do test scores miss? The importance of teacher effects on non-test score outcomes. NBER Working Paper No. 22226. Jacob, Brian, and Lars Lefgren. 2005. Principals as agents: Subjective performance assessment in education. NBER Working Paper No. 11463. Jennings, Jennifer L., and Thomas A. DiPrete. 2010. Teacher effects on social and behavioral skills in early elementary school. Sociology of Education 83(2): 135–159. doi:10.1177/0038040710368011. Jiang, Jennie Y., Susan E. Sporte, and Stuart Luppescu. 2015. Teacher perspectives on eval- uation reform: Chicago’s REACH students. Educational Researcher 44(2): 105–116. doi:10.3102 /0013189X15575517. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 305 Validating Teacher Effects Kane, Thomas J. 2015. Frustrated with the pace of progress in education? Invest in better evi- dence. Washington, DC: The Brookings Institution, Brown Center Chalkboard Series. Kane, Thomas J., Daniel F. McCaffrey, Trey Miller, and Douglas O. Staiger. 2013. Have we iden- tified effective teachers? Validating measures of effective teaching using random assignment. Seattle, WA: Bill and Melinda Gates Foundation MET Project. Kane, Thomas J., Antoniya M. Owens, William H. Marinell, Daniel R. C. Thal, and Douglas O. Staiger. 2016. Teaching higher: Educators’ perspectives on Common Core implementation. Cambridge, MA: Harvard University, Center for Education Policy Research. Kane, Thomas J., and Douglas O. Staiger. 2008. Estimating teacher impacts on student achieve- ment: An experimental evaluation. NBER Working Paper No. 14607. Koedel, Corey, Kata Mihaly, and Jonah E. Rockoff. 2015. Value-added modeling: A review. Eco- nomics of Education Review 47: 180–195. doi:10.1016/j.econedurev.2015.01.006. Koretz, Daniel M. 2008. Measuring up. Cambridge, MA: Harvard University Press. Kraft, Matthew A. (Forthcoming). Teaching effects on complex cognitive skills and social- emotional competencies. Journal of Human Resources. In press. doi:103368/jhr.54.1.0916.8265R3. Kraft, Matthew A., David Blazar, and Dylan Hogan. 2018. The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research. Kupermintz, Haggai. 2003. Teacher effects and teacher effectiveness: A validity investigation of the Tennessee Value Added Assessment System. Educational Evaluation and Policy Analysis 25(3): 287–298. doi:10.3102/01623737025003287. Lyubomirsky, Sonja, Laura King, and Ed Diener. 2005. The benefits of frequent positive affect: Does happiness lead to success? Psychological Bulletin 131(6): 803–855. doi:10.1037/0033-2909.131 .6.803. Mueller, Gerrit, and Erik Plug. 2006. Estimating the effect of personality on male and female earnings. Industrial & Labor Relations Review 60(1): 3–22. doi:10.1177/001979390606000101. Murnane, Richard J., and Barbara R. Phillips. 1981. What do effective teachers of inner-city chil- dren have in common? Social Science Research 10(1): 83–100. doi:10.1016/0049-089X(81)90007-7. Murnane, Richard J., and John B. Willett. 2011. Methods matter: Improving causal inference in edu- cational and social science research. New York: Oxford University Press. Newton, Xiaoxia A., Linda Darling-Hammond, Edward Haertel, and Ewart Thomas. 2010. Value- added modeling of teacher effectiveness: An exploration of stability across models and contexts. Education Policy Analysis Archives 18(23): 1–27. Nye, Barbara, Spyros Konstantopoulos, and Larry V. Hedges. 2004. How large are teacher effects? Educational Evaluation and Policy Analysis 26(3): 237–257. doi:10.3102/01623737026003237. Papay, John P., Eric S. Taylor, John H. Tyler, and Mary Laski. 2016. Learning job skills from colleagues at work: Evidence from a field experiment using teacher performance data. NBER Working Paper No. 21986. Pianta, Robert C., and Bridget K. Hamre. 2009. Conceptualization, measurement, and improve- ment of classroom processes: Standardized observation can leverage capacity. Educational Re- searcher 38(2): 109–119. doi:10.3102/0013189X09332374. Raudenbush, Stephen W., and Anthony S. Bryk. 2002. Hierarchical linear models: Applications and data analysis methods, 2nd ed. Thousand Oaks, CA: Sage Publications. 306 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / f . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar Rivkin, Steven G., Eric A. Hanushek, and John F. Kain. 2005. Teachers, schools, and academic achievement. Econometrica 73(2): 417–458. doi:10.1111/j.1468-0262.2005.00584.x. Rothstein, Jesse. 2010. Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics 125(1): 175–214. doi:10.1162/qjec.2010.125.1.175. Spearman, Charles. 1904. “General Intelligence,” objectively determined and measured. Ameri- can Journal of Psychology 15(2): 201–292. doi:10.2307/1412107. Thompson, Paul N., Cassandra M. Guarino, and Jeffrey M. Wooldridge. 2015. An evaluation of teacher value-added models with peer effects. Unpublished paper, Oregon State University. Todd, Petra E., and Kenneth I. Wolpin. 2003. On the specification and estimation of the produc- tion function for cognitive achievement. Economic Journal (Oxford) 113(485): F3–F33. doi:10.1111 /1468-0297.00097. West, M. R. 2016. Should non-cognitive skills be included in school accountability systems? Pre- liminary evidence from California’s CORE districts. Evidence Speaks Reports 1(13): 1–7. West, Martin R., Matthew A. Kraft, Amy S. Finn, Rebecca E. Martin, Angela L. Duckworth, Christopher F. Gabrieli, and John D. Gabrieli. 2016. Promise and paradox: Measuring students’ non-cognitive skills and the impact of schooling. Educational Evaluation and Policy Analysis 38(1): 148–170. doi:10.3102/0162373715597298. A P P E N D I X A : A D D I T I O N A L DATA Table A.1. Univariate and Bivariate Descriptive Statistics for Student Survey Univariate Statistics Pairwise Correlations Mean SD Cronbach’s Alpha Behavior in Class Self-Efficacy in Math Happiness in Class 0.74 1.00 0.76 0.35*** 1.00 Behavior in Class My behavior in this class is good. My behavior in this class sometimes annoys the teacher. My behavior is a problem for the teacher in this class. Self-Efficacy in Math I have pushed myself hard to completely understand math in this class. 4.10 4.23 3.80 4.27 4.17 4.23 0.93 0.89 1.35 1.13 0.58 0.97 If I need help with math, I make sure that someone gives me 4.12 0.97 the help I need. If a math problem is hard to solve, I often give up before I 4.26 1.15 solve it. Doing homework problems helps me get better at doing math. In this class, math is too hard. Even when math is hard, I know I can learn it. I can do almost all the math in this class if I don’t give up. I’m certain I can master the math skills taught in this class. When doing work for this math class, focus on learning not time work takes. 3.86 4.05 4.49 4.35 4.24 4.11 1.17 1.10 0.85 0.95 0.90 0.99 I have been able to figure out the most difficult work in this 3.95 1.09 math class. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 307 Validating Teacher Effects Table A.1. Continued. Univariate Statistics Pairwise Correlations Mean SD Cronbach’s Alpha Behavior in Class Self-Efficacy in Math Happiness in Class Happiness in Class This math class is a happy place for me to be. Being in this math class makes me feel sad or angry. The things we have done in math this year are interesting. Because of this teacher, I am learning to love math. I enjoy math class this year. 4.10 3.98 4.38 4.04 4.02 4.12 0.85 1.13 1.11 0.99 1.19 1.13 0.82 0.27*** 0.62*** 1.00 Notes: Statistics are generated from all available data. All survey items are on a scale from 1 to 5. Statistics drawn from all available data. ***p < 0.001. Table A.2. Summary of Random Assignment Student Compliance Remained with randomly assigned teacher Switched teacher within school Left school Left district Unknown Total Number of Students Percent of Total 677 168 40 49 9 943 72 18 4 5 1 100 Table A.3. Comparison of Student Compliers and Noncompliers in Randomization Blocks with Low Levels of Noncompliance Noncompliers Compliers p-value on Difference Student Characteristics Male African American Asian Hispanic White FRPL SPED LEP Prior achievement on state math test Prior achievement on state reading test p-value on joint test Teacher characteristics 0.38 0.38 0.12 0.15 0.31 0.64 0.06 0.11 0.30 0.28 0.49 0.33 0.15 0.21 0.27 0.66 0.05 0.21 0.26 0.30 Prior teacher effects on state math scores Students −0.01 67 −0.01 531 0.044 0.374 0.435 0.128 0.403 0.572 0.875 0.016 0.689 0.782 0.146 0.828 Note: Means and p-values are calculated from regression framework that controls for randomization block. FRPL = free- or reduced-price lunch eligible; LEP = limited English proficiency status; SPED = special education status. 308 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d / . f f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 David Blazar Table A.4. Standard Deviation of Teacher-Level Variance in Nonexperimental Sample (1) (2) (3) (4) (5) (6) (7) Panel A: Unshrunken Estimates 0.26 0.48 0.35 0.48 NA 0.38 0.30 0.45 NA 0.38 0.30 0.44 0.25 0.41 0.33 0.43 Panel B: Shrunken Estimates 0.17 0.33 0.15 0.34 X X NA 0.22 0.14 0.31 X NA 0.22 0.14 0.31 X X 0.17 0.26 0.13 0.34 X X X State math test Behavior in Class Self-Efficacy in Math Happiness in Class State math test Behavior in Class Self-Efficacy in Math Happiness in Class Prior achievement Prior survey response Student characteristics Class characteristics School characteristics School fixed effects 0.13 0.16 0.14 0.27 0.14 0.10 0.00 0.33 X 0.19 0.31 0.37 0.43 0.13 0.19 0.00 0.34 X X X X 0.15 0.29 0.30 0.34 0.11 0.21 0.00 0.30 X X X X Teachers Students 51 548 51 548 51 548 51 548 51 548 51 548 51 548 Table A.5. Relationship between Student Outcomes Following Random Assignment and Prior, Nonexperimental Teacher Effect Estimates that Control for an Imputed Measure of Students’ Prior Survey Response Behavior in Class Self-Efficacy in Math Happiness in Class p-value on Difference from 1 SD p-value on Difference from 1 SD p-value on Difference from 1 SD Estimate/SE Estimate/SE Estimate/SE Panel A: Unshrunken Estimates 0.692*** (0.159) 0.702*** (0.165) 0.059 0.078 0.452* (0.197) 0.447* (0.213) Panel B: Shrunken Estimates 1.067*** (0.269) 1.073*** (0.286) 0.805 0.799 0.604 (0.360) 0.563 (0.367) 0.008 0.013 0.279 0.241 0.346* (0.141) 0.347* (0.141) 0.423* (0.175) 0.421* (0.175) 0.000 0.000 0.002 0.002 Teacher effects calculated from model 2 Teacher effects calculated from model 3 Teacher effects calculated from model 2 Teacher effects calculated from model 3 Teachers Students 41 531 41 531 40 509 Notes: Cells include estimates from separate regression models that control for students’ prior achievement in math and reading, student demographic characteristics, classroom characteristics from randomly assigned rosters, and fixed effects for randomization block. Robust standard errors clustered at the class level in parentheses. Model 2 calculates nonexperimental teacher effects controlling for a prior measure of students’ attitude or behavior. Model 3 controls for prior scores on both prior achievement and prior attitude or behavior. Both models include an indicator for whether or not students’ lagged survey response was imputed. *p < 0.05, ***p < 0.001. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / 1 3 3 2 8 1 1 6 9 2 7 0 4 e d p _ a _ 0 0 2 5 1 p d f / . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 309David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image
David Blazar image

Descargar PDF