RACIAL INTERACTION EFFECTS - IA de Investigación especializada en el MIT

EFECTOS DE LA INTERACCIÓN RACIAL

AND STUDENT ACHIEVEMENT

Jeffrey Penney

Departamento de Economía

Pontiﬁcia Universidad

Javeriana

S.J., Bogotá D.C., Colombia

dr.jeﬀrey.penney@gmail.com

Abstracto
Previous research has found that students who are of the same
race as their teacher tend to perform better academically. Este
paper examines the possibility that both dosage and timing mat-
ter for these racial complementarities. Using a model of educa-
tion production that explicitly accounts for past observable inputs,
a conditional diﬀerences-in-diﬀerences estimation procedure is
used to nonparametrically identify dynamic treatment eﬀects of
various sequences of interventions. Applying the methodology to
Tennessee’s Project STAR class size experiment, I ﬁnd that racial
complementarities may vary considerably according to the treat-
ment path. Early exposures to same-race teachers yield beneﬁts
that persist in the medium run. This same-race matching eﬀect
may explain a nontrivial portion of the black–white test score gap.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

447

doi:10.1162/EDFP_a_00202

Racial Interactions and Achievement

INTRODUCCIÓN

1 .
There is a conventional wisdom among educators that when a minority student is
paired with a teacher of the same racial or ethnic background, he or she is more likely to
excel educationally (Dee 2004). There are a number of theories that have been put forth
as the reason for these racial complementarities, perhaps the most popular of which
is that minority teachers can serve as important role models for minority students; un-
other is the idea of “cultural synchronicity” that is hypothesized to occur between mi-
nority students and teachers who share the same cultural background (Ingersoll and
Puede 2011). These arguments, among others, have been advanced to encourage the re-
cruitment of minority teachers.

Racial complementarities in educational achievement have much empirical sup-
puerto. There is a positive correlation between academic achievement and racial match for
both black and white students on concrete measures of performance such as test scores
(Hanushek et al. 2005; Egalite, Kisida, and Winters 2015). Using data from Project
STAR (Student/Teacher Achievement Ratio) and examining each gender–race combi-
nation separately, Dee (2004) ﬁnds that the contemporaneous test score beneﬁts are
of the order of approximately 4 percentile points in all subjects—with the exception
of white girls in reading, where the gain is smaller but not statistically signiﬁcant. On
promedio, the gains are almost as large as those obtained through small class size inter-
ventions that are found with the same data reported in Krueger (1999). The possibil-
ity of favoritism in grading for students who share the teacher’s racial or ethnic back-
ground as the source of these test score gains is a concern, because there is evidence
that teachers evaluate students of the same race more favorably using subjective assess-
ments of academic performance (Ouazad 2014). Sin embargo, the complementarities
are also present when the evaluations are externally administered, which eﬀectively
rules out this possibility. Clotfelter, muchacho, y vigdor (2010) demonstrate the impor-
tance of accounting for racial interactions in education production by illustrating their
eﬀect on race coeﬃcients when same-race dummies are excluded from the regression
equation.

Cost–beneﬁt exercises of any policy should consider not just the immediate eﬀects
but also those in the medium- and long-run time horizons. Además, the interventions
themselves can exhibit substantial diﬀerences in terms of timing and dosage, ambos de
which may matter for the outcomes. These considerations suggest that ignoring the cu-
mulative and dynamic nature of education production may lead to incorrect inferences.
Investigating the eﬀect of small class sizes using data from Project STAR, krüger
(1999) ﬁnds that attending a small class yields contemporaneous test score beneﬁts
in kindergarten through third grade. Por otro lado, estimating a dynamic model
of education production that takes into account the full history of observable inputs,
Ding and Lehrer (2010) conclude that statistically signiﬁcant achievement gains from
the small class intervention in Project STAR are present only in kindergarten and ﬁrst
calificación; they further ﬁnd that the intervention may actually reduce achievement in third
grade for those who were not in small classes from kindergarten through second grade.1

1. Ding and Lehrer (2010) also pay important attention to the selective attrition and noncompliance issues of

Project STAR.

448

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

These results are consistent with that of Hanushek (1999), who ﬁnds an erosion of the
small class eﬀect in later grades.2

en este documento, I extend the work of Dee (2004) by conducting an analysis using
an approach similar to that of the one developed in Ding and Lehrer (2010). I inves-
tigate the eﬀect of racial interactions on student achievement using a dynamic model
of education production that takes into account the full history of observable inputs.
The speciﬁcation allows for both the timing and the dosage to matter for the outcome
at each grade—for example, being instructed by a teacher of the same race in kinder-
garten and ﬁrst grade may have a diﬀerent eﬀect on third-grade math test scores than
having the same-race teacher in the ﬁrst and second grades. To estimate the model, I
use a conditional diﬀerences-in-diﬀerences procedure to nonparametrically identify dy-
namic treatment eﬀects of exposure(s) to a same-race teacher in the short and medium
run. The eﬀect of any particular treatment path can be estimated. The regression model
can be thought of as a value-added model that allows for nonuniform decay of past
inputs.

This study makes use of data from Project STAR, a highly inﬂuential education
experiment that took place in Tennessee in the 1980s that sought to determine the
eﬀect of class size on student achievement. I use data from a cohort that participated
in the experiment from kindergarten until the end of third grade.

The main ﬁndings of the empirical analysis are as follows. I ﬁnd that both the tim-
ing and the dosage of being assigned to a teacher of the same race can matter for test
score gains. The contemporaneous beneﬁts are strongest in the early grades. The es-
timated dynamic treatment eﬀects show that the beneﬁts persist in the medium run,
with early grade exposure to same-race teachers having statistically signiﬁcant beneﬁts
to scores in later grades. I examine whether the ﬁndings are merely an artifact of across-
school sorting of teachers by race. Repeating the analysis using classroom ﬁxed eﬀects
in order to expunge any possible bias arising from within-school quality diﬀerences be-
tween black and white teachers, I ﬁnd no substantive changes in the results. The main
results were qualitatively, and in many cases quantitatively, similar when the analysis
was repeated using various subgroups of students by (1) those who complied with their
treatment assignment, (2) size of school, (3) carrera, (4) género, y (5) socioeconomic
estado.

To conclude the article, I discuss the policy implications of the empirical ﬁndings.
There are economically signiﬁcant gains in achievement which are moderate in mag-
nitude when students are taught by teachers of the same race, ranging from approx-
imately 4 a 10 percentile points on third grade test scores for continuous treatment
from kindergarten through third grade. The existence of eﬀects that persist past the
short run and the economic signiﬁcance of the eﬀects indicate that future research
should investigate the channels through which they occur. Once the channels are iden-
tiﬁed, policy prescriptions relating to within-school racial sorting may be found to be
desirable. In kindergarten, the own-race teacher eﬀect on achievement explains approx-
imately 14 percent and 22 percent of the black-white reading and math test score gaps,

2. Chetty et al. (2011), sin embargo, ﬁnd that improvements in other outcomes, such as higher rates of college atten-

dance, do occur as a result of smaller class sizes, even though their test score beneﬁts fade over time.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

449

Racial Interactions and Achievement

respectivamente, because minorities are far less likely to be matched with a teacher of the
misma raza.

This paper is organized as follows. Sección 2 details the theory and estimation of
the econometric model. I outline the data and perform some initial analyses in section
3. The empirical exercise and robustness checks are performed in section 4. I conclude
the main body of the paper with a discussion of policy implications in section 5. En el
online Appendix, which is available on the Education Finance and Policy’s Web site at
www.mitpressjournals.org/doi/suppl/10.1162/EDFP_a_00202, I examine issues relat-
ing to the validity of the experiment as well as other technical concerns.

2 . M O D E L
The primary purpose of this paper is to derive estimates of the eﬀects of racial matches
between students and teachers on academic achievement for both the short and
medium term. Para tal fin, I use an approach similar to that of Ding and Lehrer (2010),
wherein the estimated parameters from a system of equations (one equation for each
school grade) are used to calculate the dynamic eﬀects of various sequences of inter-
ventions. In order for it to be possible to obtain these estimates, the usual analysis of
education production is augmented by explicitly including past observable inputs and
same-race dummies into the model. I begin this section by detailing the system of equa-
ciones, then continue by describing the procedure through which the dynamic eﬀects of
own-race teachers are obtained and how they are interpreted.

Theory and Estimation
Deﬁne Aig as the achievement of student i in grade g, and let grade g = k denote kinder-
jardín. Let X be a matrix of control variables and a constant term, and deﬁne d to be
a dummy for the same-race intervention where d = 1 if the student has a same-race
teacher and d = 0 de lo contrario. Student ﬁxed eﬀects are given by v. The α vectors de-
note the estimated eﬀects of the controls, and the β coeﬃcients the eﬀects of the treat-
mentos. For a given coeﬃcient γlm where γ = {a, b}, l denotes the level of achievement
that is aﬀected by the input, and m is the time period of the input; Por ejemplo, β31 is
the estimated eﬀect of the same-race treatment in ﬁrst grade on third grade academic
logro. The system of equations to be estimated is as follows:

Aik = vi + Xikαkk + βkkdik + εik,
Ai1 = vi + Xi1α11 + Xikα1k + β11di1 + β1kdik + εi1,
Ai2 = vi + Xi2α22 + Xi1α21 + Xikα2k + β22di2 + β21di1 + β2kdik + εi2,
Ai3 = vi + Xi3α33 + Xi2α32 + Xi1α31 + Xikα3k + β33di3 + β32di2

+ β31di1 + β3kdik + εi3,

(1)

(2)

(3)

(4)

where ε is the usual error term. This formulation allows for the eﬀect of inputs to vary
con el tiempo: Por ejemplo, the eﬀect of having a same-race teacher in kindergarten on
contemporaneous achievement can be diﬀerent in kindergarten compared to ﬁrst grade
(eso es, I allow the possibility that βkk (cid:2)= β1k). The inclusion of past inputs also serves
as a prophylactic against omitted variable bias. To illustrate, consider the equation for

450

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Ai1: If past assignment to a same-race teacher dik is correlated with current assignment
to a same-race teacher di1, and past assignment aﬀects contemporaneous achievement
β1k (cid:2)= 0, then omitting past assignment dik as a regressor in the equation for Ai1 will
cause the estimate of the eﬀect of current assignment on current achievement β11 to
be biased and inconsistent.

I make the following assumptions to permit inference on the regression results.3
The ﬁxed eﬀect vi can be correlated with the observed and unobserved determinants
of achievement; it contains the eﬀect of not only student ability but also other time-
invariant inputs and characteristics. Other unobservable inputs into the education pro-
duction function are assumed to be either ﬁxed over the course of the sample (y
are thus absorbed by the ﬁxed eﬀects) or uncorrelated with the included inputs.4 I as-
sume no pretreatment eﬀects—that is, treatment assignment in future periods does
not aﬀect current achievement. The matrix of controls X contains the following: maestro
características (teacher’s race, years of experience, and whether the teacher has a grad-
uate degree), the type of class the student attends (a small class, a regular class, o un
regular class with a full-time teacher’s aide), school ﬁxed eﬀects that I allow to vary
by grade, and free lunch status. Given these assumptions, any diﬀerential eﬀect of
changes in treatment assignment will reveal themselves in the same-race coeﬃcient
vectores {βk, β1, β2, β3} where β1 = [β11 β1k], and so forth.5

There remains the issue of the ﬁxed eﬀect vi, which is unobservable. If it is corre-
lated with the included inputs but excluded from the regression equation, estimation of
the system of equations 1–4 is biased and inconsistent. Under the assumptions outlined
arriba, the regression coeﬃcients can be consistently estimated by ﬁrst-diﬀerencing
the system of equations—such a transformation will eliminate the individual ﬁxed
efectos. Miquel (2003) and Lechner and Miquel (2010) demonstrate that this condi-
tional diﬀerences-in-diﬀerences approach can be used to nonparametrically identify
the causal eﬀects of sequences of interventions. The equation for achievement in ﬁrst
grade is:

Ai1 − Aik = Xi1α11 + Xik(α1k − αkk) + β11di1 + (β1k − βkk)dik + ε∗

(5)

where ε∗
= εi1 − εik. Note that the kindergarten equation remains unchanged in this
i1
transformación: Although the ﬁxed eﬀect is still present, random assignment in this
grade ensures that the ﬁxed eﬀect is not correlated with the included covariates. Porque
of potentially nonrandom attrition in the following grades, sin embargo, we require that the
ﬁxed eﬀect be diﬀerenced out in the other achievement equations.

Note that the diﬀerencing procedure, under the assumptions above, is an identiﬁ-
cation strategy to obtain unbiased and consistent estimates of the system of equations
1–4. This can be thought of as analogous to a ﬁxed eﬀects procedure—although some

3. A discussion about the particulars regarding identiﬁcation can be found in the online appendix.
4. Using a restricted version of the method developed in Ding and Lehrer (2014), Ding and Lehrer (2010) ﬁnd
evidence that the eﬀect of unobserved inputs on achievement in the STAR data for second and third grades is
relatively constant.

5. Some scholars believe that dynamic complementarities in inputs ought to be modeled in education production
funciones. Such an approach is not required here because random assignment (mira la sección 3) guarantees that
such eﬀects, if they exist, would not bias the coeﬃcients of interest. For the same reason, peer eﬀects are not
explicitly modeled (although they are taken into account in classroom ﬁxed eﬀect models; mira la sección 4).

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

451

Racial Interactions and Achievement

of the variables are transformed in order to perform the estimation, the interpretation
of the coeﬃcients does not change as a result.

Because the diﬀerenced system of equations is triangular, it can be estimated using
equation-by-equation ordinary least squares to obtain the coeﬃcient estimates. Más-
encima, no assumptions are necessary as to the distribution of the error terms. As the
parameters enter recursively into the equations, one is required to estimate them in a
sequential fashion (starting with kindergarten) because the desire is to separately iden-
tify the coeﬃcients of interest. Por ejemplo, we require the estimates of αkk and βkk
from equation 1 to enter into equation 5 in order to obtain the estimates of α11 and
β11.6

This speciﬁcation can be thought of as a value-added model.7 Using the language of
Rothstein (2010), the model estimated here is most similar to the VAM2 speciﬁcation
(value-added model with a lagged achievement variable as a regressor), which implicitly
includes the eﬀect of past inputs by including a lagged term in achievement. Including
lagged achievement as a regressor imposes an assumption of constant decay—that is,
past inputs, both observed and unobserved, are all assumed to decay at a constant rate.
The model used here relaxes the constant decay assumption for the observed inputs
but at the cost of assuming that past time-variant unobservables are uncorrelated with
future observables.

Dynamic Treatment Effects
I now describe the procedure to produce the estimates of the dynamic eﬀects of own-
race teachers on student achievement. en este documento, they are dynamic average treat-
ment on the treated (DATT) estimates—that is, the net eﬀect of the treatment sequence
compared with some other sequence for those who have experienced that treatment
path.8 Denote t(a, b) to be the treatment sequence of an individual where a is the treat-
ment experienced in the ﬁrst period and b is the treatment received in the second. Para
i = {a, b}, let i = 1 if treatment was received and i = 0 de lo contrario. Entonces, a person experi-
encing treatment in both periods would be denoted as receiving the treatment sequence
t(1, 1), a person being treated only in the second period but not in the ﬁrst is denoted
as experiencing the sequence t(0, 1), Etcétera. Using this notation, I can deﬁne the
dynamic treatment eﬀects of interest.

For purposes of exposition, I consider the case of two periods. Let τ (a, b)(w, X) ser
the DATT of the treatment sequence t(a, b) with the counterfactual sequence t(w, X)—
put simply, the net beneﬁt (or cost, if the estimate is negative) of experiencing t(a, b)
rather than t(w, X), but the estimate of this eﬀect is only for those who have experienced

6. Even if simultaneous estimation of the system were possible, estimates of the ﬁxed eﬀect would still be in-
consistent because the number of observations for each ﬁxed eﬀect is limited to four by construction. en un
ﬁrst-diﬀerenced approach, the treatment eﬀects are consistent and unbiased.
It is important to note that although this speciﬁcation can be thought of as a value-added model, this is only
an observation and not an indication that value-added assumptions in this paper are being used to identify the
causal eﬀects of interest for the sequences of interventions.

8. Ding and Lehrer (2010) instead use the terminology dynamic treatment eﬀects for treated (DTET) instead of
DATT used here. These both refer to the same thing; the latter terminology is used because I feel that the
acronym solidly connects it to the frequently used average treatment eﬀects on the treated, which is commonly
shortened as ATT.

452

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

t(a, b).9 Por ejemplo, t (1, 1)(0, 0) refers to the DATT of having an own-race teacher
in kindergarten and ﬁrst grade compared with not having same-race teachers in both
grades for those who had teachers of the same race in both grades, and τ (1, 0)(0, 0)
describes the eﬀect on achievement of an exposure to a teacher of the same race in
kindergarten compared with never having had a same-race teacher for those who have
only had a same-race teacher in kindergarten. Using the estimated parameters from
equation 5, the DATT for the two examples would be calculated as follows:

t (1, 1) (0, 0) = ˆβ11 + ˆβ1k,
t (1, 0) (0, 0) = ˆβ1k.

The standard errors of these eﬀects are calculated using the standard formula for sums
of random variables.10 The same logic extends to more than two periods.

3 . DATA
Descripción
The data used in this study come from a cohort of students who participated in Project
STAR, an experiment that took place in Tennessee and ran from 1985 until 1989. El
experiment was legislated into existence and funded by the state government11 at a cost
of approximately $12 million over ﬁve years—this ﬁgure includes the data analysis and
reporting that took place in the ﬁfth year. The primary goal of the STAR experiment, como
its acronym implies, was to determine the eﬀect of class size on student achievement
in primary education (Finn et al. 2007). Across the state, 79 schools signed up for the
experiment and had to commit to participation for four years. Data were also gath-
ered from nonparticipating schools to use as a benchmark. To qualify for participation
in Project STAR, schools required enough students to support at least three diﬀerent
classes per grade. Students and teachers were randomly assigned within schools to one
of three class types: a small class (13 a 17 estudiantes), a regular class (22 a 25 estudiantes),
or a regular class with a full-time teacher’s aide. Regular classes in ﬁrst through third
grade still had a part-time teacher’s aide available to assist the class for approximately
25 por ciento a 33 por ciento del tiempo, on average. It was initially intended that students
stay in their assigned class type from kindergarten through third grade, although af-
ter kindergarten students in regular or regular with aide classes were randomly per-
manently reassigned between these two class types. An examination of 1,581 estudiantes
enrolled in kindergarten found that compliance was almost perfect (krüger 1999). En
ﬁrst grade and beyond, sin embargo, there were some problems with noncompliance, con
a number of students switching in or out of small classes. Noncompliance was primar-
ily due to parental complaints or discipline problems (krüger 1999). At the end of each
año, all participating students were given a battery of academic and nonacademic tests.
More detailed overviews of Project STAR can be found in Krueger (1999) and Finn et al.
(2007).

9. This condition of the estimate only referring to those who experienced the sequence t(a, b) is necessary because
treatment eﬀect on the treated estimates is being obtained. Additional assumptions are required in order to
interpret them simply as dynamic average treatment eﬀects; I do not make these assumptions here.

(cid:2)

10. Por ejemplo, the standard error of τ (1, 1)(0, 0) = ˆβ11 + ˆβ1k is equal to
11. See Word et al. (1990).

era( ˆβ11 ) + era( ˆβ1k ) + 2cov( ˆβ11, ˆβ1k ).

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

453

Racial Interactions and Achievement

Mesa 1.

Proportion of Students with a Teacher of the Same Race

Kindergarten

First Grade

Second Grade

Third Grade

White students

Black students

0.9414

0.4023

0.9588

0.4454

0.9213

0.4480

0.9501

0.5036

Notas: Numbers calculated from sample data. The table shows the proportion of
students of a given race who had a teacher of the same race for the listed grade in
the Project STAR cohort.

en este documento, the measures of student achievement examined are obtained from the
seventh edition Stanford Achievement Test scores in mathematics, lectura, and word
recognition. The tests were designed so that the scores were comparable across grades
(Finn et al. 2007)-eso es, students eﬀectively took the same tests in each subject.12 I
elect to use the natural scaled scores in this analysis in order to avoid potential pitfalls
associated with some transformations of the test score data. Cascio and Staiger (2012)
show that the use of normalized scores mechanically cause the estimated impacts of
interventions to appear to fade over time.13 Percentile scores are typically used when
the scaled test scores across several grades are not directly comparable, which is not
the case here. Sin embargo, the results of this paper are qualitatively, and in most cases
quantitatively, similar (in terms of precision of the estimates and relative magnitude
of the coeﬃcients) when examined in percentile and normalized form, which should
assuage concerns raised in Bond and Lang (2013) regarding the ordinality of test score
variables and its eﬀects on inference.

I follow the STAR cohort of students who entered the program in 1985, a excepción de
students who joined after kindergarten. This is done to more credibly estimate the full
sequence of dynamic eﬀects (Ding and Lehrer 2010). I only keep students whose race
is either black or white, which results in a loss of 33 students from the sample (bajo 1
por ciento). Dropping these students does not aﬀect the results.14

Summary Statistics
The Project STAR cohort of students is highly segregated according to school—only
about one in ﬁve of any particular Project STAR grade has a racial balance that lies
entre 20 percent and 80 percent of students being of a single race.15 Moreover,
most teachers these cohorts encounter that have predominantly white student bodies
are themselves white, whereas teachers who teach Project STAR cohorts with major-
ity black student bodies have a more even racial distribution. The proportions of stu-
dents for each grade who are taught by a teacher of the same race are displayed in
mesa 1.

12. There is considerable overlap in the test scores across grades. Por ejemplo, the top kindergarten students

performed similarly to the median third grade students in mathematics.

13. Cascio and Staiger (2012) show that these results stem from the increasing variance in accumulated knowledge
as students move through school. There is no such pattern of increasing variance in the scaled test scores in
the data used here.

14. There are twelve teachers in the sample (less than 1 percent of the teacher pool) who are neither black nor white,

and they all teach third grade. Excluding them from the analysis does not change the results.

15. Desafortunadamente, information is not available on the racial composition at the level of the entire school.

454

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Mesa 2.

Transition Tree

Kindergarten

First Grade

Second Grade

Third Grade

t(1) = 4, 814
t(0) = 1, 435

t(1, 1) = 3, 155
t(1, 0) = 358
t(1, ·) = 1, 301

t(0, 1) = 436
t(0, 0) = 502
t(0, ·) = 497

t(1, 1, 1) = 2, 290
t(1, 1, 0) = 252
t(1, 1, ·) = 613

t(1, 0, 1) = 182
t(1, 0, 0) = 64
t(1, 0, ·) = 112

t(0, 1, 1) = 164
t(0, 1, 0) = 164
t(0, 1, ·) = 108

t(0, 0, 1) = 115
t(0, 0, 0) = 233
t(0, 0, ·) = 154

t(1, 1, 1, 1) = 1,946
t(1, 1, 1, 0) = 117
t(1, 1, 1, ·) = 227

t(1, 1, 0, 1) = 176
t(1, 1, 0, 0) = 44
t(1, 1, 0, ·) = 32

t(1, 0, 1, 1) = 101
t(1, 0, 1, 0) = 43
t(1, 0, 1, ·) = 38

t(1, 0, 0, 1) = 18
t(1, 0, 0, 0) = 31
t(1, 0, 0, ·) = 15

t(0, 1, 1, 1) = 101
t(0, 1, 1, 0) = 34
t(0, 1, 1, ·) = 29

t(0, 1, 0, 1) = 76
t(0, 1, 0, 0) = 44
t(0, 1, 0, ·) = 44

t(0, 0, 1, 1) = 48
t(0, 0, 1, 0) = 32
t(0, 0, 1, ·) = 35

t(0, 0, 0, 1) = 52
t(0, 0, 0, 0) = 136
t(0, 0, 0, ·) = 45

Notas: The number of students that experience each treatment path are given after the equal sign.
A downward move corresponds to dig = 0 in the previous period, and an upward move signifies
dig = 1 in the previous period. A floating dot symbol · denotes attrition in period g. Por ejemplo,
108 children had the sequence t(0, 1) then attrited in the second grade, y 43 children have
undergone the treatment sequence t(1, 0, 1, 0).

The transitions that students experience are displayed in table 2. We see that the vast
majority of students have a teacher of the same race throughout the grades, y eso
other treatment paths have less support. This means that the standard errors produced
in the estimation process will be conservative in terms of inference by favoring the null
hypothesis, ceteris paribus.

An initial look at the relationship between having an own-race teacher and test score
performance is presented in table 3. The average test score is never higher for white
students with black teachers in any of the twelve grade–subject pairs; for black stu-
abolladuras, it is higher in two of the twelve categories if they have a white teacher (en el
third grade). To see whether these ﬁndings may hold across the distribution, I perform

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

455

Racial Interactions and Achievement

Mesa 3.

Average Test Scores by Race and Racial Match

White Students

Black Students

Teacher is Same Race

Different Race

Teacher is Same Race

Different Race

Kindergarten

Matemáticas

Reading

Word recognition

Observaciones

First grade

Matemáticas

Reading

Word recognition

Observaciones

Second grade

Matemáticas

Reading

Word recognition

Observaciones

Third grade

Matemáticas

Reading

Word recognition

Observaciones

491.79

440.77

439.07

3,655

545.52

540.56

529.34

2,446

596.64

602.56

601.72

2,277

633.04

630.23

627.80

2,096

487.99

437.36

433.77

183

525.20

518.95

513.80

105

593.09

595.43

599.65

199

620.39

624.09

620.17

110

481.51

432.74

429.21

725

521.39

503.81

502.61

505

569.64

570.85

576.50

430

607.63

608.54

602.88

404

468.49

426.49

422.89

1,131

511.32

499.43

498.70

608

565.89

566.93

567.31

519

609.35

608.09

604.01

362

Notas: Numbers calculated from sample data. The table displays the average scaled Stanford Achievement Test
scores by subject, carrera, racial match, and grade.

Kolmogorov-Smirnov tests on the kindergarten test scores. I ﬁnd that students in
classes with a teacher of the same race have higher mathematics, lectura, and word
recognition test scores compared with those who do not (pag = 0.000 for all tests).

One of the primary purposes of this paper is to determine whether the timing of
the same-race teacher treatment matters. To illustrate the potential importance of this,
en la figura 1 I plot the density of the third-grade math and reading test scores for two
diﬀerent treatment paths: one in which the student experienced two early treatments
(the treatment path t(1, 1, 0, 0)), and the other in which the treatments arrived later (el
treatment path t(0, 0, 1, 1)). In both subjects, students who were exposed to teachers
of the same race earlier have a distribution of test scores that lies to the right of the
distribution that corresponds to those who were exposed later, despite both groups of
students experiencing the same number of same-race treatments.16

This preliminary analysis allows us to come to several substantive conclusions:
There may be reason to believe that own-race teachers increase student achievement
y, should this be true, it may be that this can explain part of the black–white student
test score gap because white students are far more likely to be paired with a teacher of
the same race compared with black students. Además, the timing of the treatments
may matter for academic outcomes.

16. The results are also similar for word recognition test scores.

456

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Notas: Kernel density estimates of the probability density functions for two different treatment paths calculated from sample data
using the Epanechnikov kernel with optimal bandwidth. The kernel densities are evaluated at 50 puntos. See section 2 for notational
details.

Cifra 1.

Distribution of Third Grade Test Scores by Treatment Path.

4 . E M P I R I C A L A N A LY S I S
Resultados
Mesa 4 presents the estimates on the coeﬃcients of the dig variables obtained by
estimating the system of equations described in section 2.17 I denote the estimated
coeﬃcients from the table as structural because they (and their covariance matrix) son
required to calculate the DATT estimates.

Taken in isolation, the estimated parameters from the system of equations cor-
respond to dynamic average treatment on the treated estimates for single exposures.
Por ejemplo, the estimate of the coeﬃcient on dik in grade 3 is the estimate of
t (1, 0, 0, 0)(0, 0, 0, 0), which is the estimated DATT for a student who has an own-
race teacher in kindergarten but never again for those who have only had an own-race
teacher in kindergarten. Examining these results we see that, for a single intervention,
early exposure generally beneﬁts children more than late exposure. There appear to be
precisely estimated positive eﬀects up until second grade for the case of mathematics.
The eﬀect of the same-race teacher treatment can be persistent in the medium run: El
beneﬁt from kindergarten for a single exposure is statistically signiﬁcant in all grade–
subject combinations.

17. Note that these estimates assume the eﬀect of racial interactions is the same across races and school-grade

racial compositions; I examine these considerations in the next section.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

457

Racial Interactions and Achievement

Mesa 4.

Structural Coefficient Estimates

Matemáticas

Reading

Word Recognition

Kindergarten
βkk

First grade
β1k

β11

Second grade

β2k

β21

β22

Third grade
β3k

β31

β32

β33

11.40**
(2.52)

4.39
(2.58)
12.00**
(2.68)

6.33**
(2.42)
−0.17
(2.95)
6.24**
(2.77)

5.65
(2.99)

1.53
(2.81)

5.14
(2.99)
−4.53
(2.45)

5.08**
(1.63)

8.78**
(2.57)

3.79
(2.63)

4.94
(2.65)
−1.39
(3.04)

3.98
(2.74)

11.26**
(2.36)

2.05
(2.45)

1.79
(2.74)
−2.38
(2.51)

5.10**
(1.87)

9.09**
(3.18)

3.22
(3.18)

4.55
(3.57)

1.71
(4.03)

1.31
(3.19)

12.76**
(3.97)

7.03
(3.91)
−0.19
(3.29)

0.81
(3.60)

Notas: The table contains the structural coefficient estimates of an
own-race teacher on the dig variables in the system of equations
described in section 3 that are to be used in the calculation of the
DATT; see the text for details. Standard errors clustered at the level of
the classroom are given in parentheses. Scaled test scores are used
as the response variable. Observations are weighted using inverse
probability weights; see section A.3 of the online appendix.
**Statistical significance at the 1% nivel.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

The case of having an own-race teacher in multiple grades is displayed in table 5.
Multiple exposures are shown to be beneﬁcial in many cases. Sin embargo, although the
number of doses matters, so does their timing: Examining τ (1, 1, 0, 0)(0, 0, 0, 0) y
t (0, 0, 1, 1)(0, 0, 0, 0) in third grade, we see that the former sequence of treatments
gives far more of a beneﬁt than the latter in all subjects, even though both sequences
give two exposures to a teacher of the same race. The diﬀerence between these treat-
ment paths is statistically signiﬁcant at the 5 percent level for mathematics and at the
1 percent level for reading and word recognition. Diﬀerences in timing do not always
result in diﬀerences in outcomes—in second grade, there are no statistically signiﬁ-
cant diﬀerences in the dynamic average treatment on the treated estimates between
t (0, 1, 1)(0, 0, 0) and τ (1, 1, 0)(0, 0, 0) in any of the subjects. Another insight is that
additional doses of treatment on a treatment path may not always yield additional tan-
gible beneﬁts. Comparing τ (1, 1, 1, 1)(0, 0, 0, 0) to τ (1, 1, 1, 0)(0, 0, 0, 0), el primero
sequence does not appear to be that much more beneﬁcial for all subjects because the
estimated DATTs are well within each other’s conﬁdence intervals (eso es, there is no
statistically signiﬁcant diﬀerence at the 5 nivel porcentual). Por eso, the beneﬁt of a teacher

458

Jeffrey Penney

Mesa 5.

Dynamic Average Treatment on the Treated Estimates

Matemáticas

Reading

Word Recognition

Kindergarten
t (1)(0)

Observaciones

First grade

t (1, 1)(0, 0)

t (1, 0)(0, 0)

Observaciones

Second grade

t (1,1,1)(0,0,0)

t (1,1,0)(0,0,0)

t (0,1,1)(0,0,0)

Observaciones

Third grade

t (1,1,1,1)(0,0,0,0)

t (1,1,1,0)(0,0,0,0)

t (1,1,0,0)(0,0,0,0)

t (0,0,1,1)(0,0,0,0)

Observaciones

11.40**
(2.52)

5,782

16.38**
(2.74)

4.39
(2.58)

3,958

12.40**
(3.20)
6.16*
(3.09)

6.07
(3.64)

2,336

7.80*
(3.41)
12.32**
(3.45)
7.18*
(3.34)

0.61
(3.77)

1,840

5.08**
(1.63)

5,701

12.57**
(2.76)
8.78**
(2.57)

3,865

7.53*
(2.95)

3.55
(3.29)

2.59
(3.35)

2,338

12.73**
(3.03)
15.10**
(3.42)
13.31**
(2.93)
−0.58
(3.19)

1,852

5.10**
(1.87)

5,762

12.31**
(3.15)
9.09**
(3.18)

3,359

7.57
(4.00)

6.26
(4.29)

3.02
(4.45)

2,348

20.41**
(5.10)
19.60**
(5.44)
19.79**
(4.53)

0.62
(4.46)

1,877

Notas: The table displays the dynamic average treatment on the treated estimates
for exposure to a teacher of the same race for a given treatment path τ (·). estan-
dard errors clustered at the level of the classroom are given in parentheses. Scaled
test scores are used as the response variable. Regressions as controls include
class type, free lunch status, teacher years of experience and its square, si
the teacher has a graduate degree, and whether the teacher is black. Student fixed
effects are included in the specification. Observations are weighted using inverse
probability weights; see section A.3 of the online appendix.
*Statistical significance at the 5% nivel; **statistical significance at the 1% nivel.

of the same race for mathematics, lectura, and word comprehension in third grade
may potentially be limited.

This paper has heretofore demonstrated that there exist statistically signiﬁcant ben-
eﬁts to academic achievement by sorting students and teachers along the dimension of
carrera. The question remains as to the policy relevance, which depends on the economic
signiﬁcance of these gains. Dividing the coeﬃcients of the DATT from table 5 by the
standard deviations of the test scores in the respective grades, I ﬁnd that the gains from
having a same-race teacher to be about the level of the beneﬁt of being assigned to a
small class in Project STAR (ver, p.ej., Mueller 2013). Por ejemplo, the beneﬁts of treat-
ment in kindergarten range from 0.14 standard deviations for word recognition test
scores to 0.24 standard deviations for mathematics scores. Continuous treatment in
reading from kindergarten through third grade yields a test score increase of 0.34 estan-
dard deviations. The magnitude of the eﬀects is roughly comparable with those found

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

459

Racial Interactions and Achievement

in Dee (2004).18 En general, these represent moderate gains in academic achievement and
are therefore policy relevant.

Robustez
Teacher Sorting
There is a concern that the results of the analysis are driven by selection due to teacher
sorting across schools since teachers were randomized only within schools. This is an
important consideration, as it has been shown that teacher–school matching is a rel-
evant factor in education production (Jackson 2013). If schools whose students were
primarily white attracted high-quality white teachers and poor-quality black teachers,
and predominantly black schools attracted high-quality black teachers and low-quality
white teachers, then the estimates of the racial complementarities would be biased up-
wards. Because approximately 85 percent of the total variation in teacher quality occurs
within schools (Chetty, Friedman, and Rockoﬀ 2014; Rothstein 2014), this pattern of
sorting is a possibility that must be taken seriously. Note that controlling for teacher
observables does not solve the selection problem in this case because the signiﬁcance
of teacher unobservable heterogeneity in the determination of student achievement is
quite high, and is responsible for far more of its variation than observable character-
istics, such as the teacher’s qualiﬁcations or experience (Rivkin, Hanushek, and Kain
2005).

The eﬀects of teacher quality on the robustness of the racial interaction eﬀects can
be assessed by using classroom ﬁxed eﬀects in place of school ﬁxed eﬀects in the re-
gression (Dee 2004). This will result in the racial interaction eﬀects being identiﬁed us-
ing within-classroom variation; por lo tanto, any potential teacher sorting across schools
by quality and race will no longer be conﬂated with the racial interaction eﬀect. Este
is because the estimate of the eﬀect is no longer also capturing any potential within-
school quality diﬀerences of a low-quality teacher whose students are mostly of the
opposite race with a high-quality teacher whose students are largely of the same race
(which is a potential danger if the racial interaction eﬀects are identiﬁed using within-
school variation—i.e., by using a school ﬁxed eﬀect). An additional beneﬁt of including
classroom ﬁxed eﬀects is that it also controls for other unobservable teacher inputs and
classroom eﬀects (such as peer eﬀects). Estimating the system of equations using class-
room ﬁxed eﬀects, I ﬁnd no substantive diﬀerences in the results, which are displayed
in table 6.19

Subsample Analysis
There is a moderate level of noncompliance with classroom type assignment in the
Project STAR data. Although noncompliance was estimated to be only about 0.3 por-
cent of the sample in kindergarten (krüger 1999), a signiﬁcant number of students
moved between regular, regular with aide, and small classes in ﬁrst grade and beyond.
In the sample, aproximadamente 5 percent do not comply in ﬁrst grade, acerca de 13 por ciento

18. Dee (2004) uses percentile scores in his analysis—an indirect comparison can be made by examining what

the percentile scores correspond to on average in terms of standard deviations.

19. These regressions contain far fewer control variables. En particular, there is no teacher race variable, porque
this would be perfectly collinear with the “same race as teacher” variable. Por ejemplo, if the teacher is black,
all white students would have the same-race dummy equal to zero.

460

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Mesa 6.

Own-race Teacher Effect on Achievement, Using Classroom Fixed Effects

Matemáticas

Reading

Word Recognition

Kindergarten
t (1)(0)

Observaciones

First grade

t (1, 1)(0, 0)

t (1, 0)(0, 0)

Observaciones

Second grade

t (1,1,1)(0,0,0)

t (1,1,0)(0,0,0)

t (0,1,1)(0,0,0)

Observaciones

Third grade

t (1,1,1,1)(0,0,0,0)

t (1,1,1,0)(0,0,0,0)

t (1,1,0,0)(0,0,0,0)

t (0,0,1,1)(0,0,0,0)

Observaciones

15.39**
(2.63)

5,782

17.12**
(2.69)
5.57*
(2.55)

3,958

13.76**
(3.36)
7.89**
(3.00)
6.68*
(3.31)

2,336

13.68**
(3.25)
16.90**
(3.39)
11.32**
(3.18)

2.36
(3.38)

1,840

6.00**
(1.76)

5,701

13.01**
(2.86)
10.94**
(2.93)

3,865

8.46*
(3.40)

3.28
(2.95)

3.55
(3.27)

2,338

17.85**
(3.26)
18.52**
(3.77)
14.50**
(3.05)

3.35
(3.22)

1,852

5.28*
(2.05)

5,762

13.15**
(3.40)
10.28**
(3.79)

3,359

9.22*
(4.11)

4.01
(3.81)

5.26
(4.42)

2,348

26.94**
(5.13)
25.26**
(5.63)
20.01**
(4.43)

6.93
(4.30)

1,877

Notas: The table displays the dynamic average treatment on the treated estimates
for exposure to a teacher of the same race for a given treatment path τ (·). Estándar
errors clustered at the level of the classroom are given in parentheses. Scaled test
scores are used as the response variable. Regressions only include free lunch status
as additional covariates, since the classroom fixed effects absorb all classroom-
invariant control variables. Student fixed effects are included in the specification.
Observations are weighted using inverse probability weights; see section A.3 of the
online appendix.
*Statistical significance at the 5% nivel; **statistical significance at the 1% nivel.

do not comply in second grade, and roughly 20 percent do not comply in third grade. Si
students nonrandomly switched class types based on the race of the teacher they would
have been assigned, estimates of the teacher eﬀects would be biased and inconsistent.
To examine whether the results are sensitive to nonrandom switchers, I estimate the
system of equations using only those that comply with their treatment assignment, el
results of which are displayed in table 7.20 Despite the loss of a considerable number
of observations, the results are largely similar to those from the full sample.21

20. These estimates may be biased in the presence of nonrandom attrition—they are meant to serve as a sanity

check.

21. To account for the possibility of nonrandom sorting across classrooms (such as noncompliance based on the
student’s lack of a racial match with his teacher in the assigned class type), Dee (2004) uses an instrumental
variables strategy using the probability that a student is assigned to a teacher of the same race as the instru-
mento. Compared with the ordinary least squares estimates, the results are almost unchanged, providing strong

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

461

Racial Interactions and Achievement

Mesa 7.

Own-race Teacher Effect on Achievement, Compliers of Treatment Assignment

Matemáticas

Reading

Word Recognition

First grade

t (1, 1)(0, 0)

t (1, 0)(0, 0)

Observaciones

Second grade

t (1,1,1)(0,0,0)

t (1,1,0)(0,0,0)

t (0,1,1)(0,0,0)

Observaciones

Third grade

t (1,1,1,1)(0,0,0,0)

t (1,1,1,0)(0,0,0,0)

t (1,1,0,0)(0,0,0,0)

t (0,0,1,1)(0,0,0,0)

Observaciones

16.88**
(3.91)
6.26*
(2.71)

3,660

16.02**
(3.58)
9.10*
(3.64)

6.29
(4.15)

1,996

12.26**
(3.62)
15.13**
(3.79)
10.98**
(3.84)

1.28
(4.15)

1,457

11.42**
(3.99)
10.43**
(3.05)

3,572

8.83*
(3.69)

7.49
(4.10)

0.14
(4.31)

1,997

11.27**
(3.56)
15.09**
(4.06)
17.36**
(3.75)
−6.09
(3.45)

1,468

7.38
(4.33)
10.91**
(3.59)

3,121

7.73
(4.57)

8.40
(5.06)
−3.98
(5.67)

2,005

15.19**
(5.81)
12.56*
(5.75)
16.96**
(4.92)
−1.77
(4.78)

1,487

Notas: The table displays the dynamic average treatment on the treated estimates for expo-
sure to a teacher of the same race for a given treatment path τ (·) using the subpopulation of
those that comply with their assigned class type. Standard errors clustered at the level of the
classroom are given in parentheses. Scaled test scores are used as the response variable.
Regressions as controls include class type, free lunch status, teacher years of experience
and its square, whether the teacher has a graduate degree, and whether the teacher is black.
Student fixed effects are included in the specification. Kindergarten estimates are not in-
cluded because students are assumed to have complied in this initial grade; they would be
identical to those in table 5. Observations are weighted using inverse probability weights;
see section A.3 of the online appendix.
*Statistical significance at the 5% nivel; **statistical significance at the 1% nivel.

Past research has found that the eﬀect of small classes may vary according to the
school characteristics (Ding and Lehrer 2011). Given this, I examine whether there exists
a diﬀerential eﬀect of racial matching according to school size. Both small schools (de-
ﬁned as the bottom 50 percent in school enrollment at kindergarten) and large schools
(deﬁned as the top 50 por ciento) show largely similar qualitative and, in most cases, quan-
titative results. Desafortunadamente, robustness checks according to a school’s racial composi-
tion are not possible because data are only available for the current grade of each school
in Project STAR.

Dee (2004) ﬁnds that own-race teacher eﬀects existed in almost all subjects for both
blacks and whites, and that the magnitude of the eﬀects was similar. Aquí, I investigate
whether there exists a diﬀerential eﬀect of an own-race teacher treatment for black
estudiantes, who constitute about a third of the sample. I estimate the regressions only

evidence that this type of sorting was absent in the data. An analogous approach is not possible here because
of the estimation strategy utilized.

462

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Mesa 8.

Own-race Teacher Effect on Achievement, Black Students

Matemáticas

Reading

Word Recognition

Kindergarten
t (1)(0)

Observaciones

First grade

t (1, 1)(0, 0)

t (1, 0)(0, 0)

Observaciones

Second grade

t (1,1,1)(0,0,0)

t (1,1,0)(0,0,0)

t (0,1,1)(0,0,0)

Observaciones

Third grade

t (1,1,1,1)(0,0,0,0)

t (1,1,1,0)(0,0,0,0)

t (1,1,0,0)(0,0,0,0)

t (0,0,1,1)(0,0,0,0)

Observaciones

5.31
(4.60)

1,889

16.07*
(6.55)
10.87*
(4.96)

1,156

5.37
(6.96)

0.87
(5.64)

1.33
(6.29)

642

3.69
(7.56)

10.67
(6.60)

0.56
(5.40)

3.13
(6.20)

436

3.36
(3.02)

1,852

12.76*
(5.72)
9.60*
(4.57)

1,150

3.91
(6.62)

2.87
(5.77)
−1.17
(6.60)

641

15.84*
(7.26)
17.01**
(6.28)
14.52**
(5.17)

1.32
(5.09)

441

2.49
(3.26)

1,889

12.50
(7.07)

7.99
(5.42)

990

0.90
(8.22)

2.74
(6.70)
−5.09
(8.48)

646

14.67
(8.07)
17.07*
(7.22)
19.22**
(5.72)
−4.56
(5.47)

448

Notas: The table displays the dynamic average treatment on the treated estimates
for exposure to a teacher of the same race for a given treatment path τ (·) usando
black students only. Standard errors clustered at the level of the classroom are given
entre paréntesis. Scaled test scores are used as the response variable. Regressions
as controls include class type, free lunch status, teacher years of experience and
its square, whether the teacher has a graduate degree, and whether the teacher
is black. Student fixed effects are included in the specification. Observations are
weighted using inverse probability weights; see section A.3 of the online appendix.
*Statistical significance at the 5% nivel; **statistical significance at the 1% nivel.

using black students—the results of the estimation are in table 8. It is important to
note that there are signiﬁcantly fewer observations compared with most of the other
regressions in this paper, which entails a substantial cost in precision. The beneﬁts
from treatment appear to exhibit a qualitatively similar pattern to the results in table 5.
Sin embargo, the lack of precision means that we cannot reject the hypotheses that the
estimates are diﬀerent from zero in many cases, even though they may be numerically
similar to the DATT estimates of the full sample.

Additional robustness checks were also performed to see whether the eﬀects vary
by gender and by socioeconomic status. I repeat the analysis of the main section by
gender and do not ﬁnd any substantive changes in the results. An analysis of those of
low socioeconomic status (proxied by receiving free or reduced-price lunches in kinder-
jardín) also reveals that the numbers are largely unchanged. For reasons of space, estos
tables are not included in this paper.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

463

Racial Interactions and Achievement

5 . P O L I C Y D I S C U S S I O N
Because much of the beneﬁt from an own-race teacher comes from kindergarten and
ﬁrst grade in most subjects and the beneﬁt appears to persist for at least a few years,
some may argue that it may be justiﬁable to sort teachers and students according to race
in the ﬁrst few grades if the goal is to maximize student achievement.22 Such a policy
is especially attractive because eﬀectively costless gains may potentially be obtained by
simply reallocating students and teachers across classrooms. Though academic ben-
eﬁts of sorting students and teachers across race within classrooms are present, ellos
come with an important caveat: Additional research should ﬁrst be conducted on the
source of these complementarities to determine why they exist before incorporating
such a policy.23 For example, if they are present because teachers exert more eﬀort to
students who are of the same race, and such eﬀort comes at the intensive margin, entonces
teachers are engaging in favoritism towards students of the same race at the expense of
students who do not share their racial background. En breve, we do not yet know if the
same-race eﬀects are a “free lunch.” Moreover, such racial sorting could have pernicious
eﬀects on student noncognitive skills, such as the ability to socialize and interact with
students of diﬀerent races or the willingness to respect authority ﬁgures of a diﬀerent
carrera. General equilibrium issues may also be relevant because of supply constraints.
Should a concerted eﬀort to hire a more representative workforce in order to more ef-
fectively incorporate this policy be successful, it may result in a lower average quality of
teachers from the underrepresented races if we assume that the highest quality teachers
are hired ﬁrst (and a higher average quality for the majority race teachers). This latter
assumption seems plausible—California’s experiment with class size reductions led
to considerable decreases in teacher quality and exacerbated inequalities across school
districts because educational institutions were forced to hire teachers who lacked ex-
perience and credentials in order to implement the policy (Imazeki 2003; Jepsen and
Rivkin 2009). En este momento, there appears to be insuﬃcient evidence to support a policy
of sorting students and teachers across classrooms by race.

The positive inﬂuence of a teacher of the same race on student achievement may
help explain a small but nontrivial part of the racial test-score gap between black and
white students, because black students in the sample are far less likely to be matched
with an own-race teacher compared with white students. This pattern continues to this
day because of the continuing shortage of minority teachers (Ingersoll 2015). Mesa 9
displays the data concerning the racial test-score gap in the Project STAR data, dónde
the ﬁgures are in standard deviation units.24 The raw gap for math is slightly over half
the size, as in Fryer and Levitt (2006) (where they ﬁnd a gap of 0.663 standard de-
viations), and the raw gap in reading is minimally smaller (where it is 0.4 standard
deviations). Including student and teacher covariates does not appreciably change the

22. Racial sorting at the level of the classroom, although not wholesale segregation, may nonetheless be subject to

legal challenges under Title IX, Por ejemplo.

23. Además, segregation at the school level (rather than at the level of the classroom) is not a policy that is being
recommended here. Históricamente, such policies have had many pernicious social and economic eﬀects when
implemented.

24. School ﬁxed eﬀects are included in the adjusted gaps to account for the fact that the kindergarten grades in the
sample have a high level of racial segregation. In this analysis school grades whose student bodies are white are
much more likely to have own-race teacher matches, and therefore the contribution to the same-race teacher
gap may be overestimated if this is not controlled for.

464

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Mesa 9.

Estimated Black—White Test Score Gap in Kindergarten

Raw Gap

Adjusted

With Same Race

% of Gap Explained

Matemáticas

Reading

School fixed effects?

−0.37
−0.36
No

−0.36
−0.25
Sí

0.28
−0.21
Sí

21.72

13.55

Notas: This table displays regression results where a normalized test score is the response
variable, and the displayed coefficient is the black student dummy. Numbers are in standard
deviations, save the final column. The adjusted column includes student and teacher covariates,
and the column following adds a same-race teacher dummy.

gap in math but decreases it considerably in reading—although these adjusted gaps
are much larger than in Fryer and Levitt (2006).25 Augmenting the model further with
an own-race teacher variable moderately narrows the racial gap for mathematics, y
provides a drop of roughly half that reduction in reading. En general, accounting for racial
matches appears to explain a nontrivial portion of the gap in test scores between black
and white students.

EXPRESIONES DE GRATITUD
This paper is based on a chapter of my doctoral thesis and I thank my thesis committee for their
comentario. I graciously thank my advisors Steve Lehrer and James MacKinnon for their supervi-
sion and comments on this project. I have beneﬁted from discussions with Joseph Altonji, Gigi
Foster, Weili Ding, Jean-Sebastien Fontaine, Vincent Pohl, and Caroline Weber. I would like to
thank seminar participants at Queen’s University, the University of Toronto, and SOLE 2014 para
their feedback.

REFERENCIAS
Vínculo, Timothy N., and Kevin Lang. 2013. The evolution of the black-white test score gap in grades
K–3: The fragility of results. Review of Economics and Statistics 95(5):1468–1479.

Cascio, Elizabeth U., y Douglas O.. Staiger. 2012. Knowledge, pruebas, and fadeout in educational
intervenciones. NBER Working Paper No. 18038.

Chetty, Raj, John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzen-
bach, and Danny Yagan. 2011. How does your kindergarten classroom aﬀect your earnings? Evi-
dence from Project STAR. Revista trimestral de economía 126(4):1593–1660.

Chetty, Raj, John N. Friedman, and Jonah E. Rockoﬀ. 2014. Measuring the impacts of teach-
ers I: Evaluating bias in teacher value-added estimates. Revisión económica estadounidense 104(9):2593–
2632.

Clotfelter, Charles T., Helen F. muchacho, and Jacob Vigdor. 2010. Teacher credentials and student
achievement in high school: A cross-subject analysis with student ﬁxed eﬀects. Journal of Human
Recursos 45(3):655–681.

Dee, Thomas S. 2004. Maestros, carrera, and student achievement in a randomized experiment.
Review of Economics and Statistics 86(1):195–210.

25. Caution should be taken in comparing the adjusted gaps, sin embargo. A direct comparison between these gaps
and the gaps of Fryer and Levitt (2004, 2006) is not possible because the latter use a diﬀerent set of covariates.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

465

Racial Interactions and Achievement

Ding, Weili, and Steven F. Lehrer. 2010. Estimating treatment eﬀects from contaminated multi-
period education experiments: The dynamic impacts of class size reductions. Review of Economics
and Statistics 92(1):31–42.

Ding, Weili, and Steven F. Lehrer. 2011. Experimental estimates of the impacts of class size on
resultados de las pruebas: Robustness and heterogeneity. Education Economics 19(3):229–252.

Ding, Weili, and Steven F. Lehrer. 2014. Understanding the role of time-varying unobserved
ability heterogeneity in education production. Revisión de la economía de la educación 40:55–75.

Egalite, Anna J., Brian Kisida, and Marcus A. Winters. 2015. Representation in the classroom: El
eﬀect of own-race teachers on student achievement. Revisión de la economía de la educación 45:44–52.

Finn, Jeremy D., Jayne Boyd-Zaharias, Reva M. Fish, and Susan B. Gerber. 2007. Project STAR
and beyond: Database user’s guide. Líbano, TN: HEROS, Incorporated.

freidora, Roland G., and Steven D. Levitt, 2004. Understanding the black–white test score gap in
the ﬁrst two years of school. La revista de economía y estadística. 86(2):447–464.

freidora, Roland G., and Steven D. Levitt. 2006. The black–white test score gap through third grade.
American Law and Economics Review 8(2):249–281.

Hanushek, eric a. 1999. Some ﬁndings from an independent investigation of the Tennessee
STAR Experiment and from other investigations of class size eﬀects. Educational Evaluation and
Policy Analysis 21(2):143–163.

Hanushek, eric a., John F. Kain, Daniel M. O'Brien, y Steven G.. Rivkin. 2005. The market
for teacher quality. NBER Working Paper No. 11154.

Imazeki, Jennifer. 2003. Class-size reduction and teacher quality: Evidence from California. En
School ﬁnance and teacher quality: Exploring the connections, edited by David Monk and Margaret
Plecki, 159–178. Abingdon, Reino Unido: Routledge.

Ingersoll, Ricardo. 2015. What do the national data tell us about minority teacher shortages?
In The state of teacher diversity in American education, 14–22. Washington, corriente continua: Albert Shanker
Instituto.

Ingersoll, Ricardo, and Henry May. 2011. Recruitment, retention and the minority teacher shortage.
Filadelfia, Pensilvania: Consortium for Policy Research in Education.

Jackson, C. Kirabo. 2013. Matching quality, worker productivity, and worker mobility: Direct evi-
dence from teachers. Review of Economics and Statistics 95(4):1096–1116.

Jepsen, Christopher, and Steven Rivkin. 2009. Class size reduction and student achievement:
The potential tradeoﬀ between teacher quality and class size. Journal of Human Resources
44(1):223–250.

krüger, Alan B. 1999. Experimental estimates of education production functions. Quarterly Jour-
nal of Economics 114(2):497–532.

Lechner, Miguel, and Ruth Miquel. 2010. Identiﬁcation of the eﬀects of dynamic treatments by
sequential conditional independence assumptions. Empirical Economics 39(1):111–137.

Miquel, Ruth. 2003. Identiﬁcation of eﬀects of dynamic treatments with a diﬀerence-in-
diﬀerences approach. Artículo inédito, University of St. Gallen, Suiza.

Mueller, Steﬀen. 2013. Teacher experience and the class size eﬀect—Experimental evidence. Jour-
nal of Public Economics 98: 44–52.

466

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
mi
d
pag
a
r
t
i
C
mi
–
pag
d

F
/

1
2
4
4
4
7
1
6
9
2
1
5
0
mi
d
pag
_
a
_
0
0
2
0
2
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Jeffrey Penney

Ouazad, Amine. 2014. Assessed by a teacher like me: Race and teacher assessments. Educación
Finance and Policy 9(3):334–372.

Rivkin, Steven G., eric a. Hanushek, and John F. Kain. 2005. Maestros, escuelas, y académico
logro. Econometrica 73(2):417–458.

Rothstein, Jesse. 2010. Teacher quality in educational production: Tracking, decadencia, and student
logro. Revista trimestral de economía 125(1):175–214.

Rothstein, Jesse. 2014. Revisiting the impacts of teachers. Artículo inédito, University of Cal-
ifornia, berkeley.

Word, Elizabeth, John Johnston, Helen Pate Bain, B. DeWayne Fulton, Jayne Boyd Zaharias,
Carlos M.. Achilles, Martha Nannette Lintz, John Folger, Carolyn Breda. 1990. The state of Ten-
nessee’s Student/Teacher Achievement Ratio (STAR) Proyecto. Technical report 1985–1990. Disponible
www.classsizematters.org/wp-content/uploads/2016/09/STAR-Technical-Report-Part-I.pdf. C.A-
cessed 18 Enero 2017.