Amine Ouazad

INSEAD

经济系

77300 Fontainebleau

法国

amine.ouazad@insead.edu

ASSESSED BY A TEACHER LIKE

ME: RACE AND TEACHER

ASSESSMENTS

抽象的
Do teachers assess same-race students more favorably?
This paper uses nationally representative data on teacher
assessments of student ability that can be compared with
test scores to determine whether teachers give better as-
sessments to same-race students. The data set follows
students from kindergarten to grade 5, a period dur-
ing which racial gaps in test scores increase rapidly.
Teacher assessments comprise up to twenty items mea-
suring speciﬁc skills. Using a unique within-student
and within-teacher identiﬁcation and while controlling
for subject-speciﬁc test scores, I ﬁnd that teachers do
assess same-race students more favorably. Effects ap-
pear in kindergarten and persist thereafter. Robustness
checks suggest that: student behavior does not explain
this effect; same-race effects are evident in teacher as-
sessments of most of the skills; grading “on the curve”
should be associated with lower assessments; and mea-
surement error in assessments or test scores does not
signiﬁcantly affect the estimates.

334

土井:10.1162/EDFP_a_00136
© 2014 Association for Education Finance and Policy

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

1. 介绍
A growing body of research in education and psychology argues that minority
students receive less favorable feedback and less praise than do their white
同行 (Meier, 斯图尔特, and England 1989; 马库斯, 总的, and Seefeldt 1991;
Casteel 1998; Van Ewijk 2011). The research is usually conducted on small
样品, which may cast doubt on the wider applicability of results obtained
for particular schools or school districts (IE。, on whether results are externally
valid; Carpenter, 哈里森, and List 2005). In this paper I use a longitudinal
and nationally representative data set to measure whether or not teachers
assess same-race students more favorably. Field experiments with nationally
representative European data sets have recently measured whether teachers
assess minority students more favorably (Hinnerich, H¨oglin, and Johannesson
2011). 在美国, 然而, there are no nationally representative
data on teachers’ perceptions of same-race students’ skills. Analysis of the
National Educational Longitudinal Study of 1988 suggests that teachers have
more favorable perceptions of same-race students (Dee 2005), but in that study
the variables used to capture those perceptions (例如, “constantly inattentive,”
“frequently disruptive,” “rarely completes homework”) are measures more of
student behavior than of student performance. Hence these data cannot be
used to infer a same-race effect because such teacher perceptions are not
comparable to test scores.

There is another reason why it is so difﬁcult to measure whether teachers
assess same-race students more highly. Even if the researcher has comparable
teacher assessments of students and test scores, a ﬁnding that teachers give
better assessments to same-race students (conditional on test scores) 可以
not be given a causal interpretation owing to possible confounding factors.
Causal effects can be estimated if the researcher randomizes the assignment
of teachers to students, but such randomization is a long and costly process
that is usually performed only for small, nonrepresentative samples.

These considerations leave the researcher in a quandary. 一方面,
randomized samples with comparable teacher assessments and test scores
provide convincing evidence that teachers have more favorable perceptions
of same-race students’ skills, but randomized estimates are typically available
only for nonrepresentative samples of students. 另一方面, 国家-
ally representative samples usually lack two important features: teacher as-
sessments of student performance that are comparable to test scores, 和
randomized assignment of teachers to students.1

1. Lavy (2004) uses a nationally representative sample to estimate the impact of student gender on
grades at the high-school matriculation exam in Israel, but teacher assignments are not randomized.
Adding unique teacher identiﬁers to Lavy (2004) would also allow an identiﬁcation strategy based
on comparisons of teacher assessments and test scores while controlling for teacher effects.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

335

ASSESSED BY A TEACHER LIKE ME

This paper uses a longitudinal, nationally representative data set, the Early
Childhood Longitudinal Study, Kindergarten Class of 1998–1999, which in-
cludes detailed teacher assessments and test scores—in both mathematics
and English—in each wave of data collection from kindergarten to grade 5.
The teacher assessments are available for both subjects, and there are as
many as ten questions on speciﬁc skills within each subject in each follow-up
(Tourangeau et al. 2009). Given these data, continuous teacher assessments
can be compared with test scores.2 Teachers are not randomly assigned to
学生. Because the data set follows students through ﬁve follow-ups (从
kindergarten to grade 5) and includes teacher and student identiﬁers, 如何-
曾经, I am able to estimate the same-race effect on teacher assessments by
using a unique within-student (IE。, across grades3) and within-teacher iden-
tiﬁcation strategy that controls for student- and teacher-speciﬁc confounding
因素. The paper also describes several robustness checks, which indicate
那: (1) behavior does not explain the reported estimate of the same-race effect
on teacher assessments; (2) the same-race effect appears in kindergarten for
most skills that are assessed by the teacher; (3) grading on the curve within a
classroom would result in lower teacher assessments for same-race students;
和 (4) measurement error in teacher assessments or in test scores has no
signiﬁcant effect on the point estimates.

The within-student identiﬁcation strategy yields the following result: A
student who moves from a same-race teacher in one grade to a different-race
teacher in the next grade encounters a signiﬁcant drop in teacher assessments.4
Our second, within-teacher identiﬁcation strategy compares the teacher as-
sessment of same-race students to the average teacher assessment in the
student’s classroom. I combine the within-student and within-teacher iden-
tiﬁcation strategies and condition the results on student test scores: 存在
assessed by a same-race teacher increases teacher assessments of student per-
formance by 4 percent of a standard deviation in English and by 7 的百分比
a standard deviation in mathematics.

I design robustness checks to assess whether these results are consistent
with a teacher bias in favor of same-race students. One might object that
higher teacher assessments for same-race students reﬂect behavioral differ-
恩塞斯. 毕竟, teacher assessments of student performance do reﬂect, 部分地,

2. Tourangeau et al. (2009) mention that teacher assessments and test scores measure students’ skills
within the same broad curricular domains. 部分 4 examines teachers’ perceptions of students
skill by skill—and as early as in kindergarten—for skills that are the most likely to be assessed by
test scores; the results are similar (if not stronger) same-race effects.

3. 还, the survey is designed in a way that facilitates test score comparisons across grades. The tests
consist of two stages: an initial routing test for student ability, and second-stage tests that include
questions common to multiple grades (Tourangeau et al. 2009).

4. All of these results are conditional on student test scores.

336

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

student behavior (Sherman and Cormier 1974). The within-student identiﬁca-
tion strategy used here neutralizes the effect of permanent student behavioral
差异, but it cannot control for changes in student behavior that could
affect teacher assessments. Because the allocation of teachers to students is
not random,5 behavioral changes that raise teacher assessments may correlate
with being assigned to a same-race teacher in the subsequent grade. The data
set includes four reliable measures of student behavior that are based on the
Social Skills Rating System (Gresham and Elliott 1990). These measures vary
both across students and across grades. I do not ﬁnd that behavioral differ-
ences between same-race and other-race students explain the within-student
and within-teacher estimates of same-race effects on teacher assessments. Nei-
ther do I ﬁnd that changes in behavior from one grade to the next are associated
with the student moving from a same-race (other-race) teacher to an other-race
(same-race) teacher.

A second possible objection is that, as measures of student performance,
test scores are noisy and therefore may not fully condition for student per-
formance when assessing same-race effects on teacher assessments. 在那里面
案件, teacher assessments could be higher for same-race students simply be-
cause same-race students perform better. Test scores and teacher assessments
are highly reliable, but the question is whether a small amount of measure-
ment error would be sufﬁcient to confound the estimate of a same-race effect.
This paper calculates the impact of a given amount of measurement error in
test scores on the derived estimate of the same-race effect. A test score mea-
surement error of 50 percent would be required to account for the estimated
same-race effect.

The third major objection to this paper’s ﬁndings is that teacher assess-
ments may be an implicit ranking of students within a given classroom rather
than measures (例如, test scores) based on a common scale. I have used a
simple statistical framework to show that, because minority students have (在
average) lower test scores than white students and because minority and white
students tend to be in different classrooms, grading on a curve would lead
to higher teacher assessments for minority students—even though minority
students have signiﬁcantly (最多 40 percent of a standard deviation) 降低
teacher assessments. Grading on a curve also would affect estimates of the
same-race effect if peer group composition were correlated with assignment
to a same-race teacher. Controlling for peers’ average test score in the main
speciﬁcation does not affect my estimate of the same-race effect on teacher

5. For some evidence of nonrandom allocation of teachers to students, see Clotfelter, Ladd, and Vigdor

(2005).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

337

ASSESSED BY A TEACHER LIKE ME

assessment. 而且, assignment to a same-race teacher is not signiﬁcantly
correlated with peers’ average test score.

My main ﬁnding—that students are assessed more highly by teachers of
their own race—is robust to the three objections just detailed. That ﬁnding is
of particular relevance if teacher assessments are shown to have an effect on
student achievement. Identifying the impact of teacher perceptions of student
skills on later test scores is difﬁcult, and it has led to a large and somewhat
controversial literature in psychology and education (Rosenthal and Jacobson
1968; Jussim 1989; Jussim and Harber 2005). In the so-called Pygmalion
实验, a random subset of students in a small sample of participating
schools is typically labeled “bloomers,” and the research focus is on estimat-
ing the effect of such information on student performance. In this paper’s
nationally representative data set, I ﬁnd that previous assessments have a
signiﬁcant impact on later test scores (after conditioning for student effects,
teacher effects, and grade effects).6 实际上, previous teacher assessments are
more strongly correlated with later test scores than are previous test scores.

The paper contributes to two separate literatures. 第一的, it belongs to the
growing literature that documents same-race effects in a number of other
上下文. Price and Wolfers (2010) provide statistical evidence that National
Basketball Association referees favor players of their own race. In ﬁrms, Giu-
liano, 莱文, and Leonard (2009) found that white, Hispanic, and Asian
managers hire more whites and fewer blacks than do black managers. 在里面
data set of Giuliano, 莱文, and Leonard (2011), employees have better out-
comes when they are the same race as their manager. The main contribution
of this paper to that literature is providing evidence of same-race effects on
perceptions in education while using a nationally representative data set and
novel robustness checks.

In studying teacher perceptions of student skills from kindergarten to
年级 5, this paper adds also to the literature on teachers’ perceptions of
minority students during their early years of schooling. The previous literature
on race and student assessment has used data for no earlier than grade 8 (Dee
2004). Racial test score gaps expand rapidly much sooner, 然而; Fryer and
莱维特 (2004) document that, between the start of kindergarten and the end
of ﬁrst grade, black students’ scores fall by 20 percent of a standard deviation
relative to white students with otherwise similar characteristics.

The conclusions reported in this paper should be of particular interest to
policy makers. 第一的, teachers as a group are less diverse than the U.S. student
人口. 有, 尤其, a persistent gap between the percentage of

I also instrument the previous test score by lagged test scores to avoid biases stemming from
regression to the mean (看, 例如, Arellano and Bond 1991).

338

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

minority teachers and the percentage of minority students. Numerous papers
and reports have suggested improvements in the recruitment and retention
of minority teachers (Kirby, Berends, and Naftel 1999; Achinstein et al. 2010;
Ingersoll and May 2011). 第二, the paper’s results suggest that involving
teachers in student assessments7 may affect those assessments in ways that
reﬂect racial perceptions. To ensure fairness, 所以, an assessment system
that involves teachers should exhibit an appropriate racial balance among
graders. Note also that an interesting area of research suggests that racial
perceptions are not ﬁxed and can be signiﬁcantly altered.8

The paper is structured as follows. 部分 2 presents the data set and
descriptive evidence for higher teacher assessments of same-race students
(conditional on test scores). 部分 3 presents the within-student and within-
teacher identiﬁcation strategies separately before combining them to obtain
the paper’s baseline estimate. 部分 4 discusses the three major objections as
well as two policy implications of our results on teacher assessments. 部分 5
concludes.

2. DATA SET AND DESCRIPTIVE EVIDENCE
Structure of the Data Set

The data set is the Early Childhood Longitudinal Study, Kindergarten cohort
的 1998 (ECLS-K) from the National Center for Education Statistics, 我们.
教育部. The data follow a nationally representative sam-
的 20,000 kindergarten students in fall and spring kindergarten 1998,
spring grade 1, spring grade 3, and spring grade 5. About a thousand schools
participated.

全面的, the design of the experiment is such that observations are mostly
missing at random. Follow-ups have combined procedures to reduce costs
and maintain the sample’s representativeness. Students who move to an-
other school are randomly subsampled to reduce costs, and new schools and
children have been added to the data set to strengthen the survey’s repre-
sentativeness. In the spring of 1999, some of the schools that had previ-
ously declined participation were included. The new participating children
rendered the cross-sectional sample representative of ﬁrst-grade children, 全部
of whom were followed in the spring of grades 3 和 5. This paper uses weights

7. 例如, Darling-Hammond and Pecheone (2010) argue that teachers should be integrally

involved in the scoring of assessments.

8. Stangor, Sechrist, and Jost (2001) show how informing participants that others hold different be-
liefs about African Americans changes their beliefs about that group. Lyons and Kashima (2003)
suggest that interpersonal communication ﬁgures strongly in maintaining stereotypes. An inter-
esting avenue for future research involves examining how colleagues’ perceptions may affect a
teacher’s perceptions—using data as in Jackson and Bruegmann (2009) but instead with teachers’
perceptions of student performance.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

339

ASSESSED BY A TEACHER LIKE ME

桌子 1. Descriptive Statistics

Observations per Student

Observations per Teacher

Test Score

英语
Mathematics

Teacher Assessment

英语
Mathematics

Teacher Racea

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

Student Racea

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

Same-race Teacher by Student Raceb

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

意思是

6.991

8.198

50.00
50.00

0.809
0.063
0.019
0.052
0.057

0.587
0.137
0.057
0.157
0.062

0.436
0.683
0.188
0.069
0.163
0.056

标清

(2.020)

(5.914)

(10.00)
(10.00)

(0.393)
(0.244)
(0.135)
(0.221)
(0.232)

(0.492)
(0.344)
(0.232)
(0.364)
(0.241)

(0.496)
(0.465)
(0.391)
(0.253)
(0.369)
(0.230)

观察结果

115,950

67,885
48,065

115,950
115,950
115,950
115,950
115,950

115,950

115,950
115,950
115,950
115,950
115,950

aOther race, non-Hispanic includes Paciﬁc Islanders, 美洲印第安人, and non-Hispanic students
reporting multiple races.
bBoth of the same race, non-Hispanic, or Hispanic, any race.

provided by the survey’s designers to estimate representative effects, 尽管
the analysis is robust to changes in weights.

Observations that lacked data on basic variables (test scores, subjective
assessments, teachers’ and children’s race and gender) were deleted.9 The
analysis in this paper is based on 48,065 observations in mathematics and
67,885 in English, numbers that are similar to Fryer and Levitt (2006).

The restricted-use version of the data set includes both student and teacher
identiﬁers. 因此, students can be followed across grades. Within each follow-
向上, observations can be grouped by classroom using the teacher identiﬁers.
桌子 1 shows that data set includes about 6.9 observations per student (3.45

9. Results are robust to an alternative speciﬁcation where missing observations are present with a

dummy variable indicating that the data are missing.

340

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

on average per student in each subject); the data set includes 8.2 observations
per teacher.

Test Scores and Teacher Assessments

Test scores are based on answers to multiple-choice questionnaires conducted
by external assessors. They conform to national and state standards.10 Overall,
tests ask more than seventy questions in English, and more than sixty questions
in mathematics. Skills covered by the English assessments from kindergarten
to ﬁfth grade include: print familiarity, letter recognition, and beginning and
ending sounds; recognition of common words (sight vocabulary) and decod-
ing multisyllabic words; vocabulary knowledge, such as receptive vocabulary
and vocabulary in context; and passage comprehension. Skills covered by the
mathematics assessment include: number sense, 特性, and operations;
measurement; geometry and spatial sense; data analysis, 统计数据, and proba-
能力; and patterns, algebra, and functions. Test scores were standardized to a
mean of 50 and a standard deviation of 10 (桌子 1). Reliability measures based
on repeated estimates of test scores indicate that the tests are highly reliable;
Rasch coefﬁcients range between 0.88 和 0.95, 包括的.

Teacher assessments of student skills11 are collected at approximately the
same time as the tests are taken. Up to the spring of grade 3, the same teacher
in English and in mathematics assesses students. A different teacher assesses
students in each grade. Teachers do not see the test results, so that test score
results do not directly affect teacher assessments. The user guide speciﬁes that
“This is not a test and should not be administered directly to the child” (看,
例如, the Spring 2004 Fifth Grade questionnaire12). Teachers complete
one questionnaire per student. There are three different teacher assessments:
for language and literacy, mathematical thinking, and general knowledge.
The current paper uses the English (language and literacy) and mathematics
(mathematical thinking) assessments, as there is no corresponding test score
for general knowledge. The instructions make it clear that these assessments
should not be administered as a test directly to the student. For English and
for mathematics, teachers answer seven to nine questions, for a total number
of fourteen to eighteen questions. Answers are on a 5-point scale: Not Yet,

10. These include the National Assessment for Educational Progress, the National Council of Teachers
of Mathematics, the American Association for the Advancement of Science, and the National
Academy of Sciences.
11.
In the ECLS-K user guide, teacher assessments are also known as the academic rating scale.
12. 页 3 的 2004 Grade 5 mathematics form: “Please rate this child’s skills, 知识, 和
behaviors in mathematics based on your experience with the child identiﬁed on the cover of this
questionnaire. This is NOT a test and should not be administered directly to the child. Each question
includes examples that are meant to help you think of the range of situations in which the child
may demonstrate similar skills and behaviors.”

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

341

ASSESSED BY A TEACHER LIKE ME

Beginning, In Progress, Intermediate, and Proﬁcient. An overall assessment
is computed for English and for mathematics. Teacher assessments, like test
scores, were standardized to a mean of 50 and a standard deviation of 10
(桌子 1). Reliability measures suggest that teacher assessments are highly
可靠的; Rasch coefﬁcients range between 0.87 和 0.94.

Descriptive Evidence of Same-Race Effects on Teacher Assessments

The restricted-use version of the ECLS-K reports teachers’ and students’ race
和性别. The survey combines race and ethnicity for teachers. “Hispanic,
any race” is one category, and others are “White, any race,” “Black, any race,”
等等. The survey does distinguish race and ethnicity for students, 如何-
曾经. The two variables for students’ race and ethnicity were hence com-
bined to match the single teacher’s race and ethnicity variable. Hence “same
race” should be read as “same race (non-Hispanic) or both Hispanic (任何
种族).”13

The data set oversamples students from racial and ethnic minorities to
increase the precision of the estimates. In the data set, 14 percent are black
学生, 16 percent are Hispanic students, 和 6 percent are Asian students.
There are signiﬁcantly more white teachers than white students as a fraction of
the observations, and signiﬁcantly fewer black, Hispanic, and Asian teachers
compared with the corresponding fractions of black, Hispanic, and Asian
学生. Hence a white student is signiﬁcantly more likely to be assessed by
a same-race teacher than a black, Hispanic, or Asian student.

数字 1 presents the average teacher assessments at each test score level,
for students assessed by a same-race teacher and for students assessed by a
teacher of another race. Each line is a local polynomial regression of teacher
assessments on test scores;14 the solid line (the dashed line) is estimated on
observations for students assessed by a same-race teacher (a teacher of another
种族). The two graphs suggest that, at most test score levels, students have on
average higher teacher assessments when assessed by a same-race teacher.
The gap appears larger for Hispanic students (bottom graph) than for black
学生 (top graph).

13. Also the student’s race variable follows the 1997 我们. Revisions to the Standards for the Classiﬁca-
tion of Federal Data on Race and Ethnicity published by the Ofﬁce for Management and Budget,
which allow for the possibility of specifying “more than one race.” However, the share of multira-
cial students is small. Multiracial students are classiﬁed as “Other race,” but results are robust to
alternative classiﬁcations.

14. Figure generated with local mean smoothing with 500 点, Epanechnikov kernel, and optimal
half-width. The gap is robust to a variety of number of points, kernels, and half-width sizes.

342

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 1. Descriptive Evidence of the Same-Race Effect in (A) Black Students and (乙) Hispanic
Students. Notes: Each panel plots a local polynomial regression of teacher assessments on test
scores, using an Epanechnikov kernel, 500 点, and optimal half-width. The gap between the two
curves is present even when changing the type of the kernel, number of points, and the half-width.

An ordinary least squares (OLS) regression estimates the average effect of
same race teachers on the difference between teacher assessments and test
scores, and provides conﬁdence intervals:

TAi, F,t − TSi, F,t = constant + δ · Same Race i, F,t

+ StudentChar acter is tics i β

+ Teacher Char acter is tics i, F,t γ

+ Gr ade t + εi, F,t

(1)

343

ASSESSED BY A TEACHER LIKE ME

where i indexes students, f the subject area (mathematics or English), and t
the wave of the longitudinal data (t = {Fall kindergarten, spring kindergarten,
spring grade 1, spring grade 3, spring grade 5}). TAi,F,t is the standardized
teacher assessment, TSi,F,t represents the standardized test score. Same Racei,F,t
is a dummy set to 1 if student i in subject f in wave t was assessed by a same-
race teacher. Student characteristicsi is a vector of dummies for the student’s
gender and race. Teacher Characteristicsi,F,t is a vector of dummies for student
i’s teacher in subject f in wave t. Gradet is a grade effect, and εi,F,t is the residual,
clustered by student.15

The regression is performed separately for English and for mathematics.
Throughout the paper, I also present the regression with the teacher assess-
ment as the dependent variable, and the test score as a control. While the
regression with the test score as an explanatory variable corresponds to the
concept of conditional bias (Ferguson 2003), putting the test score on the right-
hand side means that the estimate of the coefﬁcient of the same-race dummy
may capture measurement error in test scores. Speciﬁcation 1 has both teacher
assessment and test score on the left-hand side, which substantially alleviates
any bias caused by measurement error.

The OLS regression suggests that a student assessed by a same-race teacher
gets a teacher assessment that is about 2.8 百分比到 5.7 percent of a standard
deviation higher in mathematics, 和 4.3 百分比到 6.7 percent of a standard
deviation higher in English (桌子 2). In this speciﬁcation, the test score as an
explanatory variable explains only 34.8 到 44 percent of the variance of teacher
assessments.

3. IDENTIFICATION STRATEGY
Within-Student Identiﬁcation: Using Student Mobility

from/to a Same-Race Teacher

In the descriptive evidence that was presented in the previous section, 这
OLS estimate of the same-race effect may be biased because a number of
student-speciﬁc variables are omitted from the regression.

例如, literature suggests that teacher perceptions of student per-
formance might depend on a number of characteristics other than student
种族: student behavior (Sherman and Cormier 1974), 语言 (Gluszek and
Dovidio 2010), parental involvement (Wilson and Martinussen 1999), student
academic engagement (Hughes and Kwok 2007), and other factors. Neither of
these variables is measured by test scores nor reﬂects racial perceptions per se.

15. Clustering by classroom, by student, or two-clustering (Cameron, Gelbach, and Miller, 2011) 经过
both student and classroom has little impact on the standard errors. Because two-way clustering
with two-way ﬁxed effect (used later in section 3) does not yet exist in the literature, I chose to
present standard errors clustered by student. Clustering by classroom yields very similar standard
errors in all speciﬁcation.

344

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

桌子 2. OLS Regressions

Mathematics

英语

(1)

(2)

(3)

(4)

Teacher
Assessment

Teacher Assessment
– Test Score

Teacher
Assessment

Teacher Assessment
– Test Score

Same-Race

Test Score

0.281∗
(0.118)

0.591∗∗
(0.004)

0.566∗∗
(0.131)

–

0.428∗∗
(0.093)

0.659∗∗
(0.003)

Controls

Student and teacher race and gender, grade effects

观察结果

48,065

Students

教师

20,252

5,297

0.348

F Statistic

1,218.5

48,065

20,252

5,297

0.034

85.3

67,855

20,252

5,496

0.436

2,501.1

0.665∗∗
(0.122)

–

67,855

20,252

5,496

0.029

68.9

Notes: Standard errors clustered by student. Clustering by classroom yields similar signiﬁcance
级别. Test scores and teacher assessments are standardized to a mean of 50 and a standard
deviation of 10.
∗Statistically signiﬁcant at the 5% 等级; ∗∗statistically signiﬁcant at the 1% 等级.

Identifying the speciﬁc effect of the student’s race requires a more complete
speciﬁcation than equation 1, one that at least controls for student-speciﬁc
omitted variables. Such omitted variables will confound the estimate of the
same-race effect if teachers and students are non-randomly matched.

Assume that the teacher assessment incorporates a measure of the test

分数, captures a same-race bias, and also student-speciﬁc omitted variables:

TAi, F,t = constant + δ · Same Race i, F,t + αTes t Scor e i, F,t + Gr ade t

+ Contr ol s i, F,t + Student Omitted Var iabl e i, F,t + Res idual i, F,t

(2)

with the same notations as in speciﬁcation 1, and εi,F,t = Student Omitted
Variablei,F,t + Residuali,F,t. Controlsi,F,t is a set of dummies for the teacher’s
race and gender. If student-speciﬁc omitted variables that have a positive im-
pact on the teacher assessment are correlated with assignment to a same-race
teacher, the effect δ of a same-race teacher on assessments is overestimated.
换句话说, if assignment to teachers depends on unobservables that affect
teacher assessments, the same-race effect is biased. Student-speciﬁc omitted
variables that are not correlated with same-race assignments will also imply a
(西德:3)) 是
correlation of residuals common to a given student, 那是, Corr(εi,F,t,εi,F

(西德:3)

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

345

ASSESSED BY A TEACHER LIKE ME

not equal to 0, and standard errors will need to be corrected for student-level
clustering.16

If student-speciﬁc omitted variables do not vary across grades,17 speciﬁca-

的 2 can be estimated using a student ﬁxed effect Studenti,F:

T Ai, F,t = constant + δ · Same Race i, F,t + α · Tes t Scor e i, F,t + Contr ol si, F,t

+Student i, F + Gr ade t + Res idual i, F,t

(3)

which is estimated using either a set of student dummies, or in ﬁrst-difference.
A major advantage of the dummy variable approach is that it allows us to
recover an estimate of the student unobservables Studenti; using this estimate
we can check whether there is a signiﬁcant correlation between assignment
to a same-race teacher and student unobservables. Speciﬁcation 3 can also be
estimated in ﬁrst-difference,18 那是, using a within-student regression:

T Ai, F,t+1 − T Ai, F,t = δ(Same Race i, F,t+1 − Same Race i, F,t )

+ (Contr ol s i, F,t+1 − Contr ol s i, F,t )

+ A(Tes t Scor e i, F,t+1 − Tes t Scor e i, F,t )

+ (Gr ade t+1 − Gr ade t ) + (Res idual i, F,t+1

− Res idual i, F,t ).

(4)

The ﬁrst-differenced speciﬁcation makes clear that the identiﬁcation of
the same-race effect δ relies on student mobility from/to a same-race teacher.
The effect of a same-race teacher is estimated without bias if the mobility
of a student from a teacher of the same-race (another-race) in one grade to
a teacher of another race (the same race), in the next grade, is uncorrelated
with time varying student unobservables that have an impact on test scores,
那是, Corr((Same Racei,F,t + 1 − Same Racei,F,t), (Residuali,F,t + 1 – Residuali,F,t)) =
0. Student behavior is one such time varying unobservable that may affect
teacher assessments and is potentially correlated with student mobility to/from

16. Speciﬁcally, Cov(εi,F,t,εi,F (西德:3),t(西德:3) ) = Cov(Student Omitted Variablei,F,t,Student Omitted Variablei,F (西德:3),t(西德:3) ) for f (西德:4)=
F (西德:3) and for t (西德:4)= t(西德:3). If student-speciﬁc omitted variables are constant across grades, then Cov(εi,F,t,
εi,F,t (西德:3) ) = Var(Student Omitted Variablei,F) and the correlation of residuals for a given student across
grades will be equal to the ratio of the variance of student unobservables to the overall variance of
the residuals (Moulton 1990).

17. Student Omitted Variablei,F,t = Student Omitted Variablei,F,t (西德:3) for any t, t (西德:3).
18. Both approaches (student dummies and ﬁrst-differenced speciﬁcation) are equivalent with a large
number of observations as long as the strict exogeneity assumption is satisﬁed (Baltagi 2008), 那
是, 乙(Residuali,F,t|席,F,1,席,F,2,…,席,F,5) = 0, where 1,2,…,5 indexes waves of the survey, and Xi,F,t denotes
the vector of explanatory variables for student i in subject area f, in grade t (常数, same race
dummy, test score, and grade dummies).

346

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

a teacher of the same race. I discuss the impact of behavior on estimates in
部分 4.

Because identiﬁcation relies on student mobility across teachers, it is im-
portant to check that a sufﬁcient number of students move to teachers of
different races. Otherwise identiﬁcation would rely on a small number of stu-
dents who move from/to a teacher of the same race.19 There are a large number
of such moves: 51 percent of students experience mobility from/to a same-race
teacher at some point between kindergarten and grade 5, and the sample of
movers is balanced in terms of race, 性别, and parental income.20

Columns (1) 和 (4) of table 3 present the estimation of the ﬁrst-differenced
speciﬁcation 4 in mathematics and in English, with standard errors clustered
by student.21 Being assessed by a teacher of the same race raises teacher assess-
ments by 3.5 percent of a standard deviation in mathematics and by 4.3 百分
in English. The speciﬁcation has fewer observations because the number of
observations is equal to the number of ﬁrst-differenced teacher assessments.
Columns (2) 和 (5) present results of the estimation of speciﬁcation 3, 哪个
includes a student ﬁxed effect. Being assessed by a teacher of the same race
raises assessments by 7 percent of a standard deviation in mathematics and
经过 4.8 percent of a standard deviation in English. The regression is strongly
signiﬁcant with an F statistic of 82.6. 重要的, there is a signiﬁcantly pos-
itive correlation between the estimated student effects and assignment to a
same race teacher both in mathematics and in English, which indicates that
the regression without student ﬁxed effects underestimates the impact of a
same-race teacher on assessments. Columns (3) 和 (6) regress the difference
between the teacher assessment and the test score on the explanatory variables.
Estimates of the same race effect are comparable to columns (2) 和 (5) 的
same table.

Within-Classroom Identiﬁcation

Teacher-speciﬁc omitted variables may also confound the estimate of the same-
race effect. Although OLS speciﬁcation 1 controls for teachers’ race and gender,
other teacher characteristics, imperfectly correlated with race and gender, 影响
teacher assessments. 例如, Figlio and Lucas (2004) ﬁnd that some
teachers give higher average grades regardless of their students’ ability, 种族,
or gender. Such variation in average assessments across classrooms should

19.

一般来说, if a covariate does not vary for a given student in a panel data regression with student
ﬁxed effects, the student’s observation will not contribute to the estimation of the effect (Wooldridge
2002).

20. At each parental income level, 从 41 百分比到 52 percent of students experience a transition

from/to a same race teacher. Statistics available on request.

21. Clustering either by classroom, by student, or clustering by both classroom and student (Cameron,

Gelbach, and Miller 2011) does not signiﬁcantly affect the estimated standard errors.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

347

ASSESSED BY A TEACHER LIKE ME

)
6
(

)
5
(

)
4
(

)
3
(

)
2
(

)
1
(

e
r
哦
C
S
t
s
e
时间

–

t
n
e
米
s
s
e
s
s
A

e
r
哦
C
S
t
s
e
时间

–

t
n
e
米
s
s
e
s
s
A

t
n
e
米
s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

d
e
C
n
e
r
e
F
F
我

D
–
t
s
r
我

t
n
e
米
s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

d
e
C
n
e
r
e
F
F
我

D
–
t
s
r
我

H
s
我
我

G
n
乙

s
C
我
t
A
米
e
H
t
A
中号

s
t
C
e
F
F
乙

d
e
X
我
F

t
n
e
d
你
t
S
H
t
我

w
d
n
A

n
哦
我
t
A
C
ﬁ
我
C
e
p
S
d
e
C
n
e
r
e
F
F
我

–
t
s
r
我
F

H
t
我

w
s
t
我
你
s
e
右

我

3
e
乙
A
时间

348

∗
∗
3
8
4
0

)
6
7
1
0
(

–

s
e
是

哦
氮

s
e
是

5
5
8
,
7
6

0
3
4
0

6
4
6
1

)
0
0
0
0
(

∗
∗
6
9
0
0
-

)
0
0
0
0
(

∗
∗
3
1
4
0

)
3
1
1
0
(

∗
∗
6
1
3
0

)
6
0
0
0
(

s
e
是

哦
氮

s
e
是

5
5
8
,
7
6

9
9
6
0

4
2
0
2

)
0
0
0
0
(

∗
∗
5
6
0
0

)
0
0
0
0
(

∗
∗
9
2
4
0

∗
∗
4
8
7
0

∗
∗
4
0
7
0

∗
∗
1
4
2
0

)

4
5
1
0

(

)

7
0
0
0

(

哦
氮

s
e
是

A
2
9
4

4
4

6
3
0
0

–

)

9
7
1
0

(

–

s
e
是

哦
氮

s
e
是

5
6
0

8
4

0
4
0
0

∗
∗
3
6
2
0

)

2
6
1
0

(

)

9
0
0
0

(

s
e
是

哦
氮

s
e
是

5
6
0

8
4

5
6
6
0

2
7
3
2

8
0
1
3

)

0
0
0
0

(

)

0
0
0
0

(

∗
∗
5
4
1
0
-

∗
∗
2
4
0
0

)

0
0
0
0

(

)

0
0
0
0

(

+
0
5
3
0

)
1
1
2
0
(

∗
∗
9
2
1
0

)
1
1
0
0
(

哦
氮

s
e
是

A
9
8
0
,
2
2

0
1
0
0

–

r
e
d
n
e
G
d
n
A

e
C
A
右
r
e
H
C
A
e
时间

d
n
A

t
n
e
d
你
t
S

s
t
C
e
F
F
乙

e
d
A
r
G

s
n
哦
我
t
A
v
r
e
s
乙
氧

2
右

)
e
你
A
v

我

p
(

s
t
C
e
F
F
乙

t
n
e
d
你
t
S
r
哦
F

C
我
t
s
我
t
A
t
S
F

)
s
t
C
e
F
F
乙

t
n
e
d
你
t
S

e
C
A
右
e
米
A
S
(
r
r
哦
C

–

e
C
A
右
e
米
A
S

e
r
哦
C
S
t
s
e
时间

t
C
e
F
F
乙

t
n
e
d
你
t
S

d
r
A
d
n
A
t
s

d
n
A

0
5

F
哦

n
A
e
米
A

哦
t

d
e
z
我
d
r
A
d
n
A
t
s

s
t
n
e
米
s
s
e
s
s
A

r
e
H
C
A
e
t

d
n
A

s
e
r
哦
C
s

t
s
e
时间

米
哦
哦
r
s
s
A
C

我

y
乙

G
n
我
r
e
t
s
你
C

我

哦
t

t
s
你
乙
哦
r

s
t
我
你
s
e
右

.
t
n
e
d
你
t
s

y
乙

d
e
r
e
t
s
你
我
C

s
r
哦
r
r
e

d
r
A
d
n
A
t
S

:
s
e
t
哦
氮

.
0
1

F
哦

n
哦
我
t
A
我
v
e
d

.
我
e
v
e

我

%
0
1
e
H
t

t
A

t
n
A
C
ﬁ
n
G
s

我

y
我
我

A
C
我
t
s
我
t
A
t
s
+

;
我
e
v
e

我

%
1

e
H
t

t
A

t
n
A
C
ﬁ
n
G
s

我

y
我
我

A
C
我
t
s
我
t
A
t
s
∗
∗

;
我
e
v
e

我

%
5

e
H
t

t
A

t
n
A
C
ﬁ
n
G
s

我

.
G
n
我
C
n
e
r
e
F
F
我
d

t
s
r
ﬁ

哦
t

e
你
d

s
n
哦
我
t
A
v
r
e
s
乙
哦

F
哦

r
e
乙
米
你
n

y
我
我

A
C
我
t
s
我
t
A
t
S
∗

r
e

我
我

A
米
S
A

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

be controlled for in speciﬁcation 1 as the nonrandom sorting of teachers to
students implies that the teacher’s average assessment may be correlated with
assignment to a same-race student.

All these teacher-speciﬁc omitted variables enter in the determination of

teacher assessments:

T Ai, F,t = constant + δ · Same Racei, F,t + αT es t Scor ei, F,t

+ T eacher Omitted Var iabl ei, F,t + Contr ol si, F,t

+ Gr adet + Res iduali, F,t .

(5)

Teacher omitted variables (Teacher Omitted Variablei,F,t), if correlated posi-
tively with assignment to a same race teacher (Same Racei,F,t), lead to an upward
bias in the estimate δ of the same-race effect. The presence of teacher-speciﬁc
omitted variables also imply a correlation of residuals in the OLS speciﬁcation
across observations of the same classroom, and standard errors should be
corrected for clustering at the classroom level.22 Because of the large number
of ﬁxed effects (6,093 教师), a speciﬁcation like speciﬁcation 5 is usually
estimated by taking the within-classroom difference of teacher assessments,
test scores, and each covariate of the speciﬁcation

T Ai, F,t − E (T A., F,t |cl as s r oom)

= δ · (Same Racei, F,t − E (Same Racei, F,t |cl as s r oom)

+ α · (硅钛矿, F,t − E (T S., F,t |cl as s r oom))

+ Contr ol si, F,t − E (Contr ol si, F,t |cl as s r oom))

+ Res idual

(西德:3)
我, F,t

(6)

where E(x.,f,t|classroom) is the average of covariate x in the classroom of student
i in subject f in year t. The within-classroom speciﬁcation makes it clear that
the identiﬁcation relies on comparing the teacher assessment TAi,F,t of a stu-
dent to the average teacher assessment E(TA .,f,t|classroom) in the classroom. A
classroom contributes to the identiﬁcation of the same-race effect if it has both
same-race and other-race students.23 Fortunately, 97.2 percent of the class-
rooms of the sample have observations of same-race and other-race students,
和 44 percent of students are of the same race as teacher on average.

22. Throughout the paper I cluster standard errors at the student level, but clustering at the classroom
level or two-way clustering at the student and classroom levels (Cameron, Gelbach, and Miller 2011)
yields similar signiﬁcance levels.

23. 正式地, if the value of Same Racei,F,t – E(Same Race.,f,t|classroom) changes within a classroom.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

349

ASSESSED BY A TEACHER LIKE ME

Speciﬁcation 5 can also be estimated by including a set of teacher ﬁxed

effects, 即, one dummy of each teacher of the sample.

T Ai, F,t = constant + δ · Same Race i, F,t + αTes t Scor e i, F,t

+ Teacher E f f ect i, F,t

+ Gr ade t + Res idual i, F,t .

(7)

Both approaches (speciﬁcations 6 和 7) yield the same estimate with
a large number of observations (Baltagi 2008).24 The advantage of such a
speciﬁcation is that it allows us to recover an estimate of the teacher effect. In all
waves except the spring grade 5 后续行动, the same teacher assesses students
in English and mathematics, but separate teacher effects are estimated for
English and for mathematics.

Columns (1) 和 (4) of table 4 show the results of the within-classroom
speciﬁcation 6. Students assessed by a teacher of the same race have higher
teacher assessments, 经过 4.1 percent of a standard deviation in English and 5.5
percent in mathematics. All results are signiﬁcant at 1 百分. 有趣的是,
test scores and observable controls explain 34 percent of the variance of teacher
assessments. Columns (2) 和 (5) present results of the estimation of speci-
ﬁcation 7, which includes teacher effects. The point estimates are larger than
in the within-teacher approach, but they are not statistically different from the
estimates of columns (1) 和 (4). Having a same-race teacher raises teacher
assessments by 6.9 percent of a standard deviation in English and 7.0 百分
of a standard deviation in mathematics. The speciﬁcation allows us to estimate
that teacher effects are signiﬁcant (the null hypothesis that teacher effects are
equal to zero is rejected), indicating that teacher unobservables play a role in
assessments. 而且, being assessed by a same-race teacher is negatively
correlated with the teacher effect (especially in mathematics), and we indeed
observe a downward bias: The OLS estimation of the same-race effect without
teacher effects in columns (1) 和 (3) of table 2 is lower than the estimates
of columns (2) 和 (5) of table 4. 最后, results available on request show
that teacher unobservables are not accounted for by the teacher’s race, 性别,
经验, or tenure.

Combining the Within-Student and Within-Classroom Identiﬁcation Strategies

最后, I combine both the former two identiﬁcation strategies to control
for both student-speciﬁc and teacher-speciﬁc omitted variables. My preferred

24. 那是, both estimators converge in probability to the same estimate. Under the assumption that
residuals are strictly exogenous within each classroom, 那是, 乙(Residual(西德:3)我,F,t|X·,F,t) = 0, 在哪里
席,F,t is the vector of explanatory (right-hand side) variables in speciﬁcation 6.

350

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

)
6
(

)
5
(

)
4
(

)
3
(

)
2
(

)
1
(

t
n
e
米
s
s
e
s
s
A

A
时间

e
G
A
r
e
v
A
–

t
n
e
米
s
s
e
s
s
A

A
时间

e
G
A
r
e
v
A
–

r
e
H
C
A
e
时间

t
n
e
米
s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

t
n
e
米
s
s
e
s
s
A
r
e
H
C
A
e
时间

H
s
我
我

G
n
乙

s
C
我
t
A
米
e
H
t
A
中号

s
t
C
e
F
F
乙

d
e
X
我
F

r
e
H
C
A
e
时间

d
n
A

t
n
e
d
你
t
S
H
t
哦
乙

我

G
n
n
乙
米
哦
C
d
n
A

,
n
哦
我
t
A
C
ﬁ
我
C
e
p
S
s
t
C
e
F
F
乙
d
e
X
我
F

–

r
e
H
C
A
e
时间

e
H
t

F
哦

,
n
哦
我
t
A
米

我
t
s
乙

r
e
H
C
A
e
时间
n
H
t
我

我

–

瓦
e
H
t

F
哦

s
t
我
你
s
e
右

我

4
e
乙
A
时间

∗
∗
5
3
4
0

)
4
1
1
0
(

∗
∗
3
1
3
0

)
5
0
0
0
(

s
e
是

哦
氮

5
5
8
,
7
6

3
7
7
.
0

6
8
7
2

)
0
0
0
0
(

∗
∗
3
1
0
0

)
0
0
0
0
(

2
5
1
2

)
0
0
0
0
(

∗
∗
8
5
0
0

)
0
0
0
0
(

∗
∗
2
0
7
0

)
4
9
0
0
(

∗
∗
9
6
6
0

)
3
0
0
0
(

s
e
是

哦
氮

s
e
是

5
5
8
,
7
6

3
5
5
.
0

6
3
8
2

)
0
0
0
0
(

∗
∗
7
1
0
0
–

)
0
0
0
0
(

–

∗
∗
9
4
5
0

∗
∗
1
1
7
0

∗
∗
4
9
6
0

)

8
9
0
0

(

)

0
9
1
0

(

)

0
2
1
0

(

∗
∗
4
5
6
0

∗
∗
1
4
2
0

∗
∗
8
8
5
0

)

4
0
0
0

(

)

9
0
0
0

(

)

4
0
0
0

(

哦
氮

s
e
是

5
5
8
7
6

8
3
4

–

s
e
是

哦
氮

5
6
0

8
4

6
8
7

6
9
9
2

s
e
是

哦
氮

s
e
是

5
6
0

8
4

0
4
5

1
9
2
3

)

0
0
0
0

(

)

0
0
0
0

(

)

0
0
0
0

(

4
9
7
1

)

0
0
0
0

(

∗
∗
0
3
0
0

–

∗
∗
0
2
0
0

∗
∗
1
1
0
0
–

)

0
0
0
0

(

)

8
1
0
0

(

∗
∗
6
0
4
0

)
9
1
1
0
(

∗
∗
5
6
5
0

)
5
0
0
0
(

哦
氮

s
e
是

5
6
0
,
8
4

8
3
3
.
0

–

我

s
e
乙
A
v
r
e
s
乙
氧

r
e
H
C
A
e
时间

d
n
A

t
n
e
d
你
t
S

)
e
你
A
v

我

p
(

.
t
A
t
S
F

s
t
C
e
F
F
乙

r
e
H
C
A
e
时间

)
s
t
C
e
F
F
乙

r
e
H
C
A
e
时间

e
C
A
右
e
米
A
S
(
r
r
哦
C

)
e
你
A
v

我

p
(

.
t
A
t
S
F

s
t
C
e
F
F
乙

t
n
e
d
你
t
S

)
s
t
C
e
F
F
乙

t
n
e
d
你
t
S

e
C
A
右
e
米
A
S
(
r
r
哦
C

s
n
哦
我
t
A
v
r
e
s
乙
氧

2
右

e
C
A
右
e
米
A
S

e
r
哦
C
S
t
s
e
时间

s
t
C
e
F
F
乙

r
e
H
C
A
e
时间

s
t
C
e
F
F
乙

t
n
e
d
你
t
S

.
t
n
e
米
s
s
e
s
s
A

r
e
H
C
A
e
t

A
时间

.
s
e
t
A
米

我
t
s
e

r
A

我
我

米
s

我

s
d
e
我
y
米
哦
哦
r
s
s
A
C

我

y
乙

G
n
我
r
e
t
s
你
C

我

.
t
n
e
d
你
t
s

y
乙

d
e
r
e
t
s
你
C

我

s
r
哦
r
r
e

d
r
A
d
n
A
t
S

.
s
t
C
e
F
F
e

e
d
A
r
G

e
d
你
我
C
n

我

.
我
e
v
e

我

%
1
e
H
t

t
A

s
n
哦
我
t
A
C
ﬁ
我
C
e
p
s

我
我

:
s
e
t
哦
氮

t
n
A
C
ﬁ
n
G
s

我

y
我
我

A
C
我
t
s
我
t
A
t
S
∗
∗

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

351

ASSESSED BY A TEACHER LIKE ME

estimate is thus the same-race δ coefﬁcient in the regression that controls for
both teacher effects and student effects:

T Ai, F,t = constant + δSame Race i, F,t + αTes t Scor e i, F,t

+ Teacher E f f ect i, F,t
+ Gr ade t + Res idual i, F,t

+ Student E f f ect i

(8)

where the teacher effect (Teacher Effecti,F,t) and the student effect (Student Effecti)
are estimated by including a set of dummies for teachers and a set of dum-
mies for students as controls. The large number of students (21,409) 和
large number of teachers (6,093) make it necessary to estimate the model
using econometric techniques pioneered by Abowd, Creecy, and Kramarz
(2002) and Abowd, Kramarz, and Margolis (1999) in the labor economics
employer–employee literature. The technique provides estimates for all stu-
dent effects, teacher effects, grade effects, and same-race and test score co-
efﬁcients. Standard errors are clustered at the student level; clustering by
classroom yields similar standard errors.

Columns (3) 和 (6) present the estimates. Teachers give better assess-
ments to students of their own race; the effect is 7.1 percent of a standard
deviation in mathematics and 4.4 of a standard deviation in English. Teacher
and student effects are signiﬁcant.

4. DISCUSSION OF THE FINDINGS
Behavior and Assessments

Teacher assessments of student performance are partly determined by student
行为 (Sherman and Cormier 1974). Column (1) (分别, Column (2))
of table 5 shows a regression of mathematics teacher assessments (分别,
English teacher assessments) on four behavioral measures.

The four behavioral measures come from a separate questionnaire of each
wave of the study. Teachers reported the measures in terms of the social rat-
ing scale: approaches to learning, interpersonal skills, externalizing problems
行为, internalizing problems behavior. The scale for approaches to learn-
ing measures the ease with which children can beneﬁt from their learning
环境. The interpersonal skills scale rates the child’s skill in forming
and maintaining friendships; getting along with people who are different;
comforting or helping other children; expressing feelings, ideas, and opinions
in positive ways; and showing sensitivity to the feelings of others. The exter-
nalizing problem behaviors scale (IE。, impulsive/overactive scale) addresses
acting-out behaviors, and the internalizing problem behavior scale addresses
evidence of anxiety, loneliness, low self-esteem, or sadness.

352

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

桌子 5. Behavior and Assessments

(1)

(2)

(3)

(4)

Mathematics Teacher
Assessment

English Teacher
Assessment

Same Race

Test Score

Approaches

to Learning

Interpersonal

Skills

Externalizing

Problem Behavior

Internalizing

Problem Behavior

Student and Teacher
Race and Gender

Student Effects

Teacher Effects

F Statistic

0.707∗∗
(0.199)

0.207∗∗
(0.008)

0.267∗∗
(0.008)

0.042∗∗
(0.007)

0.045∗∗
(0.012)

−0.040∗∗
(0.006)

不

是的

4.62

0.73

观察结果

48,065

0.419∗∗
(0.134)

0.265∗∗
(0.006)

0.298∗∗
(0.004)

0.035∗∗
(0.004)

0.035∗∗
(0.003)

−0.058∗∗
(0.005)

不

是的

0.80

67,855

–0.001
(0.001)

0.001∗∗
(0.001)

−0.001
(0.001)

−0.001∗∗
(0.000)

是的

不

4,249.2

0.59

0.001
(0.000)

−0.001+
(0.000)

0.000
(0.000)

0.001
(0.001)

0.001∗∗
(0.000)

是的

不

26.66

0.79

67,855A

aRegression performed using English observations. Students are assessed by the same teacher in
English and mathematics from kindergarten to grade 3, and different teachers in grade 5. 相似的
results hold when estimating the regression with mathematics observations.
∗∗Statistically signiﬁcant at the 1% 等级. All speciﬁcations include grade effects. Standard errors
clustered by student. Clustering by classroom yields similar estimates.

The measures of behavior vary substantially, both across students and for a
given student, across time. On the interpersonal skills scale, 50.1 的百分比
variance is explained by within-student variance, and the behavioral measure
in the previous wave of the study explains about 31 percent of the variance of
the behavioral measure of the next grade.

In Column (1) of table 5, the teacher assessment in mathematics is re-
gressed on the mathematics test score, the same-race dummy, the four behav-
ioral measures, a student effect, and a teacher effect.

The ﬁrst noticeable fact is the impact of behavior on assessment. Smaller
values indicate stronger behavioral problems. A one standard deviation in-
crease in the approaches to learning scale raises teacher assessments by 3
percent of a standard deviation. A one standard deviation increase in the inter-
personal skills measure raises teacher assessments by 0.4 percent of a standard

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

353

ASSESSED BY A TEACHER LIKE ME

deviation. Externalizing behavior problems has a similar positive effect. 国际米兰-
nalizing behavior problems has a negative impact on teacher assessments.
That last result is consistent with the ﬁnding (Rutherford, 奎因, and Mathur
2004) that students with internalizing behavior problems (social withdrawal,
anxiety, depression) are harder to identify than students with externalizing
behavior problems (noncompliance, aggression, disruption).

How behavior affects the baseline estimate of the same-race effect in speci-
ﬁcation 8 depends on whether students are partly matched to teachers based on
their behavior. Because I am using a student ﬁxed-effect regression, 行为
is a confounding factor in the regression if changes in behavior across grades
are signiﬁcantly correlated with the probability of being assigned a same-race
teacher. If students whose behavior improves are more likely to be assigned to
a same-race teacher, the same-race effect δ in speciﬁcation 8 will be overesti-
mated. Column (3) regresses the same-race dummy on the test score, the four
behavioral measures, and student and teacher race and gender dummies. 这
effect of behavior on same-race assignments is either nonsigniﬁcant or very
小的. Column (4) conﬁrms the ﬁnding when including student ﬁxed effects.
不出所料, 所以, behavioral controls leave the same-race effect
(0.707 compared with 0.702 in mathematics, 0.420 compared with 0.435 在
英语) virtually unchanged compared with the estimate with a student effect
and a teacher effect in table 4.

Same-Race Effects Skill by Skill

桌子 6 presents results of baseline regression for English, considering only
kindergarten fall semester observations. The novelty is that the dependent
variable is the teacher assessment broken down into eight separate skills.
The results are informative with regard to the likelihood of a bias for two
原因: 第一的, it is unlikely that students beneﬁt from the better teaching of
a same-race teacher (Dee 2005) only a few weeks after the start of school and
hence better teacher assessments for same-race students are more likely to
represent perceptions rather than actual skills. 第二, same-race assessment
gaps appear also for the least abstract questions—in other words, 问题
that address the skills that are most likely to be captured by achievement tests.
拿, 例如, the statement: “This child easily and quickly names all
upper- and lower-case letters of the alphabet.” In the fall semester of kinder-
garten, teachers assess students of their own race 4 percent of a standard
deviation higher than children of other races. This English skill is measured
in the kindergarten test and is measured early in the curriculum. And similar
regressions in grade 5 present similar positive same-race effects.

The same-race effect can also be estimated separately for each grade by in-
cluding interactions between the grade dummies and the same-race dummy.

354

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

s
t
n
e
米
s
s
e
s
s
A
r
e
H
C
A
e
时间

H
s
我
我

G
n
乙

n
e
t
r
A
G
r
e
d
n
K

我

我
我

A
F

∗
6
9
1
0

)
7
7
0
0
(

4
6
8
,
6
1

1
9
.
0

3
.
8
8
3

6
3
1
0

)
8
9
0
0
(

4
6
8
,
6
1

5
8
.
0

9
.
5
4
0
,
1

r
e
t
你
p
米
哦
C

s
n
哦
我
t
n
e
v
n
哦
C

G
n
我
t
我
r

瓦

8
1
0
0

)

8
0
1
0

(

s
d
A
e
右

0
8
0
0

我

G
n
米
y
H
右

∗
∗
4
7
6
0

s
e
米
A
氮

∗
∗
7
9
3
0

)

4
0
1
0

(

)

6
0
1
0

(

)

7
2
1
0

(

s
t
C
e
F
F
乙

r
e
H
C
A
e
时间

d
n
A

s
e
r
哦
C
S
t
s
e
时间

H
s

我
我

G
n
乙

4
6
8

6
1

4
6
8

6
1

2
8

3
8

4
6
8

6
1

2
8
0

4
6
8

6
1

4
7

9
9
2
5

4
3
2
1

2
7
2
8

3
5
6
5

我

s
哦
r
t
n
哦
C

s
d
n
A
t
s
r
e
d
n
U

我

X
e
p
米
哦
C

∗
∗
5
3
0
1

)
6
4
1
0
(

4
6
8
,
6
1

5
6
.
0

3
2
9
1
,
2

∗
∗
7
5
2
1

)
2
4
1
0
(

4
6
8
,
6
1

7
6
.
0

7
.
9
3
0
,
2

r
e
H
C
A
e
时间

–

e
C
A
右
e
米
A
S

r
e
H
C
A
e
时间

–

e
C
A
右
e
米
A
S

s
n
哦
我
t
A
v
r
e
s
乙
氧

我

s
哦
r
t
n
哦
C

C
我
t
s
我
t
A
t
S
F

2
右

我
我
我

k
S
y
乙

我
我
我

k
S
s
t
C
e
F
F
乙

e
C
A
右
e
米
A
S

我

6
e
乙
A
时间

e
H
t

r
哦
F

我

s
哦
r
t
n
哦
C

e
d
你
我
C
n

我

s
哦
r
t
n
哦
C

r
e
H
C
A
e
t

;
r
e
d
n
e
G

d
n
A

e
C
A
r

r
哦
F

我

s
哦
r
t
n
哦
C

e
d
你
C
n

我

我
我

H
C

;
0
5

F
哦

n
A
e
米
A

d
n
A

0
1

F
哦

n
哦
我
t
A
我
v
e
d

d
r
A
d
n
A
t
s

e
v
A
H

s
e
r
哦
C
s

t
s
e
时间

:
s
e
t
哦
氮

.
e
C
n
e
我
r
e
p
X
e

d
n
A

,
e
r
你
n
e
t

,
r
e
d
n
e
G

,
e
C
A
r

s
’
r
e
H
C
A
e
t

s
e
米
A
氮

我

.
r
e
H
/
米
H
哦
t
d
A
e
r

t
X
e
t

r
e
H
t
哦
r
哦
y
r
哦
t
s
A
s
t
e
r
p
r
e
t
n

我

d
n
A
s
d
n
A
t
s
r
e
d
n
你
d

我
我

我

H
C
s
H
时间
=
s
d
n
A
t
s
r
e
d
n
U

.
s
e
r
你
t
C
你
r
t
s
e
C
n
e
t
n
e
s
X
e
p
米
哦
C
s
e
s
你
d

我

我
我

我

H
C
s
H
时间
=
X
e
p
米
哦
C

我

:
s
n
哦
我
t
我
n
ﬁ
e
D

我

e
p
米
s

我

s
d
A
e
r

我
我

H
C

我

s
H
时间
=
s
d
A
e
右

.
s
d
r
哦
w
G
n
米
y
H
r

我

s
e
C
你
d
哦
r
p

我
我

H
C

我

s
H
时间
=
G
n
米
y
H
右

我

.
t
e
乙
A
H
p
A

我

e
H
t

F
哦

s
r
e
t
t
e

我

e
s
A
C
–
r
e
w
哦

我

d
n
A

–
r
e
p
p
你

我
我

s
e
米
A
n

y
我
k
C
我
你
q

d
n
A

y
我
我

s
A
e

我
我

H
C

我

s
H
时间
=

s
n
哦
我
t
n
e
v
n
哦
C

e
H
t

F
哦

e
米
哦
s

F
哦

我

G
n
d
n
A
t
s
r
e
d
n
你

n
A

s
e
t
A
r
t
s
n
哦
米
e
d

我
我

H
C

我

s
H
时间
=

s
n
哦
我
t
n
e
v
n
哦
C

.
s
r
哦
我
v
A
H
e
乙

G
n
我
t
我
r

w
y
我
r
A
e

s
e
t
A
r
t
s
n
哦
米
e
d

我
我

H
C

我

s
H
时间
=
G
n
我
t
我
r

瓦

.
y
我
t
n
e
d
n
e
p
e
d
n

我

s
k
哦
哦
乙

.
我
e
v
e

我

%
1
e
H
t

t
A

t
n
A
C
ﬁ
n
G
s

我

.
s
e
s
哦
p
r
你
p

F
哦

y
t
e
我
r
A
v

r
哦
F

r
e
t
你
p
米
哦
C

y
我
我

A
C
我
t
s
我
t
A
t
s
∗
∗

e
H
t

s
e
s
你

我
我

H
C

我

s
H
时间
=

r
e
t
你
p
米
哦
C

.
t
n
我
r
p

F
哦

;
我
e
v
e

我

%
5
e
H
t

t
A

t
n
A
C
ﬁ
n
G
s

我

y
我
我

A
C
我
t
s
我
t
A
t
S
∗

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

355

ASSESSED BY A TEACHER LIKE ME

These results (available from the author) show that teachers give more favor-
able assessments to same-race students as soon as in the fall of kindergarten:
14 percent of a standard deviation higher in mathematics and 11 percent of a
standard deviation higher in English. After the fall semester of kindergarten,
the effect is about 6 百分 (3 百分) of a standard deviation in mathematics
(英语).

Measurement Error in Test Scores and Teacher Assessments

Two types of measurement error may confound the main estimates of our
same-race effect in speciﬁcation 3. 第一的, teacher assessments may be noisy
measures of teacher perceptions of student performance. 第二, test scores
of multiple-choice questionnaires may be noisy measures of underlying ability
(Rudner and Schafer 2001). Random error may be introduced in the design
of the questionnaire and distractors (wrong options) may be partially cor-
直角. Measurement error in test scores may also be due to the student’s sleep
图案, illness, and careless errors when ﬁlling out the questionnaire, mis-
interpretation of test instructions, and other exam conditions.

Measurement error in teacher assessments is likely to make our estimates
of the same-race effect less signiﬁcant, because classical measurement error on
the dependent variable of a linear regression (speciﬁcation 3) does not typically
bias estimates but leads to larger standard errors for the estimated coefﬁcients
(Wooldridge 2002; Greene 2011). 因此, ﬁnding a signiﬁcant effect of a same-
race teacher is evidence that teacher assessments are a sufﬁciently precise25
measure of teacher perceptions of student performance.

Measurement error in test scores may be more problematic. 的确, proper
conditioning for student ability in a given grade is key to the estimation of same-
race effects on teacher perceptions of students’ skills. This paper measures
conditional bias as in Ferguson (2003)—that is, the impact of the student’s
race on teacher assessments when conditioning on covariates that include
measures of student ability. The main speciﬁcation (speciﬁcation 8) 估计
same-race effects on teacher assessments conditional on test scores and stu-
dent effects. At the extreme, if test scores are such a noisy measure of student
ability that most of its variance is accounted for by measurement error, 骗局-
ditioning on test scores will have no impact on the same-race coefﬁcient; 这
coefﬁcient on test scores will be nonsigniﬁcant.26 In such a case, the same-race
coefﬁcient will measure a sum of the same-race effects on teacher perceptions

25. Precision in the statistical sense, as the inverse of the standard deviation.
26.

In table 4, the coefﬁcient for test scores in all regressions is less than 1, whereas we would
naturally expect this coefﬁcient to equal to 1, given that both assessments and test scores have a
standard deviation of 10. Constraining this coefﬁcient to be equal to 1 does not signiﬁcantly alter
the coefﬁcients of interest. Results available on request.

356

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

and the positive effect of same-race teachers on student ability (Dee 2005).
On the other extreme, if test scores measure student ability accurately,27 这
same-race coefﬁcient in speciﬁcation 9 will be an estimate of same-race biases.
ECLS-K documentation speciﬁes that test scores are highly reliable (看
部分 2). But the question here is whether a small amount of measurement
error in test scores can explain away the same-race effect—that is, if the same-
race coefﬁcient captures some unobserved student ability rather than a bias in
teacher assessments.

So is there some amount of measurement error that explains the same-race
estimates of table 4? Test scores are noisy measures of the child’s underlying
能力, so that Test scorei,F,t = Abilityi,F,t+νi,F,t. Measurement error is assumed to
be classical (IE。, νi,t is not correlated with ability), 哪个, as Bound, 棕色的,
and Mathiowetz (2001) 建议, is a reasonable assumption in many common
案例.

Assume also that teacher assessments capture student ability and are af-

fected by a same- race bias δ:

TAi, F,t = constant + αStudent Abilit yi, F,t

+ δSame r ace i, F,t + εi, F,t .

(9)

For clarity and without loss of generality, student and teacher ﬁxed effects
are not included in this equation. I do not observe student ability and so
estimate speciﬁcation 9 by regressing assessments on the test score and the
same-race dummy. With that approach, the estimate of δ will not be consistent
because it will capture part of student ability instead of capturing only teacher
biases:28

plim(Estimator of δ) = δ + α · λθ

(10)

where δ is the coefﬁcient of teacher bias, and θ = var(ν)/[var(ν) + var(Ability)]
and λ = Cov(Same Race, Student Ability)/ Var(Same race)(1 − Corr(Same race,
Test score)2). 如果, as suggested by Dee (2005), student ability is higher when
taught by a same-race teacher, ability and the same-race dummy are positively
correlated, λ > 0, α · λθ > 0 and the effect α of same-race teachers on
assessments will be overestimated.29

If the relative size θ of the measurement error were known, an unbiased
effect of same-race teachers on assessments could be recovered. This unbiased

27. 正式地, if the test score is a sufﬁcient statistic for student ability.
28. The algebra is a particular case of the formulas of Greene (2011); plim denotes the probability limit

of the estimate.

29. This result is very close to equations of the statistical discrimination literature (看, 例如, Phelps
1972). On the labor market, the employer’s hiring decision may depend on the race of the job
candidate because the candidate’s education, 经验, and other covariates are not sufﬁcient
statistics for the candidate’s productivity.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

357

ASSESSED BY A TEACHER LIKE ME

桌子 7. Could Measurement Error in Test Scores Explain the Same-Race Effect?

Mathematics – Size of Measurement Error in Test Scores

θ = 0.00 θ = 0.05 θ = 0.10 θ = 0.15 θ = 0.20 θ = 0.25 θ = 0.30

Same Race

Corrected Test Score

0.711∗∗
(0.211)

0.241∗∗
(0.010)

0.668∗∗
(0.189)

0.254∗∗
(0.008)

0.620∗
(0.267)

0.268∗∗
(0.013)

0.566∗
(0.241)

0.284∗∗
(0.011)

0.506∗
(0.252)

0.301∗∗
(0.009)

0.438∗
(0.212)

0.322∗∗
(0.015)

0.360∗
(0.142)

0.345∗∗
(0.017)

English – Size of Measurement Error in Test Scores

θ = 0.00

θ = 0.05

θ = 0.10

θ = 0.15

θ = 0.20

θ = 0.25

θ = 0.30

Same Race

Corrected Test Score

0.435∗
(0.174)

0.313∗∗
(0.007)

0.384∗
(0.152)

0.330∗∗
(0.006)

0.327∗∗
(0.090)

0.348∗∗
(0.008)

0.264∗
(0.123)

0.368∗∗
(0.006)

0.193
(0.153)

0.391∗∗
(0.007)

0.113
(0.143)

0.417∗∗
(0.008)

0.021
(0.178)

0.446∗∗
(0.011)

Notes: Test scores have a standard deviation of 10 and a mean of 50. All regressions are two-
way ﬁxed-effects regressions with both a child and a teacher ﬁxed effect. Standard errors are
自力更生, clustered by student. The corrected test score is such that equation 13 holds.
∗Statistically signiﬁcant at the 5% 等级; ∗∗statistically signiﬁcant at the 1% 等级.

estimate of same-race effects is obtained by regressing assessments on a cor-
rected value of the test scores, deﬁned as follows:

Cor r ected T es t s cor e i, F,t = θ · E [Tes t s cor e ., F,t |Same r ace]

+ (1 − θ ) · Tes t s cor e i, F,t .

(11)

When we estimate speciﬁcation 8 replacing the test with this test score,
the estimator of the same-race effect will be an unbiased estimate of same-race
effect on teacher assessments δ.

This holds if we know the size of measurement error θ . But θ is unknown,
and we estimate the parameter of interest δ using different values of θ . 这
lowest value of measurement error θ that cancels the estimate of the effect of a
same-race teacher on assessments yields an estimate of the lowest amount of
measurement error that could account for the baseline results. Results for the
baseline speciﬁcations with corrected test scores are presented in table 7.30

For mathematics test scores, a measurement error of more than 30 每-
cent is required to render the coefﬁcient nonsigniﬁcant, and additional results
显示 40 到 50 percent of measurement error is required to cancel the
point estimate. For English, A 20 percent measurement error makes the coef-
ﬁcient nonsigniﬁcant, and additional results show that measurement error of

30. Results for measurement error above 30 percent are available upon request.

358

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Amine Ouazad

40 percent cancels the point estimate. 简而言之, a signiﬁcant amount of mea-
surement error would be necessary to cancel coefﬁcients. Even though this
statistic does not exclude the potentially confounding effect of measurement
错误, it does indicate that only a large amount of measurement error in test
scores would alter the conclusions.

Grading on a Curve

Teacher assessments in each subject are an average of ten different assess-
ments on a scale of 1 到 5, which is then standardized to a mean of 50 和
standard deviation of 10. Athough the skills that each assessment evaluates are
clearly deﬁned by the survey questionnaire, there is no guideline as such on
what should be the standard deviation of assessments across students within
a classroom, or what exact proﬁciency level justiﬁes awarding a 5 或一个 4. It may
well be that the teacher implicitly ranks students within a classroom.31

The implications of grading on a curve for the measurement of a bias in
favor of same-race students are multiple. 第一的, teacher assessments may not
be directly comparable to test scores, as they will reﬂect a ranking of students
within a classroom, while test scores have a common scale for all participating
学生. 第二, the teacher assessment of a given student will be correlated
with peers’ average test score in the classroom. 第三, if peer group ability is
signiﬁcantly correlated with being assigned a same-race teacher, the estimated
OLS effect of a same-race teacher on teacher assessments in speciﬁcation 1
will be biased.

If teacher assessments reﬂect a ranking of students within a classroom
rather than a measure on a common scale, we should expect black students to
get lower assessments than white students. 的确, consider a simple model
where there are only two students in each classroom, and each student can have
either a low teacher assessment (阿尔) or a high teacher assessment (ah). A student
gets a high assessment if he is the student with the highest ability in the class-
room. Student ability is denoted ω, and follows a cumulative distribution func-
tion F(ω). Each student can be either white, r = w, or minority, r = m. The cumu-
lative distribution function given the student’s race r is denoted F(ω|r). Then a
student gets a high assessment ah if his ability is higher than his peer’s ability.
因此, a student of race r has a high teacher assessment with probability
磷(a = ah|r,ω) =P(ω > ω(西德:3) |r,ω) = F(ω(西德:3) |r,ω). For simplicity, assume that peer
ability ω(西德:3) is independent of student ability conditional on race, 那是, F(ω(西德:3) |r,ω)
= F(ω(西德:3) |r).32 In the data we observe that minority students are in classrooms
with lower average test scores. Black students are in classrooms that have

31. Grading on a curve is one of the potential grading practices considered by Figlio and Lucas (2004).
32. Similar results hold if students are sorted by ability across classrooms.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
e
d
p
A
r
t
我
C
e
–
p
d

我

F
/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

359

ASSESSED BY A TEACHER LIKE ME

an average test score 13.7 percent of a standard deviation below the average
test score of white students’ peers. We also observe that the distribution of
black students’ peers’ test scores is strictly worse than white students’ peers’
test scores. 正式地, white students’ peers’ test score distribution ﬁrst-order
stochastically dominates black students’ peers’ test score distribution, F (ω(西德:3) |w)
< F (ω(cid:3) |b). Then, at a given ability level ω, white students are less likely to get a high assessment than black students: P(a = ah|w, ω) − P(a = ah|b, ω) = F (ω(cid:3) |w, ω) − F (ω(cid:3) |b, ω) < 0. If teacher assessments reﬂect a ranking in the classroom, we should thus observe that, conditional on test scores, minority students get higher teacher assessments than white students. But results (available from the author) show a nonsigniﬁcant or a negative and signiﬁcant effect of race on teacher assessment conditional on test scores. Another regression suggests a nonsigniﬁcant effect of peers’ test scores on teacher assessments. Such results make it unlikely that teacher assessments are a ranking of students within each classroom. The baseline effect of a same-race teacher on teacher assessments of table 4 and speciﬁcation 8 is also not likely to be affected by teachers grading on a curve within each classroom. Column (1) of table 8 suggests that being assigned a same-race teacher is negatively correlated with peers’ test scores. But column (2) of table 8 shows that being assigned a same-race teacher is not signiﬁcantly correlated with peers’ test scores when controlling for a student effect and teacher observables. Column (3) of the same table estimates the same-race effect in mathematics. The novelty compared to baseline speciﬁcation 8 is that the speciﬁcation controls for peers’ test scores. The estimate (+0.701) is virtually unchanged compared to table 4. Similar results, available from the author, hold in English. Results with All Racial Interaction Terms What races drive the results of the main speciﬁcation? We disentangle the effects of different racial interactions in speciﬁcation 8, by replacing the Same Race dummy by a set of dummies, one dummy for each interaction between the teacher’s and the student’s race: l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 T Ai, f,t = teacher i, f,t + constant + αTSi, f,t + student i, f (cid:2) + δr,r (cid:3) Dummy(teacher r ace =r )xDummy(s tudent r ace =r (cid:3) ) r (cid:4)=r (cid:3) + gr ade t, f + εi, f,t (12) 360 Amine Ouazad Table 8. Grading on a Curve Hypothesis Mathematics (1) (2) (3) Peers’ Test Scores Same Race Teacher Teacher Assessment Same Race Peers’ Average Test Score Test Score Student and Teacher Race and Gender Student Effects Teacher Effects F Statistic R2 Observations –0.609∗∗ (0.168) – – Yes No No 114.5 0.13 48,065 – –0.002 (0.002) –0.002∗∗ (0.001) Yes Yes No 13.5 0.82 48,065 0.701∗∗ (0.247) 0.065 (0.061) 0.264∗∗ (0.025) No Yes Yes 4.2 0.79 48,065 Notes: Standard errors clustered by student. Coefﬁcients have similar signiﬁcance levels when clustering by classroom. ∗∗Statistically signiﬁcant at the 1% level. where there is one racial interaction dummy for each pair of races r,r(cid:3). Dummy(teacher race = r) × D(student race = r(cid:3)) = 1 if the teacher’s race is r and the student’s race is r(cid:3), and 0 otherwise. The effects of interest are the coefﬁcients δ r,r(cid:3). The omitted dummy variables are the dummies for a teacher and a student of the same race, hence coefﬁcients are interpreted relative to the assessment given by a same-race teacher. Results are presented in table 9.33 In mathematics, being assessed by a white teacher lowers the assessment of Hispanic children by 17.3 percent of a standard deviation, compared with being assigned by a Hispanic teacher (the same-race interaction dummy is omitted). The interaction between white teachers and black students is not signiﬁcant, but the coefﬁcient’s order of magnitude is comparable to baseline estimates. In English, the interaction is signiﬁcant. White teachers give lower assessments to black children, lower by 11.1 percent of a standard deviation. They also give lower assessments to Hispanic children, by 14.8 percent of a standard deviation. 33. Results from very small minority groups (Paciﬁc Islanders, American Indians) may not be robust. All racial interactions are included in the regressions but only coefﬁcients for blacks, Hispanics, and whites are reported in the table. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 361 ASSESSED BY A TEACHER LIKE ME Table 9. Effects of All Racial Interactions Terms on Teacher Assessments Mathematics Teacher Assessment English Teacher Assessment (1) (2) Race of the Student Race of the Teacher Race of the Teacher non-Hispanic Black Hispanic non-Hispanic Black Hispanic White, White, –1.728∗∗ (0.627) –1.337 (0.872) Ref. Ref. 0.530 (0.414) 1.684∗∗ (0.568) White, non-Hispanic Ref. Black Hispanic, Any Race –0.590 (0.479) 0.899 (0.675) Test Score F Statistic R2 Student Effects Teacher Effects Grade Effects Observations –0.616 (0.512) Ref. 0.371 (1.697) 0.241∗∗ (0.009) 4.2 0.787 Yes Yes Yes 48,065 –1.110∗∗ (0.300) –1.480∗∗ (0.221) –0.980 (0.756) Ref. Ref. –0.643 (0.741) 0.314∗∗ (0.008) 5.6 0.774 Yes Yes Yes 67,855 Notes: This table presents the results of two separate regressions, each with the full set of interactions between the teacher’s race and the child’s race. Only the three largest minority group interactions are displayed in this table, but other interactions are included in the regressions. Ref. = interaction dummy omitted from the regression. ∗∗Statistically signiﬁcant at the 1% level. Despite the size of standard errors, statistical tests show that black teach- ers give signiﬁcantly higher English assessments to white students than white teachers to black students. Hispanic teachers, too, tend to give higher assess- ments in English to white students than white teachers to Hispanic students.34 In mathematics, white teachers give signiﬁcantly lower assessments to His- panic students than to white and black students.35 Table 9 also shows that Hispanic teachers tend to give higher grades to white students than to Hispanic students in English. Hence most of the 34. A post-regression χ 2 test rejects the equality of coefﬁcients “white teacher–black student” and “black teacher–white student,” as well as the equality of coefﬁcients “white teacher–Hispanic student” and “Hispanic teacher–white student.” The χ 2 statistic is 15.28 (respectively, 15.11) with a p-value of 0.0001 (respectively, 0.0001). 35. The “white teacher–Hispanic student” coefﬁcient is signiﬁcant. Moreover, a χ 2 test rejects the equality of the “white teacher–Hispanic student” coefﬁcient and the “white teacher–black student.” The statistic equals 4.62 and the p-value is 0.0316. 362 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad same-race effect on teacher assessments is driven by the behavior of white teachers toward black and Hispanic students. Policy Implications Racial Gaps in Test Scores and in Teacher Assessments Columns (1) to (4) of table 10 estimate racial gaps in test scores and in teacher assessments from kindergarten to grade 5 for both mathematics and English.36 As documented in the literature, the gap between white and black test scores increases from kindergarten to grade 5: from 63 percent to 93 percent of a standard deviation in mathematics, and from 45 percent to nearly 80 percent of a standard deviation in English. However, teacher assessments present a different picture. The white–black teacher assessment gap narrows slightly, decreasing from 47 percent to 45.5 percent of a standard deviation in mathematics and from 42 percent to 38.5 percent of a standard deviation in English. It is interesting that, over the same period, the fraction of black students assessed by a same-race teacher increases from 27.3 percent in kindergarten to 34.5 percent in grade 5, and the fraction of white students assessed by a same-race teacher remains relatively constant, at 92 percent. Because teacher assessments may depend on teachers’ identities, columns (9) to (12) present teacher assessment racial gaps while controlling for teachers’ race and for teacher–student racial interaction dummies.37 In these columns, the gap in teacher assessments increases from fall kindergarten to grade 5, from 37 percent to 49 percent of a standard deviation in mathematics, and from 46.6 percent to 49 percent of a standard deviation in English. The racial teacher assessment gap is increasing only when controlling for teachers’ race and teacher–student racial interactions.38 For Hispanic students, gaps in teacher assessments narrow faster than gaps in test scores. The white–Hispanic test score gap declines from 78 percent to 54 percent of a standard deviation in mathematics (a reduction of 24 percentage points [p.p.]); the white–Hispanic teacher assessment gap declines from 57 percent to 22 percent of a standard deviation in mathematics (a reduction of 35 p.p.). In columns (9) and (10), where regressions incorporate teachers’ race dummies and teacher–student racial interaction dummies, the gap in teacher assessment of student mathematics skills goes from 43 percent to 28 percent of a standard deviation (a 15-p.p. reduction). The situation is similar 36. Spring kindergarten, spring grade 1, and spring grade 3 are omitted from the table to save space, but the gaps evolve in the same manner from fall kindergarten to spring grade 5. 37. The full set of variables Dummy(Student race = r) × Dummy(Teacher race = r(cid:3)) for all pairs of races r and r(cid:3). Including other teacher observables as controls, such as gender, experience, tenure, and teacher ﬁxed effects, does not affect white–black teacher assessment gaps. 38. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 363 ASSESSED BY A TEACHER LIKE ME t n e m s s e s s A r e h c a e T t n e m s s e s s A r e h c a e T e r o c S t s e T h s i l g n E s c i t a m e h t a M h s i l g n E s c i t a m e h t a M h s i l g n E s c i t a m e h t a M g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F ) 2 1 ( ) 1 1 ( ) 0 1 ( ) 9 ( ) 8 ( ) 7 ( ) 6 ( ) 5 ( ) 4 ( ) 3 ( ) 2 ( ) 1 ( 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i s t n e m s s e s s A r e h c a e T n i d n a s e r o c S t s e T n i s p a G l i a c a R ∗ ∗ 3 3 9 4 – . ) 1 1 7 0 ( . ∗ ∗ 4 9 0 3 – . ) 4 3 6 0 ( . 7 7 4 0 . ) 2 2 7 0 ( . ∗ ∗ 2 6 6 4 – . ) 1 7 7 0 ( . ∗ ∗ 9 7 5 4 – . ) 4 4 7 0 ( . ) 1 7 8 0 ( . 0 7 5 0 – . ∗ ∗ 0 0 9 4 – . ) 4 7 9 0 ( . ∗ ∗ 1 6 7 2 – . ) 6 8 8 0 ( . 4 4 4 1 . ) 0 7 0 1 ( . ∗ ∗ 4 4 7 3 – . ∗ ∗ 8 5 8 3 – . ∗ ∗ 5 4 1 4 – . ∗ ∗ 5 5 5 4 – . ∗ ∗ 1 4 7 4 – . ∗ ∗ 7 5 9 7 – . ) 1 8 1 1 ( . ) 7 1 4 0 ( . ) 0 3 3 0 ( . ) 1 5 5 0 ( . ) 1 0 4 0 ( . ) 6 8 3 0 ( . ∗ ∗ 6 0 3 4 – . ∗ ∗ 7 2 4 2 – . ∗ ∗ 0 7 5 4 – . ∗ ∗ 6 7 1 2 – . ∗ ∗ 8 6 5 5 – . ∗ ∗ 4 6 2 6 – . ) 9 3 1 1 ( . ) 7 1 3 0 ( . ) 9 8 2 0 ( . ) 0 3 4 0 ( . ) 6 4 3 0 ( . ) 3 0 3 0 ( . 3 6 6 0 . ∗ ∗ 4 0 6 1 . 7 0 6 0 – . ∗ ∗ 3 8 3 2 . 8 7 3 0 – . ∗ ∗ 4 7 3 1 – . ) 8 5 3 1 ( . ) 9 8 4 0 ( . ) 9 0 5 0 ( . ) 2 2 7 0 ( . ) 5 6 7 0 ( . ) 2 0 5 0 ( . ∗ ∗ 8 3 5 4 – . ) 1 8 2 0 ( . ∗ ∗ 1 5 2 5 – . ) 1 7 2 0 ( . ∗ ∗ 7 5 3 2 . ) 5 2 5 0 ( . ∗ ∗ 7 8 2 9 – . ) 4 3 5 0 ( . ∗ ∗ 7 8 3 5 – . ) 1 2 4 0 ( . 5 1 6 0 . ) 0 8 7 0 ( . ∗ ∗ 6 3 2 6 – . ) 6 9 2 0 ( . ∗ ∗ 5 8 7 7 – . ) 9 0 3 0 ( . ∗ 0 5 3 1 . ) 4 7 5 0 ( . . 0 1 e b a T l k c a B l 364 i c n a p s H i n a s A i s e Y s e Y s e Y s e Y o N o N o N o N o N o N o N o N e c a R r e h c a e T l i a c a R d n a 7 2 6 , 0 1 4 0 3 , 6 1 3 3 2 , 5 0 0 6 , 1 1 7 2 6 , 0 1 4 0 3 6 1 , 3 3 2 , 5 0 0 6 , 1 1 7 2 6 , 0 1 4 0 3 , 6 1 3 3 2 , 5 0 0 6 , 1 1 s n o i t a v r e s b O s m r e T n o i t c a r e t n I 5 0 0 . . 1 3 3 5 0 0 . . 8 4 0 4 4 0 0 . . 9 0 1 7 0 0 . . 3 2 3 5 0 0 . . 7 6 4 5 0 0 . . 3 5 5 4 0 0 . . 9 5 1 7 0 0 . . 4 6 4 1 1 0 . . 4 4 9 7 0 0 . . 6 9 8 2 1 0 . . 3 2 6 2 1 0 . . 3 8 1 1 . l e v e l % 1 e h t t a t n a c ﬁ n g s i i y l l a c i t s i t a t s ∗ ∗ ; l e v e l % 5 e h t t a t n a c ﬁ n g s i i c i t s i t a t S F 2 R y l l a c i t s i t a t S ∗ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad for assessments of English skills: although the gap in test scores rises by 10 p.p., the gap in teacher assessments goes down by 35 p.p. With controls, in columns (11) and (12), the gap in teacher assessments falls by only 15 p.p. Broadly speaking, relying solely on teacher assessments may not provide an accurate description of racial gaps from kindergarten to grade 5. Black–white test score gaps in teacher assessments do not increase from kindergarten to grade 5, whereas racial gaps in test scores suggest that African American stu- dents are falling behind. Hispanic–white gaps in teacher assessments narrow faster than gaps in test scores, except when controlling for dummies for the teacher’s race and teacher–student racial interaction dummies. Teacher Assessments and Later Test Scores The paper’s main result will be especially important if teacher assessments reﬂect perceptions that have a causal impact on student performance in math- ematics and English. The effect of more favorable assessments is ambiguous as, on the one hand, studies report that more positive treatment and attitudes toward minority students lead to higher achievement (Casteel 1998); on the other hand, in a survey of existing research, Cohen and Steele (2002) describe the potentially negative impacts of “overpraising” and “underchallenging” stu- dents (Mueller and Dweck 1998). Importantly, in this paper’s data set, stu- dents do not see teacher assessments. Therefore, it is unlikely that teachers were trying to please students by being too positive about their English and mathematics abilities.39 Estimating the impact of teacher perceptions on student performance is difﬁcult because a causal estimation requires an experimental setting in which teachers get randomized information on students; typical experiments deceive teachers, inducing them to think more positively about a random subset of stu- dents (Jussim and Harber 2005). Experiments are typically performed on rela- tively smaller samples that are not nationally representative. In the well-known Pygmalion study, a random fraction of students was labeled as bloomers and the impact of this information on students’ IQ progress was found signiﬁ- cant (Rosenthal and Jacobson 1968). Effects of teacher perceptions on later achievement are still debated (Jussim and Harber 2005). The challenge with my observational data set is to identify the impact of teacher assessments separately from the impact of teacher quality, which may be correlated with assessments, and from the impact of student ability, which is likely positively correlated with teacher assessments conditional on test 39. My results that white teachers give lower assessments to blacks and Hispanics suggests that teachers were not trying to provide socially desirable answers. Bertrand and Mullainathan (2001) describe such “social desirability” bias in surveys but here a social desirability bias would mean even lower teacher assessments for black and Hispanic students. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 365 ASSESSED BY A TEACHER LIKE ME scores. Because the data set follows students over time, and because teacher identiﬁers are available, we can estimate the impact of previous assessments on later scores conditional on student and teacher effects. A student effect controls for student unobservables that do not vary across grades, while the teacher effect controls for teacher quality and other teacher characteristics that affect later test scores: TSi, f,t = constant + b · TAi, f,t−1 + c · TSi, f,t−1 + Student i, f + Gr ade t, f + Teacher i, f,t + Res idual i, f,t (13) where notations are as above, TSi,f,t is the test score of child i in ﬁeld f in grade t, TAi,f,t−1 is the subjective assessment of student i in the previous grade, TSi,f,t−1 is the test score in the same subject in the previous period, Studenti,f is a student effect, Gradet,f is a grade effect, and Teacheri,f,t is a teacher effect. The coefﬁcient of interest here is b, the effect of the previous teacher assessment on the test score. In such a regression, estimates of the coefﬁcients may be biased due to regression to the mean (Arellano and Bond 1991): A child who has a test score much above the average in, say, grade 1, is likely to have a test score closer to the average in the next period, in grade 3. This typically leads to biases in the estimation of the coefﬁcients of interest b and c (Nickell 1981). To alleviate this issue, the test score TSi,f,t−1 is instrumented by test scores from previous grades as in Arellano and Bond (1991) as long as a student effect is included, in columns (2) to (4) and (6) to (8) of table 11. This table shows that, in such speciﬁcations, teacher assessments have an effect on later test scores, over and above prior test scores, child ﬁxed effects, and teacher ﬁxed effects. This effect is robust to a variety of speciﬁcations with or without the Arellano and Bond (1991) instrument, with or without child and teacher ﬁxed effects, and with or without controls for peers’ test scores. A one standard deviation increase in prior teacher assessment is correlated with a 3.7 percent to 8 percent standard deviation increase in next grade’s test score, conditional on the effects and the maintained controls. In the regression, teacher assessments have a greater impact than test scores on later test scores.40 Also, keeping in mind the limitations of the regression (absence of an experimental design), the results suggest that having a same-race teacher from kindergarten to grade 5 raises teacher assessments by 7 percent of a standard deviation in mathematics (table 4), which raises grade 5 scores cumulatively over the ﬁve waves by 2.8 percent of a standard deviation in mathematics. Although only 2.57 percent of white students never 40. But interestingly, results available on request suggest that teacher assessments do not have an impact on test scores in the same grade. Teacher assessments have an impact on later test scores but not a signiﬁcant impact on current test scores. 366 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad e r o c S t s e T h s i l g n E e r o c S t s e T s c i t a m e h t a M ) 8 ( ∗ ∗ 3 6 0 0 . ) 7 0 0 0 ( . ∗ ∗ 7 3 0 0 . ) 7 0 0 0 ( . 4 . 7 1 7 8 0 . o N o N s e Y s e Y s e Y ) 7 ( ∗ ∗ 5 5 6 0 . ) 4 0 0 0 ( . ∗ ∗ 8 6 1 0 . ) 4 0 0 0 ( . 5 . 5 5 9 , 1 1 8 8 6 0 . s e Y o N s e Y o N s e Y 9 4 6 , 1 3 3 0 1 1 1 , . 6 4 3 7 2 8 0 . o N s e Y s e Y s e Y o N . 5 4 2 1 4 1 , 4 1 6 0 . s e Y s e Y s e Y o N o N 3 . 7 6 5 9 0 . o N o N s e Y s e Y s e Y . 2 8 8 2 , 7 9 7 7 0 . s e Y o N s e Y o N s e Y . 2 0 3 6 1 9 0 . o N s e Y s e Y s e Y o N ) 6 ( ) 5 ( ) 4 ( ) 3 ( ) 2 ( ∗ ∗ 7 5 0 0 . ∗ ∗ 5 8 6 0 . 0 1 0 0 – . ∗ ∗ 0 4 7 0 . ∗ ∗ 7 5 0 0 . ) 6 0 0 0 ( . ) 4 0 0 0 ( . ) 2 1 0 0 ( . ) 5 0 0 0 ( . ) 1 1 0 0 ( . ∗ ∗ 9 1 0 0 . ∗ ∗ 8 3 1 0 . ∗ ∗ 0 8 0 0 . ∗ ∗ 0 4 1 0 . ∗ ∗ 1 6 0 0 . ) 5 0 0 0 ( . ) 4 0 0 0 ( . ) 3 1 0 0 ( . ) 6 0 0 0 ( . ) 7 0 0 0 ( . s e r o c S t s e T r e t a L n o s t n e m s s e s s A r e h c a e T f o t c a p m I . 1 1 e b a T l ) 1 ( ∗ ∗ 9 7 7 0 . ) 4 0 0 0 ( . ∗ ∗ 0 0 1 0 . ) 4 0 0 0 ( . 3 . 8 8 1 , 0 1 8 9 6 0 . s e Y s e Y s e Y o N o N e v a W s u o i v e r P n i t n e m s s e s s A r e h c a e T e v a W s u o i v e r P n i e r o c S t s e T r e d n e G d n a e c a R t n e d u t S r e d n e G d n a e c a R r e h c a e T c i t s i t a t S F 2 R s t c e f f E t n e d u t S s t c e f f E r e h c a e T s t c e f f E e d a r G s n o i t a v r e s b O . s e t a m i t s e r a l i m s i l s d e i y m o o r s s a c l y b g n i r e t s u C l . t n e d u t s y b d e r e t s u l c s r o r r e d r a d n a t S : s e t o N . l e v e l % 1 e h t t a t n a c ﬁ n g s i i y l l a c i t s i t a t S ∗ ∗ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 367 ASSESSED BY A TEACHER LIKE ME have a same-race teacher from kindergarten to grade 5, 54.3 percent of black students and 63 percent of Hispanic students have not had a single same-race teacher during the same period. 5. CONCLUSION The paper presents evidence that teachers give better assessments to students of their own race, even when controlling for test scores, student unobservables, teacher unobservables, and behavioral measures. Results are not signiﬁcantly explained by measurement error in test scores or grading on a curve within each classroom. The same-race effect appears as soon as in kindergarten for skills covered by the tests. The presence of continuous detailed teacher assessments of similar skills as test scores, the longitudinal nature of the data set, and the use of econometric techniques controlling for a large number of teacher and student ﬁxed effects are key ingredients for obtaining this paper’s results. Such evidence of better perceptions of same-race students’ performance using national representative data from the early years, with detailed robust- ness checks, should contribute to the debate in at least two ways. First, shifting from standardized test scores to teacher assessments of students may intro- duce bias in assessments. Although teachers may have a better grasp of student ability than tests, teachers’ perceptions are also affected by race and ethnicity. Second, my results suggest that teachers’ perceptions of same-race students explain part of the positive impact of same-race teachers on student test scores, as documented by Dee (2005). I would like to thank Brian Jacob, Francis Kramarz, Eric Maurin, Jesse Rothstein, Cecilia Rouse, and Timothy Van Zandt, as well as two anonymous referees, for particularly helpful suggestions on previous versions of this paper. I also thank audiences at the London School of Economics, the University of Amsterdam, Uppsala University, and the Industrial Relations Section at Princeton University. I am indebted to Cecilia Rouse for access to the data set. This project was undertaken while visiting Princeton University. For computing and ﬁnancial support I thank INSEAD, CREST, the London School of Economics, and the Marie Curie Programme. The usual disclaimers apply. REFERENCES Abowd, John M., Robert Creecy, and Francis Kramarz. 2002. Computing person and ﬁrm effects using linked longitudinal employer–employee dataset. Unpublished paper, Cornell University. Abowd, John M., Francis Kramarz, and David N. Margolis. 1999. High wage workers and high wage ﬁrms. Econometrica 67(2): 251–334. doi:10.1111/1468-0262.00020 Achinstein, Betty, Rodney T. Ogawa, Dena Sexton, and Casia Freitas. 2010. Retaining teachers of color: A pressing problem and a potential strategy for “hard-to-staff” schools. Review of Educational Research 80(1): 71–107. doi:10.3102/0034654309355994 368 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad Arellano, Manuel, and Stephen Bond. 1991. Some tests of speciﬁcation for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58(2): 277–297. doi:10.2307/2297968 Baltagi, Badi. 2008. Econometric analysis of panel data. Hoboken, NJ: Wiley. Bertrand, Marianne, and Sendhil Mullainathan. 2001. Do people mean what they say? Implications for subjective survey data. American Economic Review 91(2): 67–72. doi:10.1257/aer.91.2.67 Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. Measurement error in survey data. In Handbook of econometrics. vol. 5, edited by James J. Heckman and Edward Learner, pp. 3705–3843. Amsterdam, The Netherlands: Elsevier. Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. 2011. Robust inference with multiway clustering. Journal of Business & Economic Statistics 29(2): 238–249. doi:10.1198/jbes.2010.07136 Carpenter, Jeffrey P., Glenn W. Harrison, and John A. List. 2005. Field experiments in economics: An introduction. In Research in experimental economics 10, edited by R. Mark Isaac and Douglas A. Norton, pp. 1–15. Bingley, UK: Emerald Publishing. Casteel, Clifton A. interactions grated classrooms. Journal of Educational Research 92(2): 00220679809597583 1998. Teacher–student and race in inte- 115–120. doi:10.1080/ Clotfelter, Charles T., Helen F. Ladd, and Jacob Vigdor. 2005. Who teaches whom? Race and the distribution of novice teachers. Economics of Education Review 24(4): 377–392. doi:10.1016/j.econedurev.2004.06.008 Cohen, Geoffrey L., and Claude M. Steele. 2002. A barrier of mistrust: How negative stereotypes affect cross-race mentoring. In Improving academic achievement: Impact of psychological factors on education, edited by Joshua Aronson, pp. 305–331. Bingley, UK: Emerald Publishing. doi:10.1016/B978-012064455-1/50018-X Darling-Hammond, Linda, and Ray Pecheone. 2010. Developing an internationally comparable balanced assessment system that supports high-quality learning. Paper presented at the National Conference on Next Generation Assessment Systems, Center for K-12 Assessment & Performance Management, Washington, DC, March. Dee, Thomas S. 2004. Teachers, race, and student achievement in a random- ized experiment. Review of Economics and Statistics 86(1): 195–210. doi:10.1162/ 003465304323023750 Dee, Thomas S. 2005. A teacher like me: Does race, ethnicity, or gender matter? American Economic Review 95(2): 158–165. doi:10.1257/000282805774670446 Ferguson, Ronald F. 2003. Teachers’ perceptions and expectations and the black-white test score gap. Urban Education 38(4): 460–507. doi:10.1177/0042085903038004006 Figlio, David N., and Maurice E. Lucas. 2004. Do high grading standards af- fect student performance? Journal of Public Economics 88(9): 1815–1834. doi:10.1016/ S0047-2727(03)00039-2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 369 ASSESSED BY A TEACHER LIKE ME Fryer, Jr, Roland G., and Steven D. Levitt. 2004. Understanding the black-white test score gap in the ﬁrst two years of school. Review of Economics and Statistics 86(2): 447–464. doi:10.1162/003465304323031049 Fryer, Roland G., and Steven D. Levitt. 2006. The black-white test score gap through third grade. American Law and Economics Review 8(2): 249–281. doi:10.1093/aler/ ahl003 Giuliano, Laura, David I. Levine, and Jonathan Leonard. 2009. Manager race and the race of new hires. Journal of Labor Economics 27(4): 589–631. Giuliano, Laura, David I. Levine, and Jonathan Leonard. 2011. Racial bias in the manager–employee relationship: An analysis of quits, dismissals, and promotions at a large retail ﬁrm. Journal of Human Resources 46(1): 26–52. doi:10.1353/jhr.2011.0022 Gluszek, Agata, and John F. Dovidio. 2010. The way they speak: A social psychological perspective on the stigma of nonnative accents in communication. Personality and Social Psychology Review 14(2): 214–237. doi:10.1177/1088868309359288 Greene, William H. 2011. Econometric analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. Gresham, Frank M., and Stephen N. Elliott. 1990. Social skills rating system (SSRS). Circle Pines, MN: American Guidance Service. Hinnerich, Bj¨orn Tyrefors, Erik H¨oglin, and Magnus Johannesson. 2011. Are boys discriminated in Swedish high schools? Economics of Education Review 30(4): 682–690. doi:10.1016/j.econedurev.2011.02.007 Jan, and Oi-man Kwok. 2007. Hughes, student–teacher and parent–teacher relationships on lower achieving readers’ engagement and achieve- in the primary grades. Journal of Educational Psychology 99(1): 39–51. ment doi:10.1037/0022-0663.99.1.39 Inﬂuence of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Ingersoll, Richard M., and Henry May. 2011. Recruitment, retention and the minority teacher shortage. Consortium for Policy Research in Education Research Report No. RR-69. Jackson, C. Kirabo, and Elias Bruegmann. 2009. Teaching students and teaching each other: The importance of peer learning for teachers. American Economic Journal: Applied Economics 1(4): 85–108. Jussim, Lee. 1989. Teacher expectations: Self-fulﬁlling prophecies, perceptual bi- ases, and accuracy. Journal of Personality and Social Psychology 57(3): 469–480. doi:10.1037/0022-3514.57.3.469 Jussim, Lee, and Kent D. Harber. 2005. Teacher expectations and self-fulﬁlling prophe- cies: Knowns and unknowns, resolved and unresolved controversies. Personality and Social Psychology Review 9(2): 131–155. doi:10.1207/s15327957pspr0902_3 Kirby, Sheila Nataraj, Mark Berends, and Scott Naftel. 1999. Supply and demand of minority teachers in Texas: Problems and prospects. Educational Evaluation and Policy Analysis 21(1): 47–66. doi:10.3102/01623737021001047 370 Amine Ouazad Lavy, Victor. 2004. Do gender stereotypes reduce girls’ human capital outcomes? Evidence from a natural experiment. NBER Working Paper No. 10678. Lyons, Anthony, and Yoshihisa Kashima. 2003. How are stereotypes maintained through communication? The inﬂuence of stereotype sharedness. Journal of Person- ality and Social Psychology 85(6): 989. doi:10.1037/0022-3514.85.6.989 Marcus, Geoffrey, Susan Gross, and Carol Seefeldt. 1991. Black and white students’ perceptions of teacher treatment. Journal of Educational Research 84(6): 363–367. doi:10.1080/00220671.1991.9941817 Meier, Kenneth J., Joseph Stewart, Jr., and Robert E. England. 1989. Race, class, and education: The politics of second-generation discrimination. Madison, WI: University of Wisconsin Press. Moulton, Brent R. 1990. An illustration of a pitfall in estimating the effects of ag- gregate variables on micro units. Review of Economics and Statistics 72(2): 334–338. doi:10.2307/2109724 Mueller, Claudia M., and Carol S. Dweck. 1998. Praise for intelligence can undermine children’s motivation and performance. Journal of Personality and Social Psychology 75(1): 33–52. doi:10.1037/0022-3514.75.1.33 Nickell, Stephen. 1981. Biases in dynamic models with ﬁxed effects. Econometrica 49(6): 1417–1426. doi:10.2307/1911408 Phelps, Edmund S. 1972. The statistical theory of racism and sexism. American Economic Review 62(4): 659–661. Price, Joseph, and Justin Wolfers. 2010. Racial discrimination among NBA ref- erees. Quarterly Journal of Economics 125(4): 1859–1887. doi:10.1162/qjec.2010.125.4 .1859 Rosenthal, Robert, and Lenore Jacobson. 1968. Pygmalion in the classroom: Teacher expectation and pupils’ intellectual development. New York: Holt, Rinehart & Winston. Rudner, Lawrence M., and William D. Schafer. 2001. Reliability: ERIC Digest No. ED458213. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. Rutherford, Jr., Robert B., Mary Magee Quinn, and Sarup R. Mathur. 2004. Handbook of research in emotional and behavioral disorders. New York: Guilford Publications. Sherman, Thomas M., and William H. Cormier. 1974. An investigation of the inﬂuence of student behavior on teacher behavior. Journal of Applied Behavior Analysis 7(1): 11–21. doi:10.1901/jaba.1974.7-11 Stangor, Charles, Gretchen B. Sechrist, and John T. Jost. 2001. Changing racial beliefs by providing consensus information. Personality and Social Psychology Bulletin 27(4): 486–496. doi:10.1177/0146167201274009 Tourangeau, Karen, Christine Nord, Thanh Le, Alberto G. Sorongon, and Michelle Najarian. 2009. Combined user’s manual for the ECLS-K eighth-grade and K-8 full sam- ple data ﬁles and electronic codebooks. Alexandria, VA: National Center for Education Statistics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 371 ASSESSED BY A TEACHER LIKE ME Van Ewijk, Reyn. 2011. Same work, ers’ subjective assessments. Economics of Education Review 30(5): doi:10.1016/j.econedurev.2011.05.008 lower grade? Student ethnicity and teach- 1045–1058. Wilson, Robert J., and Rhonda L. Martinussen. 1999. Factors affecting the assessment of student achievement. Alberta Journal of Educational Research 45(3): 267–277. Wooldridge, Jeffrey M. 2002. Econometric analysis of cross section and panel data. Cam- bridge, MA: MIT Press. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 372 Amine Ouazad image

下载pdf