Amine Ouazad

Amine Ouazad

INSEAD

经济系

77300 Fontainebleau

法国

amine.ouazad@insead.edu

ASSESSED BY A TEACHER LIKE

ME: RACE AND TEACHER

ASSESSMENTS

抽象的
Do teachers assess same-race students more favorably?
This paper uses nationally representative data on teacher
assessments of student ability that can be compared with
test scores to determine whether teachers give better as-
sessments to same-race students. The data set follows
students from kindergarten to grade 5, a period dur-
ing which racial gaps in test scores increase rapidly.
Teacher assessments comprise up to twenty items mea-
suring specific skills. Using a unique within-student
and within-teacher identification and while controlling
for subject-specific test scores, I find that teachers do
assess same-race students more favorably. Effects ap-
pear in kindergarten and persist thereafter. Robustness
checks suggest that: student behavior does not explain
this effect; same-race effects are evident in teacher as-
sessments of most of the skills; grading “on the curve”
should be associated with lower assessments; and mea-
surement error in assessments or test scores does not
significantly affect the estimates.

334

土井:10.1162/EDFP_a_00136
© 2014 Association for Education Finance and Policy

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

1. 介绍
A growing body of research in education and psychology argues that minority
students receive less favorable feedback and less praise than do their white
同行 (Meier, 斯图尔特, and England 1989; 马库斯, 总的, and Seefeldt 1991;
Casteel 1998; Van Ewijk 2011). The research is usually conducted on small
样品, which may cast doubt on the wider applicability of results obtained
for particular schools or school districts (IE。, on whether results are externally
valid; Carpenter, 哈里森, and List 2005). In this paper I use a longitudinal
and nationally representative data set to measure whether or not teachers
assess same-race students more favorably. Field experiments with nationally
representative European data sets have recently measured whether teachers
assess minority students more favorably (Hinnerich, H¨oglin, and Johannesson
2011). 在美国, 然而, there are no nationally representative
data on teachers’ perceptions of same-race students’ skills. Analysis of the
National Educational Longitudinal Study of 1988 suggests that teachers have
more favorable perceptions of same-race students (Dee 2005), but in that study
the variables used to capture those perceptions (例如, “constantly inattentive,”
“frequently disruptive,” “rarely completes homework”) are measures more of
student behavior than of student performance. Hence these data cannot be
used to infer a same-race effect because such teacher perceptions are not
comparable to test scores.

There is another reason why it is so difficult to measure whether teachers
assess same-race students more highly. Even if the researcher has comparable
teacher assessments of students and test scores, a finding that teachers give
better assessments to same-race students (conditional on test scores) 可以
not be given a causal interpretation owing to possible confounding factors.
Causal effects can be estimated if the researcher randomizes the assignment
of teachers to students, but such randomization is a long and costly process
that is usually performed only for small, nonrepresentative samples.

These considerations leave the researcher in a quandary. 一方面,
randomized samples with comparable teacher assessments and test scores
provide convincing evidence that teachers have more favorable perceptions
of same-race students’ skills, but randomized estimates are typically available
only for nonrepresentative samples of students. 另一方面, 国家-
ally representative samples usually lack two important features: teacher as-
sessments of student performance that are comparable to test scores, 和
randomized assignment of teachers to students.1

1. Lavy (2004) uses a nationally representative sample to estimate the impact of student gender on
grades at the high-school matriculation exam in Israel, but teacher assignments are not randomized.
Adding unique teacher identifiers to Lavy (2004) would also allow an identification strategy based
on comparisons of teacher assessments and test scores while controlling for teacher effects.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

335

ASSESSED BY A TEACHER LIKE ME

This paper uses a longitudinal, nationally representative data set, the Early
Childhood Longitudinal Study, Kindergarten Class of 1998–1999, which in-
cludes detailed teacher assessments and test scores—in both mathematics
and English—in each wave of data collection from kindergarten to grade 5.
The teacher assessments are available for both subjects, and there are as
many as ten questions on specific skills within each subject in each follow-up
(Tourangeau et al. 2009). Given these data, continuous teacher assessments
can be compared with test scores.2 Teachers are not randomly assigned to
学生. Because the data set follows students through five follow-ups (从
kindergarten to grade 5) and includes teacher and student identifiers, 如何-
曾经, I am able to estimate the same-race effect on teacher assessments by
using a unique within-student (IE。, across grades3) and within-teacher iden-
tification strategy that controls for student- and teacher-specific confounding
因素. The paper also describes several robustness checks, which indicate
那: (1) behavior does not explain the reported estimate of the same-race effect
on teacher assessments; (2) the same-race effect appears in kindergarten for
most skills that are assessed by the teacher; (3) grading on the curve within a
classroom would result in lower teacher assessments for same-race students;
和 (4) measurement error in teacher assessments or in test scores has no
significant effect on the point estimates.

The within-student identification strategy yields the following result: A
student who moves from a same-race teacher in one grade to a different-race
teacher in the next grade encounters a significant drop in teacher assessments.4
Our second, within-teacher identification strategy compares the teacher as-
sessment of same-race students to the average teacher assessment in the
student’s classroom. I combine the within-student and within-teacher iden-
tification strategies and condition the results on student test scores: 存在
assessed by a same-race teacher increases teacher assessments of student per-
formance by 4 percent of a standard deviation in English and by 7 的百分比
a standard deviation in mathematics.

I design robustness checks to assess whether these results are consistent
with a teacher bias in favor of same-race students. One might object that
higher teacher assessments for same-race students reflect behavioral differ-
恩塞斯. 毕竟, teacher assessments of student performance do reflect, 部分地,

2. Tourangeau et al. (2009) mention that teacher assessments and test scores measure students’ skills
within the same broad curricular domains. 部分 4 examines teachers’ perceptions of students
skill by skill—and as early as in kindergarten—for skills that are the most likely to be assessed by
test scores; the results are similar (if not stronger) same-race effects.

3. 还, the survey is designed in a way that facilitates test score comparisons across grades. The tests
consist of two stages: an initial routing test for student ability, and second-stage tests that include
questions common to multiple grades (Tourangeau et al. 2009).

4. All of these results are conditional on student test scores.

336

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

student behavior (Sherman and Cormier 1974). The within-student identifica-
tion strategy used here neutralizes the effect of permanent student behavioral
差异, but it cannot control for changes in student behavior that could
affect teacher assessments. Because the allocation of teachers to students is
not random,5 behavioral changes that raise teacher assessments may correlate
with being assigned to a same-race teacher in the subsequent grade. The data
set includes four reliable measures of student behavior that are based on the
Social Skills Rating System (Gresham and Elliott 1990). These measures vary
both across students and across grades. I do not find that behavioral differ-
ences between same-race and other-race students explain the within-student
and within-teacher estimates of same-race effects on teacher assessments. Nei-
ther do I find that changes in behavior from one grade to the next are associated
with the student moving from a same-race (other-race) teacher to an other-race
(same-race) teacher.

A second possible objection is that, as measures of student performance,
test scores are noisy and therefore may not fully condition for student per-
formance when assessing same-race effects on teacher assessments. 在那里面
案件, teacher assessments could be higher for same-race students simply be-
cause same-race students perform better. Test scores and teacher assessments
are highly reliable, but the question is whether a small amount of measure-
ment error would be sufficient to confound the estimate of a same-race effect.
This paper calculates the impact of a given amount of measurement error in
test scores on the derived estimate of the same-race effect. A test score mea-
surement error of 50 percent would be required to account for the estimated
same-race effect.

The third major objection to this paper’s findings is that teacher assess-
ments may be an implicit ranking of students within a given classroom rather
than measures (例如, test scores) based on a common scale. I have used a
simple statistical framework to show that, because minority students have (在
average) lower test scores than white students and because minority and white
students tend to be in different classrooms, grading on a curve would lead
to higher teacher assessments for minority students—even though minority
students have significantly (最多 40 percent of a standard deviation) 降低
teacher assessments. Grading on a curve also would affect estimates of the
same-race effect if peer group composition were correlated with assignment
to a same-race teacher. Controlling for peers’ average test score in the main
specification does not affect my estimate of the same-race effect on teacher

5. For some evidence of nonrandom allocation of teachers to students, see Clotfelter, Ladd, and Vigdor

(2005).

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

337

ASSESSED BY A TEACHER LIKE ME

assessment. 而且, assignment to a same-race teacher is not significantly
correlated with peers’ average test score.

My main finding—that students are assessed more highly by teachers of
their own race—is robust to the three objections just detailed. That finding is
of particular relevance if teacher assessments are shown to have an effect on
student achievement. Identifying the impact of teacher perceptions of student
skills on later test scores is difficult, and it has led to a large and somewhat
controversial literature in psychology and education (Rosenthal and Jacobson
1968; Jussim 1989; Jussim and Harber 2005). In the so-called Pygmalion
实验, a random subset of students in a small sample of participating
schools is typically labeled “bloomers,” and the research focus is on estimat-
ing the effect of such information on student performance. In this paper’s
nationally representative data set, I find that previous assessments have a
significant impact on later test scores (after conditioning for student effects,
teacher effects, and grade effects).6 实际上, previous teacher assessments are
more strongly correlated with later test scores than are previous test scores.

The paper contributes to two separate literatures. 第一的, it belongs to the
growing literature that documents same-race effects in a number of other
上下文. Price and Wolfers (2010) provide statistical evidence that National
Basketball Association referees favor players of their own race. In firms, Giu-
liano, 莱文, and Leonard (2009) found that white, Hispanic, and Asian
managers hire more whites and fewer blacks than do black managers. 在里面
data set of Giuliano, 莱文, and Leonard (2011), employees have better out-
comes when they are the same race as their manager. The main contribution
of this paper to that literature is providing evidence of same-race effects on
perceptions in education while using a nationally representative data set and
novel robustness checks.

In studying teacher perceptions of student skills from kindergarten to
年级 5, this paper adds also to the literature on teachers’ perceptions of
minority students during their early years of schooling. The previous literature
on race and student assessment has used data for no earlier than grade 8 (Dee
2004). Racial test score gaps expand rapidly much sooner, 然而; Fryer and
莱维特 (2004) document that, between the start of kindergarten and the end
of first grade, black students’ scores fall by 20 percent of a standard deviation
relative to white students with otherwise similar characteristics.

The conclusions reported in this paper should be of particular interest to
policy makers. 第一的, teachers as a group are less diverse than the U.S. student
人口. 有, 尤其, a persistent gap between the percentage of

6.

I also instrument the previous test score by lagged test scores to avoid biases stemming from
regression to the mean (看, 例如, Arellano and Bond 1991).

338

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

minority teachers and the percentage of minority students. Numerous papers
and reports have suggested improvements in the recruitment and retention
of minority teachers (Kirby, Berends, and Naftel 1999; Achinstein et al. 2010;
Ingersoll and May 2011). 第二, the paper’s results suggest that involving
teachers in student assessments7 may affect those assessments in ways that
reflect racial perceptions. To ensure fairness, 所以, an assessment system
that involves teachers should exhibit an appropriate racial balance among
graders. Note also that an interesting area of research suggests that racial
perceptions are not fixed and can be significantly altered.8

The paper is structured as follows. 部分 2 presents the data set and
descriptive evidence for higher teacher assessments of same-race students
(conditional on test scores). 部分 3 presents the within-student and within-
teacher identification strategies separately before combining them to obtain
the paper’s baseline estimate. 部分 4 discusses the three major objections as
well as two policy implications of our results on teacher assessments. 部分 5
concludes.

2. DATA SET AND DESCRIPTIVE EVIDENCE
Structure of the Data Set

The data set is the Early Childhood Longitudinal Study, Kindergarten cohort
的 1998 (ECLS-K) from the National Center for Education Statistics, 我们.
教育部. The data follow a nationally representative sam-
的 20,000 kindergarten students in fall and spring kindergarten 1998,
spring grade 1, spring grade 3, and spring grade 5. About a thousand schools
participated.

全面的, the design of the experiment is such that observations are mostly
missing at random. Follow-ups have combined procedures to reduce costs
and maintain the sample’s representativeness. Students who move to an-
other school are randomly subsampled to reduce costs, and new schools and
children have been added to the data set to strengthen the survey’s repre-
sentativeness. In the spring of 1999, some of the schools that had previ-
ously declined participation were included. The new participating children
rendered the cross-sectional sample representative of first-grade children, 全部
of whom were followed in the spring of grades 3 和 5. This paper uses weights

7. 例如, Darling-Hammond and Pecheone (2010) argue that teachers should be integrally

involved in the scoring of assessments.

8. Stangor, Sechrist, and Jost (2001) show how informing participants that others hold different be-
liefs about African Americans changes their beliefs about that group. Lyons and Kashima (2003)
suggest that interpersonal communication figures strongly in maintaining stereotypes. An inter-
esting avenue for future research involves examining how colleagues’ perceptions may affect a
teacher’s perceptions—using data as in Jackson and Bruegmann (2009) but instead with teachers’
perceptions of student performance.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

339

ASSESSED BY A TEACHER LIKE ME

桌子 1. Descriptive Statistics

Observations per Student

Observations per Teacher

Test Score

英语
Mathematics

Teacher Assessment

英语
Mathematics

Teacher Racea

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

Student Racea

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

Same-race Teacher by Student Raceb

白色的, non-Hispanic
黑色的, non-Hispanic
Asian, non-Hispanic
Hispanic, any race
Other race, non-Hispanic

意思是

6.991

8.198

50.00
50.00

50.00
50.00

0.809
0.063
0.019
0.052
0.057

0.587
0.137
0.057
0.157
0.062

0.436
0.683
0.188
0.069
0.163
0.056

标清

(2.020)

(5.914)

(10.00)
(10.00)

(10.00)
(10.00)

(0.393)
(0.244)
(0.135)
(0.221)
(0.232)

(0.492)
(0.344)
(0.232)
(0.364)
(0.241)

(0.496)
(0.465)
(0.391)
(0.253)
(0.369)
(0.230)

观察结果

115,950

115,950

67,885
48,065

67,885
48,065

115,950
115,950
115,950
115,950
115,950

115,950
115,950
115,950
115,950
115,950

115,950

115,950
115,950
115,950
115,950
115,950

aOther race, non-Hispanic includes Pacific Islanders, 美洲印第安人, and non-Hispanic students
reporting multiple races.
bBoth of the same race, non-Hispanic, or Hispanic, any race.

provided by the survey’s designers to estimate representative effects, 尽管
the analysis is robust to changes in weights.

Observations that lacked data on basic variables (test scores, subjective
assessments, teachers’ and children’s race and gender) were deleted.9 The
analysis in this paper is based on 48,065 observations in mathematics and
67,885 in English, numbers that are similar to Fryer and Levitt (2006).

The restricted-use version of the data set includes both student and teacher
identifiers. 因此, students can be followed across grades. Within each follow-
向上, observations can be grouped by classroom using the teacher identifiers.
桌子 1 shows that data set includes about 6.9 observations per student (3.45

9. Results are robust to an alternative specification where missing observations are present with a

dummy variable indicating that the data are missing.

340

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

on average per student in each subject); the data set includes 8.2 observations
per teacher.

Test Scores and Teacher Assessments

Test scores are based on answers to multiple-choice questionnaires conducted
by external assessors. They conform to national and state standards.10 Overall,
tests ask more than seventy questions in English, and more than sixty questions
in mathematics. Skills covered by the English assessments from kindergarten
to fifth grade include: print familiarity, letter recognition, and beginning and
ending sounds; recognition of common words (sight vocabulary) and decod-
ing multisyllabic words; vocabulary knowledge, such as receptive vocabulary
and vocabulary in context; and passage comprehension. Skills covered by the
mathematics assessment include: number sense, 特性, and operations;
measurement; geometry and spatial sense; data analysis, 统计数据, and proba-
能力; and patterns, algebra, and functions. Test scores were standardized to a
mean of 50 and a standard deviation of 10 (桌子 1). Reliability measures based
on repeated estimates of test scores indicate that the tests are highly reliable;
Rasch coefficients range between 0.88 和 0.95, 包括的.

Teacher assessments of student skills11 are collected at approximately the
same time as the tests are taken. Up to the spring of grade 3, the same teacher
in English and in mathematics assesses students. A different teacher assesses
students in each grade. Teachers do not see the test results, so that test score
results do not directly affect teacher assessments. The user guide specifies that
“This is not a test and should not be administered directly to the child” (看,
例如, the Spring 2004 Fifth Grade questionnaire12). Teachers complete
one questionnaire per student. There are three different teacher assessments:
for language and literacy, mathematical thinking, and general knowledge.
The current paper uses the English (language and literacy) and mathematics
(mathematical thinking) assessments, as there is no corresponding test score
for general knowledge. The instructions make it clear that these assessments
should not be administered as a test directly to the student. For English and
for mathematics, teachers answer seven to nine questions, for a total number
of fourteen to eighteen questions. Answers are on a 5-point scale: Not Yet,

10. These include the National Assessment for Educational Progress, the National Council of Teachers
of Mathematics, the American Association for the Advancement of Science, and the National
Academy of Sciences.
11.
In the ECLS-K user guide, teacher assessments are also known as the academic rating scale.
12. 页 3 的 2004 Grade 5 mathematics form: “Please rate this child’s skills, 知识, 和
behaviors in mathematics based on your experience with the child identified on the cover of this
questionnaire. This is NOT a test and should not be administered directly to the child. Each question
includes examples that are meant to help you think of the range of situations in which the child
may demonstrate similar skills and behaviors.”

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

341

ASSESSED BY A TEACHER LIKE ME

Beginning, In Progress, Intermediate, and Proficient. An overall assessment
is computed for English and for mathematics. Teacher assessments, like test
scores, were standardized to a mean of 50 and a standard deviation of 10
(桌子 1). Reliability measures suggest that teacher assessments are highly
可靠的; Rasch coefficients range between 0.87 和 0.94.

Descriptive Evidence of Same-Race Effects on Teacher Assessments

The restricted-use version of the ECLS-K reports teachers’ and students’ race
和性别. The survey combines race and ethnicity for teachers. “Hispanic,
any race” is one category, and others are “White, any race,” “Black, any race,”
等等. The survey does distinguish race and ethnicity for students, 如何-
曾经. The two variables for students’ race and ethnicity were hence com-
bined to match the single teacher’s race and ethnicity variable. Hence “same
race” should be read as “same race (non-Hispanic) or both Hispanic (任何
种族).”13

The data set oversamples students from racial and ethnic minorities to
increase the precision of the estimates. In the data set, 14 percent are black
学生, 16 percent are Hispanic students, 和 6 percent are Asian students.
There are significantly more white teachers than white students as a fraction of
the observations, and significantly fewer black, Hispanic, and Asian teachers
compared with the corresponding fractions of black, Hispanic, and Asian
学生. Hence a white student is significantly more likely to be assessed by
a same-race teacher than a black, Hispanic, or Asian student.

数字 1 presents the average teacher assessments at each test score level,
for students assessed by a same-race teacher and for students assessed by a
teacher of another race. Each line is a local polynomial regression of teacher
assessments on test scores;14 the solid line (the dashed line) is estimated on
observations for students assessed by a same-race teacher (a teacher of another
种族). The two graphs suggest that, at most test score levels, students have on
average higher teacher assessments when assessed by a same-race teacher.
The gap appears larger for Hispanic students (bottom graph) than for black
学生 (top graph).

13. Also the student’s race variable follows the 1997 我们. Revisions to the Standards for the Classifica-
tion of Federal Data on Race and Ethnicity published by the Office for Management and Budget,
which allow for the possibility of specifying “more than one race.” However, the share of multira-
cial students is small. Multiracial students are classified as “Other race,” but results are robust to
alternative classifications.

14. Figure generated with local mean smoothing with 500 点, Epanechnikov kernel, and optimal
half-width. The gap is robust to a variety of number of points, kernels, and half-width sizes.

342

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

数字 1. Descriptive Evidence of the Same-Race Effect in (A) Black Students and (乙) Hispanic
Students. Notes: Each panel plots a local polynomial regression of teacher assessments on test
scores, using an Epanechnikov kernel, 500 点, and optimal half-width. The gap between the two
curves is present even when changing the type of the kernel, number of points, and the half-width.

An ordinary least squares (OLS) regression estimates the average effect of
same race teachers on the difference between teacher assessments and test
scores, and provides confidence intervals:

TAi, F,t − TSi, F,t = constant + δ · Same Race i, F,t

+ StudentChar acter is tics i β

+ Teacher Char acter is tics i, F,t γ

+ Gr ade t + εi, F,t

(1)

343

ASSESSED BY A TEACHER LIKE ME

where i indexes students, f the subject area (mathematics or English), and t
the wave of the longitudinal data (t = {Fall kindergarten, spring kindergarten,
spring grade 1, spring grade 3, spring grade 5}). TAi,F,t is the standardized
teacher assessment, TSi,F,t represents the standardized test score. Same Racei,F,t
is a dummy set to 1 if student i in subject f in wave t was assessed by a same-
race teacher. Student characteristicsi is a vector of dummies for the student’s
gender and race. Teacher Characteristicsi,F,t is a vector of dummies for student
i’s teacher in subject f in wave t. Gradet is a grade effect, and εi,F,t is the residual,
clustered by student.15

The regression is performed separately for English and for mathematics.
Throughout the paper, I also present the regression with the teacher assess-
ment as the dependent variable, and the test score as a control. While the
regression with the test score as an explanatory variable corresponds to the
concept of conditional bias (Ferguson 2003), putting the test score on the right-
hand side means that the estimate of the coefficient of the same-race dummy
may capture measurement error in test scores. Specification 1 has both teacher
assessment and test score on the left-hand side, which substantially alleviates
any bias caused by measurement error.

The OLS regression suggests that a student assessed by a same-race teacher
gets a teacher assessment that is about 2.8 百分比到 5.7 percent of a standard
deviation higher in mathematics, 和 4.3 百分比到 6.7 percent of a standard
deviation higher in English (桌子 2). In this specification, the test score as an
explanatory variable explains only 34.8 到 44 percent of the variance of teacher
assessments.

3. IDENTIFICATION STRATEGY
Within-Student Identification: Using Student Mobility

from/to a Same-Race Teacher

In the descriptive evidence that was presented in the previous section, 这
OLS estimate of the same-race effect may be biased because a number of
student-specific variables are omitted from the regression.

例如, literature suggests that teacher perceptions of student per-
formance might depend on a number of characteristics other than student
种族: student behavior (Sherman and Cormier 1974), 语言 (Gluszek and
Dovidio 2010), parental involvement (Wilson and Martinussen 1999), student
academic engagement (Hughes and Kwok 2007), and other factors. Neither of
these variables is measured by test scores nor reflects racial perceptions per se.

15. Clustering by classroom, by student, or two-clustering (Cameron, Gelbach, and Miller, 2011) 经过
both student and classroom has little impact on the standard errors. Because two-way clustering
with two-way fixed effect (used later in section 3) does not yet exist in the literature, I chose to
present standard errors clustered by student. Clustering by classroom yields very similar standard
errors in all specification.

344

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

桌子 2. OLS Regressions

Mathematics

英语

(1)

(2)

(3)

(4)

Teacher
Assessment

Teacher Assessment
– Test Score

Teacher
Assessment

Teacher Assessment
– Test Score

Same-Race

Test Score

0.281
(0.118)

0.591∗∗
(0.004)

0.566∗∗
(0.131)

0.428∗∗
(0.093)

0.659∗∗
(0.003)

Controls

Student and teacher race and gender, grade effects

观察结果

48,065

Students

教师

R2

20,252

5,297

0.348

F Statistic

1,218.5

48,065

20,252

5,297

0.034

85.3

67,855

20,252

5,496

0.436

2,501.1

0.665∗∗
(0.122)

67,855

20,252

5,496

0.029

68.9

Notes: Standard errors clustered by student. Clustering by classroom yields similar significance
级别. Test scores and teacher assessments are standardized to a mean of 50 and a standard
deviation of 10.
∗Statistically significant at the 5% 等级; ∗∗statistically significant at the 1% 等级.

Identifying the specific effect of the student’s race requires a more complete
specification than equation 1, one that at least controls for student-specific
omitted variables. Such omitted variables will confound the estimate of the
same-race effect if teachers and students are non-randomly matched.

Assume that the teacher assessment incorporates a measure of the test

分数, captures a same-race bias, and also student-specific omitted variables:

TAi, F,t = constant + δ · Same Race i, F,t + αTes t Scor e i, F,t + Gr ade t

+ Contr ol s i, F,t + Student Omitted Var iabl e i, F,t + Res idual i, F,t

(2)

with the same notations as in specification 1, and εi,F,t = Student Omitted
Variablei,F,t + Residuali,F,t. Controlsi,F,t is a set of dummies for the teacher’s
race and gender. If student-specific omitted variables that have a positive im-
pact on the teacher assessment are correlated with assignment to a same-race
teacher, the effect δ of a same-race teacher on assessments is overestimated.
换句话说, if assignment to teachers depends on unobservables that affect
teacher assessments, the same-race effect is biased. Student-specific omitted
variables that are not correlated with same-race assignments will also imply a
(西德:3)) 是
correlation of residuals common to a given student, 那是, Corr(εi,F,t,εi,F

,t

(西德:3)

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

345

ASSESSED BY A TEACHER LIKE ME

not equal to 0, and standard errors will need to be corrected for student-level
clustering.16

If student-specific omitted variables do not vary across grades,17 specifica-

的 2 can be estimated using a student fixed effect Studenti,F:

T Ai, F,t = constant + δ · Same Race i, F,t + α · Tes t Scor e i, F,t + Contr ol si, F,t

+Student i, F + Gr ade t + Res idual i, F,t

(3)

which is estimated using either a set of student dummies, or in first-difference.
A major advantage of the dummy variable approach is that it allows us to
recover an estimate of the student unobservables Studenti; using this estimate
we can check whether there is a significant correlation between assignment
to a same-race teacher and student unobservables. Specification 3 can also be
estimated in first-difference,18 那是, using a within-student regression:

T Ai, F,t+1 − T Ai, F,t = δ(Same Race i, F,t+1 − Same Race i, F,t )

+ (Contr ol s i, F,t+1 − Contr ol s i, F,t )

+ A(Tes t Scor e i, F,t+1 − Tes t Scor e i, F,t )

+ (Gr ade t+1 − Gr ade t ) + (Res idual i, F,t+1

− Res idual i, F,t ).

(4)

The first-differenced specification makes clear that the identification of
the same-race effect δ relies on student mobility from/to a same-race teacher.
The effect of a same-race teacher is estimated without bias if the mobility
of a student from a teacher of the same-race (another-race) in one grade to
a teacher of another race (the same race), in the next grade, is uncorrelated
with time varying student unobservables that have an impact on test scores,
那是, Corr((Same Racei,F,t + 1 − Same Racei,F,t), (Residuali,F,t + 1 – Residuali,F,t)) =
0. Student behavior is one such time varying unobservable that may affect
teacher assessments and is potentially correlated with student mobility to/from

16. Specifically, Cov(εi,F,t,εi,F (西德:3),t(西德:3) ) = Cov(Student Omitted Variablei,F,t,Student Omitted Variablei,F (西德:3),t(西德:3) ) for f (西德:4)=
F (西德:3) and for t (西德:4)= t(西德:3). If student-specific omitted variables are constant across grades, then Cov(εi,F,t,
εi,F,t (西德:3) ) = Var(Student Omitted Variablei,F) and the correlation of residuals for a given student across
grades will be equal to the ratio of the variance of student unobservables to the overall variance of
the residuals (Moulton 1990).

17. Student Omitted Variablei,F,t = Student Omitted Variablei,F,t (西德:3) for any t, t (西德:3).
18. Both approaches (student dummies and first-differenced specification) are equivalent with a large
number of observations as long as the strict exogeneity assumption is satisfied (Baltagi 2008), 那
是, 乙(Residuali,F,t|席,F,1,席,F,2,…,席,F,5) = 0, where 1,2,…,5 indexes waves of the survey, and Xi,F,t denotes
the vector of explanatory variables for student i in subject area f, in grade t (常数, same race
dummy, test score, and grade dummies).

346

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

a teacher of the same race. I discuss the impact of behavior on estimates in
部分 4.

Because identification relies on student mobility across teachers, it is im-
portant to check that a sufficient number of students move to teachers of
different races. Otherwise identification would rely on a small number of stu-
dents who move from/to a teacher of the same race.19 There are a large number
of such moves: 51 percent of students experience mobility from/to a same-race
teacher at some point between kindergarten and grade 5, and the sample of
movers is balanced in terms of race, 性别, and parental income.20

Columns (1) 和 (4) of table 3 present the estimation of the first-differenced
specification 4 in mathematics and in English, with standard errors clustered
by student.21 Being assessed by a teacher of the same race raises teacher assess-
ments by 3.5 percent of a standard deviation in mathematics and by 4.3 百分
in English. The specification has fewer observations because the number of
observations is equal to the number of first-differenced teacher assessments.
Columns (2) 和 (5) present results of the estimation of specification 3, 哪个
includes a student fixed effect. Being assessed by a teacher of the same race
raises assessments by 7 percent of a standard deviation in mathematics and
经过 4.8 percent of a standard deviation in English. The regression is strongly
significant with an F statistic of 82.6. 重要的, there is a significantly pos-
itive correlation between the estimated student effects and assignment to a
same race teacher both in mathematics and in English, which indicates that
the regression without student fixed effects underestimates the impact of a
same-race teacher on assessments. Columns (3) 和 (6) regress the difference
between the teacher assessment and the test score on the explanatory variables.
Estimates of the same race effect are comparable to columns (2) 和 (5) 的
same table.

Within-Classroom Identification

Teacher-specific omitted variables may also confound the estimate of the same-
race effect. Although OLS specification 1 controls for teachers’ race and gender,
other teacher characteristics, imperfectly correlated with race and gender, 影响
teacher assessments. 例如, Figlio and Lucas (2004) find that some
teachers give higher average grades regardless of their students’ ability, 种族,
or gender. Such variation in average assessments across classrooms should

19.

一般来说, if a covariate does not vary for a given student in a panel data regression with student
fixed effects, the student’s observation will not contribute to the estimation of the effect (Wooldridge
2002).

20. At each parental income level, 从 41 百分比到 52 percent of students experience a transition

from/to a same race teacher. Statistics available on request.

21. Clustering either by classroom, by student, or clustering by both classroom and student (Cameron,

Gelbach, and Miller 2011) does not significantly affect the estimated standard errors.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

347

ASSESSED BY A TEACHER LIKE ME

)
6
(

)
5
(

)
4
(

)
3
(

)
2
(

)
1
(

e
r

C
S
t
s
e
时间

t
n
e

s
s
e
s
s
A

t
n
e

s
s
e
s
s
A

e
r

C
S
t
s
e
时间

t
n
e

s
s
e
s
s
A

t
n
e

s
s
e
s
s
A

t
n
e

s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

d
e
C
n
e
r
e
F
F

D

t
s
r

F

t
n
e

s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

d
e
C
n
e
r
e
F
F

D

t
s
r

F

H
s

G
n

s
C

t
A

e
H
t
A
中号

s
t
C
e
F
F

d
e
X

F

t
n
e
d

t
S
H
t

w
d
n
A

n


t
A
C


C
e
p
S
d
e
C
n
e
r
e
F
F

D


t
s
r

F

H
t

w
s
t


s
e

.

3
e

A
时间

348



3
8
4
0

.

)
6
7
1
0
(

.

s
e


s
e

5
5
8
,
7
6

0
3
4
0

.

6
4
6
1

.

)
0
0
0
0
(

.



6
9
0
0
-

.

)
0
0
0
0
(

.



3
1
4
0

.

)
3
1
1
0
(

.



6
1
3
0

.

)
6
0
0
0
(

.

s
e


s
e

5
5
8
,
7
6

9
9
6
0

.

4
2
0
2

.

)
0
0
0
0
(

.



5
6
0
0

.

)
0
0
0
0
(

.



9
2
4
0

.



4
8
7
0

.



4
0
7
0

.



1
4
2
0

.

)

4
5
1
0

(

.

)

7
0
0
0

(

.


s
e

s
e

A
2
9
4

,

4
4

6
3
0
0

.

)

9
7
1
0

(

.

s
e


s
e

5
6
0

,

8
4

0
4
0
0

.



3
6
2
0

.

)

2
6
1
0

(

.

)

9
0
0
0

(

.

s
e


s
e

5
6
0

,

8
4

5
6
6
0

.

2
7
3
2

.

8
0
1
3

.

)

0
0
0
0

(

.

)

0
0
0
0

(

.



5
4
1
0
-

.



2
4
0
0

.

)

0
0
0
0

(

.

)

0
0
0
0

(

.

+
0
5
3
0

.

)
1
1
2
0
(

.



9
2
1
0

.

)
1
1
0
0
(

.


s
e

s
e

A
9
8
0
,
2
2

0
1
0
0

.

r
e
d
n
e
G
d
n
A

e
C
A

r
e
H
C
A
e
时间

d
n
A

t
n
e
d

t
S

s
t
C
e
F
F

e
d
A
r
G

s
n


t
A
v
r
e
s

2

)
e

A
v

p
(

s
t
C
e
F
F

t
n
e
d

t
S
r

F

C

t
s

t
A
t
S
F

)
s
t
C
e
F
F

t
n
e
d

t
S

,

e
C
A

e

A
S
(
r
r

C

e
C
A

e

A
S

e
r

C
S
t
s
e
时间

t
C
e
F
F

t
n
e
d

t
S

d
r
A
d
n
A
t
s

A

d
n
A

0
5

F

n
A
e

A


t

d
e
z

d
r
A
d
n
A
t
s

s
t
n
e

s
s
e
s
s
A

r
e
H
C
A
e
t

d
n
A

s
e
r

C
s

t
s
e
时间

.




r
s
s
A
C

y

G
n

r
e
t
s

C


t

t
s



r

s
t


s
e

.
t
n
e
d

t
s

y

d
e
r
e
t
s


C

s
r

r
r
e

d
r
A
d
n
A
t
S

:
s
e
t

.
0
1

F

n


t
A

v
e
d

.

e
v
e

%
0
1
e
H
t

t
A

t
n
A
C

n
G
s

y

A
C

t
s

t
A
t
s
+

;

e
v
e

%
1

e
H
t

t
A

t
n
A
C

n
G
s

y

A
C

t
s

t
A
t
s

;

e
v
e

%
5

e
H
t

t
A

t
n
A
C

n
G
s

.
G
n

C
n
e
r
e
F
F

d

t
s
r


t

e

d

s
n


t
A
v
r
e
s

F

r
e



n

y

A
C

t
s

t
A
t
S

r
e


A

S
A

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

be controlled for in specification 1 as the nonrandom sorting of teachers to
students implies that the teacher’s average assessment may be correlated with
assignment to a same-race student.

All these teacher-specific omitted variables enter in the determination of

teacher assessments:

T Ai, F,t = constant + δ · Same Racei, F,t + αT es t Scor ei, F,t

+ T eacher Omitted Var iabl ei, F,t + Contr ol si, F,t

+ Gr adet + Res iduali, F,t .

(5)

Teacher omitted variables (Teacher Omitted Variablei,F,t), if correlated posi-
tively with assignment to a same race teacher (Same Racei,F,t), lead to an upward
bias in the estimate δ of the same-race effect. The presence of teacher-specific
omitted variables also imply a correlation of residuals in the OLS specification
across observations of the same classroom, and standard errors should be
corrected for clustering at the classroom level.22 Because of the large number
of fixed effects (6,093 教师), a specification like specification 5 is usually
estimated by taking the within-classroom difference of teacher assessments,
test scores, and each covariate of the specification

T Ai, F,t − E (T A., F,t |cl as s r oom)

= δ · (Same Racei, F,t − E (Same Racei, F,t |cl as s r oom)

+ α · (硅钛矿, F,t − E (T S., F,t |cl as s r oom))

+ Contr ol si, F,t − E (Contr ol si, F,t |cl as s r oom))

+ Res idual

(西德:3)
我, F,t

.

(6)

where E(x.,f,t|classroom) is the average of covariate x in the classroom of student
i in subject f in year t. The within-classroom specification makes it clear that
the identification relies on comparing the teacher assessment TAi,F,t of a stu-
dent to the average teacher assessment E(TA .,f,t|classroom) in the classroom. A
classroom contributes to the identification of the same-race effect if it has both
same-race and other-race students.23 Fortunately, 97.2 percent of the class-
rooms of the sample have observations of same-race and other-race students,
和 44 percent of students are of the same race as teacher on average.

22. Throughout the paper I cluster standard errors at the student level, but clustering at the classroom
level or two-way clustering at the student and classroom levels (Cameron, Gelbach, and Miller 2011)
yields similar significance levels.

23. 正式地, if the value of Same Racei,F,t – E(Same Race.,f,t|classroom) changes within a classroom.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

349

ASSESSED BY A TEACHER LIKE ME

Specification 5 can also be estimated by including a set of teacher fixed

effects, 即, one dummy of each teacher of the sample.

T Ai, F,t = constant + δ · Same Race i, F,t + αTes t Scor e i, F,t

+ Teacher E f f ect i, F,t

+ Gr ade t + Res idual i, F,t .

(7)

Both approaches (specifications 6 和 7) yield the same estimate with
a large number of observations (Baltagi 2008).24 The advantage of such a
specification is that it allows us to recover an estimate of the teacher effect. In all
waves except the spring grade 5 后续行动, the same teacher assesses students
in English and mathematics, but separate teacher effects are estimated for
English and for mathematics.

Columns (1) 和 (4) of table 4 show the results of the within-classroom
specification 6. Students assessed by a teacher of the same race have higher
teacher assessments, 经过 4.1 percent of a standard deviation in English and 5.5
percent in mathematics. All results are significant at 1 百分. 有趣的是,
test scores and observable controls explain 34 percent of the variance of teacher
assessments. Columns (2) 和 (5) present results of the estimation of speci-
fication 7, which includes teacher effects. The point estimates are larger than
in the within-teacher approach, but they are not statistically different from the
estimates of columns (1) 和 (4). Having a same-race teacher raises teacher
assessments by 6.9 percent of a standard deviation in English and 7.0 百分
of a standard deviation in mathematics. The specification allows us to estimate
that teacher effects are significant (the null hypothesis that teacher effects are
equal to zero is rejected), indicating that teacher unobservables play a role in
assessments. 而且, being assessed by a same-race teacher is negatively
correlated with the teacher effect (especially in mathematics), and we indeed
observe a downward bias: The OLS estimation of the same-race effect without
teacher effects in columns (1) 和 (3) of table 2 is lower than the estimates
of columns (2) 和 (5) of table 4. 最后, results available on request show
that teacher unobservables are not accounted for by the teacher’s race, 性别,
经验, or tenure.

Combining the Within-Student and Within-Classroom Identification Strategies

最后, I combine both the former two identification strategies to control
for both student-specific and teacher-specific omitted variables. My preferred

24. 那是, both estimators converge in probability to the same estimate. Under the assumption that
residuals are strictly exogenous within each classroom, 那是, 乙(Residual(西德:3)我,F,t|,F,t) = 0, 在哪里
席,F,t is the vector of explanatory (right-hand side) variables in specification 6.

350

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

)
6
(

)
5
(

)
4
(

)
3
(

)
2
(

)
1
(

t
n
e

s
s
e
s
s
A

t
n
e

s
s
e
s
s
A

A
时间

e
G
A
r
e
v
A

t
n
e

s
s
e
s
s
A

t
n
e

s
s
e
s
s
A

A
时间

e
G
A
r
e
v
A

r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

t
n
e

s
s
e
s
s
A
r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

r
e
H
C
A
e
时间

t
n
e

s
s
e
s
s
A
r
e
H
C
A
e
时间

H
s

G
n

s
C

t
A

e
H
t
A
中号

s
t
C
e
F
F

d
e
X

F

r
e
H
C
A
e
时间

d
n
A

t
n
e
d

t
S
H
t

G
n
n



C
d
n
A

,
n


t
A
C


C
e
p
S
s
t
C
e
F
F

d
e
X

F

r
e
H
C
A
e
时间

e
H
t

F

,
n


t
A


t
s

r
e
H
C
A
e
时间
n
H
t


e
H
t

F

s
t


s
e

.

4
e

A
时间



5
3
4
0

.

)
4
1
1
0
(

.



3
1
3
0

.

)
5
0
0
0
(

.

s
e

s
e


5
5
8
,
7
6

3
7
7
.
0

6
8
7
2

.

)
0
0
0
0
(

.



3
1
0
0

.

)
0
0
0
0
(

.

2
5
1
2

.

)
0
0
0
0
(

.



8
5
0
0

.

)
0
0
0
0
(

.



2
0
7
0

.

)
4
9
0
0
(

.



9
6
6
0

.

)
3
0
0
0
(

.

s
e


s
e

5
5
8
,
7
6

3
5
5
.
0

6
3
8
2

.

)
0
0
0
0
(

.



7
1
0
0

.

)
0
0
0
0
(

.



9
4
5
0

.



1
1
7
0

.



4
9
6
0

.

)

8
9
0
0

(

.

)

0
9
1
0

(

.

)

0
2
1
0

(

.



4
5
6
0

.



1
4
2
0

.



8
8
5
0

.

)

4
0
0
0

(

.

)

9
0
0
0

(

.

)

4
0
0
0

(

.



s
e

5
5
8
7
6

,

8
3
4

.

0

s
e

s
e


5
6
0

,

8
4

6
8
7

.

0

6
9
9
2

.

s
e


s
e

5
6
0

,

8
4

0
4
5

.

0

1
9
2
3

.

)

0
0
0
0

(

.

)

0
0
0
0

(

.

)

0
0
0
0

(

.

4
9
7
1

.

)

0
0
0
0

(

.



0
3
0
0

.



0
2
0
0

.



1
1
0
0

.

)

0
0
0
0

(

.

)

8
1
0
0

(

.



6
0
4
0

.

)
9
1
1
0
(

.



5
6
5
0

.

)
5
0
0
0
(

.



s
e

5
6
0
,
8
4

8
3
3
.
0

s
e

A
v
r
e
s

r
e
H
C
A
e
时间

d
n
A

t
n
e
d

t
S

)
e

A
v

p
(

.
t
A
t
S
F

s
t
C
e
F
F

r
e
H
C
A
e
时间

)
s
t
C
e
F
F

r
e
H
C
A
e
时间

,

e
C
A

e

A
S
(
r
r

C

)
e

A
v

p
(

.
t
A
t
S
F

s
t
C
e
F
F

t
n
e
d

t
S

)
s
t
C
e
F
F

t
n
e
d

t
S

,

e
C
A

e

A
S
(
r
r

C

s
n


t
A
v
r
e
s

2

e
C
A

e

A
S

e
r

C
S
t
s
e
时间

s
t
C
e
F
F

r
e
H
C
A
e
时间

s
t
C
e
F
F

t
n
e
d

t
S

.
t
n
e

s
s
e
s
s
A

r
e
H
C
A
e
t

=

A
时间

.
s
e
t
A


t
s
e

r
A



s

s
d
e

y



r
s
s
A
C

y

G
n

r
e
t
s

C

.
t
n
e
d

t
s

y

d
e
r
e
t
s

C

s
r

r
r
e

d
r
A
d
n
A
t
S

.
s
t
C
e
F
F
e

e
d
A
r
G

e
d


C
n

.

e
v
e

%
1
e
H
t

t
A

s
n


t
A
C


C
e
p
s


A

:
s
e
t

t
n
A
C

n
G
s

y

A
C

t
s

t
A
t
S

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

351

ASSESSED BY A TEACHER LIKE ME

estimate is thus the same-race δ coefficient in the regression that controls for
both teacher effects and student effects:

T Ai, F,t = constant + δSame Race i, F,t + αTes t Scor e i, F,t

+ Teacher E f f ect i, F,t
+ Gr ade t + Res idual i, F,t

+ Student E f f ect i

(8)

where the teacher effect (Teacher Effecti,F,t) and the student effect (Student Effecti)
are estimated by including a set of dummies for teachers and a set of dum-
mies for students as controls. The large number of students (21,409) 和
large number of teachers (6,093) make it necessary to estimate the model
using econometric techniques pioneered by Abowd, Creecy, and Kramarz
(2002) and Abowd, Kramarz, and Margolis (1999) in the labor economics
employer–employee literature. The technique provides estimates for all stu-
dent effects, teacher effects, grade effects, and same-race and test score co-
efficients. Standard errors are clustered at the student level; clustering by
classroom yields similar standard errors.

Columns (3) 和 (6) present the estimates. Teachers give better assess-
ments to students of their own race; the effect is 7.1 percent of a standard
deviation in mathematics and 4.4 of a standard deviation in English. Teacher
and student effects are significant.

4. DISCUSSION OF THE FINDINGS
Behavior and Assessments

Teacher assessments of student performance are partly determined by student
行为 (Sherman and Cormier 1974). Column (1) (分别, Column (2))
of table 5 shows a regression of mathematics teacher assessments (分别,
English teacher assessments) on four behavioral measures.

The four behavioral measures come from a separate questionnaire of each
wave of the study. Teachers reported the measures in terms of the social rat-
ing scale: approaches to learning, interpersonal skills, externalizing problems
行为, internalizing problems behavior. The scale for approaches to learn-
ing measures the ease with which children can benefit from their learning
环境. The interpersonal skills scale rates the child’s skill in forming
and maintaining friendships; getting along with people who are different;
comforting or helping other children; expressing feelings, ideas, and opinions
in positive ways; and showing sensitivity to the feelings of others. The exter-
nalizing problem behaviors scale (IE。, impulsive/overactive scale) addresses
acting-out behaviors, and the internalizing problem behavior scale addresses
evidence of anxiety, loneliness, low self-esteem, or sadness.

352

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

桌子 5. Behavior and Assessments

(1)

(2)

(3)

(4)

Mathematics Teacher
Assessment

English Teacher
Assessment

Same Race

Same Race

Same Race

Test Score

Approaches

to Learning

Interpersonal

Skills

Externalizing

Problem Behavior

Internalizing

Problem Behavior

Student and Teacher
Race and Gender

Student Effects

Teacher Effects

F Statistic

R2

0.707∗∗
(0.199)

0.207∗∗
(0.008)

0.267∗∗
(0.008)

0.042∗∗
(0.007)

0.045∗∗
(0.012)

−0.040∗∗
(0.006)

是的

是的

4.62

0.73

观察结果

48,065

0.419∗∗
(0.134)

0.265∗∗
(0.006)

0.298∗∗
(0.004)

0.035∗∗
(0.004)

0.035∗∗
(0.003)

−0.058∗∗
(0.005)

是的

是的

0.80

67,855

–0.001
(0.001)

0.001∗∗
(0.001)

−0.001
(0.001)

−0.001
(0.001)

−0.001∗∗
(0.000)

是的

4,249.2

0.59

0.001
(0.000)

−0.001+
(0.000)

0.000
(0.000)

0.001
(0.001)

0.001∗∗
(0.000)

是的

是的

26.66

0.79

67,855A

67,855A

aRegression performed using English observations. Students are assessed by the same teacher in
English and mathematics from kindergarten to grade 3, and different teachers in grade 5. 相似的
results hold when estimating the regression with mathematics observations.
∗∗Statistically significant at the 1% 等级. All specifications include grade effects. Standard errors
clustered by student. Clustering by classroom yields similar estimates.

The measures of behavior vary substantially, both across students and for a
given student, across time. On the interpersonal skills scale, 50.1 的百分比
variance is explained by within-student variance, and the behavioral measure
in the previous wave of the study explains about 31 percent of the variance of
the behavioral measure of the next grade.

In Column (1) of table 5, the teacher assessment in mathematics is re-
gressed on the mathematics test score, the same-race dummy, the four behav-
ioral measures, a student effect, and a teacher effect.

The first noticeable fact is the impact of behavior on assessment. Smaller
values indicate stronger behavioral problems. A one standard deviation in-
crease in the approaches to learning scale raises teacher assessments by 3
percent of a standard deviation. A one standard deviation increase in the inter-
personal skills measure raises teacher assessments by 0.4 percent of a standard

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

353

ASSESSED BY A TEACHER LIKE ME

deviation. Externalizing behavior problems has a similar positive effect. 国际米兰-
nalizing behavior problems has a negative impact on teacher assessments.
That last result is consistent with the finding (Rutherford, 奎因, and Mathur
2004) that students with internalizing behavior problems (social withdrawal,
anxiety, depression) are harder to identify than students with externalizing
behavior problems (noncompliance, aggression, disruption).

How behavior affects the baseline estimate of the same-race effect in speci-
fication 8 depends on whether students are partly matched to teachers based on
their behavior. Because I am using a student fixed-effect regression, 行为
is a confounding factor in the regression if changes in behavior across grades
are significantly correlated with the probability of being assigned a same-race
teacher. If students whose behavior improves are more likely to be assigned to
a same-race teacher, the same-race effect δ in specification 8 will be overesti-
mated. Column (3) regresses the same-race dummy on the test score, the four
behavioral measures, and student and teacher race and gender dummies. 这
effect of behavior on same-race assignments is either nonsignificant or very
小的. Column (4) confirms the finding when including student fixed effects.
不出所料, 所以, behavioral controls leave the same-race effect
(0.707 compared with 0.702 in mathematics, 0.420 compared with 0.435 在
英语) virtually unchanged compared with the estimate with a student effect
and a teacher effect in table 4.

Same-Race Effects Skill by Skill

桌子 6 presents results of baseline regression for English, considering only
kindergarten fall semester observations. The novelty is that the dependent
variable is the teacher assessment broken down into eight separate skills.
The results are informative with regard to the likelihood of a bias for two
原因: 第一的, it is unlikely that students benefit from the better teaching of
a same-race teacher (Dee 2005) only a few weeks after the start of school and
hence better teacher assessments for same-race students are more likely to
represent perceptions rather than actual skills. 第二, same-race assessment
gaps appear also for the least abstract questions—in other words, 问题
that address the skills that are most likely to be captured by achievement tests.
拿, 例如, the statement: “This child easily and quickly names all
upper- and lower-case letters of the alphabet.” In the fall semester of kinder-
garten, teachers assess students of their own race 4 percent of a standard
deviation higher than children of other races. This English skill is measured
in the kindergarten test and is measured early in the curriculum. And similar
regressions in grade 5 present similar positive same-race effects.

The same-race effect can also be estimated separately for each grade by in-
cluding interactions between the grade dummies and the same-race dummy.

354

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

.

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

s
t
n
e

s
s
e
s
s
A
r
e
H
C
A
e
时间

H
s

G
n

n
e
t
r
A
G
r
e
d
n
K


A
F


6
9
1
0

.

)
7
7
0
0
(

.

4
6
8
,
6
1

1
9
.
0

3
.
8
8
3

6
3
1
0

.

)
8
9
0
0
(

.

4
6
8
,
6
1

5
8
.
0

9
.
5
4
0
,
1

r
e
t

p


C

s
n


t
n
e
v
n

C

G
n

t

r

8
1
0
0

.

)

8
0
1
0

(

.

s
d
A
e

0
8
0
0

.

G
n

y
H



4
7
6
0

.

s
e

A



7
9
3
0

.

)

4
0
1
0

(

.

)

6
0
1
0

(

.

)

7
2
1
0

(

.

s
t
C
e
F
F

r
e
H
C
A
e
时间

d
n
A

s
e
r

C
S
t
s
e
时间

H
s


G
n

4
6
8

,

6
1

4
6
8

,

6
1

2
8

.

0

3
8

.

0

4
6
8

,

6
1

2
8
0

.

4
6
8

,

6
1

4
7

.

0

.

9
9
2
5

,

1

.

4
3
2
1

,

1

.

2
7
2
8

,

.

3
5
6
5

,

4

1

s

r
t
n

C

s
d
n
A
t
s
r
e
d
n
U

X
e
p


C



5
3
0
1

.

)
6
4
1
0
(

.

4
6
8
,
6
1

5
6
.
0

.

3
2
9
1
,
2



7
5
2
1

.

)
2
4
1
0
(

.

4
6
8
,
6
1

7
6
.
0

7
.
9
3
0
,
2

r
e
H
C
A
e
时间

e
C
A

e

A
S

r
e
H
C
A
e
时间

e
C
A

e

A
S

s
n


t
A
v
r
e
s

s

r
t
n

C

C

t
s

t
A
t
S
F

2



k
S
y



k
S
s
t
C
e
F
F

e
C
A

e

A
S

.

6
e

A
时间

e
H
t

r

F

s

r
t
n

C

e
d


C
n

s

r
t
n

C

r
e
H
C
A
e
t

;
r
e
d
n
e
G

d
n
A

e
C
A
r

r

F

s

r
t
n

C

e
d

C
n

d


H
C

;
0
5

F

n
A
e

A

d
n
A

0
1

F

n


t
A

v
e
d

d
r
A
d
n
A
t
s

A

e
v
A
H

s
e
r

C
s

t
s
e
时间

:
s
e
t

.
e
C
n
e

r
e
p
X
e

d
n
A

,
e
r

n
e
t

,
r
e
d
n
e
G

,
e
C
A
r

s

r
e
H
C
A
e
t

s
e

A

.
r
e
H
/

H

t
d
A
e
r

t
X
e
t

r
e
H
t

r

y
r

t
s
A
s
t
e
r
p
r
e
t
n

d
n
A
s
d
n
A
t
s
r
e
d
n

d


H
C
s
H
时间
=
s
d
n
A
t
s
r
e
d
n
U

.
s
e
r

t
C

r
t
s
e
C
n
e
t
n
e
s
X
e
p


C
s
e
s

d


H
C
s
H
时间
=
X
e
p


C

:
s
n


t

n

e
D

e
p

s

s
d
A
e
r

d


H
C

s
H
时间
=
s
d
A
e

.
s
d
r

w
G
n

y
H
r

s
e
C

d

r
p

d


H
C

s
H
时间
=
G
n

y
H

.
t
e

A
H
p
A

e
H
t

F

s
r
e
t
t
e

e
s
A
C

r
e
w

d
n
A


r
e
p
p


A

s
e

A
n

y

k
C


q

d
n
A

y

s
A
e

d


H
C

s
H
时间
=

s
n


t
n
e
v
n

C

e
H
t

F

e


s

F

G
n
d
n
A
t
s
r
e
d
n

n
A

s
e
t
A
r
t
s
n


e
d

d


H
C

s
H
时间
=

s
n


t
n
e
v
n

C

.
s
r


v
A
H
e

G
n

t

r

w
y

r
A
e

s
e
t
A
r
t
s
n


e
d

d


H
C

s
H
时间
=
G
n

t

r

.
y

t
n
e
d
n
e
p
e
d
n

s
k


.

e
v
e

%
1
e
H
t

t
A

t
n
A
C

n
G
s

.
s
e
s

p
r

p

F

y
t
e

r
A
v

A

r

F

r
e
t

p


C

y

A
C

t
s

t
A
t
s

e
H
t

s
e
s

d


H
C

s
H
时间
=

r
e
t

p


C

.
t
n

r
p

F

;

e
v
e

%
5
e
H
t

t
A

t
n
A
C

n
G
s

y

A
C

t
s

t
A
t
S

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

355

ASSESSED BY A TEACHER LIKE ME

These results (available from the author) show that teachers give more favor-
able assessments to same-race students as soon as in the fall of kindergarten:
14 percent of a standard deviation higher in mathematics and 11 percent of a
standard deviation higher in English. After the fall semester of kindergarten,
the effect is about 6 百分 (3 百分) of a standard deviation in mathematics
(英语).

Measurement Error in Test Scores and Teacher Assessments

Two types of measurement error may confound the main estimates of our
same-race effect in specification 3. 第一的, teacher assessments may be noisy
measures of teacher perceptions of student performance. 第二, test scores
of multiple-choice questionnaires may be noisy measures of underlying ability
(Rudner and Schafer 2001). Random error may be introduced in the design
of the questionnaire and distractors (wrong options) may be partially cor-
直角. Measurement error in test scores may also be due to the student’s sleep
图案, illness, and careless errors when filling out the questionnaire, mis-
interpretation of test instructions, and other exam conditions.

Measurement error in teacher assessments is likely to make our estimates
of the same-race effect less significant, because classical measurement error on
the dependent variable of a linear regression (specification 3) does not typically
bias estimates but leads to larger standard errors for the estimated coefficients
(Wooldridge 2002; Greene 2011). 因此, finding a significant effect of a same-
race teacher is evidence that teacher assessments are a sufficiently precise25
measure of teacher perceptions of student performance.

Measurement error in test scores may be more problematic. 的确, proper
conditioning for student ability in a given grade is key to the estimation of same-
race effects on teacher perceptions of students’ skills. This paper measures
conditional bias as in Ferguson (2003)—that is, the impact of the student’s
race on teacher assessments when conditioning on covariates that include
measures of student ability. The main specification (specification 8) 估计
same-race effects on teacher assessments conditional on test scores and stu-
dent effects. At the extreme, if test scores are such a noisy measure of student
ability that most of its variance is accounted for by measurement error, 骗局-
ditioning on test scores will have no impact on the same-race coefficient; 这
coefficient on test scores will be nonsignificant.26 In such a case, the same-race
coefficient will measure a sum of the same-race effects on teacher perceptions

25. Precision in the statistical sense, as the inverse of the standard deviation.
26.

In table 4, the coefficient for test scores in all regressions is less than 1, whereas we would
naturally expect this coefficient to equal to 1, given that both assessments and test scores have a
standard deviation of 10. Constraining this coefficient to be equal to 1 does not significantly alter
the coefficients of interest. Results available on request.

356

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

and the positive effect of same-race teachers on student ability (Dee 2005).
On the other extreme, if test scores measure student ability accurately,27 这
same-race coefficient in specification 9 will be an estimate of same-race biases.
ECLS-K documentation specifies that test scores are highly reliable (看
部分 2). But the question here is whether a small amount of measurement
error in test scores can explain away the same-race effect—that is, if the same-
race coefficient captures some unobserved student ability rather than a bias in
teacher assessments.

So is there some amount of measurement error that explains the same-race
estimates of table 4? Test scores are noisy measures of the child’s underlying
能力, so that Test scorei,F,t = Abilityi,F,t+νi,F,t. Measurement error is assumed to
be classical (IE。, νi,t is not correlated with ability), 哪个, as Bound, 棕色的,
and Mathiowetz (2001) 建议, is a reasonable assumption in many common
案例.

Assume also that teacher assessments capture student ability and are af-

fected by a same- race bias δ:

TAi, F,t = constant + αStudent Abilit yi, F,t

+ δSame r ace i, F,t + εi, F,t .

(9)

For clarity and without loss of generality, student and teacher fixed effects
are not included in this equation. I do not observe student ability and so
estimate specification 9 by regressing assessments on the test score and the
same-race dummy. With that approach, the estimate of δ will not be consistent
because it will capture part of student ability instead of capturing only teacher
biases:28

plim(Estimator of δ) = δ + α · λθ

(10)

where δ is the coefficient of teacher bias, and θ = var(ν)/[var(ν) + var(Ability)]
and λ = Cov(Same Race, Student Ability)/ Var(Same race)(1 − Corr(Same race,
Test score)2). 如果, as suggested by Dee (2005), student ability is higher when
taught by a same-race teacher, ability and the same-race dummy are positively
correlated, λ > 0, α · λθ > 0 and the effect α of same-race teachers on
assessments will be overestimated.29

If the relative size θ of the measurement error were known, an unbiased
effect of same-race teachers on assessments could be recovered. This unbiased

27. 正式地, if the test score is a sufficient statistic for student ability.
28. The algebra is a particular case of the formulas of Greene (2011); plim denotes the probability limit

of the estimate.

29. This result is very close to equations of the statistical discrimination literature (看, 例如, Phelps
1972). On the labor market, the employer’s hiring decision may depend on the race of the job
candidate because the candidate’s education, 经验, and other covariates are not sufficient
statistics for the candidate’s productivity.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

357

ASSESSED BY A TEACHER LIKE ME

桌子 7. Could Measurement Error in Test Scores Explain the Same-Race Effect?

Mathematics – Size of Measurement Error in Test Scores

θ = 0.00 θ = 0.05 θ = 0.10 θ = 0.15 θ = 0.20 θ = 0.25 θ = 0.30

Same Race

Corrected Test Score

0.711∗∗
(0.211)

0.241∗∗
(0.010)

0.668∗∗
(0.189)

0.254∗∗
(0.008)

0.620
(0.267)

0.268∗∗
(0.013)

0.566
(0.241)

0.284∗∗
(0.011)

0.506
(0.252)

0.301∗∗
(0.009)

0.438
(0.212)

0.322∗∗
(0.015)

0.360
(0.142)

0.345∗∗
(0.017)

English – Size of Measurement Error in Test Scores

θ = 0.00

θ = 0.05

θ = 0.10

θ = 0.15

θ = 0.20

θ = 0.25

θ = 0.30

Same Race

Corrected Test Score

0.435
(0.174)

0.313∗∗
(0.007)

0.384
(0.152)

0.330∗∗
(0.006)

0.327∗∗
(0.090)

0.348∗∗
(0.008)

0.264
(0.123)

0.368∗∗
(0.006)

0.193
(0.153)

0.391∗∗
(0.007)

0.113
(0.143)

0.417∗∗
(0.008)

0.021
(0.178)

0.446∗∗
(0.011)

Notes: Test scores have a standard deviation of 10 and a mean of 50. All regressions are two-
way fixed-effects regressions with both a child and a teacher fixed effect. Standard errors are
自力更生, clustered by student. The corrected test score is such that equation 13 holds.
∗Statistically significant at the 5% 等级; ∗∗statistically significant at the 1% 等级.

estimate of same-race effects is obtained by regressing assessments on a cor-
rected value of the test scores, defined as follows:

Cor r ected T es t s cor e i, F,t = θ · E [Tes t s cor e ., F,t |Same r ace]

+ (1 − θ ) · Tes t s cor e i, F,t .

(11)

When we estimate specification 8 replacing the test with this test score,
the estimator of the same-race effect will be an unbiased estimate of same-race
effect on teacher assessments δ.

This holds if we know the size of measurement error θ . But θ is unknown,
and we estimate the parameter of interest δ using different values of θ . 这
lowest value of measurement error θ that cancels the estimate of the effect of a
same-race teacher on assessments yields an estimate of the lowest amount of
measurement error that could account for the baseline results. Results for the
baseline specifications with corrected test scores are presented in table 7.30

For mathematics test scores, a measurement error of more than 30 每-
cent is required to render the coefficient nonsignificant, and additional results
显示 40 到 50 percent of measurement error is required to cancel the
point estimate. For English, A 20 percent measurement error makes the coef-
ficient nonsignificant, and additional results show that measurement error of

30. Results for measurement error above 30 percent are available upon request.

358

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

Amine Ouazad

40 percent cancels the point estimate. 简而言之, a significant amount of mea-
surement error would be necessary to cancel coefficients. Even though this
statistic does not exclude the potentially confounding effect of measurement
错误, it does indicate that only a large amount of measurement error in test
scores would alter the conclusions.

Grading on a Curve

Teacher assessments in each subject are an average of ten different assess-
ments on a scale of 1 到 5, which is then standardized to a mean of 50 和
standard deviation of 10. Athough the skills that each assessment evaluates are
clearly defined by the survey questionnaire, there is no guideline as such on
what should be the standard deviation of assessments across students within
a classroom, or what exact proficiency level justifies awarding a 5 或一个 4. It may
well be that the teacher implicitly ranks students within a classroom.31

The implications of grading on a curve for the measurement of a bias in
favor of same-race students are multiple. 第一的, teacher assessments may not
be directly comparable to test scores, as they will reflect a ranking of students
within a classroom, while test scores have a common scale for all participating
学生. 第二, the teacher assessment of a given student will be correlated
with peers’ average test score in the classroom. 第三, if peer group ability is
significantly correlated with being assigned a same-race teacher, the estimated
OLS effect of a same-race teacher on teacher assessments in specification 1
will be biased.

If teacher assessments reflect a ranking of students within a classroom
rather than a measure on a common scale, we should expect black students to
get lower assessments than white students. 的确, consider a simple model
where there are only two students in each classroom, and each student can have
either a low teacher assessment (阿尔) or a high teacher assessment (ah). A student
gets a high assessment if he is the student with the highest ability in the class-
room. Student ability is denoted ω, and follows a cumulative distribution func-
tion F(ω). Each student can be either white, r = w, or minority, r = m. The cumu-
lative distribution function given the student’s race r is denoted F(ω|r). Then a
student gets a high assessment ah if his ability is higher than his peer’s ability.
因此, a student of race r has a high teacher assessment with probability
磷(a = ah|r,ω) =P(ω > ω(西德:3) |r,ω) = F(ω(西德:3) |r,ω). For simplicity, assume that peer
ability ω(西德:3) is independent of student ability conditional on race, 那是, F(ω(西德:3) |r,ω)
= F(ω(西德:3) |r).32 In the data we observe that minority students are in classrooms
with lower average test scores. Black students are in classrooms that have

31. Grading on a curve is one of the potential grading practices considered by Figlio and Lucas (2004).
32. Similar results hold if students are sorted by ability across classrooms.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

/

9
3
3
3
4
1
6
9
1
3
8
7
e
d
p
_
A
_
0
0
1
3
6
p
d

F

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

359

ASSESSED BY A TEACHER LIKE ME

an average test score 13.7 percent of a standard deviation below the average
test score of white students’ peers. We also observe that the distribution of
black students’ peers’ test scores is strictly worse than white students’ peers’
test scores. 正式地, white students’ peers’ test score distribution first-order
stochastically dominates black students’ peers’ test score distribution, F (ω(西德:3) |w)
< F (ω(cid:3) |b). Then, at a given ability level ω, white students are less likely to get a high assessment than black students: P(a = ah|w, ω) − P(a = ah|b, ω) = F (ω(cid:3) |w, ω) − F (ω(cid:3) |b, ω) < 0. If teacher assessments reflect a ranking in the classroom, we should thus observe that, conditional on test scores, minority students get higher teacher assessments than white students. But results (available from the author) show a nonsignificant or a negative and significant effect of race on teacher assessment conditional on test scores. Another regression suggests a nonsignificant effect of peers’ test scores on teacher assessments. Such results make it unlikely that teacher assessments are a ranking of students within each classroom. The baseline effect of a same-race teacher on teacher assessments of table 4 and specification 8 is also not likely to be affected by teachers grading on a curve within each classroom. Column (1) of table 8 suggests that being assigned a same-race teacher is negatively correlated with peers’ test scores. But column (2) of table 8 shows that being assigned a same-race teacher is not significantly correlated with peers’ test scores when controlling for a student effect and teacher observables. Column (3) of the same table estimates the same-race effect in mathematics. The novelty compared to baseline specification 8 is that the specification controls for peers’ test scores. The estimate (+0.701) is virtually unchanged compared to table 4. Similar results, available from the author, hold in English. Results with All Racial Interaction Terms What races drive the results of the main specification? We disentangle the effects of different racial interactions in specification 8, by replacing the Same Race dummy by a set of dummies, one dummy for each interaction between the teacher’s and the student’s race: l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 T Ai, f,t = teacher i, f,t + constant + αTSi, f,t + student i, f (cid:2) + δr,r (cid:3) Dummy(teacher r ace =r )xDummy(s tudent r ace =r (cid:3) ) r (cid:4)=r (cid:3) + gr ade t, f + εi, f,t (12) 360 Amine Ouazad Table 8. Grading on a Curve Hypothesis Mathematics (1) (2) (3) Peers’ Test Scores Same Race Teacher Teacher Assessment Same Race Peers’ Average Test Score Test Score Student and Teacher Race and Gender Student Effects Teacher Effects F Statistic R2 Observations –0.609∗∗ (0.168) – – Yes No No 114.5 0.13 48,065 – –0.002 (0.002) –0.002∗∗ (0.001) Yes Yes No 13.5 0.82 48,065 0.701∗∗ (0.247) 0.065 (0.061) 0.264∗∗ (0.025) No Yes Yes 4.2 0.79 48,065 Notes: Standard errors clustered by student. Coefficients have similar significance levels when clustering by classroom. ∗∗Statistically significant at the 1% level. where there is one racial interaction dummy for each pair of races r,r(cid:3). Dummy(teacher race = r) × D(student race = r(cid:3)) = 1 if the teacher’s race is r and the student’s race is r(cid:3), and 0 otherwise. The effects of interest are the coefficients δ r,r(cid:3). The omitted dummy variables are the dummies for a teacher and a student of the same race, hence coefficients are interpreted relative to the assessment given by a same-race teacher. Results are presented in table 9.33 In mathematics, being assessed by a white teacher lowers the assessment of Hispanic children by 17.3 percent of a standard deviation, compared with being assigned by a Hispanic teacher (the same-race interaction dummy is omitted). The interaction between white teachers and black students is not significant, but the coefficient’s order of magnitude is comparable to baseline estimates. In English, the interaction is significant. White teachers give lower assessments to black children, lower by 11.1 percent of a standard deviation. They also give lower assessments to Hispanic children, by 14.8 percent of a standard deviation. 33. Results from very small minority groups (Pacific Islanders, American Indians) may not be robust. All racial interactions are included in the regressions but only coefficients for blacks, Hispanics, and whites are reported in the table. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 361 ASSESSED BY A TEACHER LIKE ME Table 9. Effects of All Racial Interactions Terms on Teacher Assessments Mathematics Teacher Assessment English Teacher Assessment (1) (2) Race of the Student Race of the Teacher Race of the Teacher non-Hispanic Black Hispanic non-Hispanic Black Hispanic White, White, –1.728∗∗ (0.627) –1.337 (0.872) Ref. Ref. 0.530 (0.414) 1.684∗∗ (0.568) White, non-Hispanic Ref. Black Hispanic, Any Race –0.590 (0.479) 0.899 (0.675) Test Score F Statistic R2 Student Effects Teacher Effects Grade Effects Observations –0.616 (0.512) Ref. 0.371 (1.697) 0.241∗∗ (0.009) 4.2 0.787 Yes Yes Yes 48,065 –1.110∗∗ (0.300) –1.480∗∗ (0.221) –0.980 (0.756) Ref. Ref. –0.643 (0.741) 0.314∗∗ (0.008) 5.6 0.774 Yes Yes Yes 67,855 Notes: This table presents the results of two separate regressions, each with the full set of interactions between the teacher’s race and the child’s race. Only the three largest minority group interactions are displayed in this table, but other interactions are included in the regressions. Ref. = interaction dummy omitted from the regression. ∗∗Statistically significant at the 1% level. Despite the size of standard errors, statistical tests show that black teach- ers give significantly higher English assessments to white students than white teachers to black students. Hispanic teachers, too, tend to give higher assess- ments in English to white students than white teachers to Hispanic students.34 In mathematics, white teachers give significantly lower assessments to His- panic students than to white and black students.35 Table 9 also shows that Hispanic teachers tend to give higher grades to white students than to Hispanic students in English. Hence most of the 34. A post-regression χ 2 test rejects the equality of coefficients “white teacher–black student” and “black teacher–white student,” as well as the equality of coefficients “white teacher–Hispanic student” and “Hispanic teacher–white student.” The χ 2 statistic is 15.28 (respectively, 15.11) with a p-value of 0.0001 (respectively, 0.0001). 35. The “white teacher–Hispanic student” coefficient is significant. Moreover, a χ 2 test rejects the equality of the “white teacher–Hispanic student” coefficient and the “white teacher–black student.” The statistic equals 4.62 and the p-value is 0.0316. 362 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad same-race effect on teacher assessments is driven by the behavior of white teachers toward black and Hispanic students. Policy Implications Racial Gaps in Test Scores and in Teacher Assessments Columns (1) to (4) of table 10 estimate racial gaps in test scores and in teacher assessments from kindergarten to grade 5 for both mathematics and English.36 As documented in the literature, the gap between white and black test scores increases from kindergarten to grade 5: from 63 percent to 93 percent of a standard deviation in mathematics, and from 45 percent to nearly 80 percent of a standard deviation in English. However, teacher assessments present a different picture. The white–black teacher assessment gap narrows slightly, decreasing from 47 percent to 45.5 percent of a standard deviation in mathematics and from 42 percent to 38.5 percent of a standard deviation in English. It is interesting that, over the same period, the fraction of black students assessed by a same-race teacher increases from 27.3 percent in kindergarten to 34.5 percent in grade 5, and the fraction of white students assessed by a same-race teacher remains relatively constant, at 92 percent. Because teacher assessments may depend on teachers’ identities, columns (9) to (12) present teacher assessment racial gaps while controlling for teachers’ race and for teacher–student racial interaction dummies.37 In these columns, the gap in teacher assessments increases from fall kindergarten to grade 5, from 37 percent to 49 percent of a standard deviation in mathematics, and from 46.6 percent to 49 percent of a standard deviation in English. The racial teacher assessment gap is increasing only when controlling for teachers’ race and teacher–student racial interactions.38 For Hispanic students, gaps in teacher assessments narrow faster than gaps in test scores. The white–Hispanic test score gap declines from 78 percent to 54 percent of a standard deviation in mathematics (a reduction of 24 percentage points [p.p.]); the white–Hispanic teacher assessment gap declines from 57 percent to 22 percent of a standard deviation in mathematics (a reduction of 35 p.p.). In columns (9) and (10), where regressions incorporate teachers’ race dummies and teacher–student racial interaction dummies, the gap in teacher assessment of student mathematics skills goes from 43 percent to 28 percent of a standard deviation (a 15-p.p. reduction). The situation is similar 36. Spring kindergarten, spring grade 1, and spring grade 3 are omitted from the table to save space, but the gaps evolve in the same manner from fall kindergarten to spring grade 5. 37. The full set of variables Dummy(Student race = r) × Dummy(Teacher race = r(cid:3)) for all pairs of races r and r(cid:3). Including other teacher observables as controls, such as gender, experience, tenure, and teacher fixed effects, does not affect white–black teacher assessment gaps. 38. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 363 ASSESSED BY A TEACHER LIKE ME t n e m s s e s s A r e h c a e T t n e m s s e s s A r e h c a e T e r o c S t s e T h s i l g n E s c i t a m e h t a M h s i l g n E s c i t a m e h t a M h s i l g n E s c i t a m e h t a M g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F g n i r p S l l a F ) 2 1 ( ) 1 1 ( ) 0 1 ( ) 9 ( ) 8 ( ) 7 ( ) 6 ( ) 5 ( ) 4 ( ) 3 ( ) 2 ( ) 1 ( 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i 5 e d a r G n e t r a g r e d n K i s t n e m s s e s s A r e h c a e T n i d n a s e r o c S t s e T n i s p a G l i a c a R ∗ ∗ 3 3 9 4 – . ) 1 1 7 0 ( . ∗ ∗ 4 9 0 3 – . ) 4 3 6 0 ( . 7 7 4 0 . ) 2 2 7 0 ( . ∗ ∗ 2 6 6 4 – . ) 1 7 7 0 ( . ∗ ∗ 9 7 5 4 – . ) 4 4 7 0 ( . ) 1 7 8 0 ( . 0 7 5 0 – . ∗ ∗ 0 0 9 4 – . ) 4 7 9 0 ( . ∗ ∗ 1 6 7 2 – . ) 6 8 8 0 ( . 4 4 4 1 . ) 0 7 0 1 ( . ∗ ∗ 4 4 7 3 – . ∗ ∗ 8 5 8 3 – . ∗ ∗ 5 4 1 4 – . ∗ ∗ 5 5 5 4 – . ∗ ∗ 1 4 7 4 – . ∗ ∗ 7 5 9 7 – . ) 1 8 1 1 ( . ) 7 1 4 0 ( . ) 0 3 3 0 ( . ) 1 5 5 0 ( . ) 1 0 4 0 ( . ) 6 8 3 0 ( . ∗ ∗ 6 0 3 4 – . ∗ ∗ 7 2 4 2 – . ∗ ∗ 0 7 5 4 – . ∗ ∗ 6 7 1 2 – . ∗ ∗ 8 6 5 5 – . ∗ ∗ 4 6 2 6 – . ) 9 3 1 1 ( . ) 7 1 3 0 ( . ) 9 8 2 0 ( . ) 0 3 4 0 ( . ) 6 4 3 0 ( . ) 3 0 3 0 ( . 3 6 6 0 . ∗ ∗ 4 0 6 1 . 7 0 6 0 – . ∗ ∗ 3 8 3 2 . 8 7 3 0 – . ∗ ∗ 4 7 3 1 – . ) 8 5 3 1 ( . ) 9 8 4 0 ( . ) 9 0 5 0 ( . ) 2 2 7 0 ( . ) 5 6 7 0 ( . ) 2 0 5 0 ( . ∗ ∗ 8 3 5 4 – . ) 1 8 2 0 ( . ∗ ∗ 1 5 2 5 – . ) 1 7 2 0 ( . ∗ ∗ 7 5 3 2 . ) 5 2 5 0 ( . ∗ ∗ 7 8 2 9 – . ) 4 3 5 0 ( . ∗ ∗ 7 8 3 5 – . ) 1 2 4 0 ( . 5 1 6 0 . ) 0 8 7 0 ( . ∗ ∗ 6 3 2 6 – . ) 6 9 2 0 ( . ∗ ∗ 5 8 7 7 – . ) 9 0 3 0 ( . ∗ 0 5 3 1 . ) 4 7 5 0 ( . . 0 1 e b a T l k c a B l 364 i c n a p s H i n a s A i s e Y s e Y s e Y s e Y o N o N o N o N o N o N o N o N e c a R r e h c a e T l i a c a R d n a 7 2 6 , 0 1 4 0 3 , 6 1 3 3 2 , 5 0 0 6 , 1 1 7 2 6 , 0 1 4 0 3 6 1 , 3 3 2 , 5 0 0 6 , 1 1 7 2 6 , 0 1 4 0 3 , 6 1 3 3 2 , 5 0 0 6 , 1 1 s n o i t a v r e s b O s m r e T n o i t c a r e t n I 5 0 0 . . 1 3 3 5 0 0 . . 8 4 0 4 4 0 0 . . 9 0 1 7 0 0 . . 3 2 3 5 0 0 . . 7 6 4 5 0 0 . . 3 5 5 4 0 0 . . 9 5 1 7 0 0 . . 4 6 4 1 1 0 . . 4 4 9 7 0 0 . . 6 9 8 2 1 0 . . 3 2 6 2 1 0 . . 3 8 1 1 . l e v e l % 1 e h t t a t n a c fi n g s i i y l l a c i t s i t a t s ∗ ∗ ; l e v e l % 5 e h t t a t n a c fi n g s i i c i t s i t a t S F 2 R y l l a c i t s i t a t S ∗ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad for assessments of English skills: although the gap in test scores rises by 10 p.p., the gap in teacher assessments goes down by 35 p.p. With controls, in columns (11) and (12), the gap in teacher assessments falls by only 15 p.p. Broadly speaking, relying solely on teacher assessments may not provide an accurate description of racial gaps from kindergarten to grade 5. Black–white test score gaps in teacher assessments do not increase from kindergarten to grade 5, whereas racial gaps in test scores suggest that African American stu- dents are falling behind. Hispanic–white gaps in teacher assessments narrow faster than gaps in test scores, except when controlling for dummies for the teacher’s race and teacher–student racial interaction dummies. Teacher Assessments and Later Test Scores The paper’s main result will be especially important if teacher assessments reflect perceptions that have a causal impact on student performance in math- ematics and English. The effect of more favorable assessments is ambiguous as, on the one hand, studies report that more positive treatment and attitudes toward minority students lead to higher achievement (Casteel 1998); on the other hand, in a survey of existing research, Cohen and Steele (2002) describe the potentially negative impacts of “overpraising” and “underchallenging” stu- dents (Mueller and Dweck 1998). Importantly, in this paper’s data set, stu- dents do not see teacher assessments. Therefore, it is unlikely that teachers were trying to please students by being too positive about their English and mathematics abilities.39 Estimating the impact of teacher perceptions on student performance is difficult because a causal estimation requires an experimental setting in which teachers get randomized information on students; typical experiments deceive teachers, inducing them to think more positively about a random subset of stu- dents (Jussim and Harber 2005). Experiments are typically performed on rela- tively smaller samples that are not nationally representative. In the well-known Pygmalion study, a random fraction of students was labeled as bloomers and the impact of this information on students’ IQ progress was found signifi- cant (Rosenthal and Jacobson 1968). Effects of teacher perceptions on later achievement are still debated (Jussim and Harber 2005). The challenge with my observational data set is to identify the impact of teacher assessments separately from the impact of teacher quality, which may be correlated with assessments, and from the impact of student ability, which is likely positively correlated with teacher assessments conditional on test 39. My results that white teachers give lower assessments to blacks and Hispanics suggests that teachers were not trying to provide socially desirable answers. Bertrand and Mullainathan (2001) describe such “social desirability” bias in surveys but here a social desirability bias would mean even lower teacher assessments for black and Hispanic students. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 365 ASSESSED BY A TEACHER LIKE ME scores. Because the data set follows students over time, and because teacher identifiers are available, we can estimate the impact of previous assessments on later scores conditional on student and teacher effects. A student effect controls for student unobservables that do not vary across grades, while the teacher effect controls for teacher quality and other teacher characteristics that affect later test scores: TSi, f,t = constant + b · TAi, f,t−1 + c · TSi, f,t−1 + Student i, f + Gr ade t, f + Teacher i, f,t + Res idual i, f,t (13) where notations are as above, TSi,f,t is the test score of child i in field f in grade t, TAi,f,t−1 is the subjective assessment of student i in the previous grade, TSi,f,t−1 is the test score in the same subject in the previous period, Studenti,f is a student effect, Gradet,f is a grade effect, and Teacheri,f,t is a teacher effect. The coefficient of interest here is b, the effect of the previous teacher assessment on the test score. In such a regression, estimates of the coefficients may be biased due to regression to the mean (Arellano and Bond 1991): A child who has a test score much above the average in, say, grade 1, is likely to have a test score closer to the average in the next period, in grade 3. This typically leads to biases in the estimation of the coefficients of interest b and c (Nickell 1981). To alleviate this issue, the test score TSi,f,t−1 is instrumented by test scores from previous grades as in Arellano and Bond (1991) as long as a student effect is included, in columns (2) to (4) and (6) to (8) of table 11. This table shows that, in such specifications, teacher assessments have an effect on later test scores, over and above prior test scores, child fixed effects, and teacher fixed effects. This effect is robust to a variety of specifications with or without the Arellano and Bond (1991) instrument, with or without child and teacher fixed effects, and with or without controls for peers’ test scores. A one standard deviation increase in prior teacher assessment is correlated with a 3.7 percent to 8 percent standard deviation increase in next grade’s test score, conditional on the effects and the maintained controls. In the regression, teacher assessments have a greater impact than test scores on later test scores.40 Also, keeping in mind the limitations of the regression (absence of an experimental design), the results suggest that having a same-race teacher from kindergarten to grade 5 raises teacher assessments by 7 percent of a standard deviation in mathematics (table 4), which raises grade 5 scores cumulatively over the five waves by 2.8 percent of a standard deviation in mathematics. Although only 2.57 percent of white students never 40. But interestingly, results available on request suggest that teacher assessments do not have an impact on test scores in the same grade. Teacher assessments have an impact on later test scores but not a significant impact on current test scores. 366 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . f / / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad e r o c S t s e T h s i l g n E e r o c S t s e T s c i t a m e h t a M ) 8 ( ∗ ∗ 3 6 0 0 . ) 7 0 0 0 ( . ∗ ∗ 7 3 0 0 . ) 7 0 0 0 ( . 4 . 7 1 7 8 0 . o N o N s e Y s e Y s e Y ) 7 ( ∗ ∗ 5 5 6 0 . ) 4 0 0 0 ( . ∗ ∗ 8 6 1 0 . ) 4 0 0 0 ( . 5 . 5 5 9 , 1 1 8 8 6 0 . s e Y o N s e Y o N s e Y 9 4 6 , 1 3 3 0 1 1 1 , . 6 4 3 7 2 8 0 . o N s e Y s e Y s e Y o N . 5 4 2 1 4 1 , 4 1 6 0 . s e Y s e Y s e Y o N o N 3 . 7 6 5 9 0 . o N o N s e Y s e Y s e Y . 2 8 8 2 , 7 9 7 7 0 . s e Y o N s e Y o N s e Y . 2 0 3 6 1 9 0 . o N s e Y s e Y s e Y o N ) 6 ( ) 5 ( ) 4 ( ) 3 ( ) 2 ( ∗ ∗ 7 5 0 0 . ∗ ∗ 5 8 6 0 . 0 1 0 0 – . ∗ ∗ 0 4 7 0 . ∗ ∗ 7 5 0 0 . ) 6 0 0 0 ( . ) 4 0 0 0 ( . ) 2 1 0 0 ( . ) 5 0 0 0 ( . ) 1 1 0 0 ( . ∗ ∗ 9 1 0 0 . ∗ ∗ 8 3 1 0 . ∗ ∗ 0 8 0 0 . ∗ ∗ 0 4 1 0 . ∗ ∗ 1 6 0 0 . ) 5 0 0 0 ( . ) 4 0 0 0 ( . ) 3 1 0 0 ( . ) 6 0 0 0 ( . ) 7 0 0 0 ( . s e r o c S t s e T r e t a L n o s t n e m s s e s s A r e h c a e T f o t c a p m I . 1 1 e b a T l ) 1 ( ∗ ∗ 9 7 7 0 . ) 4 0 0 0 ( . ∗ ∗ 0 0 1 0 . ) 4 0 0 0 ( . 3 . 8 8 1 , 0 1 8 9 6 0 . s e Y s e Y s e Y o N o N e v a W s u o i v e r P n i t n e m s s e s s A r e h c a e T e v a W s u o i v e r P n i e r o c S t s e T r e d n e G d n a e c a R t n e d u t S r e d n e G d n a e c a R r e h c a e T c i t s i t a t S F 2 R s t c e f f E t n e d u t S s t c e f f E r e h c a e T s t c e f f E e d a r G s n o i t a v r e s b O . s e t a m i t s e r a l i m s i l s d e i y m o o r s s a c l y b g n i r e t s u C l . t n e d u t s y b d e r e t s u l c s r o r r e d r a d n a t S : s e t o N . l e v e l % 1 e h t t a t n a c fi n g s i i y l l a c i t s i t a t S ∗ ∗ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 367 ASSESSED BY A TEACHER LIKE ME have a same-race teacher from kindergarten to grade 5, 54.3 percent of black students and 63 percent of Hispanic students have not had a single same-race teacher during the same period. 5. CONCLUSION The paper presents evidence that teachers give better assessments to students of their own race, even when controlling for test scores, student unobservables, teacher unobservables, and behavioral measures. Results are not significantly explained by measurement error in test scores or grading on a curve within each classroom. The same-race effect appears as soon as in kindergarten for skills covered by the tests. The presence of continuous detailed teacher assessments of similar skills as test scores, the longitudinal nature of the data set, and the use of econometric techniques controlling for a large number of teacher and student fixed effects are key ingredients for obtaining this paper’s results. Such evidence of better perceptions of same-race students’ performance using national representative data from the early years, with detailed robust- ness checks, should contribute to the debate in at least two ways. First, shifting from standardized test scores to teacher assessments of students may intro- duce bias in assessments. Although teachers may have a better grasp of student ability than tests, teachers’ perceptions are also affected by race and ethnicity. Second, my results suggest that teachers’ perceptions of same-race students explain part of the positive impact of same-race teachers on student test scores, as documented by Dee (2005). I would like to thank Brian Jacob, Francis Kramarz, Eric Maurin, Jesse Rothstein, Cecilia Rouse, and Timothy Van Zandt, as well as two anonymous referees, for particularly helpful suggestions on previous versions of this paper. I also thank audiences at the London School of Economics, the University of Amsterdam, Uppsala University, and the Industrial Relations Section at Princeton University. I am indebted to Cecilia Rouse for access to the data set. This project was undertaken while visiting Princeton University. For computing and financial support I thank INSEAD, CREST, the London School of Economics, and the Marie Curie Programme. The usual disclaimers apply. REFERENCES Abowd, John M., Robert Creecy, and Francis Kramarz. 2002. Computing person and firm effects using linked longitudinal employer–employee dataset. Unpublished paper, Cornell University. Abowd, John M., Francis Kramarz, and David N. Margolis. 1999. High wage workers and high wage firms. Econometrica 67(2): 251–334. doi:10.1111/1468-0262.00020 Achinstein, Betty, Rodney T. Ogawa, Dena Sexton, and Casia Freitas. 2010. Retaining teachers of color: A pressing problem and a potential strategy for “hard-to-staff” schools. Review of Educational Research 80(1): 71–107. doi:10.3102/0034654309355994 368 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d f . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Amine Ouazad Arellano, Manuel, and Stephen Bond. 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58(2): 277–297. doi:10.2307/2297968 Baltagi, Badi. 2008. Econometric analysis of panel data. Hoboken, NJ: Wiley. Bertrand, Marianne, and Sendhil Mullainathan. 2001. Do people mean what they say? Implications for subjective survey data. American Economic Review 91(2): 67–72. doi:10.1257/aer.91.2.67 Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. Measurement error in survey data. In Handbook of econometrics. vol. 5, edited by James J. Heckman and Edward Learner, pp. 3705–3843. Amsterdam, The Netherlands: Elsevier. Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. 2011. Robust inference with multiway clustering. Journal of Business & Economic Statistics 29(2): 238–249. doi:10.1198/jbes.2010.07136 Carpenter, Jeffrey P., Glenn W. Harrison, and John A. List. 2005. Field experiments in economics: An introduction. In Research in experimental economics 10, edited by R. Mark Isaac and Douglas A. Norton, pp. 1–15. Bingley, UK: Emerald Publishing. Casteel, Clifton A. interactions grated classrooms. Journal of Educational Research 92(2): 00220679809597583 1998. Teacher–student and race in inte- 115–120. doi:10.1080/ Clotfelter, Charles T., Helen F. Ladd, and Jacob Vigdor. 2005. Who teaches whom? Race and the distribution of novice teachers. Economics of Education Review 24(4): 377–392. doi:10.1016/j.econedurev.2004.06.008 Cohen, Geoffrey L., and Claude M. Steele. 2002. A barrier of mistrust: How negative stereotypes affect cross-race mentoring. In Improving academic achievement: Impact of psychological factors on education, edited by Joshua Aronson, pp. 305–331. Bingley, UK: Emerald Publishing. doi:10.1016/B978-012064455-1/50018-X Darling-Hammond, Linda, and Ray Pecheone. 2010. Developing an internationally comparable balanced assessment system that supports high-quality learning. Paper presented at the National Conference on Next Generation Assessment Systems, Center for K-12 Assessment & Performance Management, Washington, DC, March. Dee, Thomas S. 2004. Teachers, race, and student achievement in a random- ized experiment. Review of Economics and Statistics 86(1): 195–210. doi:10.1162/ 003465304323023750 Dee, Thomas S. 2005. A teacher like me: Does race, ethnicity, or gender matter? American Economic Review 95(2): 158–165. doi:10.1257/000282805774670446 Ferguson, Ronald F. 2003. Teachers’ perceptions and expectations and the black-white test score gap. Urban Education 38(4): 460–507. doi:10.1177/0042085903038004006 Figlio, David N., and Maurice E. Lucas. 2004. Do high grading standards af- fect student performance? Journal of Public Economics 88(9): 1815–1834. doi:10.1016/ S0047-2727(03)00039-2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / f e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 369 ASSESSED BY A TEACHER LIKE ME Fryer, Jr, Roland G., and Steven D. Levitt. 2004. Understanding the black-white test score gap in the first two years of school. Review of Economics and Statistics 86(2): 447–464. doi:10.1162/003465304323031049 Fryer, Roland G., and Steven D. Levitt. 2006. The black-white test score gap through third grade. American Law and Economics Review 8(2): 249–281. doi:10.1093/aler/ ahl003 Giuliano, Laura, David I. Levine, and Jonathan Leonard. 2009. Manager race and the race of new hires. Journal of Labor Economics 27(4): 589–631. Giuliano, Laura, David I. Levine, and Jonathan Leonard. 2011. Racial bias in the manager–employee relationship: An analysis of quits, dismissals, and promotions at a large retail firm. Journal of Human Resources 46(1): 26–52. doi:10.1353/jhr.2011.0022 Gluszek, Agata, and John F. Dovidio. 2010. The way they speak: A social psychological perspective on the stigma of nonnative accents in communication. Personality and Social Psychology Review 14(2): 214–237. doi:10.1177/1088868309359288 Greene, William H. 2011. Econometric analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. Gresham, Frank M., and Stephen N. Elliott. 1990. Social skills rating system (SSRS). Circle Pines, MN: American Guidance Service. Hinnerich, Bj¨orn Tyrefors, Erik H¨oglin, and Magnus Johannesson. 2011. Are boys discriminated in Swedish high schools? Economics of Education Review 30(4): 682–690. doi:10.1016/j.econedurev.2011.02.007 Jan, and Oi-man Kwok. 2007. Hughes, student–teacher and parent–teacher relationships on lower achieving readers’ engagement and achieve- in the primary grades. Journal of Educational Psychology 99(1): 39–51. ment doi:10.1037/0022-0663.99.1.39 Influence of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Ingersoll, Richard M., and Henry May. 2011. Recruitment, retention and the minority teacher shortage. Consortium for Policy Research in Education Research Report No. RR-69. Jackson, C. Kirabo, and Elias Bruegmann. 2009. Teaching students and teaching each other: The importance of peer learning for teachers. American Economic Journal: Applied Economics 1(4): 85–108. Jussim, Lee. 1989. Teacher expectations: Self-fulfilling prophecies, perceptual bi- ases, and accuracy. Journal of Personality and Social Psychology 57(3): 469–480. doi:10.1037/0022-3514.57.3.469 Jussim, Lee, and Kent D. Harber. 2005. Teacher expectations and self-fulfilling prophe- cies: Knowns and unknowns, resolved and unresolved controversies. Personality and Social Psychology Review 9(2): 131–155. doi:10.1207/s15327957pspr0902_3 Kirby, Sheila Nataraj, Mark Berends, and Scott Naftel. 1999. Supply and demand of minority teachers in Texas: Problems and prospects. Educational Evaluation and Policy Analysis 21(1): 47–66. doi:10.3102/01623737021001047 370 Amine Ouazad Lavy, Victor. 2004. Do gender stereotypes reduce girls’ human capital outcomes? Evidence from a natural experiment. NBER Working Paper No. 10678. Lyons, Anthony, and Yoshihisa Kashima. 2003. How are stereotypes maintained through communication? The influence of stereotype sharedness. Journal of Person- ality and Social Psychology 85(6): 989. doi:10.1037/0022-3514.85.6.989 Marcus, Geoffrey, Susan Gross, and Carol Seefeldt. 1991. Black and white students’ perceptions of teacher treatment. Journal of Educational Research 84(6): 363–367. doi:10.1080/00220671.1991.9941817 Meier, Kenneth J., Joseph Stewart, Jr., and Robert E. England. 1989. Race, class, and education: The politics of second-generation discrimination. Madison, WI: University of Wisconsin Press. Moulton, Brent R. 1990. An illustration of a pitfall in estimating the effects of ag- gregate variables on micro units. Review of Economics and Statistics 72(2): 334–338. doi:10.2307/2109724 Mueller, Claudia M., and Carol S. Dweck. 1998. Praise for intelligence can undermine children’s motivation and performance. Journal of Personality and Social Psychology 75(1): 33–52. doi:10.1037/0022-3514.75.1.33 Nickell, Stephen. 1981. Biases in dynamic models with fixed effects. Econometrica 49(6): 1417–1426. doi:10.2307/1911408 Phelps, Edmund S. 1972. The statistical theory of racism and sexism. American Economic Review 62(4): 659–661. Price, Joseph, and Justin Wolfers. 2010. Racial discrimination among NBA ref- erees. Quarterly Journal of Economics 125(4): 1859–1887. doi:10.1162/qjec.2010.125.4 .1859 Rosenthal, Robert, and Lenore Jacobson. 1968. Pygmalion in the classroom: Teacher expectation and pupils’ intellectual development. New York: Holt, Rinehart & Winston. Rudner, Lawrence M., and William D. Schafer. 2001. Reliability: ERIC Digest No. ED458213. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. Rutherford, Jr., Robert B., Mary Magee Quinn, and Sarup R. Mathur. 2004. Handbook of research in emotional and behavioral disorders. New York: Guilford Publications. Sherman, Thomas M., and William H. Cormier. 1974. An investigation of the influence of student behavior on teacher behavior. Journal of Applied Behavior Analysis 7(1): 11–21. doi:10.1901/jaba.1974.7-11 Stangor, Charles, Gretchen B. Sechrist, and John T. Jost. 2001. Changing racial beliefs by providing consensus information. Personality and Social Psychology Bulletin 27(4): 486–496. doi:10.1177/0146167201274009 Tourangeau, Karen, Christine Nord, Thanh Le, Alberto G. Sorongon, and Michelle Najarian. 2009. Combined user’s manual for the ECLS-K eighth-grade and K-8 full sam- ple data files and electronic codebooks. Alexandria, VA: National Center for Education Statistics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 371 ASSESSED BY A TEACHER LIKE ME Van Ewijk, Reyn. 2011. Same work, ers’ subjective assessments. Economics of Education Review 30(5): doi:10.1016/j.econedurev.2011.05.008 lower grade? Student ethnicity and teach- 1045–1058. Wilson, Robert J., and Rhonda L. Martinussen. 1999. Factors affecting the assessment of student achievement. Alberta Journal of Educational Research 45(3): 267–277. Wooldridge, Jeffrey M. 2002. Econometric analysis of cross section and panel data. Cam- bridge, MA: MIT Press. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / f / e d u e d p a r t i c e - p d l f / / / / / 9 3 3 3 4 1 6 9 1 3 8 7 e d p _ a _ 0 0 1 3 6 p d . f f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 372Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image
Amine Ouazad image

下载pdf