THE PROMISE OF ADMINISTRATIVE DATA

THE PROMISE OF ADMINISTRATIVE DATA

IN EDUCATION RESEARCH

抽象的
Thanks to extraordinary and exponential improvements in data
storage and computing capacities, it is now possible to collect,
manage, and analyze data in magnitudes and in manners that
would have been inconceivable just a short time ago. As the world
has developed this remarkable capacity to store and analyze data,
so have the world’s governments developed large-scale, 压缩-
hensive datafiles on tax programs, workforce information, benefit
节目, 健康, 和教育. Although these data are col-
lected for purely administrative purposes, they represent remark-
able new opportunities for expanding our knowledge. We describe
some of the benefits and challenges associated with the use of ad-
ministrative data in education research.

Presidential Essay

David Figlio

(corresponding author)

Institute for Policy Research

Northwestern University

and NBER

Evanston, 伊尔 60208

figlio@northwestern.edu

Krzysztof Karbownik

Institute for Policy Research

Northwestern University

Evanston, 伊尔 60208

krzysztof.karbownik

@northwestern.edu

Kjell Salvanes

Norwegian School of

经济学

卑尔根, Norway

土井:10.1162/EDFP_a_00229
C(西德:2) 2017 Association for Education Finance and Policy

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

.

/

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

129

The Promise of Administrative Data

I N T R O D U C T I O N
Thanks to extraordinary and exponential improvements in data storage and computing
容量, it is now possible to collect, manage, and analyze data in magnitudes and in
manners that would have been inconceivable just a short time ago. And as the world
has developed this remarkable capacity to store and analyze data, so have the world’s
governments developed large-scale, comprehensive datafiles on tax programs, 工作-
force information, benefit programs, 健康, 和教育. 今天, in many countries
世界各地, governments collect, 维持, and store an archive of information
regarding a vast range of behaviors and outcomes over an individual’s entire lifetime
(Card et al. 2010). Governments have established statistics offices to maintain and use
these data to produce official statistics about their populations. In the education sector,
governments have invested large sums of funds to develop longitudinal data systems.
美国. Department of Education alone has invested over $750 million to help states
建造, populate, and maintain these data systems.

Although these data are collected for purely administrative purposes, they represent
remarkable new opportunities for expanding our knowledge and, through the conduct
of analyses with more comprehensive data and better sources of exogenous variation
than could typically be used in times past, challenging conventional wisdom in many ar-
eas based on previous research utilizing other sources (such as surveys). Administrative
data also facilitate study of research questions that have heretofore not been possible to
credibly study at all. Researchers who are able to access these data (especially those
able to link data across administrative domains) have the ability to make extraordinary
scientific advances by exploiting the population-wide datasets in combination with the
increased opportunity for identification of causal effects through exogenous variation
(经过, 例如, policy changes, 自然灾害, and other shocks that affect some
groups of people but not others). In addition to natural experiments, these data can fa-
cilitate the conduct of field experiments, where the subjects of short-term experiments
can be followed administratively for a longer period of time in manners that would
have been impossible or prohibitively expensive to do absent large-scale administra-
tively collected data. The new insights from these studies have extraordinary potential
to inform education policy and practice, and we believe the massive growth in the
quality and diversity of social science research on educational topics—on display over
the past decade of the Association for Education Finance and Policy (AEFP)—is surely
highly related to the increased availability of good administrative data. 在这篇文章中, 我们
describe some of the benefits and challenges associated with the use of administrative
data in education research.

B E N E F I T S A S S O C I A T E D W I T H A D M I N I S T R A T I V E E D U C A T I O N D A T A
There are many uses for designed survey data, but designed data collections are not
a panacea. They offer great opportunities to ask questions in very specific ways, 然而
they are also expensive, necessarily have relatively modest sample sizes, are subject to
attrition biases, and are not well-suited to prospectively studying policy and practice
变化.

Administrative datasets offer a number of clear benefits for empirical research in
教育, supplementing designed datasets in some cases and supplanting them to

130

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

/

F

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

F

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

David Figlio, Krzysztof Karbownik, and Kjell Salvanes

some degree in others. The ability to study population-level data offers a number of re-
markable new possibilities that are extremely difficult to achieve with designed surveys
and purpose-built datasets. Perhaps the most obvious involves statistical power—in
contrast to datasets with hundreds or thousands of observations, administrative data
sets with many times that number of observations mean that one can frequently detect
modest but meaningful relationships with much greater precision than was previously
可能的. But there are at least two other distinct advantages of administrative data that
are afforded by the large magnitudes of observations. One involves the ability to detect
rare events that might be useful for identification: In administrative datasets, it is often
possible to make twin comparisons or study children from three-child families; to inves-
tigate the effects of extremely rare climatic or seismic events that offer the opportunity
for plausible identification of treatment effects; or to study specific economic events
like plant closures (Roed and Raaum 2003; Card et al. 2010). In traditionally designed
surveys, it is rare to have sufficient numbers of observations to be able to carry out
analyses of these types. Another major advantage of having large-scale administrative
data is the ability to study heterogeneous effects of educational policies and practice:
With very large numbers of observations, it becomes possible to see whether the effects
are similar across wildly different groups of individuals, 和, if they differ, how they
differ and for whom. 相似地, with population-level administrative data it is possible
to study people at the extremes of the income distribution—say, those who are typically
not well covered in traditionally designed surveys. 同样地, large-scale administrative
data provide opportunities to use very rich nonparametric specifications when measur-
ing effects, a valuable advantage in cases where relationships of interest may not be
线性.

Another benefit of using administrative data for research purposes is that because
data coverage is universal, it is possible to link administrative data from one domain
(例如, 教育) to data from another domain (例如, workforce or health). This is ob-
viously also possible in other non-administrative settings as well, but doing so is con-
siderably more difficult because people would have to be purposefully longitudinally
followed, and because a cross-section of educational data, 例如, and a cross-
section of health data may only include some of the same individuals by happenstance.
Administrative data, by virtue of their population-level nature and the frequency of data
观察, allow the researcher to follow individuals or entities over time so that there
is a panel structure to the data.

Administrative datasets also provide novel types of variables typically not found in
non-administrative data (Einav and Levin 2013). They can offer new opportunities, 为了
实例, to look at measures of delinquency, of changing geographical location, 的
社交网络, and of health instances that are nearly impossible to study in any other
方式. The real-time nature of administrative data also provides new opportunities
to study the effects of educational policies and practices that are very recent. 它也是
offers the chance for researchers to make their scholarship much more relevant to the
specific policy decisions that policy makers must make right away than are studies
that make use of retrospective information (Einav and Levin 2013). 和, 当然,
natural experiments need not be rare events to be better-studied using administrative
datasets. Because natural experiments are unannounced, and often occur via chance
or quirks, it is very difficult to set up a prospective study that will permit the evaluation

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

.

F

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

131

The Promise of Administrative Data

of a natural experiment. With administrative data that cover a population and that are
recorded regularly, it is much more feasible to ex post identify and study these natural
实验 (Roed and Raaum 2003).

Although not always the case, data quality is frequently better in administrative
data than in retrospective data collection. Rather than asking people whether they
participated in a given program twenty years ago, scholars who make use of admin-
istrative data can observe directly whether the individuals participated—according to
the authorities who paid for the participation and therefore had a strong interest in
correctly recording the occurrence! 此外, because of the mandatory nature of
participation in the activities that generate administrative data, these data are much
less likely to suffer from attrition problems or nonresponse problems than are data
collected through voluntary means (Card et al. 2010). 同样地, administrative data are
likely to be less subject to over-reporting or under-reporting of key variables than is the
case with voluntarily collected data.

Administrative data also facilitate the study of intergenerational issues. 有可能的,
at least in some contexts, to match children’s administrative records to that of their
parents, and even grandparents. Whereas it is certainly possible to purposefully follow
families longitudinally, the risk of attrition is surely greater when attempting to move
from one generation to the next than if it is possible to directly match individuals
using administrative means (Roed and Raaum 2003). 和, in the case of questions
that require a long amount of time to study in real time (例如, intergenerational issues),
the time horizon over which intergenerational questions may be studied can be shrunk
considerably with administrative data.

Administrative data have major practical value for local policy as well. 不同的
countries have extremely different policy environments, and so do different states. 我们
absolutely can learn a lot from different contexts, but there are times when policy
makers wish to act on data that are immediately relevant to their own populations.
Developing, 维持, and sharing administrative data with researchers creates
new opportunities for knowledge creation that is directly tied to local policy, 实践,
and populations. 和, 当然, finding results from different settings increases the
external validity and generalizability of research findings as well.

总之, administrative data are more comprehensive than are designed survey
数据, and can be collected with frequently far more accurate information. 此外,
the costs of conducting research with administrative data are much lower as well,
at least once the data systems are developed. Once data structures are established,
linking and extracting more records from administrative data cost only the time of the
programmer. 还, the marginal cost of adding more individuals or periods of data
to the analytical sample is extremely small, suggesting remarkably large economies of
scale associated with administrative data (Roed and Raaum 2003). There are obviously
many important roles for purpose-built designed survey data—not least the fact that
only with purpose-built data is it possible to study precisely the questions that one
wishes to study in exactly the manner in which one wishes to study them—yet it is also
evident that administrative data offer numerous new opportunities to conduct research
on questions that were previously impossible to study, or at least to study so well. 的确,
administrative data and survey data, although sometimes substitutes, can frequently be
considered complements, as when administrative data can reduce the set of questions

132

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

F

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

David Figlio, Krzysztof Karbownik, and Kjell Salvanes

that needs to be answered via surveys, or when administrative data can be used to serve
as a check on the reliability of retrospective information collected via surveys (Roed
and Raaum 2003). Administrative data can also be thought of as complementary to
the conduct of field experiments, as the costs of tracking and following up with field
experiment participants are much lower, and the data frequently much better, 当。。。的时候
field experiments can be linked with data collected by governments for administrative
目的 (Card et al. 2010). Wide access to administrative data facilitates accountability
of research as well, and helps to ensure increased research quality (reducing researcher
“monopolies” over data, 所以, makes more and better research possible). For all
of these reasons, having a high degree of access to administrative data makes a wide
range of empirical studies in education more feasible and more believable.

C H A L L E N G E S A S S O C I A T E D W I T H T H E U S E O F A D M I N I S T R A T I V E D A T A
Although administrative data provide many exciting opportunities for improving edu-
cation research and practice, there are some substantial limitations as well. 对于一个,
administrative datasets are collected for different reasons than research, and the types
of variables that are captured in administrative data often do not comport with the
types of variables that testing many educational and social science theories demands.
例如, administrative datasets provide very little information on cognitive skills
other than certain measures of achievement and attainment, or social and behavioral
技能, such as standardized test scores, attendance, and suspensions. Many research
questions demand that we know generally unmeasured information, such as motiva-
的, attitudes, and “big five” psychological traits. In some countries, some of these
variables are occasionally measured but even then the data are only seen in limited
情况, such as in military data for men for a limited number of years. 但是这个
does not foreclose the opportunities to collect these data in the future, either formally
as part of an administrative data collection or as a supplemental purpose-built survey.
例如, in Norway, just prior to the decision to attend high school, a subset of
pupils was surveyed about “big five” traits, time use, and other variables typically only
seen in designed studies. These students were recruited into lab experiments regard-
ing willingness to compete, risk taking, patience, and other variables, which was then
appended to administrative data (Alm ˚as et al. 2014). Approaches like this provide the
opportunity to capitalize on the best attributes of both designed survey data and lab
实验 (ability to measure the variables we most want to observe), 和管理员-
istrative data (efficiency in following people, accurate program participation data, 和
the like).

Another shortcoming of administrative data involves technical issues. One huge
advantage of administrative data for research purposes is the potential ability to match
data across domains (例如, combining education data with health, 劳动力, 犯罪, 或者
other data). In some locales, such as the Nordic countries, all administrative registers
maintain common identification numbers and laws for statistical research purposes,
enabling researchers to use merged registers across administrative units, with de-
identified and merged data made available through the national statistical offices.
But in many locations a unique personal identification number either does not exist
or there are legal restrictions to merging data with registers across administrative

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

.

/

F

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

133

The Promise of Administrative Data

units.1 In the United States, 例如, only a few states have linked children’s social
security numbers, which are used for all labor market and benefits data, to their birth
记录, and in many states it is illegal to link social security numbers to education
记录. In cases like these, it becomes much more challenging to link data across
administrative domains. Although many states in the United States are making strong
progress in linking education and workforce data (thanks to the leadership of the Data
Quality Campaign—among other allied groups—a national organization dedicated to
promoting the development, implementation, and use of high-quality administrative
education datasets), this is a difficult and slow-going process.

Attrition is less pronounced in the case of administrative data than in most cases
of designed survey data, yet it still presents substantial challenges in administrative
data applications. People often move to other countries to work or change citizenship,
and this issue is compounded in countries like the United States where people move
freely and often between states but individual states maintain their own birth records,
health records, education records, and workforce data. 最近几年, 它变成了
possible in rare circumstances to match school records to tax data from the U.S. Internal
Revenue Service in order to follow children living in one state to adult outcomes in
another state (看, 例如, Chetty et al. 2014), and we are hopeful that more cases like this
will occur in the future (although these cases are themselves still very limited).

而且, because administrative datasets are not designed for research in the first
地方, they are often not particularly well documented. 作为结果, 研究人员
using administrative data must often undertake large time and effort investments rela-
tive to those who make use of well-designed and well-documented surveys. 此外,
as part of the very nature of administrative data collection, it is not uncommon for vari-
able definitions to change as the things being measured change, and these changes are
not always easily available to outsiders of the administrative units collecting the data.
最后, 当然, it is always possible that the administrative datasets are incomplete
or contain errors, since their purpose was never for research quality but rather for
recording activities such as governmental program participation and compliance. 作为
a consequence, these data may not have been subjected to the same type of quality
assurance/quality control that is standard in the case of datasets collected specifically
for research purposes.

One other highly important issue with using administrative registers is that, 因为
of security and confidentiality concerns, they cannot be made available publicly. 给定
that unique personal identifiers exist, most countries with these datasets have developed
secure systems for making them available to researchers or research groups. 在里面
Nordic countries (with the exception of Finland, where a slightly different system is in
地方), very similar systems have been developed over time where statistical agencies
play an important role in merging and de-identifying data for researchers, 一般来说
through research centers that have been through a quite extensive application procedure
with data authorities, owners of data, and the national statistical offices. In all situations,
governments must balance the costs associated with potential security breaches against
the very large benefits of making data available to a wide range of researchers, who have

1. Another legal restriction was in place, 直到最近, in Finland where the possibility of indirect identification

of individuals in small groups prevented researchers from using registry data in full capacity.

134

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

F

/

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

David Figlio, Krzysztof Karbownik, and Kjell Salvanes

insights and expertise in a larger set of substantive research issues. It is imperative
that researchers work diligently to ensure that they treat administrative data with
care and maintain a high degree of security so that justifiably worried stewards of
administrative data can feel more confident that sharing data with scholars provides
high benefits to citizens with extremely low risks of security breaches or other forms of
negligent behavior. 而且, it is essential that researchers build trust relationships
with partners in government so that officials are confident that the data they are
entrusted to collect and protect are only used for the stated purposes, that they are not
used to score political points, and that they are treated with care and the highest levels
的保护.

C O N C L U S I O N
We are learning more than ever before about education finance, 实践, and policy
as a direct consequence of the widespread use of administrative data in education
研究. Administrative data open up new questions that could not previously have
been studied, allow us to reevaluate existing questions with new and more compelling
empirical approaches and identification strategies, and permit analysis of questions
of specific interest to particular localities. 所以, although administrative datasets
will never eliminate the need for or utility of purpose-built designed survey data, 他们
can make these surveys more efficient and effective by concentrating more of the
energy on the types of designed data collections not seen in administrative data, 尽管
supplementing surveys with administrative data when possible. 相同, 当然,
is also true for field and lab experiments, which can be much more efficient when
merged with administrative data.

For administrative data to reach their full potential for research, 政策, and practice,
然而, more work needs to be done. There are practical and technical issues that im-
pede the utility of these data, and governmental officials have good reason to be hesitant
to widely share the data that they are entrusted to collect and protect. An important role
of the membership of AEFP should be to show the value of thoughtful and mutually
beneficial collaborations between researchers, practitioners, and policy makers. 这
more we engage in these types of partnerships and build trust relationships, the more
likely it will be that widespread use of administrative data will continue to revolutionize
and improve education research, 实践, and policy in the United States and around
世界.

致谢
David Figlio acknowledges financial support from the National Science Foundation (奖
1244752) for this work.

参考
Alm ˚as, Ingvild, Alexander W. Cappelen, Kjell G. Salvanes, Erik Ø. Sørensen, and Bertil Tun-
godden. 2014. Willingness to compete: Family matters. 卑尔根, Norway: NHH Department of
Economics Discussion Paper No. 03/2014.

Card, 大卫, Raj Chetty, Martin Feldstein, and Emmanuel Saez. 2010. Expanding access to
administrative data for research in the United States. 华盛顿, 直流: National Science Foun-
dation White Paper No. 10–069.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

/

F

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

.

F

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

135

The Promise of Administrative Data

Chetty, Raj, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez. 2014. Where is the land of
机会? The geography of intergenerational mobility in the United States. Quarterly Journal
of Economics 129(4):1553–1623. 土井:10.1093/qje/qju022.

Einav, Liran, and Jonathan D. 莱文. 2013. The data revolution and economic analysis. NBER
Working Paper No. 19035.

Roed, Knut, and Oddbjørn Raaum. 2003. Administrative registers—Unexplored reservoirs of sci-
entific knowledge? Economic Journal (牛津) 113(488):F258–F281. 土井:10.1111/1468-0297.00134.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

F

/

/

e
d

e
d
p
A
r
t

C
e

p
d

F
/

/

/

/

1
2
2
1
2
9
1
6
9
1
4
6
3
e
d
p
_
A
_
0
0
2
2
9
p
d

.

F

/

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

136THE PROMISE OF ADMINISTRATIVE DATA image
THE PROMISE OF ADMINISTRATIVE DATA image
THE PROMISE OF ADMINISTRATIVE DATA image
THE PROMISE OF ADMINISTRATIVE DATA image

下载pdf