Data shown in the Opportunity Explorer are from SEDA 5.0. A simplified description of the SEDA 5.0 methodology is provided here. For more detail on how we construct the estimated test score parameters, please see the SEDA 5.0 Technical Documentation. For more detail on the statistical methods that we use, as well as information the accuracy of the estimates, please see our technical papers. Not all data from SEDA 5.0 is visualized on the Opportunity Explorer; the complete data are downloadable on the Get the Data page.

Data shown in the 2019-2023 Education Recovery Explorer are unique estimates (downloadable as “SEDA 2023” on the Get the Data page). For a detailed description of the difference between the most recent version of SEDA and the SEDA 2023 data, view the SEDA 2023 Special Release Technical Documentation.

Background

Federal law requires all states to test all students in grades 3-8 each year in math and RLA (commonly called “accountability” testing). It also requires that states make the aggregated results of those tests public.

All states report test results for schools and school districts to the Department of Education’s National Center for Education Statistics (NCES), which houses this data in the EDFacts database. There are two versions of this data: a public version, available on their website, and a restricted version, available by request. In the public files, data for small places or small subgroups are not reported to ensure the privacy of the students. In contrast, the restricted files contain all data, regardless of the number of students within the school/district or subgroup. We use the restricted files to estimate test score distributions in education policy relevant units (e.g., schools, geographic districts, administrative school districts, counties, metropolitan statistical areas, commuting zones, and states). The school, geographic district, county, and state data are visualized on the Opportunity Explorer; all other aggregations can be found in the downloadable data files on the Get the Data page. Details on the data construction process can be found below.

SEDA methods: Addressing the challenges

While these challenges are substantial, they are not insurmountable. The EOP team has developed methods to address these challenges in order to produce estimates of students’ average test scores, average learning rates across grades, and trends over time in each unit (e.g., school, geographic district, etc.). All estimates are comparable across states, grades, and years.

Below we describe the raw data used to create SEDA and how we:

  1. Estimate the location of each state’s proficiency thresholds
  2. Place states’ proficiency thresholds on the same scale
  3. Estimate mean test scores from the raw data and the threshold estimates
  4. Scale the estimates so they are measured in terms of grade levels
  5. Scale the estimates to grade equivalents
  6. Estimate average scores, learning rates, and trends in average scores

Raw data

The data used to develop SEDA come from the EDFacts restricted-use files, provided to our team via a data-use agreement with NCES.

EDFacts provides the counts of students scoring in proficiency levels for nearly every school in the U.S.

From the data provided, we use data for:

  • All 50 states and DC
  • School years 2008-09 through 2018-19
  • Grades 3-8
  • Math and RLA
  • Various subgroups of students: all students, racial/ethnic subgroups, gender subgroups, and economic subgroups

In the data, schools are identified by NCES IDs, which can be linked to other data sources such as the Common Core of Data (CCD). Following the CCD’s guidelines, we create a stable school identifier we call the SEDA school ID. To create stable geographic district identifiers, called SEDA LEA IDs, we use the school’s most recent geographical information and the most recent school district the school was observed in based on the 2019 elementary and unified school district boundaries from EDGE. Similarly, we use the most recent administrative district, county, metropolitan area, commuting zone, and state for each school to ensure stability over time. Note that SEDA includes special education schools (as defined by CCD) in school, administrative district, and state level data only (i.e., they are not included in geographic district, county, metropolitan statistical area, or commuting zone level data). You can also use the published crosswalk on our Get the Data page to obtain the stable or time-varying geographical information for years 2009-2019.

Below is a mock-up of the proficiency data format we use in school estimation:

Number of Students Scoring at
SEDA School ID Subgroup Subject Grade Year (Spring) Proficiency Level 1 Proficiency Level 2 Proficiency Level 3 Proficiency Level 4
11111122222 All students RLA 3 2009 10 50 100 50
777777755555 All students RLA 3 2009 5 30 80 40

Estimating the location of each state’s proficiency thresholds

We use a statistical technique called heteroskedastic ordered probit (HETOP) modeling to estimate the location of the thresholds that define the proficiency categories within each state, subject, grade, and year. We estimate the model using all the counts of students in each geographic school district within a state-subject-grade-year.

A simplified description of this method follows. We assume the distribution of test scores in each school district is bell-shaped. For each state, grade, year, and subject, we then find the set of test-score thresholds that meet two conditions: 1) they would most closely produce the reported proportions of students in each proficiency category; and 2) they represent a test-score scale in which the average student in the state-grade-year-subject has a score of 0 and the standard deviation of scores is 1.



Example: State A, Grade 4 reading in 2014–15

In the example below, there are three districts in State A. The table shows the number and proportion of scores in each of the state’s four proficiency categories. District 1 has more lower-scoring students than the others; District 3 has more higher-scoring students. Assuming each district’s test-score distribution is bell-shaped, we determine where the three thresholds would be located that would yield the proportions of students in each district shown in the table. In this example, the top threshold is one standard deviation above the statewide average score. At this value, we would expect 0% of students from District 1, 16% of students from District 2, 20% of students from District 3 to score in the top proficiency category.

Distribution table Distribution chart

Placing the proficiency thresholds on the same scale

As discussed above, we cannot compare proficiency thresholds across places, grades, and years because states use different tests with completely different scales and set their proficiency thresholds at different levels of mastery. Knowing that a proficiency threshold is one standard deviation above the state average score does not help us compare proficiency thresholds across places, grades, or years because we do not know how a state’s average score in one grade and year compares to that in other states, grades, and years.

Luckily, we can use the National Assessment of Educational Progress (NAEP), a test taken in every state, to place the thresholds on the same scale. This step facilitates comparisons across states, grades, and years.

A random sample of students in every state takes the NAEP assessment in Grades 4 and 8 in math and RLA in odd years (e.g., 2009, 2011, 2013, 2015, 2017, and 2019). From NAEP, then, we know the relative performance of states on the NAEP assessment. In the grades and years when NAEP assessments were not administered to students, we average the scores in the grades and years just before and just after to obtain estimates for untested grades, subjects, and years.

We use the states’ NAEP results in each grade, year, and subject to rescale the thresholds to the NAEP scale. For each subject, grade, and year, we multiply the thresholds by the state’s NAEP standard deviation and add the state’s NAEP average score.



Example: State A, Grade 4 reading in 2014–15

The average score and standard deviation of State A NAEP scores in Grade 4 reading in 2014–15 were:

  • Mean NAEP Score: 200
  • Standard Deviation of NAEP Score: 40

We have three thresholds:

  • Threshold 1: -0.75
  • Threshold 2: 0.05
  • Threshold 3: 1.0

As an example, let’s convert Threshold 1 onto the NAEP scale. First, we multiply by 40. Then, we add 200:

(-0.75 x 40.0) + 200 = 170

This yields a new “linked” Threshold 1 of 170. The table below shows all three linked thresholds.

Threshold Original Linked (on NAEP Scale)
1 -0.75 170
2 0.05 202
3 1.0 240

We repeat this step for every state in every subject, grade, and year. The result is a set of thresholds for every state, subject, grade, and year that are all on the same scale, the NAEP scale.

For more information, see Reardon, Kalogrides & Ho (2019).

Estimating the mean from proficiency count data

The next step of our process is to estimate the mean test score in each unit for all students and by student subgroups (gender, race/ethnicity, and economic disadvantage). To do this, we estimate heteroskedastic ordered probit models using both the raw proficiency count data (shown above) and the linked thresholds from the prior step. This method allows us to estimate the mean standardized test score in each unit for every subgroup, subject, grade, and year on the same scale.

For more information, see Steps 5 and 6 in the technical documentation; Reardon, Shear, et al. (2017); and Shear and Reardon (2020).

For more information, see Reardon, Shear, et al. (2017); and Shear and Reardon (2020).

Scaling the estimates to grade equivalents

On the website, we report all data in grade levels, or what we call the Grade (within Cohort) Standardized (GCS) scale. On this scale, users can interpret one unit as one grade level. The national average performance is 3 in Grade 3, 4 in Grade 4, and so on.

To convert our estimates from the NAEP scale into grade levels, we first approximate the average amount student test scores grow in a grade on NAEP. To do this, we use data from four national NAEP cohorts: the cohorts who were in 4th grade in 2009, 2011, 2013, and 2015. Below we show the average national NAEP scores in Grades 4 and 8 for these three cohorts. We average the three cohorts to create a stable baseline, or reference group.

Grade 2009 Cohort 2011 Cohort 2013 Cohort 2015 Cohort Average
Math 4 238.1 239.2 240.4 239.1 239.2
8 282.7 280.4 280.9 279.9 281.0
Reading 4 217.0 217.8 219.1 220.0 218.5
8 264.8 263.0 264.0 260.6 263.1

We calculate the amount the test scores changed between 4th and 8th grade (Average 4th to 8th Grade Growth) as the average score in 8th grade minus the average score in 4th grade. Then, to get an estimate of per-grade growth, we divide that value by 4 (Average Per-Grade Growth).

Average 4th Grade Score Average 8th Grade Score Average 4th to 8th Grade Growth Average Per-Grade Growth
Math 239.2 281.0 41.8 10.44
Reading 218.0 263.1 44.6 11.16

Now, we can use these numbers to rescale the SEDA estimates that are on the NAEP scale into grade equivalents. From the SEDA estimates we subtract the 4th-grade average score, divide by the per-grade growth, and add 4.



Example: Converting NAEP scores into grade levels

A score of 250 in 4th-grade math becomes:

  (250 – 239.2)/10.44 + 4 = 5.03.

In other words, these students score at a 5th-grade level, or approximately one grade level ahead of the national average (the reference group) in math.

A score of 200 in 3rd-grade reading becomes:

  (200 – 218.0)/11.16 + 4 = 2.39.

In other words, these students score approximately half a grade level behind the national average for 3rd graders in reading.

The three parameters: Average test scores, learning rates, and trends in test scores

We use hierarchical linear models to produce estimates of average test scores, learning rates, and trends in average test scores. The intuition behind these models is described in this section.

We have measures of the average test scores in up to 66 grade-year cells in each tested subject for each unit. The scores are adjusted so that a value of 3 corresponds to the average achievement of 3rd graders nationally, a value of 4 corresponds to the average achievement of 4th graders nationally, and so on. For each subject, these can be represented in a table like this:

Hypothetical Average Test Scores (Grade-level Equivalents), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

In this hypothetical school district, students in 3rd grade in 2009 earned an average score of 3 in this subject, indicating that students scored at a 3rd-grade level, on average (equal to the national average for 3rd graders). Students in 8th grade in 2019 scored at a Grade 9.5 level, on average (1.5 grade levels above the national average for 8th graders).

From this table, we can compute the average test score, the average learning rate, and the average test score trend for the district.

Computing the average test score

To compute the average test score across grades and years, we first use the information in the table to calculate how far above or below the national average students are in each grade and year. This entails subtracting the national grade-level average—e.g., 8 in 8th grade—from the grade-year-specific score.


Hypothetical Average Test Scores (Grade-level Equivalents Relative to National Average), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
7 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
6 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
5 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

In this representation, students in Grade 3 in 2009 have a score of 0, meaning their test scores are equal to the national average for 3rd graders. Students in Grade 8 in 2019 have a score of 1.5, meaning their scores are 1.5 grade levels above the national average for 8th graders.

We then compute the average of these values. In this example, the average difference (the average of the values in the table) is 0.75, meaning that the average grade 3–8 student in the district scores 0.75 grade levels above the national average.

Computing the average learning rate

To compute the average learning rate, we compare students’ average scores in one grade and year to those in the next grade and year (see below). In other words, we look at grade-to-grade improvements in performance within each cohort.

Hypothetical Average Test Scores (Grade-level Equivalents), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

For example, we compare the average score in Grade 3 in 2009 (3.0) to the average score in Grade 4 in 2010 (4.2). The difference of 1.2 indicates that students’ test scores are 1.2 grade levels higher in 4th grade than they were in 3rd grade, or that students’ learning rate in that year and grade was 1.2. We compute this difference for each diagonal pair of cells in the table, and then take their average. In this table, the average learning rate is also 1.2. If average test scores were at the national average in each grade and year, the average learning rate would be 1.0 (indicating that the average student’s scores improved by one grade level each grade). So, a value of 1.2 indicates that learning rates in this district are 20% faster than the national average.

Computing the trend in average test scores

To compute the average test score trend, we compare students’ average scores in one grade and year to those in the same grade in the next year (see below). In other words, we look at year-to-year improvements in performance within each grade.

Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

For example, we compare the average score in Grade 3 in 2009 (3.0) to the average score in Grade 3 in 2010 (3.1). The difference of 0.1 indicates that students’ test scores are 0.1 grade levels higher in 3rd grade in 2010 than they were in 3rd grade in 2009. We compute this difference for each horizontal pair of cells in the table, and then take their average. In this example, the average test score trend is 0.1 grade levels per year.

For technical details, see Step 9 of the technical documentation.

Data reporting

Estimates Shown on the Website

We report average test scores, learning rates, and trends in average test scores for schools, geographic districts, counties, and states in our Opportunity Explorer. To access data for the other units (e.g., administrative districts, commuting zones, and metropolitan statistical areas) or other types of estimates (e.g., estimates separately by subject, grade, and year), please visit our Get the Data page.

Suppression of Estimates

We do not report average performance, learning, and/or trend estimates if:

  • Fewer than 20 students are represented in the estimate
  • More than 20% of students in the unit took alternative assessments
  • The estimates are too imprecise to be informative

Data accuracy

We have taken several steps to ensure the accuracy of the data reported here. The statistical and psychometric methods underlying the data we report are summarized here and published in peer-reviewed journals. First, we conduct statistical analyses to ensure that our methods of converting the raw data into measures of average test scores are accurate. For example, in a small subset of school districts, students take the NAEP test in addition to their state-specific tests. Since the NAEP test is the same across districts, we can use these districts’ NAEP scores to determine the accuracy of our method of converting the state test scores to a common scale. When we do this, we find that our measures are accurate, and generally yield the same conclusions about relative average test scores as we would get if all students took the NAEP test. For more information on these analyses, see Reardon, Kalogrides & Ho (2019).

Second, one might be concerned that our learning-rate estimates do not account for students moving in and out of schools and districts. For example, if many high-achieving students move out of a school or district in the later grades and/or many low-achieving students move in, the average test scores will appear to grow less from 3rd to 8th grade than they should. This would cause us to underestimate the learning rate in a school or district.

To determine the accuracy of our learning-rate estimates, we compared them to the estimated learning rate we would get if we could track individual students’ learning rates over time. Working with research partners who had access to student-level data in three states, we determined that our learning-rate estimates are generally sufficiently accurate to allow comparisons among districts and schools. We did find that our learning-rate estimates tend to be slightly less accurate for charter schools. On average, our estimated learning rates for charter schools tend to overstate the true learning rates in charter schools in these three states by roughly 5%. This is likely because charter schools have more student in- and out-mobility than traditional public schools. It suggests that learning-rate comparisons between charter and traditional public schools should be interpreted with some caution. For more information on these analyses, see Reardon et al. (2019).

Third, we have constructed margins of error for each of the measures of average test scores, learning rates, and trends in average scores. On the explorer, we show 95% confidence intervals. In the downloadable data, we provide standard errors, which can be used in statistical analyses and comparisons. Interested users can download data files that include these standard errors from our Get the Data page.

Fourth, we do not release any estimates on the website or in the downloadable data files where the margin of error is large. In places where there are a small number of students (or a small number of students of a given subgroup), the margin of error is sometimes large; we do not report data in such cases. Margins of error of school learning rates are also large when there are only two or three grade levels in a school; as a result, roughly one-third of schools are missing learning rates on the website.