Methods

How SEDA produced estimates of student performance that are comparable across places, grades, and years.

A simplified description of the methods used to create the SEDA 2023 data shown in the 2019-2023 Education Recovery Explorer is described on this page. For more detail, see the SEDA 2023 Technical Documentation.

What is SEDA 2023?

SEDA 2023 is a special release of the Stanford Education Data Archive that can be viewed on the 2019-2023 Education Recovery Explorer. This release is designed to provide insight into how school district average achievement in 2023, three years after the onset of the COVID-19 pandemic, compares to achievement in 2022, two years after the onset of the pandemic, and 2019, the year prior to the pandemic.

Source Data & Construction

The construction of the SEDA 2023 data follows similar steps to that of SEDA 4.1 described in the 2009-2019 Opportunity Explorer Methods. However, there are four key differences we discuss here:

Source Data

The state proficiency data used to construct the SEDA 2023 test score estimates come from two sources. The first source is the U.S. Department of Education’s EDFacts data, which is the source data used to construct SEDA 4.1. However, because EDFacts data was not yet available for 2022 or 2023, we used publicly released 2022 and 2023 data from state websites. Notably, not all states released 2022 and 2023 proficiency data to date and some state-released data was not sufficiently complete for our use, therefore only a subset of U.S. states is included in the current data.

Like SEDA 4.1, because different states use different tests and proficiency thresholds, the test score estimates derived from the state data sources were not readily comparable across states, grades, or years. Therefore, we also draw on the National Assessment of Educational Progress (NAEP) 2019 and 2022 administrations in 4th and 8th grade to link the estimates to a scale that is comparable among states and over grades and years.

Definition of a School District

In SEDA 2023, we report estimates for administrative school districts. Administrative school districts operate sets of public and charter schools. The schools operated by each school district are identified using the National Center for Education Statistics (NCES) school and district identifiers. Most commonly, administrative school districts operate local public schools within a given physical boundary; these are what we refer to as “traditional public school districts.” There are specialized administrative districts, like charter school and virtual school districts, that do not have a physical boundary. These districts will not appear on our maps.

Administrative districts differ from the geographic districts used in SEDA 4.1. The key difference is that for geographic school districts, we “reassign” charter schools to the district in which they are physically located (regardless of the entity that operates the schools). We do no reassignment of charter schools in producing the administrative district estimates; charter schools are attached to the traditional public or charter district that operates them. For more information on geographic districts, we refer you to the SEDA 4.1 Technical Documentation and the 2009-2019 Opportunity Explorer Methods.

The choice to use administrative districts in SEDA 2023 is two-fold. First, one of the aims of SEDA 2023 is to help school districts understand their learning recovery needs. Administrative districts have authority to set policy for their schools, as such it is most useful for the estimates to reflect only the schools under their operation. Second, to construct geographic school districts, we need data for individual charter schools. While many states report such data in 2023, data for many schools is often suppressed due to the small numbers of students taking assessments. Because of this we cannot reliably construct geographic school district estimates for the 2022 and 2023 school years.

Linking

For 2019 and 2022, we use the proficiency threshold linking methodology described here. At the end of this step, we have proficiency thresholds in 2019 and 2022 that are linked to a common scale—the NAEP scale.

We must use a different process in 2023, because we do not have 2023 NAEP data. Instead, for 2023, we use the 2022 linked proficiency thresholds. For this approach to enable accurate comparisons of 2022-2023 test score changes among districts in the same state, states’ test score scales and proficiency thresholds must be comparable from 2022 to 2023. We exclude 2023 data for states where we found evidence of changes in state assessments. For this approach to enable accurate comparisons of 2022-2023 test score changes among districts in different states, we also assume that state test score trends from 2022 to 2023 are comparable to unmeasured NAEP trends from 2022 to 2023.

Available Estimates and Scales

In the 2019-2023 Education Recovery Explorer and the downloadable files, we provide the following district-level estimates by subgroup (where data are available):

  • 2019-2022 change in average math scores
  • 2019-2022 change in average reading scores
  • 2022-2023 change in average math scores
  • 2022-2023 change in average reading scores
  • 2019-2023 change in average math scores
  • 2019-2023 change in average reading scores

We report all test score changes in grade levels. In this scale, each unit is interpretable as 1 grade level. For example:

  • A 2019-2022 change in average math scores of -1 grade levels means that students in 2022 scored, on average, 1 grade level below their 2019 counterparts.

Grade levels are defined using the 2019 national NAEP 4th and 8th grade data (“the 2019 norm group”). We approximate the average number of NAEP points student test scores differ per grade in each subject using the 4th and 8th grade data. We then rescale the NAEP point scale estimates using those parameters.

Note that SEDA 2023 grade levels are not equivalent to SEDA 4.1 grade levels. In SEDA 4.1, the per-grade growth is defined by a 4-cohort norm group (rather than the 2019 norm group, described above). For more details on how we calculate SEDA 4.1 growth, we refer you to the SEDA 4.1 Technical Documentation and the 2009-2019 Opportunity Explorer Methods. For those using our downloadable test score files, the estimates in the explorer at the Empirical Bayes (EB) estimates.

Interpretation and Data Accuracy

We think of changes in average scores as reflective of changes in the average educational opportunities available to students between two time points. For example, if the 2019-2022 change in average math or reading achievement is negative, it means that students in 2022 in that district scored lower, on average, than students in 2019 in that district. This suggests that students in 2022 had fewer educational opportunities (in their schools, homes, neighborhoods, and beyond) to learn to date than the student population in 2019.

Changes in test scores during and after the pandemic may be due to a variety of mechanisms. The test score data in SEDA 2023 may only enable understanding of some of these mechanisms. To provide context for interpreting the data, we include data flags and margins of error.

Population change flag

In many districts, the student population shifted from 2019 to 2022; the data for 2023 is not yet available. Using enrollment data from the CCD, we flag any districts where: (a) the total number of students enrolled changed by more than 20%; and/or (b) the percentage of students from any racial group changed by more than 5 percentage points. For details on the calculation of this population change flag, see the SEDA 2023 technical documentation.

Margin of error

In some cases, estimates are imprecise, such that a change in average scores is not statistically distinguishable from zero. We have constructed margins of error (“standard errors”) for each of the estimates to help users identify such cases. We also do not show any estimates on the website where the margin of error is large. For those downloading the data and using it in analysis, standard errors are included in the downloadable data files.

Data shown in the Opportunity Explorer are from SEDA 5.0. A simplified description of the SEDA 5.0 methodology is provided here. For more detail on how we construct the estimated test score parameters, please see the SEDA 5.0 Technical Documentation. For more detail on the statistical methods that we use, as well as information the accuracy of the estimates, please see our technical papers. Not all data from SEDA 5.0 is visualized on the Opportunity Explorer; the complete data are downloadable on the Get the Data page.

Data shown in the 2019-2023 Education Recovery Explorer are unique estimates (downloadable as “SEDA 2023” on the Get the Data page). For a detailed description of the difference between the most recent version of SEDA and the SEDA 2023 data, view the SEDA 2023 Special Release Technical Documentation.

Background

Federal law requires all states to test all students in grades 3-8 each year in math and RLA (commonly called “accountability” testing). It also requires that states make the aggregated results of those tests public.

All states report test results for schools and school districts to the Department of Education’s National Center for Education Statistics (NCES), which houses this data in the EDFacts database. There are two versions of this data: a public version, available on their website, and a restricted version, available by request. In the public files, data for small places or small subgroups are not reported to ensure the privacy of the students. In contrast, the restricted files contain all data, regardless of the number of students within the school/district or subgroup. We use the restricted files to estimate test score distributions in education policy relevant units (e.g., schools, geographic districts, administrative school districts, counties, metropolitan statistical areas, commuting zones, and states). The school, geographic district, county, and state data are visualized on the Opportunity Explorer; all other aggregations can be found in the downloadable data files on the Get the Data page. Details on the data construction process can be found below.

Challenges Working with Proficiency Data

While there is a substantial amount of data from every state available in EDFacts, there are four key challenges when using these data:

  1. States provide only “proficiency data”: the count of students at each of the proficiency levels (sometimes called “achievement levels” or “performance levels”). The levels represent different degrees of mastery of the subject-specific grade-level material. Levels are defined by score “thresholds” (sometimes called “cut scores”), which are set by experts in the field. Scoring above or below different thresholds determines placement in a specific proficiency level. Common levels include “below basic,” “basic,” “proficient,” and “advanced.” An example is shown below.

    Test Score Proficiency Level Description
    200-500 Below Basic Inadequate performance; minimal mastery
    501-600 Basic Marginal performance; partial mastery
    601-700 Proficient Satisfactory performance; adequate mastery
    701-800 Advanced Superior performance; complete mastery
  2. Most states use their own test and define “proficiency” in different ways, meaning that we cannot directly compare test results in one state to those in another. Proficient in one state/grade/year/subject is not comparable to proficient in another.

    Consider two states that use the same test, which is scored on a scale from 200 to 800 points. Each state sets its own threshold for proficiency at different scores.

    State A: Higher threshold for proficiency

    Scale of test scores relative to grade level expectations, proficiency cutoff at 600
    State B: Lower threshold for proficiency

    Scale of test scores relative to grade level expectations, proficiency cutoff at 500

    Imagine 500 students take the test. The results are as follows: 50 students score below 400 on the exam; 100 score between 400 and 500; 200 score between 500 and 600; 50 score between 600 and 650; 50 score between 650 and 700; and 50 score above 700. If we use State A’s thresholds for assignment to categories, we find that 150 students are proficient. However, if we use State B’s thresholds, 350 students are proficient.

    Not Proficient Proficient
    State Level 1 Level 2 Level 3 Level 4
    A 150 200 100 50
    B 50 100 250 100

    In practice, this means that students in State B may appear to have higher “proficiency” rates than those in State A—even if their true achievement patterns are identical! Using the proficiency data without accounting for differing proficiency thresholds may lead to erroneous conclusions about the relative performance of students in different states.

    This problem is more complicated than the example suggests, because most states use different tests with material of varying difficulty and scores reported on different scales. Therefore, we cannot compare proficiency, nor can we compare students’ test scores between states.

  3. Even within a state, different tests are used in different grade levels. This means that we cannot readily compare the performance of students in 4th grade in one year to that of students in 5th grade in the next year. Therefore, we cannot measure average learning rates across grades.

  4. States may change the tests they use over time. This may result from changes in curricular standards; for example, the introduction of the Common Core State Standards led many states to adopt different tests. These changes make it hard to compare average performance in one year to that of the next. Therefore, we cannot readily measure trends in average performance over time.

SEDA methods: Addressing the challenges

While these challenges are substantial, they are not insurmountable. The EOP team has developed methods to address these challenges in order to produce estimates of students’ average test scores, average learning rates across grades, and trends over time in each unit (e.g., school, geographic district, etc.). All estimates are comparable across states, grades, and years.

Below we describe the raw data used to create SEDA and how we:

  1. Estimate the location of each state’s proficiency thresholds
  2. Place states’ proficiency thresholds on the same scale
  3. Estimate mean test scores from the raw data and the threshold estimates
  4. Scale the estimates so they are measured in terms of grade levels
  5. Scale the estimates to grade equivalents
  6. Estimate average scores, learning rates, and trends in average scores

Raw data

The data used to develop SEDA come from the EDFacts restricted-use files, provided to our team via a data-use agreement with NCES.

EDFacts provides the counts of students scoring in proficiency levels for nearly every school in the U.S.

From the data provided, we use data for:

  • All 50 states and DC
  • School years 2008-09 through 2018-19
  • Grades 3-8
  • Math and RLA
  • Various subgroups of students: all students, racial/ethnic subgroups, gender subgroups, and economic subgroups

In the data, schools are identified by NCES IDs, which can be linked to other data sources such as the Common Core of Data (CCD). Following the CCD’s guidelines, we create a stable school identifier we call the SEDA school ID. To create stable geographic district identifiers, called SEDA LEA IDs, we use the school’s most recent geographical information and the most recent school district the school was observed in based on the 2019 elementary and unified school district boundaries from EDGE. Similarly, we use the most recent administrative district, county, metropolitan area, commuting zone, and state for each school to ensure stability over time. Note that SEDA includes special education schools (as defined by CCD) in school, administrative district, and state level data only (i.e., they are not included in geographic district, county, metropolitan statistical area, or commuting zone level data). You can also use the published crosswalk on our Get the Data page to obtain the stable or time-varying geographical information for years 2009-2019.

Below is a mock-up of the proficiency data format we use in school estimation:

Number of Students Scoring at
SEDA School ID Subgroup Subject Grade Year (Spring) Proficiency Level 1 Proficiency Level 2 Proficiency Level 3 Proficiency Level 4
11111122222 All students RLA 3 2009 10 50 100 50
777777755555 All students RLA 3 2009 5 30 80 40

Estimating the location of each state’s proficiency thresholds

We use a statistical technique called heteroskedastic ordered probit (HETOP) modeling to estimate the location of the thresholds that define the proficiency categories within each state, subject, grade, and year. We estimate the model using all the counts of students in each geographic school district within a state-subject-grade-year.

A simplified description of this method follows. We assume the distribution of test scores in each school district is bell-shaped. For each state, grade, year, and subject, we then find the set of test-score thresholds that meet two conditions: 1) they would most closely produce the reported proportions of students in each proficiency category; and 2) they represent a test-score scale in which the average student in the state-grade-year-subject has a score of 0 and the standard deviation of scores is 1.



Example: State A, Grade 4 reading in 2014–15

In the example below, there are three districts in State A. The table shows the number and proportion of scores in each of the state’s four proficiency categories. District 1 has more lower-scoring students than the others; District 3 has more higher-scoring students. Assuming each district’s test-score distribution is bell-shaped, we determine where the three thresholds would be located that would yield the proportions of students in each district shown in the table. In this example, the top threshold is one standard deviation above the statewide average score. At this value, we would expect 0% of students from District 1, 16% of students from District 2, 20% of students from District 3 to score in the top proficiency category.

Distribution table Distribution chart

Placing the proficiency thresholds on the same scale

As discussed above, we cannot compare proficiency thresholds across places, grades, and years because states use different tests with completely different scales and set their proficiency thresholds at different levels of mastery. Knowing that a proficiency threshold is one standard deviation above the state average score does not help us compare proficiency thresholds across places, grades, or years because we do not know how a state’s average score in one grade and year compares to that in other states, grades, and years.

Luckily, we can use the National Assessment of Educational Progress (NAEP), a test taken in every state, to place the thresholds on the same scale. This step facilitates comparisons across states, grades, and years.

A random sample of students in every state takes the NAEP assessment in Grades 4 and 8 in math and RLA in odd years (e.g., 2009, 2011, 2013, 2015, 2017, and 2019). From NAEP, then, we know the relative performance of states on the NAEP assessment. In the grades and years when NAEP assessments were not administered to students, we average the scores in the grades and years just before and just after to obtain estimates for untested grades, subjects, and years.

We use the states’ NAEP results in each grade, year, and subject to rescale the thresholds to the NAEP scale. For each subject, grade, and year, we multiply the thresholds by the state’s NAEP standard deviation and add the state’s NAEP average score.



Example: State A, Grade 4 reading in 2014–15

The average score and standard deviation of State A NAEP scores in Grade 4 reading in 2014–15 were:

  • Mean NAEP Score: 200
  • Standard Deviation of NAEP Score: 40

We have three thresholds:

  • Threshold 1: -0.75
  • Threshold 2: 0.05
  • Threshold 3: 1.0

As an example, let’s convert Threshold 1 onto the NAEP scale. First, we multiply by 40. Then, we add 200:

(-0.75 x 40.0) + 200 = 170

This yields a new “linked” Threshold 1 of 170. The table below shows all three linked thresholds.

Threshold Original Linked (on NAEP Scale)
1 -0.75 170
2 0.05 202
3 1.0 240

We repeat this step for every state in every subject, grade, and year. The result is a set of thresholds for every state, subject, grade, and year that are all on the same scale, the NAEP scale.

For more information, see Reardon, Kalogrides & Ho (2019).

Estimating the mean from proficiency count data

The next step of our process is to estimate the mean test score in each unit for all students and by student subgroups (gender, race/ethnicity, and economic disadvantage). To do this, we estimate heteroskedastic ordered probit models using both the raw proficiency count data (shown above) and the linked thresholds from the prior step. This method allows us to estimate the mean standardized test score in each unit for every subgroup, subject, grade, and year on the same scale.

For more information, see Steps 5 and 6 in the technical documentation; Reardon, Shear, et al. (2017); and Shear and Reardon (2020).

For more information, see Reardon, Shear, et al. (2017); and Shear and Reardon (2020).

Scaling the estimates to grade equivalents

On the website, we report all data in grade levels, or what we call the Grade (within Cohort) Standardized (GCS) scale. On this scale, users can interpret one unit as one grade level. The national average performance is 3 in Grade 3, 4 in Grade 4, and so on.

To convert our estimates from the NAEP scale into grade levels, we first approximate the average amount student test scores grow in a grade on NAEP. To do this, we use data from four national NAEP cohorts: the cohorts who were in 4th grade in 2009, 2011, 2013, and 2015. Below we show the average national NAEP scores in Grades 4 and 8 for these three cohorts. We average the three cohorts to create a stable baseline, or reference group.

Grade 2009 Cohort 2011 Cohort 2013 Cohort 2015 Cohort Average
Math 4 238.1 239.2 240.4 239.1 239.2
8 282.7 280.4 280.9 279.9 281.0
Reading 4 217.0 217.8 219.1 220.0 218.5
8 264.8 263.0 264.0 260.6 263.1

We calculate the amount the test scores changed between 4th and 8th grade (Average 4th to 8th Grade Growth) as the average score in 8th grade minus the average score in 4th grade. Then, to get an estimate of per-grade growth, we divide that value by 4 (Average Per-Grade Growth).

Average 4th Grade Score Average 8th Grade Score Average 4th to 8th Grade Growth Average Per-Grade Growth
Math 239.2 281.0 41.8 10.44
Reading 218.0 263.1 44.6 11.16

Now, we can use these numbers to rescale the SEDA estimates that are on the NAEP scale into grade equivalents. From the SEDA estimates we subtract the 4th-grade average score, divide by the per-grade growth, and add 4.



Example: Converting NAEP scores into grade levels

A score of 250 in 4th-grade math becomes:

  (250 – 239.2)/10.44 + 4 = 5.03.

In other words, these students score at a 5th-grade level, or approximately one grade level ahead of the national average (the reference group) in math.

A score of 200 in 3rd-grade reading becomes:

  (200 – 218.0)/11.16 + 4 = 2.39.

In other words, these students score approximately half a grade level behind the national average for 3rd graders in reading.

We use hierarchical linear models to produce estimates of average test scores, learning rates, and trends in average test scores. The intuition behind these models is described in this section.

We have measures of the average test scores in up to 66 grade-year cells in each tested subject for each unit. The scores are adjusted so that a value of 3 corresponds to the average achievement of 3rd graders nationally, a value of 4 corresponds to the average achievement of 4th graders nationally, and so on. For each subject, these can be represented in a table like this:

Hypothetical Average Test Scores (Grade-level Equivalents), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

In this hypothetical school district, students in 3rd grade in 2009 earned an average score of 3 in this subject, indicating that students scored at a 3rd-grade level, on average (equal to the national average for 3rd graders). Students in 8th grade in 2019 scored at a Grade 9.5 level, on average (1.5 grade levels above the national average for 8th graders).

From this table, we can compute the average test score, the average learning rate, and the average test score trend for the district.

Computing the average test score

To compute the average test score across grades and years, we first use the information in the table to calculate how far above or below the national average students are in each grade and year. This entails subtracting the national grade-level average—e.g., 8 in 8th grade—from the grade-year-specific score.


Hypothetical Average Test Scores (Grade-level Equivalents Relative to National Average), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
7 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
6 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
5 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

In this representation, students in Grade 3 in 2009 have a score of 0, meaning their test scores are equal to the national average for 3rd graders. Students in Grade 8 in 2019 have a score of 1.5, meaning their scores are 1.5 grade levels above the national average for 8th graders.

We then compute the average of these values. In this example, the average difference (the average of the values in the table) is 0.75, meaning that the average grade 3–8 student in the district scores 0.75 grade levels above the national average.

Computing the average learning rate

To compute the average learning rate, we compare students’ average scores in one grade and year to those in the next grade and year (see below). In other words, we look at grade-to-grade improvements in performance within each cohort.

Hypothetical Average Test Scores (Grade-level Equivalents), By Grade and Year
Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

For example, we compare the average score in Grade 3 in 2009 (3.0) to the average score in Grade 4 in 2010 (4.2). The difference of 1.2 indicates that students’ test scores are 1.2 grade levels higher in 4th grade than they were in 3rd grade, or that students’ learning rate in that year and grade was 1.2. We compute this difference for each diagonal pair of cells in the table, and then take their average. In this table, the average learning rate is also 1.2. If average test scores were at the national average in each grade and year, the average learning rate would be 1.0 (indicating that the average student’s scores improved by one grade level each grade). So, a value of 1.2 indicates that learning rates in this district are 20% faster than the national average.

Computing the trend in average test scores

To compute the average test score trend, we compare students’ average scores in one grade and year to those in the same grade in the next year (see below). In other words, we look at year-to-year improvements in performance within each grade.

Grade 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
8 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5
7 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
6 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
5 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
3 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0

For example, we compare the average score in Grade 3 in 2009 (3.0) to the average score in Grade 3 in 2010 (3.1). The difference of 0.1 indicates that students’ test scores are 0.1 grade levels higher in 3rd grade in 2010 than they were in 3rd grade in 2009. We compute this difference for each horizontal pair of cells in the table, and then take their average. In this example, the average test score trend is 0.1 grade levels per year.

For technical details, see Step 9 of the technical documentation.

Data reporting

Estimates Shown on the Website

We report average test scores, learning rates, and trends in average test scores for schools, geographic districts, counties, and states in our Opportunity Explorer. To access data for the other units (e.g., administrative districts, commuting zones, and metropolitan statistical areas) or other types of estimates (e.g., estimates separately by subject, grade, and year), please visit our Get the Data page.

Suppression of Estimates

We do not report average performance, learning, and/or trend estimates if:

  • Fewer than 20 students are represented in the estimate
  • More than 20% of students in the unit took alternative assessments
  • The estimates are too imprecise to be informative

Data accuracy

We have taken several steps to ensure the accuracy of the data reported here. The statistical and psychometric methods underlying the data we report are summarized here and published in peer-reviewed journals. First, we conduct statistical analyses to ensure that our methods of converting the raw data into measures of average test scores are accurate. For example, in a small subset of school districts, students take the NAEP test in addition to their state-specific tests. Since the NAEP test is the same across districts, we can use these districts’ NAEP scores to determine the accuracy of our method of converting the state test scores to a common scale. When we do this, we find that our measures are accurate, and generally yield the same conclusions about relative average test scores as we would get if all students took the NAEP test. For more information on these analyses, see Reardon, Kalogrides & Ho (2019).

Second, one might be concerned that our learning-rate estimates do not account for students moving in and out of schools and districts. For example, if many high-achieving students move out of a school or district in the later grades and/or many low-achieving students move in, the average test scores will appear to grow less from 3rd to 8th grade than they should. This would cause us to underestimate the learning rate in a school or district.

To determine the accuracy of our learning-rate estimates, we compared them to the estimated learning rate we would get if we could track individual students’ learning rates over time. Working with research partners who had access to student-level data in three states, we determined that our learning-rate estimates are generally sufficiently accurate to allow comparisons among districts and schools. We did find that our learning-rate estimates tend to be slightly less accurate for charter schools. On average, our estimated learning rates for charter schools tend to overstate the true learning rates in charter schools in these three states by roughly 5%. This is likely because charter schools have more student in- and out-mobility than traditional public schools. It suggests that learning-rate comparisons between charter and traditional public schools should be interpreted with some caution. For more information on these analyses, see Reardon et al. (2019).

Third, we have constructed margins of error for each of the measures of average test scores, learning rates, and trends in average scores. On the explorer, we show 95% confidence intervals. In the downloadable data, we provide standard errors, which can be used in statistical analyses and comparisons. Interested users can download data files that include these standard errors from our Get the Data page.

Fourth, we do not release any estimates on the website or in the downloadable data files where the margin of error is large. In places where there are a small number of students (or a small number of students of a given subgroup), the margin of error is sometimes large; we do not report data in such cases. Margins of error of school learning rates are also large when there are only two or three grade levels in a school; as a result, roughly one-third of schools are missing learning rates on the website.