Education Endowment Foundation:EEF Blog: Is quick, cost-effective and robust evaluation of education interventions possible?

EEF Blog: Is quick, cost-effective and robust evaluation of education interventions possible?

Blog •6 minutes •

That’s the question the EEF, working with FFT Education Datalab, have set out to address. Here, the EEF’s Guillermo Rodriguez-Guzmán, Camila Nevill and Emily Yeomans, together with FFT’s Laura James, explore what we’re learning from our pilot of a new Education Data Service.

Existing routes to evaluate the impact of education interventions vary from very robust, yet costly, evaluations usually using randomised designs (like the 150 so far funded by the EEF), through to low-cost, lower security approaches, such as comparing results to national or local averages.

However, there is a paucity of options in between, offering quick, cost-effective and fairly robust estimates of the impact of education interventions.

In 2015, the Department for Education asked the EEF to explore the creation of an Education Data Service’ (EDS), like the Ministry of Justice Datalab, to fill this gap. Following a competitive tender process, the EEF began working with FFT Education Datalab to pilot the EDS

Here we discuss its purpose and some of the early results

Piloting the Education Data Service (EDS)

The EDS aims to create estimates of impact by identifying schools working with specific programmes and then, using the National Pupil Database (NPD) and other datasets, creating groups of schools with similar observable characteristics. This approach offers a transparent and methodologically-sound way to compare the outcomes of treatment schools’ and schools with similar characteristics (‘matched comparison schools’) to estimate the impact of those programmes

Ultimately, the results from the EDS could be used in two ways:

  1. Internally by EEF to inform its grant-making decisions by providing early evidence of promise to support investments in more rigorous evaluations through efficacy and effectiveness studies generally using a randomised controlled trial.
  2. Externally to provide tentative evidence of impact as a separate strand of the EEF’s work to generate evidence that can be used by teachers and senior leaders. These results might supplement the evidence provided by EEF evaluations (with relevant caveats around the security of these results, which are discussed below).

During this pilot we are exploring the methodological needs, as well as the practical and strategic implications, of a fully implemented EDS. We are currently half-way through. A feasibility report will be available in the second half of 2020

Early results from the Education Data Service (EDS)

Consistent with the EEF’s commitment to transparency, and to avoid publication bias, the EEF anticipates that all results from a fully functioning EDS will be made publicly available.

In this exploratory pilot, however, organisations that volunteered to take part could choose whether to publish their results. Two organisations decided to publish results for their programmes – Mathematics Mastery and Magic Breakfast – available on FFT Education Datalab’s website.

Mathematics Mastery

This is a whole-school approach to teaching mathematics that aims to deepen pupils’ conceptual understanding of key mathematical concepts. Compared to traditional curricula, fewer topics are covered in more depth and greater emphasis is placed on problem-solving and on encouraging mathematical thinking

This approach was previously tested by two EEF trials which, combined, found a positive impact equivalent to +1 month’s additional progress. The results for the study conducted in primary schools found positive impacts equivalent to +2 months’ additional progress.

The new EDS evaluation suggests that pupils in primary schools which used Mathematics Mastery were more likely to be working beyond the expected level’, equivalent to +2 months’ additional progress. However, the proportions of pupils working at the expected level’ were similar across the groups. These effects varied by the length of school participation, although no clear trend could be identified.

These new EDS results are well-aligned with the EEF’s previous findings and strengthen the evidence that Mathematics Mastery can have positive impacts. Schools interested in the programme should consider the fact that it seems to be particularly beneficial for high-attaining pupils and, as with all programmes, should monitor the impact in their context.

Magic Breakfast

Magic Breakfast supports schools to offer a free, universal, before-school breakfast club with the aim of increasing the number of children who eat a healthy breakfast. This programme was also previously evaluated by an EEF-funded effectiveness study in 2015 that found favourable impacts on outcomes for primary-aged children.

The new EDS evaluation explores differences in the Key Stage 2 outcomes of schools which took part in that study two and three years after the implementation of the programme (in 2017 and 2018). A key limitation is that we cannot identify whether schools continued to implement Magic Breakfast from 2016 to 2018; or, if they did, how they did so. Consequently, this study would have lower security than a typical’ EDS study.

That caveat in place, the EDS findings suggest that pupils in schools which implemented Magic Breakfast in 2015 made the equivalent of +2 months’ additional progress in Key Stage 2 maths and reading in 2017; and the equivalent of +3 months’ additional progress in Key Stage 2 maths and reading in 2018.

Those are our best estimates of the impact. However, the data for 2017 is also reasonably consistent with a very small negative effect or a larger positive effect (ie, the finding was not statistically significant).

Accepting all these limitations, this new EDS data does nonetheless provide additional evidence that Magic Breakfast’s programme can have a positive impact on pupil attainment

How confident can we be in the findings from the Education Data Service (EDS)?

The EEF has a classification system for the security of our evaluations which rates the strength of single studies from 0 to 5 padlocks’.

Generally, results from the EDS are likely to be the equivalent of 3 padlocks’on the EEF scale when they account for observable confounding factors. The EEF currently considers evaluations which have 3 padlocks’ to have moderate-to-high security. Awarding these studies 3 padlocks’ is in line with recent literature suggesting that these methods can provide credible estimates of impact for educational programmes (see references below: Cook et al. 2008, Wong et al. 2016; Wong and Steiner, 2018; Fenton Villar and Waddington, 2019; Weidmann and Miratrix, 2019).

In cases where the approach is not successful in creating a similar comparison group, results could be less secure and will be described accordingly. There could be other reasons that would reduce the security of EDS studies, such as described above with Magic Breakfast, where it wasn’t possible to identify what schools were doing in a given year.

The EDS approach has some limitations. The major one is that results may be affected by selection bias. That is, schools may have been selected, or self-selected, to join the programme based on criteria that aren’t recorded in the NPD. As these criteria aren’t available in the data, the matching process is not able to account for them, and fundamental differences may remain between the treatment schools and the matched comparison schools

Analysis is also dependent on the data kept on participating schools by the organisation responsible for running the programme. If this data is incomplete or inaccurate for any reason, this will affect the reliability of the results

Finally, it is possible that schools in the matched comparison group may have taken part in some other, similar programme. Assuming that the other programme had some positive effect, this could make the effect of the programme being tested appear smaller than it really is

Next steps

FFT Education Datalab will present a feasibility report to the EEF in mid-2020, with a decision made later that year on whether and how EEF will implement the Education Data Service


Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within‐study comparisons. Journal of Policy Analysis and Management, 27(4), 724 – 750

Fenton Villar, P. & Waddington, H. (2019). Within Study Comparisons and Risk of Bias in International Development: Systematic Review and Critical Appraisal. Campbell Systematic Reviews, 15(1).

Weidmann, B. & Miratrix, L. (2019). Lurking inferential monsters? Quantifying bias in non-experimental evaluations of school programs. Under Review.

Wong, V., & Steiner, P. M. (2018). Designs of Empirical Evaluations of Non-experimental Methods in Field Settings. Evaluation Review, 42 (2), 176 – 213.

Wong, V., Valentine, J. C., & Miller-Bains, K. (2016). Empirical Performance of Covariates in Education Observational Studies. Journal of Research on Educational Effectiveness, 10(1): 207 – 236.


Minor edits were made to the Magic Breakfast section of this blog on 3 Dec 2019 to clarify the distinction between the EEF’s 2015 effectiveness study and the new EDS study; and that the EDS study of Magic Breakfast would have lower security than a typical’ EDS study.