EEF Blog: Evaluation of whole-school interventions – how hard can it be?

Triin Edovald, Head of Evaluation at the Education Endowment Foundation, asks what’s so special about evaluating whole-school interventions?

If we’ve learnt one thing from six years of the EEF, it’s that robust evaluation is hard. Running randomised controlled trials to find out the impact of straightforward interventions - like extending the school day or targeted catch-up programmes - is complicated. Evaluating complex interventions like whole-school leadership programmes is even more challenging.

Complex interventions often have many different components (and interactions between those components) and many different outcomes. The intervention is often tailored in the delivery too.

There is no single view of what complexity means in discussions of evaluation theory and practice. However, the conceptualisation by Glouberman and Zimmerman (2002) that has been further discussed by Rogers (2008) emphasises the differences between what is complicated (multiple components) and what is complex (emergent).

When it comes to whole-school programmes that focus on leadership and school improvement, then these can often be described as both complicated and complex due to, for example, multiple components and multiple simultaneous (and alternative) causal strands as well as tipping points where at critical levels, a small change can make a big difference, and emergent outcomes (Rogers, 2008).

Can’t we just do larger RCTs over a longer period of time?

It’s been questioned in health research whether health improvement programmes fit with MRC guidance (Medical Research Council, 2006) on evaluating complex interventions and argued that argue that randomised controlled trials (RCTs) are not always practical for the evaluation of new health policy and programmes (MacKenzie et al., 2010).

Even though RCTs are the best way of finding out whether a programme works or not, complex interventions such as whole-school leadership programmes don’t lend themselves easily to RCTs. The more complex an intervention and the period over which the outcomes are to be measured, the more challenging an RCT would be to run.

However, effective leadership is likely to be a key element of successful whole-school approaches. As one of our aims is actively to encourage more high-quality applications focused on leadership, we have set out to learn more about the best ways to evaluate their whole-school impact.

What are the things to consider when evaluating such programmes?

As a first step it’s important to question what design to use and why. An RCT may be preferable and also feasible but there are certainly circumstances under which a quasi-experimental design may be appropriate. Whichever the trial design, there are a range of methodological, analytical and certainly practical aspects that evaluators need to consider.

To support us and our evaluators on this journey of evaluating complex whole-school and leadership programme we commissioned a group of researchers from UCL Institute of Education, Behavioural Insights Team and Education Datalab to undertake a review of the evaluation such interventions.

The report by a group of researchers from UCL Institute of Education, Behavioural Insights Team and Education Datalab explores the issues inherent in evaluating complex whole-school interventions and considers both methodological and practical aspects of undertaking such evaluations.