James Turner on the lessons learnt from five years of EEF evaluations – and the challenges we still face
Think of the EEF and think of evidence: robust evaluation has become the hallmark of our work. But the golden thread running through what we do is actually partnership. As a small organisation, it is only by being part of a broad coalition of schools, universities and charities that we’ve been able to work together to create a step change in the way that evidence is used in education and programmes in schools are evaluated.
School leaders, teachers and policymakers are rightly demanding more robust evidence of effectiveness and the sector has responded accordingly. An indication of the scale and nature of the shift is that before the EEF was set up in 2011, only a handful of randomised control trials (RCTs) had been conducted in English schools: the number now stands at over 100.
This is good news. RCTs are the bedrock of the EEF approach and we believe they remain our best bet in providing useful, comparable results which answer the real life questions head teachers ask about where to spend their marginal pound.
But that is not to say we think RCTs are a panacea, or that we can afford to be complacent about what we do. Far from it. As a charity genuinely committed to producing strong and accessible research (and principally funded by public money) it is right for us to critically examine what we have learnt, what we should do better – and make the case for why we believe our overall approach to evaluation is still the right one.
As a starter, it is worth restating that the EEF’s position has never been about individual RCTs in isolation: an entire way of teaching and learning shouldn’t stand or fall on one study, however good it is. The recommendations we make to schools and the decisions we take about what trials to commission are rooted in the wider evidence base captured in the Sutton Trust / EEF Teaching and Learning Toolkit. This summarises the fruits of thousands of studies from all over the world. Individual RCTs help to fill gaps in this overall body of knowledge, often where there is little or no robust UK-based research. In some cases, a number of EEF-commissioned studies have actually changed the emphasis of the overall evidence-base – for example, on the impact of teaching assistants.
Importantly, the evaluation of individual programmes also helps us to identify the most effective ways of implementing an approach in schools. This is critical if the evidence we are gathering is to be actionable. Head teachers not only need to know the headline that, say, approaches which use meta-cognition are one of the most cost effective ways to boost results; they also need to know how, practically, to implement such an approach in their schools through clearly-defined programmes which are most likely to result in positive benefits.
Of course individual studies are only useful on both these counts if they are of high quality. For this reason we introduced the EEF padlock system, our way of summarising for practitioners the level of confidence we have in the results of individual trials. This is far from straightforward – there is a huge amount of complexity involved in the interpretation of trial results – but it is vital if we expect schools to make intelligent use of our work. We have also set ourselves a deliberately high bar. So far 60% of our studies have been awarded three padlocks or more out of five – and our aim is that at least 80% of our studies will be at this level or higher.
We are equally keen for other researchers to rate their work in a similarly transparent way so that those in the education sector can become critical consumers. A quick and dirty analysis of some of the studies which gained attention in the media last year suggests most would barely register on the EEF scale. They created a lot of heat, but not necessarily much light.
But even with the best intentions, RCTs can be compromised. A recent report from America analysed a number of maths trials conducted to the US What Works Centre’s evidence standards, and identified a number of threats to their usefulness, covering factors like the independence of the evaluation team, the implementation of the programme and the conditions of the control group. We believe the EEF has generally managed to avoid these threats (Milly Nevill, the EEF’s evaluation manager addresses them directly in her blog here), but the study is food for thought as we look to refine our own processes further and review our security ratings.
The challenges don’t end there, nor do the lessons we are learning from this steep curve. For instance, it is critical we look beyond the headline impact of a trial to understand why a programme did or didn’t work – and whether some groups of pupils benefited more than others. So we’re commissioning more extensive process analysis in our new studies, and becoming cleverer about how to power trials to detect what works for whom.
And as EEF-commissioned RCTs report their findings, we are encountering a host of new challenges around replicating the most positive results as the programmes grow and reach more children. In all of this we also need to be alive to the possibility that initially positive impacts wash out in the medium or long term- which is why our data archive (which will follow the educational trajectories of young people from EEF trials and allow us to make comparisons between programmes) is so important.
This is all complex and difficult stuff. It is all too easy to retreat to the comfort of not saying anything, not rocking the boat and commissioning more research (when can we ever be 100 percent sure something works?). But as an organisation originally founded to narrow the attainment gap and help the most disadvantaged, we also feel acutely the urgency to get the best evidence out there and used by schools. Can it really wait another five years?
It is a tension we face every day and we are working with schools, academics and not-for-profits to pick our way through the issues. We’ve not cracked it – what we do is real world, imperfect and compromised – but we are yet to see an alternative that offers a better way forward