The EEF’s Jonathan Kay takes an in-depth look at the findings of one of our latest trials to examine the issue of how we respond to programmes which might be ‘gap wideners’…
Today, we have published the independent evaluation of onebillion, an app-based programme designed to support young children to acquire basic maths skills.
The key finding is promising – on average, the 1,089 Year 1 pupils who used the apps made +3 months additional progress compared to the control group of pupils who didn’t use them.
However, looking below this headline result, there is cause for caution. While, on average, pupils using ‘onebillion’ made greater progress, those pupils eligible for free school meals (FSM) made less progress (-2 months) than FSM-eligible pupils in the control group.
This is not the only time we have seen an EEF trial produce different results for FSM-eligible children. Earlier this year, we published the independent evaluation of Improving Working Memory, which found an average impact of +3 months’ additional progress for all pupils, but an impact of +1 additional month for FSM-eligible pupils. This means that the attainment gap actually widened for those pupils who received the programme
By contrast, the independent evaluations of two EEF Promising Projects – Thinking, Doing, Talking Science and Philosophy for Children – found that FSM-eligible pupils made greater progress than the overall average.
Does ‘onebillion’ really hinder progress of pupils eligible for free school meals?
We do know from the evaluation that, in this trial, FSM-eligible pupils who received ‘onebillion’ did worse than pupils in the control group. What we do not know, however, is whether this difference has been caused by the programme itself, nor whether it can be generalised to FSM-eligible pupils beyond this trial.
The reason we can’t generalise the result essentially boils down to sample size. The EEF funds evaluations of programmes in order to generate ‘what works’ evidence that can be put to use by teachers and senior leaders across England, for example to inform decisions about Pupil Premium spending. The greater the number of schools/pupils which take part in these trials the more accurate will be the estimate of the programme’s impact on pupil outcomes. We ensure there is sufficient statistical power to be able to detect effects on the pupil population within the trial.
However, because FSM-eligible pupils are a smaller, sub-group of this overall pupil population, our trials often cannot allow us confidently to make claims about outcomes for FSM-eligible pupils more generally. This is because the smaller pupil numbers in the sample means there is a risk that the result is a ‘false positive’ or ‘false negative’ that has been caused by the way specific pupils have been allocated to the intervention or control groups, rather than by the actual impact of the programme we are trialling. This is one of the reasons why you don’t see an EEF ‘padlock’ security rating beside the outcomes we report for FSM-eligible pupils – unless the trial has been specifically designed to detect an effect on FSM-eligible pupils.
Why doesn’t the EEF make sure its trials are big enough to detect effects on pupils eligible for free school meals?
Conducting trials with generalisable results for pupils eligible for free school meals requires a very large number of schools/pupils, which is both expensive to fund and can be difficult to recruit to. We only do this, therefore, for those programmes which have already demonstrated positive impact when first trialled by the EEF; or if there is a good reason to believe at the outset that the programme will have a differential impact for FSM-eligible pupils compared to all other pupils.
Despite the challenges of generalising from results for FSM-eligible pupils, it is important to be maintain transparency and communicate these results. The information is a key consideration for the EEF in making future funding decisions and can help support teachers to implement approaches successfully.
What does the EEF do with potential ‘gap widening’ programmes?
The core mission of the EEF is to close the attainment gap for disadvantaged pupils. Learning about the impacts of programmes on FSM-eligible pupils helps us target our grant-funding, as well as our key messages to teachers, senior leaders, and policy makers.
First, by prioritising interventions that are more likely to close that attainment gap – for example, highlighting promising programmes like Thinking, Doing, Talking Science. With targeted interventions this might mean encouraging schools to make sure that, if they implement the approach, they do target FSM-eligible pupils.
Secondly, in examples like ‘onebillion’, which have promising results overall but less-promising results for FSM-eligible pupils, we can fund larger, effectiveness trials, designed to give us a secure estimate impacts for population sub-groups. As well as recruiting more pupils, future evaluations may use a process evaluation specifically to examine the reasons why a programme might widen the gap
In addition, the EEF is also commissioning analysis from Durham University to examine data from all published EEF trials in order to explore the impact on sub-populations of pupils – not only those eligible for free school meals, but also pupils designated with English as an Additional Language (EAL) or with Special Educational Needs and Disabilities (SEND). We anticipate this work grouping projects together to help us understand which ‘type’ or ‘groups’ of programmes could have differential effects on these particular sub-groups.
How should teachers treat projects like ‘onebillion’?
When a project reports a different impact on FSM-eligible pupils, this might flag a risk for schools and teachers thinking about implementing the approach – even if it does not provide a secure estimate of how the approach will generally impact those pupils. Knowing that, in the EEF trial, FSM-eligible pupils made less progress than than FSM-eligible pupils in the control group might lead a teacher to carefully monitor the impact that ‘onebillion’ has on this group of students in their school or classroom.
All evidence – whether it is an individual evaluation, or an evidence summary like the EEF Toolkit – will always need professional judgement to be implemented well in the classroom. Results, such as today’s evaluation of ‘onebillion’, are an important reminder that average effects are just that: averages. Different programmes may be more positive, or more negative, for different pupils.