EEF senior associate, Prof. Rob Coe, explores how we’re responding to the challenge of presenting nuanced evidence in ways that teachers can make practical use of it – and invites you to get involved…

One of the most persistent and knotty challenges for anyone who wants to support teachers’ use of research findings is how to report conclusions in ways that are simple and actionable but retain appropriate levels of complexity and uncertainty

Research results are rarely decisive, unequivocal or precise, but teachers need to make decisions. If we stress the uncertainty too much, we risk diluting and confusing the message. If we oversimplify, people may over-interpret implications or, worse, have their trust undermined when what they thought was a clear finding is later overturned.

A traditional approach in statistics is to use p‑values or confidence intervals to convey the uncertainty around a sample-based estimate of an effect size. But we know these approaches are widely misunderstood by researchers, can certainly mislead, and arguably distort the whole scientific process. [1] Even if teachers do understand them, it is not clear how these ways of reporting results should influence decisions and actions.

A key challenge is to be really clear what actions or interpretations are desirable

Research results are rarely decisive, unequivocal or precise, but teachers need to make decisions

If we do a trial that finds that A is better than B (or A is better than ‘business as usual’) with some level of statistical ‘significance’, does that mean we want all teachers to infer that they should do A? Maybe, if A is substantially and unequivocally better across a range of contexts, it could be that simple. But, more often, the difference will be small, uncertain and variable by context.

And the results of any new trial do not appear in a vacuum: we already had some evidence about this, both from research studies and our own experience. In that case an appropriate interpretation by a teacher might be, ‘Given this finding and other research evidence, as well as my existing context, my inclination towards A should rise a bit, but not enough to make the change just yet.’

The case where A and B are choices a teacher can make is probably the simplest

For example, in our first Teacher Choices Trial, ‘A Winning Start’, we want to compare two ways of starting a lesson: quizzing vs discussion. At the end, we would like to be able to give simple advice to teachers about which approach is better. But that advice could depend on a range of factors:

- The effect size: how big is the difference, in terms of the impact on attainment?
- The precision of that point estimate: how big is the uncertainty, or likely margin of error, on our effect size?
- Costs: what is the cost difference (including the time it takes to do each) between the two choices?
- Existing evidence: what did we know (or believe) already, and with what level of confidence? How relevant is that evidence to my context (eg pupil ages, subject)?

During my recent ResearchED talk about the EEF’s Teacher Choices trials, I used the following slide to invite people to say which they thought would be better and how sure they were:

With that audience there was a wide spread of views. A question like this could be one way of capturing people’s prior beliefs, though we may need to distinguish between beliefs that are based on sound evidence and (perhaps strongly held) intuitive perceptions that may simply be wrong.

The key idea is that if you are already pretty sure that quizzing is better, new evidence in favour of discussion would have to be really strong before it will change your mind (and hence your practice). On the other hand, if you are completely undecided, even quite weak evidence could tip you one way or the other.

We want to learn more about how we can make research results more directly useful to teachers

Cost is another matter that should affect your decision. If the two choices have the same impact on learning, but one is much easier, quicker, and cheaper to do then we would surely prefer that one. Alternatively, if, for example, quizzing starters take quite a bit more preparation time than discussions, then we might be able to identify a tipping point where the benefit becomes big enough to justify the time.

If we know a teacher’s prior beliefs and have used cost estimates to calculate this tipping point then we can analyse the results of the trial to generate a ‘benefit likelihood’ if they change their practice.

We might present this as a personalised recommendation: that they should definitely/probably/possibly do A (or B), or that there is ‘no recommendation’, either because we do not yet have enough information or because the evidence we have is well balanced between the two options in their context.

An approach like this could be a way to reconcile the requirements of reporting results in a way that is simple and actionable with the need to respect the uncertainty and provisional nature of research findings.

In our forthcoming work on Teacher Choices trials we will be exploring different ways of reporting in order to learn more about how we can make research results more directly useful to teachers. If you want to get involved in this work, or just to find out more, please visit the Teacher Choices webpage.

[1] See, for example, Morey, R. D., Hoekstra, R., Rouder, J. N., & Wagenmakers, E. J. (2016). Continued misinterpretation of confidence intervals: response to Miller and Ulrich. Psychonomic bulletin & review, 23(1), 131 – 140.

Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar (2019) Moving to a World Beyond “p < 0.05”, The American Statistician, 73:sup1, 1 – 19, DOI: 10.1080/00031305.2019.1583913