Education Endowment Foundation:Why is evaluating writing programmes and approaches so challenging?

Why is evaluating writing programmes and approaches so challenging?

Writing is a multidimensional skill, so it’s hard to define good writing’.
Author
Dr Maria Pomoni
Dr Maria Pomoni
Evaluation Manager

Dr Maria Pomoni, one of our evaluation managers, explores how to effectively measure writing attainment, including new innovations with AI and how to focus on what your evaluation needs.

Research methods •5 minutes •

Writing is a vital skill that supports children’s success in school, work, and life.

But recent data shows a worrying decline in writing attainment, especially among disadvantaged pupils. (Covid-19 also had a negative impact on writing attainment.)

In response, we used a recent funding round to prioritise funding for developing and testing promising approaches for improving pupils’ writing outcomes.

How do we evaluate these approaches? It’s a challenge. Evaluating writing remains a complex task in education research.

What exactly are we measuring?

Unlike reading, which has clear benchmarks such as decoding accuracy or comprehension scores, writing is a multidimensional skill. It draws on cognitive, linguistic, and motor processes. In other words – writing has thinking, language, and physical components.

These layers make it difficult to capture progress or define good writing” in consistent, measurable ways. So when we try to evaluate writing interventions, researchers and teachers face a fundamental question: what exactly are we measuring? 

Writing encompasses multiple components including creativity, organisation, style, grammar, and mechanics. Each of these require distinct assessment tools.

There are a few established standardised tools that effectively assess these dimensions across different ages and subject areas. Where there are no established tools, or where standardised tests have fallen short, evaluators have resorted to developing bespoke measures.

The Grammar for Writing trial that we funded illustrated the challenge: developing bespoke assessment tools aligned with curriculum expectations while maintaining reliability. Writing is a multidimensional skill, so it’s hard to define good writing’. Writing is a multidimensional skill, so it’s hard to define good writing’.

In this trial, evaluators used a bespoke writing assessment adapted from previous key stage 2 assessment materials (a shorter persuasive writing task and a longer narrative task). They did this because the statutory tests were not sensitive enough: the evaluation needed a test that could detect relatively small, short‑term changes in pupils’ writing attributable to the intervention.

Approaches to measuring writing attainment

In previous EEF trials, we’ve used a variety of test to measure writing attainment:

In recent projects that we’ve funded, researchers have also used some more innovative approaches while assessing writing.

Innovation with AI

For example, experiments with technology have shown promise.

In the Writing Roots trial, researchers are using an innovative AI-driven tool designed to mark handwritten assessments.

Early findings from the piloting of this tool suggest that it can provide consistent judgments and justify scoring decisions. This is a potential breakthrough for large-scale evaluations. Of course, human oversight remains essential for safeguarding and quality assurance.

In another ongoing evaluation, of the Rehearsal Room Writing programme, pupils write for 30 minutes in response to a picture‑based prompt. Their work is assessed using comparative judgement, where markers compare two scripts at a time and decide which is better.

What’s innovative in this trial is that all scanned scripts are processed through the No More Marking platform. This platform uses automated processes to handle the entire comparative judgement process: pairing scripts for comparison, collecting marker decisions, generating final scores, and undertaking reliability checks.

Human oversight has been built into the process to ensure that any safeguarding concerns are appropriately identified.

How to effectively choose how you measure writing

Choosing the right outcome to measure starts with being clear about what aspect of writing the evaluation is trying to capture.

For example, this could be overall quality, idea generation, organisation, or technical accuracy. A combination of holistic (single overall score for a piece of writing) and analytic (scores on separate writing elements) measure often give the most complete picture of writing performance.

Studies suggest two other things that can help make evaluations more reliable:

  • collecting multiple writing samples (McMaster & Espin, 2007)
  • making sure marking is fully blind. That is, making sure markers don’t know which pieces of writing came from students who took part in the programme and which are from the control group. (Ritchey & Coker, 2013).

Conclusion

When designing evaluations of programmes aimed at improving writing attainment, it’s important to recognise that writing is difficult to assess because it is multidimensional. That’s why standardised measures are often limited or insufficient.

We hope this blog and the resources below can help you make informed decisions about how to measure, support, and improve pupils’ writing attainment.

Resources for Evaluators and Educators

Several recent EEF-commissioned work and the recent DfE framework can support decision-making when designing writing trials and interventions:

- Understanding Current Practice and Research Priorities in Teaching Writing (Pearson, 2023). An EEF-commissioned practice review mapping how writing is currently taught across primary and secondary settings.

- Writing Approaches in Years 3 to 13: Evidence Review (Slavin et al 2019). An EEF-commissioned evidence review of evaluations of writing approaches.

- The Writing Framework (DfE). Provides practical guidance for teaching writing from Reception through key stage 2.