Why is evaluating writing programmes and approaches so…

Writing is a vital skill that supports children’s success in school, work, and life.

But recent data shows a worrying decline in writing attainment, especially among disadvantaged pupils. (Covid-19 also had a negative impact on writing attainment.)

In response, we used a recent funding round to prioritise funding for developing and testing promising approaches for improving pupils’ writing outcomes.

How do we evaluate these approaches? It’s a challenge. Evaluating writing remains a complex task in education research.

What exactly are we measuring?

Unlike reading, which has clear benchmarks such as decoding accuracy or comprehension scores, writing is a multidimensional skill. It draws on cognitive, linguistic, and motor processes. In other words – writing has thinking, language, and physical components.

These layers make it difficult to capture progress or define “good writing” in consistent, measurable ways. So when we try to evaluate writing interventions, researchers and teachers face a fundamental question: what exactly are we measuring?

Writing encompasses multiple components including creativity, organisation, style, grammar, and mechanics. Each of these require distinct assessment tools.

There are a few established standardised tools that effectively assess these dimensions across different ages and subject areas. Where there are no established tools, or where standardised tests have fallen short, evaluators have resorted to developing bespoke measures.

The Grammar for Writing trial that we funded illustrated the challenge: developing bespoke assessment tools aligned with curriculum expectations while maintaining reliability. Writing is a multidimensional skill, so it’s hard to define ‘good writing’. Writing is a multidimensional skill, so it’s hard to define ‘good writing’.

In this trial, evaluators used a bespoke writing assessment adapted from previous key stage 2 assessment materials (a shorter persuasive writing task and a longer narrative task). They did this because the statutory tests were not sensitive enough: the evaluation needed a test that could detect relatively small, short‑term changes in pupils’ writing attributable to the intervention.

Approaches to measuring writing attainment

In previous EEF trials, we’ve used a variety of test to measure writing attainment:

Statutory tests (tests required for all pupils by law). Such as the sentence combining subtest from the Wechsler Individual Achievement Test, Second Edition (WIAT-II, Wechsler, 2005).
Other standardised tests, such as the Progress Test in English (PTE) GL Assessment.
Unstandardised but reliable tests, such as the Writing Assessment Measure (WAM).

In recent projects that we’ve funded, researchers have also used some more innovative approaches while assessing writing.

Innovation with AI

For example, experiments with technology have shown promise.

In the Writing Roots trial, researchers are using an innovative AI-driven tool designed to mark handwritten assessments.

Early findings from the piloting of this tool suggest that it can provide consistent judgments and justify scoring decisions. This is a potential breakthrough for large-scale evaluations. Of course, human oversight remains essential for safeguarding and quality assurance.

In another ongoing evaluation, of the Rehearsal Room Writing programme, pupils write for 30 minutes in response to a picture‑based prompt. Their work is assessed using comparative judgement, where markers compare two scripts at a time and decide which is better.

What’s innovative in this trial is that all scanned scripts are processed through the No More Marking platform. This platform uses automated processes to handle the entire comparative judgement process: pairing scripts for comparison, collecting marker decisions, generating final scores, and undertaking reliability checks.

Human oversight has been built into the process to ensure that any safeguarding concerns are appropriately identified.

How to effectively choose how you measure writing

Choosing the right outcome to measure starts with being clear about what aspect of writing the evaluation is trying to capture.

For example, this could be overall quality, idea generation, organisation, or technical accuracy. A combination of holistic (single overall score for a piece of writing) and analytic (scores on separate writing elements) measure often give the most complete picture of writing performance.

Studies suggest two other things that can help make evaluations more reliable:

collecting multiple writing samples (McMaster & Espin, 2007)
making sure marking is fully blind. That is, making sure markers don’t know which pieces of writing came from students who took part in the programme and which are from the control group. (Ritchey & Coker, 2013).

Conclusion

When designing evaluations of programmes aimed at improving writing attainment, it’s important to recognise that writing is difficult to assess because it is multidimensional. That’s why standardised measures are often limited or insufficient.

We hope this blog and the resources below can help you make informed decisions about how to measure, support, and improve pupils’ writing attainment.

Why is evaluating writing programmes and approaches so challenging?

What exactly are we measuring?

Approaches to measuring writing attainment

Innovation with AI

How to effectively choose how you measure writing

Conclusion