Education Endowment Foundation: KeyMaths3 UK

About the measure

Version

KeyMath3 UK

Previous version(s)

Version 1, 2. US, Canadian and UK editions.

Subject

Maths

Assessment screening

Subscales

Three content areas made up of a total of 10 subtests: basic concepts (numeracy, algebra, geometry, measurement, data analysis and probability); operations (mental computation and estimation, addition and subtraction, multiplication and division); applications (foundations of problem solving, applied problem solving).

Publisher

The Psychological Corporation/​Pearsons Assessment

Test source

https://www.pearsonclinical.co.uk/Education/Assessments/mathematical-assessments/keymaths‑3/keymaths-3-uk.aspx

Guidelines available?

Yes

Norm-referenced scores

Yes

Age range

6 – 16;11 years

Key Stage

Key Stage 1, Key Stage 2, Key Stage 3, Key Stage 4

UK standardisation sample

Yes

Publication date

2013

Re-norming date

N/​a

Eligibility

Validity measures available?

Yes

Reliability measures available?

Yes

note whether shortlisted, and reasons why not if relevant

Shortlisted

Administration format

Additional information about what the test measures

Measures multiple areas of mathematics: basic concepts (conceptual knowledge), operations (computational skills), and applications (problem solving).

Are additional versions available?

KeyMaths3 Diagnostic Assessment is a substantial revision in concept and design from previous editions of KeyMaths Diagnostic Assessment. KeyMaths3UK was further adapted for the UK with alignment with maths curriculum, anglicisation and new (UK) normative data.

Can subtests be administered in isolation?

Yes

Administration Group Size

Individual

Administration duration

Untimed, but typically the time taken to administer the full battery of subtests is approximately 30 – 40 minutes if the examinee is in the early primary years, and 75 – 90 minutes for older examinees. Estimated administration times for each component and in each age range are provided on p12 of the Diagnostic Assessment Manual (Connolly, 2013).

Description of materials needed to administer test

Materials supplied in administration kit: manual, two administration easels (Easel 1 and Easel 2), record form, two written computation examinee booklets. Additional materials: writing implements, calculator.

Any special testing conditions?

Quiet and well lit, recommended in a separate room but can be in classroom if able to prevent visual and audible distraction.

Response format

Response mode

Oral, Paper and Pencil

What device is required?

N/​a

Question format

Mixed

Progress through questions

Adaptive

Assessor requirements

Is any prior knowledge/training/profession accreditation required for administration?

Yes

Is administration scripted?

Examiners may include, but are not limited to, educational psychologists and special-education professionals. Qualified examiners need to have training in and an understanding of the principles of test administration, including establishing and maintaining rapport, following standardised testing procedures, and applying the statistical concepts related to scoring and interpreting test results. Examiners should also have experience with testing individuals who are of the same age and educational or disability status as those undergoing KeyMaths-3UK DA testing. Before administering, scoring and interpreting the results of the KeyMaths-3UK DA, examiners should read and study all testing materials, including the administration easels, the record form, the booklet and the manual. In addition, examiners should practice administering and scoring the test until they are comfortable with its procedures and item content.

Assessor requirements

Description of materials needed to score test

Due to basal and ceiling rules, scoring of accuracy must occur concurrently with administration. Diagnostic Administration Manual.

Types and range of available scores

Subtest raw scores, scaled scores, age equivalents; Derived (concept area) raw scores, age standard score, age equivalents, percentile rank; Total test raw score, age standard score (55 – 145), age equivalents (<4;6 – >16;0), percentile rank; Area comparisons (significance levels for discrepancy analysis).

Score transformation for standard score

Age standardised

Age bands used for norming

3 months

Scoring procedures

Complex manual scoring — training required.

Automatised norming

None

Construct Validity

Rating Construct

Does it reflect the multidimensionality of the subject?

Generic maths (with multiple specific subtests)

Construct validity comments (and reference for source)

The target constructs and development of the test are carefully considered and explained in the Diagnostic Assessment Manual (Connolly, 2013). The assessment shows excellent validity on a number of different measures across different studies. However, all of the evidence for test validity is from studies conducted in the US using the US form of the test. Given that test development was tightly bound to the US maths curriculum it is unclear how easily these measures of validity can be transferred to the UK. Further research should explore this. Five studies explored validity by calculating the correlations between KeyMaths3 and other measures of attainment in representative subgroups of the US standardisation sample. Here, we summarise correlations with total score only (see p91 – 103 for extensive analyses of correlations with subtests, area scores and total scores in each case). Excellent correlations were observed between KeyMaths3 and its predecessor KeyMaths‑R normative update (adjusted r >.9), Kaufman Test of Educational Achievement, Second Edition, Maths composite score (adjusted r >.88), Iowa Tests of Basic Skills, Mathematics scores (adjusted r =.79), Measures of Academic Progress (adjusted r >.82), Group Mathematics Assessment and Diagnostic Evaluation (adjusted r >.82).Contrasted group validity was assessed by comparing performance of clinical/​special population samples to general population, from the data collected during US standardisation. These studies showed expected differences in performance of children categorised as having giftedness (p<.001), specific learning difficulties (maths only p<.001; maths and reading p<.001; reading only p<.001) and mild intellectual disability (p<.001). No significant differences in performance levels were observed between children with ADHD and the general population, indicating that one-to-one assessment prevented this from confounding measurement of maths ability (reflecting discriminant validity).

Criterion Validity

Rating Criterion

Summarise available comparisons

None available to review.

Reliability

Rating Reliability

Summarise available comparisons

Excellent levels of internal consistency, temporal stability and alternate form reliability are reported in the Diagnostic Assessment Manual (Connolly, 2013), having been estimated from multiple adequately sized and representative studies that were conducted separately from the norming study. However, note that these studies were all conducted in the US using the US version of the test and therefore it is not clear whether these measures would transfer to the UK version. Internal consistency is excellent (>.9) for total test and most area scores. Extensive analyses of split-half reliability (more appropriate than Cronbach’s alpha due to basal/​ceiling rules) is provided for form A and B subtests, areas and total scores by school year and term, and age (see p75 – 77 of the diagnostic assessment manual). SEMs for scaled scores 0.7 – 2.0 for subtests, 2 – 9 for area scores, 1.6 – 5.5 for total scores (detail presented on p79 – 81 of the diagnostic assessment manual). Excellent test-retest reliability correlation coefficients (adjusted r2) for subtests .76 – .97; area score .93 – .95; total test scores .97 (see p85 of diagnostic assessment manual for detailed analyses of subtest, area and total score test-retest reliabilities by school year). Alternate form reliability correlation coefficients for subtests (adjusted r2) .74 – .92; area score .87 – .95; total test score .95 – .97 (see p84 of diagnostic assessment manual for detailed analyses of subtest, area and total score alternate form reliabilities by school year).

Is the norm-derived population appropriate and free from bias?

Does the standardisation sample represent the target/general population well?

If any biases are noted in sampling, these will be indicated here.

UK norms are based on a relatively small scale but well stratified and representative validation study in the UK (representative in terms of age, gender, parental education, ethnic group, geographic region), with norms standardised according to the percentile equivalent method, using percentile bands generated by a very large representative US sample as templates.

Sources

Sources

Connolly, A.J. (2013). KeyMaths3UK: Diagnostic Assessment Manual. London, UK: Pearson Assessment.