Test identification

Name of test KeyMaths3 UK
Version KeyMath3 UK
Previous version(s) Version 1, 2.US, Canadian and UK editions.
Subjects Maths
Summary Assess maths skills of students aged 6 years to 16 years 11 months, and assist in intervention planning

Assessment screening

Subscales Three content areas made up of a total of 10 subtests - Basic concepts (Numeracy, Algebra, Geometry, Measurement, Data Analysis and Probability), Operations (Mental computation and estimation, addition and subtraction, multiplication and division), Applications (foundations of problem solving, Applied problem solving)
Additional References n/a
Authors Austin J Connolly
Publisher The Psychological Corporation/Pearsons Assessment
Test source https://www.pearsonclinical.co.uk/Education/Assessments/mathematical-assessments/keymaths-3/keymaths-3-uk.aspx
Guidelines available? Yes
Norm-referenced scores. Yes
Age range 6;00-16;11 Years
Key Stage(s) applicable to KS1, KS2, KS3, KS4
UK standardisation sample Yes
Publication date 2013
Re-norming date n/a


Validity measures available? Yes
Reliability measures available? Yes
Reason for exclusion from shortlist shortlisted

Evaluation and Appraisal

Additional information about what the test measures Measures multiple areas of mathematics - Basic concepts (conceptual knowledge), operations (computational skills), and applications (problem solving)
Are additional versions available? KeyMaths3 Diagnostic Assessment is a substantial revision in concept and design from previous editions of KeyMaths Diagnostic Assessment.KeyMaths3UK was further adapted for the UK with alignment with maths curriculum, anglicisation and new (UK) normative data.
Can subtests be administered in isolation? Yes
Administration group size Individual
Administration duration Untimed, but typically the time taken to administer the full battery of subtests is approximately 30-40 minutes if the examinee is in the early primary years, and 75-90 minutes for older examinees.Estimated administration times for each component and in each age range are provided on p12 of the Diagnostic Assessment Manual (Connolly, 2013).
Description of materials needed to administer test Materials supplied in administration kit:Manual,Two administration easels (Easel 1 and Easel 2),Record form,Two written computation examinee booklets.Additional materials:Writing implements,Calculator.
Any special testing conditions? Quiet and well lit, recommended in a separate room but can be in classroom if able to prevent visual and audible distraction.

Response format

Response mode Oral/ Paper and pencil
What device is required n/a
Queston format. mixed
Progress through questions adaptive

Assessor requirements

Is any prior knowledge/training/profession accreditation required for administration? Yes
Is administration scripted? Yes


Description of materials needed to score test Due to basal and ceiling rules, scoring of accuracy must occur concurrently with administration.Diagnostic Administration Manual.
Types and range of available scores Subtest raw scores, scaled scores, age equivalents;Derived (concept area) raw scores, age standard score, age equivalents, percentile rank;Total test raw score, age standard score (55-145), age equivalents (<4;6->16;0), percentile rank;Area comparisons (significance levels for discrepancy analysis).
Score transformation for standard score Age standardised
Age bands used for norming 3 months
Scoring procedures Complex manual scoring - training required
Automatised norming none

Construct Validity

Does it adequately measure literacy, mathematics or science?
Does it reflect the multidimensionality of the subject? Generic maths (with multiple specific subtests)
Construct validity comments (and reference for source)

The target constructs and development of the test are carefully considered and explained in the Diagnostic Assessment Manual (Connolly, 2013). The assessment shows excellent validity on a number of different measures across different studies. However, all of the evidence for test validity is from studies conducted in the US using the US form of the test. Given that test development was tightly bound to the US maths curriculum it is unclear how easily these measures of validity can be transferred to the UK. Further research should explore this. 

Five studies explored validity by calculating the correlations between KeyMaths3 and other measures of attainment in representative subgroups of the US standardisation sample. Here, we summarise correlations with total score only (see p91-103 for extensive analyses of correlations with subtests, area scores and total scores in each case). Excellent correlations were observed between KeyMaths3 and its predecessor KeyMaths-R normative update (adjusted r >.9), Kaufman Test of Educational Achievement, Second Edition, Maths composite score (adjusted r >.88), Iowa Tests of Basic Skills, Mathematics scores (adjusted r =.79), Measures of Academic Progress (adjusted r >.82), Group Mathematics Assessment and Diagnostic Evaluation (adjusted r >.82).

Contrasted group validity was assessed by comparing performance of clinical/special population samples to general population, from the data collected during US standardisation. These studies showed expected differences in performance of children categorised as having giftedness (p<.001), specific learning difficulties (maths only p<.001; maths and reading p<.001; reading only p<.001) and mild intellectual disability (p<.001).

No significant differences in performance levels were observed between children with ADHD and the general population, indicating that one-to-one assessment prevented this from confounding measurement of maths ability (reflecting discriminant validity).

Criterion Validity

Does test performance adequately correlate with later, current or past performance?
Summarise available comparisons n/a


Is test performance reliable?
Summarise available comparisons Excellent levels of internal consistency, temporal stability and alternate form reliability are reported in the Diagnostic Assessment Manual (Connolly, 2013), having been estimated from multiple adequately sized and representative studies that were conducted separately from the norming study. However, note that these studies were all conducted in the US using the US version of the test and therefore it is not clear whether these measures would transfer to the UK version. Internal consistency is excellent (>.9) for total test and most area scores. Extensive analyses of split-half reliability (more appropriate than cronbach's alpha due to basal/ceiling rules) is provided for form A and B subtests, areas and total scores by school year and term, and age (see p75-77 of the diagnostic assessment manual).SEMs for scaled scores 0.7-2.0 for subtests, 2-9 for area scores, 1.6-5.5 for total scores (detail presented on p79-81 of the diagnostic assessment manual).Excellent test-retest reliability correlation coefficients (adjusted r2) for subtests .76-.97; area score .93-.95; total test scores .97 (see p85 of diagnostic assessment manual for detailed analyses of subtest, area and total score test-retest reliabilities by school year)Alternate form reliability correlation coefficients for subtests (adjusted r2).74-.92; area score .87-.95; total test score .95-.97 (see p84 of diagnostic assessment manual for detailed analyses of subtest, area and total score alternate form reliabilities by school year).

Is the norm-derived population appropriate and free from bias?

Is population appropriate and free from bias? Yes
If any biases are noted in sampling, these will be indicated here. UK norms are based on a relatively small scale but well stratified and representative validation study in the UK (representative in terms of age, gender, parental education, ethnic group, geographic region), with norms standardised according to the percentile equivalent method, using percentile bands generated by a very large representative US sample as templates.


Sources Connolly, A.J. (2013). KeyMaths3UK: Diagnostic Assessment Manual. London, UK: Pearson Assessment.