Standardised Testing

Test scores cannot be directly compared or combined on the basis of raw scores alone. It is not meaningful to compare or add together raw scores from tests of a different type, of different length, of different time-limit or of different difficulty. On the other hand, standardised scores are suitable for this purpose, and adding together standardised scores ensures that the tests all have equal weight.

Usually, tests are standardised so that the average standardised score automatically comes out as 100, irrespective of the difficulty of the test, and so it is easy to see whether a pupil is above or below the average. The measure of the spread of scores is called the "standard deviation", and this is usually set to 15 for educational attainment and ability tests. This means that, for example, irrespective of the difficulty of the test, about 68 per cent of the pupils in the national sample will have a standardised score within 15 points of the average (between 85 and 115) and about 96 per cent will have a standardised score within two standard deviations (30 points) of the average (between 70 and 130). These examples come from a frequency distribution known as "the normal distribution", which is shown in the figure below. Published standardised scores usually range from 70 to 140 or from 69 to 141.

If no age allowance were to be applied to the standardised scores, then the equation for converting raw scores to standardised scores is:

S = 15(b – a)/sd + 100

where S is the pupil’s standardised score, b is the pupil’s raw score, a is the average raw score of all the pupils, and sd is the standard deviation of the raw scores.

As an example, take a test of 80 questions. After the test has been administered and marked, the average (or ‘mean’) raw score, and standard deviation of these raw scores, are computed. The average score is 45 and the standard deviation is 12.5. For a pupil with a raw score of 55, the standardised score will be:

S = 15 x (55 – 45)/12.5 + 100 = 112

However, in order to allow for the differing ages of the pupils as accurately and as fairly as possible across the complete score range, the age-standardised scores are calculated in a much more statistically complex way, although the effect is similar to computing sets of scores using the above equation for pupils of the same age (to the nearest month).

Standardised scores will follow this distribution even if they are locally standardised rather than nationally standardised and, in most secondary selection procedures, it is locally standardised scores that are used. There is, however, a problem of comparison. "Off-the-shelf" tests that are used in schools for non-selective purposes are usually standardised on a nationally representative sample, and the mean (average) score of 100 represents a truly average level of performance. On the other hand, if an 11+ test is standardised to a mean of 100 based only on those pupils who took the test, then these locally standardised scores will not be equivalent to national scores unless it can be demonstrated that the two populations are of equivalent ability. If it is the case that the group taking the test has a higher average ability than the national sample, then their locally standardised scores on the 11+ tests will tend to be lower than their national scores on "off-the-shelf" tests of a similar type.

Raw scores and standardised scores come from different scales, and are therefore not easily comparable with each other. An everyday example of this is the comparison of temperatures in degrees Fahrenheit and degrees Celsius. Fahrenheit temperatures above 32 degrees convert to positive numbers on the Celsius scale, whereas those below 32 degrees convert to negative numbers on the Celsius scale. The conversion of raw scores to age-standardised scores is much more statistically complex, though, than the conversion of Fahrenheit to Celsius. It actually depends on the level of difficulty of the test, the average score and the spread of scores in the test, and on the relative levels of performance by pupils of differing ages.

It should be understood that scores expressed as percentages are never used. Unlike standardised scores, percentages cannot relate to the average performance of the pupils or to the extent of the variation in test score. Only by taking these into account can scores be places on a common scale.

Why is account taken of a pupil's age?

Nearly all pupils taking secondary selection tests during a particular school year are born between 1st September and 31st August of the following year, which means that the oldest pupils are very nearly 12 months older than the youngest. Almost invariably in ability and attainment tests, older pupils achieve slightly higher raw scores than younger pupils. In order not to disadvantage pupils who were born in, say, June, July or August rather than the previous September or October, the tests should, in theory, be taken by the pupils when they reach a particular exact age, e.g. 10 years 8 months. However, this is completely impractical, as it would take a full 12 months to administer the 11+ tests for a typical year group. So, instead, an allowance is included in the standardised scores that enables all the pupils to take the test on the same day, eliminating the age differential. Consequently, there is no advantage or disadvantage according to the month of birth. In effect, pupils are only being compared with other pupils of exactly the same age as themselves (measured to the nearest month).

How is the age allowance calculated?

In order to allow for the effect of age as accurately and as fairly as possible across the complete score range, the statistical model that is employed is complex. The age allowance that is included is ‘empirical’, i.e. it is based on the actual extent to which older pupils score more highly, rather than an allowance that is fixed in advance before the test scores are known. For example, in the unlikely event that older pupils did not score more highly, there would be no age allowance.

A description of the statistical model that is used can be found in the following paper from an academic journal:

SCHAGEN, I.P. (1990). ‘A method for the age standardisation of test scores.’ Applied Psychological Measurement, 14 , 4, 387-93.

Standardisation and sex differences

The standardisation treats boys and girls in exactly the same way. It used to be the case in educational testing some years ago that separate standardisations were conducted for boys and girls, but this has not been the case for a number of years under equal opportunities legislation.

A consequence of this is that the standardisation procedure does not eliminate sex differences. If boys achieve higher raw scores, on average, than girls on the test, then the boys' standardised scores will be higher; similarly, if girls obtain higher raw scores, then the girls' standardised scores will be higher.

What does a standardisation table look like?

An example of a table can be seen below. In order to be more easily readable, this example is based upon results from a test of only 40 questions (most secondary selection tests have more questions than this). It does, however, show how a standardisation table typically works.

Because standardised scores depend upon a pupil's raw score and age, a standardisation table is called a ‘two-way entry table’. In a column at the left-hand side of the table are the raw scores. Along the top of the table are the different ages - for example, 10:11 means 10 years and 11 months. As an illustration, a pupil aged 10:07 with a raw score of 23 will have a standardised score of 106 on this example test.

Two features of standardisation tables can be seen in this example:

  • As one moves along a row from left to right (i.e. as the age increases), the standardised scores decrease slightly. This is the age allowance at work, compensating for the fact mentioned earlier that, almost invariably, younger pupils score slightly lower on average. The rate at which standardised scores decrease with increasing age will vary from one test to another, and therefore the pattern observed in this table may well be different to that applicable to other tables.
  • The inclusion of this age allowance means that a younger pupil can achieve the same standardised score as an older pupil whilst having a slightly lower raw score. As stated before, this is in order not to disadvantage summer-born children in comparison to pupils who happen to have been born, say, in the previous autumn. An important consequence of this is that, in whatever month pupils were born, roughly the same proportion will achieve the specified pass mark. This is because pupils are, in effect, only being compared with other pupils of the same age as themselves.
  • As one moves down any particular column, the standardised scores increase. This means quite simply that higher raw scores will result in higher standardised scores.