Alphabet Soup – a post about A level marking.

Cambridge Assessment tell us that ‘more than half of teachers’ grade predictions are wrong’ at A level. There is an implication behind this headline that many teachers don’t know their students and perhaps have to be ‘put right’ by the cold hard reality of exam results.

What Cambridge Assessment neglect to say is that greater accuracy in predicting A level module results would actually need prophetic powers. No teacher, however good, can predict module results with any reliability because many results are VERY unpredictable. I teach history and politics, humanities subjects and so likely to have the less consistent results than subjects like sciences. Our history results have been relatively predictable of late but our A level politics results are another matter.

To illustrate we will now play a game. Your job is to look at a selection of real data from this year and predict each student’s results using their performance in previous modules as a guide. You might think that perhaps your lack of knowledge of each student’s ability will be a hindrance to your ability to make accurate predictions. Hmm well, we’ll see! (If you are a party pooper and don’t want to play you can scroll forward to the last table and see all the answers…)

It may help you to know that in politics the two AS and the two A2 modules follow similar formats and are of similar difficulty (perhaps Unit 2 is a little bit harder than Unit 1) and so a teacher will probably predict the same grade for each module as there is no reason why grades will vary markedly between modules that a teacher can anticipate. The A2 modules are a bit harder than AS modules and so a teacher will bear this in mind when predicting A2 results with the AS grades in front of them. (That said students mature and frequently do better at A2 than AS so this isn’t an entirely safe trend to anticipate.)

So using the Unit 1 grades on the first table below, what might you predict for these students’ Unit 2 module? (Remember the teacher will necessarily have given the exam board the same predicted grade for Unit 1 and 2.)


Candidate Unit 1 [AS] Unit 2 [AS] Unit 3 [A2] Unit 4 [A2]
1 B
2 B
3 B
4 A
5 B
6 A
7 A
8 B
9 A
10 A
11 A

(The overall results will soon be in the public domain and I have anonymised the students by not including the whole cohort (which was 19) and not listing numerical results (which interestingly would actually make prediction even harder if included). I have not listed results at retake as these would not be available when predictions were made. We use Edexcel.)

Check the table below to see if you were right.

Candidate Unit 1 [AS] Unit 2 [AS] Unit 3 [A2] Unit 4 [A2]
1 B B
2 B C
3 B E
4 A C
5 B E
6 A E
7 A D
8 B B
9 A B
10 A A
11 A A


Were you close? Did you get as much as 50%? If you simply predicted the same grade for Unit 2 as scored in Unit 1 (as teachers generally would) you could have only got 7/19 of the full cohort correct.

Now don’t look at the table below until you have tried to predict the first A2 grade! Go on! Just jot down your predictions on the back of an envelope. (Isn’t this fun?)

Candidate Unit 1 [AS] Unit 2 [AS] Unit 3 [A2] Unit 4 [A2]
1 B B D
2 B C E
3 B E B
4 A C E
5 B E D
6 A E D
7 A D E
8 B B C
9 A B D
10 A A A
11 A A C

Rather an unpredictably steep drop there! It was all the more puzzling for us given that our A2 history results (same teachers and many of the same students) were great.

If you’ve got this far why not predict the final module?


Candidate Unit 1 [AS] Unit 2 [AS] Unit 3 [A2] Unit 4 [A2]
1 B B D C
2 B C E B
3 B E B E
4 A C E D
5 B E D B
6 A E D D
7 A D E D
8 B B C B
9 A B D D
10 A A A A
11 A A C C


We have no idea why there is a steep drop at A2 this year.  Perhaps our department didn’t quite ‘get’ how to prepare our students for these papers in particular. I doubt it – but if so what does this say about our exams if seasoned and successful teachers can fail to see how to prepare students for a particular exam despite much anxious poring over past papers and mark schemes? Our politics AS results were superb this year for both modules.  Our department’s History results were fantastic at AS and A2. These results were entirely UNPREDICTABLE. Or to put it another way, they were only predictable in the sense that we anticipated in advance that they would look like alphabet soup – because they often do.

The first point that should be clear is that no teacher could possibly predict these module results. To even make the statement ‘more than half of teachers’ grade predictions are wrong’ is to wilfully mislead the reader as to what the real issue is here.

In fairness, it might feel like the grades have been created by a random letter generator but the results aren’t quite random. Some very able students, such as candidate 10, get As on all papers and would only ever have had A predictions. The final grades of our students were generally within one grade of expectations so the average of the four results has some validity. This said, surely there is a more worrying headline that the TES should have run with?

Just how good are these exams at reliably discriminating between the quality of different candidates? It is argued that marking is reliable but what does this then tell us about the discriminatory strength of the exams themselves or their mark schemes? It can’t have helped that there were only 23 marks (out of 90) between an A and an E on Unit 3 or 24 marks between those grades on Unit 4. I have discussed other reasons I think may be causing the unpredictability here and here.

Not all exams are as unpredictable as our Government and Politics A level but if we want good exams that reliably and fairly discriminate between students we need to feel confident we know why some exams have such unpredictable outcomes currently. Ofqual has been moving in that direction with their interesting studies of maths GCSE and MFL at A level. As well as being unfair in its implication the Cambridge Assessment headline is simply unhelpful as it obscures the real problems.



No one questions what they want to believe. The problem with EPPSE

The EPPSE is a very large and enormously influential study commissioned by the DfE to find out what types of preschool provision and early experiences are most effective. It followed 3000+ children from the age of 3 to 16 years and reaches some very significant conclusions which I would question.

The research team was from the Institute of Education, Birkbeck and Oxford and as the Institute of Education blog explains:

The EPPSE project… has become one of the highest impact educational research programmes in Europe… EPPSE’s findings underpin billions in Government spending on nursery expansion, including the Sure Start programme, the extension of free pre-school to all three and four-year-olds in 2010 and this year, to the poorest 40% of two-year-olds this year… EPPSE’s evidence documenting excellent pre-school education and its ongoing benefits, especially for the most deprived children, has fed heavily into England’s early childhood curriculum and informed curricula in countries as diverse as Australia, China and Brazil. Nursery World editor Liz Roberts has noted “how highly regarded the Early Years Foundation Stage is around the world”.

The EPPSE project findings are stunning:

Attending any pre-school, compared to none, predicted higher total GCSE scores, higher grades in GCSE English and maths, and the likelihood of achieving 5 or more GCSEs at grade A*-C. The more months students had spent in pre-school, the greater the impact on total GCSE scores and grades in English and maths… the equivalent of getting seven B grades at GCSE, rather than seven C grades.

The EPPSE project also found that:

There was some evidence of statistically significant continuing pre-school effects on social behavioural outcomes at age 16 but these were weaker than at younger ages. Having attended a high quality pre-school predicted better social-behavioural outcomes in the longer term, though the effects were small.

The IOE blog gushes that EPPSE:

…brought together a rare combination: research funded by Government with a genuinely open mind, carried out by excellent and dedicated academics savvy enough to work with and influence politicians of all stripes…And thanks to the detailed work that began 17 years ago at the IOE, we also know what excellent nursery provision looks like.

Hold on a moment! It seems these researchers did not have an open mind. I have previously blogged about the fact the EPPE (the acronym before the study moved onto secondary school outcomes) studied quality using a measure, ECERS R, which had a predefined scale based on prejudged measures of quality. Having read through much of the voluminous literature there is so much I could discuss about the EPPE findings but in this post I will focus on another claim in the IOE blog, that this study has rigour. I am really not sure it does.

That is a big accusation to make against such a large and influential study conducted by highly regarded academics but it seems to have a fundamental problem that strikes at the heart of the validity of its findings.

The problem of the EPPE/EPPSE control group.

In their report at the end of KS1 (when the study children were 7 years old) the researchers acknowledge that there were problems with the control group. Because in England the vast majority of children attend a preschool it was not possible to find a representative sample of those children who didn’t:

The ‘home’ control group are from significantly disadvantaged backgrounds when compared with the sample as a whole with most mothers having less than a GCSE qualification (p11) .

On p28 the report explains that:

…comparison of the ‘home’ sample (the control who did not attend pre-school) with children who attended a pre-school centre showed that both the characteristics and attainments of home children vary significantly from those who had been in pre-school. It is not possible to conclude with any certainty that the much lower attainments of the ‘home’ group are directly due to lack of pre-school experience.’

The writers go on to talk positively about how they have used ‘contextualised multilevel analysis’ to try and compensate for the unrepresentative nature of the control sample and they feel this means their results are worth considering but, for example, they admit that when making judgements about the impact of longer duration of pre-schooling on higher cognitive outcomes by comparing with the ‘home’ group:

“causal connections cannot be drawn”

The problems with the control are not mentioned in the overall findings but in 2004 they are acknowledged in the body of the report. There is an attempt in 2004 to show that variation in pre-school quality and in duration of attendance can have an impact and this is because these findings don’t have to use the problematic control.

It seems obvious that children coming from very disadvantaged homes as the ‘home’ control group largely do, may well benefit from pre-school in ways most children wouldn’t. Therefore, despite contextualised multilevel analysis to take account of all other variables such as SES there will always be problems using this control to reach firm conclusions on the impact of pre-schooling on the whole population.

Fast forward to 2014 and the final reports in the children at age 16. I have looked through all the reports. It is clear from the tables included that all the startlingly good educational findings rest on comparisons with this control group. However I can find NO MENTION AT ALL of the sorts of problems with the control the researchers were willing to acknowledge in 2004. It is as if the control group issue just wafted away. It seems that such a large and important study, ‘one of the highest impact educational research programmes in Europe’, had no need to concern itself with pesky issues like that annoyingly poor control, that the whole vast edifice that is EPPSE rests upon.

How can it be that in 2004 the problem of the unrepresentative control meant many findings on the impact of preschool were tentative but in 2014 the issue of the control has totally disappeared? It is as if it never existed. Given the startlingly strong impact ANY form of preschool is claimed to have the findings need to be robust if they are to be used to make policy.

The EPPSE claims to be ‘proper’ research, not the sort of stuff that gives education research a bad name. It is also enormously influential and directly used in government policy making. I can understand the researchers, invested as they are, claiming their conclusions have validity but where is the scrutiny? What is happening at peer review? If anything highlights the unhealthiness of the rigid orthodoxy in education departments, especially in early years research, it is this EPPSE study. Once again no one seems to question what they want to believe.

If you found this interesting you may also want to read these posts: