I have just wrested the computer from my husband. He is currently slogging through some external exam board marking. He keeps being interrupted from that specialised form of hellish tedium by a lunatic ranting female, bursting in to make another criticism of Ofqual and the new report on the quality of marking of GCSEs and A levels. Fortunately my husband, like me an experienced examiner, shares my view that the report covers up serious problems with the system.
I don’t know of any A level teachers in subjects such as mine, politics and history, who are happy with the quality of marking. However, many don’t know much about examining. They assume the lottery that is a module results printout on a bad year, is due to the incompetence of the markers and can be solved by recruiting better qualified examiners and giving them better general training. Actually the very real and serious problems I see with marking have much more complicated causes. This piece focuses on one significant cause of problems that stems from the desire of exam boards to cut costs and administrative headaches, by standardising online. Am I alone in singling this issue out? Well no, online standardisation has been common for around 5 years and Ofqual suggest in their report that:
“In a series of in depth interviews with some 50 examiners… they…voiced overwhelming opposition to online standardisation.”
In the good old days of examining all markers for a paper would meet in a slightly dingy hotel room, probably in Bloomsbury, and spent a thrilling day together gaining ‘a common understanding of the mark scheme’. We were all expected to mark ten papers before arriving so we were familiar with the paper and the mark scheme already. We would sit in ‘teams’ each around a table. Our team leader would direct us to the first question and using the chief examiner’s mark scheme (refined the previous day by the chief and his team leaders) and we would all have a stab at a mark. Then the fun would begin as, with much dread, we all had to share our chosen mark with our team. Without fail the marks we shared would range over three grade boundaries . Over an intense day, full of questioning, discussion, negotiating and some frustration a room of diverse, opinionated teachers would gradually start to mark as one. What the uninitiated don’t grasp is that experienced markers might get a similar rank order in their marks but a sense of level has to be re-calibrated for each question. OK, face to face standardisation was never perfect and it would have been easier for the chief examiner to herd cats, but it kind of worked.
Ofqual seem happy that in the brave new world of modern technology this palava is now unnecessary as:
“Evidence available does not suggest that online standardisation is a real threat to the quality of marking”.
However, if I chose to mark this year (I won’t) I would have to sit at home with a mark scheme and some sample responses, trying to work out how to apply a mark scheme with minimal extra guidance. I would then take a stab, enter a mark on the computer and wait, dreading the appearance of red writing on the screen. It is like Russian roulette and the best of markers can often fail with this minimal guidance. In the past the guidance given for online standardisation was always available for face to face markers, days before they ever met. However, a full intense day of work was still to be deemed necessary for a team to really gain a common understanding of the mark scheme. After that day you would spend another at home preparing a sample of your marking for the team leader. It is true that with online standardisation I do have access to a (busy) team leader on the phone and there are some boards that try forms of conferencing but in total you get about the same amount of guidance as previously – minus the full day of face to face discussion.
However, Ofqual say my understanding of the mark scheme is just as good as it was previously. Yes – and black is also white… Is it any wonder that those telephone conversations with a team leader have been known to involve some quite specific guidance to get through the on screen sample questions? Some team leaders are informally told by chief examiners (struggling to get the markers to pass with such limited guidance) to use ‘guided landing’. Otherwise (I have been told anecdotally…) when the mark scheme is poor, most of your marking team would sometimes fail the online standardisation. The online system was heralded as a solution to the previous problems that made a face to face meeting necessary. In fact, in the old days it had always been practically possible to standardise without face to face meetings. It wasn’t the lack of technology that led to face to face meeting but a belief they were important for quality of the marking.
Ofqual do admit that ‘overwhelming’ numbers of examiners are concerned about online standardisation, but the thing is, Ofqual make clear, our opinion is simply not reliable.
What do Ofqual base their faith in online standardisation on?
1. Ofqual use research evidence.
Ofqual do state that “admittedly research is limited” and I would agree. They cite three studies, one on a state wide reading assessment in America, another about training in writing assessment at a New Zealand university and a third, more relevant but sponsored by AQA. Given the distinctive nature of GCSE and especially A level humanities marking, this is a paucity of evidence that makes the Ofqual admission of its limitations an enormous understatement. Ofqual, wisely, acknowledge that it is possible that previous good practice at face to face meetings has prepared examiners to mitigate any weaknesses of online standardisation. Too right, would be my view. Ofqual say they ‘expect exam boards to monitor this closely’ and I wonder how. Their trust that there is either the will or the ability to do so seems enormously naïve.
The AQA research cited is interesting because it looks at the marking of my subject, history, at GCSE. The study finds that on a sample of GCSE papers face to face standardisation did not lead to better quality marking than online.
This research fails to grasp a fundamental issue with marking. The quality is only as good as the mark scheme and thus, as the chief examiner.
You can mark well, according to the mark scheme, and your marking can be ‘reliable’ but the result can still be some of the gross injustices to candidates I have seen as a teacher. This is because:
- A poor mark scheme leads to bad marking. Online standardisation increases exposure to human error. When 30-100 pairs of eyes fillet a mark scheme it becomes more workable. Correct marking, using a poor mark scheme, leads to injustice. Even a weak mark scheme, short of helpful guidance and produced by a poor chief examiner, can be salvaged at a face to face meeting.
- It is less possible to negotiate more subtle and intelligent judgements when at home in front of a screen. It is another error of the uninitiated to assume it is the job of an examiner to decide what they think is the quality of a piece. Their job is actually to implement the mark scheme. When stuck at home in front of a screen, without the ability to negotiate over the quality of common student responses, good examiners end up sticking to a chief examiner’s guidance. They may ring up their team leader a few times but ultimately they have to get on with the job. Marking becomes more mechanistic and any poor decisions by the chief examiner (we are all human) can’t be challenged.
- Mark schemes become more and more prescriptive over time. This is because chief examiners try to get reliable outcomes without the chance to communicate nuance. Their pool of examiners gradually contains fewer seasoned examiners who have the benefit of years of shared discussion in face to face meetings. In A level Politics something called ‘threshold’ guidance has recently been introduced that says how many arguments there must be in an essay for it to reach a certain level . This is an inevitable consequence of being deprived of face to face discussion of what makes a quality response to a specific question.
- Judgements also become more cautious when examiners are less sure of the mark scheme. If you put a good response just inside level 3, it is less risky that placing it near the top of the level. Therefore the marks, while still within tolerance (and apparently reliable), bunch up. Bunching means results are dramatically skewed by just one poor answer. On one AS Politics paper, on an essay worth 25 marks there are actually only 5 marks between the A and the E grade due to the bunching created by cautious marking. This has serious knock on effects for the overall reliability of the paper.
So Ofqual admit the research is limited and don’t even acknowledge these problems with online standardisation.
- 2. Ofqual use a questionnaire to all examiners.
In a general questionnaire 85% of 10 000 examiners responded positively to the statement
‘I receive sufficient briefing about a paper and mark scheme before I begin my marking for each exam”.
From this Ofqual blithely extrapolate that ‘standardisation is clearly perceived to work effectively for most examiners. It seems pretty obvious to me that examiners who used the old system (a dwindling pool) might be positive in assessing how well they were helped within the parameters of the system and still concerned by the loss of a better system. Ofqual prefer to believe the contradiction between those interviewed about online standardisation and the questionnaire response is because we didn’t really know our own minds . Ofqual hold with this interpretation despite that fact that in the same questionnaire, when asked how the system could be improved the most frequent suggestion was a return to face to face standardisation.
- 3. Ofqual cite a piece of research showing there is no established link between higher confidence among markers and better marking.
One wonders why Ofqual asked teachers in the first place when their ‘overwhelming’ view can be dismissed out of hand as unreliable. I suppose if overwhelming numbers of teachers don’t know their own mind and have unreliable views anyway it makes perfect sense to place your faith in three weak research studies.
Ofqual appear to be in the thrall of the big exam boards and meanwhile marking becomes more and more mechanistic and erratic. The whole currency of exams is gradually undermined and the only people who can raise the alarm are ignored.