Data Tracking and the LFs*

Until recently I was unfamiliar with the sorts of pupil tracking systems used in most schools. I’ve also recently had to get to grips with the plethora of acronyms commonly used to categorise groups of students being tracked. I’ve come across PP, LPAs, HPAs and LACs but, rather surprisingly, no mention of the LF. To be honest I am surprised by this gap given that in my considerable experience it is how the teacher and school manage the performance of the LFs that is most crucial to healthy end of year data. If the LFs perform near their potential you’re basically laughing all the way to the exam hall.

I should, at this stage, be clear. LF is not a standard acronym (it was invented by my husband) but it does describe a clearly recognisable and significant sub-section of any secondary school population. The L stands for lazy (and the second word begins with an F).

I am being very flippant, I know, but my point is serious enough.

Today I happened to need to look at a spreadsheet containing data for an old cohort from my last school. As my eye glanced down the baseline testing stats, used for tracking, I couldn’t help emitting frequent snorts of derision. The trigger of my scorn was the original baseline test data for some of my most ‘affectionately’ remembered GCSE students (truthfully, actually, I do remember them all with warmth). I commented to my husband that they needed to be real… erm… ‘LFs’ to score that low on the baseline given the brains with which I knew perfectly well that they were blessed.

If I and my colleagues had based our ambitions for those particular boys individuals on their predicted grade from the baseline they’d have cruised lazily through school. Their meagre efforts would have been continually affirmed as adequate which would have been ruinous for their habits and character and a betrayal of their potential.

If value added is what drives you it is also an obvious truth that if you effectively cap your ambitions for pupils by only showing concern when pupils don’t meet predicted grades from the baseline you’ll still have to absorb the scores of some pupils that just aren’t going to be able to live up to their predictions. Meanwhile you lose some of the scores of those that should do better than their baseline result suggests, that would otherwise balance everything out.

I think what bothers me most is the ‘inhumanity’ of a purely data driven approach to progress. How could school teachers, of all people, have devised a system that allows no room to acknowledge obvious human truth before our eyes? Exactly when weren’t and where aren’t some humans, sometimes, rather lazy? Down through the centuries school teachers have exercised their craft, ensuring pupils learn important things despite the entirely natural human propensity towards sloth, magnified in the teenage years. What made us think we could dispense with that wisdom, that our spreadsheets knew better?

Can we re-learn to teach the pupils that actually sit before us, responding to them using our hard-won expertise? Oh, I do hope so.

*Warning: this post nearly contains bad language.

The ‘quite tidy garden’ …or why level descriptors aren’t very helpful.

Dear Josh,

Thank you for agreeing to sort out our garden over your long holiday. As we’ll be away all summer here is a guide that tells you all you need to know to get

from this…

…to this

STEP A: You should begin by assessing the garden to decide its level. Read through these level descriptors to decide:

Level 1: Your garden is very overgrown. Any lawn has not been mown for some years. Shrubs have not been pruned for a considerable period. There are no visible beds and typically there will be large areas taken over by brambles and or nettles. There will probably be an abandoned armchair (or similar worn out furniture) somewhere in the overgrowth as well as assorted rubble and the old concrete base from a fallen shed. Boundary fencing will have collapsed.

Level 2: Your garden is just a little overgrown. The lawn is patchy though neglect and has only been mown sporadically. Shrubs generally have not been pruned recently. Beds look neglected and are not well stocked. There may be various forms of old rubbish abandoned in the far corners of the garden along with old lawn clippings and hedge trimmings. Boundary fences are in disrepair.

Level 3: Your garden is well tended. Lawns are mown regularly and contain no moss and weeds and shrubs are regularly pruned. Flower beds are well demarcated and contain no weeds. They are well stocked with appropriate bedding plants. The garden is quite tidy and boundary fencing is new and strong.

STEP B:

Josh, if you decide the garden is Level 1 (that is certainly our view) then I suggest you look at the Level 2 descriptor to guide you as to your next steps. It is clear that you need to move the garden from ‘very overgrown’ to ‘just a little overgrown’. For example, in a Level 1 garden, shrubs ‘have not been pruned for a considerable period’. You need to move on from that to a Level 2 garden where ‘shrubs have not been pruned recently’. The lawn needs to move from having ‘not been mown for some years’ to Level 2 ‘has only been mown sporadically’. Aim to move the boundary fencing on from Level 1 ‘will have collapsed’ to Level 2 ‘in disrepair’.  To move on from Level 1 for rubbish, for example, you’ll need to move that old armchair to a far corner of the garden.

STEP C:

Now move the garden from Level 2 to Level 3. This means you should ensure the garden is ‘well tended’ rather than ‘a little overgrown’. What useful advice!

Using level descriptors makes it so clear for you doesn’t it? Hubby is trying to insist that I also leave you his instructions but they are hopeless as he doesn’t understand that you need to know your next steps to make progress in gardening. He’s written reams and reams of advice including instructions like:

‘You’ll find the strimmer in the garage’

‘Start by clearing all the nettles’

‘Ken will come and help you shift the concrete’

‘The tip is open from 10-4 at weekends’

‘Marion next door can advise you about the best bedding plants to buy’

His instructions are just too specific to our garden. To learn the gardening skills that will achieve a Level 3 garden what you need is to really know your next level targets. I won’t confuse you by leaving you his nonsense!

We’ll see you in September and in the meantime we wish you happy gardening!

 

With apologies to any actual gardeners out there who know what they are talking about and enormous thanks to Daisy Christodoulou whose recent book helped me appreciate just why we shouldn’t use level descriptors as feedback. 

Knowledge organisers: fit for purpose?

Definition of a knowledge organiser: Summary of what a student needs to know that must be fitted onto an A4 sheet of paper.

Desk bins: Stuff I Don't Need to Know...
Desk bins: Stuff I Don’t Need to Know…

If you google the term ‘knowledge organisers’ you’ll find a mass of examples. They are on sale on the TES resource site – some sheets of A4 print costing up to £7.50. It seems knowledge organisers have taken off. Teachers up and down the country are beavering away to summarise what needs to be known in their subject area.

It is good news that teachers are starting to think more about curriculum. More discussion of the ‘what’ is being taught, how it should be sequenced and how it can be remembered is long overdue. However, I think there is a significant weakness with some of these documents. I looked at lots of knowledge organisers to prepare for training our curriculum leaders and probably the single biggest weakness I saw was a confusion over purpose.

 

I think there are three very valid purposes for knowledge organisers:

  1. Curriculum mapping – for the TEACHER

Identifying powerful knowledge, planning to build schemas, identifying transferable knowledge and mapping progression in knowledge.

  1. For reference – for the PUPIL

In place of a textbook or a form of summary notes for pupils to reference.

  1. A list of revision items – for the PUPIL (and possibly the parents)

What the teacher has decided ALL pupils need to know as a minimum at the end of the topic.

 

All three purposes can be valid but when I look at the mass of organisers online I suspect there has often been a lack of clarity about the purpose the knowledge organiser is to serve.

Classic confusions of purpose:

  1. Confusing a curriculum mapping document with a reference document:

A teacher sits down and teases out what knowledge seems crucial for a topic. As they engage in this crucial thinking they create a dense document full of references that summarises their ideas. So far so good…but a document that summarises a teacher’s thinking is unlikely to be in the best format for a child to use. The child, given this document, sees what looks like a mass of information in tiny text, crammed onto one sheet of A4. They have no real notion of which bits to learn, how to prioritise the importance of all that detail or apply it. This knowledge is self-evident to the teacher but not the child.

  1. Confusing a knowledge organiser with a textbook:

Teachers who have written textbooks tell me that there is a painstaking editorial process to ensure quality. Despite this there is a cottage industry of teachers writing series of knowledge organisers which amount to their own textbooks. Sometimes this is unavoidable. Some textbooks are poor and some topics aren’t covered in the textbooks available. Perhaps sometimes the desperate and continual begging of teachers that their school should prioritise the purchase of textbooks falls on deaf ears and teachers have no choice but to spend every evening creating their own textbooks photocopied on A4 paper…

…but perhaps we all sometimes need to remind ourselves that there is no virtue in reinventing the wheel.

  1. Confusing a textbook with summary notes:

The information included on an A4 sheet of paper necessarily lacks the explanatory context contained in a textbook or detailed notes. If such summaries are used in place of a textbook or detailed notes the student will lack the explanation they need to make sense of the detail.

  1. Confusing a reference document or notes with a list of revision items for a test

If we want all pupils to acquire mastery of some basics we can list these basic facts we have identified as threshold knowledge in a knowledge organiser. We can then check that the whole class know these facts using a test. The test requires the act of recall which also strengthens the memory of these details in our pupils’ minds.

Often, however, pupils are given reference documents to learn. In this situation the details will be too extensive to be learnt for one test. It is not possible to expect the whole class to know everything listed and so the teacher cannot ensure that all pupils have mastered some identified ‘threshold’ facts. Weaker students will be very poor at recognising what are the most important details they should focus on learning, poor at realising what is likely to come up in a test and the format in which it will be asked. Many will also find a longer reference document contains an overwhelming amount of detail and give up. The chance to build self-efficacy and thus self-esteem has been lost.

 

If you are developing knowledge organisers to facilitate factual testing then your focus is on Purpose C – creating a list of revision items. Below is a list of criteria I think are worth considering:

  1. Purpose (to facilitate mastery testing of a list of revision items)
  • Exclude knowledge present for the benefit of teacher
  • Exclude explanatory detail which should be in notes or a textbook.
  1. Amount
  • A short topic’s worth (e.g. two weeks teaching at GCSE)
  • An amount that all in the class can learn
  • Careful of expectations that are too low and if necessary ramp up demand once habit in place.
  1. Threshold or most ‘powerful’ knowledge
  • Which knowledge is necessary for the topic?
  • Which knowledge is ‘collectively sufficient’ for the topic?
  • Which knowledge will allow future learning of subsequent topics?
  • Which knowledge will best prompt retrieval of chunks of explanatory detail?
  • CUT any extraneous detail (even if it looks pretty)
  • Include relevant definitions, brief lists of factors/reasons arguments, quotes, diagrams and summaries etc.
  • Check accuracy (especially when adapting internet finds)
  1. Necessary prior knowledge
  • Does knowledge included in the organiser presume grasp of other material unlikely to yet be mastered?
  1. Concise wording
  • Is knowledge phrased in the way you wish it to be learned?

Happy knowledge organising!

 

One approach to regular, low stakes and short factual tests.

I find the way in which the Quizlet app has taken off fascinating. Millions (or billions?) has been pumped into ed tech but Quizlet did not take off because education technology companies marketed it to schools. Pupils and teachers had to ‘discover’ Quizlet. They appreciated it’s usefulness for that most basic purpose of education – learning. The growth of Quizlet was ‘bottom up’ while schools continue to have technological solutions looking for problems thrust upon them from above. What an indictment of the ed tech industry.

There has been a recent growth of interest in methods of ensuring students learn long term the content they have been taught. This is in part due to the influence of research in cognitive psychology but also due to some influential education bloggers such as Joe Kirby and the changing educational climate caused by a shift away from modular examinations. Wouldn’t it be wonderful if innovation in technology focused on finding simple solutions to actual problems (like Quizlet) instead of chasing Sugata Mitra’s unicorn of revolutionising learning?

In the meantime we must look for useful ways to ensure students learn key information without the help of the ed tech industry. I was very impressed by the ideas Steve Mastin shared at the Historical Association conference yesterday but I realised I had never blogged about my own approach and its pros and cons compared with others I have come across.

I developed a system of regular testing for our history and politics department about four years ago. I didn’t know about the research from cognitive psychology back then and instead used what I had learnt from using Direct Instruction programmes with my primary aged children.

Key features of this approach to regular factual testing at GCSE and A level:

  • Approximately once a fortnight a class is given a learning homework, probably at the end of a topic or sub topic.
  • All children are given a guidance sheet that lists exactly what areas will come up in the test and need to be learnt. Often textbook page references are provided so key material can be easily located.

AAAAA Test

  • The items chosen for the test reflect the test writer’s judgement of what constitute the very key facts that could provide a minimum framework of knowledge for that topic (n.b. the students are familiar with the format and know how much material will be sufficient for an ‘explain’ question.) The way knowledge has been presented in notes or textbook can make it easier or more difficult for the students to find relevant material to learn. In the example above the textbook very conveniently summarises all they need to know.
  • The test normally takes about 10-15 minutes of a lesson. The test is always out of 20 and the pass mark is high, always 14/20. Any students who fail the test have to resit it in their own time. We give rewards for full marks in the test. The test writer must try and ensure that the test represents a reasonable amount to ask all students to learn for homework or the system won’t work.
  • There is no time limit for the test. I just take them in when all are finished.

I haven’t developed ‘knowledge organisers’, even though I can see the advantages of them because I don’t want to limit test items to the amount of material that can be fitted onto one sheet of paper. Additionally, I’ve always felt a bit nervous about sending the message that there is something comprehensive about the material selected for testing. I’ve found my approach has some advantages and disadvantages.

Advantages of this approach to testing:

  • It is regular enough that tests never have to cover too much material and become daunting.
  • I can set a test that I can reasonably expect all students in the class to pass if they do their homework.
  • The regularity allows a familiar routine to develop. The students adjust to the routine quickly and they quite like it.
  • The guidance sheet works better than simply telling students which facts to learn. This is because they must go back to their notes or textbook and find the information which provides a form of review and requires some active thought about the topic.
  • The guidance sheet works when it is clear enough to ensure all students can find the information but some thought is still necessary to locate the key points.
  • Test questions often ask students to use information in the way they will need to use it in extended writing. For example I won’t just ask questions like “When did Hitler come to power”. I will also ask questions like “Give two reasons why Hitler ordered the Night of the Long Knives”.
  • Always making the test out of 20 allows students to try and beat their last total. The predictability of the pass mark also leads to acceptance of it.
  • Initially we get lots of retakers but the numbers very quickly dwindle as the students realise the inevitability of the consequence of the failure to do their homework.
  • The insistence on retaking any failed tests means all students really do end up having to learn a framework of key knowledge.
  • I’ve found that ensuring all students learn a minimum framework of knowledge before moving on has made it easier to teach each subsequent topic. There is a lovely sense of steadily accumulating knowledge and understanding. I also seem to be getting through the course material faster despite the time taken for testing.

Disadvantages of my approach to testing:

  • It can only work in a school with a culture of setting regular homework that is generally completed.
  • Teachers have to mark the tests because the responses are not simple factual answers. I think this is a price worth paying for a wider range of useful test items but I can see that this becomes more challenging depending on workload.
  • There is no neat and simple knowledge organiser listing key facts.
  • We’re fallible. Sometimes guidance isn’t as clear as intended and you need to ensure test materials really are refined for next year and problems that arise are not just forgotten.
  • If you’re not strict about your marking your class will gradually learn less and less for each point on the guidance sheet.
  • This system does not have a built in mechanism for reviewing old test material in a systematic way.

We have just not really found that lower ability students (within an ability range of A*-D) have struggled. I know that other schools using similar testing with wider ability ranges have not encountered significant problems either. Sometimes students tell us that they find it hard to learn the material. A few do struggle to develop the self discipline necessary to settle down to some learning but we haven’t had a student who is incapable when they devote a reasonable amount of time. Given that those complaining are usually just making an excuse for failure to do their homework I generally respond that if they can’t learn the material for one tiny test how on earth are they proposing to learn a whole GCSE? I check that anyone that fails a test is revising efficiently but after a few retakes it transpires that they don’t, after all, have significant difficulties learning the material. Many students who are weak on paper like the tests.

We also set regular tests of chronology. At least once a week my class will put events printed onto cards into chronological order and every now and then I give them a test like the one below after a homework or two to learn the events. I don’t have to mark these myself – which is rather an advantage!

AAAA Test Photo

 

I very much liked Steve Mastin’s approach of giving multiple choice tests periodically which review old material. Good multiple choice questions can be really useful but are very hard to write. Which brings me back to my first point. Come on education technology industry! How about dropping the development of impractical, rather time consuming and gimmicky apps. We need those with funding and expertise to work in conjunction with curriculum subject experts to develop genuinely useful and subject specific forms of assessment.  It must be possible develop products that can really help us assess and track success learning the key information children need to know in each subject.

Testing – a double edged sword.

As teachers we’ve all had those moments when, eyes shining, tongue loosed by the excitement of the moment, we share a fascinating nugget of detail with our class. We’ve all also experienced the dull deflation of that enthusiasm when our students respond “But is this in the exam? Do we actually need to know this?” It seems our focus on testing has created a generation of students who view their studies purely as a means to an end and have lost the ability to enjoy learning for its own sake. Such responses from my classes normally trigger an agony of soul searching on my part. I question whether my desire to get my students good results means these sorts of responses are my own fault, a just retribution for my desire to show off my teaching prowess through a healthy end of year results spreadsheet. The same problem is seen with primary children asking what they need to do to get to the next level rather than enquiring further into a subject. We see the same problem with GCSE English courses in which the easiest books are chosen and only read in extract form, to optimize exam results. I also despise (yes it is that strong) the nonsensical hoop jumping drill that consumes hours of teaching time and is only to ensure student responses conform to exam rubrics so they can get the marks they deserve.

These drawbacks of testing are explained in a blog post by Daisy Christodoulou who took part in a debate recently with Toby Young, Tristram Hunt and Tony Little on the subject of testing. Daisy explains that the proposition of the debate was that tests were ‘essentially a necessary evil…in many ways inimical to good education…Tony Little said that our focus should not be on exams, but on ensuring a love of learning.’ In her post Daisy argues coherently that testing is nonetheless very useful for the reliable feedback it provides and the way the ‘testing effect’ aids memory. I agree but would go further than arguing for teacher set tests. I question the assumption that external exams such as GCSEs and A levels are just a necessary evil, inimical to good education. I’ll explain further.

A week ago my school had their year 13 parent’s evening. The talk was all of university applications and predicted grades. Students had been investigating universities and realisation had dawned that they were not going to get to the prestigious institutions their ambitions desired without those crucial A grades. Every year students that had never quite been able to take their studies seriously wise up to reality, you can see a new purpose in their demeanour as they ‘set aside childish things’ and get down to some serious study. External exams are essential for good education because without them too many students would never summon up that motivation to learn, or to learn enough, in enough detail and never reach a standard they would otherwise be capable of. Witness what happens when teachers are told their subject will still be taught but no longer examined at GCSE or A level. You may have noticed the campaigns to stop A levels being scrapped in languages such as Polish. Teachers know perfectly well that what is examined generally IS what is taken seriously. Where exams aren’t used other forms of competition tend to arise to serve the same purpose.

The assumption that motivation in education should be intrinsic goes pretty unquestioned but while most teachers would profess to believe this, their behaviour would suggest otherwise. Why is it that every year children are under so much stress from SATs? The children have no reason to take these seriously. It is the teachers that explain the importance of these tests to the pupils – to ensure they take the tests seriously, that they pay attention and work hard. Researchers expect a significant diminution in performance on tests when the stakes are low and have to factor this into their analysis.

Eric Kalenze said in his talk at ResearchEd that extrinsic motivation is seriously underrated in education and I agree with him. On the one hand we must avoid bribing children when they would or could work happily with no reward, this is clearly counterproductive. We also want to skilfully withdraw extrinsic rewards as we can see the children are becoming capable of appreciating the content for its own sake. We want to stimulate our students’ curiosity, help them to appreciate what they are learning. However, human motivation is complex. Just how many children ever would learn to their full potential with only intrinsic motivators? I’ve certainly heard of some but even then enthusiasms tend to be selective. I can’t help thinking that if avoidance of extrinsic motivators was an educational panacea Steiner Schools would have taken off in a way they never have.

Just how many students would be sitting in our secondary schools or our A level classes if it were not compulsory and they didn’t need proof of their learning for success later in life? Could it be that external exams rather than being harmful to deeper learning are actually the very REASON why children end up learning lots? If, at 16, it had made no difference to my future whether I understood maths GCSE I might just have spent more time following my enthusiasm for 19thC novels and neglected mathematics entirely. I have also known countless students fall in love with a subject as they study but the initial impetus for that study was the desire for exam success. To really excel in a subject takes serious hard work and discipline. Often the rewards of study are only really appreciated after much toil. Even as an adult can I really say that my motivation to learn things I find interesting is purely for its own sake? So often that genuine curiosity is mixed with a wish for acknowledgment of our erudition or a desire to bolster our own self esteem through feeling learned.

Exams are a double edged sword. True, that focus on exam success over the subject matter taught for its own sake is undoubtedly harmful. We must work to limit that harm while acknowledging that exam certificates are often the very reason our students choose to study. The idea most children would learn more without exams is untested idealism and ignores lived reality.

Every September I ask my new year 12 politics students why they are studying A levels. Every year they tell me it is so they can go to university and get a good career. At the end of every year I ask them if they are pleased they now understand so much more about politics – and they are. Job done!

Is reliability becoming the enemy of validity?

What would happen if I, as a history graduate, set out to write a mark scheme for a physics GCSE question? I dropped physics after year 9 but I think it is possible I could devise some instructions to markers that would ensure they all came up with the same marks for a given answer. In other words my mark scheme could deliver a RELIABLE outcome. However, what would my enormously experienced physics teacher husband think of my mark scheme? I think he’d either die of apoplexy or from laughing so hard:

“Heather, why on earth should they automatically get 2 marks just because they mentioned the sun? You’ve allowed full marks for students using the word gravity…”

After all I haven’t a notion how to effectively discriminate between different levels of understanding of physics concepts. My mark scheme might be reliable but it would not deliver a valid judgement of the students’ understanding of physics.

A few weeks ago at ResearchEd the fantastically informed  Amanda Spielman gave a talk on research Ofqual has done into the reliability of exam marking. Their research and that of Cambridge Assessment suggest marking is more RELIABLE than has been assumed by teachers. This might surprise teachers familiar with this every summer:

It is late August. A level results are out and the endless email discussions begin:

Hi Heather, Jake has emailed me. He doesn’t understand how he got an E on Politics A2 Unit 4 when he revised just as much as for Unit 3 (in which he got a B). I advised him to get a remark. Best, Alan

Dear Mrs F, My remark has come back unchanged and it means I’ve missed my Uni offer. I just don’t understand how I could have got an E. I worked so hard and had been getting As in my essays by the end. Would you look at my exam paper if I order it and see what you think? Thanks, Jake

Hi Alan, I’ve looked Jake’s paper. I though he must have fluffed an answer but all five answers have been given D/E level marks. I just don’t get it. He’s written what we taught him. Maybe the answers aren’t quite B standard – but E? Last year this unit was our best paper and this year it is a car crash. I’ll ask Clarissa if we can order her paper as she got full uniform marks. It might give some insight.  Heather

Alan, I’ve looked at Clarissa’s paper. See what you think. It is a great answer. She learns like a machine and has reproduced the past mark scheme. Jake has made lots of valid points but not necessarily the ones in the mark scheme. Arguably they are key, but then again, you could just as easily argue other points are as important and how can such decent answers end up as E grade even if they don’t hit all mark scheme bullet points precisely? I just despair. How can we continue to deliver this course with conviction when we have no idea what will happen in the exam each year? Heather

I don’t like to blow my own trumpet but the surprisingly low marks on our A2 students’ politics papers was an aberration from what was a fantastic results day for our department this year:

Hi Heather, OMG those AS history results are amazing!!!! Patrick an A!!!! I worried Susie would get a C and she scored 93/100, where did that come from? Trish

I don’t tend to quibble when the results day lottery goes our way but I can admit that it is part of the same problem. Marking of subjects such as history and politics will always be less reliable than in maths and we must remember it is the overall A level score (not the swings between individual module results) that needs to be reliable. But… even so… there seems to be enormous volatility in our exam system. The following are seen in my department every year:

  1. Papers where the results have a very surprising (wrong) rank order. Weak students score high As while numerous students who have only ever written informed, insightful and intelligent prose have D grades.
  2. Students with massive swings in grades between papers (e.g. B on one and E on the other) despite both papers being taught by the same teacher and with the same general demands.
  3. Exam scripts where it is unclear to the teacher why a remark didn’t lead to a significant change in the result for a candidate.
  4. Quite noticeable differences in the degree of volatility over the years in results depending on paper, subject (history or politics in my case) and even exam board.

Cambridge Assessment have been looking into this volatility and suggested that different markers ARE coming up with similar enough marks for the same scripts – marking is reliable enough. However, it is then assumed by the report writers that all other variation must be at school/student level. There is no doubt that there are a multitude of school and student level factors that might explain volatility in results such as different teachers  covering a course, variations in teaching focus or simply that a student had a bad day. However, why was no thought given to whether lack of validity explains volatility in exam results?

For example, I have noticed a trend in our own results at GCSE and A level. The papers with quite flexible mark schemes, with more reliance on marker expertise, deliver more consistent outcomes closer to our own expectations of the students. It looks like attempts to make our politics A level papers more reliable have simply narrowed the range of possible responses that get reward limiting the ability of the assessment to discriminate effectively between student responses. Organisations such as HMC know there is a problem but perhaps overemphasise the impact of inexperienced markers.

The mounting pressure on exam boards from schools has driven them to make their marking ever more reliable but this actually leads to increases in unexpected grade variation and produces greater injustices as the assessment becomes worse at discriminating between candidates. This process is exacerbated by the loss of face to face standardisation meetings (and in subjects such as politics markers unused to teaching the material) and thus markers are ever more dependent and/or tied to the mark scheme in front of them to guide their decision making. If students regularly have three grades difference between modules perhaps the exam board should stop blathering on about the reliability of their systems and start thinking about the validity of their assessment.

The drive for reliability can too often be directly at the expense of validity.

It is a dangerously faulty assumption that if marking is reliable then valid inferences can be drawn from the results. We know that for some time the education establishment has been rather blasé about the validity of its assessments.

  • Apparently our country’s school children have been marching fairly uniformly up through National Curriculum levels, even though we know learning is not actually linear or uniform. It seems that whatever the levels presumed to measure it was not giving a valid snapshot of progress.
  • I’ve written about how history GCSE mark schemes assume a spurious progression in (non-existent) generic analytical skills.
  • Too often levels of response mark schemes are devised by individuals with little consideration of validity.
  • Dylan Wiliam points out that reliable assessment of problem solving often requires extensive rubrics  which must define a ‘correct’ method’ of solving the problem.
  • EYFS assesses progress in characteristics such as resilience when we don’t even know if it can be taught and critical thinking and creativity when these are not constructs that can be generically assessed.

My experience at A level is just one indication of this bigger problem of inattention to validity of assessments in our education system.