Part 2: Early years assessment is not reliable or valid and thus not helpful

This is the second post on early years assessment. The first is here

Imagine the government decided they wanted children to be taught to be more loving. Perhaps the powers that be could decide to make teaching how to love statutory and tell teachers they should measure each child’s growing capacity to love.

Typical scene in the EYFS classroom – a teacher recording observational assessment. 

There would be serious problems with trying to teach and assess this behaviour:

Definition: What is love? Does the word actually mean the same thing in different contexts? When I talk about ‘loving history’ am I describing the same thing (or ‘construct’) as when I ‘love my child’.

Transfer:  Is ‘love’ something that universalises between contexts? For example if you get better at loving your sibling will that transfer to a love of friends, or school or learning geography?

Teaching: Do we know how to teach people to love in schools? Are we even certain it’s possible to teach it?

Progress: How does one get better at loving? Is progress linear? Might it just develop naturally?

Assessment: If ‘loving skills’ actually exist can they be effectively measured?

 

 

 

 

Loving – a universalising trait that can be taught?

The assumption that we can teach children to ‘love’ in one context and they’ll exercise ‘love’ in another might seem outlandish but, as I will explain, the writers of early years assessment fell into just such an error in the Early Years Foundation stage framework and assessment profile.

In my last post I explained how the priority on assessment in authentic environments has been at the cost of reliability and has meant valid conclusions cannot be drawn from Early Years Foundation Stage Profile assessment data. There are, however, other problems with assessment in the early years…

Problems of ‘validity’ and ‘construct validity’

Construct validity: is the degree to which a test measures what it claims, or purports, to be measuring.

 Validity: When inferences can be drawn from an assessment about what students can do in other situations, at other times and in other contexts.

If we think we are measuring ‘love’ but it doesn’t really exist as a single skill that can be developed then our assessment is not valid. The inferences we draw from that assessment about student behaviour would also be invalid.

Let’s relate this to the EYFS assessment profile.

Problems with the EYFS Profile ‘characteristics of effective learning’

The EYFS Profile Guide requires practitioners to comment a child’s skills and abilities in relation to 3 ‘constructs’ labelled as ‘characteristics of effective learning’:

We can take one of these characteristics of effective learning to illustrate a serious problem of validity of the assessment. While a child might well demonstrate creativity and critical thinking (the third characteristic listed) it is now well established that such behaviours are NOT skills or abilities that can be learnt in one context and transferred to another entirely different context- they don’t universalise any more than ‘loving’. In fact the capacity to be creative or think critically is dependent on specific knowledge of the issue in question. Many children can think very critically about football but that apparent behaviour evaporates when faced with some maths.  You’ll think critically in maths because you know a lot about solving similar maths problems and this capacity won’t make you think any more critically when solving something different like a word puzzle or a detective mystery.

Creating and thinking critically are NOT skills or abilities that can be learnt in one context and then applied to another

Creating and thinking critically are not ‘constructs’ which can be taught and assessed in isolation. Therefore there is no valid general inference about these behaviours, which could be described as a ‘characteristic of learning’, observed and reported. If you wish a child to display critical thinking you should teach them lots of relevant knowledge about the specific material you would like them to think critically about.

In fact, what is known about traits such as critical thinking suggests that they are ‘biologically primary’ and don’t even need to be learned [see an accessible explanation here].

Moving on to another characteristic of effective learning: active learning or motivation. This presupposes that ‘motivation’ is also a universalising trait as well as that we are confident that we know how to inculcate it. In fact, as with critical thinking, it is perfectly possible to be involved and willing to concentrate in some activities (computer games) but not others (writing).

There has been high profile research on motivation, particularly Dweck’s work on growth mindset and Angela Duckworth’s on Grit. Angela Duckworth, has created a test that she argues demonstrates that adult subjects possess a universalising trait which she calls ‘Grit’. But even this world expert concedes that we do not know how to teach Grit and rejects her Grit scale being used for high stakes tests. Regarding Growth Mindset, serious doubts have been raised about failures to replicate Dweck’s research findings and studies with statistically insignificant results that have been used to support Growth Mindset.

Despite serious questions around the teaching of motivation, the EYFS Profile ‘characteristics of learning’ presume this is a trait that can be inculcated in pre-schoolers and without solid research evidence it is simply presumed it can be reliably assessed.

For the final characteristic of effective learning, playing and learning. Of course children learn when playing. This does not mean the behaviours to be assessed under this heading (‘finding out and exploring’, ‘using what they know in play’ or ‘being willing to have a go’) are any more universalising as traits or less dependent on context than the other characteristics discussed. It cannot just be presumed that they are.

Problems with the ‘Early Learning Goals’

 At the end of reception each child’s level of development is assessed against the 17 EYFS Profile ‘Early Learning Goals. In my previous post I discussed the problems with the reliability of this assessment. We also see the problem of construct validity in many of the assumptions within the Early Learning Goals. Some goals are clearly not constructs in their own right and others may well not be and serious questions need to be asked about whether they are universalising traits or actually context dependent behaviours.

For example, ELG 2 is ‘understanding’. Understanding is not a generic skill. It is dependent on domain specific knowledge. True, a child does need to know the meaning of the words ‘how’ and ‘why’ which are highlighted in the assessment but while understanding is a goal of education it can’t be assessed generically as you have to understand something and this does not mean you will understand something else. The same is true for ‘being imaginative’ ELG17.

An example of evidence of ELG 2, understanding, in the EYFS profile exemplification materials.

Are ELG1 ‘listening and attention’ or ELG 16 ‘exploring and using media materials’ actually universalising constructs? I rarely see qualitative and observational early years research that even questions whether these early learning goals are universalising traits, let alone looks seriously at whether they can be assessed. This is despite decades of research in cognitive psychology leading to a settled consensus which challenges many of the unquestioned constructs which underpin EYFS assessment.

It is well known that traits such as understanding, creativity, critical thinking don’t universalise. Why, in early years education, are these bogus forms of assessment not only used uncritically but allowed to dominate the precious time when vulnerable children could be benefiting from valuable teacher attention?

n.b. I have deliberately limited my discussion to a critique using general principles of assessment rather than arguments that would need to based on experience or practice.

Early years assessment is not reliable or valid and thus not helpful

The academic year my daughter was three she attended two different nursery settings. She took away two quite different EYFS assessments, one from each setting, at the end of the year. The disagreement between these was not a one off mistake or due to incompetence but inevitable because EYFS assessment does not meet the basic requirements of effective assessment – that it should be reliable and valid*.

We have a very well researched principles to guide educational assessment and these principles can and should be applied to the ‘Early Years Foundation Stage Profile’. This is the statutory assessment used nationally to assess the learning of children up to the age of 5. The purpose of the EYFS assessment profile is summative:

‘To provide an accurate national data set relating to levels of child development at the end of EYFS’

It is also used to ‘accurately inform parents about their child’s development’. The EYFS profile is not fit for these purposes and its weaknesses are exposed when it is judged using standard principles of assessment design.

EYFS profiles are created by teachers when children are 5 to report on their progress against 17 early learning goals and describe the ‘characteristics of their learning’. The assessment is through teacher observation. The profile guidance stresses that,

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration is taken from EYFS assessment exemplification materials for reading

Thus the EYFS Profile exemplification materials for literacy and maths only give examples of assessment through teacher observations when children are engaged in activities they have chosen to play (child initiated activities). This is a very different approach to subsequent assessment of children throughout their later schooling which is based on tests created by adults. The EYFS profile writers no doubt wanted to avoid what Wiliam and Black (Wiliam & Black, 1996) call the ‘distortions and undesirable consequences’ created by formal testing.

Reaching valid conclusions in formal testing requires:

  1.    Standard conditions – means there is reassurance that all children receive the same level of help
  2.    A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
  3.    Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)

The EYFS profile is specifically designed to avoid the distortions created by such restrictions that lead to an artificial test environment very different from the real life situations in which learning will need to be ultimately used. However, as I explain below, in so doing the profile loses necessary reliability to the extent that teacher observations cannot support valid inferences.

This is because when assessing summatively the priority is to create a shared meaning about how pupils will perform beyond school and in comparison with their peers nationally (Koretz 2008). As Wiliam and Black (1996) explain, ‘the considerable distortions and undesirable consequences [of formal testing] are often justified by the need to create consistency of interpretation.’ This is why GCSE exams are not currently sat in authentic contexts with teachers with clipboards (as in EYFS) observing children in attempted simulations of real life contexts. Using teacher observation can be very useful for an individual teacher when assessing formatively (deciding what a child needs to learn next) but the challenges of obtaining a reliable shared meaning nationally that stop observational forms of assessment being used for GCSEs do not just disappear because the children involved are very young.

Problems of reliability

Reliability: Little inconsistency between one measurement and the next (Koretz, 2008)

Assessing child initiated activities and the problem of reliability:

The variation in my daughter’s two assessments was unsurprising given that…

  • Valid summative conclusions require ‘standardised conditions of assessment’ between settings and this is not possible when observing child initiated play.
  • Nor is it possible to even create comparative tasks ranging in difficulty that all the children in one setting will attempt.
  • The teacher cannot be sure their observations effectively identify progress in each separate area as they have to make do with whatever children choose to do.
  • These limitations make it hard to standardise between children even within one setting and unsurprising that the two nurseries had built different profiles of my daughter.

The EYFS Profile Guide does instruct that practitioners ‘make sure the child has the opportunity to demonstrate what they know, understand and can do’ and does not preclude all adult initiated activities from assessment. However, the exemplification materials only reference child initiated activity and, of course, the guide instructs practitioners that

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration from EYFS assessment exemplification materials for writing. Note these do not have examples of assessment from written tasks a teacher has asked children to undertake – ONLY writing voluntarily undertaken by the child during play.

Assessing adult initiated activities and the problem of reliability

Even when some children are engaged in an activity initiated or prompted by an adult

  • The setting cannot ensure the conditions of the activity have been standardised, for example it isn’t possible to predict how a child will choose to approach a number game set up for them to play.
  • It’s not practically possible to ensure the same task has been given to all children in the same conditions to discriminate meaningfully between them.

Assessment using ‘a range of perspectives’ and the problem of reliability

The EYFS profile handbook suggests that:

‘Accurate assessment will depend on contributions from a range of perspectives…Practitioners should involve children fully in their own assessment by encouraging them to communicate about and review their own learning…. Assessments which don’t include the parents’ contribution give an incomplete picture of the child’s learning and development.’

A parent’s contribution taken from EYFS assessment exemplification materials for number

Given the difficulty one teacher will have observing all aspects of 30 children’s development it is unsurprising that the profile guide stresses the importance of contributions from others to increase the validity of inferences. However, it is incorrect to claim the input of the child or of parents will make the assessment more accurate for summative purposes. With this feedback the conditions, difficulty and specifics of the content will not have been considered creating unavoidable inconsistency.

Using child-led activities to assess literacy and numeracy and the problem of reliability

The reading assessment for one of my daughters seemed oddly low. The reception teacher explained that while she knew my daughter could read at a higher level the local authority guidance on the EYFS profile said her judgement must be based on ‘naturalistic’ behaviour. She had to observe my daughter (one of 30) voluntarily going to the book corner, choosing to reading out loud to herself at the requisite level and volunteering sensible comments on her reading.

 

Illustration is taken from EYFS assessment exemplification materials for reading Note these do not have examples of assessment from reading a teacher has asked children to undertake – ONLY reading voluntarily undertaken by the child during play.

The determination to preference assessment of naturalistic behaviour is understandable when assessing how well a child can interact with their peers. However, the reliability sacrificed in the process can’t be justified when assessing literacy or maths. The success of explicit testing of these areas suggests they do not need the same naturalistic criteria to ensure a valid inference can be made from the assessment.

Are teachers meant to interpret the profile guidance in this way? The profile is unclear but while the exemplification materials only include examples of naturalistic observational assessment we are unlikely to acquire accurate assessments of reading, writing and mathematical ability from EYFS profiles.

Five year olds should not sit test papers in formal exam conditions but this does not mean only observation in naturalistic settings (whether adult or child initiated) is reasonable or the most reliable option.  The inherent unreliability of observational assessment means results can’t support the inferences required for such summative assessment to be a meaningful exercise. It cannot, as intended ‘provide an accurate national data set relating to levels of child development at the end of EYFS’ or ‘accurately inform parents about their child’s development’.

In my next post I explore the problems with the validity of our national early years assessment.

 

*n.b. I have deliberately limited my discussion to a critique using assessment theory rather than arguments that would need to based on experience or practice.

References

Koretz, D. (2008). Measuring UP. Cambridge, Massachusetts: Harvard University Press.

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile. Retrieved from https://www.gov.uk/government/publications/early-years-foundation-stage-profile-handbook

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile: exemplification materials. Retrieved from https://www.gov.uk/government/publications/eyfs-profile-exemplication-materials

Wiliam, D., & Black, P. (1996). Meanings and consequences: a basis for distinguishing formative and summative functions of assessment? BERJ, 537-548.

Data Tracking and the LFs*

Until recently I was unfamiliar with the sorts of pupil tracking systems used in most schools. I’ve also recently had to get to grips with the plethora of acronyms commonly used to categorise groups of students being tracked. I’ve come across PP, LPAs, HPAs and LACs but, rather surprisingly, no mention of the LF. To be honest I am surprised by this gap given that in my considerable experience it is how the teacher and school manage the performance of the LFs that is most crucial to healthy end of year data. If the LFs perform near their potential you’re basically laughing all the way to the exam hall.

I should, at this stage, be clear. LF is not a standard acronym (it was invented by my husband) but it does describe a clearly recognisable and significant sub-section of any secondary school population. The L stands for lazy (and the second word begins with an F).

I am being very flippant, I know, but my point is serious enough.

Today I happened to need to look at a spreadsheet containing data for an old cohort from my last school. As my eye glanced down the baseline testing stats, used for tracking, I couldn’t help emitting frequent snorts of derision. The trigger of my scorn was the original baseline test data for some of my most ‘affectionately’ remembered GCSE students (truthfully, actually, I do remember them all with warmth). I commented to my husband that they needed to be real… erm… ‘LFs’ to score that low on the baseline given the brains with which I knew perfectly well that they were blessed.

If I and my colleagues had based our ambitions for those particular boys individuals on their predicted grade from the baseline they’d have cruised lazily through school. Their meagre efforts would have been continually affirmed as adequate which would have been ruinous for their habits and character and a betrayal of their potential.

If value added is what drives you it is also an obvious truth that if you effectively cap your ambitions for pupils by only showing concern when pupils don’t meet predicted grades from the baseline you’ll still have to absorb the scores of some pupils that just aren’t going to be able to live up to their predictions. Meanwhile you lose some of the scores of those that should do better than their baseline result suggests, that would otherwise balance everything out.

I think what bothers me most is the ‘inhumanity’ of a purely data driven approach to progress. How could school teachers, of all people, have devised a system that allows no room to acknowledge obvious human truth before our eyes? Exactly when weren’t and where aren’t some humans, sometimes, rather lazy? Down through the centuries school teachers have exercised their craft, ensuring pupils learn important things despite the entirely natural human propensity towards sloth, magnified in the teenage years. What made us think we could dispense with that wisdom, that our spreadsheets knew better?

Can we re-learn to teach the pupils that actually sit before us, responding to them using our hard-won expertise? Oh, I do hope so.

*Warning: this post nearly contains bad language.

The ‘quite tidy garden’ …or why level descriptors aren’t very helpful.

Dear Josh,

Thank you for agreeing to sort out our garden over your long holiday. As we’ll be away all summer here is a guide that tells you all you need to know to get

from this…

…to this

STEP A: You should begin by assessing the garden to decide its level. Read through these level descriptors to decide:

Level 1: Your garden is very overgrown. Any lawn has not been mown for some years. Shrubs have not been pruned for a considerable period. There are no visible beds and typically there will be large areas taken over by brambles and or nettles. There will probably be an abandoned armchair (or similar worn out furniture) somewhere in the overgrowth as well as assorted rubble and the old concrete base from a fallen shed. Boundary fencing will have collapsed.

Level 2: Your garden is just a little overgrown. The lawn is patchy though neglect and has only been mown sporadically. Shrubs generally have not been pruned recently. Beds look neglected and are not well stocked. There may be various forms of old rubbish abandoned in the far corners of the garden along with old lawn clippings and hedge trimmings. Boundary fences are in disrepair.

Level 3: Your garden is well tended. Lawns are mown regularly and contain no moss and weeds and shrubs are regularly pruned. Flower beds are well demarcated and contain no weeds. They are well stocked with appropriate bedding plants. The garden is quite tidy and boundary fencing is new and strong.

STEP B:

Josh, if you decide the garden is Level 1 (that is certainly our view) then I suggest you look at the Level 2 descriptor to guide you as to your next steps. It is clear that you need to move the garden from ‘very overgrown’ to ‘just a little overgrown’. For example, in a Level 1 garden, shrubs ‘have not been pruned for a considerable period’. You need to move on from that to a Level 2 garden where ‘shrubs have not been pruned recently’. The lawn needs to move from having ‘not been mown for some years’ to Level 2 ‘has only been mown sporadically’. Aim to move the boundary fencing on from Level 1 ‘will have collapsed’ to Level 2 ‘in disrepair’.  To move on from Level 1 for rubbish, for example, you’ll need to move that old armchair to a far corner of the garden.

STEP C:

Now move the garden from Level 2 to Level 3. This means you should ensure the garden is ‘well tended’ rather than ‘a little overgrown’. What useful advice!

Using level descriptors makes it so clear for you doesn’t it? Hubby is trying to insist that I also leave you his instructions but they are hopeless as he doesn’t understand that you need to know your next steps to make progress in gardening. He’s written reams and reams of advice including instructions like:

‘You’ll find the strimmer in the garage’

‘Start by clearing all the nettles’

‘Ken will come and help you shift the concrete’

‘The tip is open from 10-4 at weekends’

‘Marion next door can advise you about the best bedding plants to buy’

His instructions are just too specific to our garden. To learn the gardening skills that will achieve a Level 3 garden what you need is to really know your next level targets. I won’t confuse you by leaving you his nonsense!

We’ll see you in September and in the meantime we wish you happy gardening!

 

With apologies to any actual gardeners out there who know what they are talking about and enormous thanks to Daisy Christodoulou whose recent book helped me appreciate just why we shouldn’t use level descriptors as feedback. 

Knowledge organisers: fit for purpose?

Definition of a knowledge organiser: Summary of what a student needs to know that must be fitted onto an A4 sheet of paper.

Desk bins: Stuff I Don't Need to Know...
Desk bins: Stuff I Don’t Need to Know…

If you google the term ‘knowledge organisers’ you’ll find a mass of examples. They are on sale on the TES resource site – some sheets of A4 print costing up to £7.50. It seems knowledge organisers have taken off. Teachers up and down the country are beavering away to summarise what needs to be known in their subject area.

It is good news that teachers are starting to think more about curriculum. More discussion of the ‘what’ is being taught, how it should be sequenced and how it can be remembered is long overdue. However, I think there is a significant weakness with some of these documents. I looked at lots of knowledge organisers to prepare for training our curriculum leaders and probably the single biggest weakness I saw was a confusion over purpose.

 

I think there are three very valid purposes for knowledge organisers:

  1. Curriculum mapping – for the TEACHER

Identifying powerful knowledge, planning to build schemas, identifying transferable knowledge and mapping progression in knowledge.

  1. For reference – for the PUPIL

In place of a textbook or a form of summary notes for pupils to reference.

  1. A list of revision items – for the PUPIL (and possibly the parents)

What the teacher has decided ALL pupils need to know as a minimum at the end of the topic.

 

All three purposes can be valid but when I look at the mass of organisers online I suspect there has often been a lack of clarity about the purpose the knowledge organiser is to serve.

Classic confusions of purpose:

  1. Confusing a curriculum mapping document with a reference document:

A teacher sits down and teases out what knowledge seems crucial for a topic. As they engage in this crucial thinking they create a dense document full of references that summarises their ideas. So far so good…but a document that summarises a teacher’s thinking is unlikely to be in the best format for a child to use. The child, given this document, sees what looks like a mass of information in tiny text, crammed onto one sheet of A4. They have no real notion of which bits to learn, how to prioritise the importance of all that detail or apply it. This knowledge is self-evident to the teacher but not the child.

  1. Confusing a knowledge organiser with a textbook:

Teachers who have written textbooks tell me that there is a painstaking editorial process to ensure quality. Despite this there is a cottage industry of teachers writing series of knowledge organisers which amount to their own textbooks. Sometimes this is unavoidable. Some textbooks are poor and some topics aren’t covered in the textbooks available. Perhaps sometimes the desperate and continual begging of teachers that their school should prioritise the purchase of textbooks falls on deaf ears and teachers have no choice but to spend every evening creating their own textbooks photocopied on A4 paper…

…but perhaps we all sometimes need to remind ourselves that there is no virtue in reinventing the wheel.

  1. Confusing a textbook with summary notes:

The information included on an A4 sheet of paper necessarily lacks the explanatory context contained in a textbook or detailed notes. If such summaries are used in place of a textbook or detailed notes the student will lack the explanation they need to make sense of the detail.

  1. Confusing a reference document or notes with a list of revision items for a test

If we want all pupils to acquire mastery of some basics we can list these basic facts we have identified as threshold knowledge in a knowledge organiser. We can then check that the whole class know these facts using a test. The test requires the act of recall which also strengthens the memory of these details in our pupils’ minds.

Often, however, pupils are given reference documents to learn. In this situation the details will be too extensive to be learnt for one test. It is not possible to expect the whole class to know everything listed and so the teacher cannot ensure that all pupils have mastered some identified ‘threshold’ facts. Weaker students will be very poor at recognising what are the most important details they should focus on learning, poor at realising what is likely to come up in a test and the format in which it will be asked. Many will also find a longer reference document contains an overwhelming amount of detail and give up. The chance to build self-efficacy and thus self-esteem has been lost.

 

If you are developing knowledge organisers to facilitate factual testing then your focus is on Purpose C – creating a list of revision items. Below is a list of criteria I think are worth considering:

  1. Purpose (to facilitate mastery testing of a list of revision items)
  • Exclude knowledge present for the benefit of teacher
  • Exclude explanatory detail which should be in notes or a textbook.
  1. Amount
  • A short topic’s worth (e.g. two weeks teaching at GCSE)
  • An amount that all in the class can learn
  • Careful of expectations that are too low and if necessary ramp up demand once habit in place.
  1. Threshold or most ‘powerful’ knowledge
  • Which knowledge is necessary for the topic?
  • Which knowledge is ‘collectively sufficient’ for the topic?
  • Which knowledge will allow future learning of subsequent topics?
  • Which knowledge will best prompt retrieval of chunks of explanatory detail?
  • CUT any extraneous detail (even if it looks pretty)
  • Include relevant definitions, brief lists of factors/reasons arguments, quotes, diagrams and summaries etc.
  • Check accuracy (especially when adapting internet finds)
  1. Necessary prior knowledge
  • Does knowledge included in the organiser presume grasp of other material unlikely to yet be mastered?
  1. Concise wording
  • Is knowledge phrased in the way you wish it to be learned?

Happy knowledge organising!

 

One approach to regular, low stakes and short factual tests.

I find the way in which the Quizlet app has taken off fascinating. Millions (or billions?) has been pumped into ed tech but Quizlet did not take off because education technology companies marketed it to schools. Pupils and teachers had to ‘discover’ Quizlet. They appreciated it’s usefulness for that most basic purpose of education – learning. The growth of Quizlet was ‘bottom up’ while schools continue to have technological solutions looking for problems thrust upon them from above. What an indictment of the ed tech industry.

There has been a recent growth of interest in methods of ensuring students learn long term the content they have been taught. This is in part due to the influence of research in cognitive psychology but also due to some influential education bloggers such as Joe Kirby and the changing educational climate caused by a shift away from modular examinations. Wouldn’t it be wonderful if innovation in technology focused on finding simple solutions to actual problems (like Quizlet) instead of chasing Sugata Mitra’s unicorn of revolutionising learning?

In the meantime we must look for useful ways to ensure students learn key information without the help of the ed tech industry. I was very impressed by the ideas Steve Mastin shared at the Historical Association conference yesterday but I realised I had never blogged about my own approach and its pros and cons compared with others I have come across.

I developed a system of regular testing for our history and politics department about four years ago. I didn’t know about the research from cognitive psychology back then and instead used what I had learnt from using Direct Instruction programmes with my primary aged children.

Key features of this approach to regular factual testing at GCSE and A level:

  • Approximately once a fortnight a class is given a learning homework, probably at the end of a topic or sub topic.
  • All children are given a guidance sheet that lists exactly what areas will come up in the test and need to be learnt. Often textbook page references are provided so key material can be easily located.

AAAAA Test

  • The items chosen for the test reflect the test writer’s judgement of what constitute the very key facts that could provide a minimum framework of knowledge for that topic (n.b. the students are familiar with the format and know how much material will be sufficient for an ‘explain’ question.) The way knowledge has been presented in notes or textbook can make it easier or more difficult for the students to find relevant material to learn. In the example above the textbook very conveniently summarises all they need to know.
  • The test normally takes about 10-15 minutes of a lesson. The test is always out of 20 and the pass mark is high, always 14/20. Any students who fail the test have to resit it in their own time. We give rewards for full marks in the test. The test writer must try and ensure that the test represents a reasonable amount to ask all students to learn for homework or the system won’t work.
  • There is no time limit for the test. I just take them in when all are finished.

I haven’t developed ‘knowledge organisers’, even though I can see the advantages of them because I don’t want to limit test items to the amount of material that can be fitted onto one sheet of paper. Additionally, I’ve always felt a bit nervous about sending the message that there is something comprehensive about the material selected for testing. I’ve found my approach has some advantages and disadvantages.

Advantages of this approach to testing:

  • It is regular enough that tests never have to cover too much material and become daunting.
  • I can set a test that I can reasonably expect all students in the class to pass if they do their homework.
  • The regularity allows a familiar routine to develop. The students adjust to the routine quickly and they quite like it.
  • The guidance sheet works better than simply telling students which facts to learn. This is because they must go back to their notes or textbook and find the information which provides a form of review and requires some active thought about the topic.
  • The guidance sheet works when it is clear enough to ensure all students can find the information but some thought is still necessary to locate the key points.
  • Test questions often ask students to use information in the way they will need to use it in extended writing. For example I won’t just ask questions like “When did Hitler come to power”. I will also ask questions like “Give two reasons why Hitler ordered the Night of the Long Knives”.
  • Always making the test out of 20 allows students to try and beat their last total. The predictability of the pass mark also leads to acceptance of it.
  • Initially we get lots of retakers but the numbers very quickly dwindle as the students realise the inevitability of the consequence of the failure to do their homework.
  • The insistence on retaking any failed tests means all students really do end up having to learn a framework of key knowledge.
  • I’ve found that ensuring all students learn a minimum framework of knowledge before moving on has made it easier to teach each subsequent topic. There is a lovely sense of steadily accumulating knowledge and understanding. I also seem to be getting through the course material faster despite the time taken for testing.

Disadvantages of my approach to testing:

  • It can only work in a school with a culture of setting regular homework that is generally completed.
  • Teachers have to mark the tests because the responses are not simple factual answers. I think this is a price worth paying for a wider range of useful test items but I can see that this becomes more challenging depending on workload.
  • There is no neat and simple knowledge organiser listing key facts.
  • We’re fallible. Sometimes guidance isn’t as clear as intended and you need to ensure test materials really are refined for next year and problems that arise are not just forgotten.
  • If you’re not strict about your marking your class will gradually learn less and less for each point on the guidance sheet.
  • This system does not have a built in mechanism for reviewing old test material in a systematic way.

We have just not really found that lower ability students (within an ability range of A*-D) have struggled. I know that other schools using similar testing with wider ability ranges have not encountered significant problems either. Sometimes students tell us that they find it hard to learn the material. A few do struggle to develop the self discipline necessary to settle down to some learning but we haven’t had a student who is incapable when they devote a reasonable amount of time. Given that those complaining are usually just making an excuse for failure to do their homework I generally respond that if they can’t learn the material for one tiny test how on earth are they proposing to learn a whole GCSE? I check that anyone that fails a test is revising efficiently but after a few retakes it transpires that they don’t, after all, have significant difficulties learning the material. Many students who are weak on paper like the tests.

We also set regular tests of chronology. At least once a week my class will put events printed onto cards into chronological order and every now and then I give them a test like the one below after a homework or two to learn the events. I don’t have to mark these myself – which is rather an advantage!

AAAA Test Photo

 

I very much liked Steve Mastin’s approach of giving multiple choice tests periodically which review old material. Good multiple choice questions can be really useful but are very hard to write. Which brings me back to my first point. Come on education technology industry! How about dropping the development of impractical, rather time consuming and gimmicky apps. We need those with funding and expertise to work in conjunction with curriculum subject experts to develop genuinely useful and subject specific forms of assessment.  It must be possible develop products that can really help us assess and track success learning the key information children need to know in each subject.