Part 2: Early years assessment is not reliable or valid and thus not helpful

This is the second post on early years assessment. The first is here

Imagine the government decided they wanted children to be taught to be more loving. Perhaps the powers that be could decide to make teaching how to love statutory and tell teachers they should measure each child’s growing capacity to love.

Typical scene in the EYFS classroom – a teacher recording observational assessment. 

There would be serious problems with trying to teach and assess this behaviour:

Definition: What is love? Does the word actually mean the same thing in different contexts? When I talk about ‘loving history’ am I describing the same thing (or ‘construct’) as when I ‘love my child’.

Transfer:  Is ‘love’ something that universalises between contexts? For example if you get better at loving your sibling will that transfer to a love of friends, or school or learning geography?

Teaching: Do we know how to teach people to love in schools? Are we even certain it’s possible to teach it?

Progress: How does one get better at loving? Is progress linear? Might it just develop naturally?

Assessment: If ‘loving skills’ actually exist can they be effectively measured?





Loving – a universalising trait that can be taught?

The assumption that we can teach children to ‘love’ in one context and they’ll exercise ‘love’ in another might seem outlandish but, as I will explain, the writers of early years assessment fell into just such an error in the Early Years Foundation stage framework and assessment profile.

In my last post I explained how the priority on assessment in authentic environments has been at the cost of reliability and has meant valid conclusions cannot be drawn from Early Years Foundation Stage Profile assessment data. There are, however, other problems with assessment in the early years…

Problems of ‘validity’ and ‘construct validity’

Construct validity: is the degree to which a test measures what it claims, or purports, to be measuring.

 Validity: When inferences can be drawn from an assessment about what students can do in other situations, at other times and in other contexts.

If we think we are measuring ‘love’ but it doesn’t really exist as a single skill that can be developed then our assessment is not valid. The inferences we draw from that assessment about student behaviour would also be invalid.

Let’s relate this to the EYFS assessment profile.

Problems with the EYFS Profile ‘characteristics of effective learning’

The EYFS Profile Guide requires practitioners to comment a child’s skills and abilities in relation to 3 ‘constructs’ labelled as ‘characteristics of effective learning’:

We can take one of these characteristics of effective learning to illustrate a serious problem of validity of the assessment. While a child might well demonstrate creativity and critical thinking (the third characteristic listed) it is now well established that such behaviours are NOT skills or abilities that can be learnt in one context and transferred to another entirely different context- they don’t universalise any more than ‘loving’. In fact the capacity to be creative or think critically is dependent on specific knowledge of the issue in question. Many children can think very critically about football but that apparent behaviour evaporates when faced with some maths.  You’ll think critically in maths because you know a lot about solving similar maths problems and this capacity won’t make you think any more critically when solving something different like a word puzzle or a detective mystery.

Creating and thinking critically are NOT skills or abilities that can be learnt in one context and then applied to another

Creating and thinking critically are not ‘constructs’ which can be taught and assessed in isolation. Therefore there is no valid general inference about these behaviours, which could be described as a ‘characteristic of learning’, observed and reported. If you wish a child to display critical thinking you should teach them lots of relevant knowledge about the specific material you would like them to think critically about.

In fact, what is known about traits such as critical thinking suggests that they are ‘biologically primary’ and don’t even need to be learned [see an accessible explanation here].

Moving on to another characteristic of effective learning: active learning or motivation. This presupposes that ‘motivation’ is also a universalising trait as well as that we are confident that we know how to inculcate it. In fact, as with critical thinking, it is perfectly possible to be involved and willing to concentrate in some activities (computer games) but not others (writing).

There has been high profile research on motivation, particularly Dweck’s work on growth mindset and Angela Duckworth’s on Grit. Angela Duckworth, has created a test that she argues demonstrates that adult subjects possess a universalising trait which she calls ‘Grit’. But even this world expert concedes that we do not know how to teach Grit and rejects her Grit scale being used for high stakes tests. Regarding Growth Mindset, serious doubts have been raised about failures to replicate Dweck’s research findings and studies with statistically insignificant results that have been used to support Growth Mindset.

Despite serious questions around the teaching of motivation, the EYFS Profile ‘characteristics of learning’ presume this is a trait that can be inculcated in pre-schoolers and without solid research evidence it is simply presumed it can be reliably assessed.

For the final characteristic of effective learning, playing and learning. Of course children learn when playing. This does not mean the behaviours to be assessed under this heading (‘finding out and exploring’, ‘using what they know in play’ or ‘being willing to have a go’) are any more universalising as traits or less dependent on context than the other characteristics discussed. It cannot just be presumed that they are.

Problems with the ‘Early Learning Goals’

 At the end of reception each child’s level of development is assessed against the 17 EYFS Profile ‘Early Learning Goals. In my previous post I discussed the problems with the reliability of this assessment. We also see the problem of construct validity in many of the assumptions within the Early Learning Goals. Some goals are clearly not constructs in their own right and others may well not be and serious questions need to be asked about whether they are universalising traits or actually context dependent behaviours.

For example, ELG 2 is ‘understanding’. Understanding is not a generic skill. It is dependent on domain specific knowledge. True, a child does need to know the meaning of the words ‘how’ and ‘why’ which are highlighted in the assessment but while understanding is a goal of education it can’t be assessed generically as you have to understand something and this does not mean you will understand something else. The same is true for ‘being imaginative’ ELG17.

An example of evidence of ELG 2, understanding, in the EYFS profile exemplification materials.

Are ELG1 ‘listening and attention’ or ELG 16 ‘exploring and using media materials’ actually universalising constructs? I rarely see qualitative and observational early years research that even questions whether these early learning goals are universalising traits, let alone looks seriously at whether they can be assessed. This is despite decades of research in cognitive psychology leading to a settled consensus which challenges many of the unquestioned constructs which underpin EYFS assessment.

It is well known that traits such as understanding, creativity, critical thinking don’t universalise. Why, in early years education, are these bogus forms of assessment not only used uncritically but allowed to dominate the precious time when vulnerable children could be benefiting from valuable teacher attention?

n.b. I have deliberately limited my discussion to a critique using general principles of assessment rather than arguments that would need to based on experience or practice.

Early years assessment is not reliable or valid and thus not helpful

The academic year my daughter was three she attended two different nursery settings. She took away two quite different EYFS assessments, one from each setting, at the end of the year. The disagreement between these was not a one off mistake or due to incompetence but inevitable because EYFS assessment does not meet the basic requirements of effective assessment – that it should be reliable and valid*.

We have a very well researched principles to guide educational assessment and these principles can and should be applied to the ‘Early Years Foundation Stage Profile’. This is the statutory assessment used nationally to assess the learning of children up to the age of 5. The purpose of the EYFS assessment profile is summative:

‘To provide an accurate national data set relating to levels of child development at the end of EYFS’

It is also used to ‘accurately inform parents about their child’s development’. The EYFS profile is not fit for these purposes and its weaknesses are exposed when it is judged using standard principles of assessment design.

EYFS profiles are created by teachers when children are 5 to report on their progress against 17 early learning goals and describe the ‘characteristics of their learning’. The assessment is through teacher observation. The profile guidance stresses that,

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration is taken from EYFS assessment exemplification materials for reading

Thus the EYFS Profile exemplification materials for literacy and maths only give examples of assessment through teacher observations when children are engaged in activities they have chosen to play (child initiated activities). This is a very different approach to subsequent assessment of children throughout their later schooling which is based on tests created by adults. The EYFS profile writers no doubt wanted to avoid what Wiliam and Black (Wiliam & Black, 1996) call the ‘distortions and undesirable consequences’ created by formal testing.

Reaching valid conclusions in formal testing requires:

  1.    Standard conditions – means there is reassurance that all children receive the same level of help
  2.    A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
  3.    Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)

The EYFS profile is specifically designed to avoid the distortions created by such restrictions that lead to an artificial test environment very different from the real life situations in which learning will need to be ultimately used. However, as I explain below, in so doing the profile loses necessary reliability to the extent that teacher observations cannot support valid inferences.

This is because when assessing summatively the priority is to create a shared meaning about how pupils will perform beyond school and in comparison with their peers nationally (Koretz 2008). As Wiliam and Black (1996) explain, ‘the considerable distortions and undesirable consequences [of formal testing] are often justified by the need to create consistency of interpretation.’ This is why GCSE exams are not currently sat in authentic contexts with teachers with clipboards (as in EYFS) observing children in attempted simulations of real life contexts. Using teacher observation can be very useful for an individual teacher when assessing formatively (deciding what a child needs to learn next) but the challenges of obtaining a reliable shared meaning nationally that stop observational forms of assessment being used for GCSEs do not just disappear because the children involved are very young.

Problems of reliability

Reliability: Little inconsistency between one measurement and the next (Koretz, 2008)

Assessing child initiated activities and the problem of reliability:

The variation in my daughter’s two assessments was unsurprising given that…

  • Valid summative conclusions require ‘standardised conditions of assessment’ between settings and this is not possible when observing child initiated play.
  • Nor is it possible to even create comparative tasks ranging in difficulty that all the children in one setting will attempt.
  • The teacher cannot be sure their observations effectively identify progress in each separate area as they have to make do with whatever children choose to do.
  • These limitations make it hard to standardise between children even within one setting and unsurprising that the two nurseries had built different profiles of my daughter.

The EYFS Profile Guide does instruct that practitioners ‘make sure the child has the opportunity to demonstrate what they know, understand and can do’ and does not preclude all adult initiated activities from assessment. However, the exemplification materials only reference child initiated activity and, of course, the guide instructs practitioners that

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration from EYFS assessment exemplification materials for writing. Note these do not have examples of assessment from written tasks a teacher has asked children to undertake – ONLY writing voluntarily undertaken by the child during play.

Assessing adult initiated activities and the problem of reliability

Even when some children are engaged in an activity initiated or prompted by an adult

  • The setting cannot ensure the conditions of the activity have been standardised, for example it isn’t possible to predict how a child will choose to approach a number game set up for them to play.
  • It’s not practically possible to ensure the same task has been given to all children in the same conditions to discriminate meaningfully between them.

Assessment using ‘a range of perspectives’ and the problem of reliability

The EYFS profile handbook suggests that:

‘Accurate assessment will depend on contributions from a range of perspectives…Practitioners should involve children fully in their own assessment by encouraging them to communicate about and review their own learning…. Assessments which don’t include the parents’ contribution give an incomplete picture of the child’s learning and development.’

A parent’s contribution taken from EYFS assessment exemplification materials for number

Given the difficulty one teacher will have observing all aspects of 30 children’s development it is unsurprising that the profile guide stresses the importance of contributions from others to increase the validity of inferences. However, it is incorrect to claim the input of the child or of parents will make the assessment more accurate for summative purposes. With this feedback the conditions, difficulty and specifics of the content will not have been considered creating unavoidable inconsistency.

Using child-led activities to assess literacy and numeracy and the problem of reliability

The reading assessment for one of my daughters seemed oddly low. The reception teacher explained that while she knew my daughter could read at a higher level the local authority guidance on the EYFS profile said her judgement must be based on ‘naturalistic’ behaviour. She had to observe my daughter (one of 30) voluntarily going to the book corner, choosing to reading out loud to herself at the requisite level and volunteering sensible comments on her reading.


Illustration is taken from EYFS assessment exemplification materials for reading Note these do not have examples of assessment from reading a teacher has asked children to undertake – ONLY reading voluntarily undertaken by the child during play.

The determination to preference assessment of naturalistic behaviour is understandable when assessing how well a child can interact with their peers. However, the reliability sacrificed in the process can’t be justified when assessing literacy or maths. The success of explicit testing of these areas suggests they do not need the same naturalistic criteria to ensure a valid inference can be made from the assessment.

Are teachers meant to interpret the profile guidance in this way? The profile is unclear but while the exemplification materials only include examples of naturalistic observational assessment we are unlikely to acquire accurate assessments of reading, writing and mathematical ability from EYFS profiles.

Five year olds should not sit test papers in formal exam conditions but this does not mean only observation in naturalistic settings (whether adult or child initiated) is reasonable or the most reliable option.  The inherent unreliability of observational assessment means results can’t support the inferences required for such summative assessment to be a meaningful exercise. It cannot, as intended ‘provide an accurate national data set relating to levels of child development at the end of EYFS’ or ‘accurately inform parents about their child’s development’.

In my next post I explore the problems with the validity of our national early years assessment.


*n.b. I have deliberately limited my discussion to a critique using assessment theory rather than arguments that would need to based on experience or practice.


Koretz, D. (2008). Measuring UP. Cambridge, Massachusetts: Harvard University Press.

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile. Retrieved from

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile: exemplification materials. Retrieved from

Wiliam, D., & Black, P. (1996). Meanings and consequences: a basis for distinguishing formative and summative functions of assessment? BERJ, 537-548.

The curator of memories (and other metaphors)

Teaching is a complex job. As an experienced teacher walks out of a classroom their mind subconsciously assesses a 3D mental map the lesson has just created.

A bad lesson:

A bad lesson may lead to a morose consideration of a stubbornly undulating, terrain. The high points of the teacher’s mental topography are the children or groups of children this professional just knows have ‘got it’. The lows are failures of understanding, so painfully apparent to the professional, from all the lesson’s subtle (and less subtle) feedback cues. To tend to the understanding of all 30 children (to mix my metaphors) can often feel like spinning multiple plates.

But I realised a while ago that teachers shouldn’t only create ‘understanding’ – the transient appreciation of the content learned just now. That newly learned content needs to be remembered because ‘if nothing has been stored in long term memory, nothing has been learned’. In the last few years I think my teaching has improved because I have become not just a creator of understanding but an active, conscious curator of those newly formed understandings, freshly and precariously held in the  memories of my students.

One of the most useful ways to strengthen memory is through short, low stakes factual tests. I set fixed regular tests, I work with other teachers, helping them introduce testing. Regular pre-planned testing is also a way we can automate our teaching. This can be no bad thing as automation saves time and relieves that plate spinning stress.

However, following any practice unthinkingly, whether regular testing or an Ofsted outstanding lesson formula, is dangerous. Rather than exercising our professional judgement we follow the magic recipe (sorry, another metaphor). There are superb off the peg teaching courses out there, perhaps akin to a magic recipe any teacher could follow and get results. Nonetheless we teachers just can’t switch off. To be successful we must always consciously work at creating and then curating knowledge.

Below I outline some of the methods I use to ‘curate knowledge’.

I think very carefully about the content of each test I write and try to choose the items that will be most useful, pieces of knowledge most likely to trigger whole webs of interconnections in my students’ minds. This means the lines of explanantion I utilised in class are used in the phrasing of the test items to re-trigger the same web of knowledge in my students’ minds. Here is an example of a test I’ve used with my year 7. I wanted them to be in a position to write confidently about the causes of the Reformation in England in three weeks time. This meant teaching a whole range of ideas but then curating them, keeping them alive in the minds of my pupils so they could all be used together at essay time. Therefore I recycled test items. This test recapped old learning on Wolsey and Erasmus and reviewed fresh learning on Luther’s teachings.

Test 4 – up to Luther:

Name the very corrupt churchman who ran the government of England for Henry VIII until 1530.

Name a famous critic of corruption in the Catholic Church.

What was an indulgence?

Why did Pope Leo X sell indulgences?

Name the monk who sold indulgences in Germany.

How did Luther decide that people get to heaven?

What different belief did Catholics have about how you get to heaven?

Luther said that many beliefs of the Catholic Church were wrong because they weren’t in the bible. Give 5 examples of Catholic beliefs which Luther criticised.






To keep memories alive in the minds of all your 30 students takes more than the weekly test though.

That goal of ‘active curation’ meant I thought hard about what else I could use to warm up my pupils’ memories. In this quiz I took another tack. I hoped that a reminder of the colourful descriptions I had given of key historical characters would trigger those rich interconnected memories I sought. (nb apologoies for the wonky formatting in WordPress!)

Join the character to the correct description:

Edward IV (of York)

Henry VII (of Lancaster)

Edward V and brother Richard (died 1483)

Richard III

Johannes Gutenberg

Empson and Dudley

Elizabeth of York

The ‘pretenders’ Lambert Simnel and Perkin Warbeck

Became king in 1485, defeating Richard III at the Battle of Bosworth

They led rebellions against Henry VII by suggesting they were Edward IV’s relatives

Married Henry VII. Daughter of Edward IV and mother of Henry VIII.

Became king in 1483. Brother of Edward IV. Probably killed his nephews.

Died 1483 leaving 12 year old son Edward to inherit the throne and brother in charge.

The ‘princes in the tower’. Sons of Edward IV, probably killed by their uncle Richard III.

Established the first printing press in Germany in 1450

Very unpopular officials of Henry VII who made nobles and other people pay the miserly Henry VII lots of money to help him stay powerful. They were executed when Henry VII died.

Pope Alexander VI

Pope Leo X

Prince Arthur

Catherine of Aragon



Martin Luther

Cardinal Wolsey

Eldest son of Henry VII. Died in 1502 aged 15 after marrying Catherine of Aragon.

Sold indulgences around Germany in 1517. A great salesman.

A very clever Catholic who wrote books criticising corruption in the church.

A pope who was famous for ‘debauchery’. He held all night parties and had affairs.

First wife of Henry VIII. A Spanish princess who had a daughter called Mary.

A monk who began to argue in 1517 that the teachings of the Catholic Church were wrong and you get to heaven by ‘faith alone’.

A pope who organised for indulgences to be sold to pay for St Peter’s church in Rome.

Corrupt churchman who ran England for Henry VIII until he failed to get Henry’s divorce in 1530. From poor background but very arrogant. Built Hampton Court Palace.

If we return to the plate spinning metaphor. I deliberately chose items for this quiz that I knew would give another spin to the memory plates in particular children’s minds.  Look at the bottom description of Cardinal Wolsey. Some of my class had been taken by a description of his arrogance. I made sure to include that point in my description of him here but that word ‘corrupt’ is also in there on purpose as I hoped to reawaken notions of the word learned previously. I remembered a number in the class nodding vigorously at the mention of Hampton Court Palace. So another shove of the memory plates by adding that too.  I phrased these descriptions to latch onto previously taught memory hooks of the sort I’ve outlined.

I was aware that the chronology of events was still an issue so the class worked on putting sets of 5 events in order over a homework and repeated over a series of lessons (see below). I’ve put Gutenberg in the first set to emphasise a chronological point. I wasn’t sure many in the class had really grasped that printing presses were well established by the time of Luther. Many of the other points echo the learning for the basic knowledge tests but the same details are now in the context of testing chronology. While the class thought about chronological order I was simultaneously taking the opportunity to get those memory plates spinning again.

Henry VII marries Edward IV’s daughter, Elizabeth of York. This unites the rival noble ‘houses’ of Lancaster and York. Edward IV dies Henry VII wins the Battle of Bosworth Richard III becomes king Gutenberg sets up his first printing press
Thetford Priory is closed Thomas Cromwell is executed The Dissolution of the Monasteries begins Henry VIII dies Henry VIII gets the Act of Supremacy passed by Parliament. This makes him head of the Church of England instead of the Pope.
Martin Luther nails his 95 Theses to the door of the church in the German town of Wittenberg (probably) Henry VIII marries Anne Boleyn (who is pregnant with their daughter Elizabeth) Henry VIII decides he wants to divorce Catherine of Aragon Henry VIII becomes king Pope Clement (prisoner of Holy Roman Emperor Charles V)

refuses to grant Henry VIII a divorce

Pope Leo X commissions Tetzel to sell indulgences around Germany to pay for rebuilding St Peter’s Church Henry VIII becomes King Pope Clement refuses to give Henry a divorce Pope Clement becomes a prisoner of Holy Roman Emperor Charles V whose army have captured Rome Henry VIII gets the Act of Supremacy passed by Parliament. This makes him head of the Church of England instead of the Pope.

Once I am happy my class have some confidence with these bite sized chronologies they can begin to practise putting longer strings of events into order that are in a card sort format. I’ll keep adding to this card sort below for the rest of the year. That means whenever this is a starter activity all that old knowledge is reawakened. Note that I can’t resist using the marriage to Anne Boleyn event card to give sneaky fresh spin to the Elizabeth I memory plate…

Gutenberg invents the printing press in German
Edward IV dies leaving his young son, Edward V to be king.
Richard III makes himself king, probably murdering Edward V.
Henry Tudor beats Richard III at the Battle of Bosworth and becomes Henry VII.
Henry VIII becomes king
Henry VIII marries Catherine of Aragon
Pope Leo X commissions Tetzel to sell indulgences around Germany to pay for the restoration of St Peter’s Basilica in Rome.
Luther publicises his 95 theses criticising Catholic beliefs
Pope Clement becomes prisoner of Emperor Charles V. He refuses Cardinal Wolsey’s request of a divorce for Henry VIII.
Henry marries Anne Boleyn, pregnant with Elizabeth.
Parliament passes the Act of Supremacy in 1534 making Henry VIII head of the Church of England instead of the Pope.
Dissolution of the Monasteries begins

The quiz below got a number of outings. It will come out again to prepare the ground for Puritanism and Archbishop Laud. I knew how useful developed notions of these terms would be later and so I curated that knowledge as best I could ready for future use and development.

Quiz! Which ideas are:

Catholic (C) or Luther’s Protestant ideas (P)

1.     Priests are allowed to marry and are encouraged to live like ordinary people.

2.     The head of the church is the Pope, who lives in the Vatican City, Rome.

3.     The bible SHOULD be translated from Latin into ordinary language.

4.     Church services (called Mass) should be in Latin, as should the bible.

5.     Nuns or monks should live religious lives in monasteries or abbeys.

6.     Churches are plain so as not to distract people from thinking about God for themselves.

7.     What is written in the bible should replace traditional practices.

8.     Churches are colourful and decorated with lots of gold and painting.

9.     You get out of purgatory by doing good works

10.   People should pray to the Virgin Mary, pray to saints and keep relics.

11.   You get to heaven through faith alone – what you believe – not what you do

Meanwhile I kept going with the standard regular testing which is set as homework learning. There are enormous benefits to building habitual working practices. You might think I had no time to teach the actual material with all that supplementary recap but I only averaged one recap session within each lesson. You do also move faster when your class carry in their heads so much useful and relevant foundational knowledge.

Memory curation starts with careful planning of the knowledge you want children to remember. It involves presentation of that knowledge in ways that make it memorable, consciously creating memory hooks as you teach. Tending memories means planning new tasks that utilise old learning wherever possible. It means an ongoing awareness of the likely memories as well as understanidng of each of 30 class members. What memory plates are spinning in their minds and what actions might be necessary to keep all those different plates spinning?

Data Tracking and the LFs*

Until recently I was unfamiliar with the sorts of pupil tracking systems used in most schools. I’ve also recently had to get to grips with the plethora of acronyms commonly used to categorise groups of students being tracked. I’ve come across PP, LPAs, HPAs and LACs but, rather surprisingly, no mention of the LF. To be honest I am surprised by this gap given that in my considerable experience it is how the teacher and school manage the performance of the LFs that is most crucial to healthy end of year data. If the LFs perform near their potential you’re basically laughing all the way to the exam hall.

I should, at this stage, be clear. LF is not a standard acronym (it was invented by my husband) but it does describe a clearly recognisable and significant sub-section of any secondary school population. The L stands for lazy (and the second word begins with an F).

I am being very flippant, I know, but my point is serious enough.

Today I happened to need to look at a spreadsheet containing data for an old cohort from my last school. As my eye glanced down the baseline testing stats, used for tracking, I couldn’t help emitting frequent snorts of derision. The trigger of my scorn was the original baseline test data for some of my most ‘affectionately’ remembered GCSE students (truthfully, actually, I do remember them all with warmth). I commented to my husband that they needed to be real… erm… ‘LFs’ to score that low on the baseline given the brains with which I knew perfectly well that they were blessed.

If I and my colleagues had based our ambitions for those particular boys individuals on their predicted grade from the baseline they’d have cruised lazily through school. Their meagre efforts would have been continually affirmed as adequate which would have been ruinous for their habits and character and a betrayal of their potential.

If value added is what drives you it is also an obvious truth that if you effectively cap your ambitions for pupils by only showing concern when pupils don’t meet predicted grades from the baseline you’ll still have to absorb the scores of some pupils that just aren’t going to be able to live up to their predictions. Meanwhile you lose some of the scores of those that should do better than their baseline result suggests, that would otherwise balance everything out.

I think what bothers me most is the ‘inhumanity’ of a purely data driven approach to progress. How could school teachers, of all people, have devised a system that allows no room to acknowledge obvious human truth before our eyes? Exactly when weren’t and where aren’t some humans, sometimes, rather lazy? Down through the centuries school teachers have exercised their craft, ensuring pupils learn important things despite the entirely natural human propensity towards sloth, magnified in the teenage years. What made us think we could dispense with that wisdom, that our spreadsheets knew better?

Can we re-learn to teach the pupils that actually sit before us, responding to them using our hard-won expertise? Oh, I do hope so.

*Warning: this post nearly contains bad language.

What is high challenge teaching?

This post appears in in Schools Week:

“A question for you: What does high challenge teaching look like?”

“Oh, easy answer: make the work harder”

“OK, another question – what is harder work?”

“Er… more difficult work?”

“And what is the nature of more difficult work?”

“[now trying desperately to break out of synonym soup] I suppose work which moves pupils on further and faster…”

“And how does the work achieve this?

“Umm… by being highly challenging?”

We were asked the first question at one of our regular trust Curriculum and Assessment Group meetings. Perhaps aware that playing with synonyms wasn’t going to take us any nearer to a useful definition, we didn’t spend time on this game!

We were also unlikely to attempt to define challenge by using descriptions of good summative performance.  In so doing, as  Christodoulou explains, we simply confuse ‘the description of a phenomenon with its explanation’.  Sure, an observer with subject expertise could decide a class must have been challenged because of the high quality of their work but if we define high challenge by what it achieves (described in summative level descriptions) we move no closer to defining what teaching that challenges looks like or what tasks provide the challenge that will lead to great performance in a summative assessment. Giving our own pupils these summative descriptions of their academic destination also moves them no closer to understanding the route to get there.

So we cannot define what high challenge teaching looks like by describing more successful outcomes. Perhaps we can reach a better answer by identifying the sorts of tasks that do move children on ‘further, faster’ as being ‘high challenge’. On the face of it this seems quite straightforward: “I will give my history class tasks that require them to really struggle with difficult concepts and explain those ideas in increasingly analytical extended writing.”

But this definition is flawed in several ways:

  1. Challenge varies by subject. Increasingly analytical extended writing won’t provide the requisite ‘high challenge’ in maths. The tasks that push pupils ‘further, faster’ vary enormously by subject. It seems the moment I use specific tasks to define challenge I have to abandon any non-subject specific description of ‘high challenge’.
  1. It goes beyond tasks. Surely in history the range and specificity of the knowledge students can deploy (a key summative descriptor of quality) will depend in part on the quality of prior teacher explanations? I’m going to have to abandon the attempt to define ‘high challenge’ just through the tasks pupils do.
  1. Challenge ≠ struggle. Does moving pupils ‘further, faster’ have to involve ‘struggle’ or difficulty? I’m very familiar with Direct Instruction programmes for literacy and maths and they are highly successful despite being designed to introduce new learning in easy, incrementally tiny steps. There is progress with no struggle. Working memory theory from psychology suggests cognitive overload is a threat to learning when tasks are complex which means struggle can be a bad thing.
  1. It’s about the process. My description of a ‘high challenge’ history task is not specific enough anyway. It is still really a summative description of success. What prior work would make success in this particular analytical task more likely? As Christodoulou points out ‘the process of acquiring skills is different from the product’.

The term ‘high challenge’ is often unhelpfully associated with the experience of struggle. Perhaps a class will feel challenged as they grapple with a complex text, assimilate detail or force themselves to knuckle down and learn when they aren’t in the habit of revising. However, a strong teacher explanation of a difficult concept and its use in different contexts might feel painless. The important practice of learning times tables to automaticity might even feel too easy.

I’ve realised that it is impossible to meaningfully define ‘high challenge’ in any general way. Summative descriptions simply define the outcome and the suitability of tasks is entirely context dependent. Observations can look at outcomes but teachers must simply use their expertise to ask themselves what actions will most efficaciously move their class forward ‘further, faster’ at any given time.

'May I be excused? The pressure is getting to me.'

The ‘quite tidy garden’ …or why level descriptors aren’t very helpful.

Dear Josh,

Thank you for agreeing to sort out our garden over your long holiday. As we’ll be away all summer here is a guide that tells you all you need to know to get

from this…

…to this

STEP A: You should begin by assessing the garden to decide its level. Read through these level descriptors to decide:

Level 1: Your garden is very overgrown. Any lawn has not been mown for some years. Shrubs have not been pruned for a considerable period. There are no visible beds and typically there will be large areas taken over by brambles and or nettles. There will probably be an abandoned armchair (or similar worn out furniture) somewhere in the overgrowth as well as assorted rubble and the old concrete base from a fallen shed. Boundary fencing will have collapsed.

Level 2: Your garden is just a little overgrown. The lawn is patchy though neglect and has only been mown sporadically. Shrubs generally have not been pruned recently. Beds look neglected and are not well stocked. There may be various forms of old rubbish abandoned in the far corners of the garden along with old lawn clippings and hedge trimmings. Boundary fences are in disrepair.

Level 3: Your garden is well tended. Lawns are mown regularly and contain no moss and weeds and shrubs are regularly pruned. Flower beds are well demarcated and contain no weeds. They are well stocked with appropriate bedding plants. The garden is quite tidy and boundary fencing is new and strong.


Josh, if you decide the garden is Level 1 (that is certainly our view) then I suggest you look at the Level 2 descriptor to guide you as to your next steps. It is clear that you need to move the garden from ‘very overgrown’ to ‘just a little overgrown’. For example, in a Level 1 garden, shrubs ‘have not been pruned for a considerable period’. You need to move on from that to a Level 2 garden where ‘shrubs have not been pruned recently’. The lawn needs to move from having ‘not been mown for some years’ to Level 2 ‘has only been mown sporadically’. Aim to move the boundary fencing on from Level 1 ‘will have collapsed’ to Level 2 ‘in disrepair’.  To move on from Level 1 for rubbish, for example, you’ll need to move that old armchair to a far corner of the garden.


Now move the garden from Level 2 to Level 3. This means you should ensure the garden is ‘well tended’ rather than ‘a little overgrown’. What useful advice!

Using level descriptors makes it so clear for you doesn’t it? Hubby is trying to insist that I also leave you his instructions but they are hopeless as he doesn’t understand that you need to know your next steps to make progress in gardening. He’s written reams and reams of advice including instructions like:

‘You’ll find the strimmer in the garage’

‘Start by clearing all the nettles’

‘Ken will come and help you shift the concrete’

‘The tip is open from 10-4 at weekends’

‘Marion next door can advise you about the best bedding plants to buy’

His instructions are just too specific to our garden. To learn the gardening skills that will achieve a Level 3 garden what you need is to really know your next level targets. I won’t confuse you by leaving you his nonsense!

We’ll see you in September and in the meantime we wish you happy gardening!


With apologies to any actual gardeners out there who know what they are talking about and enormous thanks to Daisy Christodoulou whose recent book helped me appreciate just why we shouldn’t use level descriptors as feedback. 

The Secret Of My (Maths) Success

Over half term some good friends visited. I had an interesting chat with Dan, who is in his fifties and gained a first in maths through the OU a few years ago. He’s just done a PGCE as a maths teacher and has been trained to build understanding through plenty of problem solving tasks.

The discussion made me reflect on the stark difference between the way I’ve taught maths to my own children at home, with the lion’s share of time spent learning to fluency, and the focus in schools on exercises to build understanding. After all, I reflected, the progress of my children has stunned even me. How is it they missed out on SO much work on understanding while accelerating far ahead of their peers?

It isn’t that I don’t appreciate that children need some degree of understanding of what they are doing. I remember when I discovered that the reason my friend’s daughter was struggling with maths at the end of Year 1 was because she had failed to grasp that crucial notion of ‘one more’. Her teacher had advised that she needed to learn her number bonds (and indeed she did) but while she did not grasp this basic notion the bonds were gibberish to her. What we call ‘understanding’ does matter (more thoughts here).

I’ve realised the reason I’ve never had to invest significant time in exercises to build understanding. It is because when my children are given a new sort of problem they can already calculate the separate parts of that problem automatically. All their working memory is focused on the only novel element of a procedure and so it is very quickly understood. Understanding is just not a biggy. Identify the knowledge necessary to calculate the component parts of a problem and get fluency in those and generally activities for understanding become a (crucial but) small part the maths diet.

The degree of focus on fluency that my children were given is highly unusual. I have huge piles of exercise books full of years of repeated calculations continued a year, two years, after they were first learned. My children learnt all possible addition and subtraction facts between one and twenty until they were known so well that recall was like remembering your own name. I did the same with multiplication and division facts. There were hours and hours and hours and hours of quite low level recall work.

Generally the the focus in schools is the opposite and this creates a vicious cycle. Children are taught more complex problems when they are not fluent in the constituent parts of the problem. Therefore they struggle to complete calculations because their working memory breaks down. The diagnosis is made that children don’t ‘understand’ the problem posed. The cure is yet more work focused on allowing children to understand how the problem should be solved and why. The children may remember this explanation (briefly) but it is too complex to be remembered long term as too many of the constituent elements of the problem are themselves not secure. When the children inevitably forget the explanation what is the diagnosis? – a failure of understanding. Gradually building ‘understanding’ eats more and more lesson time. Gurus of the maths world deride learning to fluency as ‘rote’ but perversely the more time is spent on understanding instead of fluency, the harder it is for children to understand new learning. By comparison my children seem to have a ‘gift that keeps on giving’. Their acceleration isn’t just in the level of maths proficiency they have reached it is in the capacity they have to learn new maths so much more easily.

gift_keep_giving_13Fluency… the gift that keeps on giving.

I’ve not got everything right but I’ve learned so much from teaching my own children including that the same general principle is true of understanding maths and understanding history. If understanding is a struggle it is because necessary prior knowledge is not in place or secure.

Go back – as far as you can get away with.

Diagnose those knowledge gaps.

Teach and secure fluency.

You’ll find understanding is no longer the same challenge.