Part 2: Early years assessment is not reliable or valid and thus not helpful

This is the second post on early years assessment. The first is here

Imagine the government decided they wanted children to be taught to be more loving. Perhaps the powers that be could decide to make teaching how to love statutory and tell teachers they should measure each child’s growing capacity to love.

Typical scene in the EYFS classroom – a teacher recording observational assessment. 

There would be serious problems with trying to teach and assess this behaviour:

Definition: What is love? Does the word actually mean the same thing in different contexts? When I talk about ‘loving history’ am I describing the same thing (or ‘construct’) as when I ‘love my child’.

Transfer:  Is ‘love’ something that universalises between contexts? For example if you get better at loving your sibling will that transfer to a love of friends, or school or learning geography?

Teaching: Do we know how to teach people to love in schools? Are we even certain it’s possible to teach it?

Progress: How does one get better at loving? Is progress linear? Might it just develop naturally?

Assessment: If ‘loving skills’ actually exist can they be effectively measured?





Loving – a universalising trait that can be taught?

The assumption that we can teach children to ‘love’ in one context and they’ll exercise ‘love’ in another might seem outlandish but, as I will explain, the writers of early years assessment fell into just such an error in the Early Years Foundation stage framework and assessment profile.

In my last post I explained how the priority on assessment in authentic environments has been at the cost of reliability and has meant valid conclusions cannot be drawn from Early Years Foundation Stage Profile assessment data. There are, however, other problems with assessment in the early years…

Problems of ‘validity’ and ‘construct validity’

Construct validity: is the degree to which a test measures what it claims, or purports, to be measuring.

 Validity: When inferences can be drawn from an assessment about what students can do in other situations, at other times and in other contexts.

If we think we are measuring ‘love’ but it doesn’t really exist as a single skill that can be developed then our assessment is not valid. The inferences we draw from that assessment about student behaviour would also be invalid.

Let’s relate this to the EYFS assessment profile.

Problems with the EYFS Profile ‘characteristics of effective learning’

The EYFS Profile Guide requires practitioners to comment a child’s skills and abilities in relation to 3 ‘constructs’ labelled as ‘characteristics of effective learning’:

We can take one of these characteristics of effective learning to illustrate a serious problem of validity of the assessment. While a child might well demonstrate creativity and critical thinking (the third characteristic listed) it is now well established that such behaviours are NOT skills or abilities that can be learnt in one context and transferred to another entirely different context- they don’t universalise any more than ‘loving’. In fact the capacity to be creative or think critically is dependent on specific knowledge of the issue in question. Many children can think very critically about football but that apparent behaviour evaporates when faced with some maths.  You’ll think critically in maths because you know a lot about solving similar maths problems and this capacity won’t make you think any more critically when solving something different like a word puzzle or a detective mystery.

Creating and thinking critically are NOT skills or abilities that can be learnt in one context and then applied to another

Creating and thinking critically are not ‘constructs’ which can be taught and assessed in isolation. Therefore there is no valid general inference about these behaviours, which could be described as a ‘characteristic of learning’, observed and reported. If you wish a child to display critical thinking you should teach them lots of relevant knowledge about the specific material you would like them to think critically about.

In fact, what is known about traits such as critical thinking suggests that they are ‘biologically primary’ and don’t even need to be learned [see an accessible explanation here].

Moving on to another characteristic of effective learning: active learning or motivation. This presupposes that ‘motivation’ is also a universalising trait as well as that we are confident that we know how to inculcate it. In fact, as with critical thinking, it is perfectly possible to be involved and willing to concentrate in some activities (computer games) but not others (writing).

There has been high profile research on motivation, particularly Dweck’s work on growth mindset and Angela Duckworth’s on Grit. Angela Duckworth, has created a test that she argues demonstrates that adult subjects possess a universalising trait which she calls ‘Grit’. But even this world expert concedes that we do not know how to teach Grit and rejects her Grit scale being used for high stakes tests. Regarding Growth Mindset, serious doubts have been raised about failures to replicate Dweck’s research findings and studies with statistically insignificant results that have been used to support Growth Mindset.

Despite serious questions around the teaching of motivation, the EYFS Profile ‘characteristics of learning’ presume this is a trait that can be inculcated in pre-schoolers and without solid research evidence it is simply presumed it can be reliably assessed.

For the final characteristic of effective learning, playing and learning. Of course children learn when playing. This does not mean the behaviours to be assessed under this heading (‘finding out and exploring’, ‘using what they know in play’ or ‘being willing to have a go’) are any more universalising as traits or less dependent on context than the other characteristics discussed. It cannot just be presumed that they are.

Problems with the ‘Early Learning Goals’

 At the end of reception each child’s level of development is assessed against the 17 EYFS Profile ‘Early Learning Goals. In my previous post I discussed the problems with the reliability of this assessment. We also see the problem of construct validity in many of the assumptions within the Early Learning Goals. Some goals are clearly not constructs in their own right and others may well not be and serious questions need to be asked about whether they are universalising traits or actually context dependent behaviours.

For example, ELG 2 is ‘understanding’. Understanding is not a generic skill. It is dependent on domain specific knowledge. True, a child does need to know the meaning of the words ‘how’ and ‘why’ which are highlighted in the assessment but while understanding is a goal of education it can’t be assessed generically as you have to understand something and this does not mean you will understand something else. The same is true for ‘being imaginative’ ELG17.

An example of evidence of ELG 2, understanding, in the EYFS profile exemplification materials.

Are ELG1 ‘listening and attention’ or ELG 16 ‘exploring and using media materials’ actually universalising constructs? I rarely see qualitative and observational early years research that even questions whether these early learning goals are universalising traits, let alone looks seriously at whether they can be assessed. This is despite decades of research in cognitive psychology leading to a settled consensus which challenges many of the unquestioned constructs which underpin EYFS assessment.

It is well known that traits such as understanding, creativity, critical thinking don’t universalise. Why, in early years education, are these bogus forms of assessment not only used uncritically but allowed to dominate the precious time when vulnerable children could be benefiting from valuable teacher attention?

n.b. I have deliberately limited my discussion to a critique using general principles of assessment rather than arguments that would need to based on experience or practice.

Data Tracking and the LFs*

Until recently I was unfamiliar with the sorts of pupil tracking systems used in most schools. I’ve also recently had to get to grips with the plethora of acronyms commonly used to categorise groups of students being tracked. I’ve come across PP, LPAs, HPAs and LACs but, rather surprisingly, no mention of the LF. To be honest I am surprised by this gap given that in my considerable experience it is how the teacher and school manage the performance of the LFs that is most crucial to healthy end of year data. If the LFs perform near their potential you’re basically laughing all the way to the exam hall.

I should, at this stage, be clear. LF is not a standard acronym (it was invented by my husband) but it does describe a clearly recognisable and significant sub-section of any secondary school population. The L stands for lazy (and the second word begins with an F).

I am being very flippant, I know, but my point is serious enough.

Today I happened to need to look at a spreadsheet containing data for an old cohort from my last school. As my eye glanced down the baseline testing stats, used for tracking, I couldn’t help emitting frequent snorts of derision. The trigger of my scorn was the original baseline test data for some of my most ‘affectionately’ remembered GCSE students (truthfully, actually, I do remember them all with warmth). I commented to my husband that they needed to be real… erm… ‘LFs’ to score that low on the baseline given the brains with which I knew perfectly well that they were blessed.

If I and my colleagues had based our ambitions for those particular boys individuals on their predicted grade from the baseline they’d have cruised lazily through school. Their meagre efforts would have been continually affirmed as adequate which would have been ruinous for their habits and character and a betrayal of their potential.

If value added is what drives you it is also an obvious truth that if you effectively cap your ambitions for pupils by only showing concern when pupils don’t meet predicted grades from the baseline you’ll still have to absorb the scores of some pupils that just aren’t going to be able to live up to their predictions. Meanwhile you lose some of the scores of those that should do better than their baseline result suggests, that would otherwise balance everything out.

I think what bothers me most is the ‘inhumanity’ of a purely data driven approach to progress. How could school teachers, of all people, have devised a system that allows no room to acknowledge obvious human truth before our eyes? Exactly when weren’t and where aren’t some humans, sometimes, rather lazy? Down through the centuries school teachers have exercised their craft, ensuring pupils learn important things despite the entirely natural human propensity towards sloth, magnified in the teenage years. What made us think we could dispense with that wisdom, that our spreadsheets knew better?

Can we re-learn to teach the pupils that actually sit before us, responding to them using our hard-won expertise? Oh, I do hope so.

*Warning: this post nearly contains bad language.

Research and primary education

I took part in a panel discussion at the national ResearchEd conference yesterday. The subject of the discussion was primary education and I thought I would post the thoughts I shared:

At all levels of schooling classroom research is undoubtedly useful but the process of generalising from this research is fraught with difficulty. As E D Hirsch explains, each classroom context is different.

I think ideally we would like to base educational decisions on

  • Converging evidence from many years of research in numerous fields
  • That integrates both classroom research and lab based work so…
  • We can construct theoretical accounts of underlying causal processes.

These theoretical insights allow us to interpret sometimes contradictory classroom research. We actually have this ideal in the case of research into early reading and the superiority of systematic synthetic phonics. Despite this evidence the vast majority of primary schools ignore or are unaware of the research and continue to teach the ‘multi-cueing’ approach to reading.

While research on phonics is ignored some lamentably poor research has been enduringly influential in early primary education and treated with a breath taking lack of criticality. In the 1950s a comparison study of 32 children found that children taught at nursery using teacher centred methods showed evidence of delinquent behaviour in later life. However, this was a tiny sample and there was a tiny effect. No account was taken for the fact the teacher led research group had many more boys than the comparison child centred group – among other fatal flaws. Despite this that piece of research is STILL continually and uncritically cited. For example, the OECD used this study to support character education.  It is also central to the National Audit Office definition of ‘high quality early years provision as ‘developmentally appropriate’.

That flawed research features in the literature review for the EPPSE longitudinal study that has become one of the highest impact educational research programmes in Europe and whose findings underpin billions of pounds of government spending. EPPSE claims to have demonstrated that high quality pre-school provision is child centred and to have shown that such provision has an incredible impact on outcomes at aged 16. However, merely scratch the surface and you find there were obvious flaws with EPPSE. The scales used by classroom observers to discover the nature of quality provision lacked validity and actually predefined what constituted high quality provision as child centred. The researchers admitted problems with the control group meant causal connections couldn’t be drawn from the findings but then ignored this problem, despite the control group issue undermining their key conclusions.

It seems the key principles influencing early years education are too frequently drawn from obviously flawed research. These principles are also the product of misuse of the research we have. For example, it is statutory to devise activities to build resilience in early years education. However, Angela Duckworth, an international authority, admits that although it is a desirable trait we don’t really know for sure how to create it.

What explains the astonishing situation where theoretical research from cognitive psychology is ignored, obviously flawed huge government funded research projects become influential and new pedagogical approaches, based on faulty understanding of the evidence, are made statutory?

A glance along the bookshelves at any primary teacher training institution gives us a clue. There is a rigid child centred and developmentalist  orthodoxy among primary educationalists. This explains the lack of rigorous scrutiny of supportive research. In fact, except on social media, sceptical voices are barely heard.

Testing – a double edged sword.

As teachers we’ve all had those moments when, eyes shining, tongue loosed by the excitement of the moment, we share a fascinating nugget of detail with our class. We’ve all also experienced the dull deflation of that enthusiasm when our students respond “But is this in the exam? Do we actually need to know this?” It seems our focus on testing has created a generation of students who view their studies purely as a means to an end and have lost the ability to enjoy learning for its own sake. Such responses from my classes normally trigger an agony of soul searching on my part. I question whether my desire to get my students good results means these sorts of responses are my own fault, a just retribution for my desire to show off my teaching prowess through a healthy end of year results spreadsheet. The same problem is seen with primary children asking what they need to do to get to the next level rather than enquiring further into a subject. We see the same problem with GCSE English courses in which the easiest books are chosen and only read in extract form, to optimize exam results. I also despise (yes it is that strong) the nonsensical hoop jumping drill that consumes hours of teaching time and is only to ensure student responses conform to exam rubrics so they can get the marks they deserve.

These drawbacks of testing are explained in a blog post by Daisy Christodoulou who took part in a debate recently with Toby Young, Tristram Hunt and Tony Little on the subject of testing. Daisy explains that the proposition of the debate was that tests were ‘essentially a necessary evil…in many ways inimical to good education…Tony Little said that our focus should not be on exams, but on ensuring a love of learning.’ In her post Daisy argues coherently that testing is nonetheless very useful for the reliable feedback it provides and the way the ‘testing effect’ aids memory. I agree but would go further than arguing for teacher set tests. I question the assumption that external exams such as GCSEs and A levels are just a necessary evil, inimical to good education. I’ll explain further.

A week ago my school had their year 13 parent’s evening. The talk was all of university applications and predicted grades. Students had been investigating universities and realisation had dawned that they were not going to get to the prestigious institutions their ambitions desired without those crucial A grades. Every year students that had never quite been able to take their studies seriously wise up to reality, you can see a new purpose in their demeanour as they ‘set aside childish things’ and get down to some serious study. External exams are essential for good education because without them too many students would never summon up that motivation to learn, or to learn enough, in enough detail and never reach a standard they would otherwise be capable of. Witness what happens when teachers are told their subject will still be taught but no longer examined at GCSE or A level. You may have noticed the campaigns to stop A levels being scrapped in languages such as Polish. Teachers know perfectly well that what is examined generally IS what is taken seriously. Where exams aren’t used other forms of competition tend to arise to serve the same purpose.

The assumption that motivation in education should be intrinsic goes pretty unquestioned but while most teachers would profess to believe this, their behaviour would suggest otherwise. Why is it that every year children are under so much stress from SATs? The children have no reason to take these seriously. It is the teachers that explain the importance of these tests to the pupils – to ensure they take the tests seriously, that they pay attention and work hard. Researchers expect a significant diminution in performance on tests when the stakes are low and have to factor this into their analysis.

Eric Kalenze said in his talk at ResearchEd that extrinsic motivation is seriously underrated in education and I agree with him. On the one hand we must avoid bribing children when they would or could work happily with no reward, this is clearly counterproductive. We also want to skilfully withdraw extrinsic rewards as we can see the children are becoming capable of appreciating the content for its own sake. We want to stimulate our students’ curiosity, help them to appreciate what they are learning. However, human motivation is complex. Just how many children ever would learn to their full potential with only intrinsic motivators? I’ve certainly heard of some but even then enthusiasms tend to be selective. I can’t help thinking that if avoidance of extrinsic motivators was an educational panacea Steiner Schools would have taken off in a way they never have.

Just how many students would be sitting in our secondary schools or our A level classes if it were not compulsory and they didn’t need proof of their learning for success later in life? Could it be that external exams rather than being harmful to deeper learning are actually the very REASON why children end up learning lots? If, at 16, it had made no difference to my future whether I understood maths GCSE I might just have spent more time following my enthusiasm for 19thC novels and neglected mathematics entirely. I have also known countless students fall in love with a subject as they study but the initial impetus for that study was the desire for exam success. To really excel in a subject takes serious hard work and discipline. Often the rewards of study are only really appreciated after much toil. Even as an adult can I really say that my motivation to learn things I find interesting is purely for its own sake? So often that genuine curiosity is mixed with a wish for acknowledgment of our erudition or a desire to bolster our own self esteem through feeling learned.

Exams are a double edged sword. True, that focus on exam success over the subject matter taught for its own sake is undoubtedly harmful. We must work to limit that harm while acknowledging that exam certificates are often the very reason our students choose to study. The idea most children would learn more without exams is untested idealism and ignores lived reality.

Every September I ask my new year 12 politics students why they are studying A levels. Every year they tell me it is so they can go to university and get a good career. At the end of every year I ask them if they are pleased they now understand so much more about politics – and they are. Job done!

A truism that needs questioning.

A truism that needs questioning: The importance of ‘high quality’ preschool education.

It is a truth universally acknowledged that young children, especially those not in possession of a good middle class upbringing, must be in need of ‘high quality preschool provision’. The phrase is on every politician’s lips. David Cameron is clear about this. Nicky Morgan, Tristram Hunt and Liz Kendall are sure it will create a skilled workforce of the future and Barack Obama has pumped countless dollars into ‘high quality’ preschool programmes in the belief that research shows that ‘high quality’ provision is the key to better life outcomes.

You might be surprised to learn ‘high quality’ has a very specific meaning that goes well beyond the common sense idea that some preschools must be better run than others. The National Audit Office commissioned a summary of the evidence on the impact of early years’ provision in which they explained that “In pre-school education (3+ years), quality is most often associated with the concept of developmentally appropriate practice

The English Early Years Foundations Stage statutory framework explains what is meant by ‘developmentally appropriate’ (i.e. high quality) practice for 0-5 year olds:

“Each area of learning and development must be implemented through planned, purposeful play and through a mix of adult-led and child-initiated activity.”

Everybody believes young children should play lots and can learn while playing. In England high quality provision does not just mean giving young children time to play it makes it statutory that the bulk of any learning must be through child initiated play. As the statutory framework explains:

“Children learn by leading their own play, and by taking part in play which is guided by adults.”

To be clear, if I want my four year old to learn to wash himself I could:

  1. Instruct him directly but that would be bad practice under the EYFS framework for the majority of learning goals (not really a high quality approach).
  2. I could play a game with him that involves washing. That would be ‘adult led’ play and only acceptable some of the time.
  3. Finally I could try and engineer a situation where my child is likely to want to play at washing himself (an ‘enabling environment’) and I should offer gentle nudges to ‘enrich’ his play in the right direction. That is ‘child initiated’ learning and is at the heart of what is meant by ‘high quality’ preschool practice.

Child initiated play is prioritised because it is believed it will facilitate the central goal of ‘high quality’ pre-schooling – character development. For example the ‘guiding principles’ of the English EYFS statutory framework are a series of dispositions. Children should become resilient, capable, confident and self-assured, strong and independent. This what is meant by the phrase ‘educating the whole child’.

What is the basis for this widely held view of ‘high quality’ pre-schooling?

For me this statutory definition of high quality pre-schooling was problematic for a number of reasons.

1. I’ve looked into character education and there seems a limited basis for the belief that the dispositions and skills which are the goals of this form of pre-schooling can be taught or if they can be inculcated, no real basis for the idea that child initiated learning is the way to do so. For example, it is statutory in English preschools to devise activities to build resilience. Angela Duckworth is viewed as an international authority on ‘grit’ but she admits that although it is a desirable trait we don’t really know for sure how to create it!

2.I taught my own young children to read, do maths, swim, wash, dress. They learnt maths to a high level without my engineering ‘enabling environments’ for child initiated learning.

3. This belief that high quality preschools are child-centred and ‘developmentally appropriate’ flies in the face of the enormous American state sponsored Project Follow Through. Follow Through found direct instruction pre-schooling delivered far greater cognitive gains over child centred approaches.

4.  The research by cognitive psychologists is pretty damning of the idea that developmentally appropriate practice is a good idea.

A report by the National Audit Office on the evidence for the impact of pre-schooling suggests the evidence base for the widely cited definition of ‘high quality’ is a small handful of very old and tiny studies, particularly one I have already written about, the highly flawed High/Scope Perry study which didn’t even find any long term cognitive benefits.

I thought there had to be a firmer basis for what amounts to an international education policy. I investigated further and did find lots of studies looking at the effect of pre-schooling on outcomes but it is hard to find any that policy makers would be interested in that provide a basis for how ‘high quality’ has been defined.

There is one very well-known and significant study that purports to do so. It is the ‘Effective Provision of Pre-School Education Project’ (EPPE), a large longitudinal study involving 3000 children and sponsored by the DfES. One of its aims was to identify the characteristics of an effective pre-school setting. The study involved careful classroom observation particularly with the most widely used measure of preschool classroom quality, the Early Childhood Environment Rating Scale (ECERS-R). The EPPE report explains the use of this ECERS-R measure:

“Matters of pedagogy are very much to the fore in ECERS-R. For example, the sub scale Organisation and Routine has an item ‘Schedule’ that gives high ratings to a balance between adult initiated and child initiated activities. In order to score a 5 the centre must have a balance between adult initiated and child initiated activities.”

Hold on, surely not? This very large government funded longitudinal study is aiming to identify high quality practice using a rating system which predefines what is meant by high quality! The ECERS-R rating system was developed in the late 1970s and is used extensively around the world to judge preschool quality. I spent some time looking for the evidence base for its assumptions. I found that the quality ratings were compiled by one of the creators, using her teaching experience. There has been some criticism of the ECERS-R. Gordon et al write:

“The ECERS and ECERS-R reflect the early childhood education field’s concept of developmentally appropriate practice, which includes a predominance of child initiated activities…a ‘whole child’ approach that integrates physical, emotional, social and cognitive development…there is surprisingly little empirical evidence of the validity of the ECERS-R instrument using item response models.”

Gordon et al explain that there is a fundamental problem with ECERS-R scoring system because statements that allow higher scores (indicators) are only counted if indicators of lower scores are met. However scales ‘mix dimensions’. I’ll explain. One of the scales a preschool is judged along, the ECERS10, includes indicators of nutrition (food served is of unacceptable nutritional value), caregiver child interactions (non-punitive atmosphere during meals), language (meals and snacks are times for conversation) and sanitation, among others! If the food is of unacceptable nutritional value the scorer cannot even judge items higher up the scale when they are really unrelated! Unsurprisingly researchers found ‘the category ordering assumed by the scale’s developers is not consistently evident.’ Interestingly they also found few associations between ECERS-R and child outcomes and they suggest ‘small correlations may be attributable, in part, to the low validity of the measure itself’.

So the best recent research in England on what is meant by ‘high quality’ preschool education, the EPPE longitudinal study, uses a measure which predefines quality. This measure has been widely used to define high quality but is based on a teacher’s observations and has questionable correlation with outcomes, unsurprising when you consider the scales mix dimensions. Finally the very best evidence the National Audit Office could find to justify the ‘developmentally appropriate’ definition of high quality was a tiny, highly flawed study from 50 years ago.

I don’t suppose politicians have any idea that they are endorsing a very particular ‘child centred/developmentally appropriate’ form of early education when they herald ‘high quality early education’ as the panacea for society’s ills or that there is little justification for actively endorsing this particular approach, let alone making it statutory. In fact, whatever, might be written about the findings of the EPPE study, the actually statistics endorse something much more like direct instruction. I talk more about the problems with EPPE/EPPSE here.

Other relevant posts on:

What a child initiated education looks like

The view of early years’ educationalists on direct teaching

The Hydra Part 2

The Hydra Part 2

or ‘Weikart and Scweinhart’s [Perry] High/Scope Preschool Curriculum comparison Study Through Age 23’ and Lifetime Effects: The HighScope Perry Preschool Study Through Age 40 (2005) For details of these studies see Part 1

This post begins with the story of two preschool approaches and their fates. One approach was ‘teacher led’ Direct Instruction and the other ‘child led’ High/Scope Perry preschool, already discussed at length in my previous post. As the NIFDI website explains:

There was an enormous educational experiment beginning in 1968 comparing these two approaches called Project Follow Through. It was the most extensive educational experiment ever conducted. Beginning in 1968 under the sponsorship of the American federal government, it was charged with determining the best way of teaching at-risk children from kindergarten through grade 3. Over 200,000 children in 178 communities were included in the study, and 22 different models of instruction were compared.


Evaluation of the project occurred in 1977, nine years after the project began. The results were strong and clear. Students who received Direct Instruction had significantly higher academic achievement than students in any of the other programs. They also had higher self esteem and self-confidence. No other program had results that approached the positive impact of Direct Instruction. Subsequent research found that the DI students continued to outperform their peers and were more likely to finish high school and pursue higher education. The Perry High/Scope approach is on the graph above. It is the ‘Cognitive Curriculum’ approach (second from the end). You can see it wasn’t quite as successful…

So which preschool method was the winner? The answer might seem obvious from the table above but you would be mistaken. The approach that now dominates is the High/Scope Perry preschool approach – and this is to some extent on the basis of two very small and problematic studies.

In my last blog I outlined the problems with these two small studies from 40-50 years ago by Schweinhart and Weikart that examined the impact of their Perry High/Scope preschool methods on the participants into adulthood. I began to explain the staggering influence over education policy these two studies have had. To find out the details of these two studies click back to my last blog but to give you the gist here is Kozloff’s summary some of the problems with the way these studies have been interpreted:

 What is “…just plain bizarre, is that [Schweinhart and Weikart] barely entertain the possibility that: (1) a dozen years of school experience; (2) area of residence; (3) family background; (4) the influence of gangs; and (5) differential economic opportunity, had anything to do with adolescent development and adult behavior.

In this post I will look at the impact of these studies on early years education.

First: These studies have been crucial in building a case for the importance of preschool education in the early years of childhood.

The National Audit Office commissioned a summary of the evidence on the impact of early years’ provision on young children with emphasis given to children from disadvantaged backgrounds in 2004. Such a paper offers a good review of the key research literature that has been influencing public policy.

The evidence of the biggest ever educational experiment “Project Follow Through” does not feature but both tiny Perry preschool studies feature heavily in the report. The second study of 123 subjects is one of six randomised controlled trials cited to provide evidence of the effectiveness of preschool programmes for disadvantaged children. These trials were all small scale and to an extent have contradictory findings. Some found reductions in antisocial behaviour but not academic gains and others had the opposite findings. None of the trials seemed to offer better evidence than the problematic second Perry study of 123 subjects.

The report then goes onto look at the evidence of preschool having an impact on the general population, rather than studies that only focus on those children that are highly disadvantaged. Perhaps it would be reasonable to argue that the majority of this research had positive findings, either for social development, academic development or both. However, there were still many contradictory findings and what seemed to be quite low effect sizes. Often, the preschool methods examined provide limited academic advantage but show positive social effects later in life. Having looked at the literature, I do begin to wonder if the commenter on this American website has a point:

“As the authors note, it is indeed quite a puzzle how pre-school education could possibly not show positive effects during schooling, yet have dramatically positive effects in adulthood. But if you look at the studies that find no effect during early schooling, you’ll find them very dense with objective facts such as testing results, etc. But if you look at those handful of studies purporting to show dramatic adulthood outcomes, you don’t find a lot of such data. In fact these latter studies aren’t scientific – they’re advocacy. They didn’t come up with rigorously selected criteria prior to pre-school to evaluate the outcomes, but instead retrospectively identified metrics to compare the control groups, leaving much room for post-hoc cherry-picking. The most parsimonious explanation of the paradox is that the adulthood-effects studies are flawed, and aren’t actually showing any real positive outcomes from pre-school.”

I’d need to do much more research to comment further. What is very interesting to me is that that there is no doubt the much publicised benefits of preschool education are built on shakier foundations than advocates would like policy makers to think. If you are interested in forming an opinion, this post , this and this post and this riposte are a great starting point.

There is a very concerning reliance on the Schweinhart and Weikart Perry preschool studies in the National Audit Office report.

1. The shockingly ‘dodgy’ first study (see my previous post) is relied upon to define ‘high quality’ child care.

I’ll explain further. One noticeable feature of the research on preschool effectiveness is the reliance on the idea that the reason some studies showed no effect was because the preschool programme was not ‘high quality’. On one level that is sensible as there must be huge variation in the quality of preschool provision but it is also a way of arguing that we should dismiss all studies with weak or no effects, presuming they are not ‘high quality’. I had noticed that the term ‘high quality’ is used frequently in the literature and repeatedly by policy makers and so I was interested to see how researchers had reached a decision on what constituted ‘high quality’. This is what the National Audit Office report had to say (I’ll highlight the key passage but thought I should include the full extract):

“In pre-school education (3+ years), quality is most often associated with the concept of developmentally appropriate practice. Bryant et al. (1994) report on several studies that illustrate the relationship between developmentally appropriate practice and child outcomes. The High/Scope study [Schweinhart’s and Weikart’s] shows that children who attend a developmentally appropriate, child-centred programme are better adjusted socially than similar children who attend a teacher-directed programme implementing a direct-instruction curriculum (Schweinhart, Weikart, and Larner 1986). In North Carolina, Bryant, Peisner-Feinberg, and Clifford (1993) found that children’s communication and language development were positively associated with appropriate care giving. Burts et al. (1992) and Hart & Todd (1995) show that children’s attendance in developmentally appropriate kindergartens is associated with fewer stress behaviours. Educational content is also important for this age group. Jowett & Sylva (1986) found nursery education graduates did better in primary school than playgroup graduates, suggesting the value of an educationally orientated pre-school. The research demonstrates that the following aspects of pre-school quality are most important for enhancing children’s development: Well-trained staff who are committed to their work with children, facilities that are safe and sanitary and accessible to parents, ratios and group sizes that allow staff to interact appropriately with children, supervision that maintains consistency, staff development that ensures continuity, stability and improving quality and a developmentally appropriate curriculum with educational content

  • Oh my goodness! How can the study referred to possibly support the weight being placed on it (see first half of my previous post)? The National Audit Office report writer considers it a central plank in research used to define ‘high quality’ child care when it is hopelessly flawed.


  • Not only this, there is good research demonstrating the academic advantage of preschool methods with approaches which are contradictory approaches to the Perry preschool methods. Why does the definition of ‘high quality’ actually exclude these successful alternative methods? The Perry Preschool method endorsed ‘developmentally appropriate practice’ which means ‘child led’ experiential curriculum rather than teacher led instruction. The National Audit Office report actually mentions the success of a very teacher led approach, used widely in France:

“Studies of children in the French Ecoles Maternelle programme (Bergmann 1996) show that this programme enhances performance in the school system for children from all social classes and that the earlier the children entered the pre-school program, the better their outcome.”

The early results of project Follow Through (with 9000 children assigned to nine early childhood curricula) showed that disadvantaged children who received Direct Instruction (anathema to those advocating child led approaches) went from the 20th to about the 50th percentile on the Metropolitan Achievement Test. Children who received the Perry High/Scope curriculum did not do as well. They fell from the 20th percentile to the 11th percentile.

2. It is a concern that the second Schweinhart and Weikart study is used to demonstrate the cost effectiveness of preschool education in the National Audit Office report.

“The Perry Pre-school Project is the most cited study in this area and its benefits are well reported (e.g. Schweinhart et al. 1993). The cost-benefit analysis for this project (Barnett 1996) is worthy of some consideration as its findings are extensively used for justifying expenditure on pre-school education and care.

This is troubling given the small size and context of the original study. It can’t possibly support these inferences.

Second: This research on impact of education in children’s early years has been used to justify our statutory Early Years Foundation Stage.

The EPPE study was an enormously significant longitudinal Study funded by the DfES from 1997 – 2004 and with findings in support of our ‘child led’ EYFS curriculum. It actually cites the first highly flawed Schweinhart and Weikart study where, among other problems, findings were the result of changes in outcomes of two or three people. This  is what it says of a study that should never have been taken seriously:

Previous Research on the Effectiveness of Pre-School Education and Care: The vast majority of longitudinal research on early education has been carried out in the U.S.Two of the studies cited most often are the Abecedarian Project and the Perry Pre-school Programme (Ramey & Ramey, 1998; Schweinhart and Weikart, 1997). Both used randomised control trial methods to demonstrate the lasting effects of high quality early intervention. These landmark studies, begun in the 1970s, have been followed by further small scale ‘experiments’ (see the Early Headstart, Love et al., 2001) and larger cohort studies (See Brooks-Gunn, 2003; Melhuish, 2004a, for reviews). This huge body of literature points to the many positive effects of centre-based care and education.”

The EPPE refers to the first Perry study as having been ‘admired for decades for its internal validity.’ I am not sure the writer has actually looked at the study. Amusingly, the EPPE report writer seems bemused when their own findings contradict those of the Perry study:

“ It appears therefore that the beneficial impact of pre-school on cognitive attainment is more long lasting than that on social behaviour. Social/behavioural outcomes may be more influenced than cognitive outcomes by the primary school peer group. Still this finding is at odds with the Perry Pre-school study, which indicated that the social outcomes of pre-school were more salient than the cognitive ones by adolescence. Data from the EPPE continuation study, will follow children into adolescence, to shed light on this.”

The influence the Perry study has had on understanding of what might be meant by ‘high quality’ provision is quite concerning given these contradictory findings.  Our national EYFS curriculum presumes ‘high quality’ provision means:

“…all areas of learning [are] to be delivered through planned, purposeful play, with a balance of adult-led and child-initiated activities… Professionals should therefore adopt a flexible, fluid approach to teaching, based on the level of development of each child. Research also confirms that the quality of teaching is a key factor in a child’s learning in the early years. High quality teaching entails high levels of interaction, warmth, trust, creativity and sensitivity. Practitioners and children work together to clarify an idea, solve a problem, express opinions and develop narratives.

However this is problematic:

1. We know the first flawed Schweinhart and Weikart Perry study contributed towards this idea of ‘high quality’ provision but it did not actually improve academic outcomes for its young participants. Evidence such as Follow Through (from the same era is) not even mentioned which did improve academic outcomes. The Perry programme was viewed as worthwhile because it was believed this intervention limited adult anti-social behaviour.

“The Perry program initially boosted IQs. However, this effect faded within a few years after the end of the two-year program, with no statistically significant effect remaining for males, and only a borderline significant effect remaining for females.

2. I believe good parenting makes a difference to children and so, although the studies seem unconvincing it is not outside the realms of possibility that the committed teachers involved in the second Schweinhart and Weikart study had some positive impact on their pupils. They were a highly committed team of extremely well qualified teachers. They must have involved themselves deeply in the lives of their very disadvantaged, very low IQ pupils, given they made 90 minutes visits every week to their homes as well as teaching them. Such a scheme is not really more widely replicable though.

Also, given the range of possible benefits the children experienced it is quite a leap to pin point the child-led learning as a crucial factor and suggest it provides a model of ‘high quality’ pre-schooling for all children today.

However some sort of  model is what the Perry preschool at Ypsilanti Michigan seems to be:

3. The use of these studies to justify ‘developmentally appropriate’, ‘child led’ practices is concerning. If children are not highly disadvantaged or at any great risk of engaging in felonies (most children) it is hard to see how these studies can justify the use of such approaches. This is particularly given the success of other methods. [The results of project Follow Through (with 9000 children assigned to nine early childhood curricula) showed that all groups made greater academic gains than using the Perry High/Scope approaches.]

Some of the many subsequent studies on the effectiveness of preschool education record academic gains and others don’t. Some record social gains and others don’t. However, partly thanks to the Schweinhart and Weikart studies the importance of the ‘high quality’ preschool education is a mantra repeated by all politicians. The Perry approach has become a model for ‘high quality’ preschool education around the world and currently, about 30 percent of all Head Start centres in America offer a version of the Perry curriculum (ICPSR 2010). For those, like myself, concerned that ‘child-led’ approaches are not the most efficacious, the impact of these studies is an enormous cause for concern.

So why are two small, context dependent flawed studies still widely cited? Why are the results of the largest ever educational experiment from the same era ignored in early years research literature? There is only one possible explanation. The small studies said what educationalists wanted to hear, that a child led curriculum could be proved to effect life outcomes.  The enormous Project Follow Through had more uncomfortable findings – so it was ignored.

I examine the evidence base for what is deemed ‘high quality’ pre school education here:


The Hydra

‘The Hydra’

or ‘Weikart and Scweinhart’s [Perry] High/Scope Preschool Curriculum comparison Study Through Age 23’ and Lifetime Effects: The HighScope Perry Preschool Study Through Age 40 (2005)

A few years ago I read a serious book on early reading instruction that stated, without question, that there was strong scientific evidence that direct instruction methods, used even at nursery level, lead to increased criminality among adults. It seemed implausible. How could the teaching style of your nursery school, in some cases over only one year, have such a significant impact that it could be measurable 20 years later? However, I then came across the claim in numerous respectable publications. Greg Ashman  drew my attention to another in Psychology Today recently. Initially I felt I had to accept the claims – but should just investigate properly first. I soon discovered that all these claims pretty much originate

‘… from about half a dozen articles in the series written by Schweinhart and Weikart, in which they claim to compare (and apparently believe that they demonstrate the superiority of) their High/Scope pre-school curriculum with (1) a preschool that used Direct Instruction in reading and language for about an hour a day for one year, and (2) a traditional nursery school [Kozloff DI creates felons, but literate ones. Contribution to the DI Listserve, University of Oregon, 31 December, 2011.]

In this post I will explain how I discovered that it is no exaggeration to state that the influence of this and another related study on education policy around the world has been enormous. You will find this study as a central plank of evidence in swathes of papers on a diverse range of education related issues.

What was the approach of the apparently markedly superior Perry High/Scope approach?

The curriculum was based on the principle of active participatory learning, in which children and adults are treated as equal partners in the learning process, and children engage with objects, people, events, and ideas. Abilities to plan, execute, and evaluate tasks were fostered, as were social skills, including cooperation with others and resolution of interpersonal conflicts. The Perry curriculum has been interpreted as implementing the theories of Vygotsky (1986) in teaching self-control and sociability. [Heckman et al 2013]

What was the quality of the research in this study?

I soon discovered that to say this Schweinhart and Weikart study is problematic is an almighty understatement – as outlined by Bereiter, Kozloff (thanks to Greg Ashman for this) and Engelmann.

    • In this study 18 subjects had been taught using Direct Instruction at their preschool for about an hour a day and 14 that had been taught using High/Scope child led methods and 16 subjects in a ‘traditional’ preschool. All were low IQ.
    • These were about a third of the original subjects, the ones who could be traced 20 years later.
    • Far more of the adults who had been in the Direct Instruction preschool had been reared by single, working mothers whose income was about half that of households in the High/Scope group.
    • Eight of the 18 original Direct Instruction group had only one year of preschool while all the High/Scope subjects had two years of preschool.
    • Their gender balance was greatly different, with the High/Scope group having nearly two thirds participants female. This was not accounted for in the study.
    • Attendance at the local high school was 84% for the DI group and 64% for the High/Scope- and again this was not considered.
    • Differences in between groups actually amount to differences in the activities of only one or two persons.
    • The early results of project Follow Through (with 9000 children assigned to nine early childhood curricula) showed that disadvantaged children who received Direct Instruction went from the 20th to about the 50th percentile on the Metropolitan Achievement Test. Children who received the Perry High/Scope curriculum did not do as well. They fell from the 20th percentile to the 11th.

As Kozloff explains. What is “…just plain bizarre, is that these two writers barely entertain the possibility that: (1) a dozen years of school experience; (2) area of residence; (3) family background; (4) the influence of gangs; and (5) differential economic opportunity, had anything to do with adolescent development and adult behavior.

In fact a much larger study then found no correlation between preschool methods and criminality. It seems ridiculous that the outcomes of 18 twenty three year olds, should ever have been taken very seriously… However, this and another highly problematic study by Schweinhart  and Weikhart  (outlined below) have been cited 2444 times.

It sounds mad and I can’t quite understand it myself but  you will find these studies as a central plank of evidence in swathes of papers on a diverse range of education related issues:

Other areas the Schweinhart and Weikart papers have had influence:

Character Education

I was preparing for a panel debate on character education recently and started to look into the evidence in favour of teaching character attributes as a skill. Character can mean so many things that there is actually a vast body of relevant research but I began with a very glossy publication by the OECD. On p40 I found a Schweinhart and Weikart study was being used as evidence. Apparently their Perry preschool programme:

“significantly enhanced adult outcomes including education, employment, earnings, marriage, health, and participation in healthy behaviors, and reduced participation in crime [Heckman et al 2013]. “

The OECD report claimed that the evidence of this study provides:

some of the most compelling evidence that non-cognitive skills can be boosted in ways that produce adult success… Arthur Jensen’s (1969) discussion of IQ fadeout in Head Start and other compensatory programmes promoted the widespread embrace of the notion that intervention efforts are ineffective and that intelligence is genetically determined. His uncritical reliance on intelligence test scores illustrates the fallacy of relying on mono-dimensional measurements of human skills. The Perry intervention provides an effective rebuttal to these arguments. The programme greatly improved outcomes for both participating boys and girls, resulting in a statistically significant rate of return around 7%-10% per annum for both genders (see Heckman et al., 2010a). ”

What was the approach of this  second Schweinhart and Weikart study?

This study had 123 participants, all low IQ and they were divided between a treatment group receiving the ‘Active Participatory Learning’ (as outlined above) in classes of 13 students with two highly qualifies teachers for each group. The control group did not receiving any pre-schooling. “Sessions lasted 2.5 hours and were held five days a week during the school year. Teachers in the program, all of whom had bachelor’s degrees (or higher) in education, made weekly 1.5-hour home visits to treatment group mothers with the aim of involving them in the socio-emotional development of their children. The control group had no contact with the Perry program other than through annual testing and assessment (Weikart, Bond, and McNeil 1978).”

Concern 1: The research findings regarding delinquency have been disconfirmed

Crime reduction in adult life is argued by Heckman, (one of the OECD report authors) to be one of the major benefits of the Perry programme. However, there have been other studies that suggest the Perry programme was not especially effective.  For example this from Mills, Cole, Jenkin and Dale.

“In a previous study of the differential effects of contrasting early intervention programs on later social behavior (Mills, Cole, Jenkins, & Dale, 2002), we found no differences in self-report of juvenile delinquency at age 15 for children enrolled in direct instruction and child-directed models. These results disconfirmed the conclusion of Schweinhart, Weikart, and Larner (1986b) that direct instruction was linked to higher rates of juvenile delinquency and other social differences. Our previous study was limited to self-report of juvenile delinquency, a very coarse measure of social development, in an attempt to replicate the key finding of Schweinhart et al. (1986b). In the present study, we examine additional measures of social development, which might be more sensitive to subtle program differences, including school satisfaction, loneliness, and depression. We administered a battery of social development measures to 174 children at age 15 who had been randomly assigned at preschool age to the two early childhood models. We found no differences on any social outcome for program type. Across a wide range of social behaviors at age 15, there is no evidence that type of early intervention program differentially influences subsequent adolescent social behavior.”

Concern 2: Small sample size

Heckman an author of the OECD report has become a leading player in character education research. Fascinatingly this is on the back of his examination of this study. In his paper on the work he has done with Schweinhart’s and Weikart’s he acknowledges:

“The small sample size of the Perry experiment (123 participants) has led some researchers to question the validity and relevance of its findings (e.g., Herrnstein and Murray 1994; Hanushek and Lindseth 2009). Heckman et al. (2010a) use a method of exact inference that is valid in small samples. They find that Perry treatment effects remain statistically significant even after accounting for multiple hypothesis testing and compromised randomization.

However, for the biggest effects claimed, differences between groups amount to the differences between the activities of about 20 people. I can’t fathom how this sort of data can possibly provide some of  “the most compelling evidence that non cognitive skills can be boosted in ways that produce adult success.” Heckman is a Nobel prize winning economist. It would certainly take a kind of brilliance beyond any I can imagine to be able to justify the enormous influence this study has had over education and public policy.

Concern 3: Scaling up

A critic of the continued reliance on this study by education policy makers is Russ Whitehead who argues that this research was:

“From a time when very little of today’s safety net for the poor was in place…Further, [Perry was a]small single-site program run by [its] developers.  Concluding that findings from these studies demonstrate that current and contemplated state pre-k programs will have similar effects is akin to believing that an expansion of the number of U.S. post offices today will spur economic development because there is some evidence that constructing post offices 50 years ago had that effect.”

There are many studies that examine the impact of preschool education but the likes of Heckman continue to rely upon the outcomes of a small single site ’boutique’ programme from 40-50 years ago. Heckman himself suggests that positive effects of the Perry program have become a cornerstone of the argument for preschool programs (e.g., Obama 2013). “Currently, about 30 percent of all Head Start centers nationwide [across America] offer a version of the Perry curriculum (ICPSR 2010).”

My next post looks at the second study in more detail and, among other things,  I will look at the influence of this study on public policy regarding preschool education.

n.b. This post has been updated to make it clear that there are two separate studies being referenced.

Can character be taught?

Yesterday I took part in a panel discussion at the “Character v Knowledge” event organised by the East London Science School and  The Education Foundation.  Below is the transcript of my brief opening talk:

If Ignatius Loyola did say the famous Jesuit maxim, ‘Give me the child until he is seven and I will give you the man’, I don’t think it was an idle boast but I am sceptical that many of the schools that claim to change character, to psychologically engineer our children’s attributes, are achieving anything of the sort.

Character, in this context seems to mean a jumbled mix of values, virtues, skills and attributes. If the goal of character education is to inculcate virtues, for their own sake, then why are successful programmes judged through improved exam results? Doing the right thing doesn’t necessarily make one successful, it is an end in itself. In practice, the rhetoric of character education seems more instrumentalist than virtuous, a training in skills that provide the means to achieve exam success and other personal gains.

Character is a skill according to many keen promoters of character education but if so then why is the problem of transfer largely ignored? Take being a ‘loving person’. The fact I am loving with my children doesn’t necessarily mean I will show love to my neighbour and certainly not that I will love studying geography. We use one word ‘loving’ but it means different things in different contexts. How much more transferable between domains are the ‘skills’ we have named curiosity and resilience?

My school’s rugby sevens team has just won the national trophy. Despite the character traits their many hours of training must have inculcated why is it that I can only guess which students in my classes were in the team through their muscular bulk, not their approach to cooperative work? If sport does inculcate useful attributes it seems transfer isn’t easy or guaranteed.

Such questions should be enough to give pause for thought – even without considering how unsuccessful large scale attempts at character education have been to date. In East Germany from 1958 – 1976 the full power of the state got behind the inculcation of a ‘socialist personality’ with attributes such as ‘mutual help and comradely cooperation’. There is some hubris in believing you can engineer a better society. Our own government’s attempts have been no more successful. Look at the dismal impact of PHSEE lessons, and the failure of SEAL.

Hope springs eternal and over the last few years many schools, influenced by Dweck, have opted for exhortation with general maxims or providing helpful prompts when task setting. They hope that reasoning with students will encourage them to apply suggested principles. Schools seem less willing to allow children to actually live with the consequences of failure or to compel children to behave in ways that could one day become good habits. Sure, efforts that have a narrow enough focus, for example on persuading children to work harder in school, should have some positive impact but has character changed?

Increasing self-esteem was the last big thing but we now know that often our efforts did more harm than good. What if too much perseverance stops people behaving pragmatically, thinking of clever short cuts? What if significant perseverance is learnt through serious failure? What if our unskilled attempts at amateur cognitive therapy go wrong?

It is possible to mould values and change children’s habits over the span of their childhood in reaction to the myriad of situations that arise naturally, through a judicious mix of exhortation, example and crucially compulsion also. That is what traditional parenting and schooling does and all we really know will work until such time as we can realistically claim to have a formula for creating good character. To quote Roger Scruton, “…wisdom is seldom contained in a single head, and is more likely to be enshrined in customs that have stood the test of time than in schemes of radicals and activists.”



Why not assess loving kindness and compassion?

It was inevitable. Given the buzz around character education, mindfulness, mindset and metacognitive strategies etc it was only a matter of time before teachers started trying to assess these sorts of attributes. I came across this sincere attempt to chart progress in areas such as mindfulness and also loving kindness and compassion.

I have reproduced the blog’s suggested progression in ‘loving kindness and compassion’ below:

Their Journey so far Loving kindness and compassion
6 Regularly enjoys giving and receiving acts of loving kindness. Is regularly compassionate towards others and looks to help people in distress. Looks after the vulnerable in the school and looks to help them by talking and playing with them.
5 Is beginning to see how acts of kindness are beneficial to the giver and receiver.Beginning to understand the concept that we all suffer and that we shouldn’t look to add to people’s suffering.
4 Is beginning to see how acts of kindness can be beneficial to others.
3 Can be kind to themselves but not always show compassion or kindness towards others.
2 Finds it both hard to give and receive acts of kindness.
1 Find it hard to be positive about themselves or others.

Quite aside from whether it is the role of schools to prioritise the psychological manipulation of their pupils over academic goals, so many obvious questions appear ignored in the construction of this chart.

  • Do we really think kindness is a skill that can be taught?
  • If we really do can it be assessed effectively? How do we know?
  • Do people make any form of linear progress in behaviours?
  • Who decides what ‘better’ means and why?
  • Can a mark scheme actually show progression in kindness?
  • Is it not more than a little problematic that on this mark scheme the same person could sometimes be level 1 and at other times level 6?

Surely it is demonstrably foolish to set about assessing a desirable attribute without properly considering these questions? Isn’t it obvious that you can’t just pluck a series of statements out of the air that seem to you to show progression and claim they do and that you can use them to assess? How could the writer ever have thought this was anything other than nonsense?

Ah… Hold on a moment…

I didn’t include the above table to pillory this blogger in his sincere efforts to spread loving kindness. Why should he think there is any problem with his approach given the assessment levels he has been using as a teacher?

Here we have the old National Curriculum science levels. I have taken excerpts from the levels for energy forces and space:

Level 1 Pupils communicate observations of changes in light, sound or movement that result from actions
Level 2 Pupils know about a range of physical phenomena and recognise and describe similarities and differences associated with them.
Level 3 Pupils use their knowledge and understanding of physical phenomena to link cause and effect in simple explanations
Level 4 Pupils describe some processes and phenomena related to energy, forces and space, drawing on scientific knowledge and understanding and using appropriate terminology… They recognise that evidence can support or refute scientific ideas… They recognise some applications and implications of science
Level 5 Pupils describe processes and phenomena… drawing on abstract ideas and using appropriate terminology… They explain processes and phenomena, in more than one step or using a model… They apply and use knowledge and understanding in familiar contexts…. They recognise that both evidence and creative thinking contribute to the development of scientific ideas… They describe applications and implications of science


  • Do we really think things such as ‘recognising similarity and difference’ are generic skills, readily transferable to whatever material is being learnt?
  • If we really do can it be assessed effectively? How do we know?
  • Do people make any form of linear progress in recognising similarity and difference?
  • Who decides what ‘better’ means and why? Can a mark scheme actually show progression in recognising similarity and difference?
  • Is it not more than a little problematic that on this mark scheme, depending on the material taught,  the same person could sometimes be level 1 and at other times level 6?

Apparently you are ‘able to recognise similarity and difference’ from level 2. Level 4 is when you get the skill of ‘recognising some applications and implications of science’ and by level 5 you can explain these. My six year old’s science school report suggests he needs:

‘…to use his observations to make a simple conclusion’.

Ah yes, that ‘using observations to make a simple conclusion skill’. Isn’t that the skill he used as a new-born baby when he decided he wanted to be with mummy because she had the milk? What level did that make him?

Surely it is demonstrably foolish to set about assessing a desirable attribute without properly considering these questions? Isn’t it obvious that you can’t just pluck a series of statements out of the air that seem to you to show progression and claim they do and that you can use them to assess? How could the writer ever have thought this was anything other than nonsense?

Why am I going over old ground? Levels are gone (at least gone from some schools) and new better forms of assessment, endorsed by the Department of Education, are being substituted. Let’s look at one of these winners of ‘Assessment Innovation Fund’ money from the DfE…

Concept: Causation in history Learning for Progress: Key Stage 4 History
CreatingStudents organise and represent information in a new / different way.Action words: Plan, invent, design, develop, construct, compose Demonstrate their understanding of the past through developed, reasoned and well substantiated explanations of relevant causes, consequences and changes
EvaluatingStudents judge the quality / usefulness of information sources, making decisions based upon agreed criteria.Action words: Assess, justify, prioritise, judge, decide / choose, recommend Demonstrate their understanding of the past through reasoned and well-justified explanations of relevant causes, consequences and changes
AnalysingStudents break down information sources into key parts, finding a range of differing evidence.Action words: Compare / contrast, examine, investigate, categorise, classify, sort Demonstrate their understanding of the past through developed and reasoned explanations of relevant causes, consequences and changes
ApplyingStudents begin to solve problems / answer questions by using learned information in different situations.Action words: Use, complete, examine, illustrate, solve, apply Their descriptions are accurate and their explanations show understanding of relevant causes, consequences and changes
UnderstandingStudents begin to solve problems / answer questions by using learned information in different situations.Action words: Use, complete, examine, illustrate, solve, apply Demonstrate their understanding of the past through description of reasons, results and changes in relation to the events, people and issues studied
RememberingStudents begin to solve problems / answer questions by using learned information in different situations.Action words: Use, complete, examine, illustrate, solve, apply Demonstrate their understanding of the past through description of reasons, results and changes in relation to the events, people and issues studied

This is better than the old NC levels because there is an emphasis on the idea that the degree of skill will depend on the events people and issues studied. But then again…

  • On what grounds do we assume that analysing is a lower level skill than creating? This not a minor niggle. If we can’t actually show this (and we can’t) it undermines the whole premise of the assessment structure.
  • The structure implies that during KS4 a student will go higher up the assessment ladder as they do more topics. Will they? How do we know?
  • Historians spend quite some time simply writing descriptions. Are they operating at a lower level than a KS3 student who has reached the top ‘creating’ level on the ladder or can writing a description actually be quite hard?

The demands of comparative accountability require state schools to use progress’ measures but they will always be flawed because:

Surely it is demonstrably foolish to set about assessing a desirable attribute without properly considering these questions above? Isn’t it obvious that you can’t just pluck a series of statements out of the air that seem to you to show progression and claim they do and that you can use them to assess? How could the writer ever have thought this was anything other than nonsense?

How can I really check my year 9 history students have made ‘progress’ over time in some generic sense, that doesn’t actually hinge on whether they have learnt the latest stuff they have been taught? That apparent ‘progress’ will evaporate if students make less effort on the next topic (or my teaching is poor). The following links are to blogs that all explore the reasons levels are problematic and suggest alternative ways forward, see here, here  and here.  Using the idea that a child is making ‘progress’ rather than simply learning more stuff can work better in subjects where the content is more hierarchical such as maths and early reading although even then it can lead to short termism in approaches and can be problematic because models of progress are often inevitably flawed.

While the education establishment continues to show distaste for the idea that education is about learning a body of knowledge we will not have decent assessment. The idea of actually comparing schools by checking how many students in a year group can explain Hooke’s Law or a myriad of other facts, seems almost absurd in the current climate (although it is what GCSEs do). However, what is more absurd is the alternative.

Meaningful assessment involves checking how well students have learnt the specific stuff you have taught them and difficulty of the task will be dependent on how difficult the students find the specific material being learnt.

Just three special steps are all that you need!

What’s going to work? TEEEAMWORK!

Can we fix it? Yes we can!

Three special steps are all that you need!

What do you need when you don’t know where to go?

If your kids are similar age to mine and their favourite channel was also Nick Jr these exhortations will be so horribly familiar you may not be grateful for the reminder of exhaustion befuddled times when they became all too familiar. Heaven knows just how many times my kids, slouched, hypnotised and inactive on the sofa have been exhorted to ‘use their imagination.’ Much like adults who vacantly watch Saturday Kitchen, getting up only to make themselves some toast for lunch, our children are generally entertained but unmoved by the character/behavioural education so carefully packaged for them. How do I know it all hasn’t worked? Simply because, given the heavy indoctrination sessions the average three year old sits through everyday, if the lessons worked our reception classes would be full of cooperative, caring, team-working, problem solving giants of imagination. In actual fact, I’ve never heard a KS1 teacher recommend the route to attaining these attributes and dispositions is through more TV. Odd that.

It is odd how back to front it all is. The sort of teachers who believe hands on experience is essential to learning will only TELL young kids what is desirable behaviour. They do then, unlike with telly lessons, give possible opportunities for those behaviours to be practised and may try and prompt those behaviours but they then believe kids behaviour will alter if the child is ‘ready’ or able. These teachers tend to be less keen on REQUIRING that behaviour to ensure experience of it.

I agree that whatever of our general behaviour is mouldable, is shaped by experience. However, generally speaking, those experiences were NOT optional. We often learn from the ‘school of hard knocks’ but in our society we shirk from exposing our children to anything that could be considered distressing fearing it will damage the child or harm motivation. Engelmann sees things differently. He describes how when a child learns to walk, the ground is entirely unforgiving. Again and again the child falls but the undistorted feedback the ground provides means learning is fast. We shirk from allowing our children such experiences in their education. We seem to hope that we can cheat. If we just TELL our children the desirability of resilience, if we make it their decision, we don’t have to upset them. We fear that any non-voluntary behaviour demotivates.

For a long time (it feels like forever) my six year old has been learning number bonds. The odd time I have told him WHY he does long lists of calculations. I’ve even pointed out to him that his progress has been due to hard work. However, I have never focused on teaching him motivation and then allowed his progress to be dependent on his own motivation. Why not? Because he is only six and I don’t want my six year old to be hassled with the responsibility for or expect him to be reliably capable of, self motivated hard work. I certainly don’t want his progress to be dependent on this.

Recently my son has been doing the ‘Big Brainz’ games on the computer to build fluency in the four operations. He has got a bit distressed and very frustrated when he has failed a level because he does not remember the answer to questions such as 13-4 or 3×7. Contrary to the impression I must give, I can be a soft hearted soul. I’ve not liked to see his upset, felt that helping him with answers won’t hurt too much. Actually my kindness just means he struggles with the next level as the learning on the last was not secure. So recently I’ve been really strict with myself and not helped. I’ve even ignored his sobs that he doesn’t want to play the game anymore and required that he continue. Was he suffering significant distress? Well he had forgotten his concern within a minute and become re-immersed in the game. At six he sobs when he is told the television is being turned off or that I want him to eat some vegetables.

Has he been put off maths because of my callous, uncaring drive to hothouse him for 20 minutes a day? Of course he flippin hasn’t! If only I had had a camera to take the photo of his super cute (to his mother) victory wiggle dance when he finally conquered the third level of multiplication he had been repeatedly stuck on. I think he felt he now ruled the world! As adults our job is surely not to cocoon our children from distress or only allow struggle when it is self imposed. We must protect children from excessive distress but seem to have a very low threshold for judging that and thus we prevent our children experiencing the very lessons we value.

My son has also learnt more, faster by being required to struggle. My daughters’ KS2 teachers have found their capacity to work steadily through large amounts of maths work quite remarkable and they insist maths is their favourite subject because they enjoy being good at it. This illustrates to me that our reluctance to REQUIRE our children to struggle holds them back and lowers their own ‘pain thresholds’ when it comes to hard work (by which I mean spending a few minutes doing something they didn’t fancy doing). TBH I am just not sure how transferable my son’s newly learnt resilience will be to different contexts. However, I am convinced that while our attempts at character education are big on exhortation and decry non-negotiable experience, we’ll have little more success changing behaviour long term than Nick Jr or Saturday Kitchen.