Part 2: Early years assessment is not reliable or valid and thus not helpful

This is the second post on early years assessment. The first is here

Imagine the government decided they wanted children to be taught to be more loving. Perhaps the powers that be could decide to make teaching how to love statutory and tell teachers they should measure each child’s growing capacity to love.

Typical scene in the EYFS classroom – a teacher recording observational assessment. 

There would be serious problems with trying to teach and assess this behaviour:

Definition: What is love? Does the word actually mean the same thing in different contexts? When I talk about ‘loving history’ am I describing the same thing (or ‘construct’) as when I ‘love my child’.

Transfer:  Is ‘love’ something that universalises between contexts? For example if you get better at loving your sibling will that transfer to a love of friends, or school or learning geography?

Teaching: Do we know how to teach people to love in schools? Are we even certain it’s possible to teach it?

Progress: How does one get better at loving? Is progress linear? Might it just develop naturally?

Assessment: If ‘loving skills’ actually exist can they be effectively measured?

 

 

 

 

Loving – a universalising trait that can be taught?

The assumption that we can teach children to ‘love’ in one context and they’ll exercise ‘love’ in another might seem outlandish but, as I will explain, the writers of early years assessment fell into just such an error in the Early Years Foundation stage framework and assessment profile.

In my last post I explained how the priority on assessment in authentic environments has been at the cost of reliability and has meant valid conclusions cannot be drawn from Early Years Foundation Stage Profile assessment data. There are, however, other problems with assessment in the early years…

Problems of ‘validity’ and ‘construct validity’

Construct validity: is the degree to which a test measures what it claims, or purports, to be measuring.

 Validity: When inferences can be drawn from an assessment about what students can do in other situations, at other times and in other contexts.

If we think we are measuring ‘love’ but it doesn’t really exist as a single skill that can be developed then our assessment is not valid. The inferences we draw from that assessment about student behaviour would also be invalid.

Let’s relate this to the EYFS assessment profile.

Problems with the EYFS Profile ‘characteristics of effective learning’

The EYFS Profile Guide requires practitioners to comment a child’s skills and abilities in relation to 3 ‘constructs’ labelled as ‘characteristics of effective learning’:

We can take one of these characteristics of effective learning to illustrate a serious problem of validity of the assessment. While a child might well demonstrate creativity and critical thinking (the third characteristic listed) it is now well established that such behaviours are NOT skills or abilities that can be learnt in one context and transferred to another entirely different context- they don’t universalise any more than ‘loving’. In fact the capacity to be creative or think critically is dependent on specific knowledge of the issue in question. Many children can think very critically about football but that apparent behaviour evaporates when faced with some maths.  You’ll think critically in maths because you know a lot about solving similar maths problems and this capacity won’t make you think any more critically when solving something different like a word puzzle or a detective mystery.

Creating and thinking critically are NOT skills or abilities that can be learnt in one context and then applied to another

Creating and thinking critically are not ‘constructs’ which can be taught and assessed in isolation. Therefore there is no valid general inference about these behaviours, which could be described as a ‘characteristic of learning’, observed and reported. If you wish a child to display critical thinking you should teach them lots of relevant knowledge about the specific material you would like them to think critically about.

In fact, what is known about traits such as critical thinking suggests that they are ‘biologically primary’ and don’t even need to be learned [see an accessible explanation here].

Moving on to another characteristic of effective learning: active learning or motivation. This presupposes that ‘motivation’ is also a universalising trait as well as that we are confident that we know how to inculcate it. In fact, as with critical thinking, it is perfectly possible to be involved and willing to concentrate in some activities (computer games) but not others (writing).

There has been high profile research on motivation, particularly Dweck’s work on growth mindset and Angela Duckworth’s on Grit. Angela Duckworth, has created a test that she argues demonstrates that adult subjects possess a universalising trait which she calls ‘Grit’. But even this world expert concedes that we do not know how to teach Grit and rejects her Grit scale being used for high stakes tests. Regarding Growth Mindset, serious doubts have been raised about failures to replicate Dweck’s research findings and studies with statistically insignificant results that have been used to support Growth Mindset.

Despite serious questions around the teaching of motivation, the EYFS Profile ‘characteristics of learning’ presume this is a trait that can be inculcated in pre-schoolers and without solid research evidence it is simply presumed it can be reliably assessed.

For the final characteristic of effective learning, playing and learning. Of course children learn when playing. This does not mean the behaviours to be assessed under this heading (‘finding out and exploring’, ‘using what they know in play’ or ‘being willing to have a go’) are any more universalising as traits or less dependent on context than the other characteristics discussed. It cannot just be presumed that they are.

Problems with the ‘Early Learning Goals’

 At the end of reception each child’s level of development is assessed against the 17 EYFS Profile ‘Early Learning Goals. In my previous post I discussed the problems with the reliability of this assessment. We also see the problem of construct validity in many of the assumptions within the Early Learning Goals. Some goals are clearly not constructs in their own right and others may well not be and serious questions need to be asked about whether they are universalising traits or actually context dependent behaviours.

For example, ELG 2 is ‘understanding’. Understanding is not a generic skill. It is dependent on domain specific knowledge. True, a child does need to know the meaning of the words ‘how’ and ‘why’ which are highlighted in the assessment but while understanding is a goal of education it can’t be assessed generically as you have to understand something and this does not mean you will understand something else. The same is true for ‘being imaginative’ ELG17.

An example of evidence of ELG 2, understanding, in the EYFS profile exemplification materials.

Are ELG1 ‘listening and attention’ or ELG 16 ‘exploring and using media materials’ actually universalising constructs? I rarely see qualitative and observational early years research that even questions whether these early learning goals are universalising traits, let alone looks seriously at whether they can be assessed. This is despite decades of research in cognitive psychology leading to a settled consensus which challenges many of the unquestioned constructs which underpin EYFS assessment.

It is well known that traits such as understanding, creativity, critical thinking don’t universalise. Why, in early years education, are these bogus forms of assessment not only used uncritically but allowed to dominate the precious time when vulnerable children could be benefiting from valuable teacher attention?

n.b. I have deliberately limited my discussion to a critique using general principles of assessment rather than arguments that would need to based on experience or practice.

Early years assessment is not reliable or valid and thus not helpful

The academic year my daughter was three she attended two different nursery settings. She took away two quite different EYFS assessments, one from each setting, at the end of the year. The disagreement between these was not a one off mistake or due to incompetence but inevitable because EYFS assessment does not meet the basic requirements of effective assessment – that it should be reliable and valid*.

We have a very well researched principles to guide educational assessment and these principles can and should be applied to the ‘Early Years Foundation Stage Profile’. This is the statutory assessment used nationally to assess the learning of children up to the age of 5. The purpose of the EYFS assessment profile is summative:

‘To provide an accurate national data set relating to levels of child development at the end of EYFS’

It is also used to ‘accurately inform parents about their child’s development’. The EYFS profile is not fit for these purposes and its weaknesses are exposed when it is judged using standard principles of assessment design.

EYFS profiles are created by teachers when children are 5 to report on their progress against 17 early learning goals and describe the ‘characteristics of their learning’. The assessment is through teacher observation. The profile guidance stresses that,

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration is taken from EYFS assessment exemplification materials for reading

Thus the EYFS Profile exemplification materials for literacy and maths only give examples of assessment through teacher observations when children are engaged in activities they have chosen to play (child initiated activities). This is a very different approach to subsequent assessment of children throughout their later schooling which is based on tests created by adults. The EYFS profile writers no doubt wanted to avoid what Wiliam and Black (Wiliam & Black, 1996) call the ‘distortions and undesirable consequences’ created by formal testing.

Reaching valid conclusions in formal testing requires:

  1.    Standard conditions – means there is reassurance that all children receive the same level of help
  2.    A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
  3.    Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)

The EYFS profile is specifically designed to avoid the distortions created by such restrictions that lead to an artificial test environment very different from the real life situations in which learning will need to be ultimately used. However, as I explain below, in so doing the profile loses necessary reliability to the extent that teacher observations cannot support valid inferences.

This is because when assessing summatively the priority is to create a shared meaning about how pupils will perform beyond school and in comparison with their peers nationally (Koretz 2008). As Wiliam and Black (1996) explain, ‘the considerable distortions and undesirable consequences [of formal testing] are often justified by the need to create consistency of interpretation.’ This is why GCSE exams are not currently sat in authentic contexts with teachers with clipboards (as in EYFS) observing children in attempted simulations of real life contexts. Using teacher observation can be very useful for an individual teacher when assessing formatively (deciding what a child needs to learn next) but the challenges of obtaining a reliable shared meaning nationally that stop observational forms of assessment being used for GCSEs do not just disappear because the children involved are very young.

Problems of reliability

Reliability: Little inconsistency between one measurement and the next (Koretz, 2008)

Assessing child initiated activities and the problem of reliability:

The variation in my daughter’s two assessments was unsurprising given that…

  • Valid summative conclusions require ‘standardised conditions of assessment’ between settings and this is not possible when observing child initiated play.
  • Nor is it possible to even create comparative tasks ranging in difficulty that all the children in one setting will attempt.
  • The teacher cannot be sure their observations effectively identify progress in each separate area as they have to make do with whatever children choose to do.
  • These limitations make it hard to standardise between children even within one setting and unsurprising that the two nurseries had built different profiles of my daughter.

The EYFS Profile Guide does instruct that practitioners ‘make sure the child has the opportunity to demonstrate what they know, understand and can do’ and does not preclude all adult initiated activities from assessment. However, the exemplification materials only reference child initiated activity and, of course, the guide instructs practitioners that

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration from EYFS assessment exemplification materials for writing. Note these do not have examples of assessment from written tasks a teacher has asked children to undertake – ONLY writing voluntarily undertaken by the child during play.

Assessing adult initiated activities and the problem of reliability

Even when some children are engaged in an activity initiated or prompted by an adult

  • The setting cannot ensure the conditions of the activity have been standardised, for example it isn’t possible to predict how a child will choose to approach a number game set up for them to play.
  • It’s not practically possible to ensure the same task has been given to all children in the same conditions to discriminate meaningfully between them.

Assessment using ‘a range of perspectives’ and the problem of reliability

The EYFS profile handbook suggests that:

‘Accurate assessment will depend on contributions from a range of perspectives…Practitioners should involve children fully in their own assessment by encouraging them to communicate about and review their own learning…. Assessments which don’t include the parents’ contribution give an incomplete picture of the child’s learning and development.’

A parent’s contribution taken from EYFS assessment exemplification materials for number

Given the difficulty one teacher will have observing all aspects of 30 children’s development it is unsurprising that the profile guide stresses the importance of contributions from others to increase the validity of inferences. However, it is incorrect to claim the input of the child or of parents will make the assessment more accurate for summative purposes. With this feedback the conditions, difficulty and specifics of the content will not have been considered creating unavoidable inconsistency.

Using child-led activities to assess literacy and numeracy and the problem of reliability

The reading assessment for one of my daughters seemed oddly low. The reception teacher explained that while she knew my daughter could read at a higher level the local authority guidance on the EYFS profile said her judgement must be based on ‘naturalistic’ behaviour. She had to observe my daughter (one of 30) voluntarily going to the book corner, choosing to reading out loud to herself at the requisite level and volunteering sensible comments on her reading.

 

Illustration is taken from EYFS assessment exemplification materials for reading Note these do not have examples of assessment from reading a teacher has asked children to undertake – ONLY reading voluntarily undertaken by the child during play.

The determination to preference assessment of naturalistic behaviour is understandable when assessing how well a child can interact with their peers. However, the reliability sacrificed in the process can’t be justified when assessing literacy or maths. The success of explicit testing of these areas suggests they do not need the same naturalistic criteria to ensure a valid inference can be made from the assessment.

Are teachers meant to interpret the profile guidance in this way? The profile is unclear but while the exemplification materials only include examples of naturalistic observational assessment we are unlikely to acquire accurate assessments of reading, writing and mathematical ability from EYFS profiles.

Five year olds should not sit test papers in formal exam conditions but this does not mean only observation in naturalistic settings (whether adult or child initiated) is reasonable or the most reliable option.  The inherent unreliability of observational assessment means results can’t support the inferences required for such summative assessment to be a meaningful exercise. It cannot, as intended ‘provide an accurate national data set relating to levels of child development at the end of EYFS’ or ‘accurately inform parents about their child’s development’.

In my next post I explore the problems with the validity of our national early years assessment.

 

*n.b. I have deliberately limited my discussion to a critique using assessment theory rather than arguments that would need to based on experience or practice.

References

Koretz, D. (2008). Measuring UP. Cambridge, Massachusetts: Harvard University Press.

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile. Retrieved from https://www.gov.uk/government/publications/early-years-foundation-stage-profile-handbook

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile: exemplification materials. Retrieved from https://www.gov.uk/government/publications/eyfs-profile-exemplication-materials

Wiliam, D., & Black, P. (1996). Meanings and consequences: a basis for distinguishing formative and summative functions of assessment? BERJ, 537-548.