Early years assessment is not reliable or valid and thus not helpful

The academic year my daughter was three she attended two different nursery settings. She took away two quite different EYFS assessments, one from each setting, at the end of the year. The disagreement between these was not a one off mistake or due to incompetence but inevitable because EYFS assessment does not meet the basic requirements of effective assessment – that it should be reliable and valid*.

We have a very well researched principles to guide educational assessment and these principles can and should be applied to the ‘Early Years Foundation Stage Profile’. This is the statutory assessment used nationally to assess the learning of children up to the age of 5. The purpose of the EYFS assessment profile is summative:

‘To provide an accurate national data set relating to levels of child development at the end of EYFS’

It is also used to ‘accurately inform parents about their child’s development’. The EYFS profile is not fit for these purposes and its weaknesses are exposed when it is judged using standard principles of assessment design.

EYFS profiles are created by teachers when children are 5 to report on their progress against 17 early learning goals and describe the ‘characteristics of their learning’. The assessment is through teacher observation. The profile guidance stresses that,

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration is taken from EYFS assessment exemplification materials for reading

Thus the EYFS Profile exemplification materials for literacy and maths only give examples of assessment through teacher observations when children are engaged in activities they have chosen to play (child initiated activities). This is a very different approach to subsequent assessment of children throughout their later schooling which is based on tests created by adults. The EYFS profile writers no doubt wanted to avoid what Wiliam and Black (Wiliam & Black, 1996) call the ‘distortions and undesirable consequences’ created by formal testing.

Reaching valid conclusions in formal testing requires:

  1.    Standard conditions – means there is reassurance that all children receive the same level of help
  2.    A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
  3.    Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)

The EYFS profile is specifically designed to avoid the distortions created by such restrictions that lead to an artificial test environment very different from the real life situations in which learning will need to be ultimately used. However, as I explain below, in so doing the profile loses necessary reliability to the extent that teacher observations cannot support valid inferences.

This is because when assessing summatively the priority is to create a shared meaning about how pupils will perform beyond school and in comparison with their peers nationally (Koretz 2008). As Wiliam and Black (1996) explain, ‘the considerable distortions and undesirable consequences [of formal testing] are often justified by the need to create consistency of interpretation.’ This is why GCSE exams are not currently sat in authentic contexts with teachers with clipboards (as in EYFS) observing children in attempted simulations of real life contexts. Using teacher observation can be very useful for an individual teacher when assessing formatively (deciding what a child needs to learn next) but the challenges of obtaining a reliable shared meaning nationally that stop observational forms of assessment being used for GCSEs do not just disappear because the children involved are very young.

Problems of reliability

Reliability: Little inconsistency between one measurement and the next (Koretz, 2008)

Assessing child initiated activities and the problem of reliability:

The variation in my daughter’s two assessments was unsurprising given that…

  • Valid summative conclusions require ‘standardised conditions of assessment’ between settings and this is not possible when observing child initiated play.
  • Nor is it possible to even create comparative tasks ranging in difficulty that all the children in one setting will attempt.
  • The teacher cannot be sure their observations effectively identify progress in each separate area as they have to make do with whatever children choose to do.
  • These limitations make it hard to standardise between children even within one setting and unsurprising that the two nurseries had built different profiles of my daughter.

The EYFS Profile Guide does instruct that practitioners ‘make sure the child has the opportunity to demonstrate what they know, understand and can do’ and does not preclude all adult initiated activities from assessment. However, the exemplification materials only reference child initiated activity and, of course, the guide instructs practitioners that

‘…to accurately assess these characteristics, practitioners need to observe learning which children have initiated rather than focusing on what children do when prompted.’

Illustration from EYFS assessment exemplification materials for writing. Note these do not have examples of assessment from written tasks a teacher has asked children to undertake – ONLY writing voluntarily undertaken by the child during play.

Assessing adult initiated activities and the problem of reliability

Even when some children are engaged in an activity initiated or prompted by an adult

  • The setting cannot ensure the conditions of the activity have been standardised, for example it isn’t possible to predict how a child will choose to approach a number game set up for them to play.
  • It’s not practically possible to ensure the same task has been given to all children in the same conditions to discriminate meaningfully between them.

Assessment using ‘a range of perspectives’ and the problem of reliability

The EYFS profile handbook suggests that:

‘Accurate assessment will depend on contributions from a range of perspectives…Practitioners should involve children fully in their own assessment by encouraging them to communicate about and review their own learning…. Assessments which don’t include the parents’ contribution give an incomplete picture of the child’s learning and development.’

A parent’s contribution taken from EYFS assessment exemplification materials for number

Given the difficulty one teacher will have observing all aspects of 30 children’s development it is unsurprising that the profile guide stresses the importance of contributions from others to increase the validity of inferences. However, it is incorrect to claim the input of the child or of parents will make the assessment more accurate for summative purposes. With this feedback the conditions, difficulty and specifics of the content will not have been considered creating unavoidable inconsistency.

Using child-led activities to assess literacy and numeracy and the problem of reliability

The reading assessment for one of my daughters seemed oddly low. The reception teacher explained that while she knew my daughter could read at a higher level the local authority guidance on the EYFS profile said her judgement must be based on ‘naturalistic’ behaviour. She had to observe my daughter (one of 30) voluntarily going to the book corner, choosing to reading out loud to herself at the requisite level and volunteering sensible comments on her reading.


Illustration is taken from EYFS assessment exemplification materials for reading Note these do not have examples of assessment from reading a teacher has asked children to undertake – ONLY reading voluntarily undertaken by the child during play.

The determination to preference assessment of naturalistic behaviour is understandable when assessing how well a child can interact with their peers. However, the reliability sacrificed in the process can’t be justified when assessing literacy or maths. The success of explicit testing of these areas suggests they do not need the same naturalistic criteria to ensure a valid inference can be made from the assessment.

Are teachers meant to interpret the profile guidance in this way? The profile is unclear but while the exemplification materials only include examples of naturalistic observational assessment we are unlikely to acquire accurate assessments of reading, writing and mathematical ability from EYFS profiles.

Five year olds should not sit test papers in formal exam conditions but this does not mean only observation in naturalistic settings (whether adult or child initiated) is reasonable or the most reliable option.  The inherent unreliability of observational assessment means results can’t support the inferences required for such summative assessment to be a meaningful exercise. It cannot, as intended ‘provide an accurate national data set relating to levels of child development at the end of EYFS’ or ‘accurately inform parents about their child’s development’.

In my next post I explore the problems with the validity of our national early years assessment.


*n.b. I have deliberately limited my discussion to a critique using assessment theory rather than arguments that would need to based on experience or practice.


Koretz, D. (2008). Measuring UP. Cambridge, Massachusetts: Harvard University Press.

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile. Retrieved from https://www.gov.uk/government/publications/early-years-foundation-stage-profile-handbook

Standards and Testing Agency. (2016). Early Years Foundation Stage Profile: exemplification materials. Retrieved from https://www.gov.uk/government/publications/eyfs-profile-exemplication-materials

Wiliam, D., & Black, P. (1996). Meanings and consequences: a basis for distinguishing formative and summative functions of assessment? BERJ, 537-548.

35 thoughts on “Early years assessment is not reliable or valid and thus not helpful

  1. Anyone who was unfamiliar with the ideological world of primary education would rub their eyes in disbelief after reading the official guidance on EYFS assessments. Never mind the reliability–it’s hard to say exactly what’s being assessed. Any test or assessment which measures complex behaviours tells us very little about individual stengths and weaknesses, or in fact what children have learned and what they haven’t. Wiliam and Black notwithstanding, it’s very easy to test simple behaviours, such as knowledge of gpcs and blending skills or whether a child can count and add single-digit numbers. Five-year-olds are not too young for such simple assessments, which have the advantage of being easily administered. Even more to the point, they serve to focus teachers’ efforts on concrete and do-able tasks.

    I’m hoping that Katherine Birbalsingh succeeds in her aim of starting an all-through 5-18 school, where I am sure we will find out how much more children learn when teachers aren’t waiting around to observe ‘child-initiated’ activities. Am I the only one who finds the notion of learning through ‘child-initiated’ play a bit creepy? Of course adults should cast an eye over young children’s play to ensure that no one is getting bullied or hurt, and adults should play games with children or even just mess around with them. It’s natural, even with a lot of animals. But the idea of teachers watching kids’ spontaneous play and then trying to nudge their play in a desired direction just seems a bit pervy to me.

    1. Hello Tom
      A number of us have been discussing your comments and would like some clarification / elaboration on why you consider such interaction with children to be ‘pervy’. Additionally, we are interested in what is meant by ‘messing around’ with children like animals do.
      Thank you in advance.

  2. It’s not Percy at all with really young children- two year olds for example. That’s what parents naturally do with their young children. The problem with the eyfs framework being used in schools is that it is birth to five, and not so useful for children at the older range. It’s hopeless for summarise assessment. We ignore most of the strictures about stuff being child initiated for assessment purposes- it’s unreliable, inefficient and pointless. A couple of reception children did the year 1 end of year maths assessment last year and did very well. Big argument with moderator who said we couldn’t use it as evidence that these children were ‘exceeding.’ Complete nonsense!

    1. Madness that you weren’t allowed to use a maths test. Particularly an issue in maths with the presumption that, what is by nature a study of the abstract, can only be assessed through concrete uses. Grasp of the abstract is often necessary before application is possible to the contextualised problem.

  3. Great post. I am hoping that the fact that assessment exemplification is only of one kind (child-led, play-based observation) because the other kind (teacher-led) doesn’t really exist at the moment! The worry is how to change the rhetoric if you were in charge of EYFS to be more proactive about providing teaching and learning for disadvantaged children without falling foul of the authorities’ requirements to make everything play-based?

  4. Snap! My daughter also attended two pre-school settings and so received two EYFS assessments of her learning. Neither bore any resemblance to the other, or indeed to my view of her capabilities. My view is that this is of little consequence in pre-school settings, where the only thing that really matters is the identification of serious learning/communication/social difficulties. But it continued into Reception year, where their baseline ‘assessment’ deemed she knew no phonic sounds and so she spent a term sitting on a table learning how to say ‘mmmmmm’. Meanwhile at home she had long ago learnt all her phonic sounds and was happily blending to read simple books.

    1. It is worrying. I have heard of data out there on the disparity between EYFS profile and later reading assessments. The pre-school assessment would be harmless if it didn’t take up so much teacher time. I worry about the opportunity cost of all that teacher time following children around with camera and post it notes or an ipad and hours spent analysing and collating the lot.

  5. Any assessment can only be on what is exposed. As we know most of our thoughts will never come to light as most of our life is secret and lived within our heads. Hence many children can be assessed as below and later in life flourish as talented and vice versa The accent should always be on feeding the pig and not weighing it . NOT. Literally of course as being vegetarian I love pigs.

  6. One of the problems we have in dealing with this kind of pernicious nonsense is Article 12 of the UN Convention on the Rights of the Child. In 2009,the UN committee overseeing its implementation offered this guidance:

    “Research shows that the child is able to form views from the youngest age, even when she or he may be unable to express them verbally. Consequently, full implementation of article 12 requires recognition of, and respect for, non-verbal forms of communication including play, body language, facial expressions, and drawing and painting, through which very young children demonstrate understanding, choices and preferences.”

    Rest assured that the DfE takes Article 12 very seriously–.after all, it’s another empire-building opportunity for early-years professionals.

    1. I had no idea of this. Surely though we can recognise and respect non-verbal communication without building a nonsensical assessment system around it?

      1. We can , and do. To be fair to Article 12 is it mostly aimed at younger children who are pre-language; i.e. Birth – 2 years, rather than children in YR, or 3 – 5 which accounts for the vast majority of children in EYFS settings

      2. Although the passage I quoted obviously refers to very young children, Article one states:
        “The Convention defines a ‘child’ as a person below the age of 18, unless the laws of a particular country set the legal age for adulthood younger. The Committee on the Rights of the Child, the monitoring body for the Convention, has encouraged States to review the age of majority if it is set below 18 and to increase the level of protection for all children under 18.”

      3. Still, article 12 does specifically refer to the ‘youngest children’. I’m assuming that this doesn’t mean 18 year olds

      4. Article 12 established the right of all children to participate in making decisions that affect them. All schools in England are required to reflect this in their education plan, and I can assure you from personal experience that the DfE are quite zealous in the matter.

  7. This is an interesting piece from a specialist EYFS assessment perspective and I share some of your concerns around consistency. (As a point of information, a 3 year old should not be assessed under the EYFSP as it is solely for the end of YR. Why the setting referred to advice received from the LA on assessing a child of this age using it is a mystery). It is also a misinterpretation that all information needs to be gleaned from a ‘naturalistic’ context that is described. The ‘practitioner led observational assessment’ used for the EExBA Baseline made this clear and that some aspects of assessment generally require teacher interaction to glean the necessary information and I would argue that this is congruent with the statutory requirements for assessment outlined in the EYFS framework.
    I would argue that it is not entirely true that the principles of reliability that you cite cannot be met within this approach, and with effective and comprehensive moderation and a range of exemplification (and I accept your point that this often over focusses on Child-initiated activities) consistency can be supported and secured. Effective EYFS assessment requires the professional responsibility to use a flexible and responsive approach to collect the necessary information.
    However, as with many forays into critiquing EYFS assessment, there is an assumption that ‘testing’ in whatever form is more reliable. Although this may be true for older children, I am afraid that the evidence for this kind of approach for children aged B – 5 suggests that this is not the case. A good place to start might be with Meisels, S.J. 1993. “Remaking the classroom Assessment with the work sampling system.” Young Children. 48(5):34-40.
    Testing young children in the way advocated by CEM and NfER rely on making huge assumptions about proxies for learning and knowledge that are impossible to prove, nor do they take into account the nature of children at this age, the range of perceptions they hold, the impact of previous experiences and the emotional dimensions of the child at the point that the test takes place and how this might affect their response. Elaine Mason’s work on the previous Baseline models (1998) demonstrated the inconsistent results children attained depending on the day of the week and the time of day. This was why the original FSP and its later manifestations used the observational model as tier core.
    I’d be more than happy to have a broader face to face discussion if you are interested in exploring the challenges and issues of assessment in the EYFS. As I stated, I do share some of your concerns from what you have described, but I also believe that there is a lot of misunderstanding of what makes for effective EYFS assessment – from both ‘sides’ – and in order to get it right; which is really important, there needs to be a genuine discussion.

    1. Thank you very much for your comment. I’ll reply properly later when I have more time but just to clarify. My daughter’s reading assessment was at the end of reception. In case anyone else is confused by this I will make this clear in the post. I also explain in the post that it is not the case that all information has to be gleaned from a naturalistic context but the strong priority on this is clear.

      1. Regarding the key points you raise. There is a debate to be had about using baseline testing of the sort offered by CEM and NfER and the onus is on those developing models of baseline testing to demonstrate their fitness for purpose. From what I’ve read it seems likely they can provide meaningful data at a cohort level but can’t be used to track individual progress but I haven’t researched this in detail and I’m quite open to be persuaded either way.
        However, I can’t see how the perceived weaknesses of one model can justify continuing with the EYFS profile or using another observational approach because these are clearly not fit for purpose – as I explain. Moderation and exemplification simply can’t overcome the issues I have raised over reliability (let alone the other issues that I will cover in subsequent posts). Exemplification and moderation aren’t a substitute for standardised conditions and can’t make up for gaps in what has been observed.

      2. I am still intrigued by the assertion that observational assessment is not ‘fit for purpose’ and the test based approaches, in your opinion are. There is no evidence that supports this view and, for example, the EExBA data for 400k children was consistent overall. I agree that observational assessment can be variable as it is teacher judgement but this is possible to address. However, test based assessment contain systemic assumptions and cognitive biases that render them unreliable and invalid. I am not convinced by the results of test data that it does provide the information you state. CEM’s analysis of their own data indicates that it is a poor indicator of later outcomes, even for a cohort with something like a 50% chance of accuracy (I may be wrong about the figure but I believe it is in that region )

      3. I am not really trying to make an argument for CEM or any other baseline. Any possible failings of one approach don’t justify another. I am explaining why the EYFS profile is not fit for purpose because it does not allow the creation of a ‘common meaning’.

      4. there is a ‘common meaning’ which is expressed through the guidance, the exemplification and through rigorous moderation which takes place continually. The ‘common meaning’ in the EYFSP and EExBA is different to the one expressed through test data but is nevertheless valid and fit for purpose. Your critique focuses on what you believe to be the inconsistency of data from observational assessment; the ‘shared meaning ‘is clear and very separate to this.

      5. Moderation cannnot solve the problem of non standardised conditions. The term ‘shared meaning’ as used by Koretz requires:
        1.Standard conditions – means there is reassurance that all children receive the same level of help
        2.A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
        3.Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)
        These conditions are required for realiable summative assessment and the EYFS Profile is unable to fulfil these criteria.

      6. 1.Standard conditions – means there is reassurance that all children receive the same level of help
        2.A range of difficulty in items used for testing – carefully chosen test items will discriminate between the proficiency of different children
        3.Careful selection of content – from the domain to be covered to ensure they are representative enough to allow for an inference about the domain. (Koretz pp23-28)
        These conditions are required for reliable summative assessment and the EYFS Profile is unable to fulfil these criteria.

        Absolutely not true
        1. Standard conditions, although different, (and remember we are dealing with 4 and 5 year olds not 11 – 16 year olds) can still be consistent, and the level of ‘support’ will be the same when making the assessment. Part of the surety of this is that, ironically, it is will mostly be child-initiated
        2. The profile does make a range of judgements; essentially within, below and above national expectations. The exemplification clarifies this
        3. The criteria consists of the content that defines the child’s knowledge in the specific area and is therefore representative.

        And forgive me, but I have never heard of Koretz. I may be wrong, but this could be because Koretz does not have a ECE focus. On the flipside, are you familiar with the work of Carr, Edgington, Hutchins, Pascal, Leavers, (and me)?

  8. As a EYFSP moderator I can confirm that ether is no requirement for all or most of evidence to be ‘naturalistic’ or child-initiated. In reality few schools are able to provide a majority of child-initiated evidence for EYFSP, particularly for Number, Reading & Writing. As a EY teacher I would say there is good reason for this and many children need strong adult guidance to learn the skills and knowledge before they are confident to explore and apply academic areas by themselves. Indeed, the EYFS covers the birth to 5 age range, and clearly states that there should be an increasing balance of adult-led, whole class teaching towards the end of the EYFS in preparation for formal learning.
    Where there is little evidence of independent behaviours, it would be usual to make reference to it in the moderation notes to the head that this is so, and to make sure this informs Y1 teachers an planning, as children may only be able to demonstrate knowledge & skills with adult guidance and support.
    The issues with teaching and learning in reception have more to do with a dearth of very poor synthetic phonics teaching in schools, and in YR in particular. The EYFSP Reading and writing goals are explicitly about decoding, blending and segmenting, but all too often are marred by poor interpretation and top-down literacy, reading recovery benchmarking and lack of decodable schemes, that completely ignore the clear expectations of the Profile. This is a problem with schools not the Profile.
    It should be remembered that EYFS is what is says – a foundation. It should also be noted that a great many children enter reception without the foundations for learning, and must be given broad and balanced provision to ensure they have the opportunity to develop the communication & language, social and emotional, and physical skills needed to be able to have equitable access to year 1 learning.

    1. Thanks for this. I pretty much agree with all your sentiments especially regarding reading. The comments on here indicate that there is real variation in assessment practices. It is very important that the EYFS profile and exemplification materials are revised in line with your much more sensible approach. The interpretations being so wildly variable simply adds to the unreliability as assessments between schools in different areas aren’t therefore comparable. It would be marvellous if this also led to an end to the crazy ‘evidencing’ that reception teachers are required to spend their time doing.

  9. EYFS or any assessments of any young children are an assessment of that child’s progress at that time when being observed, not a test. Testing is never mentioned in any good practice early years documentation. Data is purely the notes that professionals use to provide evidence of progress made by an industry visual and not in any way similar to GCSE data or other.

    I have never read such nonsense. Clearly, you need to widen your knowledge of early childhood development before you are in anyway qualified to write a blog based on this.

    This is evidence to support the notion that the internet is full of nonsense!

    1. As I explain in my post the EYFS Profile is a statutory assessment used nationally to assess the learning of children up to the age of 5. The purpose of the EYFS assessment profile is ‘to provide an accurate national data set relating to levels of child development at the end of EYFS’. It is also used to ‘accurately inform parents about their child’s development’. As I explain in the post the sort of data you refer to is not fit for this purpose.

      1. Information from the EYFSP is used to inform parents of developments, achievements and attainment that children have made during YR. Your post is an analysis of observational assessment not how it is used for parents. It is absolutely fit for that specific purpose

  10. I don’t see anything here that can’t be addressed via moderation and training. The EYFSP is an end of stage summary and it isn’t the only thing in education that is assessed by teachers… Even now, year six have teacher assessment for writing, for instance, and this will always be slightly subjective, as all assessment that is not based on a absolute rights and wrongs. That’s why A-Level papers can be remarked and graded differently.

    The reason that the majority of judgements in the EYFS are based on child initiated learning is because that is how young children (and, I’d argue, all children) learn best. If they spend most of their time learning that way, it makes sense to assess them that way. As they move up the school this isn’t the case (it should be, in my opinion) but this doesn’t mean we should change the way reception works to prepare them for it. That’s like trying to teach a newborn to walk, because they’ll have to do it later on. Yes, but they aren’t yet ready. We need to stop basing our ideas of Early Years learning and assessment on what the rest of the school wants or does and work for the best interest of our children.

    1. Thanks for your comment. Moderation and trianing do not solve the problem of ensuring ‘a shared meaning’ – they are not able to do so. The reasons why EYFS uses child initiated assessment are well known but unfortuantely the more naturalistic the assessment the less control there is over the conditions of that assessment. This is why the profile doesn’t meet basic criteria for reliability and validity.

  11. With all due respect Heather – and I do genuinely respect and appreciate your interest in EYFS assessment – this isn’t the experience of undertaking EY assessment nor my experience of being national lead for the EYFSP from 2005 – 2011; (I guess I should have mentioned that). What you describe as ‘naturalistic’ assessment has always been a key part of ECE pedagogy and assessment but that doesn’t mean it is less reliable. The moderation procedures and exemplification that have been developed to support EYFSP have strongly strengthened this and not withstanding the concerns you have – and I share some of them – the ‘shared meaning’ is a strong core of the EYFSP. I wonder if your own experiences with your own children have coloured this; I can assure you what you have described is not universal by any means; and as I have suggested you would be welcome to meet at the EEx centre in London and explore the aspects of this more fully

  12. The purpose of moderation is to ensure that judgements are consistent; so that knowledge demonstrated by the child is similarly assessed wherever thy are, and in whatever context. The focus is on demonstrating the knowledge being assessed, and yes, the context may be different. I appreciate that this is very different from test based models but we are dealing with children at a different level of development and experiences and so the process may reflect that. For example; one child may demonstrate the knowledge of number by counting shells in a bucket, another may demonstrate this by counting the number of times a ball bounces. The conditions are technically different but the aspect being assessed is the same and can be moderated to be consistent

  13. from his University page
    Daniel Koretz is an expert on educational assessment and testing policy. A primary focus of his work has been the impact of high-stakes testing.

    I suspect this means that he does not have a particularly ECE focus or specialism. It can be dangerous to generalise. Imagine if 16 year olds were assessed by observing them playing in the sand area; it wouldn’t really work would it?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s