KS3 Assessment. 8 steps towards a workable system.

Screen shot 2015-10-11 at 09.53.39
Part of our KS3 Assessment Guide

We are about to launch our KS3 assessment system.  I’ve shared the full details in this post.  We’ve arrived at this model after considering all the following issues/questions/factors:

1.  Accept the reasons that NC Levels became a broken system. 

This has been covered by lots of people in great detail, but here’s a quick summary

  • Prose level descriptors were virtually impossible to apply consistently or meaningfully.
  • Levels originated to define broad range attainment standards at the end of a Key Stage; they never worked as a ladder of progress, certainly not in sub-level form. They also did not hold water in relation to specific elements of content – bits of science or maths for example.  Content X = Level Y is a nonsense because all knowledge has varying degrees of depth and complexity to which it can be learned.   It pains me that in my school students had ‘6b’ written on a piece of work in some subjects. As if that could be known.
  • The illusion of progress from 6c to 6b to 6a etc was divorced from assessment processes robust enough to measure to that degree of precision; a lot of that was driven by data tracking systems and teachers putting their finger in the air.  Variation within and between subjects was huge – no amount of moderation could safely line up a 6b in Geography, French and Maths within schools or between schools.
  • A very high proportion of discourse with parents and students and between teachers become focused on the numbers rather than the learning.

And no, something is not necessarily better than nothing.

2. Think about the way teachers in each subject evaluate standards and describe how to improve. 

Every subject has distinctive features. To improve a piece of writing in English or History, you need to focus on some specific aspects in the context of the particular writing task alongside some general features that apply to all writing.  It’s too complicated to generalise into a neat list; the feedback is too specific to each student.  Grading the work can’t possibly communicate the areas for improvement.  Here, the assessment in the micro is the key and students can only focus a few things to practice and improve at any one time.  General descriptors of writing only make sense in relation to specific examples.  The same issues hold in Art.

In Maths and Science, you cover lots of bits of content and some repeated skills each of which can only be measured in relation to specific questions.  Tests with a certain range of questions are the most commonsense way to gauge progress.  Of course, some students express ideas verbally too and that will also inform teachers’ judgements. A score of 80% tells you something; a score of 40% tells you something – if you know the test. However, it’s the specifics of which questions were right/wrong that contain the learning.

So – it’s complicated.  It’s all in the detail. Any KS3 assessment system needs to ensure maximum focus remains on the detail.  Aggregating things up to a simple measure is always going to be flawed.

3. Balance the micro and the macro: keep in mind the need for a system that actually helps students to learn but is also manageable in scale.

I’ve seen some systems being developed that have become heavily bureaucratic with lots of data entry required against prose descriptors and ‘can do’ statements.   It’s tempting to think we could track progress along the axis of ‘solving algebraic expressions’ or ‘writing to persuade’ in a helpful and meaningful way. However, once you add up all the different axes required, you’re talking about a massive system with thousands of data points.  I’d suggest there is a limit that is quickly reached before this is unsustainable.   (I remember being presented with 17 different tick-sheets for every student in my class in the early 90s – when Science had 17 attainment targets in the National Curriculum.  I left them all blank, knowing the system would die.  It died in weeks.)

I think it all needs to be much more organic than that; lived, not tracked.  Our Assignments are meant to be an organic tracking system to be annotated by students and teachers in books.  We want the micro learning to be prominent but not cumbersome.  I wouldn’t dream of putting all the assignments online – no-one has time to track at that level of granularity.

4. Don’t do ‘can do’ statements.

I know this is a popular path that different schools have gone down but I think it’s a mistake.  As Daisy Christodoulou and others have shown, ‘I can do X’ only makes sense in relation to a specific set of questions.

For example, ‘I can explain why my heart beats faster during exercise’ or ‘I can use the past tense’ are statements that can never be ticked off securely. They depend on the degree of depth, the level of complexity, the context, the extent to which they are sustained.  Even if the ticking off is staged (approaching, securing, mastering etc) it’s the same problem.

Not only are ‘Can do’ statements flawed in judging their completion, they also fall foul of Issue 3 – it’s going to be a massively cumbersome system. You may as well give out copies of the GCSE specification and tick bits off.

5. Embrace the value of formative tests.

I think we all need to re-write the popular slogan: weighing the pig doesn’t fatten it.  Actually, as we should now know, weighing the pig does actually fatten it if we are talking about testing within a learning process, not just at the end. (Once again, this re-write is borrowed from Daisy C and derives from work by Robert Bjork and others.  Testing fuels learning – it’s a fact.) Test scores from tests that focus on specific elements of learning are a very efficient and effective way to determine the depth of learning and to gauge progress.  There is no value in then turning each test into some pseudo-scale ( 35/60 is a 5c?). The point of a test is that it tells you where learning and teaching are stronger and weaker, student by student and at whole-class level.  Tests are about the micro, not the macro.

6. Accept that benchmarking is also needed:

The need to benchmark is a strong desire for all concerned:  How good is good? In addition to the micro detail of what has been learned successfully, we need a simple indicator to link that to broader brush standards.  I suppose this is what NC levels were intended for before they became corrupted.  For us, this is the last piece of our system to be introduced.  There are lots of issues to wrestle with:

  • Are you re-creating the flaws of levels with numbers dominating the discourse and false measures?
  • Are you reinforcing fixed mindset thinking by locking people into pathways?
  • Are you making false connections from the messiness of the micro of real learning to the neatness of a macro data point?
  • Are you risking introducing an institution-specific scale that can’t be moderated or linked to national standards?
  • Are you using bell-curve markers (such as GCSE grades) as a ladder? This is always misguided.

We think our system can work.  We’re looking at starting points, focusing on progress and we’re pegging it to GCSE grades.  This is risky because no-one actually knows what the 1-9 scale looks like in practice; not yet. However, if we refer to it as a rough scale and keep stressing the approximate benchmarking, it will help to contextualise the details.  If you know you’re working at grade 7 in bold terms, it puts the teacher feedback into context.  But, it’s not a ladder – that’s a vital difference. Progress must be defined within the terms of the details of the subject.

For reference to old GCSE grades: 9~A**, 8 ~A*, 7~A, 6~B, 5~C/B, 4~C, 3~D, 2~EFG
For reference to old GCSE grades: 9~A**, 8 ~A*, 7~A, 6~B, 5~C/B, 4~C, 3~D, 2~EFG

It may be harder to sell the idea that each number is a range, not a point on a scale – but that’s what we’re saying:

Screen shot 2015-10-03 at 13.56.34

7. Remember the assessment uncertainty principle.

The Assessment Uncertainty Principle is one of my favourite posts.  We must not allow the illusion of fine tuned assessment to be created by sub-steps and fine-grading.  Assessment is fuzzy and anything we do that suggests otherwise needs to be recognised and handled with care.  Is a student on Grade 7 in our system, necessarily achieving at higher level than someone awarded Graded 6?  Well, no.  We’re simply projecting teachers’ judgements and offering a best guess based on our estimates to give a rough idea.  Within the detail, a test score of 63% in Maths is going to be more meaningful, but actually, even then, only when we look at which questions the student got wrong and why.

8. Learn by doing

Personally, I reject the complaint that we should have continued with levels until a better system was devised.  I also reject the idea that schools should have been handed a National Assessment System on a plate by the DFE.  This process has been difficult but also invigorating and necessary. For the first time IN DECADES, teachers have had to think for themselves about what assessment should look like.  This is Michael Gove’s greatest gift – even if he did it rather by accident.  Nature abhors a vacuum – and so do teachers! The sharing and debate around this has been a highlight of the last couple of years.

I doubt very much that the system we’re using will be the same in three years’ time.  But how will we know if it works unless we give it a go.  We’ve put a lot of thought into it, rejected other models for good reasons, and ended up with something with a good chance of succeeding; something logical, manageable and well-reasoned.  I’m happy with that.  Let’s see what happens!


  1. What you say intuitively makes a lot of sense. I have been teaching for 30 years and seen quite a few different tracking systems in the schools I have worked in. What you are implementing highlights the links between ks3 and ks4 which levels obscured and created an artificial disjoint in pupils tracked progress. The school I am at has gone down the flightpaths route and this concerns me and it instantly puts ceilings and limits on students. I teach set 4 in year 10 and I feel uncomfortable having to tell them their flightpath. You can imagine what happened in the lesson when I told some of them you are on a 2/3 path.I look forward to seeing how your system develops.it is what I would have implemented if my role was different.


    • What you say intuitively makes a lot of sense. I have been teaching for 30 years and seen quite a few different tracking systems in the schools I have worked in. What you are implementing highlights the links between ks3 and ks4 which levels obscured and created an artificial disjoint in pupils tracked progress. The school I am at has gone down the flightpaths route and this concerns me and it instantly puts ceilings and limits on students. I teach set 4 in year 10 and I feel uncomfortable having to tell them their flightpath. You can imagine what happened in the lesson when I told some of them you are on a 2/3 path.I look forward to seeing how your system develops.it is what I would have implemented if my role was different.


  2. I support both your thinking and approach and I know that you posted this to get “feed forward” so here goes.

    “Testing fuels learning” but only when the tests are well designed and what is required of them is decided on before they are designed and implemented. I still find it difficult to convert assessed learning into a score and then reverse engineer it into learning evaluation. Students are apt to focus on the score and not on evaluating their learning.
    An example of design thinking when developing an assessment.
    1) Do I want to have evidence that students know something?
    2) Do I want to know if students can apply knowledge?
    3) Do I want to determine to what extent a student understands something?

    Weighing the pig may fatten it but what if you are not really interested in how fat the pig is but instead in how healthy it is?

    Testing will fuel learning but only if the outcome of the testing facilitates and focuses on an evaluation of the learning in order to plan future learning. It should answer three questions.
    1) What am I secure in
    2) What do I need to revise and
    3) What do I need to review

    Point 6 raises in me a few questions and as you point out there are a lot of issues to wrestle with. You say “The need to benchmark is a strong desire for all concerned”, okay but when?

    Why do we need “a simple indicator to link that [what has been learnt] to broader brush standards” ? Making things simpler and broader surely only decreases any validity or accuracy and in doing so makes the process redundant. Imagine only having 100Kg markings on your pig weighing scale!

    I think #6 is the one that will ultimately form the pivot point of any system you put into place. Does “working at”, in broad terms mean that I understand some, most or all of what I have been taught or expected to learn. Is it enough for me to settle with a 7, is it good enough for me or to get me where I want to go. You say “Progress must be defined within the terms of the details of the subject” does a 7 do that?



    • These are really good challenges. A 7 doesn’t tell you anything about the detail. It just tells you what your bell curve position is which contextualises the rest. Feedback from formative tests and other pieces of work is critical. Our challenge will be to keep the 1-9 in the background. Our discourse will always be about what has been learned and how to improve – partly because it will be hard to put the numbers up.


  3. I wonder, thinking as a learner, if my position on a bell curve will contextualise my learning. Will I understand it and will it provide the encouragement and motivation to do my very best and to deal with disappointment?

    I know it will be difficult to keep the numbers in the background. In my own experience when I used a secure/revise and review assessment all I kept getting asked was “What level/grade/mark did I get?” (I think I have already provided you with a link to the relevant article, if not let me know). It is a little easier though if you declare how work will be assessed before assessing and how that assessment will be used.

    One final point I would like to make is about the innate promise, through expectation, of any system to be valid and accurate when the final assessment is out of your hands. Ultimately it will all lead to GCSE grades and we know performance in these is not just a case of some algorithm of past performance, some linear trail or curve that will deliver what you expect when you expect it. Having this conversation with learners is vital and reinforces your emphasis on learning. It is also vital to coach learners to face such assessments with confidence and an ability to contextualise the outcomes. I know of learners who have done as asked, they have been compliant and worked hard only to fail to reach expectations. I know the impact this has on self belief and I know the coping mechanisms they need to pick themselves up. Ultimately this is the true responsibility of any assessment system we put in place in schools for we are dealing with the well being of learners. Perhaps it is our ultimate responsibility.



    • Our experience last year has been that, in the absence of any external benchmarking, there’s a tendency to revert to self-referential internal benchmarking by default. That risks a lowering of standards; there’s a general softening of expectations. It’s much tougher to face the challenge of meeting external standards – which is why we’ve pitched our scale at a demanding level, to make sure students are not under any illusions. We don’t have a complex indicator of external benchmarks – but some exist within subjects. For example, there are nationally standardised maths tests for KS3 students. Parents and learners find the jargon of assessment very confusing and a simple code is a way to help them engage. If you think your child is ‘top end’ and doing well, it’s helpful to know if we’re talking about 7s or 9s. Similar with students who don’t know what standards are, think they’re doing ok but, actually, are still operating around 3 or 4. They need to know that I think – that helps them put the imperative to continue striving to improve in context. Too many of my students are deluded; they think they’ll be ok but actually they’re behind the curve; even if they’re progressing to some degree, it’s not enough. So, I think we need the micro and the macro together. Last year we only had the micro – it wasn’t enough. Thanks for the feedback and the challenge. Tom

      Liked by 1 person

      • Happy to help you reflect and to fine tune your strategies Tom. If you ever wan to engage me in a professional capacity to challenge and support you, your staff and students then I am just up the road in Northampton and happy to visit. My e-mail is kevin@ace-d.co.uk
        BTW The formula for the number of guitars needed at any one time is the number of guitars owned plus one.


  4. One of the things we are looking at in our system is how we can work with partner schools (with different systems) to help us moderate our judgments of standard at KS3.

    We might think that a piece of work; a test; a portfolio or whatever is of a very high quality for a Year 7 student but how do we know if our Art staff have been at our school for over 10 years? The world may have moved on and we not realised? Or we have become a bit accustomed to the standard of work we see and not fully appreciate it’s greatness in the wider context.


  5. Thank you again for the 8 steps to a workable system.
    I am wondering what is being reported to parents and to students.
    How is the attainment grade going to be reported to parents? Will it be a numerical score (78%) or a numerical GCSE grade (9-1)?

    Will the progress grade that is being reported alongside this be in the format E, G, S, P scale?

    Thank you again for sharing your work on this.


  6. A very good case is made for the removal of levels. An even better one is made by the expert panel reviewing the curriculum (2011) and the Assessment Commission’s report (2015). However what happens next is an absolute made up fairytale – reminding me of the children’s story the Emporer’s new clothes. Sorry this is just a revamp of the old National Curricilum levels. Flight paths were a phenomenon developed when levels were central to the system and FFT estimates. This system was flawed and the reason why legislation removed levels from the system. Companies were tasked to develop a more equitable system measure. Progress 8 is not based on the use of levels so to return to flight paths based on your own 9-1 labelling is like trying to plug an analogue TV into a digital feed and expecting it to work!
    “There is overwhelming evidence that levels needed to go and the Commission strongly endorses the decision to remove them. However, the system has been so conditioned by levels that there is considerable challenge in moving away from them. We have been concerned by evidence that some schools are trying to recreate levels based on the new national curriculum. Unless this is addressed, we run the risk of failing to put in place the conditions for a higher-attaining, higher-equity system.” (Government Commission on Assessing without Levels 2015).
    Re 9-1 this does not work for all subjects and the outgoing Secretary of State did inform schools that subjects did not all have to conform to exactly the same system of assessing. For example physical education core PE is 100% practical. The GCSE qualification starting in Sep 2016 is 60% assessed by exam and 40% NEA. The two don’t equate – analogue and digital again! To indicate that a year 7 will get a 1 or a 2 therefore is a waste of teacher admin time and increases workload especially at a time when workload is an issue.
    The commission recommends expressing outcomes in curriculum terms and the CIF (OFSTED 2015) even uses the term ‘Assessment information in the place of ‘data’ and then Sean Harford and John Mackintosh appear on two videos sharing a message with schools saying that inspectors do not want to see data spreadsheets developed from the use of numbers – rather they wish to see how schools use assessment information.
    One of the criticisms of levelling was that it labelled differential performance and did not encourage a growth mindset. There is a huge rhetoric reality gap between stating ‘we want to encorage a growth mindset’ and then using an approach that doesn’t encourage it! Learning isn’t linear and therefore a measure that is linear and hierarchical is not fit for purpose!
    In terms of Mastery learning the profession has somewhat misunderstood the use of the term. Mastery is the expected inclusive standard. Many schools use the terms emergent expected and exceeding. Mastery is expected for all – and everything we know about childhood growth and development and the performance of other high performing jurisdictions indicate that we can expect mastery for all against the new standards unless children are SEN or disabled. Yes work back from these thresholds but there is no linear route to them therefore we cannot capture this new progress using meaningless labelling. FFT started in 2001 when 55 LAs were on board. In 2004 all LAs were on board. Type D Estimates (95% accurate in Eng & Maths within one estimated grade) didn’t emerge until there was sufficient ‘data’ in the system some 8 years after descriptive statistics were introduced. In ‘foundation’ subjects this was as low as only 70% accurate within one estimated grade. SATS are tests in the core subjects – is it any wonder that in other subjects estimates were 30% inaccurate? Any statistician will tell you that using data that is 30% inaccurate is a total waste of time…. Yet here we go again with a re-creation of a previously data obsessed admin rich standards plateauing system which the removal of levels tried to avoid!


      • Hi Tom
        Progress expressed in curriculum outcome terms referenced to the new national curriculum standards. Certainly not an obsession to convert every bit of progress a learner makes into a meaningless number or grade and adding to meaningless teacher administration.
        Levels have gone as you rightly acknowledge but if the narrative for their removal was truly understood then schools would not be using 9-1 which replicates the use of labelling 1-8 plus exceptional performance with levels however you dress it up.

        Liked by 1 person

  7. Hello Tom

    Thank you for sharing your thoughts and your school’s approach to this difficult problem of Assessment Without Levels (AWL). We wrestled with these problems in working with schools in the development of what has become the ‘KS3 Curriculum Design and Assessment System’ within the 4Matrix application. I am not writing this to plug our product, simply to contrast your thinking with ours.

    The most useful reference to guide one’s thoughts on this subject is the report of the ‘Commission for Assessment Without Levels’. Your article echoes many of its comments about the weaknesses of using assessment systems which are based on using Levels.

    The Commission report provides an important pointer to finding a solution to the AWL problem in describing the notion of ‘mastery’. The significant point about mastery is that it assumes that all pupils could master the elements of a subject if the teaching was closely geared to pupils’ learning. i.e. that through feedback a teacher could revisit a particular idea or concept for pupils that had not mastered it on the first visit. The report further suggests identifying those ‘key concepts and big ideas’ in each subject which deal with the fundamentals of the subject which need to be mastered to be fluent in understanding the essentials of the subject.

    The heart of the approach that your school is evolving is described in the table in section 6 of your article.In essence it is an approach which looks at prior attainment and predicts the most likely GCSE grade. It will then categorise pupils in relation to whether they are above or below target.
    This would be different to a mastery approach. The approach described in the table is based on the idea that we should expect pupils should to attain a GCSE grade which is indicated by prior attainment, and by implication that being securely on a low target is OK for pupils with low prior attainment. This is a fairly common feature of many schools’ approaches to the problem of Assessment Without Levels, and I would argue that we should move on from this view to one which assumes and encourages success for all.

    The identification of criteria used to make judgments about pupil progress will form the new currency for an AWL approach. Your article describes how tests, assignments, questioning and formative assessment can be used. You say that it shouldn’t be cumbersome, or based on ‘can do’ statements, or prose indicators. Identifying what these criteria actually are for each subject will provide the content needed for an assessment system. This was the most significant problem that we looked at in our work on this issue.

    The solution that we came up with was actually quite simple. We thought about how teaching has always worked. We wish a pupil to learn something and we teach it. We then use a range of feedback mechanisms to see if the pupil has ‘got it’. The criteria therefore is ‘that which we want a pupil to learn’ i.e. the Learning Objective for each unit of work. Across a year or a key stage there would be as many component Learning Objectives as were needed for the pupil to master each of the Subject Content statements (Attainment Targets) for any given subject.

    The approach therefore is based on monitoring the Mastery of pupils’ acquisition of Learning Objectives for each subject. These Learning Objectives would be identified by subject leaders as they plan each unit of work – which is why we have included the important planning stage within our approach.

    Having decided on an approach to AWL, the next problem which will be faced is how to turn the chosen approach into a ‘system’, by which I am referring to the practicalities of collecting a lot of data across many subjects, and processing this to provide information for school leaders, and reports to pupils.
    An IT system will be needed to do this. Many schools are using spreadsheets, but specifically designed content-management systems which provide the flexibility to support any approach, are likely to provide a better solution.

    I have written about these issues at http://www.mikebostock.com. A presentation on the system that we have developed can be seen at http://www.4matrix.org/ssat


    • Whilst ‘mastery for all’ is a laudable goal we must accept that students arrive to us with different prior knowledge and attainment. For some getting to ‘mastery’ on a topic/skill will be simple – indeed they may already arrive at secondary having mastered that topic/skill – whereas others have a much longer journey to get there.

      So if we aspire to mastery, but also that all students are challenged, the learning objectives may need to be different for different students even within the same subject and year group.

      In developing our ‘system’ we had significant consultation with students, parents and staff within school. We developed core principles from each group and used those to help design a system that met the needs of our community in our context.

      What was striking was the desire from parents and students alike for absolute ‘honesty’ in the reporting. They felt that levels hid how well (or otherwise) they were doing compared to ‘what is expected for their age’ yet also wanted absolute clarity about whether the student was making ‘good progress’ from their starting point towards ultimate success or were they lagging behind.

      We ended up with a system (that isn’t perfect – but is getting better with each tweak) where:

      a) students receive a ‘benchmark’ (as opposed to target) based on their KS2 results. These are set using a combination of FFT20 and staff judgement.

      Parents and students are told that this is to be used as a guide for how well students are progressing – if their attainment is above the benchmark then they accelerating away from similar students nationally; on benchmark they are making progress quicker than similar students nationally (since FFT20 is above national averages) and if below benchmark then not progressing so well.

      We did have a long debate about whether the benchmarks should be FFT 50 (ie. accelerating away; progressing as expected; slipping behind) but felt that FFT20 better suited our context.

      b) each subject determines in advance the key learning objectives for the unit of work (2 units or 3 over the course of the year based on how many lessons/ week the subject has).

      In each unit the students take an assessment. The format of that assessment is determined by the subject – it could be a test, portfolio, a combination of several smaller tests/tasks (eg. in MFL speaking, listening, reading and writing)

      The assessment is ‘graded’ against ‘national expectations for age’ (which we initially used levels to help us determine and have since moved on and have external sample moderation arrangements with some neighbouring schools).

      c) the reports to parents identify any particular strengths and areas for development from the key learning objectives for that unit and we are increasingly providing additional learning resources for those areas on our VLE.


Leave a Reply to 4c3d Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s