How can we measure and report progress meaningfully?

As we continue to develop our system-wide thinking about assessment, it’s important that teachers and leaders understand the underlying concepts we’re dealing with.  In order to motivate and challenge all students, it makes good sense to try to distinguish between attainment and progress.  This allows us to give value to students making strides with their learning regardless of their starting point.  Schools have made valiant efforts to develop assessment language and processes to measure progress and to report this to parents.  Not everyone can get the top marks but everyone can make progress.  That’s the idea.  But does it work?

The idea of progress only works if we’re clear about what it means – and only if we give it the weight the concept can sustain.

If we have something absolute like the time it takes to run a 5K race or how far we can jump in long jump, progress is measurable: we measure it in the scale of the time or distance that we use for the thing itself.  If I’m trying to lose weight – I can measure it at various points and use that to gauge a sense of progress.

However, many of the measures we use in education are only meaningful in relation to the performance of the cohort.  This was true of levels and is true of GCSE grades 1-9.  It’s also true, by definition, of standardised test scores.  Essentially these are all bell-curve position markers.  If we are average, at the 50th percentile, and our idea of progress is to move to, say, the 60th percentile, we’ve made progress but only in relation to everyone else.

This is currently the dominant model, embodied in the idea of flightpaths, comparisons against FFT targets and the Progress 8 calculation.  There are several major conceptual issues with it:

Firstly,  it’s a zero sum: children either all make no progress (and keep their relative bell-curve position) or some children make progress at the expense of others.  For a given school cohort, it may be possible for everyone to progress compared to the national background – but only if other children elsewhere are falling behind.  We can’t have more than 50% of children in the top half…  sad but true.

The second issue is the problem of measurement.  What does it mean to make progress in art or geography? You can gradually get better at painting and drawing; you can get better at writing and know more facts – but you can’t measure those things on a scale. There is no ‘scale of geography’ to move along.  Any time you try to generate a standards ladder to climb, essentially you are describing a series of age-related bell-curves; you only have relative bell-curve positions to guide you.  The notion of whether someone is making ‘expected progress’ or is exceeding or performing below expectations, is human judgement about how the quality of their work or the general extent of their knowledge is improving against a sense of what children of that age are usually capable of and how quickly children typically improve.  Any scale is an invention guided entirely by human judgement.

When teachers are saying someone entering a school at 11 is ‘on track’, the track in question is usually a projection from KS2 to KS4 outcomes based on historical links between the two different bell-curves.  However, there are no actual measures available to measure progress along this track.  Anything we use is an educated estimate; a rough idea; an approximate indication.  In most subjects we don’t have a good set of data to describe the national bell-curve at the beginning, never mind over time – not until GCSEs are taken. Systems that spew out ‘target grade 6.2’ in Year 8 are giving an illusion of accuracy (decimal places) that is entirely false.   Someone in Y7 given a projection of 5.3 against an FFT target of 6 might be considered to be making insufficient progress – but the numbers are all invented.

This excellent blog by Matthew Benyohai is brilliant for spelling out the difference between progress and attainment – check it out: https://medium.com/@mrbenyohai/the-difference-between-measuring-progress-and-attainment-7269a41cdd8 It highlights the unevenness of progress in relation to different areas of knowledge and supports the idea that discrete measures of attainment on specific assessments are the only really meaningful concept we’ve got.

See also this superb gallery of ‘progress nonsense’ from Matthew:

Thirdly, there is an illusion of comparability:  Despite the almost total absence of inter-departmental moderation or technical standardisation processes, schools routinely generate data sets that invite direct comparison between subjects.  We’re asked to accept that a 6 in Geography represents a similar standard to a 6 in Art; a student with a 5 is doing worse in Art than suggested by their 6 Geography.  This then leads to parallel assertions about progress.  All of this is invented.  It is literally not meaningful to compare the degree of progress made in say Year 9 Art compared to the degree of progress in Year 9 Geography when these judgements are made in entirely different disciplines by different people –  unless both sets of teachers have very good mechanisms for comparing school outcomes at KS3 to nationally norm-referenced standards.  The best we can hope is that each teacher’s professional judgement of a meaningful notion of progress makes some kind of sense within the parameters of the subject; to compare beyond this is guessing. Guessing is fine if we acknowledge it; it’s not fine if we pretend we’ve measured something.

A fourth  issue is the ‘so what?’ question: this flightpath model of linear bell-curve to bell-curve progress doesn’t tell anyone what to do to secure improvement, because the bell-curving, standardizing system takes the information away from a raw form that might be useful.  Even if there are some average general links between the specific knowledge and skills students might have different bell-curve positions, you can’t know this about any individual.  Attempts to connect statement banks of ‘can do’ statements to flightpaths are massively flawed – valiant maybe – nice try –  but ultimately false.

So what can we do? 

For me, we need to face the simple reality that there are two broad categories of authentic assessment of attainment and this is what we should measure and report:

  • Difficulty model assessments that use tests, yielding test scores. These can be in raw form (eg marks out of 45) converted to percentages or standardised around a certain mean – like 0, 10, 50 or 100.
  • Quality model assessments that require teachers to make comparative judgements between samples of work and/or to reference them against success criteria or examplar work. Outcomes can be turned to grades or marks (determined by mark schemes) or standardised scores. (There is also the nomoremarking.com comparative judgement score concept.)

Instead of spuriously converting raw scores or marks from tests and assessments into guessed bell-curved 1-9 grades, we should simply record and report them as they are.  These could be averaged across several tests or kept as a list of individual scores.  Tell it as it is. However, as I’ve discussed elsewhere,  scores only make sense as indicators of standards in comparison to something.  A safe bet is to use the school or class cohort average.  (Or better still, the graphic representations Matthew Benyohai uses in his blog).  So Geography: 78%; class average 65%.  Art : 70%;  Class average 74%. This is authentic and meaningful.

Progress should be viewed in two ways:

movement through a curriculum: If a student gets 70% on a set of tests of different topics, they are still making progress.  They are learning more.  If they take the same or very similar tests repeatedly, we’d expect scores to rise.  Again they are learning more.  So, progress can be tracked by evaluating how far a student’s knowledge extends in relation to the whole curriculum. That will be represented through a series of discrete localised data points that have specific content-driven significance (eg topic test scores in maths or history.)

informed professional judgement: rather than some spurious algorithm, teachers should be entrusted to report a judgement about the degree of progress a student is making through the curriculum in relation to their starting point.  ‘Expected progress’ isn’t a data point; a measure.  It’s a judgement, taking account all the attainment information – from entry through all the assessments to-date, comparing a student’s performance with current and past cohorts.  It’s evaluative, not quantifiable.

Reporting to parents then becomes a combination of three elements:

  • My child’s performance is X:  report raw assessment info.
  • This compares to the cohort/class in this way – eg report class average Y.
  • In the teacher’s judgement , taking account of input and outcome information, they are making Z progress. Use a comment bank or simple free-text comment, or a set of codes.

In this way, it doesn’t matter if the information varies between subjects – it will.  There are no tricky boundaries to navigate between ‘expected’ ‘developing’ or ‘mastered’ – all of that spuriousness has gone.  Only when we’re closer to GCSEs would we worry about how this might pan out in 1-9 grade terms.  At KS3, we keep the door open.  Good progress from a high starting point might indicate higher projected grades – but we don’t focus on that, because we can’t know.

The fundamental principle is this:  keep it real, only report what you know and if it’s a teacher judgement with a wide margin of error, say so.

See also:

Understanding Assessment: A blog guide

21 comments

  1. Very interesting. We have lost the sense of teachers as “experts”, sadly. I often think of people who learn a musical instrument and take music exams. The “Grades” get a bit harder each time (can’t be “measured” numerically!), and anyone with musical sense can notice if a child is “making progress”. “Expert” musicians can judge children’s performances and create a number mark, but there is no numerical ruler for them to use! Progression is a process over time, that is perceived by other “expert” practitioners, not a “snapshot” assessment, so, by definition, an exam can’t really effectively measure “progress” in the same way that an exam can’t measure “maturity” or “love” – but we do know what “maturity” and “love” are, and that some people are more mature and more loving than others. Problems are exacerbated when snapshot exams, measuring “achievement”, are also expected to measure “progress” and are also expected to serve as “accountability” measures. The three things are all different, as any kind of thinking beyond the superficial makes clear. But “Progress8” pretends they are not different, hence the problems of over-emphasising it if you are really trying to help children make progress – paradoxically.

    Liked by 1 person

  2. I think these approaches are absolutely correct, but I’d be interested to find how you would further tackle the issue of comparability. You have commented that without some norm-referenced standards for KS3 outcomes we are relatively in the dark; but the notion that between KS2 and KS4 outcomes we cannot determine how student performance relates to other schools and cohorts except via teacher professional judgement, whilst true, should lead to some scrutiny of this process.

    I, like you, would like to rely on teacher’s professional judgement but also see a responsibility for ensuring a systematic method of teachers checking their assumptions. So far my model would consist of:
    – First and foremost, scrutiny of the curriculum as the progression model.
    – Comparison of quality-model responses (e.g. comparative judgement) across schools
    – Running of standardised difficulty model assessments across schools (more applicable in subjects with more standardised curriculum content e.g. maths / science)
    – End of KS3 non-curricular linked assessments (e.g. GL assessments) as a broad calibration

    Are these practices similar to what you would propose?

    Like

    • Yes. I’ve covered this elsewhere but I think we should compile exemplars of standards for quality model assessments from across the country for benchmarking and use GL assessments eg in Y9. Need care to ensure that these tests actually match the taught curriculum. But this national cohort pegging only needs to sit in the background.

      Liked by 1 person

  3. Thanks for your very useful blogs Tom.

    After spending a while trying to think of a way in which we could use the GCSE scale to report data, I’ve come round to reporting normalised (probably 100,20) exam results.

    My problem: we are a new school with only 9 pupils in our oldest year group. Makes normalisation / comparison with the average not particularly meaningful. Also, we don’t have previous year group performance to give us an idea of expected progress.

    We are planning to use nomoremarking.com POP tests for English and Maths but not sure what to do for other subjects? Any ideas?

    Like

    • Thanks Luke. I wouldn’t try to use a comparative method where there isn’t a natural baseline or background set of exemplars. Just use the assessment difficulty as an indicator and accept the predictive uncertainty. So set a test in History/Science or an assessment in Drama/Art etc – and, given your teachers’ knowledge of the curriculum make informed judgment about whether a given level of performance seems strong – ‘on track’ – or not. It’s as reliable as any other mechanism but without introducing anything spurious. Nearer to GCSEs, you can pick up exemplar material etc that will help to hone the level of predictive accuracy.

      Like

  4. Tom, this is exactly how I want to report current attainment home to parents. However, some colleagues are fearful of disengagement from students who are below the average grade in the majority of their subjects. What are your thoughts on this? Thanks in advance.

    Like

    • It’s a good question. Truth is all systems do this implicitly if not explicitly. I think we just use progress and attitude comments to stress positive gains. Or you could remove the average marks and just report raw score with ‘and this represents’ X – insert comment about relative attainment/progress.

      Like

  5. […] Subject professionals discussing the building blocks, the approach to teaching and learning and the progression model to their subject.  The teaching happens. With plenty of formative assessment to inform teaching and learning. How students are progressing is discussed by subject teachers, adjustments are made to the next lesson; to the Scheme of Learning or the explanation to a small group of students. It is exactly these discussions are fed back into SLT so that they can support interventions if needed, but also understand if students or one class are not progressing at the same rate as the rest of the year group.  These discussions also take place across the schools with subject leads supporting and developing practice throughout the MAT.  Finally, an agreed summative assessment is taken so that schools can see how their students are progressing against a larger cohort.  Within a larger cohort, progress can be maintaining your position within he cohort as the whole group moves (see TomSherrington’s excellent explanation here). […]

    Like

  6. Reblogged this on Faith in Learning and commented:
    Saw the reference to this Becky Allen’s post today in the IoE Blog on “poor attainment data coming too late,” I commend this piece of really helpful thinking from Tom Sherrington. If you are a school leader, and you are interested in improving assessment in a school, then read this! If you are a Year 6 teacher or an assessment manager in a secondary school, read this and regain your sanity when school leaders are telling you to improve scores at children’s expense. If you work for the DfE and think that norm-referenced bellcurves were ever going to help you lie more effectively to your political bosses, then read this. Thank you Tom.

    Liked by 1 person

  7. I have been increasingly tempted to use comparative judgement after recommendations from colleagues. Once the papers are organised into ranked order, how can grades be awarded? Would you use the bell curve model to establish percentages of grades typically awarded (say based on previous years exams) and go from there?

    Like

Leave a comment