To put this post in context, previously I have written various other posts about the nature of exam grading, the way OfQual operates, the bell-curve of norm referencing and its role in setting standards, the degree of error that is inherent in any assessment and grading process. Here is a selection:
There are lots of contentious and/or technical issues packed into our examination system that require detailed discussion based on a fairly detailed level of understanding of the issues:
- The weight given to examination outcomes in determining students’ progression routes
- The weight given to exam results in the school accountability system and the weighting given to some subjects and particular grades within that system.
- The extent to which exams can measure the learning outcomes we see as important and desirable.
- The statistical nature of examination grading and the problem of grade boundaries with the inherent property of cliff-edge border-lines and the degree of (evidence-informed) subjectivity required to determine where they should fall.
- The fundamental power of the bell-curve in informing our sense of standards – i.e. the fact that there really are no absolute standards that we give value to in practice. The conflict between wanting to get more students over the bar whilst also raising the bar – and the inherent reciprocal relationship these two goals have in the short term.
- The degree of expertise and technical experience needed to set assessments that are consistent over time and provide the appropriate level of challenge and opportunities for success for students across the attainment range.
- The scale of the challenge of setting and marking national exams that can deliver fair, consistent outcomes across multiple exam boards in any given year and between years.
In my experience, there are lots of commentators and teachers who do not fully comprehend the complexity of the system that they are critiquing. It’s so depressing that, every single year, we have people complaining about the unfairness of grade boundaries going up ‘robbing our children of the grades they deserve’ or bemoaning our slavish devotion to norm referencing – as if this, in itself, is inherently wrong. Every time anyone makes these arguments, it tells me how little they really know about exams.
Here are some realities people need to face – especially if they are going to pass comment on the exam system:
Grade boundaries will always shift as a fundamental part of the task of keeping standards for grades as consistent as possible over time; this process is difficult and inevitably has a degree of subjectivity to it – but it is informed by the analysis of a national data set. If you think you can scan a test paper and decide that it is the same standard as last year and, consequently, persuade yourself the grade boundaries ought to be the same, you’re kidding yourself. You can only tell how difficult a paper is by looking at the results from all the students who sat it – or a reasonable sample. If students nationally get slightly more marks on a similar paper, it means that it was slightly easier. That’s all. There is a lot of statistical noise around each boundary; some students just sneak in above it; some just fall below it. That’s how it works. It can seem harsh – or a lucky escape – but that’s how it has always been.
Grade inflation isn’t a good thing (because it undermines the credibility and value given to every child’s achievements). OfQual has been correct in trying to halt this since 2010. Michael Gove may take credit for a change in policy in this area but the fact is that OfQual is a technocratic organisation; the annual process of grade setting is a highly technical process that politicians have absolutely no input into. There are no secret meetings where ‘they’ are sitting around trying to rob children of their life chances by putting the grade boundaries up in some kind of devilish conspiracy of hate. There are just lots of (probably fairly heated) meetings where professional exam setters and markers examine the spread of grades and, using a range of cross-referencing methods, try to set the boundaries in the most appropriate place so that, in their collective judgement, grades match to the standards students have reached, comparable over time and consistent from exam board to exam board.
Of course, this all should have consequences for how we view exam outcomes – i.e. with a pinch of salt informed by a sensible understanding of the limits of their accuracy and absolute meaning. But let’s at least debate the issues with the level of technical understanding the issues deserve.