(Hindsight note: I was badly wrong! They stuffed up the algorithm big-time and I totally underestimated the legitimate sense of injustice people would feel after CAG adjustments of any kind. But this is what I thought before reality happened)
In the wake of the Scottish exam results being published, Nicola Sturgeon’s apology for ‘getting it wrong’ is adding fuel to the bonfire of sanity around this year’s exam results. One of the things that winds me up is that nobody is presenting an alternative that:
- provides a more fair or more accurate outcome.
- doesn’t require a time machine to go back to March to re-do the whole thing
- doesn’t make it worse
I fear that a lot of the emotive kicking and screaming is founded on a pretty shallow understanding of how grades and standards work in national examinations and a lack of understanding of how variable teacher assessments are in the absence of rigorous moderation processes.
It’s obviously a nightmare situation: back in March, with weeks to go, we had the decision to close schools and cancel exams. (We could argue about whether that was right – but what’s the point.) Then what? How do we award qualifications with some value? The decision was to use Centre Assessed Grades, alongside rank order in lieu of exams. (No better ideas were being proposed at this time.)
The key challenge here was to take school grades and ranks to form a national data-set from which grades with some kind of parity could be awarded. One part of this is the parity from year to year – and we could debate the issue of grade inflation. But another important part is the parity between all the centres just this year. There is a lot of very woolly emotive guff out there about ‘trusting teachers’ – as if the best solution would be to take the CAGs and just give them out.
Why is that a problem?
It’s a problem because, teachers don’t just ‘know’ the standards for grades by some kind of professional wizardry. Even if teachers within a school have a good shared understanding of what they mean by a 4, a 5 , a 6, a 7 – and the A*-Es for A level – there is no way to know that these standards are shared across schools. In the absence of extensive moderation through work sampling or common assessments, there is no reference point for teachers to use to set grade boundaries in common.
Luckily we have Ofqual – a non-political technocratic organisation staffed with nerdy number crunchers. There are no dark forces here. (I’m really not interested in people thinking it’s all some conspiracy. Giant yawn.). They have data sets for previous years and baseline data that can give us some idea of the typical spread of results that schools yield, relative to the national scale that determines the grades. In the absence of any new exam data – this data allows some kind of re-alignment of all the thousands of rankings and CAG estimates. Of course it’s an approximation – of course it’s not without flaws – but it’s not a scandal. It makes sense.
In the diagram, showing three schools with comprehensive cohorts – School A has assessed grades that are close to the national spread. This matches what is typical for that school’s cohorts. They don’t need much adjustment. School B’s grades are set higher than their typical cohorts – so they need to be adjusted downwards to align with the national pattern. The rank order is intact – this references students’ efforts and work; the adjustment is needed to give the school’s overall grade pattern parity with other school’s grades. That is entirely fair in these circumstances- regardless of the level of deprivation in the area where School A and B are. (It’s the deprivation that’s unfair – not the statistical adjustments.)
School C’s grades are below the typical pattern – so, theoretically they would need an upwards adjustment. Of course there are probably almost no school Cs in reality. Why? Because teachers have a strong natural tendency to give the benefit of the doubt. Obviously. So, in the absence of school Cs, we get national grade inflation unless grades are adjusted. That might not be too big a problem if the inflation was consistent between schools. But if we want to assume students’ grades nationally are meaningful and broadly comparable, this national adjustment was necessary anyway.
(It turns out that for large cohort entries, the actual individual CAGs don’t help to set the position of the school’s scale against the national scale – the internal ranking sets the grade positions once the overall school scale has been adjusted, given a large, continuous grade spread.. Again – no scandal ; just a technical feature of the process – albeit not well-explained or publicised.)
The ‘scandal’ stories about 40% of grades needing to be adjusted….. well, it’s just not a scandal. That’s just how big a change is needed to create the parity we need to retain a set of outcomes with internal integrity. And what does this tell you about the scale of the drift that unmoderated teacher assessment creates? It’s huge. Maybe that’s the scandal – except it’s not; it’s just how assessment works. That’s why we have exams. That’s why Speaking and Listening assessments and a lot of coursework components were eventually corrupted beyond meaning.
Imagine the stats staff now dealing with Nicola Sturgeon’s political directive. What are they going to do? Bump the grades up a bit in the deprived areas to dress it up a bit? Shift everything up by 10%? Just give the teacher assessments? None of this is more fair, more objective…it’s just gerrymandering. Political.
I despair at the woeful levels of understanding of exam grading – this ludicrous obsession with ‘norm referencing’ being some kind of enemy. (And anyone who suggests ‘criterion referencing’ would be better or wants full blown teacher assessment instead of exams can’t really understand assessment – in my view. )
And still – all the emotive outrage-warriors are not offering a cogent valid alternative. Just outrage.
*Addendum: Something I forgot to add originally: My overall suggestion is that schools – and maybe even exam boards, perhaps on request – issue certificates with both CAGs and standardised grades. That could become the currency for Cohort 2020. Grades would be recognised as SA (CAG). e.g. 6 ( 7). That would be something people would put on their CVs etc. It just gives the factual information for people to interpret as they need: This is what the system gave me (this is what my school gave me).
This all makes sense, Tom, but what worries me is schools like mine. 53% have no KS2 data and are therefore subject to statistical change based on at least two cohorts they are not part of. Those who do have KS2 data and previous cohorts. We have been scrupulously fair in our CAGs, and honest, but will be impacted negatively because we are a school with many students who arrived in England in Y7, 8 or 9. It’s a nonsense.
The real scandal is trying to award grades in a year where there are no exams. We should have just awarded something like “pass/no pass”.
I can imagine the issue – it will be interesting to see how that pans out; it might not be as bad as you fear. But I don’t think there is a scandal. A pass/fail would have massive pressure on that boundary – same issues as with the other grade boundaries but worse – and no professional body was advocating that at the time of the consultation. General consensus was that this system was the best recovery model available.
Thanks for your interesting post. There is no doubt most of the grades will be sensible with this scheme.
The issue though, which you have to admit is a bad one, is that outliers in historically poorly centres will be moved down to a typical grade for a top performer at that school. Outliers, either as individuals or cohorts will be punished purely because of the quality of their school. If a school doesn’t typically get top grades – outlying pupils will be dragged down by their centre, preventing them from getting the top grades. If a school doesn’t regularly get top grades – the pupils are receiving a grade sampled from the school distribution, rather than on their individual merits/efforts. The rarified air of top grades might be very under-sampled at a particular school, making it difficult for people at the right hand edge of your Normal distribution to get accurate grades.
This is unpalatable, for me, as this is at the core of social mobility. It is further entrenching disadvantage. Personally, this would have scuppered my own grades (I was at a failing school and got straight A’s – first person in a number of years).
People are surprising and capable of statistically remarkable things – I don’t think this approach allows for that. We have a moral duty, I think, to presume people, individually and collectively, can buck the trend. Perhaps having a non-standard set of stats for this one year wouldn’t be the worst outcome, and is perhaps more humane.
Hi Simon. I could accept all of that as problems with the system – but we’re still left with needing a better one. I don’t know what you’d have done instead. Tom
Tom, there are alternatives. For example, I submitted alternative proposals for standardisation as part of my original consultation response to Ofqual (https://constantinides.net/2020/04/16/award-of-gcses-and-a-levels-in-2020/). I think the idea of “fairness” depends on who we’re suggesting this system is being fair to. The system Ofqual has gone for is arguably fair to schools, but it’s a big jump from that to the suggestion that it’s fair to individual students.
You and your readers may also be interested in the views of the Royal Statistical Society on the statistical methods used and on transparency https://rss.org.uk/RSS/media/File-library/News/2020/06082020-RSS-EPAG-statement-on-grade-adjustment-2020-exams-in-UK-FINAL.pdf
I don’t know in detail how your method is different and why it’s necessarily more fair.
> Luckily we have Ofqual – a non-political technocratic organisation staffed with nerdy number crunchers
That’s not really what Ofqual is.
Each year there are lots of bureaucratic decisions to be made around exams: things like “should exam boards be required to offer Greek even though the number of candidates is small?” or “how much of the final grade for German should a speaking assessment make up?”. These questions aren’t especially important or interesting — there’s no difficult analysis involved and the consequences of getting them “wrong” are very minor — but they have to be made, and we need an organisation to do it. Ofqual is that organisation, and it seems to do a reasonable enough job. However, the kinds of analysis and decisions needed this year have serious consequences, and require expertise that Ofqual doesn’t have.
There are basically two sensible ways to make the kind of decisions that have been left to Ofqual this year. The first way is to appoint an expert (e.g. an eminent statistician). If you do this then you’ll end up with decisions that are technically defensible even if they’re not especially popular. The second way is to appoint someone with democractic accountability. If you do this then you may end up with a technically worse decision, but (due to democratic pressures) the outcome is likely to be more popular. Ofqual has neither expertise nor accountability, which is why the system they’ve come up with is neither defensible nor popular.
I think Ofqual and the exam boards have experts and statisticians aplenty.
Can you explain why you think that?
I met them.
The funny thing is that, with all those experts and statisticians, there doesn’t seem to be anyone at Ofqual who knows how to work a spreadsheet:
Thanks for the ‘hepi’ document, which suggests in-experience in presentation to non-specialists. The interim report published recently by Ofqual suggests both appropriate statistical development and accountability.
Teachers collectively seem un-professional when politicisation occurs. Everyone wants to criticise, but as said fail to provide robust alternatives.
The interim report doesn’t show either of those things.
Tom, You wrote, “(It’s the deprivation that’s unfair – not the statistical adjustments.)”, but it is more complicated that that. The deprivation might have 2 different effects: the first one might make a school’s attainment level lower on average, the second one might make that level more volatile from year to year (eg, this year you have some brilliant students who go to a poor school).
When the statistical model identifies the lower average level of attainment of that poor school, it is not the model’s fault, as you said, it’s society’s fault. However, if the model fails to account for volatility of the attainment level at that school, it might fail the brilliant students who go there this year. And that would be the model’s fault, it is unfair to those students.
So the question is how well does the model account for the volatility in attainment levels? We don’t know, because Ofqual hasn’t told us. So it’s too early to conclude that the statistical model is not unfair.
Or indeed unfair.
I am glad that you agree that it is too early to conclude that the model is unfair or not unfair. If a system is potentially fair to students and potentially unfair, it is important to raise and address the concern that it is potentially unfair. It would be wrong to dismiss that concern as in your statement “(It’s the deprivation that’s unfair – not the statistical adjustments.)”
Just challenging default assumption that it must be unfair.
Two wrongs don’t make a right. Challenging a wrong assumption by relying on the opposite assumption only polarises the debate. It’s fairer to recognise that we don’t know if downgrading 40% of the A-level CAGs if fair or unfair.
A vital piece of information is missing from the debate, which is why it is so muddy. That is: what is the statistical confidence that the entries that have been selected for downgrading are the right ones? Suppose you have 100 people driving on the motorway and your speed trap catches 41 people, then you fine 40 of them. Is that right or not? To answer that question, we need to know how trustworthy the speed trap is. If it gets the wrong car 25% of the time, you are fining 10 people who keep to the law, and let 10 people who break the law get a way with it. That is probably not acceptable in a democratic society. If it gets the wrong car 1% of the time, it is a lot more acceptable. So without the information I mentioned, everyone who takes sides, whichever side, is stabbing in the dark to various extents.
In particular, you cannot scientifically defend the numbers the model gives out when you don’t know the confidence level I mentioned. You don’t know the likely proportion of the downgrades that are wrong, so how do you know if that proportion is acceptable in a democratic society?
The other thing is, with the need for debate in a democratic society, Ofqual, who presumably have this information, must disclose it to the public. Given that it hasn’t done so, the muddiness of the debate is primarily due to its conduct.
> Can you explain why you think that?
I ask because (1) as I’ve already explained, there’s no reason why they would have such experts, given their usual remit and (2) it certainly doesn’t appear that way given the blunders and scrutiny-evasion we’ve seen so far
Tom, I agree with your suggestion “My overall suggestion is that schools – and maybe even exam boards, perhaps on request – issue certificates with both CAGs and standardised grades. That could become the currency for Cohort 2020. Grades would be recognised as SA (CAG). e.g. 6 ( 7). That would be something people would put on their CVs etc. It just gives the factual information for people to interpret as they need: This is what the system gave me (this is what my school gave me).”
What I would add to the certificate is the confidence level at which the “standardised” grade has been calculated. Eg, if it is 90% then it’s probably pretty good. If it’s 70% then less so. It would help education providers and employers to decide how much to trust the “standardised” grade.
This is an interesting discussion.
Looking at the disproportionate downgrading in Scotland of students in disadvantaged areas compared to affluent areas I wonder if this is due to the students in more affluent areas having more access to private tutors. This would contribute to better exam grades for schools in affluent areas in previous years. Consequently these schools will have higher average exam grades for previous years and so their students’ CAGs will be less likely to need downgrading.
Students in disadvantaged areas will have less access to private tutors. They are therefore less likely to have one to one coaching for the exam. This means that they will be less likely to achieve the highest grades. Consequently schools in disadvantaged areas will have lower average exam grades for previous years and so their students’ CAGs will be more likely to be downgraded.
Has this year’s standardisation process exposed the impact of private tuition on exam grades? There seems to be no official recognition of this additional support that many students in affluent areas have access to.
I would be interested in colleague’s views on this.
I understand statistics and data -however seeing a student downgraded from a D to a U based on no evidence of their performance to match a statistical model is not right. We have looked at the data we had available, used professional judgment as well as moderated data. They would have got a passing grade. The U is unfair to them
Agree. I don’t think U is acceptable in these circumstances
Some students have done nothing (some schools have allowed students to do nothing); if awarded ‘U’, there is an opportunity to sit an exam.
A child in a previously poor performing school with predicted grades and teachers assessments of c and below in each subject gained 3 A grades!
No sense whatsoever
[…] thousands of students receiving A levels. Earlier this week I posted my thoughts about the system: Adjusting CAGs: no politics, no disgrace, no injustice. It’s just technical. Well – that headline didn’t turn out so well did it! The blog was my attempt to lay the […]
I’m sorry but your ‘despair’ at the lack of understanding as to how grading works doesn’t wash. It sounds very patronising. When you have a child such as my son who was doing four A- levels and expected A* & A/B and he’s downgraded in three subjects you feel real despair. In one exam he was awarded a grade lower than any previous assessment in five terms .You don’t need to understand the intricacies of an exam system to know something has gone wrong. Two sets of results isn’t the answer. The answer is declaring the process void and accepting most of the teacher assessments.
My son had be persuaded to apply to Cambridge as people from our area don’t usually aspire that high (his opinion). Luckily they changed their offer to unconditional but I want justice for my son and will continue to progress.