Educational Lab Rats: The Search for Evidence


The recent wave of blogs and twitter exchanges that have focused on the evidence-base that underpins educational policy and practice has been fascinating. I am one of many eagerly anticipating the ResearchED  Conference organised by Tom Bennett at Dulwich College in September. This has been catalysed in part by the exuberant Ben Goldacre, author of Bad Science. This is a must-read book ..and not just for its demolition of the credentials of TV diet magician Gillian McKeith (“…or to use her full medical title, Gillian McKeith”….a classic line!). If you’ve ever believed that those drops of Balsamum Peruvianum or alpine blackbird spit from the homeopathy cabinet made you better…..well, sorry….you’ve been suckered by the voodoo. Except in one respect…the placebo effect.

Ben’s book explores this in detail. It is truly amazing. Proper scientific studies have shown how powerful the placebo effect is. Positive effects from taking neutral sugar pills are reported in trials in all kinds of medical scenarios. The effect can be affected by the colour of the pills and the manner in which the pills are prescribed. For example, if a Doctor thinks the pills are likely to work or, conversely, are unlikely to work (pre- conditioned as part of a blind trial) ..this has a huge impact on the patient’s placebo response because of the consequent Doctor-patient dialogue. If the effect is convincing enough evidently even placebo knee operations can work! Homeopathy is essentially a giant exercise in distributing placebo pills and potions; people want to believe in them so they use them despite the fact that there is zero evidence for their efficacy that would survive a randomised controlled trial (RCT). The reason I find this interesting is because it highlights the complexity of the interaction between social/emotional effects and physical bio-chemical mechanisms even when evaluating a highly reproducible event: the taking of a pill.

Turning towards education, Ben Goldacre and others are suggesting that a more highly developed research culture would be a benefit to policy makers and practitioners. It is hard to argue with that. The big question, however, is ‘what kind of evidence do we need?’ What kind of research is needed in order to provide that evidence?. Given that no two teacher-student interactions are the same, no learning process is entirely reproducible, how are we going to use research methodologies to the greatest effect? As I argue in The Data Delusion even physical systems that appear simple (like dropping a bag of marbles) are actually too complex to be predictable as there are too many variables; we’re left with broad general patterns at best and a list of average effects (as with the Hattie effect sizes). If we throw in all the psychological factors that do their work in medical placebo effects, educational cause and effect is highly problematic from a research perspective. In looking for evidence, we must proceed cautiously with realistic expectations.

To explore this further here are some scenarios:

1) MA Research: Dialogue as a precursor for writing.
A colleague, Emma, completed her MA in Education at Cambridge with a thesis based on the process of students engaging in extended dialogue prior to writing an analysis of a text. Her methodology section was fascinating in itself. There is a large body of literature surrounding the validity and limitations of a wide range of social science research methods. In developing our thinking in the current debate, we’d be wise to engage with it; this isn’t new ground. Emma’s work involved a series of detailed interviews with three of her students – an established method. This enabled her to examine the effects of the dialogic exchanges on the subsequent writing in some detail; this narrow but deep method yielded insights but not data. Rather than tables and graphs, the thesis contains transcripts of student-teacher dialogues and the interviews. It’s a supremely interesting piece of work from which other teachers in her department have benefitted.

Questions to consider:
Does Emma’s research provide evidence that this teaching method could be applied in another context?
To what extent would the findings be more valid or more insightful if extended to a large scale trial?
Is it necessary to quantify the impact of the process in order to have confidence that it works?
Emma is an inspirational teacher in any case. Would another teacher have had the same effect with the same method? Would we find a similar effect on average if 100 teachers tried it? If the results from 100 teachers were positive on average, would that mean that this method ‘works’? What proof would be needed? In practice is ‘insight’ all we need as opposed to ‘evidence’ given all the variables?

2) Observational Experience: Think Pair Share
My blog post about the effect of students discussing in pairs before giving answers as opposed to the default ‘hands up’ method is one of my most popular. In my judgement, based on years of experience of teaching and observing lessons taught by lots of different teachers, it is immensely powerful. However, despite claiming it to be ‘the washing hands of learning’, I’ve never actually measured the difference in student outcomes generated by the two methods. My convictions lie in observing the quality of class interactions and the verbal responses generated. The learning process seems significantly more positive and engaging for all and I suppose I’m making the assumption that higher quality interactions and answers lead to deeper understanding. But, on that point, I could well be wrong…after all, plenty of students appear to learn well in didactic university lectures.

To find out if my hunch is valid or the dubious quackery of a charlatan, it would certainly be possible to conduct a trial: several hundred students could be taught with TPS as the default questioning mode and several hundred others using “Hands Up”. Understanding of some specific content could be assessed before and after the trial and the results compared. What would this show? If the data showed general support for my experience-based hunch, I’d feel vindicated. It would suggest alignment between the obervable interactions and measurable learning outcomes; all neat and tidy; q.e.d.

But what if the effect was small, neutral or, heaven forbid…negative? I’d have to re-evaluate my position and perhaps promote the idea a little less but I’d still use the method myself. Why? Mainly because no amount of data would override my sense that ‘Hands up’ is a poor process; I would argue that the testing process is too limiting; that the in-class interactions amount to more than that which can be meaningfully measured in a test; I’d impose my value system regardless. However, I could not promote TPS as a way to pass tests.

What does this say? It suggests that, in conducting research we need to be clear about what outcomes we place value on. If an initiative cannot be shown to have a reproducible, quantifiable effect in a certain direction, deciding what to do becomes more concretely a matter of working from our values; our gut instincts, biases and prejudices. At a school level is it not valid to reach a consensus on what the value-system is? Beyond that? Probably not. The DFE can’t dictate the values at play in a classroom..even if it wanted to.

3) Action Research: Co-construction.

As I have described in the post Research as CPD: CPD as Research, action research is a routine feature of life at my school. Every teacher is involved in a project where they are trying to find out about the impact of a particular teaching method. The findings are shared as part of our in-house professional dialogue and some teachers are involved with the Cambridge CamSTAR group, disseminating their work more widely. Each project is small scale, often limited to one class. Our focus is based on developing insights.

Should we be trying to scale these projects up or attempting to run them as RCTs? I’m not sure. The main purpose for our action research is to find strategies that work of each teacher in their own context; sharing the findings provides a source of reflection for others. The whole process is highly motivating, including the collaborative aspect and this in itself feels like an important ingredient in driving effective teaching and learning. There is often an organic consensus among a group of teachers about the value of a certain strategy but, at the same time, there are variations in the details of how each person implements the idea. It seems to me that an RCT or large scale trial would require a much tighter definition of the specific strategy in order for it to be valid.

For example, I have been working with a colleague to develop the idea of co-construction. We both use different methods within a common umbrella; this allows us to compare notes and learn from each other. However, we wouldn’t be able to state ‘co-construction works’ in any definitive sense. This is especially true because the value of the process is not in enhancing measurable content-based outcomes; it is in developing a range of other skills and aptitudes such as the confidence to plan and teach a lesson to your peers. I’m not sure we could scale up the trial without having to prescribe a lot of the elements of the process, thereby losing the spirit of it. At the same time, I wouldn’t ever insist that any teacher should adopt the method…. I don’t have evidence to suggest it would work for them; all I can do is suggest it might be worth trying with as much enthusiasm as the idea deserves. Is that not still worthwhile? I think it is. Small scale action research is immensely powerful in creating a culture in which outstanding teachers thrive; knock the value of action research at your peril! Size isn’t everythng and the results of a small local trial could well be more meaningful in that context than the transfer of findings from data-rich large-scale RCTs which average out the detail.

On the issue of measuring outcomes, as a cautionary aside, I often reflect on the sad fact that the very best exam results I’ve ever had for a class of my own came after we ran out of time and I taught the P3 GCSE module in about 15 lessons; we crammed, taught to the test in a mad panic and drilled on past papers. Bingo! A*s galore. Were they all that good at Physics? No. Were they well prepared for A level? No. Was it a good learning experience? No. But the data never lies! What this shows is that the testing process is limited and that surface recall gets you a long way; too far. We can’t always measure what we value and that is a key concern in conducting a trial.

With all that said, here are some examples of RCTs I’d like to see the results of:

  • Does teaching about particles atoms and molecules before teaching about chemical reactions improve understanding? Logically it should…but does it? My hunch is that effective sequencing in the curriculum is an important area.
  • Does extensive use of mini-whiteboards in class discussion during lessons provide Maths teachers with as good or better understanding of students’ capabilities compared to marking books after work has been completed? Does it have any impact at all? (My bet is ‘yes’ but only where the teacher engages with the responses…. Hmmm, how to control for that? )
  • How does the improvement in students’ writing following peer-assessment using a given technique compare to the impact of teacher assessment…controlling for time spent and other variables? Related to this: does a student’s writing improve if they have regular opportunities to peer assess the work of others?
  • If Year 7 was taught the identical scheme of work to Year 8, would they do just as well? (With obvious implications…)
  • If, in multiple trials, teams of three teachers taught parallel mixed ability groups and then the same teachers taught three tiered abilty groups, competing to gain the highest progress score each time, which structure would yield the best outcomes?
  • If students with level 3 in English engage in paired reading with a Sixth Former for half an hour every day for two months, does it improve their reading age significantly?

If these trials and hundreds of others like them were conducted, we’d certainly be in a better position. However, in my opinion, even here we’d still be working in the territory of ‘insight’. The results might influence some changes in policy and practice but ultimately I suspect that any changes would always be primarily driven by socio-political values as teachers and politicians continue to cherry-pick the bits of evidence that suit them. We might be in a position to resist the imposition of national policies that we don’t agree with and that would be a good thing for sure. In the end I would also bet that the greatest gains to students come from the reflection/self-evaluation effect of teachers engaging in and with research processes in their local contexts, regardless of the outcomes of the trials themselves. It would take a mega-meta-RCT to prove that!



  1. Thanks for this very timely contribution to the conversation about research in education. I think that the surge in interest and comment in relation to the validity of any given educational intervention must come as a result of the underlying fatigue many feel after having spent a lot of energy barking up the wrong trees.

    New Zealand has a strong culture of ‘teaching as inquiry’ which is something I would like to see develop in the UK. It is very much along the lines of what you describe as “insights”. Teachers are encouraged to take an investigative approach to their practice; teaching with a set of clear intentions (once determining the specific needs of the cohort) and with a critical eye on impact.

    For this to become anything more than well-intentioned subjective reflection – which in my view is of great value in itself – many of us might benefit from some exposure to the world of randomised controlled trials, research methodology, notions of qualitative vs quantitative analysis etc. To me this is as much in order for the teaching fraternity to develop a more reasoned and confident answer each time it is asked to pursue yet another fad as it is to underscore our professional practice with formalised research evidence.

    I’ve offered to co-ordinate a research project leading up to the ResearchED conference via the #blogsync community. I think the first thing I’ll encourage the participants to do is read this article of yours. Another book that seems relevant to the current discussion is “Bad Education: Debunking Myths in Education” by Philip Adley and Justin Dillon which is written in the UK context and addresses an educational ‘myth’ in each chapter, each time evaluating the quality of the research evidence.

    Thanks for bringing some balance to the discussion, as always.



    • Excellent. As a primary science leader, I really feel teaching as inquiry is just as valid as learning as inquiry! Teachers need to be more scientific. As the science community says: “not by authority”, but evidential proof!


  2. An excellent post. There is a place for different types of research. I have found that the benefit of action research is often in the reflection and change of the teacher participant, rather than finding out a reproducible result which could be rolled out across myriad different classrooms. Indeed, it is the development of collegiality (though working together or sharing results) that is more useful than the results themselves.


  3. Really enjoyed reading this, Tom – thanks.

    After thirty years in teaching, the last ten as a head, I’m now doing a Professional Doctorate in Education, looking at the transition between deputy headship and headship in schools, and I’m finding that fascinating. Through my work at the university I have met several teachers at an early stage of their career who are studying for Masters degrees (often because their PGCE has given them so many credits towards a further degree). I’m impressed at how what they’re learning is feeding into their departments/schools in a constructive way. As you say, it’s the process that’s valuable, in addition to the results, which may be small scale but which are still hugely powerful for the individual teacher/researcher, and which have benefits for the professional community within which they’re working.

    This is certainly a far more powerful form of CPD than the CPD I experienced in the early years of my teachinjig career in the 1980s.


  4. A very interesting post, thank you. Like you I am a little sceptical about the recent rush towards RCTs as the magic pill for achieving valid and evidence-based decision making in education, for similar reasons to those above. There is something quite seductive about conclusions that are based in Big Data, and built from quantitative methodologies, and I think we have to also reflect on how our own academic training shapes our preconceptions towards social science methodologies. Particularly those with a natural sciences background, but I have a humanities background and when I was teaching I was pretty dismissive of qualitative approaches until I actually started studying education as an academic discipline and I realised how close-minded I had been.
    In the Education department at the Institute of Physics, for example, we have been having a very productive internal discussion about the strength of Hattie’s effect sizes; at first glance it seems pretty unquestionable, and you get numbers too! …this number is bigger than this number, so it must be better. But as others have commented/blogged, the provenance for these numbers is not always well understood and when you dig into how they are generated as many questions emerge as through any other methodology. And you can get some counter-intuitive results here too: for example, Hattie seems to suggest that a teacher’s level of subject knowledge is of minimal effect, which I don’t buy at all.
    As you (and Carol) say, action research is a very useful way to generate discussion amongst teachers and generate a culture of reflective practice and teachers’ ownership of their own professional development. However I do think that a better awareness of some of the features of qualitative methodologies and their strengths and weaknesses could develop these discussions and give teachers a clearer vocabulary. Like your colleague with her Masters I have just written an extensive piece on the methodology for my own research… I’m not saying all teachers should do Masters or (in my case) a PhD – although I do regret the loss of the “all-masters profession” ambition of a few years ago, but that’s another issue – but it would be of benefit to have greater awareness of using case studies as research, of using discourse analysis when analysing speech, even at a fairly simple level.
    Big Data and RCT-based work I think is best suited to Big Issues, such as the efficacy of school governance systems or national-level curricula. What works at classroom level is so complex and dependent on the individuals involved, most significantly the teacher, that a case study approach or similar is more appropriate and, ultimately, more valid.
    Looking forward to continuing the discussion at Dulwich…


  5. Hi Tom, another thoughtful blog and great that you are involved with the ResEd conference. I agree with your outcomes focus, but then this is what my business is about, trying to emphasis the connection between evidence and outcomes in education. I also agree that you need to look at the real benefits of getting a grade on an exam, as this doesn’t always mean what it implies. Like others I am a bit sceptical on the application of RCTs to education (but on the other hand impressed by the data that has been produced!) and agree with comments about the value of case studies, especially within a localised context. The risk is that policy makers pluck a particularly favourable case study out of the air and try to apply it everywhere nationally or regionally. As with many areas in education the devil is in the detail and ultimately schools and colleges will have their own ways of tackling innovation and evidence-informed approaches that is appropriate to their circumstances. Localised strategic education partnerships (LSEPs) is one model that could be developed to engage with this appropriately in a defined area. See: .


  6. Really interesting post. As usual! Having just embarked on an MA myself, I find that the amount of evidence and research already out there is pretty staggering, if only it was more easily available. The cost of journals and the access that teachers have to the writing out there should not be a barrier.


  7. Hi Tom

    I really enjoyed this post. I have just finished a masters course through the University of Cambridge Faculty of Education. Prior to taking part in the course I felt as if I was becoming a little stale. The opportunity to learn was possibly the most important part of the course to me as an individual as it gave a sense of purpose to my role in school.

    The paradigm that I used for my study is known as ‘ Teacher -Led Development Work’. This is a concept founded by David Frost and Amanda Roberts. The main principle behind the paradigm is that teachers at all levels should be able to lead. It’s key features relate to ‘Distribution of leadership’ from the bottom up. Everyone can lead in affect.

    The process starts by clarifying your own values and what is important to you as an individual. This then allows a concern to be formed which then allows the research to begin. I was encouraged to leave out numerical data and concentrate on qualitative forms of enquiry. I used interviews with transcription as my main method.

    I have found that I have become interested with my concern on such a deep level that I was upset when the course finished. I now have a completely different outlook towards my roll. I am also able to contribute to discussion on a far greater level than ever before.

    The TLDW method worked for me. It gave me independence but allowed me to drive through my leadership challenge. The key to having a successful experience is the backing of supportive staff and professional mentors. You really have to be backed by the head teacher so that the project is supported.

    In my case I explored the concept of resilience amongst a group of students with emotional well-being concerns. I introduced the UKRP to these students. I recorded my findings to form the foundation of my project.

    It is still something that I really care about now. Overall the opportunity to learn again has reinvented how I am viewed by my colleagues with a significant improvement to my own self-efficacy.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s