Saturday, 30 August 2025

Deconstructing "Authentic" Assessment

Introduction 

​I was recently asked for my opinion about on-site, situated assessment—the type of assessment its disciples call "authentic".

This post argues that the concept of "authentic" assessment is a philosophical contradiction, a contradiction that becomes clear when we understand that any assessment, simulated or otherwise, can have no reality beyond its own constructed text. I'll explore how this fundamental insight, rooted in the work of Jacques Derrida, reveals a deep flaw in the "authentic" assessment approach to educational evaluation, a flaw that is both philosophical and profoundly human.

​Derrida's famous aphorism, ‘il n'y a pas de hors-texte’, (‘there is nothing outside the text’) is usually applied to questions in literature and philosophy. I argue that educational assessments operate in this way. The moment we bring in a third-party observer into the workplace, the work is immediately reframed. A rubric is set, a time limit is imposed, and a specific task is assigned; in that moment, the real world ceases to be the subject. The subject becomes a constructed simulation, a text to be read and evaluated.

​This dynamic creates a profound identity crisis for the apprentice, whose professional being is destabilized by a new, spectral identity as a “candidate”. Derrida's philosophy, in the reading of Dooley and Kavanagh, is fundamentally concerned with the related questions of memory and identity. It is a philosophical landscape haunted by ghosts and spectres, traces, ashes and mourning. 

The destabilization of the apprentice's identity, this haunting, rooted in Derrida's concept of hauntology, is summoned by the assessor’s gaze. It is a presence that problematizes the apprentice’s being, ensuring that the individual is no longer a single, coherent being but a paradoxical blend of two conflicting identities, the candidate/apprentice. While the assessor's gaze reads a performance, the master electrician's gaze, in complete contrast, is a pedagogical one, and this distinction is where we find the true essence of authenticity—not in a test, but in a tradition.

The Apprentice Electrician as Assessor's "Text"

When assessing a candidate in a simulated environment, say, on their ability to test a lighting system, the specific test results don't matter in the same way that the results matter on-site. What matters in the assessment are the procedures and actions that the candidate performs. The candidate, in the simulated assessment, is not testing the circuit, but is instead performing through their actions that they know how to test the circuit. And in this regard, their actions are as much a simulation of testing as the environment is itself simulated. Their performance is judged on criteria like the settings of the test meter, and whether the test is carried out from the correct part of the circuit. They are, in other words, assessed according to the assessment rubric, and not on whether the installation wiring is up to scratch. The entire scenario is a contrived representation of reality, designed for the sole purpose of evaluation. 

This arrangement ensures that the environment is safe for the candidate to perform and more importantly make mistakes without putting either themselves or the assessor at risk of danger. The assessment can reliably be taken at any centre with any assessor because the simulation is reproducible. The assessment takes place in a controlled environment free from the noise and distractions of a construction site. All the real-world messiness of an electrical installation on a site is cleared away to ensure that the simulated assessment only assesses the candidate's performance ensuring that the assessment has validity. Taking place in a test centre, the assessment is practicable and manageable in terms of bookings and staffing levels. And with all those controls in place, the assessor can concentrate on reading the candidate's actions as if they were reading a text. The candidate's actions are not real work but are instead signs to be interpreted by the assessor. There is no reality to the candidate's performance outside the text of the assessment rubric.

This philosophical error is somewhat paradoxical because the flaw in the “authentic” assessment is that it lacks genuine authenticity. The candidate still has to meet a constructed assessment criteria and it is the assessor’s reading of what is observed, and not the electrician's, that decides how the candidate has performed. It is, in other words, no more "authentic" than the simulation.

Perhaps even more than that, because there is a profound irony here. By its very nature, a situated, so-called "authentic" assessment takes place in the workplace. It is a disruption to the quotidian, the ordinary, the expected. It is a pre-planned, pre-scribed event that interrupts the flow of the workplace and imposes an external frame of scrutiny onto an organic process. To quote Derrida's friend, Emmanuel Lévinas, the workplace suffers 'an abrupt invasion' such that the space the assessor enters ceases to be: the workplace ceases to be a workplace. And, in this respect, there is nothing more inauthentic in the workplace than an "authentic" assessment.

The Haunted Identity of the Apprentice

​Why summon hauntology to discuss authenticity in assessment? By melding haunting with ontology Derrida reconceptualises ghosts as spectres who disturb notions of time and history, blurring distinctions between the present, past, and future. Spectres also problematize being and non-being—they are paradoxically present but absent, unsettling conventional categories of what is real and what is not.

Thus begins Carmen Vallis’ paper which uses Derrida's concept of hauntology to trouble the idea of "authentic" assessment, framing it as a ‘spectre of lost futures’.

The apprentice is a professional-in-training. On-site, they are what they seem to be; their identity set in a practical, day-to-day sense; a shared, recognized identity within the community of practice. But the moment they enter an assessment, their identity as an apprentice is joined by a new, paradoxical identity that of the “candidate”.

This new identity is a kind of spectre; present, but not fully real, one that, as Vallis' paper points out, "problematises being and non-being". The apprentice is still an apprentice, but their professional being is now haunted by this other non-being of candidate, both, as Dooley and Kavanagh put it, ‘belonging and non-belonging, reducible to neither one nor the other’. We can formulate this dyad as the candidate/apprentice. The single word "apprentice" unable to exist without the trace of “candidate”. Both in a state of constant deferral.

In this haunting, the identity of the individual becomes fundamentally non-coherent. The apprentice's being is defined by their work, their relationship with the master electrician, their place within the grading structure. The candidate's identity, however, is inseparable from the assessment's rubric, the assessor's gaze, and the need to perform for a grade. These two identities exist simultaneously but can never truly merge or cohere into a unified whole.

Denial of the Spectre

This is where a more profound philosophical error lies. The advocates of "authentic"assessment seek to deny the existence of this spectral identity altogether. Like Hamlet asking Horatio, "Do you see it?" the answer that advocates of "authentic"assessment want to hear is "no," because that denial allows them to claim that the assessment is a pure, unmediated reflection of the apprentice's true being. 

This haunting occurs irrespective of the simulated or situated nature of the assessment. The assessor's gaze is what summons this spectre. The assessor isn't just observing the candidate/apprentice's actions; they are reading them through the lens of an educational text. This gaze transforms the apprentice's professional identity into an academic one, a paradox that exists only within the contrived space of the assessment. Derrida's theory of hauntology lifts the visor of this ghostly presence to reveal that the "authentic" assessment is no more real than a simulated one.

It is the same with "authentic" assessment advocates who try to solve the problem of non-coherence with a separate, flawed act of coherence: equating the "authentic" with the real. ​A critical point to understand here is that a simulated assessment cannot be said to lack authenticity because it can never possess it. It is not a poor imitation of the real world; it is an entirely separate system with its own logic and its own text. A simulation's purpose is not to replicate reality but to create a controlled environment where a specific, pre-determined set of skills can be observed and evaluated against a specific rubric. The simulation is what it is—a text of a test—and pretending it is anything other than what it is in order to criticize what it cannot have is an act of bad educational faith.

But "authentic" assessment does more than simply deny the presence of the spectre. It seeks to magic its paradoxes away using the power of words. Its very name an incantation, a prayer, a spell. By calling itself "authentic”, the term attempts a deliberate act of concealment. The name itself is a kind of declarative magic, seeking to make real what is only an illusion. It is a linguistic sleight of hand that tries to erase the very spectre it summoned. By naming the assessment as authentic, we are led to believe that its reality is a given and that it has an outside to which it can be compared. But this is a flawed premise. The name is not a description of a quality; it is a declaration of a belief, an attempt to make a phantom cohere through an act of language.

The Electrician's Gaze: A Pedagogical Act

The gaze of “the other” signifies an ethical and existential encounter with a perspective beyond one's own. While the educational assessor's gaze haunts the apprentice by transforming them into a candidate, the master electrician's gaze does different work. 

When I was an apprentice, I remember wanting to see my journeyman smile, acknowledge my work and realize that I respected him and his teaching enough to want to reproduce it. I wanted to see him see that I was learning and doing well. But the smile that I wanted to see wasn't just a mark of approval; it was a confirmation that I was becoming successfully initiated into a community of practice. The apprentice wants to see the journeyman's smile because that acknowledgment is a two-way sign of respect, and a confirmation that they are successfully learning to reproduce the master's craft.

​This gaze is part of a shared text—the tradition of the trade itself. It is not there to judge against a rubric, but to guide, teach, and provide a model. This pedagogical gaze is a gaze of mentorship and acknowledgment. The apprentice is not performing for a grade; they are participating in a conversation that has been going on for generations. This is the un-haunted form of assessment, where the performance is not judged against a pre-written text, but is becoming part of a living one. And whilst all assessment is a form of reading, the master electrician's gaze is one of mentorship and tradition, whilst the educational assessor's is one of cold, bureaucratic scrutiny. What the assessor sees, all that the assessor sees, is the performance of the haunted candidate/apprentice. It is disingenuous to regard these identities, these differences, to be coherent or reconcilable.

Conclusion 

As soon as an assessment rubric is in play the apprentice is haunted by the spectre of being a candidate. This happens irrespective of whether the apprentice is in a college classroom, an assessment centre, or if an assessor assesses them on-site. There is no escaping the fact that all assessments are essentially inauthentic. We cannot escape the reality that any assessment, simulated or otherwise, is a constructed system. The frame is inseparable from the assessor's gaze. When that frame is established, there is no outside of it—and huis clos, no way out. An assessment is not merely an observation; it is a complex, closed system of evaluation. To argue otherwise is at best naive and, at worst, dishonest.

References

Mark Dooley, 'The Surprising Conservatism of Jacques Derrida', The European Conservative, https://europeanconservative.com/articles/essay/the-surprising-conservatism-of-jacques-derrida/, accessed 20th August 2025

Mark Dooley and Liam Kavanagh, (2007) The Philosophy of Derrida, Abingdon , Routledge

Carmen Vallis (2025) ‘Authentic assessment in higher education: the spectre of lost futures’, Teaching in Higher Education, 30:3, 744-751, DOI: 10.1080/13562517.2024.2362217

Episode #190 (26 December 2022) - Deconstructing Derrida: A Dialogue with Peter Salmon by Converging Dialogues on Audible. https://www.audible.co.uk/pd/B0BR4CTH5Q?source_code=ORGOR69210072400FU, accessed 21st August 2025

Peter Salmon (2021) An Event, Perhaps: A Biography of Jacques Derrida, London, Verso

Image Credits

Gisela Giardino, Derrida at Jorge Luis Borges´ home in Buenos Aires, 1995, uploaded to Wikimedia Commons on 12th October 2004, https://commons.wikimedia.org/wiki/File:Jaques_Derrida_(cropped).jpg, CC-A-SA 2.0 Generic, (accessed 25th August 2025)


Hamlet, Horatio, Marcellus and the Ghost (Shakespeare, Hamlet, Act 1, Scene 4), print, Robert Thew, after Henry Fuseli, uploaded to Wikimedia Commons on 11th July 2017, This file was donated to Wikimedia Commons as part of a project by the Metropolitan Museum of Art, CC0 1.0, https://share.google/WTpf9JyXlEcycvQQq, accessed 26th August 2024

Sunday, 10 August 2025

From Classrooms to Courtrooms: On the Horizon of Trust in Online Assessment

What are the barriers to greater adoption of online and on-screen assessment in high-stake, sessional qualifications in England? 

That's the question Ofqual sought to address in its December 2020 report into the issue. The report separates the most significant barriers into three groups: those associated with IT provision in schools and colleges; implementation challenges; and challenges maintaining fairness.

This post is focused on a feature of the implementation challenges, and in particular, the concern in stakeholder groups essential to the success of any deployment and in broader public opinion.
As part of its attempt to answer the question at the top of this post, Ofqual convened a workshop in January 2020, bringing together a cross section of well-informed stakeholders. These stakeholders represented teachers, school and college leaders, technology providers, awarding organisations, industry bodies, government, and Ofqual. Each of these groups undeniably holds valuable perspectives on the practicalities and challenges of online assessment. All of those representatives were rightly invited, included, listened to, and valued. But there's a glaring omission. An exclusion from the list of the invited participants: parents.

Ofqual


Despite the profound impact online assessment might have on their children's educational practices and their own role in supporting learning, parents or parent group representatives were conspicuously absent from this crucial discussion. This wasn't merely an oversight; it was an exclusionary choice, symptomatic of a deeper, systemic distrust of parents within the wider educational system. Indeed, while other stakeholders were invited to a direct, interactive workshop, Ofqual's approach to understanding parental perception instead relied on a YouGov survey, the 'Perceptions Survey Wave 18'.

This represents a clear two-tier approach to information gathering: on one hand, a networked, collaborative discussion among educational professionals; on the other, a third party's arm's-length survey for parents, effectively avoiding any direct engagement with the wider public. This isn't just about the absence of any specific 'parent group'; rather, it indicates how Ofqual utilized its established network to exert power over the discussion surrounding online assessment, instead of seeking to establish its authority on the issue via genuine dialogue with all affected stakeholders. There's reassurance for the institution in being able to say that they have consulted the "experts," but this very claim undermines its own validity, as it hasn't taken the crucial parental opinion into account. Indeed, there's a profound irony here: the very act of using these selected "experts" to establish the boundaries of the problem and its solutions becomes a technique for presenting the solution as a fait accompli. Once the limits have been set by this inner circle, any further involvement—perhaps, even with parents at a later time—is reduced to merely tweaking an already decided-upon solution. In their attempt to solidify their authority through a narrow consensus, these "experts" unwittingly undermine the broader legitimacy that genuine, inclusive dialogue would have provided. This approach goes further than just an oversight; it presents an ethical concern.

As Roger Scruton articulated in a different context, there is a critical difference between using people as a 'means to an end' and genuinely valuing their expertise to serve 'civil society as the end'. By resorting to an arm's-length survey rather than direct, collaborative engagement, Ofqual risked treating parents as mere data points for a pre-determined outcome, rather than as integral partners whose unique expertise could genuinely shape the most effective and trusted path forward for online assessment. Parents are not simply bystanders of educational policy; they are integral partners, offering unique insights into student well-being, home learning environments, and the real-world impact of pedagogical shifts. To exclude them from a direct discussion on 'barriers to online assessment' suggests a fundamental undervaluation of their perspective, treating them as external actors rather than essential contributors.


To exclude parents from a direct discussion on 'barriers to online assessment' suggests a fundamental undervaluation of their perspective, treating them as external actors rather than essential contributors. There's a further irony here: the workshop appears to have sought a purely technocratic understanding of the challenges in introducing online assessment for high-stakes, sessional qualifications. Yet, in their very pursuit of removing these technocratic barriers, Ofqual inadvertently imposed a significant barrier on participation itself, overlooking the crucial human and trust dimensions of implementation. My own experience, having previously highlighted a similar snub given to parents in the Scottish educational sphere, only reinforces the pervasive nature of this institutional reticence to engage directly with parent groups.



Moreover, while the Ofqual report does acknowledge 'public concern' on page 16, it notably fails to delve into, or even acknowledge, a critical aspect of this concern: the widespread public distrust in large-scale IT implementations and solutions. This omission is particularly striking given the recent, devastating public experience with such systems, a distrust that has permeated public consciousness with profound implications for how new technologies are perceived and adopted.

Once you realize that parents had been excluded from this crucial workshop – a direct, collaborative forum – and relegated to an impersonal survey, it's hard to shake the feeling that Ofqual was, perhaps unwittingly, setting themselves up for a repeat of history. This approach carries the distinct risk of mirroring the very cycle of denials, avoidance of responsibility, and disregard for the truth that the Horizon scandal so chillingly epitomises.

Horizon


Nowhere has this erosion of trust been more acutely felt than in the ongoing Post Office Horizon IT scandal. The Post Office, as Nick Wallis notes, was historically "the first Government agency" and, until recently, "the main physical interface between the British state and its citizens". It was more than just a business; it was woven into the fabric of British life, carrying with it deeply romantic associations depicted across poetry, literature, film, and television – from the gentle world of Postman Pat to the profound works of Benjamin Britten and W.H. Auden. These powerful cultural symbols, Wallis explains, represented core "notions of efficiency, stability, security" and, crucially, "trust" [pp. 18-19].

This deep-seated public trust, cultivated over centuries, was catastrophically shattered by the Horizon IT system. Innocent postmasters, whose lives were intrinsically linked to this symbol of stability and trust, found themselves accused of theft and fraud owing to a faulty computer system.

But the breakdown of trust extended even deeper. As Wallis also reveals, the very foundation of the relationship between the Post Office and its subpostmasters was one of profound "trust" [Nick Wallis, p. 27]. Subpostmasters operated with significant autonomy, managing their branches, handling cash, and acting as pillars of their local communities. This professional relationship, built on mutual reliance and good faith, was systematically betrayed by an institution that prioritised a flawed IT system over the integrity of its own people. The enduring lack of justice for these individuals does nothing to mend this fundamental breach, preventing any true closure and hindering the possibility of rebuilding trust from the ground up.

This multi-faceted collapse of trust within the Post Office saga resonates with the earlier example of parental exclusion in educational policy. In both instances, institutions appear to operate from a position of inherent distrust towards those they are meant to serve or collaborate with – whether it's parents in the educational sphere or dedicated subpostmasters on the frontline. This pattern of systemic distrust, and the devastating consequences it brings, is the unseen thread connecting classrooms to courtrooms, underpinning a dialogic breakdown of trust. Institutions, in their perceived need to exert power or control, demonstrate a distrust of the public they serve. This, in turn, fosters public cynicism and a lack of faith, trapping both parties in mutually fulfilling negative feedback loops that undermine genuine progress and societal cohesion. This repeated failure of leadership to take seriously the concerns of ordinary people undermines trust in our institutions and has brought about a national crisis of institutional distrust.

Conclusion


Whilst the two cases are nowhere near morally equivalent in their scale of direct harm, they share a common, worrying thread. Jordan Peterson, in Beyond Order, exhorts us not to denigrate institutions. We should respect them, preserve them, work within them trusting that they will devote themselves to producing something of value beyond the insurance of their own survival. However, as the Post Office's betrayal of its subpostmasters demonstrates, institutions can all too easily undermine their own worth through insularity, an aloof disregard for ordinary people's voices, and an arrogant assumption that they know best. When institutions like Ofqual use an arm's-length survey for parents while hosting networked, collaborative workshops for others to discuss a question of a large-scale IT proposal, they fundamentally betray the moral authority of the people they serve.

Rebuilding trust, then, isn't just about transparency or process; it's about a fundamental recognition of the people's inherent moral right to be heard, respected, and served justly. Only when institutions truly embody the spirit of genuine dialogue and accountability can trust begin to be restored, moving us away from courtrooms and towards a public sphere where all voices, and not just a clique with the correct opinions, are genuinely valued.

Saturday, 9 August 2025

Good for the Goose? Norman Tebbit's Other Test

When I was a boy in the early 1980’s, Norman Tebbit was an almost permanent feature on the news and airwaves. He was, from 1981 to 1987, a member of Margaret Thatcher’s cabinet. He was considered by his critics to be uncompromising, unforgiving, and unyielding; but for those same qualities he was as loved as he was loathed.



If there's one single test associated with Norman Tebbit it is undoubtedly his infamous 1999 cricket test. Coined by him, it arose because of the perceived lack of loyalty shown by South Asian and Caribbean immigrants for the English cricket team. Tebbit controversially suggested that the test should be applied to the children of these immigrants to assess whether they had genuinely integrated into this country.

However, in this post, I'm going to focus on an entirely different test: one that Tebbit failed twice, and one which, I hope, illuminates the assessment principle of reliability and one of the responses to unreliability; generosity.

Upwardly Mobile


I'm going to begin by relating a story that Tebbit tells in his 1989 autobiography Upwardly Mobile. The date is 1953, and Tebbit was flying high as an officer in the Royal Auxiliary Air Force. He was an ambitious young man and he wanted a job as a pilot for the British Overseas Airways Corporation (BOAC).

At the time, BOAC policy was to train its pilots to be multi-skilled; able to fly the plane and if necessary navigate it as well. Consequently, to become a fully-trained pilot Tebbit had to become a fully-trained navigator. The technical examination for the navigation licence was, he tells us, of a very high academic standard with, all told, twenty weeks of intensive instruction and many hours of swotting at home.

Thankfully, he passed all the components of the exam, all that is except one - the signals practical. That’s not a bad outcome; he’d learned the necessary discipline of self-study, and now all he had to do was focus his attention on the remaining, outstanding component. The requirement of the failed assessment was to send and receive six Morse code words per minute sent by an Aldis lamp or flashed by airfield beacons. Tebbit says that he found the six words per minute requirement difficult enough without the added difficulties of the media. And so, he failed his second attempt.

Not unsurprisingly he was nervous when he went for his second re-sit. He took his seat in a large hall with forty other candidates, and to his great surprise the examiner called him out to the front.

—You know the rules, don’t you?
—Yes Sir.
—If you fail any subject three times, including this practical one, you have to retake the complete examination.
—Yes Sir, I do.
—Then you mustn’t fail, must you?
—No Sir, except, you see Sir, I’m not much good at this.
—I know that, so if you miss anything give me a nod and I’ll repeat it - oh and if all else fails I’ve put you next to an RAF signaller!

Needless to say, he passed. 
Or did he? 
Did he pass, or was he passed?

Unreliability and Generosity


Let’s consider the helpful examiner’s actions. How would you describe what he did? Cheating? Forgiving? A lowering of the academic standard? A helpful hand-up to someone who was obviously capable of flying a plane? Does the answer depend on your political prejudices?

Reliability is defined by the Cambridge Assessment Network as the extent to which the results of an assessment are consistent and replicable. So if an assessment is highly reliable it means that if a student took a different version of the test, or if a different examiner marked the test, they would get exactly the same result. I’ve written more about reliability here.

However, when assessors consider an assessment to be unreliable, generosity is used as a compensatory technique. Did the examiner think it ridiculous that pilots should be fully-trained navigators as well? Or, did the examiner disagree with the two-strike rule? Did he think it overly harsh, an inefficient waste of everyone’s time?

On using Generosity to Combat Unreliability


Tom Benton in his Cambridge Assessment Network paper On using Generosity to Combat Unreliability begins from an acceptance that no assessment system is perfect and that by focusing on the risk for individual students, we might logically decide how much generosity is required. In particular, Benton examines how to adjust assessment grading when reliability decreases, such as during unforeseen events like exam cancellations. (The COVID lockdown carry-on is the obvious context). The article proposes different strategies for setting grade boundaries, including maintaining the original grade distribution, ensuring no student is disadvantaged by being awarded a lower grade than deserved, or maximizing overall classification accuracy. It introduces concepts like "true scores" versus "observed scores" and quantifies the impact of various reliability levels on misclassification rates across different grades. It can all get statistically complex but ultimately, Benton argues that a logical application of "benefit of the doubt" can lead to justifiable changes in grade distributions during periods of lower assessment reliability.

This positive light on generosity has implications for how it is managed. Should it be left to individual examiners, as it seems in the case of the signals practical resit, or should institutional authorities transparently bake generosity into their assessment strategy?

One might easily think that that question would depend upon the social significance of the assessment. Is it easier to tolerate generosity in a low-stakes school assessment but not in assessment taken by an apprentice electrician if the outcome is that he might electrocute himself, or worse, others, as a consequence of being passed rather than passing? It’s probably worth remembering that Tebbit’s examination was to allow him to fly passenger jets. There are no easy or obvious answers.

‘You're certainly relatively competent': assessor bias due to recent experiences


One further interesting area of possibility is to be found in Yeates et al's academic paper, ‘You're certainly relatively competent': assessor bias due to recent experiences. This research paper sheds an interesting light on the psychology of how assessors make poor judgements.

The study's context is medical education, where inter-rater score variability is a known challenge. The researchers conducted an experiment with consultant doctors assessing videos of medical trainees. They split the assessors into two groups: one viewed trainee performances in descending order of proficiency (good, then borderline, then poor), and the other viewed them in ascending order (poor, then borderline, then good).

The study found significant "contrast effects" in assessors' judgement. This means that assessors who had recently seen better performances tended to give lower scores to subsequent candidates, while those who had recently seen poorer performances tended to give higher scores to subsequent candidates. This poor judgement was found to be present involving perceptual judgement and judgements about abstract concepts alike. In essence, differences between the current candidate and recently seen candidates were overemphasized, leading to scores that unduly diverged.

Yeates et al identify a number of key findings that could shed some light on the examiner's actions. The most relevant for this post include:

Normative vs. Criterion-Referenced: The findings suggest that assessors often use "normative" (comparative to others) rather than purely "criterion-referenced" (against a fixed standard) decision-making, and that these internal "norms" are easily influenced by recent experiences.

Lack of Insight: Interestingly, the assessors' confidence in their ratings did not correlate with their susceptibility to this poor judgement, suggesting a lack of insight into how their judgement might be influenced by prior candidates.

Within their “lack of insight” finding, there might be further possibilities such as: 

Lack of Professional Reflection: Assessors, like professionals in many fields, often operate under time constraints and may not be explicitly trained or given the structured opportunities for deep, critical self-reflection on their assessment practices. If they lack the "tools" (e.g., frameworks for poor judgement detection, consistent feedback on their own ratings) or the "time" (owing to workload pressures) to engage in meaningful reflection, then their lack of insight into their own poor judgement is less about malicious intent and more about systemic limitations. In my experience, while reflection is encouraged in principle, the practical support, training, and protected time for truly meaningful self-assessment and debriefing are often scarce. This points to a need for better professional development, support, and a culture that values reflective practice in assessment. If assessors aren't equipped to critically analyze their own judgement and the factors influencing them, unconscious poor judgement are much more likely to persist unchecked. 

Dunning-Kruger Effect: The Dunning-Kruger effect describes how people with low ability at a task often overestimate their competence, precisely due to their lack of metacognitive ability to recognize their own errors. Conversely, highly competent individuals may underestimate their relative competence. In the context of the Yeates et al. paper's finding that assessors lacked insight into their susceptibility to poor judgement despite their confidence, it directly aligns with the Dunning-Kruger phenomenon. If assessors are unconsciously incompetent at detecting or mitigating their own poor judgement, they may confidently believe they are fair and objective even when they are not. This "unrealized or unacknowledged" lack of competence in poor judgement mitigation could be a significant factor.

Ultimately Yeates et al conclude that these cognitive predilections can significantly influence assessors' judgements in ways that are unfair to candidates. There is a question here begging to be asked: unfair to which candidates?

Whilst Benton introduces "generosity" as a logical response to unreliability, the Yeates et al. paper delves into subconscious poor judgement that could contribute to an examiner's subjective decisions. In this new light, the decision of Tebbit's examiner might not have been a deliberate act of "generosity" or "cheating" in the conventional sense, but perhaps an unconscious "contrast effect." If the examiner had just come from assessing candidates who performed significantly worse, Tebbit's "not quite good enough" performance might have seemed "relatively competent" in contrast.

Or, had he, perhaps, seen Tebbit’s scores for the other components? Were Tebbit's scores perhaps higher than his peers and that through experience the examiner recognized Tebbit’s potential? Under those circumstances, did he think that the assessment was unfairly disadvantageous to a perfectly good pilot? We, of course, cannot know. But it is not insignificant that these questions come so easily to mind.

This latter possibility raises a further intriguing question about the assessor decision-making process: could a similar psychological effect, when an assessor views the same candidate's strong performance in two or three components of an assessment, influence their judgement of a weaker performance in an another less well performed component? In other words, is it possible that the examiner, knowing Tebbit's stellar performance in previous sections, was unconsciously swayed to pass him on the final component – a fascinating area we'll delve into in a future post.
 

Unfinished Business


Nearly forty years after his BOAC assessment, Tebbit left front-line politics. In the final chapter of his subsequent 1991 book Unfinished Business, Tebbit set out the direction of what a post-Thatcher Tebbit government would have looked like. I agree with much of it. But something very particular caught my attention. 

Tebbit correctly frames his criticism of the contemporary regretful state of the education system on the socialist reforms of the 1960's. Without lapsing into nostalgic sentimentality, Tebbit praises the one-time tripartite structure of grammar, technical and secondary modern schooling which, in their own ways, served the best interests of their pupils and their country coalescing around achievement and meritocracy. The socialist ethos sought, and for that matter still seeks, to slur the distinction between failure and success. The quicker pupils have to be held back to accommodate the pace of the slowest. What could not be achieved by the many was put beyond the reach of the any. He then continues:

Despite the overwhelming evidence of falling standards a quite contrary picture is displayed by the architects of these disasters who point to statistics of ever-increasing examination success. Closer examination shows that the biggest examination cheats are not students but examiners, teachers and educationalists who simply water down standards to ensure that the appropriate quota of examinees achieve success.

That's how I remember him. Totally unafraid of left-wing intellectualism, he doesn't just pour scorn over their false gods, he pulls them down and tramples on them unmercifully. When will we see his like again. And yet. What are we to conclude about that hand-up given to him in an assessment that he simply wasn't good enough to pass on his own merit? 

One shallow, simplistic solution would be to point a finger at him and cry out “hypocrite”. It would be an answer that tells us nothing. Let us however aim for something a little more complex. In their paper, Yeates et al identify unfairness as one of the consequences of assessor poor judgement. Helping Tebbit to pass let him think that he had passed. It allowed him to forget, consciously or unconsciously, the hand-up. It admonishes Tebbit from the charge of hypocrisy but it doesn't get him off the hook that his critique is made from a position of unrealized privilege.

And finally, if we consider the Assessor's action as a form of lie, we can genuinely see how the lie distorted Tebbit's reality. The very act designed to help him inadvertently harmed his self-perception. It is a tragedy of good intentions. This is then the painful irony of the examiner's actions: that the person treated most unfairly was Tebbit himself.


Reference


‘Reliability’, Assessment 101 Glossary: 101 words and phrases for assessment professionals (nd), The Assessment Network, University of Cambridge, p.10

Benton, T. (2021). On using generosity to combat unreliability. Research Matters: A Cambridge Assessment publication, 31, 22-41

Tebbit, N.
- (1989) Upwardly Mobile, London, Futura Publications 
- (1991) Unfinished Business, London, Weidenfeld & Nicolson 

Deconstructing "Authentic" Assessment

Introduction   ​I was recently asked for my opinion about on-site, situated assessment—the type of assessment its disciples call "authe...