Intelligent people whose opinions I respect are having thoughts about LLMs that I find…less persuasive. Corey Robin reported how he’d asked his daughter to run some of his essay questions for students through Chat-GPT; the initial results were superficially plausible but bland and vacuous, as one would expect, but with additional prompts and tweaks the results were pretty well indistinguishable from a really good paper. As he notes, a student doesn’t have to be able to write a good paper, just tell the difference between a good paper and a less good one. The outcome: he’s switching to in-class midterms and finals for the first time in thirty years. And then Dan Davies popped up in response to ask provocatively whether this is really just like the arrival of the pocket calculator making skills of complex mental arithmetic redundant; is it really worth all this fuss if the machine can do it more reliably and vastly quicker?
For me – and this is probably a VERY age-specific experience – the advent of the pocket calculator as an acceptable thing to take into an exam hall was never about mental arithmetic, as those skills were long since ingrained, but about the replacement of four-figure tables for working out cosines, logs and the like. I did actually still have to use those in anger once – I cannot remember which paper it was; Advanced Maths O-Level seems most likely – as the batteries in my calculator ran out, and in those days they still kept a stack of four-figure tables handy; and I got at A, so that was fine. These days I imagine they keep a stack of spare batteries.
One point here – and this is where I think Dan is wrong – is that the pocket calculator does exactly the same calculations and looking-up of cosines that I did, only faster and to a far higher level of precision (more than four figures…). It’s a direct replacement, a superior replacement; the skill it renders obsolete in an exam situation is that of using four-figure tables to look up cosines, which is a skill that had also been rendered obsolete everywhere else. When all the lights go out, of course, you may be glad of people like me who could if necessary accurately calculate angles without the use of electricity, but for the moment it is singularly useless.
With LLMs, it is not at all obvious that the skills of processing, analysing and interpreting large quantities of complex textual information and developing original theses and arguments on this basis have suddenly ceased to be useful for human beings. Further, the LLM is not replicating the human thought process, even if it produces something that looks like the results of such a thought process; it is weighing up the probability of different arrangements of words, with no understanding of their content or relation to any sort of reality. Given its ability to use an unbelievably vast corpus of data for this purpose, perhaps this might yield some interesting results now and again, and certainly it’s getting better at mimicry. But Corey is absolutely right that if students can produce good-seeming critical interpretations by prompting Chat-GPT, the essay ceases to be a reliable means of testing their abilities actually to analyse and interpret material critically.
Where he goes wrong is in switching to in-class exams, as the ability to perform well in those is only partially correlated with the ability to analyse and interpret material critically, as opposed to short-term memory retention and bullshit – which is an area where LLMs might as well be brought in as replacements. There is surely a strong case to be made that, while skill in promoting Chat-GPT to generate convincing simulacra of human thought is definitely not the same as skill in actual critical thought, it’s a more useful skill than being able to scribble a plausible essay from memory in an hour or so.
I spent some time this week adding a section on ‘Use of Generative AI’ to my module handbooks for next year, given that the university’s guidance amounts to a vague and rather desperate ‘ask your module director if you’re allowed to use gAI for different purposes, reference this properly if you do, but never submit gAI-generated content as your own work’. Possibly the aim is make the process of working out what would constitute gAI-free content if gAI has been used in the process of research and writing so burdensome that students will steer clear; ditto the requirement to provide full information about all the prompts used and all the outputs. That’s certainly a more viable approach than attempting to bring any sort f disciplinary case on the basis of such guidelines, even if the phrase ‘academic offence’ is also bandied about.
I can’t help reverting to the guidance of the maths lessons of my youth: Show Your Working. A simple answer, whether produced through mental arithmetic or a calculator or intuition, is simply right or wrong; if you set out the steps in the calculation, up to the point where you indicate what you asked the calculator to calculate, it’s possible to see where you went wrong, and to gain some credit for thinking along the right lines even if the final answer is wrong. The humanities equivalent: the final essay in itself is not the thing we are interested in, let alone its conclusion, but the essay as evidence for the thought processes that generated it – the breadth of research and critical understanding of it, the evaluation and interpretation of different sorts of evidence, the logical construction of argument and so forth.
There may indeed be an appropriate place for LLMs within this process, analogous to the use of a pocket calculator for the final calculation; I haven’t yet seen a persuasive idea for where this might be, but I don’t rule it out on principle. The crucial point is that, for the purposes of this form of assessment, its role does need to be manifest, rather than the ‘black box’ approach of an essay generated by gAI. Perhaps we need to think of it in the same terms as an argument based on pure intuition or unexamined assumptions – equally ‘black box’ processes from the perspective of the reader, that need to be elaborated and explained to be at all persuasive. This may suggest that the polished essay is actually a problem – we need to be shown more of the nuts and bolts. But unseen exams are never the answer; they’re even more ‘black box’-y bullshit, just less polished…
In the last couple of years before I baled out I’d stopped enjoying teaching, but I always enjoyed marking. Glad I’m not there to have that taken away from me!
Yes, the “pocket calculator” analogy is tempting but actually it’s way out. (Apart from anything else, I remember when you could tell who was serious about Maths A Level by how fancy their slide rule was…) I did a “social science methods” as part of my doctorate which actually involved buying a calculator, but purely on a “then press the ‘chi-square’ button” basis, i.e. to substitute for the kind of calculations nobody would actually do.
The odd thing about AI and essay-writing, at least from a what-we-no-longer-call-post-92 perspective, is that what AI consistently does well is the one thing that no students can do, or virtually none: an essay as polished and fluent as the prose that AI typically produces would immediately ring alarm bells, and indeed frequently did. Perhaps you should bring in in-class tests, but use them purely as source data on the student’s actual writing style; you could use the kind of random topics that prefects legendarily used to set as punishments (“two sides on The Inside Of A Ping-Pong Ball!”).
Either I’m just too young for slide rules, or I did the wrong sort of Maths A-Level (Maths & Stats); either way, I remember how disappointed my father – who did a Maths degree – was that I didn’t feel any need to inherit his beloved slide rule.
Interesting idea about the class years, but I can’t see that getting through any scrutiny committee, and in any case I don’t think writing under timed conditions would necessarily provide useful data about style. Yes, perfect grammar is always a warning sign, let alone perfect US grammar and spelling, but it’s probably possible to prompt Chat-GPT to use lots of comma splices…
thank you for the information