Automatic for the People

December 12, 2022 by NevilleMorley

As the great philosopher Thucydides once said, “The society that separates its scholars from its warriors will have its thinking done by cowards and its fighting by fools.” This quote has been interpreted in many different ways over the years, but I believe it is still applicable to modern society.

At its core, this quote is a reminder that we must not allow our society to become divided between those who think and those who act. It is essential that those who think and those who act are working together in order to achieve the best possible outcome. This is true in any field, whether it is business, politics, or the military.

In the business world, for example, it is important that those in charge of making decisions also understand the implications of those decisions. This means that they must understand the risks and rewards associated with each decision, as well as the potential consequences of those decisions. Without this knowledge, decisions may be made without considering the long-term effects, leading to disastrous results.

In politics, it is essential that those in power understand the ramifications of their policies and how they will affect the citizens they are meant to serve. Without this knowledge, policies may be implemented without considering the long-term implications, leading to harmful effects on the population.

In the military, it is essential that commanders understand the implications of their orders. Without this knowledge, orders may be given without considering the long-term effects, leading to disastrous results. Ultimately, this quote serves as a reminder that knowledge and action must be combined in order for society to function properly. Those who think and those who act must work together in order to achieve the desired outcome. When this is not done, it leads to disastrous results.

It is up to us to ensure that our society does not become divided between those who think and those who act.

Well, unless there is a dramatic uptick in my viewing stats in the next week, showing that this actually IS what the public wants, I don’t think ChatGPT is going to be taking over writing duties for this blog any time soon. This is very much my ‘tell everyone you’ve been kidnapped or replaced with a badly-briefed shapeshifting alien without telling everyone you’ve been kidnapped or replaced with a badly-briefed shapeshifting alien’ message vibe – if you’re new to the blog, the first two paragraphs above are just as much Not Me as the following quoted passages. It’s a little disappointing – the adverts for post-writing apps that now keep popping up on my Twitter feed promised so much! I mean, I even instructed the AI to write in my personal style, but this is just generic and boring [ignores heckles].

And of course the opening paragraph exemplifies the problem that has already been pointed out by numerous people, that the system has absolutely no conception of accuracy, just an idea of how frequently statements are made on the internet: lots of people say wrong thing (how to treat a seizure; what Thucydides allegedly said), and the AI simply repeats them. This was all too evident in my first attempt at engaging with the thing, on the issue of whether Thucydides was in any sense a ‘scientific’ historian (this being the sort of thing that students regularly write essays about):

At this point it simply stopped responding to me, having presumably gone off in a fit of pique at the accusation of logical inconsistency. Which might almost make you think it was intelligent, but more likely I needed to adjust the character limit or the penalty for repetition.

Again, this is just not very good; a string of truisms and banalities, strung together in a plausibly logical sequence but incapable of questioning the basic premise: Thucydides is widely referred to as a scientific historian therefore he must be one. Not to mention the tendency to assert the existence of evidence without being capable of producing any, or to the failure to recognise possible issues with the claim that Thucydides and Herodotus were definitely scientific historians but lots of modern historians are not.

In brief, an AI-generated humanities essay isn’t going to get a first-class mark any time soon. It doesn’t engage critically with secondary literature – at best, you can get it to include some vague references – or analyse different interpretations of a given piece of evidence. By design, it’s incapable of originality except by accident – selecting each successive statement on a probabilistic model, rather than having even the aspiration to say something new or even interesting, let along any idea of how to go about doing this. It certainly isn’t going to be engaging with the specialist detail of high-level seminar-based courses; its domain is the general survey course, whose contents are only marginally influenced by the idiosyncrasies of the lecturer or the interests of the students rather than by a general consensus on what key topics need to be covered and how, and where a superficially plausible restatement of truisms and banalities may be perfectly passable.

That, I think, is the real issue in the ‘is AI making university professors redundant?’ discourse of the last week or so. This stuff – and even a hypothetical future dramatically improved version of this stuff – isn’t going to compete with the output of our best students, so long as we set them tasks which require a degree of critical thought and individuality. If ChatGPT can turn in an excellent performance in a task, then more or less by definition that task must be some version of a straightforward summary of received wisdom where being derivative is not a problem. As David Andress remarked on the Twitter, ‘if a machine can write like a student, it’s a sign of how we have taught students to write like a machine’.

Well, except that, let’s be honest, ChatGPT does actually write rather better than some of our students; the content may be vacuous, but the sentences are grammatical, there is a clear logical connection from one statement to the next at least most of the time, each paragraph serves a clear function within the whole, and it ends up with a neat conclusion. It’s fluent bullshit – https://www.wired.co.uk/article/chatgpt-fluent-bs – but any substantial batch of student essays will include a sample of fluent bullshit, often less fluent than this, and probably rather less well researched.

The real issue is not how we stop students getting undeserved high marks for work done by AI; our marking criteria already give us the tools to do that, especially if we use just a little more imagination in setting assessment tasks. Rather, it’s where we set the baseline. Is a ChatGPT answer passable according to our current standards? If not, why not?

Write an essay of maximum 1000 words on the question of whether Roman imperialism was motivated solely by material gain. Include references to modern scholarship, evidence, and bibliography.

The debate of whether Roman imperialism was motivated solely by material gain is a complex and multifaceted one, and has been discussed by a number of prominent scholars in the field of ancient Roman history. While there is no clear consensus among scholars, there is evidence to suggest that material gain was an important factor in driving Roman imperialism, though not necessarily the sole motivation.

The most obvious and widely accepted evidence that material gain was one of the factors driving Roman imperialism is the vast wealth and resources that were acquired by the Roman Empire as a result of their imperial conquests. During the period of the Roman Republic, the Roman state acquired vast amounts of wealth and resources from the conquered territories, which were then used to finance further military campaigns and the growth of the Roman state.

In particular, the Roman Empire was able to use the wealth and resources acquired through its imperial conquests to finance its extensive public works programs and the construction of a large network of roads and other infrastructure, which had a significant impact on the development and growth of the Roman state. Additionally, the Roman Empire was able to use the wealth and resources acquired through its imperial conquests to finance its military campaigns and maintain its large standing army, which was an integral part of its imperial strategy.

While material gain was certainly an important factor in driving Roman imperialism, it is important to note that there were other factors at play as well. In particular, the Roman Empire was driven by a strong sense of patriotism and nationalism, which was often expressed in terms of a desire to expand the boundaries of the Roman state and bring peace and stability to the regions under its control.

Additionally, the Roman Empire was driven by a strong sense of religious piety, which was often expressed in terms of a desire to spread Roman religious beliefs and customs to the regions under its control. The Roman Empire also had a strong sense of moral righteousness, which was often expressed in terms of a desire to bring justice to the regions under its control.

Ultimately, while it is clear that material gain was an important factor in driving Roman imperialism, it is also important to note that it was not the sole factor. The Roman Empire was driven by a number of other motivations, including patriotism, nationalism, religious piety, and a sense of moral righteousness.

Therefore, while it is clear that material gain was an important factor in driving Roman imperialism, it is not necessarily the sole factor.

Bibliography

Hanson, W.S. (2004). Roman Imperialism and the Ideology of Domination. Journal of the History of Ideas, 65(2), 281-301.

Harris, W.V. (2005). Rome’s Imperial Economy: Twelve Essays. Oxford: Oxford University Press.

Mattingly, D. (2006). Imperialism, Power, and Identity: Experiencing the Roman Empire. Princeton: Princeton University Press.

Saller, R. (2007). Patriarchy, Property, and Death in the Roman Family. Cambridge: Cambridge University Press.

Note that I ran two versions of this; the first time I didn’t specify the need to engage with modern scholarship, and after looking at the results I thought that was slightly unfair, hence the re-run. The first output was actually superior insofar as it quoted (albeit without giving proper references) a number of ancient sources and even talked about material evidence (though at the level of: coins have been found at Roman sites so clearly they were interested in material gain).

So… Here are the marking criteria relevant to the pass/fail boundary.

The essay has certainly ‘understood’ the question on the face of it, albeit that its answer begs a fair number of questions: pass. There is only one glaring error, the claim that the Romans sought to spread their religious beliefs and to bring justice to regions under its control, and that could just about be defended as a dubious opinion rather than a factual error: marginal. It lacks detail – but, if this were a student essay, I think I’d be inclined to say that it is too general rather than conclude that the author has little actual knowledge; the style makes it sound as if the author actually possesses quite a lot of knowledge: pass. Definite black mark for lack of evidence – this is all unsupported assertion, so fail. No footnotes or references, but bibliography is good, albeit I have no idea what Saller is doing on there: marginal. Coherent structure even if the analysis isn’t developed; clearly and accurately written: pass.

On balance, this looks like a pass to me, with some arguably 2.2 elements, even if not a very good pass overall. The crucial point, I think, is that the confident style gives the impression that the author does possess substantial relevant knowledge of the subject, and understands important aspects in general terms. They don’t, of course; they haven’t necessarily consulted any of the books in the bibliography, but just cut’n’pasted them from elsewhere and at least had the sense to make the formatting consistent.

Given the process that produced this essay, it is always likely to score well on relevance, on coherence and on clarity of writing and presentation; it makes confident assertions, and provides a reasonable bibliography when told to. The lack of detail, critical analysis, consideration of evidence and referencing will hold it back every time – but the main grounds for a fail mark would be the sense that this is actually just ‘plausible bullshit’ based on no real research, knowledge or understanding. This does not dramatically differentiate it from a certain proportion of student essays, including those that end up getting the benefit of the doubt.

Put another way: if I didn’t know that this was the product of ChatGPT, it’s the sort of essay I would dearly like to fail on the basis of suspicion that the student is bullshitting and doesn’t really understand anything – that line about the Roman Empire’s ‘moral righteousness’ would drive me up the wall – but would pass on the basis that it’s ticked enough of the marking criteria.

Does this matter? Rationally, if a student is content to scrape through their degree with marginal passes, whether because they’ve done no work and just strung together some dubious assertions or because they’ve asked an AI to do this for them, then maybe it’s their problem; a dodgy third-class degree will get them only so far in life. It’s probably much more effort than it’s worth to try to fail them anyway.

On the other hand, I’ve never claimed to be wholly rational. There is a substantial part of me that will spend an hour or more googling suspiciously well-crafted phrases from an essay even when Turnitin claims there is no trace of plagiarism, because *I* care. History cares. Plausible bullshit should be crushed without mercy, even if it takes much longer than just giving a pass mark and moving on. Actually this is less about the AI than the students who know and understand just as little but can’t even write it clearly…

More constructively, and sensibly: this needs to be about process. It’s not the one-off mark, it’s the lessons the student draws from the experience and the feedback – and if that is ‘Yes! I can get away with this!’, or any sense that the key to success is plausible bullshitting, then we do have a problem. The idea of general survey courses and relatively bland essay assessments in the early years of a programme is that these build the foundations for more advanced and detailed studies in later years. *We* can see how these AI outputs aren’t very good; how do we make sure that our students are trained in the same analytical skills, and can evaluate their own work better?

Well, one possibility is to get the AI to help us. One way – perhaps the only way – for students to develop their critical skills is to practice them on real material, but that can be tricky; real published material can be too good and hence intimidating, peer review never works because, understandably, they pull their punches, and they all hate it. Examples of past work can be useful – but my experience here is mixed, as there tend to be too many different things going on at once, whereas what we want to focus on is taking apart the argument.

And suddenly we have a means of generating arguments that are eminently worth a good kicking – superficially plausible, actually vacuous – without the distractions of poor grammar, spelling or expression. And, leaving aside its reaction to being caught out in self-contradiction and inconsistency, the author doesn’t mind; possibly it even welcomes the opportunity to improve.

I was all set to try this out today with my Greek Political Thought class, as the assessment involve revising a draft essay in response to feedback so this could be a way to organise discussion about processes of revision. But only three students turned up, and we talked about a load of other stuff instead. I haven’t decided whether to make the experiment on Thursday, with my much larger – well, theoretically much larger; second-to-last day of term with rail strikes and possible snow doesn’t promise much – but if I do, I’ll report back here.

And, look, I have got a blog post out of ChatGPT, as have a load of other people – just not quite in the way they imagined.

Update: twelve hours later, the potential of Greek Political Thought -> ChatGPT dawns on me. It’s been a long term… How could I have missed this opportunity?

Appendix: my other attempt at asking ChatGPT to write blog posts for me – in this case I didn’t ask it to try to imitate my style, but this actually manages to sound a little more like me than the one quoted above where I did…

Write a blog post about how people misquote Thucydides.

Misquoting Thucydides is a common occurrence in the world of ancient Greek literature. The Athenian historian and general who wrote the famous work, The History of the Peloponnesian War, has had his words misused, misinterpreted, and misquoted countless times throughout history.

One of the most common misquotes of Thucydides is “The strong do what they can and the weak suffer what they must.” While this phrase is often attributed to Thucydides, it is actually a misquote of a line from his work. The actual line is “The strong do what they will, and the weak suffer what they must.” This small change makes a big difference as the misquote implies that the strong will always dominate the weak, whereas the original line implies that the weak may have no choice but to accept the power of the strong.

Another misquoted line from Thucydides is “In a democracy, the people get the government they deserve.” Again, this is a misquote of the original line, which is “In a democracy, the government is responsible to the people.” This misquote implies that people are responsible for the government they get, whereas the original line implies that the government is responsible to the people.

Finally, another misquoted line from Thucydides is “The only good is knowledge and the only evil is ignorance.” This misquote implies that knowledge is the only good and that ignorance is the only evil. The original line is “The only good is knowledge and the only evil is not to know.” This line implies that knowledge is good and that not knowing is the only evil.

Misquoting Thucydides is a common occurrence due to the complexity of his language and the fact that many of his lines are open to interpretation. It is important to remember the original words of Thucydides when quoting him in order to give his work the credit it deserves.

Yeah, right. It is to be noted that one of the three examples given is seriously arguable (not to mention the AI’s conviction that there is A correct version of Thucydides in English) while the other two have nothing at all to do with Thucydides but it insists on correcting them with reference to Thucydides’ text anyway…

Posted in Musings | Tagged ChatGPT, essays, history, Writing | 2 Comments

2 Responses

on December 13, 2022 at 7:53 pm | Reply fitzharrys

Early attempts at having computers generate text often resulted in a “word salad”. It seems as if GPT-3 results in a sentence salad. Fluent, grammatical and quite convincing if you aren’t really paying attention. Computer generated music likewise.
on December 14, 2022 at 9:51 am | Reply Phil

There is a substantial part of me that will spend an hour or more googling suspiciously well-crafted phrases from an essay even when Turnitin claims there is no trace of plagiarism, because *I* care.

Damn right.

It’s not the one-off mark, it’s the lessons the student draws from the experience and the feedback – and if that is ‘Yes! I can get away with this!’, or any sense that the key to success is plausible bullshitting, then we do have a problem.

…and that’s *why* it’s right.

I had a weird one a few months ago, in those innocent days before ChatGPT:

More or less the last thing I did as a paid academic was provide evidence to a plagiarism hearing. The essay made very little sense, but it included some closely-argued passages making interesting points and some really nice quotes from secondary and primary sources…

— Phil Edwards 🇵🇸 🇺🇦 (@DrSchwitters) November 23, 2022

Note that what initially caught my eye was that the essay as written just didn’t make sense – the cut-and-paste (or translate-and-cut-and-paste in this case) had been very poorly mortared over. The real killer app would be something that borrowed chunks of text and then wrote the connective passages with ChatGPT’s Readers’ Digest fluency.

Comments RSS

Sphinx

Exploring Antiquity and Modernity with Neville Morley