To the Editor:
In a current column (“Anatomy of an AI Essay,” Inside Increased Ed, July 2, 2024), Elizabeth Steere described an evaluation of AI-generated responses to essay prompts from her programs. Whereas this evaluation is efficacious, its framing may give false confidence to instructors attempting to find out if a scholar’s work was AI-generated.
To Dr. Steere’s credit score, the column itself doesn’t explicitly recommend that readers use the report with the intention to determine if a particular scholar task was AI-authored. Furthermore, in one other current column (“The Bother with AI Writing Detection,” Inside Increased Ed, October 18, 2023), Dr. Steere discusses the perils of false plagiarism or AI-use allegations, and notes that her position is to not “play plagiarism police.” Whereas the brand new and earlier columns don’t immediately contradict each other, readers might come away from the newer work with the misguided concept that, armed with a catalog of pink flags, they’ll catch dishonest college students presenting AI-authored work as their very own. I need to emphasize that my following critique shouldn’t be in regards to the data Dr. Steere presents—moderately, it seeks to discourage hypothetical future misuse of that work.
So, why would possibly readers misuse this catalog of AI pink flags? I feel there are a number of intertwined points.
First, Dr. Steere writes: “I took observe of the traits of AI essays that differentiated them from what I’ve come to count on from their human-composed counterparts.” It appears like she enumerated AI hallmarks after which in contrast their frequency within the AI essays to the methods she recollects her human college students writing in response to comparable prompts. This type of comparability dangers affirmation bias, as mistaken beliefs about how typically people use these hallmarks may distort reminiscence. A stronger strategy would entail direct quantitative comparability of AI to human writing. Ideally, such an evaluation would result in a transparent choice rule for categorizing writing as AI or human authored, and the rule can be examined on novel writing samples.
Second, even when the cataloged pink flags can point out whether or not essays have been written by AI or as an alternative by Dr. Steere’s human college students, it’s not clear if these inferences generalize to different teams of scholars, sorts of writing task, or scholarly disciplines. College students with completely different coaching and experiences typically write in very alternative ways. One cause that automated AI detectors have largely fallen by the wayside is that they’re extra more likely to report college students writing in a second language as dishonest. Arguably, a lot of educational coaching consists of socializing college students in discipline-specific scholarly communication strategies.
The generalization concern shouldn’t be trivial, particularly if the readers of Inside Increased Ed—school from throughout educational disciplines—attempt to use Dr. Steere’s evaluation in evaluating college students. As an instance this, contemplate what would possibly occur if I used the pink flags to determine cheaters in my psychology analysis strategies course
My college students are requested to observe the conventions of APA type, which may result in awkward constructions and tortured phrases, together with the avoidance of first particular person and using passive voice in lots of contexts. As in lots of journal articles, sections of their papers are list-like, typically repetitive, and embrace formulaic beginnings and endings to paragraphs. Whereas it’s not what I ask of them, in an effort to sound “extra scientific,” many college students use “large phrases” they don’t want. As college students battle to learn and interpret the first scientific literature, they typically seem like confidently flawed and depend on analogies and metaphors to know and talk what they’ve learn. As soon as they do grasp a brand new idea, they typically communicate hyperbolically, in absolute phrases, or as if their newfound information sweeps throughout all contexts as an alternative of being narrowly relevant.
All these traits are pink flags recognized in Dr. Steere’s evaluation. I might speculate that the corpora on which frequently-used AI fashions have been skilled embrace a lot scientific writing—which might imply that the very hallmarks of dishonest with AI is also the hallmarks of profitable studying of discipline-specific writing type. We must be cautious in generalizing heuristics for distinguishing AI and human work throughout contexts.
Lastly, dependable group variations may not be informative about particular person outcomes (considered one of many on a regular basis statistical issues illustrated right here). For instance, I do know that males are taller than girls, on common. But when I’m instructed that somebody is 5’8”, I can’t say with any diploma of confidence whether or not that particular person is a person or a lady. It is because, whereas abstract measures of males’s and girls’s heights are completely different, there’s a lot overlap within the variability round these abstract measures. Given 100 individuals standing 5’8”, it’s probably that extra are males than girls—however I might not need to cause from this details about the intercourse or gender of a person. Equally, the AI pink flags described by Dr. Steere would possibly transform enough to allow us to help an announcement like, many college students in my class of 100 should have used AI, however that doesn’t imply we now have actionable proof about anybody scholar’s work.
Dr. Steere’s columns have sought to assist us via an instructional disaster. I feel her work is efficacious. As all of us battle to take care of AI within the classroom, many people have grasped for any attainable lifeline. I’m involved that this desperation could lead on some to misuse Dr. Steere’s evaluation. OpenAI shut down its personal AI detection device as a result of it couldn’t reliably detect dishonest. With out sturdy proof, we should not delude ourselves into considering that our personal heuristics are any higher.
–Benjamin J. Tamber-Rosenau
Assistant professor of psychology, College of Houston