For example, since OpenAI’s chatbot ChatGPT was launched in November students have already started using it to cheat by writing essays for them. News website CNET has used ChatGPT to write articles, only to have to issue corrections amid accusations of plagiarism. But there is a promising way to spot AI text: by embedding hidden patterns that let us identify AI-generated text into these systems before they’re released.
In studies, these watermarks have already shown that they can identify AI-generated text with near certainty. One, developed by a team at the University of Maryland, was able to spot text created by Meta’s open source language model, OPT-6.7B, using a detection algorithm they built. The work is described in a paper that’s yet to be peer reviewed, and the code will be available for free around February 15.
AI language models work by predicting and generating one word at a time. After each word, the watermarking algorithm randomly divides the language model’s vocabulary into words on a “greenlist” and a “redlist,” and then prompts the language model to choose words on the greenlist.
The more greenlisted words in a passage, the more likely it is that the text is generated by a machine. Text written by a person tends to contain a more random mix of words. For example, for the word “beautiful”, the watermarking algorithm could classify the word “flower” as green, and “orchid” as red. The AI model with the watermarking algorithm would be more likely to use the word “flower” than “orchid,” explains Tom Goldstein, an assistant professor at the University of Maryland, who was involved in the research.