OpenAI Can Watermark AI Text With ‘99.9% Certainty,’ but It Won’t (Yet)

ChatGPT parent company OpenAI has apparently been sitting on a way to watermark AI text for over a year, according to the Wall Street Journal. Such technology would be a boon to teachers, researchers, and pretty much anyone who wants to ensure what they’re reading came from a real person, but the company is worried it could be circumvented and that it could hurt business.

“Technology that can detect text written by artificial intelligence with 99.9% certainty has been debated internally for two years,” WSJ writes about OpenAI, saying that the tech is essentially ready to go and would work by adjusting the model to follow a detectable pattern. The outlet says the company doesn’t think the watermarking tool would reduce the quality of ChatGPT’s output, but that it’s worried detecting and watermarking text generated by ChatGPT could turn off almost 30 percent of users, who supposedly told OpenAI they’d be less likely to use ChatGPT if generated text were watermarked.

In response to WSJ’s report, OpenAI posted a blog confirming that it’s been studying watermarking internally, but that while it “has been highly accurate and even effective against localized tampering, such as paraphrasing,” it is less helpful with text that has been translated or reworded using an outside model. It’s also vulnerable to hacks like adding junk characters and then deleting them, which OpenAI says makes it “trivial to circumvention by bad actors.”

As an alternative, the company is looking into using metadata to mark AI-generated text, which it says has the benefits of “no false positives.” The company already uses metadata to mark AI-generated imagery, which it detailed in an example in its blog post.

Whatever method the company uses, the demand for one to come sooner rather than later is clear. According to a survey commissioned by WSJ, “people worldwide supported the idea of an AI detection tool by a margin of four to one.”

Google already uses watermarking for AI-generated text, as explained during this year’s Google I/O.

by Life Hacker