Search
AI-powered search, human-powered content.
scroll to top arrow or icon

Is ChatGPT stealing from The New York Times?

Is ChatGPT stealing from The New York Times?
Courtesy of Midjourney
Contributing Writer
https://x.com/ScottNover
https://www.linkedin.com/in/scottnover/

We told you 2024 would be the year of “copyright clarity,” and while some legal disputes were already winding their way through the US courts, a whopper dropped on Dec. 31.

Just hours before the Big Apple’s ball dropped, The New York Times filed a lawsuit against the buzziest AI startup in the world, OpenAI, and its lead investor, Microsoft.


In its 69-page complaint filed in federal court in Manhattan, The New York Times alleged that OpenAI illegally trained its large language models on the Gray Lady’s copyrighted stories. It claims that OpenAI violated its copyright when it ingested the stories and that it continues to do so repeatedly with the information it spits out.

The copying was so brazen, the lawsuit says, that the AI products powered by OpenAI’s large language model, GPT-4, can replicate full — or nearly full — versions of Times articles if prompted, undermining the paper’s subscription business. That includes OpenAI’s popular chatbot ChatGPT, as well as Microsoft’s Bing Chat and Copilot products.

What the Times has to prove

Lawyers for the Times need to first demonstrate that the paper has a valid copyright and, second, that the defendants violated it.

“Facts aren’t copyrightable,” says Kristelia Garcia, an intellectual property law professor at Georgetown University, noting that while an organization’s exact wording in covering a news event is copyrightable, the underlying event it's covering is not. Additionally, “there is a fair use exception for ‘newsworthy’ use of copyrighted work,” she says, a tenet that affords protection to anyone reporting the news.

The fair use doctrine – the main legal principle in question – is what allows you to parody a popular song or quote a novel in a critical review. Generally, the courts have ruled that to qualify as fair use, a work must be “transformative” and not compete commercially against the original work.

In the suit, the Times says that there’s nothing transformative about how OpenAI and Microsoft are using Times stories. Instead, it claims that the “GenAI models compete with and closely mimic the inputs used to train them,” and that “they owe the Times “billions of dollars in statutory and actual damages.”

The view from OpenAI

OpenAI, which had been engaged in deep discussions over the matter with the Times, was caught off guard by the legal move.

“Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development,” it said in a statement after the lawsuit was filed, noting subsequently that the lawsuit was "without merit."

The company has been riding the success of its industry-standard AI tools, chiefly the chatbot ChatGPT, toward an anticipated valuation north of $100 billion, and many users are excited about the much-hyped launch of GPT-5.

But copyright law is one snag threatening to upend OpenAI’s skyward business, and Sam Altman knows it. That’s why he and his colleagues have already started paying media companies for the right to license their content. According to recent reports, payments in the $1-5 million range annually — not the “billions” that the Times says it’s owed – are being offered to media outlets by OpenAI.

AI firms have already been hit with copyright suits from famous authors and artists over their efforts to train their models to be stylistically similar to them, but the Times lawsuit goes further, alleging straight-up copying in the input and output.

What’s likely to come next?

The New York Times was able to effectively manipulate ChatGPT to spit out its articles nearly verbatim: In its brief, it shows that it asked the chatbot to deliver a Times story one paragraph at a time.

When we at GZERO tried this, the chatbot no longer accepted this method, telling us: “I apologize for any inconvenience, but I can't provide verbatim copyrighted text from The New York Times or any other external source.” But it also said, “I can offer a brief summary or answer questions related to the article's content.” It’s unclear whether OpenAI made a change in response to the lawsuit.

Garcia thinks that the Times has a good case as long as it can demonstrate that “OpenAI ingested Article X and then spit out Article Y that shared 500 to 650 identical words.” But, ultimately, she said she’d be surprised if the case ever goes to trial — a process that would take years.

It’s much more likely, she thinks, that the Times is seeking a substantial settlement that pays what it sees as fair value for its journalism.

An adverse decision in court could be a deep threat to the AI business model as a whole — if a judge deems that the training process infringes on copyright, it could change the trajectory of this innovative new technology.