OpenAI unintentionally eradicated possible proof in a copyright litigation case with the New York Times (revised)

Lawyers representing The New York Times and Daily News, currently in legal proceedings against OpenAI for purportedly using their articles without consent to train their AI models, have claimed that OpenAI technicians inadvertently destroyed data that could be significant to the lawsuit.

Earlier in the Autumn, OpenAI consented to supply two virtual machines (VMs) – software versions of computers that exist within the OS of another computer, often used for testing, backing up data, and running applications – to enable The Times and Daily News’ legal team to conduct searches for their copyrighted material within OpenAI’s AI training sets.

According to a recently filed letter, the legal team of the publishers and their hired experts have devoted over 150 hours since the start of November to comb through OpenAI’s training data.

However, on November 14, OpenAI’s engineers allegedly deleted all the search data of the publishers stored on one of the VMs, as stated in the letter submitted to the U.S. District Court for the Southern District of New York.

OpenAI made efforts to restore the data and had partial success. Nonetheless, due to the permanent loss of the folder structure and file names, the recovered data “cannot be used to establish where the news plaintiffs’ copied articles were utilized to develop [OpenAI’s] models,” the letter states.

The legal team for The Times and Daily News stated that they had to restart their work from scratch, costing significant amounts of time and computer processing power. They only recently discovered that the recovered data is no longer usable, and a week’s worth of work by their experts and lawyers needs to be redone.

The legal representatives of the plaintiffs have clarified that they don’t suspect the deletion was done deliberately. However, they argue that this incident highlights that OpenAI is the most capable entity to scan its own datasets for potentially copyright-infringing content.

An OpenAI spokesperson opted not to share a statement. Later on, OpenAI’s lawyers responded to the letter, denying any intentional deletion of evidence, and instead proposed that the plaintiffs were responsible for a system misconfiguration that caused a technical problem.

OpenAI’s stance in this case and similar ones is that training models with publicly accessible data – including articles from The Times and Daily News – constitutes fair use. This implies that OpenAI believes it doesn’t need to license or pay for the examples used, even if it profits from those models.

On the other hand, OpenAI has entered into licensing agreements with an increasing number of news publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. OpenAI hasn’t disclosed the terms of these contracts.

OpenAI has neither confirmed nor denied whether it trained its AI systems using any specific copyrighted works without authorization.

Update: OpenAI’s response to the allegations has been included.

Comments are closed.