The New York Times Sues OpenAI and Microsoft for Copyright Infringement

The New York Times files lawsuit against OpenAI and Microsoft for allegedly training AI models with copyrighted material from The Times

The New York Times (NYT), the prestigious newspaper known for its award-winning journalism, has taken legal action against OpenAI and its investor, Microsoft. The Times alleges that these companies have violated copyright laws by using millions of NYT articles to train their generative AI models, including OpenAI’s popular ChatGPT and Microsoft’s Copilot, without obtaining consent.

In its lawsuit, filed in the Federal District Court in Manhattan, The Times demands that the AI models and training data containing the unauthorized material be destroyed. Additionally, The Times seeks “billions of dollars in statutory and actual damages” for the unlawful use of their valuable content.

The Times emphasizes the importance of independent journalism and warns that if news organizations cannot protect their work, there will be a significant loss for society. The complaint asserts that the actions of OpenAI and Microsoft undermine journalism and hinder the production of insightful news.

In response to the lawsuit, an OpenAI spokesperson stated that they respect the rights of content creators and have been engaged in productive conversations with The New York Times. OpenAI hopes to find a mutually beneficial resolution, as they have done with other publishers.

Generative AI models learn by analyzing examples and creating new content, including essays, code, emails, and articles. Companies like OpenAI collect massive amounts of data from the web, some of which could be subject to copyright restrictions. While vendors argue that fair use doctrine protects their web-scraping practices, copyright holders, including news organizations, disagree.

This conflict between vendors and copyright holders has resulted in numerous legal battles. For instance, actress Sarah Silverman and several prominent novelists have accused Meta, OpenAI’s parent company, of using their work without permission. Moreover, programmers have filed a lawsuit against Microsoft, OpenAI, and GitHub, alleging that Copilot was developed using their protected code.

The New York Times’ lawsuit represents a notable case in the ongoing legal tussle between generative AI vendors and copyright holders. The lawsuit sheds light on the potential harm caused to The Times’ brand reputation by AI models generating false or made-up information. The complaint highlights instances where Microsoft’s Bing Chat (now Copilot) provided incorrect information supposedly sourced from The Times.

The lawsuit also raises concerns about these AI models competing with news publishers by providing exclusive information without requiring a subscription. This practice diminishes revenue opportunities, as the AI-generated content lacks citation, monetization, and affiliate links used by The Times to generate commissions.

Generative AI models sometimes regurgitate training data, essentially reproducing content verbatim. In one case, OpenAI inadvertently enabled users of ChatGPT to bypass paywalls and access restricted news content.

The New York Times accuses OpenAI and Microsoft of exploiting their investment in journalism without providing any compensation. The complaint suggests that these companies are building news publisher competitors, effectively stealing away audiences and undermining The Times’ business.

This dispute reflects a broader issue that publishers face with companies like Google. Recently, publishers filed a lawsuit against Google, alleging that its AI experiments, such as Bard chatbot and Search Generative Experience, divert traffic and ad revenue from publishers through anticompetitive practices. A study conducted by The Atlantic found that AI-powered search engines could answer user queries without requiring them to click through to publisher websites, potentially diminishing traffic by up to 40%.

Although publishers have legitimate concerns, the outcome of these lawsuits remains uncertain. Heather Meeker, an expert in intellectual property, compared the act of inducing generative AI models to copy content to using a word processor to cut and paste. Meeker believes that most lawsuits like this may fail because it is ultimately the user’s responsibility if they intentionally make the AI model copy protected content.

Instead of engaging in legal battles, some news outlets have chosen to collaborate with generative AI vendors by entering licensing agreements. For example, the Associated Press and Axel Springer have negotiated deals with OpenAI.

The New York Times attempted to reach a licensing agreement with Microsoft and OpenAI in April. Unfortunately, the negotiations did not yield the desired outcome, leading to the current lawsuit.

It’s worth noting that this lawsuit serves as a wake-up call, highlighting the need for clear guidelines and regulations regarding the use of copyrighted content in training AI models. Balancing the benefits of generative AI with the protection of intellectual property poses significant challenges and requires collaboration between vendors, creators, and lawmakers.

Q&A Content

Q: Why is The New York Times suing OpenAI and Microsoft specifically? A: The New York Times alleges that OpenAI and Microsoft have used millions of NYT articles without permission to train their generative AI models, causing potential harm to The Times’ brand and revenue.

Q: What are generative AI models, and how do they learn? A: Generative AI models learn by analyzing examples and creating new content. These models, like OpenAI’s ChatGPT and Microsoft’s Copilot, can generate essays, code, emails, and articles by emulating patterns from diverse training data.

Q: What is fair use doctrine, and how does it relate to this lawsuit? A: Fair use doctrine provides a legal defense that allows limited use of copyrighted material without permission. However, its interpretation varies, leading to disagreements between generative AI vendors and copyright holders on what qualifies as fair use.

Q: How do generative AI models regurgitate training data, and why is it a concern? A: Generative AI models sometimes produce content that replicates or closely resembles their training data, reproducing phrases or sentences verbatim. This regurgitation can be problematic when it leads to the dissemination of false or inaccurate information.

Q: What impact could this lawsuit have on the news subscription business? A: If AI models like ChatGPT and Copilot generate information that is usually accessible only through a subscription, it could undermine the value proposition of news subscriptions. Customers may rely on AI-generated content instead, potentially leading to a decrease in subscriptions and revenue for news publishers.

Q: How are other publishers addressing the use of AI models without permission? A: Some publishers, like the Associated Press and Axel Springer, have chosen to collaborate with generative AI vendors by entering licensing agreements. This approach allows publishers to protect their content while still benefiting from AI technology.

To conclude, the lawsuit filed by The New York Times against OpenAI and Microsoft brings attention to the complex issues surrounding the use of copyrighted material in training generative AI models. The outcome of this case will have implications for both content creators and AI vendors, highlighting the need for clearer guidelines and cooperation in this evolving field. As the legal battles continue, it is essential to find a balance between fostering innovation and respecting intellectual property.

🔗 Relevant Links: – OpenAIMicrosoftThe New York TimesAssociated Press Partnership with OpenAIAxel Springer Licensing AgreementThe Atlantic Study on AI Impact