The debate over the use of copyrighted content to train AI models is taking place in the press, the courts, and in comments submitted to the Copyright Office for its comprehensive study. Read on for an explanation of the key cases and underlying concepts at issue, as well some steps you can take to request that your content be excluded.
What is Generative AI?
Generative AI refers to artificial intelligence systems that can create content. Generative AI systems, like ChatGPT and MidJourney, use large language models (LLM’s) that are trained on large sets of data, including content found online. These systems use complex algorithms to perform tasks such as writing blog posts, creating images and creating videos.
Is Training Generative AI Copyright Infringement?
Many artists, authors and other copyright owners are concerned that generative AI may harm their businesses by replacing or devaluing their work. Lawsuits filed by Richard Kadrey, the Author’s Guild, Thomson Reuters and the New York Times allege copyright infringement against generative AI companies. Following are summaries of these cases:
Kadrey v. Meta (N.D. Cal. No. 23cv03417)
Richard Kadrey, Sarah Silverman and other authors filed a class action suit claiming Meta infringed their copyrights by training its LLM, known as “LLaMA,” using illegal copies of their books. The complaint alleged that the LLAMA models “are themselves infringing derivative works” and asserted claims including for direct and vicarious copyright infringement. Meta moved to dismiss all claims, except the claim that copying the books for purposes of training constituted direct copyright infringement. The court granted the motion, finding the allegation that the models themselves are infringing to be “nonsensical,” and that there could be no vicarious infringement without allegations that the output was directly infringing. The plaintiffs filed an amended complaint in December and the case is entering the discovery phase.
Authors Guild vs. OpenAI (S.D.N.Y. No. 23cv8282) The Authors Guild and a group of authors filed a class-action lawsuit against OpenAI and Microsoft for copyright infringement. The plaintiffs assert that the defendants copied their works wholesale, fed them into their AI systems, and used them to train the systems so that anyone can create derivative works that infringe the originals. The plaintiffs filed an amended complaint in December, claiming direct, vicarious and contributory infringement.
Thomson Reuters v ROSS (D. Del. No. 20cv613)
In one of the earliest filed AI suits, Thomson Reuters sued Ross, claiming infringement of Westlaw content. To develop its own natural language legal search engine, Ross engaged LegalEase to create about 25,000 memos consisting of questions and answers taken from legal opinions. Thomson Reuters claimed that questions were essentially Westlaw headnotes. The parties filed cross-motions for summary judgment. In a September opinion, the court largely denied the motions due to too many disputed facts. Addressing Ross’s fair use defense, the court found it could not decide whether ROSS’ use was “intermediate copying” without knowing the precise nature of Ross’s actions. The court held that it would be transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes. But if Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors, it would be infringement. The case is currently scheduled for trial in May.
The New York Times v. Microsoft (S.D.N.Y. No. 23cv11195)
In late December, The New York Times filed suit against Microsoft and Open AI claiming infringement of millions of articles. The Times’ complaint alleges that ChatGPT’s output infringed the content of its articles, and included side-by-side comparisons to highlight the similarity. The complaint also alleges that OpenAI infringed by usings its material to train ChatGPT models. The case is at the preliminary stages.
Copyright Office Initiative
In early 2023, the Copyright Office launched an initiative to study the legal and policies issues surrounding AI. The Office held numerous listening sessions and is currently reviewing over 10,000 comments submitted in response to its Notice of Inquiry.
Can I Block My Content from the AI Bots?
A website owner can include a robots.txt file on its site requesting that AI bots exclude the site or portions of it when crawling the web. Google and OpenAI have announced that they will obey such requests, though there is no technical requirement they do so. Given that compliance is voluntary, this is only a partially effective solution.
AI is the latest in a long series of technological developments that challenge the metes and bounds of copyright law. We intend to keep abreast of developments in the field and provide updates.
Feel free to contact us if you have questions, would like to discuss these issues or would like to join our mailing list.
Nancy J. Mertzel
Mertzel Law PLLC
1204 Broadway, 4th Floor
New York, NY, 10001