Plot Twist: Understanding the Authors Guild v. OpenAI Inc Complaint

By: Stella Haynes Kiehn

While George R.R. Martin remains hard at work on the final installments in the critically acclaimed A Song of Ice and Fire series; in July 2023, one tenacious fan used ChatGPT to finish the series in a fraction of the time. For fans who don’t feel the urge to read speculations, ChatGPT can also write entirely new stories set in the style and world of any author whose work exists on the internet. These AI-generated novels are only part of a growing issue about the use of copyrighted works to train Large Language Models (LLM) such as ChatGPT. Now, the authors are looking to reverse the narrative. In a complaint filed on Sept. 19, 2023, in the Southern District of New York, the Author’s Guild, the nation’s oldest and largest organization of writers, is suing Open.AI, the maker of ChatGPT, in a class action lawsuit.

The complaint alleges that OpenAI used the authors’ voices, characters, stories, etc. to train ChatGPT, which in turn allowed users to create unauthorized sequels and derivatives of their copyrighted works. Plaintiffs argue that OpenAI should have first obtained a licensing agreement to use their copyrighted works. Plaintiffs also seek a permanent injunction against OpenAI to prevent the alleged harms from reoccurring. All authors seek damages for the lost opportunity to license their works, and for the “market usurpation defendants have enabled by making Plaintiffs unwilling accomplices in their own replacement.” The named plaintiffs include David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor LaValle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow, and Rachel Vail.

At issue legally is Title 17 of the United States Code. The complaint brings claims under 17 U.S.C. §501 for direct, vicarious, and contributory copyright infringement. 17 U.S.C. §501 allows for copyright owners to sue for enforcement of their exclusive rights and 17 U.S.C. §106 sets forth a list of exclusive rights for copyright owners. Different plaintiffs allege different infringements upon the §106 rights, but all plaintiffs commonly allege ChatGPT’s ability to provide derivative works infringed on their copyrighted materials. Under 17 U.S.C. §101, “a ‘derivative work’ is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” For instance, Plaintiff Martin alleges that “when prompted, ChatGPT generated an infringing, unauthorized, and detailed outline for an alternate sequel to A Clash of Kings, one of the Martin Infringed Works, and titled the infringing and unauthorized derivative “A Dance With Shadows,” using the same characters from Martin’s existing books in the series A Song of Ice and Fire.” Under U.S. Copyright Law, only a copyright owner may prepare, or authorize someone else to prepare, a derivative work from the copyrighted material. All plaintiffs also commonly allege ChatGPT’s ability to recite verbatim passages from the copyrighted works. Under 17 U.S.C 106, only copyright owners can create, or license the ability to create, direct copies of their work. However, it should be noted that at the time the complaint was filed, ChatGPT will no longer provide direct excerpts from copyrighted works.

Screen capture of the blog author’s attempt to engage ChatGPT with copyrighted content.

Although it is certain that ChatGPT has produced infringing work, how do we know that OpenAI knowingly trained ChatGPT on copyrighted materials? The complaint provides multiple reasons. First, until very recently, “ChatGPT could be prompted to return quotations of text from copyrighted books with a good degree of accuracy, suggesting that the underlying LLM must have ingested these books in their entirety during its ‘training.’” Second, the presence of derivative works on the internet created by ChatGPT suggests that it had access to the source materials. For example, author Jane Friedman discovered “a cache of garbage books” written under her name for sale on Amazon that were created by ChatGPT. Third, it is likely that OpenAI trained ChatGPT on copyrighted items because, as one group of AI researchers have observed, books are essential for chatbot development; “[b]ooks are a rich source of both fine-grained information, how a character, an object or a scene evolve through a story.” Finally, OpenAI has essentially confirmed this to be true, stating that their  “training” data is “derived from existing publicly accessible ‘corpora’ … of data that include copyrighted works.”

Open AI maintains that its actions were lawful. In a blog post responding to the filing of The New York Times v. OpenAI, a suit about using New York Times copyrighted materials to train ChatGPT, OpenAI argues that “based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.” The Library Copyright Alliance (LCA) also supports a fair use argument, pointing to the history of courts applying the US Copyright Act to AI. Fair use is a legal doctrine that permits the unlicensed use of copyright-protected works in certain circumstances, such as for parody, comment, or criticism. The LCA focuses on “the precedent established in Authors Guild v. HathiTrust and upheld in Authors Guild v. Google.” Notably, in Authors Guild v. Google, the US Court of Appeals for the Second Circuit held that Google’s mass digitization of a large volume of in-copyright books to distill and reveal new information about them was a fair use. The LCA argues that “while these cases did not concern generative AI, they did involve machine learning” and the fair use precedents could summarily be applied.

Despite the plaintiff’s complaints about how AI is currently being used, the complaint specifically states that the “plaintiffs don’t object to the development of generative AI”.The complaint simply asserts that “defendants had no right to develop their AI technologies with unpermitted use of the authors’ copyrighted works. Defendants could have “‘trained’ their large language models on works in the public domain or paid a reasonable licensing fee to use copyrighted works.” In fact, the complaint specifically recognizes that OpenAI’s chief executive Sam Altman has told Congress that he shares Plaintiffs’ concerns. According to Altman, “Ensuring that the creator economy continues to be vibrant is an important priority for OpenAI. … OpenAI does not want to replace creators.”

Unfortunately for any curious Game of Thrones fans out there, the ChatGPT produced novels have been removed due to the pending lawsuit. However, as AI Chatbots grow more powerful, and users grow more adept, it is undoubtedly true that further lawsuits similar to this one will occur. Open AI amended the complaint in December 2023 to include Microsoft, an investor of OpenAI. The complaint now alleges that “OpenAI’s ‘training’ its LLMs could not have happened without Microsoft’s financial and technical support arising from OpenAI’s use of their copyrighted works to train ChatGPT.” The inclusion of Microsoft, a powerful defendant, against the Author’s Guild, a prominent litigant, creates the potential for this case to have far-reaching ramifications for both copyright holders and AI developers.

Leave a comment