Plot Twist: Understanding the Authors Guild v. OpenAI Inc Complaint

By: Stella Haynes Kiehn

While George R.R. Martin remains hard at work on the final installments in the critically acclaimed A Song of Ice and Fire series; in July 2023, one tenacious fan used ChatGPT to finish the series in a fraction of the time. For fans who don’t feel the urge to read speculations, ChatGPT can also write entirely new stories set in the style and world of any author whose work exists on the internet. These AI-generated novels are only part of a growing issue about the use of copyrighted works to train Large Language Models (LLM) such as ChatGPT. Now, the authors are looking to reverse the narrative. In a complaint filed on Sept. 19, 2023, in the Southern District of New York, the Author’s Guild, the nation’s oldest and largest organization of writers, is suing Open.AI, the maker of ChatGPT, in a class action lawsuit.

The complaint alleges that OpenAI used the authors’ voices, characters, stories, etc. to train ChatGPT, which in turn allowed users to create unauthorized sequels and derivatives of their copyrighted works. Plaintiffs argue that OpenAI should have first obtained a licensing agreement to use their copyrighted works. Plaintiffs also seek a permanent injunction against OpenAI to prevent the alleged harms from reoccurring. All authors seek damages for the lost opportunity to license their works, and for the “market usurpation defendants have enabled by making Plaintiffs unwilling accomplices in their own replacement.” The named plaintiffs include David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor LaValle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow, and Rachel Vail.

At issue legally is Title 17 of the United States Code. The complaint brings claims under 17 U.S.C. §501 for direct, vicarious, and contributory copyright infringement. 17 U.S.C. §501 allows for copyright owners to sue for enforcement of their exclusive rights and 17 U.S.C. §106 sets forth a list of exclusive rights for copyright owners. Different plaintiffs allege different infringements upon the §106 rights, but all plaintiffs commonly allege ChatGPT’s ability to provide derivative works infringed on their copyrighted materials. Under 17 U.S.C. §101, “a ‘derivative work’ is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” For instance, Plaintiff Martin alleges that “when prompted, ChatGPT generated an infringing, unauthorized, and detailed outline for an alternate sequel to A Clash of Kings, one of the Martin Infringed Works, and titled the infringing and unauthorized derivative “A Dance With Shadows,” using the same characters from Martin’s existing books in the series A Song of Ice and Fire.” Under U.S. Copyright Law, only a copyright owner may prepare, or authorize someone else to prepare, a derivative work from the copyrighted material. All plaintiffs also commonly allege ChatGPT’s ability to recite verbatim passages from the copyrighted works. Under 17 U.S.C 106, only copyright owners can create, or license the ability to create, direct copies of their work. However, it should be noted that at the time the complaint was filed, ChatGPT will no longer provide direct excerpts from copyrighted works.

Screen capture of the blog author’s attempt to engage ChatGPT with copyrighted content.

Although it is certain that ChatGPT has produced infringing work, how do we know that OpenAI knowingly trained ChatGPT on copyrighted materials? The complaint provides multiple reasons. First, until very recently, “ChatGPT could be prompted to return quotations of text from copyrighted books with a good degree of accuracy, suggesting that the underlying LLM must have ingested these books in their entirety during its ‘training.’” Second, the presence of derivative works on the internet created by ChatGPT suggests that it had access to the source materials. For example, author Jane Friedman discovered “a cache of garbage books” written under her name for sale on Amazon that were created by ChatGPT. Third, it is likely that OpenAI trained ChatGPT on copyrighted items because, as one group of AI researchers have observed, books are essential for chatbot development; “[b]ooks are a rich source of both fine-grained information, how a character, an object or a scene evolve through a story.” Finally, OpenAI has essentially confirmed this to be true, stating that their  “training” data is “derived from existing publicly accessible ‘corpora’ … of data that include copyrighted works.”

Open AI maintains that its actions were lawful. In a blog post responding to the filing of The New York Times v. OpenAI, a suit about using New York Times copyrighted materials to train ChatGPT, OpenAI argues that “based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.” The Library Copyright Alliance (LCA) also supports a fair use argument, pointing to the history of courts applying the US Copyright Act to AI. Fair use is a legal doctrine that permits the unlicensed use of copyright-protected works in certain circumstances, such as for parody, comment, or criticism. The LCA focuses on “the precedent established in Authors Guild v. HathiTrust and upheld in Authors Guild v. Google.” Notably, in Authors Guild v. Google, the US Court of Appeals for the Second Circuit held that Google’s mass digitization of a large volume of in-copyright books to distill and reveal new information about them was a fair use. The LCA argues that “while these cases did not concern generative AI, they did involve machine learning” and the fair use precedents could summarily be applied.

Despite the plaintiff’s complaints about how AI is currently being used, the complaint specifically states that the “plaintiffs don’t object to the development of generative AI”.The complaint simply asserts that “defendants had no right to develop their AI technologies with unpermitted use of the authors’ copyrighted works. Defendants could have “‘trained’ their large language models on works in the public domain or paid a reasonable licensing fee to use copyrighted works.” In fact, the complaint specifically recognizes that OpenAI’s chief executive Sam Altman has told Congress that he shares Plaintiffs’ concerns. According to Altman, “Ensuring that the creator economy continues to be vibrant is an important priority for OpenAI. … OpenAI does not want to replace creators.”

Unfortunately for any curious Game of Thrones fans out there, the ChatGPT produced novels have been removed due to the pending lawsuit. However, as AI Chatbots grow more powerful, and users grow more adept, it is undoubtedly true that further lawsuits similar to this one will occur. Open AI amended the complaint in December 2023 to include Microsoft, an investor of OpenAI. The complaint now alleges that “OpenAI’s ‘training’ its LLMs could not have happened without Microsoft’s financial and technical support arising from OpenAI’s use of their copyrighted works to train ChatGPT.” The inclusion of Microsoft, a powerful defendant, against the Author’s Guild, a prominent litigant, creates the potential for this case to have far-reaching ramifications for both copyright holders and AI developers.

Talking to Machines – The Legal Implications of ChatGPT

By: Stephanie Ngo

Chat Generative Pre-trained Transformer, known as ChatGPT, was launched on November 30, 2022.  The program has since swept the world by storm with its articulate answers and detailed responses to a multitude of questions. A quick Google Search of “chat gpt” amasses approximately 171 million results. Similarly, in the first five days of launch, more than a million people had signed up to test the chatbot, according to OpenAI’s president, Greg Brockman. But with new technology comes legal issues that require legal solutions. As ChatGPT continues to grow in popularity, it is now more important than ever to discuss how such a smart system could affect the legal field. 

What is Artificial Intelligence? 

Artificial intelligence (AI), per John McCarthy, a world-renowned computer scientist at Stanford University, is “the science and engineering of making intelligent machines, especially intelligent computer programs, that can be used to understand human intelligence.” The first successful AI program was written in 1951 to play a game of checkers, but the idea of “robots” taking on human-like characteristics has been traced back even earlier. Recently, it has been predicted that AI, although prominent now, will permeate the daily lives of individuals by 2025 and seep into various business sectors.  Today, the buzz around AI stems from the fast-growing influx of  emerging technologies, and how AI can be integrated with current technology to innovate products like self-driving cars, electronic medical records, and personal assistants. Many are aware of what “Siri” is, and consumers’ expectations that Siri will soon become all-knowing is what continues to push the field of AI to develop at such fast speeds.

What is ChatGPT? 

ChatGPT is a chatbot that uses a large language model trained by OpenAI. OpenAI is an AI research and deployment company founded in 2015 dedicated to ensuring that artificial intelligence benefits all of humanity. ChatGPT was trained with data from items such as books and other written materials to generate natural and conversational responses, as if a human had written the reply. Chatbots are not a recent invention. In 2019, Salesforce reported that twenty-three percent of service organizations used AI chatbots. In 2021, Salesforce reported the percentage is now closer to thirty-eight percent of organizations, a sixty-seven percent increase since their 2018 report. The effectiveness, however, left many consumers wishing for a faster, smarter way of getting accurate answers.

In comes ChatGPT, which has been hailed as the “best artificial intelligence chatbot ever released to the general public” by technology columnist, Kevin Roose from the New York Times. ChatGPT’s ability to answer extremely convoluted questions, explain scientific concepts, or even debug large amounts of code is indicative of just how far chatbots have advanced since their creation. Prior to ChatGPT, answers from chatbots were taken with a grain of salt because of the inaccurate, roundabout responses that were likely programmed from a template. ChatGPT, while still imperfect and slightly outdated (its knowledge is restricted to information from before 2021), is being used in manners that some argue could impact many different occupations and render certain inventions obsolete.

The Legal Issues with ChatGPT

ChatGPT has widespread applicability, being touted as rivaling Google in its usage. Since the beta launch in November, there have been countless stories from people in various occupations about ChatGPT’s different use cases. Teachers can use ChatGPT to draft quiz questions. Job seekers can use it to draft and revise cover letters and resumes. Doctors have used the chatbot to diagnose a patient, write letters to insurance companies,  and even do certain medical examinations. 

On the other hand, ChatGPT has its downsides. One of the main arguments against ChatGPT is that the chatbot’s responses are so natural that students may use it to shirk their homework or plagiarize. To combat the issue of academic dishonesty and misinformation, OpenAI has begun work on accompanying software and training a classifier to distinguish between AI-written text and human-written text. While not wholly reliable, OpenAI has noted the classifier will become more reliable the longer it is trained.

Another argument that has arisen involves intellectual property issues. Is the material that ChatGPT produces legal to use? In a similar situation, a different artificial intelligence program, Stable Diffusion, was trained to replicate an artist’s style of illustration and create new artwork based upon the user’s prompt. The artist was concerned that the program’s creations would be associated with her name because the training used her artwork.

Because of how new the technology is, the case law addressing this specific issue is limited. In January 2023, Getty Images, a popular stock photo company, commenced legal proceedings against Stability AI, the creators of Stable Diffusion, in the High Court of Justice in London, claiming Stability AI had infringed on intellectual property rights in content owned or represented by Getty Images absent a license and to the detriment of the content creators. A group of artists have also filed a class-action lawsuit against companies with AI art tools, including Stable AI, alleging the violation of rights of millions of artists. Regarding ChatGPT, when asked about any potential legal issues, the chatbot stated that “there should not be any legal issues” as long as the chatbot is used according to the terms and conditions set by the company and with the appropriate permissions and licenses needed, if any. 
Last, but certainly not least, ChatGPT is unable to assess whether the chatbot itself is compliant with the protection of personal data under state privacy laws, as well as the European Union’s General Data Protection Regulation (GDPR). Known by many as the gold-standard of privacy regulations, ChatGPT’s lack of privacy compliance with the GDPR or any privacy laws could have serious consequences if a user feeds ChatGPT sensitive information. OpenAI’s privacy policy does state that the company may collect any communication information that a user communicates with the feature, so it is important for anyone using ChatGPT to pause and think about the impact that sharing information with the chatbot will have before proceeding. As ChatGPT improves and advances, the legal implications are likely to only grow in turn.