July 10, 2023

Mona Awad Among Authors Suing OpenAI For Unlawfully Ingesting Their Work

Mona Awad

Several authors, such as Mona Awad, Paul Tremblay, Christopher Golden, Richard Kadrey, and Sarah Silverman, have filed lawsuits against OpenAI for unlawfully “ingesting” their books.

The authors claim OpenAI breached copyright law by training ChatGPT on their novels without their permission.

How ChatGPT ‘learns.’

ChatGPT, OpenAI’s artificial intelligence chatbot, skyrocketed in popularity following its release late last year. People mainly use it to compose essays, write emails, create creative stories, and get answers to questions about an extensive range of topics concisely and conversationally.

Generative AI models like ChatGPT are trained on vast swathes of publicly available data, including websites, books, news articles, and more.

Awad and Tremblay have brought a class action lawsuit against OpenAI, claiming their books were used to train ChatGPT because the chatbot generated “very accurate summaries” of the work they both produced without their permission.

Did ChatGPT ingest Mona Awad’s books?

Mona Awad is a Canada-born author whose books include 13 Ways of Looking at a Fat Girl, and Tremblay, who wrote The Cabin at the End of the World, filed the complaint to a federal court last week in San Francisco with author Paul Tremblay.

Awad and Tremblay included the sample summaries ChatGPT allegedly copyrighted, highlighting Open AI used to train the chatbot.

Although there have been concerns about ChatGPT, this is the first lawsuit regarding copyright that they have received.

The author’s lawyers, Joseph Saveri and Matthew Butterick, told the Guardian that the complaint states Open AI “unfairly” profited from “stolen writing and ideas” and calls for monetary damages on behalf of all US-based authors whose works were allegedly used to train ChatGPT.

However, Andrew Guadamuz, a reader in intellectual property law at the University of Sussex, claims it may be challenging to prove that authors have suffered financial losses precisely because of ChatGPT.

ChatGPT may work the same if it had not ingested the books because it is trained on a wealth of internet information, including internet users discussing the books, he said.

Silverman’s legal papers show that when asked to summarise her memoir, the AI engine can detail its content; Golden’s book Ararat and Kadrey’s book Sandman Slim all have the same complaints.

Is there a solution?

OpenAI, Saveri and Butterick claim, has become increasingly secretive about its training data.

Papers released at the beginning of ChatGPT gave an idea of the size of the internet-based books corpora it used as training material, also known as Books2.

The lawyers suggest that the size of this dataset is estimated to contain 294,00 titles and could also be drawn from shadow libraries such as Library Genesis and Z-Library. 

Read: Timnit Gebru On AI Oversight: We Have Food And Drug Agencies, Why Is Tech Any Different?

Since the chatbot’s launch, the publishing industry has been constantly discussing how to protect their authors from copyright.

The Society of Authors’ chief executive, Nicola Solomon, told the Bookseller trade magazine that their organization was “very pleased” to see authors suing OpenAI.

The author’s lawyers noted it is “ironic” that AI tools rely on human data.

“Their systems depend entirely on human creativity. If they bankrupt human creators, they will soon bankrupt themselves,” they said.

Article Tags : , , , ,
Sara Keenan

Tech Reporter at POCIT. Following her master's degree in journalism, Sara cultivated a deep passion for writing and driving positive change for Black and Brown individuals across all areas of life. This passion expanded to include the experiences of Black and Brown people in tech thanks to her internship experience as an editorial assistant at a tech startup.