Report: Meta Used Pirated Books to Train AI Model

Photo by Sean Gladwell via Getty Images

Meta used pirated books by authors including James Patterson, Stephen King, and Zadie Smith to train an artificial intelligence program, according to a report in the Atlantic.

In the article, journalist and computer programmer Alex Reisner writes that he obtained a set of data that Meta—the tech company behind Facebook and Instagram—used to train LLaMA, a large language model, which can generate text based on prompts.

The dataset, Reisner writes, shows that more than 170,000 books were used to train LLaMA, as well as the BloombergGPT model. Reisner wrote “a series of programs” to extract ISBNs—numerical book identifiers—from the dataset.

He found that the dataset contained books written by several notable authors, including Rebecca Solnit, George Saunders, and Elena Ferrante. Some writers had multiple books included in the dataset: Haruki Murakami, Jennifer Egan, bell hooks, Margaret Atwood, and L. Ron Hubbard.

The report comes weeks after three authors—Sarah Silverman, Richard Kadrey, and Christopher Golden—filed a lawsuit against Meta and OpenAI, alleging that their books were used to train AI models without their permission. Days after that, thousands of writers published an open letter to several tech company CEOs demanding that the companies get authors’ permissions before using their books to train the models.

In his report, Reisner writes that book piracy has “come to seem natural” because of the Internet. “Yet the culture of piracy has, until now, facilitated mostly personal use by individual people,” he writes. “The exploitation of pirated books for profit, with the goal of replacing the writers whose work was taken—this is a different and disturbing trend.”

Michael Schaub, a journalist and regular contributor to NPR, lives near Austin, Texas.