Industry & Advocacy News
This week, many authors discovered that their books were used without permission to train AI systems. Here’s what you need to know if your books are in the Books3 dataset, as well as actions you can take now to speak out in defense of your rights.
September 27, 2023
If you’re an author, you may have recently discovered in The Atlantic that your published book was included in a dataset of books used to train artificial intelligence systems without your permission. (Search the dataset here.) We thank journalist and tech consultant Alex Reisner for breaking the story, which is incredibly important and has informed thousands of concerned authors around the world.
This can be an unsettling revelation, raising concerns about copyright, compensation, and the future implications of AI. Here’s what you need to know if your work has been used to “train” AI without permission:
The Books3 dataset contains 183,000 books, downloaded from pirate sources. We know that companies like Meta (creators of LLaMA), EleutherAI, and Bloomberg have used it to train their language models. OpenAI has not disclosed training information about GPT 3.5 or GPT 4—the models underlying ChatGPT—so we don’t know whether it also used Books3. Regardless of whether GPT was trained on Books3, the class action lawsuits against OpenAI should uncover more information on the datasets used by OpenAI, which we believe also include books obtained from pirate sources.
In addition to the recent lawsuit in which the Authors Guild is a named plaintiff, there are other author class action suits pending against OpenAI, Meta, and Google. You don’t need to be a named plaintiff in any of these lawsuits to participate because the respective named plaintiffs represent their entire class. Even if you don’t fall within one or more classes, an outcome in favor of authors should benefit you by clarifying that books need to be licensed when used to “train” generative AI.
Our lawyers at Lieff Cabraser and Cowan, DeBaets, Abrahams & Sheppard are not adding additional named plaintiffs to serve as class representatives to the lawsuit at this time. But since this is a class action case—assuming the class will be certified by the court—you are covered if you meet the class definition laid out in the complaint (PDF). For specific questions about the lawsuit, contact the lawyers here.
If you are not covered by the class in the ongoing suits, know that the Authors Guild is still pursuing protections and compensation for all writers, from poets to memoirists to biographers to translators. This lawsuit is only the first step.
Litigation can take a long time, but there are other important actions you take to speak out in defense of your rights now:
Having your book used by AI can be discouraging, but don’t feel powerless. Take action to protect your rights, join forces with other authors, and push the industry toward a fairer system of transparency and compensation. With collective action, we can shape an AI future that respects authorship and protects the profession at large.
Your gift will help sustain our efforts to support working writers and preserve the literary profession.
Statements
AI Licensing for Authors: Who Owns the Rights and What’s a Fair Split?
December 12, 2024
HarperCollins AI Licensing Deal
November 19, 2024
Sign the Statement on AI Training
October 25, 2024