Industry & Advocacy News
Self-published and indie authors can help protect against unauthorized training uses by including a short "no AI training" notice in their books. Also, anyone publishing online can block OpenAI's web crawler from accessing their website.
September 8, 2023
This post was originally published on September 8, 2023, and updated on August 13, 2024.
Text-based generative AI technologies require the ingestion of vast amounts of data, including tens or hundreds of thousands of books. AI companies, so far, have not sought permission or offered compensation to the authors whose books and other writings are being used in training generative AI technologies, a practice that the Authors Guild stringently opposes. To make matters worse, all of the large book AI training datasets that we are aware of were compiled from ebook piracy sites—precisely because there are no legal databases of books that are on the open internet, and AI developers have not obtained licenses to do so.
The Authors Guild has been working on advancing legislative and regulatory changes that would require AI companies to seek permission before using books and other written works in the training and development of generative AI systems. But until legal safeguards are instituted, authors can take some steps to try to protect themselves from unauthorized training uses.
In addition to the steps below, make sure to set up Google alerts for each of your books and to send takedown notices whenever you see unauthorized copies online.
We recommend that you include a short notice on the cover or copyright page of your books, or at the top of posted articles or stories. Even though copyright law gives authors exclusive rights to decide if others can use their works, many tech companies, including AI developers, believe everything on the web is free to use. They will, however, observe opt-outs when pressed. Below is an example of such a notice:
NO AI TRAINING: Without in any way limiting the author’s [and publisher’s] exclusive rights under copyright, any use of this publication to “train” generative artificial intelligence (AI) technologies to generate text is expressly prohibited. The author reserves all rights to license uses of this work for generative AI training and development of machine learning language models.
You can protect yourself from some unauthorized crawling and scraping of text from your website by adding or editing a file called robots.txt with instructions for web crawlers. OpenAI has said that it will not crawl sites that restrict its crawler, GPTBot, in robots.txt. You can learn how to add or update your robots.txt file here, and OpenAI has details on how to disallow GPTBot from accessing your site.
You can also block other AI bots from crawling your website by adding them to your site’s robots.txt. This useful website maintains a running list of AI bots. Finally, you can use Google’s robots.txt tester tool to verify that the updated file matches the documentation.
Your gift will help sustain our efforts to support working writers and preserve the literary profession.
Statements
AI Licensing for Authors: Who Owns the Rights and What’s a Fair Split?
December 12, 2024
HarperCollins AI Licensing Deal
November 19, 2024
Sign the Statement on AI Training
October 25, 2024