Industry & Advocacy News

Practical Tips for Authors to Protect Their Works from AI Use

Self-published and indie authors can help protect against unauthorized training uses by including a short "no AI training" notice in their books. Also, anyone publishing online can block OpenAI's web crawler from accessing their website.

September 8, 2023

Illustration of electrical circuits in the shape of a human brain

This post was originally published on September 8, 2023, and updated on August 13, 2024.

Text-based generative AI technologies require the ingestion of vast amounts of data, including tens or hundreds of thousands of books. AI companies, so far, have not sought permission or offered compensation to the authors whose books and other writings are being used in training generative AI technologies, a practice that the Authors Guild stringently opposes. To make matters worse, all of the large book AI training datasets that we are aware of were compiled from ebook piracy sites—precisely because there are no legal databases of books that are on the open internet, and AI developers have not obtained licenses to do so.

The Authors Guild has been working on advancing legislative and regulatory changes that would require AI companies to seek permission before using books and other written works in the training and development of generative AI systems. But until legal safeguards are instituted, authors can take some steps to try to protect themselves from unauthorized training uses.

In addition to the steps below, make sure to set up Google alerts for each of your books and to send takedown notices whenever you see unauthorized copies online.

Add a Copyright Notice to Your Works

We recommend that you include a short notice on the cover or copyright page of your books, or at the top of posted articles or stories. Even though copyright law gives authors exclusive rights to decide if others can use their works, many tech companies, including AI developers, believe everything on the web is free to use. They will, however, observe opt-outs when pressed. Below is an example of such a notice:

NO AI TRAINING: Without in any way limiting the author’s [and publisher’s] exclusive rights under copyright, any use of this publication to “train” generative artificial intelligence (AI) technologies to generate text is expressly prohibited. The author reserves all rights to license uses of this work for generative AI training and development of machine learning language models.

Block AI Bots from Crawling Your Website

You can protect yourself from some unauthorized crawling and scraping of text from your website by adding or editing a file called robots.txt with instructions for web crawlers. OpenAI has said that it will not crawl sites that restrict its crawler, GPTBot, in robots.txt. You can learn how to add or update your robots.txt file here, and OpenAI has details on how to disallow GPTBot from accessing your site.

You can also block other AI bots from crawling your website by adding them to your site’s robots.txt. This useful website maintains a running list of AI bots. Finally, you can use Google’s robots.txt tester tool to verify that the updated file matches the documentation.

Support Our Work

Your gift will help sustain our efforts to support working writers and preserve the literary profession.

Donate Now

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.