Industry & Advocacy News
Self-published and indie authors can help protect against unauthorized training uses by including a short "no AI training" notice in their books. Also, anyone publishing online can block OpenAI's web crawler from accessing their website.
September 8, 2023
Text-based generative AI technologies require the ingestion of vast amounts of data, including tens or hundreds of thousands of books. AI companies, so far, have not sought permission or offered compensation to the authors whose books and other writings are being used in training generative AI technologies, a practice that the Authors Guild stringently opposes. To make matters worse, all of the large book AI training datasets that we are aware of were compiled from ebook piracy sites—precisely because there are no legal databases of books that are on the open internet, and AI developers have not obtained licenses to do so.
The Authors Guild has been working on advancing legislative and regulatory changes that would require AI companies to seek permission before using books and other written works in the training and development of generative AI systems. But until legal safeguards are instituted, authors can take some steps to try to protect themselves from unauthorized training uses.
First, make sure to set up Google alerts for each of your books and to send takedown notices whenever you see unauthorized copies online. In addition, we recommend that you include a short notice on the cover or copyright page of your books, or at the top of posted articles or stories. Even though copyright law gives authors exclusive rights to decide if others can use their works, many tech companies, including AI developers, believe everything on the web is free to use. They will, however, observe opt-outs when pressed. Below is an example of such a notice:
NO AI TRAINING: Without in any way limiting the author’s [and publisher’s] exclusive rights under copyright, any use of this publication to “train” generative artificial intelligence (AI) technologies to generate text is expressly prohibited. The author reserves all rights to license uses of this work for generative AI training and development of machine learning language models.
In addition, you can protect yourself from some unauthorized crawling and scraping of text from your website by adding or editing a file called robots.txt with instructions for web crawlers. OpenAI has said that it will not crawl sites that restrict its crawler, GPTBot, in robots.txt. You can learn how to add or update your robots.txt file here, and OpenAI has details on how to disallow GPTBot from accessing your site. Finally, you can use Google’s robots.txt tester tool to verify that the updated file matches the documentation.
Industry & Advocacy News
Letter from Authors Guild President Maya Shanbhag Lang About Class-Action Suit Against OpenAI
September 20, 2023
The Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 Other Authors File Class-Action Suit Against OpenAI
Amazon’s New Disclosure Policy for AI-Generated Book Content Is a Welcome First Step
September 7, 2023