All News

HarperCollins has sent out requests to thousands of authors for the inclusion of their nonfiction books in an AI license to an unknown AI company. There is some confusion about the offer in the press and on social media.

As a preliminary matter, it is important to understand that the licensed use of books must replace AI companies’ current unlicensed, uncontrolled, and infringing use. Moving to a regime of licensed AI use gives authors the power to say “no” or to insist on limits on output uses and be compensated.  The flagrant illegality and unfairness of the current regime—AI companies’ unauthorized use of copyright protected materials without compensating authors—has prompted a number of lawsuits, including one brought by the Authors Guild and a number of other authors.  (For clarification, the HarperCollins initiative could not and does not impact the compensation and damages authors are entitled to receive in these lawsuits.)

The Authors Guild thus appreciates fair initiatives that move us toward licensing solutions. HarperCollins is requesting express permission from each author individually, and we understand it has incorporated agent and author feedback into the addendums with the authors. Unlike some of the academic presses that have licensed AI rights to books, HarperCollins acknowledges that the use is outside of the original publishing agreement by seeking authorization through a separate agreement providing flow-through payments (so that the authors’ shares are not applied against their advances).

Typical trade publishing agreements, like HarperCollins’ standard boilerplate, provide rights for publication in various book and excerpt forms, and all other rights are reserved to the author, meaning publishers must obtain permission from authors before including their books in AI licensing deals. We commend HarperCollins and others such as Cambridge University Press and McFarland for recognizing this important principle.

The HarperCollins licensing deal provides for a $5,000 fee per title, which will be split 50-50 between an author who chooses to participate and HarperCollins. We believe that a 50-50 split for a mere AI training license gives far too much to the publisher. These rights belong to the author as they are not book or excerpt rights; it is the authors’ expression that produces value in AI licensing. Even when the publisher is serving as the licensor on behalf of its authors, the authors should receive most of the revenue, minus only the equivalent of an agent’s fee, plus what is needed to compensate the publisher for additional labor or rights, such as creating the files that are licensed and providing metadata—and that is to be negotiated between the publisher and the author or their agent. In this case, we are cognizant of the resources and time that HarperCollins is investing in building an ethical licensing framework. 

Lastly, HarperCollins’ arrangement includes certain “guardrails” against users of the AI system generating outputs that could harm the value of the books, including limiting outputs to no more than 200 consecutive words and/or 5 percent of a book’s text. Other protections include a pledge by the AI licensee not to scrape text from piracy websites – an illegal practice that harms authors — and to take action against infringement. These kinds of limitations and conditions on the use of the licensed material are crucial in AI training licenses to prevent the AI from stealing training data and generating harmful outputs.

The Authors Guild believes that licensing that gives authors control over whether and how their works are used for AI training is part of a solution to AI companies’ ongoing, flagrant theft of books and journalism. In addition to litigation, licensing is a way to enforce copyright against this theft, to allow authors to say “no,” and to bring control over uses back to the authors and their partners.