All News

Industry & Advocacy News

AG on Meta AI Ruling: Meta Gets a Technical Win, but the Law Favors Authors

On June 25, a federal court in California issued its decision in Kadrey v. Meta, one of the cases in which authors have asserted copyright claims against an AI company for unauthorized copying of their works for training. The court ruled in favor of Meta, holding on summary judgment that Meta’s copying of books to train its AI was fair use. Meta won only on technical grounds, a matter of procedure, not on the merits of the law. The court explains how the market harm caused by Meta LLMs will almost certainly diminish the value of the authors’ works and the copyright incentives to write new books, stating that market harm is the most important factor, but finds that the plaintiffs failed to provide sufficient evidence of that harm. The court was careful to note that its decision was based on the limited record in this specific case and that it found many of Meta’s fair use arguments entirely unpersuasive, as the Guild and other creators have been asserting for years.

Judge Chhabria in essence found that the using copyrighted works for training without a license is not fair use, but granted Meta’s motion to dismiss the plaintiff’s claims because of deficiencies in the record developed by the plaintiffs in this particular case.

Importantly, the court properly rejected what it called the “ridiculous” notion that “adverse copyright rulings would stop this technology in its tracks.” Just as the Guild has consistently argued, Judge Chhabria recognized that “[t]hese products are expected to generate billions, even trillions, of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”

The ruling comes just two days after another judge’s ruling in a separate case against Anthoropic (see the Guild’s analysis of that case here).

Background

The lawsuit was brought by 13 well-known and best-selling authors, including Sarah Silverman, Junot Diaz, and Andrew Sean Greer, who allege that Meta Platforms, Inc. infringed their copyrights by using unauthorized copies of their books to train its large language models (LLMs), including the LLaMA series.

The court found that Meta acquired at least 666 copies of the plaintiffs’ books among the millions of books and articles it downloaded from notorious and criminal pirate sites Library Genesis, Z-Library, and Anna’s Archive. Although Meta initially sought to license books from publishers, it ultimately abandoned those efforts and turned to these pirate sources. Meta’s engineers torrented hundreds of terabytes of data from these libraries and incorporated them into the training corpus for LLaMA 1, 2, and later models. What’s more, Meta was also found to have distributed the copies it torrented by “seeding” them to other users to speed up its downloads. Meta’s distribution of the pirated datasets—through leeching or seeding during torrenting—remains a live issue in the case and was not resolved in this decision.

As the court notes, Meta had a strong incentive to use books as training material due to their high quality and structural coherence, noting that “while a variety of text is necessary for training, books make for especially valuable training data” in improving the LLMs’ performance. Internal Meta communications showed repeated discussions about the value of books and the urgency to acquire more for training, with one employee saying that the “best resources we can think of are definitely books.”

The Court’s Decision

In ruling on the motion, the court addressed the four factors that courts are required to consider in fair use cases: 1) the purpose and character of the use (including whether the use is commercial and whether it is “transformative”), 2) the nature of the copyrighted works, 3) the amount and substantiality of the portion used, and 4) the effect of the use upon the potential market for or value of the work. The court correctly noted that factor 4 is “undoubtedly the single most important element of fair use.” And crucially, it agreed with what the Guild has been saying for years on this factor—that flooding the market with AI-generated works will undermine the incentives that copyright is intended to protect.

Market Harm

Meta and other AI companies have argued that market dilution is not a type of market harm that is relevant under the fourth factor. “But that can’t be right,” Judge Chhabria said.

Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. . . . So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.

This is a critical judicial recognition that the overall harm to the creative professions caused by unauthorized training can and must be considered as part of the fair use analysis.

Here, the court found that the 13 plaintiffs failed to provide sufficient evidence of this type of market harm. It appears that Judge Chhabria wanted the plaintiffs to submit an additional expert report or other empirical evidence showing how Llama has harmed their markets or may do so in the future. Though a report had been submitted, it did not satisfy the court. Judge Chhabria made clear that if the record has been stronger, the outcome would be different:

[I]n the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these thirteen authors—not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.

We were also pleased that Judge Chhabria rejected another well-worn argument by AI companies—one that the court in the Anthropic case erroneously accepted. AI companies frequently argue that market dilution isn’t relevant to fair use because copying works for training is no different from using them to teach humans, who can then use that knowledge to create their own works. Judge Chhabria squarely refuted that flawed comparison:

[W]hen it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.

Unfortunately, the court failed to acknowledge one of the important elements of harm to the market: the loss of license income, in this case from licensing their works for training. The court summarily dismissed this type of harm, even though it is one that courts have consistently recognized. Of course, authors are entitled to demand licensing income for the use of their works. That is how copyright works. Authors hold rights that give them the ability to license their works, and that is how they make money, which gives them incentives to create in the first place. Moreover, the market for licensing works for AI training already exists across multiple creative industries and is growing by the day (as evidenced by licensing platforms like Created by Humans and many others).

Purpose and Character of the Use

The judge also made some puzzling findings under factor 1. First, he incorrectly (and unlike the judge in the Anthropic case) failed to treat the mass-scale downloading of books from pirate websites as separate acts of copying requiring an independent fair use analysis. He concluded that “[b]ecause Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books.” But the Supreme Court has made clear that courts should consider each act of infringement separately.

Second, and relatedly, the court did not adequately consider the illegal source of these copies in considering the “purpose and character” of Meta’s use. Here again, the issue seems to have been a supposed lack of evidence: the court faulted the plaintiffs for not having evidence about how “Meta’s act of downloading propped up these libraries or perpetuated their unlawful activities.” But it is self-evident that downloading millions of books from criminal pirate websites in order to build a multi-billion-dollar AI industry is going to create massive incentives for those sites to continue their activities, and for other pirates to join the fray. The court seems to have created an evidentiary standard that may be all but impossible for many plaintiffs to meet. We expect these rulings to be challenged, and ultimately reversed, on appeal.

A Technical, Not Legal, Win for Meta

While we expect that the decision will be described in the media and elsewhere as a victory for Meta, it is important to remember that the court reached its decision on evidentiary grounds, not because it accepted Meta’s fair use position. The court made clear that other cases should come out differently:

Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before. Indeed, it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor—and thus win the fair use question overall—in cases like this.

Read the full decision here (PDF).