All Resources

Article

How Will Authorship Be Defined in an AI Future?

If we want to ensure that our literature and arts continue to reflect our current experiences and our imagined ones, we need to ensure that human creators are compensated and their work is protected.

By Mary Rasenberger

This article originally appeared in the Summer/Fall 2022 issue of the Authors Guild Bulletin.

There’s no question that the rise of AI technology capable of “writing” poses risks to the future of writing as a profession, as it does for many other professions. For the last couple of years, the Authors Guild has been advocating for changes to copyright law that we believe will be necessary to ensure that the writing profession remains vibrant and to prevent AI from whittling away the market for human-written books, journalism, and other forms of human art.

Already, AI has been used to generate formulaic news articles, such as sports results and financial reports; corporate texts (as described in “AI for Writers”); and even novels, including the 2016 Japanese novel The Day a Computer Writes a Novel, which was generated by an AI machine that was trained on sentences, words, and structure written by its designers and was a finalist for a Japanese literary award. Although we have no proof of it, it also would appear that some of the unreadable book “summaries” and genre fiction created by cutting and pasting text from multiple books, both of which can be found on Amazon, are AI generated. GPT-3, a text-generating AI tool, has proven capable of writing fairly sophisticated material, including text-based adventure games and even an article for The Guardian on why AI is harmless to humans.

To understand the legal issues and the challenges AI brings to copyright law, it helps to understand how AI creates — or, more accurately, autogenerates — text and other artwork through machine learning. AI does not “create” out of thin air. Computers are given a massive number of prior works — of the same type and style as the works they are intended to create (e.g., text, music, images) — and those works are broken down into data using certain rules and parameters from which the computers “learn.” The computers are thus “trained” by reading the data and using statistical algorithms to recognize patterns and relationships in the data and make successful predictions when queried. For example, an AI system that writes articles on the results of sports games is trained on a data set of prior sports reports and certain rules for what data points are essential and in what order. With slightly more sophisticated technology, AI can, for instance, write in a Kafkaesque style. In 2021, Stephen Marche wrote in The New Yorker about his experience using Sudowrite, an artificial-intelligence application based on GPT-3, to write a believable follow-on paragraph of The Metamorphosis.

Examples of image-based generative creation include ING and Microsoft’s 2018 project, The Next Rembrandt, which used machine learning to create a 3D-printed “Rembrandt” painting based on 346 Rembrandt works broken down into 168,263 high-resolution painting fragments. Using this database of images, the system made instantaneous mathematical calculations or predictions by finding patterns in the material it was trained on, within rules and parameters set by the trainers. The result was an impressive look-alike of a Rembrandt painting. More recently, Jason Allen, a tabletop game maker, used Midjourney, an AI text-to-image program, to create Théâtre D’opéra Spatial, a work that took first place in the emerging digital artists category at the 2022 Colorado State Fair. In each case, the new “works” were generated only after massive numbers of existing works were copied and fed into the systems, and in the case of Midjourney, most of the millions of images scraped from the internet are protected by copyright.

These AI-created “new” works, as such, are not really new or truly creative. Rather, the AI system ingests existing works that have been broken down into data points, and then rehashes them—albeit in a very complex way. As Dr. Alison Gopnik, a professor of psychology and part of the AI research group at the University of California, Berkeley, puts it: “We might call it ‘artificial intelligence,’ but a better name might be ‘extracting statistical patterns from large data sets.’”

While some might argue that the human brain creates in that same way, and certainly there are examples of books, songs, and so on that merely rehash what already exists, it is clear that humans bring something more to what we would consider truly creative works — their experience, emotion, imagination, and intuition. Human art evolves because it strives to reflect the stories of our time and place. Computer art will always stand still without new human art to train on.

So when we talk of AI taking over human- created arts and literature — as some predict — more is at stake than just loss of jobs. Jason Allen told The New York Times that he empathized with trained artists’ fears about losing their work, but said, “Art is dead, dude. It’s over, A.I. won. Humans lost.” But AI technology — at least the kind that exists today — cannot replace true art, and I think we can all agree that a world without the arts, which help move us forward as a society, is not one that we aspire to.

To ensure that our society continues to support human creativity and that AI does not destroy human-created arts, we need to preserve a well-functioning copyright system. The Guild recommends, first, that copyright law prevent the unauthorized copying of mass amounts of in- copyright works to create competing works, and second, that when AI infringes, an identifiable person or entity can be held liable.

As a point of clarification, when an artist’s or a writer’s work is copied to train AI to create a competing work, the output — the new work generated by AI—is highly unlikely to infringe any preexisting work, including any work that it was trained on. For instance, if ING and Microsoft had copied all Andy Warhol’s work to train AI to create a new “Warhol,” the result would not infringe any one Warhol work. That is because a work can only infringe another individual work, not a style, and only when it is copied from the original work and is “substantially similar” to that particular work — after all the nonprotectable elements (ideas, facts, common elements or tropes, and style) are filtered out. In the same way, the chances that a romance novel created by an AI machine trained on romance novels will infringe any particular novel is so minuscule as to be practically impossible.

The only identifiable act of infringement, then, is the copying done at the front end, to train the AI. Each of the works that is copied to train the AI is infringed unless permission was granted or an exception applies. Fifteen or so years ago, copyright attorneys by and large would have agreed that such copying was clearly infringing, but fair use case law has since broadened, and as we saw with Authors Guild v. Google, copying vast numbers of works (books in that case) may be deemed fair use if the court finds that the use is “transformative” and the value of the works is not unduly harmed. In the Google case, the court decided that the copying was fair use because it found that the primary end use was book search, and that book search was a transformative fair use of the books copied that did not interfere with the markets for the books. (We disagreed with the latter finding, but that is another story.)

We would vehemently argue that Google does not apply to copying literary works to produce new, similar works because, unlike the Second Circuit’s finding in Authors Guild v. Google, AI-generated works will compete directly with the works copied and, as such, would harm the market for the original works, especially when such copying becomes widespread and unrestricted (which is the current legal standard). As of now, if such a case were brought, we have no assurance that the court will agree. Under existing law, a court could potentially insist on viewing the harm very narrowly, looking only at the effect of the AI-generated title to the market for each individual title that the AI was trained on, where it might be hard to prove harm. The court might conclude that the AI-created book did not affect the market for any particular book it ingested any more than those books harmed each other’s markets. Moreover, we are seeing a move toward creating broad copyright exceptions for mass “data” copying for AI training and research purposes in several other countries, including the U.K. Indeed, we are currently preparing evidence to submit to the U.K. Intellectual Property Office to protest its recent recommendation for a broad copyright exception that would allow the copying of “data” (including books and other copyrightable works that are distilled to data for these purposes) for any AI purpose.

As such, we have argued in various government and other forums that it is important that the copyright law be amended to expressly state that AI cannot be trained on copyrighted works to create competing works without the authorization of the copyright owners.

Our second concern is that AI machines will most certainly soon be capable of infringing on their own, even instructing other AI machines to copy works without any direct human involvement. One could easily imagine an AI system that, without human direction, creates a secondary AI tool that scours the internet for certain types of works and then makes copies of them available without authorization, or one that cuts and pastes from online material to create new material. A relatively recent court-made doctrine requires that for infringement to be volitional, finding that merely creating and owning a machine capable of infringement is not enough to prove direct liability. And it could be very difficult to prove secondary liability, given that the person who created the first AI system might not have directed the AI to create an infringing machine, or benefit from the infringement or control it. As such, it is highly possible that there would be no person or entity who could be sued for the AI’s infringement.

We need new rules to avoid both of these likely scenarios and have submitted public comments to the U.S. Patent and Trademark Office and the U.S. Copyright Office on these and related issues, and we are participating in their round- tables on the subject. I was a key drafter of the American Bar Association’s comments to the U.S. government and the World Intellectual Property Association (WIPO), cautioning copyright policymakers against AI’s potential to upend the market of human-authored creative works. The principal points in our various submissions are that the copyright law needs to ensure that authors and other creators are compensated for the use of their work by AI; that those who train AI machines to create new works get permission from the copyright owners to use their works; and that we revise the law to identify the person or entity— namely the creator or user of the AI — who is liable when the AI commits copyright infringement.

We are confronting serious policy issues about the future of creativity: Do we want humans or AI creating our literature and other arts? As we have argued in multiple policy forums, if we want to ensure that our literature and arts continue to reflect our current experiences and our imagined ones, we need to ensure that human creators are compensated and their work is protected. AI cannot feel, think, or empathize. It lacks the essential human faculties that move the arts forward. While it is remarkable that engineers could create a “new” Rembrandt that so closely resembles an authentic one, we do not need new Rembrandts; we need new art and literature to reflect where we humans are now, and where we might be going.

—Mary Rasenberger