The Copyright Dilemma in Generative AI: An Unfinished Battle

JJohn July 25, 2023 1:52 PM

As generative AI companies face increasing scrutiny over using copyrighted works for training their models, the inadequacy of current copyright laws in addressing issues of consent, credit, and compensation becomes evident. High-profile lawsuits and open letters from authors and artists alike highlight the need for a comprehensive legal approach, challenging existing notions of fair use and originality in the age of AI.

A call to AI giants: Consent, credit, and compensation

The world of generative AI has recently been stirred by a wave of controversies surrounding copyright issues. An open letter from the Authors Guild - backed by more than 9,000 writers including famous authors like George Saunders and Margaret Atwood - to the leaders of AI giants such as Alphabet, OpenAI, Meta, and Microsoft, raises significant concern. They demand consent, credit, and compensation for the use of copyrighted materials employed in training AI models. This plea is not an isolated incident; it's a part of a growing movement among creatives who have started noticing uncanny resemblances between their work and the output of large language models (LLMs). It has led to calls for transparency from AI companies and compensation for those whose works were used, highlighting the obscure relationship between AI and copyright.

There's a growing sentiment that pinning all hopes on copyright law to address the intricate issues posed by generative AI is misguided. The lawsuit filed by comedian Sarah Silverman against OpenAI is a case in point, where she alleges the company used her works without permission to train its ChatGPT model. But the concept of AI 'copying' protected works is far from straightforward. AI models like GPT-3.5 and GPT-4 do not 'copy' data in a conventional sense; instead, they 'digest' training data, predicting the next best word in a sequence. This nuanced difference in AI's interaction with data brings forth a complex copyright conundrum that cannot be merely oversimplified as 'copying'.

Fair use and AI: An Unsettled Debate

The concept of 'fair use' is at the heart of these discussions. Parallels are drawn between AI's learning process and how a search engine builds its index - learning from data, rather than copying it. Some believe this learning process falls under the umbrella of fair use, a provision of US copyright law that typically permits unlicensed use of copyrighted works for purposes such as scholarship and research. However, the jury is still out on whether a machine can create a derivative work, as the US Copyright Office maintains that only humans can produce 'works'. This lingering question further complicates the AI-copyright debate.

Many legal experts argue that the shroud of secrecy around the datasets used by companies, such as OpenAI, to train their AI models could serve as a compelling argument against them. Lawsuits brought against these companies postulate that the vast datasets, such as Books2, must inevitably contain pirated material due to their sheer size. This introduces into the debate a new layer of complexity - the question of whether the copyrighted works were obtained legally or not. This key concern has not yet been definitively addressed in the legal realm.

Finding Balance: Copyright, Creativity, and AI Ethics

A growing concern among artists and tech enthusiasts is that a more stringent interpretation of copyright infringement, aimed at reigning in generative AI, could inadvertently stifle creativity. The question is, should an individual be compensated if their works, which they spent years creating, are used for commercial purposes by AI? This question, among others, is likely to compel AI companies to take proactive steps to prevent lawsuits. These measures might include obtaining licensing agreements to use copyrighted works or asking artists to allow their work to be used as training data, thus entering uncharted waters in the realm of copyright and AI.

More articles

Also read

Here are some interesting articles on other sites from our network.