Sunday, January 25, 2026

DP26003 LLMs and the Original Author Copyright V01 250126

 In "How AI Ate the World", Chris Stokel-Walker addresses the subject of Large Language Models (LLMs) and their relationship with authors' works—specifically the phenomenon of machines "extracting" or replicating copyrighted material across languages—through the lens of copyright, training data, and the loss of human agency.  

While the book covers many aspects of AI, the specific subject of LLMs extracting or "remixing" original works into translated versions can be summarized through these three key themes explored by Stokel-Walker:

1. The "Ingestion" of the Digital Commons

Stokel-Walker describes how LLMs were built by scraping massive amounts of text from the internet and digitized libraries (the "Shadow Libraries").

The Extraction Process: He explains that AI doesn't just "read" books; it deconstructs them into mathematical vectors. This allows the AI to "extract" the style, plot, and even specific phrasing of an author and reproduce it in any language the model has been trained on.

The Author's Loss: For an author, this means their work is being used to create "synthetic competitors"—translated versions of their own ideas that they did not authorize and for which they receive no royalties.

2. Machine Translation as a "Double-Edged Sword"

The book traces the history of translation from Cold War clunky machines to modern LLMs.  

The Benefit: AI allows a book to be translated into dozens of languages instantly, potentially opening up global markets for indie authors.  

The Extraction Issue: Stokel-Walker highlights the "flattening" of language. When an AI extracts a work to translate it, it often misses the cultural nuance, sarcasm, and "soul" of the original author. It creates a "hollowed-out" copy that can flood markets, devaluing the original human-translated work.  

3. The Legal "Enclosure" of Literature

Stokel-Walker discusses the legal battles (such as those involving the Authors Guild) where writers are fighting back against their work being used as "training fuel."

Copyright Infringement: The book explores the argument that if an AI can "extract" enough of an author's unique voice to produce a translated sequel or a similar book, it has effectively stolen the "Code" of that person's creativity.

Derivative Works: He raises the concern that these AI-generated translations are technically "derivative works." In traditional publishing, an author owns the translation rights; in the "AI-eaten" world, those rights are being bypassed by users who use LLMs to bypass the traditional gatekeepers.

The "Content Crisis"

The core of Stokel-Walker's argument is that we are moving toward a "Content Crisis" where the sheer volume of AI-extracted and translated works will make it impossible for human authors to be discovered. He warns that if we allow LLMs to freely extract and remix the digital commons of literature, we risk a future where "new" books are just translated mashups of everything that came before them.


No comments:

Post a Comment