The Piracy Pivot: How AI Copyright Battles Are Shifting From 'Fair Use' to Data Provenance

The Real Winner Isn't Fair Use—It's Data Accountability

For months, headlines celebrated when courts ruled that AI training was "exceedingly" transformative and qualified as fair use. But those victories masked an uncomfortable truth: while fair use held up, AI companies remained on the hook for potentially billions in damages from downloading pirated works, ultimately settling for $1.5 billion and paying approximately $3,000 per title from pirate libraries.

This is the pivot reshaping AI copyright litigation in 2026.

The Thomson Reuters Roadmap Wasn't About Generative AI

Thomson Reuters sued Ross Intelligence over its use of Westlaw headnotes to train an AI legal research tool, and the court granted summary judgment finding the headnotes were original and protected, rejecting Ross's fair use defense. But here's what matters: the court found Ross's use was commercial and not transformative because it was using Thomson Reuters's copyrighted material as training data to create a competing product.

Yet the judge emphasized that "only non-generative AI is before me today," signaling courts may take a different approach in generative AI cases. The case is currently on appeal before the Third Circuit—and this will likely be the first appeal of a fair-use decision in an AI copyright case.

Where The Real Liability Lives: Data Source Matters More Than Use

In 2026, courts are focusing more on how data is gathered—whether it's pirated or violated contractual agreements—rather than whether the training itself is transformative.

In Meta's case with authors (Kadrey v. Meta), the question of whether Meta seeded pirated copies is proceeding; if the court finds Meta distributed massive amounts of pirated works, it could face staggering damages similar to Anthropic and potentially another large settlement.

This distinction is critical. In the Anthropic case, the judge ruled that training on pirated copies was legal, but that pirating the books in the first place was not—a distinction some authors rejected, arguing AI companies shouldn't extinguish high-value claims at bargain rates.

The Copyright Office Already Took a Side

The U.S. Copyright Office stated that making commercial use of "vast troves" of copyrighted works to produce expressive content that competes with original works, especially where access was illegal, "goes beyond established fair use boundaries".

New Suits Target The Piracy Question Directly

Six authors, including two-time Pulitzer Prize winner John Carreyrou, filed individual lawsuits against Anthropic, OpenAI, Google, Meta, xAI, and Perplexity AI, seeking $150,000 for each title per the Copyright Act. The suits note the Anthropic settlement's $3,000 per work is "just 2% of the Copyright Act's statutory ceiling of $150,000, thus looking for $900,000 in total per work".

Meanwhile, Penguin Random House has filed suit against OpenAI in Germany, alleging ChatGPT infringed the company's copyrights in a popular German children's book series, with the lawsuit filed in Germany last week targeting OpenAI's Irish subsidiary.

The Music Angle: Licensing Becomes Proof of Liability

Universal Music Publishing Group, Concord Music Group, and ABKCO Music filed a $3.1 billion lawsuit against Anthropic on January 28, 2026, alleging Anthropic built Claude on torrented piracy. BMG Rights Management filed suit against Anthropic on March 18, alleging the AI company used various lyrics from Bruno Mars, the Rolling Stones and other performers to train its language models.

Yet Universal Music Group settled with Udio, entering into license agreements for UMG's recorded music and publishing catalogues, with the parties launching a new subscription service in 2026 for generative AI trained on fully authorized and licensed music.

The contrast is stark: licensing now works as evidence that models can be trained legally—and therefore, that unlicensed training was a choice, not a necessity.

What This Means: The Data Provenance Era

As of March 9, 2026, OpenAI was ordered to produce 78 million and 10 million output logs on top of a 20 million log sample already ordered. 2026 is likely to bring sharper challenges to fair-use defenses, aggressive plaintiff strategies to unlock proprietary training information through discovery, and a new wave of class certification battles.

The shift is clear: courts are no longer asking "Is AI training inherently fair use?" They're asking "Where did you get this data, and did you have the right to use it that way?"

The current legal framework, developed for a pre-AI world, is being tested in unprecedented ways. But the outcome isn't about redefining fair use for AI. It's about enforcing accountability for how AI is built.

That's a fight the technology industry can't win with broad fair-use arguments alone.