Matt Topic is no stranger to taking on powerful people. Before representing newsrooms like The Intercept, Raw Story, and AlterNet in a landmark lawsuit against OpenAI, he was a civil litigation lawyer working on issues ranging from police brutality to representing inventors seeking to protect their patents.
While Topic launched his lawsuit two months after a similar copyright case by the New York Times, his approach is different. Crucially, if he’s successful, it could set a precedent that protects every online publisher—and their archives.
[time-brightcove not-tgx=”true”]Legacy publications like the New York Times, which still has a print product, protect their copyright by sending copies of their physical newspaper to the U.S. Copyright Office and paying a single fee once a month, Topic explains. The Times’s copyright lawsuit relies on the protection granted by filings at the office. But until recently, it was more challenging for digital publications—by far the dominant way people consume news—to register their copyright in the same way because they had to pay to register each individual article. This was a costly endeavor for an organization that publishes frequently. This August, the Copyright Office changed its rules to allow online publishers to bulk register content, but publishers who decide to register past articles, which may already appear in OpenAI’s datasets, may not be able to retroactively claim all the benefits afforded by registration.
Loevy & Loevy’s lawsuit for these online publishers takes another path using the Digital Millennium Copyright Act, an early internet-era law, meant to protect rights owners from their content proliferating online without their permission. (Some may be familiar with the law from the blue-screened anti-piracy warnings on DVDs in the early 2000s).
The law specifically protects people from having identifying information, like an author’s name or title, stripped from reproductions of their work, as Loevy & Loevy alleges OpenAI did before inputting this material into its training sets. But identifying the data hasn’t been easy. The training sets for large language models are huge and difficult to parse, Topic says. (OpenAI did not respond to a request for comment.)
But the stakes, Topic reiterates, are high. “AI is already displacing human journalism with AI garbage. That’s bad for society, democracy, and news publishers—it’s an existential threat to the news business,” Topic adds.