Will Sarah Anderson et al vs Stability AI, Midjourney, and DeviantArt find that the defendants violated copyright by training Stable Diffusion on images scraped from the Internet?

A class action lawsuit has been filed against Stability AI, Midjourney, and DeviantArt. Claim 1 deals with direct copyright violation - including:

160. Defendants directly infringed Plaintiffs’ and the Class’s rights because they have:

a. reproduced one or more of the Works in violation of 17 U.S.C. § 106(1);
b. prepared Derivative Works based upon one or more of the Works in violation of 17 U.S.C. § 106(2);

c. distributed copies of one or more of the Works to the public in violation of 17 U.S.C. § 106(3);

d. performed one or more of the Works publicly in violation of 17 U.S.C. § 106(4); and/or

e. displayed one or more of the Works publicly in violation of 17 U.S.C. § 106(5).

While Claim 2 also deals with copyright violation (of a more indirect nature), this market will resolve solely on the status of the first claim.

If the suit is settled out of court, or is dismissed for procedural reasons, this market will resolve N/A.

Get Ṁ600 play money
Sort by:
predicts NO

@mkualquiera That certainly gives better credence to the compression argument.

Manifold in the wild: A Tweet by Eric Adler

Attn: weird copyright friends Want to bet on the outcome of the Anderson v. Stability case about AI copyright infringement? https://manifold.markets/Imuli/will-sarah-anderson-et-al-vs-stabil @copyrightlately @edleeprof @marklemley @brianlfrye @RickSandersLaw

Market for the suit generally.

They have nothing. Having extensively researched copyright laws and text-to-image software for a video, it's basically impossible to win.
1- Stable Diffusion operates in the UK. In 2022, the UK passed laws making it so that data mining is copyright exempt, EVEN copyrighted works.


2- arguments on how text-to-image software operates is that it is similar to a collage. This is wrong.

This video outlines the argument quite well. Unless it is made through img2img, it doesn't qualify as a derivative.


3- Authors Guild, Inc. v. Google, Inc. set the precedent that non-commercial use of copyrighted work is fair use. And the training data from which Stable Diffusion uses from LAION and Crawl Common falls under fair use. The claimant are arguing that Stable Diffusion are the ones doing the data-mining which is not even the case, so the case falls flat on its face. And even if they argue that Deviantart is violating their terms of service, collecting data for non-commercial purposes is in line with the precedent set by Google and doesn't require the permission of copyright owners.

On all possible levels, they cannot win.

predicts YES

@06a8 thanks for the analysis! Question about (1), though: if Stability was sued in the US, would the US care that the scraping was legal under UK law? How's that work?

predicts NO

@AaronKaufman No. In the case of HiQ Labs v LinkedIn, the United States Court of Appeals for the Ninth Circuit examined the legal implications of data scraping. It determined that HiQ Labs, a data analytics firm, had the right to gather information from LinkedIn's public profiles. LinkedIn had sought to impede HiQ Labs from scraping data from its platform, citing potential violations of the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act. However, the court found that the CFAA did not protect LinkedIn's public profiles as they were not password-protected or otherwise restricted from the public. HiQ Labs' scraping of LinkedIn's data was not a violation of the CFAA or the DMCA. In 2022, the court upheld the district court's decision allowing HiQ Labs to continue scraping LinkedIn's data, referencing the Supreme Court’s “gate-up, gate-down” analogy, ruling that “the concept of ‘without authorization’ does not apply to public websites.” This court case established the precedent that scraping data publicly accessible on the internet is legal and does not require permission unless an account is required to access the private data. Adding to this, every country classifies data-scraping as legal when done for non-commercial purposes. Since LAION is a non-profit organization that does not sell its dataset but makes it freely open to the public, they are seen as falling under the protection of fair use and are copyright exempt.

StableAI and Midjourney likely use this dataset, but the dataset itself does not contain copyrighted image, it merely LINKS to copyrighted images on websites on the internet.  In the lawsuit against Stable Diffusion, it is assumed that these massive datasets contain copyrighted images, but they do not. They only contain HTML pairing with associated alt-text related to the image page. This means that the datasets only contain links to publicly available images that the AI scans on the internet without needing to copy or download them. By doing so, Stable AI circumvents any copyright breach because the AI observes, identifies the image and corresponding alt-text, and remembers the stylistic patterns by Diffusing it, analogous to a person observing and remembering an artwork within their mind. It is irrational to consider this data-scraping AI-training system as a breach of copyright because, when applied, it would be equivalent to arguing that a human observing an artwork is a copyright violation. As copyright infringement only occurs when someone copies, reproduces or uses a copyrighted work without the permission of the copyright holder, any class action lawsuit that demands compensation is unlikely to succeed because the copyrighted content was only looked at by the AI-model, it was not copied nor exists within its data-bank to be classified as a collage, and it was available on a public website, which is subject to legal data-scraping.

Essentially, even if StableAI were operating in the US, it would be legal because data-scraping is legal for non-commercial purposes. LAION is a non-profit organization, and the dataset doesn't contain copyrighted images but links to websites that contain copyrighted images on public websites, which is legal to get. Being in the UK gives them even further protection. The fact that people fail to recognize that the LAION dataset doesn't contain copyrighted images is one of the biggest blunders people make when examining the case.

@06a8 Consider the situation of source code - it is unambiguously created by humans and falls under copyright.

Object code is also widely held to be under copyright, by virtue of the source code being so, even though it is machine generated and you cannot perfectly reconstruct the source code from the object code. If I compile some source code that I don't have the right to distribute and distribute the object code instead, I have still violated copyright law.

I think one of the ideas in the lawsuit is analogous to: the source images are the source code, the model is the object code, and the stable diffusion code is a virtual machine.

@Imuli Of course, as you say, scraping the data is definitely not a violation.

Manifold in the wild: A Tweet by ִ

Ah, someone already made a bet. I'm going to bet no lol. https://manifold.markets/Imuli/will-sarah-anderson-et-al-vs-stabil

The best argument they've got is that the AI is a "derivative work" - in that its ability to reproduce an artist's style is correlated with how many of the original (copyrighted ) works it's been trained on. But that's an entirely novel application of copyright law that strays considerably from precedential holdings. I suppose they might have tried to shop for the one guy on the circuit who might consider it, but I suspect that Stability and crew can probably apply enough weight on their side to deflect such an attempt.

Generally speaking I'd expect this to go nowhere. I think it's maybe a 10% chance it settles to save the defendants some cash, but their case is strong enough I'd expect it much more likely they just take it to court and crush them on the merits.

More related questions