Will the GitHub Copilot Litigation Succeed?

A law firm has filed a class action suit against GitHub, OpenAI, and Microsoft over the GitHub Copilot code completion software: https://githubcopilotlitigation.com The key question is whether training a model and using it requires a licence to the training data.

By "Copilot" here I mean the current version of Copilot and any sucessors that are also trained without licenses to the underlying data.

This question resolves YES if, after appeals, the final ruling is that Copilot in normal operation requires a license to the training data or if Github voluntarily withdraws Copilot. It resolves NO if the ruling is that it does not require a license or if the suit is dropped. It resolves N/A if the court doesn't rule on this claim (ex: dismissal for lack of standing) or if the suit is still unresolved in five years. If the suit is settled, it resolves YES if Copilot stops operating within six months of the settlement, and NO if it doesn't.

Get Ṁ600 play money
Sort by:
predicts NO

There was an initial motion to dismiss, which the court issued a ruling for on 2023-05-11: judgement.

Plaintiffs’ claims for violation of Sections 1202(a) and 1202(b)(2) of the DMCA, tortious interference in a contractual relationship, fraud, false designation of origin, unjust enrichment, unfair competition, breach of the GitHub Privacy Policy and Terms of Service, violation of the CCPA, and negligence are dismissed with leave to amend. Plaintiffs’ claims for civil conspiracy and declaratory relief are dismissed with prejudice.

This leaves the key claim, breach of license, intact ("Defendants’ motions to dismiss Plaintiffs’ claim for breach of license is denied.")

Looking through the docket the plaintiffs submitted an amended complaint on 2023-06-08 and GitHub etc submitted an updated request for dismissal on 2023-06-29. Maybe the best summary right now is in filing #116 on 2023-07-05, the "joint case management statement"? This includes that they're considering settlement, and that if they did go to trial they're proposing a trial date deadline of 2025-09-29 (plaintiffs) or 2026-02-04 (defendants).

I'm not very clued in here, but I feel like "copilot" might turn out to be ambiguous - are you specifically referring to the current version of copilot, or to potential future versions as well? If the current version only, and the current version gets replaced (will we even know if that happens?) we might not get a ruling on it.

predicts NO

@philh I would include future iterations, rebrandings, etc if they were also trained on data without a license. The main claim of the suit is that training needs copyright licenses on the data, and that's the reason I'm interested in how this suit goes.

predicts YES

@JeffKaufman What if the suit settles and OpenAI/Microsoft train a new Copilot on just code that they have the license to use (MIT, internal Microsoft source code, etc.)? I don't know if there's enough free data out there for that to work, but it's at least plausible.

predicts YES

@philh I have this prediction (might make a market on it); I'm also interested how this resolves if my prediction is true.

predicts NO

@Gabrielle @dp switching from "Copilot trained on everything public" to "Copilot trained only on licensed code" would be pretty strong evidence that they thought they were going to lose the suit, though with some probability on it turning out not to hurt performance very much and so being worth it for the PR or for even a small risk of losing the suit. But I'm leaning towards updating the terms to say that if they redo copilot to use only licensed training data this resolves YES -- thoughts?

predicts YES

@JeffKaufman That is consistent with how I interpret the market description, but it doesn't completely agree with the title because it uses this loaded word "suceeed".

My gut feeling is that the Copilot lawsuit is motivated by the product being detrimental to open-source communities. Copilot doing the copyright infingement is not a central threat to that cause; Copilot existing and understanding open source API's without contributing back is. See https://githubcopilotinvestigation.com/#what-does-copilot-mean-for-open-source-communities.

But I might be wrong here, who knows.

predicts YES

@JeffKaufman I think that would be the correct change, since the result of being only trained on license code is the legal result that the lawsuit is going for (even if the people behind the lawsuit would rather that it not exist at all).

ah! well in that case, gimme my yes shares back, website buttons!

predicts NO

@Gabrielle Edited!

predicts NO

@dp I don't think that's the main motivation, or what I mostly see people upset about? Ex: code written with copilot still can be (and often is) released as open source.

@JeffKaufman Am I right to understand that "licensed" here must mean something other than the license contained in the GitHub ToS that you give them whenever uploading code to GitHub, which authorises them to host the code and use it to improve their products and features? One possible argument in this case is "even if it needs to be licensed, we've always had one".

predicts NO

@jbeshir This raises big questions about whether people who weren't original authors of code putting it on GitHub have been purporting to license it to GitHub under terms they were never authorised to, and I'm not sure what it would mean for this market if the result was a process for reporting and removing or excluding these cases.

predicts NO

@jbeshir since, as you say, it's common for people to upload code to GitHub where the only license rights the uploader has come from the associated open source license, I wouldn't expect that argument to work

predicts NO

@JeffKaufman I think it might, in part because of the DMCA. How would you resolve if it was held to be sufficient by default assumption, with an invitation to use e.g. the DMCA process anywhere your code was uploaded to GitHub without your permission to license it under the ToS?

I think it's actually pretty plausible, so it's relevant to how I'd trade here- zooming out to think about hosting websites in general, they can do a lot of stuff with the content you upload to them in the general case, not least redistribute it further, and they enjoy protections that insulate them against copyright claims caused by an uploader who didn't actually have permission to authorise the website to do things with the content uploaded.

It's not clear to me that a licensing process for this would necessarily look different to the licensing process for hosting content.

predicts NO

@jbeshir This is also why GitHub is allowed to do anything with code I upload without a license file at all, and their permission to use it to improve services is included in the same document as their permission to host it at all.

https://fossa.com/blog/analyzing-legal-implications-github-copilot/ includes a take from an IP lawyer along these lines for any training data on GitHub itself.

uh, whoops. @yaboi, if you comment, I'll tip. I misread the criteria - I doubt copilot will need to stop operating, merely retrain with license tracing (a capability not yet fully built).