OpenAI has taken the position that training an large language model (LLM) on copyrighted works falls within the "fair use" doctrine. Those who own the copyrights on these works disagree and several court cases are in progress.
This question will resolve to "yes" if any US jurisdiction rules against OpenAI, requiring them to license some amount of their training data. It will also resolve to "yes" if OpenAI reaches an agreement, of their own accord, wherein they agree to pay for some amount of their training data.
An out-of-court settlement with an intellectual property owner is a precedent of a sort but is clearly stretching the meaning of "licensing", it will not count as a "yes".
📢Can you chime in, I see you were active in discussion on Dec 9th in this market.
📝Can this Resolve?
Resolves YES @cmiles74 , see below discussion. OpenAI has reached an agreement with Axel Springer which (among other things) includes payment for use of data in training.
https://openai.com/blog/axel-springer-partnership
This is confirmed with the new agreement between OpenAI and German publishing house Axel Springer
@e_gle oooh, let's look further to see if the licensing deal included anything about the articles being used in training data, but from what they're including in the article, this isn't about training data. It sounds like run-time access to the data.
@chrisjbillington https://on.ft.com/3GHVzEU
This article provides additional detail including this quote indicating past data will be used for training.
"Axel Springer will receive a one-off payment for its historical content that will be used to train the AI technology for the first time, but the larger fee will be paid under an annual licence agreement that will allow OpenAI to access more up-to-date information."
@e_gle @cmiles74 fwiw I don't think this should have counted, because the question was whether or not OpenAI would be forced to license training data, not whether it would voluntarily do so. AFAIK this wasn't a settlement from a lawsuit or anything, just a voluntarily agreement, and given that OpenAI is (according to the FT) paying more for up-to-date data than historical data, it sounds like the purpose of this agreement is more to get up-to-date data instead of licensing data they've already trained on.
@PlasmaPower the details of the market state it will resolve yes if OpenAI signs an agreement "of its own accord"
@e_gle Ah I missed that, I thought it was just if they lost or settled a lawsuit. Makes sense then, I'll read markets more carefully in the future :)
@cmiles74 would it count if such a thing happened before market creation? Or only things from after Oct 8th when you created this market?
@chrisjbillington We should withhold judgement until the suit brought by the Authors Guild wraps up. It's a big case and I think everyone is hoping it brings clarity to this issue.
@cmiles74 that doesn't really answer my question - if e.g. openAI signed a licensing agreement (unrelated to this suit) with someone prior to when you created this market, would that count?
@chrisjbillington Do you have an article or something about this agreement? This sounds really interesting!
@cmiles74 Can you tell me whether such a thing would count for a YES resolution, if it existed, before I tell you whether I think such a thing exists or before I go looking for one?
e.g: "OpenAI signs licensing deal with NYT, will pay undisclosed sum for use of copyrighted NYT articles".
Made-up example. Would that count?
@chrisjbillington My knowledge is limited, I am not an expert in this field. I did some research before creating this market. In my opinion, if there was such a licensing agreement it was likely too narrow, hence the lawsuits from the Authors Guild and the block of nonfiction authors.
No, an earlier licensing agreement will not count.
@cmiles74 Great, thanks!
This is the one I found, it was from before market creation:
https://apnews.com/article/openai-chatgpt-associated-press-ap-f86f84c5bcc2f3b98074b38521f5f75a
ChatGPT-maker OpenAI signs deal with AP to license news stories
ChatGPT-maker OpenAI and The Associated Press said Thursday that they’ve made a deal for the artificial intelligence company to license AP’s archive of news stories.
“The arrangement sees OpenAI licensing part of AP’s text archive, while AP will leverage OpenAI’s technology and product expertise,” the two organizations said in a joint statement.
Financial terms of the deal were not disclosed.
@chrisjbillington Thank you, Chris! This is a really interesting case, particularly where they imply a two-way deal: OpenAI gets access to the AP news archive and the AP appears to get some level of access to OpenAI products.
Is the close date of the market fixed, or does it extend as long as there are ongoing lawsuits? I'd be inclined to bet NO for this year, but YES on a longer horizon.
there are lots of these https://www.courtlistener.com/?type=r&q=OPENAI&type=r&order_by=score%20desc
@firstuserhere what does this mean? I have been seeing MANA figures next to questions and I am not sure I understand what this means. Can you explain please?
@andyou Good question! This question only applies to the US jurisdiction and refers to any licensing agreement. That is, if OpenAI comes to any agreement with Hachette Book Group where they pay Hachette money for licensing any amount of Hachette's intellectual property then this would count as being "forced to license".
@osmarks If OpenAI settles with a holder of intellectual property through a financial settlement, that will suffice as a "yes" for this question. I hope it doesn't turn out this way!
In my opinion the real aim is to figure out if training data needs to be licensed or if it falls under the fair use doctrine. If OpenAI settles with a IP holder, that will be unfortunate as it will not provide a clear precedent for other companies in this space. Even so, I believe that if OpenAI settles with one or more intellectual property that will indicate to the market that some payment needs to be provided to IP holders in order to train on their data. IP holders will expect some kind of payment.
What do you think, @osmarks?
@cmiles74 I haven't yet determined what I think will actually happen, though I would be somewhat annoyed if it did require licensing.
@osmarks I was thinking more along the lines of what's your opinion on how this question should resolve if they settle with an IP holder out of court. 😉
@osmarks I'm going to leave out-of-court settlement as a "yes" for now and think about it some, I'll update the question description this afternoon. I suspect that if Hachette manages to squeeze dollars out of OpenAI, they will be banging on Google's door next. But it's somewhat muddy and not as clear as a court decision.
@cmiles74 Agreed, a settlement is not a clear precedent and is definitely not "licensing" anything.