Will OpenAI pay scientific publishers for content by EOY 2025?

Will OpenAI make a deal to pay a major science publisher such as Elsevier for explicit access / training rights on their papers, before the end of 2025?

Closes at end of 2026 to allow for a one-year lag in when we learn about it. Resolves YES if such a deal has already taken place.

Vide this minor difference of guesses between myself (Yudkowsky) and Sabine Hossenfelder.

bought Ṁ50 NO

Is a settlement in response to an Elsevier lawsuit for $undisclosed a "deal"?

i would argue that it should be if the settlement includes a prospective licensing deal

Yeah, I'd say that if it includes a licensing deal then sure.

i hold this at 80-90%

  • openai will want as much data as possible, with good metadata and annotation, which supports them making deals

  • openai has already made data deals with other publishers (reddit, etc)

  • openai will make these deals no matter what to avoid content liability (just potentially at lower prices)

  • openai will benefit from delegating the task of keeping the pile updated to someone else

  • even a small number of unique items could justify the deal at the right price

  • new managers they hire will be more used to purchasing rather than building or collecting

against this is the idea that they really care to save the money because they could get the data elsewhere but i think this is less of a thing in a $100bn organization or they have other organizational focus and this is too hard to do now

opened a Ṁ50,000 NO at 75% order

I haven't recently looked in on what sort of torrents exist there, or how hard it would be to throw it somewhere in Common Crawl and make it look like an accident. But if they're sufficiently scared of legal consequences to not do that, I'd guess they won't pay either.

If you specifically want a somewhat comprehensive collection of new papers published since after sci-hub stopped uploading new papers, I'm not sure if there's anything. (If there is, someone please tell me!!!)

@jacksonpolack depends a little on the field but many authors upload their papers to preprint servers such as arxiv and hal

bought Ṁ20 NO

Would there be an incentive to do this rather than train on arxiv pre-prints or emailing the authors of desired papers en mass?

Was initially thinking that scraping papers is bad optics, but actually it might be worse to do public deals with scientific journals.

(Sabine Hossenfelder's tweet sums up how media would probably treat deals like this, i.e. "look at these journals selling researchers hard work and not giving them anything")

Good point, I wasn't even considering optics motivations

@GeorgeIngebretsen i sold my position after reading ur comment

@GeorgeIngebretsen What? OAI does not seem to care too much about optics.

As far as I can tell, they correctly rate such small factors like “makes deals with sci journals” as v unimportant compared to making their product that much better.

It may seem important to you, but to the vast majority of people, it won’t make them more or less outraged at “Big Tech”.

Great point. I guess even the new york times lawsuit didn't seem to matter much? Not to mention how much more coverage there will be as models become more capable.

I agree training data optics are just a drop in the bucket (and if oai actually builds what its set out to build, eventually a completely insignificant detail). My og comment was more about trying to figure out the sign value of the optics of paying for journal data. Though, after some thought, I realize optics are a very negligible factor in if oai would make that deal.