Will xAI build a LLM as powerful as GPT-3.5?
➕
Plus
62
Ṁ25k
resolved May 19
Resolved
YES

This market will resolve YES if, by the end of June 2024, Elon Musk's xAI announces that they have a language model at least as powerful as GPT-3.5 or Claude.

By default, I will use the Arena Elo rating to decide whether a model meets the bar. If there is no such rating, I will use other benchmarks (e.g., MMLU) or my subjective impression. If there is a lot of disagreement, I will resolve NA.

Get
Ṁ1,000
and
S3.00
Sort by:

Anyone objecting to resolving YES? It seems like it has a good MMLU score, and it doesn't look like it'll appear on the LMSYS Chatbot Arena Leaderboard anytime soon.

sold Ṁ85 NO

@JonasVollmer I was the biggest NO holder; this seems fair

predictedNO

Why do you think it still hasn't been added to the arena? It also hasn't received an independent MMLU evaluation side by side with 3.5? They've already added the latest Mistral model to the arena even though it came out months later? I'm still a large NO holder (disclosure).

predictedYES

@benshindel Yeah IDK, seems weird

predictedYES

Any reasons against resolving this YES, based on all the benchmarks?

predictedNO

@JonasVollmer It's still not on Chatbot Arena, which is the preferred benchmark. Shouldn't you wait until the market close in case it gets uploaded there? Chatbot Arena scores can be quite different from other benchmarks. I was betting on that.

predictedYES

@Shump Ok, will hold off on resolving YES based on this!

predictedNO

It's apparently more capable by MMLU/GSM8k/MATH/HumanEval, although those are not directly related to how much people like it/arena score.

Oops

GPT-3.5 isn't really state of the art anymore. There are open source models that beat it on most metrics.

What has xAI done, besides be announced?

predictedNO

@dominic I been saying this

@benshindel Building products & doing stuff takes time. Consider: There were 2½ years between GPT-3 and ChatGPT

@dominic lol looks like this was not a great take

predictedNO

Another question: Llama-2 seems lower on the leaderboard than GPT3.5

Why is the title in disagreement with the description?

predictedYES

@BenjaminShindel updated the description to remove Llama 2 (thought this would be most fair to you given that you're the largest NO holder)

@JonasVollmer People subjectively prefer LLAMA2 over GPT-3.5 by far.

Try out https://llmboxing.com/

predictedYES

@firstuserhere it does worse on the benchmarks I linked to. Not sure why

predictedNO

@JonasVollmer Thx! Although tbh it wouldn’t impact my betting that much as I mostly just think it’s <75% likely they’ll have developed any public LLM at all by June

predictedNO

Is there any evidence that xAI has even begun to train LLMs or that they plan on doing so in the next 9 months?

Subjective quality or on benchmarks, or on leaderboards? Or a general qualitative answer?

If the latter, you may wanna frame the market similar to Peter Wildeford's following ones:

predictedYES

@firstuserhere Added: "By default, I will use the Arena Elo rating to decide whether a model meets the bar. If there is no such rating, I will use other benchmarks (e.g., MMLU) or my subjective impression."

hm, i mean given the list of people + dan advising it, most likely a strong yes, and since timelines are pretty quick june is actually a reasonably solid estimate

i think hardest roadblock would be time for training + finding good enough data. it could also be the case that they dont actually go towards LM's immediately which seems pretty low probability (although i would be interested in looking at if they did some autoformalization stuff especially)

predictedNO

@astyerche now realizing this says as powerful and not more powerful oof

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules