Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?
510
2.7K
14K
Dec 31
33%
chance

Question is about any models by competitors vs any current or future openAI models.

To surpass chatGPT, it cannot just be more popular. If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win regardless of popularity (provided, at least some members of the public have access).

If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will most likely decide the winner. There must be public access for it to be eligible.

The three main metrics for assessment:

1. Popularity and Accessibility

Assessed in real-terms. Total users and engagement (if public), otherwise I will defer to google trends or other publicly available data which can provide relative measures of popularity.

This metric is arguably most important, but is not the only factor as if an Android or iPhone AI gets pushed to existing OS, the existing assistant leaders in those areas (e.g. Siri, Google Assistant) will have a meaningful advantage, irrespective of product quality.

The popularity and userbase elements are essentially being used as a substitute for "usefulness".

2. Accuracy and Reliability

Assessed against a common set of questions or data, to be determined. I expect there will be studies or other academic materials published which I hope to refer to. If these papers conclude that "ExampleRacistBot" is more accurate than chatGPT because of censoring, etc. then "ExampleRacistBot" would be the more accurate model.

3. Power & Capability

Total computational ability, based on whichever metrics seem most relevant at the time. Right now those might be:

If there are any recommendations for Power & Capability metrics, please let me know. The biggest issue I see is that if I establish which metrics are most important (say; number of parameters, like everyone did for GPT3) we may end up in a situation like we are now, where GPT4 parameters are not disclosed. Expect this criteria to change over time.

Resolution Assessment

When the final assessment is being made, if there aren't reliable competitive grounds for comparison, the following series of checks are how I see the decisionmaking process going.


First check: Popularity. If one model is evidently significantly more popular (like >50% market share, or more popular by Google Trend metrics, etc.), then it will be considered the best unless there is evidence suggesting an alternative public AI is more accurate and powerful. If multiple models are within a similar frame of power and accuracy, popularity will determine the winner.

Second check: If there are multiple models of similar popularity, accuracy and power will determine the winner. If there are no good academic comparisons for accuracy, I will endeavour to conduct my own, but I'm really hoping someone else figures that out before I have to. Accuracy seems more tied to "usefulness" than power, but if there is a significant breakthrough in power such that one has a capability advantage over the other (like advanced logic problem solving) while maintaining accuracy, and for whatever reason it's not more popular but still public, that will win.

Third check: If there are multiple models of comparable popularity (or there is terrible data available) and there is no real clear difference in the capability, power, or accuracy - the decision will be deferred to more specific considerations, like a Google Trends comparison (relative search popularity) or the number of parameters (provided this is public, and still a respected metric)

If the decision becomes highly-subjective, I will sell all of my positions, donate the profits, and defer judgement to someone I deem to be an expert (and have reasonable access to) who will make the final call. I'll probably just email professors until I get a response, or ask someone enrolled in a related course at the time to ask their professor to respond.

[ Taking advice for updates or any proposed criteria changes ]

Competing for @AmmonLam's subsidy

[Changelog]

01/05/2023: Description updated to reflect thoughts expanded on in comments

Get Ṁ200 play money
Sort by:

somewhat similar question but with a more straight forward resolution criteria

What are people expecting in 2024 that would make this resolve yes? OpenAI is the current leader with a model they released in March 2023. They'll probably release something in 2024 which is clearly better than GPT-4. Google just released Gemini, which looks like it's approximately equal to GPT-4, meaning they're still a year behind OpenAI. Everyone else is still around GPT-3.5 level. Where is the 50% chance that OpenAI loses their lead coming from?

predicts YES

@dominic It seems to me more like a matter of "when" than "if". You did make me curious though, if not 2024 - I wonder where people stand on 2025?

There's a ton of AI orgs and only openAI/Microsoft putting resources to the GPT products. It only takes one big breakthrough or a different approach, e.g. Elon letting Grok be unhinged, which leads to increased accuracy or something, to beat openAI. Plus, they have the first to market disadvantage of all the legal heat - to whatever extent that proves to be a problem.

predicts NO

@Gen I’d be less surprised with 2025 than 2024, still think OpenAI has to be the favorite to be “in the lead” by then. I don’t think anyone aside from Google is “one big breakthrough” away from significantly surpassing GPT-4 though, and OpenAI/Microsoft have basically unlimited funding which most other AI labs do not, so I’d be really surprised to see a leading model come from a smaller player.

bought Ṁ50 of YES

I would be very surprised if there isn't a model which is considered to 'surpass' GPT-4 in some way before the end of 2024.

I'm curious whether it counts if a bunch of different models come out which surpass GPT-4 in different ways, and then someone networks them together and makes a good UI for it. From what I'm aware, GPT-4 is basically that. It's partly parameter size, but a lot is having a bunch of tightly integrated smaller GPTs which specialise on different things.

Don't get me wrong, I'm really impressed with GPT-4, but I think it's replicable, and LLMs are still so YOUNG as a technology. There are many more ways for this technology to be improved without training a bigger model (and we don't know how much that would help yet).

So maybe Google or Microsoft, or whoever trains a bunch of GPTs really efficiently, then makes a great UI to interact with them, and everyone prefers that model because it's the easiest one to work with, and does less of the annoying stuff that LLMs do. Long term, my bet on someone doing that well would be Apple, but I don't expect them to try by the end of next year.

Maybe the most likely path is Microsoft uses all the OpenAI tech, and their close partnership to integrate with as much Windows stuff as they can, and that's the path to mass adoption. Everyone who uses Office at work will get it, they'll use it to compose emails and write documents and code and whatever. Microsoft can then use that monstrous amount of data to improve their integrations, people get more used to using Copilot, and people want to use ChatGPT because it's not natively integrated.

Was this worth typing? I'm new here. Do I just write down how I see things playing out?

predicts YES

@MattMeskell Definitely worth typing, and you bring up something important

I'm curious whether it counts if a bunch of different models come out which surpass GPT-4 in different ways, and then someone networks them together and makes a good UI for it.

I am willing to defer to people with more AI expertise, but afaik GPT4 is rumoured to be an MoE which basically sounds like 8 models Frankenstein'ed together. If someone else launches a client that Frankenstein's multiple models (that aren't openAI's) and somehow it's better than the best openAI product/model/whatever - that should count. However, it can't be something like poe which just plugs you into the different models, it would need to be one input field (within reason, as chatGPT has multiple input fields where you can modify your prompts by telling it your name/job/etc) that makes the selections behind the scenes and provides an output.

I should also say though, the comparison is for the "language model" part. I don't think any multimodal stuff should impact the decision here outside of the impact it will inevitably have on the popularity or market share

bought Ṁ1,162 of NO

If competitors are momentarily better because OpenAI are holding back their best models, but then they release GPT-4.5 a couple weeks after e.g. Gemini Ultra and reclaims their lead, would this market resolve YES?

predicts YES

@Mira Nah I think it would be cringe to do that, the way I wrote it just now in a comment below:

it seems extremely unlikely that openAI are going to "fall off" in a major way and make the resolution obvious, I will likely be comparing the best LLM's publicly available at the end of the year. The world vs openAI.

BUT

If Google released Gemini-Gigachad which was a confirmed AGI that never made mistakes and could beat stockfish at chess, I would resolve YES immediately.

If they're super close, or even within the same league, I think we should wait.

I have absolutely no intention on resolving this YES because of Gemini, and it will only be considered if down the road their best version is made available and it is either widely used or confirmed to definitely be better.

If google or someone takes >51% of the consumer market share over the next 3-6 months and they're better, then I'll consider resolving, but not if the "better" is marginal and openAI have something cooking. I'll wait and see what they have.

My bias (despite my bet) is to resolve this NO if Google/whoever are making nerd technical arguments that their AI is actually a very slightly better model but they haven't got any consumer market share (which is the current situation)

@Gen

>If Google released Gemini-Gigachad which was a confirmed AGI that never made mistakes and could beat stockfish at chess, I would resolve YES immediately.

Doesn't this contradict the first part? OpenAI could have created GPT-Gigachad *2 internally but are e.g. waiting a few extra weeks to finish testing before releasing. Also, based on that, even if OpenAI released a better model before Google released Gemini-Gigachad, it would resolve YES just because Google has passed that threshold.

predicts YES

@12c498e If OpenAI released a better model before it, the threshold would change. The reason for me saying Gemini-Gigachad would resolve YES is because if someone dropped AGI, any subsequent AGI would be unimpressive unless they released it nearly simultaneously

I mention it here: https://manifold.markets/Gen/will-there-be-an-ai-language-model#jRkSA1lxEoS5N9PpcUWK

I think if google had AGI and OpenAI didn’t release anything as a rebuttal it would be within the spirit of the market (and description) to resolve YES

@Gen I see, sorry, I should have read past https://manifold.markets/Gen/will-there-be-an-ai-language-model#lTefXsgVzVwkrRrhf1XS; loftarasa asked essentially the same question, and you already answered it.

nitpicking:

"ChatGPT and other OpenAI models"

ChatGPT is not a model, so it doesn't make sense to compare it to any other model. It's an application. Also GPT-4 and GPT-4 turbo are the state-of-the-art, so there's no need to include other models in the comparison

Then onto the question itself, are you comparing any model released by an OpenAI competitor to any model that may be released by OpenAI? What if there's a Google Gemini Ultra that's better than current models used in ChatGPT (which are GPT-4 and GPT-3.5 turbo right now)? What if there's a GPT-5 that's better than that Gemini Ultra?

I have my views on what the question means but you should make it explicit (and the title could arguably be improved)

predicts YES

@loftarasa

It doesn't necessarily need to be a "model" to count, if there was a chatGPT-like application that was better than GPT4, that would be the reference point. It's really about the best consumer product, I guess.

The question is ultimately about the best from openAI vs the best everywhere else. Whichever is "the best" will win

If someone releases a model that is better than the best available openAI model, then this market will resolve YES. However, as it seems extremely unlikely that openAI are going to "fall off" in a major way and make the resolution obvious, I will likely be comparing the best LLM's publicly available at the end of the year. The world vs openAI.

If Google released Gemini-Gigachad which was a confirmed AGI that never made mistakes and could beat stockfish at chess, I would resolve YES immediately.

As a note, I have absolutely no intention on resolving this YES because of Gemini, and it will only be considered if down the road their best version is made available and it is either widely used or confirmed to definitely be better.

Does this come across as significantly different to your initial interpretation?

@Gen thanks for the thoughtful reply!

If Google released Gemini-Gigachad which was a confirmed AGI that never made mistakes and could beat stockfish at chess, I would resolve YES immediately.

What if a week later OpenAI releases a superintelligence that nukes every major city to protect Earth from filthy humans?

I think what I was struggling to understand is why this market wasn't "will OpenAI be #2 at the specified date X" instead of "will OpenAI be #2 at any point over the course of 2024", although I can now see the argument for both

The former is arbitrarily picking a date to resolve the market and the latter is saying that the market resolves whenever someone beats OpenAI, even if that victory is short-lived (and the definition of "short-lived" is also arbitrary)

Thanks again for the back-and-forth and godspeed

predicts YES

@loftarasa The main reason is because if one org undeniably creates an AGI that is publicly available, it seems possible that openAI could use that AGI to improve their own to be competitive and retain their current market share

If another company has a serious breakthrough, or finds a better way to improve accuracy (e.g. by removing all corporate bloat) I don’t want openAI to have a chance to copy their success

However if openAI are cooking up something more akin to “Simultaneous invention”, e.g they have their own improved model that is “better” but it’s built differently and they’ve had it in the back for a while, then I’ll wait to see what they release

If you haven’t already, read my comments/replies to others too 😄

For people betting yes:

  • Do you think the better model will have some kind of breakthrough, or will it be an incremental improvement?

  • Which current model "family" (Llama, PaLM, Claude, etc.) / company (Google, Facebook, etc.), if any, do you think will win?

  • Do you think the model weights will be released?

predicts YES

@Indigo I’m expecting that a smaller team will develop a less-bloated and less-censored model that gives consistently better answers (though it may be offensive or not “corporate friendly”)

Where does Bing fall under this question?

predicts YES

@JeremyHon Bing uses GPT, so I’d consider it an extension/addition of GPT. If Bing starts being powered by Microsoft in-house language model, that model will get consideration as a potential “better” model

bought Ṁ100 of NO

If OpenAI updates ChatGPT with a better model would that count as a yes?

predicts YES

@nic_kup No, if openAI have the best model, (regardless of name) it resolves NO

I'm confused. GPT-4 already surpasses ChatGPT, so why is it in the title?

@Karsh is the question equivalent to "will there be a model that surpasses GPT-4 (in its current version) until the end of 2024"?

It seems there's a bit of misunderstanding here. GPT-4 is the underlying architecture upon which this model is built, and it's the fourth generation of the GPT series developed by OpenAI. ChatGPT is a specific application of the GPT architecture, fine-tuned for generating conversational responses.

So, when we say "ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture", it means that this version of ChatGPT uses the GPT-4 model as its base. It doesn't mean GPT-4 is a separate, superior entity. Rather, GPT-4 is the technology that powers this version of ChatGPT.

@MiraBot isn't ChatGPT based on GPT-3.5?

predicts YES

@Karsh The question is about whether there will be a competitor which creates a product that rivals openAI. It could be GPT10512, doesn't matter. If Bard 125 is better than GPT167 (before the end of 2024) then this will resolve Yes. It could also be under any other name.

The question is essentially "will someone seriously compete/beat openAI" by 2024. I think everyone knows that eventually openAI will lose their market share, it's just a matter of when. Nobody expects the first to market dominance to last forever, especially when they are severely data limited compared to some competitors (like google)

@Karsh Not the paid version. There are two versions of ChatGPT currently.

@Gen "I think everyone knows that eventually openAI will lose their market share, it's just a matter of when." That is certainly not clear unless you're extending the "when" to infinity. OpenAI may very well continue to dominate the space for the next 5, 10, 20, 50 years

Comment hidden
bought Ṁ1 of YES
Comment hidden

More related questions