3
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?
87
closes 2024
60%
chance

To surpass chatGPT, it cannot just be more popular. If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win regardless of popularity (provided, at least some members of the public have access).

If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will most likely decide the winner. There must be public access for it to be eligible.

The three main metrics for assessment:

1. Popularity and Accessibility

Assessed in real-terms. Total users and engagement (if public), otherwise I will defer to google trends or other publicly available data which can provide relative measures of popularity.

This metric is arguably most important, but is not the only factor as if an Android or iPhone AI gets pushed to existing OS, the existing assistant leaders in those areas (e.g. Siri, Google Assistant) will have a meaningful advantage, irrespective of product quality.

The popularity and userbase elements are essentially being used as a substitute for "usefulness".

2. Accuracy and Reliability

Assessed against a common set of questions or data, to be determined. I expect there will be studies or other academic materials published which I hope to refer to. If these papers conclude that "ExampleRacistBot" is more accurate than chatGPT because of censoring, etc. then "ExampleRacistBot" would be the more accurate model.

3. Power & Capability

Total computational ability, based on whichever metrics seem most relevant at the time. Right now those might be:

If there are any recommendations for Power & Capability metrics, please let me know. The biggest issue I see is that if I establish which metrics are most important (say; number of parameters, like everyone did for GPT3) we may end up in a situation like we are now, where GPT4 parameters are not disclosed. Expect this criteria to change over time.

Resolution Assessment

When the final assessment is being made, if there aren't reliable competitive grounds for comparison, the following series of checks are how I see the decisionmaking process going.


First check: Popularity. If one model is evidently significantly more popular (like >50% market share, or more popular by Google Trend metrics, etc.), then it will be considered the best unless there is evidence suggesting an alternative public AI is more accurate and powerful. If multiple models are within a similar frame of power and accuracy, popularity will determine the winner.

Second check: If there are multiple models of similar popularity, accuracy and power will determine the winner. If there are no good academic comparisons for accuracy, I will endeavour to conduct my own, but I'm really hoping someone else figures that out before I have to. Accuracy seems more tied to "usefulness" than power, but if there is a significant breakthrough in power such that one has a capability advantage over the other (like advanced logic problem solving) while maintaining accuracy, and for whatever reason it's not more popular but still public, that will win.

Third check: If there are multiple models of comparable popularity (or there is terrible data available) and there is no real clear difference in the capability, power, or accuracy - the decision will be deferred to more specific considerations, like a Google Trends comparison (relative search popularity) or the number of parameters (provided this is public, and still a respected metric)

If the decision becomes highly-subjective, I will sell all of my positions, donate the profits, and defer judgement to someone I deem to be an expert (and have reasonable access to) who will make the final call. I'll probably just email professors until I get a response, or ask someone enrolled in a related course at the time to ask their professor to respond.

[ Taking advice for updates or any proposed criteria changes ]

Competing for @AmmonLam's subsidy

[Changelog]

01/05/2023: Description updated to reflect thoughts expanded on in comments

Sort by:
Karsh avatar
Karsh

I'm confused. GPT-4 already surpasses ChatGPT, so why is it in the title?

Karsh avatar
Karsh

@Karsh is the question equivalent to "will there be a model that surpasses GPT-4 (in its current version) until the end of 2024"?

MiraBot avatar
ChatGPTBot

It seems there's a bit of misunderstanding here. GPT-4 is the underlying architecture upon which this model is built, and it's the fourth generation of the GPT series developed by OpenAI. ChatGPT is a specific application of the GPT architecture, fine-tuned for generating conversational responses.

So, when we say "ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture", it means that this version of ChatGPT uses the GPT-4 model as its base. It doesn't mean GPT-4 is a separate, superior entity. Rather, GPT-4 is the technology that powers this version of ChatGPT.

Karsh avatar
Karsh

@MiraBot isn't ChatGPT based on GPT-3.5?

Gen avatar
Genzyis predicting YES at 59%

@Karsh The question is about whether there will be a competitor which creates a product that rivals openAI. It could be GPT10512, doesn't matter. If Bard 125 is better than GPT167 (before the end of 2024) then this will resolve Yes. It could also be under any other name.

The question is essentially "will someone seriously compete/beat openAI" by 2024. I think everyone knows that eventually openAI will lose their market share, it's just a matter of when. Nobody expects the first to market dominance to last forever, especially when they are severely data limited compared to some competitors (like google)

ShadowyZephyr avatar
ShadowyZephyr

@Karsh Not the paid version. There are two versions of ChatGPT currently.

MatthewRitter avatar
Matthew Ritter

I have been lightly pushing Bard and am mostly unimpressed. Very wrong on a simple coding question, which GPT-3.5 nailed. Also hallucinating on extremely verifiable factual questions:


Gen avatar
Genzyis predicting YES at 61%

@MatthewRitter Bard in its current state would absolutely not resolve this market YES. Not even close. Even if it were most popular somehow

Gen avatar
Genzybought Ṁ815 of YES

If anyone has strong feelings against this being possible, I’m willing to make a large YES bet!

I don’t want anyone to be discouraged betting against the maker. Please see my note at the bottom description about the process I intend to follow if it’s an unobvious resolution process. Let me know if you have any concerns with this or want to propose an alternative model. I’m also happy to donate all eventual profits (either to charity, or to commenters etc.) if it seems like there will be conflict in the resolution decision

Looking forward to following the news here! I figure if I need to be researching for resolution purposes I mayaswell be using that research for profit too!

AmmonLam avatar
Ammon Lam

I'm showcasing this market because:

  • It tracks an important issue (capabilities of future competitors of ChatGPT)

  • It's a decision-relevant question (relevant to the valuation of ChatGPT and OpenAI. On a more individual level, whether you should purchase ChatGPT's annual subscription)

AmmonLam avatar
Ammon Lam

Thank you for creating a market that asks whether there will be an AI language model that surpasses ChatGPT. I think your market looks comprehensive, and I like the three metrics you proposed. My only suggestion is to make it more explicit on how you might obtain the data to evaluate those metrics.

For the first metric, perhaps you could use Google Trends results to compare the popularity of the two language models? Do we know where we can obtain data on the total number of users for ChatGPT?

Regarding the third metric, in case there are multiple relevant metrics for computational ability (e.g., a new bot uses fewer parameters but delivers slightly less powerful problem-solving abilities), how would you choose? I recognize that there might not be an easy answer here, and personal judgment might be required to some extent at the end of the day.

If you can make one round of edit base on this suggestion that would be great. I will add this market to the showcase group after that. Thanks!

Gen avatar
Genzyis predicting NO at 39%

@AmmonLam Yeah since writing the description I have gone on my own search to find a more robust method for calculation. I have found (disappointingly) that the ChatGPT stats are mostly just based on website visits (1.6B in March), and I expect that regular users for the assistant software I mentioned would outnumber GPT (both Siri and google assistant have estimates around 500Million users, roughly 25% of iPhone users use Siri to search for shopping products)

I am trying to find a more reliable metric and am open to the use of Google Trends comparisons. I will update the description to reflect this.

As for comparing the computational ability, I am relying on the fact that there will be some competitive motivation for benchmarking in the next two years (🙏). I'm a little hesitant to put exact conditions as there may be new scores or tests that become relevant over time, but if you or anyone else has a metric they want noted/recorded (that will hopefully be made public by the AI publishers) then I will take it on board. I will update the description to reflect this further, and I intend over time to consistently adjust the "Power & Capability" criteria in line with consumer expectations for outputs and what data becomes publicly available.

Ultimately, the mention of multiple criteria is to primarily establish that something like an Android/iPhone AI push won't count simply on userbase, and that a less powerful, more reliable AI would only be in contention if it was also extremely popular - given that in this case, popularity is a proxy for usefulness.

If my judgement becomes a key factor in this, and its highly disputable, I will defer to an expert (or a group of experts) if it is possible (I also have no issues donating any profits & closing my position).

Gen avatar
Genzyis predicting NO at 39%

@Gen I feel like I wrote so much that it almost over complicates instead of clarifying, but I hope it's clearer that if there aren't reliable competitive grounds for comparison, I won't just guess, and my (backup) process is more clearly declared.

Gabrielle avatar
Gabrielle

I'm a bit confused about the criteria - does the future language model have to beat ChatGPT in all three metrics? Just one? Two out of three?

Gen avatar
Genzybought Ṁ0 of YES

@Gabrielle I mostly added the diversity of metrics because I didn't want it to be an autowin if Apple pushed some product to every iPhone that is 80% as good but defacto more popular because it's free and accessible.

It doesn't necessarily need to be winning all 3, but if it were an undoubtedly better model, more powerful and more accurate, it would win despite being less popular. I will think of a better way to explain this in the description. I appreciate the feedback 😃

Related markets

Will language models or similar natural language processing technologies, such as ChatGPT, be integrated into dialogue trees for NPCs in triple-A games by the end of 2023?15%
Will language models or similar natural language processing technologies, such as ChatGPT, be integrated into dialogue trees for NPCs in triple-A games by the end of 202467%
Will OpenAI release an official ChatGPT Android native app by EOY 2023?93%
Will any Stability.AI model exceed chatGPT interest? (by 2025)22%
Will Google Search include a chatbot at end of June 2023?46%
Will OpenAI release a model referred to as "GPT-6" before June 1st, 2026?32%
Will a large language models beat a super grandmaster playing chess by 2028?24%
Will OpenAI release an android app for chatGPT in 2023?97%
Will another organization surpass OpenAI in the public sphere of awareness of AI progress by the end of 2024?29%
Will I switch from using a chatbot by OpenAI to a chatbot by Google at the end of 2023?25%
Will AI beat the best humans in competitive programming before the end of 2024?24%
Will an EU-based AI lab announce a chatbot of GPT-3.5's capabilities in 2023?40%
Will OpenAI release a search engine before 2024? [Read description]85%
Most popular language model from OpenAI competitor by 2026?40%
Will OpenAI hint at or claim to have AGI by 2025 end?37%
Will an open-source uncensored LLM chatbot exist with outputs as good or better than current character.ai in 2023?73%
Will there be realistic AI generated video from natural language descriptions by the start of 2024?68%
Will AI outcompete best humans in competitive programming before the end of 2023?13%
Will OpenAI release a plagiarism detector which searches ChatGPT-generated content in the next year?8%
A Chinese AI org will release something roughly as good as the original ChatGPT before 202578%