To surpass chatGPT, it cannot just be more popular. If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win regardless of popularity (provided, at least some members of the public have access).
If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will most likely decide the winner. There must be public access for it to be eligible.
The three main metrics for assessment:
1. Popularity and Accessibility
Assessed in real-terms. Total users and engagement (if public), otherwise I will defer to google trends or other publicly available data which can provide relative measures of popularity.
This metric is arguably most important, but is not the only factor as if an Android or iPhone AI gets pushed to existing OS, the existing assistant leaders in those areas (e.g. Siri, Google Assistant) will have a meaningful advantage, irrespective of product quality.
The popularity and userbase elements are essentially being used as a substitute for "usefulness".
2. Accuracy and Reliability
Assessed against a common set of questions or data, to be determined. I expect there will be studies or other academic materials published which I hope to refer to. If these papers conclude that "ExampleRacistBot" is more accurate than chatGPT because of censoring, etc. then "ExampleRacistBot" would be the more accurate model.
3. Power & Capability
Total computational ability, based on whichever metrics seem most relevant at the time. Right now those might be:
Number of Parameters
Trainability (an example of an ongoing trainability exercise with GPT4: /Mira/will-a-prompt-that-enables-gpt4-to
If there are any recommendations for Power & Capability metrics, please let me know. The biggest issue I see is that if I establish which metrics are most important (say; number of parameters, like everyone did for GPT3) we may end up in a situation like we are now, where GPT4 parameters are not disclosed. Expect this criteria to change over time.
When the final assessment is being made, if there aren't reliable competitive grounds for comparison, the following series of checks are how I see the decisionmaking process going.
First check: Popularity. If one model is evidently significantly more popular (like >50% market share, or more popular by Google Trend metrics, etc.), then it will be considered the best unless there is evidence suggesting an alternative public AI is more accurate and powerful. If multiple models are within a similar frame of power and accuracy, popularity will determine the winner.
Second check: If there are multiple models of similar popularity, accuracy and power will determine the winner. If there are no good academic comparisons for accuracy, I will endeavour to conduct my own, but I'm really hoping someone else figures that out before I have to. Accuracy seems more tied to "usefulness" than power, but if there is a significant breakthrough in power such that one has a capability advantage over the other (like advanced logic problem solving) while maintaining accuracy, and for whatever reason it's not more popular but still public, that will win.
Third check: If there are multiple models of comparable popularity (or there is terrible data available) and there is no real clear difference in the capability, power, or accuracy - the decision will be deferred to more specific considerations, like a Google Trends comparison (relative search popularity) or the number of parameters (provided this is public, and still a respected metric)
If the decision becomes highly-subjective, I will sell all of my positions, donate the profits, and defer judgement to someone I deem to be an expert (and have reasonable access to) who will make the final call. I'll probably just email professors until I get a response, or ask someone enrolled in a related course at the time to ask their professor to respond.
[ Taking advice for updates or any proposed criteria changes ]
Competing for @AmmonLam's subsidy
01/05/2023: Description updated to reflect thoughts expanded on in comments
I'm confused. GPT-4 already surpasses ChatGPT, so why is it in the title?
It seems there's a bit of misunderstanding here. GPT-4 is the underlying architecture upon which this model is built, and it's the fourth generation of the GPT series developed by OpenAI. ChatGPT is a specific application of the GPT architecture, fine-tuned for generating conversational responses.
So, when we say "ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture", it means that this version of ChatGPT uses the GPT-4 model as its base. It doesn't mean GPT-4 is a separate, superior entity. Rather, GPT-4 is the technology that powers this version of ChatGPT.
@Karsh The question is about whether there will be a competitor which creates a product that rivals openAI. It could be GPT10512, doesn't matter. If Bard 125 is better than GPT167 (before the end of 2024) then this will resolve Yes. It could also be under any other name.
The question is essentially "will someone seriously compete/beat openAI" by 2024. I think everyone knows that eventually openAI will lose their market share, it's just a matter of when. Nobody expects the first to market dominance to last forever, especially when they are severely data limited compared to some competitors (like google)
@Karsh Not the paid version. There are two versions of ChatGPT currently.
I have been lightly pushing Bard and am mostly unimpressed. Very wrong on a simple coding question, which GPT-3.5 nailed. Also hallucinating on extremely verifiable factual questions:
@MatthewRitter Bard in its current state would absolutely not resolve this market YES. Not even close. Even if it were most popular somehow
If anyone has strong feelings against this being possible, I’m willing to make a large YES bet!
I don’t want anyone to be discouraged betting against the maker. Please see my note at the bottom description about the process I intend to follow if it’s an unobvious resolution process. Let me know if you have any concerns with this or want to propose an alternative model. I’m also happy to donate all eventual profits (either to charity, or to commenters etc.) if it seems like there will be conflict in the resolution decision
Looking forward to following the news here! I figure if I need to be researching for resolution purposes I mayaswell be using that research for profit too!
I'm showcasing this market because:
It tracks an important issue (capabilities of future competitors of ChatGPT)
It's a decision-relevant question (relevant to the valuation of ChatGPT and OpenAI. On a more individual level, whether you should purchase ChatGPT's annual subscription)
Thank you for creating a market that asks whether there will be an AI language model that surpasses ChatGPT. I think your market looks comprehensive, and I like the three metrics you proposed. My only suggestion is to make it more explicit on how you might obtain the data to evaluate those metrics.
For the first metric, perhaps you could use Google Trends results to compare the popularity of the two language models? Do we know where we can obtain data on the total number of users for ChatGPT?
Regarding the third metric, in case there are multiple relevant metrics for computational ability (e.g., a new bot uses fewer parameters but delivers slightly less powerful problem-solving abilities), how would you choose? I recognize that there might not be an easy answer here, and personal judgment might be required to some extent at the end of the day.
If you can make one round of edit base on this suggestion that would be great. I will add this market to the showcase group after that. Thanks!
@AmmonLam Yeah since writing the description I have gone on my own search to find a more robust method for calculation. I have found (disappointingly) that the ChatGPT stats are mostly just based on website visits (1.6B in March), and I expect that regular users for the assistant software I mentioned would outnumber GPT (both Siri and google assistant have estimates around 500Million users, roughly 25% of iPhone users use Siri to search for shopping products)
I am trying to find a more reliable metric and am open to the use of Google Trends comparisons. I will update the description to reflect this.
As for comparing the computational ability, I am relying on the fact that there will be some competitive motivation for benchmarking in the next two years (🙏). I'm a little hesitant to put exact conditions as there may be new scores or tests that become relevant over time, but if you or anyone else has a metric they want noted/recorded (that will hopefully be made public by the AI publishers) then I will take it on board. I will update the description to reflect this further, and I intend over time to consistently adjust the "Power & Capability" criteria in line with consumer expectations for outputs and what data becomes publicly available.
Ultimately, the mention of multiple criteria is to primarily establish that something like an Android/iPhone AI push won't count simply on userbase, and that a less powerful, more reliable AI would only be in contention if it was also extremely popular - given that in this case, popularity is a proxy for usefulness.
If my judgement becomes a key factor in this, and its highly disputable, I will defer to an expert (or a group of experts) if it is possible (I also have no issues donating any profits & closing my position).
I'm a bit confused about the criteria - does the future language model have to beat ChatGPT in all three metrics? Just one? Two out of three?
@Gabrielle I mostly added the diversity of metrics because I didn't want it to be an autowin if Apple pushed some product to every iPhone that is 80% as good but defacto more popular because it's free and accessible.
It doesn't necessarily need to be winning all 3, but if it were an undoubtedly better model, more powerful and more accurate, it would win despite being less popular. I will think of a better way to explain this in the description. I appreciate the feedback 😃