Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2024?

Question is about any models by competitors vs any current or future openAI models.

To surpass chatGPT, it cannot just be more popular. If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win regardless of popularity (provided, at least some members of the public have access).

If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will decide the winner. There must be public access for it to be eligible.

Explaining the three main metrics for assessment:

1. Popularity and Accessibility

Assessed in real-terms. Total users and engagement (if public), otherwise I will defer to google trends or other publicly available data which can provide relative measures of popularity.

This metric is arguably most important, but is not the only factor as if an Android or iPhone AI gets pushed to existing OS, the existing assistant leaders in those areas (e.g. Siri, Google Assistant) will have a meaningful advantage, irrespective of product quality.

The popularity and userbase elements are essentially being used as a substitute for "usefulness".

2. Accuracy and Reliability

Assessed against a common set of questions or data, to be determined. I expect there will be studies or other academic materials published which I hope to refer to. If these papers conclude that "ExampleRacistBot" is more accurate than chatGPT because of censoring, etc. then "ExampleRacistBot" would be the more accurate model.

3. Power & Capability

Total computational ability, based on whichever metrics seem most relevant at the time. Right now those might be:

Number of Parameters
Tokens/words processed
Problem-solving abilities
Trainability (an example of an ongoing trainability exercise with GPT4: /Mira/will-a-prompt-that-enables-gpt4-to

If there are any recommendations for Power & Capability metrics, please let me know. The biggest issue I see is that if I establish which metrics are most important (say; number of parameters, like everyone did for GPT3) we may end up in a situation like we are now, where GPT4 parameters are not disclosed. Expect this criteria to change over time.

Resolution Assessment

When the final assessment is being made, if there aren't reliable competitive grounds for comparison, the following series of checks are how I see the decisionmaking process going.

First check: Popularity. If one model is evidently significantly more popular (like >50% market share, or more popular by Google Trend metrics, etc.), then it will be considered the best unless there is evidence suggesting an alternative public AI is more accurate and powerful. If multiple models are within a similar frame of power and accuracy, popularity will determine the winner.

Second check: If there are multiple models of similar popularity, accuracy and power will determine the winner. If there are no good academic comparisons for accuracy, I will endeavour to conduct my own, but I'm really hoping someone else figures that out before I have to. Accuracy seems more tied to "usefulness" than power, but if there is a significant breakthrough in power such that one has a capability advantage over the other (like advanced logic problem solving) while maintaining accuracy, and for whatever reason it's not more popular but still public, that will win.

Third check: If there are multiple models of comparable popularity (or there is terrible data available) and there is no real clear difference in the capability, power, or accuracy - the decision will be deferred to more specific considerations, like a Google Trends comparison (relative search popularity) or the number of parameters (provided this is public, and still a respected metric)

If the decision becomes highly-subjective, I will defer judgement to someone I deem to be an expert (and have reasonable access to) who will make the final call. I'll probably just email professors until I get a response, or ask someone enrolled in a related course at the time to ask their professor to respond.

[ Taking advice for updates or any proposed criteria changes ]

Competing for @AmmonLam's subsidy

[Changelog]

01/05/2023: Description updated to reflect thoughts expanded on in comments

#	Name	Total profit
1		Ṁ6,825
2		Ṁ5,425
3		Ṁ4,326
4		Ṁ2,562
5		Ṁ2,364

1. Popularity and Accessibility

2. Accuracy and Reliability

3. Power & Capability

🏅 Top traders

People are also trading

Related questions