Will an EU-based AI lab announce a chatbot of GPT-3.5's capabilities in 2023?

1kṀ5287

resolved Dec 13

Resolved

YES

ALL

Will resolve positively if a company registered in an EU country announces a self-developed LLM-based chatbot of roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.

Europe

Language Models

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ147
2		Ṁ145
3		Ṁ128
4		Ṁ125
5		Ṁ122

People are also trading

Will it be public knowledge by EOY 2025 that a major AI lab believed to have created AGI internally before October 2023?

5% chance

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

Sort by:

predictedYES

Haven't done much research, might be convinced otherwise. Looks like Mistral's new model is better than GPT-3.5 https://community.openai.com/t/mistral-medium-versus-gpt-3-5-turbo/556898

predictedNO

@konstan I would unresolve and wait for clear results, e.g. until it appears here: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

predictedNO

This has some more benchmark data: https://huggingface.co/blog/mixtral

predictedNO

Looks like it resolves YES based on MMLU and MT-Bench, so nevermind my earlier comment!

Does London based Stability count as EU?

predictedYES

@CromlynGames no. London is not part of the EU anymore, unfortunately.

“roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.”

So which is it? These bots all have different capabilities. GPT has the best logic/reasoning, Claude is best at creative tasks, and Bard is significantly worse than the other two.

And if the model is private, made before 2023, but gets released after, how does it resolve? Luminous-world is a potential candidate for this

predictedNO

@ShadowyZephyr Either counts

predictedNO

@ShadowyZephyr Known as of dec 31

@konstan And what about the first question? WHAT capabilities, exactly? If you're talking about any benchmark, luminous-supreme should be sufficient to resolve this to YES, because the company behind it published benchmarks showing it matching text-davinci-003 (similar performance to ChatGPT) although I personally find it way less competent than these.

predictedNO

@ShadowyZephyr I think it's going to be a mix of benchmarks and me just reading what others think about the models

predictedNO

@ShadowyZephyr If you're not convinced by luminous-supreme, I likely won't be either.

predictedNO

@Tegwick I mean developing the base model from scratch, just like Google's Bard or OpenAI's GPT-3. Finetuning does not count.

What do you mean by "modified engines with training"?

@KonstantinPilz by the engine I mean the code for algorithms of training the model and of retrieving and tuning requests, as well as context setting, API design etc without actual weights. And I would find building from scratch hard to define still. My best effort to make building from scratch a decidable criterion is having a code base which does not reuse stuff from another model but even then you may or may not allow use of common libraries. And if it is fully or in part the same algorithms would one call it "developed from scratch" because the code istl different?

predictedNO

@Tegwick Fair!
Let's say developed from scratch means they trained it. Fair to use already-existing code (though, afaik, neither Google nor OpenAI nor Anthropic has published it). It's about whether or not they are capable of a large-scale, long training run required to obtain one of those models.

@Tegwick if this includes retraining modified open source (right now that mostly means LLaMa derived) models, I think it's substantially undervalued.

Can you confirm that a research institution doing that would qualify?

Relevant google leak: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

@JustNo above comment/question intended for @Konstantin

predictedNO

@JustNo I'm interested in whether an EU-based lab has the ability to train something like GPT-3.5 from scratch. So no, this doesn't count (and afaik LLaMA-based models aren't at 3.5's capabilities yet)

@konstan Vicuna is quite close. I personally would put it under gpt-3.5-turbo, but there are some questions that it's better at.

What are companies or research facilities that might do that? What candidates are there? What does self developed mean? I guess using an existing model/engine and training it does not count. But what about modified engines with training. There is no specific criterion that I can easily use to decide how self-developed should be defined.

Britain is not eu

People are also trading

Will it be public knowledge by EOY 2025 that a major AI lab believed to have created AGI internally before October 2023?

5% chance

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

18% chance

🏅 Top traders

People are also trading

People are also trading

Related questions