Will an EU-based AI lab announce a chatbot of GPT-3.5's capabilities in 2023?
Basic
57
5.3k
resolved Dec 13
Resolved
YES

Will resolve positively if a company registered in an EU country announces a self-developed LLM-based chatbot of roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ147
2Ṁ145
3Ṁ128
4Ṁ125
5Ṁ122
Sort by:
predicted YES

Haven't done much research, might be convinced otherwise. Looks like Mistral's new model is better than GPT-3.5 https://community.openai.com/t/mistral-medium-versus-gpt-3-5-turbo/556898

predicted NO

@konstan I would unresolve and wait for clear results, e.g. until it appears here: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

predicted NO

This has some more benchmark data: https://huggingface.co/blog/mixtral

predicted NO

Looks like it resolves YES based on MMLU and MT-Bench, so nevermind my earlier comment!

Does London based Stability count as EU?

predicted YES

@CromlynGames no. London is not part of the EU anymore, unfortunately.

“roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.”

So which is it? These bots all have different capabilities. GPT has the best logic/reasoning, Claude is best at creative tasks, and Bard is significantly worse than the other two.

And if the model is private, made before 2023, but gets released after, how does it resolve? Luminous-world is a potential candidate for this

predicted NO

@ShadowyZephyr Either counts

predicted NO

@ShadowyZephyr Known as of dec 31

@konstan And what about the first question? WHAT capabilities, exactly? If you're talking about any benchmark, luminous-supreme should be sufficient to resolve this to YES, because the company behind it published benchmarks showing it matching text-davinci-003 (similar performance to ChatGPT) although I personally find it way less competent than these.

predicted NO

@ShadowyZephyr I think it's going to be a mix of benchmarks and me just reading what others think about the models

predicted NO

@ShadowyZephyr If you're not convinced by luminous-supreme, I likely won't be either.

predicted NO

@Tegwick I mean developing the base model from scratch, just like Google's Bard or OpenAI's GPT-3. Finetuning does not count.

What do you mean by "modified engines with training"?

@KonstantinPilz by the engine I mean the code for algorithms of training the model and of retrieving and tuning requests, as well as context setting, API design etc without actual weights. And I would find building from scratch hard to define still. My best effort to make building from scratch a decidable criterion is having a code base which does not reuse stuff from another model but even then you may or may not allow use of common libraries. And if it is fully or in part the same algorithms would one call it "developed from scratch" because the code istl different?

predicted NO

@Tegwick Fair!
Let's say developed from scratch means they trained it. Fair to use already-existing code (though, afaik, neither Google nor OpenAI nor Anthropic has published it). It's about whether or not they are capable of a large-scale, long training run required to obtain one of those models.

@Tegwick if this includes retraining modified open source (right now that mostly means LLaMa derived) models, I think it's substantially undervalued.

Can you confirm that a research institution doing that would qualify?

Relevant google leak: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

@JustNo above comment/question intended for @Konstantin

predicted NO

@JustNo I'm interested in whether an EU-based lab has the ability to train something like GPT-3.5 from scratch. So no, this doesn't count (and afaik LLaMA-based models aren't at 3.5's capabilities yet)

@konstan Vicuna is quite close. I personally would put it under gpt-3.5-turbo, but there are some questions that it's better at.

What are companies or research facilities that might do that? What candidates are there? What does self developed mean? I guess using an existing model/engine and training it does not count. But what about modified engines with training. There is no specific criterion that I can easily use to decide how self-developed should be defined.

Britain is not eu

More related questions