Will an EU-based AI lab announce a chatbot of GPT-3.5's capabilities in 2023?
closes Dec 31

Will resolve positively if a company registered in an EU country announces a self-developed LLM-based chatbot of roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.

ShadowyZephyr avatar

“roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.”

So which is it? These bots all have different capabilities. GPT has the best logic/reasoning, Claude is best at creative tasks, and Bard is significantly worse than the other two.

And if the model is private, made before 2023, but gets released after, how does it resolve? Luminous-world is a potential candidate for this

5 replies
konstan avatar
Konstantinpredicts NO

@ShadowyZephyr Either counts

konstan avatar
Konstantinpredicts NO

@ShadowyZephyr Known as of dec 31

ShadowyZephyr avatar

@konstan And what about the first question? WHAT capabilities, exactly? If you're talking about any benchmark, luminous-supreme should be sufficient to resolve this to YES, because the company behind it published benchmarks showing it matching text-davinci-003 (similar performance to ChatGPT) although I personally find it way less competent than these.

konstan avatar
Konstantinpredicts NO

@ShadowyZephyr I think it's going to be a mix of benchmarks and me just reading what others think about the models

konstan avatar
Konstantinpredicts NO

@ShadowyZephyr If you're not convinced by luminous-supreme, I likely won't be either.

konstan avatar
Konstantinpredicts NO

@Tegwick I mean developing the base model from scratch, just like Google's Bard or OpenAI's GPT-3. Finetuning does not count.

What do you mean by "modified engines with training"?

6 replies
Tegwick avatar

@KonstantinPilz by the engine I mean the code for algorithms of training the model and of retrieving and tuning requests, as well as context setting, API design etc without actual weights. And I would find building from scratch hard to define still. My best effort to make building from scratch a decidable criterion is having a code base which does not reuse stuff from another model but even then you may or may not allow use of common libraries. And if it is fully or in part the same algorithms would one call it "developed from scratch" because the code istl different?

konstan avatar
Konstantinpredicts NO

@Tegwick Fair!
Let's say developed from scratch means they trained it. Fair to use already-existing code (though, afaik, neither Google nor OpenAI nor Anthropic has published it). It's about whether or not they are capable of a large-scale, long training run required to obtain one of those models.

JustNo avatar

@Tegwick if this includes retraining modified open source (right now that mostly means LLaMa derived) models, I think it's substantially undervalued.

Can you confirm that a research institution doing that would qualify?

Relevant google leak: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

JustNo avatar

@JustNo above comment/question intended for @Konstantin

konstan avatar
Konstantinpredicts NO

@JustNo I'm interested in whether an EU-based lab has the ability to train something like GPT-3.5 from scratch. So no, this doesn't count (and afaik LLaMA-based models aren't at 3.5's capabilities yet)

ShadowyZephyr avatar

@konstan Vicuna is quite close. I personally would put it under gpt-3.5-turbo, but there are some questions that it's better at.

Tegwick avatar

What are companies or research facilities that might do that? What candidates are there? What does self developed mean? I guess using an existing model/engine and training it does not count. But what about modified engines with training. There is no specific criterion that I can easily use to decide how self-developed should be defined.

MarkIngraham avatar
Mark Ingrahambought Ṁ5 of NO

Britain is not eu