
Will resolve positively if a company registered in an EU country announces a self-developed LLM-based chatbot of roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.
Related questions

“roughly the capabilities of ChatGPT, 2022 version (OpenAI), Bard (Google), or Claude (Anthropic) by the end of 2023.”
So which is it? These bots all have different capabilities. GPT has the best logic/reasoning, Claude is best at creative tasks, and Bard is significantly worse than the other two.
And if the model is private, made before 2023, but gets released after, how does it resolve? Luminous-world is a potential candidate for this



@konstan And what about the first question? WHAT capabilities, exactly? If you're talking about any benchmark, luminous-supreme should be sufficient to resolve this to YES, because the company behind it published benchmarks showing it matching text-davinci-003 (similar performance to ChatGPT) although I personally find it way less competent than these.

@ShadowyZephyr I think it's going to be a mix of benchmarks and me just reading what others think about the models

@ShadowyZephyr If you're not convinced by luminous-supreme, I likely won't be either.

@Tegwick I mean developing the base model from scratch, just like Google's Bard or OpenAI's GPT-3. Finetuning does not count.
What do you mean by "modified engines with training"?
@KonstantinPilz by the engine I mean the code for algorithms of training the model and of retrieving and tuning requests, as well as context setting, API design etc without actual weights. And I would find building from scratch hard to define still. My best effort to make building from scratch a decidable criterion is having a code base which does not reuse stuff from another model but even then you may or may not allow use of common libraries. And if it is fully or in part the same algorithms would one call it "developed from scratch" because the code istl different?

@Tegwick Fair!
Let's say developed from scratch means they trained it. Fair to use already-existing code (though, afaik, neither Google nor OpenAI nor Anthropic has published it). It's about whether or not they are capable of a large-scale, long training run required to obtain one of those models.

@Tegwick if this includes retraining modified open source (right now that mostly means LLaMa derived) models, I think it's substantially undervalued.
Can you confirm that a research institution doing that would qualify?
Relevant google leak: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

@JustNo I'm interested in whether an EU-based lab has the ability to train something like GPT-3.5 from scratch. So no, this doesn't count (and afaik LLaMA-based models aren't at 3.5's capabilities yet)

@konstan Vicuna is quite close. I personally would put it under gpt-3.5-turbo, but there are some questions that it's better at.
What are companies or research facilities that might do that? What candidates are there? What does self developed mean? I guess using an existing model/engine and training it does not count. But what about modified engines with training. There is no specific criterion that I can easily use to decide how self-developed should be defined.






















