e.g. WinoGrande >= 87.5%
GPT4 but 98% cheaper: "Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost." https://arxiv.org/abs/2305.05176
@b9cd It has nothing to do with the question. It's not even an LLM! It's almost like saying that I can access GPT-4 API from my phone.
@qumeric LLM cascade is LLM. LLM with prompt adaptation is LLM. LLM that uses some amount of stored answers from another LLM or LLM fine tuned on results of other LLM is still LLM. You can use this tricks to adjust quality of your model while having less parameters, right? I'm not saying this is a direct solution to question, but it definetely seems like related research
@b9cd I mean it's not a new LLM, it's a technique which uses existing LLMs.
I don't see how cost savings (which is the most impressive part) are very relevant here, it doesn't really change anything regarding models run on RTX 3090.
Things like CoT or even LLM cascade may be relevant, but I am not sure. Would this question resolve as yes if we will find some way to e.g. augment prompts which will make LLaMA-30B as capable as GPT-4 without prompt augmentation?
I agree it's somewhat related research, but seems like weak evidence to me. Interesting paper nonetheless.
@WieDan Gpt-3 and Gpt-4 is 2.5 years away. Also, even gpt-3.5-turbo is 30 times cheaper per token than Gpt-4.
@qumeric This progress was made in a space for 5 weeks. We're at the start of a cambrian explosion here.
@WieDan Alphabet had GPT-3 level models in early 2021, possibly even 2020. Alphabet still has no GPT-4 level model and is probably not going to have it this year, especially prior to the resolution date. Today is the Google I/O though, so there is a chance I will be surprised but it's thin and even if it will happen, it will still only increase my odds here from 2% to perhaps 5%.
Nobody except OpenAI has GPT-4 level models, closed or open source, run on 3090 or a huge cluster.
What progress do you mean really? The main improvement is llama.cpp and similar stuff, it's just a bunch of optimization tricks which already mostly reached their limits. LLaMA itself is not even that great model, it's a bit worse than gpt-3.5-turbo IMO.
It is likely that GPT-4 level models in the next few years are simply not going to fit into 24GB even with 4bit quantization. The absolute limit is around 40B parameters, maybe 50B.
@qumeric regarding LLaMA quality you're probably right, I also saw this small set of logic related questions https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=719051075 where all popular open source LLMs fail compared to ChatGPT as demonstation of that
@cherrvak I assume "run on a 3090" means "run on a consumer PC with a single 3090 and a cpu, primarily using the gpu"