Inspired by these questions:
/sylv/an-llm-as-capable-as-gpt4-will-run-f290970e1a03
/sylv/an-llm-as-capable-as-gpt4-runs-on-o
See also: /singer/will-an-llm-better-than-gpt35-run-o
Resolution criteria (provisional):
My question differs in these aspects from those linked above:
It needs to run on my own rtx 3090.
Instead of WSC, I'll use the leaderboard from Chatbot Arena.
It has to run faster than 2 tokens per second.
Any model released by OpenAI with a version prefix starting with 4 counts as "gpt4". To be eligible, the contestant model needs to beat all of them.
It has to run just on the 3090, without storing any weights in RAM or VRAM from other graphics cards.
If at any time during this year such a model succeeds at the above, this question resolves YES. Otherwise NO.
If circumstances don't permit me to test the model on my own hardware, then (and only then) I'll defer to another Manifold user who can run it. If my rtx 3090 breaks or becomes inoperable, I'll also defer to another Manifold user.
Update:
For a quantized model to be eligible, it cannot differ more than 2% from the original model's score on the Winograd Schema Challenge.