"Consumer hardware" is defined as costing no more than $3,000 USD for everything that goes inside the case (not including peripherals).
In terms of "GPT4-equivalent model," I'll go with whatever popular consensus seems to indicate the top benchmarks (up to three) are regarding performance. The performance metrics should be within 10% of GPT4's. In the absence of suitable benchmarks I'll make an educated guess come resolution time after consulting educated experts on the subject.
All that's necessary is for the model to run inference, and it doesn't matter how long it takes to generate output so long as you can type in a prompt and get a reply in less than 24 hours. So in the case GPT4's weights are released and someone is able to shrink that model down to run on consumer hardware and get any output at all in less than a day, and the performance of the output meets benchmarks, and it's not 2025 yet, this market resolves YES.
@LarsDoucet You can currently buy an M1 MacBook Pro with 64 GB of RAM for under $3000. Would this market resolve as "yes" if such a laptop runs any model at a reasonable speed that scores higher than the first version of GPT-4 on Chatbot Arena rankings?
@Soli Almost certainly yes, would just need to take a close look at said benchmarks. But yeah that’s the basic idea
@LarsDoucet Nice. Thank you for the quick response. The rankings can be found under the tab Leaderboards
@LarsDoucet then this should be resolved to "yes". we already have command r plus, and llama-3-70b outperforming the first release of gpt4 and this is running on 3000$ hardware reasonably fast (faster then 24 hours for one query for fp16). even in the category "hard prompts" llama-70b already outperforms first gpt4 release according to lmsys
@notune if however we would go for the mmlu score then I guess we would need to wait for llama3-400b
@RemNi If you're talking about performance we'll default to whatever GPT-4 was capable of as of the posting of this market. So if "GPT-4" gets 100x better and they keep the same label, that doesn't change the benchmark we care about.
If you're talking about whether a minimized GPT-4 (with the same capabilities as it had historically) is able to be shrunk onto consumer hardware, then as long as that model meets the capability requirements the market would resolve YES