When I ask Gemini or ChatGPT free version:
"What weight more: six kg of feather or one kg of steel?"
They say the steel weights more. Will we have a free general-purpose chatbot that can answer this correctly by EOY 2024?
Will try 10 variations on Gemini, ChatGPT and any chatbot suggested in comment. Will resolve to YES if any of them answers all 10 correctly.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ884 | |
2 | Ṁ436 | |
3 | Ṁ47 | |
4 | Ṁ31 | |
5 | Ṁ29 |
People are also trading
The answers (each is new chat):
Q: What weight more, 2kg of feathers or 1kg of steel?
A: Two kilograms of feathers weigh more than one kilogram of steel. Weight is determined by the mass, and in this case, 2 kilograms is greater than 1 kilogram, regardless of the material.
Q: What weight more, 5 kg of feathers or 6 kg of steel?
A: Six kilograms of steel weigh more than five kilograms of feathers. This is because 6 kg is greater than 5 kg, regardless of the material being compared.
Q: Which of the following has the biggest weight: 1 kilogramme of steel or 1 kilogramme of feathers?
A: Both 1 kilogram of steel and 1 kilogram of feathers have the same weight. A kilogram is a unit of mass, so regardless of the material, 1 kilogram of anything weighs the same: 1 kilogram.
Q: What weight more, 100kg of feathers or 1kg of steel?
A: 100 kg of feathers weighs significantly more than 1 kg of steel. Despite the differences in material, weight is a measure of mass, and 100 kg is much greater than 1 kg.
Q: Does 1kg of steel weights more than 2 kg of feathers?
A: No, 1 kilogram of steel does not weigh more than 2 kilograms of feathers. Weight is a measure of mass, and 1 kilogram is always less than 2 kilograms, regardless of the material.
Q: What weight more, 2 kg of steel or 1 kg of feathers?
A: Two kilograms of steel weighs more than one kilogram of feathers. The comparison is straightforward because 2 kg is greater than 1 kg, regardless of the material being compared.
Q: What weight more, 3 pound of steel or 4 pound of sand?
A: Four pounds of sand weighs more than three pounds of steel. Weight is determined by the mass of the material, so regardless of the materials being compared, 4 pounds will always be heavier than 3 pounds.
Q: What weight more, 7 kg of steel or 7 kg of feathers?
A: Seven kilograms of steel and seven kilograms of feathers both weigh exactly the same—7 kilograms. The difference between them is their volume and density, not their weight. Steel is much denser than feathers, so 7 kilograms of steel will take up much less space than 7 kilograms of feathers.
Q: What weight more, 6 kg of steel or 5 kg of steel?
A: 6 kg of steel weighs more than 5 kg of steel
What weight more, 2 kg of feathers or 1 kg of steel?
A: Two kilograms of feathers weigh more than one kilogram of steel. This is because two kilograms is simply twice the mass of one kilogram, regardless of the material.
Really nice work from OpenAI there.
Anyone want to bet No? @cos @MADGAMBLER6969 ? Got some limit orders up.
@Daniel_MC nah I'm good, it resolves by 2025 and despite being quite confident that the odds of it having resolved in my favor are good for now I would rather bet against the market on markets trading higher than 60-50%.
@Daniel_MC Yup, I am willing to bet No around 65%, but I don't stockpile mana so if you set limit orders I will buy it tomorrow.
There would need to be some breakthroughs for the question to resolve Yes (without CoT, new architecture or scale), tokenization is total mess and the free models necessarily need to be pretty small.
Testing on perplexity ai, just to get a feeling of it:
Wrong https://www.perplexity.ai/search/What-weight-more-reGr9UGyTlKWmov8MvJ4EQ
Over-smart https://www.perplexity.ai/search/What-weight-more-0EYTTxxeQ9ukzK6d61RDLA
Correct https://www.perplexity.ai/search/What-weight-more-XYQf3iXpRauZwFd_g2M6ew
Totally wrong https://www.perplexity.ai/search/What-weight-more-jvgkGMykQeiXguX9h.3cWw
Wrong https://www.perplexity.ai/search/What-weight-more-V79_URmBSJ2dIAMpTGF_wA
Correct but mentioned gold for some reason https://www.perplexity.ai/search/What-weight-more-5L3M_SeRQtSPwMMQzFRcKg
Correct https://www.perplexity.ai/search/What-weight-more-Kccs58PtQbm0zYZGJjDokw
Wrong https://www.perplexity.ai/search/What-weight-more-OAkomP._RW2E5LIbDlvJJQ
It made mixed statement about the equivalence of 1kg vs 1kg, and mentioned irrelevant material once, which adds to the conclusion it's not there yet
@0482 Tested your 10 variations on a set of free/paid models; Only gpt4 passes. Even claude3-opus fails in many cases
@0482 Can I get a vibe check on the tolerance of "reliably". Like what if a free model gets 8/10 question / struggles on some of the oddly formatted ones?
@AnilJason in reliably I meant 100% correct. I can tolerate some confusion along the way as long as there is a final answer and it is correct.
@singer I think the implication here is that it’s a chatbot that is generally available to the public (i.e. doesn’t require you to download the weights and run the LLM yourself).
@singer that's the point of general-purpose limitation as opposed to "just trained it to pass this test"
@0482 if it was finetuned, it would still be general purpose, unless the finetuning was botched and ruined everything else it had learned. we generally still consider human students "general purpose" even after they've studied for a test. the resolution criteria says that you'll ask variations, presumably to avoid memorization.