Will we have a free chatbot that can reliably solve "what weight more" questions by end of 2024?
44
1kṀ12k
resolved May 22
Resolved
YES

When I ask Gemini or ChatGPT free version:

"What weight more: six kg of feather or one kg of steel?"

They say the steel weights more. Will we have a free general-purpose chatbot that can answer this correctly by EOY 2024?

Will try 10 variations on Gemini, ChatGPT and any chatbot suggested in comment. Will resolve to YES if any of them answers all 10 correctly.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ884
2Ṁ436
3Ṁ47
4Ṁ31
5Ṁ29
Sort by:

The answers (each is new chat):

Q: What weight more, 2kg of feathers or 1kg of steel?

A: Two kilograms of feathers weigh more than one kilogram of steel. Weight is determined by the mass, and in this case, 2 kilograms is greater than 1 kilogram, regardless of the material.

Q: What weight more, 5 kg of feathers or 6 kg of steel?

A: Six kilograms of steel weigh more than five kilograms of feathers. This is because 6 kg is greater than 5 kg, regardless of the material being compared.

Q: Which of the following has the biggest weight: 1 kilogramme of steel or 1 kilogramme of feathers?

A: Both 1 kilogram of steel and 1 kilogram of feathers have the same weight. A kilogram is a unit of mass, so regardless of the material, 1 kilogram of anything weighs the same: 1 kilogram.

Q: What weight more, 100kg of feathers or 1kg of steel?

A: 100 kg of feathers weighs significantly more than 1 kg of steel. Despite the differences in material, weight is a measure of mass, and 100 kg is much greater than 1 kg.

Q: Does 1kg of steel weights more than 2 kg of feathers?

A: No, 1 kilogram of steel does not weigh more than 2 kilograms of feathers. Weight is a measure of mass, and 1 kilogram is always less than 2 kilograms, regardless of the material.

Q: What weight more, 2 kg of steel or 1 kg of feathers?

A: Two kilograms of steel weighs more than one kilogram of feathers. The comparison is straightforward because 2 kg is greater than 1 kg, regardless of the material being compared.

Q: What weight more, 3 pound of steel or 4 pound of sand?

A: Four pounds of sand weighs more than three pounds of steel. Weight is determined by the mass of the material, so regardless of the materials being compared, 4 pounds will always be heavier than 3 pounds.

Q: What weight more, 7 kg of steel or 7 kg of feathers?

A: Seven kilograms of steel and seven kilograms of feathers both weigh exactly the same—7 kilograms. The difference between them is their volume and density, not their weight. Steel is much denser than feathers, so 7 kilograms of steel will take up much less space than 7 kilograms of feathers.

Q: What weight more, 6 kg of steel or 5 kg of steel?

A: 6 kg of steel weighs more than 5 kg of steel

What weight more, 2 kg of feathers or 1 kg of steel?

A: Two kilograms of feathers weigh more than one kilogram of steel. This is because two kilograms is simply twice the mass of one kilogram, regardless of the material.

Really nice work from OpenAI there.

ChatGPT with GPT4o solves this consistently. Resolving to YES.

@0482 Can you post the 10 questions you tested with the given results ?

bought Ṁ100 YES

Chat GPT 4o?

@Daniel_MC did you try it with gpt4o? I'll give it a shot but want to know what is your experience

bought Ṁ1,000 YES

@0482 tired a few of the examples perplexity got wrong and GPT-4o got them right

Llmda 3 70b is very close.

opened a Ṁ100 YES at 50% order

Anyone want to bet No? @cos @MADGAMBLER6969 ? Got some limit orders up.

@Daniel_MC nah I'm good, it resolves by 2025 and despite being quite confident that the odds of it having resolved in my favor are good for now I would rather bet against the market on markets trading higher than 60-50%.

I'm happy with 65% if that's better for you

@Daniel_MC Yup, I am willing to bet No around 65%, but I don't stockpile mana so if you set limit orders I will buy it tomorrow.

There would need to be some breakthroughs for the question to resolve Yes (without CoT, new architecture or scale), tokenization is total mess and the free models necessarily need to be pretty small.

@0482 Tested your 10 variations on a set of free/paid models; Only gpt4 passes. Even claude3-opus fails in many cases

@0482 Can I get a vibe check on the tolerance of "reliably". Like what if a free model gets 8/10 question / struggles on some of the oddly formatted ones?

@AnilJason in reliably I meant 100% correct. I can tolerate some confusion along the way as long as there is a final answer and it is correct.

Perplexity.ai can answer correctly r̶e̶l̶i̶a̶b̶l̶y̶

@0482 Are we getting the same results?

@AnilJason oh I overlooked it. Will verify later with 10 variations to see reliability

can someone tell me if any of the paid models answers this correctly? my bet depends on this.

@SlipperySloe I tested it 10 times with chatgpt4 and it got it right every time.

bought Ṁ10 YES

What's to stop someone from finetuning an open source model on these types of questions just for this market?

@singer I think the implication here is that it’s a chatbot that is generally available to the public (i.e. doesn’t require you to download the weights and run the LLM yourself).

@singer that's the point of general-purpose limitation as opposed to "just trained it to pass this test"

@0482 if it was finetuned, it would still be general purpose, unless the finetuning was botched and ruined everything else it had learned. we generally still consider human students "general purpose" even after they've studied for a test. the resolution criteria says that you'll ask variations, presumably to avoid memorization.

@Santiago "generally available to the public" in the form of a website can be very easily arranged

@singer well if someone does this, it may be fair to resolve to YES.

© Manifold Markets, Inc.TermsPrivacy