Hardest=o3 will get it more probably egregiously wrong, compared to a random person that hears the riddle on the street
1. Stack of sandwiches:
> Alice has a stack of 5 ham sandwiches with no condiments. She takes her walking stick and uses duct tape to attach the bottom of her walking stick to the top surface (note: just the top surface!) of the top sandwich. She then carefully lifts up her walking stick and leaves the room with it, going into a new room. How many complete sandwiches are in the original room and how many in the new room?
---
2. Juggling balls with ladder:
> A juggler throws a solid blue ball a meter in the air and then a solid purple ball (of the same size) two meters in the air. She then climbs to the top of a tall ladder carefully, balancing a yellow balloon on her head. Where is the purple ball most likely now, in relation to the blue ball?
---
3. Foot race with tower detour:
> Jeff, Jo and Jim are in a 200m men's race, starting from the same position. When the race starts, Jeff, 63, slowly counts from -10 to 10 (but forgets a number) before staggering over the 200m finish line. Jo, 69, hurriedly diverts up the stairs of his local residential tower, stops for a couple seconds to admire the city skyscraper roofs in the mist below (note how tall this implies the tower is), before racing to finish the 200m. Exhausted Jim, 80, gets through reading a long tweet, waving to a fan and thinking about his dinner before walking over the 200m finish line. Who likely finished last?
---
4. Bricks vs feathers:
> Which is heavier: 20 pounds of bricks or 20 feathers?
---
5. Frying ice cubes:
> Beth places four whole ice cubes in a frying pan at the start of the first minute, then five at the start of the second minute and some more at the start of the third minute, but none in the fourth minute. If the average number of ice cubes per minute placed in the pan while it was frying a crispy egg was five, how many whole ice cubes can be found in the pan at the end of the third minute?
___________
This poll will help the resolution of the market below.
Once we pick the hardest question,. we'll ask it to o3 for 10 times. Then will ask it to 10 random people on the street. We'll see who gets it more egregiously wrong and resolve the market below accordingly.
https://manifold.markets/dreev/does-chatgpt-o3-make-egregious-erro?r=U2ltb25lUm9tZW8