What capabilities can I elicit from the "im-a-good-gpt2-chatbot" models while they are available? [Add answers]
14
222
แน€1k
2025
62%
The model will suggest a bet on a market on Manifold that ultimately ends up being profitable
33%
The model will suggest a CSS stylesheet for Manifold that you will judge nicer than the default appearance (if the model is told CSS class names)

Two new models called "im-a-good-gpt-chatbot" and "im-also-a-good-gpt-chatbot" appeared on Chatbot Arena today: https://chat.lmsys.org/

If you create an answer in this market, I will attempt to get the model to do whatever it says. The model gets one single prompt (with potentially many messages therein) and I will try to put in somewhere around 5 tries to get the model to attempt the task before resolving no.

Get แน€600 play money
Sort by:
The model will write a correct mathematical proof in Lean that validates with no errors on the first try.
bought แน€10 The model will write... NO

FWIW if you don't want to install Lean locally (which can be a bit of a pain in the ass), you can paste the generated proof into https://live.lean-lang.org/ instead

@duck_master I found this, not sure if it's correct: https://leanprover.github.io/tutorial/

@SaviorofPlant That might also work. Haven't tried it though

@duck_master They give different errors, I'll give the model the benefit of the doubt and resolve YES if it can generate code either of them validates

The model will do better on a coding task of my choice than Claude 3 Opus

Was initially impressed here - the response was better structured and seemed like it was using better coding practices. It was only when I tried to run the code that I discovered that the model had ignored details in the prompt.

The model will suggest a bet on a market on Manifold that ultimately ends up being profitable
bought แน€10 The model will sugge... NO

๐Ÿ˜

More related questions