What capabilities can I elicit from the "im-a-good-gpt2-chatbot" models while they are available? [Add answers]
14
1kṀ984
resolved Sep 19
Resolved
YES
The model will suggest a bet on a market on Manifold that ultimately ends up being profitable
Resolved
NO
The model will crack a joke that I actually find funny
Resolved
NO
The model will do better on a coding task of my choice than Claude 3 Opus
Resolved
NO
The model will correctly predict Bitcoin price within +-5000 on April 30, 2024 based on articles from March 2024
Resolved
NO
The model will write a correct mathematical proof in Lean that validates with no errors on the first try.
Resolved
NO
The model will suggest a CSS stylesheet for Manifold that you will judge nicer than the default appearance (if the model is told CSS class names)
Resolved
NO
The model will correctly identify that it is gpt2-chatbot when told about the release

Two new models called "im-a-good-gpt-chatbot" and "im-also-a-good-gpt-chatbot" appeared on Chatbot Arena today: https://chat.lmsys.org/

If you create an answer in this market, I will attempt to get the model to do whatever it says. The model gets one single prompt (with potentially many messages therein) and I will try to put in somewhere around 5 tries to get the model to attempt the task before resolving no.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ130
2Ṁ15
3Ṁ12
4Ṁ11
5Ṁ6
Sort by:

@SaviorofPlant what did the stylesheet the model suggested look like?

bought Ṁ10 NO

FWIW if you don't want to install Lean locally (which can be a bit of a pain in the ass), you can paste the generated proof into https://live.lean-lang.org/ instead

@duck_master I found this, not sure if it's correct: https://leanprover.github.io/tutorial/

@SaviorofPlant That might also work. Haven't tried it though

@duck_master They give different errors, I'll give the model the benefit of the doubt and resolve YES if it can generate code either of them validates

Was initially impressed here - the response was better structured and seemed like it was using better coding practices. It was only when I tried to run the code that I discovered that the model had ignored details in the prompt.

bought Ṁ10 NO

😐

sold Ṁ10 NO

@SaviorofPlant Joke's on me...

© Manifold Markets, Inc.TermsPrivacy