prompt:
"A frog riding on top of a bird"
I will run the prompt 10x. If it can produce it correctly 50% of the time, this market will resolve to YES, otherwise NO.
Unfortunately it's not easy to rigorously define judgment, so I will use my own observation to decide if the picture really shows a frog riding a bird. I will also take inputs from @StrayClimb prior to resolution.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ85 | |
2 | Ṁ25 | |
3 | Ṁ2 | |
4 | Ṁ1 | |
5 | Ṁ1 |
People are also trading
Very good success rate
attempt 1: at least 2 are easily correct out of the 4 generated. Arguably 3.
Attempt 2 : img 1 ,2, and 4 are correct, 3rd is not.

Attempt 3: img 1,2,3 are correct, 4 is also arguably correct.
Attempt 4: gets it in img 1, 2, and 3
The rest of the attempts are left as an exercise to the reader.
@chrisjbillington second and third attempts, this time with the exact prompt as specified in this market. 11/12 correct so far.


@chrisjbillington 23/24 correct in 6 fresh bing chat sessions generating 4 images per prompting.
@firstuserhere There is this:
to be clear yall it’s not 100% accurate every time, obviously selecting some of the best. the point is that all of this is possible within 2-3 generations due to gpt-4 pair prompting with you until it gets it right.
which to me sounds like a pretty good hit rate, but it's not super clear.
Cool market. I could use a lot more mj or dalle3 tests. Can it add, etc. You can also use it to generate observations such as by having it generate say 100 men and 100 women and then seeing if the height averages match up to human distributions. And then do the same thing for "person in 1870". What I'm actually saying is that the best image generation actually requires full AGI at the limit