On 8/1/2023 I will use the best available paid model of midjourney (top paid or best plan available under $100/month) to run this prompt 5 times "a picture of a large overgrown concrete building with a large neon sign that says Pizza on top"
This will produce 20 images (5 mosaics of 4)
If the word Pizza is spelled correctly in at least 10/20 of the resulting images, the claim resolves YES. If not, it's NO. If the claim is tested successfully before this final test, it can immediately resolve YES, but I won't test it more than once per day.
Below is a sample of the images for that prompt today, 3/16/2023
Only the lower right image from the last of the 5 would count, so in total today's score would be 1/20.
Better start talking about seeds now, to avoid cherry picking. Let's use seed 20230801, 20230802, and perhaps some preselected from your submissions in comments here? We should lock that in. It won't damage credibility since its very likely that seed interpretation will totally change between midjourney versions. That's something we can verify as well to eliminate claims of bias.
@StrayClimb I like how it's kind of groping its way towards signs that literally say "pizza on top".
Hmm. If an image contains multiple attempted instances of the word "pizza," and some but not all are spelled correctly, how does that image get counted for purposes of the resolution criteria?
@NLeseul I'll attempt to identify the largest sign on top of the building and if that is spelled right, it counts.
@StrayClimb Makes sense to me.
How would you adjudicate that on the first image above, out of curiosity? The most prominent sign in the composition seems to be the big "PIZZZA" in the middle, but the "∩IZ∩" at the bottom is larger, and the signs closest to the top of the building are the two small pink ones.
@NLeseul true. The first image is tough. The neon sign on the top is spelled right, and is the highest but not the largest.
Thinking about the exact claim and prompt definition perhaps we can key in on the clearly highest text and use that to judge?
Note that deep Floyd destroys this test and would greatly surpass 50%. So I'm expecting that there will a clear yes or no. That said I'd like to work with participants to find an agreeable way to resolve.
FYI: Stable Diffusion XL on Clipdrop is already doing it right most of the time:
Consider voting UP my Midjourney stock!
@StrayClimb for this one I just prompted 'neon sign saying pizza' anyway it's getting there with the text. I expect Midjourney to do even better soon
If the word Pizza is spelled correctly in at least 10/20 of the resulting images, the claim resolves YES.
@firstuserhere the market resolves true a version of mj which succeeds at least 10/20 times comes out. My reference to 1/20 was a revision of the description of the current state of the art, based on the images in the description - 1/20 of those would be considered a success today.