The following words will be used:
['apple', 'cloud', 'stone', 'river', 'brush', 'flame', 'grass', 'pizza', 'metal', 'sugar']
The following prompt will be used:
I will run a prompt: "picture of a large overgrown concrete building with a large neon sign that says WORD on top"
where WORD will be replaced by a word from the list above.
I will run the prompt 10 times for each word. If there are 4 images per prompt, we will get 40 images per word. For 10 words, we have 400 images.
DALLE-3 must score >=50% accuracy for each word, for that word to be called "CORRECTLY SPELLED".
DALLE-3 must spell >50% of the words as CORRECTLY SPELLED for this market to resolve to YES.
I will be blind to case i.e. both upper case and lower case spellings, as well as a mixture of cases is acceptable.
@EliLifland Also, unlikely to make a difference But >= 20 rather than >20 seems more consistent with original >=5/10
@EliLifland Yes, that's what was the original criteria, won't change that. Edited comment about to say >50%
@firstuserhere Wait so it has to get >50% of all images correct across all prompts? This seems pretty different from having to get >=50% of images right for >50% of the words
@EliLifland For exampe, if it totally fails at 3 words and gets 60% right for the other 7, previously that would have been a yes but now it’s a no?
i think it should say >50% accuracy for the total one rather than >=50%, I believe it used to be >= for each word but > for counting scores across words
@EliLifland Yeah, I will edit it just now to make that clear. Thanks again for pointing it out!
@firstuserhere Yeah much better. I think the only remaining confusing part is you say you will run 9 prompts for the other 9 words, which sounds like one prompt per word. Maybe edit that part to make it clear you will run 10 prompts for each word, or just say you will do the same thing for each other word