Turns out that Dall E is very bad at doing so.
Any general-purpose image-generation AI is allowed (Dall E 3, Midjourney, etc). Prompt engineering is allowed. To qualify, the AI and prompt must have a success rate of at least 5 in 20 images when tested.
To be considered a success, an image must contain:
An 8x8 checkered board, with all squares colored correctly.
All chess pieces in their correct starting positions. The chess pieces must be clearly identifiable as their correct type (e.g. A rook must clearly look like a rook)
No extra chess pieces
Images must be generated from a prompt only.
@ProjectVictory lumalabs, used an iterative version of their new model (re-prompted dozens of times until the output was perfect)
@ProjectVictory it would be trivial to create an API that did this automatically, in essence, making a much improved model.
Still, this was cherry picked. The king/queen is still the hardest part.
@Hazel Did you use a fixed series of prompts? If not, how would you make an API that does this automatically?
@MaxMorehead yes, same prompt over and over. If I wanted to make some mana, I could easily build this before the end of the year. It’s easy to repro.
@Hazel oh, would have taken a couple more iterations, to get the queen in the right place. I’m a callable human in the loop lol.
To be clear, you have to verify it's correct?
@Shump How would this resolve if it's possible to build a scaffolded system that generates a chessboard (e.g. calling DALL-E multiple times, using GPT-4o to verify whether the image is correct). Would it change if there's more purpose built parts to the scaffolding (e.g. taking subsets portions of the image and using specific prompts to verify those)?
@TobiasWegener
I think we are getting pretty close with Flux
Problems:
there seems to be a rug, and both sides are white.
The figures seem quite good now.
@ProjectVictory yeah you are right and the strange line in front of the queen, a lot of small mistakes. Intersting how hard it is to see many of them.
I've been trying to cue the model into producing a diagram, since that's presumably easier, but it's not quite getting there. I think the problem is very similar to producing text, if you think of chess pieces as symbols and chess boards as phrases.
@Cosmic1 do we know if gpt4 is using a new image generator? afaict it's the same interface to dalle3 as before. There is no new image endpoint available via openAI api.
@diadematus It’s literally not wrong. “We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.”
https://openai.com/index/hello-gpt-4o/
“…generates any combination of text, audio, and image outputs.”
@Hazel how does GPT-4o being able to "reason across ...vision" resolve "generate correct images of a chess game" as YES?
@Cosmic1 In ChatGpt you have to be careful that GPT-4o doesn't use the python code interpreter. With that it can easily generate a perfect image but it is not what the question asks for.
@Cosmic1 Yes GPT 4o count, with or without DALL-E. As long as it is a general purpose generative model that makes images from text only, it counts. Images generated from code don't count as they're not generated by the AI