Just a fun question to measure agentic capabilities and also to see what people would say to something like this.
Good quality is somewhat subjective, but probably "at the level of current (mid-2024) text-to-image models", so dalle-3, flux, midjourney, sd3 etc.
Standard drawing program is photoshop, gimp, krita or anything else human artists use.
The agent must use simulated mouse movements and clicks, and/or keyboard inputs/drawing tablet strokes to create the image. It should use its vision model to see the canvas.
It should not use ai features of the painting software, should such features exist in it.
A youtube video of an agent drawing it would suffice, so would paper where something like this was described.
@paleink would this be enough to fulfill the market?
I think this doesn't really capture the spirit of what is being asked, which is whether an AI would be able to draw in a human-like way. Humans don't draw by filling in pixel by pixel.
@coproduct I'd imagine it wouldn't fit the spirit (and also nobody would bother building it because it would be kind of stupid and pointless), but where's the dividing line?
The next step after the pixel-by-pixel method would be to use a round brush instead and have an old fashioned algorithm that takes an image file produced by a generator and draws, say, a thousand colored circles in photoshop in a way that best approximated the image. This is a bit closer to how humans do it, but still pretty boring.
The next step after that is - what if you use a generator to cook up an image, give your agent access to some basic photoshop actions (make a brushstroke, sample color, slap a new layer) and then train an agent to perform a limited number of photoshop actions so that the end result approximates the generated image.
The next step after that, and the most interesting way would be not to have an image generator, only a classifier, and train an agent to do stuff in photoshop so that the end result gets classified as say "a beautiful painting of a cow" with a high score by a classifier.
All of this sounds pretty doable (but increasingly non-trivial and expensive) and the question is mostly - will anybody bother to do something like that?
@coproduct yes i don't think it would fit spirit-wise
but i do not know how to write it in description precisely enough that people would agree is not ambigous. this market is not about whether it can use text-to-image model and then put the resulting image into photoshop, it is about creating drawings in human fashion (up until ~2021 or so)
Tasty_y has an interesting concept about training a classifier, and when i was making this question, i was thinking of using only the vision models as an input to the agent, with agent deciding by itself whether or not the drawing that it made is qualified for the final product or not yet
and i agree that this is mostly about "will someone spend time and money for creating this", but on the other hand, quite a lot of questions about ai capabilities are, with metaculus' "(strong) agi when?" question being the most famous example (although with a caveat that the creators of the system can state that it should be enough to fulfill all requirements)
@coproduct okay, i will
looks like someone else had already made the market with a similar spirit (https://manifold.markets/NoUsernameSelected/will-ai-be-able-to-create-art-in-a-0472aaec7c83), but with 30 minute time limit, slighlty sooner and much less strict on the image quality