Current image models are terrible at this. Whenever I first get access to DALL-E 3, I will give it 10 basic numerical prompts. In at least 90% of those prompts, at least 50% of the images must have the correct number of objects.
If one object is partially cut off by the edge of the image, it counts towards the total as long as I can clearly tell that it's the correct type of object. If not all of the objects are clearly visible, I'll count it if it seems like something a reasonable human artist would be likely to do from the same image description. It's ok if it gets other details of the image wrong, it only needs to get the number of objects correct.
Here are the specific prompts:
9 balls lying on a field
A flock of 40 pigeons
25 eggs in a kitchen
A 7-armed starfish
A football team of 11 players
4 light bulbs on the ceiling
A pack of 8 wolves
A houseplant with exactly 30 leaves
A rectangular grid of 100 dots
An archipelago of 15 islands
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ144 | |
2 | Ṁ104 | |
3 | Ṁ52 | |
4 | Ṁ48 | |
5 | Ṁ26 |
People are also trading
That's more than 40.

@firstuserhere For high counts of people, will DALLE-3 be able to keep them from morphing together?
@firstuserhere As high as I can before it becomes difficult to tell how many objects are actually there.