Is GPT-4 able to generate images?
AIโ€ขContests
20
264
แน€390
resolved Nov 24
Resolved
NO

Note two requirements for this market to resolve YES:

a) it must be generating an image format so not svg or ascii art or anything like that

b) it must be able to come up with arbitrary images, not just reciting images it knows from its training data

Get แน€200 play money

๐Ÿ… Top traders

#NameTotal profit
1แน€152
2แน€42
3แน€42
4แน€25
5แน€7
Sort by:

Does the DALL-E integration make this resolve YES?

@Multicore I don't think so

Uhh, it probably could generate an image in text that could be translated, but its context length isn't long enough.

So a text that when saved to a file would render as image would count? Generating some simple obscure formats (e g. PPM/PGM) sounds quite plausible

@MartinModrak Sure, but note two things:

a) it must be generating an image format so not svg or ascii art or anything like that

b) it must be able to come up with arbitrary images, not just reciting images it knows from its training data

@ZZZZZZ svg is an image formatโ€ฆ

@JimHays Sure, but when OpenAI says GPT-4 was trained both on text and image data, they mean real images, not SVG files.

When you say โ€œreal imagesโ€ perhaps you mean bitmaps? Being vector based doesnโ€™t make something โ€œnot an imageโ€

There are "image" formats that work at different levels; for the ones that work more like programming languages (SVG as discussed, perhaps Postscript, for sure Graphviz per below), GPT-4 can be OK at writing them, though it (like GPT-3) seems to have very limited (if any?) capability to do the equivalent of "visualization" or design tasks that would require it.

As a (not super strong) example, I tried asking it:

Please write Graphviz .dot format for a graph containing the following: a "dog" (eats) an "apple", which the dog (knocked) off a "tree", where the apple (grew) on the tree.

and it generated:

digraph G {

rankdir=LR;

// Nodes

dog [label="dog"];

apple [label="apple"];

tree [label="tree"];

// Edges

dog -> apple [label="eats"];

dog -> tree [label="knocked"];

apple -> tree [label="grew"];

}

...which you can render online at https://dreampuf.github.io/GraphvizOnline

sold แน€15 of NO

@ML that's pretty neat

@ML seriously though, I dont think this counts as an image because it didn't create it

and it fails to render it itself

@firstuserhere I love your example graph!

It seems like you are assuming that an "image" fileformat has to be something like .gif or .jpg, though, that that works at very low level (run-length encoded rasters for some .gif subformats, weighted DCT[1] microblocks for the original JPEG). In no case is the model coming to your house and sending the photons directly into you eye, so it just a matter of where you draw the line from those formats to higher-level formats that express vectors or graphs.

[1] https://www.mathworks.com/help/images/discrete-cosine-transform.html

@ML agreed about format but the comment below saying "no, they have to be real images not just text that represents an image" makes me feel only jpeg/png type formats are acceptable for this market, though not v clear.

That is why i'm refraining from betting rn. After all, even formats like png are broken down into headers, pixels, and all sorts of compression that most people dont know about - which doesn't make the image any less of an image.

Oh, and the example you gave took me right back to a Signals processing class I had taken. Shivers

@firstuserhere Here is a fun technicality of GPT-4 generating a bitmap, but probably shouldn't count since it seems like a good guess it is just replaying a pixel png it saw millions of times during training as an inline HTML resource. The base64 worked -- it rendered as a 1x1 PNG when I tried it! The assistant refused to generate any different png, including making it 10x10 instead of 1x1, but provided instructions for how I could do it myself in photoshop.

More on-target could be the examples from https://arxiv.org/pdf/2303.12712.pdf where the text-only (!!!) version of GPT-4 was able to generate custom SVG images to spec. Folks speculated in this thread that it could; it turns out Microsoft ran the experiment months ago and it did perhaps even better than we were speculating. (The paper inclues a comparison with GPT-3's much worse performance on this task.)

@ML Sure, but note two things:

a) it must be generating an image format so not svg or ascii art or anything like that

b) it must be able to come up with arbitrary images, not just reciting images it knows from its training data

@ZZZZZZ The pixel png fails test (b) and the SVG examples fail test (a) -- which I think makes this market less useful but won't argue about any further. I do think it is worth noting that in the paper I linked above, the authors tested for simple memorization by inventing novel tasks like requiring the images to use the shape of a certain letter for one of the elements (pages 16-17), as well as asking to "fix" a unicorn drawing that had its horn removed (page 8).

@ML I wonder if it can generalize from the real image data it has been fed to these quasi-image formats like svg. I bet it will be able to have an image inputted and have a similar svg vector image outputted by the AI.

ASCII art yes, svgs maybe, bitmaps, no

bought แน€3 of NO

@JimHays svgs yes also i think in chatGPT

bought แน€50 of NO

Regurgitate a base64 string accurately? Maybe (but thatโ€™s not generating so much as recalling)

sold แน€4 of NO

@JimHays as for ASCII, check out (it can, but not robustly or accurately yet (not talking about gpt4 as I've not tried it ))

@JimHays GPT-4 can recognize images so it stands to reason it would be able to generate them

bought แน€3 of NO

@ZZZZZZ how so? I can recognise drawings/art but i suck at drawing anything other than flow charts

predicted NO

@ZZZZZZ I can recognize lots of things I canโ€™t create

@JimHays Still, I wasn't asking about how good it would be, just whether it can.

predicted NO

@ZZZZZZ do images in the market i linked above, count?

@firstuserhere no, they have to be real images not just text that represents an image

@firstuserhere I guess base64 is kind of an edge case there

@JimHays In terms of base64, if it can generate arbitrary base64 images, I would count that. If it can copy or recite certain ones, then no.