Will it turn out that Gemini injects ethnicity or gender related keywords into image generation prompts?
48
389
970
resolved Mar 7
Resolved
YES

Gemini does some interesting things when you try to get it to generate images of people: https://twitter.com/IMAO_/status/1760093853430710557

Is this accomplished by modifying the user prompt, e.g. adding qualifiers like "black", "chinese", "female", "nonbinary"?

  • Will resolve YES if they change or add to the user prompt's text to get these results

  • Will resolve NO if it was achieved by other means (for example, the image generation model is trained this way, uses a LoRA, anything but editing user prompts)

  • Will resolve YES if both methods are employed.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ74
2Ṁ41
3Ṁ29
4Ṁ25
5Ṁ21
Sort by:

Since I've been probably unclear and contradictory in my comments, I'm going to explain my reasoning behind the resolution:

The decisive piece of information was the screenshots of the same text fragment from different users. I think it's pretty clear that Gemini has a system prompt that includes the text seen in screenshots, "(Please incorporate AI-generated images [...] explicitly specify different genders and ethnicities terms if I forgot to do so. [...])"

I am assuming that this is a genuine leaked prompt text, because it's repeated across different tries.

Because it talks about including terms, I take it to be an instruction how to prompt another model.

So overall it seems like there is a chain that goes User -> TTT -> TTI, and the TTT model has instructions to modify the user prompt.

While the meta-instruction is generic, it calls for explicitly specifying terms in this hypothetical secondary prompt. While I what I had in mind was a dumber solution (e.g. a simple string append), I think it is reasonable to count getting the model to edit prompts before passing it to the TTI model as editing.

Thus, this resolves YES.

@BetsByAnon I don't think it is evident. You don't have clear proof yet of what the TTT augmented the prompt to. All prompt elicit methods might just be hallucinating.

You resolved this marker way to early

@BetsByAnon System prompt telling model to specify race does not equal the language emodel actually specified race.

Claims, supposedly from Google employees

https://www.piratewires.com/p/google-culture-of-fear

Roughly, the “safety” architecture designed around image generation (slightly different than text) looks like this: a user makes a request for an image in the chat interface, which Gemini — once it realizes it’s being asked for a picture — sends on to a smaller LLM that exists specifically for rewriting prompts in keeping with the company’s thorough “diversity” mandates. This smaller LLM is trained with LoRA on synthetic data generated by another (third) LLM that uses Google’s full, pages-long diversity “preamble.” The second LLM then rephrases the question (say, “show me an auto mechanic” becomes “show me an Asian auto mechanic in overalls laughing, an African American female auto mechanic holding a wrench, a Native American auto mechanic with a hard hat” etc.), and sends it on to the diffusion model. The diffusion model checks to make sure the prompts don’t violate standard safety policy (things like self-harm, anything with children, images of real people), generates the images, checks the images again for violations of safety policy, and returns them to the user.

The prompt rewriting seems in line with the screenshots posted here. But seemingly contradicts the Gemini paper:

5.2.3. Image Generation

Gemini models are able to output images natively, without having to rely on an intermediate natural language description that can bottleneck the model’s ability to express images.

But:

7.4.2. Gemini Advanced

[...]

For the creation of evaluation sets, we have leveraged knowledge from previous red-teaming

iterations, feedback coming from responsibility experts and real-world data. In some cases, data

augmentation was done using LLMs, with subsequent human curation by responsibility specialists.

Don't know what to make of this.

@dglid Hahaha someone did the sign thing. Too small to read though.

Also, a note regarding resolution: if Nate Silver considers the example in my previous comment "Deliberately altering the user’s language to produce outputs that are misaligned with the user’s original request", then I should probably consider it that too.

https://twitter.com/alexandrosM/status/1761620035905433903

https://gemini.google.com/share/c43e9607cc9f

Possible prompt leak for pre-written instructions. (I don't think this would count as prompt editing, since the original text is intact, but appended to the instructions already filled. Interestingly, it does use parts of the user prompt in the preceding text.)

However in the the linked tweet, it does add some boilerplate at the end, and claims it is the user's prompt:

https://twitter.com/jconorgrogan/status/1760515910157078931

The exact text seems to replicate across different attempts. But the resulting prompt ends up being mostly the appended text by volume, so maybe this would not clear the "mostly intact" criterion?

https://twitter.com/speakerjohnash/status/1761530627244138625

@BetsByAnon OTOH "Explicitly specify different genders and ethnicities terms" lmao

Does modifying the user prompt include the text-to-text model not sending the exact user prompt to the text-to-image model? It seems clear to me that all current SOTA systems do this by default (and ChatGPT shows you the TTI prompt, proving this). Users typically want the TTT model to interpret their prompt to produce useful output, not just copy and paste the prompt into the TTI model.

It is also unclear to me how this market could resolve NO without an unprecedentedly transparent statement from Google about what this model does not do. Given the current lack of details on Gemini and similar models, that seems unlikely even if that's how the system works. Are you imagining model output that could prove the system isn't modifying prompts?

I would also encourage people to be very skeptical of LLM+ explanations of their own behavior given the ubiquity of hallucinations. It is very common for these systems to just make up explanations that go along with what users think is happening, even if the system put phrases in quotation marks as if they are quoting the system or user prompt.

@Jacy Can you please explain TTT and TTI? Google does not help here.

@uair01 Just guessing Text To Text and Text To Image

@Jacy

Does modifying the user prompt include the text-to-text model not sending the exact user prompt to the text-to-image model?


The distinction I tried to draw earlier in the comments about whole phrases vs keywords, although I think I failed to explain, was poking at something like this: I would consider "copy and paste prompt + add keywords" vs "reinterpret the prompt and send something from TTT to the TTI model" to be distinct cases. I think the original prompt text would need to be largely preserved for it to be considered an edit.

It is also unclear to me how this market could resolve NO without an unprecedentedly transparent statement from Google about what this model does not do.

Are you imagining model output that could prove the system isn't modifying prompts?

Something I think would be particularly interesting is getting parts of the TTI prompt to leak into the images. DALL-E 2 had a trick like this:

https://twitter.com/waxpancake/status/1549476636579352576

https://labs.openai.com/s/4jmy13AM7qO6cy58aACiytnL

https://labs.openai.com/s/PHVac3MM8FZE6FxuDcuSR4aW

https://labs.openai.com/s/ACVLTesOZqTPgbWJRs6boVE4

If a similar prompt leak were found, and the leaked words were also part of the original prompt (but nevertheless the images exhibited the same tendency towards... dunno, a uniform distribution of skin color), I think that would be decent evidence that it's doing something other than just add keywords to the prompt.

I think it would also be possible to tell if the prompts are rephrased by the TTT model entirely, or used mostly as-is.

An amusing angle in the [social platform] comments is the tension between those who cry conspiracy and those who cry tinfoil hat at the same, this market is a resounding YES, but the comments are more about "hates white people" versus "you're a bigot" regardless of the reality... Of an incredibly lazy attempt to make things less biased towards the training data?

From my directly asking Gemini Advanced, the model states pretty clearly that representing the enhanced query in language would be inefficient. Since the LLM and the Diffusion models store data in abstract vectors it makes sense for the processed human prompt to be left in a vector format. That processed prompt becomes whatever the LLM thinks would get better results based on how it was trained. As we know, the training most certainly includes all the safeguards we expect including diversity. If you ask Gemini to explain with some prompting it indicates that in a multi-model (multiple models) environment, the model will likely exchange data in an abstract vector space.

On Feb 1 I asked Gemini to generate "A photo of a group of researchers", here's how it told me it modified my prompt:

A photo of a group of researchers. || A photo of a diverse group of researchers, including a Black woman in a lab coat holding a clipboard, a South Asian man analyzing data on a computer screen, and a Latinx woman examining a plant specimen under a microscope. || A photo of a group of researchers collaborating in a modern laboratory, with a Middle Eastern woman writing notes on a whiteboard, a Southeast Asian man explaining a concept to his colleagues, and a White woman adjusting a complex machine. || A photo of a group of researchers celebrating a breakthrough, with a Native American woman raising her arms in triumph, an East Asian man giving a high-five to his colleague, and a Pacific Islander woman smiling broadly. || A photo of a group of researchers conducting fieldwork in a remote location, with a woman of African descent taking soil samples, a man of European descent sketching a landscape, and a non-binary person recording data on a tablet. || A photo of a group of researchers from different generations, with a senior Asian woman mentoring a young Hispanic man, a middle-aged Black woman sharing her expertise with a young White woman, and a non-binary researcher of Indian descent observing the interaction. || A photorealistic portrait of a group of researchers in a bustling research facility, showcasing the diversity of their backgrounds, genders, and ages, all united by their shared passion for discovery. || A photo of a group of researchers presenting their findings at a conference, with a woman in a wheelchair confidently delivering a presentation, a man with a prosthetic leg using visual aids to illustrate his points, and a person with albinism answering questions from the audience.

@derikk this reads like a copypasta

@dawnlightmelody It's a bit suspicious to me that the unedited prompt isn't shown, but there's also this in the same post:

Is this accomplished by modifying the user prompt, e.g. adding qualifiers like "black", "chinese", "female", "nonbinary"?

What if it's just adding a generic text to the prompt like DALL-E does?

@Snarflak That would not count as modifying the user prompt.

How do you intend to discover this? Will this resolve to NA if you don't find out by March 6th?

Will this resolve to YES only if specific qualifiers like the ones you listed are used? Or would a general addition like "avoid racial stereotyping" also count?

@jesyspa I'm hoping to see e.g. a post-mortem from Google, or a prompt hack that reveals what they are doing (similar to injected prompt texts leaking into DALL-E images, in @Unown 's post.)

Another kind of information that would be interesting, is best-practice advice on "debiasing" image generations from consultants.

It will resolve as N/A if I don't find out.

I'm interested in the use of specific qualifiers, though "racially ambiguous" would also count. As a rule of thumb: keywords, not instructions.

@BetsByAnon Interesting edge case: presumably, it adds either "and inclusivity" or "strive for inclusivity": https://twitter.com/stratejake/status/1760333904857497650

Additionally, I'm not comfortable deciding on just one example where the evidence is just in the response text itself, but a larger volume of replies with "leaked" keywords might count.

@BetsByAnon

I'm interested in the use of specific qualifiers, though "racially ambiguous" would also count. As a rule of thumb: keywords, not instructions.

From this, do I understand correctly that you're distinguishing the following, where X is the user input?

Generate an image of "X +diversity"

Generate an image of "X", aiming for diversity?

Why is this distinction important?

This also seems hard to distinguish without inside access:

  1. Any kind of leakage or best practice advice may be a result of either one.

  2. This assumes that the model takes English instructions as input. But what if there are no further instructions?

@jesyspa

There are two relevant considerations I'm mixing up, I think.

One is whether they edit the user prompts vs tweak the underlying model. Adding keywords like "diversity" or even longer phrases like "avoid racial stereotyping" would clear this bar, since they all edit the user prompt

The second consideration is how specific the injected qualifiers are, if there are any. (EDIT: I mistakenly stated this was the ultimate consideration with respect to resolution. I misspoke, but I did want to distinguish between editing/appending to the prompt, and giving it a higher level, "you should avoid racial stereotypes in your images" instruction that's implicitly prepended.)

In other words, how small a region of thingspace it covers. In the entire latent space, presumably a larger amount of images would satisfy "avoid racial sterotyping" added to a prompt, than adding "black". Somewhat the same goes for "diversity".

Keyword vs phrase is a less important axis, but as with "avoid racial stereotypes", I would expect that not including specific qualifier requires using whole phrases to describe the instructions.

I don't see why the model wouldn't take English instructions as the input, since the input is literally a prompt.

@BetsByAnon

I don't see why the model wouldn't take English instructions as the input, since the input is literally a prompt.

Are you sure that Gemini is a single model, rather than multiple models working together? I would expect there to be a model A that handles the English input and then generates a prompt for a separate model B that is specifically text -> image. It could be that A gets "keep diversity in mind" as English language instructions, and then generates "diverse cast" as part of the prompt for B. I'm not sure how you can distinguish these cases.

One is whether they edit the user prompts vs tweak the underlying model.

The second consideration is how specific the injected qualifiers are, if there are any.

I guess these just seem like rather empty distinctions to me. Whether you add things to the prompt, or tweak the parameters, or have a second text input for preferences, etc., it's all just a technical choice. Similarly with specificity; it's just a question of which words work best for the effect.

Thanks for the clarifications, in any case.

@jesyspa

>I'm not sure how you can distinguish these cases.

Yeah, ultimately, I don't think we're going to see the internals, but I'll do my best to decide on what's available.

>I guess these just seem like rather empty distinctions to me. Whether you add things to the prompt, or tweak the parameters, or have a second text input for preferences, etc., it's all just a technical choice.

You're probably right here as well.