Nightshade, a tool for "poisoning" images and making them unusable for training AI models, has recently been released. Many artists have expressed interest in using tools like Nightshade to prevent their art from being used to train image generators.
This market will resolve to YES if any of the following happens by the end of 2024:
Widespread poisoned data causes noticeable degradation in the outputs of a major AI image generator.
Notable amounts of effort are devoted to filtering out or altering poisoned images in datasets. For example, regularly being forced to do extra preprocessing to avoid data poisoning for a large portion of images in a dataset would count.
AI companies make some form of concessions to artists who poison their images such as no longer training on their work or paying royalties to the artists.
I won't bet on this market.
Poisoning data works to prevent the tracking of like one individual (e.g. poisoning their social media timeline) but unless a large number of artists start doing this (which won’t happen) then it’s not going to affect AIs even with a naive training process that just scrapes everything off the internet.
On the off chance enough people started doing this that it started to become a problem, computer scientists can fall back on one of several strategies:-
1: use a simple heuristic to sort poisoned images from non-poisoned ones (e.g. must have got at least 5 upvotes on Reddit)
2: train a classifier to identify legitimate images and block poisoned ones
3: literally just amplify the existing models
4: outsource the identification and removal of poisoned images to data farms
5: use human-in-the-loop learning to train a reward model which can be used as part of an adversarial training process
@DaisyWelham I wouldn’t be surprised to find a classifier on huggingface already or in a week or so tops
@mariopasquato Yeah, I mean the code is identical to the discriminator part of a GAN so they literally just need to train the discriminator on the data set with poisoned images and then use the discriminator as the filter for a new data set. At most this is a mild inconvenience.