Will there be evidence of large scale data pollution operations by the end of 2025?
70
1kṀ6502
resolved Jan 5
Resolved
YES

Considering that:


a) data pollution (large scale injection of AI generated data into the information space) and subsequent model collapse have been identified [1] as potential threats [2] for future LLM's and
b) advanced AI models will impact the geopolitical power distribution [3] and therefore be increasingly subject to geostrategic contention [4],

Do you believe that by the end of 2025, there will be evidence of large scale organized data pollution operations by state or non-state actors with the implicit or explicit goal of denigrating the performance of future LLM's taking or having taken place?

Resolution:

This market will resolve as YES if at any point before 01/01/2026 credible information will emerge that a deliberate data pollution operation by any actor (state/non-state) for any reason (geopolitical contestation, ideology, terrorism, lulz) has taken place.

Caveat: the operation must be/have been significant enough to warrant mention by a reputable news source (e.g. the NYT, WSJ, WP, BBC etc.), a government communication, a peer-reviewed scientific publication, a reputable threat intelligence service provider and/or other reputable sources not covered in this list.

I reserve the right to final judgement.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ573
2Ṁ152
3Ṁ133
4Ṁ79
5Ṁ61
Sort by:

While inclining to resolve YES, I also understand the case to resolve NO. I found it hard to come to a decision I could stand by so I asked a bunch of AI models by simply pasting the market description. They would resolve the market as follows:

Free ChatGPT model: NO

Grok Expert: YES
Gemini 3 Pro: YES

Claude Sonnet 3.5: YES

I therefore resolve it as YES and take home as a learning that I won't create any further markets which can not be resolved through clear binary outcomes determined by someone else.

@Symmetry What's the credible information that a deliberate data pollution happened, upon which the market resolved to as YES?

@Symmetry Also why is it ure inclined to resolve YES?

Inclined to resolve YES because of the following article. Any disagreement? @traders

https://manifold.markets/post/will-there-be-evidence-of-large-sca?r=U3ltbWV0cnk

@Symmetry I'll stick to my guns personally and take the hit

@Symmetry I do. Kindly read the Washington post article. There is no credible information of intentional data pollution mentioned in it.

The talk seems mostly speculative imo. Now im not disagreeing that data pollution is real but, the article itself is basically saying "propaganda exists, we feed everything to AI, AI cant tell the difference, therefore this is data pollution."

There is no... proof?

The post does not prove Russia has intentionally “poisoned” LLMs at scale.

What it shows is:

  1. Russian/pro-Kremlin networks flood the web with low-quality content (documented).

  2. LLMs can echo dominant web content in data-poor areas.

But several researchers (e.g. HKS Misinformation Review) argue this is better explained by data voids and weak source curation, not deliberate, reliable AI grooming.

So it could be intentional data pollution, but claims of demonstrated, large-scale AI manipulation are currently assumptions.

In other words

This article doesn't provide credible information that a deliberate data pollution did indeed happen.

@theScalper @mods

Hello! Sorry for the inconvenience.

I just noticed that the market creator might have missed my comment. Furthermore, they resolved the market based on asking 4 different AI models by only pasting the market criteria without providing an article that provides credible information of deliberate data pollution.

I understand that the market creator reserves the right to final judgement. But not providing an article that at least, shows SOMEWHAT credible information and resolving ONLY using 4 AI models goes against the market's criteria for resolving yes.

This market will resolve as YES if at any point before 01/01/2026 _credible_ information will emerge that a _deliberate_ data pollution operation by any actor (state/non-state) for any reason (geopolitical contestation, ideology, terrorism, lulz) has taken place.

For me the issue is whether the goal is really to cause performance degradation or rather to guide specific results injection without overall degradation.

which page

@ItsMe you can search for Nightshade which is emphasized in the paper as requiring very little sample share to have its intended negative effect.

@Panfilo Wouldn't that be the opposite, surgical rather than large scale.

@JussiVilleHeiskanen No, it's effective on a large scale with a small "dose." I don't think this market is about how efficient the pollution is, and I certainly don't think it's about only inefficient pollution, but I could be wrong!

@Panfilo "large scale organized data pollution operations" -- from description. Single site isn't that organized an operation.

@JussiVilleHeiskanen Then was there a reason you don't think the WaPo Russia story in the further back comments counts?

I presume I am missing something but I just can't seem to get it😭

Regardless of wether one believes that there is data pollution or not. This market is about the credible mention of such an operation by a reputable source. And so my question is, why are the odds so high?

opened a Ṁ2,500 YES at 51% order

@theScalper Bet more, comrade!

bought Ṁ95 YES

if there were large-scale text data pollution operations, you'd expect some metric like this to be going down or at least stagnating, not going up

@Dulaman those models are scaling up faster to overcome those obstacles.

Those things do happen, but they are kept out of the media to avoid the Streisand effect.

© Manifold Markets, Inc.TermsPrivacy