Help me find a good and easy Spam filter solution for Manifold
Ṁ1,935 / 2000
bounty left

Look at manifold.markets/admin/reports - the vast majority of reported content is automated advertising from newly created accounts. Often in foreign languages, low quality, SEO-bait crap. AI and traditional spam filtering should be really good at catching this stuff.

I'm looking for

  • relatively cheap to use and integrate (otherwise @SirSalty will keep just keep doing it by hand for now)

  • low false positive and false negative rate is low enough that it's actually useful

    • I just want to auto filter spam, not other content. Other stuff still needs a human touch I think.

  • you can propose alternate solutions but keep in mind we want the experience for new users to still be good

to be clear, I'm okay with you guys promoting your own startup / soundcloud because you are real and make real questions. The crux is the contant new account making and extremely poor quality.

Rewards

20 mana for good comments, 2000 for being the first to link me to something good that actually works, and fractional rewards for fractional help.

Get
Ṁ1,000
and
S1.00
Sort by:
+Ṁ20

Can you implement a very simple "if multiple reports within some frequency, hide from front page for some duration until an admin can review" approach? Seems like it could be a good way to leverage superusers to clean up the experience for less-engaged/newer users?

+Ṁ20

I work in a T&S company, and while this isn't exactly my area of expertise, what I would recommend right now is to establish some rules that can be automated. Then backtest those rules to see if they are reliable enough and don't catch too many false positives. And example can be:
User is less than 1 day old & has no bets & market contains a link.
You can use information about their email, IP, device, etc. to find groups of those entities that are suspicious and auto-block those as well.
You should also consider whether you want to simply block these users from creating markets, or to shadowban them, which would make it significantly more difficult for spammers to circumvent prevention measures. You can just hide the markets that fit these criteria from showing up on the front page and in searches without giving the user an error.

I also recommend you stay away from regex filters or naive ML solutions, as these can have a high false positive rate. Stick to understandable rules, and when the time is ready to develop a full spam solution, look for a vendor that can help.

+Ṁ5

Why not just use gpt-3.5-turbo with an "is this comment spam? Respond with only Yes or No." prompt and logit bias to force yes/no answers? You'd also want to decrease temperature. At the current commenting rates I imagine it'll cost <$3/day.

Say 20k comments/day w 100 tokens on average. At .015$/1K prompt tokens that's $3/day.

You'd need to check that it works as a spam filter first though and sanity check my $3/day figure with better numbers. Also maybe you can do way better than 3.5 with some other solution.

I don't have links to out-of-the-box solutions, but I do have an identification strategy based on current spam trends:

  • offsite link in description, usually just a single one

  • no trades in user history

  • no manually added topics (unsure if that can be detected in creation flow)

these spam questions also often have the following in common:

  • no trades from other users

  • no (or very very few) question marks in title or description

  • user only has only one or two created questions

  • reads as generated marketing copy (anti-AI detection tools might be useful), and only presents a single perspective

You just have to know how to respond to the spambots to shut them up. The spammers are filled with lies, but they get real quiet when you start telling the truth.

here's some Python code i just wrote on the spot

import requests

# hardcoded list of bad words
bad_words = ["erectile dysfunction", "nhà cái", "cược"]

# make request
r = requests.get("https://api.manifold.markets/v0/markets")
r.raise_for_status()
result = r.json()

# main loop
for market in result:
title = market["question"]
for term in bad_words:
if term in title:
print(market["url"])

I know some forums i frequent block new users from linking external links for a while. While this would reduce the new experience somewhat, I don't think it breaks any usage of the site. It would also remove a lot of the incentives behind spamming.

I'll also +1 the suggestion of multiple reports -> blacklisted, requiring admin approval to be visible again.

The quick and easy way to do it, imo, would probably just be to have a list of regex/keyword based “rules” for auto-classifying content that David can add to. That’s simpler than a classifier and probably gets at what you want with “David’s time, but scaled” anyway. Plus it sets you up for more complicated rules later. That way whenever he spots a pattern that’s a drain on his time, he adds a rule, and he’s responsible for maintaining high precision rules.

There is a more general setup that is analogous to @DaveK‘s comment that is used at several social media companies. It might be too much effort to set up here, but you define signals (at manifold scale queries are probably fine) on spam-relevant behavior like account age, report history, posting frequency, admin action history, etc. You then define enforcement endpoints (rate limits, bans, checkpoint approvals, etc). From there you can set up any “rules” you want based on your experience with the data and track the precision/recall achieved by each rule.

I think this is best used in combination with content classification. There are several companies that do this as a service (ActiveFence, Toloka). I have no experience with these in particular and integrating could be more trouble than a quick gpt model or whatever, but I suspect manifold T&S is going to be an ongoing battle as spammers tend to adversarially adapt to stuff if they find the target otherwise fruitful, so it could be worth it in the long term to invest in a 3rd party handling it.

More tangential but the folks are Cinder (cinder.co) are top notch if you’re interested in what they provide for trust&safety.

Just ideas here, not a full solution.

Obviously, there's some low hanging fruit for at least filtering these users based on account characteristics (e.g. not having bet at all, only having 1 market, etc.) But you probably can't only use those things in isolation. (And also, the spammers could easily change their strategy to bypass that.)

If you want to implement your own ML thing (based on the title and description of the markets) for spam detection, you could look into Bag of Words (simple) or TFIDF (slightly more complicated). I imagine these would have pretty high false positive and false negative rate if used in isolation, though.

The biggest thing that comes to mind is using an LLM. (I also don't have any experience with LLM APIs, so these are purely theoretical.)

ML ideas from easiest to implement to hardest:
- Bag of Words spam detection. Probably pretty high false positive and false negative rates, but very easy to implement.
- TFIDF for spam detection. It's like a slightly more advanced version of Bag of Words. But without an LLM, this is probably the best you can do as far as ML goes. (Also, it's just an interesting concept if you want to read more https://www.sciencedirect.com/science/article/pii/S1877050919318617. The paper claims 97.5% accuracy.)
- Use the API for an out-of-the-box LLM (e.g. GPT4) for sentiment analysis. Ads generally have positive sentiment, whereas most markets posted by actual users have a neutral or uncertain sentiment. (I tried asking about whether a message was spam, but the LLM seems to be a lot less sure if you ask it about spam than asking for sentiment.)
- Fine-tune an LLM for checking for spam (using the data you currently have). I imagine the outputs would have very high accuracy, and it should also be relatively cheap with LoRA. I don't know what the pricing is for those APIs though, since I've never used them.