Will a major AI lab claim to use activation steering in its main chat assistant by EOY 2025?

Ṁ150Ṁ581

resolved Jan 3

Resolved

ALL

Also includes methods inspired by activation steering, as long as they don't use any gradient descent step.

Only includes announcements about main chat assistants (e.g. Claude, ChatGPT, Bard, ...) of a major AI lab (OpenAI, Google Deepmind, Anthropic, Meta, Inflection or Mistral).

Does not include to fine-tuning API endpoints.

Market context

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ48
2		Ṁ36
3		Ṁ19
4		Ṁ16
5		Ṁ11

2 Comments

14 Holders

21 Trades

Sort by:

Anthropic found two features (auto-labeled "Neutrality and impartiality" and "Multiple perspectives and balance") that improve BBQ benchmark scores.

According to Nathan Labenz on the Future of Life Institute Podcast, Anthropic is piloting custom activation steering in limited beta (make-your-own Golden-Gate-Claude).