(5000M subsidy) Will I (porby) think "goal agnosticism" as a concept is still relevant/useful at the end of 2024?

I currently think goal agnostic systems, particularly a subset of predictors, have really nice foundational properties that give us a path to practically usable extreme capability without autodoom.

Some (beefy) background:
FAQ: What the heck is goal agnosticism? — LessWrong

Using predictors in corrigible systems — LessWrong

Resolves yes if on January 1, 2025:

I still agree with the core arguments underlying goal agnosticism, how it can be used, and how it is likely to scale.
I still think that AI research is on a path that makes roughly goal agnostic foundations a reasonable expectation: not guaranteed, but >15%-ish chance. (Current estimate: ~87%)

Note that resolving yes does not require that I am still working on things related to goal agnosticism.

Some example ways this could resolve no:

An experiment shows that simple current-style autoregressive, single token predictive loss over a reasonably broad training distribution still allows unconditional preferences over world states. "Wanting to predict well" instead of "predicting well" leading to locally loss-increasing steganography, for example.
The industry finds an easier path to extreme capability that doesn't lend itself to goal agnosticism. For example, if someone manages to make end-to-end reinforcement learning on a sparse, distant reward (no predictive world model helping out, no reward shaping, etc) work reliably and for 10,000x less compute than an equivalent predictor-backed system, I'd probably be forced to downgrade the probability of goal agnostic systems a lot. Also, we'd probably explode.
I become convinced somehow that the fuzzier parts, like the degree to which we can reliably aim a strong system at useful things, are not like I thought in a way that makes the approach useless.

#	Name	Total profit
1		Ṁ278
2		Ṁ178
3		Ṁ155
4		Ṁ100
5		Ṁ82

🏅 Top traders

People are also trading

Related questions