What will the Anthropic SAE paper contain?
6
66
Ṁ107Ṁ250
Dec 31
1D
1W
1M
ALL
74%
Eye-test experiments
59%
Some cherry-picked proof of concept for a useful *type* of task
54%
Doing PEFT by training sparse weights and biases for SAE embeddings in a way that beats baselines like LORA
50%
Streetlight edits
50%
Passive scoping
50%
Finding and manually fixing a harmful behavior that WAS represented in the SAE training data
50%
Using an SAE as a zero-shot anomaly detector
50%
Latent adversarial training under perturbations to an SAE's embeddings
22%
Experiments to do arbitrary manual model edits
12%
Finding and manually fixing a novel bug in the model that WASN'T represented in the SAE training data
This will resolve according to Stephen Casper's judgments.
Get Ṁ600 play money
Related questions
Will an AI-generated paper be accepted into Nature by 2025?
20% chance
By 2028, will I think Anthropic has been net-good for the world?
64% chance
Will we have an AI generated research paper accepted to > 1 top ML conference by 2026?
48% chance
Will an AI alignment research paper be featured on the cover of a prestigious scientific journal? (2024)
30% chance
Will Anthropic and OpenAI collaborate substantially on a research paper before 2025?
51% chance
Will a paper on room temp, ambient pressure Rationalussy superconductivity be released before 2025?
48% chance
Will an AI co-author a mathematics research paper published in a reputable journal before the end of 2026?
59% chance
What will be the valuation of Anthropic in 2026? (M1000 subsidy)
Will a paper fully created by AI be accepted into Nature by 2030?
58% chance
Will we have an AI generated research paper accepted to > 1 top ML conference by 2027?
62% chance