What will the Anthropic SAE paper contain?
6
66
250
Dec 31
74%
Eye-test experiments
59%
Some cherry-picked proof of concept for a useful *type* of task
54%
Doing PEFT by training sparse weights and biases for SAE embeddings in a way that beats baselines like LORA
50%
Streetlight edits
50%
Passive scoping
50%
Finding and manually fixing a harmful behavior that WAS represented in the SAE training data
50%
Using an SAE as a zero-shot anomaly detector
50%
Latent adversarial training under perturbations to an SAE's embeddings
22%
Experiments to do arbitrary manual model edits
12%
Finding and manually fixing a novel bug in the model that WASN'T represented in the SAE training data

This will resolve according to Stephen Casper's judgments.

Get Ṁ600 play money

More related questions