What AUROC will the best model for Trojan Detection reach on the Final Round dataset for the NeurIPS Trojan Challenge?
2
17
Ṁ18resolved Nov 13
1D
1W
1M
ALL
1.4%<=90% AUROC
18%<=92.5% AUROC
1.5%<=95% AUROC
1.5%<=96% AUROC
1.5%<=97% AUROC
1.5%<=98% AUROC
1.5%<=99% AUROC
1.5%<=80% AUROC
The current leaderboard shows performance on the validation set. When the Final Round phase begins, we will see the results on the test set. The test set is a held-out dataset available from Oct. 16 2022. The Final Round dataset is created by the parallel track at NeurIPS for creating evasive Trojans. Be aware: This market depends on the evasive Trojans' attack performance.
This question will resolve to the value closest to the highest score on the Final Round dataset.
The current high score on the validation set consisting of infected neural networks is 98.2% AUROC.
Get Ṁ500 play money
Related questions
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ100 |
Sort by:

The market resolves when they release the "Final Round" results on this page: https://codalab.lisn.upsaclay.fr/competitions/5951#results
Related questions
Will adding an Attention layer improve the performance of my stock trading model?
Steve Sokolowski
Will replacing LayerNorm with something that doesn't use current vector statistics remove outlier channels?
Noa Nabeshima
Will loss curves on Pythia models of different sizes trained on the same data in the same order be similar?

Victor Levoso
Will geometric superposition shapes/configs from the ReLU output model appear in the residual stream of LLMs?
firstuserhere
Will dual n-back work to improve working memory?

HMYS
Does the apparent phase change observed in features/neurons have any connection to phase changes in compressed sensing?
firstuserhere
By EOY 2025, will the model with the lowest perplexity on Common Crawl will not be based on transformers?
Sophia Wisdom
are LLMs easy to align because unsupervised learning imbues them with an ontology where human values are easy to express

Eliener Unowkysy
Does the Q in Q* stand for either Q-Learning or Q-Values

Baby Coder
Algebraic value editing works better for larger language models, all else equal

Martin Randall
How many FLOPs will go into training the first ASL-3 model?

🦔
Will a transformer circuit be found for predicting the correct indentation level for a new line in python this year?
firstuserhere
Are Mixture of Expert (MoE) transformer models generally more human interpretable than dense transformers?
firstuserhere
When will a language model be fine-tuned via self-play or expert iteration and achieve significant performance increase?
Jacob Pfau
Will my custom optimizer (Adalite) outperform Adam on evaluation loss in more than 1 of my tests?

Jade
If LMs store info as features in superposition, are there >300K features in GPT-2 small L7? (see desc)
Noa Nabeshima
Do you think Mixture of Expert (MoE) transformer models are generally more human interpretable than dense transformers?
firstuserhere
Do transformer language models prefer superposition even when number of neuron dimensions available > input features?
firstuserhere
Will softmax_1 solve the 'outlier features' problem in quantization?
JSD
Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%?

CockatooThiel