What AUROC will the best model for Trojan Detection reach on the Final Round dataset for the NeurIPS Trojan Challenge?
2
22
Ṁ18resolved Nov 13
1D
1W
1M
ALL
1.4%
<=90% AUROC
18%
<=92.5% AUROC
1.5%
<=95% AUROC
1.5%
<=96% AUROC
1.5%
<=97% AUROC
1.5%
<=98% AUROC
1.5%
<=99% AUROC
1.5%
<=80% AUROC
The current leaderboard shows performance on the validation set. When the Final Round phase begins, we will see the results on the test set. The test set is a held-out dataset available from Oct. 16 2022. The Final Round dataset is created by the parallel track at NeurIPS for creating evasive Trojans. Be aware: This market depends on the evasive Trojans' attack performance.
This question will resolve to the value closest to the highest score on the Final Round dataset.
The current high score on the validation set consisting of infected neural networks is 98.2% AUROC.
Get Ṁ200 play money
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ100 |
Sort by:
The market resolves when they release the "Final Round" results on this page: https://codalab.lisn.upsaclay.fr/competitions/5951#results
More related questions
Related questions
Will there be a model that has a 75% win rate against the latest iteration of GPT-4 as of January 1st, 2025?
45% chance
Will any open-source model rank in the top 3 on Chatbot Arena at any point in 2024? (for 1 week)
39% chance
When will an AI model be better than me at competitive programming?
By which years will AI be shown to have a better log loss than the Metaculus community pred. on <= 1 year predictions?
By the end of 2028, will AI models, as Dario Amodei speculates, be able to “replicate and survive in the wild”?
30% chance
Will general purpose AI models beat average score of human players in Diplomacy by 2028?
60% chance
By 2024 end, a model exhibits action recognition (video) equivalent to human level accuracy on Something Something V2?
40% chance
Will any open-source model achieve GPT-4 level performance on MMLU through 2024?
83% chance
Will OpenAI's next-generation model score 65% or higher on the GPQA benchmark?
65% chance
Will the Jan 2024 version of the LLM detector "Binoculars" be effective against OpenAI's best model at end 2024?
79% chance