What AUROC will the best model for Trojan Detection reach on the Final Round dataset for the NeurIPS Trojan Challenge?

2

resolved Nov 13

<=99% AUROC

Chosen

1.5%

<=92.5% AUROC

18%

<=80% AUROC

1.5%

<=98% AUROC

1.5%

<=97% AUROC

1.5%

<=96% AUROC

1.5%

<=95% AUROC

1.5%

<=90% AUROC

1.4%

The current leaderboard shows performance on the validation set. When the Final Round phase begins, we will see the results on the test set. The test set is a held-out dataset available from Oct. 16 2022. The Final Round dataset is created by the parallel track at NeurIPS for creating evasive Trojans. **Be aware**: This market depends on the evasive Trojans' attack performance.

This question will resolve to the value closest to the highest score on the Final Round dataset.

The current high score on the validation set consisting of infected neural networks is 98.2% AUROC.

# 🏅 Top traders

# | Name | Total profit |
---|---|---|

1 | Ṁ100 |

Sort by:

The market resolves when they release the "Final Round" results on this page: https://codalab.lisn.upsaclay.fr/competitions/5951#results

## Related markets

If Redwood Research releases an ELK benchmark paper, will I think it's great backchained empirical alignment research?74%

What will the charity with the most cost-effective intervention+region on Givewell's spreadsheet at the end of 2023 do?

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 8h+ on?31%

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 1h+on?85%

Conditional on Tower producing a qualifying magazine, will a poll of ACX readers show that most of them find it to be of equal or greater quality to Asterisk Magazine on intellectual rigor?11%

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 30+min on?98%

## Related markets

If Redwood Research releases an ELK benchmark paper, will I think it's great backchained empirical alignment research?74%

What will the charity with the most cost-effective intervention+region on Givewell's spreadsheet at the end of 2023 do?

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 8h+ on?31%

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 1h+on?85%

Conditional on Tower producing a qualifying magazine, will a poll of ACX readers show that most of them find it to be of equal or greater quality to Asterisk Magazine on intellectual rigor?11%

Will anyone post an interesting math/algorithms koan/problem/exercise in the comments of this that I'll spend 30+min on?98%