What Brier score will the best model achieve in the $125k Autocast warmup competition?

This question will resolve to the minimum Brier score achieved on the leaderboard by a qualified submission. The calibrated random baseline is 85. The lower the better. See an example of the Brier score in action.

See the competition page.

From the Evaluation page

For true/false and multiple-choice questions, we evaluate models using the Brier score, which is then divided by 2 to normalize between 0% and 100%. For numerical questions, we use L1 distance, bounded between 0% and 100%. We denote these question types as T/F, MCQ, and Numerical, respectively. To evaluate aggregate performance, we use a combined metric (T/F + MCQ + Numerical), which has a lower bound of 0%. A score of 0% indicates perfect prediction on all three question types. For more details, please check out the Autocast paper.

🏅 Top traders

Related questions