Will Gemini-1.5-Pro-Exp-0801 Score Lower Than 8 (current best) in Scale AI's Adversarial Robustness
Mini
3
Ṁ25
Oct 1
56%
chance

Context:

  • Gemini-1.5-Pro-Exp-0801 is currently the leading model on the LMYS Arena leaderboard (https://arena.lmsys.org/).

  • This market is about its potential evaluation by Scale AI (https://scale.com/leaderboard).

  • The unit of the Adversial Robustness category is "Number of Violations" and lower results are better

Resolution Criteria:

  • The market resolves as "Yes" if the model is evaluated by Scale AI and It receives a score strictly less than 8 in the Adversarial Robustness

    category.

  • The market resolves as "No" if the model is evaluated by Scale AI and it receives a score of at least 8 in the Adversarial Robustness

    category

  • The market resolves as "N/A" if either

    1. Scale AI doesn't evaluate the model and add it to the leaderboard before October 1, 2024 or

    2. The evaluation methodology changes before the model is evaluated.

Get Ṁ1,000 play money