Intology’s Locus result on RE-Bench real?
1
1kṀ6550
Dec 21
3%
chance

ways this would be real:

  • result independly replicated

  • Model is clearly found to be strong SOTA at SWE tasks similar to RE-Bench

Ways this would not be real:

  • they announce that this reported score was in part caused by an error in their setup / due to extensive reward hacking by their model (it ‘cheated’)

  • Independently replicated and score is nowhere near human level

    failing these, resolves to consensus of credible people, let’s say in feb 2025

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy