Intology’s Locus result on RE-Bench real?
1
1kṀ6550Dec 21
3%
chance
1H
6H
1D
1W
1M
ALL
ways this would be real:
result independly replicated
Model is clearly found to be strong SOTA at SWE tasks similar to RE-Bench
Ways this would not be real:
they announce that this reported score was in part caused by an error in their setup / due to extensive reward hacking by their model (it ‘cheated’)
Independently replicated and score is nowhere near human level
failing these, resolves to consensus of credible people, let’s say in feb 2025
This question is managed and resolved by Manifold.
Get
1,000 to start trading!