People are also trading
@Incompleteusern Big labs might just not bother to run it. They might see it as saturated, like, 'we already know these models are superhuman at this task. This isn't going to make us stand out from other labs. Let's not rub it in the faces of the humans doing the competition'
@comicstosteal i guess that could happen but i don't think this is that realistic of a scenario; even then, it's also somewhat plausible that someone independently runs a harness or whatever using big lab models.
@Incompleteusern the resolution rules are a bit specific. Probably no big AI company cares about this market enough to pay attention to the rules, so we have to hope somebody just happens to comply.
@DottedCalculator Yeah, and I think it may well be fine this year, but 95% seems a little high.
@placebo_username which part of the criteria do you think won’t be satisfied? they all seem reasonable for any perfect score model to satisfy
@DottedCalculator >5% probability of publication date after Aug 21 or more than 9 hours of compute used.
@placebo_username the 95% includes the 4.5 hour per day time limit. This is an official IMO time limit so if they submit for IMO they will explicitly follow it if their model has the ability to get a 42 within this time, which I think is very likely. There is also no reason to wait a month. Last year, they were published immediately after IMO.
@0xseraphim someone claimed they can do P6 from last year which was an especially hard combinatorics problem so prolly they can
@0xseraphim jump from gold to perfect score depends on the highly varying difficulty of P6. Last year the gap was v big compared to the avg year
@0xseraphim for human contestants this very much depends highly on individual variance and test variance. many people with up to 4 gold medals have 0 or 1 perfect scores.
@Bayesian "someone claimed they can do P6 from last year which was an especially hard combinatorics problem so prolly they can" they can get 1-3 points on it I think
https://x.com/wtgowers/status/2057175729008153069?s=20
https://openai.com/index/model-disproves-discrete-geometry-conjecture/
"AI has now solved a major open problem -- one of the best known Erdos problems called the unit distance problem, one of Erdos's favourite questions and one that many mathematicians had tried."
@Lorenzo I’m poor. 7000 is already by far the largest position I’ve had in a real market bet (other than my friend’s mit market I lost 13k on)
@DottedCalculator the conditions on the market include that it'll be solved with human time constraints. Which hopefully it can.
@comicstosteal tbh I've been under the impression that these time constraints don't matter as much since fundamentally LLMs are just a lot faster than humans. maybe I'm wrong though
@Incompleteusern more importantly my sense is that they can run hundreds of instances per problem in parallel and it would still qualify, solutions to these problems are guaranteed to be reasonably short so the relative value of sequential time vs parallel time seems much lower than in other cases


