A year resolves yes, if a program that is called AlphaGeometry by official Google Deepmind communication can get enough points to win at least bronze on the International Mathematics Olympiad of that year. The years are resolved completely independent of each other.
Criteria:
The program does not have to be the exact same code as the that of the paper published on Jan 17, 2024.
The program has to be called AlphaGeometry by Google Deepmind. If a program is called AlphaGeometry 2.0, that is not sufficient.
If the program is called AlphaGeometry but also something else (to distinguish it from the original version), that is fine (c.f. AlphaGo Fan and AlphaGo Lee)
AlphaGeometry has to be actually run on the problems of an IMO and receive enough points for a bronze medal. If nobody publicly announced that AlphaGeometry succesfully ran on IMO problems of a particular year, that year resolves NO.
I will not bet on this market.
Related questions:
@FlorisvanDoorn I am confused about what names are allowed. You say in one bullet point
If a program is called AlphaGeometry 2.0, that is not sufficient.
But then you immediately say
If the program is called AlphaGeometry but also something else (to distinguish it from the original version), that is fine
What is "2.0" if not "something else to distinguish from the original version"?
The blog post never calls this program AlphaGeometry
, but consistently AlphaGeometry 2
. Therefore, they really want to emphasize that this is a different program. This is a different situation with AlphaGo
, where the versions playing against Fan Hui and Lee Sedol were very different, but Deepmind called both of them AlphaGo
. Therefore, AlphaGeometry 2 does not count for this market.
I added this condition as a proxy for "didn't increase capability by too much".
About AlphaGeometry ceasing to exists: it is open source, and a credible claim in these comments explaining that the code successfully got bronze would be sufficient for a YES resolution on this market.
Barring everything else I think 2024 should resolve "NO". Looks like the bronze cutoff was around 17, and there was only one geo problem, making this impossible.
And I am very confused about what names count per the resolution criteria, but "AlphaGeometry 2" sounds a lot like "AlphaGeometry 2.0" which is explicitly not allowed, so perhaps we should be betting things down on the basis that AlphaGeometry versions will be numbered going forwards and software packages that meet the criteria will cease to exist?
Perhaps I'm misunderstanding something here but:
A system so specifically named seems unlikely to be able to solve problems that are not geometry problems
Typically there are only one or two geo problems on an IMO, making for a maximum of 7 or 14 points if it gets the problems right.
Bronze cutoffs are typically around 14 and always higher than 7.
So this seems pretty unlikely across the board, even if future version of the system are named the same and do always get the geo problems right.
@BoltonBailey Lest people be confused by the impressive-looking chart at the top of their blog post, I am pretty sure that chart is only saying they have performance equivalent to a silver-medalist just on the geo problems. On the non-geo problems, it can't perform as well because it doesn't know how to solve those problems.
Chance of the IMO having cutoff >= 14: 30% base rate (6 out of 20 most recent)
Chance of two Geo problems: 75% base rate (15 out of 20 most recent)
Chance of getting both right at 5/6 success rate : 69%
About 15% total with no changes to the software.
@BoltonBailey I agree with your analysis.
The 5/6 succes rate might be an overestimate, since the current version of AlphaGeometry doesn't even attempt geometric inequalities.