Will OpenAI's next-gen math-focused model score at least 95% on the MATH benchmark?

Resolve to YES if OpenAI's next generation math-focused model achieves a score of 95% or higher on the MATH benchmark.

If the next generation of general models (e.g. GPT-4), code models (e.g. Codex), or any other models specialized for reasoning are released earlier than the math models and score 95% or higher, it will resolve this question to YES.

Benchmarking on a subset of MATH is acceptable.

Using tools(e.g. calculator) & code is allowed.

