https://docs.google.com/document/d/1jV9F4H-UM7V76cj59CA9_WLzL6zHM_SC5cAbBE614vk/edit?tab=t.0
From the Astral Codex Ten post https://www.astralcodexten.com/p/open-thread-365
If OpenAI gets the data sent to them directly via email, file sharing, etc (e.g. like the frontiermath data was sent) that counts even if they pinky promise not to train on it.
If the data gets submitted to an OpenAI endpoint, that counts if and only if we find evidence that OpenAI is hunting for benchmark submissions by sifting through their api logs, or training on api logs generally.
60% is chosen in case METR declares "Don't worry, we held out a test set"
(This is slightly different than frontiermath since answers are being solicited by this job posting, not questions. Just flagging that I am aware of that)