Okay, I have a new test for all your "frontier models": can they answer the following prompt correctly
"can you count how many pieces are there in this set cumulatively? extra points if you can guess what this is, and what parts are missing?"

Ofc, this prompt fails to give me a satisfactory answer. So -- can you give me a prompt which can answer this question? Feel free to break out your favorite reasoning models!!
a) prompts which depend on searching the internet will receive a deduction (i'm not disqualifying them)
b) the entirety of the prompt, along with any additional setting must be provided
c) the output must be correct and be independently validated by either me (or someone else)
d) I am intentionally not providing any information in this question so that people can't just feed the link and get an unfair advantage :). Any spoilers shall be deleted!!!!!