
Resolves as YES if an algorithm can watch a new 2+ hour movie and answer questions about it to the same level as the top 5% of humans by January 1st 2027
Nice question. I've been thinking about this kind of thing too - for use on the road, giving the AI live video+geolocation and hoping to get back very detailed useful reports on what's all around you, history etc. SotA already seems pretty good, just that the actual apps aren't really released yet.
@Ernie mmmh yeah I really like the history idea! I've tried doing this in the past while travelling, however LLMs aren't great at digging up things that they know sometimes. They can't make all the connections they need to make even if the knowledge is there. It's certainly doable, but to make it really good and very detailed would take some tricks.
@MalachiteEagle Yeah my thesis is:
The actual amount of publicly available data about our real world infra + history is GIGANTIC. Local gov't mostly makes all the PDFs available about every conceivable building, tender offer, budget, payment etc. But, it's all custom and super inaccessible and boring. Same with everything around law/regulation/licensing.
For history, too, there are lots of books about it but copyright limits access. So getting an LLM which can do test-time inference into them all would suddenly open them up.
So when you're in a place you could ask the agent questions like "here's an image of a video camera at this intersection, who makes it, how much did the contract cost, who signed it, what's the text, who gets the video and what do they with it, how many people have been injured at this intersection, etc." and it could be scoped to your known interest vectors. So as you'd drive/walk you'd just get super high interest data which it's reasonable to believe that nearly nobody has ever put together into a simple form before.
(And people will be able to convince others of this by having fast access to sources & references, and the data may be very useful). Like, I wonder how many historical scam/corruption issues there are out there, all visible in the records, just waiting to be put together by someone who can keep enough data in memory to make it clear what's going on?