This is one of the first predictions in Scott Alexander et al's AI 2027. I wrote about this in my AGI Friday newsletter.
For the purposes of this market, this resolves according to whether I personally find such agents worth using and paying for on at least a weekly basis.
Ask clarifying questions (and suggest things I should be using such agents for)!
So far, my experiments with OpenAI's Operator have not exactly been a success.
People are also trading
Some things I've used it for:
1) I wanted all the recordings from fenymans lectures (hosted here https://www.feynmanlectures.caltech.edu/recordings.html) the website is an absolute mess, it takes 5 clicks to download each one and go to the next. I had agent inspect the page and build a scraper, worked beautifully
2) I may allegedly know someone who uses a pirating site for watching the NFL. It's full of awful clickjacking and every dark pattern under the sun. I had agent inspect the website and create a chrome extension which disables all the shenanigans
3) Planning vacation - it did decently but not perfect. It was helpful though
@traders I'm inclined to resolve this NO but I do want to hear if you think I'm being dumb by not putting agents to more use than I do. Also if you want to cry foul that coding assistants like Codebuff and Claude Code should count as using agents, speak now or forever hold your peace.
@dreev I bet a small amount of YES with the understanding that the market was subjective to you. I'm fine to take that L. I also predicted a bit more agent use this year than we got, so I think I was meaningfully incorrect.
@Haiku Yeah, I think it's getting there but isn't reliable enough for me to bother with it for everyday (or everyweek) things. Like just now I asked it to try playing http://digit.party and ChatGPT, Claude, and Gemini are all hopeless, though I've seen ChatGPT occasionally succeed at similar games.
Digit.party is actually dirt simple; you literally can just click in each of the 25 squares and that counts as successfully playing it. ChatGPT is the only one that could even start playing it.
PS: Wait, ChatGPT does seem able go get through a game, eventually. It flaked out a couple times and I restarted it. It was excruciating to watch. Even setting aside any attempt at strategy, it just couldn't tell where it was clicking and seemed to get through by luck and brute force.
And then when it finally finished and clicked the share button, it was stymied for a long time trying to get the clipboard contents back to me. But ultimately it got there, after about an hour (!).
PPS: In Gemini's case it understood the game perfectly and wrote its own version and played that, without me asking it to. Claude implemented a playable version when I asked. All insanely impressive but in terms of going out on the web and using a keyboard and mouse agentically, it seems like it's at most at an elementary school level in certain critical ways (despite being superhuman in other ways). So not smart enough to use regularly yet.
I think the first bullet item in the Random Roundup of https://agifriday.substack.com/p/boombust is a small positive update here.
@MRME It's highly ambiguous! Anyone have positive or negative examples of things they've tried? I talked about an example in an AGI Friday recently.
