Will I find practical use for agents like OpenAI's Operator in 2025?

Ṁ100Ṁ4.3k

Jan 12

12%

chance

ALL

This is one of the first predictions in Scott Alexander et al's AI 2027. I wrote about this in my AGI Friday newsletter.

For the purposes of this market, this resolves according to whether I personally find such agents worth using and paying for on at least a weekly basis.

Ask clarifying questions (and suggest things I should be using such agents for)!

So far, my experiments with OpenAI's Operator have not exactly been a success.

Market context

Technology

Technical AI Timelines

OpenAI

AI Impacts

Get

1,000

to start trading!

People are also trading

Will Anthropic, OpenAI, Deep-mind or Meta publish an app mainly for AI Agents in February 2026?

10% chance

Will Anthropic, OpenAI, Deep-mind or Meta publish an app mainly for AI Agents by end of March 2026?

34% chance

Will OpenAI exist in Jan 2027?

96% chance

Will OpenAI make a profit in 2025?

3% chance

Will models be able to do the work of an AI researcher/engineer before 2027?

16% chance

Will OpenAI be more valuable than Microsoft before 2040?

28% chance

By 2028, will I think OpenAI has been net-good for the world?

25% chance

Will OpenAI become nothing by 2030?

Is Sam Altman right that we will see AI agents materially change the output of companies in 2025?

10% chance

Will OpenAI be in the lead in the AGI race end of 2026?

Sort by:

Some things I've used it for:

1) I wanted all the recordings from fenymans lectures (hosted here https://www.feynmanlectures.caltech.edu/recordings.html) the website is an absolute mess, it takes 5 clicks to download each one and go to the next. I had agent inspect the page and build a scraper, worked beautifully

2) I may allegedly know someone who uses a pirating site for watching the NFL. It's full of awful clickjacking and every dark pattern under the sun. I had agent inspect the website and create a chrome extension which disables all the shenanigans

3) Planning vacation - it did decently but not perfect. It was helpful though

I've found them to be quite useful

bought Ṁ50 NO

I recently tried to get Comet to buy a lego electric motor and it tried to buy a real motor instead.

@traders I'm inclined to resolve this NO but I do want to hear if you think I'm being dumb by not putting agents to more use than I do. Also if you want to cry foul that coding assistants like Codebuff and Claude Code should count as using agents, speak now or forever hold your peace.

@dreev I bet a small amount of YES with the understanding that the market was subjective to you. I'm fine to take that L. I also predicted a bit more agent use this year than we got, so I think I was meaningfully incorrect.

@Haiku Yeah, I think it's getting there but isn't reliable enough for me to bother with it for everyday (or everyweek) things. Like just now I asked it to try playing http://digit.party and ChatGPT, Claude, and Gemini are all hopeless, though I've seen ChatGPT occasionally succeed at similar games.

Digit.party is actually dirt simple; you literally can just click in each of the 25 squares and that counts as successfully playing it. ChatGPT is the only one that could even start playing it.

PS: Wait, ChatGPT does seem able go get through a game, eventually. It flaked out a couple times and I restarted it. It was excruciating to watch. Even setting aside any attempt at strategy, it just couldn't tell where it was clicking and seemed to get through by luck and brute force.

And then when it finally finished and clicked the share button, it was stymied for a long time trying to get the clipboard contents back to me. But ultimately it got there, after about an hour (!).

PPS: In Gemini's case it understood the game perfectly and wrote its own version and played that, without me asking it to. Claude implemented a playable version when I asked. All insanely impressive but in terms of going out on the web and using a keyboard and mouse agentically, it seems like it's at most at an elementary school level in certain critical ways (despite being superhuman in other ways). So not smart enough to use regularly yet.

I think the first bullet item in the Random Roundup of https://agifriday.substack.com/p/boombust is a small positive update here.

Boom XOR Bust

Even snarky Tiktokers have switched to deriding AI for being powerful

@dreev what are your thoughts 6 months later?

@MRME It's highly ambiguous! Anyone have positive or negative examples of things they've tried? I talked about an example in an AGI Friday recently.

I'm not betting so far, which may be easier. I can just resolve according to my own best judgment. If I can't resist betting, I'll say more about how I'd deal with my conflict of interest. So far I'm pretty clueless and don't feel I have a better probability than this market is coming up with.

boughtṀ100NO