What GUI tasks will Claude be able to do automatically before end of 2024?
17
1.5kṀ1553
resolved Jan 2
Resolved
YES
Exit vim
Resolved
YES
Send an email
Resolved
YES
See what tabs I have open
Resolved
NO
Find this market after being told its title and download the AI generated banner
Resolved
NO
Resolve a Manifold question
Resolved
NO
Transfer money from one bank account to another
Resolved
NO
Drag a file from one folder to another

Anthropic has just released a new interesting update.

https://www.anthropic.com/news/3-5-models-and-computer-use

For this question, I will be testing all of the created answers myself when the market closes. I'll give Claude 5 tries. It needs to get it "right" (accomplish the task) at least 4 of those times.

Due to the tricky nature of testing and possible ambiguity, I will not be betting.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ344
2Ṁ45
3Ṁ40
4Ṁ30
5Ṁ24
Sort by:

Summary:

  • Exit vim: does this very easily

  • Send an email: it consistently put the title of the email into the recipient field, but usually corrected itself after seeing the error message. Succeeded 4/5 times.

  • See what tabs I have open: does this very easily. I had to adjust the number of visible tabs to be 5, due to the tiny screen that the Computer Use Demo uses.

  • Find this market after being told its title and download the AI generated banner: it ran into rate limiting errors more than once which stopped it from making progress, which I count as failure.

  • Resolve a Manifold question: runs into rate limiting errors before managing to log in

  • Transfer money from one bank account to another: it refuses to do the task

  • Drag a file from one folder to another: it can't do this, apparently it lacks the ability to "drag" while clicking in a GUI

@MagnusAnderson Can both accounts be owned by me, or no?

Like a checkings account and a savings account.

@singer yes. You can use a password manager with the credentials saved for the banks' page and include an instruction for how to use the password manager (bitwarden is ctrl+shift+L for example)

@singer Equivalent requirements of this: /singer/will-ai-correctly-see-what-tabs-i-h

@singer Specifically it needs to click and drag to accomplish this.

sold Ṁ50 NO

@singer Does it need to download just the banner? If it just saves all images on the page or the whole page that will include the banner without identifying the banner specifically.

@LiamZ Good point. It has to return just the specific image to me, regardless of how it downloaded it.

@singer Equivalent to the resolution criteria here: /singer/will-ai-automate-guis-by-end-of-202

@singer It needs to open the browser itself. It will be told there's a "Manifold market" with the title of the market, but no other details of where to find it.

@singer The inbox will be open onscreen to start with.

@singer The vim window will be open onscreen to start with.

bought Ṁ5 NO

@singer So to be clear it doesn’t need to do any GUI interactions right? Just hit ZZ or :wq or :q! ? Does just closing parent terminal window count (can you run in tmux to exclude this)?

@LiamZ I'll run it in tmux, good idea. Yes, the intention is that it types ZZ or another vim command to exit.

@singer I hope it writes a bash command in :! to ps | grep | sigkill itself lol.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules