What GUI tasks will Claude be able to do automatically before end of 2024?
What GUI tasks will Claude be able to do automatically before end of 2024?
17
1.5kṀ1553
resolved Jan 2
Resolved
YES
Exit vim
Resolved
YES
Send an email
Resolved
YES
See what tabs I have open
Resolved
NO
Find this market after being told its title and download the AI generated banner
Resolved
NO
Resolve a Manifold question
Resolved
NO
Transfer money from one bank account to another
Resolved
NO
Drag a file from one folder to another

Anthropic has just released a new interesting update.

https://www.anthropic.com/news/3-5-models-and-computer-use

For this question, I will be testing all of the created answers myself when the market closes. I'll give Claude 5 tries. It needs to get it "right" (accomplish the task) at least 4 of those times.

Due to the tricky nature of testing and possible ambiguity, I will not be betting.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ344
2Ṁ45
3Ṁ40
4Ṁ30
5Ṁ24


Sort by:
5mo

Summary:

  • Exit vim: does this very easily

  • Send an email: it consistently put the title of the email into the recipient field, but usually corrected itself after seeing the error message. Succeeded 4/5 times.

  • See what tabs I have open: does this very easily. I had to adjust the number of visible tabs to be 5, due to the tiny screen that the Computer Use Demo uses.

  • Find this market after being told its title and download the AI generated banner: it ran into rate limiting errors more than once which stopped it from making progress, which I count as failure.

  • Resolve a Manifold question: runs into rate limiting errors before managing to log in

  • Transfer money from one bank account to another: it refuses to do the task

  • Drag a file from one folder to another: it can't do this, apparently it lacks the ability to "drag" while clicking in a GUI

answered7mo
Transfer money from one bank account to another

@MagnusAnderson Can both accounts be owned by me, or no?

Like a checkings account and a savings account.

7mo

@singer yes. You can use a password manager with the credentials saved for the banks' page and include an instruction for how to use the password manager (bitwarden is ctrl+shift+L for example)

answered7mo
See what tabs I have open
answered7mo
Drag a file from one folder to another
7mo

@singer Specifically it needs to click and drag to accomplish this.

sold Ṁ50 Find this market aft... NO

@singer Does it need to download just the banner? If it just saves all images on the page or the whole page that will include the banner without identifying the banner specifically.

7mo

@LiamZ Good point. It has to return just the specific image to me, regardless of how it downloaded it.

answered7mo
Resolve a Manifold question

@singer Equivalent to the resolution criteria here: Will AI automate GUIs by end of 2024?NO

answered7mo
Find this market after being told its title and download the AI generated banner

@singer It needs to open the browser itself. It will be told there's a "Manifold market" with the title of the market, but no other details of where to find it.

answered7mo
Send an email
7mo

@singer The inbox will be open onscreen to start with.

answered7mo
Exit vim
7mo

@singer The vim window will be open onscreen to start with.

bought Ṁ5 Resolve a Manifold q... NO7mo

@singer So to be clear it doesn’t need to do any GUI interactions right? Just hit ZZ or :wq or :q! ? Does just closing parent terminal window count (can you run in tmux to exclude this)?

7mo

@LiamZ I'll run it in tmux, good idea. Yes, the intention is that it types ZZ or another vim command to exit.

7mo

@singer I hope it writes a bash command in :! to ps | grep | sigkill itself lol.

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.TermsPrivacy