
Anthropic has just released a new interesting update.
https://www.anthropic.com/news/3-5-models-and-computer-use
For this question, I will be testing all of the created answers myself when the market closes. I'll give Claude 5 tries. It needs to get it "right" (accomplish the task) at least 4 of those times.
Due to the tricky nature of testing and possible ambiguity, I will not be betting.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ344 | |
2 | Ṁ45 | |
3 | Ṁ40 | |
4 | Ṁ30 | |
5 | Ṁ24 |
People are also trading
Summary:
Exit vim: does this very easily
Send an email: it consistently put the title of the email into the recipient field, but usually corrected itself after seeing the error message. Succeeded 4/5 times.
See what tabs I have open: does this very easily. I had to adjust the number of visible tabs to be 5, due to the tiny screen that the Computer Use Demo uses.
Find this market after being told its title and download the AI generated banner: it ran into rate limiting errors more than once which stopped it from making progress, which I count as failure.
Resolve a Manifold question: runs into rate limiting errors before managing to log in
Transfer money from one bank account to another: it refuses to do the task
Drag a file from one folder to another: it can't do this, apparently it lacks the ability to "drag" while clicking in a GUI
@MagnusAnderson Can both accounts be owned by me, or no?
Like a checkings account and a savings account.
@singer yes. You can use a password manager with the credentials saved for the banks' page and include an instruction for how to use the password manager (bitwarden is ctrl+shift+L for example)
@singer Does it need to download just the banner? If it just saves all images on the page or the whole page that will include the banner without identifying the banner specifically.
@LiamZ Good point. It has to return just the specific image to me, regardless of how it downloaded it.
@singer It needs to open the browser itself. It will be told there's a "Manifold market" with the title of the market, but no other details of where to find it.
@singer So to be clear it doesn’t need to do any GUI interactions right? Just hit ZZ or :wq or :q! ? Does just closing parent terminal window count (can you run in tmux to exclude this)?
@LiamZ I'll run it in tmux, good idea. Yes, the intention is that it types ZZ or another vim command to exit.