Current AI agents (circa Jan 2024) are quite bad at clicking, reading screenshots, and interpreting the layout of webpages and GUIs. This is expected to change in the near future, with AI capable enough to navigate an arbitrary GUI about as well as a human.
Example of an early system of this type: https://github.com/OthersideAI/self-operating-computer/tree/main?tab=readme-ov-file#demo
Resolution criteria:
This question resolves YES if, the day after 2024 ends, I can direct an AI agent to resolve this market as YES using only voice commands while blindfolded. It resolves NO if this takes over 30 minutes.
Update:
There are no restrictions on whether the AI agent is free, open source, proprietary, local, remote, etcetera.
Update:
If someone else on Manifold can demonstrate an AI agent resolving a Manifold market as YES (while following the same restrictions that I would have followed), then I'll resolve this one as YES too. This is in case I'm not able to get access to the AI agent myself for testing.
Update:
The agent will need to be able to open a web browser and login to Manifold on its own.
Update 2025-02-01 (PST) (AI summary of creator comment): Additional Resolution Criteria:
The AI agent must not require modifying with custom code (e.g., writing scripts).
@traders I'm resolving NO. I dislike this resolution because the bottleneck to an AI resolving the market YES is mainly just my interface to the AI. Someone could probably rig up the Claude Computer Use demo with a voice interface and iterate on it until it was able to navigate and click successfully for this market. If you interpreted the market that way, feel free to DM me and I can refund you.
@singer I agree with your resolution, i could allways have built tools to do so, but thats not automation in the spirit of the question
@singer I'm happy enough with the resolution. Traded yes on impression ~fundamental capabilities are sufficient and it could exist.
This is the sort of situation where someone sufficiently invested in the situation can add the necessary scaffolding to allow the market to resolve in line with those fundamental capabilities if they're right that they're there, and since nobody managed ahead of the deadline resolving no seems like the only correct option.
@traders I expected there would be a voice driven agent by now, like the Claude Computer Use demo but with voice chat. As far as I'm aware there isn't such a thing yet. I did try to resolve a Manifold market using that same demo yesterday, yet it couldn't do it (crashes due to rate limiting errors, which it doesn't have a built-in way of recovering from). I'll resolve this market as NO in two days, unless somebody can suggest an AI agent to use with a voice interface (and which doesn't require me modifying it with custom code, e.g. writing some script).
@singer should be (😉) fairly easy if you use the github repo “computer use out of the box (ootb)” + gemini screen sharing
@nosuch computer_use_ootb looks interesting. But as far as creating an AI system to resolve the market, it's too late for that. I wish that when I made this market I hadn't emphasized the need for a voice interface, which I didn't expect would become the main bottleneck.
@singer even though it can't control all interfaces, as it is only a demo, maybe: https://youtu.be/L-GLo-1IR_k with the GitHub repo https://github.com/HumeAI/voice-computer-use
https://github.com/AmberSahdev/Open-Interface/blob/main/MEDIA.md#demos looks promising, but might be a bit tricky blindfolded. You have to press submit for your voice command. Maybe this can be achieved by using keyboard shortcuts like alt+tab to switch back to the app and pressing tab a few times to get to the submit button. I would argue that the real input is still only voice commands. However, there seems not to be an audio feedback without scripting. That would make it impractical.
Disclaimer: I have just looked at the demos and didn't try them myself.
@Topology These are excellent demos I didn't know about and I appreciate you taking the time to find them. I can't get the first one to work, and the second one is excluded using the (frankly ad-hoc) interpretation of the criteria I've decided to go with, which is me not needing to modify the AI just for the sake of this market. The reason I'm going with this interpretation is that I could spend 4-5 hours myself creating a jury-rigged demo to specifically resolve this market, but it wouldn't prove anything.
@MaxMorehead No, and I'm not aware of any agent that actually has both a voice interface and computer use ability. If you know one, please tell me.
@singer I bet NO on this market and think it should resolve NO (I sold due to the chance that I've missed something or the chance of misresolution), so I'm not the one to ask. I don't believe it's possible to do this, and I think it's incumbent on the YES holders to clearly demonstrate that it is.
https://docs.anthropic.com/en/docs/build-with-claude/computer-use
Let's go!
I wonder if it works well enough
@singer I think it's somewhat likely that this market will resolve YES, after testing it for a bit. The key thing is that for this question I can direct the AI while it's working (vocally, while blindfolded).
"OpenAI Shifts AI Battleground to Software That Operates Devices, Automates Tasks" [spamwalled; don't bother clicking]
OpenAI is developing a form of agent software to automate complex tasks by effectively taking over a customer’s device. The customer could then ask the ChatGPT agent to transfer data from a document to a spreadsheet for analysis, for instance, or to automatically fill out expense reports and enter them in accounting software. Those kinds of requests would trigger the agent to perform the clicks, cursor movements, text typing and other actions humans take as they work with different apps, according to a person with knowledge of the effort.
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks (jykoh.com)
GPT-4V has a success rate of only 16.37% on web tasks, whereas human-level performance is 88.70%. Not sure whether resolving this market is one of the easier tasks, but it seems we have a way to go before AI achieves human-level web browsing.