If @Mira works on AGI, how far will I get? (2024)
11
218
915
resolved Apr 24
Resolved
N/A
Solve easy Sudoku puzzles
Resolved
N/A
Operate a robot through a maze
Resolved
N/A
Solve arithmetic word problems
Resolved
N/A
Beat Super Mario Bros
Resolved
N/A
Mine diamond in Minecraft
Resolved
N/A
Solve Project Euler problems
Resolved
N/A
Invent sorting algorithms
Resolved
N/A
Read the Manifold Markets documentation and place 1 bet on any market
Resolved
N/A
Hold an audio conversation with me
Resolved
N/A
Hold a video conversation with me
Resolved
N/A
Beat Factorio

This is part of a series of markets about things I could do in 2024. See also: /Mira/which-of-mira-s-cool-ideas-will-mir

Summary

An AGI is supposed to be able to do everything a human can do. Most humans can't do everything a human can do, and so do not qualify as general intelligences.

If @Mira works on a generalized agent that can consume and produce audio, text, image, and video, what types of things will it be able to do?

To count, tasks must be:

  • Learned. The agent must be initialized to a generic state(random or zero state)

  • Learned ex-nihilo. No language models that consume the entire internet. No cloning of human behavior to play a video game. If it's going to "invent sorting algorithms", that means being given an "is sorted?" predicate and having to learn the algorithm from querying True/False on test cases.

    • For tasks like "can hold a conversation", it will have to learn human language, so there is some predictive cloning necessary. It's okay to learn from human data in the sense of observing it, but not to train a human-designed architecture with a human-designed dataset.

    • If it independently rederives the Transformer architecture and consumes the entire internet, it would count. But not if I handwrite a Transformer-based network and command it trained against the internet.

  • Tasks moreso than benchmarks: I want to be able to make a YES/NO decision on these. While I might still resolve PROB, the task itself should naturally be YES/NO. "Scoring really high on the SAT" is not interesting because it is a test of memorization; "Beating Factorio" when the game must be learned from pixels, it can't inherently know how to read the text, and there is mid-range planning, shows intelligence just from the problem.

  • Cheap to actually test. A real AGI should do expensive things too, but do you really need "can fabricate a CPU from Silicon?" as your task when you could have "can execute simple place & route of a 32-bit adder", "Given an oracle for chemical reactions, and an environment for placing atoms, can infer the Boron-doping process to create semiconductors", [repeat 10x].

  • Answers do not need to be precise. "Hold a conversation with me" is something akin to a Turing Test, but a much weaker standard.

I've left the question open so anyone can add their own interesting tasks that an "AGI" should be able to do. I will may edit it if it doesn't meet these standards, or NA it if it's a joke or unsalvagable answer.

I would be overjoyed to do "program induction"(solving Sudoku or inventing sorting algorithms) ex-nihilo. Everything else is more of a solicitation for ideas.

Market Mechanics

Trigger condition: @Mira writes in the comments that such project has started.

Each option resolves NA if the trigger condition is not met, or if @Mira chooses to cancel it as being poorly-written.

Get Ṁ200 play money
Sort by:

Canceling a bunch of low-volume personal markets due to the pivot.

They would likely have all resolved NO because the projects, while potentially able to solve some of these, don't match the constraints.

This market is very impressively optimistic, I'm not sure I would give a 15% probability of beating factorio from screen pixels over a 1 year period to any individual on earth under the stated requirements.

If @Mira is Gwern it’s a hard “yes” for me, but otherwise…?

Have you made any verifiable info about your identity or background public so that we could make an informed bet on this market?

bought Ṁ10 of Solve easy Sudoku pu... YES

What kind of CPU and GPU resources would you have access to?

Asking as most of these are, or at least smell like, DeepMind-style MCTS agent goal, but this kind of approach apparently requires ungodly amount of compute both for the NN and non-NN (gym/environnement) part.

Personal anecdote: after DM's matrix multiply paper, I tried implementing a generic "code in assembly for a small virtual machine" type RL agent, with a modest goal (find a program in that VM's assembly that would compute a polynomial function given input in registers), and most of the compute went into actually evaluating the programs. VM was CPU-implemented, branch-y, so very meh perf. Getting the agent to emit sensical assembly (no faulting on memory, no infinite loops... ) required a lot of data, including feeding "synthetic examples" of generated correct loops and memory access instr. Did not get the agent to do anything useful toward the goal. Now fully appreciate LeCun's judgment on RL: very data intensive, agent view is a tiny sliver of the world (win/loss, or in that case deviation form expected output), even though world modeling alleviated this a bit.

Aniwei, enough blog posting: of all ideas currently in the Mira-doing-thing-verse, this sounds like the one where you'd have the most edge, so, ganbatte!

@CamillePerrin Google DeepMind has a lot of stuff that could qualify. I've definitely cloned a few of their papers before, so the ideas have likely rubbed off. I was big into RL and evolutionary algorithms before LLMs took off and I had to learn more about those.

I have a different architecture I'm thinking of though, that nobody else is doing. It's not going to look similar to Google's.