This is part of a series of markets about things I could do in 2024. See also: /Mira/which-of-mira-s-cool-ideas-will-mir
Summary
An AGI is supposed to be able to do everything a human can do. Most humans can't do everything a human can do, and so do not qualify as general intelligences.
If @Mira works on a generalized agent that can consume and produce audio, text, image, and video, what types of things will it be able to do?
To count, tasks must be:
Learned. The agent must be initialized to a generic state(random or zero state)
Learned ex-nihilo. No language models that consume the entire internet. No cloning of human behavior to play a video game. If it's going to "invent sorting algorithms", that means being given an "is sorted?" predicate and having to learn the algorithm from querying True/False on test cases.
For tasks like "can hold a conversation", it will have to learn human language, so there is some predictive cloning necessary. It's okay to learn from human data in the sense of observing it, but not to train a human-designed architecture with a human-designed dataset.
If it independently rederives the Transformer architecture and consumes the entire internet, it would count. But not if I handwrite a Transformer-based network and command it trained against the internet.
Tasks moreso than benchmarks: I want to be able to make a YES/NO decision on these. While I might still resolve PROB, the task itself should naturally be YES/NO. "Scoring really high on the SAT" is not interesting because it is a test of memorization; "Beating Factorio" when the game must be learned from pixels, it can't inherently know how to read the text, and there is mid-range planning, shows intelligence just from the problem.
Cheap to actually test. A real AGI should do expensive things too, but do you really need "can fabricate a CPU from Silicon?" as your task when you could have "can execute simple place & route of a 32-bit adder", "Given an oracle for chemical reactions, and an environment for placing atoms, can infer the Boron-doping process to create semiconductors", [repeat 10x].
Answers do not need to be precise. "Hold a conversation with me" is something akin to a Turing Test, but a much weaker standard.
I've left the question open so anyone can add their own interesting tasks that an "AGI" should be able to do. I will may edit it if it doesn't meet these standards, or NA it if it's a joke or unsalvagable answer.
I would be overjoyed to do "program induction"(solving Sudoku or inventing sorting algorithms) ex-nihilo. Everything else is more of a solicitation for ideas.
Market Mechanics
Trigger condition: @Mira writes in the comments that such project has started.
Each option resolves NA if the trigger condition is not met, or if @Mira chooses to cancel it as being poorly-written.
Related questions
If @Mira is Gwern it’s a hard “yes” for me, but otherwise…?
Have you made any verifiable info about your identity or background public so that we could make an informed bet on this market?
What kind of CPU and GPU resources would you have access to?
Asking as most of these are, or at least smell like, DeepMind-style MCTS agent goal, but this kind of approach apparently requires ungodly amount of compute both for the NN and non-NN (gym/environnement) part.
Personal anecdote: after DM's matrix multiply paper, I tried implementing a generic "code in assembly for a small virtual machine" type RL agent, with a modest goal (find a program in that VM's assembly that would compute a polynomial function given input in registers), and most of the compute went into actually evaluating the programs. VM was CPU-implemented, branch-y, so very meh perf. Getting the agent to emit sensical assembly (no faulting on memory, no infinite loops... ) required a lot of data, including feeding "synthetic examples" of generated correct loops and memory access instr. Did not get the agent to do anything useful toward the goal. Now fully appreciate LeCun's judgment on RL: very data intensive, agent view is a tiny sliver of the world (win/loss, or in that case deviation form expected output), even though world modeling alleviated this a bit.
Aniwei, enough blog posting: of all ideas currently in the Mira-doing-thing-verse, this sounds like the one where you'd have the most edge, so, ganbatte!
@CamillePerrin Google DeepMind has a lot of stuff that could qualify. I've definitely cloned a few of their papers before, so the ideas have likely rubbed off. I was big into RL and evolutionary algorithms before LLMs took off and I had to learn more about those.
I have a different architecture I'm thinking of though, that nobody else is doing. It's not going to look similar to Google's.