Attention + Diffusion + Search + RL = AGI?
3
47
200
2060
25%
chance

Resolves as YES if, within 18 months of the first AGI being revealed, there is strong evidence of an AGI model combining these four components:

  • Attention: The model contains an attention mechanism related to the one in the original transformer, and/or uses products of embeddings (softmax) for some form of retrieval.

  • Diffusion: The model learns some form of denoising objective related to the one used for DDPM.

  • Search: The model has a search mechanism enabling it to explore an open-ended set of responses during inference. It has the ability to compare and evaluate candidate solutions before responding. The search mechanism also enables it to explore memories + datasets.

  • Reinforcement Learning: The model uses reinforcement learning (e.g. RLHF) during training and/or inference.

The AGI in question may not be the first one that is revealed. For this question to resolve as YES, it must be primarily defined by these three components/techniques. It can include other components, but these should play limited roles within the model's training/inference processes.


An AGI with a "memory module" relying primarily on another technique (that does not fit into one of these four categories) would not qualify. An AGI that solves long term memory by, for example, scaling up long context attention combined with search would qualify. In the context of this question, RAG approaches broadly qualify as attention/search based techniques.

If there is weak evidence that such a model exists 18 months after the first AGI (but no strong evidence), then this question resolves as N/A. If there is strong evidence that a model with these components exists at this time, but it is unclear if its other components play a critical role, then this question resolves as N/A.

If 18 months after the revealing of the first AGI, there is no evidence that such a model exists, then this question resolves as NO.

Related questions:

/RemNi/attention-diffusion-search-agi

/RemNi/attention-diffusion-search-rl-agi (this question)

Get Ṁ600 play money
Sort by:

I think this is overly complicated.

All these things are just shortcuts to reduce computational power. I believe that it will eventually be found that given the same data, assuming you have enough neurons, the architecture is largely irrelevant. Just like now it doesn't really matter what device or method you use to render a game because there's enough computational power available, in 10 years it won't matter what is used to train a LLM because there's sufficient computational power to try any one of a number of architectures that basically converge to whatever the data would converge to.

@SteveSokolowski Maybe! But this question isn't necessarily about that, it specifically refers to a class of model that may or may not exist within 18 months of the first AGI, not the ultimate form these models may eventually take.

@RemNi Isn't it already proven that AGI can be created with something like:

x = Dense(inputs, 10 ** 20)

x = Dense(x, 1, ....)

and that's it?

@SteveSokolowski Not saying it can't be done, but that certainly has not been demonstrated yet.

More related questions