Attention + Diffusion + Search = AGI?
9
64
200
2060
19%
chance

Resolves as YES if, within 18 months of the first AGI being revealed, there is strong evidence of an AGI model combining these three components:

  • Attention: The model contains an attention mechanism related to the one in the original transformer, and/or uses products of embeddings (softmax) for some form of retrieval.

  • Diffusion: The model learns some form of denoising objective related to the one used for DDPM.

  • Search: The model has a search mechanism enabling it to explore an open-ended set of responses during inference. It has the ability to compare and evaluate candidate solutions before responding. The search mechanism also enables it to explore memories + datasets.

The AGI in question may not be the first one that is revealed. For this question to resolve as YES, it must be primarily defined by these three components/techniques. It can include other components, but these should play limited roles within the model's training/inference processes.

An AGI with a "memory module" relying primarily on another technique (that does not fit into one of these three categories) would not qualify. An AGI that solves long term memory by, for example, scaling up long context attention combined with search would qualify. In the context of this question, RAG approaches broadly qualify as attention/search based techniques.

If there is weak evidence that such a model exists 18 months after the first AGI (but no strong evidence), then this question resolves as N/A. If there is strong evidence that a model with these components exists at this time, but it is unclear if its other components play a critical role, then this question resolves as N/A.

If 18 months after the revealing of the first AGI, there is no evidence that such a model exists, then this question resolves as NO.

Related questions:

/RemNi/attention-diffusion-search-agi (this question)

/RemNi/attention-diffusion-search-rl-agi

Get Ṁ600 play money
Sort by:

You're missing some sort of memory component.

@patrik maybe! I think it's possible that memory could be solved by scaling up long context attention and training the model to search efficiently within a dataset, without a fancy "memory module" that leverages some other technique than the three listed in this question.

@patrik have updated the description to be a bit clearer on that point. Think it's definitely possible that we'll get a memory module that relies on another technique than these three. Thanks for the feedback!

I can see attention and search factoring into this, but how would diffusion help?

@singer diffusion appears to be significant regarding the prediction of complex structured data as of 2024

@RemNi I'd be curious to hear examples, since I'm only aware of it being used for image generation.

@singer alpha fold 3

More related questions