
Forward-Forward Algorithm is a new type of neural network training algorithm that's supposed to replace backpropagation. It was proposed by Geoffrey Hinton recently.
Link to the paper: https://www.cs.toronto.edu/~hinton/FFA13.pdf
Link to a Twitter thread with a less technical explanation: https://twitter.com/martin_gorner/status/1599755684941557761
Link to the leaderboard: https://paperswithcode.com/sota/image-classification-on-imagenet
If at the end of 2023 there is at least one entry in the leaderboard that uses Forward-Forward Algorithm (FFA), market resolves "Yes". The entry doesn't have to use exclusively FFA, it can use a mix of FFA and backpropagation.
65% accuracy is low, this threshold is to prevent proof-of-concept entries that aren't competitive with backprop at all.
If an entry uses barely any FFA but technically qualifies, I'll ask for help in resolving the market.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ232 | |
2 | Ṁ172 | |
3 | Ṁ32 | |
4 | Ṁ17 | |
5 | Ṁ15 |
People are also trading
supposed to replace backpropagation
Even the paper doesn't make that claim. Hinton explicitly says:
[the forward-forward algorithm] is unlikely to replace backpropagation for applications where power is not an issue. The exciting exploration of the abilities of very large models trained on very large datasets will continue to use backpropagation. The two areas in which the forward-forward algorithm may be superior to backpropagation are as a model of learning in cortex and as a way of making use of very low-power analog hardware without resorting to reinforcement learning
@jonsimon from this quote I assume that for applications where power is an issue, it may replace backpropagation.
I used the phrasing here to simplify the explanation, omitting "... for specific applications in specific conditions".
The paper mentions potential "mixed" models that may use both FFA and backprop, and those would also resolve this market as "yes", while not replacing backprop completely.
@l8doku what are the nature of these mixed models? And how much heavy lifting is backprop doing in them?
Can someone help me out with the paper: why would an early layer learn to recognize features that are useful for high-level features in a later layer?
If my confusion is correct then FF alone can't scale beyond simple problems (such as MNIST).
I also have no clue what's going on with figure 3 / the time-extended backwards RNN thing
@citrinitas I am not sure it does. It is not clear to me that this mechanism converges anywhere, let alone converges to something we care about.
@citrinitas Figure 3 and the RNN thing solve exactly the problem you describe. An early layer gets both the earlier (lower-level) layer's and the later (higher-level) layer's activations as input in the RNN version. This means earlier layers can learn from activations of higher layers. Solving this at the same time for all layers is too hard (or impossible?), so each layer treats all weights/activations as frozen except the ones directly above and below it. This way layers update iteratively, which can be represented in terms of previous and current time steps of an RNN (previous one is frozen, current one is being updated).
I'm trying to measure popularity of FFA with this market. Previous Hinton's ideas (CapsNets and GLOM) are interesting but aren't used in practice. CapsNets have trouble scaling, GLOM is more of a general concept, as far as I understand.
In contrast, FFA seems like it should be easier to scale. It can be one of the building blocks of a complex network that uses both FFA and previous algorithms. So I think it's more likely that FFA is used in practice compared to CapsNets and GLOM.