Will adding an Attention layer improve the performance of my stock trading model?
25
530Ṁ646
resolved Dec 4
Resolved
NO

Let's see if Manifold users can accurately predict how AI models should be trained to save electricity.

I have been writing a 55M parameter stock and cryptocurrency trading model. A test model with only 4M parameters was trained and is already useful in trading. Now, I bought more graphics cards to make use of all the data I have.

The large model is to be trained on 120 million sequences of bars of OHLCV data (about 5TB), with an additional sixth feature ("imputed"), which is 1 if a bar is missing from the data or is an outlier; 0 otherwise.

The number of bars in each sequence is greater than 1000, and the candle time is five minutes or greater. The output is the predicted price several bars in the future. Standard techniques like scaling and Dropout are used. I won't reveal what is at the beginning of the model. Here is the middle and end:

  • Secret layers at the beginning of the model

  • Several LSTM layers

  • The proposed code (or not):

    x = LayerNormalization()(x)
    x = Attention()([x, x])
    x = LayerNormalization()(x)
  • One more LSTM layer

  • Multiple Dense layers

I will train each model for one epoch - about 144 hours total - on three 4090 graphics cards. One run will include the model without the code in black, and the second will include it.

If the model with this code has a lower Huber loss function than the model without it, this market will resolve to YES. If an error occurs during training that cannot be resolved without significantly changing the model architecture, or if training is unable to be completed due to a physical problem like a card failure, the market will resolve to N/A. Otherwise, it will resolve to NO.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ76
2Ṁ66
3Ṁ28
4Ṁ25
5Ṁ23
© Manifold Markets, Inc.TermsPrivacy