Projection of LSTM layer vs GRU layer

6 visualizaciones (últimos 30 días)
Silvia
Silvia el 28 de Mayo de 2024
Comentada: Silvia el 10 de Jun. de 2024
I am training two RNNs, one with a LSTM layer and the other one with a GRU layer. The two architectures are the following:
numFeatures = 1;
numHiddenUnits = 32;
layersLSTM = [
sequenceInputLayer(numFeatures)
lstmLayer(numHiddenUnits, OutputMode="sequence")
fullyConnectedLayer(numFeatures)
];
layersGRU = [
sequenceInputLayer(numFeatures)
gruLayer(numHiddenUnits, OutputMode="sequence")
fullyConnectedLayer(numFeatures)
];
Using the GRU architecture and training the projected model, the Validation RMSE and Loss do not follow the Training RMSE and Loss as shown in the image below:
It's the first time that this happens. For the LSTM NN I've never had this problem (both for the architecture with LSTM layer and the one with LSTM projected layer), and also training the GRU NN model without projection I didn't have this problem. The validation could follow the metrics properly. What could this problem be due to?
I have also a second question:
Following the two examples in Matlab I set the parameters of outputProjectorSize and inputProjectorSize to:
  • 75% of the number of Hidden Units and 25% of the Input size respectively for LSTM
  • 25% of the number of Hidden Units and 75% of the Input size respectively for GRU
So, for the GRU it's the opposite. Is there a reason behind this choise?
Thank you in advance!

Respuestas (1)

Maksym Tymchenko
Maksym Tymchenko el 3 de Jun. de 2024
I am glad to see that you are using our new projection features.
I'll start by answering the second question.
From what I see, both examples are using the exact same definition for OutputProjectorSize and InputProjectorSize in the section "Compare Network Projection Sizes":
  • An output projector size of 25% of the number of hidden units.
  • An input projector size of 75% of the input size.
These are reasonable parameter sizes to choose because they result in the lstmProjectedLayer having fewer learnable parameters compared to an lstmLayer with the same number of hidden units. Note that it is possible to choose values that will result in a projected layer being larger than the original layer. To avoid this, use the function compressNetworkUsingProjection which will determine these parameters sizes automatically based on the desired amount of compression specified.
Alternatively, if you want to create the projected layers from scratch, follow the Tips in the description of the the OutputProjectorSize and InputProjectorSize parameters. These say that, to ensure that the projected layer requires fewer learnable parameters than the corresponding non-projected layer:
  1. For an lstmProjectedLayer: set the OutputProjectorSize property to a value less than 4*NumHiddenUnits/5, and set the InputProjectorSize property to a value less than 4*NumHiddenUnits*inputSize/(4*NumHiddenUnits+inputSize)
  2. For a gruProjectedLayer: set the OutputProjectorSize property to a value less than 3*NumHiddenUnits/4, and set the InputProjectorSize property to a value less than 3*NumHiddenUnits*inputSize/(3*NumHiddenUnits+inputSize)
These formulas can be derived by expressing the total number of learnable parameters as a function of the number of hidden units and the input size. For more information, see the algorithms section of the pages lstmProjectedLayer and gruProjectedLayer.
Regarding your first question, I would need the full reproduction steps, including the script and dataset used, in order to investigate what the issue is. Please feel free to share these as an attachment to this post. Or alternatively, you can open a technical support request with the reproduction steps.
  1 comentario
Silvia
Silvia el 10 de Jun. de 2024
Thank you for the detailed explanations and the interesting insight into the compressNetworkUsingProjection function!
Unfortunately, as far as the codes and datasets are concerned, I cannot share anything for reasons of data privacy.
But thank you again for your help!
Silvia

Iniciar sesión para comentar.

Categorías

Más información sobre Image Data Workflows en Help Center y File Exchange.

Productos


Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by