Low frequency response from LSTM model

5 visualizaciones (últimos 30 días)
Shubham Baisthakur
Shubham Baisthakur el 6 de Nov. de 2023
Comentada: Shubham Baisthakur el 17 de Nov. de 2023
How to get low frequency output from LSTM network? Following is the time history response of my input features, which has relatively low frequency component
My LSTM network architecture is as follows:
layers = [
sequenceInputLayer(size(X_train{1}, 1)) % Input Features (F)
lstmLayer(x.num_hidden_units_1, 'OutputMode', 'sequence')
tanhLayer
dropoutLayer(0.05)
lstmLayer(x.num_hidden_units_2, 'OutputMode', 'sequence')
dropoutLayer(0.05)
tanhLayer
fullyConnectedLayer(x.num_layers_ffnn)
tanhLayer
fullyConnectedLayer(1)
];
During the training, my network predictions are plotted with the target output as follows
The network output has a very high frequency output on the valildation data, however when the model is used to predict the test data, it is giving a flat line.
The two major concerns for me are:
1) Why the LSTM network is giving high frequency output even when the input features have relatively low frequency?
2) During the training when the model has high frequency, why is it giving a flat line during testing?

Respuestas (1)

Debraj Maji
Debraj Maji el 17 de Nov. de 2023
I see that you are trying to understand why your LSTM network is giving a high frequency output on training data even though the input features have a low frequency.
The model might have overfitted to the training data and captured noise or specific patterns that are not generalizable. This can result in high-frequency outputs on the training set, but when applied to unseen data, the model fails to generalize, leading to a flat line.
LSTMs are designed to overcome limitations of traditional RNN based architectures as they can capture long term-dependencies in sequential data. They are not inherently bad for low frequency data, however in your case, the network is unable to capture the underlying pattern in the sequence due to the nature of input features. The high frequency pattern in the output is mainly due to the introduction of noise in the output which in turn is a result of inaccuracies in the prediction.
The possible ways to mitigate this error are:
  • Increase the amount of training data.
  • Feature Engineering and feature scaling.
  • Experiment with different initializations, learning rates, or optimization algorithms to stabilize training. Monitoring training and validation loss curves can provide insights into model stability.
  • Systematically tune hyperparameters using techniques like grid search or random search to find the most suitable values for your specific problem.
For more information on fine tuning a LSTM you can refer to the following documentation: https://in.mathworks.com/help/deeplearning/ug/long-short-term-memory-networks.html
  1 comentario
Shubham Baisthakur
Shubham Baisthakur el 17 de Nov. de 2023
Hello @Debraj Maji, I don't think this is possibly a case of overfitting to the training data because even the training data does not have such a high frequency componenet. I have already applied the possible ways you have suggested but couldn't get rid of the high frequency.
I think the error could be in the way I am defining the custom training loop, because the model tend to perform quite well when the model is trained using the "trainnet" function.
I am attaching the code for custom training loop, can you spot any obvious errors?
function [val_loss, net] = LSTM_NetworkOptimization_CustomLoop(x, X_train, Y_train, X_val, Y_val, is_training, isCustomTraining)
% Define hyperparameters
if size(X_train,1) > 1
batchSize = x.batch_size;
else
batchSize = 1;
end
sequenceLength = x.sequence_length;
if is_training
numEpochs = 400; % Adjust as needed
minEpochs = 300;
else
numEpochs = 300;
end
% Check if GPU is available
useGPU = canUseGPU();
% Create the LSTM model
layers = [
sequenceInputLayer(size(X_train{1}, 1)) % Input Features (F)
lstmLayer(x.num_hidden_units_1, 'OutputMode', 'sequence')
tanhLayer
dropoutLayer(x.drop_out_rate*0.5)
lstmLayer(x.num_hidden_units_2, 'OutputMode', 'sequence')
dropoutLayer(x.drop_out_rate*0.5)
tanhLayer
fullyConnectedLayer(x.num_layers_ffnn)
tanhLayer
fullyConnectedLayer(1)
];
net = dlnetwork(layers);
% Define training options
iteration = 0;
epoch = 0;
% Adam parameters
averageGrad = [];
averageSqGrad = [];
% Initialize training progress monitor
if is_training
monitor = trainingProgressMonitor(Info="Epoch", XLabel="Iteration");
monitor.Metrics = ["Loss", "Data_Loss", "Var_Loss", "Validation_Loss","Frequency_Loss","Power_Loss","Energy_Loss"];
groupSubPlot(monitor, "Loss-Components", ["Loss", "Data_Loss", "Var_Loss","Frequency_Loss","Power_Loss","Energy_Loss"]);
groupSubPlot(monitor, "Validation-Loss", "Validation_Loss");
end
% Initialize variables for tracking consecutive unchanged validation losses.
consecutive_unchanged_loss = 0;
previous_val_loss = inf;
% Network training
while epoch < numEpochs && consecutive_unchanged_loss < 10
epoch = epoch + 1;
% Shuffle data
idx = randperm(numel(Y_train));
X_epoch = X_train(idx); % Shuffle both input and output data
Y_epoch = Y_train(idx);
for i = 1:batchSize:numel(Y_train)
% Prepare mini-batch data
startIndex = i;
endIndex = min(i + batchSize - 1, numel(Y_train));
% Pad or truncate sequences to match the specified sequence length
X_miniBatch = padOrTruncate(X_epoch(startIndex:endIndex), sequenceLength,useGPU);
Y_miniBatch = padOrTruncate(Y_epoch(startIndex:endIndex), sequenceLength,useGPU);
Xepoch_val = padOrTruncate(X_val(startIndex:endIndex), length(X_val{1}),useGPU);
Yepoch_val = padOrTruncate(Y_val(startIndex:endIndex), length(Y_val{1}),useGPU);
X = X_miniBatch;
T = Y_miniBatch;
% Evaluate the model loss and gradients using dlfeval and the modelLoss function.
isValidation = false();
[loss,data_loss,var_loss,freq_loss,power_loss,energy_loss,gradients] = dlfeval(@modelLoss_LSTM,net,X,T,isValidation,numEpochs);
% Update the network parameters using the Adam optimizer.
iteration = iteration + 1;
[net,averageGrad,averageSqGrad] = adamupdate(net,gradients,averageGrad,averageSqGrad,iteration);
% Compute validation loss
isValidation = true();
[val_loss] = dlfeval(@modelLoss_LSTM,net,Xepoch_val,Yepoch_val,isValidation,numEpochs);
% Check if the validation loss has not changed
if is_training && epoch > minEpochs
if abs(val_loss - previous_val_loss) < 1e-6
consecutive_unchanged_loss = consecutive_unchanged_loss + 1;
else
consecutive_unchanged_loss = 0;
end
previous_val_loss = val_loss;
end
% Update the training progress monitor
if is_training
recordMetrics(monitor, iteration, Loss=extractdata(loss), Data_Loss=extractdata(data_loss), Var_Loss=extractdata(var_loss), Validation_Loss=val_loss,Frequency_Loss=freq_loss, Power_Loss=power_loss, Energy_Loss=energy_loss);
updateInfo(monitor, Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration / (numEpochs * numel(Y_train));
end
end
end
end
% Helper function to pad or truncate sequences
function paddedSequence = padOrTruncate(sequence, targetLength, UseGPU)
paddedSequence = cell(size(sequence));
for i = 1:numel(sequence)
if size(sequence{i}, 2) < targetLength
padding = zeros(size(sequence{i}, 1), targetLength - size(sequence{i}, 2));
paddedSequence{i} = [sequence{i}, padding];
elseif size(sequence{i}, 2) > targetLength
paddedSequence{i} = sequence{i}(:, 1:targetLength);
else
paddedSequence{i} = sequence{i};
end
paddedSequence{i} = dlarray(paddedSequence{i},"CT");
if UseGPU
paddedSequence{i} = gpuArray(paddedSequence{i});
end
end
end

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by