Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?
Mostrar comentarios más antiguos
Hi there,
I have been trying to use bayesian optimization for tuning my hyperparameters in my BiLSTM code (Hope this code helps some of the community because I saw unanswered questions on matlab related to LSTM bayesian optimization (similar to BiLSTM)).
In my code, one of the parameters I'm changing is depth of the BiLSTM network but, I should also try to find the best number of hidden units for each layer I think.
As you can see in the code, the maximum number of layers I want to test is 10 layers, so I created (HiddenUnits_1 --> HiddenUnits_10) under optimVars but, this number also depends on the number of layers we have in the network. For example: If a 5 layer (BiLSTM layers only) network needs to be adjusted, there should be 5 variables for hidden units (HiddenUnits_1 --> HiddenUnits_5) and the rest of the parameters (HiddenUnits_6 --> HiddenUnits_10) should not exist for that particular "experiment". I ran the code successfully but, it is trying to optimize for all 10 hidden units even if the layer size is smaller. Is there a way to avoid optimizing for unnecessary variables such as in this case (ignore hidden units 6-10 if there are only 5 layers in the current point being evaluated)?
Also, a little off topic but, related: Is there a way to optimize these hidden units in an array or a cell? Basically, can I write a cell array to be optimized with each cell being the different hidden units variables (HiddenUnits_1-HiddenUnits_10)? The reason I want to see if this is possible is becase I can modify the code to accept hidden units automatically from a cell array and I will not have to mention each hidden unit separetely because I can make that number dependent on the number of BiLSTM layers I believe (not tried it yet).
Thank you, any help or suggestions are appreciated.
Here is the code I have written for it:
%% Bayesian Optimization
optimVars = [
optimizableVariable('SectionDepth',[1 10],'Type','integer')
optimizableVariable('InitialLearnRate',[1e-2 1],'Transform','log')
optimizableVariable('HiddenUnits_1', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_2', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_3', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_4', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_5', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_6', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_7', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_8', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_9', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_10', [1 200], 'Type', 'integer')];
ObjFcn = makeObjFcn(Noisy_XTrain_PLE,Noisy_YTrain_PLE,PLE_Predictions_40_train,PLE_Predictions_40_test,mu_PLE,std_PLE);
% Perform bayesian optimization by minimizing error on validation set.
% Minimum of 30 runs is suggested for bayesian optimization (more can lead to better results).
BayesObject = bayesopt(ObjFcn,optimVars, ...
'MaxObj',30, ...
'MaxTime',14*60*60, ...
'IsObjectiveDeterministic',false, ...
'UseParallel',false);
% Load the best network found in optimization and load the filename
bestIdx = BayesObject.IndexOfMinimumTrace(end);
fileName = BayesObject.UserDataTrace{bestIdx};
savedStruct = load(fileName);
% Print validation error
TrainError = savedStruct.TotaltrainingError
valError = savedStruct.TotalvalError
%% Define the objective function for optimization
function ObjFcn = makeObjFcn(XTrain,YTrain,PLE_Predictions_training,PLE_Predictions_test,mu_PLE,std_PLE)
ObjFcn = @valErrorFun;
function [TotalvalError,cons,fileName] = valErrorFun(optVars)
% Create cell array of valError to save the validation error values
valError = cell(510,1);
TrainingError = cell(510,1);
% Random seed
seed = 100;
rng(seed);
% Input - Output features
numFeatures = 1;
numResponses = 1;
% Hyperparameters
miniBatchSize = 1;
%numHiddenUnits = 50;
x = 0;
y = 1;
maxEpochs = 1;
% Layer structure
layers = [
sequenceInputLayer(numFeatures)
bilstmBlock(optVars.SectionDepth,optVars.HiddenUnits_1,optVars.HiddenUnits_2,optVars.HiddenUnits_3,optVars.HiddenUnits_4,optVars.HiddenUnits_5,optVars.HiddenUnits_6,optVars.HiddenUnits_7,optVars.HiddenUnits_8,optVars.HiddenUnits_9,optVars.HiddenUnits_10,x,y) % Function
dropoutLayer(0)
% Add the fully connected layer and the final softmax and
% classification layers.
fullyConnectedLayer(numResponses,'BiasInitializer','ones','WeightsInitializer',@(sz) normrnd(x,y,sz))
regressionLayer];
% Training options
options = trainingOptions('adam', ...
'InitialLearnRate',optVars.InitialLearnRate, ...
'GradientThreshold',1, ...
'MaxEpochs',maxEpochs, ...
'ExecutionEnvironment','gpu', ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',125, ...
'LearnRateDropFactor',1, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','never', ...
'Verbose',false, ...
'Plots','training-progress');
% Train network
net = trainNetwork(XTrain, YTrain, layers, options);
% Forecast future values
for i = 450:510
net = resetState(net); % Testing this reset option
[net,XPred] = predictAndUpdateState(net,XTrain(i,:),'MiniBatchSize', 1);
Ending = cellfun(@(x) x(end), YTrain(i,:), 'UniformOutput', false);
% Then Update the state again on the last point of Ytrain to get the next state update
[net,YPred] = predictAndUpdateState(net,Ending,'MiniBatchSize',1);
% Repeat the predictAndUpdateState in a for loop to get the next time steps (Forecast into the future)
for j = 2:40 % Need to change this to account for remaining months for each well
[net,YPred(:,j)] = predictAndUpdateState(net,YPred(:,j-1),'MiniBatchSize', 1,'ExecutionEnvironment','gpu');
end
% Convert cell to matrix since the amount of predictions is the same (not the total amount for each well but, the next 5 years for example)
YPred_new = cell2mat(YPred);
mu_3 = cell2mat(mu_PLE);
std_3 = cell2mat(std_PLE);
De_normalized_YPred = YPred_new.*std_3(i,:) + mu_3(i,:);
De_normalized_Xpred = cellfun(@(x,y,z) x.*y + z, std_PLE (i,1), XPred, mu_PLE (i,1), 'UniformOutput', false);
% Test PLE
PLE_test = cell2mat(PLE_Predictions_test(i,1));
% Training PLE
PLE_Predictions_train = cellfun(@(x) x(:,end-1), PLE_Predictions_training, 'UniformOutput', false);
PLE_train = cell2mat(PLE_Predictions_train(i,1));
valError{i,1} = mean((PLE_test(1,1:40) - De_normalized_YPred).^2);
TrainingError{i,1} = mean((PLE_train(1,:) - cell2mat(De_normalized_Xpred(:))).^2);
end
TotaltrainingError = sum([TrainingError{:}]);
TotalvalError = sum([valError{:}]);
fileName = num2str(TotaltrainingError) + "_" + num2str(TotalvalError) + ".mat";
save(fileName,'net','TotalvalError','TotaltrainingError','options','layers')
% Constraints
cons = [];
end
end
%% Define a function for creating deeper networks
function layersan = bilstmBlock(numBiLSTMLayers,HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10,x,y)
numHiddenUnits = [HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10];
layersan = [];
for i = 1:numBiLSTMLayers
layers = bilstmLayer(numHiddenUnits(1,i),'BiasInitializer','ones','OutputMode','sequence','InputWeightsInitializer',@(sz) normrnd(x,y,sz),'RecurrentWeightsInitializer',@(sz) normrnd(x,y,sz));
layersan = [layersan; layers];
end
end
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Classification Ensembles en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
