1D-CNN: Replicating SeriesNetwork results using dlNetwork

Question

Ioannis Tsitsimpelis el 23 de Feb. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2085973-1d-cnn-replicating-seriesnetwork-results-using-dlnetwork

Respondida: Avadhoot el 20 de Mzo. de 2024

Hi,

I have trained a 1D-CNN with sequence data of varying size, but I would like to have finer control and so naturally I have been trying to replicate my results with a training loop. However, my training loop won't work unless I pad each batch before making it a dlarray. I don't want to pad or do anything else that would distort my data. Could anyone advise how a miniBatch containing varying size vectors is handled using the trainNetwork and how could I do the same in the training loop?

Otherwise, to replicate the results with both ways, Could I perhaps implement a dynamic batching method common to both? See snippets 1 and 2 for reference.

% Snippet 1
rng(369,'twister')
load train_data.mat
load train_data_id.mat
data = train_data;
data_id = train_data_id;
num_outputs = numel(categories(data_id))
% Split train data to train-test-eval
[idxTrain,idxTest,idxVal] = trainingPartitions(size(data,2), [0.7 0.15 0.15]);
XTrain = data(idxTrain)
TTrain = data_id(idxTrain);
XTest = data(idxTest);
TTest = data_id(idxTest);
XVal = data(idxVal);
TVal = data_id(idxVal);
layers = [
    sequenceInputLayer(1, 'MinLength',500)
    convolution1dLayer(20,32)
    batchNormalizationLayer
    reluLayer
    dropoutLayer(0.2)
    convolution1dLayer(20,64)
    batchNormalizationLayer
    reluLayer
    dropoutLayer(0.2)
    globalMaxPooling1dLayer
    fullyConnectedLayer(9)
    softmaxLayer
    classificationLayer];
options = trainingOptions('rmsprop', ...
    'MaxEpochs',75, ...
    'Shuffle','every-epoch', ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'MiniBatchSize',8);
% Train network
[net, info] = trainNetwork(XTrain,TTrain,layers,options);

% Snippet 2 built based on Mathwork's training loop help page
rng(369,'twister')
load train_data.mat
load train_data_id.mat
data = train_data;
data_id = train_data_id;
% Count number of labels
numClasses = numel(categories(data_id));
classes = categories(data_id);
% Split train data to train-test-eval
[idxTrain,idxTest,idxVal] = trainingPartitions(size(data,2), [0.7 0.15 0.15]);
XTrain = data(idxTrain);
TTrain = data_id(idxTrain);
XTest = data(idxTest);
TTest = data_id(idxTest);
XVal = data(idxVal);
TVal = data_id(idxVal)
% Specify network architecture
layers = [
    sequenceInputLayer(1, 'MinLength',500)
    convolution1dLayer(20,32)
    batchNormalizationLayer
    reluLayer
    dropoutLayer(0.2) 
    convolution1dLayer(20,64)
    batchNormalizationLayer
    reluLayer
    dropoutLayer(0.2) 
    globalMaxPooling1dLayer
    fullyConnectedLayer(9)
    softmaxLayer];
% Create a dlnetwork object from the layer array
net = dlnetwork(layers)
% Specify the options to use during training
numEpochs = 75;
miniBatchSize = 8;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
% Initialize the squared average gradients
averageSqGrad = [];
% Calculate the total number of iterations for the training progress monitor
numIterations = numEpochs * numIterationsPerEpoch;
% Initialize the TrainingProgressMonitor object. Because the timer starts 
% when you create the monitor object, make sure that you create the object 
% close to the training loop
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
% Train the model using a custom training loop. For each epoch, shuffle the 
% data and loop over mini-batches of data. Update the network parameters 
% using the rmspropupdate function. At the end of each iteration, 
% display the training progress
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
    epoch  = epoch + 1;
    % Shuffle data
    idx = randperm(numel(XTrain));
    XTrain = XTrain(idx);
    TTrain = TTrain(idx);
    i = 0;
    while i < numIterationsPerEpoch && ~monitor.Stop
        i = i + 1;
        iteration = iteration + 1;
        % Read mini batch of data and convert the labels to dummy variables
        idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
        X = XTrain(idx);
        T = zeros(numClasses,miniBatchSize,"single");
        for c = 1:numClasses
            T(c,TTrain(idx)==classes(c)) = 1;
        end
        % Determine max length for padding
        maxLength = max(cellfun(@length, X));
        % Pad sequences to have the same length
        paddedX = cellfun(@(seq) padarray(seq, [0, maxLength - length(seq)], ...
            'post'), X, 'UniformOutput', false);
        paddedMatrix = cat(3, paddedX{:}); % Assuming sequences are row vectors
        paddedMatrix2 = permute(paddedMatrix, [3, 2, 1]); % Adjust dimensions as needed for dlarray
        paddedMatrix3 = permute(paddedMatrix2, [1, 2]);
        dlX = dlarray(single(paddedMatrix3), 'BTC');
        % Convert mini-batch of data to a dlarray.
        % If training on a GPU, then convert data to a gpuArray.
        if  canUseGPU
            dlX = gpuArray(dlX);
        end
        
        % Evaluate the model loss and gradients using dlfeval and the
        % modelLoss function.
        [loss,gradients] = dlfeval(@modelLoss,net,dlX,T);
        
        % Update the network parameters using the RMSProp optimizer.
        [net,averageSqGrad] = rmspropupdate(net,gradients,averageSqGrad);
        % Update the training progress monitor.
        recordMetrics(monitor,iteration,Loss=loss);
        updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
        monitor.Progress = 100 * iteration/numIterations;
    end
end
function [loss,gradients] = modelLoss(net,dlX,T)
Y = forward(net,dlX);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Avadhoot el 20 de Mzo. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2085973-1d-cnn-replicating-seriesnetwork-results-using-dlnetwork#answer_1428231

Hi @Ioannis Tsitsimpelis,

I understand that you are using padding because your data has sequences of variable length. But you want to keep your model free from biases introduced due to padding. Regarding your question about the "trainNetwork" function, even that uses padding internally while training the network. But when you use custom training loop, you need to perform the padding manually. There are 2 ways to go about it. Both are listed below:

1) Dynamic Batching:

As you have already mentioned in your question, dynamic batching can be used in both the cases. While dynamic batching will not remove the need for padding completely, it will keep the padding at the minimum. To implement dynamic batching, you will have to sort your data according to the lengths of the sequences and then group the data of similar sequence size in batches. This is a relatively easy step and will keep padding at the minimum.

2) Custom padding and masking:

This approach can be used if dynamic batching is not feasible. Here you can implement your custom padding and then introduce a masking layer in the model to ignore the padding. This way your model will be unaffected by the padding and no biases will be introduced.

Here is the simplified approach to implement this:

Pad Sequences: Pad the sequences to match the length of the longest sequence before converting them to "dlarray". You have already performed this in Snippet 2.
Implement masking: You need to introduce a masking layer into your network. MATLAB does not have a built-in masking layer so you will have to write your own layer with the appropriate forward and backward functions for masking operation. Alternatively, you can manually apply a mask to the output of the network before calculating the loss so that the padded values are ignored. Care must be taken to ensure that the gradients are computed correctly.

In conclusion, his approach will be difficult to implement but it will achieve what you intended. The model will be free from the effects of padding. Otherwise, you have the option of dynamic batching which keeps the padding to a minimum.

I hope this helps.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

1D-CNN: Replicating SeriesNetwork results using dlNetwork

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

1D-CNN: Replicating SeriesNetwork results using dlNetwork

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos