Update: Confirmed the same behavior exists using older trainNetwork with Apple Accelerate BLAS, and using trainNetwork with OpenBLAS 0.3.24
Trainnet with parallel-CPU mode giving incorrect results
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I'm using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130x326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints--fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I'm running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below...
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions('sgdm', ...
'InitialLearnRate',1e-7, ...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',500,...
'LearnRateDropFactor',.25,...
'MaxEpochs',1000, ...
'Verbose',false, ...
'ExecutionEnvironment','parallel',...
'Shuffle','every-epoch',...
'Plots','training-progress', ...
'OutputNetwork','last-iteration');
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end
3 comentarios
Matt J
el 25 de Mayo de 2024
We can't run the code without trainingData. Please attach your two data point test case in a .mat file (as an arrayDatastore).
Respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!