Gradient computation: Custom Deep Learning Training

Question

0 votos

Hi,

I would like to create an RNN regression problem. I have two input sequences X1 and X2. An output sequence Y is to be predicted from my RNN model. I only have data for X1,X2 and Y, which are all 1x1000 cell with 1x100 double each.

But the output from my RNN model should be sequence F. The derivative of F to X1 is to be my Y, which I want to train with. How can I achieve this?
Also, I don't quite understand how to evaluate my hiddenState from the gru? Since the hiddenstate is not a sequence. Is Z the appropriate sequence for it?

I tried this in model.m, but i dont know if this is right? Can i use dlgradient for sequences at all? Or is there another possibility?

dF_dX1 = dlgradient(sum(F,"all"),{X1});
Y = dF_dX1;

Unfortunately, the loss function does not decrease during training. Where is the error?

I would be grateful for any help.

Greetings

Here you will find the model.m and modelLoss.m

dF_dX1 = dlgradient(sum(F,"all"),{X1});
Y = dF_dX1;
function [loss,gradients] = modelLoss(parameters,X1,X2,T)
X = cat(1,X1,X2);
[Y, F,Z,hiddenState] = model(parameters,X,sequenceLengthsSource,1);
T = dlarray(T);
loss = mse(Y,T,Dataformat="CBT");
% Update gradients.
gradients = dlgradient(loss,parameters);
end

function [Y,F,Z,hiddenState] = model(parameters,X)
X1 = X(1,:,:);
X2 = X(2,:,:);
% GRU.
inputWeights = parameters.pinn.gru.InputWeights;
recurrentWeights = parameters.pinn.gru.RecurrentWeights;
bias = parameters.pinn.gru.Bias;
numHiddenUnits = size(recurrentWeights, 2);
initialHiddenState = dlarray(zeros([numHiddenUnits 1]));
[Z,hiddenState] = gru(X,initialHiddenState,inputWeights,recurrentWeights,bias,DataFormat="CBT");
Z = tanh(Z);
% Fully connect.
weights = parameters.pinn.fc.Weights;
bias = parameters.pinn.fc.Bias;
Z = fullyconnect(Z,weights,bias,DataFormat="CBT");
weights = parameters.pinn.fc2.Weights;
bias = parameters.pinn.fc2.Bias;
Z = fullyconnect(Z,weights,bias,DataFormat="CBT");
% Concatenate Z and X1 along the first dimension
Z_X1 = cat(1, Z, X1);
% Fully connect
weights = parameters.pinn.fc3.Weights;
bias = parameters.pinn.fc3.Bias;
F = fullyconnect(Z_X1,weights,bias,DataFormat="CBT");
F = tanh(F);
% Fully connect
weights = parameters.pinn.fc4.Weights;
bias = parameters.pinn.fc4.Bias;
F = fullyconnect(F,weights,bias,DataFormat="CBT");
dF_deps = dlgradient(sum(F,"all"),{eps});
Y = dF_deps;
end

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Ben el 23 de Jun. de 2023

Could you clarify what you are trying to compute with dF_deps = dlgradient(sum(F,"all"),{eps})?

To use dlgradient you must call dlgradient(Y,X) where X is a dlarray, Y is a dlarray that has been computed using dlarray operations (including using dlnetwork ) on X, and these are computed in a function such as modelLoss that is called by dlfeval.

Note that Y has to be scalar - typically a loss - but X doesn't have to be, so you can compute gradients of scalars with respect to sequence variables.

Since you're writing X1 = X(1,:,:) I'll note something that may seem counter-intuitive. If you compute something like L = sum(X,"all") then dlgradient(L,X) is non-zero, but dlgradient(L,X1) will be zero. This happens because X1 = X(1,:,:) is seen as "computed from X". Instead of computing dLdX1 = dlgradient(L,X1) you should compute dLdX = dlgradient(L,X) and dLdX1 = dLdX(1,:,:).

Regarding the outputs of gru, the first output is the gru output at each sequence element corresponding to the input sequence. The 2nd output is the hidden state which typically you only need if you are going to call the gru again on a further part of the same input sequence. For example if you have an input sequence X with sequence length size(X,3)==200, you could compute gru on X(:,:,1:100) using the initial hidden state, then on X(:,:,101:200) using the hidden state output from the first call to gru.

Hope that helps,

Ben

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Gradient computation: Custom Deep Learning Training

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (0)

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Gradient computation: Custom Deep Learning Training

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (0)

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos