How to change a subset of ANN weights while keep others weights unchanged?
9 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
jason
el 31 de Oct. de 2012
Respondida: Jai
el 7 de Jul. de 2016
Hello folks,
I am using the neural network toolbox 2012a in my project. I have created a feed-forward-net with 2 layers(inputs are not counted as a layer as conventionalized in the users' guide), and I want to update some of the input weights (IW{1,1}) while keep other input weights in IW{1,1} and the first-to-second-layer weights(LW{2,1}) fixed. To be short, I want to change a subset of IW{1,1} while remain all the other weights fixed. Let me refer this as my optimal goal here.
If the optimal goal is impossible, a sub-optimal goal is also acceptable. That is,update the entire IW{1,1} and keep the whole LW{2,1} fixed.
I already figured out how to achieve the sub-optimal goal. My solution is to use the command 'adapt' and set the learning rate to 0 for LW{2,1}. But I do not like this solution since 'adapt' is an over-simplified function lacking parameters and features(eg. min-grad, plotperform, etc.) of other training functions/algorithms(eg. trainlm,traingd,etc.) Therefore it is harder to control the training process and check on the results.
So, first, I want to know if there is a way to achieve the optimal goal rather than the sub-optimal.
Second, if the optimal goal is not possible (besides composing everything from scratch), I wonder if I can achieve the sub-optimal goal by taking advantage of some training functions instead of using 'adpat'. I have already looked through 'trainlm' and 'traingd' but I do not think they are helpful to either of my goals.
I will really appreciate it if anyone can help me with this issue.
Jason Lee
0 comentarios
Respuesta aceptada
Greg Heath
el 9 de Nov. de 2012
First, let me clarify my train of thought. I was comparing training continuously using net.trainParam.epochs = 100 with training 10 consecutive times in a loop using net.trainParam.epochs = 10 ( or, say, 100 consecutive times in a loop using net.trainParam.epochs = 1). To eliminate complications, do not train with a validation set. For example, train candidates using net.divideFcn = ''. Then use a holdout validation set to choose the best designs.
There is a way to obtain the same result ( I am pretty sure that I did it yrs ago with the 2004 MATLAB 6.5 version of NEWFF). Given the same initial weights at epoch 0, the results will be the same at epoch 10. However, when the second example starts the 11th epoch, it has to call TRAIN again. When TRAIN starts again, it is not in the same state that it would have been in the 11th epoch of the continuous training example.
The task then is to quantify the state of TRAIN at the close of epoch 10 and to guarantee that it is in that state after it is called at the beginning of epoch 11.
Extending this strategy you can interrupt training at any time and assign your specified weights. However, now I understand that you would like some of those weights to remain fixed throughout further training.
Currently, the only way to do that is to keep assigning that same fixed weight thoughout traiing. Whether the assignments are made every epoch or every few epochs would have to be determined by trial and error.
I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:
1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000
2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1
3. FITNET (calls FEEDFORWARDNET) continuous.
4. FITNET WHILE-LOOP
The 4 MSE results for each of the 10 random weight initializations were in agreement. However, I have not yet compared final weights.
In order to further understand the problem I may obtain 1-3-1 designs to get a wider scatter of results.
Hope this helps.
Thank you for formally accepting my answer.
Greg
3 comentarios
Más respuestas (7)
Jai
el 7 de Jul. de 2016
You can use net.biases{i}.learn=0, net.inputWeights{i,j}.learn=0, To fix some of the weights.
0 comentarios
Greg Heath
el 1 de Nov. de 2012
You can directly assign any combination of weights that you want after the call of the osolete functions newpr, newfit or newff. However, if you use the updated functions patternnet, fitnet or feedforwardnet, you have to first call configure, init or train.
net.IW{:,:} = IW;
net.LW{:,:} = LW;
net.b{:,:} = b;
Hope this helps.
Thank you for formally accepting my answer.
Greg
0 comentarios
jason
el 6 de Nov. de 2012
1 comentario
Greg Heath
el 7 de Nov. de 2012
PATTERNET was explicitly designed for classification and pattern recognition.
FITNET was explicitly designed for regression and curvefitting.
BOTH call FEEDFORWARDNET.
If you compare source codes via
type fitnet
type feedforwardnet
you will see that the only difference is that fitnet automatically uses PLOTFIT whereas feedforwardnet does not.
So, if you want to use feedforwardnet, you have to explicitly call plotfit afterward as demonstrated in
help plotfit
Greg
Greg Heath
el 8 de Nov. de 2012
I guess I do not understand exactly what you want to do. My original point was that if you use one of the obsolete fuctions, you can change the value of any combination of weights before or during training and then continue training.
However, if you use one of the current functions,
1. You have to use configure or init if you want to use a specific subset of initial weights. Direct assignment is not allowed before configure, init or train is called.
2. If you want to interrupt training, specify a specific subset of weights and then continue training, training will not continue smoothly from where you interrupted. Instead, your training parameters will be automatically reinitialized.
To make it clearer, suppose you wanted to interrupt training, then do nothing before continuing to train. You will end up with a different result than if you trained continuously. In particular
net.trainParam.epoch = 10;
rng(0)
for i = 1:10
[net tr ] = train(net,x,t);
end
will have a different result than
net.trainParam.epoch = 100;
rng(0)
[net tr ] = train( net, x, t);
If you can figure out how to obtain the same results, it is worth starting a new thread to share the discovery.
Greg
Greg Heath
el 10 de Nov. de 2012
I don't know how many times you want to change the first layer weights during training. However, if you have 2 hidden layers and want to fix the first layer of weights, you can switch between that net and a double net configuration:
1. Use [x;t] to train net1 I-H1-H2-O
h1 = ...
h2 = ...
y = ...
2. Initialize net2 I-H1 and net3 H1-H2-O with weights from net1
3. Use net2 to create the new input matrix h1 = tansig(b1+IW*x)
4. Use [h1;t] to train net3. Since it has a hidden layer, it is a universal approximator.
5. Use the weights from net3 to intialize the last 2 layers of net1
6. etc
The fact that retraining net1 and/or net3 reinitializes the state of TRAIN is not a problem.
If your data set is not large, your toughest problem may be choosing a suitable pair of values for the number of hidden nodes H1 and H2 to prevent overtraining an overfit net ( Number of training equations is not sufficiently larger than the number of unknown weights).
Hope this helps.
Thank you for formally accepting my answer.
Greg
5 comentarios
Greg Heath
el 13 de Nov. de 2012
>Thank you Greg. This is a good trick. But this cannot be used in the 2 cases below.
Easily Modified:
>1. If I do not want to fix the first layer weights completely, but only some of them. That is, I also want to fix some elements of IW{1,1} while changing other elements of IW{1,1} and elements of LW.
This can be accomplished by changing the weights of the first net and generating a new h1.
>2. If I want to do the opposite, that is, fix the layer weights LW while changing the input weights IW{1,1}.
In this case you can use pseudo-inversion to obtain h2 from b2+LW2*h2 = t
Greg Heath
el 13 de Nov. de 2012
I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:
1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000
2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1
3. FITNET (calls FEEDFORWARDNET) continuous.
4. FITNET WHILE-LOOP
The 4 MSE results for each of the 10 random weight initializations were in agreement.
This is because the first 9 initializations achieved the training goal of R2trna >= 0.99 where R2trna is the adjusted coefficient of determination (AKA degree-of-freedom adjusted R^2 ... see Wikipedia). The last initializations terminated before reaching the goal because the specified minimum gradient of MSE (1e-10) was reached.
However, when the weights of the continuous and interupted training designs are compared, only 50% of the designs achieved the same weights.
I do not intend to pursue the reason why the other 50% did not beyond looking at the 1-3-1 case where many of the designs did not achieve the training goal.
0 comentarios
Greg Heath
el 13 de Nov. de 2012
These are the results using FITNET for the 1-3-1 design.
1. All 20 cases terminated via tr.stop = 'Minimum gradient reached.' before achieving the goal of R2trna >= 0.99.
2. Continuous training took 6.8 sec, interrupted training took 32.0 sec
3.The differences in tr.mu were either 1e-3 or 0.9999e-3.
4. The differences in R2trn and R2trna were less than 1e-8.
5. The differences in R2val and R2tst were less than 1e-5.
6. The differences in number of epochs were
dNepochs = -5 0 -5 -4 0 -3 -3 0 -6 -5
7. Nevertheless, in both cases runs 2-5 and 7-9 obtained the EXACT same set of weights. In run 1 there was a sign change in IW(2), b1(2) and LW(2) which caused no change in output because the hidden node activation has odd parity. Adjusting for these 3 sign changes (*), the differences between the continous and interupted training weight estimates were
dWB(: , [1 2 6 10 ] ) =
1 [2-5,7-9] 6 10
=======================================
0.0017 -0.0000 0 -0.0007
*0.0001 -0.0000 0.0001 -0.0022
-0.0074 0.0414 1.0558 -0.0015
0.0014 0.0000 0 0.0003
*0.0000 0.0000 -0.0000 -0.0011
-0.0071 0.0387 0.7752 -0.0013
0.0002 0.0000 0.2949 -0.0000
*0.0000 -0.0000 0.0000 -0.0000
0.0000 -0.0002 -0.5898 0.0001
-0.0002 -0.0002 -0.2949 0.0001
0 comentarios
Ver también
Categorías
Más información sobre Modeling and Prediction with NARX and Time-Delay Networks en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!