When used in DDPG actor and critics networks, the 1×1 convolution does NOT behave like a fully connected layer (Even though it should). Why is that?
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Supposing (for some valid reason), my data is arranged along the third dimension. Say
x = rand(1,1,m);
I need to process this data using a deep learning network. For the sake of example, let us assume the network has just two layers. I input the data using an image input layer:
imageInputLayer([1,1,m], "Normalization", "none"));
I then use a 1 by 1 convolution layer
convolution2dLayer([1,1], n, "Stride", [1,1]);
This should behave like an fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.
fullyConnectedLayer(n);
Only when the data is along the first or second dimension, the fully connected layer behaves properly. But in this case I can't use a convolution layer which defeats the purpuse.
Note: I need the first and second dimensions for other purposes... that's why I'm aranging my data along the third dimension in this example.
====================
Update: Here's a minimal working example that you can run and see for yourself (Just an example... not a real application).
It's a DDPG agent trying to learn to replicate the state. In other words, the agent should learn to output an action vector as close as possible to the observation vector (which is very trivial). In order to do so, I've set the reward to be the negative norm of the difference between action and state (or observation). That is:
Reward = -norm(Action(:) - obj.State(:));
This is the complete definition of the environment
classdef env < rl.env.MATLABEnvironment
%% Properties
properties
State
end
%% Methods
methods
function obj = env(dim)
% Initialize Observation settings
ObservationInfo = rlNumericSpec(dim); %
ObservationInfo.LowerLimit = -ones(ObservationInfo.Dimension);%
ObservationInfo.UpperLimit = +ones(ObservationInfo.Dimension);%
% -------------------------------------------------------------
% Initialize Action settings
ActionInfo = rlNumericSpec(dim); %
ActionInfo.LowerLimit = -ones(ActionInfo.Dimension); %
ActionInfo.UpperLimit = +ones(ActionInfo.Dimension); %
% -------------------------------------------------------------
obj = obj@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo); % This line implements built-in functions of RL env
end
function [Observation, Reward, IsDone, LoggedSignals] = step(obj, Action)
Reward = -norm(Action(:) - obj.State(:));
Observation = reset(obj);
IsDone = false;
LoggedSignals = [];
end
function InitialObservation = reset(obj) % Reset environment to initial state and output initial observation
obsInfo = getObservationInfo(obj);
obj.State = 2*rand(obsInfo.Dimension)-1;
InitialObservation = obj.State;
end
end
end
And here's a test function which defines two simple actor and critic networks and trains the network
function test
k = 3; % k can be set to 1, 2, or 3
dim = ones(1,3);
dim(k) = 10;
%% Create Environment
envObj = env(dim);
obsInfo = getObservationInfo(envObj);
actInfo = getActionInfo(envObj);
% -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
%% Create Actor
actorNetwork = [
imageInputLayer(dim, 'Name','observation', 'Normalization','none')
fullyConnectedLayer(100, 'Name', 'ActorFC1'); reluLayer('Name', 'ActorRelu1')
fullyConnectedLayer(max(dim), 'Name', 'ActorFC3'); tanhLayer('Name', 'ActorTanh' )];
actorOpts = rlRepresentationOptions('LearnRate', 1e-4, 'GradientThreshold', 1);
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', {'observation'}, 'Action', {'ActorTanh'}, actorOpts);
% -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
%% Create Critic
statePath = [
imageInputLayer(dim, 'Name', 'observation', 'Normalization', 'none')
fullyConnectedLayer(100, 'Name', 'CriticStateFC2') ];
actionPath = [
imageInputLayer(dim, 'Name', 'action', 'Normalization', 'none')
fullyConnectedLayer(100, 'Name', 'CriticActionFC1', 'BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add'); reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, 'CriticStateFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'CriticActionFC1', 'add/in2');
criticOpts = rlRepresentationOptions('LearnRate', 1e-3, 'GradientThreshold', 1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'action'},criticOpts);
% -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
%% Create Agent
agentOpts = rlDDPGAgentOptions();
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor, critic, agentOpts);
% -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
%% Training
trainOpts = rlTrainingOptions(...
"MaxEpisodes", 1000,...
"MaxStepsPerEpisode", 100,...
"ScoreAveragingWindowLength", 10,...
"Verbose", true,...
"Plots", "training-progress",...
"StopTrainingCriteria", "AverageReward",...
"StopTrainingValue", -0.01,...
"SaveAgentCriteria", "EpisodeReward",...
"SaveAgentValue", -0.1);
train(agent, envObj, trainOpts); % Train the agent.
% -------------------------------------------------------------------------
end
You can chose to put the random data along dimension 1, 2, or 3 by setting k to be one of the tree in the initial line of the test function. Since a fully connected layer is used, the restuls should not differ. But, here's a plot for or 2
As can be seen, in this case, the agent is sucessfuly reducing the cost (Increasing the reward).
Now, let and see the result
As can be seen, in this case, the agent is simply meandering around the initial value.
This example uses fully connected layers. But, when , you can replace the fully connected layers with convolutional layers and you would get the same resutls. For example
fullyConnectedLayer(100, 'Name', 'ActorFC1');
Would be replaced by
convolution2dLayer([1,1], 100, 'Name', 'ActorFC1');
====================
2 comentarios
Respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!