Multi action agent programming in reinforcement learning

Question

0 votos

Please, how can I program or represent multi action agent in reinforcement learning (DQN), where I could construct the agent but I do not know how can represent it (action with three decision every stage of learning) in step function. The action has three decision that are charging battery, operating first generator and operating second generator. The first part of code below show how I construct the enviroment and in the second part I ask how can I add this actions to the my step function.

Thank you in advance.

first part

clc

ObservationInfo = rlNumericSpec([4 1]);

ObservationInfo.Name = 'EnergSolar States';

ObservationInfo.Description = 'T,SOC,SOF,Temp';

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

ActionInfo.Name = 'EnergSolar Action';

env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunctionfuel','myResetFunctionfuel');

obsInfo = getObservationInfo(env);

numObservations = obsInfo.Dimension(1);

actInfo = getActionInfo(env);

statePath = [

imageInputLayer([4 1 1], 'Normalization', 'none', 'Name', 'state')

fullyConnectedLayer(200, 'Name', 'CriticStateFC1')

reluLayer('Name', 'CriticRelu1')

fullyConnectedLayer(200, 'Name', 'CriticStateFC2')];

actionPath = [

imageInputLayer([1 3 1], 'Normalization', 'none', 'Name', 'action')

fullyConnectedLayer(200, 'Name', 'CriticActionFC1')];

commonPath = [

additionLayer(2,'Name', 'add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1, 'Name', 'output')];

criticNetwork = layerGraph(statePath);

criticNetwork = addLayers(criticNetwork, actionPath);

criticNetwork = addLayers(criticNetwork, commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

criticOpts = rlRepresentationOptions('LearnRate',0.002,'GradientThreshold',1);

critic = rlRepresentation(criticNetwork,obsInfo,actInfo,...

'Observation',{'state'},'Action',{'action'},criticOpts);

agentOpts = rlDQNAgentOptions(...

'UseDoubleDQN',false, ...

'TargetUpdateMethod',"periodic", ...

'TargetUpdateFrequency',4, ...

'ExperienceBufferLength',100000, ...

'DiscountFactor',0.99, ...

'MiniBatchSize',1000);%500 to 1000

agent = rlDQNAgent(critic,agentOpts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes', 1000, ...

'MaxStepsPerEpisode', 500, ...

'Verbose', false, ...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeReward',...

'StopTrainingValue',0,...

'ScoreAveragingWindowLength',5);

trainingStats = train(agent,env,trainOpts);

Second part

%Balance eq.

Pg=PL-Ppv-bpr*(Action1);

if(Pg>Z)

if(Pg-Z<=150)

PDG1=Pg(T)-Z;

PDG2=0;

F(T)=A*PDG1+B*Pr;

Pg=Z;

else

if(Pg-Z<350)

PDG2=Pg-Z;

F=A*PDG2+B*Pr2;

PDG1=0;

Pg=Z;

elseif(Pg-Z<500)

PDG2=350;

PDG1=(Pg-Z-PDG2)*Action2;

F=A*(PDG1+PDG2)+B*(Pr1*Action2+Pr2*Action3);

Pg=Pg-Z-PDG1-PDG2;

end

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis el 13 de Jul. de 2020

0 votos

This example shows how to create an environment with multiple discrete actions. Hope that helps

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Emmanouil Tzorakoleftherakis el 14 de Jul. de 2020

All the elements are in ActionInfo.Elements. Is that what you need?

Nabil Jalil Aklo el 14 de Jul. de 2020

Let me explain what I need in this example:

If I have action vector consist of three elements at time,

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

At any time, let the action vector became Action=[-1 0 1] these element represent three decisions to control battery charging, first generator control and second generator control, at mean time I want to apply the first element of this vector on the equation below

SOC=SOC+200*(first element of the action vector)

the question is how can I abstruct the first element from the vector.

Thank you in advance.

Iniciar sesión para comentar.

Multi action agent programming in reinforcement learning

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (1)

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Etiquetas

Community Treasure Hunt

Multi action agent programming in reinforcement learning

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (1)

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo