Time-varying policy function
Mostrar comentarios más antiguos
Hi,
I am wondering if it is possible to have time-varying (non-stationary) policy functions in the reinforcement learning toolbox.
For example, say my episode lasts three periods (t=1,2,3), then I would have the set
where
is some neural network structure indexed by a general vector of parameters ϑ, which will ultimately depend on the time period.
where Is that possible to do with the toolbox?
Thank you so much!
Respuestas (1)
Emmanouil Tzorakoleftherakis
el 25 de Mayo de 2023
0 votos
Why don't you just train 3 separate policies and pick and choose as needed?
4 comentarios
Matheus Silva
el 25 de Mayo de 2023
Editada: Matheus Silva
el 25 de Mayo de 2023
Emmanouil Tzorakoleftherakis
el 25 de Mayo de 2023
I could be misunderstanding, but assuming first period has no dependencies, then you train that first. Then you use the trained policy to train your second period policy and so on
Matheus Silva
el 28 de Mayo de 2023
Editada: Matheus Silva
el 28 de Mayo de 2023
Emmanouil Tzorakoleftherakis
el 30 de Mayo de 2023
Honestly, I think your best bet would be to use the same policy throughout, but maybe use an input signal to the neural net to indicate which period you are in based on your state.
Another option, which is similar to what I mentioned earlier, is to train 3 different policies. To work around the period dependencies, you can place the RL policy block inside a triggered subsystem and only enable the subsystem for training when the system is in the appropriate period. Do that for each policy and then you can switch between the 3 as needed. See here
Categorías
Más información sobre Deep Learning Toolbox en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

