How to format sequences to store in experience buffer for DRQN?

Question

Imola Fodor el 27 de Feb. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2087601-how-to-format-sequences-to-store-in-experience-buffer-for-drqn

Comentada: Imola Fodor el 4 de Jun. de 2024

For DRQN (Deep Recurrent Q Learning) in POMDP it is needed to store entire sequences instead of individual transitions in the replay buffer. For the object agent.ExperienceBuffer, how to construct the data? For example, for Observation element i have tried to have a 1x1 cell with inside the numchannel x sequencelength, and also to have a cell array directly numchannel x sequencelength. the idea was to then sample minibatch of sequences instead of minibatch of transitions.

For any trial I get an error

Error using rl.replay.rlReplayMemory/validateExperience
Observation dimensions must match the dimensions specified in the corresponding specifications.

More specifically, when debugging i see that in the first case (1x1 cell) the code crashes at :

for obsCh = 1:numObsChannels
    if ~all(size(NewObs{obsCh}) == obj.InternalReplayMemory_.ObservationDimension{obsCh})
        error(message('rl:general:errIncorrectObservationDim'));
    end
    

And in the second case at:

    if numObsChannels ~=  numel(NewObs)
        error(message('rl:general:errIncorrectObservationDim'));
    end   

In MATLAB it is possible to have dqn with recurrent layers, so there is certainly a way to store these sequences somehow.

Thank you,

Imola

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Shubham el 29 de Mayo de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2087601-how-to-format-sequences-to-store-in-experience-buffer-for-drqn#answer_1464681

Hi Imola,

To handle sequences in the replay buffer for Deep Recurrent Q-Networks (DRQN) within a Partially Observable Markov Decision Process (POMDP) setting in MATLAB, you need to structure your observations and experiences in a way that aligns with the expected format of the rl.ExperienceBuffer or any custom replay buffer you're implementing. The error you're encountering is due to a mismatch in the dimensions of the observations you're trying to store versus what the replay memory expects based on the observation space specifications.

Here's how you can approach this:

1. Observation and Action Space Specification

First, ensure that your observation and action spaces are correctly specified to accommodate sequences. For a DRQN, the observation space must account for the sequence length as part of its dimensionality if you're not using a 1x1 cell to encapsulate the entire sequence.

2. Storing Sequences

When storing sequences, the key is to maintain consistency in how observations are represented. If your environment's observation for a single timestep is a vector of size [numChannels, 1], then for a sequence of length sequenceLength, you'd typically have an observation of size [numChannels, sequenceLength].

However, MATLAB's RL framework expects each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, you need to ensure that the entire sequence for a single channel is contained within a single cell, and the dimensions match what the environment and agent expect.

3. Correct Approach for Sequences

Given the errors you're encountering, let's clarify the correct approach:

For a 1x1 Cell Approach: If you're trying to encapsulate the entire sequence in a 1x1 cell, ensure that the cell contains a matrix where each column represents a timestep, and the rows represent different features or channels of the observation. This approach might require custom handling in your experience replay mechanism to correctly sample and utilize these sequences.
For a Cell Array Directly Matching numChannel x sequenceLength: This seems to be a misunderstanding. If you're using a cell array where each cell is supposed to represent a channel over the sequence, ensure that each cell actually contains a vector representing the sequence for that channel. The correct dimensionality for a cell array storing sequences would be [1, numChannels] where each cell contains a vector of length sequenceLength, not a matrix of [numChannels, sequenceLength].

4. Sampling Mini-batches

When sampling mini-batches of sequences, you must ensure that each sampled experience contains the full sequence as required for the DRQN's input. This might involve custom modifications to the sampling logic to ensure that sequences are kept intact and not broken up.

5. Debugging Tips

Check Dimensionality at Every Step: Print out the dimensions of your observations at various points (creation, before storing, and during retrieval) to ensure they match expectations.
Align with Agent Specifications: Double-check the agent's expected input dimensions, especially if you're using recurrent layers, to ensure compatibility.
Custom Replay Buffer: If the built-in rl.ExperienceBuffer doesn't meet your needs for sequence handling, consider implementing a custom replay buffer that explicitly supports sequences in the way you require.

Remember, the key to successfully implementing DRQN in MATLAB is ensuring that your observation sequences are correctly formatted and that your replay buffer is capable of handling, storing, and sampling these sequences in a way that aligns with the expected input structure of your recurrent neural network.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Imola Fodor el 4 de Jun. de 2024

hello Shubham, this answer is very long but unfortunately i dont see any concrete solutions.. Can you point me to some documentation where I can read about "...each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, ..."? Another thing, i see staterments such as "This might involve custom modifications to the sampling logic " or "This approach might require custom handling in your experience replay mechanism "...

Iniciar sesión para comentar.

How to format sequences to store in experience buffer for DRQN?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How to format sequences to store in experience buffer for DRQN?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos