rlHindsightPrioritizedReplayMemory
Description
An off-policy reinforcement learning agent stores experiences in a circular experience buffer.
During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:
- S is the current observation of the environment. 
- A is the action taken by the agent. 
- R is the reward for taking action A. 
- S' is the next observation after taking action A. 
- D is the is-done signal after taking action A. 
The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.
By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory object
      as their experience buffer. For goal-conditioned tasks, where the observation includes both
      the goal and a goal measurement, you can use an
        rlHindsightPrioritizedReplayMemory object.
rlHindsightReplayMemory
      objects uniformly sample experiences from the buffer. To use prioritized nonuniform sampling,
      which can improve sample efficiency, use an
        rlHindsightPrioritizedReplayMemory object.
A hindsight replay memory experience buffer:
- Generates additional experiences by replacing goals with goal measurements 
- Improves sample efficiency for tasks with sparse rewards 
- Requires a ground-truth reward function and is-done function 
- Is not necessary when you have a well-shaped reward function 
For more information on hindsight experience replay and prioritized sampling, see Algorithms.
Creation
Syntax
Description
buffer = rlHindsightPrioritizedReplayMemory(obsInfo,actInfo,rewardFcn,isDoneFcn,goalConditionInfo)obsInfo and
            actInfo, respectively. This syntax sets the
            RewardFcn, IsDoneFcn, and
            GoalConditionInfo properties.
Input Arguments
Properties
Object Functions
| append | Append experiences to replay memory buffer | 
| sample | Sample experiences from replay memory buffer | 
| resize | Resize replay memory experience buffer | 
| reset | Reset environment, agent, experience buffer, or policy object | 
| allExperiences | Return all experiences in replay memory buffer | 
| validateExperience | Validate experiences for replay memory | 
| generateHindsightExperiences | Generate hindsight experiences from hindsight experience replay buffer | 
| getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer | 
| getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer | 
Examples
Limitations
- Hindsight prioritized experience replay does not support agents that use recurrent neural networks. 
Algorithms
References
[1] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. 'Prioritized experience replay'. arXiv:1511.05952 [Cs] 25 February 2016. https://arxiv.org/abs/1511.05952.
[2] Andrychowicz, Marcin, Filip Wolski,Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojiech Zaremba. 'Hindsight experience replay'. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA: 2017.
Version History
Introduced in R2023a