Error in creating a custom environment in deep reinforcement learning code

I am creating a movie recommender system using deep reinforcement learning. I faced an error while creating the environment for the RL problem. The problem is I couldn't find any generic example on how to create environments for RL in MATLAB, all the environments are predefined. Please assist as I keep getting the following error:
Error using rl.env.MATLABEnvironment/validateEnvironment
Environment 'ObservationInfo' does not match observation output from step function. Check the data type,
dimensions, and range.
Error in rl.env.rlFunctionEnv (line 74)
Error in rlFunctionEnv (line 45)
env = rl.env.rlFunctionEnv(varargin{:});
Error in untitled9 (line 22)
env = rlFunctionEnv(observationInfo, actionInfo, @(Action,LoggedSignals) myStepFunction(Action,LoggedSignals,ratings), @myResetFunction);
I have written the following code so far.
clear all
% Load the MovieLens dataset
ratings = readtable('ml-latest-small/ratings.csv', 'VariableNamingRule', 'preserve');
opts = detectImportOptions('ml-latest-small/movies.csv');
movies = readtable('ml-latest-small/movies.csv', opts);
% Preprocess the data to create the state space and reward function
numMovies = height(movies); % number of movies
numGenres = 20; % number of movie genres
numRatings = 5; % number of possible movie ratings
numUsers = max(ratings.userId); % number of users
stateSize = numMovies + numGenres + numRatings + numUsers;
observationInfo = rlNumericSpec([stateSize 1]);
observationInfo.Name = 'observation';
% Define the action space
actionInfo = rlFiniteSetSpec([1:numMovies]);
actionInfo.Name = 'action';
% Define the environment
env = rlFunctionEnv(observationInfo, actionInfo, @(state,action) myStepFunction(state,action,ratings), @myResetFunction);
% Define the DQN agent
numHiddenUnits = 64;
statePath = [ imageInputLayer([stateSize 1 1],'Normalization','none','Name','observation')
dqn = rlDQNAgent(statePath,actionInfo,'UseDoubleDQN',true);
% Train the agent
maxEpisodes = 100;
maxSteps = 10;
trainOpts = rlTrainingOptions('MaxEpisodes',maxEpisodes,'MaxStepsPerEpisode',maxSteps);
trainingStats = train(dqn,env,trainOpts);
function next_state = myStepFunction(state, action, ratings)
% This function takes the current state, an action, and a matrix of ratings
% as input and returns the next state.
% Calculate the new state based on the action
new_state = [state(2:end); action];
% Return the new state
next_state = new_state;
function [initial_state, LoggedSignal] = myResetFunction()
% This function returns the initial state for the movie recommendation system.
% Load the movie ratings dataset
%ratings = readmatrix('ml-latest-small/ratings.csv');
num_movies = 9742;
% Initialize the state with no movie ratings
initial_state = zeros(num_movies + 20 + 5 + 610, 1);
% Set the first element of the state to a random integer between 1 and the number of movies
LoggedSignal.State = randi(num_movies);
% Assign the random movie selection to the first element of the state vector
initial_state(1) = LoggedSignal.State;
function reward = myRewardFunction(state, action, ratings)
% This function takes the current state, an action, and a matrix of ratings
% as input and returns the reward.
% Calculate the reward based on the ratings
reward = ratings(state(1), action);

Emmanouil Tzorakoleftherakis
el 24 de Abr. de 2023

