Hi,
I use the Matlab RL toolbox to find solutions of a graph theory problem. In short, it's about how to find the best order to pick cars at a crossing if they were driving automated and we knew all the relevant parameters (where they come from, where they want to go, speed). So out of that we made an adjacence matrix and now we want to find out which is the order with the fewest total time.
I made an environment using rlFunctionEnv, including working Step and Reset Functions. If a restriction was broken, i.e. if a car that is currently not in the first line was chosen or if a car was chosen more than once, I gave penalties, or negative rewards. If not, I gave a postive reward depending on the value in the adjacence matrix. And if all cars were put in an order without taking a car more than once, there was a big reward for success at the end.
Then I used the RL Designer App to make an actor (i.e. using DQN).
- My problem now is, that the rewards apparenlty are not fully transmitted. Because when I check the dashboard in the App, it always shows one step less per episode that it should (i.e. if there are 6 cars, it shows only 5 steps per episode). And the rewards are not the same as how I told the code how to calcuate them. When I saved the reward in the code of the environment manually at the end of an episode in a csv file, there it is shown correctly. So the code seems to work, it just doesn't work the same in the App. There is always the reward of one step missing. And I think that's a big problem because I think the reward shown there is what the learning of the actor in the end is based on. So my question is: does anybody know why it is so and how to solve that problem?   
- Another problem then is that at in training after some time convergence is reached, though not to the highest values but to almost the worst values, i.e. when a very negative overall reward is given because always the same car was chosen. I tried all kind of variations of epsilon (decay) and learning rate. How can that be solved?
I hope you understand my problems. I'm especially interested in ideas for my first problem as it seems that is a specific problem with the RL Designer App for which I can't find hints at any other place. 
 Thanks alot in advance! If something wasn't clear, just ask :)