Process Control with Reinforcement Learning - MATLAB
Video Player is loading.
Current Time 0:00
Duration 15:34
Loaded: 0.00%
Stream Type LIVE
Remaining Time 15:34
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
      Video length is 15:34

      Process Control with Reinforcement Learning

      Multiple-input, multiple-output (MIMO) processes are a feature of almost all chemical plants. The design of robust control strategies is critical for maintaining consistent product quality, ensuring safe operations, minimizing downtime, and generating profit. The design process typically involves the comparative evaluation of alternative control loop configurations for interacting process units, applying domain expertise, and using techniques such as relative gain array and decouplers. How about using reinforcement learning (RL)?

      This video shows an example that introduces the elements of RL. It also provides an overview that describes a MIMO process control design problem and demonstrates how you can use RL to generate a design solution. See how the RL results compare with those derived from a traditional design approach and discover further possibilities for using RL in broader applications.

      Published: 6 Jan 2021

      Hi. I'm James Cross. Thank you for joining our presentation today. In this talk, we'll explore the application of artificial intelligence technology to a process control design problem. In our presentation, we'll focus on a simple multiple input, multiple output, or MIMO, system.

      The system is a mixer consisting of two inlet streams, one cold, and one hot, and a single outlet stream with two characteristics, total flowrate and temperature. The fluid level in the tank is allowed to vary, which introduces non-linearity and a bit more complexity. The objective is to control the flowrates of the two inlet streams to achieve the setpoints for flowrate and temperature.

      Here's a high-level mathematical model for the system. Fundamentally, there are two coupled differential equations. In all our work, we non-dimensionalize the equations for simplicity. You can think of cold as a temperature of 0 and hot as a temperature of 1.

      The mixer model was implemented in Simulink. All of our work is conducted using simulation. Let's take a look. We're not going to go through the details of this, but you can see the details of the mathematical model of the mixer here. And this is what the model is solving for.

      To proceed, we need to learn how to implement the controls. MIMO systems have multiple control loop options. Here are two alternatives for two input, two output system like our mixer.

      In one case, the inlet flow 1 is modulated to control the outlet flowrate. And flowrate 2 is controlled to achieve the setpoint temperature. In the configuration B, it's swapped, so it's the opposite. Which one would be better for the mixer?

      This is the decision that's typically faced in control design problems, and there are several tools that you can use to analyze the situation to make a decision. A common one is the relative gain array. We've done this analysis for the configurations here for the mixer and come up with a matrix that looks like this, showing the impact of each of the manipulated variables on the output variables.

      The problem, as you might observe when you start to quantify this for different temperatures, is that the relative gain array shows couplings between manipulated variables and output variables that change depending on what the temperature is, leaving us with an unfortunate situation that the control loop architecture is indeterminate. This is a problem. But to proceed with our analysis, we need to make a choice. So we choose configuration A.

      Next, we build the controls model. So here, we build the controls model by adding certain elements to the model. You can see the mixer model that we showed two slides ago here.

      Here's what's been added, blocks to specify the setpoint for the flowrate and the temperature, calculations of error or deviations from the actual values from their setpoints, proportional integral control elements that allow us to execute control, saturation blocks which limit the values that the flow can achieve, we can't have a negative and we can't have it exceeding physical limit, and ultimately, we have scopes at the end so we can inspect the results.

      Now let's run a simulation. Here we are in the Simulink environment. You can see the plant model that we've been discussing. Let's consider an example where we ask the controller to increase flowrate and decrease temperature. I've already preloaded the setpoint blocks and the plant model with the initial conditions. So we're ready to run.

      Let's run and generate some output, which you'll see appearing at the right. You can see the top two plots are the manipulated variables, the flow 1 and flow 2. And you can see how they are modulated by the controller to achieve the setpoints shown in blue for both the flowrate and for the temperature. The control is performing as expected.

      Now, let's look at a couple of cases that were anticipated by the relative gain array analysis. We're going to consider a scenario where we simply want to increase flowrate and hold the temperature fixed. We're going to simulate the scenario for two cases, one where the temperature is almost pure cold and one where it's almost pure hot.

      Here are the results for the cold case. You can see that the setpoint is achieved in a reasonable amount of time and quite smoothly. Now, if we look at the hot case, it's somewhat different. You can see that while it looks like it settles out initially for the flowrate, there's a bump that occurs quite a bit later. And you can see that the temperature is significantly disrupted early on and then actually struggles to achieve the setpoint until this much later time. As expected from the relative gain array analysis, the results contrast strongly, which is our tee-up to consider a different approach.

      We're going to talk about reinforcement learning. What is it? Reinforcement learning is a technique that enables a computer to learn a policy for making a series of decisions to perform a task with the objective of maximizing the cumulative reward. Note the emphasis on cumulative. Maximizing the cumulative reward may entail incurring losses from certain individual decisions along the way.

      Here's the general construct. An agent take observations or measurements from the environment. It digests those observations or calculates something based on them to make a decision for what action to take. The action is then put into the environment where we then get a response in a loop, much like a control system.

      However, we invent a reinforcement learning algorithm to make sure that as we accumulate experience, in a trial and error fashion, the learning algorithm is presumably doing better and better by adapting the policy accordingly. What are some examples? Did you see the movie Ford versus Ferrari?

      There's the notion of the perfect lap. If you had a model of a car on a racetrack, how would you compute that? This is one such technique. Or more relevant to our conference, how would you maximize profits from a batch operation? What is the perfect recipe?

      Now we're going to take a small diversion and look at a demo that we've created at MathWorks, which teaches a bipedal robot how to walk. I'm going to do the voiceover here quickly while we play it. This is only a small amount of excerpts from the actual video. So what you can see is a model for a physical system on the right with lots of actions that can be taken to adjust the various joints.

      And the observations were taking place as a result of making those actions, in other words, movement of the robot. To design a control system, you have to make decisions about lots of things. This is a complex system with 31 observations.

      As an alternative, in this demo, a reinforcement learning agent was used using neural networks for an actor and critic model to train the robot how to walk. And you're going to see right here how it did in an early stage. This is a reward function. It's not fully mature.

      And you can see the robot is struggling. It's kind of dragging its right leg like a zombie. It's not doing things symmetrically, and it's deviating from the goal line. Now we update the reward function, and you can see that actually, the robot learns how to walk and stay on target. If you'd like to watch the entire video that explains this demonstration, you can click on the link provided.

      So what is the reinforcement learning workflow? We have to set this up. Our problem with the mixer, we have to adapt for it. Here's the six stages. First is the environment. We've already built a process model. So we're all set there.

      Next is the reward function. This is quite simple for the controls problem. We just reward it when it achieves the setpoint, but we have it monotonically decrease with error as it goes away from the setpoint. There's complexity in the policy, which we code as a neural network. If you don't know what that is, it's a highly configurable mathematical function that's used extensively in artificial intelligence applications.

      And there are a lot of tools for designing and training them. Then we need the agent. Here we have to estimate cumulative rewards as we go based on the experience that we're acquiring. We also use a neural network for that function. We also choose an update method for updating the policy, which is known as the DDPG approach. And you can see it referred to here.

      Then we have to train. We need to run simulations and accumulate experience, calculate rewards, and see how our policy did. If it didn't do well, we make corrections to it to make it do better.

      Lastly, if training is successful, and we obtain a result, a validated trained policy that works in our test cases, we can deploy it into production or a specific application. So, in this case, it's a trained neural network that's then released for use.

      Now let's talk about how we adapted the reinforcement learning framework and put all those elements together in the model. Here's the model. It won't look quite as familiar as the one you saw before. So the PI controllers that were in the model before have been removed.

      You can see an observation block, which simply collates the errors in flow and temperature as well as the flow and temperature themselves and send those to the agent. The reward lock, we talked about the functionality in the last line. And the agent block, which is where the magic happens. That's where the outputs are decided based on the actions in this complex neural network that's being trained as we accumulate knowledge.

      The mixer model is the same as before. So the reinforcement learning model and learning environment is now complete. But how do we do the training? We wrote a MATLAB script to do that. It builds the neural networks. It creates scenarios, in other words, combinations of initial conditions and setpoints randomly, and it supervises the training.

      Let's take a look at what it looks like when the reinforcement learning algorithm and policy are being trained. You can see simulations occurring in the right pane of the window. You can see at the top they're running from the start time to the time and accumulating knowledge. This is the first flow, the second flow, the inlets, and the outlet flow rate, and the outlet temperature per the reinforcement learning controller.

      Each run is an episode, which is plotted here on the x-axis. What you see in blue are the rewards for each episode that's run. What you see in the orange is a moving average of the reward, and you can see that it's increasing. So the learning is happening as desired. And the performance is increasing accordingly.

      Now that we have a trained reinforcement learning controller, let's look at its results on the same exact test that we ran with the PI controller. Remember, all we did was ask for flowrate to be increased while holding temperature constant. First, at the cold condition, I'll first note that the time scale of this plot is quite a bit smaller than it was for the PI control, about three time units. You can see that the flow quickly achieves the setpoint and the temperature achieves the setpoint just shortly thereafter.

      Now, look at the case where it was hot. You can see that the reinforcement learning algorithm, again, creates a response and control action that achieves the setpoints in about the same time frame. So what we have here is that in contrast to conventional control design, the reinforcement of learning approach is loop agnostic. This is a big consideration.

      So this brings us to the final slide of our talk. Traditional MIMO control design requires loop selection. You remember the guidance we took from the RGA at the beginning was indifferent. It couldn't decide. It was indeterminate. And that's only for a 2 input, 2 output system.

      Many plant conditions will have many more inputs and outputs, as will any other interesting systems being controlled, which this problem becomes very, very complex. To tune the controllers requires significant domain expertise. There's a lot of interactions. There's a lot of instabilities that have to be addressed.

      There's different values that have to be scheduled. This requires significant experience. And as we saw on this very simple case, one design may not cover the entire operating space. We'd made those temperatures in the simulation absolute. It wouldn't even be able to control successfully.

      So reinforcement learning offers an alternative. It's inherently MIMO. It's a box that takes multiple inputs and produces multiple outputs, each one a control signal. So we call it loop agnostic. There's no loop design required.

      It does require some AI know-how, building neural networks. There are many tools to allow you to do that simply. But doing it well may be considered analogous to having controls domain expertise in the first case. However, we're very happy that, in this very simple case study, it was used successfully to produce meaningful and good results.

      If you're interested to learn more about this topic or the applications of artificial intelligence in the process industries, I'd welcome you to contact me directly or see one of the resources that I've listed here. So with that, I'll conclude the presentation. I want to thank you very much for your attention, and I'd be happy to take any questions.

      View more related videos