Description

Anomaly Detection for Industrial Processes and Machinery with MATLAB

Overview

Many industries are looking to AI to deliver increased efficiency and improve product quality by automating production process monitoring and maintenance scheduling. Even when production lines are instrumented with sensors as part of digital transformation, engineering teams often lack the specialized skills required by predictive maintenance and advanced process analytics. This webinar will demonstrate statistical and machine learning techniques in MATLAB on real-world datasets to monitor manufacturing processes and detect equipment anomalies.

Highlights

Preprocessing sensor data
Identifying condition indicators
Using deep learning and machine learning for anomaly detection algorithms
Operationalizing algorithms on embedded systems and IT/OT systems

About the Presenter

Timothy Kyung is an Application Engineer at MathWorks supporting the Government and Defense Industry with technical expertise in application deployment, interfacing with third party software, and parallelization. He holds a B.S. and M.S. in Mechanical Engineering with a focus in robotics from Carnegie Mellon University.

Recorded: 24 Aug 2022

Full Transcript

Hello, my name is Timothy and I'm an Application Engineer at The MathWorks. A little about myself before we begin, I've been with The MathWorks for about three years. And before, I was working on my master's in mechanical engineering at Carnegie Mellon focusing on research and in lichen locomotion and active prostheses.

Now, through my research and time at Carnegie, I had the chance to explore some different artificial intelligence applications. And so today, I'm here to talk to you about how we can use MATLAB to develop AI models to detect anomalies in our signals. Now, the signals that we're going to explore today come from two different examples. The first example focuses on an industrial process of electrolytical copper production, while our second example focuses on data taken from sensors on an industrial machine.

Now, before we go further in our presentation, let's take a step back and look at an example of the end goal that we're going to work towards. So here I have an application that's been packaged with some of the approaches that we're going to explore today. The data come from our copper production process that we'll explore later in example one.

We can see the data that we're working with, representing different levels of impurities measured through a copper production process such as nickel, silver, lead, and so on. Here we can see a comparison of different approaches against the ground truth value shown in the plot on the bottom. In this application, we explore the results of using a control chart, a one class SVM, and an isolation forest.

Now, if those terms are a little new to you, don't worry. We're going to explore them later in the approaches that we're going to see later in this presentation. Now, by using these different solutions, we can compare the results and come up with the best approach for detecting anomalous signals for any particular situation.

OK, now that we have an idea of what we're going to cover, let's get back into the presentation. First, let's talk about why someone might want to do anomaly detection and some of the benefits that it comes with. One benefit we can explore through anomaly detection is process monitoring, where we can manage our process and ensure optimal resource use.

Another is improving the quality of our products as anomaly detection allows us to diagnose defects that may occur during our production line. Finally, knowing when an anomaly might occur can be useful in helping us schedule maintenance so as to reduce downtime, since in this case, we're going to know when our equipment needs to be down. These are just some of the benefits that anomaly detection can provide for our work.

So now that we've discussed the why for anomaly detection, let's get into the what. And we'll start off by asking, what exactly is an anomaly? Well, an anomaly is any behavior that deviates from normal. Some types of anomaly can be easy to identify, such as point anomalies in signals that stick out from a particular sensor reading. Others can be a bit more difficult, such as collective anomalies across many signals.

So maybe one particular anomaly wouldn't cause us any concern, but groups or patterns of those anomalies might be when we define a problem. So you can see here how the dependence on multiple signals might cause an issue when we try to identify these anomalies down the line. Lastly, we can have visual anomaly in images. Now, while we can detect them visually, training a deep learning network to do the same can be much more complicated.

And so for this demo, we're going to cover point anomalies and collective anomalies, but not visual anomalies. Again, these aren't all the anomaly types, but just a couple that you might face in either your process or machinery. So now that we've defined what an anomaly is, let's ask the question is something different happening? Answering this question can be pretty difficult sometimes. And we'll look at a couple of examples here of how that can be the case.

Our first example shows us some time series data. Here we have two different signals that, honestly, look quite similar. And if I posed a question to you, which signal is the anomalous signal and which signal is a normal signal? How would you answer? It's pretty confusing.

So I'll give you the answer and tell you that the blue signal is a normal one and the black signal is anomalous. So do you think now you could tell me what the differences were? Right. This question itself is pretty difficult to answer. And so that leads to further question, how are you going to train an algorithm to tell the difference?

Well, let's transform our signal in this case from the time domain to the frequency domain. Doing this shows us some differences when our peaks occur. And so in this case, we might be able to feed in the power spectrum of our signal to help train our anomaly detection algorithm.

But now, let's consider a case where we have a group of signals from a three channel accelerometer. Again, the question is, which is the anomalous data? The blue on the left, or the red on the right? We can begin to see here how our data doesn't always tell us where anomaly is and how we might need to work with the massager data so that we can get meaningful information to train our algorithms.

Here, we might need to extract multiple features from the data and explore their statistical distributions to really get an idea of the differences between the signals. And this is why anomaly detection problems can really benefit from machine learning approaches. Going through just these two examples, a question begins to form in our heads. How do we know what an anomaly looks like? And that's the challenge that we're going to face today in this webinar.

Oftentimes, data really is normal with very few anomalies collected. If most of your data is anomalous, you might need a different webinar. But in this case, we have to work with what we have, and that's healthy data. And so we can begin by creating models based on healthy or normal data.

And when you compare your data to these models based on the normal data, anything that sticks out we could probably label as an anomaly, since it doesn't fit to the healthy data that we trained it on. So what is normal now? Well, normal is really defined by you. It's going to be any data that represents behavior that we weren't expecting.

So anomalies aren't always a problem, just behavior that we're not expecting. So if we look at some of the approaches we can take towards anomaly detection, we can see that we have a lot. We can start off with some of the basic simple statistics approaches, such as a control chart, or go into different machine learning approaches based on if we have labeled or unlabeled data of what's healthy and what isn't.

Earlier in that example of our application, we went over control charts, one class SVM, and isolation forest. During our examples today, we're going to cover some of the simplest statistics approaches with the control charts and some unsupervised learning approaches using auto-encoders and distance-based methods. So let's go into a typical workflow of anomaly detection.

Here, we'll first start off with our data and pre-process it so it's in a usable format. This might include dealing with missing data, outliers, noise, really anything that might skew our results. From there, we can begin to extract some features from the data to feed into our machine learning model and train it.

Remember our two examples earlier. Using just the raw data, we had a tough time determining when an anomaly was occurring. And this is why feature extraction is so critical in developing our detection algorithms. Lastly, we'll deploy our algorithms to any system that we work with so that we can repeat this process, but with the real time anomaly detection algorithm working in the loop. This is going to allow us to realize those benefits that we explored earlier of process monitoring, better quality of products, and more predictable maintenance schedules.

So in this first example, we'll cover how we can use anomaly detection to help in our process monitoring and improving quality of our product. So let's try this out in the example here. Here we have data collected from electrolytical copper production. And we're looking at different impurities generated from our process, such as silver, nickel, lead, and so on.

We took these measurements based on the random samples that we took of our production line, twice a day, every day for a year. We also had this ground truth, a total analysis index, that tells us how impure the copper was overall. Now, we're going to try to detect some common anomalies found in the real world process. The anomalies that we're going to detect here can be defined as a single impurity, which is fairly easy, or groups of impurities, which get a little bit more of a challenge to detect.

Now, I understand not everyone watching may be involved in copper production, but don't worry. The techniques that we'll go over today can apply to measurements taken really in any industrial process-- acidity, temperature, pressure, you name it. OK. So here's the example that we're going to go over today to detect the anomalies in our copper production process.

Let's first look at our data. Remember, the first step of our workflow was to acquire data. And so here, we'll read in our measured sample data that we took. Now, normally you might get your data from a process historian or a database, or you might even stream in the data directly. But in this case, we'll use a saved data set.

Our data set, again, has two samples taken randomly, post production, every day for about a year. So in this case, rows 1 and 2 represent the first day of our sample collection, rows 3 and 4 the second day, and so on. To get a sense of what our data looks like, we have a lot of options that we can use. But in this case, we'll use our stack plot function to plot all of our variables on top of each other.

Now, let's try to plot this plot so we can better interact with our data. Doing this allows us to better view if there are any peaks occurring, indicating to us that the measure rate of a certain metal is high. Now, looking at our plot, it's pretty difficult to get a telling idea of how anomalous our data is just from the raw data.

We can see how easy it is to measure individual peaks. But trying to look at peaks across different groups sets becomes a bit of a challenge. One metal might have a peak where another metal doesn't. And trying to keep track of this across all the different groups that we have becomes a bit of a challenge. So to overcome this, let's apply some basic statistics, such as finding the mean and standard deviation to better paint the picture of what our data is telling us.

So I've gone ahead and done this here using a box plot. Using this box plot, we can begin to define boundaries on a percentage basis of our data. Now, if we needed to define boundaries of groups of censors, we might want to normalize our data. Normalizing our data helps us establish a common scale across all the groups without distorting the difference in range of values. This helps to reduce bias introduced for any particular data set.

And we can see how normalizing our data set gives us a different visualization and understanding of our data in comparison to each other than the previous box plot. So let's take a different approach to detecting anomalies within our data. So now, let's apply a simple statistics approach to try to detect anomalies within our data. Here we have a control chart, which allows us to put an upper and lower control limit onto the data that we've selected.

The limits that we see here in red are set at three standard errors away from our center line here in green. And the violation passed the threshold the circled in red. Now this control chart shows us a measure of just our silver impurities. But let's see what happens when we apply this approach to all of our variables.

Now, we could try to spot anomalies across all of our data or even groups of data using this method, but you can see from just this plot alone how this might start to get messy really quickly. And we also need to ask now, does a single control chart violation represent an anomaly in the entire process? If not, how many violations do we need? Two, three, maybe all of them?

There might even be places where no single impurity is above a threshold, but collectively there is a problem. And so this is when we might want to switch to our machine learning approach. For anomaly detection, we can use a class of machine learning methods that only require training on normal data. And this is pretty practical because we often go to great lengths to keep our processes running normally. So you may not actually collect very many anomalies.

Who knew that having such a good process might make things difficult for us? But don't worry, we'll go over a couple of methods that can get us through this challenge. Our first method that we'll explore is a one class SVM. An SVM, or Support Vector Machine, maximizes the margin between normal and anything else.

We can do this by using the fitcsvm function and apply it to our data set with an anomaly frequency rate of 2%. Now this number kind of came out of the blue. And I'll explain how we got this number. First, before we can do anomaly detection, we need to understand a bit about how our process operates. And while we may not know exactly how often anomalies will occur in the future, we might have access to a sample of data that showed an approximate anomaly rate.

Now, this is something that we can iterate on in the future as we gain more knowledge about the process. But for now, we'll stick to 2%. Down here, I've plotted four of our metals, and drew lines where my model detected anomaly's occurring. We can notice some points have some pretty obvious spikes across all of the metals, which is no real surprise as to why they were selected.

However, we can also see some points where only some of our recorded metals had spikes, while some didn't, showing us that these machine learning models can pick up on trends that our human eyes can't perceive. Now, this was the one class SVM. But we do have similar approaches that we can apply.

The second approach that we're going to explore is the isolation forest. An isolation forest separates each training sample into its own leaf. And the normality score of the sample is decided on a number of decision splits. Now that we've gone over some of the approaches to anomaly detection, we can begin to compare different approaches to each other to see how well they perform.

Now, what I've done here is do that comparison of our different approaches to our selenium metal for a specified range of time. And so here we can see how the one class SVM picked up an anomaly. But we can also see our isolation forest picked up the same anomaly, plus some others. And we can even see further down that our control chart picked up a few more than our isolation forest and our one class SVM.

Because we have a measure of how impure the overall product was with this total analysis index, we can actually use this to compare the performance of our different approaches and see which approach performed most accurately. Now, if we define an anomaly as any point reaching a 5 or greater in the total analysis index, we can see how the control chart was able to detect the most points accurately. However, if we were to raise our threshold value just a little more, the control chart may be a bit too sensitive.

And so in this case, we might want to go with our isolation forest approach, which gives us a balance of sensitivity and accuracy. Using our best approach, we can begin to apply anomaly detection to see just when our process might have gone awry. Now, let's go back and recap what we went over.

In this demo, we went through data of impurities collected from samples in electrolytical copper production. Using this data, we used a box plot and control chart to apply a simple statistics approach to anomaly detection. We then switched to a machine learning approach through one class SVM and isolation forest to explore other methods to detect the anomalies. And in this case, because we had a measure of our total impurities using our total analysis index, we could come to some conclusion of which model best suited our needs.

But what happens if the issue is not in a process, but rather in our machinery? Let's explore this question in our next example. In example two, we're going to look at a welding machine. Specifically, we've gathered accelerometer data in the x, y, and z-axes from points before and after maintenance.

Our goal now is to come up with a method or model that can detect the anomalies before a machine might need maintenance. In this case, we know when maintenance was performed so we can split our data to the before and after categories. In this instance, we'll trust that any data collected after maintenance is healthy. But we can't always assume that the data collected before maintenance was unhealthy.

Therefore, we're going to develop our model using only our healthy data again. So let's get back into MATLAB and get started. Again, remember from our workflow, the first step is to acquire our data. Here we're going to load in our data set again and bring it in. And we can see exactly what our data looks like.

We have three channels that represent our different axes, and the label of whether or not it was collected before or after maintenance. Overall, we have 40 different time series signals that we've collected, 20 coming from before maintenance and 20 coming after. Now that we know what our data structured like, let's visualize the data.

In this graph, we can see two different curves for each channel, before maintenance in orange and after maintenance in blue. Already, we can note some visual differences between the two states of before and after, but it's pretty difficult to get a sense of how to define these differences. Like we did in the previous example, we're going to train an algorithm to answer this question for us.

We will only rely on the normal data, in this case, the data right after maintenance. As you mentioned earlier in the real world application, most of your data will probably be normal. In the previous example, we mainly used the raw data. But here, we're going to extract some features from the data.

There are a few reasons why you might want to do this. Perhaps you want to reduce the size of your data. But more likely, it's to extract the information that's better at separating normal from anomalous data. So feature extraction can be a pretty iterative process that requires some understanding of your machine. And here you can see on the screen just how involved the process can be.

But in MATLAB, we have this nice handy app called the Diagnostic Feature Designer app. And what's nice about this app is that allows us to interactively extract a number of commonly used features in predictive maintenance, which can save us a lot of time. Now, let's go into the app gallery at the top here and open up the app.

Like most MATLAB apps, we're going to go from left to right, starting with a new session. Now, I'm going to choose our training data here and I'll begin by selecting or deselecting the different signals that I want to work with. Once your data has been imported, let's choose channel 1 and begin by graphing our signal using the signal trace.

We can further separate out this plot by choosing to group by labels. And if I want to further explore my data, I can do so using the panner tool and focus in on different areas. I can even pan through different regions that might be of interest to me. But let's select the whole signal again and begin our feature extraction.

Now we can see here with a simple click of a button, MATLAB has generated a wide range of features that it can extract from our data. And we still have other categories that we can choose from. Now, I'll go ahead and keep all of these selected, but I'll show you in just a moment how we can narrow down this list to the most relevant features that can help differentiate between normal and abnormal data.

Now, let's hit Apply. And then we'll see that this feature table of all the features that we generated populate. Now, we also have some histograms that show us a distribution of these features. But I'm more interested now in the ranking of my features. And we can do this by going back to the Feature Designer tab. And we can see this rank features options.

And you might have guessed it, this is what we're going to actually use to find those relevant features. Here we can see our features sorted by importance using a T-test. Now with what we've done so far, I'm pretty happy with the results. I was able to generate quite a large number of features that I can use. And I was also able to narrow down the list to something that's a little bit more applicable and practical to use.

So now let's choose to export these features and use them to develop our model. Now, I can export the features directly to the MATLAB workspace or I can generate a function for these features. This option here is going to help us with the last portion of our anomaly detection workflow, the deploy and integrate portion. So let's choose it.

Now MATLAB is going to ask us a couple more questions such as which feature table to export, which ranking algorithm to use, and how many features we want MATLAB to extract. Let's choose four and hit OK. Now here we can see the generator function that MATLAB has created for us. And all this code that we just created with a click of a button-- this is going to reproduce all the work that we just did in that app, allowing us to export these features in the future to any new data that we might feed into it.

This is going to open the door for us to let us play with smaller sets of data and expand out to larger sets of data. Or, going back to our workflow, it will help us with the deployment by letting us generate C or C++ code from this MATLAB code, allowing us to deploy the feature extraction onto an embedded device. Now, let's go back to the example and see which features we chose.

We can see that the app chose four distinct sets of features for each channel that we have. And now if you go into our next script, we can see that the actual data set that we're going to be using is much larger than what we worked with in the app. Here you can see that the data set is actually over 17,000 signals. And so again, that benefit that I mentioned earlier, using this generated function, we're able to take a small subset of our data, extract relevant features from it, generate the code required to extract those features, and apply that same function onto our larger set of data.

This allows us to save a lot of time in our data analytics work. Once we have extracted our relevant features and loaded them into our workspace, we need to split our data into training and testing sets. In this case, because we have a large amount of data, we can approach our anomaly detection algorithm using a deep learning method. We'll use an auto-encoder deep learning network to create our anomaly detection algorithm.

Now before going further, I want to spend some time talking about how an auto-encoder works. An auto-encoder network outputs a reconstruction of a given input. And this is done by using two smaller networks-- an encoder that learns the set of features from the input data and a decoder that's used to reconstruct the data based on those features.

Here we see an example of this. Now going back to our example, we can see the steps in setting up and training our auto-encoder network. Originally, we would have used a train network function to train our deep learning network. But because deep learning networks take a pretty long time to train, I'm just going to load it in the save network that I have pre-trained.

Now, let's take a look at the performance of our model. On the top, we can see that the error is high, which means that the model doesn't reconstruct those signals well. On the bottom, however, the error is low. So it's clear that the model performs better on the data right after maintenance.

And this makes sense. Since I trained the model in all the healthy data, we assumed whatever reconstruction is a reconstruction of the healthy data. But what level of error means there's an anomaly? Let's define the threshold for what we will qualify as an anomaly. To do this, I'll define an anomaly as a point that has a reconstruction error greater than some value times the mean.

To try to make this process a bit more automated, I've gone ahead and put it in an interactive slider. Doing this allows me to play with different threshold values till I find the one that gives me the highest validation accuracy. And what we'll see is that a threshold value of 0.5 gives me a validation accuracy of over 99%, which is acceptable for me.

Now, we've done a lot of work with this auto-encoder, but again, we won't always have this much data to work with. In this example, we simulate the situation by taking a smaller subset of our whole data again. Now at this point, with all the different approaches that we've taken, the question starts to form, which approach should I take given the situation that I'm in? And this is a very good question to ask. And while it can be confusing to answer, this diagram here it gives us some insight into some possible solutions.

Here we can see what model to use to best complement the situation we might be in. For a large amount of data that we can only classify as healthy, approaches such as thresholding, distance-based methods, auto-encoders, or dynamic system models work well. In approaches where we have a label of what's healthy and what's not, a supervised learning approach will be best suited. When we have a smaller amount of data, going back to our previously explored unsupervised machine learning approaches, such as the one class SVM or isolation forest, may work best.

Now the rest of this demo applies to one class SVM and isolation forest to our smaller data set. Now, let's go over the results to see how well they performed. Here we can see the results of the one class SVM in the confusion matrix, comparing the predicted class against the true class. We can see with the one class SVM that all true anomalies are predicted, with only a small number of normal points being mistaken as an anomaly. Here we can see the results of the isolation forest in the confusion matrix.

Now if we wanted to, we can even go further and tweak some hyper-parameters to try to improve the performance of either model. But even with what we have, we can see that the performance of our machine-learning models are pretty high. You can still get good results on a smaller data set using these machine learning approaches to anomaly detection versus using a deep learning approach with an auto-encoder like we saw earlier. However, remember your mileage may vary with the approach that you take. And using a combination of different methods and comparing your results will allow you to craft the best model for any use case you might have.

Now, let's go back and recap what we went over. In this example, we went through accelerometer data collected from a welding machine in different points in time, either before or after maintenance. Using this data, we explored a deep learning approach using an auto-encoder. We also simulated a situation where we had a much smaller subset of data and talked about multiple machine learning approaches that we could take. We explored these methods to help develop a model to identify anomalies that occur before maintenance. And this way, we can predict when a machine might need maintenance and plan for it in the future.

Now, we touched upon the last step of our workflow, deploy and integrate, briefly in our last example. Using MATLAB's built-in capabilities along with the MATLAB coder tool, you can automatically translate high level MATLAB code into low level code. And we can generate C, C++ code from automatically generated MATLAB functions to deploy onto embedded systems.

We also saw in the beginning how we can package our solutions into an application, which we can also deploy onto some cloud platform. These solutions will allow us to integrate our anomaly detection algorithms into environments where you're pulling in data live, letting you detect anomalies as they're happening. Now, I know we covered quite a lot. But the basic summary of this webinar is that MATLAB enables you to quickly create, test, and implement anomaly detection programs.

We went over how we can use MATLAB to detect anomalies in our process with our copper production example. And we used our welding machine example to show how anomaly detection can be used in machinery to help predict when maintenance might be needed. Now, if you wanted more help on your specific anomaly detection applications, MathWorks has a wide range of options from hands-off options, such as training, to involved solutions, such as consulting. We offer many different services to get you ramped up and ready to detect anomalies in your work. Now, that will conclude this webinar on anomaly detection for industrial processes and machinery with MATLAB. Thank you so much and have a wonderful rest of your day.