Description

Asset Health Monitoring and Predictive Maintenance of Electric Utility Equipment using Artificial Intelligence

Overview

The use of AI techniques on time-series data is growing in popularity across electric utilities sector for asset management, demand response, outage management, customer services, energy storage, renewable resources, and many other areas in the power generation and delivery system.

In this webinar, we will present a case study on “Identifying Risk in Underground Utility Cable Systems using Machine Learning and Deep Learning”. Predictive maintenance begins with understanding how cable system failures occur. Analyzing and interpreting results from partial discharge (PD) measurements taken in the field can be a complex task for humans. Machine learning algorithms and deep learning algorithms are used to automatically identify and categorize markers of defects contained in the PD measurements. These algorithms are used to categorize different defect types by risk of going to failures soon. Differentiating cables with “high to low risk defects” along with those that are “defect free” enables predictive maintenance. Examples of identified defects will be presented.

You will also learn how to apply AI using MATLAB^® for asset condition monitoring, and find out about tools and fundamental approaches for developing advanced predictive models on time series data. Using a real-world faulty dataset we will show two approaches of building deep learning models using convolution neural networks and recurrent neural networks, and finally deploying the models on an edge devices or the cloud.

Specific Topics include:

Acquiring, creating and annotating faulty datasets
Applying time-frequency transformations, extracting established signal features, and automating deep feature extraction using Invariant Scattering Convolutional Networks
Building and comparing Deep Learning models with CNNs and LSTMs
Generating stand-alone CUDA code to deploy models on edge devices
Deploying and integration on the cloud

About the Presenter

1. Shishir Shekhar

With over a decade of engineering and management experience in the Power & Energy industry, Shishir Shekhar is responsible for Worldwide Utilities & Energy business segment at MathWorks Inc. Shishir leads the global Market & Strategy functions supporting the breadth of the MathWorks product line from Artificial Intelligence, Advanced Simulations, Cloud Computing, IoT, Big Data, etc. Shishir advises leading Electric Utilities, Renewable Energy and Power Systems Automation organizations globally on adopting and implementing Digital Technologies and Solutions for their Grid Modernization and Digitization Initiatives.

Before joining MathWorks Inc, Shishir was Engineering Lead, New Initiatives (Grid Modernization and Digitization) at National Grid USA, where he led innovation programs and pilot demonstration projects on developing Digital Solutions for Asset Management, Grid Analytics, etc. Shishir also led large, multimillion-dollar Infrastructure projects such as Integration of Wind, Solar and Energy Storage technologies on Electric T&D systems, Planning and Commissioning of High Voltage Overhead and Underground Transmission Networks.

Shishir is a Senior Member of IEEE and member of CIGRE.

Shishir holds a Master’s Degree in Business Management from Harvard University, USA and a Master’s Degree in Electrical Power Engineering from Northeastern University, USA. Shishir was a Research Associate at Massachusetts Institute of Technology (MIT) USA, where his research focus was in the area of Smart Grids, studying Economics of Energy Storage and Wind Technologies for Improving Grid Reliability and identifying the use of storage for new revenue streams in energy and power markets across USA. Shishir received his Bachelor’s Degree in Electronics and Communication Engineering from SRM University, India and was awarded Notable Alumni of the university in 2019.

2. Akhilesh Mishra

Akhilesh Mishra is an Application Engineer for the Electric Utility and Healthcare Industry at MathWorks. He specializes in the signal/data processing, artificial intelligence and GPU computing workflows. He has been with MathWorks since 2016. Akhilesh holds a M.S. degree from University of Kansas where he was the signal processing lead in a group working on radar and sonar systems for sounding the ice sheets of Greenland and Antarctica to study global sea-level rise.

Recorded: 29 Oct 2020

Full Transcript

Hello, everyone. My name is Shishir Shekhar, and I manage the worldwide utilities energy business segment at MathWorks. I'm joined by Akhilesh Mishra, who is a senior applications engineer and works closely with utilities and energy industry customers in North America. Today, we will present on how you can develop asset health monitoring and predictive maintenance solutions to monitor your high-voltage utility equipment using artificial intelligence techniques.

This is the agenda that we have for you today. As part of this webinar series, there'll be two parts. Today, we'll be focusing on the part 1, which is about health monitoring using AI techniques. And part 2, we will talk about how you can develop predictive maintenance algorithms using artificial intelligence and the Digital Twin approach.

Each year, millions of people and thousands of businesses are impacted by underground cable systems and transformer failures. Our transformers' underground cables are one of the most expensive and critical equipment in the electricity network. These equipment are subjected to high electrical and mechanical stress, which can lead to failures. Electric utilities and manufacturers are looking for innovative techniques that enable low=cost condition monitoring and predictive maintenance.

We'll talk about, first, how transformers generally fail. And here, I'm referring to power transformers. Insulation degradation in bushings, windings, and oil are influenced by various electrical, thermal, mechanical, and environmental stress factors over many years in service. While there are other techniques, such as dissolved gas analysis, frequency response analysis. Partial discharge activity is a reliable indicator of installation condition and transformer bushings and windings, as it is often a sign of insulation defect that can potentially cause failures.

So how do we detect partial discharge? Deterioration of insulation leads to formation of void. This void could be a gas bubble in the transformer oil crack-- artificial-- in the paper insulation.

The void is characterized by a dielectric constant that is much lesser than the surrounding insulating medium. Under the condition of high electric field which is sufficient to overcome the breakdown field of the void, the void simply collapses and generates a discharge. A discharge is associated with release of small amount of heat energy. It also triggers some chemical reactions. It emits an electromagnetic pulse and generates a small pressure wave in the insulating region.

Now, all these forms opportunities for us to detect the presence of partial discharge. What is most interesting about the acoustic method of detecting partial discharge? It is very advantageous because the acoustic signals nicely propagate in the oil medium and when it hit the transformer tank walls. This is very similar to the ripples generated in the water tank when we throw a stone in it. The partial discharge signals and other measured parameters can be captured. And error techniques can be applied to detect faults and predict failures of transformers.

Next we will look at how cables fail. The cables can fail from any combination of electrical, mechanical, and thermal factors. Understanding how cables fail enables a predictive maintenance approach.

Typical cable failures mechanisms are well understood. 99% of solid dielectric cables' failures are associated with partial discharge. Differentiating cables with high to low risk defects, along with those that are defect-free, enables predictive maintenance of cable systems. Now, current testing methods can lead to challenges and frustration for engineers in electric utilities and renewable energy companies. And we'll talk some of these challenges in the next slide.

Most of the testing for cables and transformers are done offline. The cables and transformers have to be taken out of service, and partial discharge signals are analyzed manually. Analyzing and interpreting results of partial discharge measurements can be complex tasks for humans. Different engineers may interpret partial discharge signals differently. When you're dealing with hundreds of miles of cables and thousands of transformers, analyzing these partial discharge signals can be very time consuming, expensive, and very difficult to scale.

Now, there are different types of maintenance approaches that companies take. The three types of maintenance approaches are reactive, preventive, predictive. While in the reactive approach, companies do maintenance once there is a problem.

In preventive approach, maintenance is done at the regular rate. However, it leads to unnecessary maintenance and wasteful use of crews. And it may not eliminate all the failures.

Predictive approach, you generally forecast when problems will arise. It could be a bit challenging to make accurate forecasts for complex equipment. But then, we'll show you how you can use artificial intelligence techniques, and also complement that with simulation techniques to develop predictive maintenance algorithms. As I mentioned, reactive approach causes high and unexpected costs, loss of service, deployment of emergency crews, increasing reliability indices, and potential quality damage.

Now I'll present a case study on artificial intelligence techniques that were used to develop asset condition monitoring solutions for underground cable systems. This project was jointly done between MathWorks and IMCORP. In this case study, I will demonstrate how machine learning and deep learning approaches were used to categorize risk in cables.

The main difference between machine learning and deep learning is that machine learning typically involves feature extraction, and deep learning does not. Deep learning techniques deal with data. Whereas machine learning technique saturates.

Many people in the utility industry ask, what kind of data should we have before we can even use machine learning or deep learning techniques? Well, it doesn't really matter. You can use different types of data sets.

Although, depending on the type of data, you might use a different approach or a different technique. For example, if you're working with numeric data, you generally use machine learning or LSTM approach. If you have time series data-- that is, data from your sensors-- or if you have extra numeric data-- that is, data from your logs of maintenance by your linemen-- you can actually use either convolutional neural networks or LSTM-- that is, long short-term memory. If you're looking at image data-- you have images of assets or you have images of vegetation data-- you generally want to use convolutional neural network techniques.

Here we see time series data from partial discharge measurements. This time series data consists of partial discharge signals, noise signals-- which are non-partial discharge signals. Location-specific effects produced partial discharge. By able to use machine learning and deep learning techniques, interpretation can be taken out of human hands for a reliable prediction.

As I already mentioned, different types of learning approaches lend themselves to application phase. In this case, we have chosen two approaches. One is a machine learning approach. Another one is a deep learning approach when it comes to a partial discharge signal wave forms. The partial discharge signals that I showed earlier are time series digitized signals. And they lend themselves very well to machine learning models, where the extracted features of time series are being processed, or to a deep learning approach-- long short-term memory approach-- where the waveforms are actually classified based on the features in the time series data.

In this example, I would like to demonstrate to you that machine learning approach based on a boosted tree methodology was very successful on a data pool of 353,000 labeled waveform parameters. The waveforms are actually labeled by human expert analyst and were not labeled one by one, as you can imagine. We found out that out of 43 abstracted features, only eight were necessary to predict the overall accuracy of 94%.

So is 94% a success? We wanted to find out how much closer to 100% we can go. And I will give you a little update in a couple of slides about it.

The next approach that we applied to our challenge was the deep learning approach on the partial discharge time series data. We used a long short-term memory approach where we used the time domain signals as a feature and added other features on there. At this time, we employed this model on a pool of about 900,000 human-labeled signal waveforms. Then we found out the overall validation accuracy is 93%.

Deep learning is more advisable when you have large amount of data sets. So we process more data. And studies have been performed in larger pools that I wanted to give you an example on, about 900,000 labeled waveforms. So 93% is not bad, either. But the question is, why isn't are we getting close to 100%?

So here are the reasons. When you look at this curve, the probability is being plotted against the partial discharge signals. And they have been sorted by probability, from 1 being a true partial discharge to 0, like the very clear non-partial discharge signals.

And you see that you can categorize that pool of data into three main buckets. One is the clear partial discharge. The other one is a clear non-partial discharge. And in the middle, there is a transition which you would have wished would be a little steeper.

These are ambiguous waveforms. So when we went back and looked-- especially at the ambiguous pool-- we found out that some of those signals actually have features that make them look either way. And this depends, more or less, on the signal data quality.

So we know the inputs have to be of a specific quality if you would like to classify very precisely. So we can either do better training in that zone, or we can clean up our signals to begin with. That's for signal capture and signal processing issue. Or we can apply different models to that specific zone in order to have higher validation accuracy.

This specific slide shows you how an LSTM long short-term memory network is being trained inside in MATLAB. I told you about this pool of 900,000 labeled waveforms. So when actually selecting data for training, there are considerations about the ratio between positive features and negative features, the specific number of approaches you want to run, the number of iterations, and the specific data conditioning before you start training a model.

In this case, the model training takes several hours, sometimes even several days. And we found out that important condition of the data, and to find the right training method in LSTM network. As I mentioned before, 93% or 94% accuracy is pretty good. But we would like to improve that, so that model compared to a human is almost equally as good.

Another approach that helps us to learn some information is the convolutional neural network approach. Convolutional neural network lends itself very well to a measured recognition or feature recognition on images. So partial discharge data can also be converted into plots where the partial discharge magnitude in picocoulombs is being plotted against a phase of the excitation voltage.

So those are the two pictures you see at the bottom of this presentation. Those actually have significance. Those pictures can reveal the nature of the partial discharge defect in a cable system. Especially, even the partial discharge defects are based on associated electrical drain, which is a micro defect in the cable insulation. Those [AUDIO OUT] defects are much higher of risk of failure than the other types of defects.

Utilities are interested in finding out the high-risk defect that eventually make a cable fail. In this case, we found out that the convolutional neural network was trained very successfully on identifying those partial discharge defects related to electrical drain. That allowed us and enabled us to devise a partial discharge severity factor that is actually serving now as a risk categorization for a specific partial discharge defect.

This is done automatically with a machine. So the human will be alerted that there might be defects that might be more risky than others. It actually helps us to inform that there are specific defects in the cable that needed much more urgent attention than other types of defects. So it allows operational people-- at network owners, cable network owners and operators-- to categorize and classify, prioritize repairs of their underground cable systems.

Here are just a few examples of those electrical three type defects that our machine has phone automatically for us. And sure enough, when we look back into data and went a little closer into the data, we found out these would be the ones that the cable network owner should take care of first. And this is something that is all being retained in the database, that automatically categorizes and prioritizes risk of partial discharge defects going to failure.

The remaining useful life is a very important question that needs to be answered. Especially when you maintain and operate underground cable systems or transformers, the costs are very high when you need to replace this equipment on the field. So the question that always comes up and everyone would like to answer is, how much remaining useful life is in my equipment?

The remaining useful life is not just a question of time to failure. It's also a question of when, economically, it is not viable anymore to operate an underground cable system. So typically-- and that's why there is a curve that we would like to explain briefly-- a cable failure develops over time. And long before the cable actually fails, that defect is already active in the cable system.

What is plotted here on the x-axis is time. And on the y-axis is the partial discharge inception voltage take off a defect. The higher the voltage, the less likely it is a transient overvoltage event. Which means it is very temporally short event that can ignite and make this partial discharge defect active for a very short time.

So there are a few things that can actually make a defect active. For example, lightning. If lightning strikes close to a cable or hits, actually, a jumper that is connected to a cable, that cable will see an overvoltage. And the already existing partial discharge defect is becoming active for a short amount of time for this event, and continues to grow and degrade the insulating material.

Thumping is actually a methodology where artificial overvoltage or high-voltage pulses are injected in an underground power cable to find a failure. Thumping is not helping you to find an existing failure. It's also continuing to degrade existing defects that are elsewhere in the cable, as the overvoltage causes our switching events. Switching of cables, it obviously generates transient overvoltages for a short amount of time.

So every time a partially shut defect is active in a cable, a partial discharge inception voltage is decreasing. So eventually, it will decrease to a level of the operating voltage. But the operating voltage, which is always on, this partial discharge would be continuously being active and very quickly go to failure.

So somewhere in between-- this is between the blue and the yellow-shaded region-- the condition-based assessment-- partial discharge assessment-- has been performed. So if we have a point in time-- measurement of the condition across the cable-- based on that one point in time, it should be possible-- and it is possible-- to actually give predictions about how much longer does it take before the cable will go to failure. We will get into what would you need to know to make this kind of prediction. In the part 2 of the series, we will demonstrate how you can calculate remaining useful life of utility equipment such as underground cables.

As I just mentioned, the remaining useful life depends on many variables for underground power, cable systems, and transformers. It may depend on transient overvoltages-- as I have just explained-- pumping, or overvoltages that are artificially generated. The operation of the asset-- that is, the power flow, the current, temperature of the equipment. And also, the age and other intrinsic asset data.

For instance how old is a transformer on the cable system? What is the manufacturer? What kind of jacket does the cable have? Or does it even have a jacket?

So when you look to the very last on the slide, you will see a whole layer of different information that is actually being used to predict the remaining useful life of a cable or other utility assets. Other data, you know when lightning strikes happen, where they happen, geographical information. How likely is that this cable or transformer will be flooded or not? All this kind of information flow into the model that helps you to predict how long before the utility equipment actually fails on the field.

So in summary, transformers and underground cable failures can be predicted. Predictive maintenance begins with understanding of how equipment failure actually occurs. Machine learning and deep learning algorithms are used to automatically identify and categorize risk makers of defects contained in the partial discharge measurements. These algorithms are used to categorize different types of risk of going to failure soon. Differentiating equipment with high to low risk defects, along with those that are defect-free enabled predictive maintenance.

In the next video, my colleague Akhilesh will show you how you can develop predictive maintenance and asset health monitoring solutions using machine learning and deep learning techniques. Due to the NDA between MathWorks and IMCORP, we cannot use the data that was used for the case study. In this example, we will use EKG signals instead of partial discharge signals to demonstrate the machine learning and deep learning workflow that I just showed you in the case study.

Hello. Welcome to the part 2 of the webinar on asset health monitoring and predictive maintenance of electrical utility equipment using AI. My name is Akhilesh, and I am a senior application engineer based up the Texas Plano office. And in this session, I will walk you through to example workflows, demonstrating on how to train the deep learning models which was shown by Shishir in our previous talk using images approach as well as signals approach.

So let's look closely at the AI workflow. So usually, to build a models, there are three steps involved. The first step is preparing the data. So you might have to collect a lot of data from the field. Or maybe you have a Historian servers which is storing the data from the past, which can be leveraged to do some ground truth labeling and prepare it for training purposes.

The next step is training the model, which involves building and training the model, and even accelerating the trainings on some hardware platforms, such as GPUs or cloud. Now, this training step, I would like to mention, is kind of the easy step right now. Because platforms like MATLAB-- or even the open-source solutions out there-- are fairly matured enough to provide us the infrastructure to quickly build, prototype, and train models and test them out.

But once we have built the models, the final step is to deploy the system for our application. So, deployment can be two things. One deployment is deploying on the hardware platforms like embedded devices such that we can take our devices in the field, get the data in real time, and get some inferences on failure analysis, then, right over there.

But then, the second deployment can be-- which is becoming, actually, very popular these days-- is enterprise deployment. Bureau application actually lives on cloud. And the end user might be connected through the cloud through an app.

And to give you an example in this asset health monitoring, you can take pictures through your IOS platform, and those pictures are processed through the cloud with the deep learning model. And you get the inference in real time. But then, the operator, which is sitting at a different remote site, can also access the same data and the same results and take some corrective actions.

Anyways, these three steps does involve going back and forth until more iterations, until we get more refinement. And even if we deploy the system, we might have to go back and make some more tunings, and we realize that we might have missed a case. And it would be an ongoing process. And the good news is, with MATLAB, it's not just only training the model, but from end to end, all the way from accessing data to deploying, our tools would help you cover the end-to-end workflow so that you can become successful in training, building, and deploying your AI models for real-time inference in a very short amount of time.

So the next examples, I'll walk you through the workflow of building such failure analysis predictive models. But in this case, we'll be using a representative data set, which is data from ECG signals having three different classes. One class is the healthy class, which correspond to the normal sinus rhythm. But then the other two classes are the faulty classes, which correspond to arrhythmia and congestive heart failure, respectively.

Now, the reason why we cannot work on the IMCORP data was because of the nondisclosure agreement. So we were not able to use their data. But ECG data is good enough to demonstrate the workflow on how they build their images and signal-based model.

So let's dive into the first part, which is using images approach. So with the signals, a very popular technique is to convert signals into time-frequency images, and then train what we call convolution neural networks-- or the CNNs, Now, this approach is very popular, because in the deep learning word, CNNs is a very readily available in the public domain. And we can reuse a lot of the pretrained CNNs quickly and test it out on a new data set by converting our signals to images and build some predictive models.

Now, converting to signals to images can help us leverage this CNN workflows. And it can help us quickly train models and get some initial good results. But the question arises, which time frequency representation to choose from. Now, In MATLAB, even outside-in.

This approach serves as a good starting point for signal application. Then we can quickly convert signals to time frequency images and get some initial results out. Now, the question arises, which time frequency representation to choose from?

Now, in MATLAB, we do have a lot of different time frequency representation techniques, such as the spectrogram or some modified spectrogram, like Fourier synchrosqueezed spectrogram, Hilbert-Huang transform, Wigner-Ville transform, or the wavelet transform using wavelets. Now, all these techniques, there are quite a bunch of techniques to choose from.

You can go to MathWorks page, the time frequency gallery. And it kind of walks us through all the different time-frequency methods available and what are the advantages and disadvantages of using one method over the other. And at the same time, if we scroll down, we have some specific examples and use cases for each of these steps.

So we do have a lot of representations to choose from. But apparently, for the EKG signals, one of the characteristic features of the signals is that there are features which is changing very quickly in time. And those high-frequency features which are changing in a very short duration of time, the technique which captures those transients with high fidelity is the one approach which we would like to work with.

And apparently, as it turns out, frequency domain methods like spectrogram FTT-based methods are not a very good choice because they don't give a high enough resolution. But very quickly, we can convert signals to CWT. And what we see is that when we convert it to CWT, each of these pulses, we can see that the resolution is pretty high in the time and frequency domain.

Each of the pulses are separated out. The heartbeats are separated out in time. But if we see individual pulse, we'll see that all the frequency variations of a single pulse are beautifully captured in the frequency domain. So the bottom line is that the better resolution we get, the better AI model is what we're going to train from those images.

So now that we've established the method to convert signals to time frequency presentation using wavelets, then, what is the next step? Like I said in the beginning, it is very simple. All you need to do is train.

Now that we have chosen our technique to convert our signals to time frequency images, we can build deep learning models at this phase, either from scratch or-- like I said before-- using pretrained models out there. Now, in the community, there are very standard pretrained models, like AlexNet, VGG-16, GoogLeNet, SqueezeNet.

And if you see this link over here in MathWorks' page, we kind of have already included all of these models. And we do give you a comparison on which model to work with depending upon what's the need. Suppose if you're looking for a very high accuracy and you do not worry too much of the inference time, then you can go with these higher-end models, like Inception v3 or DenseNet or a NASNet.

But if the real-time inference is of critical importance and the inference speed needs to be high, then you can work with models like ResNet 18 or VGG-16 and still get some decent accuracy out of it. All of these models are very popular in the public domain. But we do not restrict ourselves with the models which we see on the screen over here.

As of today, MATLAB works with the ONNX framework, which allows us to import models from any platform. So the popular platforms like TensorFlow, Caffe2, PyTorch, mxnet, any model-- deep learning model-- you find which kind of resembles to your needs or your workflows, you can convert them into ONNX format, which is abbreviation for Open Neural Network Exchange. And you can bring those models in MATLAB and start working with them.

Now, the advantage of this approach is that as of today, you can log into a GitHub page, and let's say you find some nice paper research article in the community. And in the GitHub, they already have given you a model. Say it was developed in PyTorch.

And you don't have to go and learn PyTorch to work with that model. You can directly convert it to ONNX, bring it to MATLAB and leverage the nice time frequency or signal processing or deployment capability, then work with that model in MATLAB itself. Well, that being said, we also allow the exporting of deep learning models from MATLAB to other frameworks by the same channels. So if you have to share it with your colleagues, feel free to do that too.

So the approach used with the pretrained models is often referred to as transfer learning. And what, really, we do is, we take these deep learning networks-- like ResNet 50, VGG-16, or whatever it is-- modify a few layers, and then retrain on the new images. And very quickly, within a few lines of code, we can get some new inferences out of those models.

So let's go ahead and jump to MATLAB and see a live example of this workflow using ECG data set. So this is my MATLAB over here. And I'll start with loading of the data.

Like I said, there are 162 samples we'll be working with. And each of the EKG samples already have the pre-labels, like ARR for arrhythmia, CHF for congestive heart failure, which are the two failures modes. And NSR is a normal sinus rhythm. And if you notice, the length of each signal is 65,000 samples long.

So we can quickly go ahead. And like I had mentioned, the time frequency approach, I would like to showcase, at this point, a very useful app in MATLAB signal analyzer which allows us to visualize the signals quickly and do some analysis on it. So let's take an arrhythmia sample for demonstration purposes here.

So, very quickly, on this app, I've got a small fragment one of the arrhythmia samples. And if you see over here, we can do some pre-processing if we have to-- like high-pass, low-pass filtering and stuff-- and get the results on the screen. But in this case, what I like to do is, convert it into time frequency images.

And like I had mentioned before, if I convert it into spectrogram, the resolution is not that good. And we see that these peaks are not segregated out in time or frequency. But the other option I have is to convert it quickly into scalogram.

And what I see is, all of a sudden, each of these pulses are separated out in time. And each pulse has the frequency information of all this record. So this is the approach which would definitely give us better results when we bring AI models than the prior one.

And with the signal analyzer app, it further allows us additional features, like zooming in in certain region really quickly, analyzing a region of interest, creating samples from it by extracting the certain region of interest we would like to work with. But at the same time, anything which we do on the app, you can generate a script.

Let's say we generated a scalogram. We want to see the script. And it would automatically create the script for you, which will call the CWT function to generate the scalogram for the given signal. And it generated all the code automatically.

So in the following section, what I did was, I took that script and modified a little bit. So that in this case, what I'm doing is, I'm doing the same CWT. I create a filter bank and compute the CWT, which is the wavelet transform.

And over here, I have another function which is helping me write all the converted time frequency images for this 162 sample. And I save it in my data folder over here as ARR, CHF, and NSR in the respective folder names. And if I look over here, then I have got all these jpgs, which are the arrhythmia CHF, and NSR converted into these images which I'll be using for deep learning training.

So that being said, now let's go ahead and start training the models. And you'll see that it's such an easy step. So the first step, what I do is, I create a image data store with the given file path so that I can provide MATLAB-- while during the training-- the part where all these images are. And also, I split these images, using this function, into 80% for training purposes. And 20% I save for testing of my network to get an accuracy towards the end.

Now, note that I say that includes sub folders true and label source as the folder names. So what this data stores allows me is automatically associate the labels to each of those images from the folder names. So if I look into this data stored real quick object, like all images, you'll see that 162 is the path to all the files where my images are. But then, I do also have this function of this variable labels. And you can see that it is called the labels names, which is derived from the folder names.

So now we are set with the images. So the data preparation, the step number one is completed at this stage. And the next step is to train a pretrained network, AlexNet.

Now, AlexNet-- real quick, let me just quickly run this section. Now, AlexNet, like I said, is a pretrained network. It was available in the public domain. It's probably one of the simplest deep learning network. Or almost, it is fair to say that this is the hello, world example of deep learning.

Now, with MATLAB, one of the flexibilities which we have is, we've got another app called Deep Network Designer app. And if you go to this app section, you'll see that the Deep Network Designer app is located in this machine learning and deep learning. And what you can do is, you can entirely work on this app in developing and training the model.

So what I'll do over here is, I would open AlexNet. So if I click Show more, I can also start with a blank network. But it allows me a graphical interface to build, develop, and train models.

So let me just quickly open AlexNet. And you'll see that in this canvas layout, all the layers of the AlexNet are already imported. Let me just Zoom in.

So the first layer of the data input layer, which takes in an image of size 227 by 227 by 3 followed by all these convolution layers, relu layer and whatnot, which does the transformation of the image from one layer to another. Finally, if we go to this last layer, we see is that it is a classification layer which number of outputs is 1,000. And if is look closely, it's not populating all of it.

But the classes which this AlexNet is screening on is real-world objects like goldfish, great white sharks, tortoise, and whatnot. So in class for learning, what we have to do is, we take this AlexNet because it's all got these nice filters to be able to classify some real-world objects. But we will retrain it to classify our ECG images.

And the way it's done is, you go all the way till the final layers. You can delete the final two or three layers. In this case, I'll just delete the fully-connected classification and softmax layers and replace it with new learnable layers which I can easily choose from this library on the left side, which had all the layers available for me.

So this one was fully connected. And in this case, I can specify the property like the output size. Instead of for original AlexNet, it was 1,000. For my network it would be 3, because three classes. And then, I would have other layers such as the softmax layers, which is over here.

And the final layer is the classification layer, which is over here. So now I can connect it. And there's another nice tool in this one called Analyze, which can be used to analyze the new network. So this tool kind of gives you a sanity check that you made changes to the network, and we did not include any errors or warnings. And it is good.

And if we scroll all the way down, the fully connected is now 1, 1, 1 by 3-- for three classes. And we should be good to go for our training step. Once we've done that in this Deep Network Designer app, now the second step is to define the data.

Now, data, I can import directly from a image. Or I can import datastore from workspace, is what I'm going to use. So training data, I'll say that, hey, it is my training images over here.

And test validation, I would not worry about validation at this point. So what I'll say is none. And let's go ahead and do the import.

I have saved the test images for-- now, you can see that overview of the five images. So data is all good. Now let's go to the training steps.

In this one, we have to just define the training options. For instance, which solver what am I using? So I'm going to use sgdm solver.

And then, what's my initial learn rate? And in this case, I'll just put a learn rate of 0.001. I let all the parameters be all the default variables. But you can change all these hyperparameters, is what they refer to.

Another parameter which is important is this execution environment. If you already have GPUs or multiple GPUs, you can choose that. But if you do not have a GPU-- like, I'm running on a laptop right now-- you can just leave it to auto. And it will pick the most efficient resource available for training.

Now, note that we have this parallel option. What means is like if you have clusters or you're training on the cloud, MATLAB would directly connect to those clusters and scale the training to make it a lot much faster. So I'll use this options for training. And all I do is hit Train.

So you notice that for constructing deep learning network and loading the data, implementing the training, I did not have to give any command at all. Not even a single line of code. It is that easy.

So the training has started. And I give it a few seconds for train. So now we see that our network is fully trained. And it just took us three minutes and 31 seconds. I ran through this training process.

But now, since we have this trained network, let's just export it out-- the trained network-- and try to test it. So one of the ways is, I can export it out and test it, which I'll do right now. So it exported trainedNetwork_1 to the workspace.

But note that I can also generate a code for training. And what it allows is, it generates a code. So if I have to do the retraining several times, then I don't have to go through this app workflow every time. And I have an automated code for doing all of this training step.

So in this case, it created the code for setting the training options. And then this is the entire code for generating the AlexNet. Note that it generated the full code for the AlexNet, for all the layers.

But the last three layers-- like I had modified, fully connected, and also, there, called the command trainNetwork. And this is what I have, even over here, in my code. So I load in AlexNet, replace the layers.

I don't have to reprogram all the layers. I can just replace the final three layers. And I can train with the training option, then trainNetwork, whatnot.

So that being said, when I evaluate this trained model on a test set-- so the last time when I ran it, I saved the network as ECGNet. And I can read on it right now. The last time, when I run it on the test data set, it gave me an accuracy of 96%.

And that same data I had from last time, I ran the classify command on the train network. But let's go ahead and rename this new network that I just train from the app-- which is this train network-- as ECGNet, and read on this section. All right.

So that means now the network which we just trained is going to run on the test images set and generate this. So we notice that this time, we got a little-- this time when we ran it, we saw that we got a little less accuracy, 93%. Maybe an extra misclassification. But it's OK, because our test set is quite small.

So anyways, we just saw how we train the deep learning model with using CNN approach. To summarize, we converted signals to time frequency images and use that to train it on AlexNet. And even on the simplest AlexNet, we were getting an accuracy greater than 90%, which is fairly good.

So that being said, let's go and see the second approach, which is training deep learning models using signals directly. Now, what are the three challenges of practical-- challenges of training with CNNs? So in the beginning, I did say, so, you can get some quick results.

But then there's some disadvantages. Number one being, CNNs are often criticized to be non-transparent, because they are very deep and their predictions, it's hard to understand. Like in the first term, for instance, when I showed you the example, I got a 96% accuracy. But the second time, when I did with a randomized test data set, I got a 93% accuracy, maybe one more extra misclassification. But really, I cannot explain why it was doing that, why that deep learning was behaving like it. It's kind of hard to interpret the model.

And then, at the same time, another thing is, CNNs are quite deep. So if you're working with real-time inference-- even on a GPU-- the inference speed can be a limitation for certain applications. And at the same time, CNNs do require a lot of data. And sometimes, available data may be limited.

Si what's the solution? The solution is to train directly on the time series data. Because we see that the signal is pretty long-- 65,000 samples. And can we do that? And the answer is yes.

So the deep learning does provide you the architecture with what we call recurrent neural networks, which has one of the recurrent neural networks which we use, called LSTM-- short for long short-term memory networks. And we can directly feed in sequence input data-- like time series data-- in these layers and then get some classification or even regression, which is predicting the future value. So what did the long to short-term memory network, what did this-- a long short term memory network, how is it different?

Is works with time series signal, because it not only has nonlinear transformation, but then it also has the memory or a state associated with it. And as the new data is coming in, the state updates. And for future predictions, it is using the past data samples for future predictions.

So that's the memory element. And that's where it's known long short-term memory networks. But we are faced with another challenge in this application.

The ECG signals right now are 65,000 samples long. And if I directly throw them in LSTMs-- and because they're so changing rapidly with time in a short time with this high frequency QRS complexes, what happens is that when I attempt the deep learning training, it looks something like this. The network never learns. It's getting all confused. So what is the solution?

The solution is, if there is a way I can reduce the dimensionality of the signal and then extract the relevant information or the relevant features which impact the training process or which impact what dictates the differentiation of ARR from CHF and NSR and use them for training, I would get a much more high-fidelity network. So the solution which MathWorks offers is, we have a lot of feature-extracting tools. And one of the automatic feature-extracting tools which captures those nuances between the different signals with the reduction of features' signal size is called the wavelet scattering network.

And wavelet it scattering as similar to a CNN in the sense that it's got multiple layers, like convolution layer, RaLU layer, and pooling layer. But in this case, the convolution is performed with a known set of wavelet filter bank. It's a blip convolution. The nonlinearity layer is where we're doing the scaling.

And then you've got a wavelet can-- the nonlinearity and the averaging operations results in the output of scattering coefficients, which is the one which we extract after each layer or level, what they call it, and use them as the features to train our classifiers. Now, in this case, wavelet scattering, right now in the community, is becoming more and more popular. And it's being featured in a lot of different computations.

And it it's often an excellent framework for automatically extracting relevant and compact features without you being an expert in the domain. Or it's helped you extract the relevant features without even getting into the details on how the signal changing. And then it's all automatic.

And actually, with this wavelet scattering method, you are able to train a successful deep learning network. And let's go ahead and see that example, live. So again, switching back to MATLAB.

Let's go to this LSTM exercise over here. So we're still working with the same data set-- the same ECG data set. But with the wavelet scattering features, I would like to show you, if I take one data set and extract this wavelet scattering feature using this featureMatrix command, what I see is, my original signal-- which is of length 65,000 samples long-- gets reduced to a size if 499 by 8. It's a 95% reduction in the size for these features.

And you can also visualize these features, although it will not make much sense. Like, say, level 1 or level 2 features for, let's say, for CHF. So we can plot a scattergram for all the features. And it's hard to understand what that scattergram means for level 1 or level 2. But one thing just to take away from this is, it has captured the small, subtle nuances which differentiate ARR from CHF or NSR, which would be used in the training process.

So in the next step, I actually have already partitioned the data randomly into training and testing set. And I will convert them, all of that data set which I have-- so 113 for training, 49 for testing-- and separate them out and extract the scattering features. And note that I'm using a for loop. So, parfor. So I would be using my multiple cores on the machine to do the feature extraction.

And all my features are extracted now. And train and test variables. And now I can train it on a LSTM with these features.

Now, one thing to note over here is that if I see this architecture-- now, this time, I won't do it on the Deep Network Designer app. I already have the code, so can just use it in this case. But the workflow is going to be the same.

Now, number of hidden units can be available property in your LSTM. So, when I add this LSTM layer. And it can change drastically how my network is, inferencing how much accuracy I'm getting.

Let's start with, say, we can vary this number of hidden units. And let me just start with this sequenceInputlayer(inputSize). So note that this input size is 499 because we have 499 by 8 feature set. So we're going to input 499 samples at a time.

And I have these training options and this train network command over here. So let's go ahead and quickly train the network. And because I've reduced the size of the data sets-- now, earlier, for the CNN, it took me three minutes to train.

But now the data has been reduced. You see that how far the draining is progressing with that small data set. And it would just take me a few seconds to actually train the entire network on the training data set.

And note that I'm not even using a GPU. It's test training on a single CPU at 22 seconds. That's all it took.

And if I go in and test this new network on the-- call the classify function, last time when I called, it was 90%. But let's go ahead and test it this time. So we get only two misclassifications, like last time. With CNN, then, we get a high accuracy of 96%.

So let me do another experiment. So, see how I can quickly change this number of hidden units? Let's say I increase it. And my delays in my sample is, say, almost equal to the size of the number of features I'm training in.

And let's go ahead and read on the script. Training options, retrain the network with the new number of hidden unit layers. And again, it won't take pretty long. Just a few seconds. And because, now, my learnable features like this additional hidden units, it's taking longer than before, which is OK.

So it has reached 100% accuracy in 41 seconds. And let's go ahead and test this network right now with that same test data set. And wow, I get even better accuracy. Only one misclassification.

So you see, not only playing with it is so easy. But you can see the impact it has in number of accuracy. Retraining multiple times and retraining the type of parameters can be, really, a relevant task for training the deep learning networks.

And as of today, I won't get into the details, but we also have this Experiment Manager app, which allows you to set up training sessions for multiple hyperparameters. And you can use your GPUs or CPUs-- multiple cores of the CPUs-- for training, and set up like 100 different use case scenarios that are different hyperparameters or hidden layers in the LSTM and set the training. Maybe leave your training overnight.

And what it will do is, it will compile the results. And what it will do is, it will save all the results. And it will save the result which has the--

So folks, this is the Experiment Manager link documentation page. And when we run a network with Experiment Manager-- so it kind of sets up all the experiments with the training. It performs multiple iteration and saved all the results. And once we come back, we can see which result is the best one. And you go with that hyperparameter for our network.

Anyways, so, that being said, so we did see that on a single CPU. We could efficiently train LSTM and get a high accuracy. But the enabler to work that LSTM was feature extraction, which we did with using the wavelet scattering network.

So, what's more? Right now, we only covered two use cases of the entire AI pipeline for this EKG classifier or fault detection system. We just did the classification using LSTMs and classification using CNNs. But then, other aspects are, we can also play in machine learning models, which we did not get into the details. Machine learning models, in a very simple way, and using our tool sets.

But then, we also have additional tools to help you do automated exploration and annotation labeling of the signals. We also did not cover on further techniques to improve accuracy by something which we call as hyperparameter optimization using Bayesian regularization methods. And deployment is another strong piece where you can directly generate code from your model to put it on your hardware device or the cloud.

So if you're interested in any part of this AI workflow-- or maybe even the entire whole AI pipeline-- I would like to mention, at this point, MathWorks is your AI partner. So anytime you have any data, or you would be willing to work on building an AI classification device or a regression device or a system, do not hesitate to reach out to us anytime. Your local contacts, your sales or application engineers you work with, we are more than happy to team up with you, look into your data, give you the insights, provide you with the right set of tools, empower you, and even build some proof-of-concepts. I'll work with you on projects hand-to-hand and help you enable that AI workflow which you are looking to achieve.

So with that, I would like to thank you, everybody, for taking out your time today. It was a pleasure presenting to you all. And I wish you all the very best in working with your AI application.

Related Resources

Related Products

Feedback

Deep Learning Toolbox

Up Next:

This session explores the fundamentals of machine learning using MATLAB . Rory reviews typical workflows for both supervised (classification and regression) and unsupervised learning, through examples. — Machine Learning for Predictive Modelling (Highlights)