Energy Speaker Series - Module 2: Utility Asset Condition Monitoring and Predictive Maintenance using Machine Learning and Artificial Intelligence - MATLAB
Video Player is loading.
Current Time 0:00
Duration 1:19:47
Loaded: 1.04%
Stream Type LIVE
Remaining Time 1:19:47
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
      Video length is 1:19:47

      Energy Speaker Series - Module 2: Utility Asset Condition Monitoring and Predictive Maintenance using Machine Learning and Artificial Intelligence

      Niels Jessen, Wind Turbine Performance
      Patrice Brunelle, Hydro Quebec
      Steffen Ziggler, IMCORP

      Session 2.1 Application of Artificial Neural Networks for Condition Monitoring of Wind Turbine Main Bearings – Niels Jessen, RWE Renewables

      In a wind turbine, a failure of an important component, such as a main bearing, can lead to long-lasting downtimes and thus to a corresponding energy loss. In offshore wind energy, the problem is even more serious as maintenance work is not always possible due to adverse weather conditions and must be planned in advance. In order to save operational expenditure, wind farm operators are required to implement a maintenance strategy that enables them to predict a component’s failure as early as possible.

      The RWE Renewables GmbH has developed an ANN based tool that predicts the temperature of undamaged main bearings based on a selection of SCADA signals. Anomalies are detected when the actual bearing temperature deviates from the predicted temperature. The tool was shown to be successful in detecting issues up to nine months before failure.

      Session 2.2 Grid Fault Location Detection Using System Simulation and Machine Learning - Patrice Brunelle, Hydro Quebec; Graham Dudgeon, MathWorks

      MathWorks and Hydro-Québec explore how both system simulation and machine learning can be used to develop algorithms that can detect the location of faults on electric grids using voltage sag measurements. System simulation is used to generate synthesized fault data that covers a broader operating envelope than measured data alone. The synthesized data is then used to train machine-learning classification algorithms. You’ll learn how the performance of classification algorithms may be used to provide further insight into the physical behavior of the system and any limitations associated with training data. You’ll also see how recommendations can be made from this insight to enhance system measurements and training data sets to improve overall classification accuracy.

      Session 2.3 Signal Waveform Classification in Partial Discharge Applications for Underground Power Cables - Steffen Ziegler, IMCORP

      Underground distribution cable system failures can be predicted! Predictive maintenance begins with understanding how cable system failures occur. Analyzing and interpreting results from partial discharge (PD) measurements taken in the field can be a complex task for humans. Machine learning algorithms and deep learning algorithms are used to automatically identify and categorize markers of defects contained in the PD measurements. These algorithms are used to categorize different defect types by risk of going to failures soon. Differentiating cables with “high to low risk defects” along with those that are “defect free” enables predictive maintenance. Examples of identified defects will be presented. 

      About the Presenters

      Niels Jessen, Wind Turbine Performance Analyst, RWE Renewables, Germany

      Niels Jessen studied mechanical engineering with focus on sustainable energy at Hamburg University of Applied Sciences (HAW Hamburg). He wrote his master thesis about the application of artificial neural networks for condition monitoring purposes of offshore wind turbines. He works since 2019 as Wind Turbine Performance Analyst at RWE Renewables.

      Patrice Brunelle, Scientist, Hydro Quebec, Canada 

      Patrice Brunelle is a Scientist with the Research Center of Hydro-Quebec. Over the last 20 years his work has centered around Power Systems and Power Electronic and he has been involved in the development of Simscape Electrical Specialized Power Systems (formally Power System blockset, and later on SimPowerSystems). Patrice holds a B.Sc. degree in Genie Unifié from the Universite du Quebec a Chicoutimi, Chicoutimi, Quebec, Canada and a M.Sc. degree in electrical engineering in 1994 from Universite Laval, Ste-Foy, Canada.

      Graham Dudgeon, Principal Product Manager, MathWorks Inc, USA

      Graham Dudgeon is the Principal Product Manager for Electrical Technology at MathWorks. Over the last two decades Graham has supported several industries in the Electrification area, including Aerospace, Défense, Automotive, Industrial Automation, Medical Devices and Power & Utilities. Graham’s technical experience covers - Transmission & Distribution, Grid Integration, Renewable Energy, Power Conversion, Motors & Drives, Microgrids, Electric Aircraft, Electric Ship and Electric Vehicle, with an emphasis on system modelling and simulation, control design, real-time simulation, machine learning and data analytics.

      Steffen Ziegler, Director- Signal Analysis & Artificial Intelligence, IMCORP, USA

      Steffen Ziegler completed his Master of Science degree in Electrical Engineering from the Karlsruhe Institute of Technology, Germany. Since 1999 Mr. Ziegler is working for IMCORP and is currently the Director for Signal Analysis and Artificial Intelligence. He has specialized in the field of digital signal processing applications and machine learning and deep learning applications for underground power system cables. He is a member of the IEEE Power & Energy Society and contributes as a working group member at Insulated Conductors Committee (ICC) meetings. He is also a member of the VDE in Germany. Since 2015, Mr. Ziegler is a member of the Industrial Advisory Board of the ECECS department at the University of New Haven in Connecticut.

      Recorded: 19 Nov 2020

      Welcome, everyone, to today's presentation. My name is Niels Jessen. I'm wind turbine performance analyst at RWE renewables, and today I'm going to give a presentation about the application of artificial neural networks for condition monitoring of wind turbine main bearings. So what's on the agenda? First of all, I will give a small introduction into wind energy and main bearings, and I will try to explain what the problem was that we were facing when we were trying to implement some condition monitoring for the wind turbines main bearings. Second would be the method we used to solve this problem, three will be the results, and then number four, a quick summary of this whole presentation.

      Let's start with the introduction. So first of all, I want to give you an overview about wind turbines in general. So what you can see here is a picture of a wind turbine, or more precisely, of the nacelle of a wind turbine that contains all the important main components of a wind turbine. And I will quickly run through the numbers.

      So number one over here would be the generator, number two is the gearbox, number three in the back of the nacelle is the transformer. Number four over here is the rotor blades. Number five is the pitch system, which basically rotates the rotor blades. Number six over here is the azimuth system, which rotates the nacelle. Then number seven is the main bearing and then we'll talk a bit more about this one in a minute.

      But first of all, we go to number eight which is the nacelle crane. And then number nine is the lighting production over here. And number 10 is the tower down here. But now back to the nacelle-- to the main bearing, number seven. Yeah, you can tell it's located in between the rotor and the gearbox. And its main task is really to absorb all the axial and radial loads that the wind puts on the rotor so they don't end up in the gearbox and damage something there.

      And yeah, considering the size of the wind turbine and the rotor, in general, you can imagine that the main bearing needs to deal with great force there. And because of that, it also is very prone to damage. So what's the problem with main bearings and main bearing failures in general?

      First of all, as I said before, the main bearings take great loads and are therefore prone to damage. And if one of these main bearings fails, they can cause long downtimes because the main bearing needs to be replaced and this is quite a complex process, really, and a very time consuming process. And the turbine can't produce any energy in that period. So this is something the wind farm operator wants to avoid at all costs.

      Then furthermore, a replacement is very expensive in general because the main bearing itself is very expensive. And one more point is that if the main bearing runs to failure, they can damage other surrounding components, which causes additional costs as well. So all of this wants to be-- so a wind farm operator wants to avoid all of this at all costs.

      Then the solution to this is really the wind farm operator implements predictive maintenance strategy, which requires some condition monitoring as well. Because-- yeah, the wind farm operator wants to monitor all the components, wants to see, continuously, if they are OK or not so they can schedule maintenance if required to avoid all these down times. And why the condition monitoring was a problem for the main bearing, I will tell you in a minute.

      And with this, we come to the second point on the agenda, the method we use to solve the problem we had with the condition monitoring of their main bearing. So we wanted to implement a condition monitoring for the main bearing temperature. And our presumption was, then, that in general, the damage in the main bearings should lead to increased heat generation due to the increased friction in the main bearing and that this should make the main bearing temperature a good indicator for the health of the main bearing.

      The problem with the health-- with the main bearing temperature is that it depends on many factors. As for example, the ambient temperature, but also the operational state of the turbine. So you can imagine that a turbine running in the hot summer months at full generation would have much higher main bearing temperatures than a turbine that is running in winter at lower power levels.

      And this makes it hard to implement a simple statistical temperature analysis for this one. Because if you would just define one upper limit for a sample, you would get mostly results-- well, mostly alarms for hot summer months at high power generation. It is not what you want, because these would mostly be phase alarms. And if you lower the limit, then you would get even more false alarms, really. And yeah, we've tried this and it was not really successful.

      A different way to solve this kind of problem would be to build a physical model that would take the underlying mathematical and physical relationships into account, like heat transfer, for example. But this would be difficult to implement for the wind farm operator because we don't have the detailed information about the main bearing. This knowledge is mostly accessible to the main bearing manufacturer, so this wasn't really an option for us because we would have to do some extensive main bearing-- needs some extensive reverse engineering, really, to get all the information we needed to build such a model.

      So our solution, then, was to model the main bearing temperature with an Artificial Neural Network, or in short, ANNs, because we knew that ANNs are particularly useful when the underlying mathematical relationships of a system are not known in detail, which is the case for us. But large amounts of operation data are available. And yeah, both of these applies to the main bearing of a wind turbine. And I've already talked about how we don't have all the detailed information about the main bearing, but I haven't talked about why we have so much data. And I will talk about that in the next slide.

      So what data did we use and why do we have so much of it? We use SCADA data, which is short for Supervisory Control And Data Acquisition. And in general, this is a computer system for monitoring and controlling technical processes. Nowadays, each wind turbine is remotely monitored by a SCADA system. And the system records all kinds of operating data and collects it. And this data can then be called up by the wind farm operator if needed.

      In general, the SCADA system records operating and environmental conditions of wind turbines in 10 minute intervals. So for wind-- 10 minutes interval, we would get the average value, the minimum value, and the maximum value, and also the standard deviation of that 10 minute period. And this can be all kinds of data, like temperatures, for example, also pressure data, electrical quantities, rotor speeds, wind speeds, really, all kinds of data. And this data is recorded by default and is available to the wind farm operator. And this makes it, for us, very reasonable to use this data for condition monitoring processes-- purposes.

      So next, I want to talk about how we use this SCADA data to train a artificial neural network to do what we want it to do, which is basically modeling the main bearing temperature. So first of all, we used only data of turbines with healthy main bearings. And as you can see on the right, over here in this box, is says Mathworks algorithm, and this Mathworks algorithm basically does the training for us. But we have to deliver some certain inputs to that algorithm so it can create an ANN model for us.

      So the inputs we delivered were really the SCADA input signals that have an influence on the main bearing temperature in some way, which can be, for example, the rotor speed or the ambient temperature as well. And we also need to provide the targets, which is basically what we want the artifical neural network to predict later. So for each input signal, we basically have a corresponding target as well. And then the algorithm does its job, does the training, and then in the end, it would have an ANN model that can predict the SCADA main bearing temperature based on certain SCADA inputs.

      So then on this slide, I want to talk about how we use that trained ANN to detect damages in the main bearing. So over here, you can see, in this box, the trained ANN, which was the result of the slide before. And now we just deliver the SCADA inputs. So the same quantities we've used before to train the model but of a different period. And now, we expect the trained ANN to use the SCADA inputs to give some main bearing temperature estimations, which basically how the temperature should normally be for these SCADA inputs.

      And then in the next step, these main bearing temperature estimations are compared to this SCADA main bearing temperature. And I call this here a deviation analysis with predefined limits. And how we do this is, basically, that we just calculate the SCADA main bearing temperature minus the main bearing temperature estimations. And from that, we get-- basically, we get a deviation distribution is what you can see in this plot down here, which shows the frequency distribution of the deviation are the residual, as it has here.

      And this should usually be the distributions symmetrically distributed around zero. And we would, now, only look at the positive deviations because the way we calculated it if the SCADA main bearing temperature is unusually high, we would get greater positive deviations. So if there is a problem with the main bearing, we'd expect this distribution to shift to the right, right?

      And so we can detect any issues with the main bearing, we'd some-- we'd define some limits. So in yellow, you would see here, the warning limit and then the alarm limit in red as well. Yeah, and these are basically just based on experience. And when we have a shift in the distribution due to a damage of the main bearing, these limits should trigger alarms and we would be get a notice that there's a problem.

      So with that, we go to number three on the agenda, which is the result. So we have a look at the results at the tool we've developed. It's generating for us. So first of all, I want to give you a quick overview of what the results of the module we've developed look like. On the left side of the slide, you can see the results one intact memory, and on the right side, you can see the results for a damaged main bearing. Let me just quickly explain to you the plots.

      So first of all on the x-axis, you have the days of the month. And on the y-axis, you can see the main bearing temperature. And then the upper plot shows in blue the SCADA main bearing temperature, and in red or orange, we can see the model output of the ANN. The lower plot, then, shows the deviation between these two lines we've seen in the upper plot. So basically, every time the blue line is higher than the red line, we would see the deviation down here in form of blue peaks.

      I mean, as I said earlier, we only look at the positive deviations because we want to detect unusually high SCADA temperatures. And I also talked about the warning limits before and the alarm limits as well. And you can see these in this plot as well. So in yellow would be the warning limit and red would be the alarm limit that would trigger warnings and alarms for us. And I think you can clearly see that in the damaged main bearing, there's a lot more going on in this deviation plot on the right side.

      If you would look on the left, we have deviations of up to or slightly higher than 0.5 degrees Celsius. And on the right side, we have deviations of up to three degrees Celsius. And this would clearly trigger an alarm love for us, while the other wouldn't. And then, before you ask, you can tell that the warning limits and the alarm limits are different in these two plots, and this is because we defined individual-- or turbine individual warning and alarm limits, which is basically because if you use the ANN to make predictions for each individual turbine, it can perform differently for each turbine and reflected that in the-- or we took that into account by taking-- by defining individual alarm and warning limits.

      Then on the next slide, we can see the comparison. to the conventional method, as I call it here. And the conventional method, in this case, is basically just-- yeah, you can see the months down there. And then you can see, on the y-axis, the main bearing temperature. And what you can see in the plot is basically just the monthly average temperature. And you can see four lines, and it's just to reflect the different operational levels to when it can be in. So probably, the upper line would be full generation and the lower line would be very low generation. And you can see there's a difference in the temperatures for the different operational states.

      And then there's-- on the right side of the plot, there's a limit-defined upper and lower limits, actually. And so with this, I basically just wanted to show you why the ANN temperature analysis performed so much better. Because if you look on the left again, the upper plot shows an intact main bearing and the lower plot shows a damaged main bearing. And I just picked two months that I thought would have similar temperatures in these two plots. And just also consider that, actually, none of these triggered any alarms because none of these reached the upper limit.

      So as I said, I picked one month, which is December for the upper one and October for the damage main bearing, and I did the ANN temperature analysis for these periods. And you can clearly see that while the conventional analysis didn't find anything in there, the ANN temperature analysis was able to detect the error-- or the damage, actually quite clearly.

      So when we're looking at the potential for early detection, the SCADA-based ANN model that we've developed performed very well as well because we were able to detect an impending main bearing failure about eight to nine months prior to the failure or the replacement, in this case, because we usually won't run the main bearing to a failure for the reasons I've discussed earlier.

      And the model we've developed also worked very great because we were able to detect that there's something wrong with the main bearing continuously from the first detection until a replacement. So we could constantly see that there's something wrong with the model, which wasn't the case with the conventional method we've used earlier, really. Because with that one, we maybe just got one alarm in this period, and then we're going to show if this is just a phase alarm as usual or an actual problem with main bearing.

      But this wasn't the case with the ANN model, so this worked very great. And yeah, and this gave us plenty of time to plan the maintenance, to plan the replacement, whatever's necessary to order a new main bearing, for example, so we have it in stock so the whole thing, the replacement, can go much quicker so we save downtime.

      And then we come to the last point on the agenda, which is the summary, where I'm just going to tell you what I've told you in this presentation real quick. So in summary, our aim was to implement a condition monitoring solution for the main bearing of the wind turbine. Our approach was to use the main bearing temperature as a health indicator for the main bearing.

      The problem we had with that was that even in a healthy turbine, the main bearing temperature varies in a wide range doing normal turbine operation. Our solution to that was to use a artificial neural network and also use the large amount of operations SCADA data that we had and also implement a deviation analysis to combine all of this. And the result of that was a SCADA-based ANN model that can detect impending main bearing failures about eight to nine months prior to the failure.

      Hello, everyone. My name is Graham Dudgeon, and I am Principal Product Manager for Electrical Technology at MathWorks. I'm joined today by Patrice Bruneelle, who is Principal Scientist at Hydro-Québec's Research Institute, IREQ. Hi Patrice, how are you, my friend?

      I'm doing great, thanks Graham. I'm really happy to talk with you today.

      In this presentation, Patrice and I will be talking about how both systems simulation and machine learning can be used to develop algorithms that can detect the location of faults on electric grids using voltage sag measurements. Patrice will kick us off by providing some background to fault location detection, and we'll discuss some of Hydro-Québec's initiatives in this area. Patrice will then describe the system under study, and talk about configuring the simulation model to generate fault data at multiple locations.

      Patrice will then pass back to me, and I will discuss the use of Classification Learner, which is part of the statistics and machine learning tool box, to train and evaluate machine learning algorithms. Next, I'll take a look at how the results of a classification algorithm can help guide us to make recommendations on what additional measurements may be needed to improve overall results. I will then explore how would reduced data sets affect the accuracy of classification algorithms. This helps provide guidance on how much data is needed to provide accurate classification and what limitations may exist if we are training with reduced data sets. We'll then end with a summary.

      I would like to start by setting some context behind fault location detection. Clearly, being able to precisely determine the location of a fault is of a high operational value. With the precise location, system operators can take definitive action, and maintenance crew can be more efficiently dispatched. Hydro-Québec has a long history of developing advanced fault location and condition-based maintenance capabilities.

      One example of this is the MILES project, MILES standing for Maintenance and Investigation on Lines. With MILES, voltage measurements were made at the key location, and an algorithm was developed that would triangulate on a fault location based only on these measurements. the image we see on the right shows an example of the MILES fault locator estimating an actual fault.

      The MILES algorithm are based on the power system engineering theory, and Hydro-Québec, like many utilities around the world, is exploring where machine learning may provide complementary capability and enhancement for operational monitoring, which is why we were excited to explore this capability with MathWorks on a representative problem.

      The system under investigation is a radial distribution network, which is representative of the system used for the MILES project. The simulation has been validated against the real system, and so we have a high level of confidence that the simulated fault responses will be representative of actual response.

      For this study, there are five voltage measurement locations indicated in green here, and we chose 38 different location in which to apply faults. For each fault location, we use a 288 combination of phase and neutral resistance This was done to generate over than-- more than 10,000 fault scenarios, specifically 38 fault location, with 288 scenarios per location comes up at more than 10,000 scenarios.

      Why did we aim for such a large number of scenario? Well, for machine learning, typically the more data, the better. 10,000 seems like a reasonable target. although we can, of course, generate more if needed. I should also note that we generated normal data, meaning data from simulation where no fault was applied and we change only load value using a normal distribution profile on each load. I'll now switch to MATLAB to show you the model and script we are using to generate the simulation data.

      Let's first look at the simulation model that we developed in the Simscape Electrical's specialized power system. The distribution network is connected to a grid to a distribution transformer. And there is a three-phase line of the 15 kilometers that I split in three portion of five kilometer each. And also, there is a branch, two-phase branch there. So I label all the blocks L1, L2, L3 and L4. There is also some single-phase distribution feeder connected to the network. I have six of them, I label L1P-- one phase one.

      We also have bus bar to measure the voltage and currents in five location in the model. They are labelled B1, B2, B3, B4. We collect all the signals in here. And we are me-- for each fault, we are measuring the positive, negative, and zero sequence of it. We collect all the signals in one output. Let's have a quick look in under the three-phase system. You see that I split it in one kilometer apart using this block. And this is where we've specified the line parameters. And this allows me to have access to five or six point in the line, so I'll be able to put a fault in there.

      The purpose is to have a fault block that I can program for values all type and the fault impedance. So I'll be able to set up this block and then the script will move this along the line in all the location we just saw before. So it will allow me to apply the fault at the 38 location I mentioned earlier. And of course, the same apply, also, for the single-phase line.

      Now, let's have a look at the script we used to generate the simulation data. Let me go full screen, here we go. OK, this is where I specify general parameters where I define the type of fault. For now, I'll program only an ab to ground fault with the dedicated parameters. Here, I'm listing all the 10 lines I have in my models where I will apply the fault using special label that we could use later on.

      Here, this is where we can specify the lines. So now, for this measurement, let's say that we will only one in five. So the single-phase line and the two-phase line, just to show you the principle. After that, this is where for each line in the list, I will add the full block to it. And I will do some settings through it depending if we are doing a three-phase fault or a single-phase fault. I have to set up the block accordingly.

      Then if we go down, for each section, each insertion point, I'll do a hard line. So I'll connect the fault block to the location I would like to do the fault. And for each fault location, I will apply a bunch of error Rphase and Rneutral values. So this will give me a lot of fault, typical fault, to this location.

      So in the next step-- let's see here, let's just simulate for one just to show you the principle. We'll go faster for the simulation. And then the next step is to launch the simulation. And after the simulations save simulation that data in a table. So it will be available after the simulation. Now, let's start-- whoops, there is a stop. Let's continue. You should see the block-- the full block appearing in this subsystem. This is the L1 line. Here we go, it is connected to the first point where I would like to pair from the fault.

      The model is compiling, simulating and then I'm going to the next section. Do the same settings, applied fault, collect the simulation data, et cetera. Let it run for the rest of the line. We now start to fourth-- the simulation number four.

      And once we are done with this L1 line, close it automatically and then open the single-phase, right. Same thing here I'm doing add block and I applied the fault at the first position. Then simulate, get the result, go to the next. There will be two more simulation, this one and then the last one. So let's now go to the MATLAB command. You can see here, the fault location and the table of data generated.

      Let's have a quick overlook at the simulation results. For example, the first one I did, the first fault, on the bus B1 year ID ABC phase magnitude just to show you the sequence parameter I'm computing and getting. And that's saying the last simulation I did at the very same bus. I'll now pass back to Graham, who will discuss using the machine learning tools.

      Thank you, Patrice. So once the full data is generated, we then organize it in a MATLAB table. The table includes sequence data for each bus voltage measurement and also the fault classification. The example we see here shows only a few data points for illustrative purposes. We have bus 1 sequence data for magnitude and angle and also the fault classification. For this example, we generated data for over 10,000 scenarios.

      The Classification Learner is a user interface that comes with the statistics and machine learning tool box. So I'm going to open up the Classification Learner in a moment, and I can show you some of its capabilities. I would note that I'm not going to give a comprehensive overview. So following what I show, if you would like more information, I would encourage you to refer to the documentation.

      The first thing I'm going to do is load the data set and invoke the Classification Learner. I'm using projects here so I can organize my files and create shortcuts to help me better manage my workflows. So I'm going to click Get Average Training Data. What that will do is load the data and then invoke the Classification Learner. If you would like more information on projects, please refer to the documentation. I will just expand the Classification Learner to the full screen.

      In the Classification Learner, I first start a new session and load data from the workspace. In this case, my data is in the MATLAB table sheet. Now, it was the only variable in the workspace so it automatically picked that one up. If you have multiple data set variables in your workspace, you would select it. In this case, I don't have to do that. You'll also see that the data has been parsed automatically, and as the fault column is essentially categorical with 39 unique classifications. I'll remind you there are 38 fault locations and also one normal classification.

      So because it's essentially categorical, the Classification Learner has automatically picked up fault as the response data. And the other variables within the data table T are chosen as predictors. Now of course, you have control over this. The Classification Learner doesn't pick up the right information. You can select appropriately. But in this case, it does exactly what I want it to do.

      What we now do is select what we want to do with validation. There are two options, cross-validation, which separates the data into a training and testing set using statistical methods, or holdout validation, which will put aside a certain percentage of the data for testing and then use the remaining data for training. We'll stick with the default setting, which is to use cross-validation with five faults. We then click start session. You can see that we have defaulted to a scatter plot, which in this case is showing bus 1 voltage magnitude for the positive sequence versus bus 1 voltage magnitudes for the negative sequence.

      There are a couple of observations I would like to make here. First, normal operation, where no fault is applied but where we are varying load values, is seen to be very clean. It's actually this small region here down at the bottom right. If I just scroll down on our classes, see normal, It's the red one. If I hover over, then we'll actually get some information on the data points that are selected. So you can see here, class normal.

      So we can see that normal behavior is very clean and that we have a tight distribution and we do not see any overlap with any of the fault conditions. We would expect normal operation to be readily classified in this case. The second observation is that while we can see a pattern on the fault data, we also see overlap of data points meaning that classification through traditional engineering analysis would be challenging.

      Let us now present this data to machine learning algorithms and see what we can achieve. The place I always start is to select All Quick-To-Train. What this will do is it will select a number of machine learning algorithms, which for the data set I'm presenting, will train in a relatively quick amount of time. If I then select Train, this will automatically invoke a parallel pool if you have parallel computing tool box installed, which will allow the training algorithms to benefit from multiple cores.

      And we can see, now, we have a number of different algorithms going through the training process. So we'll just let a few of those go through. As you can see, as they are finishing, then the accuracy is coming up and the best model is going to be highlighted by the White box. So right now, we have an accuracy of 67.9% We will just let this run a little bit more, so if we can do better. Define KNN is at 75.9%. 80.6 on the medium KNN, so that's pretty good.

      So what we can do for the other ones are just looking to finish training. It's our best one so far. So what does the 80.6% mean? To gain more insight on this number, we can view the confusion matrix. So we go here and select Confusion Matrix. The confusion matrix shows us how the training data is performing on the trained classifier.

      We see that we have true class versus predictive class. If we had a perfect classifier, we would see only diagonal entries on this matrix and we would also be a little bit skeptical of the results. Perfect classification of training data may mean you have overfit the classification algorithm or that you have some data quality issues.

      In this case, we can see that we have some areas where we have a distinct off-diagonal pattern where the classification, looking at the numbers, associated with the training sets, we can see that we've got a large number associated here with the L1ph3 and L1ph2, and also here with the L1ph4 and the L1ph5. So the classifier is struggling with distinguishing the faults on lines L1ph2 and L1ph3, and is also struggling with classifying the faults on the L1ph4 and the L1ph5.

      This issue forces us to go back to the physical system and determine whether there are physical characteristics that are contributing to this result. So let us consider what's happening with our system. The voltage measurements we are taking all upstream from forked lines. The forked lines contain equivalent electrical characteristics. This means that is a fault occurs on a fork, say at location F1 in this illustrative example, then the voltage measurement indicated by V mass, while it can detect the fault, it cannot distinguish whether the fault is at location F1 or F2.

      Let's look at the system model again so I can show you the forked lines. OK, so we are having the issue with L1ph2 and L1ph3. If I just go under L1ph2, you see that we have four segments here. And if we go to L1ph3. we have two segments. But the lines fork 1 1, nn, which is here. So we do indeed have a fork which has the same electrical characteristics, and hence this is why we are having a difficulty with the classification on L1ph2 and L1ph3. The same goes for L1ph4 and L1ph5. We have the same setup in this case.

      So we just zoom in a little bit more on those areas of the confusion matrix where we were having the difficulty. So we see we have significant off-diagonal classification, which is erroneous because of the forked lines. So what can we do to improve this situation? One solution is to make additional voltage measurements at the end of a fork. Note that in general, we need y minus 1 additional measurements where y is the number of folks. And so with two forks, which is the situation we have on our system, we need only one additional measurement.

      So we updated the simulation model to include additional measurements. In this case, a measurement on L1ph2 to help distinguish L1ph2 and L1ph3 faults, and a measurement on L1ph4 to help distinguish L1ph4 and the L1ph5 faults. So we'll now load up the new data set with additional measurements and retrain the classification algorithms. We are now training on the new data set. Now remember, the last time, when we did not have these additional measurements, where the forked lines were an issue, the best result we had on the All Quick-To-Train was 80.6%. So let's just let us go through and we'll see what we can achieve.

      75% so far on the fine tree. We'll just give it a few more seconds to let one or two more train up. 91.9%, so we're already getting better response. But the proof is in the pudding, we'll have to look at the confusion matrix to see if we are helping resolve the particular issue we had.

      So let me select either fine KNN or cosine KNN, they have equal accuracy. They may have slightly different results, but I'll just choose one to take a look at here. We will look at the confusion matrix. Actually, let me try that again. We have actually three with the same results. So it now choose the weighted KNN. So I'll just select that. We'll take a look at the confusion matrix.

      So we can know see-- you may remember that we had a significantly larger off-diagonal component here when we were looking at L1ph2 and L1ph3. so we have significantly better classification than we had before. So the introduction of those additional measurements on L1ph2 and L1ph4 have helped us achieve a greater level of accuracy. And that also helps build our confidence that the effect we were seeing was indeed caused by the forked lines.

      So I'll just make a couple more points here. I've only used to All Quick-To-Train but with the Classification Learner, you have access to a broader range of models and it may well be that you want to take a look at support vector machines. I typically use quadratic support vector machines because I find them to be more accurate, but they will take longer to train. So I'm not training one here in this presentation because it does take a longer amount of time. But typically, you would see more accuracy with that.

      Another point is when you do have a trained model, you can press Export model and then you can select a name for your model and just click OK. I'll then go to the MATLAB workspace. So you can see here that we have trained the model in the workspace and it also shows you how to call it within the MATLAB workspace.

      So in subsequent results I'm going to show in this presentation, I'm using the MATLAB scripts to be able do it. I'm not going to show you the MATLAB scripts as they're just lines of code, I'd rather focus on the results in this presentation. But we can provide scripts to those who want to take a closer look at these workflows.

      We'll now consider training using only edge cases. The reason we're doing this is to give us some insight on the type of data we need to successfully train a classification algorithm. Particularly, can we achieve accurate results on reduced data sets?

      There are three cases we'll consider. First, training on fault data gathered only from the first line sections. Second, training on fault data gathered only on the last sections, and third, training on fault data gathered on both the first and last sections.

      We can see from the confusion matrices shown here that we get very accurate results for classification on the data provided. This is to be expected. The question is how will the classifiers respond when fault data from other sections are passed through these models? We'll take a look at a couple of lines to explore what happens with this particular system. Let me first orient you on the confusion matrices you're seeing.

      So let's focus on the results on the right, trained on first section. This means we do not have predicted classes for anything other than section 1, which is why if you look at the columns here, you'll see section 2, section 3, and section 4. These are empty. That's to be expected, because we did not train on those. So given this, a result on the diagonal is best because we have data for that scenario. If we classify in the green box, that means we've identified the correct length.

      So for example, we look at L1ph4 section 2. That was the true class that was not trained on the first section data. It's been identified as L1ph4 section 1. So the line is the same, so hence the green box. And that's the best we can do for sections we have not trained with, is to at least have them classified on the correct line. Anything outside the green box means that we haven't identified the correct line.

      So we can see by looking at those three different edge cases that we do not get satisfactory results. For example, the behavior of faults on section 1 of L1ph4 does not contain sufficient information to extrapolate that a fault on another section of L1ph4 can be identified as belonging to that line. Here's another example, L1ph6. This line has only two sections, and we see that while training on the last section, the middle response here yields accurate results for identifying the correct line. Training on the first section is not accurate.

      So when we look at these results and also other results which I'm not showing here, we conclude that we need a broad range of fault scenarios across every line section in order to accurately classify the fault locations. This perhaps comes as no surprise, but the remaining question is what level of granularity do we need on the line sections to achieve acceptable levels of accuracy? This question is outside of the scope of this presentation but can certainly be explored through the generation of synthesized data from simulation models.

      So in conclusion, the results of this study are encouraging. We've shown that classification machine learning algorithms can be used to classify fault locations with a relatively high degree of accuracy. We saw that forked lines are problematic for upstream measurements, and so in this case we recommend additional measurements at the end of a fork. By making these additional measurements, we are able to achieve much better accuracy on fault location classification.

      We also took a look at trained on reduced data sets. And we found in this example, training only on the first and last sections is insufficient to locate the correct line with an acceptable degree of accuracy. What that means is a broad range of synthesized data is necessary to effectively treat machine learning algorithms. Thank you.

      Hello, my name is Steffen Ziegler. I'm the director for signal analysis in artificial intelligence at IMCORP. Today, I am giving you an update about signal waveform classification in partial discharge applications for underground power cable systems.

      So in this abstract, I would like for you to understand that underground distribution cable system failures can be predicted. So each year, millions of people and thousands of businesses are impacted by underground cable system failures. For over 40 years, cable and accessory manufacturers have used offline 50 and 60 hertz partial discharge testing with specific measuring sensitivity levels. They are measured in picocoulomb as a quality control standard.

      Partial discharge is a phenomena inside the cable insulation and cable accessories, such as splices and terminations, long before a cable system failure occurs. Measuring partial discharge on-site in the field, actually, is a standard process to assess the condition of the cable installation and the workmanship of the cable system installation.

      How do we get to predictive maintenance? Well first, by identifying cable defects before failure and we need to understand how cable systems failures occur. By cable system failures we include, obviously, the cable itself, and the accessories such as splices and terminations. Secondly, differentiating cables with high-to-low risk defects along with those that are defect-free enables predictive cable system maintenance.

      Over 99% of solid dielectric cable system failures are associated with partial discharge. Most of the cables that have been installed in the past 30 or 40 years underground are actually solid dielectric cable systems, which means they're either made out of costly polyethylene or EPR, which is ethylene propolyne rubber. The IMCORP database contains information, including a vast amount of signal waveforms, from over 210,000 tests, which is approximately over 250 million feet of tested conductor length.

      Next, I'm going into different approaches for maintaining cable system assets and I'll start here with the reactive approach. So why is predictive maintenance a desirable approach? Well, let's look at the reactive approach first. The cable system failure causes an unpredicted service interruption. What you see here is an actual cable failure. So the result of it is lots of service, which impacts the customer's emergency and trouble crew mobilization usually during unexpected times or at really inconvenient times at night or weekends, which is an O&M expense.

      Reliability indices are impacted, the SAIDI, SAIFI, CAIDI, or MAIFI, and worst of it all is the potential collateral damage and safety issues. Just one example, think of the manholes that could fly up in the air during such a cable failure and then back down on the street again. So let's get to the types of maintenance. The first type, the reactive type of maintenance I've already explained in the previous slide.

      The second type of maintenance is called preventive maintenance, meaning someone does maintenance at a regular rate. The problem is that is unnecessary maintenance can be wasteful. Plus, we do not know the time interval that is optimal to do the maintenance and it may not eliminate all failures. You might be able to do maintenance at regular intervals and catch the outage or you come too late and the object outage has already occurred. Then, the most desirable type of maintenance is predictive maintenance, forecasting when problems will arise. The problem with that approach is it's difficult to make accurate forecasts for complex equipment.

      On this slide, I'd like to explain how preventive maintenance can be applied to partial discharge defect measurements. So I will explain how a defect evolves and eventually becomes a failure. The defect alone is not the real danger, it is the failure that occurs from that defect.

      So on this graph, you see on the x-axis the lifetime variable, which is pretty much a time. And the y-axis is the condition of a partial discharge defect. And the condition, in this case, is measured by PDIV, which stands for partial discharge inception voltage. It's the voltage that makes a defect that's dormant in the cable installation active.

      So as you can see here, the cable is operated at operating voltage, here denoted with U0, or 1 times U0 is the operating voltage where the red dashed line is. All the other voltages that go above here are not the operating voltage, those are actually transient overvoltages. So what is a transient overvoltage?

      Well, it's something else than the operating voltage superimposed as the operating voltage for a few microseconds of milliseconds. One example is lightning, like lightning strike can elevate the voltage on a cable for a few microseconds or milliseconds to levels up to two and a half, or three, or three and a half times the operating voltage depending how the cable system is being secured.

      The blue color on top here indicates-- it's a rare occurrence to have transient over voltages at this level. The red color means the occurrence of transient overvoltages is very often. So let's look at a dormant defect in the cable insulation. The transient overvoltage that's produced by lightning strike reaches that particular level. The dormant partial discharge effect becomes active, and because it's active it continues to grow for a tiny bit symmetrically.

      So a few more micrometers the defect continues to grow in the insulation for a short amount of time. Next time a transient overvoltage occurs, the level doesn't have to be as tall or as high anymore in order to activate the heating and so on. The more often a defect has been activated, the lower the partial discharge inception voltage becomes.

      And so on and so forth, so over time that curve continues to drop and it continues to drop faster as we get in the area of high occurrences of transient overvoltages. So these transient overvoltages could be switching like when you switch a cable in and out, which naturally occurs when this has to be done for load flow considerations or lightning strikes, something that's a device that cable network owners use to find cable defects with. But it also produces a overvoltage in type-- pulse type form that activates maybe dormant defects.

      So at one point in time, the condition-based assessment as we do is being performed, and we find out they're a partial discharge sites in a cable. And now the question is, after this condition-based assessment or that point in time is over, what will happen to those partial discharge defects?

      Now, interpreting partial discharge related signals. These defects are very small, as I mentioned before, micrometers to a few millimeters in range. So you can imagine the electromagnetic signals they emit when they're active are also on a very small order. A few millivolts, sometimes even upper hundreds of microvolts. So when somebody looks at those signals and tries to interpret them, it can become a complex task.

      But the task is complicated even more when other signals than just the partial discharge signals that are being looked for are being superimposed by other noise types of signals. So interpretation depends on human decision-making, and obviously, human decisions vary between different humans. So we don't want this decision-making to vary. Then also, a scalable operation cannot depend on the human process for interpretation. Obviously, there is a time constraint and fatigue will make the human decision-making vary.

      So it is very desirable that a system can continuously learn and become better at interpreting. So a system that can continuously adapt to changing inputs and can also track data trends, that would be very desirable output. So the goal is developing models, machine learning models or deep learning models, that can actually predict if the signal is partial discharge of it's not partial discharge and classify them.

      On this slide, we see how an ideal model should behave for us. This slide is a confusion matrix, which has, on the lower left, a true positive partial discharge, on the lower right, the true negative partial discharge, which is, let's say, noise signals or other signals that are not related to partial discharge effects. Those would be only two classes that a great model, an absolutely great model, would classify signals correctly into. But no system is perfect and we know there will be two other classes.

      So on the top right, we see the false positive class. This is the class that is being considered by an algorithm to be partial discharge but in actuality, it is not. And then we see, on the lower left, the false negative class, which is the class that an algorithm actually classifies as not partial discharge but in actuality, it is.

      So a false positive class can be tolerated if the occurrence level is low. The complication with the false positive class is it makes the review, or the supervision of that class a little more difficult because it's not mixed only with true positive partial discharges, but also false positives, which are actually non-partial discharge signals.

      The false negative class is a little more tricky because that means we have lost some of the actual true partial discharges and categorized them in the negative class, so they're not being considered anymore. So we would like this class to be as little as possible. On other words, the recall rate should be close to one. So the true positives and the true negatives should cluster well, and that means the model would have an accuracy close to one or 100%.

      On this slide, I'd like to explain what two major approaches can be considered for this task. So machine learning is one consideration, deep learning another one. Let me quickly talk about machine learning. So machine learning can generally be classified into two main classes.

      One is the unsupervised learning where no labelled data is available. And the other class is supervised learning. So labels for the features are actually available. So in the supervised learning with machine learning, you can classify. That's what we want to do. Or you can even do regression analysis, for instance, if you want to go further and assign probabilities to certain class types.

      And then the next approach is deep learning. Deep learning, as opposed to machine learning, does not involve feature extraction. So that's a big advantage if you consider deep learning. Because in machine learning, feature extraction is necessary and that depends on how we choose features and how we define features. In deep learning, that task is not necessary. So the next few slides are actually talking about a deep learning approach, like an approach where no feature extraction was defined, but the algorithm internally dealt with all of the features that were internally derived.

      Let's look quickly at two deep learning workflows for signal applications. On top, you see the convolutional neural network application where signal waveforms can be transformed into-- let's say images could, for instance, be time-frequency transformation and the resulting image. And then that image, or those images on different signals, are being pushed through the network with different layers and each layer is composed of convolution layer, rectified linear unit, and a pooling layer, and eventually comes to a fully connected layer at the end that supports classification.

      So this would be a CNN, or the Convolutional Neural Network, approach. The other approach for deep learning could be a Long Short Term Memory, or LSTM, network for time-series data. In the next slides, I'll explain how LSTM approach was used to classify partial discharge signal waveforms.

      So we have chosen a recurrent neural network, or an LSTM-based network to apply the partial discharge sigma waveforms for classification. So there were a few challenges that we needed to consider. One was a different length, time series for different lengths of cables. The longer a cable is that is being tested, the longer the time series of a partial discharge becomes. So if you're trying to classify valid partial discharge signals from short and long cables at the same time, we need to-- actually, we paired the data in a specific way.

      Another challenge was the range of dynamics, so the dynamic range of the time series is at an order of three VC signals that are a few millivolts in magnitude and signals that might be up to a volt magnitude. And then, also, the labels for the features might have some noise in it. So even though they were classified by humans, humans make mistakes too, and we might not be able to totally rely on the labels as a ground truth.

      It is very difficult to find out what a deep learning network actually does and how it makes decisions. So it's almost like a black box. However, there are a few tools that give you some insights in the n-making of a deep learning network. And in this case, we see, on top, a partial discharge waveform. And at the bottom, we see a hidden unit of the RNN network. Try to visualize how the decision has been made on the signal. So in this case, the outcome was one, which means it was an actual partial discharge signal.

      On this slide, I'm going to present the end-to-end workflow for our LSTM training for PD classification. So all it starts with is the data acquisition. The acquisition is being performed on a cable in the field where a digitizer is actually collecting the signal waveforms. Next is data pre-processing. So the data cannot just be taken as is, it needs to be pre-processed and then we can start transforming the data. So everything in this box here, in that square, is on the software side, then.

      We need to transform the data, we need to choose to write hyperparameters like an initial guess, and then we start tuning those hyperparameters maybe by using grid search. We do model training after chosen LSTM model. We do batch inference to make sure the model is generalizing well. And then we do model evaluation. So different models have been chosen and different parameter fields have been chosen. So on some of the next slides, we show which models have actually performed best.

      As I mentioned before, there were some challenges to consider for our specific case. So here, I want to show the workflow for model training for the LSTM network. As you can imagine, different cable length produce different lengths of time series. So first, we had to do data pre-processing and augmentation. Then you had to do the correct setting after training options for the network and then sorting the signal waveforms.

      As you can see, the different batches here, illustrated in the different blocks, contain different length signal waveforms, those blues thin lines. So each batch, or mini-batch size here, has to be one unified length. So that means there needs to be some padding for all of the signals.

      Now, we can do that in an unsorted way or we sort so the mini-batches are sorted. It is much easier to only pad the signals with the necessary amount of slack. So sorting is a very important step just in the correct sorting and then normalizing the sequence data.

      As I mentioned before, the dynamic range is on an order of three. So we need to normalize the sequence data for the network to actually being able to converge and classifying correctly. Also important is to find the right mini-batch size. If it's too small, the model might not converge well. If it's too large, the memory issues related on a typical computer might kick in and it's important to shuffle the data. And finally, the network is being trained.

      So here is the training and validation data. We used 59,964 labeled sequences. The sequence length in sampling points whilst varying from 200 to 800 points. The mean sequence length is 381 sampling points. As illustrated here in this histogram, the x-axis is the sequence length in sampling points and the y-axis is the number of records available. So when you integrate this out, you will find out you have 59,964 sequences.

      On this slide, I'd like to explain how we came to the best models. So we used around 10 plus models that we've chosen and we've chosen the initial hyperparameters and then optimized the hyperparameters and tuned them to the best levels. And at the very end, four top models with the best hyperparameter settings and model parameter settings were left.

      And all of those remaining four top models had very similar performance. So on this slide, you see the MATLAB deep learning tool box interface that allows you to see how a model is being trained, what the progress is, the accuracy, and also the loss function here over the number of iterations are also the number of epochs, as you can see, printed here.

      So what was the outcome of those models? What confusion matrix for binary classification that the best model produced? Again, we had 59,964 total signal waveforms. It's a binomial classification. We see that we achieved an accuracy of 89.77%.

      So the recall rate is actually 80.71%, which means we have a false negative ratio of about 8.49%. We'd like to continue to train models where that rate actually becomes lower and the recall goes close to one. We have a false positive rate, which is pretty low, 1.75%.

      So the precision here is 95.32%, that's very satisfactory. So we're not mixing unnecessary non-partial discharge signals and the true partial discharge signal pool. The precision is, as I said, 95.3%, which is great. So for now, we say that's an acceptable model for PD classification. It's not a model that can be left alone and make all the decision, but it's a great model to take most of the work for a human away and do all the leg work while the human is a supervisor of the process, then.

      On this slide, I'd like to show how deep predictive LSTM model can be enhanced. Here's a graph that shows, on the x-axis, the number of instances-- of training instances. And on the y-axis, probability that has been defined between zero and one. And as you can see here, the left side is green. That's a partial discharge class. The right side shaded in that is the non-partial discharge class. And then we have the gray zone in the middle, pretty much where the probability of an instance drops below 80% or is above 20%.

      Ideally, that transition would be much steeper and much shorter over the instances. So why is that that there is some larger "gray zone", quote unquote. So we have two options. We can move the decision threshold for the binary classification either to the right or to the left. If you move the decision threshold to the left, that means most of the signals that are actually being classified as partial discharge are actually partial discharge.

      But we might push a lot of partial discharge signals into the non-partial discharge pool, and vice versa if the decision threshold is being pushed to the right. So we can continue to train our models better, we can improve the incoming signal data quality, and we can maybe just run a second model that is being specifically trained in this zone. What is necessary for that is, obviously, a correct labeling. And as I mentioned before, there's some uncertainty if all the labels are correct that we're given for this problem.

      On this slide, I'd like to talk about enhancing a predictive model. There are a few challenges that we need to consider. First of all, some signal waveforms appear to be ambiguous, the ones in the gray zone from the previous slide. So there's either an improvement necessary with the instrument or how the signals are being collected from a hardware standpoint. Then the data pre-processing. Maybe some of the signal features can be enhanced by digital signal processing. And the noise in the labels, meaning the confidence in the labels, might not be 100% has to be considered as well.

      So we want to train for the different methodologies in the ambiguous area. That's some of the next steps that we're going to perform. And then, we're moving the decision thresholds and see what impact it will have on the outcome of the model on the confidence matrix or the confusion matrix. And multiple and competing models that can actually run in parallel to give binomial outputs. Then we can use the voting classifier to find out which one is the most likely outcome that needs to be trusted. So those are all future enhancements that we are going to work on.

      On this slide, I'd like to present about a different model approach, the model approach that is based on machine learning. As I mentioned in some of the previous slides, for machine learning, feature extraction is actually necessary and has to be defined by humans. In the deep learning approach, feature extraction is not necessary.

      So in this example, which is a different example and from a different data pool, we have used 350,000, approximately, labeled waveform parameters. And a feature extraction unit was designed to pull 43 features from each instance but only eight were chosen to be the predictive variables or to be independent variables.

      We used a boosted tree from the ensemble Adaboost and trained the machine learning model. And here you can see the confusion matrices. The overall accuracy came out to be about 94.2%. So that's actually a very good outcome, and we think that the deep learning approach and the machine learning approach could actually run in parallel and to have competing models determine if a partial discharge signal falls into the true or the false category.

      So before we come to the end of this presentation, I would like to talk about a couple of conclusions. First of all, the signal classification for partial discharge time series has been successfully implemented in the daily analysis process. So the enabling tools are for MATLAB, like we use the deep learning tool box, the signal processing and statistics tool box, and a machine learning tool box. We have machine learning models available, deep learning models available, and they have been implemented in the process of classifying partial discharge.

      So what does this lead to? What is the benefit for all of this? The average time savings for one analysis record, on average, is 51.7%. So one analysis record took usually, on average, 15.7 minutes and it came down to 7.6 minutes by applying the automated classification. So this is directly translatable into cost savings and also very important in resource and task optimization. The human becomes, rather, a supervisor of the process than performing the process by themselves.

      So another benefit we didn't even think of at the beginning of this endeavor was quality outlier detection. As these models were developed, we could find out that we have, now, the opportunity to find anomalies or outliers and deal with them. Sometimes, these anomalies and outliers had certain root causes that we could investigate and improve the systems.

      So tracking variables and alerting users when outliers occur is very important. This is preventing data quality issues. And also, these models calculate key performance metrics that are being visible in a web-based dashboard. Thank you very much for your attention. I'm ready, now, to take your questions and try to answer them.

      View more related videos