Description

Signal Analysis and Feature Extraction for AI with Wavelets

Overview

Building AI models with signal and time-series data has become very popular for advanced applications in predictive maintenance and health monitoring, automated driving systems, financial portfolio management, biomedical systems, and many others. Robust signal analysis, preprocessing and feature extraction techniques are critical to building these models.

Analyzing physiological, speech, vibration, and other non-stationary signals with traditional Fourier based signal processing techniques can be challenging. Wavelet based techniques can help address the limitation of these techniques and build better AI models.

In this session, through detailed examples, you will learn how to perform:

Wavelet analysis with apps in MATLAB without needing to be an expert
Clean and preprocess data with signal filtering with wavelets
Feature extraction from signals data for machine learning and deep learning workflows with multiresolution analysis and wavelet Scattering

About the Presenter

Esha Shah is a Product Manager at MathWorks focusing on Signal Processing and Wavelets Toolbox. She supports MATLAB users focusing on advanced signal processing and AI workflows. Before joining MathWorks, she received her Master’s in Engineering Management from Dartmouth College and Bachelor’s in Electronics and Telecommunication Engineering from Pune University, India.

Recorded: 22 Sep 2021

Full Transcript

Hi, everyone. Welcome to the session on Signal Analysis and Feature Extraction for AI using Wavelets. So we will start by discussing the importance of signal processing when you are building AI solutions for signal and time series data, and then I will demonstrate a few key wavelet techniques using application examples.

And we will take a look at how to perform wavelet analysis in MATLAB. Since the use of sensors has become so widespread and there have been so many advances in AI research, more and more AI applications are being developed in different industries like biomedical signal analysis, wireless communications, predictive maintenance, health monitoring, and so many others. And the development of these different AI solutions generally follows this workflow or some variation of it.

The first step is collecting data and preparing it, cleaning the data, pre-processing it, and extracting features is part of the step. The next step is using the data features to train and test AI models. And then finally, the models are deployed in the field.

Now the pre-processing step can include things like resampling the data or dealing with missing data samples, removing noise, filtering data, finding or removing outliers, et cetera. And feature extraction is the process of extracting the key information from the raw data and using that to train models instead of the raw data.

Now features can be any type of descriptors of the data. So time domain features can be patterns or peaks in the signals, frequency domain features can be spectral or bandwidth measurements. Time frequency maps is another popular way to represent data.

And there are other techniques to extract features that are domain-dependent depending on your specific application and data type. All of these different feature extraction and three processing techniques can be performed in MATLAB.

Now it's important to understand. Why do we perform pre-processing and feature extraction? So there are a lot of advantages to doing this. One is that less data is needed for training. Often, if you use raw data directly to train networks, you need very large data sets to get good network performance.

However, feature extraction and data pre processing can help reduce this requirement. So you can get good network results even with less data. This means that for applications like biomedical radar and others where data collection is challenging, you can still build good models.

The next is that the size of data going into the AI model can be reduced. So features are extracted instead of using raw data, you can have much less data that's being sent to the network for training. This is very advantageous when you want to deploy your network to small embedded devices, where there are constraints on memory and computation power.

Another key advantage is improved network accuracy. So signal and time series data can have a lot of variability and dimensionality. And the network can only improve to a certain extent even with training and more data. In this case extracting features and using those for training can have a significant impact on network performance.

Another advantage is that less complex models can be used for other applications. So, for example, instead of needing to use deep networks, you could get away with using simpler machine learning models. Again, this can be useful when deploying to smaller embedded devices. And it also provides better interpretability into network behavior.

There are several other advantages, but this helps to understand the value of processing and feature extraction. Now, wavelet techniques can specifically be even more useful in different applications as we will see now. So if you're not familiar with wavelets, a wavelet is a waveform of limited duration that has an average value of 0.

And similar to how Fourier analysis decomposes signals into its sinusoidal components, wavelet analysis decomposes signals into wavelet components. However, unlike the infinite sine waves, wavelets are very localized in time and frequency.

So how is this useful? Real real-world signals are generally non-stationary and have some slowly varying trends and transients. Wavelets are particularly good at representing such data by scaling and shifting the wavelets, the transients, and the data can be captured.

This makes wavelet a very useful tool for EIB applications, make anomaly detection, health monitoring, analyzing biomedical signals, audio analysis, financial analysis, and others, where data is noisy and not stationary.

These are the examples that we will take a look at. First, we will talk about time frequency analysis with wavelet for feature extraction and apply this to the problem of classification of ECG signals. Then we will take a look at wavelet denoising and see how to analyze spectroscopy data with this technique.

Next, we will take a look at MRA, or Multiresolution Analysis and use this to automate labeling of earthquake signals. And finally, we will explore wavelet scattering, which is an automated feature extraction technique. And we'll use it for a fault detection problem.

So let's start with the continuous wavelet transform. Now, time frequency analysis is needed because time domain signals do not provide information about frequencies and power spectrums, while they can provide information about the frequency components, do not provide any information about where the frequency components exist in the signal. But joint time-frequency analysis can provide localized frequency information which is useful when the data has sharp peaks or transients.

The short-time Fourier transform is a popular technique that's used to do type frequency analysis, and it creates a Fourier-based time-frequency map. This map is called a spectrogram. And the transform scales the time frequency plane with windows of fixed sides. This type of tiling is difficult to use when a signal has high and low frequency components, because the resolution is limited by the window size.

Now, continuous wavelet transform creates a map called the scalogram. And this transform is useful, because wavelets has variable sized windows. Because of this, it can capture low frequencies but better frequency resolution and high frequencies can be captured with better time resolution.

So what does this mean? Here I have a synthetic hyperbolic job signal with two components that have exponentially increasing frequencies. In this case, if I wanted to analyze this signal in the time-frequency space, I could use the short-time Fourier transform to create a spectrogram.

Now, here you can see that the two different components in the signal. While you can see that there are definitely two separate components, they are not very well separated in time in this particular representation. And even when I vary the time resolution, the two components are not any clearer.

Instead, if I use the continuous wavelet transform to create a scale diagram image, the two components are perfectly separated. And I can track both of these components as the frequency increases. This is why scalogram as a feature to train AI networks works well, because the data that has quickly varying frequencies or a mix of both high and low frequency components is very clearly represented.

So let's take a look at the ECG classification problem. Here, we have a data set of a 162 ECG records that belong to three classes. ECG signals or electrocardiogram signals which are a type of popular biomedical signal that captured the activity of the human heart. An analysis of this signal is used quite often for detection and diagnosis of other illnesses.

So what we are going to do is take the raw signals, use CWD to create the scalogram images for these signals, and use these representations as input to the convolutional neural network that will then perform the classification.

So let's go into MATLAB and take a closer look at this. So we start by loading in our ECG data over here. And then, we can move to the signal analyzer. The signal analyzer is a great starting point. We want to just explore your data, understand the signal studied in the time-frequency domains, or try out a few different pre-processing techniques, et cetera.

So you should take our ECG signal, I'm going to take the normal ECG signal sample and plot it over here. Then I can go to the time-frequency tab. And I can generate the spectrogram for my signal. So you can see over here that the high frequency components, which are these peaks and they're called the QRS complex. Those are definitely captured, but they are not particularly well localized in time. So there is a lot of smearing that is happening over here.

Now, I can create the scalogram view. And this is definitely a clearer representation. There are these low frequency components that often exist in biomedical signals because of muscle movement or breathing of the patient when the data has been captured.

So we can see that over here in the data. And we can see the high frequency components that are pretty well localized in time. And so this is our data. And we know that the CWD is giving us a good representation of it. So now we'll move back to our script.

And what we are going to do here is, we are going to now apply the CWD filter bank to all of our data. So I'm going to I'm using this particular script to do that. And I've already run it before time to save some time over here. And like you can see over here, I'm taking in the data and creating the scalogram image representations and saving those.

Once that is done, I can use the image data store to bring in my data. The image data store provides a good way to manage image data. And it is useful because I can then divide my data into training and test sets. And so what I'm going to do next is I'm going to divide the data into training and test images. I'm using 80% of the data for training and the remaining 20% for test. And then we move to the next step.

Now, this is an important step. So over here, what I am doing is I'm bringing in a pre-trained model. A pre trained model is a model that has already been trained on very large data sets and works pretty well for different applications.

Now here, Alexnet that is a popular pre-trained network that has been trained for the classification of 1,000 different items or 1,000 different classes and that is the network I'm going to use over here. What I'm going to be doing over here is transfer learning.

So here, I will bring in this free trade network, and then instead of retraining all of the layers in this network, we are only going to be retraining and changing the last few layers of the network. My training process will also be sped up significantly.

So I have my classification layer and I've added in my training classes. And I've modified the last three layers. And my network is now ready. To learn more about transfer learning, take a look at our deep learning documentation.

Now next, I'm going to set my training options. And I'm going to run this. As I'm running it, you could see the accuracy being plotted over your. And the accuracy keeps improving and the loss goes on dropping. And I've sped up the training of here to show you fairly quickly.

But you can see over your that it took less than five minutes to train the network, even though I'm using I'm not using a GPU for training. It's quick to do this because I'm able to use transfer learning over here. And again, this is an advantage of using embedded representations of signals. You can leverage the research that's being done for AI for images as well.

So now once my network is trained, I'm going to go in and evaluate this model by using it to classify my test data. And you can see over here, the result is about 97%. And basically, we have one misclassification which is a pretty good job that the network is done.

We were able to get this result with a relatively small data set of just 162 images. And the overall performance of the network is pretty solid. This is how CWD is often used as a feature extraction tool for AI.

And here we use the biomedical signals, but this can be easily extended to other types of signals. In fact, we have an example that shows the same workflow for wireless communication signals as well. And we heard of customers applying it to all different types of data.

So now let's go back to the slides and move on to wavelet denoising. So removing noise from signals before using it for AI can significantly improve results, which is pretty obvious because if the data is less corrupted, the model training will be more effective.

Wavelet denoising is particularly useful when the signal to noise ratio of the captured signal is low and when the noise and the signal occupy the same frequency bands, because in this case, the linear filtering approach does not really work, because it cannot separate noise from signal where the Fourier spectral overlap.

So wavelet denoising is based on the concept that wavelet coefficients capture signals very sparsely. So for this signal , the final scale wavelet coefficients have just 10 non-zero values. And the entire signal can be recreated with just these coefficients.

What this means is that the wavelet transform concentrates signal features in a few large magnitude wavelet coefficients. And the wavelet coefficients, which are smaller in value are typically noise. And you can shrink or remove these coefficients without affecting the signal quality. So over here, you can see the original noisy signal on the left.

And what you see on top are the wavelet coefficients of this noisy signal. And the coefficients that are marked in red, these are the larger magnitude coefficients that the wavelet denoising process recognizes as signals and all of the other coefficients that are smaller in magnitude are zeroed out. And what you end up with is the denoised wavelet coefficients.

You're using a statistical thresholding to zero out these other coefficients. And then once you have these denoised coefficients, you can use the most stable transform and get back the denoised signal. So this is how the wavelet denoising process works.

Let's take a look at this in action in MATLAB. So here, we want to do noise some NMR spectroscopy data. NMR spectroscopy is a technique that uses magnetic resonance to analyze better understand the chemical and physical properties of matter.

Let's plot original signal over here. You can see the signal of your. Now this signal is not time-based but it has the spectral signal and we can still denoise it in the same way that we would approach time domain signal. But you can see that it definitely appears to be noisy. But also, there are these sharp transients and sharp peaks that exist in this signal.

Now it's important, particularly for spectroscopy to maintain these peaks, because they are providing more information and insight into certain properties of the matter that's being studied. So we want to keep these peaks while also removing the noise.

So typically, if we use any form to smooth the data, we can remove noise but we also end up losing information that exists in these sharp peaks. So you can see over here, I've applied the moving average filter and that's exactly the problem that I run into. I could see that the noise in this area has definitely been reduced, but these sharp peaks are also lost when I'm performing this operation.

I can try a different type of filter. So over here, I've tried the Savitsky Golay filter. And when I applied this to my data, I see that while the peaks are maintained, I am still not getting rid of enough of the noise. So basically what is happening is I'm having to make a trade off between removing noise and keeping my peaks, which is why I'm going to now turn to wavelet denoising.

So I'm going to open up the wavelet signal to noise app over here. This app provides you a quick way to denoise your signal, study it, try out different techniques, and once you're happy with it, you can either generate the script or export the signal itself.

So we'll first quickly import the noisy NMR signal. And when you import a signal into the signal denoiser, of it automatically denoises it with the default settings. And then if you want to change the wavelength being used or any other parameters, so over here, I'm going to change the level from 7 to 4.

You can set these parameters and click on the Denoise to run the denoising again. And now, I can see that my signal matches the original signal in terms of the peaks. But has also removed some of the noise that was clearly present in the original signal.

You can take a look at the coefficients. So if you want to see the different scales and the denoised coefficients, you can move to that tab. And over here, we can see that in the fine scale, just one coefficient is kept, while denoising the signal. And even in the other scales, very few coefficients are kept. And most of the other coefficients that represent noise are discarded.

Now once this is done, I can export this to the work space. And I can generate the MATLAB script. That's useful because if I wanted to apply it to multiple signals, once I have the script, I can apply it fairly easily. I don't have to always be inside the app.

Now, here I have the function that is being used in the app. And I'm just running this over here to once again show the final result as compared to the Savitsky Golay or the Moving mean filter. And you can see that wavelet denoising it does a good job over here.

So wavelet denoising can help improve the data quality. And there are a lot of different applications like financial market analysis, stock trading, audio denoising, et cetera. Well, this technique has been applied successfully before inputting the data in into AI models. Well, this technique is applied successfully.

Now, let's go to Multiresolution Analysis and see how this can be used. So signals often consist of multiple physically meaningful components. Quite often, you may want to study one or more of these components in isolation.

So multiresolution analysis refers to breaking up a signal into such components. What you can see over here is the result of a wavelet multiresolution analysis. And right at the bottom is the original data. And on top you can see the different components that are created using the wavelet MRA technique.

And the great part about this kind of a decomposition is that when you add up all of these different components, you get back exactly the original signal. So if I added up all of these components back in, I would get back my original signal. And all of these components have the same time scale as the original data.

So effectively, multiresolution analysis is letting you perform time-frequency analysis without having to leave the time domain. And we'll take a look at how this is effective in our example. So the example that we are looking at over here is automated labeling of earthquake signals.

And this particular application was inspired by this research paper that was written by authors from Shell, where they had captured earthquake signals using array of geophones. And once they had these earthquake signals, they used a recurrent neural network label the P- and S-waves in the signal.

So P-waves are the first waves that appeared in earthquake signals, and the S-waves that come at a later point are generally larger in magnitude. So over here in red, you can see the result is labeled P-wave and you can see the arrival of the P-wave. In yellow, you can see the arrival of the S-wave.

Identification of these waves is useful for analyzing and studying these earthquake signals further. So for this particular example, I didn't use the original data set, but instead I used earthquake signals that was obtained from an event in Japan. Our aim is to create an AI model that can label the P- and S-waves.

So over here, when we tried to take the same approach that was taken in the paper, which was using label data and sending it to a recurrent neural network and training that network and then using it to label unseen data, the accuracy was very, very low. And this was because there are a lot of challenges with this particular data.

One is that labeling the data, even the original data that you would need for training is challenging. It is difficult to interpret this time-domain signal, figure out exactly where the P-wave is where the S-wave is and label them. And seismic signals are not stationary with features changing quickly in time. It's not easy to label these signals directly using an RNN.

And the other reason is that the RNN does not train well with this raw data. So what you can see over here is the result of the training. And as you can see over here, it's just oscillating between 100% and 0, and the network is learning nothing at all.

So let's see if we can use signal processing over here to improve these results. So first, we used the continuous wavelet transform, the scalogram to just study the signal in more detail. So this is the original signal that we have. And here, we can see the scalogram.

In the scalogram, you can clearly see that there is this one bright spot at a slightly lower frequency, and then you have this bright yellow spot. This bright yellow spot is in fact, the S-wave and the slightly dimmer component is the P-wave. And you can see all of this other components that are noise that are captured as well.

So here, we can already see that the multi-scale approach of wavelets is working well to capture the data. But now here, instead of using the scalogram, we'll use the multiresolution analysis to extract out the components that we are interested in.

So let's quickly jump into MATLAB and see the multiresolution analysis in action. I'm going to quickly load in the kobe signal. And once my signal is open, I'll open up my signal multiresolution analyzer app, and I'm going to import in the kobe signal.

Again, like the denoiser, you can see that it's automatically decomposed with the default settings. And you can see over here the energy distribution in all of these layers. And you can also see the reconstruction with the selected components.

So what this means is, if I selected all of these components, we know that we would get back the original signal. But I can choose to leave out some of the components and reconstruct my signal. So components have more of the noise, I can choose to leave them out and reconstruct my signal. I can remove the approximate component which captures the trend. So it is possible to do all of the detaining removing some of the noise with multiresolution analysis as well.

So in this case, I am going to set my wavelet to the budgies filter and this is because I know that this works well for my particular data We have a lot of documentation on how to select the right wavelet and the right level depending on your particular application. Though the default settings will work well in most cases.

Here, when I select level 3, level 5, and the approximate band, what I end up with is actually the P-wave of the earthquake signal. And then, if I was to instead select only level 4, I am able to capture exactly the S-wave. And now that I know that these particular levels work well to extract out the P-wave and the S-wave, I can export the MATLAB script for this and apply it to all of my data. I'm able to extract out specifically the components that I was interested in using multiresolution analysis.

And let's take a look at what the reconstructed signals look like. So over here, I have the reconstructed P-wave and S-wave. And you can see over here that when I create the scalogram for these individual components, I can see the components very clearly. And a lot of the noise that was present in the original signal has been removed as well, because I've selected specific components.

So now, I'll go ahead and train my model. And here, instead of using the additional time domain signal, I'm going to be using the MRA components that I extracted. So I'm separately putting in the P-wave and S-wave and training the LSTM network. This is then the training that was recorded for the network. And the recurrent network now learns really well and is able to deliver the labeling with much higher accuracy.

So now let's move on to wavelet scattering. Now, convolutional neural networks have proven to be excellent feature extractors for signal and image data. And the reason they work so well is because they perform the series of mathematical operations of convolution followed by the ReLU or the nonlinear layer, followed by pooling.

And these layers are repeated to create the deep network, and then you have the fully connected and classification layers. And the weights of these layers are learned during the training process. So while looking at the width of a trained network, the researchers found that these resemble wavelet filters. And the thought was if the layers of the training resemble wavelet filters, why don't we start with wavelets instead?

And this is exactly what wavelet scattering is. It is a framework which performs a convolution, has a nonlinear layer and averaging, and these layers have weights that are fixed at wavelets instead of being learned like in the convolutional neural network. But these perform exactly the same function of extracting key features from raw data.

And the advantage to using wavelet scattering is that because these layers are not learned but are fixed already at wavelets, even with less data, you can use wavelet scattering, which would not be the case if you were using convolutional neural networks. And the features that are extracted from this way with scattering framework can then be input to deep learning or machine learning model for the classification task.

And the wavelet scattering framework looks like this. Where the feature extraction is done in stages and at each layer, wavelet coefficients for all of these different scattering paths are captured. And in the end, you have this matrix of features and are input to the machine learning model.

Now, we'll take a look at how wavelet scattering can be applied in MATLAB. The example that we are going to take a look at over here is fault detection in air compressors. So what we have is audio recording from air compressors. And the recordings belong to eight possible classes. One is the healthy state. And then you have seven faulty states depending on which particular part of the air compressor may be faulty.

So let's jump into MATLAB and take a look at the script. First, we'll get started by loading our data. We'll use the audio data store over here because we have recorded sounds from the compressors, again, data stores is a good way to manage your data, especially if you have data that's not in memory, it's a good way to manage that.

And we'll quickly take a look at our data set. We have 225 signals of each class. And we'll split that data into training and test data, where 80% the data is used for training and 20% is held out for testing. And we can see over here that the training and test data split equally across all of these different labels.

Let's quickly visualize the time domain signals for all of the eight classes. And you can see over here it's difficult to tell these signals apart directly looking at the time domain signals. The healthy signal looks no different to me from the flywheel signal.

So it's difficult to tell these apart. And again, this emphasizes why some form of feature extraction is going to be helpful before inputting the data into the network. Now, the data of your has 50,000 sample points. So we'll now create our scattering framework and extract the scattering features for one sample signal.

So the scattering framework can be created. And we've mentioned the signal length, sampling frequency, and the invariant scale. This is the only parameter that's needed. For wavelet scattering, we'll see the feature matrix that's generated.

So again the original signal had 50,000 samples. And the feature matrix has 25 time windows across 330 scattering parts. So already, we have a six fault reduction from the original data to our features. And we further sub-sample it by reducing the time windows down to 5.

Over here, we are just going to perform quickly the wavelet scattering process for a test and training data. And we'll reshape the data so that it's in the right format to be input to our SVM model. So over here, each signal has a 5 into 330 feature matrix. And each of these five time windows are put in as separate signals.

So basically, where we originally had one signal, we now have five signals each of just 330 features. And we will just quickly extend the labels. And then we move on to the SVM classification part. So your we've created a cubic polynomial SVM classifier and we train our network on the training data on the training features that are extracted, and then we perform the cross validation of our data. And here, the validation accuracy is nearly 100%.

One more thing that we do before we have the final result is we do the majority votes. So like we saw, there were five windows for example. And I believe we should use all five of those to create the class label. So that's exactly what we are going to do. We are going to use the majority vote helper function that we've written and calculate the final result and calculate the final cross validation accuracy.

Once this is done, we apply the same to the test data. And the test accuracy comes out to 100% as well. And if we look over here, the confusion matrix shows the same thing, where we have just the correct classification for all of these different classes.

So basically, using wavelet scattering, we reduced data that was going into the network from 50,000 sample points to about 1,500 sample points. And then we were able to use a cubic SVM model to perform the classification, and we got 100% accuracy.

So you can see that the wavelet scattering technique was highly effective in pulling out the required features even while reducing the data down significantly. So wavelet scattering can be a really good feature extractor when you are not familiar with your data.

We've seen wavelet scattering perform equally as well or even outperform deep learning models in many different applications, and we have many more examples in our documentation for you to take a look at for application of wavelet scattering to other areas.

So that brings us to the end of our presentation. And we've talked about some of the key wavelet techniques over here, but there are other techniques and other methods that we haven't covered today. And please take a look at the documentation to learn more about these.

We also have a lot of different application examples that you can see and used to get started. All of these different techniques mentioned Today you can find documentation for all of them, and we have excellent videos and webinars as well.