A deep dive into Deep Learning Modeling- Advanced Neural Networks, incl. variational autoencoders
Overview
In this session, we will take a deeper dive into designing, customizing, and training advanced neural networks. We will demonstrate MATLAB's extended deep learning framework, which enables you to implement advanced network architectures such as generative adversarial networks (GANs), variational autoencoders (VAEs), or Siamese networks.
Highlights
- Comparing basic and advanced network architectures to determine the right architecture for your needs
- Generating realistic synthetic image data with GANs
- Implementing generalized research models in MATLAB
About the Presenter
Rishu Gupta is a senior application engineer at MathWorks India. He primarily focuses on image processing, computer vision, and deep learning applications. Rishu has over nine years of experience working on applications related to visual contents. He previously worked as a scientist at LG Soft India in the Research and Development unit. He has published and reviewed papers in multiple peer-reviewed conferences and journals. Rishu holds a bachelor’s degree in electronics and communication engineering from BIET Jhansi, a master’s in visual contents from Dongseo University, South Korea, working on the application of computer vision, and a Ph.D. in electrical engineering from University Technology Petronas, Malaysia with focus on biomedical image processing for ultrasound images.
Recorded: 3 Aug 2021
Hello, everybody. Good afternoon and very warm welcome to Deep Planning Webinar Series. My name is Rishu Gupta, and I work as a senior application engineer at MathWorks. My focus areas are around artificial intelligence and automated driving. Thank you very much for joining for today's session.
Let me briefly cover what Deep Learning Webinar Series is about for the folks who are new to the series. As a part of this Deep Learning Webinar Series, we hosted four sessions. Initially, the first session was around automated labeling and iterative learning, followed by a couple of sessions on designing experiments and advanced neural network architectures. Then fourth is around automatic CUDA code generation and deployment on embedded target.
So today's session is around advanced deep neural networks, which will which is a follow up of the previous session that we have done on designing experiments. Now quickly to go through the agenda for today's discussion, I'll do a very quick brief recap of the previous sessions that we have done. Then I'll go into a comparison of framework features. We have been talking about extended framework in the prior discussions. Today I'll talk about extended framework and there are differences of the framework features.
Then I'll go into modeling advanced deep neural network architectures. I'll talk about GANs. I'll talk about how you can model GAN inside MATLAB. And then in the last, I'll go ahead and talk about how MathWorks can help you in your deep learning journey, and what are different supports that are available from MathWorks site.
Now without any further delay, let me just go into our first bullet, which is around recap of the previous sessions. So if you look at the deep learning workflow, the workflow could be divided into three different buckets. First bucket is around data preparation, second bucket is around AI modeling. And the third is around deployment. And our webinar series is also focused around these three buckets. First was around the data, second and third is around AI modeling, and fourth is around deployment.
In the first session when we were talking about data preparation, we focused more on the data that we could be working on while doing deep learning, and also how you could label those data types. Now, when you are working on deep learning applications, there could be a different kind of data, depending on different kind of applications you could be working on.
Now you could be working on image data, video data. You could be working on signal data, audio data, speech data. You could be working on numeric data coming in from different sensors. You could be working on NLP problems, where you might encounter text data. Or you could be working on automated driving scenarios where you may have a point cloud data.
Now while you are doing deep learning, and since deep learning is a supervised learning technique, first thing that you would need is the labels associated to these different kind of data. Now to address that particular challenge around data, we introduce different apps and also showed you workflow how you can accelerate the labeling task for a deep learning application. We introduce imageLabeler app. We introduce groundTruthLabeler app, lidartLabeler app, videoLabeler app, audioLabeler, and signalLabeler app. Depending on what kind of data you are working on, what kind of application you are working on, you can choose to go ahead with any of these apps.
For example, if I talk about groundTruthLabeler app, groundTruthLabeler app allows you the capability to import lidar point cloud, as well as the video data. Once you have imported those synchronized data inside the app, you can do the visualization, annotation, automation of the annotation, and also export the labels for training your data model.
Now during the entire process, for the automated labeling, we understood that labeling is very tedious and time-consuming. However, we tried to show you a different way to work on the labeling problem itself. It's not just around sitting and labeling your images. It's more of an algorithmic development task when you are doing data labeling.
We understood it is tedious. It is time consuming. But we also understood that it is absolutely necessary if you want to do deep learning and get better robustness and accuracy.
We have introduced all these different apps. And in the process of introducing these apps, we have extended the labeler capabilities to incorporate custom class. To automate the annotation, we also introduce some auto-bound tools, which will help you to incorporate any of the mask or any of the algorithms that we have implemented to automate the annotation. We have shown multiple different options how you can do preprocessing to make your labeling task easier. And also we incorporated AI techniques like machine learning, deep learning model into the labeler apps and automated the labeling for you.
So in a nutshell, in the first session, we discussed how you can go ahead and automate your most, like one of the most significant challenges around deep learning, which is annotation.
Now, the second and third session are more around AI modeling. And in the second session, what we talked about was how you can go ahead and model your neural network architecture inside MATLAB and also how you can accelerate the hyperparameter tuning of your AI models.
Now, in that process, what I have done was I have introduced you with two apps. One was Deep Network Designer app and another is Experiment Manager app. Both the apps could be invoked from the apps panel in the MATLAB toolstrip. And from the Deep Network Designer what we have introduced was how you can choose from the comprehensive list of pre-trained models that are already available for you to get it started. You could be doing transfer learning, or you could be modifying those neural network pre-trained neural network architectures.
How you can easily design, analyze, or train your own neural network architecture within the Deep Network Designer app. So not only you can modify or build your own network architecture, you can also import the data for classification or semantic segmentation kind of use cases. And from there on you can also go ahead and train the neural network architecture inside the Deep Network Designer, while also parallelly visualizing the training progress plot. Which will help you to understand that our network is progressing, accuracy is increasing, losses decreasing or not.
The second thing that we have introduced was around Experiment Manager. And Experiment Manager allows you to-- it complements Deep Network Designer-- and allows you to sweep over multiple hyperparameter combinations while you are training your AI model. Now, when you are training your AI model, the hyperparameters could be different network architectures that you may want to try, a different kind of data that you may want to try, a different hyperparameters like momentum, learning rate. There could be a lot of things that you may want to try to converge to an accuracy or robustness you want.
Now, we have introduced Experiment Manager, we have shown you how you can set up your experiment and quickly start to do the experimentation with different hyperparameters and converge to the accuracy that you want. You can also-- from the list of the neural networks that you have listed out, you can sort, filter, and also understand based on the metrics that you have defined what neural network may make most sense for you. Also, you can automate the hyperparameter search with the help of the Bayesian optimization method.
The best part is, in the case of the AI, it's more about redundancy, replicating the research, and tracking of the result. So Experiment Manager, once you are done with their entire experimentation it allows you to export the trained network or export all that experiment settings for later visualization or sharing across with multiple team members.
Now, for today's discussion, what we are going to talk about is more advanced neural network architectures. What we have talked in the previous sessions, we are using Deep Network Designer and Experiment Manager. However, if you want to build more complex deep neural network architectures, how the workflow would look like. And in the process of doing that, I would be showcasing you one of the neural network architecture, which is GANs.
And here, if you see in this particular demo, what is happening is, as the training is progressing, the Generative Adversarial Network is learning to identify or learning to produce images with the help of the training it has being done.
Now, to go into the GANs, first let me quickly start with the evolution of Deep Learning in MATLAB. And precisely to talk about neural network capability in MATLAB. MATLAB has this capability to train neural network architecture since 1995 or '96 when we introduced Neural Network Toolbox.
And then in 2014, VGG Group at Oxford has released MatConvNet library, which allows users to work on convolutional neural network architecture using MATLAB. And, then, officially MATLAB introduced Deep Learning Toolbox in 2016. And since it is a prolific and hugely advanced field, there are a lot of additions and developments happened since then on the Deep Learning Toolbox.
Starting with 2016 until today 2020 2021 we have added a lot of capabilities while we want to do Deep Learning on your audio data, text data, video data, or signals, or any kind of data you may want to work on. In 2019, we have introduced extended framework which we are going to talk about today and that is an extended framework allows you to build custom training loops, allows you to incorporate custom layers, do automatic differentiation, allow weight sharing models, and also allow you to work on big image capabilities.
There are almost more than 200 examples available for you to get quickly started. Our today's session focus would be around some of these capabilities, and how can we leverage these automatic differentiation custom training loops, weight sharing and big image capabilities to build more advanced deep neural network architectures.
Now, let's go into a little bit more detail about our today's discussion. Now, until 2019a, and prior to that, MathWorks supported simple convolutional neural network, long short term memory network architectures, and combination of both, which is called a C-LSTMs. Also, the conventional approach of neural network architecture like ANNs was also supported.
But deep learning is a very fast evolving field, and this is not all. There is a lot that you need to do and you can do with deep learning. And to capture all those requirements, we have moved our focus. We have also started focusing on extended framework, which will allow the users to build all these deep neural network architectures.
Until 2019a, in MathWorks, in MATLAB, you can quickly go ahead and build series kind of neural network architecture, directed acyclic graphs, which are DAG network architectures, or it could be recurrent neural network architectures, which could be LST and GRU based. But 2019b and future versions we have also incorporated extended framework, which is more advanced framework, by introducing capabilities to work on generative modeling like GANs, wherein you can use GANs to create a lot more data for the application, the capability to work on models like multiple input multiple output networks. So in many of the cases, you may not be working on one type of data. You may be working on video data as well as the sequence data at the same time. How to leverage deep learning for working on those frameworks or models.
Also, giving you much higher fidelity or functionality or flexibility to build your own arbitrary functional programming in the case of neural networks. Like the way you conventionally work on any algorithm development inside MATLAB. So you can create your own functions , you can build your own custom training loops, call those functions, and do the training of your neural network architecture.
You can completely customize the training as per your requirement as per your layer architecture, as per your need for iterating over multiple epochs. Now, with this introduction of this extended framework, different features that got enabled was shared weights. Now shared weight is a kind of a feature which you may be working on while you are doing Siamese kind of neural network architecture, which is more precisely for comparison of the images if you are doing so. Other thing could be automatic differentiation, which is absolutely necessary while you are doing backpropagation inside deep neural network architecture.
Now, the next thing going ahead that is allowed is flexible training structures, wherein you can go ahead and build custom training loops and do the training as per your requirement. Also, there are a lot of layers which are already supported in MATLAB, and in our last discussion we have seen all those different layers, as well. However, there could be used cases wherein you may want to go ahead and build your own custom layer implementations. How to do that. You can also go ahead and build your own customer layers. Incorporate those custom layers within the deep neural network architecture, build your own custom training loops, and train your own neural network.
Now, we'll talk about all of these different features. But before going deeper into that, I would want to give you a little bit deeper insights into what a deep learning training loop may look like. This is not going to a mathematical level of deep learning training. However, talking through about what deep learning process or custom training may look like. Now, while you are training any deep neural network architecture, the first thing that is required is the data. And entire data set that you have for your training is called an epoch. You split that data into smaller chunks, which is called minibatches. Now I would want you to memorize some of these terminologies, which is very common for deep learning. And if not, then you can just capture a snapshot of this particular slide. We will be sharing the recording on all, but for the reference in the later talk you can just capture the snapshot.
So the first thing that we have is the data set, which is epoch. We divide the data set into smaller chunks, which is called minibatches. For each minibatch, we do preprocessing, if that is required for your deep learning training. We will pre-process the input data to a network that you have created and that is a forward pass we do, which is called also inference. Now once we have the output from the forward pass, we get the inference from the network. We can compare that inference with the ground truth that we have. And we have an objective function, which is generally called a loss function, which will compare the predicted loss with the ideal or the ground of loss.
Once we have that comparison, we will backpropagate that comparison. We will compute the gradients of those losses with respect to all the weights and biases, all the parameters inside the neural network architecture. And we will backpropagate those losses to individual layers, and we will correct the network parameters to train appropriately so that the losses are minimized.
Now once we compare the losses we backpropagate those losses by computing gradients and continuously keep on updating weights and biases of the model, which is parameters of the model. Now these two steps, step fifth and sixth, is the backbone of deep learning, which is also called backpropagation.
Now, once we have backpropagated the entire network, updated the weights and biases, our network will start to learn about the use case. What we can also do is you can also visualize the process and also put together any relevant metrics, like accuracy, or loss, to understand whether my network is progressing or the training is progressing and my network is learning to identify or perform the application that I want it to perform.
Now, capture a snapshot, or just go through this slide, because this is what we will be referring in the next few slides that I'm going to talk about. Now, few more terminologies before we go ahead. In the extended framework, to enable the extended framework we have introduced some functions, constructs. What are those constructs? dlarray. Dlarray is that data container to capture the data on which you want it to perform the training. dlnetwork is a network container itself. It contains the neural network architecture or the layer graph that you have created. dlfeval. It will evaluate the deep learning model or function perform back propagation and within dlfeval it will call the dlgradient function, which will compute the gradients' use of automatic differentiation capability.
Now, once we have the gradients computed, we will use dlupdate to perform one more step of propagation and update the weights and biases for individual layers. Right. Now, before going any further, I would request you to go into the polling window and answer one question for me. What kind of neural network architectures you are currently working on?
Are you working on transformer models for text to data? Are you working on GANs, which is generative adversarial networks, for synthetic data generation? And that data could be coming from images, it could be coming from videos, it could be coming from signals, or any domain.
Or you are working on Siamese kind of neural network architecture for comparing different workflows or different images. Or you're working on attention mechanism, like image captioning kind of a problem. Or you are working on other use cases. The poll question would be up there for a few seconds, but in the interest of time, I will move ahead.
And please help us by filling in the poll question. This will help us to connect better and plan for many more sessions which can be more dedicated to your use cases. Now, let's go deeper into MATLAB and start to look at this extended framework. And to do that, we would be again picking up our last problem that we were discussing about was MNIST. To the folks who are new, MNIST is a collection of data sets, of numerals from zero to nine, and problem is the identification of those handwritten numerals. Right. And for that, we have 60,000 training images and 10,000 test images.
Now let me go and talk a little bit on the extended framework itself. Now, the first thing that we see, that we have seen in the last discussion as well, was the simple approach, wherein we can load the data, we can create the network architecture, and we can train the network architecture. Let me just go inside MATLAB and show you some of those network architectures itself.
Now, if I do the comparison between the simple architecture and the network architecture with the help of the dlnetwork or a layer graph approach, this is what it may look like. Now, in the simple approach, what we did was, we have loaded the data, prepared the data for the training, and did the visualization of the data. Created a simple neural network architecture and, if you observe here, we have image input layer, convolution layer, relulayer, pooling, fully connected and softmax classification.
These are the kind of layers you would want to have when you are creating a simple neural network architecture. Then, once you have the network architecture created, you will set up the training options, as to what is the optimizer you want to use? Do you want the training progress part to be visible? What is the learning rate, and a few other parameters.
Once you are done with the training options, the next thing that you do is call in the train network function, and train the network. Nothing else. And then once you have the network trained, you will pass in that trained network with the test data to classify your output and test accuracy of trained network. Very simple approach.
Now with the dlnetwork kind of an approach, what happens is some of the things remain the same as last. So first thing, loading the data. That is absolutely necessary, irrespective of what kind of deep learning application I'm working on. I need to have the data. Now once I have the data, I can do the visualization. And I can create the network architecture.
Now if you observe closely here, when we are creating the network architecture in the case of dlnetwork API what is happening is I'm not incorporating the loss function. So classification there, there is the loss function, which I am skipping in this particular create network case. And the reason for that is because while we are working on some of these advanced deep neural network architecture, loss function plays a key role.
We will see this when we are talking about GANs. But in some of the use cases, when we define the loss functions, loss functions can be tricky. And it could make us to learn a lot of different neural network architectures simultaneously. So for now, you can take that while we are doing the extended framework or deal network approach, layer we will define the architecture, but we are not going to add the loss function to it.
Now, once we have this, we will use the layer graph function to create the layer graph for us and dlnetwork API to stitch all of that together and create a network. Now, this was around creating the network itself. On defining the hyperparameters side, in the case of the simple approach, you have one function training options, wherein you are defining all the hyperparameters. However, in the case of the extended approach, you can go in and define a lot more parameters than in the simpler approach.
Now, the idea to have simple approach, extended approach, as well as functional approach, is to give you a higher level of fidelity, a higher level of flexibility, and higher level of control while you are working on deep learning. Now, the next thing which you were doing in the simple framework was train network. In the case of extended train this gets a little extended. And what we are doing here is exactly what we have talked in that one slide.
We have in that epoch, we have divided that epoch into multiple number of minibatches, so we are saying number of operations per epoch, and we are processing all those minibatch with the pre-processing, if required. Dlarray is where we will have our data for which we wanted to do the training. And then here is dlfeval, which is calling in the model gradients to computer gradients for us. And if I go inside the model gradient function, inside the model gradient function what you can see is we are doing the forward pass, which is the inference, we are getting the predictions, we are building the loss function here and here the loss function is simple crossentropy.
And once we have the loss computed, we are computing the gradient of the loss function with the help of automatic differentiation for all the learnable parameters. We are passing those gradients and learn parameters back. And doing sgdmupdate, depending on what software we have. If you have sgdm, then we will do sgdmupdate, if you have Adam, we will do Adam update, if we have RMSProp we will do RMSProp update.
Now, this is what our training, custom training loop may look like. Now, here what we have done is we have provided you flexibility to build your own custom loss function, which enables you to work on more advanced neural network architectures, like GANs. Now, another approach that we have introduced is something called a functional approach, which gives you even more control as to how you're defining your custom training loops, and how you are defining your neural network architecture itself.
Now, though data remains the same, visualization remains the same, as compared to the extended framework approach. Let me just push extended framework here. Yeah. So, now the load data remains the same, the visualization remains the same. Now I'll skip the create network piece for just a second, and I'll go inside the hyperparameters. Hyperparameter section remains the same whether you are going for extended framework or you are going for the functional approach.
However, in the case of functional approach, now you have more flexibility as to how you want to choose the parameters of individual layer, how you want to initialize those parameters within the convolutional layer, how those weights and biases should look like. Right. And then, the custom training loop looks identical to the extended framework, or the dlnetwork, or API framework, where you have the epoch, you will have the minibatch sizes, you will have dlarray to get the data for training, then you will have dlfeval doing the computation of automatic differentiation or gradients and then the sgdmupdate.
Now one thing that I want to talk a little bit is around the network. Now, when you are creating the network in the functional approach, you are not going anymore with the layers that we have defined internally. What you are doing is you are implementing each layer functionality in the form of a function, which gives you full flexibility as to how you want your custom training to look like, as to how you want your individual layers in the deep neural network architecture to look like.
And if you see here, the network that we are creating here has convolutional layer. I am passing in the parameters or the initializations that I have done about. I have relulayer pooling layer, fully connected layer, and softmax layer. and these are the layers that I have on the left as well. One thing that you don't have here is the image input layer. And the reason is because the image input layer you pass in the model and it passes through the forward and it does the inference on this forward pass on this functional model.
Again the model gradient that we have defined initially remains the same. We get that we compute the output with the help of a model we pass in the output, do a loss function, which is cross entropy here. Compare it with the ideal ground truth and compute the losses. And then, as to last, we are going ahead and propagating those losses with the help of sgdmupdate to individual layers, which will allow our deep neural network architecture to learn.
Now, this may seem a little bit complicated. But the entire intent to have these two different approaches is at what abstraction level you want to be while you are working on deep learning. If you are focusing on application level, you may go ahead and choose the simplest approach. If you are working on more-- if you need more flexibility, if you are working on advanced architectures, and you want to model advanced loss functions, go with the dlnetwork approach. And if you are going more research oriented work, wherein you may want to create your own layer architecture, you want a functional API, to work on the neural network architecture then go with the functional approach.
And this you can do probably parallel comparison with some other frameworks that are available as well like tensorflow, pytorch. So depending upon how much flexibility you want, how much fidelity you want, how much control you want on your deep neural network architecture and custom training loops. You are independent to choose different frameworks. And all of these three frameworks are extensively supported.
Now, now I'll go back in my presentation slide. What we have seen was one of the simple approaches. And we have talked about the dlnetwork approach, where you do not define the loss function. And you define your own loss function and do custom training loops for training the neural network architecture.
Then what we also talked about was a functional form, wherein it's called a model function approach, wherein you load the data. Now you create a functional form of your neural network architecture.
You define the parameters and hyperparameters that you needed to have in your deep neural network architectures. You can initialize the way you want them to initialize. You define your loss function model gradients as per the previous approach, and you define the custom training loops and train your neural network architecture.
Now, all these three different approaches are provided to give you flexibility and control of a different level. Now, let me go ahead and leverage some of these approaches and build a deep neural network architecture, which could be GAN. Now, with the help of this extended framework, you can work on GANs, which is generative adversarial networks, which can help you to create more and more data if you do not have sufficient data. Or also it can help you to solve some of the challenges around semantic segmentations, which I'm going to talk in upcoming slide.
Other things that you can do is you can also work on variational and autoencoders, that become extremely useful while you are working on anomaly detection kind of a problem, or you are working on image denoising, or image super resolution kind of a problem. Or you are working on Siamese network kind of a problems, wherein you want to do the comparison, or you want to build a face identification tool, kind of workflows , wherein it's more about comparing different kind of the networks or images and doing weight sharing.
The last but not the least that you could do is attention mechanism kind of network architecture. And if you look at this attention mechanism kind of network architecture, what it is doing is, it is doing image capturing after looking at individual components inside an image. And if you look at this particular image, and there's caption. It's very precisely saying a dog sitting on some grass. It is very detailed.
Now when you are working on these advanced neural network architecture, you will need to involve some of these extended APIs or functional APIs. Now, let me go into detail and talk a little bit more around what a GAN may look like. Now when we are talking about GANs kind of neural network architecture, we can think of the generator as being like a counterfeiter, trying to make fake data. So here it's a generator, who is trying to make a fake data, and the discriminator network being a policeman, trying to allow only the legal data and catch the counterfeit data. And this is a game. This is a zero sum game, wherein to succeed in this game the counterfeiter must learn to make data that is indistinguishable from genuine data. And the generators that generate a network must learn to create those samples that are drawn from the same distribution as the training data.
And why it is called an adversarial network is because both the generator and discriminator are fighting with each other, wherein generator is generating fake images and discriminator is trying to identify those fake images. There could be many types of GANs, depending upon what kind of data or application you are working on. It could be images, it could be videos, it could be sounds. And to train the GAN, what is required is some structural information or a random noise input.
There could be a lot of different kind of generator network architectures you can have. There could be a lot of different kind of discriminator network architecture you have. Basically, the generator and discriminator are nothing but deep neural network architecture in itself.
Now, if we talk about the visualization of GANs, or going a little bit more into detail of GANs. What we do is we pass in the random latent vector, what we are calling here is, it is a noise vector that we are passing to a generator, which produces fake images for us. Now, that fake image is fed into a discriminator network, which is another convolutional neural network architecture. Now discriminator network at the same time also received the data from the actual real data set. And it has to figure out whether the image that I'm getting right now is coming from the real data or the fake data created by the generator. The labels, we know, are whether the image is real or fake, in advance.
Right. So, we can backpropagate, we can have the loss function identify whether the data is coming from the real data set or it is coming from the generator. With the help of a loss function, we can compute the loss. And we can backpropagate that training loss function through both of the networks to improve their accuracy.
The output of the loss function is used to update the weights of both the networks simultaneously. And loss function for each network is constructed in a way that they compete with each other. Now let me talk a little bit more on the loss function itself.
Now, that discriminator loss function now while discriminator is trained it classifies both real data and the fake data from the generator. It penalizes itself for misclassifying the real instance as fake, or fake instance, which is created by a generator, as real, by maximizing the below function. So this function is the last function for the discriminator. And it is trying to maximize this function, where the first component was log of the x that you see refers to the probability that the generator is rightly classifying the image.
And the other from other component that you see here would help it to correctly label the fake image that comes from the generator. Now if we talk about the generator loss function, while the generator is trained, it samples random noise and produces an output from that noise. The output then goes through the discriminator and gets classified as either real or fake based on the ability of the discriminator to tell one from the other.
Now the generator loss is then calculated from the discriminator's classification. It gets rewarded if it successfully is able to fool the discriminator and gets penalized in the otherwise case. And the equation for the generator loss function can be defined as this particular logarithmic function.
Now. And while they are trying to compete with each other and get to a level where the discriminator can discriminate between the real and fake data. And also generator is able to create the data, which can be used to fool that discriminator itself. To maximize and minimize those loss function where it converges at is at the saddle point. It doesn't go asymptotically to any extremes. It finds a saddle point, a balance wherein the discriminator is able to discriminate half the time whether the images coming from the real or fake. And generator at the same time is able to provide the images which discriminator has difficulty to identify.
Now, once you have trained your GANs, what you can do is you can take those generator network from the GANs and use it to create more and more data. Once the GANs are trained, you don't need any more that discriminator. What you only need is the generator network wherein you can feed in your input, test input, noise vector, or some structural information, and it will help you create the data itself.
Now, a little extension to that and the example that we are going to touch today is something called a conditional GANs. Now, on top of GANs there are something called conditional GANs. And how conditional GANs work are, these are again adversarial networks and they have an advantage of having labels during the training process. So the last work we saw was more around unsupervised. In this case, it is supervised and we have the training data or the labels associated to it.
Now, in the case of conditional GANs, we are able to provide some information to both generator and discriminator. And both are conditioned with this extra information, which is a label y. Now, we can perform the conditioning by feeding the labels to both the discriminator and a generator as an additional input layer.
And in this way, GANs can not only create the data for you, it can create a conditional data condition over some label or something. Let me show that to you with a real example. Now, when I go and talk about the conditional GANs use case, the workflow remains the same. Now here is the generator for the architecture for generator and network with the additional input of labels. First thing that I needed to do is I'm just configuring my project. The next thing that I'm doing is loading the data. And that has been one of the things that I'm doing irrespective of what kind of neural network architecture I'm working on.
Now once I have had my data set up the next thing that I'm going to do is I'm going to define my generator network. And if you see this generator network, this generator network is nothing but a deep neural network architecture with a lot of convolution layers, transposed convolution layers, because what we are feeding to this generator network is a noise or an array of random variables or values. And also we are feeding labels to it.
It is doing transposed convolutions having multiple layers architecture, and it is providing the image as an output to us. This is a generator network architecture. Let me just go ahead and create the generator network architecture. And if you see here what I'm doing is I'm using the real network API to create a network architecture.
And we have some simple functions like add layers, correct layers, to build a network architecture. I'll also show you the discriminator network. And discriminator, what it does is it takes it as an input image and the labels associated and gives you an output, which could be binary, telling whether the image is coming from the fake image provided by generator or it is coming from the real image from the data set.
And if you look at the discriminator network, again it is a API that I'm invoking here, creating my layer graph architecture, using some of the functions that I can quickly use to stitch my neural network architecture together. And then calling in dlnetwork to create the network. Now, we have talked about deep network design and we have talked about Experiment Manager. All of those things can very well be useful here as well.
Now, if I go ahead and I try to import the network from my workspace, I can see both the discriminator as well as the generator. Let me just go ahead and visualize the generator first. Now, if I visualize the generator, what generator looks like is there are a lot of neural network architecture. Like, there are a lot of layers within it, and all of these layers are stitched together. There are multiple input straight. All of these exist together to give me an output, which could be in the form of an image.
Now, the other thing that I want to show you is the discriminator network, which you can also visualize inside the Deep Network Designer. And I can click on OK, and I can click Replace. Now as the discriminator is also imported inside the Network Designer, I can go ahead and I can also visualize what my discriminator looks like. So a discriminator takes in the image input layer and the labels and then have on other layers like convolution, relu convolution, relu batch normalization. To spit it out the final outcome, which could be my whether the discriminator is able to classify the image coming from real data or the fake data or not.
I can also go ahead and click on Analyze button, which will analyze both the discriminator as well as generator for me for its correctness to train. Now, here you will see two errors. But you can ignore these errors, because the error is coming from the last year. And we don't have the loss function, which we needed to build. Other than that, you can do the visualization of an entire neural network architecture and check for its readiness to train.
Now, once you are done with setting up your generator and discriminator network architecture, the next thing that you needed to do is set up your loss function and the custom training loop. Now, before setting up the last functional custom training loop, you will go ahead and define your training options. So there are multiple training options that you may want to define.
So all of these are different training options that we are defining. And given how much fidelity and flexibility you want, you may want to go ahead and choose those parameters. Now, I'm defining the input noise that I wanted to feed in the form of a dlarray to my network architecture.
Right. And now here is the custom training loop. And what custom training loop is doing is exactly the same thing. The number of epochs, the number of iterations that I wanted to have over the minibatch data, and then what it is bringing in is like the input latent noise vector that I want it to have, converting it all to a dlarray. And in this piece here what I'm calling in is something called a dlfeval. Now the dlfeval is internally called model gradient. Let me go inside model gradient and show you what model gradient does.
So, now model gradient calculates the loss for both the discriminator as well as the generator. So what we have done here is like entirely implemented the loss function that we have computed. So mean of the probability of the generated and mean of the probability of the real plus mean 1, minus probability of the generator. So this is the loss function that we have discussed around.
And then we are overall computing the GAN loss. We are passing this loss to dlgradient for automatic differentiation. And, once we have the loss computed gradients computed, we can go back and we can use the sgdm update or the kind of optimizer or the kind of software you want to use, to quickly go ahead and update the neural network architecture.
Right. For every iteration of the minibatches, you will continue to do that. And once you have the network converged for the accuracy and loss that you are looking for, you will have that neural network architecture trained for yourself.
Now, let me go ahead and show you how the generated image may look like. Just a minute. I will need to set up the training options. So I can go ahead and run this training options for me. And then I can quickly go ahead and generate images for my use. And what I am also doing in this particular iteration is I'm also training that trained GAN network so that I can go ahead and I can see what my generated images may look like given conditioned on our label.
We go back here. Just click on this. I have my execution environment. I will go quickly back and now I can choose to have any of the labels and generate the data as per that label.
So here what I'm getting is that class daisy. But it is not the data that we may want to generate. What I can do is now I can-- sorry-- I can go ahead and I can load this trained GAN and once I have loaded this trained GAN, now I can choose between any of these labels and create more logically looking data. So now I am conditioning it on dandelion and what data is getting generated is the flower images of dandelion. I can click on roses and the data that gets selected is of roses.
Now, let me go back to my slides. Now in the case of conditional GANs, you can create more and more data. And also at the same time have control on what that data may look like. Now GANs are very popular and can be used for multiple different kinds of use cases.
Right. And this is how the training progress plot looked like when I trained my conditional GAN. And if you see here the loss function or the accuracy, none of them is converging to asymptote. It is converting towards 0.4 or a midway. So both adversaries are competing with each other to get to a value which makes sense for both the generator as well as the discriminator.
Now, we have talked about GANs and it would make sense to talk a little bit more on different use cases of GANs. Where GANs can be used could be applications like domain transfer or bridging the simulation and real-world gap, validating the performance of pretrained object detector or semantic segmentation. Let me just give you some of the context. Now, in this particular example that you see here, what we are doing is we have trained our deep neural network architecture, which is unsupervised GAN, to convert day to dusk. And with few training images which are coming from CAMVID data set, we were able to do that. And now with the help of the network that we have trained, we are able to convert dusk images to day images. And the same network can be used to convert day image to dusk image.
Now, this could become significantly useful when I want to create data for one scenario and I do not have the data from another scenario. So let's say I've collect a lot of data for the day scenario, but I don't have the data for the night or the morning scenarios. So I can use some of these use cases to create more and more data and perform domain transfer or domain translation to get data for my AI modeling or use cases.
Now, another thing that I wanted to highlight is, consider a situation where you want to test the pretrained YOLO v4 object detection model. Shipping with MathWorks GitHub data repository. Now, as the model is trained on larger data set, the model performance is quite good for the images as long as the illumination conditions are good. However, the model performance is poor for dusk images, which is another important part of your test data set.
So, now the question comes, how can we improve the performance of the detector? And you might think that we can collect more data and label it. But how about we generate more data instead of collecting it? And we can use GANs to automatically transform on portion of that images which are good illumination images to the images that could be the dusk images. And then use those images to train my AI model and have more robust YOLO v4 model doing the inference for me.
Other use case, wherein reference a study that you see here shows the performance improvement for various automotive perception tasks when trained using GANs generated data. Blue represent models trained using only real data. Red represents a combination of real and simulation data. And green represents combination of real and GAN generated.
Models trained with GANs data consistently improves the performance and therefore helps in bridging the domain gap between the two different data sets, that is a real and simulated driving data set, which could be significantly used in different studies. One more use case for GANs that I'm going to talk about is, below what we have discussed about the workflows of GAN is more on generating the data, which can be used for training for computer vision tasks. But let's look at how GAN can be used directly for solving a specific vision task, such as semantic segmentation.
The common problem with deep learning is we have images, but we don't have labels. We can use data set with the labels that have similar representation as our data set. It's not entirely our data set, but it can have a similar representation as our data set. For that purpose, we can use semi supervised GAN approach and perform semantic segmentation.
The semantic segmentation network is trained to accurately segment the simulation data set and then add it to a GAN as generator network for fine tuning on real data sets with respect to labels. And here you can see the outcomes. If you have trained a semantic segmentation network using GANs that the complex trained complex neural network architecture that we had a lot of documentation examples available.
With a full workflow provided to you that you can quickly go ahead and use to establish your workflow and use cases, whether we are talking about conditional GANs, we are talking about unit GANs, we are talking about image translation, or we are talking about leveraging GANs for semantic segmentation. All of those examples are provided as a documented example in MathWorks. You can go ahead and quickly start to use some of those examples. And build your own applications on top of it.
Now, before going any further into our discussion, I would request your input on a couple of poll questions. First question is, what's your current challenge while working with artificial intelligence? Is it data? Is it model architecture? Is it hyperparameter tuning? Or is it deployment, be it embedded on ECUs or on Clouds? Also, would you like to have subsequent technical discussions for your application and use case with us?
In the interest of time, I would go ahead. However, please help us with the poll questions. This will help us to support better and provide more rigorous and thorough content as per your requirement.
Now, what we have talked around was a comparison of the framework, modeling advanced neural network architectures. Now let me go ahead and talk a little bit more how MATLAB can support you in your deep learning journey. First thing before going there is we have been recognized by Gartner as the leader for data science and machine learning platforms. And we stand ahead of platforms like Google, Microsoft, in this journey. And the reason we are there is because of the completeness of the vision that we see for data science and AI. You can leverage the entire workflow end to end whether you are doing data preparation, AI modeling, system designing, or you want to take your designed systems to embedded or cloud deployments.
How MATLAB can support you is, we have a lot of free training courses available for you to get quickly started on artificial intelligence. Like deep learning or machine learning, there are quick to get you started Onramp courses which are absolutely free. Also if you are working on image processing, signal processing, there a lot of other Onramp courses available as well.
To get you further started, there are a lot of detailed courses around deep learning, machine learning, which you can get quickly started on. These are more detailed two-day courses, which will help you to build a deeper expertise on doing deep learning with MATLAB.
Also, MathWorks can help you with the consulting. And the philosophy towards MathWorks consulting is to maximize the return on investment. It will help you to reduce the overall learning curve, will help you in the quicker implementation of your POCs or objectives, and also avoid common mistakes that you may do, being a starter or being an initial learner.
As a part of this consulting, we also are very open and transparent in doing knowledge transfer as to what we are developing for you. This concludes our today's session.
And looking forward to our next session on the embedded deployment, which is also the last session of this deep learning webinar series. That session is on 11th August around automatic CUDA code generation and embedded deployment. Thank you all for joining, and enjoy deep learning. See you in the next session. Thank you very much.