A Practical Guide to Deep Learning: From Data to Deployment

Chapter 5: Testing and Deploying Deep Learning Models

Chapter 5

Testing and Deploying Deep Learning Models

Getting a model that can classify your data is not the end of the deep learning workflow. You also want to have confidence that the model will work on unseen data and that it is going to interact as expected with the other system components. This chapter covers deploying that model onto a target device that requires certain performance characteristics.


Integrating and Testing the Network with Other System Logic

In the previous chapter, a network was trained to recognize high fives in three-axis acceleration data. The network, however, is just part of the overall logic that is needed for a successful high-five counter. At a minimum, the system also needs logic that will:

  • Read the acceleration from the sensor
  • Preprocess the sensor data into a scalogram
  • Pass the scalogram into the network
  • Count labeled high fives (with logic to make sure that a single high-five motion isn’t counted twice as the pattern streaks across the scalogram

Top: A representation of the high-five counter logic in Simulink. Bottom: The top plot shows data read in from the accelerometer, the middle line (green) indicates the scalograms labeled as high fives, and the bottom plot shows the number of high fives counted.

It is reasonable to want to know whether the network and the other system logic works, and this is where testing becomes important. For the high-five counter, you know the network alone functions in some fashion because during the training process it was using 40 validation images to assess the model accuracy, and it only mislabeled one out of the 40.


This is a good start, but even though you have some confidence that this model works for the 40 validation images, you don’t necessarily know that it works on images that haven’t been seen yet. Therefore, in addition to the validation data set that is used during training, often you will also have a test data set that is used to ensure the network accuracy is acceptable across the entire solution space.

In the case of the high-five counter, 40 images were enough to validate that the network was converging during training, but it was not enough to cover all of the possible arm motions the network might see in real life. Additional testing is needed before deploying this network into the field. Then, after you test the system and are confident in its implementation, you could use Simulink Coder™ to build embedded C code and deploy all of this logic including the deep neural network to the arm itself.

Part of testing might be to systematically try all of the different arm motions that a watch might experience, both high-five and non–high five motions. You would want to enlist as wide a variety of people as possible to build up a representative training set and capture the entire solution space. For every instance where the user motion was misclassified by the system, you would save off that data and add it to the training data set to retrain and refine the network.

It’s important to note that you’re never guaranteed to have a perfectly functioning network by sampling a subset of the entire solution space, but you are increasing the range over which you are confident that it will perform. This is a standard approach for deep neural networks. Right now, there is no good systematic way to verify deep neural networks, so you usually rely on sampling methods like Monte Carlo approaches to gain confidence in the network over the entire solution space.

Testing with a Monte Carlo approach is more than likely going to be the case for your project as well, whether you’re looking for material defects, picking out verbal commands in audio, or classifying RF modulation schemes. You’re going to integrate the trained neural network into your full system and test it in a variety of situations.

Importantly, no matter how many different tests you run, there will always be sections of the solution space that haven’t been tested.


Monte Carlo approach to testing the entire solution space. The network is tested for some solutions, but most solutions are not tested.

This is where synthesized data can be so powerful. Chapter 2 outlined how to synthesize RF data for training a network. Similarly, you can use synthesized data to generate millions of different test cases and produce a really dense sampling of the solution space. This would give you a lot of confidence in the system.


Monte Carlo approach to testing the entire solution space. With synthesized test data, more of the solution space can be covered quickly.

Synthesizing data would be difficult with the high-five project because it would be hard to accurately model all of the acceleration patterns that are possible with arm motions and know which of those motions are high fives. This is why for some projects it is easier to physically test the network than synthesize the test data.

Regardless of whether you can synthesize test data or not, you’re going to want to ultimately test the system in the real world on the real hardware.


Deploying the Network

It doesn’t matter if your code works in simulation if it doesn’t work on the target hardware.

Part of making sure that your code will run on the target hardware is evaluating the size of the network and its execution speed. If the size of your network is too large, or if it takes too long to execute, you could start with a smaller pretrained network. Would your project using GoogLeNet (7 million parameters) work just as well if you used transfer learning with SqueezeNet (1 million parameters)?


If the network is still too large, instead of searching for even smaller pretrained networks, you could try reducing the size of your network by pruning or quantizing it.

Pruning is removing some of the parameters in the network that don’t contribute much to classifying your particular data.

Quantizing means taking the single- or double-precision weights and biases in your network and quantizing them to 8-bit scaled integer data types. The idea is that you can still get the same performance from the network without having to use high-precision data types. See What Is int8 Quantization and Why Is It Popular for Deep Neural Networks? for more information.

To give you a sense of what reducing the network could look like in one instance, this is the result of using the Deep Network Quantizer app to quantize the high-five network to 8-bit scaled integers. It took a few minutes to run, and afterwards the network was compressed by 75% and there was no measurable impact on its accuracy.


Hopefully, you can see that with a pretrained network, transfer learning, pruning, and quantizing, you might be able to get to a model that is of sufficient size and efficiency for your application.

If the network is still too large, the last option is to build your own network architecture from scratch. This option requires the most training data and the most training time since at the beginning the network has no concept of anything, so it has to learn everything.

The other downside is that it takes a good understanding of different network architectures to create an efficient one from scratch.


The Takeaway

When considering deep learning for your application, you need to think about the architecture of your network, access to training data, whether to use simulations, and how to gain confidence in your network and the system as a whole.  

There is no one answer for every project, but hopefully, you can start to see the benefits and possibilities for deep learning. Perhaps you have an engineering problem you’re working on right now in which the solution comes down to being able to detect and label complex patterns in data. If so, deep learning is an approach that you might want to consider as part of your studies. It might be easier than you think.