Generate int8
Code for Deep Learning Networks
Deep learning uses neural network architectures that contain many processing layers, including convolutional layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep Neural networks trained in MATLAB® use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, you can reduce the memory requirements of the network.
You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can use MATLAB Coder™ to generate optimized code for the network.
ARM Cortex-A Processors
The generated code takes advantage of ARM® processor SIMD by using the ARM Compute library. The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of ARM Cortex-A CPU platforms such as Raspberry Pi®. To deploy the code on ARM Cortex-A processors, you must use ARM Compute library version 20.02.1.
Supported Layers and Classes
You can generate C++ code for these layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:
2-D average pooling layer (
averagePooling2dLayer
(Deep Learning Toolbox))2-D convolution layer (
convolution2dLayer
(Deep Learning Toolbox))Fully connected layer (
fullyConnectedLayer
(Deep Learning Toolbox))2-D grouped convolution layer (
groupedConvolution2dLayer
(Deep Learning Toolbox)). The value of theNumGroups
input argument must be equal to2
.Max pooling layer (
maxPooling2dLayer
(Deep Learning Toolbox))Rectified Linear Unit (ReLU) layer (
reluLayer
(Deep Learning Toolbox))Input and output layers
C++ code generation for such deep learning networks supports DAGNetwork
(Deep Learning Toolbox),
dlnetwork
(Deep Learning Toolbox), yolov3ObjectDetector
(Computer Vision Toolbox), yolov4ObjectDetector
(Computer Vision Toolbox) and SeriesNetwork
(Deep Learning Toolbox)
objects.
Generating Code
To generate code that performs inference computations in 8-bit integers, in your
coder.ARMNEONConfig
object dlcfg
, set these additional
properties:
dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile'; dlcfg.DataType = 'int8';
Alternatively, in the MATLAB
Coder app, on the Deep Learning tab, set Target
library to ARM Compute
. Then set the
Data type and Calibration result file path
parameters.
Here 'dlquantizerObjectMatFile'
is the name of the MAT-file that
dlquantizer
(Deep Learning Toolbox)
generates for specific calibration data. For the purpose of calibration, set the
ExecutionEnvironment
property of the dlquantizer
object to 'CPU'
.
Otherwise, follow the steps described in Code Generation for Deep Learning Networks with ARM Compute Library.
For an example, see Generate INT8 Code for Deep Learning Network on Raspberry Pi.
ARM Cortex-M Processors
The generated code takes advantage of the CMSIS-NN library version 5.7.0 and can be integrated into your project as a static library that you can deploy to a variety of ARM Cortex-M CPU platforms.
Supported Layers and Classes
The code generated for the fullyConnectedLayer
(Deep Learning Toolbox) object, which represents a fully connected layer, uses the
CMSIS-NN library and performs inference computations in 8-bit integers.
Your deep learning network can also contain the following layers. The generated code performs computations for these layers in 32-bit floating point type.
lstmLayer
(Deep Learning Toolbox) object, which represents a long short-term memory layer. The value ofSequenceLength
that you pass topredict
must be a compile-time constant.softmaxLayer
(Deep Learning Toolbox) object, which represents a softmax layer.Input and output layers.
C code generation for such deep learning networks supports SeriesNetwork
(Deep Learning Toolbox)
objects and DAGNetwork
(Deep Learning Toolbox)
objects that can be converted to SeriesNetwork
objects.
Generating Code
To generate code that performs inference computations in 8-bit integers by using the
CMSIS-NN library, in your coder.CMSISNNConfig
object dlcfg
, set the
CalibrationResultFile
property:
dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile';
Alternatively, in the MATLAB
Coder app, on the Deep Learning tab, set Target
library to CMSIS-NN
. Then set the
Calibration result file path parameter.
Here 'dlquantizerObjectMatFile'
is the name of the MAT-file that
dlquantizer
(Deep Learning Toolbox)
generates for specific calibration data. For the purpose of calibration, set the
ExecutionEnvironment
property of the dlquantizer
object to 'CPU'
.
For an example, see Generate INT8 Code for Deep Learning Network on Cortex-M Target.
See Also
Apps
- Deep Network Quantizer (Deep Learning Toolbox)
Functions
dlquantizer
(Deep Learning Toolbox) |dlquantizationOptions
(Deep Learning Toolbox) |calibrate
(Deep Learning Toolbox) |validate
(Deep Learning Toolbox) |coder.loadDeepLearningNetwork
|codegen