Generate int8 Code for Deep Learning Networks

Generate `int8` Code for Deep Learning Networks

Deep learning uses neural network architectures that contain many processing layers, including convolutional layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep Neural networks trained in MATLAB^® use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, you can reduce the memory requirements of the network.

You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can use MATLAB Coder™ to generate optimized code for the network.

ARM Cortex-A Processors

The generated code takes advantage of ARM^® processor SIMD by using the ARM Compute library. The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of ARM Cortex-A CPU platforms such as Raspberry Pi^®. To deploy the code on ARM Cortex-A processors, you must use ARM Compute library version 20.02.1.

Supported Layers and Classes

You can generate C++ code for these layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:

2-D average pooling layer (averagePooling2dLayer (Deep Learning Toolbox))
2-D convolution layer (convolution2dLayer (Deep Learning Toolbox))
Fully connected layer (fullyConnectedLayer (Deep Learning Toolbox))
2-D grouped convolution layer (groupedConvolution2dLayer (Deep Learning Toolbox)). The value of the NumGroups input argument must be equal to 2.
Max pooling layer (maxPooling2dLayer (Deep Learning Toolbox))
Rectified Linear Unit (ReLU) layer (reluLayer (Deep Learning Toolbox))
Input and output layers

C++ code generation for such deep learning networks supports DAGNetwork (Deep Learning Toolbox), dlnetwork (Deep Learning Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox) and SeriesNetwork (Deep Learning Toolbox) objects.

Generating Code

To generate code that performs inference computations in 8-bit integers, in your coder.ARMNEONConfig object dlcfg, set these additional properties:

dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile'; 
dlcfg.DataType = 'int8';

Alternatively, in the MATLAB Coder app, on the Deep Learning tab, set Target library to ARM Compute. Then set the Data type and Calibration result file path parameters.

Here 'dlquantizerObjectMatFile' is the name of the MAT-file that dlquantizer (Deep Learning Toolbox) generates for specific calibration data. For the purpose of calibration, set the ExecutionEnvironment property of the dlquantizer object to 'CPU'.

Otherwise, follow the steps described in Code Generation for Deep Learning Networks with ARM Compute Library.

For an example, see Generate INT8 Code for Deep Learning Network on Raspberry Pi.

ARM Cortex-M Processors

The generated code takes advantage of the CMSIS-NN library version 5.7.0 and can be integrated into your project as a static library that you can deploy to a variety of ARM Cortex-M CPU platforms.

Supported Layers and Classes

The code generated for the fullyConnectedLayer (Deep Learning Toolbox) object, which represents a fully connected layer, uses the CMSIS-NN library and performs inference computations in 8-bit integers.

Your deep learning network can also contain the following layers. The generated code performs computations for these layers in 32-bit floating point type.

lstmLayer (Deep Learning Toolbox) object, which represents a long short-term memory layer. The value of SequenceLength that you pass to predict must be a compile-time constant.
softmaxLayer (Deep Learning Toolbox) object, which represents a softmax layer.
Input and output layers.

C code generation for such deep learning networks supports SeriesNetwork (Deep Learning Toolbox) objects and DAGNetwork (Deep Learning Toolbox) objects that can be converted to SeriesNetwork objects.

Generating Code

To generate code that performs inference computations in 8-bit integers by using the CMSIS-NN library, in your coder.CMSISNNConfig object dlcfg, set the CalibrationResultFile property:

dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile';

Alternatively, in the MATLAB Coder app, on the Deep Learning tab, set Target library to CMSIS-NN. Then set the Calibration result file path parameter.

For an example, see Generate INT8 Code for Deep Learning Network on Cortex-M Target.