validate
Description
quantizes the weights, biases, and activations in the convolution layers of the network, and
validates the network specified by valResults
= validate(quantObj
,valData
)dlquantizer
object,
quantObj
, using the data specified by
valData
.
quantizes and validates the network with additional options specified by
valResults
= validate(quantObj
,valData
,quantOpts
)quantOpts
.
This function requires Deep Learning Toolbox Model Quantization Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.
Examples
Quantize a Neural Network for GPU Target
This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.
Load the pretrained network. net
is the output network of the Train Deep Learning Network to Classify New Images example.
load squeezedlnetmerch
net
net = dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary.
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
In this example, use the images in the MerchData
data set. Define an augmentedImageDatastore
object to resize the data for the network. Then, split the data into calibration and validation data sets.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); classes = categories(imds.Labels); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);
Create a dlquantizer
object and specify the network to quantize.
dlquantObj = dlquantizer(net);
Specify the GPU target.
quantOpts = dlquantizationOptions(Target='gpu');
quantOpts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}
quantOpts = dlquantizationOptions with properties: Validation Metric Info MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]} Validation Environment Info Target: 'gpu' Bitstream: ''
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(dlquantObj, aug_calData)
calResults=120×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ ____________________ ________________________ _________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.91985 0.88489
{'conv1_Bias' } {'conv1' } "Bias" -0.07925 0.26343
{'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
{'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
{'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.7406 0.90982
{'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.060056 0.14602
{'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.74397 0.66905
{'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.051778 0.074239
{'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
{'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
{'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.72035 0.9743
{'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.067029 0.30425
{'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.61443 0.7741
{'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.053613 0.10329
{'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
{'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the validate
function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions
object to compare the results of the network before and after quantization.
valResults = validate(dlquantObj, aug_valData, quantOpts)
valResults = struct with fields:
NumSamples: 20
MetricResults: [1×1 struct]
Statistics: [2×2 table]
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
valResults.Statistics
ans=2×2 table
NetworkImplementation LearnableParameterMemory(bytes)
_____________________ _______________________________
{'Floating-Point'} 2.9003e+06
{'Quantized' } 7.3393e+05
In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.
The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.
Quantize Network for FPGA Deployment
This example uses:
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Deep Learning ToolboxDeep Learning Toolbox
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
- MATLAB Coder Interface for Deep LearningMATLAB Coder Interface for Deep Learning
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8
network to a target FPGA board.
For this example, you need:
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model Quantization Library
Deep Learning HDL Toolbox Support Package for Xilinx® FPGA and SoC Devices
MATLAB® Coder™ Interface for Deep Learning.
Load Pretrained Network
Load the pretrained LogoNet network and analyze the network architecture.
snet = getLogoNetwork; deepNetworkDesigner(snet);
Set random number generator for reproducibility.
rng(0);
Load Data
This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore
object for calibration and validation.
curDir = pwd; unzip("logos_dataset.zip"); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
Generate Calibration Result File for the Network
Create a dlquantizer
object and specify the network to quantize. Specify the execution environment as FPGA.
dlQuantObj = dlquantizer(snet,'ExecutionEnvironment',"FPGA");
Use the calibrate
function to exercise the network with sample inputs and collect the range information. The calibrate
function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObj,calibrationData)
ans=35×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 198.72
⋮
Create Target Object
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx Vivado® Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');
To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','10.10.10.15');
Alternatively, you can also use the JTAG interface.
% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');
Create dlQuantizationOptions
Object
Create a dlquantizationOptions
object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.
options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget); options_emulation = dlquantizationOptions('Target','host');
To use a custom metric function, specify the metric function in the dlquantizationOptions
object.
options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)},'Bitstream','zcu102_int8','Target',hTarget); options_emulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)})
Validate Quantized Network
Use the validate
function to quantize the learnable parameters in the convolution layers of the network. The validate
function simulates the quantized network in MATLAB. The validate
function uses the metric function defined in the dlquantizationOptions
object to compare the results of the single-data-type network object to the results of the quantized network object.
prediction_emulation = dlQuantObj.validate(validationData,options_emulation)
prediction_emulation = struct with fields:
NumSamples: 160
MetricResults: [1×1 struct]
Statistics: []
For validation on an FPGA, the validate function:
Programs the FPGA board by using the output of the
compile
method and the programming fileDownloads the network weights and biases
Compares the performance of the network before and after quantization
prediction_FPGA = dlQuantObj.validate(validationData,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer) 2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 12 'relu_4' ReLU ReLU (HW Layer) 13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer) 15 'relu_5' ReLU ReLU (HW Layer) 16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer) 17 'relu_6' ReLU ReLU (HW Layer) 18 'fc_3' Fully Connected 32 fully connected layer (HW Layer) 19 'softmax' Softmax softmax (SW Layer) 20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>relu_4 ... ### Compiling layer group: conv_1>>relu_4 ... complete. ### Compiling layer group: maxpool_4 ... ### Compiling layer group: maxpool_4 ... complete. ### Compiling layer group: fc_1>>fc_3 ... ### Compiling layer group: fc_1>>fc_3 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "11.9 MB" "OutputResultOffset" "0x00be0000" "128.0 kB" "SchedulerDataOffset" "0x00c00000" "128.0 kB" "SystemBufferOffset" "0x00c20000" "9.9 MB" "InstructionDataOffset" "0x01600000" "4.6 MB" "ConvWeightDataOffset" "0x01aa0000" "8.2 MB" "FCWeightDataOffset" "0x022e0000" "10.4 MB" "EndOffset" "0x02d40000" "Total: 45.2 MB" ### Network compilation complete. ### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware. ### The network includes the following layers: 1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer) 2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 12 'relu_4' ReLU ReLU (HW Layer) 13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer) 15 'relu_5' ReLU ReLU (HW Layer) 16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer) 17 'relu_6' ReLU ReLU (HW Layer) 18 'fc_3' Fully Connected 32 fully connected layer (HW Layer) 19 'softmax' Softmax softmax (SW Layer) 20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer) ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 39136574 0.17789 1 39136574 5.6 imageinput_norm 216472 0.00098 conv_1 6832680 0.03106 maxpool_1 3705912 0.01685 conv_2 10454501 0.04752 maxpool_2 1173810 0.00534 conv_3 9364533 0.04257 maxpool_3 1229970 0.00559 conv_4 1759348 0.00800 maxpool_4 24450 0.00011 fc_1 2651288 0.01205 fc_2 1696632 0.00771 fc_3 26978 0.00012 * The clock frequency of the DL processor is: 220MHz ### Finished writing input activations. ### Running single input activation.
prediction_FPGA = struct with fields:
NumSamples: 160
MetricResults: [1×1 struct]
Statistics: [2×7 table]
View Performance of Quantized Neural Network
Display the accuracy of the quantized network.
prediction_emulation.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 0.9875
{'Quantized' } 0.9875
prediction_FPGA.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 0.9875
{'Quantized' } 0.9875
Display the performance of the quantized network in frames per second.
prediction_FPGA.Statistics
ans=2×7 table
NetworkImplementation FramesPerSecond Number of Threads (Convolution) Number of Threads (Fully Connected) LUT Utilization (%) BlockRAM Utilization (%) DSP Utilization (%)
_____________________ _______________ _______________________________ ___________________________________ ___________________ ________________________ ___________________
{'Floating-Point'} 5.6213 16 4 93.198 63.925 15.595
{'Quantized' } 19.433 64 16 62.31 50.11 32.103
Quantize a Neural Network for CPU Target
This example uses:
- Deep Learning ToolboxDeep Learning Toolbox
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
- MATLAB CoderMATLAB Coder
- MATLAB Support Package for Raspberry Pi HardwareMATLAB Support Package for Raspberry Pi Hardware
- Embedded CoderEmbedded Coder
- MATLAB Coder Interface for Deep LearningMATLAB Coder Interface for Deep Learning
This example shows how to quantize and validate a neural network for a CPU target. This workflow is similar to other execution environments, but before validating you must establish a raspi
connection and specify it as target using dlquantizationOptions
.
First, load your network. This example uses the pretrained network squeezenet
.
load squeezedlnetmerch
net
net = dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary.
Then define your calibration and validation data, calDS
and valDS
respectively.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); classes = categories(imds.Labels); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227],calData); aug_valData = augmentedImageDatastore([227 227],valData);
Create the dlquantizer
object and specify a CPU execution environment.
dq = dlquantizer(net,'ExecutionEnvironment','CPU')
dq = dlquantizer with properties: NetworkObject: [1×1 dlnetwork] ExecutionEnvironment: 'CPU'
Calibrate the network.
calResults = calibrate(dq,aug_calData,'UseGPU','off')
calResults=120×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
__________________________ ____________________ ________________________ _________ ________
"conv1_Weights" {'conv1' } "Weights" -0.91985 0.88489
"conv1_Bias" {'conv1' } "Bias" -0.07925 0.26343
"fire2-squeeze1x1_Weights" {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
"fire2-squeeze1x1_Bias" {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
"fire2-expand1x1_Weights" {'fire2-expand1x1' } "Weights" -0.7406 0.90982
"fire2-expand1x1_Bias" {'fire2-expand1x1' } "Bias" -0.060056 0.14602
"fire2-expand3x3_Weights" {'fire2-expand3x3' } "Weights" -0.74397 0.66905
"fire2-expand3x3_Bias" {'fire2-expand3x3' } "Bias" -0.051778 0.074239
"fire3-squeeze1x1_Weights" {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
"fire3-squeeze1x1_Bias" {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
"fire3-expand1x1_Weights" {'fire3-expand1x1' } "Weights" -0.72035 0.9743
"fire3-expand1x1_Bias" {'fire3-expand1x1' } "Bias" -0.067029 0.30425
"fire3-expand3x3_Weights" {'fire3-expand3x3' } "Weights" -0.61443 0.7741
"fire3-expand3x3_Bias" {'fire3-expand3x3' } "Bias" -0.053613 0.10329
"fire4-squeeze1x1_Weights" {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
"fire4-squeeze1x1_Bias" {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the MATLAB Support Package for Raspberry Pi Hardware function, raspi
, to create a connection to the Raspberry Pi. In the following code, replace:
raspiname
with the name or address of your Raspberry Piusername
with your user namepassword
with your password
% r = raspi('raspiname','username','password')
For example,
r = raspi('gpucoder-raspberrypi-8','pi','matlab')
r = raspi with properties: DeviceAddress: 'gpucoder-raspberrypi-8' Port: 18734 BoardName: 'Raspberry Pi 3 Model B+' AvailableLEDs: {'led0'} AvailableDigitalPins: [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27] AvailableSPIChannels: {} AvailableI2CBuses: {} AvailableWebcams: {} I2CBusSpeed: AvailableCANInterfaces: {} Supported peripherals
Specify raspi
object as the target for the quantized network.
opts = dlquantizationOptions('Target',r);
opts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}
opts = dlquantizationOptions with properties: Validation Metric Info MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]} Validation Environment Info Target: [1×1 raspi] Bitstream: ''
Validate the quantized network with the validate
function.
valResults = validate(dq,aug_valData,opts)
### Starting application: 'codegen/lib/validate_predict_int8/pil/validate_predict_int8.elf' To terminate execution: clear validate_predict_int8_pil ### Launching application validate_predict_int8.elf... ### Host application produced the following standard output (stdout) and standard error (stderr) messages:
valResults = struct with fields:
NumSamples: 20
MetricResults: [1×1 struct]
Statistics: []
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
Quantize YOLO v3 Object Detector
This example shows how to quantize a yolov3ObjectDetector
(Computer Vision Toolbox) object using preprocessed calibration and validation data.
First, download a pretrained YOLO v3 object detector.
detector = downloadPretrainedNetwork();
This example uses a small labeled data set that contains one or two labeled instances of a vehicle. Many of these images come from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona and used with permission.
Unzip the vehicle images and load the vehicle ground truth data.
unzip vehicleDatasetImages.zip data = load('vehicleDatasetGroundTruth.mat'); vehicleDataset = data.vehicleDataset;
Add the full path to the local vehicle data folder.
vehicleDataset.imageFilename = fullfile(pwd, vehicleDataset.imageFilename);
Create an imageDatastore
for loading the images and a boxLabelDatastore
(Computer Vision Toolbox) for the ground truth bounding boxes.
imds = imageDatastore(vehicleDataset.imageFilename); blds = boxLabelDatastore(vehicleDataset(:,2));
Use the combine
function to combine both the datastores into a CombinedDatastore
.
combinedDS = combine(imds, blds);
Split the data into calibration and validation data.
calData = combinedDS.subset(1:32); valData = combinedDS.subset(33:64);
Use the preprocess
(Computer Vision Toolbox) method of yolov3ObjectDetector
(Computer Vision Toolbox) object with transform
function to prepare the data for calibration and validation.
The transform
function returns a TransformedDatastore
object.
processedCalData = transform(calData, @(data)preprocess(detector,data)); processedValData = transform(valData, @(data)preprocess(detector,data));
Create the dlquantizer
object. When you use the MATLAB execution environment, quantization is performed using the fi
fixed-point data type which requires a Fixed-Point Designer™ license.
dq = dlquantizer(detector, 'ExecutionEnvironment', 'MATLAB');
Calibrate the network.
calResults = calibrate(dq, processedCalData,'UseGPU','off')
calResults=135×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ ____________________ ________________________ _________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.92189 0.85687
{'conv1_Bias' } {'conv1' } "Bias" -0.096271 0.26628
{'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.3751 1.2444
{'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.12068 0.23104
{'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.75275 0.91615
{'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.059252 0.14035
{'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.75271 0.6774
{'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.062214 0.088242
{'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7586 0.68772
{'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10206 0.31645
{'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.71566 0.97678
{'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.069313 0.32881
{'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.60079 0.77642
{'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.058045 0.11229
{'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.738 1.0805
{'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.11189 0.13698
⋮
Validate the quantized network with the validate
function.
valResults = validate(dq, processedValData)
valResults = struct with fields:
NumSamples: 32
MetricResults: [1×1 struct]
Statistics: []
function detector = downloadPretrainedNetwork() pretrainedURL = 'https://ssd.mathworks.com/supportfiles/vision/data/yolov3SqueezeNetVehicleExample_21aSPKG.zip'; websave('yolov3SqueezeNetVehicleExample_21aSPKG.zip', pretrainedURL); unzip('yolov3SqueezeNetVehicleExample_21aSPKG.zip'); pretrained = load("yolov3SqueezeNetVehicleExample_21aSPKG.mat"); detector = pretrained.detector; end
Validate Quantized Network on FPGA Using Custom Bitstream
This example uses:
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
Validate a dlquantizer
object on a target FPGA board using a custom bitstream, and compare the results of validation using two custom int8
bitstreams with different thread counts. In this example, you will quantize a pretrained network, generate custom bitstreams, and validate the quantized network using the custom bitstreams.
Quantize Pretrained Network
Load the pretrained digits network.
snet = getDigitsNetwork;
Load image data for quantization and create calibration and validation datastores. For more information on the data used in this example, see Data Sets for Deep Learning.
dataFolder = fullfile(toolboxdir('nnet'),'nndemos','nndatasets','DigitDataset'); imds = imageDatastore(dataFolder, 'IncludeSubfolders',true,'LabelSource','foldernames'); [calData,valData] = splitEachLabel(imds,0.7,'randomized'); calData_subset = calData.subset(1:20); valData_subset = valData.subset(1:6);
Quantize the network using a dlquantizer
object. Specify FPGA as the execution environment.
dq = dlquantizer(snet,'ExecutionEnvironment','FPGA'); dq.calibrate(calData_subset);
To validate the network on a target FPGA board, specify a dlhdl.Target
object. This example uses a Xilinx™ ZCU102 ZU9EG device.
hTarget = dlhdl.Target('Xilinx','Interface','JTAG');
Generate Custom Bitstreams
To compare the performance of custom bitstreams, generate two bitstreams with different configurations. The bitstreams used in this example are customized to show the performance and resource utilization difference between int8
bitstreams with different processor thread counts for the convolution and fully connected modules on the Xilinx™ ZCU102 ZU9EG device.
Generating a bitstream can take several hours. Before generating a bitstream, you can use the optimizeConfigurationForNetwork
method to modify the processor configuration to meet the requirements of your network and target device. For a list of existing bitstreams, see Use Deep Learning on FPGA Bitstreams.
Use a dlhdl.ProcessorConfig
object to specify the processor parameters for your custom bitstream. For a quantized network, specify the processor data type as 'int8'
. For an int8
processor, the default values assigned to ConvThreadNumber
and FCThreadNumber
are 16 and 4, respectively. Generate the bitstream using the dlhdl.buildProcessor
function. For more information about how to generate a custom bitstream, see Generate Custom Bitstream.
hPCNew = dlhdl.ProcessorConfig
hPCNew.ProcessorDataType = 'int8';
dlhdl.buildProcessor(hPCNew);
Save the generated bitstream as 'custom_int8.bit'
. After saving the generated bitstream, use the same dlhdl.ProcessorConfig
object to generate a second bitstream. Increase the ConvThreadNumber
to 64 and FCThreadNumber
to 16.
hPCNew.setModuleProperty('conv','ConvThreadNumber',64); hPCNew.setModuleProperty('fc','FCThreadNumber',16); dlhdl.buildProcessor(hPCNew);
Save the new generated bitstream as 'custom_int8_incThread.bit'
.
Validate Using Generated Bitstreams
Validate the quantized network on the target device using the first generated bitstream, 'custom_int8.bit'
. Specify the bitstream to use for validation using a dlquantizationOptions
object. If the bitstream is not in your working directory, specify the full path to the file.
dlquantOpts_custom_int8 = dlquantizationOptions('Bitstream','custom_int8.bit','Target',hTarget); valResults_custom_int8 = dq.validate(valData_subset,dlquantOpts_custom_int8);
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream custom_int8.bit. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### The network includes the following layers: 1 'imageinput' Image Input 28×28×1 images with 'zerocenter' normalization (SW Layer) 2 'conv_1' 2-D Convolution 8 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 16 3×3×8 convolutions with stride [1 1] and padding 'same' (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 32 3×3×16 convolutions with stride [1 1] and padding 'same' (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'fc' Fully Connected 10 fully connected layer (HW Layer) 11 'softmax' Softmax softmax (SW Layer) 12 'classoutput' Classification Output crossentropyex with '0' and 9 other classes (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>maxpool_2 ... ### Compiling layer group: conv_1>>maxpool_2 ... complete. ### Compiling layer group: conv_3>>relu_3 ... ### Compiling layer group: conv_3>>relu_3 ... complete. ### Compiling layer group: fc ... ### Compiling layer group: fc ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "184.0 kB" "OutputResultOffset" "0x0002e000" "4.0 kB" "SchedulerDataOffset" "0x0002f000" "8.0 kB" "SystemBufferOffset" "0x00031000" "36.0 kB" "InstructionDataOffset" "0x0003a000" "16.0 kB" "ConvWeightDataOffset" "0x0003e000" "8.0 kB" "FCWeightDataOffset" "0x00040000" "28.0 kB" "EndOffset" "0x00047000" "Total: 284.0 kB" ### Network compilation complete. ### Programming FPGA Bitstream using JTAG... ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 16-Jan-2024 15:08:59 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 16-Jan-2024 15:08:59 ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware. ### The network includes the following layers: 1 'imageinput' Image Input 28×28×1 images with 'zerocenter' normalization (SW Layer) 2 'conv_1' 2-D Convolution 8 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 16 3×3×8 convolutions with stride [1 1] and padding 'same' (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 32 3×3×16 convolutions with stride [1 1] and padding 'same' (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'fc' Fully Connected 10 fully connected layer (HW Layer) 11 'softmax' Softmax softmax (SW Layer) 12 'classoutput' Classification Output crossentropyex with '0' and 9 other classes (SW Layer) ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 22272 0.00011 1 2e+04 8979.7 imageinput_norm 4236 0.00002 conv_1 4494 0.00002 maxpool_1 2999 0.00001 conv_2 2455 0.00001 maxpool_2 2388 0.00001 conv_3 2354 0.00001 fc 3346 0.00002 * The clock frequency of the DL processor is: 200MHz ### Finished writing input activations. ### Running single input activation.
Validate the quantized network on the target device using the second generated bitstream, 'custom_int8_incThread.bit'
.
dlquantOpts_custom_incThread = dlquantizationOptions('Bitstream','custom_int8_incThread.bit','Target',hTarget); valResults_custom_incThread = dq.validate(valData_subset,dlquantOpts_custom_incThread);
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream custom_int8_incThread.bit. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### The network includes the following layers: 1 'imageinput' Image Input 28×28×1 images with 'zerocenter' normalization (SW Layer) 2 'conv_1' 2-D Convolution 8 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 16 3×3×8 convolutions with stride [1 1] and padding 'same' (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 32 3×3×16 convolutions with stride [1 1] and padding 'same' (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'fc' Fully Connected 10 fully connected layer (HW Layer) 11 'softmax' Softmax softmax (SW Layer) 12 'classoutput' Classification Output crossentropyex with '0' and 9 other classes (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>maxpool_2 ... ### Compiling layer group: conv_1>>maxpool_2 ... complete. ### Compiling layer group: conv_3>>relu_3 ... ### Compiling layer group: conv_3>>relu_3 ... complete. ### Compiling layer group: fc ... ### Compiling layer group: fc ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "92.0 kB" "OutputResultOffset" "0x00017000" "4.0 kB" "SchedulerDataOffset" "0x00018000" "36.0 kB" "SystemBufferOffset" "0x00021000" "36.0 kB" "InstructionDataOffset" "0x0002a000" "28.0 kB" "ConvWeightDataOffset" "0x00031000" "8.0 kB" "FCWeightDataOffset" "0x00033000" "20.0 kB" "EndOffset" "0x00038000" "Total: 224.0 kB" ### Network compilation complete. ### Programming FPGA Bitstream using JTAG... ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 16-Jan-2024 15:10:57 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 16-Jan-2024 15:10:57 ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Finished writing input activations. ### Running single input activation. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware. ### The network includes the following layers: 1 'imageinput' Image Input 28×28×1 images with 'zerocenter' normalization (SW Layer) 2 'conv_1' 2-D Convolution 8 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 16 3×3×8 convolutions with stride [1 1] and padding 'same' (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 32 3×3×16 convolutions with stride [1 1] and padding 'same' (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'fc' Fully Connected 10 fully connected layer (HW Layer) 11 'softmax' Softmax softmax (SW Layer) 12 'classoutput' Classification Output crossentropyex with '0' and 9 other classes (SW Layer) ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 41020 0.00021 1 41020 4875.7 imageinput_norm 4236 0.00002 conv_1 6683 0.00003 maxpool_1 5804 0.00003 conv_2 5509 0.00003 maxpool_2 4582 0.00002 conv_3 5905 0.00003 fc 8301 0.00004 * The clock frequency of the DL processor is: 200MHz ### Finished writing input activations. ### Running single input activation.
Compare Validation Output
Compare the validation results from both bitstreams. For these bitstream configurations, increasing the number of threads used for convolution and fully connected layers increases the number of frames per second as well as the resource utilization. For more information on how to optimize your processor configuration based on the resource requirements of your hardware, see Estimate Resource Utilization for Custom Processor Configuration.
valResults_custom_int8.Statistics
ans=2×7 table
NetworkImplementation FramesPerSecond Number of Threads (Convolution) Number of Threads (Fully Connected) LUT Utilization (%) BlockRAM Utilization (%) DSP Utilization (%)
_____________________ _______________ _______________________________ ___________________________________ ___________________ ________________________ ___________________
{'Floating-Point'} 4875.6704 16 4 78.8523788674839 55.7565789473684 15.4365079365079
{'Quantized' } 6418.8972 16 4 36.685274372446 47.7521929824561 10.5952380952381
valResults_custom_incThread.Statistics
ans=2×7 table
NetworkImplementation FramesPerSecond Number of Threads (Convolution) Number of Threads (Fully Connected) LUT Utilization (%) BlockRAM Utilization (%) DSP Utilization (%)
_____________________ _______________ _______________________________ ___________________________________ ___________________ ________________________ ___________________
{'Floating-Point'} 8979.6835 64 16 264.974970811442 61.4583333333333 52.5793650793651
{'Quantized' } 12126.3566 64 16 61.4561441914769 49.671052631579 32.0238095238095
Input Arguments
quantObj
— Network to quantize
dlquantizer
object
Network to quantize, specified as a dlquantizer
object.
valData
— Data to use for validation of quantized network
imageDatastore
object | augmentedImageDatastore
object | pixelLabelImageDatastore
object | CombinedDatastore
object | TransformedDatastore
object
Data to use for validation of quantized network, specified as an imageDatastore
object, an augmentedImageDatastore
object, a pixelLabelImageDatastore
(Computer Vision Toolbox) object, a CombinedDatastore
object,
or a TransformedDatastore
object.
You must preprocess the data used for validation of a quantized yolov3ObjectDetector
(Computer Vision Toolbox) object using the preprocess
(Computer Vision Toolbox)
function. For an example of using preprocessed data for validation of a
yolov3ObjectDetector
, see Quantize YOLO v3 Object Detector.
validate
accepts a CombinedDatastore
or
TransformedDatastore
object as input data for validating quantized
yolov3ObjectDetector
and yolov4ObjectDetector
objects. The CombinedDatastore
and
TransformedDatastore
used for validation must contain an
imageDatastore
or augmentedImageDatastore
as the
first datastore and a boxLabelDatastore
as the second datastore. For
more information on valid datastores, see Prepare Data for Quantizing Networks.
quantOpts
— Options for quantizing network
dlquantizationOptions
object
Options for quantizing the network, specified as a dlquantizationOptions
object.
Output Arguments
valResults
— Performance of quantized network
struct
Performance of quantized network, returned as a struct. The struct contains these fields.
NumSamples
— The number of sample inputs used to validate the network, specified byvalData
.MetricResults
— Struct containing results of the metric function defined in thedlquantizationOptions
object. When more than one metric function is specified in thedlquantizationOptions
object,MetricResults
is an array of structs.MetricResults
contains these fields:Field Description MetricFunction
Metric function used to determine the performance of the quantized network, specified in the dlquantizationOptions
object.Result
Table indicating the results of the metric function before and after quantization.
The first row in the table,
'Floating-Point'
, contains information for the original floating-point implementation. The second row,'Quantized'
, contains information for the quantized implementation. The output of the metric function is displayed in theMetricOutput
column.Statistics
— Table indicating the learnable parameter memory used, in bytes, by the original floating-point implementation of the network and the quantized implementation.When the
ExecutionEnvironment
for thedlquantizer
object is set toFPGA
, theStatistics
table is a table indicating these values for the original floating-point and quantized network implementations:Frames per second
Number of convolution threads
Number of fully connected threads
Lookup table (LUT) resource utilization percentage
Block RAM resource utilization percentage
DSP resource utilization percentage
The
Statistics
table will be empty when theTarget
property ofdlquantizationOptions
is set to'host'
.
Limitations
Validation on target hardware for CPU, FPGA, and GPU execution environments is not supported in MATLAB® Online™. For FPGA and GPU execution environments, validation can be performed through emulation on the MATLAB Online host. GPU validation can also be performed if GPU support has been added to your MATLAB Online Server™ cluster. For more information on GPU support for MATLAB Online, see Configure GPU Support in MATLAB Online Server (MATLAB Online Server).
Algorithms
The validate
function determines the default metric function to use
for the validation based on the type of network that is being quantized.
Type of Network | Metric Function |
---|---|
Classification | Top-1 Accuracy — Accuracy of the network |
Object Detection | Average Precision — Average precision over all detection results. See evaluateObjectDetection (Computer Vision Toolbox). |
Regression | MSE — Mean squared error of the network |
Semantic Segmentation | evaluateSemanticSegmentation (Computer Vision Toolbox) — Evaluate semantic segmentation data set
against ground truth |
Single Shot Detector (SSD) | WeightedIOU — Average IoU of each class, weighted by the number of pixels in that class |
Version History
Introduced in R2020aR2022a: Validate the performance of quantized network for CPU target
You can now use the dlquantizer
object and the
validate
function to quantize a network and generate code for CPU
targets.
See Also
Apps
Functions
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)