Custom Deep Learning Processor Generation to Meet Performance Requirements
This example shows how to create a custom processor configuration and estimate the performance of a pretrained series network. You can then modify parameters of the custom processor configuration and re-estimate the performance. Once you have achieved your performance requirements you can generate a custom bitstream by using the custom processor configuration.
Prerequisites
Deep Learning HDL Toolbox™Support Package for Xilinx FPGA and SoC
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model Quantization Library
MATLAB Coder Interface for Deep Learning Libraries
Load Pretrained Series Network
To load the pretrained series network LogoNet, enter:
snet = getLogoNetwork;
Create Custom Processor Configuration
To create a custom processor configuration, use the dlhdl.ProcessorConfig
object. For more information, see dlhdl.ProcessorConfig
. To learn about modifiable parameters of the processor configuration, see getModuleProperty
and setModuleProperty
.
hPC = dlhdl.ProcessorConfig; hPC.TargetFrequency = 220; hPC
hPC = Processing Module "conv" ModuleGeneration: 'on' LRNBlockGeneration: 'on' ConvThreadNumber: 16 InputMemorySize: [227 227 3] OutputMemorySize: [227 227 3] FeatureSizeLimit: 2048 Processing Module "fc" ModuleGeneration: 'on' SoftmaxBlockGeneration: 'off' FCThreadNumber: 4 InputMemorySize: 25088 OutputMemorySize: 4096 Processing Module "adder" ModuleGeneration: 'on' InputMemorySize: 40 OutputMemorySize: 40 Processor Top Level Properties RunTimeControl: 'register' InputDataInterface: 'External Memory' OutputDataInterface: 'External Memory' ProcessorDataType: 'single' System Level Properties TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit' TargetFrequency: 220 SynthesisTool: 'Xilinx Vivado' ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM' SynthesisToolChipFamily: 'Zynq UltraScale+' SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e' SynthesisToolPackageName: '' SynthesisToolSpeedValue: ''
Estimate LogoNet Performance
To estimate
the performance of the LogoNet series network, use the estimatePerformance
function of the dlhdl.ProcessorConfig
object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).
hPC.estimatePerformance(snet)
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 39926006 0.18148 1 39926006 5.5 ____conv_1 6825671 0.03103 ____maxpool_1 3755088 0.01707 ____conv_2 10440701 0.04746 ____maxpool_2 1447840 0.00658 ____conv_3 9405685 0.04275 ____maxpool_3 1765856 0.00803 ____conv_4 1819636 0.00827 ____maxpool_4 28098 0.00013 ____fc_1 2651288 0.01205 ____fc_2 1696632 0.00771 ____fc_3 89511 0.00041 * The clock frequency of the DL processor is: 220MHz
The estimated frames per second is 5.5 Frames/s. To improve the network performance, modify the custom processor convolution module kernel data type, convolution processor thread number, fully connected module kernel data type, and fully connected module thread number. For more information about these processor parameters, see getModuleProperty
and setModuleProperty
.
Create Modified Custom Processor Configuration
To create a custom processor configuration, use the dlhdl.ProcessorConfig
object. For more information, see dlhdl.ProcessorConfig
. To learn about modifiable parameters of the processor configuration, see getModuleProperty
and setModuleProperty
.
hPCNew = dlhdl.ProcessorConfig; hPCNew.TargetFrequency = 300; hPCNew.ProcessorDataType = 'int8'; hPCNew.setModuleProperty('conv', 'ConvThreadNumber', 64); hPCNew.setModuleProperty('fc', 'FCThreadNumber', 16); hPCNew
hPCNew = Processing Module "conv" ModuleGeneration: 'on' LRNBlockGeneration: 'on' ConvThreadNumber: 64 InputMemorySize: [227 227 3] OutputMemorySize: [227 227 3] FeatureSizeLimit: 2048 Processing Module "fc" ModuleGeneration: 'on' SoftmaxBlockGeneration: 'off' FCThreadNumber: 16 InputMemorySize: 25088 OutputMemorySize: 4096 Processing Module "adder" ModuleGeneration: 'on' InputMemorySize: 40 OutputMemorySize: 40 Processor Top Level Properties RunTimeControl: 'register' InputDataInterface: 'External Memory' OutputDataInterface: 'External Memory' ProcessorDataType: 'int8' System Level Properties TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit' TargetFrequency: 300 SynthesisTool: 'Xilinx Vivado' ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM' SynthesisToolChipFamily: 'Zynq UltraScale+' SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e' SynthesisToolPackageName: '' SynthesisToolSpeedValue: ''
Quantize LogoNet Series Network
To quantize the LogoNet network, enter:
dlquantObj = dlquantizer(snet,'ExecutionEnvironment','FPGA'); Image = imageDatastore('heineken.png','Labels','Heineken'); dlquantObj.calibrate(Image);
Estimate LogoNet Performance
To estimate the performance of the LogoNet series network, use the estimatePerformance
function of the dlhdl.ProcessorConfig
object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).
hPCNew.estimatePerformance(dlquantObj)
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 14306694 0.04769 1 14306694 21.0 ____conv_1 3477191 0.01159 ____maxpool_1 1876680 0.00626 ____conv_2 2932291 0.00977 ____maxpool_2 723536 0.00241 ____conv_3 2611391 0.00870 ____maxpool_3 882544 0.00294 ____conv_4 641788 0.00214 ____maxpool_4 14025 0.00005 ____fc_1 665265 0.00222 ____fc_2 425425 0.00142 ____fc_3 56558 0.00019 * The clock frequency of the DL processor is: 300MHz
The estimated frames per second is 21.2 Frames/s.
Generate Custom Processor and Bitstream
Use the new custom processor configuration to build and generate a custom processor and bitstream. Use the custom bitstream to deploy the LogoNet network to your target FPGA board.
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2019.2\bin\vivado.bat'); % dlhdl.buildProcessor(hPCNew);
To learn how to use the generated bitstream file, see Generate Custom Bitstream.
The generated bitstream in this example is similar to the zcu102_int8
bitstream. To deploy the quantized LogoNet network using the zcu102_int8
bitstream, see Obtain Prediction Results for Quantized LogoNet Network.