Generate Custom Bitstream to Meet Custom Deep Learning Network Requirements
Deploy your custom network that only has layers with the convolution module output format or only layers with the fully connected module output format by generating a resource optimized custom bitstream that satisfies your performance and resource requirements. Bitstream generated using the default deep learning processor configuration consists of the convolution (conv), fully connected (fc), and adder modules. The generated default bitstreams could exceed your resource utilization requirements which could drive up costs. To generate a bitstream that consists of only the layers in your custom deep learning network, modify the deep learning processor configuration by using the setModuleProperty function of the dlhdl.ProcessorConfig object.
In this example, you have a network that has only layers that have the fully connected module output format. Generate a custom bitstream that consists of the fully connected module only by removing the convolution and adder modules from the deep learning processor configuration. To remove the convolution and adder modules:
Turn off the
ModuleGenerationproperty for the individual modules in the deep learning processor configuration.Use the
optimizeConfigurationForNetworkfunction. The function takes the deep learning network object as the input and returns an optimized custom deep learning processor configuration.Rapidly verify the resource utilization of the optimized deep learning processor configuration by using the
estimateResourcesfunction.
Setup Synthesis Toolpath
To set up the Xilinx® Vivado® tool path, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');Create Custom Processor Configuration
Create a custom processor configuration. Save the configuration to hPC.
hPC = dlhdl.ProcessorConfig
hPC =
Processing Module "conv"
ModuleGeneration: 'on'
LRNBlockGeneration: 'off'
SegmentationBlockGeneration: 'on'
ConvThreadNumber: 16
InputMemorySize: [227 227 3]
OutputMemorySize: [227 227 3]
FeatureSizeLimit: 2048
Processing Module "fc"
ModuleGeneration: 'on'
SoftmaxBlockGeneration: 'off'
FCThreadNumber: 4
InputMemorySize: 25088
OutputMemorySize: 4096
Processing Module "custom"
ModuleGeneration: 'on'
Addition: 'on'
MishLayer: 'off'
Multiplication: 'on'
Resize2D: 'off'
Sigmoid: 'off'
SwishLayer: 'off'
TanhLayer: 'off'
InputMemorySize: 40
OutputMemorySize: 120
Processor Top Level Properties
RunTimeControl: 'register'
RunTimeStatus: 'register'
InputStreamControl: 'register'
OutputStreamControl: 'register'
SetupControl: 'register'
ProcessorDataType: 'single'
System Level Properties
TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
TargetFrequency: 200
SynthesisTool: 'Xilinx Vivado'
ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
SynthesisToolChipFamily: 'Zynq UltraScale+'
SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
SynthesisToolPackageName: ''
SynthesisToolSpeedValue: ''
Optimize Processor Configuration for a Custom Fully Connected (FC) Layer only Network
To optimize your processor configuration, create a custom fully connected layer only network. Call the custom network fcnet.
layers = [ ... imageInputLayer([28 28 3],'Normalization','none','Name','input') fullyConnectedLayer(10,'Name','fc')]; layers(2).Weights = rand(10,28*28*3); layers(2).Bias = rand(10,1); fcnet = dlnetwork(layers); plot(fcnet);

Retrieve the resource utilization for the default custom processor configuration by using estimateResources. Retrieve the performance for the custom network fcnet by using estimatePerformance.
hPC.estimateResources
Deep Learning Processor Estimator Resource Results
DSPs Block RAM* LUTs(CLB/ALUT)
------------- ------------- -------------
Available 2520 912 274080
------------- ------------- -------------
DL_Processor 389( 16%) 508( 56%) 216119( 79%)
* Block RAM represents Block RAM tiles in Xilinx devices and Block RAM bits in Intel devices
hPC.estimatePerformance(fcnet)
### An output layer called 'Output1_fc' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### The network includes the following layers:
1 'input' Image Input 28×28×3 images (SW Layer)
2 'fc' Fully Connected 10 fully connected layer (HW Layer)
3 'Output1_fc' Regression Output mean-squared-error (SW Layer)
### Notice: The layer 'input' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'Output1_fc' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Deep Learning Processor Estimator Performance Results
LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 16127 0.00008 1 16127 12401.6
fc 16127 0.00008
* The clock frequency of the DL processor is: 200MHz
The target device resource counts are:
Digital signal processor (DSP) slice count — 240
Block random access memory (BRAM) count — 128
The estimated performance is 12401.6 frames per second (FPS). The estimated resource use counts are:
Digital signal processor (DSP) slice count — 389
Block random access memory (BRAM) count — 508
The estimated DSP slice count and BRAM count use exceeds the target device resource budget. Customize the bitstream configuration to reduce resource use by customizing the processor configuration.
Customize Processor Configuration by Using ModuleGeneration Property
Create a deep learning network processor configuration object. Save it to hPC_moduleoff. Turn off the convolution and adder modules in the custom deep learning processor configuration.
hPC_moduleoff = dlhdl.ProcessorConfig; hPC_moduleoff.setModuleProperty('conv','ModuleGeneration','off'); hPC_moduleoff.setModuleProperty('adder','ModuleGeneration','off');
Retrieve the resource utilization for the default custom processor configuration by using estimateResources. Retrieve the performance for the custom network fcnet by using estimatePerformance.
hPC_moduleoff.estimateResources
Deep Learning Processor Estimator Resource Results
DSPs Block RAM* LUTs(CLB/ALUT)
------------- ------------- -------------
Available 2520 912 274080
------------- ------------- -------------
DL_Processor 17( 1%) 44( 5%) 25760( 10%)
* Block RAM represents Block RAM tiles in Xilinx devices and Block RAM bits in Intel devices
hPC_moduleoff.estimatePerformance(fcnet)
### An output layer called 'Output1_fc' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### The network includes the following layers:
1 'input' Image Input 28×28×3 images (SW Layer)
2 'fc' Fully Connected 10 fully connected layer (HW Layer)
3 'Output1_fc' Regression Output mean-squared-error (SW Layer)
### Notice: The layer 'input' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'Output1_fc' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Deep Learning Processor Estimator Performance Results
LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 16127 0.00008 1 16127 12401.6
fc 16127 0.00008
* The clock frequency of the DL processor is: 200MHz
The target device resource counts are:
Digital signal processor (DSP) slice count — 240
Block random access memory (BRAM) count — 128
The estimated performance is 12401.6 frames per second (FPS). The estimated resource use counts are:
Digital signal processor (DSP) slice count — 17
Block random access memory (BRAM) count — 44
The estimated resources of the customized bitstream match the user target device resource budget. The estimated performance matches the target network performance.
Customize Processor Configuration by Using optimizeConfigurationForNetwork
Create a deep learning network processor configuration object. Save it to hPC_optimized. Generate an optimized deep learning processor configuration by using the optimizeConfigurationForNetwork function.
hPC_optimized = dlhdl.ProcessorConfig; hPC_optimized.optimizeConfigurationForNetwork(fcnet);
### Optimizing processor configuration for deep learning network...
### An output layer called 'Output1_fc' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Note: Processing module "conv" property "ModuleGeneration" changed from "true" to "false".
### Note: Processing module "fc" property "InputMemorySize" changed from "25088" to "2352".
### Note: Processing module "fc" property "OutputMemorySize" changed from "4096" to "128".
### Note: Processing module "custom" property "ModuleGeneration" changed from "true" to "false".
Processing Module "conv"
ModuleGeneration: 'off'
Processing Module "fc"
ModuleGeneration: 'on'
SoftmaxBlockGeneration: 'off'
FCThreadNumber: 4
InputMemorySize: 2352
OutputMemorySize: 128
Processing Module "custom"
ModuleGeneration: 'off'
Processor Top Level Properties
RunTimeControl: 'register'
RunTimeStatus: 'register'
InputStreamControl: 'register'
OutputStreamControl: 'register'
SetupControl: 'register'
ProcessorDataType: 'single'
System Level Properties
TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
TargetFrequency: 200
SynthesisTool: 'Xilinx Vivado'
ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
SynthesisToolChipFamily: 'Zynq UltraScale+'
SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
SynthesisToolPackageName: ''
SynthesisToolSpeedValue: ''
### Optimizing processor configuration for deep learning network complete.
Retrieve the resource utilization for the default custom processor configuration by using estimateResources. Retrieve the performance for the custom network fcnet by using estimatePerformance.
hPC_optimized.estimateResources
Deep Learning Processor Estimator Resource Results
DSPs Block RAM* LUTs(CLB/ALUT)
------------- ------------- -------------
Available 2520 912 274080
------------- ------------- -------------
DL_Processor 17( 1%) 20( 3%) 25760( 10%)
* Block RAM represents Block RAM tiles in Xilinx devices and Block RAM bits in Intel devices
hPC_optimized.estimatePerformance(fcnet)
### An output layer called 'Output1_fc' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### The network includes the following layers:
1 'input' Image Input 28×28×3 images (SW Layer)
2 'fc' Fully Connected 10 fully connected layer (HW Layer)
3 'Output1_fc' Regression Output mean-squared-error (SW Layer)
### Notice: The layer 'input' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'Output1_fc' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Deep Learning Processor Estimator Performance Results
LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 16127 0.00008 1 16127 12401.6
fc 16127 0.00008
* The clock frequency of the DL processor is: 200MHz
The target device resource counts are:
Digital signal processor (DSP) slice count — 240
Block random access memory (BRAM) count — 128
The estimated performance is 12401.6 frames per second (FPS). The estimated resource use counts are:
Digital signal processor (DSP) slice count — 17
Block random access memory (BRAM) count — 20
The estimated resources of the customized bitstream match the user target device resource budget. The estimated performance matches the target network performance.
Generate Custom Bitstream
Generate a custom bitstream using the processor configuration that matches your performance and resource requirements.
To deploy fcnet using the bitstream generated by using the ModuleOff property, uncomment this line of code:
% dlhdl.buildProcessor(hPC_moduleoff)To deploy fcnet using the bitstream generated by using the optimizeConfigurationForNetwork function, uncomment this line of code:
% dlhdl.buildProcessor(hPC_optimized)See Also
dlhdl.ProcessorConfig | getModuleProperty | setModuleProperty | estimatePerformance | estimateResources