Debug YOLO v2 Vehicle Detector on FPGA
This example shows how to debug hardware by visualizing signals from a vehicle detector design deployed on the AMD® Zynq® UltraScale+(TM) MPSoC ZCU102 board. You use FPGA data capture and AXI manager features of the HDL Verifier™ Support Package for AMD FPGA and SoC Devices software to set triggers and capture the signals of interest. The Deploy and Verify YOLO v2 Vehicle Detector on FPGA example shows how to deploy a vehicle detector design on an FPGA. In this example, you integrate FPGA data capture and AXI manager features into this design to debug and visualize its functionality.
Introduction
Debugging designs, especially those deployed to the FPGA, can be a difficult task without a proper set of tools. FPGA data capture and AXI manager offer many capabilities to easily debug designs deployed to an FPGA. In this example, you focus on the Preprocessing module of the design. You analyze several scenarios where proper debugging is required to ensure the application behaves correctly. The scenarios are:
Handshaking between the Preprocessing DUT and deep learning (DL) IP core. This scenario shows how to use FPGA data capture and AXI manager features to visualize the handshaking events between the Preprocessing DUT and the DL IP in the Logic Analyzer (DSP System Toolbox). You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DL IP from the FPGA.
Functionality of the
Resize
Subsystem. This scenario shows how to add debug hooks to the model and use them for debugging and verification.Handshaking between the Preprocessing DUT and the DDR memory. This scenario shows how to visualize the handshaking events between the Preprocessing DUT and the DDR memory in the Logic Analyzer. You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DDR memory from the FPGA.
Add Debug Hooks and Test Points in Model
To capture signal data using FPGA data capture, configure the signal as a test point. For more information, see Configure Signals as Test Points (Simulink). Configure all the signals described in this section as test points. Use the Bus Selector (Simulink) block to extract signals from a bus and then add test points. To calculate the valid pixel flow through the Resize
subsystem, add debugging logic using counters within the YOLOv2PreprocessAlgorithm
model. Use the helperConfigAndAddTestPoints
function to automate adding the counters and test points to the YOLOv2PreprocessAlgorithm
and DLHandshakeLogicExtMem
models. The helperConfigAndAddTestPoints
function creates the four models, which are YOLOv2PreprocessTbDebug
, YOLOv2PreprocessDUTDebug
, YOLOv2PreprocessAlgoDebug
, and DLHandshakeLogicDebug
. These four models contain all the required testpoints and debug hooks.
This figure shows the signals that are configured as test points in the YOLOv2PreprocessAlgoDebug
model.
This figure shows the signals that are configured as test points in the DLHandshakeLogicDebug
model.
Use the Simulink.BlockDiagram.arrangeSystem
(Simulink) function to improve the layout of the model.
Integrate FPGA Data Capture and AXI Manager in HDL Workflow Advisor
To generate IP core files for a DL processor, follow the steps in the Configure Deep Learning Processor and Generate IP Core section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. Use the helperUpdateHDLWorkflowAdvisor
function to automate configuring the HDL workflow advisor settings and generate the bitstream. You must provide the complete path to the DL IP core files. Set the buffer size for FPGA data capture IP to 16384 and the maximum sequence depth to 7.
pathToDLIPFiles = 'F:\dlhdl_prj\ipcore\dlprocessor_v1_0'; modelWithTestPoints = {'YOLOv2PreprocessTbDebug','YOLOv2PreprocessDUTDebug','YOLOv2PreprocessAlgoDebug','DLHandshakeLogicDebug'}; helperUpdateHDLWorkflowAdvisor(pathToDLIPFiles,modelWithTestPoints,'16384','7')
Follow these steps to perform this task manually.
Start the targeting workflow by right-clicking the
YOLO v2 Preprocess DUT Subsystem
subsystem in theYOLOv2PreprocessTbDebug
model and selecting HDL Code > HDL Workflow Advisor.In step 1.1, select
IP Core Generation
and set Target platform toXilinx Zynq Ultrascale+ MPSoC ZCU102 Evaluation Kit
.In step 1.2, set Reference design to
Deep Learning with Preprocessing Interface
. The DL Processor IP name and the DL Processor IP location fields specify the name and location of the generated deep learning processor IP core, respectively. These details are fetched from the IP core report. Set Insert AXI manager toJTAG
.In step 1.3, enable the Enable HDL DUT output port generation for test points setting to update the interface table with all the test points as output ports for the generated DUT. Map the target platform interfaces to the input and output ports of the DUT. For the required interface mapping, see step 1.3 in Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. This table shows the interface mapping for test points. To capture and visualize the trigger signals in the Logic Analyzer, map the trigger signals to
Trigger and Data
instead ofTrigger
. For more information, see Use As (HDL Verifier).
Perform steps 1.4 to 3.1 as shown in the Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example.
In step 3.2, set FPGA data capture buffer size to
16384
and FPGA data capture maximum sequence depth to7
. Select Include capture condition logic in FPGA data capture to enable the capture control logic option in the generated FPGA data capture component.
In step 4.3, generate the bitstream. The HDL Workflow Advisor generates the
block_design_wrapper.bit
bitstream file in thehdl_prj\vivado_ip_prj\vivado_prj.runs\impl_1
folder.
Handshaking Between Preprocessing DUT and Deep Learning IP Core
The DL IP core expects the preprocessed data to be at a specific address in the DDR memory and to have a specific size. The handshaking between the Preprocessing DUT and the DL IP core is to convey the expected address and size to the Preprocessing DUT. The handshaking comprises these steps:
The Preprocessing DUT drives the
rd_addr
,rd_len
, andrd_avalid
control signals in theAXIReadCtrlOutDL
bus.The DL IP core samples these control signals and responds to the Preprocessing DUT by sending the data at the
rd_addr
location through theAXIReadDataDL
signal. The DL IP core also drives the corresponding control signals,rd_dvalid
andrd_aready
, in theAXIReadCtrlInDL
bus.This process continues for three different addresses corresponding to
InputValid
(x"354"),InputAddr
(x"358"), andInputSize
(x"35C") signals. The IP core generation report for the DL IP contains the addresses for these registers.
Signals Required for Debugging
The DLHandshakeLogicExtMem
model contains these signals.
rd_addr --- Address location in the DL IP from which the Preprocessing DUT fetches the required information during handshaking.
rd_len --- Size of data, in bytes, to read from the DL IP starting from the
rd_addr
address location.rd_avalid --- Indication of whether the data in the
rd_addr
andrd_len
signals of the same bus is valid.Data_From_DL --- Information based on the control information the DL IP receives from the Preprocessing DUT in the
AXIReadCtrlOutDL
bus. The DL IP sends appropriate information on this signal.rd_dvalid --- Control signal that forms part of the
AXIReadCtrlInDL
bus. This signal validates the data in theAXIReadDataDL
signal.inputAddr_from_DL --- Output of the
Read DL Registers
subsystem. The Preprocessing DUT places the preprocessed data in the DDR memory at this address.inputSize_from_DL --- Output of the
Read DL Registers
subsystem. This output is the size of the data that the Preprocessing DUT places in the DDR memory.inputValid_from_DL --- Output of the
Read DL Registers
subsystem. This signal validates the data in theinputAddr_from_DL
andinputSize_from_DL
signals.
Timing Diagram
This timing diagram shows the sequence of events for this scenario.
Trigger Conditions in FPGA Data Capture
A successful handshaking between the Preprocessing DUT and DL IP comprises seven events. These events act as sequential triggers in the FPGA Data Capture tool to capture the data.
Configure these settings in the FPGA Data Capture tool:
Set Number of capture windows to
1
to indicate that handshaking events happen only at the beginning of preprocessing. The signal data corresponding to the entire sample depth can be captured in a single window once these trigger conditions are satisfied.Set Number of trigger stages to
7
to indicate that the handshaking comprises seven events.Set Trigger Position to a small value close to zero. If you set this option to
0
, you cannot visualize these events because the tool captures signal data only after this trigger.Repeat the Trigger Stage 1 and Trigger Stage 2 sequences three times.
Use a trigger time out to ensure that Trigger Stage 7 happens within one clock cycle of Trigger Stage 6. Trigger Stage 7 corresponds to a rising edge on the
inpValid_from_DL
signalSet Capture mode to
On Trigger
.
Visualize Captured Data in Logic Analyzer
This timing diagram shows that the handshaking between the Preprocessing DUT and the DL IP behaves as expected.
Functionality of Resize
Subsystem
In this scenario, the focus is to verify the behavior of the Resize
subsystem. The input image to the Resize
subsystem is of size 224-by-340 (76,160 pixels). The output image of the Resize
subsystem is of size 128-by-128 (16,384 pixels). You can use FPGA data capture feature to count the total number of output pixels from the Resize
subsystem and capture the resized image data to find any errors within the logic. Simulink™ does not support renaming of the output of a Bus Selector block. To rename the signal, use the model components contained in the green boxes in this image.
Signals Required for Debugging
The YOLOv2PreprocessAlgoDebug
model contains these signals.
Input_Pix_Valid --- Control signal that is a part of the
pixelcontrol
bus input of theResize
subsystem. This signal validates the pixel data in theInp_Pixel_Data
signal.Input_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels that you pass as input to the
Resize
subsystem. The model uses theInput_Pix_Valid
signal to enable this counter.Resized_Pix_Data --- Output signal of the
Resize
subsystem. This signal contains the pixel data corresponding to the resized image.Resized_Pix_Valid --- Control signal that is a part of the
pixelcontrol
bus output of theResize
subsystem. This signal validates the pixel data in theResized_Pix_Data
signal.Resized_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels returned by the
Resize
subsystem. The model uses theResized_Pix_Valid
signal to enable this counter.
Timing Diagram
Validate the output pixel data using the Resized_Pix_Valid
signal. Whenever this signal goes high, the Resize
subsystem sends the valid output data, as this timing diagram shows. The Input_Pix_Cnt
and Resized_Pix_Cnt
signals indicate the number of valid pixels entering and emerging from the Resize
subsystem, respectively.
Trigger Conditions in FPGA Data Capture
To capture the valid resized pixel data, use the capture condition logic in the FPGA Data Capture tool.
Configure these settings in the FPGA Data Capture tool:
Select Enable the capture control logic in the Capture Condition tab.
Use the
Resized_Pix_Valid
signal in the capture condition logic to ensure that the tool captures the data only when this signal goes high.Select
Immediately
in the capture mode dropdown menu to enable immediate capture. This option is suitable for scenarios in which no specific triggers determine when the tool captures data.
Visualize Captured Data in Logic Analyzer
This timing diagram shows the resized pixel data and the pixel counts captured by the FPGA Data Capture tool. The tp_Resized_Pix_Valid
signal is always high, unlike in the equivalent model simulations using Simulink software. This discrepancy is because the capture condition indicates that the FPGA Data Capture tool captures data only when tp_Resized_Pix_Valid
is high.
The FPGA Data Capture tool creates the dataCaptureOut
structure in the MATLAB® workspace after it captures data. Visualize the resized image by extracting and concatenating the RGB image data from dataCaptureOut
.
RData = reshape(dataCaptureOut.tp_Resized_Pix_Data_0,128,128); BData = reshape(dataCaptureOut.tp_Resized_Pix_Data_2,128,128); GData = reshape(dataCaptureOut.tp_Resized_Pix_Data_1,128,128); resizedImage = cat(3,RData',GData',BData'); imshow(resizedImage)
Scenario 3: Handshaking Between Preprocessing DUT and DDR Memory
After the Preprocessing DUT resizes and normalizes the input image, it places the preprocessed image data in the DDR memory at the address it receives from the DL IP. The handshaking process comprises these steps:
The Preprocessing DUT drives the
wr_addr
,wr_len
, andwr_valid
control signals in theAXIWriteCtrlOutDDR
bus. The DUT also sends the preprocessed signal data through theAXIWriteDataDDR
signal.The DDR memory samples these control signals and the preprocessed pixel data received from the Preprocessing DUT.
Once all the data is placed in the DDR memory, the DDR memory acknowledges the Preprocessing DUT with a pulse on the
wr_complete
signal in theAXIWriteCtrlInDDR
bus.
Signals Required for Debugging
The DLHandshakeLogicDebug
model contains these signals.
wr_addr --- Control signal that is a part of the
AXIWriteDataDDR
bus. This signal is the address in the DDR memory at which the Preprocessing DUT places the data.wr_len --- Control signal that is a part of the
AXIWriteDataDDR
bus. This signal is the size of data, in bytes, that the Preprocessing DUT places in the DDR memory starting from thewr_addr
address location.wr_valid --- Control signal that is a part of the
AXIWriteDataDDR
bus. This signal validates the data in thewr_addr
, andwr_len
signals of the same bus.wr_complete --- Control signal that is a part of the
AXIWriteCtrlInDDR
bus. This signal is the acknowledgement sent from the DDR memory to the Preprocessing DUT containing an indication of the status of the data.writeDone --- Output of the
Write To DDR
subsystem. This signal indicates whether the data transfer to the DDR memory is successful and triggers the DL IP to start reading that data from the DDR memory for further processing.
Timing Diagram
After the final rising edge on the wr_valid
control signal occurs, the DDR memory sends a pulse on the wr_complete
signal as an acknowledgement and a pulse sent on the writeDone
internal signal. This timing diagram shows the sequence of events for this scenario.
Trigger Conditions in FPGA Data Capture
Configure these settings in the FPGA Data Capture tool:
Set Number of capture windows to
1
because these handshaking events happen towards the end of the transaction between Preprocessing DUT and the DDR memory. After these trigger conditions are satisfied, the signal data corresponding to the entire sample depth can be captured in a single window.Set Number of trigger stages to
2
because this handshaking event comprises three events, of which two events occur simultaneously.Set Trigger position option close to the end of the handshake to ensure the Logic Analyzer displays the complete handshake.
Set Capture mode to
On Trigger
.
The Trigger Stage 1 corresponds to a rising edge on wr_valid
signal from the DDR memory.
The Trigger Condition 2 section captures an expected pulse on the wr_complete
and writeDone
signals. This stage uses logical and comparison operators.
Visualize Captured Data in Logic Analyzer
This timing diagram confirms that the handshaking between Preprocessing DUT and DDR memory happens as expected.
Use FPGA Data Capture and AXI Manager Features Simultaneously
As described in Design Considerations for Data Capture (HDL Verifier), to use AXI manager and FPGA data capture features simultaneously, set the capture mode of FPGA data capture to nonblocking. Create an FPGADataCapture object in non-blocking mode and launch the FPGA Data Capture tool.
cd(fullfile('hdl_prj','ipcore','YOLOV2Pre_cs_ipv4_v1_0','fpga_data_capture')) fpgadc = FPGADataCapture; fpgadc.CaptureMode = 'nonblocking'; launchApp(fpgadc);
You must configure a few registers before sending a video frame as an input to the model. Set the DUTProcStart
register of the Preprocessing DUT to 1
. AXI manager can be leveraged to do this task. The YOLOv2DeployAndVerifyDetector
function that is attached with the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example has all the steps present in Verify Deployed YOLO v2 Vehicle Detector Using MATLAB section. The YOLOv2DeployAndVerifyDetector
function uses writePort
function to configure all the control registers. To use the AXI manager instead of writePort
to configure the DUTProcStart
register, use the helperUpdateYOLOv2DeployAndVerifyDetector
function.
The helperUpdateYOLOv2DeployAndVerifyDetector
function creates the DebugYOLOv2VehicleDetector
function which is a modified version of the YOLOv2DeployAndVerifyDetector
function and contains an object of the AXI manager. The helperUpdateYOLOv2DeployAndVerifyDetector
function adds this code to the DebugYOLOv2VehicleDetector
function, which you can use to access AXI manager feature.
Create an AXI manager object.
h = aximanager('AMD');
Use writememory
function to write 1
into the DUTProcStart
register. The address for this register can be found in the IP Core Generation report.
writememory(h, '0xA0040100',1);
Release the JTAG cable resource after writing into the DUTProcStart
register to ensure that FPGA data capture can use the same JTAG interface to capture the data.
release(h)
To capture the required data corresponding to different scenarios, the FPGA Data Capture tool with the appropriate trigger conditions. This diagram shows the data capture process:
Configure the FPGA Data Capture tool with the trigger conditions and then click the Capture Data button to start the data capture process. The tool captures the data when it observes triggers.
Enter the command
DebugYOLOv2VehicleDetector(hSOC)
to start the workflow comprising all the steps from configuring the registers to reading back the processed data to MATLAB. Because you start the FPGA Data Capture tool before this step, the FPGA Data Capture tool detects all the events.
The AXI manager configures the DUTProcStart
control register while the FPGA Data Capture tool waits for the trigger condition to be satisfied. You can simultaneously use both of these tools to capture all the required data.
See Also
Deploy and Verify YOLO v2 Vehicle Detector on FPGA
Related Topics
- Design Considerations for Data Capture (HDL Verifier)
- Set Up AXI Manager (HDL Verifier)
- Target Deep Learning Processor and Image Preprocessing to FPGA (SoC Blockset)