Accelerate NR LDPC Decoder for Streaming Data Using FPGA-in-the-Loop
This example shows how to accelerate simulation of the NR LDPC Decoder from Simulink® by using FPGA-in-the-loop (FIL) in free-running mode for 5G NR streaming data. You can use this mode to accelerate other Monte Carlo simulations. The example also compares acceleration performance between FIL in free-running mode and FIL in lockstep mode.
Requirements
SoC development board with an Ethernet or USB Ethernet interface. For supported boards, see Supported FPGA Devices for FPGA Verification.
FPGA design software, Vivado® Design Suite or Quartus® Prime Pro software, with a supported version listed in FPGA Verification Requirements.
Hardware Setup app to configure the board. You can use either the PS Ethernet interface or the USB Ethernet interface.
Simulate HDL Behavioral Model of LDPC Decoder
Generate Stimulus for LDPC Decoder
To configure parameters and generate stimulus for simulation, run the following script in MATLAB®.
exampleWorkingFolder = pwd; project = openProject("ldpc_stimulus_scripts_prj"); numFrames = 100; noisevar = 1.2; % Initialize model parameters init_ldpc_param; % Generate input data for simulation model [decInFrames,txBits] = generate_ldpc_data(numFrames,noisevar,bgn,liftingSize); generate_simulink_input; % Navigate back to Example working folder cd(exampleWorkingFolder);
Simulate HDL Model
To simulate the HDL behavior model, run this script in MATLAB.
modelName = 'NRLDPCDecoderHDL'; open_system(modelName); % Simulate tstart = tic; simout = sim(modelName); telapsed = toc(tstart); % Compare decoder output with input bits validIndex= find(simout.ctrlOut.valid.Data==1); rxBits= reshape(simout.sampleOut.Data(1,1,validIndex),[],1); numError = sum(rxBits~=txBits); fprintf(['Simulation of %s took %.2f seconds. Total number of frames = %d.' ... 'Number of error bits = %d.\n'],modelName, telapsed, numFrames,numError);
Open the Logic Analyzer and inspect the input and output data for the device under test (DUT). For the given configuration, the LDPC decoder takes 70,898 cycles to process one frame of 10,400 samples. Therefore, Simulink sends 81,298 samples, including the 10,400 valid samples and 70,898 samples of gap data, to the LDPC decoder for decoding each frame. The gap data is the invalid samples required due to the decoder's frame processing latency.
Accelerate Simulation Using Free-Running FIL
From the prior simulation, you can see that the HDL implementation of the NR LDPC Decoder block can take a lot of time to simulate. One way to accelerate this process is to use FIL simulation.
FIL simulation offers two modes for synchronizing MATLAB and FPGA. Lockstep mode ensures cycle-accurate simulation by gating the DUT clock for synchronization with MATLAB or Simulink. On the other hand, free-running mode permits the DUT to function asynchronously from MATLAB. This mode is particularly beneficial for applications that do not require MATLAB synchronization because it can greatly enhance the speed of the DUT.
In contrast to Simulink, which sends invalid gap data to the NR LDPC Decoder block, MATLAB transmits only valid data signals in free-running mode. This mode enables the FPGA to process the input data with an independently operating clock, significantly enhancing the acceleration.
To generate the FIL FPGA project, first generate the HDL code.
makehdl('NRLDPCDecoderHDL/HDL Algorithm');
Open the FPGA-in-the-Loop Wizard.
filWizard
Select the supported board and interface. For this example, set Board Name to AMD Zynq UltraScale+ RFSoC ZCU111 Evaluation Kit.
Set FPGA-in-the-Loop Connection to Ethernet
. Then, under MATLAB/FPGA Synchronization Mode, select Free-running FPGA.
Add the generated HDL files and select HDL_Algorithm.vhd
as the top-level file.
Under DUT I/O ports, map the ports as the following figure shows. As the bgn
and litingsizein
ports are constant for datain
, mark them as Control data
to set them before writing datain
.
Set the output data types as the following figure shows.
Finish the remaining steps to generate the Vivado project and MATLAB class files. At the end, a separate terminal window opens to complete synthesis and bitstream generation.
Run the following script to send the inputs to the LDPC decoder on the FPGA and read the decoded data back into MATLAB using the FIL free-running mode.
% Create FIL object filObj = HDL_Algorithm_fil; filObj.IPAddress = '10.10.10.15'; % change IP filObj.ReadFrameLength = encInframeLen; % process frame by frame % Program FPGA filObj.programFPGA; % Declare and pre-allocate output variables ce_out= cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false); dataOut = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false); ctrlout_start_out = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false); ctrlout_end_out = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false); % Generate input data for free-running FIL ctrlin_start=[true; false(decInFrameLen-1,1)]; ctrlin_end = [false(decInFrameLen-1,1);true]; tstart = tic; % Write configuration data filObj.writePort('liftingsizein',uint16(liftingSize)); filObj.writePort('bgn',logical(bgn)); % Write streaming data and read the decoded data for i = 1:numFrames filObj.writePort('datain',decInFrames{i}, ... 'ctrlin_start',ctrlin_start, ... 'ctrlin_end',ctrlin_end); [ce_out{i},dataOut{i},ctrlout_start_out{i},ctrlout_end_out{i}]= filObj.readPort("ce_out","dataout","ctrlout_start","ctrlout_end"); end telapsed = toc(tstart); % Compute error bits filnumErrors=sum(cell2mat(dataOut')~=txBits); % Print result fprintf('Free Running FIL Simulation took %.2f seconds. Total number of frames = %d. Number of error bits = %d.\n', telapsed, numFrames,filnumErrors); % Release object filObj.release;
Compare Simulation with Lockstep FIL
In lockstep mode, similar to a Simulink model, MATLAB transmits 81,298 samples since the DUT clock gates to achieve synchronization with MATLAB.
To generate the FIL FPGA project in lockstep mode, use the FIL Wizard. For this example, set Board Name to AMD Zynq UltraScale+ RFSoC ZCU111 Evaluation Kit
. Set FPGA-in-the-Loop Connection to Ethernet
. Then, under MATLAB/FPGA Synchronization Mode, select Lockstep.
To generate the Vivado project and MATLAB class files, see System Object Generation with the FIL Wizard.
Run the following script to send the inputs into the LDPC decoder on the FPGA and read the decoded data back to MATLAB using the lockstep mode.
% Create FIL object filObj = HDL_Algorithm_fil_locked; filObj.IPAddress = '10.10.10.15'; % set IPAddress % Program FPGA filObj.programFPGA; % Declare output and pre-allocate output variables ce_out= cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); dataOut = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); liftingsizeout = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); nextframe = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); validout = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); ctrlout_start_out = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); ctrlout_end_out = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false); % Generate input data for lockstep FIL model sampleInLocked = cell(1, numFrames); gapdata = zeros(decframeGap+encInframeLen,1); % Add gap data to the input data as the DUT has latency for ii=1:numFrames sampleInLocked{ii} = fi([decInFrames{ii};gapdata],1,4,0); end ctrlin_start=[true; false(decframeLength-1,1)]; ctrlin_end = [false(decInFrameLen-1,1);true;false(decframeGap+encInframeLen,1)]; validIn = [true(decInFrameLen,1); false(decframeGap+encInframeLen,1)]; bgnIn = logical(ones(decframeLength,1).*bgn); liftingSizeIn = uint16(ones(decframeLength,1).*liftingSize); tstart=tic; % Write testing data to FPGA and read result for i = 1:numFrames [ce_out{i},dataOut{i},ctrlout_start_out{i},ctrlout_end_out{i}, ... validout{i},liftingsizeout{i},nextframe{i}]= filObj.step( ... sampleInLocked{i},ctrlin_start,ctrlin_end, ... validIn,bgnIn,liftingSizeIn); end telapsed=toc(tstart); % Compute error bits validIndex= find(cell2mat(validout')==1); dataOut_concat =cell2mat(dataOut'); numErrors=sum(dataOut_concat(validIndex)~=txBits); % Release object fprintf('Lockstep FIL Simulation took %.2f seconds. Total number of frames = %d. Number of error bits = %d.\n', telapsed, numFrames,numErrors);
Conclusion
The table below summarizes the performance of the different simulation modes while processing 1,000 frames. Free-running FIL significantly enhances simulation performance. It reduces simulation time to just 10 seconds, offering a remarkable 460x improvement over the traditional Simulink behavioral model. This increase in speed is primarily due to the optimized data size in free-running mode, which outputs only valid data.
For applications that do not have a strict requirement for cycle-accurate simulation, free-running mode provides notable efficiency and substantial time savings. For applications that demand cycle-accurate simulation, use lockstep mode. To understand which mode is suitable for your application, see Introduction to Free-Running Mode.
Simulation Mode | Time Taken (in seconds) | Performance Improvement over Simulink Model |
Simulink behavioral model | 4,602 | 1x |
Lockstep FIL | 132 | 35x |
Free-running FIL | 10 | 460x |