Main Content

Accelerate NR LDPC Decoder for Streaming Data Using FPGA-in-the-Loop

This example shows how to accelerate simulation of the NR LDPC Decoder from Simulink® by using FPGA-in-the-loop (FIL) in free-running mode for 5G NR streaming data. You can use this mode to accelerate other Monte Carlo simulations. The example also compares acceleration performance between FIL in free-running mode and FIL in lockstep mode.

Requirements

  • SoC development board with an Ethernet or USB Ethernet interface. For supported boards, see Supported FPGA Devices for FPGA Verification.

  • FPGA design software, Vivado® Design Suite or Quartus® Prime Pro software, with a supported version listed in FPGA Verification Requirements.

  • Hardware Setup app to configure the board. You can use either the PS Ethernet interface or the USB Ethernet interface.

Simulate HDL Behavioral Model of LDPC Decoder

Generate Stimulus for LDPC Decoder

To configure parameters and generate stimulus for simulation, run the following script in MATLAB®.

exampleWorkingFolder = pwd;

project = openProject("ldpc_stimulus_scripts_prj");

numFrames = 100;
noisevar  = 1.2;   

% Initialize model parameters
init_ldpc_param;
                                                                                                                                                                                                                                                                                                                                                                 
% Generate input data for simulation model 
[decInFrames,txBits] = generate_ldpc_data(numFrames,noisevar,bgn,liftingSize);
generate_simulink_input;

% Navigate back to Example working folder
cd(exampleWorkingFolder);

Simulate HDL Model

To simulate the HDL behavior model, run this script in MATLAB.

modelName = 'NRLDPCDecoderHDL';
open_system(modelName);

% Simulate
tstart = tic;
simout = sim(modelName);
telapsed = toc(tstart);

% Compare decoder output with input bits
validIndex= find(simout.ctrlOut.valid.Data==1);
rxBits= reshape(simout.sampleOut.Data(1,1,validIndex),[],1);
numError =  sum(rxBits~=txBits);
fprintf(['Simulation of %s took %.2f seconds. Total number of frames = %d.' ...
    'Number of error bits = %d.\n'],modelName, telapsed, numFrames,numError);

Open the Logic Analyzer and inspect the input and output data for the device under test (DUT). For the given configuration, the LDPC decoder takes 70,898 cycles to process one frame of 10,400 samples. Therefore, Simulink sends 81,298 samples, including the 10,400 valid samples and 70,898 samples of gap data, to the LDPC decoder for decoding each frame. The gap data is the invalid samples required due to the decoder's frame processing latency.

Logic Analyzer showing the dataIn signal with 10,400 samples of valid data and 70,898 samples of gap data.

Accelerate Simulation Using Free-Running FIL

From the prior simulation, you can see that the HDL implementation of the NR LDPC Decoder block can take a lot of time to simulate. One way to accelerate this process is to use FIL simulation.

FIL simulation offers two modes for synchronizing MATLAB and FPGA. Lockstep mode ensures cycle-accurate simulation by gating the DUT clock for synchronization with MATLAB or Simulink. On the other hand, free-running mode permits the DUT to function asynchronously from MATLAB. This mode is particularly beneficial for applications that do not require MATLAB synchronization because it can greatly enhance the speed of the DUT.

In contrast to Simulink, which sends invalid gap data to the NR LDPC Decoder block, MATLAB transmits only valid data signals in free-running mode. This mode enables the FPGA to process the input data with an independently operating clock, significantly enhancing the acceleration.

To generate the FIL FPGA project, first generate the HDL code.

makehdl('NRLDPCDecoderHDL/HDL Algorithm');

Open the FPGA-in-the-Loop Wizard.

filWizard

Select the supported board and interface. For this example, set Board Name to AMD Zynq UltraScale+ RFSoC ZCU111 Evaluation Kit. Set FPGA-in-the-Loop Connection to Ethernet. Then, under MATLAB/FPGA Synchronization Mode, select Free-running FPGA.

FPGA-in-the-Loop Wizard open on the FIL Options pane, with Board Name set to AMD Zynq UltraScale+ RFSoCZCU111 Evaluation Kit, FPGA-in-the-Loop Connection set to Ethernet, and Free-running FPGA mode selected.

Add the generated HDL files and select HDL_Algorithm.vhd as the top-level file.

FPGA-in-the-Loop Wizard open on the Source Files pane, with HDL_Algorithm.vhd selected as the top-level file.

Under DUT I/O ports, map the ports as the following figure shows. As the bgn and litingsizein ports are constant for datain, mark them as Control data to set them before writing datain.

FPGA-in-the-Loop Wizard open on the DUT I/O Ports pane, with the Automatically generate I/O port name, direction and width from top-level module parameter selected. The bgn, liftingsizein, liftingsizeout, and nextframe ports set to the Control data port type, different from the default settings.

Set the output data types as the following figure shows.

FPGA-in-the-Loop Wizard open on the Output Types pane, with all outputs set to the Logical data type, except for liftingsizeout, which is set to the Unsigned Integer data type.

Finish the remaining steps to generate the Vivado project and MATLAB class files. At the end, a separate terminal window opens to complete synthesis and bitstream generation.

Run the following script to send the inputs to the LDPC decoder on the FPGA and read the decoded data back into MATLAB using the FIL free-running mode.

% Create FIL object
filObj = HDL_Algorithm_fil;
filObj.IPAddress = '10.10.10.15'; % change IP
filObj.ReadFrameLength = encInframeLen; % process frame by frame

% Program FPGA
filObj.programFPGA;

% Declare and pre-allocate output variables
ce_out= cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false);
dataOut = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false);
ctrlout_start_out = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false);
ctrlout_end_out = cellfun(@(x) zeros(encInframeLen,1), cell(1, numFrames), 'UniformOutput', false);

% Generate input data for free-running FIL
ctrlin_start=[true; false(decInFrameLen-1,1)];
ctrlin_end = [false(decInFrameLen-1,1);true];

tstart = tic;
% Write configuration data
filObj.writePort('liftingsizein',uint16(liftingSize));
filObj.writePort('bgn',logical(bgn));

% Write streaming data and read the decoded data
for i = 1:numFrames
    filObj.writePort('datain',decInFrames{i}, ...
        'ctrlin_start',ctrlin_start, ...
        'ctrlin_end',ctrlin_end);
    [ce_out{i},dataOut{i},ctrlout_start_out{i},ctrlout_end_out{i}]= filObj.readPort("ce_out","dataout","ctrlout_start","ctrlout_end");
end
telapsed = toc(tstart);

% Compute error bits
filnumErrors=sum(cell2mat(dataOut')~=txBits);

% Print result
fprintf('Free Running FIL Simulation took %.2f seconds. Total number of frames = %d. Number of error bits = %d.\n', telapsed, numFrames,filnumErrors);

% Release object
filObj.release;

Compare Simulation with Lockstep FIL

In lockstep mode, similar to a Simulink model, MATLAB transmits 81,298 samples since the DUT clock gates to achieve synchronization with MATLAB.

To generate the FIL FPGA project in lockstep mode, use the FIL Wizard. For this example, set Board Name to AMD Zynq UltraScale+ RFSoC ZCU111 Evaluation Kit. Set FPGA-in-the-Loop Connection to Ethernet. Then, under MATLAB/FPGA Synchronization Mode, select Lockstep.

FPGA-in-the-Loop Wizard open on the FIL Options pane, with Board Name set to AMD Zynq UltraScale+ RFSoCZCU111 Evaluation Kit, FPGA-in-the-Loop Connection set to Ethernet, and Lockstep mode selected.

To generate the Vivado project and MATLAB class files, see System Object Generation with the FIL Wizard.

Run the following script to send the inputs into the LDPC decoder on the FPGA and read the decoded data back to MATLAB using the lockstep mode.

% Create FIL object
filObj = HDL_Algorithm_fil_locked;
filObj.IPAddress = '10.10.10.15'; % set IPAddress

% Program FPGA
filObj.programFPGA;

% Declare output and pre-allocate output variables
ce_out= cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
dataOut = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
liftingsizeout = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
nextframe = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
validout = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
ctrlout_start_out = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);
ctrlout_end_out = cellfun(@(x) zeros(decframeLength,1), cell(1, numFrames), 'UniformOutput', false);

% Generate input data for lockstep FIL model
sampleInLocked = cell(1, numFrames);
gapdata = zeros(decframeGap+encInframeLen,1);

% Add gap data to the input data as the DUT has latency
for ii=1:numFrames
    sampleInLocked{ii} = fi([decInFrames{ii};gapdata],1,4,0);
end
ctrlin_start=[true; false(decframeLength-1,1)];
ctrlin_end = [false(decInFrameLen-1,1);true;false(decframeGap+encInframeLen,1)];
validIn = [true(decInFrameLen,1); false(decframeGap+encInframeLen,1)];
bgnIn = logical(ones(decframeLength,1).*bgn);
liftingSizeIn = uint16(ones(decframeLength,1).*liftingSize);

tstart=tic;
% Write testing data to FPGA and read result
for i = 1:numFrames
    [ce_out{i},dataOut{i},ctrlout_start_out{i},ctrlout_end_out{i}, ...
        validout{i},liftingsizeout{i},nextframe{i}]= filObj.step( ...
        sampleInLocked{i},ctrlin_start,ctrlin_end, ...
        validIn,bgnIn,liftingSizeIn);
end
telapsed=toc(tstart);

% Compute error bits
validIndex= find(cell2mat(validout')==1);
dataOut_concat =cell2mat(dataOut');
numErrors=sum(dataOut_concat(validIndex)~=txBits);

% Release object
fprintf('Lockstep FIL Simulation took %.2f seconds. Total number of frames = %d. Number of error bits = %d.\n', telapsed, numFrames,numErrors);

Conclusion

The table below summarizes the performance of the different simulation modes while processing 1,000 frames. Free-running FIL significantly enhances simulation performance. It reduces simulation time to just 10 seconds, offering a remarkable 460x improvement over the traditional Simulink behavioral model. This increase in speed is primarily due to the optimized data size in free-running mode, which outputs only valid data.

For applications that do not have a strict requirement for cycle-accurate simulation, free-running mode provides notable efficiency and substantial time savings. For applications that demand cycle-accurate simulation, use lockstep mode. To understand which mode is suitable for your application, see Introduction to Free-Running Mode.

Simulation Mode

Time Taken (in seconds)

Performance Improvement over Simulink Model

Simulink behavioral model

4,602

1x

Lockstep FIL

132

35x

Free-running FIL

10

460x

See Also

Related Topics