Main Content

Streaming Data from Software to Hardware

This example shows how to design the data-path from an embedded processor to hardware logic (FPGA) using SoC Blockset™. Design and simulate the entire application comprising of FPGA and processor algorithms, memory interface, and task scheduling to meet the system requirements. Then, validate the design on hardware by generating code from the model and implementing it on a System-on-Chip (SoC) device.

Supported hardware platforms:

  • Xilinx® Zynq® ZC706 evaluation kit

  • Xilinx Zynq UltraScale™+ MPSoC ZCU102 Evaluation Kit

  • Xilinx Zynq UltraScale™+ RFSoC ZCU111 Evaluation Kit

  • ZedBoard™ Zynq-7000 Development Board

  • Altera® Cyclone® V SoC development kit

  • Altera Arria® 10 SoC development kit

Design Task and System Requirements

In this example, the embedded processor sends test data of either a low or high frequency sinusoid to the FPGA. The FPGA algorithm detects the frequency of the signal by filtering and lights up a light-emitting diode (LED) to indicate the detection. This example models the data-path similar to the Streaming Data from Hardware to Software example. In this example, the data-flow is reversed as compared to the Streaming Data from Hardware to Software example.

The application has these performance requirements.

  • Throughput: 10e6 samples per second

  • Maximum latency: 10 ms

  • Data streaming: Continuous

Design Using SoC Blockset

Create SoC model soc_swhw_stream_top using the template Stream from Processor to FPGA Template. The top model includes FPGA model soc_swhw_stream_fpga and processor model soc_swhw_stream_proc instantiated as model references. The top model also includes Software to AXI4-Stream block that model shared external memory between the FPGA and processor.

Design to Meet Latency Requirement: Begin with a few potential frame sizes and calculate the frame period for each frame size in Table-1. The frame period is the time between two consecutive frames from the FPGA to processor. For this example, the FPGA output sample time is 1/10e6, or 1e-7, as the FPGA algorithm runs at 10 MHz. The frame period is calculated as

$FramePeriod = Frame size * FPGAOutputSampleTime$

The latency of the memory is due to the time elapsed by samples in the queue of frame buffers and the FPGA FIFO. Select the FPGA FIFO size such that it is equivalent to the size of one frame buffer. To stay within the maximum latency requirement, calculate the number of frame buffers for each frame size such that:

$(NumFrameBuffers + 1) * FramePeriod <= MaxLatency$

The maximum latency allowed for this example is 10 ms. Calculate the maximum frame buffers for all of the cases in this table. Because the number of buffers accounts for the maximum latency requirement, all of the cases meet the latency requirement.

The range for number of buffers is dictated by memory architecture constraints. The maximum number of frame buffers allowed by the software Direct Memory Access (DMA) driver is 64. The minimum number of frame buffers is 3. While the processor writes one frame buffer, the FPGA reads from another frame buffer. Therefore, the range for the number of frame buffers is:

$3 <= NumFrameBuffers <= 64$

Case #5 and #6 violate the minimum buffer requirements.

Design to Meet Throughput Requirement: On average, the software processing must complete within a frame period. If it does not, the software task does not generate data fast enough for consumption by the FPGA, violating the throughput requirement. i.e.

$FramePeriod > MeanTaskDuration$

Various ways exists for obtaining mean task durations corresponding to frame sizes for your algorithm. These concepts are covered in the Task Execution example. Mean task durations for various frame sizes are captured in the following Table-2. Because the mean task duration is greater than the calculated frame period, case #1 and #2 violate the throughput requirement.

Design to Meet Data Continuity Requirement: To meet the data continuity requirement, fill in the frame buffers in the memory (priming) before starting to stream the data. When temporary disruptions occur due to processor execution, the data is available from the preciously filled frame buffers filled earlier. Priming is accomplished by designing software logic under the soc_swhw_stream_proc/Writer/Priming subsystem, which generates a streamEnable command for the FPGA to start streaming data after the memory is almost full.

Because the task durations can vary for many reasons such as different code execution paths and variation in OS switching time, the software task might not deliver data to the FPGA through shared memory on time. This can result in loss of data continuity. Specify the mean task execution duration and its statistical distribution in the mask of the Task Manager block, and then simulate to verify if this requirement is met.

By default, the model is configured with case #3 parameters by default. Simulate the top model, and Click Data Inspector from the Simulation tab. Add bufAvail signals on the top view. In this case, the available software buffer signal does not drop to zero, and validDropLED in the top model does not light up, indicating that the data is streamed continuously.

Set the model for case # 4 as in this code and simulate the model again.

soc_swhw_stream_set_parameters(4); % row # 4

In this case, the available software buffers drop to zero, and the validDropLED in the top model lights up.

Case #4 violates the data continuity requirement. Case #3 is proven to be the optimal case that meet all of the design requirements. This Table-3 shows the updated results.

Run soc_swhw_stream_set_parameters(3) command to restore the model with case #3 parameters before deployment of the model.

Implement and Run Model on Hardware

These products are required for this section:

  • HDL Coder™

  • Embedded Coder®

  • SoC Blockset Support Package for Xilinx Devices, or SoC Blockset Support Package for Intel Devices

For more information about support packages, see SoC Blockset Supported Hardware.

To implement the model on a supported SoC board use the SoC Builder tool. By default, the model is implemented on Xilinx Zynq ZC706 evaluation kit as it is configured with that board. To open SoC Builder, click Configure, Build, & Deploy button in the toolstrip and follow these steps:

  1. On the Setup screen, select Build model. Click Next.

  2. On the Select Build Action screen, select Build and load for external mode. Click Next.

  3. On the Select Project Folder screen, specify the project folder. Click Next.

  4. On the Review Hardware Mapping screen, click Next.

  5. On the Review Memory Map screen, view the memory map by clicking View/Edit. Click Next.

  6. On the Validate Model screen, check the compatibility of the model for implementation by clicking Validate. Click Next.

  7. On the Build Model screen, begin building the model by clicking Build. An external shell opens when FPGA synthesis begins. Click Next.

  8. On the Connect Hardware screen, test the connectivity of the host computer with the SoC board by clicking Test Connection. To go to the Run Application screen, click Next.

The FPGA synthesis often takes more than 30 minutes to complete. To save time, you can use the provided pregenerated bitstream by following these steps.

  • Close the external shell to terminate synthesis.

  • Copy pregenerated bitstream to your project folder by running this copyfile command below.

copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot, ...
 'toolbox','soc','supportpackages','xilinxsoc','xilinxsocexamples', ...
 'bitstreams','soc_swhw_stream_top-zc706.bit'),'./soc_prj');

% * Click Load and Run to load the pregenerated bitstream and open the generated software model soc_swhw_stream_top_sw.

After loading the bitstream, run the generated software model soc_swhw_stream_top_sw in external mode by clicking Monitor and Tune on the toolstrip. This will light up LED2 on the board, indicating the detection of high frequency signal by the FPGA. To change the frequency of the sinusoid signal dynamically at run-time, replace the SourceSelector terminator block with a Constant block, and then run the model again in external mode. Modify the constant value from 0 to 1 to change the frequency of signal from a high to low respectively.

Implementation on other boards: To implement the model on a supported board other than ZC706, first configure the model to the supported board, and then set the example parameters as below.

  • In the Simulink® toolstrip, on the System on Chip tab, open Configuration Parameters window by clicking Hardware Settings.

  • In the Configuration Parameters window, in Hardware Implementation, select your board from Hardware board drop-down list on both top and processor model.

  • In the Hardware board settings section, expand Target hardware resources. Under Groups, click FPGA design (top level). Specify IP core clock frequency (MHz) as 10.

Next, open SoC Builder and follow the same steps as for the Xilinx Zynq ZC706 board. Modify the copyfile command to match the bitstream corresponding to your board. In case of Altera Arria® 10 SoC development kit and Altera Cyclone® V SoC development kit use below copyfile command corresponding to your board. In case of Altera Arria® 10 SoC development kit, copy '.periph.rbf' and '.core.rbf' files.

copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot, ...
 'toolbox','soc','supportpackages','intelsoc','intelsocexamples', ...
 'bitstreams','soc_swhw_stream_top-c5soc.rbf'),'./soc_prj');

The following are the available pre-generated bitstream files:

  • 'soc_swhw_stream_top-zc706.bit'

  • 'soc_swhw_stream_top-zedboard.bit'

  • 'soc_swhw_stream_top-zcu102.bit'

  • 'soc_swhw_stream_top-XilinxZynqUltraScale_RFSoCZCU111EvaluationKit.bit'

  • 'soc_swhw_stream_top-c5soc.rbf'

  • 'soc_swhw_stream_top-a10soc.periph.rbf'

  • 'soc_swhw_stream_top-a10soc.core.rbf'

In summary, this example showed how to design the data-path from processor to FPGA for continuous streaming. You designed and modeled the behavior using SoC Blockset and went through the workflow required to implement it on an SoC device.

See Also