Video length is 23:03

MATLAB to FPGA in 5 Steps

Engineers use MATLAB® to develop algorithms for applications such as signal processing, wireless communication, and image-video processing. To develop a proof-of-concept, engineers need to implement their design on FPGA evaluation or prototyping boards. Using the HDL Coder™ workflow, discover the key steps necessary to convert and evolve a MATLAB algorithm to a readable and optimized HDL code that can be implemented on FPGAs.

Published: 1 Jul 2021

Hello everyone and welcome to this MathWorks webinar ' MATLAB to FPGA in 5 Steps', an introduction to the HDL Coder Workflow.  

I am Raghu Sivakumar part of the product marketing team in India Hyderabad, responsible for MathWorks solutions for FPGA, ASIC and SoC Development.

A quick overview of the webinar's agenda, I will talk about the Motivation for the webinar, then take a Pulse Detector Algorithm in MATLAB as an example for this webinar and then demonstrate the Design workflow.  

In this webinar I will start with the Pulse Detector Algorithm in MATLAB and by the end of the webinar, we will have a well-defined readable HDL code, with its inputs and outputs defined and its datatype defined in Fixed point. And to get to this code, I did not have to write any of the HDL. In this webinar I will show how in incremental steps I went from MATLAB to HDL Code.  

This brings me back nicely to the motivation of this webinar. We have many customers working on applications such as Signal processing, Wireless communication and image/video processing who are predominantly in MATLAB. They ask us how they can take their MATLAB algorithms, onto an FPGA evaluation board! They either have little knowledge of FPGA programming or their hardware design engineers are too busy. If this situation relates to you, I would request you to spend the next 30 minutes watching this webinar MATLAB to FPGA in 5 Steps.

Lets now take the Pulse Detector Algorithm in MATLAB. I choose the pulse detector as it’s a commonly used algorithm in many applications, and it is familiar to a wide audience. The key takeaway is the steps and rather than the algorithm, the algorithm can be a filter, a chirp signal OFDM etc.  

The pulse detector algorithm in MATLAB, I define the pulse, insert it in the transmit signal. For a real-world receive signal, I add noise to the transmit signal. Now to find the pulse, I pass it through a filter with its coefficients defined and to detect the pulse pass it through a max function. So as you can see in MATLAB I was able to work on the entire frame. But in hardware this frame will be an incoming stream of data on which operations are performed, while managing the timing of parallel paths. Simulink is good for this. We can visualize parallel paths; it has a built-in sense of timing and we can visualize data type propagation through the operations.  

Both MATLAB and Simulink work seamlessly together and you can even include MATLAB code within Simulink. This I will demonstrate during the webinar.  

The Design workflow we recommend to our customers is to take advantage of both textual programming and visual modelling of MATLAB and Simulink and building a time-based system in a simulation environment.  

Using MATLAB, we will create our Golden Reference. The part of the algorithm that we want implemented on the hardware, this will be our First step. In step 2 we will model a sample-based hardware implementation in Simulink. We will optimize our hardware in Step 3 where we will optimize for speed and resource usage. In step 4 we shall quantize the data type into Fixed-Point. And finally in Step 5 generate the HDL Code. At each step we will ensure that the functionality and correctness is verified against the MATLAB Golden Reference.

In step 2 we will model our peak detector in the Simulink environment.  

The MATLAB frame based golden algorithm will be converted into a sample-based Simulink model.  

In MATLAB we used a filter and passed the signal through the max function to detect the peak. In Simulink we will design our model to work on the incoming stream of data and detect the peak by sliding a window on the last 11 samples. To detect the peak, we will use three stages, filter the incoming stream second computing the max function, and detect the local peak. 

Let’s take a look at the demo.

We will make use of Simulink's HDL optimized library with over 250 blocks to model the hardware implementation

The ' signal from workspace' block is used to create the stream of data, this sample data is then sent through a discrete FIR FILTER block. The block parameters are the variable values from the MATLAB Algorithm. We are simply re-using the work we did in MATLAB.

In Simulink one can visualize the structure and the flow of data as it moves through the mode. We will log specific signals that we will later use to verify the Simulink Model with the MATLAB golden reference.

In MATLAB we were able to find the global max of the entire signal, and the function uses complex operations such as square root. In hardware as the signal is streaming in, we will calculate the local max. In order to save resource, rather than using complex operations like square-root, we will design it to sum both square of the real and imaginary part of the signal.

Next, we calculate the max of the most recent 11 samples. This is easily done in MATLAB and we will use the MATLAB function block to include ML code in the model. The peak is found by checking if the middle sample is larger than others and also greater than a minimum threshold.  

This method enables detection of the pulse immediately, as this processes the streamed signal as it arrives. 

The peak is held by the delay enabled block and displayed when the detected signal becomes high.

In order to verify if this Simulink model functions as per our MATLAB Golden reference, we will run a comparison between the logged signals of the Simulink Design, against the golden reference. Throughout this process, we will employ a MATLAB test bench script for the verification purpose.

The location of the peak detected in MATLAB and Simulink are the same and the correlation error is in the range of -17 eps.

With this step, we have modelled the hardware implementation in Simulink, and established that both environments MATLAB and Simulink work seamlessly together, as we even included MATLAB code inside Simulink.

The takeaway from this step -  

  1. By using Simulink's 250 and more HDL optimized blocks, we created the hardware implementation of the peak detector.
  2. It was possible to visualize the data flow from different operations
  3. Logging the Simulink signals of interest later utilize for verifying against the Golden Reference…
  4. Including MATLAB code in Simulink environment.

Having modelled our hardware implementation in Simulink, we will focus on optimizing our architecture - making it efficient and optimized hardware design.

We will first prepare our model for HDL Code generation, then use optimization techniques to control speed and area tradeoffs in hardware design. Additionally, we will add a signal validity check to the hardware design.  

Running the HDLSETUP command, calls a series of instructions which will run in the background and configure several model parameters. This sets the model solver to a discrete fixed step time solver, and each sample tick will correspond to one clock cycle of the FPGA - How fast data gets clocked and the timing of signals through parallel paths. Sample rate time on the blocks is represented in color, red depicts the fastest sample rate.

The pulse detection implementation is grouped in a sub-system which we will call Pulse-Detector. This sub-system also called Device Under Test (DUT), will be implemented on the hardware target (HDL will be generated from this part of the Simulink model). We can now focus on the individual blocks inside this Pulse Detector subsystem to optimize the hardware.  

We will replace the discrete FIR Filter block with a Discrete FIR Filter HDL hardware optimized block. This block offers filter architecture and pipeline register placement that are designed to be DSP blocks resource efficient. Simulink provides a range of such HDL optimized blocks that are optimized for hardware design.

The discrete FILTER block has a built-in signal validity check option. Using Data valid checks is a good practice and also commonly used in hardware designs that interface with non-continuous data source. And for this demo I have set the receive signal to be true, as input to the valid signal for the filter.  

Among the factors that determines how fast the clock cycle/discrete time fixed step is on the FPGA, is how much operations or computation needs to be done in a given clock cycle. Inserting pipeline registers along parallel paths improves clock speed. HDL Coder offers a variety of ways to insert pipeline stages during code generation to shorten paths so you can run at a higher frequency. Its delay balancing feature inserts matching pipeline stages on the parallel paths. I will illustrate here by inserting them all manually, which also gives me the ability to simulate this behavior at a high level.

To visualize the effect of pipelines we will insert them manually on the parallel paths and simulate the hardware design model. The parallel paths being the inputs to the FIR FILTER block and the outputs from the LOCAL Peak subsystem and the filter_valid signal path.

We will make sure the changes made to the Simulink model has not affected its functionality and verify the logged signals against the Golden Reference. Running the verification test, we can see that the location of both the MATLAB and Simulink environments are the same and also the correlation error is in the -17 eps range.

The key takeaway form Step 3

  1. We used the HDLSETUP command to prepare the model for code generation, learnt how Simulink's solver relates to the to the FPGAs clock rate.
  2. Used hardware efficient blocks and added valid signal check. We used Pipelining methods to improve the clock rate - a key factor that decides the clock speed in the functional design
  3. Also, by verifying our hardware optimization without datatype quantization, ensures quantization noise does not hide your design errors.

So far, we have converted our frame-based MATLAB algorithm to sample based Simulink model, demonstrated how MATLAB and Simulink environments work seamlessly together. We took the sample based Simulink Model, optimized it for code generation and added details for efficient use of hardware resources.

In step 4 we will quantize the data types to fixed point and verify it against the Golden Reference.

In MATLAB the data type defaults to 64-bit double precision floating point, and in digital hardware data type is represented by fixed-point as it is resource efficient and also reduces power consumption. In MATLAB fixed point is represented as signed or unsigned followed by the word length made up of the integer and fraction part.

So in this section we will take advantage of Simulink's ability to visualize data type propagations through operations and use the data type conversion blocks to convert the data into fixed point. In this demo, I have predefined my fixed-point data variables, ensuring the entire range of receive signal is represented. We will enter the fixed-point variables in the model using the data type conversion blocks. We will update the model to visualize the word length of the fixed-point data through the operations to show that Simulink automatically propagates fixed point and maintains precision. When we reach the multiply operation, we recommend to convert the signal to 18bits word-size and similarly convert the word length to 18 bits after the add operation. The reason we do that is for the multiply and add operations to get mapped to a single DSP block on the FPGA hardware.

For this demonstration it was possible to define the fixed-point variables and convert the data types, but for large and complex models, we can use the Fixed-Point Designer Tool. The tool collects the range of data by running the model, it then proposes the fixed-point data types, and we can apply the proposed value or set your own. Finally, we can run the model, see the effects of quantization, and go back and make changes where necessary. This allows for re-run the simulation with multiple quantization options and provides visual information on data overflow, underflow.

With the Device under test now quantized to fixed-point, we will verify if the changes made give us the desired results as per the Golden reference. Running the verification test, quantization errors are highlighted as a warning. This is an expected behavior due to saturation and scaling of the data-types. The whole idea of quantization for hardware design is to find the balance between efficiency and accuracy. While the location of the peak from the MATLAB and Simulink is same, the correlation error has increased due to quantization.

In step 4 we covered the following topics…

  1. We saw Simulink's ability to propagate and maintain data-type precision through the operations,
  2. Managing bit growth. Converting the word length to 18 bits so that multiply-add operations get mapped to a single DSP block
  3. Fixed point Designer a graphical user interface tool, that collects the data range, proposes fixed-point, and lets you simulate and compare results.
  4. And we verified the quantized model against the golden reference.

In the final step 5, we will generate and synthesize our HDL Code. For demo purpose we will choose the Xilinx Zynq device and in order to synthesize after generating the HDL Code, we include the Xilinx Vivado synthesis tool on the MATLAB path.

In this section, we will see the capabilities of the HDL Workflow advisor and also demonstrate reports that trace between the generated code and the Simulink model.  

To start we will run through a sequence of checks, using the HDL Code Advisor. This checks for model, ports and blocks settings to ensure compatibility for code generation. Additionally, this also enables you to check the compatibility with native floating point and Industry standard checks. As these two are not applicable, we shall skip these. When running these checks if there are errors in the checks, you can either directly make the changes recommended in the model or choose the 'MODIFY SETTINGS' to make the changes automatically.  

Having checked our model, we will use the HDL Advisor, which will take us through a series of tasks and settings selections. Here we will enter details of our target device, device package, target frequency and the synthesis tool. In HDL Code generation settings, under the optimizations section we opt for adaptive pipelining, and to get reports and choose the clock setting we will opt the necessary choices.  

Having made our selections, we can now generate the HDL Code by running the task ' Generate RTL code and Testbench'. When all the tasks have completed successfully you have generated the HDL Code which is well structured, commented and readable. The reports that we opted for provide us with information on resource utilization and estimates multiplier, DSP blocks utilization. Optimization report links to where additional registers were inserted in the final model, for example in the compute power subsystem block, you can see HDL Coder has automatically added pipeline registers in order to improve the overall clock speed. The traceability works both ways, one can either use the generated HDL Code to navigate to the Simulink model and from a block in the model trace its HDL Code. For this demonstration I have chosen the multiply block, and I am showing you how I can trace it back from the code to the model and from the model back to the code.

With the generated HDL code, you can synthesize using the HDL Workflow advisor which supports third party synthesis tools, else you can use the synthesis tool of your choice. Using the HDL Workflow, the tool creates the necessary files and folders and runs the synthesis in the background and in addition provides reports on critical timing path on the model. This design workflow enables users to explore the design at a higher abstraction level and leads to quickly finding the best hardware architecture to meet their goals.  One such customer 'OROLIA' a company which provides positioning, navigation and timing (PNT) solutions for Defense and Space applications, had to develop receiver hardware for their second-generation beacons. Prior to this they had utilized analog designs and had little experience with designing and implementing digital receiver. Using MATLAB and Simulink they designed a specialized SDR and implemented the design on an Analog Devices RF system on module. HDL Coder was used to generated synthesizable HDL. This enabled the engineers to focus on designing intelligent algorithms and by taking this design approach the engineers were able to shorten development by 8 months and reduced the FPGA implementation time by 50%.

With HDL Coder, we can connect our MATLAB algorithm and system design to FPGA prototyping hardware. With the HDL Coder you are generating synthesizable, VHDL or Verilog from MATLAB functions, Simulink models and Stateflow charts. Taking the time to build the sample-based model in Simulink, adding hardware architecture, converting to fixed point and verifying each of the steps against the golden reference lets you control resource utilization and target your algorithms on FPGA in systematic steps. The HDL code that is generated is target-independent and portable, and if you have to try the code on another device you can.

I hope what we discussed and demonstrated today was of interest to all. To find out more use the following resource links which provides information on topics of verifying the code, fixed-point conversion and I encourage you to try the HDL self-guided tutorial available on MathWorks file-exchange. If you are interested in evaluating the HDL Coder tool and want to get started visit our Getting started page or get in touch with your MathWorks Sales contacts.

Thank you all for listening and we can now spend rest of the time answering questions regarding the topics discussed.