The GPU Coder™ Support Package for NVIDIA® GPUs uses the GPU Coder product to generate CUDA® code (kernels) from the MATLAB® algorithm. These kernels run on any CUDA enabled GPU platform. The support package automates the deployment of the generated CUDA code on GPU hardware platforms such as Jetson or DRIVE
In this tutorial, you learn how to:
Prepare your MATLAB code for CUDA code generation by using the
Connect to the NVIDIA target board.
Generate and deploy CUDA executable on the target board.
Run the executable on the board and verify the results.
NVIDIA DRIVE or Jetson embedded platform.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
NVIDIA CUDA toolkit installed on the board.
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards.
GPU Coder for CUDA code generation. For help on getting started with GPU Coder, see Get Started with GPU Coder (GPU Coder).
NVIDIA CUDA toolkit on the host.
Environment variables on the host for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Environment Variables (GPU Coder).
This tutorial uses a simple vector addition example to demonstrate the build and
deployment workflow on NVIDIA GPUs. Create a MATLAB function
myAdd.m that acts as the
entry-point for code generation. Alternatively, you can use the
files provided in the Getting Started with the GPU Coder Support Package for NVIDIA GPUs example. The easiest way to
generate CUDA kernels for your MATLAB algorithm is to place the
coder.gpu.kernelfun pragma in the entry-point function. When GPU Coder encounters
kernelfun pragma, it attempts to parallelize all
the computation within this function and then maps it to the GPU.
function out = myAdd(inp1,inp2) %#codegen coder.gpu.kernelfun(); out = inp1 + inp2; end
The support package software uses an SSH connection over TCP/IP to execute commands required for building and running the generated CUDA code on the DRIVE or Jetson platforms. Connect the target platform to the same network as the host computer. Alternatively, you can use an Ethernet crossover cable to connect the board directly to the host computer. Refer to the NVIDIA documentation on how to set up and configure your board.
To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the
To create a live hardware connection object, provide the host name or IP address, user name,
and password of the target board. For example to create live object for Jetson
hwobj = jetson('192.168.1.15','ubuntu','ubuntu');
The software performs a check of the hardware, compiler tools and libraries, IO server installation, and gathers information on the peripherals connected to the target. This information is displayed on the MATLAB Command Window.
Checking for CUDA availability on the Target... Checking for 'nvcc' in the target system path... Checking for cuDNN library availability on the Target... Checking for TensorRT library availability on the Target... Checking for prerequisite libraries is complete. Gathering hardware details... Checking for third-party library availability on the Target... Gathering hardware details is complete. Board name : NVIDIA Jetson TX2 CUDA Version : 10.0 cuDNN Version : 7.5 TensorRT Version : 5.1 GStreamer Version : 1.14.5 V4L2 Version : 1.14.2-1 SDL Version : 1.2 Available Webcams : MicrosoftÂ® LifeCam Cinema(TM) Available GPUs : NVIDIA Tegra X2
Alternatively, to create live object for DRIVE hardware:
hwobj = drive('126.96.36.199','nvidia','nvidia');
If there is a connection failure, a diagnostic error message is reported on the MATLAB Command Window. The most likely cause of a failed connection is incorrect IP address or host name.
To generate a CUDA executable that can be deployed to an NVIDIA target, create a custom main wrapper file
main.cu and its
associated header file
main.h. The main file calls the entry-point
function in the generated code. The main file passes a vector containing the first 100
natural numbers to the entry-point function and writes the results to the
myAdd.bin binary file.
Create a GPU code configuration object for generating an executable. Use the
function to create a configuration object for the DRIVE or Jetson platform and assign it to
Hardware property of the code configuration object
cfg. Use the
BuildDir property to specify the folder
for performing remote build process on the target. If the specified build folder does not
exist on the target, the software creates a folder with the given name. If no value is
cfg.Hardware.BuildDir, the remote build process happens in
the last specified build folder. If there is no stored build folder value, the build process
takes place in the home folder.
cfg = coder.gpuConfig('exe'); cfg.Hardware = coder.hardware('NVIDIA Jetson'); cfg.Hardware.BuildDir = '~/remoteBuildDir'; cfg.CustomSource = fullfile('main.cu');
The GPU code configuration object uses the default compute capability value specified
coder.gpuConfig. To use the complete set of features supported by your
CUDA GPU and to reduce numerical mismatches, set the
ComputeCapability property of the code configuration object to match
your GPU specifications. You can use the
GPUInfo property of the
hardware connection object to get the compute capability value for the GPU on your
To generate CUDA code, use the
codegen command and pass the GPU code configuration object along with the size
of the inputs for the
myAdd entry-point function. After the code
generation takes place on the host, the generated files are copied over and built on the
To run the executable on the target hardware, use the
method of the hardware object. In the MATLAB Command Window, enter:
pid = runApplication(hwobj,'myAdd');
### Launching the executable on the target... Executable launched successfully with process ID 26432. Displaying the simple runtime log for the executable...
Copy the output bin file
myAdd.bin to the MATLAB environment on the host and compare the computed results with the simulation
results from MATLAB.
outputFile = [hwobj.workspaceDir '/myAdd.bin'] getFile(hwobj,outputFile); % Simulation result from the MATLAB. simOut = myAdd(0:99,0:99); % Read the copied result binary file from target in MATLAB. fId = fopen('myAdd.bin','r'); tOut = fread(fId,'double'); diff = simOut - tOut'; fprintf('Maximum deviation : %f\n', max(diff(:)));
Maximum deviation between MATLAB Simulation output and GPU coder output on Target is: 0.000000