Main Content

Call Custom CUDA Kernels from the Generated Code

From within your MATLAB® code, you can directly call external CUDA® kernels, also called custom code or legacy code. To call CUDA kernels, use coder.ceval. The code generator integrates your CUDA kernel into the CUDA code generated from MATLAB. Integrate code when there are external libraries, optimized code, or object files developed using CUDA that you want to use with your generated code.

The external CUDA kernel must use the __global__ qualifier to execute the function (kernels) on the GPU device and to call the function from the host or from the device. Functions with the __device__ qualifier are called device functions. The device functions are different from global functions in that they can only be called from other device or global functions. For information on integrating custom device functions, see Call Custom CUDA Device Function from the Generated Code.

Note

Use coder.ceval only in MATLAB code intended for code generation. coder.ceval generates an error in uncompiled MATLAB code. To determine if a MATLAB function is executing in MATLAB, use coder.target. If the function is executing in MATLAB, call the MATLAB version of the CUDA kernel.

Call Custom CUDA Kernel

This example shows how to integrate a simple CUDA kernel with MATLAB code by using coder.ceval. Consider the MATLAB function, saxpy:

type saxpy.m
function y = saxpy(a,x,y)
    y = a*x + y;
end

For this example, suppose that you want to implement the a x plus y operation by using external CUDA kernel. Consider the CUDA kernel, saxpy_kernel, implemented in the file saxpy.cu:

type saxpy.cu
#include "saxpy.h"

__global__
void saxpy_kernel(uint32_T n, real32_T a, real32_T *x, real32_T *y)
{
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) y[i] = a*x[i] + y[i];
}

To integrate saxpy_kernel with your MATLAB code, you need a header file that contains the function prototype. See the file saxpy.h:

type saxpy.h
#ifndef real32_T 
#define real32_T float
#define uint32_T unsigned int
#endif

#define saxpy(grid,block,n,a,x,y) saxpy_kernel<<<grid,block>>>(n,a,x,y)

__global__ void saxpy_kernel(uint32_T n, real32_T a, real32_T *x, real32_T *y);

This example generates CUDA MEX, uint32_T and real32_T are custom types used in the generated MEX code. The code generator produces data types in CUDA code that correspond to the data types that you use in your MATLAB code. The data types that are generated depend on the target platform and compiler. The code generator can produce either built-in CUDA/C++ data types, such as short, long, int, and so on, or custom data types defined by using typedef statements. By default, the code generator produces built-in types for standalone code (lib, dll, or exe) and custom types for MEX code. For more information, see Mapping MATLAB Types to Types in Generated Code.

Entry-Point Function

Use the coder.ceval command to call the CUDA kernel in the saxpyRef.m entry-point function. Use coder.ceval only in MATLAB code intended for code generation. The coder.rref and coder.ref commands instruct the code generator to pass pointers to the arrays, rather than copy them.

type saxpyRef.m
function y = saxpyRef(a,x,y)
%   saxpyRef Entry-point function for computing single-precision 
%   (A*X) Plus Y

%   Copyright 2022 The MathWorks, Inc.
coder.gpu.kernelfun;

if coder.target('MATLAB')
    y = a*x + y;
else
    coder.ceval('saxpy', uint32(floor((numel(x)+255)/256)), ...
        uint32(256),uint32(numel(x)), single(a), ...
        coder.rref(x,'gpu'),coder.ref(y,'gpu'));
end
end

Generate CUDA Code

To generate CUDA code, create a GPU code configuration object. Specify the location of the custom CUDA files by setting custom code properties on the configuration object. For more information, see Configure Build for External C/C++ Code.

cfg = coder.gpuConfig("mex");
cfg.GenerateReport = true;
cfg.CustomSource = "saxpy.cu";
cfg.CustomInclude = pwd;
cfg.CustomSourceCode = '#include "saxpy.h"';

aType = coder.newtype('single', [1 1], [0 0]);
xType = coder.newtype('single', [4096 256], [0 0]);
yType = coder.newtype('single', [4096 256], [0 0]);
inputArgs = {aType,xType,yType};

codegen -config cfg saxpyRef -args inputArgs
Code generation successful: View report

Generated Code

To compare your generated CUDA code to the original MATLAB code, open the CUDA file, saxyRef.cu in the work\codegen\mex\saxpyRef folder.

#include "saxpy.h"
// Function Definitions
void saxpyRef(real32_T a, const real32_T x[1048576], real32_T y[1048576])
{
  real32_T(*gpu_x)[1048576];
  real32_T(*gpu_y)[1048576];
  cudaMalloc(&gpu_y, 4194304UL);
  cudaMalloc(&gpu_x, 4194304UL);
  //    saxpyRef Entry-point function for computing single-precision (A*X) Plus
  //    Y
  //    Copyright 2022 The MathWorks, Inc.
  cudaMemcpy(*gpu_x, x, 4194304UL, cudaMemcpyHostToDevice);
  cudaMemcpy(*gpu_y, y, 4194304UL, cudaMemcpyHostToDevice);
  saxpy(4096U, 256U, 1048576U, a, &(*gpu_x)[0], &(*gpu_y)[0]);
  cudaMemcpy(y, *gpu_y, 4194304UL, cudaMemcpyDeviceToHost);
  cudaFree(*gpu_x);
  cudaFree(*gpu_y);
}

Run Generated MEX

Run the generated MEX with random inputs and compare the results with MATLAB simulation.

a = single(15);
x = randi(10,4096,256,'single');
y = zeros(4096,256,'single');

yMATLAB = saxpyRef(a,x,y);
yGPU = saxpyRef_mex(a,x,y);

if (yGPU - yMATLAB == 0)
    fprintf('\nMATLAB simulation matches GPU execution.\n');
end
MATLAB simulation matches GPU execution.

See Also

Functions

Objects

Related Topics