Main Content

coder.gpu.constantMemory

Pragma that maps a variable to the constant memory on GPU

Description

coder.gpu.constantMemory(v) maps the variable v to the constant memory space on the GPU device. Place this pragma within a parallelizable loop. If GPU Coder™ generates a kernel for the loop, it loads v to a device constant memory variable. It replaces any access to this variable within the kernel by access to the constant memory variable. Within the kernel, the variable v must be read-only. Otherwise, GPU Coder ignores this pragma. Use this pragma when every thread accesses every element of the parameter array or matrix.

This function is a code generation function. It has no effect in MATLAB®.

example

Examples

collapse all

This example shows how to map an input to the constant memory space on the GPU by using the coder.gpu.constantMemory pragma.

Write an entry-point function myFun that accepts two inputs a of size 256x256 and constant k of size 1x3. The function has a nested for-loops that adds the constants to each element of a. To create a kernel, place the coder.gpu.kernel() pragma outside the nested for-loop. The coder.gpu.constantMemory(k) places the read-only input k into the constant memory of the GPU.

function b = myFun(a,k)
  b = coder.nullcopy(zeros(size(a)));
  coder.gpu.kernel();
    for j = 1:256
      for i = 1:256
        coder.gpu.constantMemory(k);  
        b(i,j) = a(i,j) + k(1) + k(2) + k(3);
      end
    end
end

Create a configuration object for MEX code generation.

cfg = coder.gpuConfig('mex');

Define a cell array input that declares the size and data type of the inputs a,k to the function myFun.

input = {ones(256),ones(1,3)}

Generate a MEX function myFun_mex by using -config, -args, and -report options to specify configuration, provide input arguments, and generate a code generation report.

codegen -config cfg -args input -report myFun

In the report, on the C code tab, click myFun.cu.

The read-only variable k is declared as const_k by using the __constant__ qualifier as shown in the code snippet.

/* Variable Definitions */
__constant__ real_T const_k[3];

cudaMemcpyToSymbol call copies the value of k from the host to the device constant memory const_k.

  cudaMemcpyToSymbol(const_k, k, 24U, 0U, cudaMemcpyHostToDevice);
  cudaMemcpy(gpu_a, a, 524288U, cudaMemcpyHostToDevice);
  myFun_kernel1<<<dim3(128U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_a, gpu_b);
  cudaMemcpy(b, gpu_b, 524288U, cudaMemcpyDeviceToHost);  

The kernel body accesses the constant const_k and adds it to each element of a

static __global__ __launch_bounds__(512, 1) void myFun_kernel1(const real_T *a,
 real_T *b)
{
  int32_T i;
  int32_T j;
  int32_T threadIdX;
  threadIdX = (int32_T)(blockDim.x * blockIdx.x + threadIdx.x);
  i = threadIdX / 256;
  j = threadIdX - i * 256;
  if ((!(j >= 256)) && (!(i >= 256))) {
    b[i + (j << 8)] = ((a[i + (j << 8)] + const_k[0]) + const_k[1]) + const_k[2];
  }
}

Input Arguments

collapse all

The name of the variable that must be mapped to the constant memory space on the GPU device.

Version History

Introduced in R2017b

Go to top of page