mex cuda code for matrix multiplication

Question

0 votos

Hello,

I made a simple mex cuda code to calculate multiplication of two matrices of size NxN but never get the same results as in matlab command

C = A*B except for B is a diagonal matrix. Just wondering if I was wrong at any point? The code is attached. Thanks.

#include "mex.h"

#include "cuda_runtime.h"

// CUDA kernel for matrix multiplication

__global__ void matrixMultiplication(const double* A, const double* B, double* C, int N) {

int row = blockIdx.y * blockDim.y + threadIdx.y;

int col = blockIdx.x * blockDim.x + threadIdx.x;

if (row < N && col < N) {

double sum = 0;

for (int i = 0; i < N; i++) {

sum += A[row * N + i] * B[i * N + col];

}

C[row * N + col] = sum;

}

void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[]) {

/* Declare all variables */

const double* A;

const double* B;

double* C;

double* d_A = nullptr;

double* d_B = nullptr;

double* d_C = nullptr;

int N;

A = mxGetPr(prhs[0]);

B = mxGetPr(prhs[1]); // Use prhs[1] for matrix B

N = mxGetN(prhs[0]); // Get the number of columns of matrix A

/* Initialize output array */

plhs[0] = mxCreateDoubleMatrix((mwSize)N, (mwSize)N, mxREAL);

C = mxGetPr(plhs[0]);

// Allocate device memory for matrices A, B, and C

cudaMalloc((void**)&d_A, N * N * sizeof(double));

cudaMalloc((void**)&d_B, N * N * sizeof(double));

cudaMalloc((void**)&d_C, N * N * sizeof(double));

// Copy matrices A and B from host to device

cudaMemcpy(d_A, A, N * N * sizeof(double), cudaMemcpyHostToDevice);

cudaMemcpy(d_B, B, N * N * sizeof(double), cudaMemcpyHostToDevice);

// Define grid and block dimensions

dim3 dimGrid(128, 128);

dim3 dimBlock(1, 1);

// dim3 dimGrid((N + 15) / 16, (N + 15) / 16); // Adjust block size as needed

// dim3 dimBlock(16, 16);

// Launch the CUDA kernel

matrixMultiplication << <dimGrid, dimBlock >> > (d_A, d_B, d_C, N);

cudaDeviceSynchronize();

// Copy the result matrix C from device to host

cudaMemcpy(C, d_C, N * N * sizeof(double), cudaMemcpyDeviceToHost);

// Free device memory

cudaFree(d_A);

cudaFree(d_B);

cudaFree(d_C);

}

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Varun el 30 de Oct. de 2023

Abrir en MATLAB Online

0 votos

Hi Chieh-Hsun,

Looks like you are trying to calculate matrix multiplication of matrices A and B using "mex" CUDA code but the results are not matching when calculated using MATLAB command window as "C=A*B".

There are two orderings in which a 2D matrix can be flatten into 1D matrix:

Row-major wise: In row-major order, you traverse the 2D matrix row by row, collecting the elements one row at a time. You start with the first row, then move to the second row, and so on until you've visited all the rows.
Column-major wise: In column-major order, you traverse the 2D matrix column by column, collecting the elements one column at a time. You start with the first column, then move to the second column, and so on until you've visited all the columns.

For Example, consider the following matrix:

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]]

Row-major order: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Column-major order: [1, 4, 7, 2, 5, 8, 3, 6, 9]

So, the CUDA code which you have shared uses row-major wise sorting to flatten 2d matrix "A" and "B" into 1d array. It then calculates matrix multiplication on these 1d arrays and stores the result into new 1d array "C", again it is in row-major order.

But the "mexFunction", by default converts these 2d matrices into 1d arrays using column-major sorting.

Here 1d arrays "A" and "B" is in column-major sorting which was expected to be in row-major sorting.

A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);

And, finally the 1d array "C" is returned to the MATLAB as 2d array and conversion is based on the assumption that "C" is column-major 1d array.

C = mxGetPr(plhs[0]);

To resolve the above issues related to array orderings in the "mex" CUDA code, you can just pass transpose of the matrices "A" and "B" i.e. " A' " and " B' " to the CUDA code and again do transpose of the returned matrix in MATLAB after calling the CUDA code. Please refer to the following code snippets:

C = CUDA_MATRIX_MUL(A', B')
C = C'
%% C is the final answer

Hope it helps.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

mex cuda code for matrix multiplication

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Community Treasure Hunt

mex cuda code for matrix multiplication

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos