Speeding up code which integrates a 2D gpuArray matrix using Simpson's Rule

1 view (last 30 days)
Thomas Barrett
Thomas Barrett on 10 Feb 2021
Edited: Thomas Barrett on 11 Feb 2021
I have a (real) 2D gpuArray, which I am using as part of a larger code, and now am trying to also integrate the array using the Composite Simpson Rule inside my main loop (several 10000 iterations at least). A MWE looks like the following:
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501; % Dimensions of matrix M
Nx = 501; %
dx = 0.1; % Grid spacings
dy = 0.2; %
M = rand(Ny, Nx, 'gpuArray'); % Initialise a matrix
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% INTEGRATOR %%%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;
% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The operation of performing the integration is around 850 µs, which is currently a significant part of my code. This was measured using
f = @() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)
Is there a way to reduce the execution time for integrating the gpuArray matrix?
(The graphics card is the Nvidia Quadro P4000)

Accepted Answer

Jan
Jan on 10 Feb 2021
Edited: Jan on 10 Feb 2021
These are faster at least on CPU - I do not have graphics hardware in my laptop which supports GPU arrays:
function I = simpsons_integration_2D_v2(F, dx, dy, Nx, Ny)
% Avoid summation of large intermediate arrays:
sX = F(:,1) + 2 * sum(F(:, 3:2:(Nx-2)), 2) + ...
4 * sum(F(:, 2:2:(Nx-1)), 2) + F(:,Nx);
sY = sX(1) + 2 * sum(sX(3:2:(Ny-2))) + ...
4 * sum(sX(2:2:(Ny-1))) + sX(Ny);
I = dx * dy / 9 * sY;
end
function I = simpsons_integration_2D_v3(F, dx, dy, Nx, Ny)
% Perform summation as dot product:
v = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1];
sX = F * v';
if Nx ~= Ny
v = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
end
sY = v * sX;
I = dx * dy / 9 * sY;
end
Compared timeit output on my mobil i5:
0.00111591015 original
0.00066621015 without large intermediate array
0.00012221015 as matrix multiplication
So this is faster on my old 2 core i5 than on your GPU. Maybe this gives a speedup on your GPU also. I assume, in version v3 the vector v must be created on the GPU also.
  3 Comments
Thomas Barrett
Thomas Barrett on 11 Feb 2021
Hi @Jan. Thank you. Along with the help of rahnema1 over on SE (https://stackoverflow.com/q/66140278/12921500), we arrived at a quick method by expanding the computation into a dot product. Since none of Nx, Ny, dx or dy are changing in the main loop, much of this can be precalculated into a single vector Simp_coeff as follows:
function Simp_Coeffs = generate_coeffs_for_SimpsonIntegration(dx, dy, Nx, Ny)
% Generate coefficients for integrating a 2D function with Nx columns and Ny rows,
% and grid spacings dx and dy, using Simpson's rule.
% If Nx and Ny are odd the integral can be calculated directly using Simpson's
% composite rule. If Nx/Ny are even, the composite Simpson's rule is used for the
% first (n-3) = even strips, and the Simpson's 3/8 rule is used for the final three strips.
% Credit for this goes here: https://scicomp.stackexchange.com/a/25669/38005
% and here: https://stackoverflow.com/q/66140278/12921500
if rem(Nx,2) % If number of columns is odd
SimpVec1 = (dx/3)*[1, repmat([4, 2], 1, (Nx-3)/2), 4, 1];
else % If number of columns is even
SimpVec1 = dx*[1/3, repmat([4, 2]/3, 1, (Nx-6)/2), 4/3, [17/3 9 9 3]/8 ];
end
if rem(Ny,2) % If number of rows is odd
SimpVec2 = (dy/3)*[1, repmat([4, 2], 1, (Ny-3)/2), 4, 1];
else % If number of rows is even
SimpVec2 = dy*[1/3, repmat([4, 2]/3, 1, (Ny-6)/2), 4/3, [17/3 9 9 3]/8 ];
end
% Convert from matrix to vector, ready to perform dot product later
Simp_Coeffs = reshape(SimpVec1 .* SimpVec2.', 1, []);
end
where I have now accounted for the possibility of an even number of points being supplied (Simpson's composite rule only works for odd number of points usually).
The integrator function now looks like:
function I = simpsons_integration_2D_dotProduct(F, Simp_coeffs)
% Perform integration using Simpson's composite rule as dot product
I = Simp_coeffs * F(:);
end
and the main code looks like:
Simp_Coeffs = generate_coeffs_for_SimpsonIntegration(dx, dy, Nx, Ny);
Simp_Coeffs = gpuArray(Simp_Coeffs); % Move coefficients to GPU
%%% Main Loop %%%
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D_dotProduct(F, Simp_coeffs); % Now integrate M
end
%%%%%%%%%%%%%%%%%
This reduces the time taken to 77 µs down from 230 µs on my GPU (again using gputimeit, averaged over 1000 runs), which is more than a factor of x11 speedup over the 850 µs of my code in the OP.
Thank you very much, I have learned a lot with this exercise.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by