why is mex parfor slower them mex for?

I am starting to work with the Parallel Computing Toolbox, and just constructed an FIR filter example to compare for and parfor
coefs = [-0.00393617608745112 -5.95945405003999e-05...] length 1x10498
values = [30.3750000000000 30.3760000000000...] length 1x131000
tic;
outVal = FIRMP(coefs,values);
%outVal = FIRMP_mex(coefs,values);
time = toc;
with function FIRMP
function [result] = FIRMP(coefs, values)
coefLen = length(coefs);
valLen = length(values);
result = zeros(size(values));
(par)for I = 1 : valLen - coefLen;
suma = 0;
for J = 1 : coefLen
suma = suma + coefs(J)*values(I + J);
end
result(I) = suma;
end
end
I used 4 threads and got this results
for : time= 13.5s
parfor: time = 5.5s
It is OK, but if I create C++ mex (matlab CODER) and run again, the result has changed
for : time = 3.1s
parfor: time = 4.3s
why is the 'parfor' in C++ mex slower than 'for'?

 Respuesta aceptada

Ryan Livingston
Ryan Livingston el 29 de Ag. de 2018
Editada: Ryan Livingston el 29 de Ag. de 2018
When I try your example on Linux (Debian 9) using GCC I see a good speedup with parfor in generated MEX:
for : time = 1.3s
parfor : time = 0.4s
On Windows 10 using Microsoft Visual Studio 2017, I see a much more modest speedup:
for : time = 1.3s
parfor : time = 1.0s
What compiler and OS are you using?
One thing that may be happening for certain compilers is that each of the parfor loop iterations are very fast. When this is the case, the overhead of managing threads can dominate the loop execution time. This can ruin any possible parallelism gains.
The Coder documentation covers this in some detail:
as does the MATLAB parfor documentation:

6 comentarios

Josef Shrbeny
Josef Shrbeny el 30 de Ag. de 2018
I am using Windows 8.1 and Microsoft Visual Studio 2015 compiler
Good to know. I tried in R2016b with a few other MEX compilers and here's what I saw:
Microsoft Visual Studio 2015:
Similar times to what I posted above
MinGW: The version of MinGW supported in R2016b
does not support OpenMP
Intel C++ Composer XE 2013 with Microsoft Visual Studio 2013 (C)
for : time = 0.5-0.6s
parfor : time = 1.6s
So it looks like the compiler makes a good bit of difference for your example with OpenMP.
Possibly of interest, I also tried a newer MinGW in MATLAB R2018a which supports OpenMP and saw slightly more improvement over Visual Studio:
for : time = 1.4s-1.5s
parfor : time = 0.6s-0.7s
Upgrading to R2018a and using MinGW could be an option to get more of a speedup.
Thanks Ryan, you are right, it looks like it is caused by a compiler.
Interesting exemple. I rewrote the code this way
function [result] = FIRMP(coefs, val)
coefLen = length(coefs);
valLen = length(val);
result = zeros(size(val));
values = zeros(1, coefLen + valLen);
values(coefLen : coefLen + valLen -1 ) = val;
parfor I = 1 : valLen
result(I) = reshape(values(I : I + coefLen - 1) ,1,[])*reshape(coefs,[],1);
end
end
it takes 4,9s (mex C++). If I did one small change (line 6)
function [result] = FIRMP(coefs, val)
coefLen = length(coefs);
valLen = length(val);
result = zeros(size(val));
values = zeros(1, coefLen + valLen);
values(coefLen + 1 : coefLen + valLen ) = val;
parfor I = 1 : valLen
result(I) = reshape(values(I : I + coefLen - 1) ,1,[])*reshape(coefs,[],1);
end
end
it takes only 1.4s (mex C++). But it is still longer than not creating mex C++ and run matlab code (0.9s)
Ryan Livingston
Ryan Livingston el 31 de Ag. de 2018
That's a surprising difference for those two examples. I'm not able to see such a difference when I try those examples. If you have specific reproduction steps for those (with codegen commands, how to run the MEX files, and timing commands) and are willing to share them with MathWorks Technical Support, our team would like to take a further look into this.
The fact that the generated code is about the same performance as MATLAB isn't too surprising in this case. The operations you are performing: reshape, indexing, mtimes, etc. are all built-in operations in MATLAB. So they are already compiled and hand-optimized. When such operations dominate the runtime of your MATLAB code, there's no real expectation that the generated MEX would be faster.
That's covered in the documentation here:
with some performance tips here:
Josef Shrbeny
Josef Shrbeny el 2 de Sept. de 2018
Ryan, Using variables 1 x :n instead of 1 x :inf (both input and local) solved the problem
Now, the 'parfor mex C++' is about 3x faster than 'for mex C++'. (4 threads) Thank you for all your help. You gave me very useful tips and links.
Ryan Livingston
Ryan Livingston el 3 de Sept. de 2018
You're welcome Josef. Glad to hear you found a solution.

Iniciar sesión para comentar.

Más respuestas (0)

Etiquetas

Preguntada:

el 29 de Ag. de 2018

Comentada:

el 3 de Sept. de 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by