Simple parfor loop slow

2 visualizaciones (últimos 30 días)
Manuel Santana
Manuel Santana el 29 de Ag. de 2024
Comentada: Sam Marshalik el 31 de Ag. de 2024
In my code I am running a parfor loop which has to compute many matrix inversions of a moderately sized matrix. Below is some code which captures the essenence of my code.
nps = 10; foo = zeros(nps,1);
u = rand(1,6000,'like',1i); v = rand(6000,1,'like',1i);
mat = rand(6000,6000,'like',1i);
tic
parfor (ii = 1:nps,10)
foo(ii) = u * (mat \ v);
end
partime = toc
tic
for ii = 1:nps
foo(ii) = u * (mat \ v);
end
sertime = toc
For some reason the parfor loop is slower than the serial loop. For example with nps = 10 I get sertime = 11.2867, partime = 20.7321. If I inecrease to nps = 100 then sertime = 111.8209, partime = 126.8961. Note, I am running this code on a cluster using matlab parallel server using a slurm profile with 10 workers, (allowing more threads avaliable to each worker didn't help either).
Any thoughts on why the parfor loop doesn't provide the speedup expected?
As a side note in my actual code the matrix changes every loop iteration, but the above code still captures the bahavior I cannot explain.

Respuesta aceptada

Sam Marshalik
Sam Marshalik el 31 de Ag. de 2024
I don't think ThreadPool will help here. I ran the code in a Process pool and a Thread pool and the runtime was somewhat similar. I also double checked how much data is being sent between the MATLAB client and workers and it is not a lot:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 576198886.00 614.00
2 576198886.00 614.00
3 576198886.00 614.00
4 576198886.00 614.00
5 576198886.00 614.00
6 576199557.00 1081.00
7 576199557.00 1081.00
8 576198886.00 614.00
Total 4609592430.00 5846.00
ThreadPool can certainly help when working with large data, but I do not think it is the culprit here.
I think the culprit is multi-threading. Running the code serial took me 33 seconds and running it on a single worker with no multi-threading took 78 seconds. This means that some multi-threading is happening behind the scenes.
I think you had the right idea of giving your parallel workers access to more threads. For example, in serial the code took 33 seconds. I then started a single worker and gave it access to 8 threads and that ran in 38 seconds (5 seconds for overhead is reasonable). I think as the problem scales up and you can have more workers with more threads you will get more of a benefit from MATLAB Parallel Server.
P.S. you may want to explore using sliced input variables as your data gets larger, so you can send chunks of data to the workers instead of the entire matrix/array.

Más respuestas (1)

Ronit
Ronit el 30 de Ag. de 2024
Hello Manuel,
Since you are working on large complex data, and 10 MATLAB workers, the data must be copied to each of the workers, and the results must be copied back. This takes time.
I would suggest that you set up your workers to be threads, not separate processes. In this way, they use shared memory and data doesn’t need copying. You can do this with parpool(“threads”). This will significantly reduce the execution time of parfor loop.
Please refer to the documentation link of Run MATLAB Functions in Thread-Based Environment for more information:
I hope it helps with your query!
  3 comentarios
Manuel Santana
Manuel Santana el 30 de Ag. de 2024
Great thanks! I found that more threads and scaling the problem up did help increase the runtime as I expected. If you repost this reply as an answer I will accept it.
Sam Marshalik
Sam Marshalik el 31 de Ag. de 2024
All set :)

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by