'parfor' seems to assign jobs of different loopvars to cores beforehand. How may I assign jobs dynamically to speed up calculation?

3 visualizaciones (últimos 30 días)
I am using parfor for speeding up calculation.
Lyaps=zeros(nsample,1);
flags=false(nsample,1);
parfor k=1:nsample
[Lyaps(k),flags(k)] = SympLargestLyap(SamplePoints(:,k),SympFx,opts);
end
For different 'k', the time for function 'SympLargestLyap (SamplePoints(:,k),SympFx,opts) ' to complete can vary a lot.
I find that when calculation for most k's has completed,
n = length(find(Lyaps==0));
'n' being much smaller than 'nsample', the calculation slows down.
Those remaining unfinished k's, are those whose
SympLargestLyap(SamplePoints(:,j),SympFx,opts)
takes a long time.
However, when calculation for most k's had completed and I checked my cpu, there were only three or four cores occupied, although parallel pool has 32 workers. It seems that the remaining 20s cores have fnished their job and are in rest.
May I assign the jobs for each core dynamically, so majority of them do not rest like this?

Respuesta aceptada

Walter Roberson
Walter Roberson el 15 de Dic. de 2024
You can create a parforOptions object specifying RangePartitionMethod "fixed" SubrangeSize 1
and pass that parforOptions object to parfor()
This tells parfor to only allocate a single index to each worker, with the worker going back to ask for the next available task after performing the single iteration.
Normally parfor assigns chunks of indices to each worker, with the first chunk accounting for roughly 2/3 of the iterations, and the second chunk accounting for roughly 20% of the iterations, and then the remaining 10% assigned to single iterations.
When all of the iterations take roughly the same time, then the auto method works fine with minimal final waiting.
However, there is always the possibility that by chance one of the initial chunks happens to have a group of iterations that take much longer than average. For example you might be running through files but most of the files might be mostly empty and the few more substantive files might happen to be clustered near each other, so the worker that got that range of indices might take a long time while the other workers might all be fast.
By allocating SubrangeSize as 1 then you might still end up with single workers that take a long time, but you will not end up in a situation where there are multiple long iterations stuck on the same worker.

Más respuestas (0)

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Productos


Versión

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by