2 views (last 30 days)

I am developing an application that MUST take advantage of parallelization, and ideally offer real-time updates after each iteration, which makes use of parfeval prefarable. I believe the algorithm that I have developed is highly parallelizable (see attached for performance of 'WT_Ex_2_b' as a function of number of cores used in parfeval function). From 1 to 8 cores, the speedup factor agrees with theoretical expectation (Amdahl's Law with p=0.95), however, performance of my application saturates at 8 cores. This led me to create a dummy function (see attached script) to compare the performance of using parfor and parfeval as a function of number of cores. I discovered that the parfor version behaves quite similarly to theoretical expectation (Ahmdal's Law, also with p=0.95), however the parfeval version continues to show strange saturation behavior, even for the dummy function. Notice how the Speedup factor improves with core number upto 12 cores, then suddenly no further improvement is observed. I have attached the script in case you want to reproduce this behavior on your end.

Is there a fundamental limitation to the number of cores the parfeval function can leverage? Or is there an obvious mistake I am making in the way I am using the parfeval function? Why does the performance behavior of the dummy algorithm suddenly saturate at 12 cores? Any recommendation how to use the parfeval function to perform as well as parfor?

I would like to emphasize that I have already developed my application to use parfeval, so converting to parfor would be time-consuming and prevent me from utilizing the update-after-iteration feature of parfeval.

Thank you for your help on this critical matter.

Edric Ellis
on 1 Jul 2020

The main difference between parfor and parfeval is that in the parfeval case, you are responsible for scheduling the work on the workers. parfor has an advantage over parfeval in that it knows how many loop iterations there are, and so what it does is schedule a fixed number of chunks of work per worker (see the documentation for parforOptions - the chunks are referred to as "sub-ranges"). So, in your case, parfeval will incur more overhead since each parfeval request is sent on its own to a worker, where as parfor groups things together, and this will generally be more efficient in the case where the request durations are of a similar duration to the overheads of making a single remote request.

So, parfeval doesn't have a fundamental limitation in this regard, but you might need to amalgamate your requests if they are too short to match parfor performance. Another option might be to use parfor together with DataQueue which would let you perform updates at the client after each parfor iteration completes.

Edric Ellis
on 8 Jul 2020

As N exceeds the number of workers in the pool, the parfor machinery breaks the loop up into "subranges" to send to the workers. So, each worker will get a subrange of a number of loop iterations. objList here is a cell array, so I don't see how elements of that can interact with each other unless they're created that way. The other suspect here is the "broadcast" variable W. This will get copied once to each worker, and then the same instance will be used for multiple "subranges". So, if this has handle behaviour, that might explain it. Here's the sort of thing I'm thinking of.

h = containers.Map();

out = cell(1, 10);

parfor i = 1:10

% Following line is to fool parfor into letting me modify 'h'

hh = h;

% Check the current contents of 'h'

out{i} = hh.keys();

% Modify 'h'

hh(string(i)) = magic(i);

end

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 4 Comments

## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919465

⋮## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919465

## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919483

⋮## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919483

## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919636

⋮## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919636

## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919642

⋮## Direct link to this comment

https://la.mathworks.com/matlabcentral/answers/557377-why-does-performance-of-functions-saturate-with-number-of-cores-using-parfeval-but-not-with-parfor#comment_919642

Sign in to comment.