parfeval with status check using timer class

8 visualizaciones (últimos 30 días)
Stephane Eisen
Stephane Eisen el 5 de Abr. de 2021
I am running a large design parameter sweep (~350k cases) using parfeval. The simulation is implemented in the standard way as a function handle that is passed to parfeval with the required input data (parfeval(p,@simulation,1,output). For some of the input parameter combinations, the simulation does not coverage so I created a timer that polls the futures to see if they have started and if so how long they have been running. If the run time exceeds a preset value, I cancel the future.
In my first implementation, I created all ~350k futures and then polled the futures on a preset interval using a single timer. However, the polling was extremely slow (I presume since I was polling all ~350k future every time) so I tried to create a Matrix of futures on which I create a timer object and pass only each column of futures. This results in a much smaller batch of futures to poll (2500 vs 350k)
With the small data set I used for testing (1000 futures, 10 timers) this seemed to work fine. When I expanded this to the full data set of 350k futures, I end up getting the following error., " java.lang.OutOfMemoryError: GC overhead limit exceeded" even though my machine was only using 130 of 188GB available.
With this in mind, I have the following questions:
  1. What is the source java.lang.OutOfMemoryError: GC overhead limit exceeded?
  2. Is there a limit to the size of the futures vector/matrix I should create?
  3. Is there a more efficient way to poll the execution time of each future?
Any suggestions to create a more robust implementation are also appreciated.
futureRows = 2500;
futureCols = 140;
F(futureRows,futureCols) = parallel.FevalFuture;
for pIdx = 1:size(F,2)
for fIdx = 1:size(F,1)
linIdx = sub2ind(size(F),fIdx,pIdx);
F(fIdx,pIdx) = parfeval(p,@simulation,1,inputData(linIdx));
end
futureTimer(pIdx) = parfevalTimer(F(:,pIdx));
end
the parfevalTimer function looks like this:
function t = parfevalTimer(F,timeout,refreshRate)
arguments
F {mustBeA(F,'parallel.FevalFuture')}
timeout {mustBeNumeric} = 600 % Default 600 second runtime limit
refreshRate {mustBeNumeric} = 600 % Default 600 second refresh rate
end
timeout = seconds(timeout); % Convert to Duration class
% Prepare timer object
t = timer('ExecutionMode','fixedSpacing');
t.UserData = struct('Futures',F,'Timeout',timeout);
t.TimerFcn = @(~,event)checkParpool(t);
t.Period = refreshRate;
t.StopFcn = @timerCleanup;
start(t)
function timerCleanup(s,~)
disp('Stopping Timer')
delete(s)
function checkParpool(t)
F = t.UserData.Futures;
emptyLocal = NaT('TimeZone','local');
state = {F.State};
nCompleted = sum(ismember(state,'finished'));
if nCompleted == numel(F)
stop(t);
end
idxRunning = find(ismember(state,'running'));
hasTimeoutFcn = @(start,finish)~isnat(start)&isnat(finish)&(datetime('now','TimeZone','local')-start)>t.UserData.Timeout; % Determine if task has timed out
start = {F(idxRunning).StartDateTime};
[start{cellfun(@isempty,start)}] = deal(emptyLocal);
finish = {F(idxRunning).FinishDateTime};
[finish{cellfun(@isempty,finish)}] = deal(emptyLocal);
idxTerminate = cellfun(@(Ts,Tf)hasTimeoutFcn(Ts,Tf),start,finish);
if sum(idxTerminate)
F(idxRunning(idxTerminate)).cancel;
disp('Number of cases that timed out in the timer period: ' + string(sum(idxTerminate)));
end

Respuestas (0)

Categorías

Más información sobre Asynchronous Parallel Programming en Help Center y File Exchange.

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by