Random Parfor iterations missing

Was just curious as to whether anyone else had encountered something like this.
I'm running a designed experiment. There are 33 treatments in the design, so the simulation runs each one and writes out a file with 33 lines, one per design point. The parfor loop is at the level, as each treatment level runs independently from the others.
This whole process is then iterated in order to provide replications at each design point.
I just ran 600 replications over night to generate a bunch of data for variance characterization. When I imported the data, I found that of my 600 output files, 8 of them had only 32 lines instead of 33. Each one was missing one treatment level...only one treatment was missing twice, and there's no identifiable pattern, or at least none that I can discern:
Run --- Missing Treatment
272 --- 10
273 --- 3
278 --- 1
319 --- 21
329 --- 20
367 --- 21
424 --- 19
487 --- 8
Just wondering if anyone else had encountered something similar...and if they maybe found a way to mitigate it.

7 comentarios

José-Luis
José-Luis el 16 de Jul. de 2014
Sounds like a race condition. What function writes the offending lines?
Sean de Wolski
Sean de Wolski el 16 de Jul. de 2014
What happens if you run a regular for-loop over the failed iterations? Do you get 33 treatments?
Jeremy
Jeremy el 16 de Jul. de 2014
Editada: Jeremy el 16 de Jul. de 2014
The general structure is
DPSimDriver
  1. Load design
  2. parfor i = 1:size(design) call DPSim(design parameters)
The last thing DPSim does is
FileIDResults = -1;
while FileIDResults == -1
FileIDResults = fopen('project_results.txt','a');
end
fprintf(....);
fclose('all');
return
So each iteration should wait unti it can grab the file and write out its results...or have I missed something? Those are the only file operations...
Sean de Wolski
Sean de Wolski el 16 de Jul. de 2014
The fclose('all') worries me. I don't think that it should have an effect but I can't say for sure. fclose(fid) is a much smaller hammer approach that should be fine.
And what about just running a regular for-loop over the simulations that failed to see if they fail again serially?
Jeremy
Jeremy el 16 de Jul. de 2014
I've only just discovered the issue, so I haven't tried to run a regular for loop vice a parfor. It took all night to run with 4 workers....
Sean de Wolski
Sean de Wolski el 16 de Jul. de 2014
Just rerun the failed iterations to see if it's something with them specifically.
Jeremy
Jeremy el 16 de Jul. de 2014
I can certainly do that...but I'd like to figure out what the root cause was. It was a pain in the butt to sift through 600 sets of output to identify the ones that were missing elements and I'd rather not have to do that again.
Plus, I can't see how it would be something with those specific iterations. For example, the first one...run number 272 didn't have a line of output for the 10th design point. That design point ran just fine 599 other times....
Puzzling....

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Etiquetas

Preguntada:

el 16 de Jul. de 2014

Comentada:

el 16 de Jul. de 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by