Borrar filtros
Borrar filtros

Best read-only data strategy for parfor

1 visualización (últimos 30 días)
Robin
Robin el 18 de Oct. de 2012
Hi,
I am using parfor on a grid with 60 workers.
I have some data which will be used read-only within the parfor loop.
I see that there are two options... load it on the machine I am submitting from so it is serialized and sent across the network (dedicated gigE for the cluster), or load it from disk within the loop.
Can anyone comment on which of these might be the best strategy for different data sizes? The data compresses very well so is about 20MB on disk but more than 1GB on in memory when loaded. What is the speed of loading and uncompressing in comparison to serialisation?
If I have it loaded on the submission machine, is matlab clever enough to serialize and send once to each worker or will it repeat it on every iteration. Obviously loading from a file would be done every iteration.
Any advice appreciated

Respuestas (1)

Edric Ellis
Edric Ellis el 18 de Oct. de 2012
I would recommend trying my Worker Object Wrapper. It's designed for just this sort of situation. In your case, you should put the files in a location available to the workers, and have them load the data using something like this:
w = WorkerObjectWrapper( @loadHugeData );
The object 'w' is then effectively a handle to the data. When you pass this into a PARFOR loop, the workers can then access the underlying data, like so:
parfor ii = 1:N
doSomethingWith( w.Value );
end

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by