Are accumarray​/cellfun/a​rrayfun/et​c multithreaded?

6 visualizaciones (últimos 30 días)
Eric Sampson
Eric Sampson el 19 de Abr. de 2013
Comentada: Jan el 22 de En. de 2018
As per the subject line, anyone know? If not, could this be done by TMW without a lot of effort?

Respuestas (2)

Jan
Jan el 21 de Abr. de 2013
Editada: Jan el 21 de Abr. de 2013
It is not trivial to add multi-threading to e.g. cellfun. Its source code was shipped with older Matlab versions and multi-threading for the builtin commands (defined as strings and unfortunately marked as "Backward Compatibility" only, although they are very efficient) can be added easily. But for the not-built-in methods, the called functions could have side-effects, e.g. persistent counters, output to files, etc. Then distributing to job to different tasks would cause serious errors. Example:
x = {1,2; 3,4}
cellfun(@(c) fprintf('%d ', c), x)
With multi-threading the result is not well-defined anymore.
How could cellfun (etc.) decide, if there are dependencies between the function calls? As long as independence is not guaranteed, an automatic multi-threading would be a bug.
  4 comentarios
Alexander Laut
Alexander Laut el 22 de En. de 2018
I believe I understand your point given your example printing the contents of the cell. I am curious if your concern is however that the outputs would be in some arbitrary order or would they conflict in critical way (specifically for your example).
If backwards compatibility is the main issue then so be it, but i think that it may still be a nice feature to include and optional flag that would allow it to run in parallel, if not to run at your own risk. The flag for 'uniformoutput',false already seems like an option that may have been added to deal with unpredictable functions.
Jan
Jan el 22 de En. de 2018
@Alexander Laut: The "uniformoutput" flag is not for unpredictable functions, but if the reply of the functions cannot be concatenated to an array.
MathWorks will surely not introduce a flag, which is used on own risk only. But you can do this easily, if you have an old version of Matlab, e.g. R13, which included many C-sources as cellfun.c.
A mutli-threaded cellfun would have to call Matlab multiple times, but Matlab is not thread-safe. Even if you call functions from the C-mex libs like mxCreateNumericMatrix, a crash is guaranteed.
But if a future version of Matlab is thread-safe, there is still the problem of defining a suitable number of threads: MathWorks decided to implement a multi-threading for sum, if it applied to a vector with more than 88.999 elements. This is a double-edged decision, because the summation is not a stable operation and depends on the order of operands (example: 1e17 + 1 - 1e17 returns 0, but 1e17 - 1e17 + 1 yields 1). In R2009a the results a and b of
x = rand(1, 89000);
a = sum(x)
b = sum(x)
could differ randomly due to rounding errors. The sum was calculated in 2 threads and the result depended on which thread was finished at first. This was fixed in following versions, but since sum was multi-threaded, the result of the sum of huge vectors depends on the number of used cores.
But back to the problem: sum has the perfect property that you can predict how much processing time it needs. MathWorks decided to set a limit at 89000, because starting a thread is very expensive and for short vectors the single threaded version is faster in consequence. But what would you do in cellfun. How could the function decide how many threads to start? There is no chance to estimate how much time the called function needs of if it need the same time for the different cell elements. There are methods to control this dynamically, but they are expensive.
If you have a huge data set stored in a cell and want to apply a function, which can be distributed to multiple threads, use the method suggested by Nimrod: Run a parfor loop and check carefully, if it is faster than a single-threaded cellfun approach.

Iniciar sesión para comentar.


Nimrod
Nimrod el 6 de Sept. de 2016
I found an easy solution..
you will have to divided your original cell array into several cell arrays (in cell arrays), lets say 32
Than you iterate over the cells with parfor and cell2mat everything back together

Categorías

Más información sobre Startup and Shutdown en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by