Tall vs distributed array

I see that we have tall and distributed arrays.
Tall divides data into chunks.
Distributed also divides data into chunks!
What's the differece here?
And, how either of these are connected to parallel computing?

Respuestas (1)

Edric Ellis
Edric Ellis el 14 de Mayo de 2018

0 votos

Both tall and distributed arrays are designed for processing large amounts of data, but they have somewhat different capabilities.
distributed arrays exist spread across the memory of several MATLAB worker processes - so the largest distributed array you can create is limited by the total amount of physical memory you have. Also, distributed arrays are more oriented towards dense and sparse linear algebra. distributed arrays require Parallel Computing Toolbox, and are most effective when used with MATLAB Distributed Computing Server (which allows the use of multiple machines across which to distribute the data).
The data for tall arrays exists on disk, and so their size is not limited by the amount of memory you have available. However, as the name implies, tall arrays can be large only in the first dimension. tall arrays are more geared towards data analytics. tall arrays ship with MATLAB itself, but there is enhanced support in both Parallel Computing Toolbox (which enables parallel processing in a single computer) and MATLAB Distributed Computing Server (which enables parallel processing across a cluster, including Hadoop/Spark clusters).

3 comentarios

Pey
Pey el 14 de Mayo de 2018
Thanks Edric for the reply.
The data are always stored on disk! I guess it's a matter of how you read from and write to disk and communicate with CPU and GPU.
Both also have improved operation through the two toolboxes of Parallel Computing and Distributed Computing. Distributed array requires Parallel Computing and tall array doesn't. (oen difference here, but why! mathematical, technical or marketing?!)
So if we remove several dimensions of a distributed array so that it has only one large dimension, it turns into a tall array, right? I don't see a point of having two types of arrays. Why don't we have only distributed array?
Not knowing the details on how you handle the memory and read/write, tall array seems redundant to me.
Edric Ellis
Edric Ellis el 15 de Mayo de 2018
The fundamental difference is where the data is held once you've created the array. distributed arrays are more restricted in size because the contents are always in memory, but they are more capable. tall arrays can be much larger - as long as you have the disk space.
Pey
Pey el 15 de Mayo de 2018
Thanks. So if I understood correctly, can I summarize it in this way?

Iniciar sesión para comentar.

Categorías

Más información sobre Creating and Concatenating Matrices en Centro de ayuda y File Exchange.

Productos

Preguntada:

Pey
el 11 de Mayo de 2018

Comentada:

Pey
el 15 de Mayo de 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by