Create and Use Distributed Arrays
If your data is currently in the memory of your local machine, you can use the
distributed
function to distribute an existing array from the
client workspace to the workers of a parallel pool. Distributed arrays use the combined
memory of multiple workers in a parallel pool to store the elements of an array. For
alternative ways of partitioning data, see Distributing Arrays to Parallel Workers. You operate on the entire array as a single entity,
however, workers operate only on their part of the array, and automatically transfer
data between themselves when
necessary.
You can use distributed arrays to scale up your big data computation. Consider
distributed arrays when you have access to a cluster, as you can combine the memory of
multiple machines in your cluster.
A distributed
array is a single variable, split over multiple workers
in your parallel pool. You can work with this variable as one single entity, without
having to worry about its distributed nature. To explore the functionalities available
for distributed arrays in the Parallel Computing Toolbox™, see Run MATLAB Functions with Distributed Arrays.
When you create a distributed
array, you cannot control the details
of the distribution. On the other hand, codistributed
arrays allow you
to control all aspects of distribution, including dimensions and partitions. In the
following, you learn how to create both distributed
and
codistributed
arrays.
Creating Distributed Arrays
You can create a distributed array in different ways:
Use the
distributed
function to distribute an existing array from the client workspace to the workers of a parallel pool.You can directly construct a distributed array on the workers. You do not need to first create the array in the client, so that client workspace memory requirements are reduced. The functions available include
,eye
(___,"distributed")
, etc. For a full list, see the Alternative Functionality section of therand
(___,"distributed")distributed
object reference page.To create a
codistributed
array inside anspmd
statement, see Single Program Multiple Data (spmd). Then access it as adistributed
array outside thespmd
statement. This lets you use distribution schemes other than the default.
In this example, you create an array in the client workspace, then turn it into a distributed array.
Create a new pool if you do not have an existing one open.
parpool("Processes",4)
Create a magic 4-by-4 matrix on the client and distribute the matrix to the workers. View the results on the client and display information about the variables.
A = magic(4); B = distributed(A); B whos
Name Size Bytes Class Attributes A 4x4 128 double B 4x4 128 distributed
You have created B
as a distributed
array,
split over the workers in your parallel pool. This is shown in the figure below. The
distributed array is ready for further computations.
Close the pool after you have finished using the distributed array.
delete(gcp)
Creating Codistributed Arrays
Unlike distributed
arrays, codistributed
arrays allow you to control all aspects of distribution, including dimensions and
partitions. You can create a codistributed
array in different
ways:
Partitioning a Larger Array — Start with a large array that is replicated on all workers, and partition it so that the pieces are distributed across the workers. This is most useful when you have sufficient memory to store the initial replicated array.
Building from Smaller Arrays — Start with smaller replicated arrays stored on each worker, and combine them so that each array becomes a segment of a larger codistributed array. This method reduces memory requirements as it lets you build a codistributed array from smaller pieces.
Using MATLAB Constructor Functions — Use any of the MATLAB® constructor functions like
rand
orzeros
with a codistributor object argument. These functions offer a quick means of constructing a codistributed array of any size in just one step.
In this example, you create a codistributed
array inside an
spmd
statement, using a nondefault distribution scheme.
Create a parallel pool with two workers.
parpool("Processes",2)
In an spmd statement, define a 1-D distribution along the third dimension, with 4
parts on worker 1, and 12 parts on worker 2. Then create a 3-by-3-by-16 array of
zeros. View the codistributed
array on the client and display
information about the variables.
spmd codist = codistributor1d(3,[4,12]); Z = zeros(3,3,16,codist); Z = Z + spmdIndex; end Z whos
Name Size Bytes Class Attributes Z 3x3x16 1152 distributed codist 1x2 265 Composite
Close the pool after you have finished using the codistributed array.
delete(gcp)
For more details on codistributed arrays, see Working with Codistributed Arrays.