Define parallel execution environment for mapreduce and tall arrays
mapreducer defines the execution environment for
mapreduce or tall arrays. Use the
mapreducer function to change the execution
environment to use a different cluster or to switch between serial and
The default execution environment uses either the local MATLAB® session, or a parallel pool if you
have Parallel Computing Toolbox™. If you have Parallel Computing Toolbox installed, when you use the
mapreduce functions, MATLAB automatically starts a parallel pool of workers,
unless you have changed the default preferences. By default, a parallel pool
uses local workers, typically one worker for each core in your machine. If
you turn off the Automatically create a parallel pool option, then you must explicitly start a pool if you want to
use parallel resources. See Specify Your Parallel Preferences.
When working with tall arrays, use
mapreducer to set
the execution environment prior to creating the tall array. Tall arrays are
bound to the current global execution environment when they are constructed.
If you subsequently change the global execution environment, then the tall
array is invalid, and you must recreate it.
In MATLAB, you do not
need to specify configuration settings using
mapreduce algorithms and tall array
calculations automatically run in the local MATLAB session only. If
you also have Parallel Computing Toolbox, then you can use the additional
mapreducer configuration options listed on
this page for running in parallel. If you have MATLAB
Compiler™, then you can use
mapreducer configuration options for
running in deployed environments.
mapreducer with no input arguments creates a new
mapreducer execution environment with all the
defaults and sets this to be the current
tall array execution environment. You can use
gcmr to get the current
If you have default preferences (Automatically create a parallel pool is enabled), and you have not opened a parallel pool, then
mapreduceropens a pool using the default cluster profile, sets
gcmrto a mapreducer based on this pool and returns this mapreducer.
If you have opened a parallel pool, then
gcmrto a mapreducer based on the current pool and returns this mapreducer.
If you have disabled Automatically create a parallel pool, and you have not opened a parallel pool, then
gcmrto a mapreducer based on the local MATLAB session, and
mapreducerreturns this mapreducer.
mapreducer(0) specifies that
calculations run in the MATLAB client session without using any parallel
mapreducer( sets the global
execution environment for
mapreduce or tall arrays,
using a previously created MapReducer object,
mr, if its
ObjectVisibility property is
returns a MapReducer object to specify the execution environment. You can
define several MapReducer objects, which enables you to swap execution
environments by passing one as an input argument to
mr = mapreducer(___)
hides the visibility of the MapReducer object,
mr = mapreducer(___,'ObjectVisibility','Off')
any of the previous syntaxes. Use this syntax to create new MapReducer
objects without affecting the global execution environment of
Develop in Serial and Then Use Local Workers or Cluster
If you want to develop in serial and not use local workers or your specified cluster, enter:
mapreducerto change the execution environment after creating a tall array, then the tall array is invalid and you must recreate it. To use local workers or your specified cluster again, enter:
mapreducer with Automatically Create a Parallel Pool Switched Off
If you have turned off the Automatically create a parallel pool option, then you must explicitly start a pool if you want to use parallel resources. See Specify Your Parallel Preferences for details.
The following code shows how you can use
without input arguments to set the execution environment to your local
MATLAB session and then specify a
local parallel pool:
>> mapreducer >> parpool('local',1);
Starting parallel pool (parpool) using the 'local' profile ... connected to 1 workers.
Evaluating tall expression using the Local MATLAB Session: Evaluation completed in 0 sec ans = 5.2238e-04
poolobj — Pool for parallel execution
gcp (default) | parallel.Pool object
Pool for parallel execution, specified as a
poolobj = gcp
hadoopCluster — Hadoop cluster for parallel execution
Hadoop cluster for parallel execution, specified as a
mr — Execution environment for
mapreduce and tall arrays
Execution environment for
mapreduce and tall
arrays, returned as a MapReducer object.
ObjectVisibility property of
mr is set to
mr defines the default execution environment for
mapreduce algorithms and tall array calculations.
ObjectVisibility property is
'Off', you can pass
mr as an
input argument to
mapreduce to explicitly specify the
execution environment for that particular call.
You can define several MapReducer objects, which enables you to swap
execution environments by passing one as an input argument to
One of the benefits of developing your algorithms with tall arrays is that you
only need to write the code once. You can develop your code locally, then use
mapreducer to scale up and take advantage of the
capabilities offered by Parallel Computing Toolbox, MATLAB
Parallel Server™, or MATLAB
Compiler, without needing to rewrite your