Reduction Operations Supported for Automatic Parallelization of
for
-loops
The code generator automatically parallelizes for
-loops by
converting implicit and explicit sequential for-loop code blocks into parallelized code
blocks. Parallelization of a section of code might significantly improve the execution
speed of the generated code. See How parfor-Loops Improve Execution Speed.
Parallelize for
-loops Performing Reduction Operations
You can parallelize for
-loops performing reduction operations
by using the configuration option Optimize reductions.
To enable automatic parallelization of these for
-loops:
Open the MATLAB® Coder™ app.
On the Generate Code page, click More Settings.
On the Speed tab, select the Enable automatic parallelization and Optimize reductions check boxes.
Optimize reductions is also enabled if you set the Leverage target hardware instruction set extensions parameter to an instruction set that your processor supports.
To enable the configuration option OptimizeReductions
by using
the command-line interface, run these commands.
cfg = coder.config('lib');
cfg.EnableAutoParallelization = true;
cfg.OptimizeReductions = true;
For example, write a MATLAB function arraySum
that calculates the sum of
elements of arrays in1
and sum
, and returns
the reduction variable
out
.
function out = arraySum(in1,a,b) sum = 0; c = zeros(numel(in1),1); for i2 = 1:numel(in1) if i2 > in1(i2) sum = sum + in1(i2); c(i2) = a(i2) + b(i2); end end out = sum + mean(c); end
At the MATLAB command line, run this codegen
command.
arr = 1:1000; codegen arraySum -config cfg -args {arr,arr,arr} -report
Code generation successful: View report
Open the code generation report to see the parallelized
for
-loop that performs the addition
operation.
sum = 0.0; #pragma omp parallel num_threads(omp_get_max_threads()) private(sumPrime, d) { sumPrime = 0.0; #pragma omp for nowait for (i2 = 0; i2 < 1000; i2++) { c[i2] = 0.0; d = in1[i2]; if ((double)i2 + 1.0 > d) { sumPrime += d; c[i2] = a[i2] + b[i2]; } } omp_set_nest_lock(&autoparExample_nestLockGlobal); { sum += sumPrime; } omp_unset_nest_lock(&autoparExample_nestLockGlobal); }
MATLAB Functions Supported for Reduction Operations
A reduction operation reduces specific dimensions of an input to a scalar value. A
reduction operation must be associative and commutative. This table lists the
MATLAB functions that are supported as reduction operations and are
parallelized in generated code, where X
is the reduction variable
and expr
is a MATLAB expression. The reduction variable X
can appear on
both sides of an assignment statement.
MATLAB Function | Usage Notes |
---|---|
plus |
|
minus |
|
times |
|
max |
|
min |
|
sum |
|
prod |
|
or |
|
and |
|
bitand |
|
bitor |
|
bitxor |
|
Note
The Support nonfinite numbers
(SupportNonFinite
) property supports code generation only
for standalone libraries (lib
, dll
) and
executables.
The following example shows a typical usage of a reduction variable
X
.
X = 0; % Initialize X for i = 1:n X = X + d(i); end
This loop is equivalent to the following, where you calculate each
d(i)
in a different
iteration.
X = X + d(1) + ... + d(n)
Handling Overflow in Automatic Parallelization of for
-loops
Enabling automatic parallelization of for
-loops and reduction
optimization might produce different results due to overflow when you compare the
output of sequential MATLAB code with that of the generated parallel C/C++ code. Therefore, when
there is possibility of such overflow, the code generator does not parallelize the
loop.
The table shows the MATLAB functions where significant overflow can occur, along with their corresponding workarounds.
MATLAB Function | Description | Workaround |
---|---|---|
Integer overflow function out = integerOverflow(in) out = int8(0); for i = 1:numel(in) out = out + in(i); end end integerOverflow(int8(1:100)) ans = int8 127 | Automatic parallelization of reduction based for-loops
performing arithmetic operations on integers is not
supported when During parallel execution,
the reduction operations are distributed among multiple
threads. When the partial results are accumulated at the
end, the results might be
non-deterministic.
Therefore, the code generator do not automatically
parallelize the (126-125) + 122 = 1 + 122 = 123 (126 + 122) - 125 = 127(saturation) - 125 = 2 | If appropriate for
your application, disable the Saturate on integer
overflow
( |
Usage Notes and Limitations
for
-loops containing calls to C/C++ functions usingcoder.ceval
are not automatically parallelized.Bitwise reduction operations (
bitand
,bitor
, andbitxor
) are only supported for integer data types.Custom reduction operations such as
a = foo(a,b)
are not supported for automatic parallelization offor
-loops.Reduction operations on floating-point numbers are only approximately associative. To get deterministic behavior of a parallel execution, the reduction operations involved must be associative. To be associative, a function
f
must satisfy the following for alla
,b
, andc
.When working with floating-point numbers, different parallel executions of a loop might produce results with different round-off errors. If such round-off errors are unacceptable to your application, use the pragmaf(a,f(b,c)) = f(f(a,b),c)
coder.loop.parallelize('never')
to instruct the code generator to not automatically parallelize specific for-loops. For more information on potential differences during code generation, see Differences Between Generated Code and MATLAB Code.