why is acumarray much slower calculating means than sum?

8 visualizaciones (últimos 30 días)

Mostrar comentarios más antiguos

dleal el 9 de Ag. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1776175-why-is-acumarray-much-slower-calculating-means-than-sum

Comentada: Bruno Luong el 10 de Nov. de 2022

Respuesta aceptada: Bruno Luong

Abrir en MATLAB Online

Hi all,

why is acumarray much slower calculating means than sum?

I understand averages are slightly more complex, but I wouldn't expect 20x slower. Here's some code:

ind = randi([1,100],1000000,1);
dat = randn(1e6,1);
f_mean = @() accumarray(ind,dat,[],@mean);
f_sum = @() accumarray(ind,dat,[],@sum);
>> timeit(f_mean)
ans =
0.0562
>> timeit(f_sum)
ans =
0.0028

If I benchmark taking averages and sums using the following code, I get approximately only twice as slow using averages vs sum:

tic;
for jj = 1:1000
    x = randn(100,1);
    mean(x);
end
toc
Elapsed time is 0.005451 seconds.
tic;
for jj = 1:1000
    x = randn(100,1);
    sum(x);
end
toc
Elapsed time is 0.002414 seconds.

2 comentarios
Mostrar NingunoOcultar Ninguno

Walter Roberson el 9 de Ag. de 2022

Abrir en MATLAB Online

You are right, it is slower by a fair bit.

ind = randi([1,100],1000000,1);

dat = randn(1e6,1);

f_mean = @() accumarray(ind,dat,[],@mean);

f_sum = @() accumarray(ind,dat,[],@sum);

N = 50;

tm = zeros(N,1); ts = zeros(N,1);

for K = 1 : N; t0 = tic; f_mean(); tend = toc(t0); tm(K) = tend; end

for K = 1 : N; t0 = tic; f_sum(); tend = toc(t0); ts(K) = tend; end

plot([tm, ts]);

legend({'mean', 'sum'})

mean(tm) ./ mean(ts)

ans = 16.7634

dleal el 9 de Ag. de 2022

thanks for your input Walter!

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Bruno Luong el 9 de Ag. de 2022

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1776175-why-is-acumarray-much-slower-calculating-means-than-sum#answer_1023495

Editada: Bruno Luong el 9 de Ag. de 2022

Abrir en MATLAB Online

The default behavior of accumarray is sum and it's low-level coded.

When you pass user-defined function MATLAB will call the function and the overhead is significan't slower.

i=randi(10,1e6,1);
v=rand(size(i));
tic; accumarray(i,v); toc
Elapsed time is 0.008750 seconds.
tic; accumarray(i,v,[],@sum); toc % smart parsing
Elapsed time is 0.007060 seconds.
tic; accumarray(i,v,[],@(x) sum(x)); toc % % smart parsing stops here
Elapsed time is 0.090754 seconds.

The faster way to compute mean is

tic; meanv = accumarray(i,v)./accumarray(i,1); toc
Elapsed time is 0.007861 seconds.
tic; meanv = accumarray(i,v,[],@mean); toc
Elapsed time is 0.087376 seconds.

2 comentarios
Mostrar NingunoOcultar Ninguno

Christine Tobler el 10 de Nov. de 2022

Abrir en MATLAB Online

That's it exactly. Specifically, @sum, @min and @max are special-cased in accumarray nowadays.

These are able to be faster than an unkonwn function handle for one thing because we have built them in explicitly, but also because we don't need to know all the elements in a bin ahead of time. We can do something like this:

ind = [1;2;2];
var = [2;3;5];
out = zeros(max(ind), 1);
for i=1:length(ind)
    out(ind(i)) = out(ind(i)) + var(i);
end

This wouldn't work for a function handle like @sort, where we need to first find all the elements in a bin and then apply the function handle.

@mean is an interesting in-between case, where we can't use the same technique as for the three function handles above, but could still get some performance from a special-case threatment. We have added it to our tracker of potential enhancements.

Bruno Luong el 10 de Nov. de 2022

IMO one thing that is missing in basic MATLAB is argmin/argmax on finite set, meaning function that returns the second output of min/max functions as main output.

I could argue the same argmin/argmax as accumarray special treated function would be useful (it should returns the row index of the first argument of accumarray, not the index within the group)

Iniciar sesión para comentar.

Más respuestas (1)

Matt J el 9 de Ag. de 2022

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1776175-why-is-acumarray-much-slower-calculating-means-than-sum#answer_1023485

Editada: Matt J el 9 de Ag. de 2022

Abrir en MATLAB Online

I suspect it is because, when you pass in @sum, accumarray is smart enough to recognize that it can use its default settings, which are implemented in a less generic and well-optimized way.The timing comparisons below support this.

Note, in any case, that the speed differences have nothing to do with the complexities of the summation and mean operations themselves. When we specify summation using an anonymous function, we get the same slow speed as with @mean.

ind = randi([1,100],1000000,1);
dat = randn(1e6,1);
f_mean = @() accumarray(ind,dat,[],@mean);
f_sum = @() accumarray(ind,dat,[],@sum);
f_sumAnon = @() accumarray(ind,dat,[],@(x) sum(x));
f_sumDefault = @() accumarray(ind,dat);
 timeit(f_mean)
ans = 0.0790
 timeit(f_sumAnon)
ans = 0.0781
 timeit(f_sumDefault)
ans = 0.0033
 timeit(f_sum)
ans = 0.0033

2 comentarios
Mostrar NingunoOcultar Ninguno

Matt J el 9 de Ag. de 2022

Editada: Matt J el 9 de Ag. de 2022

Abrir en MATLAB Online

If you wish to do a more optimized group-wise mean, you can implement it this way:

ind = randi([1,100],1000000,1);
dat = randn(1e6,1);
f_mean = @() accumarray(ind,dat)./accumarray(ind,1);
timeit(f_mean)
ans = 0.0067

dleal el 9 de Ag. de 2022

thanks for your answer Matt! very helpful

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Categorías

MATLAB Environment and Settings Startup and Shutdown

Más información sobre Startup and Shutdown en Help Center y File Exchange.

Etiquetas

Productos

MATLAB

Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by