Accumarray for two functions?

I'm trying to compute both the mean and SD of a set of values by group, and I can either do it with for-loop:
M = zeros(1,ngroup);
SD = zeros(1,ngroup);
for i = 1:ngroup
M(i) = mean(data(ind==i));
SD(i) = std(data(ind==i));
end
Or, alternatively use `accumarray` twice.
M = accumarray(ind,data,[],@mean);
SD = accumarray(ind,data,[],@std);
But is there a way to just use accumarray once and compute both quantities? Since accumarray is faster than for-loop, but calling it twice will be slow. Is it possible to do something like:
[M, SD] = accumarray(ind,data,[],{@mean,@std})

 Respuesta aceptada

Matt J
Matt J el 19 de Oct. de 2022
Editada: Matt J el 19 de Oct. de 2022
Since accumarray is faster than for-loop, but calling it twice will be slow.
I would argue that 3 calls to accumarray would be the most optimal.
data = rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.499977 seconds.
tic;
Out = accumarray(ind, data, [], @(x){{mean(x) std(x)}});
toc;
Elapsed time is 0.235066 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.047581 seconds.

4 comentarios

BRIAN XU
BRIAN XU el 19 de Oct. de 2022
Wow that's interesting, many thanks!
Bruno Luong
Bruno Luong el 19 de Oct. de 2022
Accumarray is optimized when doing sum.
Using function handle has great penalty.
Matt J
Matt J el 19 de Oct. de 2022
Editada: Matt J el 19 de Oct. de 2022
I should mention though that there might be some sacrifice in numerical accuracy using the expansion,
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
The subtraction of S2 and 2*S.*M can give large floating point residuals.
data = 10000+rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.574901 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.048677 seconds.
errorMean=norm(M-M0)/norm(M0)
errorMean = 4.4567e-15
errorSTD=norm(SD-SD0)/norm(SD0)
errorSTD = 5.3453e-06
Bruno Luong
Bruno Luong el 19 de Oct. de 2022
Small simplification
N=accumarray(ind,1);
S = accumarray(ind,data);
M = S./N;
S2=accumarray(ind,data.^2);
SD = sqrt((S2 - S.*M)./(N-1));

Iniciar sesión para comentar.

Más respuestas (1)

Star Strider
Star Strider el 19 de Oct. de 2022
The accumarray approach is certainly possible —
v = randn(250,1)
v = 250×1
0.0501 -0.2469 1.0253 1.1484 -0.9196 0.4947 -0.7221 1.2164 -0.3162 0.0461
Out = accumarray(ones(size(v)), v, [], @(x){{mean(x) std(x)}})
Out = 1×1 cell array
{1×2 cell}
meanv = Out{:}{1}
meanv = -0.0333
stdv = Out{:}{2}
stdv = 0.9419
Make appropriate changes to work with your data.
.

2 comentarios

BRIAN XU
BRIAN XU el 19 de Oct. de 2022
that's a nice idea! thank you!
Star Strider
Star Strider el 19 de Oct. de 2022
My pleasure!

Iniciar sesión para comentar.

Categorías

Más información sobre Performance and Memory en Centro de ayuda y File Exchange.

Preguntada:

el 19 de Oct. de 2022

Comentada:

el 19 de Oct. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by