Help with optimising my code?

Question

0 votos

Hello everyone,

Apologies for the vague title, but I think I'm missing exactly the knowledge that would let me point at the topic...

I am trying to group the values in a large matrix yMat into a single vector yOut, sorting them by binning values from a secondary matrix xMat. Starting with

[N,bin] = histc(xMat,edges); *

Then

        for k=1:numel(edges)-1
            yOut(k)=mean(yMat(bin==k));
        end

xMat/yMat vary in size, but their numel can go up to a couple tens of millions, and I need to go through hundreds of them. I picked a relatively large example: the histc line takes 0.3s, while the loop takes over 4s when numel(edges)=100. Ideally I would like much finer grain, too, with numel(edges) closer to 1000...

I am pretty confident I do this the very wrong way, for loops never being a good sign, but I can't think of a faster way to do this. Any help appreciated!

*I know histc is a legacy function, but I'm stuck with R2011b right now, and i don't think this impacts the issue

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

per isakson el 11 de Feb. de 2018

Editada: per isakson el 11 de Feb. de 2018

Abrir en MATLAB Online

1 voto

xMat    = randn( 12, 1);
edges   = ( - 4 : 1 : 4 );
yMat    = ( 1 : 12 )';
[ N, bin ] = histc( xMat, edges );
y1 = accumarray( bin, yMat, [], @mean );
for k=1:numel(edges)-1
    y2(k)=mean(yMat(bin==k));
end

comparision

>> y1'
ans =
         0   11.0000    8.5000    6.6667    4.0000    7.0000
>> y2
y2 =
       NaN   11.0000    8.5000    6.6667    4.0000    7.0000       NaN       NaN

btw: y2 should have been preallocated

In my example, elements of bin can get the value zero. Thus, bin+1 in accumarray

However, the increase in speed on R2016a is definitely not impressive.

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Rene Higuita el 11 de Feb. de 2018

Editada: Rene Higuita el 12 de Feb. de 2018

Abrir en MATLAB Online

Thanks for your edits. Since I'm discovering these functions too, here's quick comments on them:

you avoid 0's in bin by adding -Inf and +Inf in edges (histc yields zero when values are out of bound).

Here's the more challenging code I used

tic;
xMat    = randn( 1e8,1);
edges   = [ -Inf ( - 4 : .01 : 4 ) Inf];
yMat    = randn(size(xMat))';
allocTime=toc;
tic;
[ N, bin ] = histc( xMat, edges );
histcTime=toc;
tic;
y1 = accumarray( bin, yMat, [], @mean );
y1time=toc;
tic;
y3 = accumarray( bin, yMat);y3=y3./N(1:end-1);
y3time=toc;
tic;
y2=zeros(numel(edges)-1,1);
for k=1:numel(edges)-1
  y2(k)=mean(yMat(bin==k));
end
y2time=toc;
%checking relative error levels
maxErr21=max(abs((y2-y1)./y1));
maxErr32=max(abs((y3-y2)./y2));
maxErr13=max(abs((y1-y3)./y3));
fprintf(['---  Numel(xMat)=%g, numel(edges)=%d ---\n'...
  'Creating vectors took %0.1fs, histc %0.1fs\n'...
  'Method y1: %0.1fs y2: %0.1fs y3: %0.1fs\n\n'...
  'Largest relative differences between:\n y1 and y2 : %0.2g'...
  ' -- y2 and y3 : %0.2g -- y3 and y1 :%0.2g\n\n'],...
  numel(xMat),numel(edges),allocTime,histcTime,...
  y1time,y2time,y3time,maxErr21,maxErr32,maxErr13);

And the output by fprintf at the end there, on a 5year old laptop and R2011b:

---  Numel(xMat)=1e+008, numel(edges)=803 ---
Creating vectors took 5.1s, histc 9.5s
Method y1: 18.8s y2: 292.5s y3: 1.5s
Largest relative differences between:
 y1 and y2 : 1.6e-012 -- y2 and y3 : 1e-012 -- y3 and y1 :2.4e-012

On a more recent desktop and R2017a:

i---  Numel(xMat)=1e+08, numel(edges)=803 ---
Creating vectors took 2.2s, histc 4.7s
Method y1: 10.4s y2: 117.6s y3: 0.7s
Largest relative differences between:
y1 and y2 : 2.1e-12 -- y2 and y3 : 1.8e-12 -- y3 and y1 :2.4e-12

So no dramatic changes in the ratios between 2011b and 2017a, and accumarray

Regarding the undocumented Matlab post, I can't really compare because accumarray only takes function handles as input for that argument, unlike cellfun which has a list of built-in functions it can deal with (much more efficiently than through a handle, which is the main point of that post). '@sum' is supposed to be the default handle, but I suspect it is actually built-in as opposed to actually calling 'sum'

Thanks for your help and interest! This definitely will take down my running times to acceptable levels.

per isakson el 12 de Feb. de 2018

Editada: per isakson el 12 de Feb. de 2018

Back in 2009 Yair Altman described a similar case, in which function handles cause a significant performance penalty, cellfun – undocumented performance boost. This has not changed in recent Matlab releases.

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 11 de Feb. de 2018

1 voto

Try accumarray. Use the bin number as subs, use ymat in the values slot, use the empty array as the size, use @mean as the function

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Rene Higuita el 11 de Feb. de 2018

Editada: Rene Higuita el 11 de Feb. de 2018

You got the answer as well. I tagged Per's because he gives more details, and it'll probably help others more to have the examples he gives.

But thanks a lot!

Iniciar sesión para comentar.

Help with optimising my code?

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Help with optimising my code?

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos