Finding the max number of non-zero repeating elements for each row in a large matrix
6 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Marin Genov
el 28 de Jul. de 2022
Comentada: Marin Genov
el 29 de Jul. de 2022
I have a (large) matrix of type 'uint8' or 'uint16', for (a small) example
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
One can make the following assumptions on A:
- size(A,2) is relatively small, e.g.
- size(A,1) is big, i.e. a few hundred thousands of rows, possibly even a few millions;
- the rows of A are sorted in an ascending manner;
- the elements of A are bounded above by a constant M that is sufficiently smaller than the maximum the type allows, e.g. for 'uint8' and for 'uint16'.
I would like to compute a vector
b = [2;
3;
0;
3];
assigning to each row of A the max number of repetitions of the non-zero elements contained in that row. Now, the way I do this currently is the following vectorized code running on my GPU:
a = permute(unique(A(A~=0)),[3 2 1]); % determine the unique non-zero values of A;
b = max(sum(uint8(bsxfun(@eq,A,a)),2,'native'),[],3); % compare A with a and count each value, then take max.
But if the matrix A is large and has a lot of variance (hence so does a), there appears to be a lot of unnecessary computation. So I was wondering if there is a faster way to do this since I have to do it for a lot of such large matrices.
0 comentarios
Respuesta aceptada
Matt J
el 28 de Jul. de 2022
Editada: Matt J
el 28 de Jul. de 2022
size(A,2) is relatively small,
That being the case, I'd just use a loop:
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n]=size(A);
count=zeros(m,1);
b=-inf(m,1);
Alast=A(:,1);
for i=1:size(A,2)
Ai=A(:,i);
increm=(Ai==Alast & Alast~=0);
count(increm)=count(increm)+1;
reset=~increm;
count(reset)=(Ai(reset)~=0);
b=max(b,count);
Alast=Ai;
end
b
2 comentarios
Más respuestas (2)
Bruno Luong
el 29 de Jul. de 2022
Editada: Bruno Luong
el 29 de Jul. de 2022
This code would not requires too much RAM beside a copy of A transposed.
It somesort of vectorized runlength encoding. A little bit cryptic code I must admit.
The for-loop posted by Matt is the best IMO
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n] = size(A);
B = [A.'; Inf(1,m)];
[c,r] = find([ones(1,m); diff(B)]);
Aij = B(c+(r-1)*(n+1));
runlength = diff(c).*(diff(r)==0);
runvalue = Aij(1:end-1);
lgtnz = runlength.*(runvalue>0); % does count the length of sequence of 0s
maxlgt = accumarray(r(1:end-1), lgtnz, [m,1], @max)
0 comentarios
Matt J
el 28 de Jul. de 2022
Editada: Matt J
el 28 de Jul. de 2022
This should be fine for the uint8 case. For the uint16 case, it might strain RAM.
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
N=size(A,1);
M=max(A(:));
[I,J,S]=find(A);
b=max( accumarray([I,S],1,[N,M]) ,[],2)
2 comentarios
Bruno Luong
el 29 de Jul. de 2022
Editada: Bruno Luong
el 29 de Jul. de 2022
I think because
[I,S]
is converted to the lower class of I and S, which is uint8/uint16. Thus I is truncated to intmax of the class.
you can fix by replace with
[I,double(S)]
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!