Fastest way to find unique cells in a logical cell array

28 visualizaciones (últimos 30 días)
Haider Ali
Haider Ali el 7 de Mzo. de 2020
Comentada: Haider Ali el 8 de Mzo. de 2020
Hello all,
I have a large array of cells (currently 1x4096 but likely to far bigger than that) with each cell of 6x6 logical values. Each cells contains 6 values that are 1 and rest are 0s. Only the placement of these 1s and 0s is different.
I want to find the fastest way to find unique cells in this array.
Currently I am using a for loop for finding unique cells like this:
for idx2 = 1:length(A)-1
for k = idx2 + 1 : length(A)
if(A{idx2} == A{k})
matching_indx(cnt) = uint16(k);
cnt=cnt+1;
end
end
end
But its really slow and does not scale well to larger cell arrays with larger values in each cell (8x8).
What would be the fastest way to achieve this?
Also what would be the most memory efficient way for these type of values? Would uisng sparse help in saving memory AND computations?
Regards
  2 comentarios
Ameer Hamza
Ameer Hamza el 7 de Mzo. de 2020
Can you attach a sample dataset so that we can test a solution and compare the efficiency?
Haider Ali
Haider Ali el 7 de Mzo. de 2020
@Ameer Hamza, yes sure.

Iniciar sesión para comentar.

Respuesta aceptada

Guillaume
Guillaume el 7 de Mzo. de 2020
Editada: Guillaume el 7 de Mzo. de 2020
Why are you using a cell array to start with? A 6x6x4096 matrix would be more efficient in term of speed and memory. It also makes finding the histogram of your 6x6 array dead easy:
%converting cell array into N x N x numel(A) matrix
m = cat(3, A{:}); %better variable names than A and m required!
%finding unique NxN matrices and their count:
masrows = reshape(m, [], size(m, 3)).'; %a numel(A) x (NxN) matrix
[uniquem, ~, id] = unique(masrows, 'rows'); %find unique matrices and assign corresponding ID to each row
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []); %reshape into N x N x numuniquematrices
count = accumarray(id, 1); %one of the many ways to calculate histograms
If you're happy to store the matrices as a count x N x N array instead of a N x N x count array, you would avoid the two transpose above which would give you a speed gain.
Also, in matlab it's better to avoid using integer types unless you really need to.
  8 comentarios
Ameer Hamza
Ameer Hamza el 7 de Mzo. de 2020
Editada: Ameer Hamza el 7 de Mzo. de 2020
@Haider Ali, I am not sure about speed gain. Remember in my code above I artificially increased the size of A
A = [A A A A A A]; % to increase the size of the dataset
If you use A, then the speed gain is close to 2 on my system too.
I could not think of an efficient way in MATLAB to typecast a row of your matrix A into uint64 without slowing down the above codes. I think @Guillaume suggests to change the generation of matrix A such that it saves values as uint64.
Another way to overcome the limited precision of double is to partition your matrix into sub-matrices. As we know that the double will work up to a 7x7 matrix, i.e., its binary representation has 49 digits. Now consider the matrix has a dimension of 8x8, i.e., 64 digits. We can partition it into [1x49 1x15]. So change the lines in my code as follow,
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:64)*2.^(14:-1:0)'];
Similarly for 9x9 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:81)*2.^(31:-1:0)'];
and for 10x10 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
I hope the pattern is clear. You can easily write a statement to automate the above line depending on the dimension of matrix A.
The gain in speed, for the case of 10x10 matrix, is shown by the following script
% generating a pseudo dataset with 128 unique matrices
rng(0);
A = rand(10, 10, 128) > 0.5;
A = A(:,:,randi(128, 1, 4096));
A = mat2cell(A, 10, 10, ones(1,4096)); % convert to cell cell array to be compatilbe with functions
% artifically augment dataset
A = [A A A A A A];
[unique1,count1] = fun1(A);
[unique2,count2] = fun2(A);
isequal1 = isequal(unique1, unique2)
isequal2 = isequal(count1, count2)
t1 = timeit(@() fun1(A))
t2 = timeit(@() fun2(A))
function [uniquem, count] = fun1(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
[uniquem, ~, id] = unique(masrows, 'rows');
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
function [uniquem, count] = fun2(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
[~, ia, id] = unique(binaryVal, 'rows');
uniquem = reshape(masrows(ia,:).', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
The result is
isequal1 =
logical
1
isequal2 =
logical
1
t1 =
0.0864
t2 =
0.0162
About 5.5x speed gain. Note that this method works for arbitrary dimension matrix.
Haider Ali
Haider Ali el 8 de Mzo. de 2020
@Ameer Hamza, thank you! You have been really helpful.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Sparse Matrices en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by