fast cell2mat with padding or numerical equivalent of pad function that works very fast on cellarray of variable length uint8 vectors
12 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi Folks,
I have a similar question I already found being asked but I need very fast solution that specifically converts cellarray of vriable length uint8 vectors into a matrix of uint8s with 0s padded to the end of shorter vectors. On the characters the very fast solution is simply function pad. For example given the cell array c:
for i=1:256 c{i,1}=char(uint8(0:i-1)); end
I can pad the cells with whatever character (say '@') using
padded_c=pad(c,'@');
and then convert it really fast to a matrix by:
matrix_c=reshape([padded_c{:}],[],numel(c));
which is btw a way faster than cell2mat I could have used. The important part here is that I did not have to specify and know the maximum length in the cell array - pad function figures it out nicely. What I want is the similarly performing function but on cellarrays of uint8s that occupy half of the space compared to chars. The cell arrays are huge, could be holding hundreds of millions of uint8 vectors and even checking the maximum length with cellfun(@numel,c) is very costly. I need it for DNA sequences analysis. The input is typically a text with lines of only a few characters arranged in sequences of up to 160 or so. To save space I convert the sequences from the lines of char text to cellarray of uint8s and then I would need a fast cell2mat with padding. Obviously I could proccess them as chars using the above and convert into uint8s at the end, but this limits significantly the size of the data I can process in RAM in one go. I was thinking that perhaps I could stay on a single vector of uint8s from the start (without placing separate lines into cells) and then somehow inject uint8(0) in places after the end of shorter lines but I could not see a fast(er) way to do it other than copying across to another empty matrix of uint8s. Any ideas?
3 comentarios
Jan
el 13 de Oct. de 2022
Editada: Jan
el 13 de Oct. de 2022
@dymitr ruta: Now the input is a "read vector" and can be char or uint8. In the question you have mentioned a cell array. I thought of posting a C-Mex function, but as long as the type of the input is not clear, this would be a waster of time in 50% of the cases.
So please post a small example of the input data and the wanted output. It matters if you want the row or column order.
A hint: cellfun(@numel,c) is slower than cellfun('prodofsize',c).
Respuestas (0)
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!