Explode cell that are into another cell

I have a cellarray data like in the picture : each cell contains cellarray of strings
I am opening all all the cells with this way
counter=0;
for ind=1:length(data)
tmp=cell2str(data{ind,1});
for k=1:size(tmp,1)
counter=counter+1;
tmp2=textscan(tmp(k,:),'%s%s%s%s%s%[^\n\r]','Delimiter', ' ');
for j=1:6
if isempty(tmp2{j})==0
Raw(counter,j)=tmp2{j};
end
end
clear tmp2 j
end
clear k tmp
end
The results is correct but is there a better/faster way to do it ?
Using parfor, or other technics
Thank you in advance

9 comentarios

dpb
dpb el 9 de En. de 2019
Attach small sample dataset and what do you want the result to be?
NirE
NirE el 9 de En. de 2019
Editada: NirE el 9 de En. de 2019
I am joining a small piece of data, 10 rows but in fact it has 38956 rows.
The length of the cells that are inside the main cell can differ. I hope that I am clear enough
You can just run the script that I wrote it is working but very slowly
Is there a way tu use parfor or other thing in order to accelerate the process?
Luna
Luna el 9 de En. de 2019
cell2str does not work for me, which version you are using?
NirE
NirE el 9 de En. de 2019
Matlab R2017b
dpb
dpb el 9 de En. de 2019
Editada: Stephen23 el 9 de En. de 2019
Must be in some TB, then, it's not in base R2017b...
Again, what's the desired output? That doesn't seem to make sense reading the code; your format statement has 5 strings but some "records" have many more fields than that...
Well, that's not it either, maybe FEX submittal? A search of online help doesn't find it, either.
Not knowing what, precisely, the cell2str function actually returns it's hard to guess exactly what the result that "works" really is without more effort than have time to spend...help us help you.
Luna
Luna el 9 de En. de 2019
What size should be your output? Are you expecting 64x1 cell array?
NirE
NirE el 9 de En. de 2019
the str2cell function just return a string vector with the number of line that we had in the cell.
for the format it is exactly what i want 6 parts with different length.
i hope that i am helping
Jan
Jan el 9 de En. de 2019
Note: Omit the useless clear commands. They will waste time only here.
dpb
dpb el 9 de En. de 2019
Well, no...helping would be to show us what you really, really want instead of just describing it that we can't reproduce.
Where did you find the function? SHOW us!!!

Iniciar sesión para comentar.

 Respuesta aceptada

Jan
Jan el 9 de En. de 2019
Start with a pre-allocation:
Len = cellfun('prodofsize', data);
Raw = cell(sum(Len), 6);
c = 0;
for ind = 1:numel(data)
tmp = data{ind};
for k = 1:numel(tmp)
c = c + 1;
tmp2 = strsplit(tmp{k}, ' ');
for j = 1:numel(tmp2)
Raw{c, j} = tmp2{j};
end
end
end
I cannot open your MAT file currently, so I guess, what it might contain. I guessed also, that cell2str can be avoided by scanning the cell element directly. I assume that Raw should be a cell array. All these assumptions can be wrong. If you post a small input as code and the wanted output, less guessing is required.

5 comentarios

NirE
NirE el 10 de En. de 2019
Your code is reducing mine 3 times thanks a lot.
Can you just explain what is doing 'prodofsize' ?
Jan
Jan el 10 de En. de 2019
Editada: Jan el 10 de En. de 2019
@Nir Eliezer: 'prodofsize' is explained in the docuemtation: doc cellfun. It is equivalent to:
Len = cellfun(@numel, data);
but much faster. If you provide a function handle to cellfun, it calls the Matlab engine for each element of the cell, while using the strings like 'length', 'prodofsize' and 'isclass' accesses the cell elements directly inside the cellfun core. Although the speed of cellfun might be negligible in your case, it is a good programming practize to use the most efficient code.
By the way, 'numel' would be much nicer than 'prodofsize'.
Maybe this is slightly faster:
Len = cellfun('prodofsize', data);
Raw = cell(sum(Len), 6);
c = 0;
for ind = 1:numel(data)
tmp = data{ind};
for k = 1:Len(ind)
c = c + 1;
tmp2 = strsplit(tmp{k}, ' ');
Raw(c, 1:numel(tmp2)) = tmp2;
end
end
NirE
NirE el 21 de En. de 2019
Jan one more question is there a way to parallelize your piece of code that I could use parfor ?
Jan
Jan el 22 de En. de 2019
Yes, a parallelizaion should be very straigh forward. Did you try it?
NirE
NirE el 22 de En. de 2019
Will try and tell you how it increase or not

Iniciar sesión para comentar.

Más respuestas (2)

dpb
dpb el 9 de En. de 2019
Editada: dpb el 9 de En. de 2019
OK, I overlooked the regular expression in the format string that sucks up all of those extra blanks at the end of the odd-man-out records...
To dereference the cell content in each cell requires two levels snce textscan isn't cell-string aware. split doesn't cut it here because there's not a unique delimiter that defines the fields desired; hence the above...
You can try the following and see if the lack of preallocation shows up as a performance hit with the size; oftentimes it'll fool you and not be too bad...
fnTS=@(s) textscan(s,'%s%s%s%s%s%[^\n\r]','Delimiter', ' ');
res=[];
for i=1:length(data)
res=[res;cellfun(fnTS,data{i},'uni',0)];
end
res=cat(1,res{:});
The above yields a 64x6 cell array...
I'd have to think of the bestest way to be able to build the array directly w/o the intermediary second cell array to not be dynamically catenating the output.
ADDENDUM:
res(cellfun(@isempty,res))={''};
>> string(res)
ans =
11×6 string array
"1" "EventDataLogNewFile" "DataEventTime" "TypeSecondsSinceEpoch" "1546725641" ""
"1" "EventDataLogNewFile" "DataEventTime" "TypeFormattedDate" "Sun" "Jan 6 00:00:41 2019"
"1" "EventDataLogNewFile" "DataReportingSubsystem" "TypeString" "datalogger" ""
"1" "EventDataLogNewFile" "DataInstrumentID" "TypeString" "00:01:05:19:CF:30" ""
"1" "EventDataLogNewFile" "DataEntityName" "TypeString" "mc16" ""
"4" "EventREAD" "DataEventTime" "TypeSecondsSinceEpoch" "1546725657" ""
"4" "EventREAD" "DataEventTime" "TypeFormattedDate" "Sun" "Jan 6 00:00:57 2019"
"4" "EventREAD" "DataReportingSubsystem" "TypeString" "pc" ""
"4" "EventREAD" "DataEntityID" "TypeString" "Dev_CLPC_PressureGauge1" ""
"4" "EventREAD" "DataReading" "TypeUnitLessNumber" "34729" ""
"4" "EventREAD" "DataEventDuration" "TypeSec" "0.000686859" ""
>>
for just doing the first two elements in the for...end loop instead of all for brevity.
Luna
Luna el 9 de En. de 2019
I was assuming the same 64x9 cell. Here is my solution gives the same result with Jan's:
cellArray = cellfun(@(x) strsplit(x(:,:),' '), vertcat(data{:}), 'UniformOutput',false);
for i = 1:numel(cellArray)
for j = 1:numel(cellArray{i})
raw{i,j} = cellArray{i}{j} ;
end
end

Categorías

Más información sobre Data Type Identification en Centro de ayuda y File Exchange.

Productos

Versión

R2017b

Preguntada:

el 9 de En. de 2019

Comentada:

el 22 de En. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by