- "number written in front of timestep to"   is that the number to the right of the string, timestep ?
- What has xlswrite and fprintf to do with the question?
extracting the lines of interest to a matrix from a text
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Homayoon
el 11 de Jun. de 2015
Editada: per isakson
el 22 de Jun. de 2015
Dear All, I have tried for about two hours but I could not figure what the problem is with the code. So sorry to repost it to the forum!
I have a huge text file in the following format:
*********************************
timestep 455
No_Specs 3
H2 49
H2O2 1
O2 49
*********************************
timestep 460
No_Specs 3
H2 49
H2O2 1
O2 49
*********************************
timestep 465
No_Specs 2
H2 50
O2 50
*********************************
As you can see the text file includes a lot of loops, each consisting of 4-10 lines. What I want is simply report the number written in front of timestep to the first column of a matrix. Also, I need to find 'HO2 ' [ To avoid confusion the extra space is needed ] for any of the loops and report the number in front of it in the second column of that matrix! Obviously if there is not any 'HO2 ' in a loop the associated number to that that row is zero!
Here is the code:
fid=fopen('fic.txt');
l=fgetl(fid);
k=1;
while ischar(l)
r{k}=l;
k=k+1;
l=fgetl(fid);
end
fclose(fid);
idx=find(~cellfun(@isempty,regexp(r,'(?=timestep).+')));
a=regexp(r(idx),'\d+','match');
b=str2double([a{:}]);
ii=diff([idx numel(r)+1])-1;
for k=1:numel(b);
s=r(idx(k)+1:ii(k));
jj=find(~cellfun(@isempty,regexp(s,'(?=HO2 ).+')));
c=regexp(s(jj),'\d+','match');
if isempty(c)
f(k)=0;
else
f(k)=str2double(c{1});
end
end
M=[b' f']
Problem with the code is , the elements of the second column are all zero !!! I hope you might be able to help me! I appreciate your helps! Best
2 comentarios
per isakson
el 11 de Jun. de 2015
Editada: per isakson
el 11 de Jun. de 2015
Why don't you provide a sample of how you want the result?
Respuesta aceptada
per isakson
el 12 de Jun. de 2015
Editada: per isakson
el 22 de Jun. de 2015
An alternate approach. The function cssm transfers the entire content of the text file to a structure array. This structure is then used for reporting.
"a huge text file" this approach requires that the string content of the text file together with the structure fits in memory.
>> out = cssm()
out =
3x1 struct array with fields:
H2
H2O2
No_Specs
O2
timestep
>> for jj = 1 : 3, fprintf( '%8d%8d\n', out(jj).timestep, out(jj).H2O2 ), end
455 1
460 1
465 0
>> permute( [ out.timestep; out.H2O2 ], [2,1] )
ans =
455 1
460 1
465 0
where
function out = cssm()
str = fileread( 'H2O2.txt' );
section_separator = '[\*]{30,}'; % a row of at least 30 "*"
cac = strsplit( str, section_separator ...
, 'DelimiterType', 'RegularExpression' );
cac( cellfun( @isempty, cac ) ) = [];
len = length( cac );
names = create_list_of_names_( cac );
out = initiate_structure_( len, names, 0 );
for jj = 1 : len
out(jj) = parse_one_section_( cac{jj}, out(jj) );
end
end
function sas = parse_one_section_( str, sas )
cac = textscan( str, '%s%f' );
for jj = 1 : length( cac{1,1} )
sas.( cac{1,1}{jj} ) = cac{1,2}(jj);
end
end
function cac = create_list_of_names_( sections )
str = cat( 2, sections{:} );
cac = textscan( str, '%s%*f' );
cac = permute( unique( cac{1} ), [2,1] );
end
function sas = initiate_structure_( len, names, val )
cell_values = num2cell( val( ones(len,length(names)) ) );
sas = cell2struct( cell_values, names, 2 );
end
0 comentarios
Más respuestas (1)
Guillaume
el 11 de Jun. de 2015
Editada: Guillaume
el 11 de Jun. de 2015
You're way overcomplicating it:
content = fileread('fic.txt'); %read all file at once:
tsteps = regexp(content, 'timestep\s+(\d+)\s+[^*]*?(HO2\s+\d+|\*)', 'tokens');
out = cell2mat(cellfun(@(tstep) [str2double(tstep{1}) str2double(regexp(tstep{2}, '\d+', 'match', 'once'))], tsteps', 'UniformOutput', false))
The first regular expression capture the first number after 'timestep, then matches anything but '*' until it finds 'HO2' followed by a number or a '*'. The 'HO2' with number or the '*' is the second capture. (Unfortunately you can't capture just the number due to limitations of matlab regular expression engine. You can't have a capture within a non-capturing group). In the end, for each timestep you get a cell containing a 1x2 cell array whose 1st cell is the timestep, and 2nd cell is the 'HO2' line if present or '*' if not.
The 2nd regular expression extract the number from the 'HO2' line and pass it to str2double (along with the timestep). If there's no 'HO2' line, then regexp return empty which str2double converts to NaN.
Note that your example does not have an HO2 line!
2 comentarios
Guillaume
el 11 de Jun. de 2015
Oh! of course, it's capturing the '2' of 'HO2'. Just replace the second regular expression by
regexp(tstep{2}, '(?<=HO2\s+)\d+', 'match', 'once')
Ver también
Categorías
Más información sobre Text Data Preparation en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!