How to convert text line into numbers

3 visualizaciones (últimos 30 días)
Wisam
Wisam el 21 de Sept. de 2014
Comentada: Wisam el 22 de Sept. de 2014
I am trying to read this text and put it in a vector, some of the elements must be repeated according to the numbers before * symbol, for example the first five elements should have a value of 10 and so on:
5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57
perm_i=[];
fid=fopen(file_name_out);
textscan(fid, '%s', 1, 'delimiter', '\n', 'headerlines', row_permi_start-1);
for j=1:row_permi_end-row_permi_start
c=textscan(fid, '%s', 1, 'delimiter', '\n');
astring=cell2mat(c{1});
ind1=find(astring=='*');
ind_temp=[];
if ~isempty(ind1)
for k=1:length(ind1)
indspace=find(astring==' ');
indspace1=indspace(indspace<ind1(k));
display (indspace);
if isempty(indspace1)
indspace1=0;
else
indspace1=indspace1(end);
end
display (indspace1);
num_loc(k)=length(indspace1)+1;
indspace1=indspace1(end);
display (indspace1);
num_1(k)=str2double(astring(indspace1+1:ind1(k)-1))-1;
ind_temp=[ind_temp,indspace1+1:ind1(k)];
display (num_loc);
end
astring(ind_temp)=[];
end
acell=textscan(astring,'%f');
var_temp=acell{1,1};
if ~isempty(ind1)
var_temp_1=var_temp;
for k=1:length(ind1)
var_temp(num_loc(k)+num_1(k) :end+num_1(k))=var_temp(num_loc(k):end);
var_temp(num_loc(k)+1:num_loc(k)+num_1(k))=var_temp(num_loc(k));
display (var_temp);
num_loc=num_loc+num_1(k);
end
  2 comentarios
John
John el 21 de Sept. de 2014
I have not tried the above solutions/suggestions, but this is a natural job for regular expressions. MATLAB, the most versatile numerical computing package, provides extensive regular expression (regex) functionality. It does not have the utility of Perl, but there are enough regex varieties in MATLAB to collapse those loops into a few lines of regex code.
To get you started on regex in MATLAB:
Some of the regex functions you will likely have to use to craft a concise solution: regexp, regexprep
You will have to do a bit of reading and practising to get the hang of it. To give you an idea of how regex can serve you in parsing and manipulating the string, consider these few lines of code which give you the starting indices of the tokens -whether they have a multiplier prepended or not- you would probably want to manipulate:
myString = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57'
regexQuery = '((\d+)\*)?\d+(\.\d+)?'
indices = regexp(myString, regexQuery)
% indices = 1 6 14 19 27 35 41 49 57 63 71
The elements of indices point to the starting indices of the tokens you would be interested in. To achieve the effect of repeating numbers prepended with multipliers, you would have to look into the more advanced features of 'regexprep'.
These, and not code that ordinarily parses string tokens, are more likely to give you graceful solutions that are maintainable and readable.
You may find MATLAB's string functions useful as well:
Wisam
Wisam el 22 de Sept. de 2014
I appreciate your support, thanks

Iniciar sesión para comentar.

Respuesta aceptada

Guillaume
Guillaume el 21 de Sept. de 2014
Editada: Guillaume el 22 de Sept. de 2014
I've not looked at your code (which is badly formatted), but to convert your example into a vector of numbers I would do:
str = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
v = [];
for group = strsplit(str) %split string at spaces into groups
groupparts = strsplit(group{1}, '*'); %split group at * (if no *, no split)
if numel(groupparts) == 1
v = [v str2num(groupparts{1})];
else
v = [v repmat(str2num(groupparts{2}), 1, str2num(groupparts{1}))];
end
end
Or as I said in my comment to John's answer, if you want to use a regexprep one liner:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));

Más respuestas (1)

John
John el 21 de Sept. de 2014
Editada: John el 21 de Sept. de 2014
As mentioned before, regular expressions provide more intuitive solutions (once you get the hang of the basics). This short snippet below, which returns the answer as a numeric vector, seems to work:
input = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
regexQuery = '(?<pre>(\d+))?(\*)?(?<post>\d+(\.\d+)?)'
matches = regexp(input, regexQuery, 'names')
res = ''
for i = 1:size(matches, 2)
if (isempty(matches(i).pre))
matches(1).pre = 1;
end
res = [res repmat([' ' matches(i).post ' '], [1 str2num(matches(i).pre)])];
end
res = str2num(res)
It uses regexp once and the results of that in a simple loop that concatenates the nascent string. And I would consider this a crude solution (if it actually works :-) ) with a lot of superfluous code. My guess is that exploiting named captures and the command substitution functionality in regexprep could collapse all that into 2 or 3 commands.
  1 comentario
Guillaume
Guillaume el 22 de Sept. de 2014
Editada: Guillaume el 22 de Sept. de 2014
I would argue that regular expressions are overkill in this case, considering you only need two strsplit, one to break the string at every space and one to break those split at the '*'.
You could indeed do it with a single line regexprep, but this involve a dynamic regular expression replacement string which is not particularly cheap in term of computation time (and not particularly easy to comprehend. For the record, the one liner is:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
edit: On the other hand the regexprep is much faster than my strsplit solution.

Iniciar sesión para comentar.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Etiquetas

Aún no se han introducido etiquetas.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by