Removing specific characters from string in nested cells
16 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Bob Thompson
el 13 de Jun. de 2018
Comentada: Stephen23
el 30 de Dic. de 2022
I have a series of strings which are contained within a nested cell array (because regexp loves to nest cells), and I would like to remove any non numeric or white space characters from them so that I can convert them to doubles, namely astrick.
I'm looking for the least painful way of removing any of these special characters from all strings. I do not have a sample file to attach, sorry, but I have dictated the shape of a sample array below.
X == 1x1 cell
X{1} == 1x1 cell (because regexp can't help itself apparently)
X{1}{1} = {'1234., ';'12.,* ';'1234., ','123.,* ',' 321.,* '};
12 comentarios
Stephen23
el 15 de Jun. de 2018
@Bob Nbob: you are right, it does not appear in the Mfile help. I notice that many other useful regular expression features also do not appear in the Mfile help: notably missing are dynamic expressions, lookaround operators, and named capture.
Both the inbuilt help and the page I linked to give a very useful introduction, and explain all features of regular expressions in MATLAB:
doc regexp
doc('Regular Expressions')
Respuesta aceptada
Paolo
el 15 de Jun. de 2018
Editada: Paolo
el 15 de Jun. de 2018
Perhaps this can easily be achieved in two steps. For your input:
1 ****TABLE1****
COLUMN1= 1.12, 2.23, 3.34, 4.45, 5.56, 6.67,
COLUMN2= 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
COLUMN3= 1.23, 0.34, 3.45, 5.78*, 6.54*, 8.23,
1 ****TABLE2****
data = fileread('CORR.txt');
expression_sub = '(?<=\d\.\d*\*?)([\*\.,])';
data = regexprep(data,expression_sub,'');
Data will now not contain those characters. Data is now:
' 1 ****TABLE1****
COLUMN1= 1.12 2.23 3.34 4.45 5.56 6.67
COLUMN2= 0.00 0.00 0.00 0.00 0.00 0.00
COLUMN3= 1.23 0.34 3.45 5.78 6.54 8.23
1 ****TABLE2****
'
Step 2. Match your data. Live regex here. The expression is greedy and will try to match as many digit, full stop, digits combinations as it can. Therefore you don't need to repmat your expression like you showed.
expression_match = '(?<=COLUMN[1,3]=\s)(\d.?\d*\s)*';
[tokens,match] = regexp(data_sub,expression_match,'tokens','match');
Matlab manipulation.
column1 = str2double(strsplit(cell2mat(tokens{1}),' '));
column3 = str2double(strsplit(cell2mat(tokens{2}),' '));
column1 =
1.1200 2.2300 3.3400 4.4500 5.5600 6.6700
column3 =
1.2300 0.3400 3.4500 5.7800 6.5400 8.2300
Más respuestas (1)
George Abrahams
el 30 de Dic. de 2022
The others are right to fix the root problem causing the tricky nested cell array. Having said that, for future reference, my deepreplace function on File Exchange / GitHub would have done exactly what you requested.
x = {{{'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '}}};
% Remove any character except for digits (0-9) and period (.)
match = regexpPattern('[^\d.]');
x = deepreplace(x,match,'');
% x = 1×1 cell array
% {1×1 cell}
% x{1} = 1×1 cell array
% {5×1 cell}
% x{1}{1} = 5×1 cell array
% {'1234.'}
% {'12.' }
% {'1234.'}
% {'12310'}
% {'321.' }
0 comentarios
Ver también
Categorías
Más información sobre Text Data Preparation en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!