Splitting Characters in A Cell Array

Hi All,
I am trying to split some content in a cell array into separate portions. I've tried converting to a string and using strsplit, but I am not getting the results I want because of the datatype syntax.
Came across the cellfun command, but not really sure how to implement it.
Here is what I have
'P245/65R17 105S'
'P265/70R16 111S'
'P275/55R20 111H'
'285/60R18 120H'
'P235/70R17 108S'
What I need:
'P245/' '65' 'R' '17' '105' 'S'
'P265/' '70' 'R' '16' '111' 'S'
'P275/' '55' 'R' '20' '111' 'H'
'285/' '60' 'R' '18' '120' 'H'
'P235/' '70' 'R' '17' '108' 'S'
Thanks in advance!

 Respuesta aceptada

Jan
Jan el 11 de Nov. de 2015
Data = {'P245/65R17 105S'; ...
'P265/70R16 111S'; ...
'P275/55R20 111H'; ...
'285/60R18 120H'; ...
'P235/70R17 108S'};
n = numel(Data);
Result = cell(n, 6);
for k = 1:n
S = Data{k};
p = strfind(S, '/');
% 'P245/65R17 105S'
% 'P245/' '65' 'R' '17' '105' 'S'
Result(k, :) = {S(1:p), S(p+1:p+2), S(p+3), S(p+4:p+5), S(p+7:p+9), S(p+10)};
end
Does this help already? Or do strings appear, which do not match this pattern? If so, you can search for the space also, use the length of the strings or whatever.

5 comentarios

Hi Jan,
Thanks, the for loop never occurred to me as an option. It works up to a certain iteration. Then I get the following error:
Index exceeds matrix dimension
I believe it is because the data has alot more variance beside P265 and 285
Here's the complete cell array that accounts for all variable types in my data:
'P265/70R16 111S'
'P275/55R20 111H'
'285/60R18 120H'
'P235/70R17 108S'
'275/50R17 ST'
'LT245/70R17 128/112R'
I tried doing another strfind in the for loop but it was not working
n = numel(Data);
Result = cell(n, 6);
for k = 1:n
S = Data{k};
p = strfind(S, '/');
q = strfind(S, 'ST');
% 'P245/65R17 105S'
% 'P245/' '65' 'R17' '105' 'S'
Result(k, :) = {S(1:p), S(p+1:p+2), S(p+3:p+5), S(p+5:q), S(q+1)};
end
I am sure the modified Result has some errors.
Perhaps an if statement within the for loop that includes a strfind for 'ST' and '128/112' would work? But I am not sure how to do that while keeping the array in the exact same order.
Guillaume
Guillaume el 12 de Nov. de 2015
regexp will process your whole cell array at once without the need of a loop. It is also easy (much easier than coding it yourself) to cope with patterns of different lengths. And if a string does not conform to the pattern, such as your shorter string, it simply returns an empty cell for that string instead of terminating the program with an error as your hand coded loop would do (your shorter string will result in an index exceeds matrix dimension error).
The regular expression I posted can easily be modified to cope with the two new patterns you've added, but you need to explain how these two patterns are to be split.
If you really do want to go down the route of writing your own parser, then you need a lot more than a strfind and indexing to make your parser robust.
Jan
Jan el 12 de Nov. de 2015
I agree that regexp has several advantages. The robustness of the parsing depeneds critically on the exact definition of the wanted result. While regexp replies empty fields for not matching strings, the loop might stop with an error directly. Both methods, a manual parsing and regexp are only as robust as the control of the results.
What is the wanted result for:
'275/50R17 ST'
'LT245/70R17 128/112R'
Thanks everyone for all the useful input! Guillaume, regexp actually gave me my desired output. Made an edit that gave me the following:
split = regexp(size,'([P-T])(\d+)(\D+)(\d+)([A-Z])(\d+) (\d+)([A-Z])', 'tokens', 'once');
% 'P' '265' '/' '70' 'R' '16' '111' 'S'
However, it ends up skipping the rows that do not follow that exact character configuration (i.e. skips something like 185/65R15 or LT245/70R17 128/112R) and continues iterating.
The end result I would like is the following:
'P' '265' '/' '70' 'R' '16' '111' 'S'
'275' '/' '50' 'R' '17' 'ST'
'LT' '245' '/' '70' 'R' '17' '128/112' 'R'
Jan
Jan el 13 de Nov. de 2015
@Aldrich: The shown result cannot be represented in Matlab. If it is stored as a cell string, the missing elements must be at least [], because an array must have the same number of elements per row.

Iniciar sesión para comentar.

Más respuestas (1)

Guillaume
Guillaume el 11 de Nov. de 2015
Editada: Guillaume el 11 de Nov. de 2015
Use the power of Regular Expressions. It's a daunting language at first but it's very powerful:
data = {'P245/65R17 105S';
'P265/70R16 111S';
'P275/55R20 111H';
'285/60R18 120H';
'P235/70R17 108S'};
splitdata = regexp(data, '(.+/)(\d+)([A-Z])(\d+) (\d+)([A-Z])', 'tokens', 'once');
splitdata = vertcat(splitdata{:})
The regular expression is divided into tokens (the () in the regex)
  • the 1st token is one or more (the +) character (the .) followed by '/'
  • the 2nd token is one or more (the +) digit (the \d)
  • the 3rd token is a single character between A and Z (the [A-Z])
  • 4th token, see 2nd
  • it then matches a space which is not part of any token
  • 5th token, see 2nd
  • 6th token, see 3rd

3 comentarios

Star Strider
Star Strider el 12 de Nov. de 2015
Great documentation of the regexp call!
Aldrich To
Aldrich To el 12 de Nov. de 2015
Yes, it is!
A regex that would most likely work with all your cases would be
regexp(data, '([A-Z]*)(\d+)(/)(\d+)([A-Z])(\d+) (\d+(/\d+)?)?([A-Z])', 'tokens')

Iniciar sesión para comentar.

Categorías

Preguntada:

el 11 de Nov. de 2015

Comentada:

el 13 de Nov. de 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by