Splitting table text column with multiple delimiters based on parantheses.

Question

greenyellow22 el 26 de Mayo de 2022

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1727700-splitting-table-text-column-with-multiple-delimiters-based-on-parantheses

Comentada: Stephen23 el 23 de Jun. de 2022

Hey! I have a table with multiple columns and one of them is in text form (called 'Status', for cell examples see below). I would like to split this colum into two based on the parantheses, so that I have one column with the text (e.g. User has collected a REGULAR coin at/ Access to new world) and one with the numbers (e.g. -5.5, 10.9, 90.1).

oldtable.Status =
Player collected a YELLOW coin at (-5.5, 10.9, 90.1)
Access to new world 
New world entered
Player collected a GREEN coin at (6.6, 8.7, 74.9)
...
Player collected a GREEN coin at (4.7, 9.7, 71.7)    

First, I tried simply the split function and got an error message:

newtable = split(oldtable.Status)
Error using split
Element 2 of the text contains 3 delimiters while the previous elements have 6. All elements must contain the same number of delimiters.

And also tried the multiple delimierts function:

newtable = split(oldtable.Status["("])
Invalid expression. When calling a function or indexing a variable, use parentheses. Otherwise, check for mismatched delimiters.

Thank you in advance! :-)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Stephen23 el 27 de Mayo de 2022

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1727700-splitting-table-text-column-with-multiple-delimiters-based-on-parantheses#answer_972810

Abrir en MATLAB Online

S = ["Player collected a YELLOW coin at (-5.5, 10.9, 90.1)";"Access to new world";"New world entered";"Player collected a GREEN coin at (6.6, 8.7, 74.9)"]
S = 4×1 string array
    "Player collected a YELLOW coin at (-5.5, 10.9, 90.1)"
    "Access to new world"
    "New world entered"
    "Player collected a GREEN coin at (6.6, 8.7, 74.9)"
T = regexp(S,'^(.*?)(\(.*\))?$','once','tokens');
T = vertcat(T{:})
T = 4×2 string array
    "Player collected a YELLOW coin at "    "(-5.5, 10.9, 90.1)"
    "Access to new world"                   ""                  
    "New world entered"                     ""                  
    "Player collected a GREEN coin at "     "(6.6, 8.7, 74.9)"  

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

greenyellow22 el 23 de Jun. de 2022

Hey Stephen23! Could you explain me how you came up with the expression '^(.*?)($.*$)?$' for filtering out the coordinates? I would like to slightly change that expression in order to also filter out coordinates that are not in the end of the string but also in the middle (e.g. 'Player took 1 damage at (86.8, 59.9, -22.6) and was knocked back with booster: 14 away from enemy: True'). Thanks in advance!

Stephen23 el 23 de Jun. de 2022

Abrir en MATLAB Online

% '^(.*?)(\(.*\))?$'
%  ^                  match the start of the string
%   ^   ^             define group 1
%    ^^^              match any character zero or more times, lazy
%        ^      ^     define group 2
%         ^^  ^^      match literal parentheses
%           ^^        match any character zero or more times, greedy
%                ^    make group 2 optional
%                 ^   match end of the string

The regular expression basically works by trying to match the literal parentheses, and if that fails, the lazy match will catch everything. It expects the parentheses to be at the end of the text.

If you want to obtain and "filter out" the coordinates anwhere in the text, perhaps:

S = ["Player collected a YELLOW coin at (-5.5, 10.9, 90.1)";"Access to new world";"New world entered";"Player collected a GREEN coin at (6.6, 8.7, 74.9)";"Player took 1 damage at (86.8, 59.9, -22.6) and was knocked back with booster: 14 away from enemy: True"]
S = 5×1 string array
    "Player collected a YELLOW coin at (-5.5, 10.9, 90.1)"
    "Access to new world"
    "New world entered"
    "Player collected a GREEN coin at (6.6, 8.7, 74.9)"
    "Player took 1 damage at (86.8, 59.9, -22.6) and was knocked back with booster: 14 away from enemy: True"
R = '([^\(]+)(\(.+\))?(.*)';
T = regexp(S,R,'tokens','once');
T = vertcat(T{:})
T = 5×3 string array
    "Player collected a YELLOW coin at "    "(-5.5, 10.9, 90.1)"     ""                                                            
    "Access to new world"                   ""                       ""                                                            
    "New world entered"                     ""                       ""                                                            
    "Player collected a GREEN coin at "     "(6.6, 8.7, 74.9)"       ""                                                            
    "Player took 1 damage at "              "(86.8, 59.9, -22.6)"    " and was knocked back with booster: 14 away from enemy: True"

Iniciar sesión para comentar.

Answer 2

dpb el 27 de Mayo de 2022

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1727700-splitting-table-text-column-with-multiple-delimiters-based-on-parantheses#answer_972800

Abrir en MATLAB Online

A rudeness about way the string functions are implemented -- they won't return variable numbers of elements that would be extremely handy -- without more exotic things like regexp, the workaround is something like

>> Status =["Player collected a YELLOW coin at (-5.5, 10.9, 90.1)"
"Access to new world "
"New world entered"
"Player collected a GREEN coin at (6.6, 8.7, 74.9)"];
>> ix=contains(Status,'(');
>> numStr(ix,1)=extractBetween(Status(ix),'(',')')
numStr = 
  4×4 string array
    "-5.5, 10.9, 90.1"    <missing>    <missing>    "6.6, 8.7, 74.9"
    <missing>             <missing>    <missing>    <missing>       
    <missing>             <missing>    <missing>    <missing>       
    "6.6, 8.7, 74.9"      <missing>    <missing>    <missing>       
>> 

The above just pasted in your displayed text and made it into something readable the simplest way possible -- each line is an array element. To duplicate your exact case would need to know the precise format of the input, but the above is how to work with the higher-leve builtin functions to get around their penchant to have everything excessively uniform.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Splitting table text column with multiple delimiters based on parantheses.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Splitting table text column with multiple delimiters based on parantheses.

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos