How to parse an Nx1 string array without looping through N

Question

0 votos

I have an Nx1 string array, and I can't figure out how to extract 6 chunks of text out of it and into an Nx6 cell array. The text elements are numbers, but it's simplest to not treat them as numbers at this juncture.

Here is a toy version of the string array, together with code that correctly parses out the necessary elements of CCYYMMDD and hhmm from the first element of the string array:

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
charLaunch = textscan(stringFile(1),'%*18c %2c %2c %2c %2c %*c %2c %2c');

charLaunch =

1×6 cell array

{'20'} {'02'} {'04'} {'28'} {'18'} {'48'}

However, both

charLaunchAll = textscan(stringFile,'%*18c %2c %2c %2c %2c %*c %2c %2c');

and

charLaunchAll = cell(5,6);
charLaunchAll = textscan(stringFile(:),'%*18c %2c %2c %2c %2c %*c %2c %2c');

generate the same error message:

Error using textscan

First input must be a valid file-id or non-empty character vector.

Is there a way to extract these pieces of texts out of every array member without building a loop?

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Stephen23 el 23 de Abr. de 2020

Editada: Stephen23 el 23 de Abr. de 2020

Abrir en MATLAB Online

0 votos

Using one simple regular expression:

C = {...
    'nsasondewnpnC1.b1.20020428.184800.cdf'; ...
    'nsasondewnpnC1.b1.20020428.220500.cdf'; ...
    'nsasondewnpnC1.b1.20020428.235900.cdf'; ...
    'nsasondewnpnC1.b1.20020429.013100.cdf'; ...
    'nsasondewnpnC1.b1.20020429.182500.cdf'};
out = regexp(C,'\d{2}','match');
out = vertcat(out{:})

I used a cell array of character vectors, but it will also work for a string array.

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Stephen23 el 23 de Abr. de 2020

Editada: Stephen23 el 23 de Abr. de 2020

Abrir en MATLAB Online

"... why textscan will work with a single element of a string array, but not with an entire array of strings?"

Because low-level string parsing functions parse one string element or one character vector, and textscan is ultimately just a fancy wrapper for low-level operations.

You might think of a string array as one thing, but really it is a container array of multiple character vectors, i.e. it contains lots of individual, separate character vectors, which are stored separately. Not so different from a cell array, really (search this forum for more accurate and detailed discussions on how string arrays are actually implemented).

Parsing a string array introduces ambiguities: e.g. what is the end-of-line character? textscan relies on identifying that character... but parsing a string array would (possibly, see below) require having no EOL character at all, and instead treating each string element as being de-facto delimited by some character (in which case you can trivially do this yourself, as I did in my last comment). You might think it is obvious that each string element should be treated as one line, but computers do not understand "obvious", they understand instructions in the form of code. Consider how this 2x1 string array should be parsed:

str = ["1";"2\n3"] % \n = newline

which of these should textscan(str,'%f') return?:

[1;2;3] all values, identify both newline AND different string elements as having de-facto EOL.
[1;2] newline causes parsing to finish.
[1] second element does not parse.
{[1];[2;3]} the output is not of the class requested, and the cell contents can have an arbitrary size.
error second element throws an error.

If you say the first is the correct behavior, what about the next user who expects one of the other behaviors?

Note also that text files also consist of one long character vector (people think of them as having "lines", but really they are all one long character vector interspersed with newline characters), and low-level file parsing functions also parse just that one character vector.

Leslie el 23 de Abr. de 2020

Editada: Leslie el 23 de Abr. de 2020

OK, thanks. I'd noticed that what I was trying to do "all at once" would have worked if I'd been reading a file and could have searched for the newline character, but didn't (or couldn't) carry that all the way forward to understanding how the string array was being stored. It just never occurred to me to do something like "ignore through the 'cdf' at the end of the string", which is an analog to the documentation's example of "ignore the rest of the line".

Iniciar sesión para comentar.

Answer 2

Mohammad Sami el 23 de Abr. de 2020

Abrir en MATLAB Online

0 votos

Since the pattern in your string seems to be the same, you can use the format specification to convert the string directly to datetime as follows.

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
fmt = "'nsasondewnpnC1.b1.'yyyyMMdd'.'HHmmss'.cdf'";
% the constant portion of your string is enclosed in 'single quotes';
d = datetime(stringFile,'InputFormat',fmt);

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Leslie el 23 de Abr. de 2020

Thanks, interesting useage that I didn't know about.

But I don't really want it in datetime format; I'd like the 2-digit text chunks. If I've got to clutter up my code with sending it to datetime & back, I might as well write the stupid loop. (I'm not meaning to be cranky at you; I'm just cranky that I spent a few hours today poring over documentation and Answers to do something that it seems I ought to be able to do!)

Iniciar sesión para comentar.

How to parse an Nx1 string array without looping through N

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Etiquetas

Community Treasure Hunt

How to parse an Nx1 string array without looping through N

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos