looking for regular expression to parse sparse data

Hi,
i have a sparse mass matrix exported from ansys, and the data looks as follows:
[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08
Im looking to put this into a dense matrix, but well enough will be to store all the items in a cell array of 3 columns: x, y, data by N rows, where the regular expression will read to the end of the file.
I would then search the cell array for the largest index (X,Y) and initialize an array of that size, then copy the data over from the cell array to the matrix.
Is this possible?

 Respuesta aceptada

This uses one regexp call to parse the data into specific cells that are read with sscanf, and then partitioned into individual columns using the reshape function in the ‘Out’ assignment. It may not be exactly what you intended (I doubt that is possible), however it has the virtue of produciing the desired result:
M = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08';
V = regexp(M, '\[', 'split');
R = sscanf([V{:}], '%d,%d]: %f');
Out = reshape(R, 3, []);
with:
FirstFiveColumns = Out(:,1:5)
producing:
FirstFiveColumns =
1 1 1 1 1
1 4 7 10 13
1.157e-07 2.332e-08 2.146e-08 5.835e-08 4.043e-08
with ‘x’ being the first row, ‘y’ being the second row, and the floating-point variables (I have no idea what they represent) the third row.

6 comentarios

Stephen23
Stephen23 el 14 de Nov. de 2020
Editada: Stephen23 el 14 de Nov. de 2020
Without regexp or reshape, sscanf can parse it directly:
format long
str = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08';
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 52×3
1.000000000000000 1.000000000000000 0.000000115700000 1.000000000000000 4.000000000000000 0.000000023320000 1.000000000000000 7.000000000000000 0.000000021460000 1.000000000000000 10.000000000000000 0.000000058350000 1.000000000000000 13.000000000000000 0.000000040430000 1.000000000000000 16.000000000000000 0.000000010110000 1.000000000000000 19.000000000000000 0.000000008211000 1.000000000000000 22.000000000000000 0.000000025900000 1.000000000000000 25.000000000000000 -0.000000034750000 1.000000000000000 28.000000000000000 -0.000000028540000
Tyler
Tyler el 3 de En. de 2021
Hi, both of the answers above work if i have the data in a 'string". However, if i import from a text file: Mfile = fileread('brg1_m.dat'); it comes in as a 1x270000000 character vector. I wasnt sure if the size of the vector was the issue, so i just used the first 1000 characters, and it still wont work.
Is there a way to convert a character vector into a string? I am using R2016b
Thanks alot!
It would be easier to attempt to solve this if ‘brg1_m.dat’ was uploaded so we could work with it. There may be better ways to import it.
With respect to compatibility, the detectImportOptions function could be important here, and since it was introduced in R2016b, you should have it.
Be sure to download and install any Updates if available (I don’t remember what version/release those began with) so that you have the most current version of R2016b.
"both of the answers above work if i have the data in a 'string'. However... it comes in as a 1x270000000 character vector. ... it still wont work."
I very much doubt that it would make any difference.
The code in my comment already uses a character vector, not a string. Using the equivalent string would give exactly the same output, because either a character vector or a string scalar can be supplied to sscanf, it makes zero difference. Lets try it:
Character vector:
str = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08'; % char vector
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
1.0000 1.0000 0.0000 1.0000 4.0000 0.0000
String:
str = "[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08"; % string
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
1.0000 1.0000 0.0000 1.0000 4.0000 0.0000
Most likely your character vector does not have the exact format that you showed us in your original question, e.g. contains some leading characters or non-displaying character, or some other difference. Both Star Strider's and my code rely on the input having the exact format that you showed in your question.
Tyler
Tyler el 3 de En. de 2021
Thank you, this is correct. There was one line of header in the file.
Thanks so much
As always, my pleasure!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Productos

Versión

R2016b

Preguntada:

el 13 de Nov. de 2020

Comentada:

el 4 de En. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by