Question about reading in text files: alternatives

5 visualizaciones (últimos 30 días)
Brian
Brian el 21 de Feb. de 2013
Hello, thanks for reading this.
I wrote a reader for importing ANSYS mesh files, but in my opinion its a bit inelegant. What I do is read the file, write all lines as strings, and then parse through the file for identifiers (like point and connectivity information). It works, but it is slow. Any file around 1 MB loads slowly, and anything larger loads exponentially slower.
Is there a better way of doing this? I currently open the files and parse every line into a string with the commands:
function [Points, vFaceMx] = getPointsAndFacesforMESH(fileName)
wb2 = waitbar(0,'Loading Mesh');
filename=fileName;
fid = fopen(filename, 'rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
fid = fopen(filename, 'rt');
A=[];
ct = 0;
%%Write all lines as strings
while feof(fid) == 0
tline = fgetl(fid);
A_c=size(A, 2);
t_c=size(tline, 2);
if A_c > t_c
tline=[tline, NaN(size(tline, 1), A_c-t_c)];
end
if A_c < t_c
A=[A, NaN(size(A, 1), t_c-A_c)];
end
A = [A; tline];
end
fclose(fid);
And from there, I parse through using strcmp commands. I load the data I want into data arrays of strings, then I use sscanf commands to bring it back into numerical data.
Any advice would be appreciated.
  6 comentarios
Cedric
Cedric el 25 de Feb. de 2013
Editada: Cedric el 25 de Feb. de 2013
As mentioned above, the best way to discuss the method is certainly to paste part of the file (e.g. 20-40 first rows) below the original question. When you have a text file, what you read is most often strings, so there is no need perform a translation to string.If you look at the class of tline right after the call to fget(), you will see that it is char. The only thing that you need to do in principle is parsing and extracting content as string/integer/double/etc from the lines that you read. There are several ways to achieve this. As mentioned, for most simple cases were lines have a simple, regular structure, f/scanf() will be fine; for more complicated cases, regular expressions [regexp()] are usually an invaluable tool when available.
Morteza
Morteza el 25 de Feb. de 2013
Editada: Morteza el 25 de Feb. de 2013
str2doubleq.cpp
this function is really fast to converting string data to numerical data. you can download it here and use according it's description.

Iniciar sesión para comentar.

Respuesta aceptada

per isakson
per isakson el 25 de Feb. de 2013
Editada: per isakson el 25 de Feb. de 2013
Some comments:
  • I assume it is a text file that resembles the example below
  • I guess that line-breaks are not really significant
  • the first while-loop counts the lines - is that needed?
  • in the second while-loop A is growing, which is bad for performance
  • the lines are padded with char(0) - space char(32) is "more standard"
  • I assume your file fits in memory (ram)
  • the example code below with textscan returns A, which is identical to A returned by getPointsAndFacesforMESH - with the exception of padding with char(32).
tic,
str = fileread( filespec );
et = toc;
tic,
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A1 = char( cac{1} );
et = [ et, toc ];
tic,
[ A2, ~ ] = getPointsAndFacesforMESH( filespec );
et = [ et, toc ];
.
Sample text file
(0 "GAMBIT to Fluent File")
(0 "Dimension:") (2 2)
(10 (0 1 10 1 2)) (10 (1 1 10 1 2)(
0.0000000000e+000 1.0000000000e+000
1.0000000000e+000 1.0000000000e+000
0.0000000000e+000 0.0000000000e+000
1.0000000000e+000 0.0000000000e+000
1.0000000000e+000 3.3333333333e-001
1.0000000000e+000 6.6666666667e-001
0.0000000000e+000 6.6666666667e-001
0.0000000000e+000 3.3333333333e-001
3.3333333333e-001 1.0000000000e+000
6.6666666667e-001 1.0000000000e+000
3.3333333333e-001 0.0000000000e+000
6.6666666667e-001 0.0000000000e+000
6.6666666667e-001 3.3333333333e-001
6.6666666667e-001 6.6666666667e-001
3.3333333333e-001 3.3333333333e-001
3.3333333333e-001 6.6666666667e-001 ))
(0 "Faces:") (13(0 1 18 0))
(13(3 1 9 3 0)
( 2 1 7 9 0 2 7 8 6 0 2 8 3 3 0 2 3 b 3 0 2 b c 2 0 2 c ... 6 4 0 2 6 2 7 0 ))
(13(4 a c 14 0)( 2 1 9 0 9 2 9 a 0 8 2 a 2 0 7 ))
(13(6 d 18 2 0)
( 2 d c 1 2 2 5 d 1 4 2 f b 2 3 2 d f 2 5 2 f 8 3 6 2 e ... 7 8 2 9 10 8 9 ))
(0 "Cells:") (12 (0 1 9 0)) (12 (2 1 9 1 3))
(0 "Zones:") (45 (2 fluid fluid)())
(45 (3 wall new_wall.4)())
(45 (4 mass-flow-inlet wall.4)())
(45 (6 interior default-interior)())
  3 comentarios
Brian
Brian el 25 de Feb. de 2013
Wow, I just tried this, and its amazing how much faster this is. Thanks, a lot. I'm going to look more into these lines in my own time:
fid = fopen( filespec, 'r' ); cac = textscan( fid, '%[^\n]' ); fclose(fid); A = char( cac{1} );
because these seem to contain all the magic. My code is now benchmarked by the visualization of the mesh, which is to be expected of MATLAB.
Thanks a lot!
per isakson
per isakson el 26 de Feb. de 2013
Editada: per isakson el 26 de Feb. de 2013
"A in the second while-loop A is growing," [sic]
Search for "preallocating memory" in the help. Doc says:
Preallocating Memory
Repeatedly expanding the size of an array over time, (for example, adding more
elements to it each time through a programming loop), can adversely affect the
performance of your program. This is because
MATLAB has to spend time allocating more memory each time you increase the
size of the array.
This newly allocated memory is likely to be noncontiguous, thus slowing down
any operations that MATLAB needs to perform on the array.
.
enough RAM
when working with files it makes a big difference if the file fits in the system cache. See the Windows Task Manager.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Data Import and Export en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by