Question about reading in text files: alternatives
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello, thanks for reading this.
I wrote a reader for importing ANSYS mesh files, but in my opinion its a bit inelegant. What I do is read the file, write all lines as strings, and then parse through the file for identifiers (like point and connectivity information). It works, but it is slow. Any file around 1 MB loads slowly, and anything larger loads exponentially slower.
Is there a better way of doing this? I currently open the files and parse every line into a string with the commands:
function [Points, vFaceMx] = getPointsAndFacesforMESH(fileName)
wb2 = waitbar(0,'Loading Mesh');
filename=fileName;
fid = fopen(filename, 'rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
fid = fopen(filename, 'rt');
A=[];
ct = 0;
%%Write all lines as strings
while feof(fid) == 0
tline = fgetl(fid);
A_c=size(A, 2);
t_c=size(tline, 2);
if A_c > t_c
tline=[tline, NaN(size(tline, 1), A_c-t_c)];
end
if A_c < t_c
A=[A, NaN(size(A, 1), t_c-A_c)];
end
A = [A; tline];
end
fclose(fid);
And from there, I parse through using strcmp commands. I load the data I want into data arrays of strings, then I use sscanf commands to bring it back into numerical data.
Any advice would be appreciated.
6 comentarios
Cedric
el 25 de Feb. de 2013
Editada: Cedric
el 25 de Feb. de 2013
As mentioned above, the best way to discuss the method is certainly to paste part of the file (e.g. 20-40 first rows) below the original question. When you have a text file, what you read is most often strings, so there is no need perform a translation to string.If you look at the class of tline right after the call to fget(), you will see that it is char. The only thing that you need to do in principle is parsing and extracting content as string/integer/double/etc from the lines that you read. There are several ways to achieve this. As mentioned, for most simple cases were lines have a simple, regular structure, f/scanf() will be fine; for more complicated cases, regular expressions [regexp()] are usually an invaluable tool when available.
Respuesta aceptada
per isakson
el 25 de Feb. de 2013
Editada: per isakson
el 25 de Feb. de 2013
Some comments:
- I assume it is a text file that resembles the example below
- I guess that line-breaks are not really significant
- the first while-loop counts the lines - is that needed?
- in the second while-loop A is growing, which is bad for performance
- the lines are padded with char(0) - space char(32) is "more standard"
- I assume your file fits in memory (ram)
- the example code below with textscan returns A, which is identical to A returned by getPointsAndFacesforMESH - with the exception of padding with char(32).
tic,
str = fileread( filespec );
et = toc;
tic,
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A1 = char( cac{1} );
et = [ et, toc ];
tic,
[ A2, ~ ] = getPointsAndFacesforMESH( filespec );
et = [ et, toc ];
.
Sample text file
(0 "GAMBIT to Fluent File")
(0 "Dimension:") (2 2)
(10 (0 1 10 1 2)) (10 (1 1 10 1 2)(
0.0000000000e+000 1.0000000000e+000
1.0000000000e+000 1.0000000000e+000
0.0000000000e+000 0.0000000000e+000
1.0000000000e+000 0.0000000000e+000
1.0000000000e+000 3.3333333333e-001
1.0000000000e+000 6.6666666667e-001
0.0000000000e+000 6.6666666667e-001
0.0000000000e+000 3.3333333333e-001
3.3333333333e-001 1.0000000000e+000
6.6666666667e-001 1.0000000000e+000
3.3333333333e-001 0.0000000000e+000
6.6666666667e-001 0.0000000000e+000
6.6666666667e-001 3.3333333333e-001
6.6666666667e-001 6.6666666667e-001
3.3333333333e-001 3.3333333333e-001
3.3333333333e-001 6.6666666667e-001 ))
(0 "Faces:") (13(0 1 18 0))
(13(3 1 9 3 0)
( 2 1 7 9 0 2 7 8 6 0 2 8 3 3 0 2 3 b 3 0 2 b c 2 0 2 c ... 6 4 0 2 6 2 7 0 ))
(13(4 a c 14 0)( 2 1 9 0 9 2 9 a 0 8 2 a 2 0 7 ))
(13(6 d 18 2 0)
( 2 d c 1 2 2 5 d 1 4 2 f b 2 3 2 d f 2 5 2 f 8 3 6 2 e ... 7 8 2 9 10 8 9 ))
(0 "Cells:") (12 (0 1 9 0)) (12 (2 1 9 1 3))
(0 "Zones:") (45 (2 fluid fluid)())
(45 (3 wall new_wall.4)())
(45 (4 mass-flow-inlet wall.4)())
(45 (6 interior default-interior)())
3 comentarios
per isakson
el 26 de Feb. de 2013
Editada: per isakson
el 26 de Feb. de 2013
"A in the second while-loop A is growing," [sic]
Search for "preallocating memory" in the help. Doc says:
Preallocating Memory
Repeatedly expanding the size of an array over time, (for example, adding more
elements to it each time through a programming loop), can adversely affect the
performance of your program. This is because
MATLAB has to spend time allocating more memory each time you increase the
size of the array.
This newly allocated memory is likely to be noncontiguous, thus slowing down
any operations that MATLAB needs to perform on the array.
.
enough RAM
when working with files it makes a big difference if the file fits in the system cache. See the Windows Task Manager.
Más respuestas (0)
Ver también
Categorías
Más información sobre Data Import and Export en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!