extract values in text document

2 visualizaciones (últimos 30 días)
Sebastian
Sebastian el 12 de Abr. de 2012
Hi all,
I'd like to screen a text-dokument (with numeric values and character-strings in a header section which is repeated unfrequently (not periodicaly after exact N rows); and numeric values in form of a matrix underneath the header) and collect all values after a certain string.
to be more clear, here an example of the textfile I want to process:
ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
1000
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
2......
.......
999....
1000...]
ITEM: TIMESTEP
2
ITEM: NUMBER OF ATOMS
1005
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
2......
.......
1004...
1005...]
... and so on...
I'd like to extract the number of atoms within different timesteps, which means: I want to create an array, which stores all the values, that follow the string
"ITEM: NUMBER OF ATOMS"
in the text document (in the example it's the values 1000 and 1005).
How can I do that?
Thanks very much for your help! regards
Sebastian

Respuestas (1)

Ken Atwell
Ken Atwell el 12 de Abr. de 2012
For a customer file type like this, I would use a regular expression (the MATLAB function regexp) to scan the file. regexp can be a little daunting to the uninitiated, so here is a little code to get you started.
%%Read the data file
f = fopen('atomdata.txt');
t = fread(f, 'char=>char');
t=t';
fclose (f);
%%Scan for atom counts
numAtoms = regexp(t, 'ITEM: NUMBER OF ATOMS\W+([0-9]+)', 'tokens')
This will give you a cell array of text strings, which you may need to further convert to double using str2double or similar.
  2 comentarios
Sebastian
Sebastian el 12 de Abr. de 2012
Hm, that sounds very complicated.
I think there should be a easier solution.
Give me one more try to explain. My problem is not the logic of the method to process the textfile to recieve the values after the string "ITEM: NUMBER OF ATOMS". My problem is more how to deal with the text file format...
That is a textfile-example:
ITEM: TIMESTEP
10
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
20
ITEM: NUMBER OF ATOMS
5
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
30
ITEM: NUMBER OF ATOMS
4
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
40
ITEM: NUMBER OF ATOMS
7
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
50
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
To create an array, that returns the values
array = [3,5,4,7,2]
To do that, I think I need some kind of method, which at first picks the value in line 4 (which is 3). With that value one can calculate, that after 4lines of the beginning +8lines of headers + 3 lines of values = line 15 => the next value is 5 and so on...
i need some kind of following method:
array = [0 ];
for i=1:1:5
array = [array; (value of line(4+(i-1)*8+sum(array)))]
end
ok, but how should I process that textfile?
I think I could do it with a lot of dlmread-commands but that would be very costly if the files become very large...
do you have another hint for me?
thanks and kind regards,
Sebastian
Ken Atwell
Ken Atwell el 12 de Abr. de 2012
You can use fgetl in a loop to read the file line-by-line, looking for "NUMBER OF ATOMS"'... knowing that the following line is the piece of data you are looking for.
I still contend that regexp will get you what you're looking for, probably in one line of code and certainly without a loop.

Iniciar sesión para comentar.

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by