MATLAB Answers

Reading specific data from formatted txt files - looks very dificult

1 view (last 30 days)
djr
djr on 2 Aug 2014
Commented: djr on 5 Aug 2014
Hi,
I have 15 ascii files. The file names are 1948_1950, 1950_1955, 1955_1960, 1960_1965, ..., 2010_2014 (all files except the frist and the last have 5 yaer span in the name). The 15th file is Kos.txt that has only dates and hours from 1948 to 2010 (but not all dates in that period). I've attached 1948_1950.txt and Kos.txt files.
You'll see that files with year in their name have year and time next to the word 'CENTERS' when you open them. So, the first file has 480101 0000 indicating date Jan 01, 1949 (date format is 'yymmdd') at hour 00. About 40 lines below is the following line:
"k io lon lat f c dp rd zs up vp lonv latv".
I need data that are below that line, in this case it would be:
"1 11 63.14 50.20. etc...
If you go through the file you'll see that pattern. Date and time are next to the CENTER word and then about 40 lines below are the date that I need (always below the line that starts with "k io lon lat ...".
However, there are three additional problems:
  1. I need these information only for dates and hours specified in Kos.dat file
  2. Date formats in Kos.dat and the files with years in their name are not the same
  3. Date format in files with years in their name is 'yymmdd', but when it comes to year 2000-2010 then 101 would be 000101 for Jan 1st, 2000. Therefore, zeros before the first integer are missing.
I know that this might be very challenging, but I would very much appriciate help.
Thanks in advance, Djordje

  4 Comments

Show 1 older comment
djr
djr on 3 Aug 2014
This is the last piece of the problem that I have. I tried, but not sure how to eliminate the lines that I don't need.
per isakson
per isakson on 3 Aug 2014
Did you specify
  • in what form you want the result
  • the date format used in Kos.txt
dpb
dpb on 3 Aug 2014
As I showed you before, the easiest will likely be to read the whole file into memory and then select those wanted (or eliminate the unwanted).
The rest is pretty much as IA say just more or less trivial grunt work of counting lines, creating format strings and using textscan and/or other io functions.
I don't see a piece of the puzzle that hasn't been addressed in one of the previous postings other than perhaps finding the given line. That's pretty much either
a) use a fixed headerlines count if the offset is fixed or
b) read line-by-line until find the string. That is indeed pretty simple...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,'a unique pattern in the target string'), break,end
end
If you need to find the number of lines to the given one the first time so can use headerlines later for multiple sections that are a fixed (but initially unknown) separation, then just add a counter to the loop.

Sign in to comment.

Accepted Answer

per isakson
per isakson on 3 Aug 2014
Edited: per isakson on 3 Aug 2014
I disagree, it's not that simple. Ok, it depends.
I've chosen to divide the task into two steps
  1. Read the data-file and put the required data into a containers.Map object. The object may be saved to a mat-file. More data can be added to the object later. There are methods with which one may inspect data interactively.
  2. Loop over the "keys" of the key-file and print result to the screen. It's a demo after all.
Questions on performance and memory usage are postponed.
Error handling and more remains, e.g. testing and documentation.
&nbsp
Demo:
>> specific_data
Key: 19490101T0000, Data:
Key: 19490101T0600, Data:
Key: 19490101T1200, Data:
1.0e+03 *
Columns 1 through 9
0.0010 0 0.0596 0.0436 1.0379 -0.0002 0.0072 0.0093 0.1614
Columns 10 through 13
0.0006 0.0004 0.0560 0.0483
Key: 19490101T1800, Data:
Key: 19490102T0000, Data:
....
where
function specific_data
key_filespec = 'h:\m\cssm\Kos.txt';
met_filespec = 'h:\m\cssm\1948_1950.txt';
lib = containers.Map( 'KeyType', 'char', 'ValueType', 'any' );
lib = met2lib( met_filespec, lib );
fid = fopen( key_filespec );
cac = textscan( fid, '%s' );
fclose(fid);
for kk = 1 : length( cac{1} )
key = datestr( datevec( cac{1}(kk), 'ddmmyyyyHH' ) ...
, 'yyyymmddTHHMM' );
if not(isrow( key ))
keyboard
end
fprintf( '\nKey: %s, Data: \n', key )
if isKey( lib, key )
disp( lib( key ) )
end
pause(0.1)
end
end
and
function lib = met2lib( filespec, lib )
str = fileread( filespec );
cac = strtrim( strsplit( str, 'CENTRES:' ) );
cac(1) = [];
for bb = 1 : length( cac )
block_str = cac{bb};
datetime_str = repmat( '0', 1, 11 );
str = strtrim( block_str(1:12) );
datetime_str( end-length(str)+1 : end ) = str;
timekey = datestr( datevec(datetime_str,'yymmdd HHMM',1940)...
, 'yyyymmddTHHMM' );
colhead_xpr ...
= 'k\s+io\s+lon\s+lat\s+f\s+c\s+dp\s+rd\s+zs\s+up\s+vp\s+lonv\s+latv\s+';
str = regexp( block_str, ['(?<=',colhead_xpr,').+$'], 'match' );
if not( isempty( str ) )
num_val = str2num( str{:} );
else
num_val = [];
end
lib( timekey ) = num_val;
end
end

  5 Comments

Show 2 older comments
djr
djr on 5 Aug 2014
Hi... I played a little bit with this code. It looks like it's a very sophisticated way of dealing with data sets. How can I now export data to an array or to something that I can actually see. I figured out I could use:
valueSet = values (lib,{'key'}},
e.g. key = 19491020T0600. But it gives me a specification, like [3x13 double].
Also, is there a way to make it automatic for all files that I have or I need to run it separately for each file? Thanks
per isakson
per isakson on 5 Aug 2014
  • "But it gives me a specification, like [3x13 double]" &nbsp This comment indicates that you badly need to do some getting-started-exercises with the Matlab Desktop before you start experimenting with deeply nested cell arrays.
  • "I figured out" &nbsp The MathWorks forbid me to use the acronym, RTFM. Even after 20+ years with Matlab I read the on-line help all the time.
  • " is there a way to make it automatic for all files" &nbsp Yes, my code is the start of something automatic. But since you did not indicate how you will use the data, I just dumped it on the screen.
  • "very sophisticated" &nbsp I tried to structure the code somewhat and I use regular expressions. Stuctured programming was regarded sophisticated in the late seventies. My use of regular expressions might be sophisticated in the Matlab world.
  • My lib is way better than your eval.
/ not so humble
djr
djr on 5 Aug 2014
Sorry if you are offended. As I said before, I just started using Matlab and I have to do this asap. I know that most of my questions are maybe even stupid but I have like 2 weeks to finish this and 2 weeks of Matlab experience so far.
Thanks... P.S. It's a way better...

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by