how to read grid data from text file ?
Mostrar comentarios más antiguos
hi I have a text file(attached). which contain ozone data. I am not able to read the data. since it is not in regular format. only latitude(-59.5S to 59.5N (1.00 degree steps) ) is given and on every latitude all ozone data is given so there are 288 longitudes(-179.375W to 179.375E (1.25 degree steps)) therefore 288 data points are there. but the problem is all data is in string format and we need to split data after every 3 digit. some random space is also given in the middle of the data so we have to remove that also otherwise data will not split in 3 correct digits .
later i will use inpolygon to grab out the data from specific region. that i will try later. but first i need to read this text file and took the data out.
hope you understand.
2 comentarios
Cedric
el 23 de Sept. de 2017
Does this format have a name? Is it the original format in which the data is distributed?
Respuesta aceptada
Más respuestas (2)
- Read the file as block of cellstr, convert to character array
- Convert char array of 12x75 to 1*900 line=reshape(blk.',1,[]);
- Select first 288*3 --> 864 characters c=line(1:864);
- Replace any blanks with '0' c=strrep(c,' ','0');
- Convert 3-digit fields dat=sscanf(c,'%3d');
- Go next block
Thanks to Cedric for pointing out my weak eyes... :)
file=textread('tropo.txt','%s','delimiter', '\n','whitespace', '','headerlines',3); % file as cellstr array
L=length(file); % number lines/records in file
data=zeros(L/12,288); % preallocate for resulting data
j=0; % counter for data blocks
for i=1:12:L % loop over blocks of 12 records
blk=char(file(i:i+11)); % retrieve a block, convert to character array
blk(:,1)=''; % remove leading blanks
line=reshape(blk.',1,[]); line=line(1:864); % recast as record;truncate
line=strrep(line,' ','0'); % replace blanks with leading 0
j=j+1; % increment counter
data(j,:)=sscanf(line,'%3d'); % convert to numeric
end
results in a double array containing the data...
From the first block I tested at command line--
>> whos data
Name Size Bytes Class Attributes
data 288x1 2304 double
>>
3 comentarios
Hi dpb, what is wrong is that the first space on each row is kept (and these spaces are not replacing zeros):
>> line
line =
272261205193193211193200204227268294316326298236262382369158256314279182191 305336...
^ ^
Also, you pick an extra space with your indexing, that you don't see in the output, so you have 12 extra spaces for 12 lines, plus a last one, which leads to the 13 that you find:
>> ['"', line(11*76+1:11*76+1+40), '"']
ans =
" 151155265249234198234225231216214241156 "
dpb
el 23 de Sept. de 2017
Old eyes failed me...I had mistakenly thought char() had gotten rid of the leading space but didn't...thanks.
Cedric
el 23 de Sept. de 2017
My maybe younger eyes failed me too. I had to get tricked a couple times before I realized!
Guillaume
el 23 de Sept. de 2017
Whoever created that format should be very ashamed. It's a pain to parse.
This is a start. I still need to figure out why I've got 292 columns instead of 288, but I've got to go.
filecontent = fileread('L3_tropo_ozone_column_jan14.txt'); %read it all
filecontent(ismember(filecontent, [10, 13])) = []; %remove line returns
longdesc = regexp(filecontent, 'Longitudes:\s*(\d+)\D+(\d+(\.\d+)?)([EW])\D+(\d+(\.\d+)?)([EW])', 'tokens', 'once'); %longitude description
longnumbers = str2double(longdesc([1 2 4]));
longnumbers(2:3) = longnumbers(2:3) .* (-1).^ strcmp(longdesc([3 5]), 'W'); %change sign for W
longitudes = linspace(longnumbers(2), longnumbers(3), longnumbers(1));
pointlats = regexp(filecontent, '\s+([0-9 ]+)lat\s*=\s*(-?\d+(\.\d+)?)', 'tokens'); %extract point strings and latitude
pointlats = vertcat(pointlats{:});
latitudes = str2double(pointlats(:, 2));
points = regexprep(pointlats(:, 1), '\s', '0'); %replace spaces with 0
points = regexp(points, '\d{3}', 'match'); %split in group of three
points = str2double(vertcat(points{:}));
5 comentarios
Cedric
el 23 de Sept. de 2017
"Whoever created that format should be very ashamed. It's a pain to parse."
Agreed!!
pruth
el 23 de Sept. de 2017
dpb
el 23 de Sept. de 2017
...why I've got 292 columns instead of 288"
'Cuz the file content doesn't match the description is why. Now where that came from we can't tell, but it's just inconsistent.
I'm suspecting it was written as stream file and the formatting as is is the product of having been opened in text editor. But even that doesn't seem to explain what appears to be valid data stream that doesn't have correct total number of characters.
The really nasty element is what appears to be the '%3d' formatting instead of '%03d' so has the embedded blanks. Given C's penchant for "eating" blanks on input even with fixed-width format strings, one would think the author wouldn't have done that. Fortran FORMAT would make it simpler but still we don't have the right number of characters to start with here...
Cedric
el 23 de Sept. de 2017
The format is consistent (see my comment under you answer). What is annoying is that it is designed partly because of "machine" constraints, and partly for looking "cute" to a human eye when opened in a text editor.
dpb
el 23 de Sept. de 2017
Wonder why put the leading blank in there, though...that really is the only really bad part; the rest is pretty easy to deal with but that makes for special-casing. Oh, the no leading zero in the format is also pretty ugly; almost forgot that! :)
Categorías
Más información sobre Text Data Preparation en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!