Borrar filtros
Borrar filtros

use textscan on a subset of from large ascii file

1 visualización (últimos 30 días)
nori
nori el 16 de Mayo de 2011
hello,
i am trying to use the textscan function to open a large ascii file (2.5gb) and break the file up into smaller files.
the big file is the cru ts 3.1 world monthly temp data from jan 1901-dec2009 where each month is stored in 360x720. multiply that by 1308 (number of months in date range) and that is my big ascii file.
now my problem is that i cannot seem to find any documentation on how to use textscan to scan through the original file using a specified range (360x720).
the help does refer to the possiblity of opening large files and subsetting it but the examples show how to do it using a given number of characters but since this data has a range of 0-255 i cant set a fixed number of characters for each line.
fyi-i am able to use the textscan on smaller files and get the results i want but i only use textscan to read the entire file and not a subset of the data.
is textscan able to do what im hoping or is there another function? i searched and couldnt find anything suitable.
any help would be greatly appreciated.
thanks.
n

Respuestas (2)

Walter Roberson
Walter Roberson el 16 de Mayo de 2011
monthnumber = 17; %for example. First is 1
fid = fopen('YourDataFile.txt','rt');
monthcell = textscan(fid, repmat('%g',720), 360, 'HeaderLines', 360*(monthnumber-1), 'CollectOutput', 1);
fclose(fid);
Your data would then be the array monthcell{1}
  1 comentario
nori
nori el 17 de Mayo de 2011
hi Walter,
thanks for the quick reply.
i tried to understand and apply your solution but i confused on a few things and when i applied it to my dataset, i got empty cell arrays returned.
first off, i dont understand the repmat('%g',720) part of the code. isnt repmat for replicating a matrix? which would result in the same matrix replicated over and over again? and could you elaborate on the '%g'? i couldnt find what parameter that represents.
i understand the rest of the code. use the headerlines function to skip the number of datasets based on 360 rows. brilliant.
by looking at your code, i got the idea that i could try to read in the number of cells for each dataset (360*720) and use that to limit each month. and if that worked, i would try to use your idea of the headerlines function.
my idea sort of worked but now when i convert the .asc to an image, i get 2 images that are rotated 90 degrees. so the south pole is point west and north pole pointing east and the image doubled.
according to the metadata from CRU the dataset is definitely 720x360. so with what i am seeing, i should expect that i got a matrix of 1440x360. but that is not the case.
in order to see the image, i have to add this header to the top of the file to import it into arcgis.
ncols 720
nrows 360
xllcorner -180
yllcorner -89.9999999999999
cellsize 0.5
NODATA_value -999
the code below is what i tried and got the double image (rotated 90).
file = ('cru_ts_3_10.1901.2009.tmn.dat');
r = 720;
c = 360;
fid = fopen(file,'r');
a = r*c;
m = textscan(fid,'%d16',a);
b = m{1,1};
c = reshape(b, c, r);
fclose(fid);
dlmwrite('out.txt', c, 'delimiter', ' ');
now heres the really strange part of this problem.
if i transpose the final result c = reshape(b, c, r)';
and use the exact same header from above, i get a single image with everything looking just fine.
however, if i open the file and count the rows and columns, it is 720 rows and 360 cols (which is what i expect) but arcgis properly displays the image using the header of 720 cols and 360 rows.
now i am definitely confused.
sorry for the long winded reply. but its been a number of days and im going bananas!

Iniciar sesión para comentar.


nori
nori el 22 de Mayo de 2011
not a matlab solution but it works.
i downloaded 7zip and used the split file utility.
worked like a charm.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by