Reading text files in Matlab with inconsistent whitespace and character array lengths

4 visualizaciones (últimos 30 días)
I am having difficulty reading in text files with inconsistent white spacing and character array lengths. There are no delimeters.
Below is an example of the text file that I am trying to read:
%%dummy_example.txt
USC00011084201408PRCP 0 7 0 7 0 7 13 7
USC00011084201408WT03-9999 -9999 -9999 5 7
USC00011084201409TMAX 350 H 350 H 344 H 67 7
USC00011084201409TMIN 217 H 217 H 228 H 167 7
USC00011084201409TOBS 217 H 256 H 233 H 72 7
USC00011084201409PRCP 0 H 0 H 0 H 117 7
Each line represents a weather station and columns 2 and 3 represent the variable value and the character flags for the first day of the month. The pattern repeats for the entire month, shown here are the first 4 days of September 2014. The character spacing is given from the data source as follows:
% ID 1-11 Character
% YEAR 12-15 Integer
% MONTH 16-17 Integer
% ELEMENT 18-21 Character
% VALUE1 22-26 Integer
% MFLAG1 27-27 Character
% QFLAG1 28-28 Character
% SFLAG1 29-29 Character
% VALUE2 30-34 Integer
% MFLAG2 35-35 Character
% QFLAG2 36-36 Character
% SFLAG2 37-37 Character
% . . .
The inconsistent white space and lack of delimeter is making it extremely difficult for me to read the data consistently. The goal is to be able to parse the variable values into a data matrix so that I can determine the variable values for each day of the month. Below is my m-file.
%%Begin m-file
% Read Dummy Text
clc
clear
fid=fopen('dummy_example.txt');
num_fmt=('%11s %6d %4s %5d %3s %5d %3s %5d %3s %5d %3s');
C=textscan(fid,num_fmt);
fclose(fid);
Stn=C{1};
Dates=C{2};
Variable=C{3};
Data(:,1)=C{4};
Data(:,2)=C{6};
Data(:,3)=C{8};
Data(:,4)=C{10};
disp(Data)
Any help would be greatly appreciated. Thank you.

Respuesta aceptada

Jan
Jan el 9 de Mayo de 2017
Editada: Jan el 9 de Mayo de 2017
Data = fileread('dummy_example.txt');
DataC = strsplit(Data, '\n');
DataC(cellfun('isempty', DataC)) = [];
nLine = numel(DataC);
ID = cell(1, nLine);
Year = zeros(1, nLine);
Month = zeros(1, nLine);
Element = cell(1, nLine);
...
for iLine = 1:nLine
S = DataC{iLine};
ID{iLine} = S(1:11); % ID 1-11 Character
Year(iLine) = sscanf(S(12:15), '%d'); % YEAR 12-15 Integer
Month(iLine) = sscanf(S(16:17), '%d'); % MONTH 16-17 Integer
Element{iLine} = S(18:21); % ELEMENT 18-21 Character
...
end
This is less smart than textscan, but this simple method allows to parse the lines exactly as defined in the table.
  2 comentarios
J Travis Hunsucker
J Travis Hunsucker el 9 de Mayo de 2017
Thank you very much! I cannot pre-allocate the signed integer arrays (Year, Month,...etc) without getting the following error:
Cell contents assignment to a non-cell array object.
Error in Read_Dummy_Text (line 44)
Year{iLine} = sscanf(S(12:15), '%d');
However, it works great if I remove the pre-allocation and use the cell2mat function after the loop. Thank you very much for the help.
Jan
Jan el 9 de Mayo de 2017
This was a typo, sorry. Use:
Year(iLine) = sscanf(S(12:15), '%d');
^ ^ round parenthesis instead of curly braces

Iniciar sesión para comentar.

Más respuestas (1)

Guillaume
Guillaume el 9 de Mayo de 2017
Using textscan, you need to specify Whitespace, '' to prevent matlab interpreting the spaces as separator. It is then easy to scan the file in just one line:
fid = fopen('dummy_example.txt', 'rt');
header = '%11s %4d %2d %4s';
valflag = '%5d %1s %1s %1s';
C = textscan(fid, [header, valflag, valflag, valflag, valflag], 'Whitespace', '');
%conversion to table for ease of use:
varnames = compose({'Value%d', 'MFlag%d', 'QFlag%d', 'SFlag%d'}, (1:4)')'
T = table(C{:}, 'VariableNames', [{'Id'; 'Year'; 'Month'; 'Element'}; varnames(:)])

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by