Import data with double and strings
20 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Diego Fonseca
el 24 de Sept. de 2019
Comentada: Diego Fonseca
el 24 de Sept. de 2019
Hello everybody,
I have a file (*.txt as attached). This file contains data perfilated with an equipment, displayed in a first column of date/time and four columns of numeric values (depth and water quality). Each profile is preceeded by a 6 lines heading (text data) and a given file may contain many profiles (always preceeded by a 6 lines heading).
Example:
Sensor Data Columns
Sensor Statistics (sensor #, Description, Statistic)
1 Depth Mean
2 OBS-1 (NTU) Mean
4 Temperature (deg C) Mean
5 Conductivity (mS/cm) Mean
16:15:35.0 07/15/2019 9.92 2422.9 28.77 6.73
16:15:35.5 07/15/2019 9.92 2422.9 28.77 6.73
16:15:36.0 07/15/2019 9.95 2422.9 28.77 6.72
16:15:36.5 07/15/2019 9.98 2422.9 28.77 6.73
16:15:37.0 07/15/2019 10.01 2422.9 28.77 6.72
16:15:37.5 07/15/2019 10.03 2422.9 28.77 6.74
16:15:38.0 07/15/2019 10.02 2422.9 28.77 6.74
The problem is that there are 5 spaces separating date/time and the column showing the depths, but only four spaces for depths greater than 9.9m. When I use:
OBS = importdata(AM2_15_1a.txt,' ',6);
It creates a strcut separating text and numeric data, which is exactly what I want, except that for depths > 9.9m, the lines are read as 0.1 (instead of 10.1) and the remaining "1" is put together with the date column which bugs the analysis.
As I have many files with many lines, is there any solution for fixing this problem? I tried pasting data to excel and then back to the log file; changing to csv and other types in order to standardise the columns separators, but when I do that, "importdata" doesn't work (it doesn't create a struct, but a cell containing only the heading.
Any sugestion? Thanks in advance
Best regards!
2 comentarios
Respuesta aceptada
Adam Danz
el 24 de Sept. de 2019
Editada: Adam Danz
el 24 de Sept. de 2019
This solution uses fileread() to read in the entire file and then splits the text by line using strsplit().
It then identifies the start of each profile by looking for the string 'Sensor Data Columns' and then marks the start and end of each segment of data based on those line numbers and the 6 total header lines per profile.
It then loops through each profile and read in the datetime values and the 4 measurements and stores them in tables saved in the cell array "T".
See comments for alternative approaches.
% set parameters
file = 'AM2_15_1a.txt'; %file (better if it includes full path)
profileStartKey = 'Sensor Data Columns'; %text that indicates start of key
nHeaderRows = 6; %number of header rows for each profile
% Get start/end line number for each profile
text = strtrim(strsplit(fileread(file),'\n')'); %separate entire file by line
text(cellfun(@isempty,text)) = []; % remove empty lines
profileStartLineNum = find(~cellfun(@isempty,regexp(text,sprintf('^%s',profileStartKey)))); %lines where header starts
startLineNum = profileStartLineNum + nHeaderRows; %lines where data starts
endLineNum = [profileStartLineNum(2:end)-1;numel(text)]; %lines where data ends
% Loop through each profile and create a table for each profile, stored in "T"
T = cell(numel(startLineNum),1);
for i = 1:numel(startLineNum)
profile = text(startLineNum(i):endLineNum(i)); %block of profile data (cell of chars)
dataStr = cellfun(@strsplit,profile,'UniformOutput',false); %char array broken up into sub-cells
% Extract datetime and measurements for each line within the profile
[dtTemp,dataTemp] = cellfun(@(x)deal(datetime([x{1},' ',x{2}],'InputFormat','HH:mm:ss.s MM/dd/yyyy'),...
str2double(x(3:end))),dataStr,'UniformOutput',false);
T{i} = [cell2table(dtTemp,'VariableNames',{'Date'}),array2table(cell2mat(dataTemp),'VariableNames',{'m1','m2','m3','m4'})];
end
T{n} is the table from profile #n.
5 comentarios
Adam Danz
el 24 de Sept. de 2019
Editada: Adam Danz
el 24 de Sept. de 2019
Glad I could help! In case you started implementing the answer before my final edit, I recommend using the answer in its current form where tables are produced and saved within a cell array. It's by far the cleanest and simplest approach.
Strtrim() is used to get rid of the carriage return at the end of each line which is added by strsplit().
Más respuestas (0)
Ver también
Categorías
Más información sobre Text Data Preparation en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!