Multiple selection of data (dates,mon​ths,days,h​ours)

9 visualizaciones (últimos 30 días)
Sofia Fourkioti
Sofia Fourkioti el 4 de Mayo de 2020
Comentada: Sofia Fourkioti el 11 de Mayo de 2020
Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;
Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0
2005-01-01 00:00:00.0/2005-01-01 01:00:00.0
2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]
I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.
Please Help !!!!
  1 comentario
Sofia Fourkioti
Sofia Fourkioti el 4 de Mayo de 2020
Editada: Sofia Fourkioti el 4 de Mayo de 2020
I forgot to mention that my data is from 2004-02-01 to 2020-01-31, including everyday hourly values . I want to group the values as specific year,specific month, specific day for the hour range of 0-23.
Thanks!

Iniciar sesión para comentar.

Respuesta aceptada

Adam Danz
Adam Danz el 4 de Mayo de 2020
Editada: Adam Danz el 4 de Mayo de 2020
Follow these steps to prepare in your data more efficiently and in a user-friendly format.
Use readtable to read in the data.
filePath = 'C:\Users\name\Documents\MATLAB'; % Path to your csv file
fileName = 'κοζάνη.csv';
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);
Let's look at T using head() which shows us the first few rows
>> head(T)
ans =
8×11 table
x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI ClearSkyDHI ClearSkyBNI GHI BHI DHI BNI Reliability
_______________________________________________ ______ ___________ ___________ ___________ ___________ ______ ______ ______ ______ ___________
{'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659 1.2983 16.531 1.789 0.4964 1.2926 14.457 1
{'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474 33.111 516.95 97.043 61.139 35.905 408.39 1
{'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68 56.764 799.67 268.38 203.61 64.77 710.04 1
Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!
Split the x_ObservationPeriod into start / stop times and convert to datetime format.
This will take a few seconds because you've got 140256 rows of data.
% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:}));
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl);
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'};
Let's look at startStopTblDt
>> head(startStopTblDt)
ans =
8×2 table
ObservationStart ObservationStop
____________________ ____________________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00
Much better format. You can update the output format using this example,
startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';
Combine the two tables
T = [startStopTblDt, T];
View the results (first 10 rows, 6 columns)
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
If you want to remove the x_ObservationPeriod column,
T.x_ObservationPeriod = [];
Add row numbers to the table that correspond to row number in csv file
rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);
View results
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
44 01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
45 01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
46 01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
47 01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
48 01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
49 01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
50 01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
51 01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
52 01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
53 01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
Now you can see that the first row of data comes from line 44 of the csv file.
  12 comentarios
Adam Danz
Adam Danz el 6 de Mayo de 2020
Editada: Adam Danz el 6 de Mayo de 2020
"Do you think that it could be a better solution"
No, because I don't think groupsummary is better. Retime is more flexible and it's designed to work with timetables. With groupsummary, you would need to create a grouping variable for every type of average (yearly, monthly, etc). Retime does that for you.
You can follow examples for groupsummary in the documentation.
Example
>> T.month = month(T.ObservationStart);
>> groupsummary(T(:,{'TOA','month'}), 'month', 'mean')
% Result (first 3 rows)
% ans =
% 12×3 table
% month GroupCount mean_TOA
% _____ __________ ________
% 1 11904 175.52
% 2 10848 237.48
% 3 11904 321.15
But this requires you to add an additional column ('month') to an already large table and the data in that column is redundant.
Sofia Fourkioti
Sofia Fourkioti el 11 de Mayo de 2020
Thank you for your help!!

Iniciar sesión para comentar.

Más respuestas (1)

dpb
dpb el 4 de Mayo de 2020
Attach text that folks can do something with instead of pictures...or just attach the .csv file itself.
Use split to break the two times apart going into datetime
>> datetime(split(TSTR,'/').','InputFormat','uuuu-MM-dd''T''HH:mm:ss.S')
ans =
1×2 datetime array
01-Jan-2004 00:00:00 01-Jan-2004 01:00:00
>>
where I just copied your first line above into TSTR, split it into the two sections and returned that as the 2-column cell array to pass to datetime.
Once you have datetimes, put those into a table or timetable and use the selection on those datetime values.
  1 comentario
Sofia Fourkioti
Sofia Fourkioti el 4 de Mayo de 2020
Editada: Sofia Fourkioti el 4 de Mayo de 2020
Thank you!I have attached my .csv in the comments.
I have already split my data, but this is not my problem. Is there a way to locate this data and find their exact position in the .csv?

Iniciar sesión para comentar.

Categorías

Más información sobre Timetables en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by