I need to break down the flow data into groups of data with months for every year.
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Harjas
el 20 de Jun. de 2022
Comentada: Cris LaPierre
el 26 de Jun. de 2022
For example: All the entries for the month of Jan for all the years (suppose 2010-2022) are stored in one group. Similarly, all the entries for the month of Feb for all the years.(suppose 2010-2022) are stored in second group and so on. My end goal is to calculate the percentile of monthly data for all the years.
Date Flow
12/1/2010 100
12/2/2010 200
12/3/2010 150
12/4/2010 176
.....
.....
12/1/2022 145
I tried to create the vlookup function to create the matrix but i wasn't able to. So, I decided to group all months together and then perform operations on them.
0 comentarios
Respuesta aceptada
Cris LaPierre
el 20 de Jun. de 2022
6 comentarios
Cris LaPierre
el 25 de Jun. de 2022
I think the best approach here is to create a second table that is a subset of the full data set, but just contains the date ranges of interest. Then you can use groupsummary to compute the desired stats grouped by year. I don't know what you want to use for percentile, so I use 42.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false)
% Use 'day of year' to identify dates within the desired range for all years
drng = day(datetime(2004,9,27):caldays(1):datetime(2004,10,27),'dayofyear');
d = day(T.Var1,'dayofyear');
% Create subtable
idx= ismember(d,drng);
Tsub = T(idx,:)
% compute stats for each year
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
Cris LaPierre
el 26 de Jun. de 2022
Leap years are going to cause some issues with my previous answer. Unfortunately, I couldn't find a slick way to do this, so here's a brute force approach.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false);
% Define range of interest
startD = datetime(2004,9,27);
stopD = datetime(2004,10,27);
% find data for all years between the indicated month/days
y = unique(year(T.Var1));
idx = zeros(height(T),1);
for i = 1:length(y)
idx = idx + isbetween(T.Var1,datetime(y(i),month(startD),day(startD)),...
datetime(y(i),month(stopD),day(stopD)));
end
Tsub = T(logical(idx),:)
% Compute summary stats
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
Más respuestas (0)
Ver también
Categorías
Más información sobre Data Preprocessing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!