How can I efficiently down-sample data that is non-uniformly spaced?
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Martin Hoecker
el 17 de Sept. de 2012
Comentada: Paul Quelet
el 15 de Oct. de 2014
I am working with long time-series. My data acquisition system takes one point per second for months - except when it hangs for a few minutes or hours, then there will be no data for this time interval.
I want to down-sample this data to say one point per 60 seconds. However, because there can be "gaps" in the data, I am struggling to find efficient code for this!
I tried the following approach for a typical array (called "data") that holds 0.5 million rows and two columns: The first column is the time, the second column is the actual data.
start_time = data(1,1);
end_time = data(end,1);
time_step = 60/(3600*24);
total_time = end_time - start_time;
% Prepare the downsampled data array.
downsampled_data = zeros(floor(total_time/time_step),2);
tic
for i = 1:length(downsampled_data)
% For each time intervall, find all points in this intervall, and
% average over them
downsampled_data(i,1) = start_time + (i-0.5)*time_step;
downsampled_data(i,2) = mean(data(data > start_time + (i-1)*time_step &...
data < start_time + i*time_step,2));
end
toc
As a final step I would have to fish out those points where there is no data... However, the above code takes about 60 seconds to run for .5 million points - and I need it to run in less than 10 minutes for an array of 5 million points. Can you guys think of a way of speeding it up?
1 comentario
Paul Quelet
el 15 de Oct. de 2014
Thank you for posting this algorithm.
The code is very helpful in something that I am doing. However, when I used it for my dataset, I ended up with some values on the exact hour with 00:00 for the minutes and seconds, which became NaN . In any case, I would revise the above algorithm to include the equal to condition, in my case at the bottom time:
mean(data(data >= start_time + (i-1)*time_step &...
data < start_time + i*time_step,2));
Many thanks again for the algorithm Martin and fast solution Andrei.
Respuesta aceptada
Andrei Bobrov
el 17 de Sept. de 2012
[Y, M, D, H, MN] = datevec(data(:,1));
[c,c,c] = unique([Y, M, D, H, MN],'rows');
out = accumarray(c,data(:,2),[],@mean);
Más respuestas (0)
Ver también
Categorías
Más información sobre Multirate Signal Processing en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!