MATLAB Answers

0

Processing a HUGE number of timestamps

Asked by Matlab2010 on 13 Nov 2014
Latest activity Answered by Jan
on 16 Nov 2014
I have a cell array whose elements are time stamps in the format "Mon Apr 01 20:00:00 BST 2013". I have a very large number of these vectors. At the moment, I loop through each value in the vector and apply the below function. This loop taking up 99% of my processing time.
How can I remove the loop?
thanks
function myTimeOut = st_timestampConvert(myTime)
year = strtrim(myTime(end-4:end));
month = strtrim(myTime(5:8));
day = strtrim(myTime(9:10));
time = strtrim(myTime(11:19));
timezone = strtrim(myTime(20:23));
myTimeOut = convert_to_UTC(myTimeOut, timezone); %time zone conversion
myTimeOut = datenum([day '-' month '-' year ' ' time], 'dd-mmm-yyyy HH:MM:SS');
end

  0 Comments

Sign in to comment.

Tags

3 Answers

Answer by Guillaume
on 14 Nov 2014
 Accepted Answer

Without R2014b datetime, you can use regexprep to rearrange the bits of the string you want before calling datenum. It's many orders of magnitude faster than a loop, cellfun, or strsplit.
s2 = regexprep(s, '(\w+) (\w+) (\d+) ([0-9:]+) (\w+) (\d+)', '$3-$2-$6 $4');
tout = datenum(s2, 'dd-mmm-yyyy HH:MM:SS');
On my machine, to process 100k dates, the above two lines takes 2.1 seconds , most of it taken by the datenum operation. The regexp line is only 0.3 seconds.
There remains the problem of the time zone adjustment (which I believe should have come after the conversion to datenum in your example). Your convert_to_UTC is not part of matlab. Hopefully it can operate on cell arrays as well. Thus to extract the timezone:
tzones = regexp(s, '\w+(?= \d+$)', 'match', 'once');
tout = convert_to_UTC(tout, tzones); %Will this work?

  0 Comments

Sign in to comment.


Answer by Peter Perkins
on 13 Nov 2014

none, I don't know if you have access to R2014b. If you do, consider using the new datetime data type. On a not so fast PC, parsing 100000 strings like yours, with time zones, takes a bit over 2s. 'BST' presents a potential issue, because it might mean any number of things. In the UK, it means "British Summer Time", and the following parses the strings using that locale. Hope this helps.
% construct 100k strings
d = datetime(2013,4,1,20,0,0,'TimeZone','Europe/London') + days(randn(100000,1));
s = cellstr(d,'eee MMM dd HH:mm:ss z yyyy','en_UK');
ans =
Tue Apr 02 14:33:32 BST 2013
% parse those strings
tic
d1 = datetime(s,'Format','eee MMM dd HH:mm:ss z yyyy','TimeZone', ...
'Europe/London','Locale','en_UK');
toc

  1 Comment

Peter.
I am using win7 and 2014A.
I have N cell arrays, each with T elements. N = 0.5M, T = 0.25M.
I have a PC with 244GB RAM and 32 Cores. I use parfor for such jobs.
BST does mean British summer time. Though I deal with all time zone conversions myself. You can assume each N is in the same time zone, thus, the time zone conversion can be done as a single vectorized operation.
The issue is how to extract from this cell and get into datenum without using a for loop.
thank you

Sign in to comment.


Answer by Jan
on 16 Nov 2014

I prefer Guillaume's version, but it is not "magnitudes" faster than a loop approach:
function DOut = ConvertCellDate(DIn)
DOut = zeros(size(DIn));
for k = 1:numel(DIn)
Dx = double(DIn{k} - '0'); % For faster conversion of numbers
month = (strfind('JanFebMarAprMayJunJulAugSepOctNovDec', DIn{k}(5:7)) + 2) / 3;
year = Dx(25) * 1000 + Dx(26) * 100 + Dx(27) * 10 + Dx(28);
DOut(k) = datenummx(year, month, Dx(9) * 10 + Dx(10), ...
Dx(12) * 10 + Dx(13), Dx(15) * 10 + Dx(16), ...
Dx(18) * 10 + Dx(19));
end
Phew, this looks cruel and is not smart to debug. But it takes 2.3 sec on my Matlab 2011b/Win7/32 system, while Guillaume's method takes 1.7 sec.

  0 Comments

Sign in to comment.