Join several timetables with different variables and times and duplicates
Mostrar comentarios más antiguos
I'm trying to read a bunch of csv files that each have one variable timeseries. Many files can hold the same variable and there are overlaps in the timestamps (duplicates).
My problem is when trying to merge these csv files, which are read to timetables, into a single timetable.
Example code:
times_TT_A = [datetime(2019,01,01) datetime(2019,01,02) datetime(2019,01,03)]';
data_TT_A = [1 2 3]';
TT_A = timetable(times_TT_A, data_TT_A,'VariableNames',{'Variable_ONE'});
times_TT_B = [datetime(2019,01,01) datetime(2019,01,02)]';
data_TT_B = [10 20]';
TT_B = timetable(times_TT_B, data_TT_B,'VariableNames',{'Variable_TWO'});
times_TT_C = [datetime(2019,01,02) datetime(2019,01,03)]';
data_TT_C = [20 30]';
TT_C = timetable(times_TT_C, data_TT_C,'VariableNames',{'Variable_TWO'});
So here there are two different variables and one variable, Variable_TWO, has overlap in timestamps.
I would like to have this result:
times_TT_A Variable_ONE Variable_TWO
__________ ____________ _________________
2019-01-01 1 10
2019-01-02 2 20
2019-01-03 3 30
I don't mind a solution with duplicate entries, I can sort those out later, for example this is also fine:
times_TT_A Variable_ONE Variable_TWO
__________ ____________ _________________
2019-01-01 1 10
2019-01-02 2 20
2019-01-03 3 NaN
2019-01-02 NaN 20
2019-01-03 NaN 30
The csv files are read in a loop and should then one by one be added to the common timetable like this in pseudocode
timetable_alldata = timetable()
for csvfile in csvfiles
timetable_csv = readtable(csvfile)
timetable_alldata = join(timetable_alldata, timetable_csv)
end
This is then the same as calling this using the example data - here using outerjoin:
outerjoin(outerjoin(TT_A,TT_B), TT_C)
ans =
3×3 timetable
times_TT_A Variable_ONE Variable_TWO_left Variable_TWO_TT_C
__________ ____________ _________________ _________________
2019-01-01 1 10 NaN
2019-01-02 2 20 20
2019-01-03 3 NaN 30
I have tried stacking ([A; B]), vertcat, outerjoin and synchronize...but I seem to be stuck here.
Any suggestions how to solve this?
3 comentarios
Lei Hou
el 25 de Feb. de 2020
Hi,
Do you want to do the variable name match automatically between tables? Are the variable names from your csv files are random?
Lei Hou
el 25 de Feb. de 2020
I have another clarification question. TT_B and TT_C both have the same row: 02-Jan-2019 20. If there are duplicated rows like this, do you want only keep one row?
bk
el 25 de Feb. de 2020
Respuesta aceptada
Más respuestas (1)
Guillaume
el 26 de Feb. de 2020
With your example tables, this is trivially achieved with:
synchronize(TT_A, [TT_B; TT_C])
3 comentarios
bk
el 26 de Feb. de 2020
Lei Hou
el 26 de Feb. de 2020
Try me solution of using outerjoin, it doesn't require priori knowledge of the variables in the CSV file and their order. Please let me know if my solution doesn't work for you. I'm happy to help you.
Guillaume
el 26 de Feb. de 2020
I must admit I didn't read the whole question. Yes, this won't work for what you want. If Lei's solution doesn't work, then you can always do it the 'old fashioned way' (before timetables and tables existed) with intersect and co. A bit more work but still fairly simple. I can work out the code if it's needed.
Categorías
Más información sobre Logical en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!