Alternative to outerjoin for large table concatenation?

2 visualizaciones (últimos 30 días)
Jeff
Jeff el 26 de Mzo. de 2014
Hi MATLAB'ers,
I am looking for tips on how to speed up vertical concatenation of tables. Specifically, how to identify and append missing columns to each table before concatenation.
In my case, I have 2000 tables stored in a cell array and ~15000 unique columns (VariableNames). Each table has a randomly-ordered subset of between 10-3000 of these columns and, importantly, a column containing unique sample identifiers.
One way of tackling this is to simply call outerjoin in a loop. Outerjoin normalizes the column headers for you, but spends an inordinate amount of time in the joinInnerOuter.m and defaultarrayLike.m subfunctions.
MergeTable = AllTablesCellArray{1};
for nt = 2:length(MutTables)
MergeTable = outerjoin(MergeTable,AllTablesCellArray{nt},'MergeKeys',true);
end
Another strategy is to pad the individual tables with the extra (missing) columns then vertcat. This works, but is slower. On the other hand, this code can use a parfor loop.
AllVars = cellfun(@(x) x.Properties.VariableNames,AllTablesCellArray,'UniformOutput',false);
UniqueVars= unique([AllVars{:}],'stable');
for nt = 1:length(AllTablesCellArray)
MissingVars = UniqueVars(~ismember(UniqueVars,AllVars{nt}))';
if ~isempty(MissingVars)
AllTablesCellArray{nt}{:,MissingVars} = repmat({''},height(AllTablesCellArray{nt}),length(MissingVars));
end
end
MergeTable = vertcat(AllTablesCellArray{:});
I am hoping that my brain is just fried and that I am missing something obvious. My hope is to avoid converting each table to a cell array, but this could be a good way to go.
Thanks!
-Jeff

Respuestas (0)

Categorías

Más información sobre Data Type Identification en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by