fitglm for multi-dimensional/time series data

7 visualizaciones (últimos 30 días)
Niko Busch
Niko Busch el 27 de Mayo de 2019
Respondida: Jaynik el 24 de Jul. de 2024
Hi everyone,
How can I use fitglm when my response variable has more than one dimension? Specifically, my response variable is time series data with dimensions n x p, where n is number of observations and p is time points. I would like to avoid having to loop over the time series and compute many fitglms on one time point at a time. Is this possible?
To be clear, the result should be a time series of coefficient estimates of the univariate GLM -- I am not trying to fit a multi-variate model. Also, I would prefer to use fitglm by providing it with a modelspec in Wilkinson notation, as in the example below.
%% Generate time series data of two experimental conditions.
clear; clc
npoints = 1000;
ntrials = 20;
t = (1:npoints)/1000;
signal = (1-cos(2*pi*t))/2;
data1 = repmat(1.0*signal, [ntrials, 1])+ 0.1*randn(ntrials, npoints);
data2 = repmat(0.6*signal, [ntrials, 1])+ 0.1*randn(ntrials, npoints);
data_all = [data1; data2];
cond = [ones(ntrials,1); 2*ones(ntrials,1)];
figure; plot(t, data1, 'r', t, data2, 'k')
%% The following works, but only for a single point of the time series.
modelspec = 'Var2 ~ cond';
tbl = table(cond, data_all(:,npoints/2));
mdl = fitglm(tbl,modelspec,'Distribution','normal')
%% Looping over data points is time consuming.
tic
for i = 1:npoints
tbl = table(cond, data_all(:,i));
mdl = fitglm(tbl,modelspec,'Distribution','normal');
coeffs(:,i) = mdl.Coefficients.Estimate;
end
toc
figure; plot(t, coeffs)
%% fitglm does not accept the full time series at once.
tbl = table(cond, data_all);
mdl = fitglm(tbl,modelspec,'Distribution','normal')
%% Different notation, but same problem.
mdl = fitglm(cond, data_all, 'Distribution','normal')

Respuestas (1)

Jaynik
Jaynik el 24 de Jul. de 2024
Hi Niko,
The fitglm function in MATLAB is designed to work with univariate response variables. For multivariate response variable (like a time series), you would typically need to fit a separate model for each time point, as done in the loop.
One way to achieve faster computations is to use parallel computing tools provided by MATLAB, such as parfor instead of for. This allows you to perform the iterations in parallel, which can significantly speed up the computation if you have a multi-core processor.
Here is how the code can be modified:
if isempty(gcp('nocreate'))
parpool
end
tic
coeffs = zeros(2,npoints);
parfor i = 1:npoints
tbl = table(cond, data_all(:,i));
mdl = fitglm(tbl,modelspec,'Distribution','normal');
coeffs(:,i) = mdl.Coefficients.Estimate;
end
toc
Please note that we need to start a parallel pool using parpool before using parfor. Also, the Parallel Computing Toolbox is required for this.
Even though this approach speeds up the computation, it is still fitting a separate GLM for each time point. For large number of time points, consider a different modeling approach that can handle multivariate time series data directly. However, this would likely involve moving away from the GLM framework.
Hope this helps!

Categorías

Más información sobre Agriculture en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by