Machine Learning: Use cross-validation between time series
    8 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
Hello,
I am working on the following task:
Given: about 30 series of measurements. Each one includes measurements until the break of a system. I have divided the data points of each lifetime in 5 classes (A = data points in the section 0-20% of lifetime, B = 20-40% of lifetime, ... E = 80-100% of lifetime).
Goal: I want to determine in which state a concrete system is.
Solution: I have used the function "fitcauto" to train many classification algorithms and to choose the best one. However, there is a problem: The algorithm uses cross-validation. Thereby, it divides the input data into training and validation data. The problem is, that this division is made measuring series overlapping. This means there are data points of a specific series of measurements in both training data and test data. However, this training task is too easy, because the algorithm just has to interpolate the missing sections. If it sees after training a completely new series, it will perform very badly. The solution I want to try is to do the cross-validation at the level of the measurement series. This means the data points of one series are all either in the training or validation data.
Question: Is this type of cross-validation possible with MATLAB, especially with the "fitcauto"-function? If yes, how? If no, is there an alternative MATLAB function?
1 comentario
  Magsud Hasanov
 el 22 de Jul. de 2022
				Hi Paul, 
I am also working now on time series forecast and I've been looking for matlab cross validation implementation, as well.
Hope we'll find the answer.
All the best,
Magsud
Respuestas (1)
  Ayush Aniket
      
 el 11 de Jun. de 2025
        You can use grouped cross-validation using the cvpartition function, which ensures that all data points from a single measurement series remain in either the training or validation set. Refer the following documentation and code snippet below: https://www.mathworks.com/help/stats/cvpartition.html#mw_9d9b6de7-30dc-4a1c-9349-370602efa9f2 
% Assume 'SeriesID' is a column indicating the measurement series
K = 10; % Number of folds
seriesGroups = unique(SeriesID); % Unique measurement series
cvp = cvpartition(length(seriesGroups), 'KFold', K); % Grouped cross-validation
% Prepare training and test sets based on grouped partition
for i_fold = 1:K
    testSeries = seriesGroups(cvp.test(i_fold)); % Test series
    trainSeries = seriesGroups(cvp.training(i_fold)); % Train series
    % Select data points belonging to the respective series
    trainIdx = ismember(SeriesID, trainSeries);
    testIdx = ismember(SeriesID, testSeries);
    trainX = X(trainIdx, :);
    trainY = Y(trainIdx);
    testX = X(testIdx, :);
    testY = Y(testIdx);
    % Train model using fitcauto
    trainedModel = fitcauto(trainX, trainY);
    % Evaluate model on test set
    predictions = predict(trainedModel, testX);
    accuracy(i_fold) = sum(predictions == testY) / length(testY);
end
Additionally, if fitcauto does not support grouped cross-validation directly, you can manually train models using fitcecoc (for multi-class SVM) or fitcensemble (for ensemble learning) while ensuring grouped cross-validation.
trainedModel = fitcecoc(trainX, trainY, 'CVPartition', cvp);
0 comentarios
Ver también
Categorías
				Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


