Understanding MatLab's built-in SVM cross-validation on fitcsvm

10 visualizaciones (últimos 30 días)
Carlos Mendoza
Carlos Mendoza el 30 de Ag. de 2020
Comentada: Xingwang Yong el 3 de Oct. de 2020
I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
which gives me class_error equals to 0.4151.
However, if I tried MatLab's built-in SVM cross-validation
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'Leaveout', 'on', 'BoxConstraint', 0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?
I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.
Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.
Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?
Update 2: I attached the data.
Thanks!
  2 comentarios
Image Analyst
Image Analyst el 31 de Ag. de 2020
No answers probably because you forgot to attach your data.
Carlos Mendoza
Carlos Mendoza el 31 de Ag. de 2020
I didn't forget. I thought that the code would be enough. Probably an error.

Iniciar sesión para comentar.

Respuestas (1)

Xingwang Yong
Xingwang Yong el 29 de Sept. de 2020
Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.
https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle
  2 comentarios
Xingwang Yong
Xingwang Yong el 3 de Oct. de 2020
class_error = error_count / n_trials;
= (n_trials - correct_count) / n_trials
= 1 - correct_count / n_trials
= 1 - accuracy
That is your definition of loss.

Iniciar sesión para comentar.

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by