Monte Carlo repetitions with customized partitions

Question

0 votos

k = 5; %number of partitions
c = cvpartition(Labels{2},"KFold", k ,"Stratify",true);
test_idx = test(c,"all");
for ii = 1:5
    %%%% Divide into train and test set via logical indexing (5columns = 5
    %%%% partitions. Label 1 & 3 are always used for testing
    testIndices(:,ii) = logical([ones(numel(Labels{1}),1); test_idx(:,ii); ones(numel(Labels{3}),1)]);
end
c = cvpartition("CustomPartition",testIndices);

I want to customize partitions for cross-validation, but with some of the samples to be tested for in each partition. Is there a way to do it?

I tried using cvpartition, but I can either customize the partitions and get the Error: "Each observation must be present in one test set."

Or I use monte carlo repetitions which allows for samples to be used more than once as testing set, but then I cant customize the sets anymore.

I'm thankful for any hint.

9 comentarios
Mostrar 7 comentarios más antiguos Ocultar 7 comentarios más antiguos

Tobias Rieker el 3 de Abr. de 2024

Editada: Tobias Rieker el 3 de Abr. de 2024

Abrir en MATLAB Online

I tried it out, but still doesnt work:

SFS_xtrain is the data to be partioned (KFold = 5) Then every fold the data to be only tested is concatenated (=SFS_xAlwaysIn...)

I have no idea why the arrays are not consistent.

k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, [XTest; SFS_xAlwaysIn], [yTest; SFS_yAlwaysIn]); %%%% FOR CV
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c,"nfeatures",nfeatures,"options",opts);
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end
______
Error using crossval>evalFun
The function '@(XTrain,yTrain,XTest,yTest)errorFun(XTrain,yTrain,[XTest;SFS_xAlwaysIn],[yTest;SFS_yAlwaysIn])' generated
the following error:
Dimensions of arrays being concatenated are not consistent.
Error in crossval>getFuncVal (line 509)
funResult = evalFun(funorStr,arg(:));
Error in crossval (line 355)
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in sequentialfs>callfun (line 500)
funResult = crossval(fun,x,other_data{:},...
    
Error in sequentialfs (line 368)
crit(k) = callfun(fun,x,other_data,cv,mcreps,ParOptions);

Tobias Rieker el 3 de Abr. de 2024

Indeed SFS_xAlwaysIn has more columns than XTest. The reason is that Sequentialfs chooses 1,2,3...n columns (features) of the data to be tested against the testing data. This choosing of one of the features doesnt happen with the concatenated testing data SFS_xAlwaysIn though as it is "externally added". Is there a way to automatically chose the same features for SFS_xAlwaysIn?

Thank you for your big help so far. It is highly appreciated

Harald el 3 de Abr. de 2024

Duh... that makes sense. I suppose it will take some fiddling to address this.

I would try this strategy:

To be able to determine which columns were chosen, add fake data 1:numColumns to on top of the x-values and some nonsense value that does not appear in your y-values on top of the y-values that you supply to sequentialfs.
Identify which of the y-values passed to the function (either yTrain or yTest) contains the nonsense value. Extract the corresponding row of x-values from xTrain or xTest. This will tell you which columns were sent into the function.
Extract the corresponding columns from SFS_xAlwaysIn and add it to the test data. Be sure to remove the fake data of the first step.

I expect this to be somewhat tricky and would be happy to try to help, but would really need some sample data for SFS_xtrain and SFS_ytrain to play with. Perhaps I should be able to infer this, but I am not even sure of the data type of SFS_ytrain.

Best wishes,

Harald

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Harald el 4 de Abr. de 2024

Abrir en MATLAB Online

0 votos

I have now tried the approach discussed in the comments with sample data based on fisheriris.mat.

%% Sample data
load fisheriris.mat
species = categorical(species);
% Shuffle data
order = randperm(length(species));
meas = meas(order,:);
species = species(order,:);
SFS_xtrain = meas(1:130,:);
SFS_ytrain = species(1:130);
SFS_xAlwaysIn = meas(131:end,:);
SFS_yAlwaysIn = species(131:end);
%% Add fake data
SFS_xtrain = [1:size(SFS_xtrain, 2); SFS_xtrain];
SFS_ytrain = ["nonsense"; SFS_ytrain];
%% Your code (for now without setting "nfeatures" and "options")
k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
% opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c);
%% A helper function
function err = callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn)
if sum(yTrain == "nonsense") == 1
    idx = yTrain == "nonsense";
    columns = XTrain(idx, :);
    XTrain(idx,:) = [];
    yTrain(idx) = [];
elseif sum(yTest == "nonsense") == 1
    idx = yTest == "nonsense";
    columns = XTest(idx, :);
    XTest(idx,:) = [];
    yTest(idx) = [];
else
    error("Something unexpected happened. Revisit the approach...")
end
XTrain = [XTrain; SFS_xAlwaysIn(:, columns)];
yTrain = [yTrain; SFS_yAlwaysIn];
err = errorFun(XTrain,yTrain,XTest,yTest);
end
%% Your function
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end

I hope you'll find this to be helpful.

Best wishes,

Harald

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Tobias Rieker el 4 de Abr. de 2024

Editada: Tobias Rieker el 4 de Abr. de 2024

Abrir en MATLAB Online

Thanks to your above meantioned idea I have now figured it out. Thank you!

This is my approach:

% add fake data ontop of columns 
num_col = 1:numel(EMG_chanels_remaining);  %EMG_chanels_remaining = number of features
SFS_xtrain = [num_col;SFS_xtrain];
SFS_ytrain = [categorical(1);SFS_ytrain];
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn); %%%% FOR CV
% CVpartition Object
k = 10;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c,"nfeatures",nfeatures,"options",opts);

function error = errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn)
%Find where fake data is & extract columns of first row = the features
%included in SFS
if ismember(yTrain(1,:),categorical(1:256))
    Ch_count = XTrain(1,:);
    XTrain(1,:) = [];
    yTrain(1,:) = [];
else
    Ch_count = XTest(1,:);
    XTest(1,:) = [];
    yTest(1,:) = [];
end
%add data only to be tested with according columns(features)
XTrain = [XTrain;SFS_xAlwaysIn(:,Ch_count)];
yTrain = [yTrain;SFS_yAlwaysIn];
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end

Harald el 4 de Abr. de 2024

Glad it's working for you! If you found the answer to be helpful, please consider "accept"-ing it.

Best wishes,

Harald

Iniciar sesión para comentar.

Monte Carlo repetitions with customized partitions

9 comentarios
Mostrar 7 comentarios más antiguos Ocultar 7 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Más respuestas (0)

Categorías

Etiquetas

Community Treasure Hunt

Monte Carlo repetitions with customized partitions

9 comentarios Mostrar 7 comentarios más antiguos Ocultar 7 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar Ninguno Ocultar Ninguno

Más respuestas (0)

Categorías

Etiquetas

Ver también

Community Treasure Hunt

9 comentarios
Mostrar 7 comentarios más antiguos Ocultar 7 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno