Monte Carlo repetitions with customized partitions
Mostrar comentarios más antiguos
k = 5; %number of partitions
c = cvpartition(Labels{2},"KFold", k ,"Stratify",true);
test_idx = test(c,"all");
for ii = 1:5
%%%% Divide into train and test set via logical indexing (5columns = 5
%%%% partitions. Label 1 & 3 are always used for testing
testIndices(:,ii) = logical([ones(numel(Labels{1}),1); test_idx(:,ii); ones(numel(Labels{3}),1)]);
end
c = cvpartition("CustomPartition",testIndices);
I want to customize partitions for cross-validation, but with some of the samples to be tested for in each partition. Is there a way to do it?
I tried using cvpartition, but I can either customize the partitions and get the Error: "Each observation must be present in one test set."
Or I use monte carlo repetitions which allows for samples to be used more than once as testing set, but then I cant customize the sets anymore.
I'm thankful for any hint.
9 comentarios
Harald
el 1 de Abr. de 2024
Hi,
I am not sure if I understand the question correctly: do you want to use some samples always for testing, but never for training? In that case, I'd use cvpartition on all samples except those and would manually add the samples back in that are always supposed to be used for testing. The setdiff function may be helpful for the "all except" part.
If you get error messages, please always also include the specific code that generated the error message.
Best wishes,
Harald
Tobias Rieker
el 2 de Abr. de 2024
Harald
el 3 de Abr. de 2024
I am assuming that by the "Sequential Feature Selection" function, you mean sequentialfs.
Not sure if that will work and behave as you intend, but consider creating the cvpartition object only from Labels{2} and then customizing it the way you want and adding the additional test samples in when defining the function for criterion selection. Assuming your current function is called myFun, you could then use an anonymous function handle to include the samples that need to always be included:
fun = @(XTrain,yTrain,XTest,yTest) myFun(XTrain,yTrain, [XTest; XAlwaysIn], [yTest; yAlwaysIn])
Best wishes,
Harald
Tobias Rieker
el 3 de Abr. de 2024
Editada: Tobias Rieker
el 3 de Abr. de 2024
Harald
el 3 de Abr. de 2024
Suggestion based on this:
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, [XTest; XAlwaysIn], [yTest; yAlwaysIn]);
[toKeep, ranking] = sequentialfs(fun,X,y,"cv",c,"nfeatures",nfeatures,"options",opts);
Basically, the anonymous function handle serves to modify the inputs to your function.
In the call to sequentialfs, I believe you need to pass all X and y values, not just the training data. Otherwise, sequentialfs will not have access to the test data. It might be that you are already doing this and that I was just misled by the choice of variable names XTrain and yTrain in the call to sequentialfs.
Best wishes,
Harald
Tobias Rieker
el 3 de Abr. de 2024
Editada: Tobias Rieker
el 3 de Abr. de 2024
Without access to your complete code and data, it is hard for me to tell what is going on.
I would think that XTest and SFS_xAlwaysIn have different numbers of columns or that this happens for YTest and SFS_yAlwaysIn.
For easier debugging, set UseParallel to false for the moment. Then, set a breakpoint in the anonymous function to view the dimensions of the variables, see
For further assistance, please provide a fully reproducible example, including sample data.
Best wishes,
Harald
Tobias Rieker
el 3 de Abr. de 2024
Harald
el 3 de Abr. de 2024
Duh... that makes sense. I suppose it will take some fiddling to address this.
I would try this strategy:
- To be able to determine which columns were chosen, add fake data 1:numColumns to on top of the x-values and some nonsense value that does not appear in your y-values on top of the y-values that you supply to sequentialfs.
- Identify which of the y-values passed to the function (either yTrain or yTest) contains the nonsense value. Extract the corresponding row of x-values from xTrain or xTest. This will tell you which columns were sent into the function.
- Extract the corresponding columns from SFS_xAlwaysIn and add it to the test data. Be sure to remove the fake data of the first step.
I expect this to be somewhat tricky and would be happy to try to help, but would really need some sample data for SFS_xtrain and SFS_ytrain to play with. Perhaps I should be able to infer this, but I am not even sure of the data type of SFS_ytrain.
Best wishes,
Harald
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Text Analytics Toolbox en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!