Lasso/Elastic Net feature selection with kFold crossvalidation
8 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I want to understand how Lasso/Elastic Net regression selects the final features when using kFold cross-validation and using the function: [B, stats] = lasso(featData, classData, 'CV', 10) (from the Statistics & ML toolbox).
In my understanding, if the model is trained 10 times on different subsets of the total sample, this may result in different features selected/penalized in every fold. However, the cross-validated model output does not provide any insight on the variability of those features across different folds. Is the best model simply chosen among all folds and applied to the entire training set? Or are features averaged/weighted based on their stability across folds?
There was a related question previously, but nobody ever answered it:
https://www.mathworks.com/matlabcentral/answers/125357-understanding-k-fold-cross-validation
Thanks for your help!
1 comentario
Tyson
el 23 de Jul. de 2018
This is an important thread. We are also looking for clarification on this exact question. We do not find any info about the beta values for the k-folds in the FitInfo, only a single set of beta values for each lambda. Exactly how were these betas determined?
Respuestas (1)
Bernhard Suhm
el 22 de Abr. de 2018
Crossvalidation just applies to assessing model performance. As described in doc , with kfold the average error across the k different partitions will be reported. The model is trained on the complete dataset that you provide to the training function, in this case, "lasso".
3 comentarios
Bernhard Suhm
el 30 de Abr. de 2018
You are right, and asked internally for additional clarification. If you use the kfold argument, you don't get a "final" model back with features weighted or averaged, but pointers to all k models, whose coefficients (or selected features) may slightly differ. If they do differ, that would be a sign those features aren't very strong, so you wouldn't want them in your final model. - You can get additional information on the various fitted models in the FitInfo field of the output object, but you have to analyze the variability across different objects yourself. - Alternatively, you can retrain the model without k-fold, which will give you the best features using the complete data set.
Ver también
Categorías
Más información sobre Gaussian Process Regression en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!