Would kfold loss values vary if cross validation is performed after model training?

Question

Charles Bergen el 9 de Mayo de 2025

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2177046-would-kfold-loss-values-vary-if-cross-validation-is-performed-after-model-training

Editada: the cyclist el 10 de Mayo de 2025

I am concerned about the difference in cross validated (CV) predictions (kfoldpredict) in regression bagged ensembles (fitrensemble) if CV is performed after a model has been trained. If I understand this correctly, a fitrensemble model without CV will have access to all available variables in a data set. Thus generated trees will have a unique set of node split values different from node split values found in trees generated from a fitrensemble with CV on. Differences in these split values would then lead to an overall difference in possible outcomes for constructed trees in both models.

I guess this would boil down to, does the crossval and subsequent kfoldloss or kfoldpredict (really any CV predict functions) functions account for these differences when supplied a model that did not peform initial cross validation?

If there is an error in my thoughts, please let me know.

I tried to supply an example of my question below.

% No initial CV

Mdl = fitrensemble(looperValues(:,1:cherrios), allratios2,... 'Learners',t,'Weights',W1,'Method','Bag','NumLearningCycles',numblearningcyc,'Options',statset('UseParallel',true));

Mdl_CV_After_Training = crossval(MdllooperPhyschemMexB, 'KFold', 10);

Mdl_CV_After_Training_kfold_predictions = kfoldpredict(Mdl_CV_After_Training)

VS

% Yes initial CV

Mdl = fitrensemble(looperValues(:,1:cherrios), allratios2, 'Learners', t, 'Crossval', 'On','Weights',W1,'Method','Bag','NumLearningCycles',numblearningcyc,'Options',statset('UseParallel',true));

Mdl_Yes_CV_kfold_predictions = kfoldpredict(Mdl_CV_After_Training)

% Would Mdl_CV_After_Training_kfold_predictions == Mdl_Yes_CV_kfold_predictions?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

the cyclist el 9 de Mayo de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2177046-would-kfold-loss-values-vary-if-cross-validation-is-performed-after-model-training#answer_1564992

Abrir en MATLAB Online

The predictions will be identical, as long as you use the same fold assignments:

% Set seed, for reproducibility
rng default
% Simulate some data
N = 100;
X = randn(N,3);
y = sum(X+0.5*randn(N,1),2);
% Define a partition (which will be used for both models)
p = cvpartition(N,'KFold',10);
% Train one model using cross-validation during training
mdl_1 = fitrensemble(X,y,'CrossVal','on','CVPartition',p);
% Train a second model without using cross-validation during training, but apply it afterward
mdl_2 = fitrensemble(X,y);        
mdl2_cv = crossval(mdl_2,'CVPartition',p);
% Make the k-fold predictions
y1 = kfoldPredict(mdl_1);
y2 = kfoldPredict(mdl2_cv);
% See if they are equal -- THEY ARE!
isequal(y1,y2)
ans = logical
   1

If you do not make sure the two models use exactly the same fold assignments, the predictions will not be identical, but they will be statistically equivalent.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

the cyclist el 9 de Mayo de 2025

Editada: the cyclist el 10 de Mayo de 2025

Abrir en MATLAB Online

To make an analogy ...

If you used

N = 1000;
x1 = randn(N,1);
x2 = randn(N,1);

to draw two samples of (pseudo)randomly generated values from a normal distribution, you would not expect those to be identical samples unless you set the seed each time, to get the same sequence. However you would expect the two samples to have the same statistical properties (the same within sampling error). Same mean, standard deviation, etc.

Similarly, I would not expect your predictions to be identical, but for all properites to be the same to within sampling error.

Charles Bergen el 9 de Mayo de 2025

I appreciate the insight.

Iniciar sesión para comentar.

Would kfold loss values vary if cross validation is performed after model training?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Would kfold loss values vary if cross validation is performed after model training?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo