How to replicate Regression Learner app based training using Matlab script?

47 visualizaciones (últimos 30 días)
Quazi Hussain
Quazi Hussain el 15 de Ag. de 2025 a las 18:29
Editada: dpb el 18 de Ag. de 2025 a las 17:38
I have trained a ML model in regression learner app using optimizable GPR model using the default setting such as 5 k-fold validation and 30 iterations etc. Now I am traying to do the same using the Matlab script.using the following where X are the resgressors and Y is the response variable.
>> ML_mdl=fitrgp(X,Y,'OptimizeHyperparameters','all','HyperparameterOptimizationOptions',struct('KFold',5))
Are the two resulting models more or less equivalent? I know there will be some difference due to the probabilistic nature of the algorithm. When I test it on the entire training set, the R squared value is practically 1.0. Is it overfitting even with K-fold cross-correlation? The prediction on unseen testing set is not that good. Any suggestions?
  3 comentarios
dpb
dpb el 16 de Ag. de 2025 a las 19:29
Editada: dpb el 17 de Ag. de 2025 a las 16:56
"Is it overfitting even with K-fold cross-correlation? The prediction on unseen testing set is not that good. Any suggestions?"
Possibly. Depends on how much data you've got although the other possibility is that the other dataset simply is different from the dataset used for training.
Without the data to look at, we're shooting in the dark.
As an aside, regarding @Umar's comment "slightly different", a recent thread here in the forum illustrated that the randomized selection of the training dataset occasionally produced a grossly different result from the same overall dataset. That indicated that there were subsets of the total dataset that had markedly different characteristics than other random subsets. One cannot naively assume that recalculating with a different training subset will always produce model estimates that are similar; that will be true only if all random subsets of the overall data are similar to each other in their pertinent characteristics. In particular, different models are sensitive to different things; for example some may be very susceptible to outliers in which case a single training set that happens to pick one outlier may result in a very different model from a training set without any such extreme values. Unfortunately, "it all depends" and about the only way to know with such algorithms is to run a number of times and observe just how stable (or unstable) the results are.
OLS on the other hand, uses the entire dataset and so is deterministic although again the results may be affected by the presence of outliers and just how strongly is still dependent upon the particular model chosen.
Umar
Umar el 17 de Ag. de 2025 a las 17:49

@dpb - You're absolutely right, I oversimplified that. The "slightly different" comment assumes well-behaved data, but as you point out, some datasets can produce dramatically different models depending on the random subset selection.

For @Quazi Hussain's case, this variability could actually explain the overfitting issue. If CV folds are inconsistent due to data heterogeneity, the hyperparameter optimization might be fitting noise rather than signal.

Good suggestion to run multiple times with different seeds to check stability - high variability would indicate the dataset sensitivity you mentioned.

Thanks for the clarification.

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 15 de Ag. de 2025 a las 20:50
To replicate the fit, save generate the function in the learner app.
To produce the identical results, set the random number seed before doing the fit calculation in both the Learner app and from the command line.
  2 comentarios
Quazi Hussain
Quazi Hussain el 18 de Ag. de 2025 a las 13:43
In script, I can set the random number generator to a seed, say 1, by calling rng(1) right before the fit command. How do I do that in regressionLearenr? Do I do that in Matlab command window prior to involking the app?
>> rng(1)
>> regressionLearner
or, there is somewhere in the app setting I can do that? Thanks.
dpb
dpb el 18 de Ag. de 2025 a las 14:28
Editada: dpb el 18 de Ag. de 2025 a las 17:38
Yes, set in Matlab command window prior to invoking(*) the app; the random number generator stream is global in MATLAB so it will pick up from the last invocation/reset.
This means, of course, that you can't call anything else that generates another random number between the setting and the evaluation in order to be at the same point between the two.
It probably would not be a bad enhancement request to ask for there to be a way to set the seed inside the app to facilitate such use.
ADDENDUM:
(*) Actually, you should be able to just go to the command line while in the app and reset the seed...that would be easy enough to check that if set first, then run a fit that if then reset the seed to the same value that can replicate the fit.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Support Vector Machine Regression en Help Center y File Exchange.

Productos


Versión

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by