Cross-validate Gaussian process regression model
cvMdl = crossval(gprMdl)
cvmdl = crossval(gprMdl,Name,Value)
returns the partitioned model,
cvMdl = crossval(
cvMdl, built from the Gaussian process regression (GPR) model,
gprMdl, using 10-fold cross validation.
cvmdl is a
RegressionPartitionedModel object, and
gprMdl is a
RegressionGP (full) object.
returns the partitioned model,
cvmdl = crossval(
cvmdl, with additional options specified by one or more
Name,Value pair arguments. For example, you can specify the number of folds or the fraction of the data to use for testing.
gprMdl— Gaussian process regression model
Gaussian process regression model, specified as a
RegressionGP (full) object. You cannot call
crossval on a compact regression object.
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'CVPartition'— Random partition for a k-fold cross validation
'Holdout'— Fraction of data to use for testing
Fraction of the data to use for testing in holdout validation, specified as the comma-separated pair consisting of
'Holdout' and a scalar value in the range from 0 to 1. If you specify
1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data
2. Stores the compact, trained model in
'Holdout', 0.3 uses 30% of the data for testing and 70% of the data for training.
'KFold'— Number of folds
Number of folds to use in cross-validated GPR model, specified as the comma-separated pair consisting of
'KFold' and a positive integer value greater than 1.
Kfold must be greater than 1. If you specify
1. Randomly partitions the data into k sets.
2. For each set, reserves the set as test data, and trains the model using the other k – 1 sets.
3. Stores the k compact, trained models in the cells of a k-by-1 cell array in
'KFold',5 uses 5 folds in cross-validation. That is, for each fold, it uses that fold as test data, and trains the model on the remaining 4 folds.
'Leaveout'— Indicator for leave-one-out cross-validation
Indicator for leave-one-out cross-validation, specified as the comma-separated pair consisting of
'LeaveOut' and either
'off'. If you specify
'Leaveout','on', then, for each of the n observations,
1. Reserves the observation as test data, and trains the model using the other n – 1 observations.
2. Stores the n compact, trained models in the cells of a n-by-1 cell array in
cvgprMdl— Partitioned Gaussian process regression model
Partitioned Gaussian process regression model, returned as a
The dataset has 506 observations. The first 13 columns contain the predictor values and the last column contains the response values. The goal is to predict the median value of owner-occupied homes in suburban Boston as a function of 13 predictors.
Load the data and define the response vector and the predictor matrix.
load('housing.data'); X = housing(:,1:13); y = housing(:,end);
Fit a GPR model using the squared exponential kernel function with separate length scale for each predictor. Standardize the predictor variables.
gprMdl = fitrgp(X,y,'KernelFunction','ardsquaredexponential','Standardize',1);
Create a cross-validation partition for data using predictor 4 as a grouping variable.
rng('default') % For reproducibility cvp = cvpartition(X(:,4),'kfold',10);
Create a 10-fold cross-validated model using the partitioned data in
cvgprMdl = crossval(gprMdl,'CVPartition',cvp);
Compute the regression loss for in-fold observations using models trained on out-of-fold observations.
L = kfoldLoss(cvgprMdl)
L = 9.5299
Predict the response for in-fold observations, i.e. observations not used for training.
ypred = kfoldPredict(cvgprMdl);
For every fold,
kfoldPredict predicts responses for observations in that fold using the models trained on out-of-fold observations.
Plot the actual responses and prediction data.
plot(y,'r.'); hold on; plot(ypred,'b--.'); axis([0 510 -15 65]); legend('True response','GPR prediction','Location','Best'); hold off;
Read the data into a
tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false);
The dataset has 4177 observations. The goal is to predict the age of abalone from 8 physical measurements.
Fit a GPR model using the subset of regressors (
sr) method for parameter estimation and fully independent conditional (
fic) method for prediction. Standardize the predictors and use a squared exponential kernel function with a separate length scale for each predictor.
gprMdl = fitrgp(tbl,tbl(:,end),'KernelFunction','ardsquaredexponential',... 'FitMethod','sr','PredictMethod','fic','Standardize',1);
Cross-validate the model using 4-fold cross validation. This partitions the data into 4 sets. For each set,
fitrgp uses that set (25% of the data) as the test data, and trains the model on the remaining 3 sets (75% of the data).
rng('default') % For reproducibility cvgprMdl = crossval(gprMdl,'KFold',4);
Compute the loss over individual folds.
L = kfoldLoss(cvgprMdl,'mode','individual')
L = 4.3669 4.6896 4.0565 4.3162
Compute the average cross-validated loss on over all folds. The default is the mean squared error.
L2 = kfoldLoss(cvgprMdl)
L2 = 4.3573
This is equal to the mean loss over individual folds.
mse = mean(L)
mse = 4.3573
You can only use one of the name-value pair arguments at a time.
You cannot compute the prediction intervals for a cross-validated model.
Alternatively, you can train a cross-validated model using the related name-value pair arguments in
If you supply a custom
'ActiveSet' in the call to
fitrgp, then you cannot cross validate the GPR model.
 Harrison, D. and D.L., Rubinfeld. "Hedonic prices and the demand for clean air." J. Environ. Economics & Management. Vol.5, 1978, pp. 81-102.
 Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994.
 Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." University of Tasmania Department of Computer Science thesis, 1995.
 Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.