How to stop xgboost_train function from overwriting MATLAB's random number generator?
22 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
STY1980
el 14 de Feb. de 2025
Comentada: STY1980
el 20 de Feb. de 2025
I am trying to solve a regression problem using the following functions for XGBoost in MATLAB:
As is common in machine learning problems, I am repeating the analysis a number of times with randomly selected training and testing data to have a better understanding of the variance error. However, it seems that xgboost_train function is "overwriting" the random generation algorithm of MATLAB in some way. In other words, after running xgboost_train in a loop, commands such as randi or randperm generate the same number over and over which results in generation of the identical training and testing data. I have never observed a similar issue with any other ML algorithm. The following code may help to explain my point. In this code, if xgboost_train (and xgboost_test) are deactivated, randperm, randi and rand ,as expected, generate different random numbers after each repeat. In case those functions are activated, the generated numbers will remain constant and do not change at all.
By the way, I am using 'Accuracy' for the evaluation criterion (MSE or MAE didn't work for me). Moreover, I have changed params.objective to 'reg:squarederror' in xgboost_train function.
clear; close all; clc
%% Loading carsmall data, removing NaN data, defining features (X) and output/target (Y) arrays
rawData = load('carsmall') ;
nn = 0 ;
Y = zeros(1,1);
X = zeros(1,6) ;
for ii = 1:length(rawData.MPG)
if ~isnan(rawData.MPG(ii))
nn = nn + 1 ;
Y(nn) = rawData.MPG(ii) ;
X(nn,1) = rawData.Cylinders(ii);
X(nn,2) = rawData.Displacement(ii);
X(nn,3) = rawData.Horsepower(ii);
X(nn,4) = rawData.Weight(ii);
X(nn,5) = rawData.Acceleration(ii);
X(nn,6) = rawData.Model_Year(ii);
end
end
Y = transpose(Y) ;
%% Running xgboost_train and xgboost_test functions
for ii = 1:1:5
rate = 0.8;
n = length(Y) ;
r = randperm(n) ; % random index
randi(1000) % This is simply to control if numbers are generated in a random manner
rand % This is simply to control if numbers are generated in a random manner
ntrain = round(rate*n); % #training samples
Xtrain = X(r(1:ntrain),:); % training set
Ytrain = Y(r(1:ntrain),:); % observed training variable
Xtest = X(r(ntrain+1:end),:); % test set
Ytest = Y(r(ntrain+1:end),:); % observed test variable
model_filename = '';
model = xgboost_train(Xtrain,Ytrain,[],999,'Accuracy',model_filename) ; %deactivate this line so that numbers will be generated randomly by randi, randperm & rand commands
model_filename = 'D:\xgboost_model.xgb' ;
loadmodel = 0 ;
Yp = xgboost_test(Xtest,model,loadmodel); %This line needs to be deactivated as well is model = xgboost_train(...) has been deactivated
end
0 comentarios
Respuesta aceptada
Darshak
el 17 de Feb. de 2025
Hello,
I faced a similar issue while running a model I found on file exchange, where I wanted to do random cross validation. When I inspected the model function code, I found that a seed was explicitly set for generating random numbers, due to which the same random numbers were generated on every execution. I found similar lines of code in the “xgboost_train.m” file.
The lines I am referring to are as follows:
rand('state', 0); u1 = rand(size(Xtrain,1),1); cvind = sortrows([u1 , cvind],1); cvind = cvind(:,2:end); clear u1
and,
rand('state', 0); u1 = rand(size(Xtrain,1),1); cvind = sortrows([u1 , cvind],1); cvind = cvind(:,2:end); clear u1
as you see we are setting the seed here, you can replace this lines with the following code for random generation on every execution:
rng('shuffle');u1 = rand(size(Xtrain,1),1); cvind = sortrows([u1 , cvind],1); cvind = cvind(:,2:end); clear u1
and,
rng('shuffle'); u1 = rand(size(Xtrain,1),1); cvind = sortrows([u1 , cvind],1); cvind = cvind(:,2:end); clear u1
This should resolve the issue. You can refer to the following documentation for more information related to rng function and control of random number generation:
I hope this helps with your doubt.
Más respuestas (0)
Ver también
Categorías
Más información sobre Mathematics and Optimization en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!