Regression tree and prediction equation

1 visualización (últimos 30 días)
Danish Nasir
Danish Nasir el 3 de Nov. de 2022
Comentada: the cyclist el 5 de Nov. de 2022
Suppose i have 3 independent variables A,B and C and dependent variable T. The variable A is discrete and B,C are continuous. The output variable T is also continuous. In such situation we need to create Regression tree. How can we generate prediction equation for such regression tree in MATLAB?
E.g.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
  5 comentarios
Danish Nasir
Danish Nasir el 4 de Nov. de 2022
Yes the length of each variable should be same. Say it is 6 for each variable ( take first 6 values ). I want to consider A as categorical variable. When one of the input is categorical, we can't use Multiple regression but instead use Regression tree. How can i predict T using MATLAB?
dpb
dpb el 4 de Nov. de 2022
As noted above, the MATLAB fitlm know how to handle the categorical variables automagically.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tABC=array2table([A;B(1:numel(A));C(1:numel(A));T(1:numel(A))].','VariableNames',{'A','B','C','T'})
tABC = 6×4 table
A B C T ___ ____ _ ___ 50 0.45 3 1.2 75 0.55 4 1.8 100 0.75 5 2.1 125 0.8 6 2.3 150 0.9 7 2.5 175 1 8 2.7
mdl=fitlm(tABC,'categorical',{'A'})
Warning: Regression design matrix is rank deficient to within machine precision.
mdl =
Linear regression model: T ~ 1 + A + B + C Estimated Coefficients: Estimate SE tStat pValue ________ __ _____ ______ (Intercept) 0.3 0 Inf NaN A_75 0.3 0 Inf NaN A_100 0.3 0 Inf NaN A_125 0.2 0 Inf NaN A_150 0.1 0 Inf NaN A_175 0 0 NaN NaN B 0 0 NaN NaN C 0.3 0 Inf NaN Number of observations: 6, Error degrees of freedom: 0 R-squared: 1, Adjusted R-Squared: NaN F-statistic vs. constant model: NaN, p-value = NaN
While it runs, the toy dataset is deficient in that the three independent variables are all almost exact linear combinations of the first so there's only one of the three that is estimable...observe
corrcoef(tABC{:,:})
ans = 4×4
1.0000 0.9876 1.0000 0.9694 0.9876 1.0000 0.9876 0.9770 1.0000 0.9876 1.0000 0.9694 0.9694 0.9770 0.9694 1.0000

Iniciar sesión para comentar.

Respuesta aceptada

the cyclist
the cyclist el 5 de Nov. de 2022
This model is probably nonsense, because of the linear dependencies that @dpb points out. But perhaps your real data will yield a useful model. (Note that I transposed all your variables before putting them in a table.)
A = [ 50 75 100 125 150 175 ]';
Acat = categorical(A);
B = [ 0.45 0.55 0.75 0.8 0.9 1]';
C = [3 4 5 6 7 8]';
T= [ 1.2 1.8 2.1 2.3 2.5 2.7]';
tbl = table(Acat,B,C,T);
mdl=fitrtree(tbl,"T ~ Acat + B + C")
mdl =
RegressionTree PredictorNames: {'Acat' 'B' 'C'} ResponseName: 'T' CategoricalPredictors: 1 ResponseTransform: 'none' NumObservations: 6 Properties, Methods
  2 comentarios
Danish Nasir
Danish Nasir el 5 de Nov. de 2022
Yes the data set for each variable has 400 elements. A (categorical variable) has the mentioned 6 values kept repeating. The range of B is 0.5 to 2 while the range of C is 3 to 23. The range of T is 2 to 7.
A=400x1,B=400x1,C=400x1,T=400x1
Now i need a prediction model which can predict T using Regression Tree in Matlab.
the cyclist
the cyclist el 5 de Nov. de 2022
The model the way I specified it should do what you want. You can then use that model's predict method to predict T for new values.

Iniciar sesión para comentar.

Más respuestas (0)

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by