How to understand the fsrftest function in Regression Learner App

6 visualizaciones (últimos 30 días)
ako
ako el 4 de Abr. de 2023
Respondida: Drew el 5 de Abr. de 2023
From the document, I find that the algorithm description of the fsrftest function, which performs feature selection based on variable importance in regression learners, is as follows:
Univariate Feature Ranking Using F-Tests
  • fsrftest examines the importance of each predictor individually using an F-test. Each F-test tests the hypothesis that the response values grouped by predictor variable values are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. A small p-value of the test statistic indicates that the corresponding predictor is important.
  • The output scores is –log(p). Therefore, a large score value indicates that the corresponding predictor is important. If a p-value is smaller than eps(0), then the output is Inf.
  • fsrftest examines a continuous variable after binning, or discretizing, the variable. You can specify the number of bins using the 'NumBins' name-value pair argument
So,how should I understand the statement "In regression tasks, the null hypothesis (H0) of the F-test is that at least one parameter to be fitted is zero," and does it conflict with the statement in the document?

Respuesta aceptada

Drew
Drew el 5 de Abr. de 2023
The short answer is that there is no conflict.
fsrftest is as described in the documentation at https://www.mathworks.com/help/stats/fsrftest.html, which is referenced from the Regression Learner documentation https://www.mathworks.com/help/stats/feature-selection-and-feature-transformation-using-regression-learner-app.html. Note that fsrftest is a univariate feature ranking for regression using F-tests which examines the importance of each predictor individually.
For a more general look at F-tests for linear models, see MATLAB documentation https://www.mathworks.com/help/stats/f-statistic-and-t-statistic.html. On that documentation page, there is a concept of the F-test for an entire linear model. My guess is that the final statement you quoted (or perhaps slightly misquoted?) is related to an F-test for an entire linear model, or something similar. For example, at https://online.stat.psu.edu/stat501/lesson/6/6.4, you can see the final statement "In general, to test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F-test reported in the analysis of variance table."
To see the overall F-test for a model created in regression learner, you can export the model, then examine the model using steps shown in the MATLAB documentation at https://www.mathworks.com/help/stats/f-statistic-and-t-statistic.html. There is an example below, where the F-statistic for the overall model vs a constant model is 595, as seen at the bottom of the first output, or in more detail (with less rounding), the F-statistic is 594.88 as seen in the output of the anova command.
% trainedModel is a linear model, built on Fisher iris data, which was
% exported from Regression Learner
>> trainedModel.LinearModel
ans =
Linear regression model:
PetalWidth ~ 1 + SepalLength + SepalWidth + PetalLength + Species
Estimated Coefficients:
Estimate SE tStat pValue
_________ ________ _______ __________
(Intercept) -0.47314 0.17659 -2.6793 0.008237
SepalLength -0.092934 0.044585 -2.0844 0.038888
SepalWidth 0.2422 0.047757 5.0716 1.1956e-06
PetalLength 0.2422 0.048842 4.9589 1.9687e-06
Species_versicolor 0.64811 0.12314 5.2631 5.0406e-07
Species_virginica 1.0464 0.16548 6.3232 3.0296e-09
Number of observations: 150, Error degrees of freedom: 144
Root Mean Squared Error: 0.167
R-squared: 0.954, Adjusted R-Squared: 0.952
F-statistic vs. constant model: 595, p-value = 3.03e-94
%% For some other details, use the anova command
>> anova(trainedModel.LinearModel,'summary')
ans =
5×5 table
SumSq DF MeanSq F pValue
______ ___ ________ ______ __________
Total 86.57 149 0.58101
Model 82.572 5 16.514 594.88 3.0343e-94
Residual 3.9976 144 0.027761
. Lack of fit 3.8826 138 0.028135 1.4679 0.33497
. Pure error 0.115 6 0.019167

Más respuestas (0)

Categorías

Más información sobre Analysis of Variance and Covariance en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by