Add terms to generalized linear regression model
Create a generalized linear regression model using one predictor, and then add another predictor.
Generate sample data using Poisson random numbers with two underlying predictors
rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);
Create a generalized linear regression model of Poisson data. Include only the first predictor in the model.
mdl = fitglm(X,y,'y ~ x1','Distribution','poisson')
mdl = Generalized linear regression model: log(y) ~ 1 + x1 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 2.7784 0.014043 197.85 0 x1 1.1732 0.0033653 348.6 0 100 observations, 98 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 1.25e+05, p-value = 0
Add the second predictor to the model.
mdl1 = addTerms(mdl,'x2')
mdl1 = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 1.0405 0.022122 47.034 0 x1 0.9968 0.003362 296.49 0 x2 1.987 0.0063433 313.24 0 100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0
terms— Terms to add to regression model
Terms to add to the regression model
as one of the following:
Character vector or string scalar formula in Wilkinson Notation representing one or more terms. The variable names in the formula must be valid MATLAB® identifiers.
T of size
t-by-p, where t is the
number of terms and p is the number of predictor variables in
mdl. The value of
T(i,j) is the exponent
j in term
For example, suppose
mdl has three variables
C in that
order. Each row of
T represents one term:
[0 0 0] — Constant term or intercept
[0 1 0] —
A^0 * B^1 * C^0
[1 0 1] —
[2 0 0] —
[0 1 2] —
addTerms treats a group of indicator variables for a
categorical predictor as a single variable. Therefore, you cannot specify an
indicator variable to add to the model. If you specify a categorical
predictor to add to the model,
addTerms adds a group of
indicator variables for the predictor in one step.
NewMdl— Generalized linear regression model with additional terms
Generalized linear regression model with additional terms, returned as a
NewMdl is a newly fitted model that uses the input
data and settings in
mdl with additional terms
To overwrite the input argument
mdl, assign the newly
fitted model to
mdl = addTerms(mdl,terms);
addTerms treats a categorical predictor as follows:
A model with a categorical predictor that has L levels
(categories) includes L – 1 indicator variables. The model uses the first category as a
reference level, so it does not include the indicator variable for the reference
level. If the data type of the categorical predictor is
categorical, then you can check the order of categories
categories and reorder the
categories by using
reordercats to customize the
addTerms treats the group of L – 1 indicator variables as a single variable. If you want to treat
the indicator variables as distinct predictor variables, create indicator
variables manually by using
dummyvar. Then use the
indicator variables, except the one corresponding to the reference level of the
categorical variable, when you fit a model. For the categorical predictor
X, if you specify all columns of
dummyvar(X) and an intercept term as predictors, then the
design matrix becomes rank deficient.
Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor.
Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels.
You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.