Implement Bayesian Linear Regression

Econometrics Toolbox™ includes a self-contained framework that allows you to implement Bayesian linear regression. The framework contains two groups of prior models for the regression coefficients β and the disturbance variance σ²:

Standard Bayesian linear regression prior models — The five prior model objects in this group range from the simple conjugate normal-inverse-gamma prior model through flexible prior models specified by draws from the prior distributions or a custom function. Although standard prior models can serve several purposes, they are best suited for posterior estimation, simulation (from the joint or, for most models, conditional posterior), and forecasting from the posterior predictive distribution.
Bayesian prior models for predictor variable selection — The models in this group can perform Bayesian lasso regression or stochastic search variable selection (SSVS). They are best suited for posterior estimation, during which the predictor selection algorithm occurs. The resulting posterior model is represented by draws from the Gibbs sampler (an empirical model object), and the estimation summary contains the results from the predictor selection algorithm. The estimation procedure does not remove insignificant or redundant variables for you, but the coefficients of a well-tuned model are close to zero. Therefore, as with standard models, you can simulate draws from the posterior distribution or forecast from the posterior predictive distribution without having to remove any variables.

The general workflow for estimating features of the posterior distributions, and then forecasting responses given new predictor data, is:

Use bayeslm to create a prior model object that represents your assumptions about the distribution or a prior model object that is suited for predictor selection.
Pass the prior model object and the predictor and response data to the estimate function. By default, estimate returns a model object that represents the posterior distribution.
Pass the model object representing the posterior distribution to forecast.

For details on the standard Bayesian linear regression workflow, see Workflow for Standard Bayesian Linear Regression Models, and for Bayesian predictor selection, see Workflow for Bayesian Predictor Selection.

Workflow for Standard Bayesian Linear Regression Models

The following procedure describes the workflow for estimating features of the posterior distributions, and then forecasting responses given predictor data.

Choose a joint prior distribution for (β,σ²). Then, using bayeslm, create the Bayesian linear regression model object that completely specifies your beliefs about the joint prior distribution. This table contains the available prior model objects.

Model Object	Joint Prior Distribution of (β,σ²)	When to Create
`conjugateblm`	π(β\|σ²) is Gaussian with mean `Mu` and covariance σ²`V`. π(σ²) is inverse gamma with shape `A` and scale `B`.	Create this model object when all of the following are true: You are fairly confident that the parameters have the corresponding joint prior, and that β depends on σ². You want to incorporate your prior knowledge of the prior mean and covariance of β and the shape and scale of σ². You want analytical forms for the marginal and conditional posteriors, which are normal-inverse-gamma conjugate distributions for both types.
`semiconjugateblm`	π(β) is Gaussian with mean `Mu` and covariance `V`. π(σ²) is inverse gamma with shape `A` and scale `B`. β and σ² are independent.	Create this model object when all of the following are true: You are fairly confident that the parameters have the corresponding joint prior, and that β and σ² are independent. You want to incorporate your prior knowledge of the prior mean and covariance of β and the shape and scale of σ². You want analytical forms for the conditional posteriors, which are normal-inverse-gamma conjugate conditional distributions.
`diffuseblm`	$π (β, σ^{2}) \propto \frac{1}{σ^{2}} .$	Create this model object when all of the following are true: You want the posterior to be influenced more by the information in the data than by the prior. The joint prior distribution is inversely proportional to σ² (Jeffreys noninformative prior [2]). You want analytical forms for the marginal and conditional posteriors, which are the normal-inverse-gamma conjugate distributions for both types.
`customblm`	A function handle of a custom function that computes the log of the joint prior distribution.	Create this model object when you want to specify the log of the joint prior distribution. This specification allows for maximum flexibility.

Given the data, estimate features of the posterior distributions. The functions to use in this step depend on your analysis goals.

Function Goals

Function	Goals
`estimate`	Obtain a posterior model object for forecasting. Posterior model objects include: Estimates of the mean and covariance matrix of the marginal posterior π(β\|y,x) and the mean and variance of π(σ²\|y,x). Marginal posteriors and their parameter values. Analytical solutions are available for `conjugateblm` and `diffuseblm` prior models. For all other prior models, `estimate` must use Monte Carlo sampling. 95% equitailed, credible intervals. For nonanalytical posteriors, 95% equitailed, credible intervals are the 0.025-quantile and the 0.975-quantile of the retained Monte Carlo sample. Estimate the mean and covariance of the conditional distribution π(β\|σ²,y,x), that is, implement linear regression with σ² held fixed. Update an existing posterior distribution based on new data.
`simulate`	Approximate the expected value of a function of the parameters with respect to the joint posterior π(β,σ²\|y,x). That is, draw multiple samples of (β,σ²) from their joint posterior, apply a function to each draw, and then compute the average of the transformed draws. Draw from the conditional posterior distributions π(β\|σ²,y,x) and π(σ²\|β,y,x). This selection is convenient for running a Markov chain Monte Carlo (MCMC) sampler, such as a Gibbs sampler.

estimate

Obtain a posterior model object for forecasting. Posterior model objects include:
- Estimates of the mean and covariance matrix of the marginal posterior π(β|y,x) and the mean and variance of π(σ²|y,x).
- Marginal posteriors and their parameter values. Analytical solutions are available for conjugateblm and diffuseblm prior models. For all other prior models, estimate must use Monte Carlo sampling.
- 95% equitailed, credible intervals. For nonanalytical posteriors, 95% equitailed, credible intervals are the 0.025-quantile and the 0.975-quantile of the retained Monte Carlo sample.
Estimate the mean and covariance of the conditional distribution π(β|σ²,y,x), that is, implement linear regression with σ² held fixed.
Update an existing posterior distribution based on new data.

simulate

Approximate the expected value of a function of the parameters with respect to the joint posterior π(β,σ²|y,x). That is, draw multiple samples of (β,σ²) from their joint posterior, apply a function to each draw, and then compute the average of the transformed draws.
Draw from the conditional posterior distributions π(β|σ²,y,x) and π(σ²|β,y,x). This selection is convenient for running a Markov chain Monte Carlo (MCMC) sampler, such as a Gibbs sampler.

If you have a custom prior model (customblm object), then choose a Markov chain Monte Carlo (MCMC) sampler when you call estimate or simulate. This table contains a list of supported MCMC samplers. After choosing a sampler, try the default tuning parameter values first.

MCMC Sampler Specify Using Description

Hamiltonian Monte Carlo (HMC)

MCMC Sampler	Specify Using	Description
Hamiltonian Monte Carlo (HMC)	`'Sampler',"hmc"`	Because the HMC sampler tunes itself, and resulting samples mix well and converge to their stationary distribution more quickly, try this sampler first. To increase sampling speed, supply the gradient of the log PDF for all or some of the parameters.
Random walk Metropolis	`'Sampler',"metropolis"`	If the sample size is reasonably large and the prior does not dominate the likelihood, then try this sampler. Supported proposal distributions are multivariate normal and multivariate t distributions. Tuning parameters include the distribution, its scale matrix, and its degrees of freedom.
Slice	`'Sampler',"slice"` (default)	To achieve adequate mixing and convergence, carefully tune the typical sampling-interval width. Values are application dependent.

'Sampler',"hmc"

Because the HMC sampler tunes itself, and resulting samples mix well and converge to their stationary distribution more quickly, try this sampler first.

To increase sampling speed, supply the gradient of the log PDF for all or some of the parameters.

Random walk Metropolis

'Sampler',"metropolis"

If the sample size is reasonably large and the prior does not dominate the likelihood, then try this sampler.

Supported proposal distributions are multivariate normal and multivariate t distributions.

Tuning parameters include the distribution, its scale matrix, and its degrees of freedom.

Slice 'Sampler',"slice" (default) To achieve adequate mixing and convergence, carefully tune the typical sampling-interval width. Values are application dependent.

After estimating a nonanalytical posterior by using an MCMC sampler, inspect the posterior or conditional posterior draws for adequate mixing. For more details, see Posterior Estimation and Simulation Diagnostics.

If the quality of the samples is not satisfactory, then create a sampler options structure by using sampleroptions, which allows you to specify the tuning parameter values that are appropriate for the sampler. For example, to specify a random walk Metropolis sampler that uses a multivariate t proposal distribution with 5 degrees of freedom, enter:

options = sampleroptions('Sampler',"metropolis",'Distribution',"mvt",...
     'DegreeOfFreedom',5)

After you create the sampler options structure, specify it when you call estimate or simulate by using the 'Options' name-value pair argument.

Forecast responses given the new predictor data using forecast. The forecast function constructs forecasts from the posterior predictive distribution. Analytical posterior predictive distributions are available for conjugateblm and diffuseblm prior models. For all other prior models, forecast resorts to Monte Carlo sampling. As with estimation and simulation, you can choose an MCMC sampler for customblm models. If forecast uses an MCMC sampler, you should inspect the posterior or conditional posterior draws for adequate mixing.

Workflow for Bayesian Predictor Selection

The following procedure describes the workflow for performing Bayesian predictor selection for linear regression models, and then forecasting responses given predictor data.

Econometrics Toolbox supports two Bayesian predictor selection algorithms: Bayesian lasso regression and SSVS. Choose a predictor selection algorithm, which implies a joint prior distribution for (β,σ²). Then, using bayeslm, create the Bayesian linear regression prior model object that performs the selected predictor selection algorithm, and optionally specify the tuning parameter value. This table contains the available prior model objects for predictor selection. For details on the forms of the prior distributions, see Posterior Estimation and Inference.

Model Object Predictor Selection Algorithm Tuning Parameter When to Create

Model Object	Predictor Selection Algorithm	Tuning Parameter	When to Create
`mixconjugateblm`	SSVS [1]	Gaussian mixture variances, specified by the `'V'` name-value pair argument. Specify a two-column positive matrix. The first column contains the variances for the variable-inclusion component. Specify relatively large values. Values attribute higher probability to coefficients further away from 0. The second column contains the variances for the variable-exclusion component. Specify relatively small values. Values attribute higher probability to coefficients close to 0.	The prior variance of β is a function of σ². You want posterior estimates of the probability that a variable is included in the model.
`mixsemiconjugateblm`	SSVS	Same as `mixconjugateblm`	β and σ² are independent, a priori. You want posterior estimates of the probability that a variable is included in the model.
`lassoblm`	Bayesian lasso regression [3]	λ, specified by the `'Lambda'` name-value pair argument. You can supply a positive scalar or vector. Larger values indicate more regularization, which implies that the prior of β is dense around zero. If predictor variables differ in scale by orders of magnitude, then supply a vector of shrinkages, where elements correspond to coefficients.	You want to force insignificant and redundant predictor variables to have coefficients with posterior mode of zero and fairly tight 95% posterior credible intervals around that mode.

mixconjugateblm

SSVS [1]

Gaussian mixture variances, specified by the 'V' name-value pair argument. Specify a two-column positive matrix.
The first column contains the variances for the variable-inclusion component.
- Specify relatively large values.
- Values attribute higher probability to coefficients further away from 0.
The second column contains the variances for the variable-exclusion component.
- Specify relatively small values.
- Values attribute higher probability to coefficients close to 0.

The prior variance of β is a function of σ².
You want posterior estimates of the probability that a variable is included in the model.

mixsemiconjugateblm

SSVS

Same as mixconjugateblm

β and σ² are independent, a priori.
You want posterior estimates of the probability that a variable is included in the model.

lassoblm

Bayesian lasso regression [3]

λ, specified by the 'Lambda' name-value pair argument. You can supply a positive scalar or vector.
Larger values indicate more regularization, which implies that the prior of β is dense around zero.
If predictor variables differ in scale by orders of magnitude, then supply a vector of shrinkages, where elements correspond to coefficients.

You want to force insignificant and redundant predictor variables to have coefficients with posterior mode of zero and fairly tight 95% posterior credible intervals around that mode.

Because predictor selection algorithms can be sensitive to differing scales of the predictor variables (Bayesian lasso regression, in particular), determine the scale of the predictors by passing the data to boxplot, or by estimating their means and standard deviations by using mean and std, respectively.
For SSVS prior models, plot the prior distributions of the coefficients by using plot. The plots give you an idea about how the densities of the two Gaussian components balance. You can adjust the variances of a coefficient by using dot notation. For example, to attribute prior variances of 1000 and 0.1 to the variable-inclusion and variable-exclusion components, respectively, for coefficient j, enter:
```
PriorMdl.V(j,:) = [1000 0.1];
```
For more details on specifying V, see [1].
For lasso prior models, determine a regularization path, that is, a series of values for λ to iterate through during posterior estimation. Values are data dependent.
- If the predictor variables have similar scales, specify a left boundary that forces most variables into the model (that is, attribute almost no penalty), specify a right boundary that forces almost all coefficients to 0, and specify enough values in between the boundaries to search the space adequately. By default, the software attributes a shrinkage of 0.01 to the intercept and 1 to all coefficients.
- If the scales of the predictor variables differ by orders of magnitude, then you can rescale or standardize the data. However, these actions make interpreting the coefficients difficult. Instead, specify a regularization path and use it to create a matrix of repeated rows. For those variables whose scales are small or large, multiply the corresponding rows by the appropriate order of magnitude to magnify the penalty for small coefficients or reduce the penalty for large coefficients.
  For example, suppose a regression model has three predictors. The first two predictors have similar scales, but the third predictor has a scale that is 3 orders larger. Suppose the 100-element regularization path is in the 1-by-100 vector Lambda. To create a matrix of shrinkage values, enter the following code:
  p = 3 % Number of predictors numcoeffs = p + 1; % Number of coefficients, intercept and predictors LambdaMat = repmat(Lambda,4,1); LambdaMat(4,:) = 1e3*LambdaMat(4,:);
  For more on specifying λ, see [3].
In a for loop, pass the predictor and response data to estimate to estimate features of the posterior distributions for each value of the tuning parameter. For each iteration, store the posterior model object and estimation summary table in a cell array. estimate uses the Gibbs sampler to sample successively from the full conditionals (see Analytically Tractable Posteriors). You can change aspects of the Gibbs sampler, such as the thinning factor, using the available options.
Posterior models (empiricalblm model objects) store the draws from the full conditionals, among other things.
Estimation summary tables are MATLAB^® tables that include:
- Estimates of the mean (Mean) and covariance matrix (Covariances) of the marginal posterior π(β|y,x), and the mean and variance of π(σ²|y,x).
- 95% equitailed, credible intervals (CI95), which are the 0.025-quantile and the 0.975-quantile of the retained Monte Carlo sample.
- Posterior variable-inclusion probabilities (Regime) for SSVS prior models only. Values below a threshold that you determine indicate that the corresponding predictor is insignificant or redundant.
Although you estimate multiple posterior models, a good practice is to inspect the posterior draws for adequate mixing, particularly for models estimated using the boundaries of the tuning parameter values. For more details, see Posterior Estimation and Simulation Diagnostics.
Determine the best-performing posterior model. Two examples are:
- Choose the simplest model that minimizes the mean squared error (MSE, the means of Sigma2) among all posterior models. This method is simple, but resulting models might not generalize well.
- Choose the simplest model that minimizes forecast MSE.
  1. Partition the data into estimation and forecast samples.
  2. For all selected tuning parameter values, estimate the posterior distribution by using the estimation sample data.
  3. Compute the forecast MSE using the forecast sample and responses forecasted by using the posterior predictive distribution.
Forecast responses given new predictor data using forecast. The forecast function constructs forecasts from the posterior predictive distribution by implementing a Gibbs sampler. Inspect the draws for adequate mixing.

References

[1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889.

[2] Marin, J. M., and C. P. Robert. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. New York: Springer Science+Business Media, LLC, 2007.

[3] Park, T., and G. Casella. "The Bayesian Lasso." Journal of the American Statistical Association. Vol. 103, No. 482, 2008, pp. 681–686.

Implement Bayesian Linear Regression

Workflow for Standard Bayesian Linear Regression Models

Workflow for Bayesian Predictor Selection

References

See Also

Functions

Objects

Topics