ecmmvnrmle

Multivariate normal regression with missing data

Syntax

[Param,Covar] = ecmmvnrmle(Data,Design)

[Param,Covar,Resid,Info] = ecmmvnrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)

Description

[Param,Covar] = ecmmvnrmle(Data,Design) estimates a multivariate normal regression model with missing data. The model has the form

$D a t a_{k} \sim N (D e s i g n_{k} \times P a r a m e t e r s, C o v a r i a n c e)$

for samples k = 1, ... , NUMSAMPLES.

example

[Param,Covar,Resid,Info] = ecmmvnrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat) adds an optional arguments for MaxIterations, TolParam, TolObj, Param0, Covar0, and CovarFormat.

example

Examples

collapse all

Compute Multivariate Normal Regression With Missing Data

Open Live Script

This example shows how to estimate a multivariate normal regression model with missing data.

First, load dates, total returns, and ticker symbols for the twelve stocks from the MAT-file.

load CAPMuniverse
whos Assets Data Dates

  Name           Size             Bytes  Class     Attributes

  Assets         1x14              1792  cell                
  Data        1471x14            164752  double              
  Dates       1471x1              11768  double

Dates = datetime(Dates,'ConvertFrom','datenum');

The assets in the model have the following symbols, where the last two series are proxies for the market and the riskless asset.

Assets(1:14)

ans = 1x14 cell
    {'AAPL'}    {'AMZN'}    {'CSCO'}    {'DELL'}    {'EBAY'}    {'GOOG'}    {'HPQ'}    {'IBM'}    {'INTC'}    {'MSFT'}    {'ORCL'}    {'YHOO'}    {'MARKET'}    {'CASH'}

The data covers the period from January 1, 2000 to November 7, 2005 with daily total returns. Two stocks in this universe have missing values that are represented by NaNs. One of the two stocks had an IPO during this period and, consequently, has significantly less data than the other stocks.

Compute separate regressions for each stock, where the stocks with missing data have estimates that reflect their reduced observability.

[NumSamples, NumSeries] = size(Data);
NumAssets = NumSeries - 2;

StartDate = Dates(1);
EndDate = Dates(end);

Alpha = NaN(1, length(NumAssets));
Beta = NaN(1, length(NumAssets));
Sigma = NaN(1, length(NumAssets));
StdAlpha = NaN(1, length(NumAssets));
StdBeta = NaN(1, length(NumAssets));
StdSigma = NaN(1, length(NumAssets));
for i = 1:NumAssets
	% Set up separate asset data and design matrices
	TestData = zeros(NumSamples,1);
	TestDesign = zeros(NumSamples,2);

	TestData(:) = Data(:,i) - Data(:,14);
	TestDesign(:,1) = 1.0;
	TestDesign(:,2) = Data(:,13) - Data(:,14);

	% Estimate the multivariate normal regression for each asset separately.
	[Param, Covar] = ecmmvnrmle(TestData, TestDesign)
    
end

Param = 2×1

    0.0012
    1.2294

Covar = 
0.0010

Param = 2×1

    0.0006
    1.3661

Covar = 
0.0020

Param = 2×1

   -0.0002
    1.5653

Covar = 
8.8911e-04

Param = 2×1

   -0.0000
    1.2594

Covar = 
6.4996e-04

Param = 2×1

    0.0014
    1.3441

Covar = 
0.0014

Param = 2×1

    0.0046
    0.3742

Covar = 
6.3272e-04

Param = 2×1

    0.0001
    1.3745

Covar = 
6.5040e-04

Param = 2×1

   -0.0000
    1.0807

Covar = 
2.8562e-04

Param = 2×1

    0.0001
    1.6002

Covar = 
6.9146e-04

Param = 2×1

   -0.0002
    1.1765

Covar = 
3.7138e-04

Param = 2×1

    0.0000
    1.5010

Covar = 
0.0010

Param = 2×1

    0.0001
    1.6543

Covar = 
0.0015

Input Arguments

collapse all

`Data` — Data
matrix

Data, specified as an NUMSAMPLES-by-NUMSERIES matrix with NUMSAMPLES samples of a NUMSERIES-dimensional random vector. Missing values are indicated by NaNs. Only samples that are entirely NaNs are ignored. (To ignore samples with at least one NaN, use mvnrmle.)

Data Types: double

`Design` — Design model
matrix | cell array

Design model, specified as a matrix or a cell array that handles two model structures:

If NUMSERIES = 1, Design is a NUMSAMPLES-by-NUMPARAMS matrix with known values. This structure is the standard form for regression on a single series.
If NUMSERIES ≥ 1, Design is a cell array. The cell array contains either one or NUMSAMPLES cells. Each cell contains a NUMSERIES-by-NUMPARAMS matrix of known values.
If Design has a single cell, it is assumed to have the same Design matrix for each sample. If Design has more than one cell, each cell contains a Design matrix for each sample.

Data Types: double | cell

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

(Optional) Maximum number of iterations for the estimation algorithm, specified as a numeric.

Data Types: double

`TolParam` — Convergence tolerance for estimation algorithm based on changes in model parameter estimates
`1.0e-8` (default) | numeric

(Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates, specified as a numeric. The convergence test for changes in model parameters is

$‖ P a r a m_{k} - P a r a m_{k - 1} ‖ < T o l P a r a m \times (1 + ‖ P a r a m_{k} ‖)$

where Param represents the output Parameters, and iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam ≤ 0 and TolObj ≤ 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

Data Types: double

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
`1.0e-12` (default) | numeric

(Optional) Convergence tolerance for estimation algorithm based on changes in the objective function, specified as a numeric. The convergence test for changes in the objective function is

$| O b j_{k} - O b j_{k - 1} | < T o l O b j \times (1 + | O b j_{k} |)$

for iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam ≤ 0 and TolObj ≤ 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

Data Types: double

`Param0` — Estimate for the parameters of regression model
`[]` (default) | vector

(Optional) Estimate for the parameters of the regression model, specified as an NUMPARAMS-by-1 column vector.

Data Types: double

`Covar0` — Estimate for the covariance matrix of regression residuals
`[]` (default) | matrix

(Optional) Estimate for the covariance matrix of the regression residuals, specified as NUMSERIES-by-NUMSERIES matrix.

Data Types: double

`CovarFormat` — Format for the covariance matrix
`'full'` (default) | character vector

(Optional) Format for the covariance matrix, specified as a character vector. The choices are:

'full' — Compute the full covariance matrix.
'diagonal' — Force the covariance matrix to be a diagonal matrix.

Data Types: char

Output Arguments

collapse all

`Param` — Estimates for parameters of the regression model
vector

Estimates for the parameters of the regression model, returned as a NUMPARAMS-by-1 column vector.

`Covar` — Estimates for the covariance of regression model's residuals
matrix

Estimates for the covariance of the regression model's residuals, returned as a NUMSERIES-by-NUMSERIES matrix.

`Resid` — Residuals from regression
matrix

Residuals from the regression, returned as a NUMSAMPLES-by-NUMSERIES matrix. For any missing values in Data, the corresponding residual is the difference between the conditionally imputed value for Data and the model, that is, the imputed residual.

Note

The covariance estimate Covariance cannot be derived from the residuals.

`Info` — Additional information from regression
structure

Additional information from the regression, returned as a structure. The structure has these fields:

Info.Obj — A variable-extent column vector, with no more than MaxIterations elements, that contain each value of the objective function at each iteration of the estimation algorithm. The last value in this vector, Obj(end), is the terminal estimate of the objective function. If you do maximum likelihood estimation, the objective function is the log-likelihood function.
Info.PrevParameters — NUMPARAMS-by-1 column vector of estimates for the model parameters from the iteration just prior to the terminal iteration.Info.PrevCovariance – NUMSERIES-by-NUMSERIES matrix of estimates for the covariance parameters from the iteration just prior to the terminal iteration.

References

[1] Little, Roderick J. A. and Donald B. Rubin. Statistical Analysis with Missing Data. 2nd Edition. John Wiley & Sons, Inc., 2002.

[2] Meng, Xiao-Li and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

[3] Sexton, Joe and Anders Rygh Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

[4] Dempster, A. P., N. M. Laird, and Donald B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.

Version History

Introduced in R2006a

ecmmvnrmle

Syntax

Description

Examples

Compute Multivariate Normal Regression With Missing Data

Input Arguments

`Data` — Data
matrix

`Design` — Design model
matrix | cell array

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

`TolParam` — Convergence tolerance for estimation algorithm based on changes in model parameter estimates
`1.0e-8` (default) | numeric

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
`1.0e-12` (default) | numeric

`Param0` — Estimate for the parameters of regression model
`[]` (default) | vector

`Covar0` — Estimate for the covariance matrix of regression residuals
`[]` (default) | matrix

`CovarFormat` — Format for the covariance matrix
`'full'` (default) | character vector

Output Arguments

`Param` — Estimates for parameters of the regression model
vector

`Covar` — Estimates for the covariance of regression model's residuals
matrix

`Resid` — Residuals from regression
matrix

`Info` — Additional information from regression
structure

References

Version History

See Also

Topics

ecmmvnrmle

Syntax

Description

Examples

Compute Multivariate Normal Regression With Missing Data

Input Arguments

Data — Data matrix

Design — Design model matrix | cell array

MaxIterations — Maximum number of iterations for the estimation algorithm 100 (default) | numeric

TolParam — Convergence tolerance for estimation algorithm based on changes in model parameter estimates 1.0e-8 (default) | numeric

TolObj — Convergence tolerance for estimation algorithm based on changes in objective function 1.0e-12 (default) | numeric

Param0 — Estimate for the parameters of regression model [] (default) | vector

Covar0 — Estimate for the covariance matrix of regression residuals [] (default) | matrix

CovarFormat — Format for the covariance matrix 'full' (default) | character vector

Output Arguments

Param — Estimates for parameters of the regression model vector

Covar — Estimates for the covariance of regression model's residuals matrix

Resid — Residuals from regression matrix

Info — Additional information from regression structure

References

Version History

See Also

Topics

`Data` — Data
matrix

`Design` — Design model
matrix | cell array

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

`TolParam` — Convergence tolerance for estimation algorithm based on changes in model parameter estimates
`1.0e-8` (default) | numeric

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
`1.0e-12` (default) | numeric

`Param0` — Estimate for the parameters of regression model
`[]` (default) | vector

`Covar0` — Estimate for the covariance matrix of regression residuals
`[]` (default) | matrix

`CovarFormat` — Format for the covariance matrix
`'full'` (default) | character vector

`Param` — Estimates for parameters of the regression model
vector

`Covar` — Estimates for the covariance of regression model's residuals
matrix

`Resid` — Residuals from regression
matrix

`Info` — Additional information from regression
structure