Capital Asset Pricing Model with Missing Data

This example illustrates implementation of the Capital Asset Pricing Model (CAPM) in the presence of missing data.

The Capital Asset Pricing Model

The Capital Asset Pricing Model (CAPM) is a venerable but often-maligned tool to characterize comovements between asset and market prices. Although many issues arise in its implementation and interpretation, one problem that practitioners face is to estimate the coefficients of the CAPM with incomplete stock price data.

Given a host of assumptions that can be found in the references (see Sharpe [3], Lintner [2], Jarrow [1], and Sharpe, et. al. [4]), the CAPM concludes that asset returns have a linear relationship with market returns. Specifically, given the return of all stocks that constitute a market denoted as M and the return of a riskless asset denoted as C, the CAPM states that the return of each asset R(i) in the market has the expectational form

E[R(i)] = C + b(i) * (E[M] - C)

for assets i = 1, ... , n, where b(i) is a parameter that specifies the degree of comovement between a given asset and the underlying market. In other words, the expected return of each asset is equal to the return on a riskless asset plus a risk-adjusted expected market return net of riskless asset returns. The collection of parameters b(1), ... , b(n) are called asset betas.

Note that the beta of an asset has the form

b(i) = cov(R(i),M)/var(M)

which is the ratio of the covariance between asset and market returns divided by the variance of market returns. If an asset has a beta equal to 1, the asset is said to move with the market; if an asset has a beta greater than 1, the asset is said to be more volatile than the market; and if an asset has a beta less than 1, the asset is said to be less volatile than the market.

Estimation of the CAPM

The standard form of the CAPM model for estimation is a linear model with additional parameters for each asset to characterize residual errors. For each of n assets with m samples of observed asset returns R(k, i), market returns M(k), and riskless asset returns C(k), the estimation model has the form

R(k,i) = a(i) + C(k) + b(i) * (M(k) - C(k)) + V(k,i)

for samples k = 1, ... , m and assets i = 1, ... , n, where a(i) is a parameter that specifies the non-systematic return of an asset, b(i) is the asset beta, and V(k,i) is the residual error for each asset with associated random variable V(i).

The collection of parameters a(1), ... , a(n) are called asset alphas. The strict form of the CAPM specifies that alphas must be zero and that deviations from zero are the result of temporary disequilibria. In practice, however, assets may have non-zero alphas, where much of active investment management is devoted to the search for assets with exploitable non-zero alphas.

To allow for the possibility of non-zero alphas, the estimation model generally seeks to estimate alphas and to perform tests to determine if the alphas are statistically equal to zero.

The residual errors V(i) are assumed to have moments

E[V(i)] = 0

and

E[V(i) * V(j)] = S(i,j)

for assets i,j = 1, ... , n, where the parameters S(1,1), ... , S(n,n) are called residual or non-systematic variances/covariances.

The square root of the residual variance of each asset, i.e., sqrt(S(i,i)) for i = 1, ... , n, is said to be the residual or non-systematic risk of the asset since it characterizes the residual variation in asset prices that cannot be explained by variations in market prices.

Estimation with Missing Data

Although betas can be estimated for companies with sufficiently long histories of asset returns, it is extremely difficult to estimate betas for recent IPOs. However, if a collection of sufficiently-observable companies exists that can be expected to have some degree of correlation with the new company's stock price movements, for example, companies within the same industry as the new company, then it is possible to obtain imputed estimates for new company betas with the missing-data regression routines in the Financial Toolbox™.

Separate Estimation of Some Technology Stock Betas

To illustrate how to use the missing-data regression routines, we will estimate betas for twelve technology stocks, where one stock (GOOG) is an IPO.

First, load dates, total returns, and ticker symbols for the twelve stocks from the MAT-file CAPMuniverse.

load CAPMuniverse
whos Assets Data Dates

  Name           Size             Bytes  Class     Attributes

  Assets         1x14              1792  cell                
  Data        1471x14            164752  double              
  Dates       1471x1              11768  double

Dates = datetime(Dates,'ConvertFrom','datenum');

The assets in the model have the following symbols, where the last two series are proxies for the market and the riskless asset.

Assets(1:7)

ans = 1×7 cell
    {'AAPL'}    {'AMZN'}    {'CSCO'}    {'DELL'}    {'EBAY'}    {'GOOG'}    {'HPQ'}

Assets(8:14)

ans = 1×7 cell
    {'IBM'}    {'INTC'}    {'MSFT'}    {'ORCL'}    {'YHOO'}    {'MARKET'}    {'CASH'}

The data covers the period from January 1, 2000 to November 7, 2005 with daily total returns. Two stocks in this universe have missing values that are represented by NaNs. One of the two stocks had an IPO during this period and, consequently, has significantly less data than the other stocks.

The first step is to compute separate regressions for each stock, where the stocks with missing data have estimates that reflect their reduced observability.

[NumSamples, NumSeries] = size(Data);
NumAssets = NumSeries - 2;

StartDate = Dates(1);
EndDate = Dates(end);

Alpha = NaN(1, length(NumAssets));
Beta = NaN(1, length(NumAssets));
Sigma = NaN(1, length(NumAssets));
StdAlpha = NaN(1, length(NumAssets));
StdBeta = NaN(1, length(NumAssets));
StdSigma = NaN(1, length(NumAssets));
for i = 1:NumAssets
	% Set up separate asset data and design matrices.
	TestData = zeros(NumSamples,1);
	TestDesign = zeros(NumSamples,2);

	TestData(:) = Data(:,i) - Data(:,14);
	TestDesign(:,1) = 1.0;
	TestDesign(:,2) = Data(:,13) - Data(:,14);

	% Estimate the CAPM for each asset separately.
	[Param, Covar] = ecmmvnrmle(TestData, TestDesign);

	% Estimate the ideal standard errors for covariance parameters.
	[StdParam, StdCovar] = ecmmvnrstd(TestData, TestDesign, Covar, 'fisher');
	
	% Estimate the sample standard errors for model parameters.
	StdParam = ecmmvnrstd(TestData, TestDesign, Covar, 'hessian');

	% Set up results for the output.
	Alpha(i) = Param(1);
	Beta(i) = Param(2);
	Sigma(i) = sqrt(Covar);

	StdAlpha(i) = StdParam(1);
	StdBeta(i) = StdParam(2);
	StdSigma(i) = sqrt(StdCovar);
end

displaySummary('Separate', StartDate, EndDate, NumAssets, Assets, Alpha, StdAlpha, Beta, StdBeta, Sigma, StdSigma)

Separate regression with daily total return data from 03-Jan-2000 to 07-Nov-2005 ...
       Alpha                Beta                 Sigma               
  ---- -------------------- -------------------- --------------------
  AAPL    0.0012 (  1.3882)    1.2294 ( 17.1839)    0.0322 (  0.0062)
  AMZN    0.0006 (  0.5326)    1.3661 ( 13.6579)    0.0449 (  0.0086)
  CSCO   -0.0002 (  0.2878)    1.5653 ( 23.6085)    0.0298 (  0.0057)
  DELL   -0.0000 (  0.0368)    1.2594 ( 22.2164)    0.0255 (  0.0049)
  EBAY    0.0014 (  1.4326)    1.3441 ( 16.0732)    0.0376 (  0.0072)
  GOOG    0.0046 (  3.2107)    0.3742 (  1.7328)    0.0252 (  0.0071)
   HPQ    0.0001 (  0.1747)    1.3745 ( 24.2390)    0.0255 (  0.0049)
   IBM   -0.0000 (  0.0312)    1.0807 ( 28.7576)    0.0169 (  0.0032)
  INTC    0.0001 (  0.1608)    1.6002 ( 27.3684)    0.0263 (  0.0050)
  MSFT   -0.0002 (  0.4871)    1.1765 ( 27.4554)    0.0193 (  0.0037)
  ORCL    0.0000 (  0.0389)    1.5010 ( 21.1855)    0.0319 (  0.0061)
  YHOO    0.0001 (  0.1282)    1.6543 ( 19.3838)    0.0384 (  0.0074)

The Alpha column contains alpha estimates for each stock that are near zero as expected. In addition, the t-statistics (which are enclosed in parentheses) generally reject the hypothesis that the alphas are nonzero at the 99.5% level of significance.

The Beta column contains beta estimates for each stock that also have t-statistics enclosed in parentheses. For all stocks but GOOG, the hypothesis that the betas are nonzero is accepted at the 99.5% level of significance. It would seem, however, that GOOG does not have enough data to obtain a meaningful estimate for beta since its t-statistic would imply rejection of the hypothesis of a nonzero beta.

The Sigma column contains residual standard deviations, that is, estimates for non-systematic risks. Instead of t-statistics, the associated standard errors for the residual standard deviations are enclosed in parentheses.

Grouped Estimation of Some Technology Stock Betas

To estimate stock betas for all twelve stocks, set up a joint regression model that groups all twelve stocks within a single design (since each stock has the same design matrix, this model is actually an example of seemingly-unrelated regression). The function to estimate model parameters is ecmmvnrmle and the function to estimate standard errors is ecmmvnrstd.

Since GOOG has a significant number of missing values, a direct use of the missing data function ecmmvnrmle takes 482 iterations to converge. This can take a long time to compute. For the sake of brevity, the parameter and covariance estimates after the first 480 iterations are contained in a MAT-file (CAPMgroupparam) and is used as initial estimates to compute stock betas.

load CAPMgroupparam
whos Param0 Covar0

  Name         Size            Bytes  Class     Attributes

  Covar0      12x12             1152  double              
  Param0      24x1               192  double

Now estimate the parameters for the collection of twelve stocks.

NumParams = 2 * NumAssets;

% Set up the grouped asset data and design matrices.
TestData = zeros(NumSamples, NumAssets);
TestDesign = cell(NumSamples, 1);
Design = zeros(NumAssets, NumParams);

for	k = 1:NumSamples
	for i = 1:NumAssets
		TestData(k,i) = Data(k,i) - Data(k,14);
		Design(i,2*i - 1) = 1.0;
		Design(i,2*i) = Data(k,13) - Data(k,14);
	end
	TestDesign{k} = Design;
end

% Estimate the CAPM for all assets together with initial parameter estimates.
[Param, Covar] = ecmmvnrmle(TestData, TestDesign, [], [], [], Param0, Covar0);

% Estimate the ideal standard errors for covariance parameters.
[StdParam, StdCovar] = ecmmvnrstd(TestData, TestDesign, Covar, 'fisher');

% Estimate the sample standard errors for model parameters.
StdParam = ecmmvnrstd(TestData, TestDesign, Covar, 'hessian');

% Set up results for the output.
Alpha = Param(1:2:end-1);
Beta = Param(2:2:end);
Sigma = sqrt(diag(Covar));

StdAlpha = StdParam(1:2:end-1);
StdBeta = StdParam(2:2:end);
StdSigma = sqrt(diag(StdCovar));

displaySummary('Grouped', StartDate, EndDate, NumAssets, Assets, Alpha, StdAlpha, Beta, StdBeta, Sigma, StdSigma)

Grouped regression with daily total return data from 03-Jan-2000 to 07-Nov-2005 ...
       Alpha                Beta                 Sigma               
  ---- -------------------- -------------------- --------------------
  AAPL    0.0012 (  1.3882)    1.2294 ( 17.1839)    0.0322 (  0.0062)
  AMZN    0.0007 (  0.6086)    1.3673 ( 13.6427)    0.0450 (  0.0086)
  CSCO   -0.0002 (  0.2878)    1.5653 ( 23.6085)    0.0298 (  0.0057)
  DELL   -0.0000 (  0.0368)    1.2594 ( 22.2164)    0.0255 (  0.0049)
  EBAY    0.0014 (  1.4326)    1.3441 ( 16.0732)    0.0376 (  0.0072)
  GOOG    0.0041 (  2.8907)    0.6173 (  3.1100)    0.0337 (  0.0065)
   HPQ    0.0001 (  0.1747)    1.3745 ( 24.2390)    0.0255 (  0.0049)
   IBM   -0.0000 (  0.0312)    1.0807 ( 28.7576)    0.0169 (  0.0032)
  INTC    0.0001 (  0.1608)    1.6002 ( 27.3684)    0.0263 (  0.0050)
  MSFT   -0.0002 (  0.4871)    1.1765 ( 27.4554)    0.0193 (  0.0037)
  ORCL    0.0000 (  0.0389)    1.5010 ( 21.1855)    0.0319 (  0.0061)
  YHOO    0.0001 (  0.1282)    1.6543 ( 19.3838)    0.0384 (  0.0074)

Although the results for complete-data stocks are the same, notice that the beta estimates for AMZN and GOOG (which are the two stocks with missing values) are different from the estimates derived for each stock separately. Since AMZN has few missing values, the differences in the estimates are small. With GOOG, however, the differences are more pronounced.

The t-statistic for the beta estimate of GOOG is now significant at the 99.5% level of significance. Note, however, that the t-statistics for beta estimates are based on standard errors from the sample Hessian which, in contrast to the Fisher information matrix, accounts for the increased uncertainty in an estimate due to missing values. If the t-statistic is obtained from the more optimistic Fisher information matrix, the t-statistic for GOOG is 8.25. Thus, despite the increase in uncertainty due to missing data, GOOG nonetheless has a statistically-significant estimate for beta.

Finally, note that the beta estimate for GOOG is 0.62 - a value that may require some explanation. Whereas the market has been volatile over this period with sideways price movements, GOOG has steadily appreciated in value. Consequently, it is less correlated than the market, which, in turn, implies that it is less volatile than the market with a beta less than 1.

References

[1] R. A. Jarrow. Finance Theory. Prentice-Hall, Inc., 1988.

[2] J. Lintner. "The Valuation of Risk Assets and the Selection of Risky Investments in Stocks." Review of Economics and Statistics. Vol. 14, 1965, pp. 13-37.

[3] W. F. Sharpe. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." Journal of Finance. Vol. 19, 1964, pp. 425-442.

[4] W. F. Sharpe, G. J. Alexander, and J. V. Bailey, Investments. 6th ed., Prentice-Hall, Inc., 1999.

Utility Functions

function displaySummary(regressionType, StartDate, EndDate, NumAssets, Assets, Alpha, StdAlpha, Beta, StdBeta, Sigma, StdSigma)
fprintf(1,'%s regression with daily total return data from %s to %s ...\n', ...
	regressionType, string(StartDate),string(EndDate));
fprintf(1,'  %4s %-20s %-20s %-20s\n',' ','Alpha','Beta','Sigma');
fprintf(1,'  ---- -------------------- -------------------- --------------------\n');

for i = 1:NumAssets
	fprintf('  %4s %9.4f (%8.4f) %9.4f (%8.4f) %9.4f (%8.4f)\n', ...
		Assets{i},Alpha(i),abs(Alpha(i)/StdAlpha(i)), ...
		Beta(i),abs(Beta(i)/StdBeta(i)),Sigma(i),StdSigma(i));
end

end