Mercer's theorem and the Karhunen-Loeve expansion

Toby Driscoll, 20th December 2011

Contents

(Chebfun example stats/MercerKarhunenLoeve.m)

Mercer's theorem is a continuous analog of the singular-value or eigenvalue decomposition of a symmetric positive definite matrix. One of its main applications is to find convenient ways to express stochastic processes, via the Karhunen-Loeve expansion .

LW = 'linewidth'; FS = 'fontsize'; MS = 'markersize';

Mercer's theorem

Suppose K(s,t) is a symmetric (K(t,s)=K(s,t)), continuous, and nonnegative definite kernel function on [a,b]x[a,b]. Then Mercer's theorem asserts that there is an orthonormal set of eigenfunctions {psi_j(x)} and eigenvalues {lambda_j} such that

K(s,t) = sum(  lambda_j psi_j(s) psi_j(t),  j=1..infinity  ),

where the values and functions satisfy the integral eigenvalue equation

lambda_j psi_j(s) = int(  K(s,t) psi_j(t), t=a..b  ).

For example, suppose we have an exponentially decaying kernel:

K = @(s,t) exp(-abs(s-t));

We can create the integral operator and find the leading terms of its Mercer decomposition numerically.

F = fred( K, domain([-1,1]) );
[Psi,Lambda] = eigs(F,20,'lm');
plot(Psi(:,[1 2 5 10]),LW,2), title('Four Mercer eigenfunctions') The eigenfunctions returned by eigs are orthonormal.

Psi(:,1:6)'*Psi(:,1:6)
ans =
Columns 1 through 3
1.000000000000000   0.000000000000002   0.000000026818899
0.000000000000002   1.000000000000000   0.000000000000001
0.000000026818899   0.000000000000001   1.000000000000000
-0.000000000000000   0.000000032098278  -0.000000000000003
0.000000094733929   0.000000000000001   0.000000011521606
0.000000000000001   0.000000083028756  -0.000000000000001
Columns 4 through 6
-0.000000000000000   0.000000094733929   0.000000000000001
0.000000032098278   0.000000000000001   0.000000083028756
-0.000000000000003   0.000000011521606  -0.000000000000001
1.000000000000000  -0.000000000000001   0.000000018925274
-0.000000000000001   0.999999999999999   0.000000000000002
0.000000018925274   0.000000000000002   1.000000000000000

The truncation of the Mercer sum does lead to an underestimate of the values of the kernel K(s,t). For our example, we should get K(s,s)=1, but we get noticeably less.

Psi(0,:)*Lambda*Psi(0,:)'
Psi(0.95,:)*Lambda*Psi(0.95,:)'
ans =
0.979857329683123
ans =
0.982464694653203

In fact, the eigenvalues decrease only like O(1/n^2), which makes the pointwise convergence in the number of terms rather slow.

diff(log(diag(Lambda)))' ./ diff(log((1:20)))
ans =
Columns 1 through 3
-1.555646283238670  -2.248663437902514  -2.362606105193702
Columns 4 through 6
-2.344013276487799  -2.305388107579013  -2.268319088803652
Columns 7 through 9
-2.236247834937635  -2.208919255360176  -2.185442602570997
Columns 10 through 12
-2.164965338901021  -2.146795988026114  -2.130397815739643
Columns 13 through 15
-2.115357876991441  -2.101357150110396  -2.088146837397857
Columns 16 through 18
-2.075530637487911  -2.063351724075133  -2.051483201102641
Column 19
-2.039821069702878

Karhunen-Loeve expansion

Now suppose that X(t,omega) is a stochastic process for t in some interval [a,b] and omega in some probability space. The process is often characterized by its mean, mu(t), and its covariance, K(s,t), the expected value of (X(s)-mu(s))*(X(t)-mu(t)). Using Mercer's theorem on K, we can express the process by the K-L expansion

X(t,omega) = mu(t) + sum( sqrt(lambda_j) psi_j(t) Z_j(omega), j=1..inf )

where lambda_j and psi_j are Mercer eigenmodes for K, and the Z_j are uncorrelated and of unit variance.

K-L is a generalization of the singular value decomposition of a matrix, which can be written as a sum of outer products of vectors. The covariance K plays the role of the Gram matrix inner products (in probability) of "columns" of the process for different values of s and t. A number of SVD results have K-L analogs, most notably that the best approximation of the process results from truncating the expansion, if the eigenvalues are arranged in nonincreasing order.

Because the Z_j in the expansion are uncorrelated, the variance of X is just the sum of the eigenvalues. This is the trace of K, which is the integral of K(s,s); in this case, the result is 2. But we can also calculuate the variance in a truncation of the expansion by summing only some of the eigenvalues. For example, suppose the process X has the exponential covariance in K above. The eigenvalues show that 95% of the variance in the process is captured by the first 10 K-L modes:

lambda = diag(Lambda);
sum(lambda(1:10)) / 2
ans =
0.957887547062563

We can find realizations of X by selecting the random parameters Z_j in the expansion.

Z = randn(10,400);
X = Psi(:,1:10)*(sqrt(Lambda(1:10,1:10))*Z);
plot(X(:,1:40))
mu = sum(X,2)/400;
hold on, plot(mu,'k',LW,3)
title('Random realizations, and the mean') We should get roughly the original covariance function back.

[S,T] = meshgrid(-1:.05:1);
C = cov( X(-1:.05:1,:)' );  % covariance at discrete locations
clf, mesh(S,T,C)
hold on, plot3(S,T,K(S,T),'k.',MS,10) If we shorten the correlation length of the process relative to the domain (i.e., more randomness), the amount of variance captured by the first 10 modes will decrease.

K = @(s,t) exp(-4*abs(s-t));       % decrease correlation faster, then...
F = fred( K, domain([-1,1]) );
lambda = eigs(F,24,'lm');
sum(lambda(1:10)) / 2          % ... a smaller fraction is captured
ans =
0.835854015697241

References:

 D. Xu, Numerical Methods for Stochastic Computations, Princeton University Press, 2010.