# AR Order Selection with Partial Autocorrelation Sequence

This example shows how to assess the order of an autoregressive model using the partial autocorrelation sequence. For a stationary time series with values $X\left(1\right),X\left(2\right),X\left(3\right),\dots ,X\left(k+1\right)$, the partial autocorrelation sequence at lag $k$ is the correlation between $X\left(1\right)$ and $X\left(k+1\right)$ after regressing $X\left(1\right)$ and $X\left(k+1\right)$ on the intervening observations, $X\left(2\right),X\left(3\right),X\left(4\right),\dots ,X\left(k\right)$. For a moving average process, you can use the autocorrelation sequence to assess the order. However, for an autoregressive (AR) or autoregressive moving average (ARMA) process, the autocorrelation sequence does not help in order selection. This example uses the following workflow for model order selection in an AR process:

• Simulates a realization of the AR(2) process.

• Graphically explores the correlation between lagged values of the time series.

• Examines the sample autocorrelation sequence of the time series.

• Fits an AR(15) model to the time series by solving the Yule-Walker equations (aryule).

• Uses the reflection coefficients returned by aryule to compute the partial autocorrelation sequence.

• Examines the partial autocorrelation sequence to select the model order.

Consider the AR(2) process defined by

$X\left(n\right)+1.5X\left(n-1\right)+0.75X\left(n-2\right)=\epsilon \left(n\right),$

where $\epsilon \left(n\right)$ is an $N\left(0,1\right)$ Gaussian white noise process. Simulate a 1000-sample time series from the AR(2) process defined by the difference equation. Set the random number generator to the default settings for reproducible results.

A = [1 1.5 0.75];
rng default
x = filter(1,A,randn(1000,1));

View the frequency response of the AR(2) process.

freqz(1,A)

The AR(2) process acts like a highpass filter in this case.

Graphically examine the correlation in x by producing scatter plots of $X\left(n+1\right)$ vs. $X\left(1\right)$ for $n=2,3,4,5$.

figure
for k = 1:4
subplot(2,2,k)
plot(x(1:end-k),x(k+1:end),'*')
xlabel('X_1')
ylabel(['X_' int2str(k+1)])
grid
end

In the scatter plot, you see a linear relationship between $X\left(1\right)$ and $X\left(2\right)$ and between $X\left(1\right)$ and $X\left(3\right)$, but not between $X\left(1\right)$ and either $X\left(4\right)$ or $X\left(5\right)$.

The points in the top row scatter plots fall approximately on a line with a negative slope in the top left panel and positive slope in the top right panel. The scatter plots in the bottom two panels do not show any apparent linear relationship.

The negative correlation between $X\left(1\right)$ and $X\left(2\right)$ and the positive correlation between $X\left(1\right)$ and $X\left(3\right)$ are explained by the highpass-filter behavior of the AR(2) process.

Find the sample autocorrelation sequence up to lag 50 and plot the result.

[xc,lags] = xcorr(x,50,'coeff');

figure
stem(lags(51:end),xc(51:end),'filled')
xlabel('Lag')
ylabel('ACF')
title('Sample Autocorrelation Sequence')
grid

The sample autocorrelation sequence shows a negative value at lag 1 and a positive value at lag 2. Based on the scatter plot, this result is expected. However, you cannot determine the appropriate order for the AR model from the sample autocorrelation sequence.

Fit an AR(15) model using aryule. Return the sequence of reflection coefficients, whose negative is the partial autocorrelation sequence.

[arcoefs,E,K] = aryule(x,15);
pacf = -K;

Plot the partial autocorrelation sequence along with the large-sample 95% confidence intervals. If the data are generated by an autoregressive process of order $p$, the values of the sample partial autocorrelation sequence for lags greater than $p$ follow a $N\left(0,1/N\right)$ distribution, where $N$ is the length of the time series. For a 95% confidence interval, the critical value is $\sqrt{2}{erf}^{-1}\left(0.95\right)\approx 1.96$ and the confidence interval is $\Delta =0±1.96/\sqrt{N}$.

stem(pacf,'filled')
xlabel('Lag')
ylabel('Partial ACF')
title('Partial Autocorrelation Sequence')
xlim([1 15])

conf = sqrt(2)*erfinv(0.95)/sqrt(1000);
hold on
plot(xlim,[1 1]'*[-conf conf],'r')
hold off
grid

The only values of the partial autocorrelation sequence outside the 95% confidence bounds occur at lags 1 and 2. This indicates that the correct model order for the AR process is 2.

In this example, you generated the time series to simulate an AR(2) process. The partial autocorrelation sequence only confirms that result. In practice, you have only the observed time series without any prior information about model order. In a realistic scenario, the partial autocorrelation sequence is an important tool for appropriate model order selection in stationary autoregressive time series.