Data Transformations

Why Transform?

You can transform time series to:

Isolate temporal components of interest.
Remove the effect of nuisance components (like seasonality).
Make a series stationary.
Reduce spurious regression effects.
Stabilize variability that grows with the level of the series.
Make two or more time series more directly comparable.

You can choose among many data transformation to address these (and other) aims.

For example, you can use decomposition methods to describe and estimate time series components. Seasonal adjustment is a decomposition method you can use to remove a nuisance seasonal component.

Detrending and differencing are transformations you can use to address nonstationarity due to a trending mean. Differencing can also help remove spurious regression effects due to cointegration.

In general, if you apply a data transformation before modeling your data, you then need to back-transform model forecasts to return to the original scale. This is not necessary in Econometrics Toolbox™ if you are modeling difference-stationary data. Use arima to model integrated series that are not a priori differenced. A key advantage of this is that arima also returns forecasts on the original scale automatically.

Detrending

Some nonstationary series can be modeled as the sum of a deterministic trend and a stationary stochastic process. That is, you can write the series y_t as

$y_{t} = μ_{t} + ε_{t},$

where $ε_{t}$ is a stationary stochastic process with mean zero.

The deterministic trend, μ_t, can have multiple components, such as nonseasonal and seasonal components. You can detrend (or decompose) the data to identify and estimate its various components. The detrending process proceeds as follows:

Estimate the deterministic trend component.
Remove the trend from the original data.
(Optional) Model the remaining residual series with an appropriate stationary stochastic process.

Several techniques are available for estimating the trend component. You can estimate it parametrically using least squares, nonparametrically using filters (moving averages), or a combination of both.

Detrending yields estimates of all trend and stochastic components, which might be desirable. However, estimating trend components can require making additional assumptions, performing extra steps, and estimating additional parameters.

Differencing

Differencing is an alternative transformation for removing a mean trend from a nonstationary series. This approach is advocated in the Box-Jenkins approach to model specification [1]. According to this methodology, the first step to build models is differencing your data until it looks stationary. Differencing is appropriate for removing stochastic trends (e.g., random walks).

Define the first difference as

$Δ y_{t} = y_{t} - y_{t - 1},$

where Δ is called the differencing operator. In lag operator notation, where $L^{i} y_{t} = y_{t - i},$

$Δ y_{t} = (1 - L) y_{t} .$

You can create lag operator polynomial objects using LagOp.

Similarly, define the second difference as

$Δ^{2} y_{t} = {(1 - L)}^{2} y_{t} = (y_{t} - y_{t - 1}) - (y_{t - 1} - y_{t - 2}) = y_{t} - 2 y_{t - 1} + y_{t - 2} .$

Like taking derivatives, taking a first difference makes a linear trend constant, taking a second difference makes a quadratic trend constant, and so on for higher-degree polynomials. Many complex stochastic trends can also be eliminated by taking relatively low-order differences. Taking D differences makes a process with D unit roots stationary.

For series with seasonal periodicity, seasonal differencing can address seasonal unit roots. For data with periodicity s (e.g., quarterly data have s = 4 and monthly data have s = 12), the seasonal differencing operator is defined as

$Δ_{s} y_{t} = (1 - L^{s}) y_{t} = y_{t} - y_{t - s} .$

Using a differencing transformation eliminates the intermediate estimation steps required for detrending. However, this means you can’t obtain separate estimates of the trend and stochastic components.

Log Transformations

For a series with exponential growth and variance that grows with the level of the series, a log transformation can help linearize and stabilize the series. If you have negative values in your time series, you should add a constant large enough to make all observations greater than zero before taking the log transformation.

In some application areas, working with differenced, logged series is the norm. For example, the first differences of a logged time series,

$Δ \log y_{t} = \log y_{t} - \log y_{t - 1},$

are approximately the rates of change of the series.

Prices, Returns, and Compounding

The rates of change of a price series are called returns. Whereas price series do not typically fluctuate around a constant level, the returns series often looks stationary. Thus, returns series are typically used instead of price series in many applications.

Denote successive price observations made at times t and t + 1 as y_t and y_t+1, respectively. The continuously compounded returns series is the transformed series

$r_{t} = \log \frac{y_{t + 1}}{y_{t}} = \log y_{t + 1} - \log y_{t} .$

This is the first difference of the log price series, and is sometimes called the log return.

An alternative transformation for price series is simple returns,

$r_{t} = \frac{y_{t + 1} - y_{t}}{y_{t}} = \frac{y_{t + 1}}{y_{t}} - 1.$

For series with relatively high frequency (e.g., daily or weekly observations), the difference between the two transformations is small. Econometrics Toolbox has price2ret for converting price series to returns series (with either continuous or simple compounding), and ret2price for the inverse operation.

References

[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.