Documentation

## Overview of Expected Shortfall Backtesting

Expected Shortfall (ES) is the expected loss on days when there is a Value-at-Risk (VaR) failure. If the VaR is 10 million and the ES is 12 million, we know the expected loss tomorrow; if it happens to be a very bad day, it is 20% higher than the VaR. ES is sometimes called Conditional Value-at-Risk (CVaR), Tail Value-at-Risk (TVaR), Tail Conditional Expectation (TCE), or Conditional Tail Expectation (CTE).

There are many approaches to estimating VaR and ES, and they may lead to different VaR and ES estimates. How can one determine if models are accurately estimating the risk on a daily basis? How can one evaluate which model performs better? The varbacktest tools help validate the performance of VaR models with regards to estimated VaR values. The esbacktest, esbacktestbysim, and esbacktestbyde tools extend these capabilities to evaluate VaR models with regards to estimated ES values.

For VaR backtesting, the possibilities every day are two: either there is a VaR failure or not. If the VaR confidence level is 95%, VaR failures should happen approximately 5% of the time. To backtest VaR, you only need to know whether the VaR was exceeded (VaR failure) or not on each day of the test window and the VaR confidence level. Risk Management Toolbox™ VaR backtesting tools support “frequency” (assess the proportion of failures) and “independence” (assess independence across time) tests, and these tests work with the binary sequence of "failure" or "no-failure" results over the test window.

For expected shortfall (ES), the possibilities every day are infinite: The VaR may be exceeded by 1%, or by 10%, or by 150%, and so on. For example, there are three VaR failures in the following example:

On failure days, the VaR is exceeded on average by 39%, but the estimated ES exceeds VaR by an average of 27%. How can you tell if 39% is significantly larger than 27%? Knowing the VaR confidence level is not enough, you must also know how likely are the different exceedances over the VaR according to the VaR model. In other words, you need some distribution information about what happens beyond the VaR according to your model assumptions. For thin-tail VaR models, 39% vs. 27% may be a large difference. However, for a heavy-tail VaR model where a severity of twice the VaR has a non-trivial probability of happening, then 39% vs. 27% over the three failure dates may not be a red flag.

A key difference between VaR backtesting and ES backtesting is that most ES backtesting methods require information about the distribution of the returns on each day, or at least the distribution of the tails beyond the VaR. One exception is the “unconditional” test (see unconditionalNormal and unconditionalT) where you can get approximate test results without providing the distribution information. This is important in practice, because the “unconditional” test is much simpler to use and can be used in principle for any VaR or ES model. The trade-off is that the approximate results may be inaccurate, especially in borderline accept, or reject cases, or for certain types of distributions.

The toolbox supports the following tests for expected shortfall backtesting for table-based tests for the unconditional Acerbi-Szekely test using the esbacktest object:

The toolbox supports the following Acerbi-Szekely simulation-based tests for expected shortfall backtesting using the esbacktestbysim object:

For the Acerbi-Szekely simulation-based tests, you must provide the model distribution information as part of the inputs to esbacktestbysim.

The toolbox also supports the following Du and Escanciano tests for expected shortfall backtesting using the esbacktestbyde object:

For the Du and Escanciano simulation-based tests, you must provide the model distribution information as part of the inputs to esbacktestbyde.

### Conditional Test by Acerbi and Szekely

The conditional test statistic by Acerbi and Szekely is based on the conditional relationship

$E{S}_{t}=-{E}_{t}\left[{X}_{t}|{X}_{t}<-Va{R}_{t}\right]$

where

Xt is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

VaRt is the estimated VaR for period t.

ESt is the estimated expected shortfall for period t.

The number of failures is defined as

$NumFailures=\sum _{t=1}^{N}{I}_{t}$

where

N is the number of periods in the test window (t = 1,…,N).

It is the VaR failure indicator on period t with a value of 1 if Xt < -VaR, and 0 otherwise.

The conditional test statistic is defined as

The conditional test has two parts. A VaR backtest must be run for the number of failures (NumFailures), and a standalone conditional test is performed for the conditional test statistic Zcond. The conditional test accepts the model only when both the VaR test and the standalone conditional test accept the model. For more information, see conditional.

### Unconditional Test by Acerbi and Szekely

The unconditional test statistic by Acerbi and Szekely is based on the unconditional relationship,

$E{S}_{t}=-{E}_{t}\left[\frac{{X}_{t}{I}_{t}}{{p}_{VaR}}\right]$

where

Xt is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

PVaR is the probability of VaR failure defined as 1-VaR level.

ESt is the estimated expected shortfall for period t.

It is the VaR failure indicator on period t with a value of 1 if Xt < -VaR, and 0 otherwise.

The unconditional test statistic is defined as

The critical values for the unconditional test statistic are stable across a range of distributions, which is the basis for the table-based tests. The esbacktest class runs the unconditional test against precomputed critical values under two distributional assumptions, namely, normal distribution (thin tails, see unconditionalNormal), and t distribution with 3 degrees of freedom (heavy tails, see unconditionalT).

### Quantile Test by Acerbi and Szekely

A sample estimator of the expected shortfall for a sample Y1,…,YN is:

$\stackrel{⌢}{ES}\left(Y\right)=-\frac{1}{⌊N{p}_{VaR}⌋}\sum _{i=1}^{⌊N{p}_{VaR}⌋}{Y}_{\left[i\right]}$

where

N is the number of periods in the test window (t = 1,…,N).

PVaR is the probability of VaR failure defined as 1-VaR level.

Y1,…,YN are the sorted sample values (from smallest to largest), and $⌊N{p}_{VaR}⌋$ is the largest integer less than or equal to NpVaR.

To compute the quantile test statistic, a sample of size N is created at each time t as follows. First, convert the portfolio outcomes to Xt to ranks ${U}_{1}={P}_{1}\left({X}_{1}\right),...,{U}_{N}={P}_{N}\left({X}_{N}\right)$ using the cumulative distribution function Pt. If the distribution assumptions are correct, the rank values U1,…,UN are uniformly distributed in the interval (0,1). Then at each time t:

1. Invert the ranks U = (U1,…,UN) to get N quantiles ${P}_{t}^{-1}\left(U\right)=\left({P}_{t}^{-1}\left({U}_{1}\right),...,{P}_{t}^{-1}\left({U}_{N}\right)\right)$.

2. Compute the sample estimator $\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(U\right)\right)$.

3. Compute the expected value of the sample estimator $E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]$

where V = (V1,…,VN) is a sample of N independent uniform random variables in the interval (0,1). This can be computed analytically.

The quantile test statistic by Acerbi and Szekely is defined as

${Z}_{quantile}=-\frac{1}{N}\sum _{t=1}^{N}\frac{\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(U\right)\right)}{E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]}+1$

The denominator inside the sum can be computed analytically as

$E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]=-\frac{N}{⌊{N}_{pVaR}⌋}{\int }_{0}^{1}{I}_{1-p}\left(N-⌊{N}_{pVaR}⌋,⌊{N}_{pVaR}⌋\right){P}_{t}^{-1}\left(p\right)dp$

where Ix(z,w) is the regularized incomplete beta function. For more information, see betainc and quantile.

### ES Backtest Using Du-Escanciano Method

For each day, the Du-Escanciano model assumes a distribution for the returns. For example, if you have a normal distribution with a conditional variance of 1.5%, there is a corresponding cumulative distribution function Pt. By mapping the returns Xt with the distribution Pt, you get the “mapped returns” series Ut, also known as the "ranks" series, which by construction has values between 0 and 1 (see column 2 in the following table). Let α be the complement of the VaR level — for example, if the VaR level is 95%, α is 5%. If the mapped return Ut is smaller than α, then there is a VaR “violation” or VaR “failure.” This is equivalent to observing a return Xt smaller than the negative of the VaR value for that day, since, by construction, the negative of the VaR value gets mapped to α. Therefore, you can compare Ut against α without even knowing the VaR value. The series of VaR failures is denoted by ht and it is a series of 0's and 1's stored in column 3 in the following table. Finally, column 4 in the following table contains the “cumulative violations” series, denoted by Ht. This is the severity of the mapped VaR violations on days on which the VaR is violated. For example, if the mapped return Ut is 1% and α is 5%, Ht is 4%. Ht is defined as zero if there are no VaR violations.

XtUt = Pt(Xt)ht = Ut < αHt = (α - Ut) * ht
0.002080.579900
-0.010730.155400
-0.008250.215900
-0.029670.007310.0427
0.012420.874500
............

Given the violations series ht and the cumulative violations series Ht, the Du-Escanciano (DE) tests are summarized as:

Du-Escanciano TestVaR TestES Test
UnconditionalMean of htMean of Ht
ConditionalAutocorrelation of htAutocorrelation of Ht

The DE VaR tests assess the mean value and the autocorrelation of the ht series, and the resulting tests overlap with known VaR tests. For example, the mean of ht is expected to match α. In other words, the proportion of time the VaR is violated is expected to match the confidence level. This test is supported in the varbacktest class with the proportion of failures (pof) test (finite sample) and the binomial (bin) test (large-sample approximation). In turn, the conditional VaR test measures if there is a time pattern in the sequence of VaR failures (back-to-back failures, and so on). The conditional coverage independence (cci) test in the varbacktest class tests for one-lag independence. The time between failures independence (tbfi) test in the varbacktest class also assesses time independence for VaR models.

The esbacktestbyde class supports the DE ES tests. The DE ES tests assess the mean value and the autocorrelation of the Ht series. For the unconditional test (unconditionalDE), the expected value is α/2 — for example, the average value in the bottom 5% of a uniform (0,1) distribution is 2.5%. The conditional test (conditionalDE) assesses not only if a failure occurs but also if the failure severity is correlated to previous failure occurrences and their severities.

The test statistic for the unconditional DE ES test is

${U}_{ES}=\frac{1}{N}{\sum }_{t=1}^{N}{H}_{t}$

If the number of observations is large, the test statistic is distributed as

${U}_{ES}\underset{dist}{\to }N\left(\frac{\alpha }{2},\frac{\alpha \left(1/3-\alpha /4\right)}{N}\right)={P}_{U}$

where N(μ,σ2) is the normal distribution with mean μ and variance σ2.

The unconditional DE ES test is a two-sided test that checks if the test statistic is close to the expected value of α/2. From the limiting distribution, a confidence level is derived. Finite-sample confidence intervals are estimated through simulation.

The test statistic for the conditional DE ES test is derived in several steps. First, define the autocovariance for lag j:

${\gamma }_{j}=\frac{1}{N-j}{\sum }_{t=j+1}^{N}\left({H}_{t}-\alpha /2\right)\left({H}_{t-j}-\alpha /2\right)$

The autocorrelation for lag j is then

${\rho }_{j}=\frac{{\gamma }_{j}}{{\gamma }_{0}}$

The test statistic for m lags is then

${C}_{ES}\left(m\right)=N{\sum }_{j=1}^{m}{\rho }_{j}^{2}$

If the number of observations is large, the test statistic is distributed as a chi-square distribution with m degrees of freedom:

${C}_{ES}\left(m\right)\underset{dist}{\to }{\chi }_{m}^{2}$

The conditional DE ES test is a one-sided test to determine if the conditional DE ES test statistic is much larger than zero. If so, there is evidence of autocorrelation. The limiting distribution computes large-sample critical values. Finite-sample critical values are estimated through simulation.

### Comparison of ES Backtesting Methods

The backtesting tools supported by Risk Management Toolbox have the following requirements and features.

Backtesting ToolPortfolioData RequiredVarData RequiredESData RequiredVaRLevel Required[a]PortfolioID and VaRID SupportedDistribution Information RequiredSupports Multiple Models[b]Supports Multiple VaRLevels
varbacktestYesYesNoYesYesNoYesYes
esbacktestYesYesYesYesYesNoYesYes
esbacktestbysimYesYesYesYesYesYesNoYes
esbacktestbydeYesNoNoYesYesYesNoYes

[a] VaRLevel is an optional name-value pair argument with a default value of 95%. It is recommended to set the VaRLevel when creating the backtesting object.

[b] For example, you can backtest a normal and a t model in the same object with varbacktest, but you need two separate instances of the esbacktestbyde class to backtest them.

Risk Management Toolbox supports the following backtesting tools and their associated tests.

Test TypeTest NameTests forRisk MeasureCritical Value ComputationUse ObjectUse Function
BaselTraffic lightFrequencyVaRExact finite-sample (binomial)varbacktesttl
VariousBinomialFrequencyVaRLarge-sample normal approximationvarbacktestbin
KupiecProportion of failuresFrequencyVaRExact finite-sample (log likelihood)varbacktestpof
KupiecTime until first failureIndependenceVaRExact finite-sample (log likelihood)varbacktesttuff
ChristoffersenConditional coverage, mixedFrequency and independenceVaRExact finite-sample (log likelihood)varbacktestcc
ChristoffersenConditional coverage, independenceIndependenceVaRExact finite-sample (log likelihood)varbacktestcci
HaasMixed Kupiec testFrequency and independenceVaRExact finite-sample (log likelihood)varbacktesttbf
HaasIndependence (time between failures)IndependenceVaRExact finite-sample (log likelihood)varbacktesttbfi
Acerbi-Szekely"Test 2" or unconditionalSeverityESTables of presimulated critical values, under normal and t distributionesbacktestunconditionalNormal unconditionalT
Acerbi-Szekely"Test 1" or conditionalSeverityESFinite-sample simulationesbacktestbysimconditional
Acerbi-Szekely"Test 2" or unconditionalSeverityESFinite-sample simulationesbacktestbysimunconditional
Acerbi-Szekely"Test 1" or ranks (quantile)SeverityESFinite-sample simulationesbacktestbysimquantile
Du-EscancianoUnconditionalSeverityESLarge-sample approximation and finite-sample simulationesbacktestbydeunconditionalDE
Du-EscancianoConditionalIndependenceESLarge-sample approximation and finite-sample simulationesbacktestbydeconditionalDE

## References

[1] Basel Committee on Banking Supervision. Supervisory Framework for the Use of “Backtesting” in Conjunction with the Internal Models Approach to Market Risk Capital Requirements. January 1996. https://www.bis.org/publ/bcbs22.htm.

[2] Acerbi, C., and B. Szekely. Backtesting Expected Shortfall. MSCI Inc. December 2014.

[3] Du, Z., and J. C. Escanciano. "Backtesting Expected Shortfall: Accounting for Tail Risk." Management Science. Vol. 63, Issue 4, April 2017.