## Overview of Expected Shortfall Backtesting

Expected Shortfall (ES) is the expected loss on days when there is a Value-at-Risk (VaR) failure. If the VaR is 10 million and the ES is 12 million, we know the expected loss tomorrow; if it happens to be a very bad day, it is 20% higher than the VaR. ES is sometimes called Conditional Value-at-Risk (CVaR), Tail Value-at-Risk (TVaR), Tail Conditional Expectation (TCE), or Conditional Tail Expectation (CTE).

There are many approaches to estimating VaR and ES, and they may lead to different VaR and ES estimates. How can one determine if models are accurately estimating the risk on a daily basis? How can one evaluate which model performs better? The `varbacktest` tools help validate the performance of VaR models with regards to estimated VaR values. The `esbacktest`, `esbacktestbysim`, and `esbacktestbyde` tools extend these capabilities to evaluate VaR models with regards to estimated ES values.

For VaR backtesting, the possibilities every day are two: either there is a VaR failure or not. If the VaR confidence level is 95%, VaR failures should happen approximately 5% of the time. To backtest VaR, you only need to know whether the VaR was exceeded (VaR failure) or not on each day of the test window and the VaR confidence level. Risk Management Toolbox™ VaR backtesting tools support “frequency” (assess the proportion of failures) and “independence” (assess independence across time) tests, and these tests work with the binary sequence of "failure" or "no-failure" results over the test window.

For expected shortfall (ES), the possibilities every day are infinite: The VaR may be exceeded by 1%, or by 10%, or by 150%, and so on. For example, there are three VaR failures in the following example:

On failure days, the VaR is exceeded on average by 39%, but the estimated ES exceeds VaR by an average of 27%. How can you tell if 39% is significantly larger than 27%? Knowing the VaR confidence level is not enough, you must also know how likely are the different exceedances over the VaR according to the VaR model. In other words, you need some distribution information about what happens beyond the VaR according to your model assumptions. For thin-tail VaR models, 39% vs. 27% may be a large difference. However, for a heavy-tail VaR model where a severity of twice the VaR has a non-trivial probability of happening, then 39% vs. 27% over the three failure dates may not be a red flag.

A key difference between VaR backtesting and ES backtesting is that most ES backtesting methods require information about the distribution of the returns on each day, or at least the distribution of the tails beyond the VaR. One exception is the “unconditional” test (see `unconditionalNormal` and `unconditionalT`) where you can get approximate test results without providing the distribution information. This is important in practice, because the “unconditional” test is much simpler to use and can be used in principle for any VaR or ES model. The trade-off is that the approximate results may be inaccurate, especially in borderline accept, or reject cases, or for certain types of distributions.

The toolbox supports the following tests for expected shortfall backtesting for table-based tests for the unconditional Acerbi-Szekely test using the `esbacktest` object:

ES backtests are necessarily approximated in that they are sensitive to errors in the predicted VaR. However, the minimally biased test has only a small sensitivity to VaR errors and the sensitivity is prudential, in the sense that VaR errors lead to a more punitive ES test. See Acerbi-Szekely (2017 and 2019) for details. When distribution information is available, the minimally biased test (`minBiasRelative` or `minBiasAbsolute`) is recommended.

The toolbox supports the following Acerbi-Szekely simulation-based tests for expected shortfall backtesting using the `esbacktestbysim` object:

For the Acerbi-Szekely simulation-based tests, you must provide the model distribution information as part of the inputs to `esbacktestbysim`.

The toolbox also supports the following Du and Escanciano tests for expected shortfall backtesting using the `esbacktestbyde` object:

For the Du and Escanciano simulation-based tests, you must provide the model distribution information as part of the inputs to `esbacktestbyde`.

### Conditional Test by Acerbi and Szekely

The conditional test statistic by Acerbi and Szekely is based on the conditional relationship

`$E{S}_{t}=-{E}_{t}\left[{X}_{t}|{X}_{t}<-Va{R}_{t}\right]$`

where

`X`t is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

`VaR`t is the estimated VaR for period t.

`ES`t is the estimated expected shortfall for period t.

The number of failures is defined as

`$NumFailures=\sum _{t=1}^{N}{I}_{t}$`

where

`N` is the number of periods in the test window (t = `1`,…,`N`).

`I`t is the VaR failure indicator on period t with a value of 1 if `X`t < -VaR, and 0 otherwise.

The conditional test statistic is defined as

`${Z}_{cond}=\frac{1}{NumFailures}\sum _{t=1}^{N}\frac{{X}_{t}{I}_{t}}{E{S}_{t}}+1$`

The conditional test has two parts. A VaR backtest must be run for the number of failures (`NumFailures`), and a standalone conditional test is performed for the conditional test statistic `Z`cond. The conditional test accepts the model only when both the VaR test and the standalone conditional test accept the model. For more information, see `conditional`.

### Unconditional Test by Acerbi and Szekely

The unconditional test statistic by Acerbi and Szekely is based on the unconditional relationship,

`$E{S}_{t}=-{E}_{t}\left[\frac{{X}_{t}{I}_{t}}{{p}_{VaR}}\right]$`

where

`X`t is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

`P`VaR is the probability of VaR failure defined as 1-VaR level.

`ES`t is the estimated expected shortfall for period t.

`I`t is the VaR failure indicator on period t with a value of 1 if `X`t < -VaR, and 0 otherwise.

The unconditional test statistic is defined as

`${Z}_{uncond}=\frac{1}{N{p}_{VaR}}\sum _{t=1}^{N}\frac{{X}_{t}{I}_{t}}{E{S}_{t}}+1$`

The critical values for the unconditional test statistic are stable across a range of distributions, which is the basis for the table-based tests. The `esbacktest` class runs the unconditional test against precomputed critical values under two distributional assumptions, namely, normal distribution (thin tails, see `unconditionalNormal`), and t distribution with 3 degrees of freedom (heavy tails, see `unconditionalT`).

### Quantile Test by Acerbi and Szekely

A sample estimator of the expected shortfall for a sample `Y`1,…,`Y`N is:

`$\stackrel{⌢}{ES}\left(Y\right)=-\frac{1}{⌊N{p}_{VaR}⌋}\sum _{i=1}^{⌊N{p}_{VaR}⌋}{Y}_{\left[i\right]}$`

where

`N` is the number of periods in the test window (t = `1`,…,`N`).

`P`VaR is the probability of VaR failure defined as 1-VaR level.

`Y`1,…,`Y`N are the sorted sample values (from smallest to largest), and $⌊N{p}_{VaR}⌋$ is the largest integer less than or equal to `Np`VaR.

To compute the quantile test statistic, a sample of size `N` is created at each time t as follows. First, convert the portfolio outcomes to `X`t to ranks ${U}_{1}={P}_{1}\left({X}_{1}\right),...,{U}_{N}={P}_{N}\left({X}_{N}\right)$ using the cumulative distribution function `P`t. If the distribution assumptions are correct, the rank values `U`1,…,`U`N are uniformly distributed in the interval (0,1). Then at each time t:

1. Invert the ranks U = (`U`1,…,`U`N) to get `N` quantiles ${P}_{t}^{-1}\left(U\right)=\left({P}_{t}^{-1}\left({U}_{1}\right),...,{P}_{t}^{-1}\left({U}_{N}\right)\right)$.

2. Compute the sample estimator $\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(U\right)\right)$.

3. Compute the expected value of the sample estimator $E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]$

where `V` = (`V`1,…,`V`N) is a sample of `N` independent uniform random variables in the interval (0,1). This can be computed analytically.

The quantile test statistic by Acerbi and Szekely is defined as

`${Z}_{quantile}=-\frac{1}{N}\sum _{t=1}^{N}\frac{\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(U\right)\right)}{E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]}+1$`

The denominator inside the sum can be computed analytically as

`$E\left[\stackrel{⌢}{ES}\left({P}_{t}^{-1}\left(V\right)\right)\right]=-\frac{N}{⌊{N}_{pVaR}⌋}{\int }_{0}^{1}{I}_{1-p}\left(N-⌊{N}_{pVaR}⌋,⌊{N}_{pVaR}⌋\right){P}_{t}^{-1}\left(p\right)dp$`

where `I`x(`z`,`w`) is the regularized incomplete beta function. For more information, see `betainc` and `quantile`.

### Minimally Biased Test by Acerbi and Szekely

The minimally biased test statistic by Acerbi and Szekely is based on the following representation of the VaR and ES (see Acerbi and Szekely 2017 and 2019 for details and also Rockafellar and Uryasev 2002, and Acerbi and Tasche 2002):

`$\begin{array}{l}E{S}_{\alpha }={\mathrm{min}}_{v}E\left[v+\frac{1}{\alpha }\left(X+v\right)_\right]\\ Va{R}_{\alpha }=\mathrm{arg}{\mathrm{min}}_{v}E\left[v+\frac{1}{\alpha }\left(X+v\right)_\right]\end{array}$`

where

X is the portfolio outcome.

(x)_ is the negative part function defined as (x)_ = max(0,-x).

ɑ is 1-VaR level.

The test statistic has an absolute version and a relative version. The absolute version of the minimally biased test statistic is given by

`${Z}_{minbias}^{abs}=\frac{1}{N}\sum _{t=1}^{N}\left(E{S}_{t}-Va{R}_{t}-\frac{1}{{p}_{VaR}}\left({X}_{t}+Va{R}_{t}\right)_\right)$`

where

Xt is the portfolio outcome, that is the portfolio return or portfolio profit and loss for period t.

VaRt is the essential VaR for period t.

ESt is the expected shortfall for period t.

pVaR is the probability of Var Failure defined as 1-VaR level.

N is the number of periods in the test window (t = 1,...N).

(x)_ is the negative part function defined as (x)_ = max(0,-x).

The relative version of the minimally biased test statistic is given by

`${Z}_{minbias}^{rel}=\frac{1}{N}\sum _{t=1}^{N}\frac{1}{E{S}_{t}}\left(E{S}_{t}-Va{R}_{t}-\frac{1}{{p}_{VaR}}\left({X}_{t}+Va{R}_{t}\right)_\right)$`

ES backtests are necessarily approximated in that they are sensitive to errors in the predicted VaR. However, the minimally biased test has only a small sensitivity to VaR errors and the sensitivity is prudential, in the sense that VaR errors lead to a more punitive ES test. See Acerbi-Szekely (2017 and 2019) for details. When distribution information is available, the minimally biased test is recommended. For more information, see `minBiasRelative` and `minBiasAbsolute`.

### ES Backtest Using Du-Escanciano Method

For each day, the Du-Escanciano model assumes a distribution for the returns. For example, if you have a normal distribution with a conditional variance of 1.5%, there is a corresponding cumulative distribution function Pt. By mapping the returns Xt with the distribution Pt, you get the “mapped returns” series Ut, also known as the "ranks" series, which by construction has values between 0 and 1 (see column 2 in the following table). Let α be the complement of the VaR level — for example, if the VaR level is 95%, α is 5%. If the mapped return Ut is smaller than α, then there is a VaR “violation” or VaR “failure.” This is equivalent to observing a return Xt smaller than the negative of the VaR value for that day, since, by construction, the negative of the VaR value gets mapped to α. Therefore, you can compare Ut against α without even knowing the VaR value. The series of VaR failures is denoted by ht and it is a series of 0's and 1's stored in column 3 in the following table. Finally, column 4 in the following table contains the “cumulative violations” series, denoted by Ht. This is the severity of the mapped VaR violations on days on which the VaR is violated. For example, if the mapped return Ut is 1% and α is 5%, Ht is 4%. Ht is defined as zero if there are no VaR violations.

XtUt = Pt(Xt)ht = Ut < αHt = (α - Ut) * ht
0.002080.579900
-0.010730.155400
-0.008250.215900
-0.029670.007310.0427
0.012420.874500
............

Given the violations series ht and the cumulative violations series Ht, the Du-Escanciano (DE) tests are summarized as:

Du-Escanciano TestVaR TestES Test
UnconditionalMean of htMean of Ht
ConditionalAutocorrelation of htAutocorrelation of Ht

The DE VaR tests assess the mean value and the autocorrelation of the ht series, and the resulting tests overlap with known VaR tests. For example, the mean of ht is expected to match α. In other words, the proportion of time the VaR is violated is expected to match the confidence level. This test is supported in the `varbacktest` class with the proportion of failures (`pof`) test (finite sample) and the binomial (`bin`) test (large-sample approximation). In turn, the conditional VaR test measures if there is a time pattern in the sequence of VaR failures (back-to-back failures, and so on). The conditional coverage independence (`cci`) test in the `varbacktest` class tests for one-lag independence. The time between failures independence (`tbfi`) test in the `varbacktest` class also assesses time independence for VaR models.

The `esbacktestbyde` class supports the DE ES tests. The DE ES tests assess the mean value and the autocorrelation of the Ht series. For the unconditional test (`unconditionalDE`), the expected value is α/2 — for example, the average value in the bottom 5% of a uniform (0,1) distribution is 2.5%. The conditional test (`conditionalDE`) assesses not only if a failure occurs but also if the failure severity is correlated to previous failure occurrences and their severities.

The test statistic for the unconditional DE ES test is

`${U}_{ES}=\frac{1}{N}{\sum }_{t=1}^{N}{H}_{t}$`

If the number of observations is large, the test statistic is distributed as

`${U}_{ES}\underset{dist}{\to }N\left(\frac{\alpha }{2},\frac{\alpha \left(1/3-\alpha /4\right)}{N}\right)={P}_{U}$`

where N(μ,σ2) is the normal distribution with mean μ and variance σ2.

The unconditional DE ES test is a two-sided test that checks if the test statistic is close to the expected value of α/2. From the limiting distribution, a confidence level is derived. Finite-sample confidence intervals are estimated through simulation.

The test statistic for the conditional DE ES test is derived in several steps. First, define the autocovariance for lag j:

`${\gamma }_{j}=\frac{1}{N-j}{\sum }_{t=j+1}^{N}\left({H}_{t}-\alpha /2\right)\left({H}_{t-j}-\alpha /2\right)$`

The autocorrelation for lag j is then

`${\rho }_{j}=\frac{{\gamma }_{j}}{{\gamma }_{0}}$`

The test statistic for m lags is then

`${C}_{ES}\left(m\right)=N{\sum }_{j=1}^{m}{\rho }_{j}^{2}$`

If the number of observations is large, the test statistic is distributed as a chi-square distribution with m degrees of freedom:

`${C}_{ES}\left(m\right)\underset{dist}{\to }{\chi }_{m}^{2}$`

The conditional DE ES test is a one-sided test to determine if the conditional DE ES test statistic is much larger than zero. If so, there is evidence of autocorrelation. The limiting distribution computes large-sample critical values. Finite-sample critical values are estimated through simulation.

### Comparison of ES Backtesting Methods

The backtesting tools supported by Risk Management Toolbox have the following requirements and features.

Backtesting Tool`PortfolioData` Required`VarData` Required`ESData` Required`VaRLevel` Requireda`PortfolioID` and `VaRID` Supported`Distribution` Information RequiredSupports Multiple ModelsbSupports Multiple `VaRLevel`s
`varbacktest`YesYesNoYesYesNoYesYes
`esbacktest`YesYesYesYesYesNoYesYes
`esbacktestbysim`YesYesYesYesYesYesNoYes
`esbacktestbyde`YesNoNoYesYesYesNoYes

a `VaRLevel` is an optional name-value pair argument with a default value of 95%. It is recommended to set the `VaRLevel` when creating the backtesting object.

b For example, you can backtest a `normal` and a `t` model in the same object with `varbacktest`, but you need two separate instances of the `esbacktestbyde` class to backtest them.

Risk Management Toolbox supports the following backtesting tools and their associated tests.

Test TypeTest NameTests forRisk MeasureCritical Value ComputationUse ObjectUse Function
BaselTraffic lightFrequencyVaRExact finite-sample (binomial)`varbacktest``tl`
VariousBinomialFrequencyVaRLarge-sample normal approximation`varbacktest``bin`
KupiecProportion of failuresFrequencyVaRExact finite-sample (log likelihood)`varbacktest``pof`
KupiecTime until first failureIndependenceVaRExact finite-sample (log likelihood)`varbacktest``tuff`
ChristoffersenConditional coverage, mixedFrequency and independenceVaRExact finite-sample (log likelihood)`varbacktest``cc`
ChristoffersenConditional coverage, independenceIndependenceVaRExact finite-sample (log likelihood)`varbacktest``cci`
HaasMixed Kupiec testFrequency and independenceVaRExact finite-sample (log likelihood)`varbacktest``tbf`
HaasIndependence (time between failures)IndependenceVaRExact finite-sample (log likelihood)`varbacktest``tbfi`
Acerbi-Szekely"Test 2" or unconditionalSeverityESTables of presimulated critical values, under normal and t distribution`esbacktest``unconditionalNormal` and `unconditionalT`
Acerbi-Szekely"Test 1" or conditionalSeverityESFinite-sample simulation`esbacktestbysim``conditional`
Acerbi-Szekely"Test 2" or unconditionalSeverityESFinite-sample simulation`esbacktestbysim``unconditional`
Acerbi-Szekely"Test 1" or ranks (quantile)SeverityESFinite-sample simulation`esbacktestbysim``quantile`
Acerbi-SzekelyMinimally Biased, relative versionSeverityESFinite-sample simulation`esbacktestbysim``minBiasRelative`
Acerbi-SzekelyMinimally Biased, absolute versionSeverityESFinite-sample simulation`esbacktestbysim``minBiasAbsolute`
Du-EscancianoUnconditionalSeverityESLarge-sample approximation and finite-sample simulation`esbacktestbyde``unconditionalDE`
Du-EscancianoConditionalIndependenceESLarge-sample approximation and finite-sample simulation`esbacktestbyde``conditionalDE`

## References

[1] Basel Committee on Banking Supervision. Supervisory Framework for the Use of “Backtesting” in Conjunction with the Internal Models Approach to Market Risk Capital Requirements. January 1996. https://www.bis.org/publ/bcbs22.htm.

[2] Acerbi, C., and B. Szekely. Backtesting Expected Shortfall. MSCI Inc. December 2014.

[3] Acerbi, C., and B. Szekely. "General Properties of Backtestable Statistics. SSRN Electronic Journal. January, 2017.

[4] Acerbi, C., and B. Szekely. "The Minimally Biased Backtest for ES." Risk. September, 2019.

[5] Acerbi, C. and D. Tasche. “On the Coherence of Expected Shortfall.” Journal of Banking and Finance. Vol. 26, 2002, pp. 1487-1503.

[6] Du, Z., and J. C. Escanciano. "Backtesting Expected Shortfall: Accounting for Tail Risk." Management Science. Vol. 63, Issue 4, April 2017.

[7] Rockafellar, R. T. and S. Uryasev. "Conditional Value-at-Risk for General Loss Distributions." Journal of Banking and Finance. Vol. 26, 2002, pp. 1443-1471.