Overview of Expected Shortfall Backtesting

Expected Shortfall (ES) is the expected loss on days when there is a Value-at-Risk (VaR) failure. If the VaR is 10 million and the ES is 12 million, we know the expected loss tomorrow; if it happens to be a very bad day, it is 20% higher than the VaR. ES is sometimes called Conditional Value-at-Risk (CVaR), Tail Value-at-Risk (TVaR), Tail Conditional Expectation (TCE), or Conditional Tail Expectation (CTE).

There are many approaches to estimating VaR and ES, and they may lead to different VaR and ES estimates. How can one determine if models are accurately estimating the risk on a daily basis? How can one evaluate which model performs better? The varbacktest tools help validate the performance of VaR models with regards to estimated VaR values. The esbacktest, esbacktestbysim, and esbacktestbyde tools extend these capabilities to evaluate VaR models with regards to estimated ES values.

For VaR backtesting, the possibilities every day are two: either there is a VaR failure or not. If the VaR confidence level is 95%, VaR failures should happen approximately 5% of the time. To backtest VaR, you only need to know whether the VaR was exceeded (VaR failure) or not on each day of the test window and the VaR confidence level. Risk Management Toolbox™ VaR backtesting tools support “frequency” (assess the proportion of failures) and “independence” (assess independence across time) tests, and these tests work with the binary sequence of "failure" or "no-failure" results over the test window.

For expected shortfall (ES), the possibilities every day are infinite: The VaR may be exceeded by 1%, or by 10%, or by 150%, and so on. For example, there are three VaR failures in the following example:

On failure days, the VaR is exceeded on average by 39%, but the estimated ES exceeds VaR by an average of 27%. How can you tell if 39% is significantly larger than 27%? Knowing the VaR confidence level is not enough, you must also know how likely are the different exceedances over the VaR according to the VaR model. In other words, you need some distribution information about what happens beyond the VaR according to your model assumptions. For thin-tail VaR models, 39% vs. 27% may be a large difference. However, for a heavy-tail VaR model where a severity of twice the VaR has a non-trivial probability of happening, then 39% vs. 27% over the three failure dates may not be a red flag.

A key difference between VaR backtesting and ES backtesting is that most ES backtesting methods require information about the distribution of the returns on each day, or at least the distribution of the tails beyond the VaR. One exception is the “unconditional” test (see unconditionalNormal and unconditionalT) where you can get approximate test results without providing the distribution information. This is important in practice, because the “unconditional” test is much simpler to use and can be used in principle for any VaR or ES model. The trade-off is that the approximate results may be inaccurate, especially in borderline accept, or reject cases, or for certain types of distributions.

The toolbox supports the following tests for expected shortfall backtesting for table-based tests for the unconditional Acerbi-Szekely test using the esbacktest object:

ES backtests are necessarily approximated in that they are sensitive to errors in the predicted VaR. However, the minimally biased test has only a small sensitivity to VaR errors and the sensitivity is prudential, in the sense that VaR errors lead to a more punitive ES test. See Acerbi-Szekely (2017 and 2019) for details. When distribution information is available, the minimally biased test (minBiasRelative or minBiasAbsolute) is recommended.

The toolbox supports the following Acerbi-Szekely simulation-based tests for expected shortfall backtesting using the esbacktestbysim object:

For the Acerbi-Szekely simulation-based tests, you must provide the model distribution information as part of the inputs to esbacktestbysim.

The toolbox also supports the following Du and Escanciano tests for expected shortfall backtesting using the esbacktestbyde object:

For the Du and Escanciano simulation-based tests, you must provide the model distribution information as part of the inputs to esbacktestbyde.

Conditional Test by Acerbi and Szekely

The conditional test statistic by Acerbi and Szekely is based on the conditional relationship

$E S_{t} = - E_{t} [X_{t} | X_{t} < - V a R_{t}]$

where

X_t is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

VaR_t is the estimated VaR for period t.

ES_t is the estimated expected shortfall for period t.

The number of failures is defined as

$N u m F a i l u r e s = \sum_{t = 1}^{N} I_{t}$

where

N is the number of periods in the test window (t = 1,…,N).

I_t is the VaR failure indicator on period t with a value of 1 if X_t < -VaR, and 0 otherwise.

The conditional test statistic is defined as

$Z_{c o n d} = \frac{1}{N u m F a i l u r e s} \sum_{t = 1}^{N} \frac{X_{t} I_{t}}{E S_{t}} + 1$

The conditional test has two parts. A VaR backtest must be run for the number of failures (NumFailures), and a standalone conditional test is performed for the conditional test statistic Z_cond. The conditional test accepts the model only when both the VaR test and the standalone conditional test accept the model. For more information, see conditional.

Unconditional Test by Acerbi and Szekely

The unconditional test statistic by Acerbi and Szekely is based on the unconditional relationship,

$E S_{t} = - E_{t} [\frac{X_{t} I_{t}}{p_{V a R}}]$

where

X_t is the portfolio outcome, that is, the portfolio return or portfolio profit and loss for period t.

P_VaR is the probability of VaR failure defined as 1-VaR level.

ES_t is the estimated expected shortfall for period t.

I_t is the VaR failure indicator on period t with a value of 1 if X_t < -VaR, and 0 otherwise.

The unconditional test statistic is defined as

$Z_{u n c o n d} = \frac{1}{N p_{V a R}} \sum_{t = 1}^{N} \frac{X_{t} I_{t}}{E S_{t}} + 1$

The critical values for the unconditional test statistic are stable across a range of distributions, which is the basis for the table-based tests. The esbacktest class runs the unconditional test against precomputed critical values under two distributional assumptions, namely, normal distribution (thin tails, see unconditionalNormal), and t distribution with 3 degrees of freedom (heavy tails, see unconditionalT).

Quantile Test by Acerbi and Szekely

A sample estimator of the expected shortfall for a sample Y₁,…,Y_N is:

$\overset{⌢}{E S} (Y) = - \frac{1}{⌊ N p_{V a R} ⌋} \sum_{i = 1}^{⌊ N p_{V a R} ⌋} Y_{[i]}$

where

N is the number of periods in the test window (t = 1,…,N).

P_VaR is the probability of VaR failure defined as 1-VaR level.

Y₁,…,Y_N are the sorted sample values (from smallest to largest), and $⌊ N p_{V a R} ⌋$ is the largest integer less than or equal to Np_VaR.

To compute the quantile test statistic, a sample of size N is created at each time t as follows. First, convert the portfolio outcomes to X_t to ranks $U_{1} = P_{1} (X_{1}), ..., U_{N} = P_{N} (X_{N})$ using the cumulative distribution function P_t. If the distribution assumptions are correct, the rank values U₁,…,U_N are uniformly distributed in the interval (0,1). Then at each time t:

Invert the ranks U = (U₁,…,U_N) to get N quantiles $P_{t}^{- 1} (U) = (P_{t}^{- 1} (U_{1}), ..., P_{t}^{- 1} (U_{N}))$ .
Compute the sample estimator $\overset{⌢}{E S} (P_{t}^{- 1} (U))$ .
Compute the expected value of the sample estimator $E [\overset{⌢}{E S} (P_{t}^{- 1} (V))]$
where V = (V₁,…,V_N) is a sample of N independent uniform random variables in the interval (0,1). This can be computed analytically.

The quantile test statistic by Acerbi and Szekely is defined as

$Z_{q u a n t i l e} = - \frac{1}{N} \sum_{t = 1}^{N} \frac{\overset{⌢}{E S} (P_{t}^{- 1} (U))}{E [\overset{⌢}{E S} (P_{t}^{- 1} (V))]} + 1$

The denominator inside the sum can be computed analytically as

$E [\overset{⌢}{E S} (P_{t}^{- 1} (V))] = - \frac{N}{⌊ N_{p V a R} ⌋} \int_{0}^{1} I_{1 - p} (N - ⌊ N_{p V a R} ⌋, ⌊ N_{p V a R} ⌋) P_{t}^{- 1} (p) d p$

where I_x(z,w) is the regularized incomplete beta function. For more information, see betainc and quantile.

Minimally Biased Test by Acerbi and Szekely

The minimally biased test statistic by Acerbi and Szekely is based on the following representation of the VaR and ES (see Acerbi and Szekely 2017 and 2019 for details and also Rockafellar and Uryasev 2002, and Acerbi and Tasche 2002):

$\begin{array}{l} E S_{α} = \min_{v} E [v + \frac{1}{α} (X + v)_] \\ V a R_{α} = \arg \min_{v} E [v + \frac{1}{α} (X + v)_] \end{array}$

where

X is the portfolio outcome.

(x)_ is the negative part function defined as (x)_ = max(0,-x).

ɑ is 1-VaR level.

The test statistic has an absolute version and a relative version. The absolute version of the minimally biased test statistic is given by

$Z_{m i n b i a s}^{a b s} = \frac{1}{N} \sum_{t = 1}^{N} (E S_{t} - V a R_{t} - \frac{1}{p_{V a R}} (X_{t} + V a R_{t})_)$

where

X_t is the portfolio outcome, that is the portfolio return or portfolio profit and loss for period t.

VaR_t is the essential VaR for period t.

ES_t is the expected shortfall for period t.

p_VaR is the probability of Var Failure defined as 1-VaR level.

N is the number of periods in the test window (t = 1,...N).

(x)_ is the negative part function defined as (x)_ = max(0,-x).

The relative version of the minimally biased test statistic is given by

$Z_{m i n b i a s}^{r e l} = \frac{1}{N} \sum_{t = 1}^{N} \frac{1}{E S_{t}} (E S_{t} - V a R_{t} - \frac{1}{p_{V a R}} (X_{t} + V a R_{t})_)$

ES backtests are necessarily approximated in that they are sensitive to errors in the predicted VaR. However, the minimally biased test has only a small sensitivity to VaR errors and the sensitivity is prudential, in the sense that VaR errors lead to a more punitive ES test. See Acerbi-Szekely (2017 and 2019) for details. When distribution information is available, the minimally biased test is recommended. For more information, see minBiasRelative and minBiasAbsolute.

ES Backtest Using Du-Escanciano Method

For each day, the Du-Escanciano model assumes a distribution for the returns. For example, if you have a normal distribution with a conditional variance of 1.5%, there is a corresponding cumulative distribution function P_t. By mapping the returns X_t with the distribution P_t, you get the “mapped returns” series U_t, also known as the "ranks" series, which by construction has values between 0 and 1 (see column 2 in the following table). Let α be the complement of the VaR level — for example, if the VaR level is 95%, α is 5%. If the mapped return U_t is smaller than α, then there is a VaR “violation” or VaR “failure.” This is equivalent to observing a return X_t smaller than the negative of the VaR value for that day, since, by construction, the negative of the VaR value gets mapped to α. Therefore, you can compare U_t against α without even knowing the VaR value. The series of VaR failures is denoted by h_t and it is a series of 0's and 1's stored in column 3 in the following table. Finally, column 4 in the following table contains the “cumulative violations” series, denoted by H_t. This is the severity of the mapped VaR violations on days on which the VaR is violated. For example, if the mapped return U_t is 1% and α is 5%, H_t is 4%. H_t is defined as zero if there are no VaR violations.

X_t	U_t = P_t(X_t)	h_t = U_t < α	H_t = (α - U_t) * h_{_t}
0.00208	0.5799	0	0
-0.01073	0.1554	0	0
-0.00825	0.2159	0	0
-0.02967	0.0073	1	0.0427
0.01242	0.8745	0	0
...	...	...	...

Given the violations series h_t and the cumulative violations series H_t, the Du-Escanciano (DE) tests are summarized as:

Du-Escanciano Test	VaR Test	ES Test
Unconditional	Mean of h_t	Mean of H_t
Conditional	Autocorrelation of h_t	Autocorrelation of H_t

The DE VaR tests assess the mean value and the autocorrelation of the h_t series, and the resulting tests overlap with known VaR tests. For example, the mean of h_t is expected to match α. In other words, the proportion of time the VaR is violated is expected to match the confidence level. This test is supported in the varbacktest class with the proportion of failures (pof) test (finite sample) and the binomial (bin) test (large-sample approximation). In turn, the conditional VaR test measures if there is a time pattern in the sequence of VaR failures (back-to-back failures, and so on). The conditional coverage independence (cci) test in the varbacktest class tests for one-lag independence. The time between failures independence (tbfi) test in the varbacktest class also assesses time independence for VaR models.

The esbacktestbyde class supports the DE ES tests. The DE ES tests assess the mean value and the autocorrelation of the H_t series. For the unconditional test (unconditionalDE), the expected value is α/2 — for example, the average value in the bottom 5% of a uniform (0,1) distribution is 2.5%. The conditional test (conditionalDE) assesses not only if a failure occurs but also if the failure severity is correlated to previous failure occurrences and their severities.

The test statistic for the unconditional DE ES test is

$U_{E S} = \frac{1}{N} \sum_{t = 1}^{N} H_{t}$

If the number of observations is large, the test statistic is distributed as

$U_{E S} \underset{d i s t}{\to} N (\frac{α}{2}, \frac{α (1 / 3 - α / 4)}{N}) = P_{U}$

where N(μ,σ²) is the normal distribution with mean μ and variance σ².

The unconditional DE ES test is a two-sided test that checks if the test statistic is close to the expected value of α/2. From the limiting distribution, a confidence level is derived. Finite-sample confidence intervals are estimated through simulation.

The test statistic for the conditional DE ES test is derived in several steps. First, define the autocovariance for lag j:

$γ_{j} = \frac{1}{N - j} \sum_{t = j + 1}^{N} (H_{t} - α / 2) (H_{t - j} - α / 2)$

The autocorrelation for lag j is then

$ρ_{j} = \frac{γ_{j}}{γ_{0}}$

The test statistic for m lags is then

$C_{E S} (m) = N \sum_{j = 1}^{m} ρ_{j}^{2}$

If the number of observations is large, the test statistic is distributed as a chi-square distribution with m degrees of freedom:

$C_{E S} (m) \underset{d i s t}{\to} χ_{m}^{2}$

The conditional DE ES test is a one-sided test to determine if the conditional DE ES test statistic is much larger than zero. If so, there is evidence of autocorrelation. The limiting distribution computes large-sample critical values. Finite-sample critical values are estimated through simulation.

Comparison of ES Backtesting Methods

The backtesting tools supported by Risk Management Toolbox have the following requirements and features.

Backtesting Tool	`PortfolioData` Required	`VarData` Required	`ESData` Required	`VaRLevel` Required^a	`PortfolioID` and `VaRID` Supported	`Distribution` Information Required	Supports Multiple Models^b	Supports Multiple `VaRLevel`s
`varbacktest`	Yes	Yes	No	Yes	Yes	No	Yes	Yes
`esbacktest`	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes
`esbacktestbysim`	Yes	Yes	Yes	Yes	Yes	Yes	No	Yes
`esbacktestbyde`	Yes	No	No	Yes	Yes	Yes	No	Yes
^a `VaRLevel` is an optional name-value pair argument with a default value of 95%. It is recommended to set the `VaRLevel` when creating the backtesting object. ^b For example, you can backtest a `normal` and a `t` model in the same object with `varbacktest`, but you need two separate instances of the `esbacktestbyde` class to backtest them.

Risk Management Toolbox supports the following backtesting tools and their associated tests.

Test Type	Test Name	Tests for	Risk Measure	Critical Value Computation	Use Object	Use Function
Basel	Traffic light	Frequency	VaR	Exact finite-sample (binomial)	`varbacktest`	`tl`
Various	Binomial	Frequency	VaR	Large-sample normal approximation	`varbacktest`	`bin`
Kupiec	Proportion of failures	Frequency	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`pof`
Kupiec	Time until first failure	Independence	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`tuff`
Christoffersen	Conditional coverage, mixed	Frequency and independence	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`cc`
Christoffersen	Conditional coverage, independence	Independence	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`cci`
Haas	Mixed Kupiec test	Frequency and independence	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`tbf`
Haas	Independence (time between failures)	Independence	VaR	Exact finite-sample (log likelihood)	`varbacktest`	`tbfi`
Acerbi-Szekely	"Test 2" or unconditional	Severity	ES	Tables of presimulated critical values, under normal and t distribution	`esbacktest`	`unconditionalNormal` and `unconditionalT`
Acerbi-Szekely	"Test 1" or conditional	Severity	ES	Finite-sample simulation	`esbacktestbysim`	`conditional`
Acerbi-Szekely	"Test 2" or unconditional	Severity	ES	Finite-sample simulation	`esbacktestbysim`	`unconditional`
Acerbi-Szekely	"Test 1" or ranks (quantile)	Severity	ES	Finite-sample simulation	`esbacktestbysim`	`quantile`
Acerbi-Szekely	Minimally Biased, relative version	Severity	ES	Finite-sample simulation	`esbacktestbysim`	`minBiasRelative`
Acerbi-Szekely	Minimally Biased, absolute version	Severity	ES	Finite-sample simulation	`esbacktestbysim`	`minBiasAbsolute`
Du-Escanciano	Unconditional	Severity	ES	Large-sample approximation and finite-sample simulation	`esbacktestbyde`	`unconditionalDE`
Du-Escanciano	Conditional	Independence	ES	Large-sample approximation and finite-sample simulation	`esbacktestbyde`	`conditionalDE`

References

[1] Basel Committee on Banking Supervision. Supervisory Framework for the Use of “Backtesting” in Conjunction with the Internal Models Approach to Market Risk Capital Requirements. January 1996. https://www.bis.org/publ/bcbs22.htm.

[2] Acerbi, C., and B. Szekely. Backtesting Expected Shortfall. MSCI Inc. December 2014.

[3] Acerbi, C., and B. Szekely. "General Properties of Backtestable Statistics. SSRN Electronic Journal. January, 2017.

[4] Acerbi, C., and B. Szekely. "The Minimally Biased Backtest for ES." Risk. September, 2019.

[5] Acerbi, C. and D. Tasche. “On the Coherence of Expected Shortfall.” Journal of Banking and Finance. Vol. 26, 2002, pp. 1487-1503.

[6] Du, Z., and J. C. Escanciano. "Backtesting Expected Shortfall: Accounting for Tail Risk." Management Science. Vol. 63, Issue 4, April 2017.

[7] Rockafellar, R. T. and S. Uryasev. "Conditional Value-at-Risk for General Loss Distributions." Journal of Banking and Finance. Vol. 26, 2002, pp. 1443-1471.