Expected Shortfall (ES) is the expected loss on days when there is a ValueatRisk (VaR) failure. If the VaR is 10 million and the ES is 12 million, we know the expected loss tomorrow; if it happens to be a very bad day, it is 20% higher than the VaR. ES is sometimes called Conditional ValueatRisk (CVaR), Tail ValueatRisk (TVaR), Tail Conditional Expectation (TCE), or Conditional Tail Expectation (CTE).
There are many approaches to estimating VaR and ES, and they may lead to different VaR
and ES estimates. How can one determine if models are accurately estimating the risk on
a daily basis? How can one evaluate which model performs better? The varbacktest
tools help validate the performance of VaR models with regards
to estimated VaR values. The esbacktest
, esbacktestbysim
, and esbacktestbyde
tools extend
these capabilities to evaluate VaR models with regards to estimated ES values.
For VaR backtesting, the possibilities every day are two: either there is a VaR failure or not. If the VaR confidence level is 95%, VaR failures should happen approximately 5% of the time. To backtest VaR, you only need to know whether the VaR was exceeded (VaR failure) or not on each day of the test window and the VaR confidence level. Risk Management Toolbox™ VaR backtesting tools support “frequency” (assess the proportion of failures) and “independence” (assess independence across time) tests, and these tests work with the binary sequence of "failure" or "nofailure" results over the test window.
For expected shortfall (ES), the possibilities every day are infinite: The VaR may be exceeded by 1%, or by 10%, or by 150%, and so on. For example, there are three VaR failures in the following example:
On failure days, the VaR is exceeded on average by 39%, but the estimated ES exceeds VaR by an average of 27%. How can you tell if 39% is significantly larger than 27%? Knowing the VaR confidence level is not enough, you must also know how likely are the different exceedances over the VaR according to the VaR model. In other words, you need some distribution information about what happens beyond the VaR according to your model assumptions. For thintail VaR models, 39% vs. 27% may be a large difference. However, for a heavytail VaR model where a severity of twice the VaR has a nontrivial probability of happening, then 39% vs. 27% over the three failure dates may not be a red flag.
A key difference between VaR backtesting and ES backtesting is that most ES
backtesting methods require information about the distribution of the returns on each
day, or at least the distribution of the tails beyond the VaR. One exception is the
“unconditional” test (see unconditionalNormal
and unconditionalT
) where you
can get approximate test results without providing the distribution information. This is
important in practice, because the “unconditional” test is much simpler to use and can
be used in principle for any VaR or ES model. The tradeoff is that the approximate
results may be inaccurate, especially in borderline accept, or reject cases, or for
certain types of distributions.
The toolbox supports the following tests for expected shortfall backtesting for
tablebased tests for the unconditional AcerbiSzekely test using the esbacktest
object:
The toolbox supports the following AcerbiSzekely simulationbased tests for expected
shortfall backtesting using the esbacktestbysim
object:
For the AcerbiSzekely simulationbased tests, you must provide the model distribution
information as part of the inputs to esbacktestbysim
.
The toolbox also supports the following Du and Escanciano tests for expected shortfall
backtesting using the esbacktestbyde
object:
For the Du and Escanciano simulationbased tests, you must provide the model
distribution information as part of the inputs to esbacktestbyde
.
The conditional test statistic by Acerbi and Szekely is based on the conditional relationship
$$E{S}_{t}={E}_{t}\left[{X}_{t}{X}_{t}<Va{R}_{t}\right]$$
where
X
_{t} is the portfolio outcome, that is,
the portfolio return or portfolio profit and loss for period
t.
VaR
_{t} is the estimated VaR for period
t.
ES
_{t} is the estimated expected shortfall
for period t.
The number of failures is defined as
$$NumFailures={\displaystyle \sum _{t=1}^{N}{I}_{t}}$$
where
N
is the number of periods in the test window
(t = 1
,…,N
).
I
_{t} is the VaR failure indicator on
period t with a value of 1 if
X
_{t} < VaR, and 0 otherwise.
The conditional test statistic is defined as
The conditional test has two parts. A VaR backtest must be run for the number of
failures (NumFailures
), and a standalone conditional test is
performed for the conditional test statistic
Z
_{cond}. The conditional test accepts
the model only when both the VaR test and the standalone conditional test accept the
model. For more information, see conditional
.
The unconditional test statistic by Acerbi and Szekely is based on the unconditional relationship,
$$E{S}_{t}={E}_{t}\left[\frac{{X}_{t}{I}_{t}}{{p}_{VaR}}\right]$$
where
X
_{t} is the portfolio outcome, that is,
the portfolio return or portfolio profit and loss for period
t.
P
_{VaR} is the probability of VaR failure
defined as 1VaR level.
ES
_{t} is the estimated expected shortfall
for period t.
I
_{t} is the VaR failure indicator on
period t with a value of 1 if
X
_{t} < VaR, and 0 otherwise.
The unconditional test statistic is defined as
The critical values for the unconditional test statistic are stable across a range
of distributions, which is the basis for the tablebased tests. The esbacktest
class runs the
unconditional test against precomputed critical values under two distributional
assumptions, namely, normal distribution (thin tails, see unconditionalNormal
), and t distribution with 3
degrees of freedom (heavy tails, see unconditionalT
).
A sample estimator of the expected shortfall for a sample
Y
_{1},…,Y
_{N} is:
$$\stackrel{\u2322}{ES}(Y)=\frac{1}{\lfloor N{p}_{VaR}\rfloor}{\displaystyle \sum _{i=1}^{\lfloor N{p}_{VaR}\rfloor}{Y}_{\left[i\right]}}$$
where
N
is the number of periods in the test window
(t = 1
,…,N
).
P
_{VaR} is the probability of VaR failure
defined as 1VaR level.
Y
_{1},…,Y
_{N}
are the sorted sample values (from smallest to largest), and $$\lfloor N{p}_{VaR}\rfloor $$ is the largest integer less than or equal to
Np
_{VaR}.
To compute the quantile test statistic, a sample of size N
is
created at each time t as follows. First, convert the portfolio
outcomes to X
_{t} to ranks $${U}_{1}={P}_{1}({X}_{1}),\mathrm{...},{U}_{N}={P}_{N}({X}_{N})$$ using the cumulative distribution function
P
_{t}. If the distribution assumptions
are correct, the rank values
U
_{1},…,U
_{N}
are uniformly distributed in the interval (0,1). Then at each time
t:
Invert the ranks U =
(U
_{1},…,U
_{N})
to get N
quantiles $${P}_{t}^{1}(U)=({P}_{t}^{1}({U}_{1}),\mathrm{...},{P}_{t}^{1}({U}_{N}))$$.
Compute the sample estimator $$\stackrel{\u2322}{ES}({P}_{t}^{1}(U))$$.
Compute the expected value of the sample estimator $$E\left[\stackrel{\u2322}{ES}({P}_{t}^{1}(V))\right]$$
where V
=
(V
_{1},…,V
_{N})
is a sample of N
independent uniform random variables
in the interval (0,1). This can be computed analytically.
The quantile test statistic by Acerbi and Szekely is defined as
$${Z}_{quantile}=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}\frac{\stackrel{\u2322}{ES}({P}_{t}^{1}(U))}{E[\stackrel{\u2322}{ES}({P}_{t}^{1}(V))]}+1}$$
The denominator inside the sum can be computed analytically as
$$E[\stackrel{\u2322}{ES}({P}_{t}^{1}(V))]=\frac{N}{\lfloor {N}_{pVaR}\rfloor}{\displaystyle {\int}_{0}^{1}{I}_{1p}(N\lfloor {N}_{pVaR}\rfloor},\lfloor {N}_{pVaR}\rfloor ){P}_{t}^{1}(p)dp$$
where
I
_{x}(z
,w
)
is the regularized incomplete beta function. For more information, see betainc
and quantile
.
For each day, the DuEscanciano model assumes a distribution for the returns. For example, if you have a normal distribution with a conditional variance of 1.5%, there is a corresponding cumulative distribution function P_{t}. By mapping the returns X_{t} with the distribution P_{t}, you get the “mapped returns” series U_{t}, also known as the "ranks" series, which by construction has values between 0 and 1 (see column 2 in the following table). Let α be the complement of the VaR level — for example, if the VaR level is 95%, α is 5%. If the mapped return U_{t} is smaller than α, then there is a VaR “violation” or VaR “failure.” This is equivalent to observing a return X_{t} smaller than the negative of the VaR value for that day, since, by construction, the negative of the VaR value gets mapped to α. Therefore, you can compare U_{t} against α without even knowing the VaR value. The series of VaR failures is denoted by h_{t} and it is a series of 0's and 1's stored in column 3 in the following table. Finally, column 4 in the following table contains the “cumulative violations” series, denoted by H_{t}. This is the severity of the mapped VaR violations on days on which the VaR is violated. For example, if the mapped return U_{t} is 1% and α is 5%, H_{t} is 4%. H_{t} is defined as zero if there are no VaR violations.
X_{t}  U_{t} = P_{t}(X_{t})  h_{t} = U_{t} < α  H_{t} = (α  U_{t}) * h_{t} 

0.00208  0.5799  0  0 
0.01073  0.1554  0  0 
0.00825  0.2159  0  0 
0.02967  0.0073  1  0.0427 
0.01242  0.8745  0  0 
...  ...  ...  ... 
Given the violations series h_{t} and the cumulative violations series H_{t}, the DuEscanciano (DE) tests are summarized as:
DuEscanciano Test  VaR Test  ES Test 

Unconditional  Mean of h_{t}  Mean of H_{t} 
Conditional  Autocorrelation of h_{t}  Autocorrelation of H_{t} 
The DE VaR tests assess the mean value and the autocorrelation of the
h_{t} series, and the
resulting tests overlap with known VaR tests. For example, the mean of
h_{t} is expected to
match α. In other words, the proportion of time the VaR is violated is expected to
match the confidence level. This test is supported in the varbacktest
class with the proportion of failures (pof
) test (finite sample) and the binomial (bin
) test (largesample approximation). In turn, the conditional VaR
test measures if there is a time pattern in the sequence of VaR failures
(backtoback failures, and so on). The conditional coverage independence (cci
) test in the varbacktest
class tests for onelag independence. The time between
failures independence (tbfi
) test in the varbacktest
class also assesses time independence for VaR
models.
The esbacktestbyde
class
supports the DE ES tests. The DE ES tests assess the mean value and the
autocorrelation of the
H_{t} series. For the
unconditional test (unconditionalDE
), the expected value is α/2 — for example, the
average value in the bottom 5% of a uniform (0,1) distribution is 2.5%. The
conditional test (conditionalDE
) assesses not only if a failure occurs but also if the
failure severity is correlated to previous failure occurrences and their
severities.
The test statistic for the unconditional DE ES test is
$${U}_{ES}=\frac{1}{N}{\displaystyle {\sum}_{t=1}^{N}{H}_{t}}$$
If the number of observations is large, the test statistic is distributed as
$${U}_{ES}\underset{dist}{\to}N\left(\frac{\alpha}{2},\frac{\alpha (1/3\alpha /4)}{N}\right)={P}_{U}$$
where N(μ,σ^{2}) is the normal distribution with mean μ and variance σ^{2}.
The unconditional DE ES test is a twosided test that checks if the test statistic is close to the expected value of α/2. From the limiting distribution, a confidence level is derived. Finitesample confidence intervals are estimated through simulation.
The test statistic for the conditional DE ES test is derived in several steps. First, define the autocovariance for lag j:
$${\gamma}_{j}=\frac{1}{Nj}{\displaystyle {\sum}_{t=j+1}^{N}({H}_{t}\alpha /2)({H}_{tj}}\alpha /2)$$
The autocorrelation for lag j is then
$${\rho}_{j}=\frac{{\gamma}_{j}}{{\gamma}_{0}}$$
The test statistic for m lags is then
$${C}_{ES}(m)=N{\displaystyle {\sum}_{j=1}^{m}{\rho}_{j}^{2}}$$
If the number of observations is large, the test statistic is distributed as a chisquare distribution with m degrees of freedom:
$${C}_{ES}(m)\underset{dist}{\to}{\chi}_{m}^{2}$$
The conditional DE ES test is a onesided test to determine if the conditional DE ES test statistic is much larger than zero. If so, there is evidence of autocorrelation. The limiting distribution computes largesample critical values. Finitesample critical values are estimated through simulation.
The backtesting tools supported by Risk Management Toolbox have the following requirements and features.
Backtesting Tool  PortfolioData Required  VarData Required  ESData Required  VaRLevel Required^{[a]}  PortfolioID and VaRID
Supported  Distribution Information Required  Supports Multiple Models^{[b]}  Supports Multiple VaRLevel s 

varbacktest  Yes  Yes  No  Yes  Yes  No  Yes  Yes 
esbacktest  Yes  Yes  Yes  Yes  Yes  No  Yes  Yes 
esbacktestbysim  Yes  Yes  Yes  Yes  Yes  Yes  No  Yes 
esbacktestbyde  Yes  No  No  Yes  Yes  Yes  No  Yes 
^{[a]} ^{[b]} For example, you can backtest a

Risk Management Toolbox supports the following backtesting tools and their associated tests.
Test Type  Test Name  Tests for  Risk Measure  Critical Value Computation  Use Object  Use Function 

Basel  Traffic light  Frequency  VaR  Exact finitesample (binomial)  varbacktest  tl 
Various  Binomial  Frequency  VaR  Largesample normal approximation  varbacktest  bin 
Kupiec  Proportion of failures  Frequency  VaR  Exact finitesample (log likelihood)  varbacktest  pof 
Kupiec  Time until first failure  Independence  VaR  Exact finitesample (log likelihood)  varbacktest  tuff 
Christoffersen  Conditional coverage, mixed  Frequency and independence  VaR  Exact finitesample (log likelihood)  varbacktest  cc 
Christoffersen  Conditional coverage, independence  Independence  VaR  Exact finitesample (log likelihood)  varbacktest  cci 
Haas  Mixed Kupiec test  Frequency and independence  VaR  Exact finitesample (log likelihood)  varbacktest  tbf 
Haas  Independence (time between failures)  Independence  VaR  Exact finitesample (log likelihood)  varbacktest  tbfi 
AcerbiSzekely  "Test 2" or unconditional  Severity  ES  Tables of presimulated critical values, under normal and t distribution  esbacktest  unconditionalNormal
unconditionalT 
AcerbiSzekely  "Test 1" or conditional  Severity  ES  Finitesample simulation  esbacktestbysim  conditional 
AcerbiSzekely  "Test 2" or unconditional  Severity  ES  Finitesample simulation  esbacktestbysim  unconditional 
AcerbiSzekely  "Test 1" or ranks (quantile)  Severity  ES  Finitesample simulation  esbacktestbysim  quantile 
DuEscanciano  Unconditional  Severity  ES  Largesample approximation and finitesample simulation  esbacktestbyde  unconditionalDE 
DuEscanciano  Conditional  Independence  ES  Largesample approximation and finitesample simulation  esbacktestbyde  conditionalDE 
[1] Basel Committee on Banking Supervision. Supervisory Framework for the Use of “Backtesting” in Conjunction with the Internal Models Approach to Market Risk Capital Requirements. January 1996. https://www.bis.org/publ/bcbs22.htm.
[2] Acerbi, C., and B. Szekely. Backtesting Expected Shortfall. MSCI Inc. December 2014.
[3] Du, Z., and J. C. Escanciano. "Backtesting Expected Shortfall: Accounting for Tail Risk." Management Science. Vol. 63, Issue 4, April 2017.
esbacktest
 esbacktestbyde
 esbacktestbysim
 varbacktest