How to plot confidence bounds for a theoretical cumulative distribution function?

I understand how to plot upper and lower confidence bounds for an experimental cumulative distribution function using the ecdf function.
But how to plot upper and lower confidence bounds for a theoretical cumulative distribution like for example the Theoretical CDF in the plot shown below? (copied from: https://www.mathworks.com/help/stats/cdfplot.html)

6 comentarios

What do you mean by confidence bounds on the theoretical CDF? The theoretical CDF isn't estimated, so how can it have confidence bounds?
With theoretical I mean "the parametric probabiliy distribution" (e.g. Weibull) that is fitting the observed data such that in an hypothesis test the distribution would be accepted as a good predictor for how the population data is distributed.
So I would expect that confidence intervals around that distribution, given that there is also a confidence interval around the observed data.
Paul
Paul el 6 de Ag. de 2021
Editada: Paul el 6 de Ag. de 2021
I see. I was confused because "the parametric proability distribution" is not the same thing as the "Theorectical CDF" in the plot from that link.
If you're interested in fitting a PDF to the data, check out the function fitdist(), which returns confidence intervals for the distribution parameters (at least for some of the distributions).
Sorry for the confusion - but would you know how to calculate the confidence bounds for such a "parametric probability distribution"? Thank you.
Thank you, Paul, the paramci() function (when applied to the probability distribution calculated with fitdist()) provides the confindence intervals for the distribution parameters. Can these be used to calculate the entire upper and lower confidence bounds "around" the probability distribution? Is there a generic formula for that or is the formula dependent on the type of probability distribution?
I agree with your question to @Jeff Miller, thanks for explaining it like that, because indeed, I want to be able to calculate the upper and lower confidence bounds for various types of probability distributions, dependent on what the outcome of the hypothesis tests are in terms of which probability distribution is fitting the observed data in the best way (highest p-value).
I don't know if there is a way to do what you want; but I'm far from an expert on such things. Having said that, my intuition is that fitting a distribution is an exercise in estimating the parameters of the distribution, and that's why fitdist only returns the CI's around the parameters, i.e., it's only those parameters that are being estimated. In contrast, the ecdf() function is estimating something at each value in the xdata, so there it seems reasonable to come up with a CI around each estimate, which is what the dotted curves are in that plot. I'll be interested to see other answers to your question.

Iniciar sesión para comentar.

Respuestas (1)

For a given X value, the theoretical cumulative probability is p = F(X). Suppose you have a sample of N observations and you let k be the number of observations <= X. k is (by definition) binomial(N,p) with the known N and that theoretical p. Using that binomial distribution, you can get upper and lower confidence limits on the observed k (e.g., with a normal approximation to the binomial). Then divide those upper and lower lilmits on k by N and you will have upper and lower confidence limits on p for that X value.

4 comentarios

Thank you. Can you point me to the required Matlab functions to calculate the upper and lower limits in line with your explanation?
It's a little bit tricky because the binomial is discrete. But say you want a 95% confidence interval (i.e., 2.5% below the cutoff and 2.5% above). Then at a given X with it's associated p,
lower_k = binoinv(0.025,N,p) - 1; % -1 because of discreteness
upper_k = binoinv(0.975,N,p);
lower_k will have less than 0.025 of the probability at or below it, and upper_k will have less at or above.
Then divide these k bounds by N to get probability bounds.
HTH
Based on the comment chain in the question, it sounds like @Eric-Jan Scharlee wants to fit a specified CDF (e.g,. Weibull) to some data, and then show a confidence interval around the fitted distribution. Is that the process you're describing?
Thanks for the clarification, @Paul. I misunderstood the original question as pertaining to a known theoretical distribution (i.e., with known parameter values). No, the process I am describing does not apply to fitted distributions.
@Eric-Jan Scharlee As Paul says, CIs around parameter estimates are standard, but CIs around a fitted CDF are not. I am not even sure how that would be defined. I suppose you could generate a range of CDFs with different combinations of parameter values within the parameter CIs and take the extremes of those (at each X) as some kind of theoretical CI, but that's really ad hoc. To me, it seems better to focus on the CIs from the ecdf.

Iniciar sesión para comentar.

Etiquetas

Preguntada:

el 5 de Ag. de 2021

Comentada:

el 6 de Ag. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by