Fit a statistical distribution to truncated data
30 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Sim
el 19 de Jun. de 2023
I have a "truncated dataset" and I would need to infer the distribution that most likely fits the data. Even though I have a "truncated dataset", instead of a "full dataset", I think that the best fitting distribution would be that one that could describe the "full dataset". This best-fitting distribution would be something like what is depicted by the blue line in this plot:
Do you have any comment, suggestion, or idea on how to get that blue line ?
When I tried to reproduce - with the fitdist function - the blue line in the above-mentioned figure, i.e. the best-fitting distribution as if I had the "full dataset", I was not successful. Here below you can find a comparison between the fitdist applied to the "full dataset" and the "truncated dataset", having both the same "origin", i.e. makedist('Normal','mu',3).
% (1) from a normal probability distribution, i.e. "makedist('Normal','mu',3)",
% create:
% (i) a "full dataset" and
% (ii) a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_full = random(pd,10000,1);
data_trunc = random(t,10000,1);
% (2) fit the normal distribution to
% (i) the "full dataset"
% (ii) the set of "truncated data"
pd_fit_full = fitdist(data_full,'normal');
pd_fit_trunc = fitdist(data_trunc,'normal');
% (3) plot
% (i.a) the "histogram of the full dataset" (from the "full dataset")
% (i.b) the density function corresponding to the distribution that fits the "full dataset"
% (ii.a) the "truncated histogram" (from the "truncated data")
% (ii.b) the density function corresponding to the distribution that fits the "truncated histogram"
xgrid = linspace(0,100,1000)';
hold on
histogram(data_full,100,'Normalization','pdf','facecolor','red')
line(xgrid,pdf(pd_fit_full,xgrid),'Linewidth',2,'color','red')
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,pdf(pd_fit_trunc,xgrid),'Linewidth',2,'color','blue')
hold off
xlim([0 10])
0 comentarios
Respuesta aceptada
Jeff Miller
el 20 de Jun. de 2023
If you would like to fit a variety of truncated distributions in addition to the normal, you might find Cupid helpful. For instance, here's an example with a 2-parameter Weibull:
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% Lower cutoff of 3 is known. Start with
% any reasonable guesses for the Weibull parameters--here, 2 & 2.
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% Now estimate the Weibull parameters by maximum likelihood,
% allowing for the truncation.
fittedDist.EstML(data_trunc);
xgrid = linspace(0,100,1000)';
figure
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])
1 comentario
Más respuestas (2)
Torsten
el 19 de Jun. de 2023
Editada: Torsten
el 19 de Jun. de 2023
Why should it be justified to fit a dataset of a truncated normal by a normal distribution ?
pd_fit_trunc = fitdist(data_trunc,'normal');
First complete the data set "data_trunc" by reflection at x = 3 such that it becomes distributed according to a normal distribution. Then you can fit it by a normal distribution:
% (1) from a normal probability distribution, i.e. "makedist('Normal','mu',3)",
% create:
% (i) a "full dataset" and
% (ii) a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_full = random(pd,10000,1);
data_trunc = random(t,10000,1);
data_trunc = [data_trunc;-(data_trunc-3)+3];
% (2) fit the normal distribution to
% (i) the "full dataset"
% (ii) the set of "truncated data"
pd_fit_full = fitdist(data_full,'normal');
pd_fit_trunc = fitdist(data_trunc,'normal');
% (3) plot
% (i.a) the "histogram of the full dataset" (from the "full dataset")
% (i.b) the density function corresponding to the distribution that fits the "full dataset"
% (ii.a) the "truncated histogram" (from the "truncated data")
% (ii.b) the density function corresponding to the distribution that fits the "truncated histogram"
xgrid = linspace(0,100,1000)';
hold on
histogram(data_full,100,'Normalization','pdf','facecolor','red')
line(xgrid,pdf(pd_fit_full,xgrid),'Linewidth',2,'color','red')
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,pdf(pd_fit_trunc,xgrid),'Linewidth',2,'color','blue')
hold off
xlim([0 10])
1 comentario
the cyclist
el 19 de Jun. de 2023
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
[norm_trunc, phat, phat_ci] = fitdist_ntrunc(data_trunc, [3, Inf]);
xgrid = linspace(0,100,1000)';
figure
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,norm_trunc(xgrid,phat(1),phat(2)),'Linewidth',2,'color','red')
xlim([0 10])
Ver también
Categorías
Más información sobre Probability Distributions en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!