Automatically fitting distribution to histogram

Hello
I have a histogram plot of one feature (machine learning). That mean on the x-axis I have several value the feature can take and on the y-axis I have the number of occurences.
Is it possible in Matlab to automatically fit a probability distribution to this histogram if I don't know which type of distribution it is (normal distribution or geometric distribution etc.)? That means Matlab should figure out which distribution it is and give me the optimal parameters.
The problem is that I have a lot of features and manually inspecting the features takes too much time.

 Respuesta aceptada

Image Analyst
Image Analyst el 29 de Feb. de 2016

0 votos

You can try all the distributions that fitdist() offers you and find which one has the lowest MSE or MAD.

7 comentarios

Sepp
Sepp el 29 de Feb. de 2016
Thanks for the input. Would you reccommend MSE or MAD (I think there are also others)?
Second, how do you calculate MSE or MAD for a probability distribution.
Third, is there an option to automatically grab all distribtions or do I have to manually specify them in fitdist?
Sepp
Sepp el 29 de Feb. de 2016
I have found the following Matlab tool which does the job: http://blogs.mathworks.com/pick/2012/02/10/finding-the-best/
Looks awesome but unfortuantely only supports parametric models. :(
Sepp
Sepp el 29 de Feb. de 2016
Editada: Sepp el 29 de Feb. de 2016
I'm now a bit confused. I have read that I have to normalize my histogram so that I see empirical probabilities instead of the numbers. Is this true if I want to try all possible parametric distribution and pick the best one? How can this normalization be done in Matlab?
Steven Lord
Steven Lord el 29 de Feb. de 2016
If you're using HISTCOUNTS or HISTOGRAM, see the Normalization option.
Image Analyst
Image Analyst el 29 de Feb. de 2016
As far as MSE or MAD goes, my statistician and I prefer Median Absolute Deviation rather than Mean Squared Error, or RMSE, or Mean or Average Absolute Deviation. It seems to be more like what people would expect and is less affected by how large the deviation is. With RMSE or AAD or especially MSE, a single really big outlier can throw your MSE way way off from what it would be if just that one point was ignored. The Median Absolute Error is rather well behaved and tolerant of outliers without making the metric go haywire.
Sepp
Sepp el 29 de Feb. de 2016
Editada: Sepp el 29 de Feb. de 2016
Thanks a lot. Is normalization required before searching the best fitting distribution?
And a last small question: If I would like to create a histogram of values (that means the empirical distribution), how should I choose the bin width?
Image Analyst
Image Analyst el 29 de Feb. de 2016
Editada: Image Analyst el 29 de Feb. de 2016
No, only if you want the area under the curve to be 1, like it would for a regular probability density function.
The bin width depends on what kind of resolution you want in the x direction. You might not want to do so many bins that each bin has only 1 or 0 counts in it, but other than that, it's up to you.

Iniciar sesión para comentar.

Más respuestas (0)

Preguntada:

el 29 de Feb. de 2016

Editada:

el 29 de Feb. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by