Minimize error between data distribution and expected distribution
Mostrar comentarios más antiguos
Hi all,
I have a 3 set of data which are expected to:
1) 1st data-block to approach a Gaussian distribution with mu = 0 and sigma = 1;
2) 2nd data-block to approach a Gaussian distribution with mu = 0 and sigma = .8;
3) 3rd data-block to approach a Gaussian distribution with mu = 0 and sigma = .5;
Each data-block has only a limited number of representations (generally between 2048 and 8192) and because of some filter effects drawn by the specific code I use, they will not exactly match the corresponding expected distribution.
The point is that, although what it implies in terms of manipulation, I want each data-block to minimize the discrepancy between actual and expected distribution. It's to be remarked that I won't increase the number of representations, due to some need I will not explain in detail.
Generally, the first data-block, respect to the normal Gaussian distribution, looks like the followinf figure:

I was thinking to use lsqcurvefit for this purpose.
What would you suggest?
Respuestas (1)
Wouter
el 20 de Mzo. de 2013
Do you know this function:
histfit
6 comentarios
PEF
el 20 de Mzo. de 2013
Wouter
el 20 de Mzo. de 2013
Still not very clear to me; it sounds a bit counterintuitive.
1) you create your own data
2)your data has noise
3) you data does not meet your expectations (i.e. not a perfect Gaussian distribution)
4) you want to change your data?
And regarding the blue bars not matching the red line exactly; this is always the case with noisy data right? You could try to make bigger bins in your histogram to visually filter it out, but still they will be there.
PS: Sorry if I misunderstand your question :) It is rather confusing.
PEF
el 20 de Mzo. de 2013
Shashank Prasanna
el 21 de Mzo. de 2013
I agree with Wouter, it does seem counter-intuitive, to modify your data to fit an ideal distribution.
In most situations the reason you fit distributions is so that you can generate random samples from it, and the fitted distribution is the closest Gaussian you can get.
I guess if you can provide some details about what kind of filtering you are doing that is inducing non normality, there may be a solutions that is better suited.
You could try to change individual datapoints after your filteringset in order to update your datapoints; this will change the blue bars. For example; find a blue bar that is too high; change one of those datapoints into a value which lies in a blue bar that too low (compared to the red line). This does however changes your data and will render step 2)treat_with_piece_of_code useless.
However it makes more sense to find a better fit to the histogram; i.e. change the red line. Lsqcurvefit would only be useful if you would like to update the red line (fit)
PEF
el 21 de Mzo. de 2013
Categorías
Más información sobre Get Started with Curve Fitting Toolbox en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!