Minimize error between data distribution and expected distribution

1 visualización (últimos 30 días)

Mostrar comentarios más antiguos

PEF el 20 de Mzo. de 2013

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/67928-minimize-error-between-data-distribution-and-expected-distribution

Hi all,

I have a 3 set of data which are expected to:

1) 1st data-block to approach a Gaussian distribution with mu = 0 and sigma = 1;

2) 2nd data-block to approach a Gaussian distribution with mu = 0 and sigma = .8;

3) 3rd data-block to approach a Gaussian distribution with mu = 0 and sigma = .5;

Each data-block has only a limited number of representations (generally between 2048 and 8192) and because of some filter effects drawn by the specific code I use, they will not exactly match the corresponding expected distribution.

The point is that, although what it implies in terms of manipulation, I want each data-block to minimize the discrepancy between actual and expected distribution. It's to be remarked that I won't increase the number of representations, due to some need I will not explain in detail.

Generally, the first data-block, respect to the normal Gaussian distribution, looks like the followinf figure:

I was thinking to use lsqcurvefit for this purpose.

What would you suggest?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuestas (1)

Wouter el 20 de Mzo. de 2013

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/67928-minimize-error-between-data-distribution-and-expected-distribution#answer_79324

Abrir en MATLAB Online

Do you know this function:

histfit

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Wouter el 21 de Mzo. de 2013

Editada: Wouter el 21 de Mzo. de 2013

You could try to change individual datapoints after your filteringset in order to update your datapoints; this will change the blue bars. For example; find a blue bar that is too high; change one of those datapoints into a value which lies in a blue bar that too low (compared to the red line). This does however changes your data and will render step 2)treat_with_piece_of_code useless.

However it makes more sense to find a better fit to the histogram; i.e. change the red line. Lsqcurvefit would only be useful if you would like to update the red line (fit)

PEF el 21 de Mzo. de 2013

Abrir en MATLAB Online

I think that you started to get the point :)

The major concern is that I don't want to find the best fit to the data, but the best data fitting the standard normal distribution: for some reasons I need my data to fit gaussian distribution with mean 0 and sigma 1.

At the moment I'm proceeding this way:

 data = randn(4096,1);
 [f_p,m_p] = hist(data,128);
 f_p = f_p/trapz(m_p,f_p);
 x_th = min(data):.001:max(data);
 y_th = normpdf(x_th,0,1);
 f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
 figure(1)
 bar(m_p,f_p)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off
 figure(2)
 bar(m_p,f_p_th)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off

Now, I would proceed with calculating a scaling factor

sf = abs(f_p_th,f_p);

and I consequently scale the data in accordance to the scale factor of the corresponding bin; for example:

if data(1) falls within bin(1) --> scale with sf(1) and so on.

I do think that my question is no counter-intuitive, it's only reversing the standard procedure of fitting a distribution to a given set of data.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Categorías

AI and Statistics Statistics and Machine Learning Toolbox Probability Distributions Continuous Distributions Triangular Distribution

Más información sobre Triangular Distribution en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by