normalizing a normal distribution

Question

Maksym Zawrotny el 8 de En. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/438819-normalizing-a-normal-distribution

Editada: John D'Errico el 18 de Mzo. de 2023

I just need to plot a gaussion distribution plot given mean (mu) and standard deviation (sigma). I used:

gauss1 = normpdf(x, mu, sigma)

But it's output its not normalized. I would like it to be normalized as probability density function. I need to plot it next to histogram normalized it by:

histogram(dataSerie, 'Normalization', 'probability')

After using normpdf it looked like:

it is clear that integral from 0 to 1 is not equal 1, so it is not probability density function.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

John D'Errico el 18 de Mzo. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/438819-normalizing-a-normal-distribution#answer_1195965

Editada: John D'Errico el 18 de Mzo. de 2023

Abrir en MATLAB Online

You don't show the actual mean and standard deviation used for that plot, so I'll make a wild guess.

mu = 0.55;

S = 0.18;

fplot(@(x) normpdf(x,mu,S),[0,1])

That seems pretty close to the plot shown. You want to use a TRUNCATED normal distribution, so truncated on the interval [0,1]. The simplest way to achieve what you want is to use the truncate function, but that would not give any real understanding to what should be done.

help truncate
--- help for prob.NormalDistribution/truncate ---

 TRUNCATE Truncate probability distribution to an interval.
     T = TRUNCATE(P,LOWER,UPPER) takes a probability distribution P and
     returns another probability distribution T that represents P
     truncated to the interval with lower limit LOWER and upper limit
     UPPER. The pdf of T is zero outside the interval. Inside the
     interval it is equal to the pdf of P, but divided by the probability
     assigned to that interval by P.
 
     Example: Create normal distribution truncated to the interval [-2,2].
         p = makedist('normal')
         q = truncate(p,-2,2)
 
     See also Truncation, IsTruncated.

Help for prob.NormalDistribution/truncate is inherited from superclass prob.TruncatableDistribution

    Documentation for prob.NormalDistribution/truncate
       doc prob.NormalDistribution/truncate

First, what is the area under that curve? The very best way is to use normcdf. So we could do this:

format long g
CDF01 = normcdf([0,1],mu,S)
CDF01 = 1×2
       0.00112321990250323         0.993790334674224

So a normal distribution with those parameters has an area of 0.0011 below x==0, and it has an area of 0.9938, below x==1. So the area between those two points must be the difference.

A01 = diff(CDF01)
A01 = 
         0.992667114771721

Now we can normalize the TRUNCATED normal PDF by dividing by A01. It is very close to 1, because that PDF has very little mass outside of those limits.

fplot(@(x) normpdf(x,mu,S)/A01,[0,1])

Not very different, I know. I only had to divide by 0.9927, so I doubt you would see the difference.

Could I have normalized the PDF in a different way? OF COURSE! For example, I could have computed that area using integral.

A01num = integral(@(x) normpdf(x,mu,S),0,1)
A01num = 
         0.992667114771721

As you can see, integral agrees completely. (Off-topic: Be careful though, as integrating normal PDFs can be a problem some of the itme. This is perhaps the most common error people seem to make when computing an integral. If the standard deviation of the normal is very small, so the normal distribution almost looks like a relative direac delta, then integral can get that answer wrong.)

But as you should see, the area under that curve is the same as what normcdf said it should be.

Can we sample from this truncated distribution? Of course. Again, it would have been simplest to use truncate.

One way would be to generate uniformly distributed random numbers in the interval

CDF01
CDF01 = 1×2
       0.00112321990250323         0.993790334674224

Now invert those values through the untruncated (and unscaled) normal CDF. This inverse precedure is perhaps the most common way to sample from a distribution. It applies as long as the inverse CDF is available as a function, and it is sufficiently fast to compute.

R = rand(1,1e7)*diff(CDF01) + CDF01(1);

X = norminv(R,mu,S);

histogram(X,'normalization','pdf')

hold on

fplot(@(x) normpdf(x,mu,S)/A01,[0,1])

hold off

legend('Histogram of 1e7 samples','Scaled, truncated pdf')

As you can see, the histogram overlays on top of the line in red perfectly.

The other common method for sampling from such a distribution would be a rejection scheme. Since there is so little mass outside of those limits for this particular problem, rejection would actually be also quite efficient. You simply sample from the untruncated normak distributino, then toss away any samples that lie outside of [0,1]. Your rejection rate here would be well under 1% of the samples.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

normalizing a normal distribution

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

normalizing a normal distribution

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos