normalizing a normal distribution

53 visualizaciones (últimos 30 días)
Maksym Zawrotny
Maksym Zawrotny el 8 de En. de 2019
Editada: John D'Errico el 18 de Mzo. de 2023
I just need to plot a gaussion distribution plot given mean (mu) and standard deviation (sigma). I used:
gauss1 = normpdf(x, mu, sigma)
But it's output its not normalized. I would like it to be normalized as probability density function. I need to plot it next to histogram normalized it by:
histogram(dataSerie, 'Normalization', 'probability')
After using normpdf it looked like:
it is clear that integral from 0 to 1 is not equal 1, so it is not probability density function.

Respuestas (1)

John D'Errico
John D'Errico el 18 de Mzo. de 2023
Editada: John D'Errico el 18 de Mzo. de 2023
You don't show the actual mean and standard deviation used for that plot, so I'll make a wild guess.
mu = 0.55;
S = 0.18;
fplot(@(x) normpdf(x,mu,S),[0,1])
That seems pretty close to the plot shown. You want to use a TRUNCATED normal distribution, so truncated on the interval [0,1]. The simplest way to achieve what you want is to use the truncate function, but that would not give any real understanding to what should be done.
help truncate
--- help for prob.NormalDistribution/truncate --- TRUNCATE Truncate probability distribution to an interval. T = TRUNCATE(P,LOWER,UPPER) takes a probability distribution P and returns another probability distribution T that represents P truncated to the interval with lower limit LOWER and upper limit UPPER. The pdf of T is zero outside the interval. Inside the interval it is equal to the pdf of P, but divided by the probability assigned to that interval by P. Example: Create normal distribution truncated to the interval [-2,2]. p = makedist('normal') q = truncate(p,-2,2) See also Truncation, IsTruncated. Help for prob.NormalDistribution/truncate is inherited from superclass prob.TruncatableDistribution Documentation for prob.NormalDistribution/truncate doc prob.NormalDistribution/truncate
First, what is the area under that curve? The very best way is to use normcdf. So we could do this:
format long g
CDF01 = normcdf([0,1],mu,S)
CDF01 = 1×2
0.00112321990250323 0.993790334674224
So a normal distribution with those parameters has an area of 0.0011 below x==0, and it has an area of 0.9938, below x==1. So the area between those two points must be the difference.
A01 = diff(CDF01)
A01 =
0.992667114771721
Now we can normalize the TRUNCATED normal PDF by dividing by A01. It is very close to 1, because that PDF has very little mass outside of those limits.
fplot(@(x) normpdf(x,mu,S)/A01,[0,1])
Not very different, I know. I only had to divide by 0.9927, so I doubt you would see the difference.
Could I have normalized the PDF in a different way? OF COURSE! For example, I could have computed that area using integral.
A01num = integral(@(x) normpdf(x,mu,S),0,1)
A01num =
0.992667114771721
As you can see, integral agrees completely. (Off-topic: Be careful though, as integrating normal PDFs can be a problem some of the itme. This is perhaps the most common error people seem to make when computing an integral. If the standard deviation of the normal is very small, so the normal distribution almost looks like a relative direac delta, then integral can get that answer wrong.)
But as you should see, the area under that curve is the same as what normcdf said it should be.
Can we sample from this truncated distribution? Of course. Again, it would have been simplest to use truncate.
One way would be to generate uniformly distributed random numbers in the interval
CDF01
CDF01 = 1×2
0.00112321990250323 0.993790334674224
Now invert those values through the untruncated (and unscaled) normal CDF. This inverse precedure is perhaps the most common way to sample from a distribution. It applies as long as the inverse CDF is available as a function, and it is sufficiently fast to compute.
R = rand(1,1e7)*diff(CDF01) + CDF01(1);
X = norminv(R,mu,S);
histogram(X,'normalization','pdf')
hold on
fplot(@(x) normpdf(x,mu,S)/A01,[0,1])
hold off
legend('Histogram of 1e7 samples','Scaled, truncated pdf')
As you can see, the histogram overlays on top of the line in red perfectly.
The other common method for sampling from such a distribution would be a rejection scheme. Since there is so little mass outside of those limits for this particular problem, rejection would actually be also quite efficient. You simply sample from the untruncated normak distributino, then toss away any samples that lie outside of [0,1]. Your rejection rate here would be well under 1% of the samples.

Productos


Versión

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by