Function 'pdf' doesn't return pdf values
13 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I have a problem with the function pdf. I have this code:
estim_KDE = fitdist(data, 'kernel');
x = low:(abs(low-high)/(obs-1)):high;
y = pdf(estim_KDE,x);
plot(x,y,'r'), xlabel('xxx'), ylabel('yyy'),...
title('title'), legend('xyz');
but the function pdf returns values that have no sense for me: not comprised between 0 and 1, nor numbers between zero and one multiplied by the length of x (one of this two options is what i expected from the function pdf); for example: it gives me numbers like 20.something or 5.something, with length(x) = 1000 or more, numbers that have no sense for me. This happens for all the distributions i tried to have the pdf (always by the function fitdist). I discovered this problem only because i have plotted an histogram of the frequencies versus the Kernel Density Estimator.
Can someone help me, please?
0 comentarios
Respuestas (2)
John D'Errico
el 6 de Feb. de 2015
Editada: John D'Errico
el 6 de Feb. de 2015
I think you are under a common misperception about the PDF of a random variable. My guess is it is because of the letter P in PDF that confuses people, and yes, it is called a Probability Density Function.
The thing is, it does not actually return a probability. Consider a PDF with a very narrow spread. Here, a Gaussian with mean 0 and std deviation of 0.001.
normpdf(0,0,.001)
ans =
398.94
See that the PDF at 0 is 398.94, vastly larger than 1.
What matters is that the PDF integrates to 1. The integral of that function over the domain is 1.
It is the CDF that actually returns something you can interpret as a probability. Or, you can form the integral of the PDF to compute a probability. That is what the CDF gives you though.
4 comentarios
John D'Errico
el 10 de Feb. de 2015
A plot of the PDF IS a graph of the relative frequency, to the extent that this makes any sense. Why do you care about the y-axis scaling? If that is what bothers you, then just turn off the y-axis labels.
The fact is, you CAN create a histogram, of the frequency in each "bin". You would do this by either an integration of the PDF over that sub-interval, or by subtracting successive values of the CDF, to get the relative fraction that would occur in that bin.
If you used a tiny enough bin interval, then the curve would look very nice and smooth. But the probability of a point falling in any single such tiny bin would be vanishingly small. So the y-axis scaling would be all tiny numbers. This reflects the fact that any single number has probability ZERO of arising.
So, just plot the PDF, and don't worry about the y-axis, or turn it off completely.
Rob Keeton
el 3 de Sept. de 2019
Multiply by the bandwidth of the pdf.
y = pdf(estim_KDE,x)*;estim_KDE.BandWidth;
0 comentarios
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
