Numerical integration and NaN

I want to calculate the following integral:
integral( pdf(data_one) / pdf(data_two) )
I managed to do this, but I had to resort to a hack to avoid divisions by zero. This hack works but creates deviations that annoy me - what is the proper way to do this?
data_axis = 1:5;
data_one = [1,2,3,4,5];
data_two = [1,1,1,1,5];
% Calculate probability densities
pdf_one = histc(data_one,data_axis); % [0.2,0.2,0.2,0.2,0.2]
pdf_two = histc(data_two,data_axis); % [0.8,0.0,0.0,0.0,0.2]
% HACK: Histc may have generated 0-buckets
%pdf_one = (pdf_one + 0.1);
%pdf_two = (pdf_two + 0.1);
% Normalize the result
pdf_one = pdf_one./sum(pdf_one);
pdf_two = pdf_two./sum(pdf_two);
complicated_integral = trapz(pdf_one./pdf_two);

Respuestas (1)

Walter Roberson
Walter Roberson el 11 de Mayo de 2011

1 voto

Probability Density is not defined for discrete distributions.
Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point.
and
If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds for finite and countable sets.
This implies that you are asking to calculate something which is not defined, or else that your calculation is incorrect, perhaps attempting to sample the counts at a point rather than over a range.

5 comentarios

James
James el 12 de Mayo de 2011
Fair enough.
The code I posted above is severely shortened to not confuse the reader. The actual data I use is much larger so that calculating a pdf should be reasonably exact. But I get your point.
But how should I tackle this problem instead? I need to calculate the probabilty distribution of two finite datasets and then calculate the integral as shown above. I am unsure what to do here instead of histc and trapz.
Walter Roberson
Walter Roberson el 12 de Mayo de 2011
Standard "probability density function" is not defined for a finite dataset.
You are trying to form an integral. Integration is an operation that inherently applies only to infinite sets. If you have a function that is only defined at discrete places, then you either need to use some other kind of calculation entirely or else replace the integral by a discrete approximation such as trapezoid calculation.
James
James el 18 de Mayo de 2011
Thank you for your reply Walter.
But am I not doing exactly what you are suggesting by using trapz()? That is where my problem stems from.
Richard Crozier
Richard Crozier el 18 de Mayo de 2011
If you know what the underlying form of the pdf of the data from some scientific knowledge of the system from which you have sampled the data, could you not fit the distribution to the datasets and integrate the resulting continuous functions? Of course the accuracy of this is doubtful for some distributions, e.g. an exponential distribution.
James
James el 18 de Mayo de 2011
That is a nice idea which I will have to implement anyways for other parts of the project.
Though I fear that in this case, it would ultimately not solve my problem. My main problem currently comes from the fact that I am doing this:
pdf_one./pdf_two
= [0.2,0.2,0.2,0.2,0.2] ./ [0.8,0.0,0.0,0.0,0.2]
= [0.2500, Inf, Inf, Inf, 1.0000]
And then try to trapezoid-integrate over it, which is pretty stupid. Whenever there is no probability of a value appearing in said pdf_two (=0), I am diving by zero.

Iniciar sesión para comentar.

Categorías

Preguntada:

el 11 de Mayo de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by