How to efficiently generate a random integer within a range from an arbitrary probability distribution

Question

0 votos

I need to generate a random integer within a range from an arbitrary probability distribution, within a loop of 100000 iterations. My implementation works, but I am not sure it is mathematically clean, and it takes forever:

pdf = [ 0.9 0.3 0.003 0.1 0.07 0.0005 0.003 0.15 0.009 0.08 ]; % discrete prob distrib function
cdf = cumsum(pdf); % cumulative distribution function
cdf = cdf / max(cdf); cdf(1) = 0; % normalization
index = ceil(interp1(cdf, [1:numel(pdf)], rand(1)))

Notice that the pdf above is just an example: my actual case is a vector of about 500 numbers.

Here is a different solution, which seems mathematically cleaner, but does not work for my overall problem, and is just as slow as above:

pdf = [ 0.9 0.3 0.003 0.1 0.07 0.0005 0.003 0.15 0.009 0.08 ]; % discrete prob distrib function
cdf = cumsum(pdf); % cumulative distribution function
cdf = cdf - min(cdf); cdf = cdf / max(cdf); % normalization
index = round(interp1(cdf, [1:numel(pdf)], rand(1)))

Is there a more efficient/correct way to do this?

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

the cyclist el 8 de En. de 2017

Editada: the cyclist el 8 de En. de 2017

Abrir en MATLAB Online

1 voto

I think that

index = sum(rand()>cdf)+1;

will be much faster than using interp1 as you do, and will give the same result.

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

the cyclist el 8 de En. de 2017

I double-checked by generating 100,000 samples, and the results seem right.

the cyclist el 8 de En. de 2017

Abrir en MATLAB Online

Regarding speed ...

I get roughly equivalent results between my solution and randsample. Note that both of these solutions can be vectorized:

index = sum(rand(N,1)>cdf,2)';

and

index = randsample(1:numel(pdf),N,true,pdf);

These will both generate N = 100,000 samples in a few milliseconds.

I did not try to vectorize your solution, but as it currently stands, it took 5 seconds to generate that many.

Iniciar sesión para comentar.

Answer 2

the cyclist el 8 de En. de 2017

1 voto

Do you have the Statistics and Machine Learning Toolbox? If so, you can do this with the randsample command.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Paolo Binetti el 8 de En. de 2017

Thank you. No, I do not have this toolbox but I'll see if I can get it.

Iniciar sesión para comentar.

How to efficiently generate a random integer within a range from an arbitrary probability distribution

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

How to efficiently generate a random integer within a range from an arbitrary probability distribution

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos