Distribution sampling

1 visualización (últimos 30 días)
Lakshman Dontha
Lakshman Dontha el 8 de Jul. de 2011
I have 2 million samples with three parameters (a,b,c). These are correlated each other and each have different distribution (not gausian or logarithmic). Now I need to collect 60,000 samples of them with same correlation and same distribution. Is there any particular method any one can suggest? Can any one help me?

Respuestas (1)

Doug Eastman
Doug Eastman el 8 de Jul. de 2011
I'm not a statistics expert but I believe randomly sampling a set of data should come close to preserving the distribution and correlation of the original data, so here's a way to take a random subset of length n of an array A:
i = randperm(numel(A));
subset = A(i(1:n));
Here's an example showing the preserved distribution:
N = 100000;
n = 10000;
x = randn(N,1)*3+12;
y = randn(N,1)*2+2;
A = [x;y];
i = randperm(numel(A));
subset = A(i(1:n));
hist(A,100);
figure
hist(subset,100);
  2 comentarios
Lakshman Dontha
Lakshman Dontha el 10 de Jul. de 2011
Thanks, but will it apply if A is of mxn matrix?
Doug Eastman
Doug Eastman el 11 de Jul. de 2011
Sorry, fixed a typo above, but yes, this will work for any dimension A because it uses linear indexing (only one number for the index).
If you have something like 1000x3 where you want 100x3 (100 of the 1000 original samples), you would do:
i = randperm(size(A,1));
subset = A(i(1:n),:);

Iniciar sesión para comentar.

Categorías

Más información sobre Descriptive Statistics en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by