randperm non uniformly distributed
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
AbioEngineer
el 15 de Ag. de 2019
Comentada: AbioEngineer
el 15 de Ag. de 2019
I want to sample from integers 1 through 56 without replacement. Neither randperm nor datasample with 'Replacement',false give a uniformly distributed set if I iterate many times. Why is the last bin in the histogram double the size of the the rest?
perms=zeros(10000,6);
samps=zeros(10000,6);
[rp, cp]=size(perms);
for p=1:rp
permstemp = randperm(56,6);
perms(p,:)=permstemp;
end
[rs, cs]=size(samps);
for s=1:rs
sampstemp = datasample(1:56,6,'Replace',false);
samps(s,:)=sampstemp;
end
histogram(perms(1:end))
histogram(samps(1:end))
0 comentarios
Respuesta aceptada
John D'Errico
el 15 de Ag. de 2019
Sigh. This is NOT a question of non-uniformity. Just a question of not understanding how to recognize non-uniformity, and partially how to understand a histogram.
If you create a histogram with too few bins, what happens is there will be SOME bins that have multiple counts in those bins.
It turns out that histogram decided to use bin edges of 1:56 here, so the last bin got used for twice as many samples.
Note the difference between these two calls to histogram:
histogram(perms(1:end))
histogram(perms(1:end),1:56)
histogram(perms(1:end),1:57)
The first two produce the same results. So it appears the default for the bin edges was 1:56. However, when I gave it another bin up to 57, all things appear normal.
So what happens when I have bin edges 1:56? There are integer events at 56, and some at 55. So that last bin had all events that were either 55 OR 56 in the bin. Whereas bin number 1 only had the events that were strictly a 1. When I get it one more bin to use for the histogram, things were now fine.
So before you claim non-uniformity, think about whether the test you are using that asserts non-uniformity might be flawed.
3 comentarios
Steven Lord
el 15 de Ag. de 2019
John is correct. As stated in the histogram documentation page, "Each bin includes the left edge, but does not include the right edge, except for the last bin which includes both edges."
Before John added that last bin edge at 57, the last bin was [55, 56] and the next-to-last bin was [54, 55). So the last bin counted two distinct values from the data.
After John added that last bin edge at 57, the last bin is [56, 57] and the next-to-last bin is [55, 56). Each of the last two bins now counts only one distinct value from the data.
Más respuestas (1)
Ver también
Categorías
Más información sobre Data Distribution Plots en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!