- Create an x vector and a vector of bin edges, so that the count in each cell comes out to the values in your vector "sample". Use thtose as inputs to chi2gof().
- Compute the chi quared test statistic yourself and compare it to a critical value, using the correct degrees of freedom.
using chi2gof to determine sample representativeness
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello,
I am trying to use chi2gof function to test if the collected sample data is representative of the population data. Say here we have 8 bins and we have the population and sample value for each bin. Is this the correct way to do this test?
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
[h,p,k]=chi2gof(Sample,'Expected',Population);
0 comentarios
Respuestas (1)
William Rose
el 26 de Oct. de 2022
Your code (below) does not work because chi2gof expects a vector x containing the observed values of the valriable- not the count of how many are in each cell, which you have provided.
There are (at least) 2 solutions.
Furthermore: Cells with 0 expected value cause the calculation of the chi squared statistic to blow up. Cells with less than 4-5 expected should be combined as needed, until all cells have at least 4-5 expected. Therefore combine cells 4-8 into a single cell:
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
pop2 = [996, 749, 370, sum(Population(4:8))]
sample2 = [647, 486, 100, sum(Sample(4:8))]
Now let's try method 1 above:
x=[];
for i=1:length(sample2), x=[x,i*ones(1,sample2(i))]; end
edges=.5+(0:length(sample2));
Now do the chi2 test using chi2gof(). k has statistical info, so we inspect it, to make sure the observed values ("O") are what we want them to be.
[h,p,k]=chi2gof(x,'Expected',pop2,'Edges',edges)
The oberved vector "O" has the values in "sample2" vector. That means our x vector and the edges vector worked as desired.
h=1 means the null hypothesis (which is that the sample data matches the population) is rejected.
The low p value means it is highly improbable to get the observed data from this population.
Method 2: Compute the chi2 test statistic ourselves, then compare it to the critical value with the correct degrees of freedom.
chi2stat=sum((sample2-pop2).^2./pop2)
df=length(pop2)-1; pcrit=.05; chi2crit=chi2inv(pcrit,df);
h2=chi2stat>chi2crit; p2=1-chi2cdf(chi2stat,df);
fprintf('h=%d, p=%.3f\n',h2,p2);
The chi squared statistic and h and p match the test statistic and h and p we found above with Method 1.
0 comentarios
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!