Statistical test for difference between means

Hi,
I have two mean values and standard errors (se) of two unknown distributions.
I would like to check whether the difference between these two means is statistically significant.
mean1=1.2545, se1=0.0145
mean2=1.6913, se2=0.0172
Thanks,
Aviram.

 Respuesta aceptada

mean1=1.2545; se1=0.0145;
mean2=1.6913; se2=0.0172;
rho = 0.01;
s = 2*normcdf(-abs(mean1-mean2), 0, min(se1,se2))
s = 2.3405e-199
s < rho
ans = logical
1

7 comentarios

Aviram Shemen
Aviram Shemen el 25 de Dic. de 2020
Thank you! this is very helpfull!
Can you, please, help me understand why this is the way to do this?
Is there a name for this test? what are the assumptions?
Thanks,
Aviram.
Suppose that mean1 <= mean2 . Then
normcdf(mean1, mean2, se2)
gives the cdf that N(mean2, se2) <= mean1 . In turn you can recenter that as
N(0, se2) <= mean1 - mean2
Keep in mind here the assumption that mean1 <= mean2 so mean1 - mean2 is non-positive. "Given standard deviation #2, what is the cumulative probability that mean1 is (mean2 - mean1) to the left of it? Equal mean would be 0 difference which would be cdf 0.5
Now, suppose instead that mean2 <= mean1 then the you could look at the upper-tail probability that you are (mean1 - mean2) to the right of the center given that standard deviation, but if you try to calculate the cdf to the upper bound then you are likely to get a value too close to 1 to be useful. So it is easier to reflect it down to the other side and ask for the cdf to be that far to the left. The distance to the center is abs(mean1-mean2) and you the distance left of center, so -abs(mean1-mean2), tested against a normal distribution with mean 0 and second standard deviation. And since you have quietly moved from two-tail distribution to one-tail distribution, you need to double the probability -- if the difference in means was 0 then the probability would have accumulated to 1.0 . You want the test to succeed when the probability < 0.05 or < 0.01, one chance in 20 or one chance in 100 that the means are at least that far away.
Thus you have 2*cdf(-abs(mean1-mean2), 0, se2)
Then you flip the two around to find out the probability that sample 2 is at least as far as way based on se1, and by symmetry it is obvious you have 2*cdf(-abs(mean1-mean2), 0, se1) for that.
You now want to choose the more extreme of those two. But the more extreme of the two will be for the case with the smaller standard deviation. And that brings us to
2*cdf(-abs(mean1-mean2), 0, min(se1,se2))
Higher value meaning it is more likely that the two means are the same, lower value meaning it is less likely, more significant that they are different... so you want to test < rho
Aviram Shemen
Aviram Shemen el 25 de Dic. de 2020
Got it! Thank you very much!
Is there a name for this procedure/test? Say in an article, how should I say the statistical significance was calculated?
Walter Roberson
Walter Roberson el 25 de Dic. de 2020
I just made it up based on the definitions. I did write it out in longer form and tested and verified that my short-form version gave the same result.
Aviram Shemen
Aviram Shemen el 27 de Dic. de 2020
Thanks! I appreciate your help!
mean1=1.2545; se1=0.0145;
mean2=1.6913; se2=0.0172;
rho = 0.01;
minse = min(se1,se2);
s = 2*normcdf(-abs(mean1-mean2)/minse, 0, 1)
s = 2.3405e-199
A small modification to the above that perhaps might make it marginally clearer based upon common statistics: we effectively normalize back to mean 0 and standard deviation 1, and then it is just a cdf calculation
Aviram Shemen
Aviram Shemen el 28 de Dic. de 2020
Thanks!

Iniciar sesión para comentar.

Más respuestas (1)

Image Analyst
Image Analyst el 28 de Dic. de 2020

0 votos

Wouldn't you use ttest()?

11 comentarios

mean1=1.2545; se1=0.0145;
mean2=1.6913; se2=0.0172;
ttest(mean1, mean2)
ans = NaN
Aviram Shemen
Aviram Shemen el 28 de Dic. de 2020
To use ttest you need to have a distribution. In this case I don't have the full distribution.
Image Analyst
Image Analyst el 28 de Dic. de 2020
How were the means and standard errors computed then?
Aviram Shemen
Aviram Shemen el 28 de Dic. de 2020
I'm using coxphfit, and its output is the mean value for the coefficient beta, and its se.
Image Analyst
Image Analyst el 28 de Dic. de 2020
Strange how you got it to work with no input data though.
Aviram Shemen
Aviram Shemen el 29 de Dic. de 2020
For coxphfit I have input data. However, its output come only in a summerized manner as beta and se.
Image Analyst
Image Analyst el 29 de Dic. de 2020
What happened to the original input data you had? How did it go missing?
Paul
Paul el 29 de Dic. de 2020
Like Image Analyst, I also think that a t-test is can be used to test for the difference between means (so can a z-test in some circumstances). But, there are different types of t-tests and the correct one to use depends on your assumptions about the underlying populations, the number of samples from those populations, and the alnternative hypothesis, none of which have been explicitly stated in the original question.
I had never head of coxphfit, but took a quick look and I'm still not clear on what the outputs mean in the context of thise question. Is the se1 in the question that stats.se field? What is mean1? Is it b (stats.beta)? Is stats.beta the sample mean of some random sample of some population?
Aviram Shemen
Aviram Shemen el 29 de Dic. de 2020
Hi,
I didn't provide any information regarding coxphfit in my original question since I didn't think it was relevant...
Any way, let's say I have two covariates for which I want to calculate their Hazard Ratio - HR [exp(stats.beta)] in relation to some baseline H0.
The output of coxphfit is two betas, one for each covariate.
In my original question,
mean1=stats.beta(1);
mean2=stats.beta(2);
se1=stats.se(1);
se2=stats.se(2);
As far as I understand, coxphfit uses a maximization of likelihood method to calculate stats.beta, and not by averaging some distribution. Thus, as an output of coxphfit I get only "summarized" data.
My original data is irrelevant for the comparison between stats.beta(1) and stats.beta(2), so I'm left only with the "summarized" data to establish the statistical significance of the differnce between stats.beta(1) and stats.beta(2).
Paul
Paul el 29 de Dic. de 2020
Editada: Paul el 29 de Dic. de 2020
You've posed an interesting question, but I'm afraid that I can't be of any more help.
Aviram Shemen
Aviram Shemen el 29 de Dic. de 2020
Thanks!

Iniciar sesión para comentar.

Etiquetas

Preguntada:

el 24 de Dic. de 2020

Comentada:

el 29 de Dic. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by