Puzzling behavior of ranksum

Question

Paul el 3 de Mayo de 2017

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/338558-puzzling-behavior-of-ranksum

Comentada: Star Strider el 24 de Jul. de 2017

I performed a ranksum test on two vectors of 80 and 88 entries, both with 0 median and in all respects fairly similar. I assumed ranksum would tell me the difference between the two vectors was insignificant but surprisingly ranksum returned a p < 0.05. I started playing around to try and understand the output better and came across the following puzzling behavior of ranksum:

As I added an identical number of 0's(5,10,20,50..) to the end of both vectors and redid the ranksum test, the p-value it output became smaller. The more 0's I added to both vectors the smaller the p-value I received upon testing. This seemed strange to me because by adding identical entries to both vectors all sample statistics should converge, right? And the more similar the sample statistics the more likely they were drawn from the same distribution?

I have been reading quite a lot about the Wilcoxon Rank Sum test but have not come across an explanation for this behavior. I'm not a statistician and I'm getting at the end of my wits here. If anybody could tell me what I'm missing it would be greatly appreciated!

Best, Paul

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Star Strider el 3 de Mayo de 2017

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/338558-puzzling-behavior-of-ranksum#answer_265516

I’ve not used the Wilcoxon Rank-Sum test in a while, but as I recall (and a brief review just now supports, at least as I read it), the p-value is the probability that the two medians are different (or one greater or less than the other in a one-tailed test). So a low p-value would be interpreted to mean that the probability of different medians is low, and a high p-value the probability of different medians is high.

This is counter-intuitive with respect to the interpretation of the t-test, for example, where a low p-value indicates a low probability that the means are the same, and a high p-value a high probability that the means are the same.

I would be interested to read others’ interpretations and clarifications.

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Star Strider el 4 de Mayo de 2017

Abrir en MATLAB Online

My pleasure.

I was in the middle of replying to this earlier when Firefox crashed, taking MATLAB with it.

I didn’t try to reproduce your results yesterday, but did today and in R2017a got results that are not the same as yours. Mine apparently are the probability of a different median. (My Answer yesterday remains the same because it is consistent with my nonparametric statistics references.)

My results are actually complementary to yours. I am including my data so you can compare my results with yours.

You might consider contacting MathWorks about this if your results differ from mine.

Example —

v_mtx1 = randi(10,10,2)
z_v = zeros(20,1);
v_mtx2 = [v1 v2; z_v z_v];
[p1,h1,stats1] = ranksum(v_mtx1(:,1),v_mtx1(:,2))
[p2,h2,stats2] = ranksum(v_mtx2(:,1),v_mtx2(:,2))
v_mtx1 = [4     1
          9     1
          6     6
          6     8
         10    10
          3     2
          8     6
          8     5
          4     1
          6     4];
p1 =
      0.14583
h1 =
   0
stats1 = 
       zval: 1.4544
    ranksum: 124.5
p2 =
      0.73758
h2 =
   0
stats2 = 
       zval: 0.33505
    ranksum: 934.5

May the Fourth be with you!

Paul el 24 de Jul. de 2017

Editada: Paul el 24 de Jul. de 2017

Abrir en MATLAB Online

Hi Star Strider,

Sorry for the long radio silence. I moved on to do different things for a while but now I'm back with the same problem. First let me thank you again for your answer and code.

When I run your code (which is basically what I suggested you try), indeed I do get the same sensible results form ranksum that you find as well; adding zeros to two identically sized vectors and then testing their difference with ranksum returns a larger p-value than when testing without the 0-padding.

However, when I start with two vectors of dissimilar length, the p-value decreases with 0-padding. If you're still interested in humoring me, here's the code I used:

v1 = randi(30,30,1)
v2 = v1(5:end-5)
v3 = [v1;zeros(10,1)];
v4 = [v2;zeros(10,1)];
p1 = ranksum(v1,v2)
p2 = ranksum(v3,v4)
    v1 =
    27
    17
     2
     3
    17
     8
    29
     8
    29
    18
     8
    14
    25
     4
    11
     7
     2
    28
    22
    30
    12
    22
     8
    20
     7
     3
    24
    24
    29
     4
v2 =
    17
     8
    29
     8
    29
    18
     8
    14
    25
     4
    11
     7
     2
    28
    22
    30
    12
    22
     8
    20
     7
p1 =
    0.8402
p2 =
    0.6941

Star Strider el 24 de Jul. de 2017

My pleasure.

As I read it, larger vectors result in larger values for the test statistic, so a lower p-value. More data (with the same distribution) would reduce the p-value with every statistical test I’m familiar with.

Iniciar sesión para comentar.

Puzzling behavior of ranksum

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Puzzling behavior of ranksum

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos