Unique function not deleting duplicate rows.
Mostrar comentarios más antiguos
attached my matrix "M" and here is my code.
[trash,idx] = unique(M,'rows');
pleb=M(idx,:)
gg=sort(pleb)
When inspecting gg we see that there are still duplicate rows.
I've also tried to do it in different ways, for example;
[~, III, ~] = unique(M,'first','rows'); %removing double points
III = sort(III);
pleb = M(III,:);
gg=sort(pleb);
But they either delete non duplicate data, or delete too few data.
What am I doing wrong?
Respuesta aceptada
Más respuestas (3)
Titus Edelhofer
el 4 de Mayo de 2015
Hi Luc,
I don't see duplicate data, but the data change sign ...? Take last 4 rows of pleb and it's
19.4558 -4.1355 -2.0906
19.4558 -4.1355 2.0906
19.4558 4.1355 -2.0906
19.4558 4.1355 2.0906
Look similar but all 4 are completely different - as long as -2.0906 is different from 2.0906 ;-).
Similar for the other "4-row-blocks".
When you take the abs then the story is different,
Titus
3 comentarios
luc
el 4 de Mayo de 2015
@luc: There is no reason why those rows would be removed, as
- all rows of M are already unique
- sort(M) sorts each column independently, so there is no reason why these rows should be unique (or removed) either.
You need to actually describe what you are trying to achieve.
Titus Edelhofer
el 4 de Mayo de 2015
Editada: Titus Edelhofer
el 4 de Mayo de 2015
Indeed. As I wrote as comment, if you would sort keeping rows as rows, i.e., using
sortrows(M)
then you would see, that there are no duplicate rows.
John D'Errico
el 4 de Mayo de 2015
Editada: John D'Errico
el 4 de Mayo de 2015
There are NO equal rows. I checked. They are different in sign. There are no rows that are even that close to each other, although the nearest neighbor is not uniformly close.
The check that I made was to find the point for each row that was closest in distance. I.e., the nearest neighbor. There ARE no essentially zero distances.
The overall closest pair of points are 1.7291 units apart.
Mu = unique(M,'rows');
D = ipdm(Mu,'subset','smallestfew','limit',1)
D =
(87,95) 1.7291
D = ipdm(Mu,'subset','nearest')
D =
(2,1) 4.1811
(1,2) 4.1811
(13,3) 4.1811
(14,4) 4.1811
(6,5) 4.1811
(5,6) 4.1811
(8,7) 4.1811
(7,8) 4.1811
(15,9) 4.1811
(16,10) 4.1811
(17,11) 4.1811
(18,12) 4.1811
(3,13) 4.1811
(4,14) 4.1811
(9,15) 4.1811
(10,16) 4.1811
(11,17) 4.1811
(12,18) 4.1811
(26,25) 4.1811
(25,26) 4.1811
(28,27) 4.1811
(27,28) 4.1811
(35,29) 4.1811
(36,30) 4.1811
(37,31) 4.1811
(38,32) 4.1811
(19,33) 4.1811
(20,34) 4.1811
(29,35) 4.1811
(30,36) 4.1811
(31,37) 4.1811
(32,38) 4.1811
(21,39) 4.1811
(22,40) 4.1811
(23,41) 4.1811
(24,42) 4.1811
(33,47) 4.1811
(53,47) 3.3826
(55,47) 3.3826
(34,48) 4.1811
(54,48) 3.3826
(56,48) 3.3826
(43,49) 4.1811
(44,50) 4.1811
(45,51) 4.1811
(46,52) 4.1811
(39,53) 4.1811
(47,53) 3.3826
(40,54) 4.1811
(48,54) 3.3826
(41,55) 4.1811
(42,56) 4.1811
(58,57) 4.1811
(57,58) 4.1811
(60,59) 4.1811
(59,60) 4.1811
(69,61) 4.1811
(70,62) 4.1811
(71,63) 4.1811
(72,64) 4.1811
(49,65) 4.1811
(91,65) 3.3826
(50,66) 4.1811
(92,66) 3.3826
(51,67) 4.1811
(93,67) 3.3826
(52,68) 4.1811
(94,68) 3.3826
(61,69) 4.1811
(62,70) 4.1811
(63,71) 4.1811
(64,72) 4.1811
(79,77) 1.7291
(81,77) 1.7291
(80,78) 1.7291
(82,78) 1.7291
(77,79) 1.7291
(78,80) 1.7291
(73,83) 4.1811
(74,84) 4.1811
(75,85) 4.1811
(76,86) 4.1811
(95,87) 1.7291
(96,88) 1.7291
(97,89) 1.7291
(98,90) 1.7291
(65,91) 3.3826
(83,91) 4.1811
(105,91) 3.3826
(66,92) 3.3826
(84,92) 4.1811
(106,92) 3.3826
(67,93) 3.3826
(85,93) 4.1811
(107,93) 3.3826
(68,94) 3.3826
(86,94) 4.1811
(108,94) 3.3826
(87,95) 1.7291
(101,95) 1.7291
(88,96) 1.7291
(102,96) 1.7291
(89,97) 1.7291
(103,97) 1.7291
(90,98) 1.7291
(104,98) 1.7291
(99,101) 4.1811
(100,102) 4.1811
(109,105) 4.1811
(110,106) 4.1811
(111,107) 4.1811
(112,108) 4.1811
(113,109) 4.1811
(114,110) 4.1811
(115,111) 4.1811
(116,112) 4.1811
(121,117) 4.1811
(122,118) 4.1811
(123,119) 4.1811
(124,120) 4.1811
(117,121) 4.1811
(118,122) 4.1811
(119,123) 4.1811
(120,124) 4.1811
(126,125) 4.1811
(125,126) 4.1811
(133,127) 1.7291
(134,128) 1.7291
(135,129) 1.7291
(136,130) 1.7291
(132,131) 4.1811
(131,132) 4.1811
(127,133) 1.7291
(128,134) 1.7291
(129,135) 1.7291
(130,136) 1.7291
(139,137) 1.7291
(140,138) 1.7291
(137,139) 1.7291
(138,140) 1.7291
(145,141) 4.1811
(149,141) 3.3826
(146,142) 4.1811
(150,142) 3.3826
(147,143) 4.1811
(151,143) 3.3826
(148,144) 4.1811
(152,144) 3.3826
(153,145) 4.1811
(154,146) 4.1811
(155,147) 4.1811
(156,148) 4.1811
(141,149) 3.3826
(165,149) 4.1811
(142,150) 3.3826
(166,150) 4.1811
(143,151) 3.3826
(167,151) 4.1811
(144,152) 3.3826
(168,152) 4.1811
(159,157) 3.3826
(177,157) 4.1811
(160,158) 3.3826
(178,158) 4.1811
(157,159) 3.3826
(179,159) 4.1811
(158,160) 3.3826
(180,160) 4.1811
(169,161) 4.1811
(170,162) 4.1811
(171,163) 4.1811
(172,164) 4.1811
(181,165) 4.1811
(182,166) 4.1811
(183,167) 4.1811
(184,168) 4.1811
(161,169) 4.1811
(162,170) 4.1811
(163,171) 4.1811
(164,172) 4.1811
(174,173) 4.1811
(173,174) 4.1811
(176,175) 4.1811
(175,176) 4.1811
(189,177) 4.1811
(190,178) 4.1811
(191,179) 4.1811
(192,180) 4.1811
(193,185) 4.1811
(194,186) 4.1811
(195,187) 4.1811
(196,188) 4.1811
(185,193) 4.1811
(186,194) 4.1811
(187,195) 4.1811
(188,196) 4.1811
(198,197) 4.1811
(197,198) 4.1811
(200,199) 4.1811
(199,200) 4.1811
(205,201) 4.1811
(206,202) 4.1811
(207,203) 4.1811
(208,204) 4.1811
(201,205) 4.1811
(202,206) 4.1811
(203,207) 4.1811
(204,208) 4.1811
(210,209) 4.1811
(209,210) 4.1811
(212,211) 4.1811
(211,212) 4.1811
6 comentarios
Sean de Wolski
el 4 de Mayo de 2015
John D'Errico
el 4 de Mayo de 2015
Ooh! Neat. Uniquetol is a function that has long been needed.
Stephen23
el 4 de Mayo de 2015
Hmmm... but how does uniquetol resolve the "transitivity problem" described here?:
The algorithm basically groups values together based on their sort-order, which means, as the docs state, different outputs are possible depending on this sorting. So it is not exactly robust, but I guess it is as good as we can expect unless someone solves the "transitivity problem".
luc
el 4 de Mayo de 2015
Sean de Wolski
el 4 de Mayo de 2015
First, your screenshot is too small to see.
Second, here's a good exercise to explain the small differences in floating point: Run this:
>> format hex
Then rerun the command. See! They're different, even if just by a little.
luc
el 4 de Mayo de 2015
Robert
el 17 de Oct. de 2018
0 votos
If anyone encounters truly duplicate rows in the output of unique like I did, this may be caused by NaN in your data being treated as distinct values. See this question for more info.
Categorías
Más información sobre Logical en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!