Removing Short Runs from Binary Data
Mostrar comentarios más antiguos
I have a large string of binary data of the form:
A = [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0]
Within the data, if I have a group of 0s with an occasional 1, I want to convert that 1 to a zero. Similarly for a group of 1s with an occasional 0.
As a rule, I want to reset runs of 1s or 0s that are shorter than 3 consecutive values in length to the value of the surrounding elements.
So 0,0,0,1,0,0,0 would become 0,0,0,0,0,0,0
I'd also like to convert something like 1,1,1,0,0,1,0,1,1 to all 1s.
Any suggestions on how to do this? Thanks in advance.
2 comentarios
Jacob Wood
el 19 de Feb. de 2020
How important is speed here? Would you prefer a readable for-loop solution or a one-liner?
Guillaume
el 19 de Feb. de 2020
Jim McIntyre's comment mistakenly posted as an answer moved here:
Obviously a one-liner would be better, but a for-loop solution is probably okay.
Respuesta aceptada
Más respuestas (2)
Guillaume
el 19 de Feb. de 2020
The desired one-liner:
%demo data
A = [1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0]
%should result in
% [1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
double(regexprep(char(A), '(.)((??@char(1-$1)){1,2})\1', '$1${char(1-$2)}$1')) %replace a run of one to twp 0 or 1 surrounded by the opposite by a run of the opposite
However, note that behaviour may not be as you expect when you've got consecutive runs of 0s and 1s both less than 3 characters, as in your 2nd example [1,1,1,0,0,1,0,1,1]. Why is it the 0s that are replaced by 1s rather than the single 1 replaced by a 0?
1 comentario
Jim McIntyre
el 19 de Feb. de 2020
Jacob Wood
el 19 de Feb. de 2020
I've got a silly one-liner:
A = [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0];
A_converted = replace(sprintf('%d', A),{'010','0110','01110','101','1001','10001'},{'000','0000','00000','111','1111','11111'}) - '0';
1 comentario
You could just do char(A + '0') to construct the char vector instead of using sprintf.
This is arguably clearer than my regexprep solution. However, the regexprep expression can easily be extended to any arbitrary length of runs (simply replace the 2 in {1, 2} by whatever max run length is desired) whereas the replace would get a bit unwieldy.
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!