Delete rows with bad data and surrounding rows
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I would like to delete rows which contain ones, sinces ones indicate bad data (inclusion criterion 1). Moreover, I would like to remove rows that are surrounded by those rows with bad information. The aim is to only include rows if they are present in sets of minimally 3 good (all zeros) rows (inclusion criterion 2). I created a matrix B to explain my question:
B = [0 1 0 0 1 0 1;
0 0 0 0 0 0 0;
0 1 0 0 1 0 1;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 1 0 1 1 0 1;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 0 1 0;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 1 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
1 0 0 0 1 0 0;
1 0 1 1 1 0 1];
In this 19x7 matrix row 1, 3, 4, 6, 8, 9, 10 ,1 4, 18 an 19 would be deleted by inclusion criterion 1. So far my loop (for multiple matrices like B) works. Regarding my inclusion criterion 2, row 2, 5, 7, and 8 must be deleted as well since they are not part of set of 3 or more rows with zeros. For inclusion criterion 2 I have to create an if structure in my existing loop.
% find or strcmp to look for the rows
% todelete = [] to eliminate these r
How can I delete rows that contain ones OR (||) are present in a set of less than 3 rows with all zeros?
2 comentarios
madhan ravi
el 26 de Jul. de 2019
Would you mind showing how your expected result should look like??
Respuesta aceptada
Jon
el 26 de Jul. de 2019
Here's another approach
% script to clean data
B = [0 1 0 0 1 0 1;
0 0 0 0 0 0 0;
0 1 0 0 1 0 1;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 1 0 1 1 0 1;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 0 1 0;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 1 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
1 0 0 0 1 0 0;
1 0 1 1 1 0 1];
D = rand(size(B)); % data matrix to be cleaned
% assign parameters
minRun = 3; % minimum number of adjacent rows to be considered good data
% make vector with ones for the good rows (rows with only zeros)
iGood = ~any(B,2);
% now mark the locations where the beginning and end of each run of ones
% starts and ends
% use diff to create jumps at transitions, pad with 1's to ensure jump at start and end
isJump = [1; diff(iGood(:))~=0; 1];
% find location of jumps
jmpIdx = find(isJump);
% find run lengths of zeros, and ones
n = diff(jmpIdx);
% n has the lengths of runs of zeros, and runs of ones interleaved, but we
% need to find out whether it starts with the zeros, or starts with the
% ones
if iGood(1) == 1
% starts with ones
offset = 0;
else
% starts with zeros
offset = 1;
end
% in preparation for using repelem, build a vector with alternating
% values of zero and run lengths
run = zeros(size(n)); % initalize and preallocate
iStart = 1 + offset; % element where first run of ones starts
run(iStart:2:end) = n(iStart:2:end);
% assign the run lengths corresponding to each row
runLength = repelem(run,n);
% only keep rows in B that are members of sufficiently wide (run length) peaks
idxClean = 1:size(B,1);
idxClean = idxClean(runLength >= minRun);
Bclean = B(idxClean,:);
% also probably want to clean some other matrix based upon status of B
Dclean = D(idxClean,:)
4 comentarios
Guillaume
el 30 de Jul. de 2019
I'm not sure how MATLAB handles these type of regional differences
More often than not: badly, unfortunately.
Más respuestas (3)
Guillaume
el 26 de Jul. de 2019
Editada: Guillaume
el 26 de Jul. de 2019
First, the easiest and fastest way to implement criterion 1 is:
todelete = any(B, 2);
For criterion 2, since you just want to look on either side, you can just shift up or down the above vector:
todeleteall = todelete | [false; todelete(1:end-1)] | [todelete(2:end); false];
B(todeleteall, :) = []
Another way of implementing 2, particularly if you want a larger windows than one each side is with a convolution:
halfwindow = 1; %up or down
todeleteall = conv(todelete, ones(2*halfwindow+1, 1), 'same') > 0;
B(todeleteall, :) = []
edit: or as shown by Andrei, you could also use imdilate. There are many ways you could implement that criterion 2. movsum would be another one (which would let you have different before and after good rows).
edit2: As per the cyclist comment, the above is not quite right, see later comment for the actual solution.
11 comentarios
Guillaume
el 1 de Ag. de 2019
the above can easily be changed to apply to just certain columns. If the criteria is that good rows have 0s in column 1,4,5,6, 9,10 and 11, then
todelete = any(B(:, [1, 4, 5, 6, 9, 10, 11]), 2);
and then, as it got buried in all the comments, the simplest way to apply criterion 2 is:
startrun = strfind(todelete', [0, 0, 0]); %need 3 consecutive zeros
tokeep = unique(startrun + [0; 1; 2]);
B = B(tokeep, :)
Andrei Bobrov
el 26 de Jul. de 2019
Editada: Andrei Bobrov
el 26 de Jul. de 2019
ii - row indices with valid data (imdilate - function from the Image Processing Toolbox).
ii = find(~imdilate(any(B,2),[1;1;1]));
Other variant
lo = any(B,2) == 0;
ii_valid = unique(strfind(lo(:)',ones(1,3)) + (0:2)');
0 comentarios
the cyclist
el 26 de Jul. de 2019
Editada: the cyclist
el 26 de Jul. de 2019
Here is one way.
Bm2 = [ones(2,N); B(1:end-2,:)];
Bm1 = [ones(1,N); B(1:end-1,:)];
Bp1 = [B(2:end,:); ones(1,N)];
Bp2 = [B(3:end,:); ones(2,N)];
v = not(any(B, 2));
vm2 = not(any(Bm2,2));
vm1 = not(any(Bm1,2));
vp1 = not(any(Bp1,2));
vp2 = not(any(Bp2,2));
valid = (vm2 & vm1 & v) | (vm1 & v & vp1) | (v & vp1 & vp2);
The output variable valid is a logical vector with "true" at each valid row. Use
find(valid)
to get the indices of the valid rows.
0 comentarios
Ver también
Categorías
Más información sobre Creating and Concatenating Matrices en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!