Moving window with cumulative sum less than a ref value and excluding certain windows
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
RANJITH REDDY KALLURI
el 12 de En. de 2017
I have an array of 36000 cells, i want to break those arrays into windows of cumulative sum less than a reference value. and exclude few windows based on other parameters respective to this array based on index. For example A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]; similarly other arrays of same length. Now window 1 should start with 1st value and end at sum less than 2 ( window{1} = [0.5 0.6 .45], sum = 1.55) second window has to start from 2nd value ( window{2} = [0.6 .45 .56 .056 .16], sum = 1.826). Similarly i need all possible windows for 36000 cells, while excluding those where value/sum of other arrays/windows of respective index. Index of these windows is also needed. I here wanted a moving window. Eg Index of my windows would be like [ 1 3] , [2 6], [3 9], [4 10], [5 11], [6 11], [ 7 11], [ 8 11], [9 11], [10, 11].
- If i don't wont to consider values in index 3,4, 8, i want to exclude windows which include these indexes.
- If i want to exclude windows whose corresponding array average is less than certain value. Ex: I've another B whose size is same as A. after breaking array A into windows, average of array B windows is less than a value (20) has to be excluded.
3 comentarios
Image Analyst
el 16 de En. de 2017
RANJITH, please attach some data. I'd like to know if your data is really in cells like you said, or if it's a regular double array. Cell arrays are very slow and inefficient memory hogs - the drawbacks to their being really flexible as to what kind of data they can hold. If you're using cells then I encourage you to see if you can use a standard numerical array, which will speed things up. I can't tell if you're using cells (like MATLAB definition, not Excel's definition) or standard arrays because you're using both braces (like cell arrays use) and brackets (like standard arrays use). But maybe you're using the term cell like Excel uses it - this would be "element" in MATLAB's terminology - I don't know which. Plus, attaching data would help out the people who are trying to solve your question.
Respuesta aceptada
John BG
el 23 de En. de 2017
Editada: John BG
el 23 de En. de 2017
Ranji
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
% masking
mask=[3 4 8];
d=0;
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d);
B(d,:)=[];
reasons why I ask you to consider accepting my answer:
1. my code works, no errors like Jan Simon's Out(end,:)=[36800 36799]
2. my excluding mask section doesn't crash, like Simon's.
3. from 1.7 seconds to 0.5 seconds .. if it's really about cutting down time, C should be the language to use.
4. Ranji, If it's really about speed you may want to compile it, which will require the coder to get it all through C before building the .exe and then functions cumsum and find are going to add many more lines and end up taking more time than the few lines I have supplied.
5. the warning error that Jan Simon claims does not show up in MATLAB version R2016, it may happen in his MATLAB version R2009 but not in up to date MATLAB version.
6. his recommendation ' .. This is always a bad programming practize .. ' is the typical pointless remark that does not contribute to help solve the question.
1 comentario
Jan
el 25 de En. de 2017
Editada: Jan
el 26 de En. de 2017
Shouldn't the last line be [36800, 36800]? [36799, 36799] ignores the last element.
As said before: Your code crashes, if the input contains trailing zeros:
while A(n2)==0
n2=n2+1;
end
From 1.7 to 0.5 seconds only by using pre-allocation and omit the zeros? This is a nice speed up. Feel free to insert the pre-allocation lines in your code to increase its quality.
The editor of R2016b shows the message in the lines "B=[B;n1 n2-1];" and "d=[d k];" as the orange marks in ths MLint-warning column on the rigt side:
The variable 'B' appears to change size on every loop iteration.
Consider pre-allocation for speed.
When the code for the masking is included also, your code needs 8.46 sec, and my suggestion 0.28 sec. The severe drawback of a missing pre-allocation is the exponential growing of the runtime: If the input array A has the double size, your code needs 55.3 sec already compared to 0.6 of mine. As you can see, pre-allocation is fundamental in programming and it does matter the solution here also. For more details search for "Schlemiel the Painter's algorithm", e.g. https://en.wikipedia.org/wiki/Joel_Spolsky:
In software development, a Shlemiel the painter's algorithm
(sometimes, Shlemiel the painter algorithm) is a method that is
inefficient because the programmer has overlooked some
fundamental issues at the very lowest levels of software design.
Therefore your statement "typical pointless remark" means, that you did not get the point.
Más respuestas (5)
Andrei Bobrov
el 16 de En. de 2017
Editada: Andrei Bobrov
el 19 de En. de 2017
I'm corrected, thank you Jan Simon.
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
out = zeros(numel(A),2);
n = numel(A);
t = 2;
for ii = 1:n
z = cumsum(A(ii:n)) < t;
out(ii,:) = [ii, find(z,1,'last') + ii - 1] ;
end
5 comentarios
Andrei Bobrov
el 19 de En. de 2017
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
t = 2;
tav = 0.1473;
n = numel(A);
wind = cell(n,1);
out = zeros(n,2);
lo = true(n,1);
for ii = 1:n
jj = ii:n;
tt = cumsum(A(ii:n));
z = tt < t;
iend = find(z,1,'last');
lo(ii) = tt(iend)./jj(iend) >= tav;
out(ii,:) = [ii,iend+ii-1];
wind{ii} = A([ii:iend+ii-1]);
end
wind = wind(lo);
out = out(lo,:);
John BG
el 12 de En. de 2017
Editada: John BG
el 19 de En. de 2017
Let me split the question into 2:
1. generating window indices
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]
B={0}
for k=1:1:numel(A)
s=0;n1=k ;n2=k
while s<2
s=s+sum(A([n1:n2]));
if n2<numel(A)
n2=n2+1;
elseif n2>=numel(A)
n2=numel(A);
end
end
if n2<numel(A)
B=[B [n1 n2-1 ]];
elseif n2>=numel(A)
B=[B [n1 n2]];
end
end
L=zeros(size(B,2)-2,2);
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:);
end
L contains the sought indices
L =
1 3
2 4
3 5
4 7
5 11
6 11
7 11
8 11
9 11
10 11
2.
Adding selectivity.
At this point I'm not sure whether you want to exclude any window that has in example index=3 then L would have rows 1 and 3 removed, or you also would like to have any window that implicitly has index 3, which then would mean that L row 2 would also be excluded
case 1: only excluding indices in given index, example 3:
[i j]=find(L==3)
L(i,:)=[]
L =
2 4
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 2: excluding any window directly or implicitly referring a given index:
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
d=0
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],3)
d=[d k];
end
end
d(1)=[];
L(d,:)=[]
L =
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 1, using a mask:
mask=[3 4 8]
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(L==mask(k))
q=[q i(:)']
end
q(1)=[]
q=sort(q)
L(q,:)=[]
L =
5 11
6 11
7 11
9 11
10 11
case 2, using a mask:
L=zeros(size(B,2)-2,2) % restore L
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
L(d,:)=[]
L =
9 11
10 11
if you find these lines useful would you please mark my answer as Accepted Answer?
To any other reader, if you find this answer of any help please click on the thumbs-up vote link,
thanks in advance for time and attention
John BG
5 comentarios
John BG
el 19 de En. de 2017
Editada: John BG
el 21 de En. de 2017
Ranji
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG
John Chilleri
el 12 de En. de 2017
Hello,
I tossed this together real quick so there might be unforeseen errors, but this should work:
% Given A
window = cell(36000,1);
window_index = cell(36000,1);
for i = 1:36000
count = i;
while (sum(A(i:count) < 2))
if (count == 36000)
count = count+1;
break;
end
count = count + 1;
end
window{i} = A(i:count-1);
window_index{i} = i:count-1;
end
% Deleting windows containing undesired indices
Undesired = [];
for i = 1:36000
for j = 1:length(Undesired)
if (sum(window_index{i} == Undesired(j)) == 0)
window{i} = [];
window_index{i} = [];
break;
end
end
end
Note that you need to give A and fill in Undesired. Otherwise it will make the windows empty that have undesirable indices, so if you access them after "removal" they'll be empty.
Hope this helps!
1 comentario
John BG
el 23 de En. de 2017
Editada: John BG
el 23 de En. de 2017
Ranji
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
.
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
.
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG
1 comentario
Jan
el 23 de En. de 2017
Editada: Jan
el 23 de En. de 2017
@John BG: Matlab's editor shows an important message for your code: In "B=[B;n1 n2-1]" the array grows iteratively. This is always a bad programming practize. Use a pre-allocation instead:
% Before the loop:
B = zeros(numA, 2);
iB = 0;
% Instead of B=[B;n1 n2-1]:
iB = iB + 1;
B(iB, :) = [n1, n2-1];
% After the loop:
B = B(1:iB, :);
This reduces the runtime on my machine from 2.86 to 1.34 seconds.
The "while A(n2)==0" will fail if A contains trailing zeros.
At least under the old Matlab R2009a I can use currently, creating "A" dynamically by "load Array.mat" increases the runtime by a factor of 50. This allows the JIT acceleration to operate efficiently:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
But this will most likely not matter the code of the OP, because the data are provided as file for the discussion in the forum only.
When the data contain a lot of zeros, removing them will accelerate the processing:
Index = find(A);
A = A(Index);
... as above
B = Index(B); % Get original indices back
This gives an additional boost of 40% on my machine. See my answer.
Jan
el 23 de En. de 2017
Editada: Jan
el 25 de En. de 2017
The part for finding the intervals:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
Index = find(A);
AA = A(Index); % Ignore zeros in cumulative sum
A2 = cumsum(AA);
numAA = numel(AA);
Out = zeros(numAA, 2); % Pre-allocation
t0 = 38.19;
for n1 = 1:numAA
n2 = n1;
while A2(n2) - A2(n1) < t0 && n2 < numAA
n2 = n2 + 1;
end
Out(n1, 1) = Index(n1);
Out(n1, 2) = Index(n2) - 1;
end
Out(numAA, 2) = numel(A); % Fix last element
[EDITED] [EDITED2: Bugfix: "B"->"Out"]
exclude = [3, 4, 8];
mask = false(size(Out, 1), 1);
for k = 1:numel(exclude)
mask = mask | (Out(:, 1) >= exclude(k) & exclude(k) <= Out(:, 2));
end
Out(mask, :) = [];
4 comentarios
John BG
el 23 de En. de 2017
the last line of the answer of Jan Simon is not valid.
Out([end-10:end],:)
ans =
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
36800 36799
it should be
B([end-10:end],:)
ans =
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
and the Jan Simon's [EDITED] section to apply the mask crashes:
exclude = [3, 4, 8];
mask = false(1, size(Out, 1));
for k = 1:numel(exclude)
mask = mask | (B(:, 1) >= exclude(k) & exclude(k) <= B(:, 2));
end
B(mask, :) = [];
Error using |
Matrix dimensions must agree.
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!