Find contiguous values in vector that sum to specified value

Hi all,
I have a column vector e.g. V=[1 0 2 2 1 1]
I need to look within this vector and find a run of contiguous values which sum to a specific value (e.g. 5). If such a value can be found, I want the code to simply return the number of contiguous values it needed to add up to reach that target value.
So in the example above, I can see that I can add [1 0 2 2] to get 5, so that's 4 contiguous values summed, I can also see I can get 5 by adding [2 2 1] (3 values summed).
One way of doing this is applying a moving window which goes through the data summing the contents of the window. Every time the sum of the window equals my target value, that counts as a hit. The window then increases in size (1:length(V)).
The big problem I can see with this is computationally this is likely to be painfully slow because my length(V)=10^6. It's a million window sizes, and up to a million window shifts for each window
I wondered if anyone out there can think of a better way of achieving my aim?

1 comentario

I have two questions to ask which are important in choosing an efficient algorithm. 1) Are all your V values integers as in your example or is there a concern with round-off errors in checking sums? 2) Are all the values non-negative?

Iniciar sesión para comentar.

 Respuesta aceptada

Image Analyst
Image Analyst el 21 de En. de 2013
Editada: Image Analyst el 21 de En. de 2013
You can use conv() to do the sums very efficiently. Try this:
tic
% Make a vector a million long.
v = randi(3, 1, 1000000);
for windowSize = 1 : 11;
sums = conv(v, ones(1, windowSize), 'same');
indexes = find(sums == 5);
fprintf('Number of 5s with window size = %d = %d\n', windowSize, length(indexes));
end
toc;
In the command window:
Number of 5s with window size = 1 = 0
Number of 5s with window size = 2 = 222880
Number of 5s with window size = 3 = 222507
Number of 5s with window size = 4 = 49205
Number of 5s with window size = 5 = 4065
Number of 5s with window size = 6 = 1
Number of 5s with window size = 7 = 0
Number of 5s with window size = 8 = 0
Number of 5s with window size = 9 = 0
Number of 5s with window size = 10 = 0
Number of 5s with window size = 11 = 0
Elapsed time is 0.343283 seconds.
Note, that some stretches of a window size might be the same stretch as a longer window if zeros are included. For example a window size of 2 with [2 3] and a window size of 3 with [2 3 0] and a window size of 4 with [2 3 0 0], etc. If you want to exclude longer window sizes from looking at the same stretch then you have to zero out the array where the smaller window was. Counting the lengths of the stretches is pretty easy if you have the Image Processing Toolbox. Just label and call regionprops at the end of each iteration. If you have that toolbox and can't figure it out, let me know. There may also be overlaps, for example in the sequence [2 3 2] you can get a sum of 5 at two positions, [2 3] and [3 2] so the 3 is in an overlap region.

8 comentarios

Swisslog
Swisslog el 21 de En. de 2013
Editada: Swisslog el 21 de En. de 2013
You are correct, the 0s are an issue. I apologize for not for-seeing that. Ideally I would prefer it if the code ignored any window that began or ended with a zero (it's ok for the window to contain 0s). I don't have the image processing toolbox, so certainly if you can see a quick way of addressing this then do let me know.
To answer the questions, all values in the real data are actually non-integers, I have actually just adapted the code to search for a range of values with tight constraints, e.g.:
indexes = find(sums>3 & sums<6);
There are no negative values to worry about
Image Analyst
Image Analyst el 21 de En. de 2013
Editada: Image Analyst el 21 de En. de 2013
If you don't have integers, then it's not a problem. If you have integers you can just get rid of zeros altogether and that shouldn't affect the result. Oh wait, yes it will, because of the overlap issue I mentioned. But you have floating point values so you most likely don't have an exact zero. So the number of elements to sum to the value is just the window size, so from my example, 222880 summed to 5 with 2 elements, 222507 summed to 5 with 3 elements, and so on.
What is the purpose of this procedure anyway? Why do you want to do it?
The 0s are a problem alas. They are indeed exact 0s
It's kind of hard to explain the purpose - essentially, each element in the original vector is a rate (thickness/time). By summing the rates, one can calculate the thickness over various timespans. I want to turn that problem around and thus calculate the range of timespans for a given thickness...don't know if that helps!
Well if you have [0 2 3 0] so you'd get two hits [0 2 3] and [2 3 0], which hit do you want to count? The first one only?
Neither sorry: with [0 2 3 0] a window length of 3 would return nothing as any window length of 3 applied to the vector would either begin or end with a 0. Does that make sense?
Swisslog
Swisslog el 21 de En. de 2013
Editada: Swisslog el 21 de En. de 2013
OK, think I may have cracked this:
for i = 1:9;
sums = conv(test, ones(1, i), 'valid');
P0=test==0;
ignore = or(P0(1:end-i+1), P0(i:end));
sums(ignore) = [];
indexes = find(sums>3 & sums<5);
Output(i)=length(indexes);
end
Will test a bit more, but if anyone sees any glaring issues then do chip in
Matt J
Matt J el 21 de En. de 2013
Editada: Matt J el 21 de En. de 2013
A more efficient way of doing this
indexes = find(sums>3 & sums<5);
Output(i)=length(indexes);
is the 1-liner
Output(i) = nnz(sums>3 & sums<5);
See also my Answer, however. Repeated convolution on increasingly larger windows surely performs many redundant summations as compared to working with cumsum(V). Also, discarding so many convolution points might be less efficient than what IPDM does.
Thanks Matt, I will investigate. Thanks Image Analyst as well :-)

Iniciar sesión para comentar.

Más respuestas (1)

Matt J
Matt J el 21 de En. de 2013
Editada: Matt J el 21 de En. de 2013
Another approach might be to apply the IPDM tool
to the vector
Vc=cumsum(V(:));
Basically, every difference
Vc(i)-Vc(j)
corresponds to a sum over a particular window. The IPDM tool will compute these differences, but gives various options to restrict the search efficiently to certain component differences of interest. In your case, you might use
out = ipdm(Vc,'Subset','Maximum',...
'Limit',targetvalue+tolerance,...
'Result','Structure');
idx= (out.distance <= targetvalue-tolerance )&...
(out.rowindex > out.columnindex);
out.rowindex(idx)=[];
out.columnindex(idx)=[];
WindowWidths = out.columnindex-out.rowindex+1;

Categorías

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by