Select neighbours of a vector efficiently

a = 1:10;
k = 4;
I have to loop through the elements of a and pick the k - 4 - neighbors equally distributed to each side (excluding the actual element I am on - n -) except when I get close to the boundaries:
a n neigh
1 * 2 3 4 5
2 * 1 3 4 5
3 * 1 2 4 5
4 * 2 3 5 6
5 * 3 4 6 7
6 * 4 5 7 8
7 * 5 6 8 9
8 * 6 7 9 10
9 * 6 7 8 10
10 * 6 7 8 9
My dummy algorithm:
nmA = numel(a);
for n = a
lo = n-k/2;
up = n+k/2;
if lo < 1
up = up + 1-lo;
lo = 1;
elseif up > nmA
lo = lo - up + nmA;
up = nmA;
end
out = setdiff(lo:up,n)
end
Any help for an effcient algorithm? (filters, maybe?)
I am retrieving the neighbours because I need to calculate the trimmed mean and std on the neighbours.
(NOTE: my real a can be as much as 3e6 and k = 60, always even)
Thanks in advance
---------------------------------------------------------------------------------------------------------------------------------------------------------
THE FINAL ALGORITHM Just for fun... and thanks to everybody.
I have a dataset which contains intraday financial data. The first part of the dataset has 2 612 670 observations spread among 1061 days. (I have 10 parts)
For each day I have to calculate the moving mean/std with particular conditions for the beginnig/end of the day.
Firstly, I tried to loop per day but the sole indexing of obs belonging to a day was 75% of total computational time (> 22 sec).
In the end I chose the vectorised solution proposed by Jan and applied the vectorization concept to eliminate the day by day loop.
Now, for k = 20 it takes < 3 sec!
% Load part of the dataset (first column serial dates, second column prices)
tmp = load([R.d 'dataset.mat'],fields{part});
k = 20;
k2 = k / 2;
% Last observation for each day and 0
last(1,1,:) = uint32([0
find(diff(fix(tmp.(fields{part})(:,1))))
size(tmp.t200301(:,1),1)]);
% Keep only the prices (free some memory)
tmp = tmp.t200301(:,2);
% Begin of the day
Ini = repmat(uint32(1:k+1).',1,k2);
Ini(1:k+2:k*k2) = [];
Ini = reshape(Ini, k, k2);
Ini = bsxfun(@plus,Ini,last(1:end-1));
% End of day
Fin = bsxfun(@minus, last(2:end), Ini(k:-1:1,k2:-1:1,1))+1;
% Number of observartions per day
numobs = diff(last);
% Create middle and concatenate with Ini and Fin - Loop or out of memory
last = cat(3,Last,size(tmp,1));
pos = uint32([1:k2, k2+2:k+1].');
mu = zeros(last(end),1);
sigma = mu;
for n = uint16(1:numel(numobs))
neigh = sort(tmp([Ini(:,:,n),...
bsxfun(@plus, pos, uint32(0:numobs(n)-k-1)),...
Fin(:,:,n)]));
mu(last(n)+1:last(n+1)) = mean(neigh(2:end-1,:));
sigma(last(n)+1:last(n+1)) = std(neigh(2:end-1,:));
end

2 comentarios

Jan
Jan el 22 de Jul. de 2011
Do you want to create the [out] vector dynamically, or do you need a complete matrix containing the index vectors as lines?
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
I will start testing the solutions asap. I will see if it is faster to store the indices and call mean and std once.

Iniciar sesión para comentar.

 Respuesta aceptada

Jan
Jan el 22 de Jul. de 2011
If the indices are wanted as array:
k2 = k / 2;
Ini = transpose(repmat(1:k+1, k2, 1));
Ini(1:k+2:k*k2) = [];
Ini = transpose(reshape(Ini, k, k2));
Mid = bsxfun(@plus, [1:k2, k2+2:k+1], transpose(0:nA-k-1));
Fin = nA + 1 - Ini(k2:-1:1, k:-1:1);
B = cat(1, Ini, Mid, Fin);

3 comentarios

Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
Why do you use transpose instead of .' (any particular reason)?
Andrei Bobrov
Andrei Bobrov el 22 de Jul. de 2011
+1
Jan
Jan el 23 de Jul. de 2011
@Oleg: I'm using an old laptop with a 13'' LCD, which had been brighter in its youth. While these tiny specles . and ' look like a fly had left its excreta on my monitor, the massive "transpose" is hard to misinterprete.
And there was a discussion in CSSM about "''x'''.'" compared to "'''x''.'''" - or equivalent. It was rather confusing and I started to use "transpose" and "['string', char(39), 'string']" whenever I post in a forum.

Iniciar sesión para comentar.

Más respuestas (3)

Jan
Jan el 22 de Jul. de 2011
Some ideas:
  • "for n = 1:numel(a)" is usually faster than "for n = a"
  • For indexing integer types are faster than DOUBLEs. I use DOUBLEs in my example for better readability.
  • SETDIFF is expensive due to sorting.
  • Instead of check for exceptions inside the code, you can hard-code the exceptions by creating 3 loops:
nA = numel(a);
k2 = k / 2;
v = 1:k+1;
for n = 1:k2 % Initial part
out = v(v ~= n);
end
v = [1:k2, k2+2:k+1];
for n = k2+1:nA-k2
out = v;
v = v + 1;
end
v = nA-k:nA;
for n = nA-k2+1:nA % Final part
out = v(v ~= n);
end

5 comentarios

Titus Edelhofer
Titus Edelhofer el 22 de Jul. de 2011
Hi Jan,
I already once see you writing that using integers as indices is faster. Somehow I was not able to reproduce this... Do you have some example that shows this?
Titus
Jan
Jan el 22 de Jul. de 2011
@Titus: See http://www.mathworks.com/matlabcentral/answers/8461-how-to-make-a-double-summation-in-matlab-with-vectorized-loops . But there seems to be some interaction with the JIT acceleration: Very simple loops do not profit from the integers. But is is worth to compare the runtime for real world programs.
Jan
Jan el 22 de Jul. de 2011
@Titus: Same speed:
x = rand(10000, 1);
tic; a = 1; b = 2000; for i = 1:1000; v = x(a:b); end, toc
tic; a = uint32(1); b = uint32(2000); for i = 1:1000; v = x(a:b); end, toc
Different speed:
tic; ind = 1:2000; tic;for i = 1:1000; v = x(ind); end, toc
tic; ind = uint32(1):uint32(2000); for i = 1:1000; v = x(ind); end, toc
Titus Edelhofer
Titus Edelhofer el 22 de Jul. de 2011
Thanks! Now I understand. It's the indexing that is faster. I once tried this but replaced the loop counter by integers but it got slower this way. Thanks again! Titus
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
Implementing with preallocation to test against your vectorized solution.

Iniciar sesión para comentar.

Andrei Bobrov
Andrei Bobrov el 22 de Jul. de 2011
Hi Oleg! My version.
a = 1:10;
k = 4;
k2 = k/2;
nA = numel(a);
idx = cumsum([[1:k2,k2+2:k+1];ones(nA-k-1,k)]);
% OR idx = bsxfun(@plus,[1:k/2,k/2+2:k+1],(0:nA-k-1)');
a1 =bsxfun(@plus,1:k+1,[0;0]);
a2 =bsxfun(@plus,nA-(0:k),[0;0]);
ii = (1:k2).^2;
a1(ii)=0;
a2(ii)=0;
aa = [a1;a2]';
aout = reshape(aa(aa~=0),k,[])';
out = a([aout(1:k2,:);idx;aout(end-(k2-1:-1:0),end:-1:1)]);

9 comentarios

Jan
Jan el 22 de Jul. de 2011
@Andrei: Oleg asked for a=1:3e6 and k=60. Then the result needs 1.44GB, but your intermediate arrays will use additional temporary memory.
Jan
Jan el 22 de Jul. de 2011
@Andrei: Oleg asked for a=1:3e6 and k=60. Then the result needs 1.44GB, but your intermediate arrays will use additional temporary memory.
a1=bsxfun(@plus,1:k+1,[0;0]) ?! Why not [1:k+1; 1:k+1] or repmat(1:k+1, 2, 1)?
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
k2 = k/2;
nA = numel(data); % 1081
idx = cumsum([[1:k2,k2+2:k+1]; ones(nA-k-1,k)]);
a1 = [1:k+1; 1:k+1];
a2 = [n-(0:k); n-(0:k)];
ii = (1:k2).^2;
a1(ii)=0;
a2(ii)=0;
aa = [a1;a2].';
aout = reshape(aa(aa~=0),k+1,[]).';
% Errors here
out = a([aout(1:k2,:); idx; aout(end-(k2-1:-1:0),end:-1:1)]);
Andrei Bobrov
Andrei Bobrov el 22 de Jul. de 2011
In row : you - aout = reshape(aa(aa~=0),k+1,[]).';
I - aout = reshape(aa(aa~=0),k,[]).';
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
If it's not k+1 it erros there.
Andrei Bobrov
Andrei Bobrov el 22 de Jul. de 2011
Oleg. <http://forum.exponenta.ru/download.php?id=5931>
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
Ok, sorry was testing too many things together. What's n here then?
Andrei Bobrov
Andrei Bobrov el 22 de Jul. de 2011
Oleg. Sorry, n = numel(a). Сorrected.
Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
I completely abandoned the day by day loop and cannot verify your version anymore, although I got it working.

Iniciar sesión para comentar.

Sean de Wolski
Sean de Wolski el 22 de Jul. de 2011
I would probably try to do it with a convolution. A 'valid' one with a :
kernel = ([1;1;0;1;1]./4); %a is a column vector (example for mean)
std as a function of two convolutions is also doable. Then do the boundaries manually with a for-loop.

1 comentario

Oleg Komarov
Oleg Komarov el 22 de Jul. de 2011
Nice idea, will try to substitute the mid part proposed by Jan with the convolution.

Iniciar sesión para comentar.

Categorías

Preguntada:

el 21 de Jul. de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by