Find not fast enough - is there a speedier solution for large matrices?
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
My code works but given the size of my data matrices is too slow despite access to a pretty heavy duty machine. I'm sure the matlab community has a good and quick fix to my woes. It takes about 0.3s per iteration at the moment so we are talking days/weeks of computing to run my code. I think my main problem lies with use of the function 'find', and I need a more elegant solution perhaps vectorizing or using the parallel computing tool box (available but new to me).
Thanks in advance!
The problem:
I have 36 sampling dates. A large matrix (1942242*2) of xy sample coordinates('locmat'), my code pasted below then reads in a three column matrix for each sample date in turn. These matrices have similar but different lengths to 'locmat' that consists of xy coordinate data (read in to 'xydat') and a data measurement at that xy location (read in to 'fetcol'). All coordinatyes in 'xydat' have an exact match in 'locmat', but are indexed differently depending on the sample date. Therefore not all xy coordinates in 'locmat' are to be found in 'xydat'. I am trying to index the data in the sample files to locmat based on the xy locations - producing a single matrix (1942242*36) called 'fetmat'. Any coordinate with no data on a given date is stored as -999.
Code:
nosamp = 36;
fetpath = 'C:\Data\dat_text\';
locfnam = 'C:\Data\srchmat\locmat.csv';
locmat = csvread(locfnam);
fn = dir(fetpath);
ns = {fn.name};
ns = sort(ns);
ns = char(ns(3:end));
fetmat = zeros(length(locmat),nosamp);
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
for s = 1:length(locmat);
xysrch = locmat(s,:);
xyrep = repmat(xysrch,length(locmat),1);
ids = find(locmat == xyrep) ;
if isempty(ids)
fetmat(s,q) = -999;
else
fetmat(s,q) = fetcol(ids(1));
end
end
end
1 comentario
Roger Stafford
el 20 de Dic. de 2012
I don't entirely understand your code. In spite of the statement about 'xydat' having a match in 'locmat' there is no reference to 'xydat' within your for-loops. Instead you seem to be searching for duplications in 'locmat' itself. Perhaps I haven't understood your description correctly.
However I can make a general comment concerning the use of the 'find' function. When you have a long list to be repeatedly searched for specific items it is best not to use 'find' if you can possibly avoid it. If you use a sorted list instead, there are some much faster methods of finding a match. With your 'locmat' at a length of 1,942,242 rows such a search can take only log2(1,942,242) = 21 comparisons rather than 1,942,242 of them using a binary search algorithm. I am fairly sure the matlab function 'ismember' uses just such a method in finding elements of one set which lie in another set. Of course you are apparently trying to match a pair of values, x and y, but I am sure there is a way of making use of the binary search technique which would apply here.
You don't want to be scanning 'locmat' from one end to the other repeatedly 1,942,242 x 36 times. That's over 100 trillion comparisons!
Roger Stafford
Respuesta aceptada
Matt J
el 20 de Dic. de 2012
Editada: Matt J
el 20 de Dic. de 2012
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
[~,fetmat(:,q)]=ismember(locmat,xydat,'rows');
end
fetmat(~fetmat)=-999;
0 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Creating and Concatenating Matrices en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!