Fast data structure for tabu list?
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I am looking for a fast way to implement a tabu list where each element in the list is a vector of integers.
I am writing some optimization code where evaluating the quality of a given candidate solution is expensive so I want to record that I have evaluated a given candidate before so as to avoid doing it a second time.
I am currently doing something very simple:
ismember(candidateSolution,tabuList,'rows')
where the tabuList is a matrix where each row corresponds to a previously evaluated candidateSolution. This works well enough for small problem instances but starts to bog down when the number of rows in tabuList gets into the tens of thousands.
Presumably a more sophisticated data structure would improve performance, but I'm not sure what to try. Maybe a sparse matrix?
2 comentarios
James Tursa
el 16 de Sept. de 2017
How many elements per row? How many total candidate solutions are we talking about? What are the ranges of the integer values? Are you willing to consider mex functions?
Respuestas (4)
Walter Roberson
el 16 de Sept. de 2017
2 comentarios
Image Analyst
el 16 de Sept. de 2017
I didn't know about this. thanks, it could be useful. Is it old? The documentation does not say when it was introduced.
I was going to suggest a table. Could containers be an alternative to a table? And iskey() an alternative to ismember()?
John D'Errico
el 15 de Sept. de 2017
Sparse matrices are NOT the answer here. In fact, that would be an actively terrible reason to use a sparse matrix.
You might consider the memoize function, which allows MATLAB to recognize that it has seen a set of inputs to a specific function.
2 comentarios
John D'Errico
el 16 de Sept. de 2017
When an old release is an issue, it is always a good idea to tell us. Saves us wasting time answering, and you wasting time to tell us.
Jan
el 16 de Sept. de 2017
Editada: Jan
el 16 de Sept. de 2017
You can use a hash, e.g. obtained by FEX: DataHash or FEX: GetMD5. The latter is faster, but the hashing will not be the bottleneck of the code.
tabuList = {};
hash8 = GetMD5(candidateSolution, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if any(strcmp(hash, tabuList))
% Existing already
else
% New data, process...
tabuList{end+1} = hash;
end
This can be improved by pre-allocating the tabuList or perhaps by a binary search. But usually strcmp is fast, because it stops the comparison at the first not matching character.
[EDITED] The comparison is slightly faster with using all 16 bits of the CHAR type. Now this takes 37 seconds on my R2016b/64 system for 40'000 candidates with about 10'000 repeated keys. The main time is spent in any(strcmp), such that the drawback of the omitted pre-allocation is less important that the advantage of having a shorter tabuList. With a dedicated C-Mex function this can be accelerated:
tic;
tabuList = cell(1, 4e4); % Pre-allocate
iList = 0;
for k = 1:4e4
c = randi([1, 4], 1, 8);
% hash = GetMD5(c, 'Binary', 'base64');
hash8 = GetMD5(c, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if anyStrcmp8(hash, tabuList))
% Existing already
else
% New data, process...
% Add hash to the list:
iList = iList + 1;
tabuList{iList} = hash;
end
end
disp(iList)
toc
And the Mex function anyStrcmp8.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint16_T *S, *CS;
size_t iC, nC, nS;
const mxArray *C, *aC;
// Get inputs:
S = (uint16_T *) mxGetData(prhs[0]);
nS = mxGetNumberOfElements(prhs[0]);
C = prhs[1];
nC = mxGetNumberOfElements(C);
// Loop over cell:
for (iC = 0; iC < nC; iC++) {
aC = mxGetCell(C, iC);
if (aC == NULL) { // Undeclared element reached:
plhs[0] = mxCreateLogicalScalar(false);
return;
}
CS = (uint16_T *) mxGetData(aC);
if (memcmp(S, CS, 8 * sizeof(uint16_T)) == 0) {
// Matching element found:
plhs[0] = mxCreateLogicalScalar(true);
return;
}
}
// No element found
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This needs 13.6 sec for 40'000 keys with 25% repetitions.
0 comentarios
Jan
el 16 de Sept. de 2017
It is much faster to store the hash keys in an UINT64 matrix than in a cell string:
tic;
nKey = 4e4;
List = zeros(2, nKey, 'uint64');
iList = 0;
for k = 1:nKey
c = randi([1, 4], 1, 8);
hash = typecast(GetMD5(c, 'Binary', 'uint8'), 'uint64').';
if ~SearchHashKey(hash, List, iList)
% New data, process...
% Append hash to the list:
iList = iList + 1;
List(:, iList) = hash;
end
end
disp(iList);
toc
And the C-Mex function SearchHashKey.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint64_T *V, *W;
size_t iW, nW;
if (nrhs != 3) {
mexErrMsgTxt("SearchHashKey.mex: 3 inputs required.");
}
if (!mxIsUint64(prhs[0]) || !mxIsUint64(prhs[1])) {
mexErrMsgTxt("SearchHashKey.mex: Inputs must be UINT64.");
}
V = (uint64_T *) mxGetData(prhs[0]);
W = (uint64_T *) mxGetData(prhs[1]);
nW = (size_t) mxGetScalar(prhs[2]);
if (mxGetNumberOfElements(prhs[0]) != 2 ||
mxGetNumberOfElements(prhs[1]) < 2 * nW) {
mexErrMsgTxt("SearchHashKey.mex: Inputs have bad sizes.");
}
for (iW = 0; iW < nW; iW++) {
if (V[0] == W[0] && V[1] == W[1]) {
plhs[0] = mxCreateLogicalScalar(true);
return;
}
W += 2;
}
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This takes 1.3 seconds for 40'000 elements only.
0 comentarios
Ver también
Categorías
Más información sobre Resizing and Reshaping Matrices en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!