# Fast data structure for tabu list?

5 views (last 30 days)
Paul Berglund on 15 Sep 2017
Answered: Jan on 16 Sep 2017
I am looking for a fast way to implement a tabu list where each element in the list is a vector of integers.
I am writing some optimization code where evaluating the quality of a given candidate solution is expensive so I want to record that I have evaluated a given candidate before so as to avoid doing it a second time.
I am currently doing something very simple:
ismember(candidateSolution,tabuList,'rows')
where the tabuList is a matrix where each row corresponds to a previously evaluated candidateSolution. This works well enough for small problem instances but starts to bog down when the number of rows in tabuList gets into the tens of thousands.
Presumably a more sophisticated data structure would improve performance, but I'm not sure what to try. Maybe a sparse matrix?

James Tursa on 16 Sep 2017
How many elements per row? How many total candidate solutions are we talking about? What are the ranges of the integer values? Are you willing to consider mex functions?
Paul Berglund on 16 Sep 2017
This will vary, but as an example the one I'm having trouble with right now has 9 elements in a row. The integer values will always be under 2000, and in the current example they are all under 100.
The number of candidate solutions is large, but the number I need to keep track of is in the hundreds of thousands. Big but not huge.
I think I'll only try mex if I've exhausted other reasonable options.

Walter Roberson on 16 Sep 2017
Use containers.Map . Test with iskey()

Image Analyst on 16 Sep 2017
I didn't know about this. thanks, it could be useful. Is it old? The documentation does not say when it was introduced.
I was going to suggest a table. Could containers be an alternative to a table? And iskey() an alternative to ismember()?
Walter Roberson on 16 Sep 2017
R2008b according to the release notes.

John D'Errico on 15 Sep 2017
Sparse matrices are NOT the answer here. In fact, that would be an actively terrible reason to use a sparse matrix.
You might consider the memoize function, which allows MATLAB to recognize that it has seen a set of inputs to a specific function.

Paul Berglund on 15 Sep 2017
Alas my MATLAB version is old and does not have memoize.
John D'Errico on 16 Sep 2017
When an old release is an issue, it is always a good idea to tell us. Saves us wasting time answering, and you wasting time to tell us.

Jan on 16 Sep 2017
Edited: Jan on 16 Sep 2017
You can use a hash, e.g. obtained by FEX: DataHash or FEX: GetMD5. The latter is faster, but the hashing will not be the bottleneck of the code.
tabuList = {};
hash8 = GetMD5(candidateSolution, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if any(strcmp(hash, tabuList))
else
% New data, process...
tabuList{end+1} = hash;
end
This can be improved by pre-allocating the tabuList or perhaps by a binary search. But usually strcmp is fast, because it stops the comparison at the first not matching character.
[EDITED] The comparison is slightly faster with using all 16 bits of the CHAR type. Now this takes 37 seconds on my R2016b/64 system for 40'000 candidates with about 10'000 repeated keys. The main time is spent in any(strcmp), such that the drawback of the omitted pre-allocation is less important that the advantage of having a shorter tabuList. With a dedicated C-Mex function this can be accelerated:
tic;
tabuList = cell(1, 4e4); % Pre-allocate
iList = 0;
for k = 1:4e4
c = randi([1, 4], 1, 8);
% hash = GetMD5(c, 'Binary', 'base64');
hash8 = GetMD5(c, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if anyStrcmp8(hash, tabuList))
else
% New data, process...
% Add hash to the list:
iList = iList + 1;
tabuList{iList} = hash;
end
end
disp(iList)
toc
And the Mex function anyStrcmp8.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint16_T *S, *CS;
size_t iC, nC, nS;
const mxArray *C, *aC;
// Get inputs:
S = (uint16_T *) mxGetData(prhs);
nS = mxGetNumberOfElements(prhs);
C = prhs;
nC = mxGetNumberOfElements(C);
// Loop over cell:
for (iC = 0; iC < nC; iC++) {
aC = mxGetCell(C, iC);
if (aC == NULL) { // Undeclared element reached:
plhs = mxCreateLogicalScalar(false);
return;
}
CS = (uint16_T *) mxGetData(aC);
if (memcmp(S, CS, 8 * sizeof(uint16_T)) == 0) {
// Matching element found:
plhs = mxCreateLogicalScalar(true);
return;
}
}
// No element found
plhs = mxCreateLogicalScalar(false);
return;
}
This needs 13.6 sec for 40'000 keys with 25% repetitions.

Jan on 16 Sep 2017
It is much faster to store the hash keys in an UINT64 matrix than in a cell string:
tic;
nKey = 4e4;
List = zeros(2, nKey, 'uint64');
iList = 0;
for k = 1:nKey
c = randi([1, 4], 1, 8);
hash = typecast(GetMD5(c, 'Binary', 'uint8'), 'uint64').';
if ~SearchHashKey(hash, List, iList)
% New data, process...
% Append hash to the list:
iList = iList + 1;
List(:, iList) = hash;
end
end
disp(iList);
toc
And the C-Mex function SearchHashKey.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint64_T *V, *W;
size_t iW, nW;
if (nrhs != 3) {
mexErrMsgTxt("SearchHashKey.mex: 3 inputs required.");
}
if (!mxIsUint64(prhs) || !mxIsUint64(prhs)) {
mexErrMsgTxt("SearchHashKey.mex: Inputs must be UINT64.");
}
V = (uint64_T *) mxGetData(prhs);
W = (uint64_T *) mxGetData(prhs);
nW = (size_t) mxGetScalar(prhs);
if (mxGetNumberOfElements(prhs) != 2 ||
mxGetNumberOfElements(prhs) < 2 * nW) {
}
for (iW = 0; iW < nW; iW++) {
if (V == W && V == W) {
plhs = mxCreateLogicalScalar(true);
return;
}
W += 2;
}
plhs = mxCreateLogicalScalar(false);
return;
}
This takes 1.3 seconds for 40'000 elements only.