Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Matthew Worker
el 13 de Oct. de 2013
Comentada: Cedric
el 13 de Oct. de 2013
I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
I want to obtain:
idx = [2, 3, 6];
I have written a very long code that:
1. finds the elements with length not greater than length(str);
2. removes the elements with characters not included in str;
3. finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.
0 comentarios
Respuesta aceptada
Mohsen Nosratinia
el 13 de Oct. de 2013
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));
0 comentarios
Más respuestas (3)
Cedric
el 13 de Oct. de 2013
Editada: Cedric
el 13 de Oct. de 2013
Here is another solution, for the fun of it ..
>> spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
>> str_spec = spectrum(str) ;
Then
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6
>> dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz', 'aaaz'} ;
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6 8
where we see that 'aaz' was taken into account but not 'aaaz' as is has more a's than str.
2 comentarios
N/A
el 13 de Oct. de 2013
Cedric, when I try your code it returns only 2.
When I use my real cell of strings, it returns this error:
??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.
Error in ==> @(s)accumarray(s.'-64,ones(size(s)),[58,1])
Error in ==> @(s)all(spectrum(s)<=str_spec)
I am trying to understand why this is happening.
Cedric
el 13 de Oct. de 2013
Editada: Cedric
el 13 de Oct. de 2013
A copy/paste with the following returns only 2?
str = 'actaz' ;
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'} ;
spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
str_spec = spectrum(str) ;
find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
It shouldn't.
In your real cell of strings, do you have special characters, numbers or spaces? I wrote this solution thinking that there would be only letters (lower or upper case). If there are spaces and digits, please use the update version of spectrum:
spectrum = @(s) accumarray(s.'-31, ones(size(s)), [91,1]) ;
If there can be any special character, just use
spectrum = @(s) accumarray(s.'-0, ones(size(s)), [256,1]) ;
PS: s.'-0 could be replaced by double(s).', but I wanted to keep the previous structure for it not to be even more confusing ;-)
Jos (10584)
el 13 de Oct. de 2013
Take a look at my function MATCHROW:
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
indices = find(cellfun(@(x) matchrow(str,x),dic))
The warning it issues (when numel(str) > numel(dic{k}) ), can be ignored. The function can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14520
0 comentarios
Azzi Abdelmalek
el 13 de Oct. de 2013
Editada: Azzi Abdelmalek
el 13 de Oct. de 2013
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
idx1=find(cellfun(@numel,dic)<=numel(str))
dic1=dic(idx1)
idx2=cellfun(@(x) numel(unique(x))==numel(x),dic1)
idx3=idx1(idx2)
dic3=dic1(idx2)
idx4= cellfun(@(x) all(ismember(x,str)),dic3)
indices=idx3(idx4)
2 comentarios
N/A
el 13 de Oct. de 2013
Dear Azzi,
your solution is amazing and very elegant, although apparently it is also a bit slow (but faster than the code that I am currently using).
There is only one problem. Suppose that
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
the solution should be
indices = [2 3 6 8];
because 'a' appears twice in str. I realize I should have made clear that characters can appear at most n times in dic if they appear n times in str - i.e characters cannot be repeated more times than they are repeated in str.
Do you think your code can be adjusted to this condition?
Anyway, thank you a lot for your prompt answer!
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!