Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

4 visualizaciones (últimos 30 días)
I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
I want to obtain:
idx = [2, 3, 6];
I have written a very long code that:
1. finds the elements with length not greater than length(str);
2. removes the elements with characters not included in str;
3. finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.

Respuesta aceptada

Mohsen Nosratinia
Mohsen Nosratinia el 13 de Oct. de 2013
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));

Más respuestas (3)

Cedric
Cedric el 13 de Oct. de 2013
Editada: Cedric el 13 de Oct. de 2013
Here is another solution, for the fun of it ..
>> spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
>> str_spec = spectrum(str) ;
Then
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6
>> dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz', 'aaaz'} ;
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6 8
where we see that 'aaz' was taken into account but not 'aaaz' as is has more a's than str.
  2 comentarios
N/A
N/A el 13 de Oct. de 2013
Cedric, when I try your code it returns only 2.
When I use my real cell of strings, it returns this error:
??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.
Error in ==> @(s)accumarray(s.'-64,ones(size(s)),[58,1])
Error in ==> @(s)all(spectrum(s)<=str_spec)
I am trying to understand why this is happening.
Cedric
Cedric el 13 de Oct. de 2013
Editada: Cedric el 13 de Oct. de 2013
A copy/paste with the following returns only 2?
str = 'actaz' ;
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'} ;
spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
str_spec = spectrum(str) ;
find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
It shouldn't.
In your real cell of strings, do you have special characters, numbers or spaces? I wrote this solution thinking that there would be only letters (lower or upper case). If there are spaces and digits, please use the update version of spectrum:
spectrum = @(s) accumarray(s.'-31, ones(size(s)), [91,1]) ;
If there can be any special character, just use
spectrum = @(s) accumarray(s.'-0, ones(size(s)), [256,1]) ;
PS: s.'-0 could be replaced by double(s).', but I wanted to keep the previous structure for it not to be even more confusing ;-)

Iniciar sesión para comentar.


Jos (10584)
Jos (10584) el 13 de Oct. de 2013
Take a look at my function MATCHROW:
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
indices = find(cellfun(@(x) matchrow(str,x),dic))
The warning it issues (when numel(str) > numel(dic{k}) ), can be ignored. The function can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14520

Azzi Abdelmalek
Azzi Abdelmalek el 13 de Oct. de 2013
Editada: Azzi Abdelmalek el 13 de Oct. de 2013
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
idx1=find(cellfun(@numel,dic)<=numel(str))
dic1=dic(idx1)
idx2=cellfun(@(x) numel(unique(x))==numel(x),dic1)
idx3=idx1(idx2)
dic3=dic1(idx2)
idx4= cellfun(@(x) all(ismember(x,str)),dic3)
indices=idx3(idx4)
  2 comentarios
N/A
N/A el 13 de Oct. de 2013
Dear Azzi,
your solution is amazing and very elegant, although apparently it is also a bit slow (but faster than the code that I am currently using).
There is only one problem. Suppose that
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
the solution should be
indices = [2 3 6 8];
because 'a' appears twice in str. I realize I should have made clear that characters can appear at most n times in dic if they appear n times in str - i.e characters cannot be repeated more times than they are repeated in str.
Do you think your code can be adjusted to this condition?
Anyway, thank you a lot for your prompt answer!

Iniciar sesión para comentar.

Categorías

Más información sobre Cell Arrays en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by