how to search for multiple words anywhere in the sentence ?
Mostrar comentarios más antiguos
I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?
Respuestas (3)
the cyclist
el 19 de Sept. de 2015
0 votos
The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.
per isakson
el 19 de Sept. de 2015
Editada: per isakson
el 20 de Sept. de 2015
Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
 
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end
12 comentarios
Amr Hashem
el 19 de Sept. de 2015
per isakson
el 19 de Sept. de 2015
Editada: per isakson
el 19 de Sept. de 2015
"... but thats not what i want"
Then you need to better explain what you want. And also explain why my hint isn't useful to you.
Amr Hashem
el 19 de Sept. de 2015
John D'Errico
el 19 de Sept. de 2015
Because he wants a magic solution.
Amr Hashem
el 19 de Sept. de 2015
Editada: Amr Hashem
el 19 de Sept. de 2015
per isakson
el 19 de Sept. de 2015
Editada: per isakson
el 19 de Sept. de 2015
The task is:   "search for three words "Battery, power, failure" the three must exist in the sentence in any order".   Is that correct?
"I have about (57000*6 cell)"   How are that cell array related to alldata(:,126:130)? Thus, with one sentence per cell, you have 0.342 million sentences(?). What is an acceptable execution time?
"I only need to modify this line:"   You need at least to explain what you expect the line to do! Why should I guess?
"I only want to solve this problem"   What problem? Why only? What make you think that it is even possible to accomplish the task with a code along the lines, which you propose? I don't think it is possible!
btw: "Xbattery" should that match "battery"?
Amr Hashem
el 19 de Sept. de 2015
per isakson
el 20 de Sept. de 2015
"I am now asking is it possible to modify the code or not? "   I repeat: I don't think it is possible!
per isakson
el 20 de Sept. de 2015
Editada: per isakson
el 20 de Sept. de 2015
Three words in any order is a tough job for regexp.   "to do the regexp search three times, once for each word"   is a sound approach and I cannot understand why you dismissed it.
per isakson
el 20 de Sept. de 2015
I added a new code to my answer.
Amr Hashem
el 20 de Sept. de 2015
Amr Hashem
el 20 de Sept. de 2015
1 comentario
Cedric
el 22 de Sept. de 2015
This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!