Expected Behavior for contains()?

10 views (last 30 days)
I am seeing odd behavior from the contains() function and I'm not sure if it's a bug or if it's working as expected.
I have two cell arrays of chars: rFootNames is 44x1 and eligNames is 33x1.
rFootNames={'RDP1';'RFCC';'RFM1';'RFM5';'RHeelCirc1';'RHeelCirc2';'RHeelCirc3';'RHeelCirc4';'RHeelCirc5';'RHeelCirc6';'RHeelCirc7';'RHeelCirc8';'RHeelCirc9';...
'RHeelCirc10';'RHeelCirc11';'RHeelCirc12';'RHeelCirc13';'RHeelCirc14';'RHeelCirc15';'RHeelCirc16';'RHeelCirc17';'RHeelCirc18';'RHeelCirc19';...
'RHeelCirc20';'RForeCirc1';'RForeCirc2';'RForeCirc3';'RForeCirc4';'RForeCirc5';'RForeCirc6';'RForeCirc7';'RForeCirc8';'RForeCirc9';'RForeCirc10';...
'RForeCirc11';'RForeCirc12';'RForeCirc13';'RForeCirc14';'RForeCirc15';'RForeCirc16';'RForeCirc17';'RForeCirc18';'RForeCirc19';'RForeCirc20'};
eligNames={'LFCC';'RDP1';'RFM1';'RFM5';'LHeelCirc8';'LHeelCirc9';'LHeelCirc10';'LHeelCirc11';'LHeelCirc12';'LHeelCirc13';'LHeelCirc14';'RHeelCirc1';...
'RHeelCirc2';'RHeelCirc3';'RHeelCirc4';'RHeelCirc5';'RHeelCirc6';'RHeelCirc19';'RHeelCirc20';'RForeCirc1';'RForeCirc2';'RForeCirc3';'RForeCirc4';'RForeCirc5';...
'RForeCirc12';'RForeCirc13';'RForeCirc14';'RForeCirc15';'RForeCirc16';'RForeCirc17';'RForeCirc18';'RForeCirc19';'RForeCirc20'};
When I run
rIdx=contains(rFootNames,eligNames);
the rIdx variable is the same length as rFootNames, as expected, but
sum(rIdx)
ans = 36
comes out to be 36, which is larger than the length of the "eligNames" pat argument, which I thought would be impossible! I think that the correct answer here is 25, which is the number returned by
sum(ismember(rFootNames,eligNames))
ans = 25
How can there be more matches found than elements in "eligNames" when using contains()? Is this expected behavior? What am I missing? Thanks.

Accepted Answer

Stephen
Stephen on 27 Oct 2021
Edited: Stephen on 27 Oct 2021
Exactly as its documentation explains, CONTAINS returns a logical TRUE where pat (second input) is found anywhere within the str (first input), i.e. the pattern can be a partial match of the string i.e. the string contains the pattern. This is very easy to demonstrate using a small subset of your data:
rFootNames = {'RHeelCirc1';'RHeelCirc10'};
eligNames = {'RHeelCirc1'};
rIdx = contains(rFootNames,eligNames)
rIdx = 2×1 logical array
1 1
From your question you apparently do not expect two matches, but this output is correct because "RHeelCirc10" contains the pattern "RHeelCirc1". That is exactly what CONTAINS does: it tells you that both "RHeelCirc1" and "RHeelCirc10" contain the pattern "RHeelCirc1". Hence two perfectly correct matches of that pattern. The fact that one of them happens to exactly match that pattern is totally irrelevant (but is what you apparently expect, given your invalid comparison against the output of ISMEMBER).
"I think that the correct answer here is 25"
Nope, that would be checking for exact matches of the entire strings, not checking for substrings using CONTAINS.
"How can there be more matches found than elements in "eligNames" when using contains()?"
Because the pattern "RHeelCirc1" is contained within "RHeelCirc1" and also within "RHeelCirc10" and also within "RHeelCirc11" and also within "RHeelCirc12" and also within "RHeelCirc13" ... etc. etc. etc.
"Is this expected behavior?"
Yes.

More Answers (1)

Image Analyst
Image Analyst on 27 Oct 2021
Edited: Image Analyst on 27 Oct 2021
Because some of the names in eligNames occur in more than one location of rFootNames so they are counted more than once.
  1 Comment
Mitchell Tillman
Mitchell Tillman on 27 Oct 2021
I don't think that there are any duplicates in either variable, because
isequal(sort(rFootNames),sort(unique(rFootNames)))
and
isequal(sort(eligNames),sort(unique(eligNames)))

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by