Improving efficiency of a char array function
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Paolo Binetti
el 8 de En. de 2017
Editada: Paolo Binetti
el 9 de En. de 2017
I have built a function to be run 100000 times in a loop. Its input, "motifs", is a full 20x15 char array, containing only four types of characters, 'A', 'C', 'G', 'T'. I am wondering if my implementation, below, can be improved of if it is pretty much as fast as it gets:
% Find consensus string and score of a motifs array
function [score, count, consensus] = scoremotifs_2(motifs)
count = [ sum(motifs == 'A',1) ;
sum(motifs == 'C',1) ;
sum(motifs == 'G',1) ;
sum(motifs == 'T',1) ];
[count_max, consensus_num] = max(count);
consensus(consensus_num == 1) = 'A';
consensus(consensus_num == 2) = 'C';
consensus(consensus_num == 3) = 'G';
consensus(consensus_num == 4) = 'T';
score = sum(sum(motifs ~= consensus));
3 comentarios
the cyclist
el 8 de En. de 2017
Editada: the cyclist
el 8 de En. de 2017
One thing to be aware of ...
It looks to me like your algorithm will be biased to overcount 'A' relative to the other elements, because in the case of a tied number of counts, consensus_num_ is going to choose the first index. (Similarly, 'C' will be more likely than 'G' and 'T', etc.)
Respuesta aceptada
the cyclist
el 8 de En. de 2017
Editada: the cyclist
el 8 de En. de 2017
This is starting to get a little obfuscated, but it's significantly faster:
list ='ACGT';
count = sum(motifs == reshape(list,[1 1 4]));
[count_max, consensus_num] = max(count,[],3);
consensus = list(consensus_num);
score = sum(sum(motifs ~= consensus));
The key algorithmic difference here is that I am able to compare against all four elements 'ACGT' in parallel, by permuting them into a third dimension.
You need R2016b or later in order for MATLAB to make the implicit dimension expansion to compare the 20x15x1 array against the 1x1x4 array. If you have an older version, you will explicitly need to use repmat.
2 comentarios
the cyclist
el 8 de En. de 2017
Also, because you are calling this function lots of time, you might want to do something like defining
list = 'ACGT';
plist = reshape(list,[1 1 4]);
in the calling function, and pass them in as argument, rather than needing to do that reshape every time. But I think that will be only a tiny speedup, if any.
Más respuestas (1)
the cyclist
el 8 de En. de 2017
It's somewhat faster to calculate consensus like this:
list = 'ACGT';
consensus = list(consensus_num);
0 comentarios
Ver también
Categorías
Más información sobre Particle & Nuclear Physics en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!