String Replacing for a DNA sequence

1 visualización (últimos 30 días)
Reshma Ravi
Reshma Ravi el 2 de Jun. de 2017
Comentada: Reshma Ravi el 2 de Jun. de 2017
I want to extract the row from a table whose count is greater than 1, where the first column consists of strings and second column its count. For eg, Table A = AAGC 1 GCCU 2 AGCU 2 CCGU 1 The desired output is : GCCU 2 AGCU 2
  2 comentarios
Andrei Bobrov
Andrei Bobrov el 2 de Jun. de 2017
Editada: Andrei Bobrov el 2 de Jun. de 2017
Please example with beginning sequence and with finished result.
Jan
Jan el 2 de Jun. de 2017
What are "repeated substring" exactly?

Iniciar sesión para comentar.

Respuesta aceptada

Andrei Bobrov
Andrei Bobrov el 2 de Jun. de 2017
Editada: Andrei Bobrov el 2 de Jun. de 2017
A = {'AAGC', 1 ;'GCCU', 2 ;'AGCU', 2; 'CCGU' 1};
T = cell2table(A,'var',{'DNA','count'});
Tout = T(T.count > 1,:);

Más respuestas (1)

Jan
Jan el 2 de Jun. de 2017
Imagine that you worked out how to get a cell string containing the sub-strings:
C = {'GTTA', 'TTAG', 'TAGC', 'GTTA', 'GTTA', 'GTTA', 'TTAG'};
Now find the repeated strings:
repeated = strcmp(C(1:end-1), C(2:end));
Unfortuinately the description is not clear:
if GTTA is repeated 4 times then replace it with another non terminal for example,
A or something like that.
Do you want to replace each repeated string by the character 'A', or all 4 repetitions by one 'A'? This might be:
C(repeated) = {'A'};
[B, N, Index] = RunLength(repeated);
As long as I'm not sure, what you are asking for, I will not spend more time in creating an explicite answer. But you can try it by your own.

Categorías

Más información sobre Genomics and Next Generation Sequencing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by