I want to convert a character series into numerical series using for loop

2 visualizaciones (últimos 30 días)
I have a character sequence stored in variable DNA_SEQS = 'AGGTAT.....'. The sequence consists of four type of character 'A', 'C', 'T' & 'G', therefore I have used swith case to generate the numerical sequence. The code I have written is:
seqs = fastaread('AF0071891.fasta');
DNA_SEQS = seqs.Sequence;
len = length(DNA_SEQS);
for j = 1:5
x = [];
a = DNA_SEQS(j);
switch a
case 'A'
v = 0;
case 'C'
v = 1;
case 'G'
v = 2;
case 'T'
v = 3;
end
x(j+1) = [x(j) v];
end
By using this code I supposed to get a numerical array like [0,2,2,3,0] but I got an error as: Index exceeds matrix dimensions.
Please help

Respuesta aceptada

dpb
dpb el 8 de Jun. de 2022
Editada: dpb el 8 de Jun. de 2022
for j = 1:5
x = [];
a = DNA_SEQS(j);
...
You wipe out what you put in x later every time you start through the loop again...don't do that!!! :)
x = [];
for j = 1:5
a = DNA_SEQS(j);
...
instead, although you should
  1. preallocate and asign into the array instead
  2. size x() based on the length of the string, not hardcode the loop count
N=strlength(DNA_SEQS);
x=zeros(1,N);
for j = 1:N
a = DNA_SEQS(j);
...
However, in MATLAB you don't need a loop; use a lookup table instead. One way (not necessarily the fastest, but pretty easy to code) would be
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS));
This would return for your sample above...
>> DNA_SEQS = 'AGGTAT';
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS))
DNA_VALS =
0 2 2 3 0 3
>>
  1 comentario
S Kar
S Kar el 8 de Jun. de 2022
Thank you but still got this error using the first method:
In an assignment A(:) = B, the number of elements in A and B must be the same.
The second method is working fine

Iniciar sesión para comentar.

Más respuestas (1)

DGM
DGM el 8 de Jun. de 2022
You can use ismember():
thisstr = 'AGGATATC';
charmap = 'ACGT';
[~,idx] = ismember(thisstr,charmap);
idx = idx-1
idx = 1×8
0 2 2 0 3 0 3 1
  4 comentarios
dpb
dpb el 8 de Jun. de 2022
For exactly the reason I outlined above as a possibility -- it isn't a char() array --
>> DNA_SEQS='AGGTAT'; % assign as char() string (and array of char())
>> N=strlength(DNA_SEQS) % strlength() is same as length(x,2) here...
ans =
6
>> for i=1:N,disp(DNA_SEQS(i));end % works find for a char() array with () addressing
A
G
G
T
A
T
>> DNA_SEQS = cellstr('AGGTAT'); % redefine as a cellstr() instead...
>> N=strlength(DNA_SEQS) % strlength knows about what is in the cell
N =
6
>> for i=1:N,disp(DNA_SEQS(i));end % but it fails as you see...
{'AGGTAT'}
Index exceeds the number of array elements (1).
>>
WHY!!!???
>> size(DNA_SEQS) % because now the cellstr is a 1x1 CELL array, NOT 1x6 char() array...
ans =
1 1
>>
How to make work???
"Use the curlies, Luke!!!"
>> for i=1:N,disp(DNA_SEQS{1}(i));end
A
G
G
T
A
T
>>
NB: above the use of {1} to "dereference" the cell array back to the content of the char() array inside it -- the subsequent "smooth" parenstheses (i) then picks the ith element from that vector again, just as it did directly when it was "only" a char() array, not a char() array in a cell.
Strings behave similarly as cellstr(); you have to use {} (the "curlies") to reference inside the string to the individual characters that make up the string array element.
See cellstr and links there for addressing cell strings and cells in general.
S Kar
S Kar el 8 de Jun. de 2022
Thank you so much for the elaboration.

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by