getting the nth term out of a sequence

Question

SANGBIN LEE el 29 de Feb. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2088736-getting-the-nth-term-out-of-a-sequence

Editada: John D'Errico el 29 de Feb. de 2024

% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);

I have the code above which is supposed to pull out the 155th term to the 3358th term in the text file that I have. For some reason when I run the code, it shows me the 153rd term to the 3356th term. Is something wrong with the code?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

SANGBIN LEE el 29 de Feb. de 2024

KIF11.txt

thank you

Walter Roberson el 29 de Feb. de 2024

Abrir en MATLAB Online

sequence = fscanf(fid, '%c');

beware: the character codes returned in sequence will include any end-of-line characters that might be there (possibly carriage return and line feed). Linear indexing into that is a bit uncertain because of the uncertainty over whether carriage returns are present or not.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Dyuman Joshi el 29 de Feb. de 2024

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2088736-getting-the-nth-term-out-of-a-sequence#answer_1419076

Editada: Dyuman Joshi el 29 de Feb. de 2024

Abrir en MATLAB Online

KIF11.txt

As @Walter has warned, a carriage return character (\r) is being read along with the data -

% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
size(sequence)
ans = 1×2
           1        3736
%Expected - last character of the 1st line and first character of the 2nd line
%Output is not according to that
y = sequence(70:71)
y = 
    'T
     '
double(y)
ans = 1×2
    84    13

Alternatively, you can use textscan here -

Fid = fopen(inputFileName, 'r');
out = textscan(Fid, '%c')
out = 1×1 cell array
    {3682×1 char}
seq = out{1};
y = seq(70:71)
y = 2×1 char array
    'T'
    'G'

% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

John D'Errico el 29 de Feb. de 2024

Editada: John D'Errico el 29 de Feb. de 2024

Abrir en MATLAB Online

+1. I was going to point this out:

find(~ismember(sequence,'CAGT'))
ans =
Columns 1 through 8
71         142         213         284         355         426         497         568
Columns 9 through 16
639         710         781         852         923         994        1065        1136
Columns 17 through 24
1207        1278        1349        1420        1491        1562        1633        1704
Columns 25 through 32
1775        1846        1917        1988        2059        2130        2201        2272
Columns 33 through 40
2343        2414        2485        2556        2627        2698        2769        2840
Columns 41 through 48
2911        2982        3053        3124        3195        3266        3337        3408
Columns 49 through 54
3479        3550        3621        3692        3735        3736

So there are two invisible characters in there before 155. They fell where carriage return characters will lie. That explains why it looks like the sequence was read by exactly 2 characters off.

So by deleting those elements first, then an index into the repaired string would work.

Iniciar sesión para comentar.

getting the nth term out of a sequence

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

getting the nth term out of a sequence

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos