hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 visualización (últimos 30 días)

Mostrar comentarios más antiguos

chocho el 15 de Feb. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica

Editada: Walter Roberson el 20 de Feb. de 2017

Respuesta aceptada: dpb

COADREAD_methylation.txt

Abrir en MATLAB Online

d = fopen('COADREAD_methylation.txt','r');
this_line=0;
all={};
while this_line~=-1
 % C= textscan( d, '%f%s'  ) ;
    this_line=fgetl(d);
   if this_line~=-1
       all=[all;this_line];
   end
end
fclose(d);

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Stephen23 el 17 de Feb. de 2017

Editada: Stephen23 el 17 de Feb. de 2017

Respuesta aceptada

dpb el 15 de Feb. de 2017

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica#answer_254913

Editada: dpb el 16 de Feb. de 2017

Abrir en MATLAB Online

Well, 'NA' is easy, not sure what defines the repeated columns; not enough time at present to try to parse that input file to figure out what is/isn't unique without a description being supplied...

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);
  if isempty(strfind(l,'NA')), data=[data;{l}]; end
end
fid=fclose(fid);

If the presence of 'NA' is all that's needed to get all the offending records, then you're done; otherwise need more details on how to tell so folks here don't have to try to work it out on their own.

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

chocho el 20 de Feb. de 2017

Editada: Walter Roberson el 20 de Feb. de 2017

Abrir en MATLAB Online

hi friend, i want to make this code like this format

Note: i want to get every line and check if it has a NA remove it and get the second line, if not ckeck the columns of this line and see which column have ';' split this column and make 2 rows

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);   %get the lines
    if isempty(strfind(l,'NA')),  %remove NA rows
    else 
        %read next line
      idx=regexp(l,'\t','split');   %split the colmuns of this line which don't have NA and look for ';' in every column and split it 
      [nrow,ncol]=size(idx);  
           for i=1:ncol  
                 if idx(i)==';'  %look for columns which have ';'and split it 
                     split this column into 2 columns and put the second column
                     into a new row
                      %D = regexp(idx,';','split')
                      %l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
                 end
                     i=i+1;
           end
            save this line % this line will have no NA and if have ; will be splitted
      end
  end
  fid=fclose(fid);

chocho el 20 de Feb. de 2017

Editada: Walter Roberson el 20 de Feb. de 2017

Abrir en MATLAB Online

inputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
Composite Element REF  Beta_value  Gene_Symbol  Chromosome  Genomic_Coordinate  Beta_value    Gene_Symbol
cg00000292  0.511852232819811  ATP2A1   16  28890100  0.787687855895422  ATP2A1
cg00002426  0.519102187746053  SLMAP    3  57743543  0.932889308560864  SLMAP
cg00006414  NA  "ZNF425;ZNF398"  7  148822837  NA  "ZNF425;ZNF398"  
cg00008493  0.987979722052904  "COX8C;KIAA1409"  14  93813777  0.986128428295584      "COX8C;KIAA1409"  
cg00011459  0.922491239231445  "TMEM186;PMM2"  16  8890425  0.961124285303233  "TMEM186;PMM2"

outputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
cg00000292  0.511852232819811  ATP2A1   0.787687855895422  
cg00002426  0.519102187746053  SLMAP       0.932889308560864  
cg00008493  0.987979722052904  COX8C     0.986128428295584      
cg00008493  0.987979722052904  KIAA1409  0.986128428295584        
cg00011459  0.922491239231445  TMEM186  0.961124285303233  
cg00011459  0.922491239231445  PMM2                0.961124285303233

appreciate your help !

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

MATLAB Programming Live Scripts and Functions

Más información sobre Live Scripts and Functions en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

13 comentarios Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos