Importdata does not import whole .txt file
24 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Luuk van Oosten
el 10 de Jul. de 2014
Editada: per isakson
el 15 de Jul. de 2014
I'm encountering a problem importing a .txt file containing mass spectrometry data. Somewhere along the way it just stops importing the remaining part of my .txt file (in total over 777.000 lines)
The data I'm trying to import describes a scan event from the mass spectrometer. The header (from 'BEGIN IONS' to 'SCANS=scannumber') describes certain properties of the scan event. The numbers between 'SCANS=scannumber' and 'END IONS' describe the spectrum (the actual data, but worthless without the header); the first column being m/z values , the second ion intensities .
My data looks lik this (this is one scan event):
BEGIN IONS
TITLE=Spectrum2667 scans: 5993,
PEPMASS=897.52844 17418.17383
CHARGE=2+
RTINSECONDS=3127
SCANS=5993
176.86790 128.299
181.97141 139.498
221.90227 139.841
341.23862 982.842
END IONS
I want to extract certain scan events based on their scannumber; another script tells me which ones to extract from this file. But, MATLAB is not importing all my scans. Therefore I am missing some of the data (and my other script gives an error, because the scan number it is looking for is not present).
I have just over 6000 of these scans in one .txt file. For some reason, MATLAB stops somewhere near the end of my file. At a certain scan event, it stops halfway the list describing the spectrum. The code I use to import the data is:
List(:,1) = importdata('MyData.txt');
Because I just need a list of all the scan events and write them to a new file after I have extracted the scan events that I want, it is of no importance to import the file in two columns or split the header etc; I just want the complete list all the way to the end of my .txt file.
I've looked in my .txt file, but there is no different space and/or tab format at this particular line in the .txt file.
If someone could help me solve my problem, I would be very happy.
Here is a dropbox link to https://www.dropbox.com/s/ijp5mvtvrm0ob9w/140708_LO_03_140710112412.txt it was too large to attach.
4 comentarios
Sara
el 11 de Jul. de 2014
There is an error when clicking on the file. Have you tried cutting out only the last entry by itself and see if the code fails. Maybe it's not about the number of elements but I'm really just guessing here.
Respuesta aceptada
Sara
el 11 de Jul. de 2014
I don't know what is wrong with importdata. This version will work. The size of k was based on your file, it may need to be changed if you change file.
k = cell(1332160,1);
j = 0;
fid = fopen('140708_LO_03_140710112412.txt','r');
while 1
t = fgetl(fid);
if(~ischar(t)),break,end
j = j + 1;
k{j} = t;
end
k = k(8:j-2);
3 comentarios
Sara
el 15 de Jul. de 2014
I thought you didn't need that part :) and the number was totally casual, just a big one.
Más respuestas (3)
per isakson
el 15 de Jul. de 2014
Editada: per isakson
el 15 de Jul. de 2014
"For some reason, MATLAB stops somewhere near the end of my file."
In Matlab, there is no high level function that reads and parses your text file, i.e. a file with repeated headers and blocks of data.
 
"[...]the actual data, but worthless without the header" .
I have a function, read_blocks_of_numerical_data, that reads only the actual data.
>> g=read_blocks_of_numerical_data('140708_LO_03_140710112412.txt',50);
>> whos g
Name Size Bytes Class Attributes
g 1x2142 21279808 cell
>> g{1234}
ans =
1.0e+06 *
0.0001 0.0023
0.0001 0.0022
0.0001 0.0022
.......
I attached the m-file. Somebody else might want to try it.
Cedric
el 15 de Jul. de 2014
Editada: Cedric
el 15 de Jul. de 2014
Here is an alternate way based on regular expressions
content = fileread( '140708_LO_03_140710112412.txt' ) ;
pattern = ['TITLE=(?<title>[^\r\n]*)\s*', ...
'PEPMASS=(?<pepmass>[^\r\n]*)\s*', ...
'CHARGE=(?<charge>[^\r\n]*)\s*', ...
'RTINSECONDS=(?<rtinseconds>\d*)\s*', ...
'SCANS=(?<scans>\d*)\s*', ...
'(?<spectrum>[^E]*)'] ;
data = regexp( content, pattern, 'names' ) ;
for k = 1 : numel( data )
data(k).pepmass = sscanf( data(k).pepmass, '%f' )' ;
data(k).rtinseconds = sscanf( data(k).rtinseconds, '%d' ) ;
data(k).scans = sscanf( data(k).scans, '%d' ) ;
data(k).spectrum = sscanf( data(k).spectrum, '%f', [2, Inf] )' ;
end
Running this, you get e.g.
>> data
data =
1x2142 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum
>> data(1000)
ans =
title: 'Spectrum1000 scans: 3128,'
pepmass: [630.9374 2.4366e+05]
charge: '7+'
rtinseconds: 1987
scans: 3128
spectrum: [885x2 double]
>> select = [data.scans] > 5200 ;
>> data(select)
ans =
1x6 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum
0 comentarios
Sanket Mishra
el 10 de Jul. de 2014
Put importdata command into try and catch block and look for the exception that gets displayed. This might help you.
try
List = importdata();
catch ex
disp(ex);
end
I would suggest you to use textscan instead of importdata which is more suitable to your workflow. Please follow the below link to the documentation of textscan
0 comentarios
Ver también
Categorías
Más información sobre Data Import and Export en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!