Textscan to read CSV File (Mixed Format)
8 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I am trying to read multiple large CSV files (200,000 X 200 each) into MATLAB 2013a 64bit which contains numbers and strings. The order of numbers and strings is not fixed between files. I am using "Textscan" function. If I use the "%f" format then "textscan" stops execution as soon as it encounters the string so instead I ended up using "%s". But, using "str2double" to convert to numbers takes a long time. "str2double" also converts any non-numeric entry in the data to NaNs and I like using it but the overhead is too much.
I am wondering if anyone else has found a work around for this. I would ideally prefer using "textscan" since the code queries user to specify the line at which column names are in the CSV file and also some lines are commented in the CSV file and textscan is ideal for this file type. But any ideas are appreciated.
Thanks
1 comentario
Respuestas (2)
Image Analyst
el 12 de Dic. de 2014
Apparently you need to inspect the line of text first to determine if it's a comment or not. So use fgetl() to read a string from the file. Then call strtrim() to get rid of any leading whitespace. Then check if the first character in non-numeric, a % symbol, or whatever else indecates a comment in your file. Then use sscanf() or textscan() to extract the numbers into a numerical array.
1 comentario
Star Strider
el 12 de Dic. de 2014
Please post one of your files or a representative sample of one that demonstrates the problem.
As an interim possibility, see if this approach works:
fidi = fopen('filename.csv');
d1 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);
fseek(fidi,0,0);
d2 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);
You may have to put that in a loop and subscript ‘d1’ and ‘d2’ and others appropriately. (This is archive code that worked for the file it was intended to read.) You will have to adapt it to your file structure, including an initial 'HeaderLines' option if necessary.
The code reads starting with the initial string and the numerical data following until it gets to the next string (here assumed to have the same structure) where it stops. The fseek function automatically re-starts textscan reading the file at the correct position at the second textscan call. It then reads that string and the subsequent data until it gets to the end.
2 comentarios
Star Strider
el 12 de Dic. de 2014
You would likely need to read them in with different format strings for each file. The first one might be:
'%s%f%f%f'
the second:
'%f%f%s%f%s'
reading the time in as a string, since it probably is. I see no way around that otherwise. You could parse the inputs to store only the numeric data and ignore the strings after you read each file.
Ver también
Categorías
Más información sobre Text Files en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!