Running out of memory

Hi All,
I have a couple of data text files which I am extracting data of interest from.
The total text data files size is about 549 MB, which I am writing into a single text file and it comes to about 500 MB since I am disregarding some information in the text files.
When I try to write to read and write the output text file into an excel file;
1. The process consumes all my RAM (8G) and computer freezes
2. I got this error at the end
Error using xlswrite (line 219)
The specified data range is invalid or too large to write to the specified file format. Try writing to an XLSX file and use Excel A1
notation for the range argument, for example, A1:D4.
Error in Data_Overall_Ext_Dir (line 29)
xlswrite (output, data);
Is there any way I can handle this?
With limited number of files, the script works well but as the number of files (data size) increases, I have this issue.
Kindly help.

7 comentarios

Walter Roberson
Walter Roberson el 6 de Feb. de 2017
Are you writing to a csv file? If so then do you need a text header line? Is there any text involved anywhere other a possible header line?
Theo Score
Theo Score el 7 de Feb. de 2017
Hi Walter,
This is the script;
workDir = 'C:\Users\Theo_Score\Desktop\COM_CFDEK';
data = fullfile(workDir, '*.data');
files = dir(data);
fileout = fullfile(workDir, 'Merged.txt');
fout = fopen (fileout, 'w');
for k = 1000:1000:100000
cntfiles = fullfile (workDir, ['dump' num2str(k) '.data']);
fin = fopen (cntfiles);
linecount =0;
while ~feof(fin)
linecount = linecount + 1;
tline = fgetl(fin);
if linecount >= 10 % Number of header lines being skipped
fprintf (fout, '%s\r\n', tline);
end
end
fclose(fin);
end
fclose(fout);
fileout = fullfile(workDir, 'Merged.txt');
fid = fopen(fileout,'r') ;
S = textscan(fid,'%f','delimiter','\n') ;
fclose(fid) ;
data = reshape(S{1},19,[])' ;
[dummy, Index] = sort (data(:,1));
data = data (Index,:);
% seperate the data
output = fullfile (workDir, 'Results.xlsx');
xlswrite (output, data);
So in short, I am merging the data into one single .txt file then read the .txt file into .xlxs file.
There is no text or headers in the merged .txt file as well the intended .xlxs file.
Thank you for your further help on how I can better this script.
Best regards.
Theo Score
Theo Score el 7 de Feb. de 2017
Editada: Theo Score el 7 de Feb. de 2017
Hi Walter,
In addition to the above, this is the sample of the first 5 lines of the merged data text file.
263 1 -0.0160595 0.00244881 -0.0962486 0 0 0 0.000427309 -0.00108329 -0.000464858 6.63484e-05 0.000120976 4.62236e-05 0.0989527 0.155754 -0.150322 0.0025 3
2969 2 -0.0111157 0.00246386 -0.0969006 0 0 0 0.000116854 -0.000384438 -0.000182779 5.28212e-05 0.000124234 -4.93361e-05 0.0977772 -0.0942376 0.117209 0.0025 3
308 1 -0.00618838 0.00247852 -0.0973835 0 0 0 0.00042683 -0.000342466 -0.000143103 0.000127069 0.000298262 -7.95949e-07 0.0695372 0.181823 -0.14406 0.0025 3
342 1 -0.00123651 0.00243941 -0.0975912 0 0 0 0.000330124 -0.000473911 -0.000260137 0.000105378 4.63017e-05 3.24658e-05 0.0806383 0.0642646 -0.0783811 0.0025 2
3110 2 0.00367546 0.00307267 -0.0974889 0 0 0 0.00022631 -0.000392997 -0.000104105 8.40109e-05 -2.8533e-05 5.60855e-05 0.0235912 -0.076895 0.215312 0.0025 3
In total its a 3,846,269 x 19 data matrix. The total size of the text file is 553 MB and I realise, I cant even copy and paste the data into Excel.
Thank you again.
Theo Score
Theo Score el 7 de Feb. de 2017
Hi All,
Further to the above, I just checked and think my problem lies on the fact that Excel has a maximum worksheet size of 1,048,576 rows by 16,384 columns.
Here is the issue. The Merged.txt is 'junk' file with all the data in a haphazard manner. The data is now sorted according to the ascending order of the first element of the column.
I will appreciate help on how this can be handled. Possibility is a script to that can have the data in multiple Excel worksheet files.
Thank you again.
Walter Roberson
Walter Roberson el 7 de Feb. de 2017
Sorting on the first column is not difficult. The maximum number of rows is a problem.
Given that you need to plot it using an an external tool (possibly Excel itself), what are the requirements of that graphing tool? For example if you need all of the points to be available at once then you aren't going to get anywhere because you just have too many rows for that.
I could make suggestions about dividing them to put them onto multiple sheets, or about putting them into multiple columns side by side, but unless there are key values in the data to break at, I would be guessing about where it would be acceptable to put the breaks. We cannot, though, just put all the columns side by side: you have 1000 files and 19 columns each would exceed the 16384 limit.
Theo Score
Theo Score el 8 de Feb. de 2017
Editada: Theo Score el 8 de Feb. de 2017
Hi Walter,
Thank you for your response. I thought of a work around the problem. I will use Linux sort/ split commands in Linux but I need to get an extra component to my data. This script:
workDir = 'C:\Users\Theo_Score\Desktop\COM_CFDEK';
data = fullfile(workDir, '*.data');
files = dir(data);
fileout = fullfile(workDir, 'Merged.txt');
fout = fopen (fileout, 'w');
for k = 1000:1000:100000
cntfiles = fullfile (workDir, ['dump' num2str(k) '.data']);
fin = fopen (cntfiles);
linecount =0;
while ~feof(fin)
linecount = linecount + 1;
tline = fgetl(fin);
if linecount >= 10 % Number of header lines being skipped
fprintf (fout, '%s\r\n', tline);
end
end
fclose(fin);
end
Was skipping 9 lines of the .data files. I need some help to modify this script so that it reads line 2 (which is time step, say 1000) as shown in the first 9 lines I was previously skipping.
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ATOMS
26
ITEM: BOX BOUNDS ff ff ff
-0.1 0.1
0 0.4
-0.1 0.1
id type x y z ix iy iz vx vy vz fx fy fz omegax omegay omegaz radius
I want to multiply 1000*0.00001 and return the value in the first column and the previous first column becomes the second column.
The first 5 lines from the previous script would be like
1 1 -0.0506691 0.301248 -0.0540098 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
7 1 0.0119942 0.300662 -0.0584242 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
6 1 0.0589997 0.30511 -0.0540171 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
3 1 -0.0512266 0.330591 -0.0441473 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
16 2 -0.0118166 0.320646 -0.046286 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
After the modification, I would want an output like
0.01 1 1 -0.0506691 0.301248 -0.0540098 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
0.01 7 1 0.0119942 0.300662 -0.0584242 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
0.01 6 1 0.0589997 0.30511 -0.0540171 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
0.01 3 1 -0.0512266 0.330591 -0.0441473 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
0.01 16 2 -0.0118166 0.320646 -0.046286 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0
I would appreciate help on this.
Walter Roberson
Walter Roberson el 8 de Feb. de 2017
I would probably toss it all into perl...

Iniciar sesión para comentar.

Respuestas (1)

cr
cr el 6 de Feb. de 2017

0 votos

If xls format is not critical requirement, you may want to use fileIO operations and produce a csv (or some text delimited) format. Once a file text file is successfully generated it can be converted to xls using MS Excel.

1 comentario

Theo Score
Theo Score el 7 de Feb. de 2017
Editada: Theo Score el 7 de Feb. de 2017
Hi cr,
I have embedded the script on this thread on a response to Walter.
Essentially, I need to plot graphs with the information I get so I need the data to be in .xlsx. Further comment is also given in that response.
I appreciate your further help.
Best regards.

Iniciar sesión para comentar.

Categorías

Más información sobre Get Started with MATLAB en Centro de ayuda y File Exchange.

Etiquetas

Aún no se han introducido etiquetas.

Preguntada:

el 6 de Feb. de 2017

Comentada:

el 8 de Feb. de 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by