Using matfile to partially extra data still loads entire file into memory

5 visualizaciones (últimos 30 días)
Hello,
I have a .mat file saved in the -7.3 format. The content of the file is a large cell array. I am using (,) indexing to retrieve a single row:
obj = matfile('File.mat');
Data = obj.CellArray(RowNum,:);
I've done some investigating on the memory usage and when I run that command, it works, I get the line out of the cell array.
However, it takes the same amount of time as loading the .mat file into the workspace and it uses the same amount of memory. From the MATLAB help files I thought this syntax was designed to only partially load files into memory. Am I doing something incorrect or does the feature not work the way I hoped it would?
Thank you for your help.
  3 comentarios
Justin Brooks
Justin Brooks el 8 de Abr. de 2021
One issue I have is our data is actually housed in tables, so I put each table in an individual cell since table isn't supported in the matfile construct. I could have each table column be it's own column in a cell array, could do a nested cell array even (thinking out loud here). But even then, at that point I don't think I'm gaining anything. The point of this was to just see if I could load part of a variable without having to load the entire variable into memory, but alas if it isn't to be, it isn't to be.
I did try pulling a single column of the cell array instead of all the columns. There was a slight speed up using tic/toc to measure time, but nothing significant. Memory useage didn't seem to indicate anything less than full file load again.
I've also looked into tall arrays and datastores, but that's not germain to this discussion.
dpb
dpb el 8 de Abr. de 2021
I guess with cell arrays I'm not terribly surprised that it might have to dereference them to get stuff out -- and so, while what it returns is only what is asked for, it took the same or more effort to produce than just the straight load and then clearing what don't want.
In straight arrays, the direct location can be computed and memcopy() invoked on a buffer and stuff can be streamlined; I've no idea what the actual memory structure of cell arrays is having never poked around in the innards, but there's a whole lot of overhead associated with them and tables add yet another layer on top.

Iniciar sesión para comentar.

Respuestas (2)

Matt J
Matt J el 9 de Abr. de 2021
Editada: Matt J el 9 de Abr. de 2021
We can run a test right here. The one below suggests there is some benefit, though perhaps not as much benefit as I would have expected given the size of the data being loaded. You're sure the format of your File.mat is v7.3?
CellArray(1:100,1:100)={rand(50)};
save -v7.3 File CellArray
tic;
L=load('File');
toc
Elapsed time is 2.161086 seconds.
obj=matfile('File.mat','Writable',false);
tic
obj.CellArray(20,:);
toc
Elapsed time is 0.587641 seconds.
  1 comentario
Justin Brooks
Justin Brooks el 9 de Abr. de 2021
I ran the following as my test:
fid = fopen('TestPartialLoadCell.mat');
txt = char(fread(fid,[1,40],'*char'));
txt = [txt,0];
txt = txt(1:find(txt==0,1,'first')-1);
That came back with:
'MATLAB 7.3 MAT-file, Platform: PCWIN64, Created on: Thu Apr 8 08:50:48 2021 HDF5 schema 1.00'
I then ran the following code on the paritial cell array ( a reduced number of rows from the true data set, but the same data):
tic;
load('TestPartialLoadCell.mat')
toc
size(EventCell)
clear all;
tic;
obj = matfile('TestPartialLoadCell.mat')
toc
tic;
Test = obj.EventCell(46,:);
toc
The output to console was:
1.871945 seconds % Loading whole file into the workspace
80,3 % Size of the cell array
1.824823 seconds % Creating matfile object
3.131165 seconds % Pulling single row
I also did the same test with the entire data file just to see. Output to console was:
280.921421 seconds % Loading whole file into the workspace
5421,3 % Size of the cell array
279.005530 seconds % Creating matfile object
624.437779 seconds % Pulling single row
I was also watching the memory usage and it was all over the place, but each operation peaked at the same amount of memory. Now with creating the object and pulling a single row the memory went back down, but that was the whole point of the exercise to begin with was to try and reduce memory usage and time for an end user.
So I just don't think it's going to work for my application which is fine. There are other ways to accomplish the same task, they just arn't as "fancy".
Thanks!

Iniciar sesión para comentar.


Stephen23
Stephen23 el 9 de Abr. de 2021
Editada: Stephen23 el 9 de Abr. de 2021
Transpose the cell array (when it is created), so that you are accessing a contiguous part of the cell array:
Data = obj.CellArray(:,ColNum);
% ^^^^^^^^ first index is colon!

Categorías

Más información sobre Workspace Variables and MAT-Files en Help Center y File Exchange.

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by