What are pros and cons of matfile vs memmapfile for partial loading of large data?

Question

Kouichi C. Nakamura el 6 de Ag. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/475093-what-are-pros-and-cons-of-matfile-vs-memmapfile-for-partial-loading-of-large-data

Comentada: Kouichi C. Nakamura el 12 de Ag. de 2019

MATLAB offers ways to access a fraction of data without loading a whole file, and they can be really useful particularly when you need to handle large data.

One way is to use matfile for *.mat files wrtten with save. Another is to use memmapfile for binary files (typically, *.bin or *.dat written with fwrite)

Let us suppose that we'd like to save the following variables into a file:

A = randi([intmin('int16'),intmax('int16')], 1000000,1, 'int16'); % 1000000x1, int16
Fs = 1024     % 1x1, double
scale = 1.20  % 1x1, double
offset = 0.05 % 1x1, double
title = 'EEG' % 1x3, char

Which of the matfile and memmapfile approaches do you think is better? What are the pros and cons? Please give me insights on this matter.

matfile and MAT files

If we save the variables above into myData.mat, partial loading of A can be achieved with Pmatfile as below:

save('myData.mat','A','Fs','scale','offset','title','-v7.3') % use -v7.3 flag for partial loading
mat = matfile('myData.mat') % 8 bytes
A(10) == mat.A(10)

ans =

logical

1

Pros?

The variables stored in the *.mat file can be easily distinguished by the property names of mat.
You don have to worry about too much for bytes used by each variables.

Cons?

If MATLAB becomes unavailable, access to the data might be hard.
If the format of MAT file changes in the future, the data might be broken (although it's likely the Mathworks tries to keep back compatibility)
Because matfile has limitations for table, structure, cell, sparse array, and custom class data type, you need to save the bulky variables into simple MATLAB arrays.

memmapfile and binary files

If we save the variables above into myData.bin, partial loading of A can be achieved with memmapfile as below:

fid = fopen('myData.bin','w','n','UTF-8')
fwrite(fid,Fs,'double')
fwrite(fid,scale,'double')
fwrite(fid,offset,'double')
fwrite(fid,title,'char')
fwrite(fid,A,'int16')
fclose(fid)
fid = fopen('myData.bin','r','n','UTF-8')
Fs_ = fread(fid,[1 1],'double')        % 8 bytes
scale_ = fread(fid,[1 1],'double')     % 8 bytes
offset_ = fread(fid,[1 1],'double')    % 8 bytes
title_ = char(fread(fid,[1 3],'char')) % 3 bytes in file (although 6 bytes in Workspace)
fclose(fid)
m = memmapfile('myData.bin','Format',...
    'int16','Offset', 8+8+8+3) % 192 bytes
A(10) == m.Data(10)

ans =

logical

1

Pros?

Because the data is stored in bare binary files, as long as metadata is stored somewhere else it's guranteed that you can open the file in the future.

Cons?

Writing and reading with low level functions fwrite and fread as well as memmapfile are A LOT more labourious then using high level operations with save and matfile
You cannot access variables in the file by their names.
You need to know exact bytes used for each variables in the file.

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Athul Prakash el 12 de Ag. de 2019

Looks like everything's been covered already in your question and by others.

A few more subtle points which may be useful...

1) Like you mentioned, memmapfile approach involved a lot of low level processing of data.

2) However, it can result in more efficient code since matlab would treat the data on the file just like any other variable - fread and fwrite are not required.

3) memmapfile is also faster at read/write, since it uses the OS's virtual memory capabilities to perform File I/O instead of using the I/O buffers allotted to Matlab process.

4) For partial loading using matfile method, you would need to use Matfile v 7.3, which is a relatively newer version. Hence, your code might not be compatible with older versions. Ofcourse, no such limitaitons exist for the memmapfile method.

5) You should really be looking at using a datastore. They are custom built to handle chunks of large data at a time. datastores are also the foundation for many big-data processing tools in matlab such as mapReduce and tall arrays.

Hope it helps!

Kouichi C. Nakamura el 12 de Ag. de 2019

Thanks. I like all the points you raised. Yeah, I started learning about datastore and tall arrays.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

What are pros and cons of matfile vs memmapfile for partial loading of large data?

Pros?

Cons?

Pros?

Cons?

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

What are pros and cons of matfile vs memmapfile for partial loading of large data?

Pros?

Cons?

Pros?

Cons?

6 comentarios Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos