Save using -append behaves differently when replacing objects vs replacing arrays

15 visualizaciones (últimos 30 días)
Hi all,
I'm trying to save some variables to a .mat file, appending to that file if the variable is new, overwriting if it's already there. I've found different behaviour if the variable is an object vs if it's a simple array. Here's a demonstration:
Firstly, arrays
p = rand(1000,3);
save('test.mat','p')
whos '-file' test.mat
d1 = dir('test.mat');
p = rand(10,3);
save('test.mat','p','-append')
whos '-file' test.mat
d2 = dir('test.mat');
fprintf('Saved .mat file:\n1000 pt triangulation: %d bytes\n 10 pt triangulation: %d bytes.\n',...
d1.bytes, d2.bytes)
Results in:
Saved .mat file:
1000 pt triangulation: 22907 bytes
10 pt triangulation: 430 bytes.
Now, objects
p = delaunayTriangulation(rand(1000,3));
save('test.mat','p')
whos '-file' test.mat
d1 = dir('test.mat');
p = delaunayTriangulation(rand(10,3));
save('test.mat','p','-append')
whos '-file' test.mat
d2 = dir('test.mat');
fprintf('Saved .mat file:\n1000 pt triangulation: %d bytes\n 10 pt triangulation: %d bytes.\n',...
d1.bytes, d2.bytes)
Results in:
Saved .mat file:
1000 pt triangulation: 72833 bytes
10 pt triangulation: 73319 bytes.
As you can see, the first -append with arrays did exactly as expected. The large variable p was overwritten by the smaller variable with the same name. The resulting filesize was reduced (as is expected).
The second -append worked differently. Here, it seems that the original large p object stayed in the file, even after it was replaced with a much smaller object with the same name. Overall, the filesize actually increased. This is surprising. Is this a bug? A feature with dubious utility? It's an annoyance at the moment. The only way I've found so far is to reload the whole contents of a .mat file into memory every time I want to -append a replacement to ANY of the object variables in that file. This would be fine with small files, but I'm dealing with a few hundred MB files containing a suite of 100 MB objects - it's really cumbersome to reload all when I only need to save one.
Thanks, Sven.

Respuesta aceptada

Walter Roberson
Walter Roberson el 3 de Feb. de 2016
save -append has always been defined to patch out the old data and add the new data to the end of the .mat file without necessarily reducing the file size at all. It is possible that there is an optimization for the case of a single numeric array or the case of a numeric array that happens to be the last thing in the file, but that has never been guaranteed. At no time has save -append been defined as needing to remove the old data and "dropping down" whatever follows to fill the hole. The -append flag is for fast updating of a .mat file, not for space efficiency.
If you need to update variables inside a .mat file you should consider using a -v7.3 file and using matFile()
  2 comentarios
Sven
Sven el 9 de Feb. de 2016
Editada: Sven el 9 de Feb. de 2016
Fair enough. I would call this then a feature with dubious utility.
For anyone looking to do what I was trying to do - the only way to replace saved objects and not continually increase file size (in R2015b or earlier) is to do away with "-append" completely:
tmp = load('filename.mat')
tmp.var = newVar % some object you want to save as "var"
save('filename.mat','-struct',tmp)
It's not clean and has obvious overhead (not really a viable system if you're saving thousands of small variables into one large file), but at least the above does an actual replace.
For reference, using -v7.3 and calling matfile() rather than save() actually has the same inconsistent behaviour... overwriting a saved variable with a smaller one of the same name reduces overall .mat file size if the variable is a simple array, but not if the variable is an object.
The documentation for save doesn't mention this one way or the other - it would be great if it just acknowledged it with something like "using -append to save objects will always result in increased filesize as new data is appended to the end of the file irrespective of whether variables with identical names are being replaced." I know... clear as mud anyway.
Simon Matte
Simon Matte el 16 de Sept. de 2020
I have the same issue,
Bless you for coming back to this and re-summarizing the situation.
I'm glad I was not alone to encounter this problematic and be confused at the misleading documentation...

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Sparse Matrices en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by