Deletion of cell array element performance issue - alternative or solutions?

I have a struct array with 7 equal-sized arrays and one scalar value. One of the arrays is a cell array, as each element can contain a varying number of elements.
I have done performance profiling and found that deleting an element of the cell array is consuming 90% of the effort of my element-removing function. e.g. cell_array(index) = []; The function is called so often that it is consuming the majority of the time for my code.
Why is removing an element from a cell array so much more expensive than from a regular array?
Is there a more efficient way to achieve my goal of removing an element?
Is there an alternative way to store a varying-length element within an object that is otherwise a struct array that is more efficient for deletion operations?

8 comentarios

Removing a single element from an array (always?) involves re-aranging the memory used by the array. A possible solution would be to remove multiple elements at once, by first collecting all the indices. Or may be you do not need to remove them but just keep track of which elements you do not need for further processing by means of a logical array?
What is the size of the cell array?
I would assume that removing element from a cell array involves behind the scene:
  1. Allocation of a new cell array smaller than the original one
  2. Copy of all the elements of the cell array bar the one removed. Note that the elements of the cell array are pointers to the memory location of the matrices/objects contained in each cell. These matrices/objects are not copied so their size does not matter.
  3. If the content of the cell is not shared by any other variable, then release the memory used by that content.
Step 1 and 2 are the same that would occur for a matrix. Step 3 is in addition.
The array sizes are on the order of 100,000 elements. It must be the memory freeing that is the slowdown.
Or not. if I take the cell, shift everything down and move it to the end, so that the reference still exists, it's just as slow as deletion alone.
Nicholas Bauer
Nicholas Bauer el 13 de Oct. de 2017
Editada: Nicholas Bauer el 13 de Oct. de 2017
In fact, the profiler suggests that the shifting operation is the most expensive part. Could it be that internally, Matlab doesn't actually remove an element from a normal array, but just applies an indexing to make it appear as if it had removed an element from the middle? And that for cell arrays, it actually does shift and remove?
Well, it's all conjectures since mathworks does not document the memory management of matlab. I would assume that for both cell arrays and normal arrays a deletion or insertion involves a complete copy of the array content, 64-bit pointer vs 64-bit double so both should take the same time. I don't think that matlab could take any shortcut for a normal array such as maintaining a table of removed indices as that would break things such as reshape and access through mex.
Possibly, when matlab copy/move elements of a cell array, there's more than just copying a pointer. Maybe matlab also updates/checks the reference count of the pointers.
For reference: comparison between array and cell arrays (R2017b matlab online)
%Deletion
a = 1:10000;
c = num2cell(a);
tic; for i = numel(a):-1:1, a(randi(i)) = []; end; toc
tic; for i = numel(c):-1:1, c(randi(i)) = []; end; toc
Elapsed time is 0.230962 seconds. %Array
Elapsed time is 0.885017 seconds. %Cell array
So roughly 4 times slower when deleting elements from a cell array
%Shifting
a = 1:10000;
c = num2cell(a);
tic; for step = 1:10000; idx = randi(numel(a)); a = [a(1:idx-1), a(idx+1:end), a(idx)]; end; toc
tic; for step = 1:10000; idx = randi(numel(c)); c = [c(1:idx-1), c(idx+1:end), c(idx)]; end; toc
Elapsed time is 0.271223 seconds. %Array
Elapsed time is 3.486516 seconds. %Cell array
Shockingly slow with a cell array even though it should be the same amount of memory moved. Not sure what is going on.
@Nicholas Bauer: resizing data arrays in a loop is invariably a bad idea, there are plenty of discussions on this forum about why resizing matrices in a loop is slow. You would be much better off simply storing a logical mask of the relevant rows.

Iniciar sesión para comentar.

Respuestas (3)

Guillaume
Guillaume el 13 de Oct. de 2017
Cell array and matrices are very efficient for indexing (O(1) complexity) but innefficient for insertions and deletions (O(n) complexity).
The only container included in matlab that is efficient for insertions and deletions is containers.Map. Depending how it's implemented in matlab (unfortunately this is not documented), time complexity should be O(1) or O(log(n)). However, indexing takes longer.
There are other containers that have different trade-offs between insertion/deletion, indexing, search, etc. But none of these are implemented in matlab. I believe that there's an implementation of linked lists on the FileExchange (insertions/deletions in O(1)).

4 comentarios

In theory the complexity of inserting and deleting elements is O(n), but Matlab reduces this. A re-allocation of a smaller arrays does not release the free'd memory immediately. It seems like growing arrays iteratively is improved also by the JIT. Both methods are not documented.
But without doubt the complexity is in the magnitude of n and much larger than 1, such that growing and shrinking of arrays is very expensive. +1
I understand this in general, but if a Cell array is just pointers to arrays (presumably 32 or 64 bit addresses?) and the data I'm storing in the array are integers (presumably also 32 or 64 bits?), why is the latter extremely fast and the former very slow? It must be doing something more complex with cell arrays.
I think perhaps Guillaume's #3 is the difference then, freeing memory.
Nevermind, it isn't-- see comment on question.

Iniciar sesión para comentar.

James Tursa
James Tursa el 16 de Oct. de 2017
Editada: James Tursa el 16 de Oct. de 2017
Keep in mind that deleting an element of a double array means:
- Allocating/copying the data (64-bit doubles) to new memory & free'ing old memory
Whereas deleting an element of a cell array means:
- Allocating/copying the data (32-bit or 64-bit pointers) to new memory & free'ing old memory, PLUS ...
- Checking for any sharing (data, reference, etc) of the cell element being deleted
- Unsharing the cell element if it is sharing in any manner
- Freeing the data of the cell element itself
- Free'ing the mxArray variable header of the cell element
So, because of that mxArray header stuff for the cell array variable pointers, a lot more is going on with the MATLAB memory manager than just allocating/copying the pointers in the data area. And free'ing memory does not necessarily mean that this memory gets released back to the OS. MATLAB may hang onto it in the background, keeping it in reserve for additional anticipated allocations in the future. As Jan points out, the details of all of this memory manager stuff are not published, so really I have no clue what I would expect for timing for this operation.
It would be interesting to see if one could force those deletions to be simply reference counter decrements instead of mxArray header free'ing. E.g., if you weren't concerned about the memory implications, something like this might speed things up:
- Create a new cell array where the elements are all reference copies of the original cell array
- This can easily be done in a mex routine (Maybe also at the m-file level but I would have to think about it first)
- Then delete the cell elements from the original. These deletions would result ONLY in a reference count decrement in the mxArray header ... no headers would need to be free'd so no memory manager stuff needed for this particular operation.
Then maybe the timing would reduce down to what the double array takes. But this is all just a conjecture on my part. Also, the downside is of course that all of the original cell elements remain in memory. But if you care mainly about the speed and not the memory then this approach might work.
My solution, for the moment, is to not remove any elements from the array, just keep track of validity, the valid and total count of elements, and when needed take the valid subset.

Categorías

Productos

Preguntada:

el 13 de Oct. de 2017

Editada:

el 16 de Oct. de 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by