Performance comparison among Struct Array, Cell Array and Table

I am facing an issue when to use what. There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays.
I did some search online for the performance among three of them: Struct will be the fastest, but still not really clear when to use what.
Can someone give me a general concept of the performance among these MATLAB data structures?
Thanks :)

3 comentarios

How are you accessing the data downstream in your code? Can you give a short example of how you would be accessing your data using the three methods you mention? I.e., some code or pseudo-code showing what you are thinking about implementing.
"There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays."
You only list container classes. What about the simpler ways of storing data: the numeric array (single, double, uint*, and int*), the character array, and the logical array? These are faster to access than the ones that you list. Is there a reason why you do not list them?
i found this super helpful due to the discussion it generated - thanks Kat Lee!

Iniciar sesión para comentar.

Respuestas (3)

Bruno Luong
Bruno Luong el 2 de Nov. de 2018
My general rule of thumbs:
  • Simple Array is the fastest
  • Using cell if you don't have a choice (mixing class or uniform sizes) and don't care about how to "name" elements.
  • Next recommendation is using struct of arrays and/or cell-arrays, that allows to have meaningful fieldnames, and flexible data exchanges.
  • Avoid at all cost array of structs for large number of records (said > 10), this will soon or later have big penalty of speed. I can't remember the last tile I use it, probably in my youth and never did it again.
  • Table is sort of Object Oriented built on top of CELL, personally I never feel a need to use it. I recognize it's very attractive for people who like excel sheet. ;-)

5 comentarios

Kat Lee
Kat Lee el 2 de Nov. de 2018
Editada: Kat Lee el 2 de Nov. de 2018
Thank you so much @Bruno for your such specific answers.
According to your answer, the speed performance will be Cell > Struct > Table, am I right?
Could you explain a little bit more of "Avoid at all cost array of structs for large number of records (said > 10)", I feel that >10 will be very easy to reach since I am dealing with large number most of the time.
Bruno Luong
Bruno Luong el 2 de Nov. de 2018
Editada: Bruno Luong el 2 de Nov. de 2018
Struct of array and array of struct
% struct of array, recommended
>> s = struct('x', 1:100, 'value', sin(1:100))
s =
struct with fields:
x: [1×100 double]
value: [1×100 double]
% array of struct, not recommended
>> s = struct('x', num2cell(1:100), 'value', num2cell(sin(1:100)))
s =
1×100 struct array with fields:
x
value
The later will be very slow and impractical to process in MATLAB, since that data are scattered everywhere. Juts avoid to have you data structure like the second way, contrary to language like C/C++ where such data structure is perfectly efficient to handle.
Bruno, tables are NOT built on top of cell, at least not in the way that you probably mean. Compare the memory requirements:
>> x = randn(1000000,10);
>> t = array2table(x);
>> c = num2cell(x);
>> whos x t c
Name Size Bytes Class Attributes
c 1000000x10 1200000000 cell
t 1000000x10 80003090 table
x 1000000x10 80000000 double
A struct array has memory footprint similar to a cell, while a scalar struct of vectors has a footprint similar to a table.
Performance-wise, a double array wins. A cell array or a struct array is likely gonna need a loop, since there's no simple way to get a contiguous vector of values corresponding to one of x's columns. A table, and a scalar struct will have good performance for vectorized operations.
Peter
Tables are built on top cell arrays. Your example is misleading since you're comparing two very different things. Your cell array c is literally a 1000000-by-10 array. Your table t is built on top of a 1-by-1 cell array, where the entire numeric array x is placed in one cell. This is how tables work - each "variable" in the table language is placed in its own cell. The table t is hence sort of equivalent to a cell array { x }.
Notice Peter's phrase, "at least not in the way that you probably mean."
In particular, many people tend to think that a table with N rows and V variables is stored as an N by V cell array, but instead it is stored as a struct that contains a 1 x V cell array each entry of which is an object with N rows.

Iniciar sesión para comentar.

Matt J
Matt J el 1 de Nov. de 2018
Editada: Matt J el 1 de Nov. de 2018
They should all be about the same speed. If speed matters and the data is large, however, you shouldn't be using any of these. You should be storing data in numeric arrays instead. That way the data will be held contiguously in RAM and accessing it will be very fast.

5 comentarios

My expectations are that cell would be slightly faster than struct as struct involves a symbol lookup where cell is just following pointers. Either one should be faster than table objects as those have overhead for object processing.
Kat Lee's comment moved here
Thank you for answering my question, for my case, store in numeric array won't be applicable for me since I also need the fieldname to associate with number.
Plus, there do exist time difference between these three when I run the scripts
Matt J
Matt J el 1 de Nov. de 2018
Editada: Matt J el 1 de Nov. de 2018
Thank you for answering my question, for my case, store in numeric array won't be applicable for me since I also need the fieldname to associate with number.
But it is better to do this
s.name=rand(10000,1);
than it is to do this,
z=num2cell(rand(10000,1));
[s(1:10000).name]=z{:};
Stephen23
Stephen23 el 6 de Nov. de 2018
Editada: Stephen23 el 6 de Nov. de 2018
"store in numeric array won't be applicable for me since I also need the fieldname to associate with number."
That is a very poor reason not to use numeric arrays, especially if you then ask about efficiency accessing data!
Simply keep an array of text data (e.g. cell array of char vectors, string array) and a corresponding array of numeric data (any numeric class). This will make your data processing much simpler and more efficient than messing about with numeric data pointlessly split up into a cell array.
A table might be a good solution (it effectively does the same thing).
Especially when one can put the numerical array inside a struct with a meaningful fieldname.

Iniciar sesión para comentar.

Peter Perkins
Peter Perkins el 6 de Nov. de 2018
Kat, there's no way you are gonna get a useful answer without providing more information. The best representation of your data is gonna depend on your data and what you are doing, and how you plan on writing your code. Without knowing that, any answer is just guessing.

5 comentarios

Thanks Peter for letting me know this condition. The thing is that we have several different case for our data need to stored in MATLAB. One type is stored as table, the table has 3 columns for attributes name and multiple rows corresponds to values that have been calculated. This is one case, another case is that we been using cell to store some data as size = 2 x N (30>N>10), first row is name (char), second is numbers. And then we will apply interpolation based on this cell table for more rows. For these two cases, which data structure will most efficiency one to use?
You are probably going to be very unhappy with cell, if you mean what it sounds like you mean.
My advice is to start with either numeric arrays or tables. Only go to something else if you run into trouble.
I really doubt Table's performance, since what I see before Table's performance is not very good
Matt J
Matt J el 6 de Nov. de 2018
Editada: Matt J el 6 de Nov. de 2018
I really doubt Table's performance, since what I see before Table's performance is not very good
That's not a good reason in and of itself to doubt the performance of tables. The person who was demonstrating their performance to you may have been an inexperienced programmer who didn't use properly vectorized methods to get the best performance.
Also, Mathworks has improved table() performance over the years.

Iniciar sesión para comentar.

Categorías

Preguntada:

el 1 de Nov. de 2018

Comentada:

el 18 de Mzo. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by