MATLAB Answers

0

Why does converting a table to a struct increase memory usage by 15x??

Asked by Brian Kardon on 28 Oct 2019
Latest activity Commented on by Peter Perkins
on 1 Nov 2019
I'm reading tabular data from a file using the "readtable" function - each file has 54 fields and 1000 rows. It takes up 250 kB on disk, and 450 kB in memory as a table. Then, when I try to convert the table to a struct using the "table2struct" function, the resulting struct takes up 6.5 MB!!! Why does converting from a table to a struct result in a 15x increase in memory usage? I have several thousand of these files to manipulate, so 450 kB per file is fine, but 6.5 MB makes MATLAB run out of memory! No good.
Here's some output to verify my assertions:
>> t = readtable('example_file.dat');
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved in the
VariableDescriptions property.
>> t.Properties
ans =
struct with fields:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1×54 cell}
VariableDescriptions: {1×54 cell}
VariableUnits: {}
RowNames: {}
>> size(t)
ans =
1000 54
>> ts = table2struct(t);
>> size(ts)
ans =
1000 1
>> whos t ts
Name Size Bytes Class Attributes
t 1000x54 457776 table
ts 1000x1 6483456 struct
Why does converting from table to struct waste so much memory, and how can I fix it?
Thanks in advance for any help!
PS: For some reason this form won't allow me to select a release - I'm using MATLAB R2017a.

  2 Comments

Walter Roberson
2019 年 10 月 28 日
Table objects have one datatype stored per variable (and more for variables that are cell)
struct have one datatype stored per field per struct array element.
PS: For some reason this form won't allow me to select a release - I'm using MATLAB R2017a.
Select the product first, then the release dropdown should populate.

Sign in to comment.

Products


Release

R2017a

1 Answer

回答者: per isakson
2019 年 10 月 28 日
編集済み: per isakson
2019 年 10 月 28 日

What kind of structure do you expect? table2struct can create two kinds.
  • struct array with one struct for each row of the table.
  • scalar struct with each column of the table stored as one field value
Try
t = readtable('example_file.dat');
struct_scalar = table2struct( t, 'ToScalar', true );
struct_array = table2struct(t);
whos
which returns
Name Size Bytes Class Attributes
struct_array 12x1 15040 struct
struct_scalar 1x1 2720 struct
t 12x10 4068 table
where example_file.dat contains
f00 f01 f02 f03 f04 f05 f06 f07 f08 f09
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
I assume that you expected table2struct(t); to create a scalar structure

  4 Comments

Show 1 older comment
Brian, you say "have to stick with tables". Is that causing you headaches? I'm genuinely asking; if you can say why you want edto convert to a struct array, that would be helpful.
Peter,
Actually, I'm starting to think using tables probably won't cause me headaches.
I'm more famliiar with structs than tables, and I was under the impression that a table in MATLAB was a "higher level" object that would incur more overhead during manipulation and processing, compared to a struct, and I was also under the impression that the set of functions for manipulating structs was more plentiful than the set of functions for tables. Perhaps both of those assumptions are incorrect, and tables will be a good choice for my primary data structure!
Peter Perkins
2019 年 11 月 1 日
It all depends on what you are doing.
If you are using a struct array (as opposed to a scalar struct each of whose fields is itself a vector), a table is a clear winner memory-wise. This makes a big difference as your data size gets larger.
Tables allow you to easily slice your data in two directions. A struct array lets you slice along the "array" dimensions, but not so easily along the "fields" dimension. And tables support operations like joins and sorting and unique and others that struct arrays don't. So syntactically, I think you will be happier with tables.
Performance wise, it depends on what you are doing. Subscripted assignment and reference for tables is usually the thing people flag, but those have been getting more performant in the last couple releases (and that will continue). Performance-wise, I think you want to go for ease of use and move away from tables only if you have real performance issues. And even then, it's usually possible to vectorize your code, or to "hoist" a few variables out of the table for a short scope in your code.

Sign in to comment.