Error converting python DataFrame to Table

I have used the following commands to load in a python .pkl file.
fid = py.open("data.pkl");
data = py.pickle.load(fid);
T = table(data);
This loads a python DataFrame object. Newer versions of MATLAB have the ability to convert this object to a table using the table command, which I tried but encountered the below error:
Error using py.pandas.DataFrame/table
Dimensions of the key and value must be the same, or the value must be scalar.
What does this error mean? I'm guessing it's because the DataFrame object in the .pkl contains a couple nested fields. Most of the fields are simply 1xN numeric vectors, but a couple are 1xN objects which then have their own fields.
How can I convert this DataFrame object to something usable in MATLAB? I was given this datafile and did not generate it, and I am much more proficient in MATLAB than python, so I would rather solve this within MATLAB rather than having to create a python script or change how the file is created.

 Respuesta aceptada

Umar
Umar hace alrededor de 17 horas
Editada: Umar hace alrededor de 17 horas

1 voto

Hi @David, Thanks for writing in — you've actually already diagnosed this correctly, so let me just confirm it and get you moving. The table() conversion (available since R2024a) only handles one level of DataFrame nesting. Those 1×N object columns in your file that carry their own sub-fields push past that limit, and that's exactly what's throwing the dimension mismatch. Your flat numeric columns are completely fine — it's only the nested ones tripping it up. Before anything else, just run this to see what you're working with: fid = py.open("data.pkl", "rb"); data = py.pickle.load(fid); py.print(data.dtypes); py.print(data.head(int32(3))); For your flat columns, pull them out directly: flat_cols = {"col1", "col2", "col3"}; arrays = cellfun(@(c) double(data{c}.values), flat_cols,'UniformOutput', false); T = array2table(cell2mat(arrays), 'VariableNames', flat_cols); For the nested ones, you don't need a separate Python script. Call pandas.json_normalize inline from MATLAB — it flattens nested fields into dot-separated columns (e.g. sensor.value becomes a normal flat column), and after that table() will convert without issue: records = data.to_dict("records"); flat_data = py.pandas.json_normalize(records); T = table(flat_data); If that still gives you trouble, take a look at the PandasToMatlab utility on File Exchange (https://www.mathworks.com/matlabcentral/fileexchange/111770-pandastomatlab). The df2t() function there handles more edge cases than the built-in path and works entirely in memory. Full type-conversion details are in the docs here if you want to check what maps to what: https://www.mathworks.com/help/matlab/matlab_external/python-pandas-dataframes.html Hope this helps!

2 comentarios

David K
David K hace alrededor de 5 horas
This is very helpful, thank you! I was able to use the json_normalize method to make it easily convertible to a table.
A couple other things: Would you mind formatting your answer? It is very difficult to read as is. Also, your line arrays = cellfun(@(c) double(data{c}.values), flat_cols,'UniformOutput', false); gives the error "Brace indexing is not supported for variables of this type."
Umar
Umar hace alrededor de 1 hora
Glad json_normalize worked out, David! Apologies for the messy formatting — here's a cleaner version of the full answer. Step 1-Inspect what you're working with: fid = py.open("data.pkl", "rb"); data = py.pickle.load(fid); py.print(data.dtypes); py.print(data.head(int32(3))); Step 2 — For flat numeric columns only: flat_cols = {"col1", "col2", "col3"}; arrays = cell(1, numel(flat_cols)); for i = 1:numel(flat_cols) arrays{i} = double(data{py.str(flat_cols{i})}.values); end T = array2table(cell2mat(arrays), "VariableNames", flat_cols); Step 3 — For nested columns (the one that solved your problem): records = data.to_dict("records"); flat_data = py.pandas.json_normalize(records); T = table(flat_data); On the brace-indexing error: the issue is that {} is a MATLAB cell array operation — a py.pandas.DataFrame isn't a cell array, so MATLAB rejects it. The fix is wrapping the column name in py.str() so MATLAB passes a proper Python string key to the DataFrame. I've also swapped cellfun for a plain loop since it's more reliable across MATLAB versions. That said, since json_normalize already handles both flat and nested columns in one shot, you probably won't need the flat-column path at all. Hope that clears things up!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Productos

Versión

R2025b

Preguntada:

el 21 de Abr. de 2026 a las 18:28

Comentada:

hace alrededor de 4 horas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by