Borrar filtros
Borrar filtros

Using groupsummary with Floating Number of Columns to Analyze

2 visualizaciones (últimos 30 días)
Hello,
I'm planning on using the groupsummary function to analyze different groups of data at various timesteps (columns).
The number of timesteps will vary, so I don't want to hard code in a set number.
My first issue that I'm having came whenever I changed the name of the columns that contain the data I want to analyze.
After importing into MATLAB, the column headers were originally just numbers (0, 1, 2, and 3), so MATLAB gave it generic 'VarName3', 'VarName4', 'VarName5', and 'VarName6'. (Columns 1 and 2 aren't needed)
I changed these to be the original numbers by extracting data from the Excel file and converting the numbers to strings (using num2str).
When the column headers were the VarName, the groupsummary function worked (code below):
G = groupsummary(table,'groupnames',"sum",{'VarName3','VarName4'});
But now that I changed it to be reflective of the timestep in quesiton, it gives the error 'Invalid data variable'.
G = groupsummary(table,'groupnames',"sum",{'0','1'});
I'm not sure why just changing the column headers would cause an error.
If anyone has any insight, I would appreciate it.
My second question is how to have the number of columns to analyze be dependent on the imported table?
I was thinking something like this, where the first column to analyze is always the 3rd column, and the very last column is never analyzed, and N is the total number of columns.
for k=3:N-1
G = groupsummary(table,'groupnames',"sum",table(:,k));
end
As I understand it, the part in the function where you state which columns you want to analyze must be written as a string value, so something in the apostrophe.
With that in mind, I'm not sure how to make it float, since, at least as far as I'm aware, you can't put variable (k) in apostrophes.
Any help is appreciate, let me know if you'd like more detail, or I can split this question up into two separate posts.
Thanks.
  2 comentarios
Cris LaPierre
Cris LaPierre el 25 de Abr. de 2023
Please attach your file, or save your variables to a mat file and attach that to your post using the paperclip icon.
Jon
Jon el 25 de Abr. de 2023
Sure thing, I will attach it shortly after I post this.
This is a simplified version of my full table.
Basically the Last column is a copy of the first column but without the number attached to it (done in MATLAB usually).
The 2nd column is not imported (I put in an arbitrary Area).
Columns 3-6 are the ones in quesiton.
Here there are four columns, but what I want to do is have my code work for any number of columns.
So if N is the width of the table, I will want to analyze via groupsummary columns 3 through N-1 always based on the groupnames column.

Iniciar sesión para comentar.

Respuesta aceptada

Cris LaPierre
Cris LaPierre el 25 de Abr. de 2023
There is nothing wrong with your first code snippet. The error is simply telling you that your table does not have a variable (column) named '0' or '1'. Check that you have properly renamed the table variables.
Health = {'Fair';'Poor';'Excellent';'Good';'Fair'};
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T1 = table(Health,Height,Weight,BloodPressure,'VariableNames',{'groupnames','0','1','2'});
G = groupsummary(T1,'groupnames',"sum",{'0','1'})
G = 4×4 table
groupnames GroupCount sum_0 sum_1 _____________ __________ _____ _____ {'Excellent'} 1 64 131 {'Fair' } 2 135 295 {'Good' } 1 67 133 {'Poor' } 1 69 163
For your second question, the code you are proposing won't work well. You'd only ever end up with the final results, as each loop will overwrite the previous one. There is also no need to put this in a loop. Just use the colon operator and width as shown here:
G = groupsummary(T1,'groupnames',"sum",2:width(T1)-1)
G = 4×4 table
groupnames GroupCount sum_0 sum_1 _____________ __________ _____ _____ {'Excellent'} 1 64 131 {'Fair' } 2 135 295 {'Good' } 1 67 133 {'Poor' } 1 69 163
  4 comentarios
Jon
Jon el 25 de Abr. de 2023
Very nice, thanks. I'm assuming that 'PreserveVariableNames' function keeps the name of the headers regardless of the form? I think that was the issue I was running into.
Cris LaPierre
Cris LaPierre el 25 de Abr. de 2023
Editada: Cris LaPierre el 25 de Abr. de 2023
It looks like that is an undocumented syntax. Use this instead: "VariableNamingRule","preserve".
I also prefer strings over character arrays, so I would ultimately do this.
data = readtable('table.xlsx','ReadVariableNames',true,'VariableNamingRule','preserve',...
'TextType','string')
G1 = groupsummary(data,'groupnames',"sum",{'0','1'})
G2 = groupsummary(data,'groupnames',"sum",3:width(data)-1)

Iniciar sesión para comentar.

Más respuestas (0)

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by