aggregate data of a dataset

 Respuesta aceptada

Sindar
Sindar el 16 de Feb. de 2020

0 votos

check out splitapply. You may need to change the format of your data, but it does exactly what you want:
G = findgroups(ds.seats);
mean_dist = splitapply(@mean,ds.score,G);
Switching to tables is probably a good idea:
ds = readtable("datasetT.csv");

17 comentarios

Megan
Megan el 16 de Feb. de 2020
is there not a aggregate funktion like in R?
Sindar
Sindar el 16 de Feb. de 2020
I'm not familiar with R, but (based on a little googling of R's aggregate function) it looks like splitapply does basically the same thing, just with a little less in the way of wrapping. Look at the documentation for examples.
Megan
Megan el 16 de Feb. de 2020
Thanks Sindar
Megan
Megan el 16 de Feb. de 2020
@sindar I doesnt work. I just get 2 rows with NaN :/
Megan
Megan el 16 de Feb. de 2020
Okay I did dataset2table. That worked out. Now I have a table
but
splitapply
didn't work.
Do you know why? Now I know it's not because of dataset.
Sindar
Sindar el 16 de Feb. de 2020
Most likely, you have NaN's in your data. Sounds like you'll need to do some extra work (but, this will help in the future). First, try using the import tool: https://www.mathworks.com/help/matlab/ref/importtool-app.html
This should allow you to figure out why readtable isn't working. Once everything looks good, you can generate code using the arrow just under "import selection"
Then, look here for how to handle missing data (that produced those nans). Some can be done during import, too. https://www.mathworks.com/help/matlab/data_analysis/missing-data-in-matlab.html
Sindar
Sindar el 16 de Feb. de 2020
Try this to replace any missing values with 0:
fillmissing(ds,'constant',0)
Megan
Megan el 16 de Feb. de 2020
Did you try it out with my table?
Megan
Megan el 16 de Feb. de 2020
I didn't get your last comment :(
Megan
Megan el 16 de Feb. de 2020
YOu can look at my table I dont have missing values
Sindar
Sindar el 16 de Feb. de 2020
Editada: Sindar el 16 de Feb. de 2020
I hadn't tried before, but this works:
ds=readtable('datasetT.xlsx');
G = findgroups(ds.Seat);
mean_dist = splitapply(@mean,ds.score,G);
mean_dist =
3.4286
3.7576
There don't seem to be any missing values or issues with readtable
Megan
Megan el 16 de Feb. de 2020
Oh okay I found it
Megan
Megan el 16 de Feb. de 2020
Editada: Megan el 16 de Feb. de 2020
I added a short version of my dataset here. in my original one I have NaN... Sorry :/
Megan
Megan el 16 de Feb. de 2020
fillmissing(ds,'constant',0)
This is not working.
Error using fillmissing/checkArrayType (line 522)
Invalid fill constant type.
Error in fillmissing/fillTableVar (line 166)
[intConstVj,extMethodVj] = checkArrayType(Avj,intMethod,intConstVj,extMethodVj,x,true);
Error in fillmissing/fillTable (line 144)
B.(vj) =
fillTableVar(indVj,A.(vj),intMethod,intConst,extMethod,x,useJthFillConstant,useJthExtrapConstant);
Error in fillmissing (line 127)
B = fillTable(A,intM,intConstOrWinSize,extM,x,dataVars);
Sindar
Sindar el 16 de Feb. de 2020
Sorry, I haven't actually used fillmissing much, so I'm not sure what's up. Regardless, I realized removing rows with missing entries is probably better for your purpose:
ds=readtable('datasetT.xlsx');
clean_ds = rmmissing(ds);
G = findgroups(clean_ds.Seat);
mean_dist = splitapply(@mean,clean_ds.score,G);
Megan
Megan el 16 de Feb. de 2020
That worked out well Thanks!!!
One last question: now I have two rows with mean values.
How can I know which row is which seat number?
Sindar
Sindar el 16 de Feb. de 2020
Look at the second output from findgroups:
[G,G_seat] = findgroups(clean_ds.Seat);
At the end, you can make a summary table:
sum_table = table(G_seat,mean_dist)

Iniciar sesión para comentar.

Más respuestas (0)

Preguntada:

el 16 de Feb. de 2020

Editada:

el 19 de Feb. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by