Plot Mean over Box Charts using Positional and Color Grouping Variables

24 views (last 30 days)
I'd like to plot mean over box chart which using positional and Color Grouping Variables.
Example)
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November''December'};
tbl.Month = categorical(tbl.Month,monthOrder);
boxchart(tbl.Month,tbl.TemperatureF,'GroupByColor',tbl.Year)
ylabel('Temperature (F)')
legend
hold on
meanTemperatureF = groupsummary(tbl.TemperatureF,{tbl.Month, tbl.Year}, 'mean');
plot(meanmeanTemperatureF,'o')
but, the positions of plot are not exactly.
How can I fix it?
  3 Comments
dpb
dpb on 16 Nov 2021
Well, you've got twelve months but for some reason only 11 categories on the boxplot axis and there are only 10+7 actual boxes by the time you've done the grouping by year.
Then you just plotted the means for an indeterminate number of months versus their ordinal positions.
You need to find/create the correct categorical variable value for each of those means that is associated with the position of the appropriate box -- and then since you have used grouping, that may still not quite line up.
Never done such an exercise before and suspect highly unlikely anybody on Answers has, either; attach your data as a .mat file so somebody can reproduce the result you have and easily explore how to make the necessary adjustments to axis values.
"Help us help you!"

Sign in to comment.

Accepted Answer

Dave B
Dave B on 16 Nov 2021
Edited: Dave B on 16 Nov 2021
Boxchart makes this really difficult! The location of the categories is well defined, but the offset (while easy to calculate) isn't included anywhere. Fortunately, you can plot the numeric equivalent of a categorical, and it's easy to convert.
Some bits of your code didn't quite line up for me (e.g. how you're calling group summary) so I used a dataset I happened to have around with very similar data. I plotted the means on each box, not sure if you were thinking the mean for each month - which would be much easier!
Note that I used the more robust 'ruler2num' to convert month names to their numeric values, but in reality the locations are just the category number, so the month number.
tbl = readtable('natick weather 2003-2014.csv');
tbl.Year=tbl.DATE.Year;
tbl=tbl(ismember(tbl.Year,[2004,2008,2012]),:);
%monthOrder = {'January','February','March','April','May','June','July', ...
% 'August','September','October','November', 'December'};
% alternate move:
monthOrder = month(datetime(2010,1:12,1),'name');
tbl.Month = categorical(month(tbl.DATE,'name') ,monthOrder);
meantemp = groupsummary(tbl,{'Month' 'Year'},'mean','TMAX');
%%
bc=boxchart(tbl.Month,tbl.TMAX,'GroupByColor',tbl.DATE.Year);
ylabel('Temperature (F)')
legend
hold on
xax=get(gca,'XAxis');
offset=(1:numel(bc))/numel(bc);
offset=offset-mean(offset);
for i = 1:numel(bc)
ind = string(meantemp.Year)==string(bc(i).DisplayName);
x=ruler2num(meantemp.Month(ind),xax)+offset(i);
y=meantemp.mean_TMAX(ind);
plot(x,y,'x','LineWidth',2,'DisplayName',"mean(" + bc(i).DisplayName + ")",'SeriesIndex',i)
end
  6 Comments
dpb
dpb on 19 Nov 2021
Thanks for the feedback...part of my purpose in Answers (besides being entertainment/stimulation after giving up the consulting gig) is that it gives the opportunity to raise these kinds of user pain points.
I know I tend to carp on a lot of details and may take a thread off on a side journey but I always try to make sure the OPs Q? is answered best as can on the way. :)
But, I think having these related types of similar cases raised hopefully will continue to raise the consciousness of the development team -- I know it probably isn't true, but it seems to me as a longer-time user a trend towards releasing features that are not yet really ready and that there is far less consistency across the base product and toolboxes than before. It seems as though there isn't an overall corporate-wide oversight that really enforces syntax rules/documentation to try to maintain that cohesive nature but that the various toolboxes are almost totally separate products.
I understand the difficulties; the shift from purely procedural coding style of the original MATLAB to object-oriented/class-based methods is a major dichotomy and schism to breach. I don't have the answer (so to speak?), but believe there needs to be more effort into the area during the initial design of new functions/features/toolboxes to try to minimize these differences going forward.

Sign in to comment.

More Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by