Tiling stacks of boxplots. Each stack contains 5 boxplots

77 visualizaciones (últimos 30 días)
George
George el 30 de Nov. de 2024 a las 16:21
Comentada: dpb el 6 de Dic. de 2024 a las 17:19
I have generated five boxplots stacked side by side. I used subplot(1,5,1)....suplot(1,5,5). I'm attaching a figure for easy reference. I'd like to tile under this figure another one containing five similar stacked subplots. I've tried various solutions to no avail. The closest to my requirements is the boxplotGroup function but I'm still unable to get what I want.
I'd be grateful for any help
  1 comentario
George
George el 1 de Dic. de 2024 a las 21:40
Movida: dpb el 1 de Dic. de 2024 a las 22:37
I'm attaching the excell file with the data as per your request.
For reference: headings starting with g1 refer to "labile", g2 refer to "stable" (columns 1 to 10). Correspondingly: g1c for "labile" and g2c for "stable" (columns 11-20). Data in columns 1-10 come from one experiment and for columns 11-20 from a second one.
Again, thank you for your patience and suggestions.

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 2 de Dic. de 2024 a las 0:11
Editada: dpb el 2 de Dic. de 2024 a las 17:23
This duplicates the prior work with a table instead; the previous still is correct for the one vector case, but having the data as the struct allows one the flexibility to get ahold of the field names and then move the metadata out of the variable names and into the data where it belongs...
S=load('datasets'); % read as a struct; can handle variable names that way programmatically
N=max(structfun(@numel,S)); % find longest vector
S=structfun(@(v)[v;nan(N-numel(v),1)],S,'uni',0); % and pad to that size
experiment=contains(fieldnames(S),'c_')+1; % set the experiment number from "c" id in name
experiment=cell2mat(arrayfun(@(e)repmat(e,N,1),experiment,'uni',0));
type=startsWith(fieldnames(S),'g2'); % and the cell type
type=cell2mat(arrayfun(@(t)repmat(t,N,1),type,'uni',0));
type=categorical(type,unique(type),{'labile','stable'});
class=extractAfter(fieldnames(S),'_'); % and the classification
class=arrayfun(@(c)repmat(c,N,1),class,'uni',0);
class=categorical(cat(1,class{:}));
observation=cell2mat(cellfun(@(f)S.(f),fieldnames(S),'uni',0));
tData=table(experiment,type,class,observation); % and turn into a table
head(tData)
experiment type class observation __________ ______ ______ ___________ 1 labile acidic 11.597 1 labile acidic 11.94 1 labile acidic 11.028 1 labile acidic 11.753 1 labile acidic 11.864 1 labile acidic 12.079 1 labile acidic 11.26 1 labile acidic 12.181
groupsummary(tData,{'experiment','type','class'},'all')
ans = 20x16 table
experiment type class GroupCount mean_observation sum_observation min_observation max_observation range_observation median_observation mode_observation var_observation std_observation nummissing_observation nnz_observation numunique_observation __________ ______ _________ __________ ________________ _______________ _______________ _______________ _________________ __________________ ________________ _______________ _______________ ______________________ _______________ _____________________ 1 labile acidic 245 11.954 2881 8.4507 14.444 5.9937 12.024 11.824 0.72844 0.85349 4 241 198 1 labile aliphatic 245 28.582 6888.3 25.2 33.712 8.5121 28.571 26.052 3.3274 1.8241 4 241 214 1 labile charged 245 26.751 6447.1 22.177 31.434 9.2576 26.679 26.253 1.981 1.4075 4 241 206 1 labile npolar 245 54.873 13224 51.073 59.557 8.4844 54.902 55.4 1.6953 1.302 4 241 203 1 labile polar 245 45.122 10874 40.443 48.927 8.4844 45.098 44.6 1.7028 1.3049 4 241 203 1 stable acidic 245 11.475 2811.5 8.2 14.112 5.9119 11.569 11.776 1.0054 1.0027 0 245 208 1 stable aliphatic 245 29.657 7265.9 25.528 34.898 9.3701 29.762 27.6 2.9134 1.7069 0 245 226 1 stable charged 245 26.079 6389.3 21.792 29.803 8.0103 26.052 25.149 2.3286 1.526 0 245 220 1 stable npolar 245 55.621 13627 51.196 60.196 8.9999 55.666 55.709 3.1817 1.7837 0 245 227 1 stable polar 245 44.379 10873 39.804 48.804 8.9999 44.334 44.291 3.1829 1.7841 0 245 227 2 labile acidic 245 11.78 1354.6 8.4507 13.878 5.4276 11.753 12.5 0.70648 0.84052 130 115 110 2 labile aliphatic 245 28.717 3302.5 25.421 33.531 8.11 28.63 27.921 2.868 1.6935 130 115 109 2 labile charged 245 26.503 3047.9 22.485 30.303 7.8178 26.471 25.941 1.972 1.4043 130 115 111 2 labile npolar 245 54.979 6322.6 52.523 59.557 7.034 54.91 54.028 1.7804 1.3343 130 115 111 2 labile polar 245 45.021 5177.4 40.443 47.477 7.034 45.09 43.843 1.7804 1.3343 130 115 111 2 stable acidic 245 11.292 1795.4 8.2 14.111 5.9115 11.446 9.5918 1.023 1.0115 86 159 143
Removing metadata from variable names and converting to a table makes further analyses much simpler and also is easier to present the data...
As for the boxplots, within the vectors, the previous code would work just fine; with the above table, varfun with the grouping variables would work as well.
For the prior result, then
j=0;
for e=unique(tData.experiment).'
ix=tData.experiment==e;
for c=categories(tData.class).'
j=j+1;
hAx=subplot(2,5,j);
iy=ix & tData.class==c;
boxplot(tData.observation(iy),tData.type(iy))
hAx.XAxis.TickLabelRotation=0;
end
end
looks about right with the same issue that the online platform does something funky with the first axes on each row.
Again, the above makes the previous presumption that you wanted all of them in one figure...
ADDENDUM
Nota Bene: the orientation of the vectors in the for...end loops; MATLAB iterates over the items in the list by column, so must ensure those are row vectors--hence the transpose.
ADDENDUM SECOND
"the previous still is correct for the one vector case,"
Nota Bene: To use the observation field as the vector, remember it is now augmented to full length so the indexing is over N elements, not the variable number used in prior examples...or pull the data from the struct without the augmentation to same length and the prior logic will work as given if compute the L length vector to coincide with actual data instead of making up something as I did in the example by using a random length...
ADDENDUM THIRD
Forcibly setting the XAxis.TickLabelRotation property back to 0 fixes the issue with the first axes on the two rows.
  5 comentarios
George
George el 6 de Dic. de 2024 a las 17:11
The subplots should look like:
g1_genLen/g2_genLen g1_intrLen/g2_intrLen g1_numExon/g2_numExon
g1c_genLen/g2c_genLen g1c_intrLen/g2c_intrLen g1c_numExon/g2c_numExon
dpb
dpb el 6 de Dic. de 2024 a las 17:19
Well, then you've got to generalize the tiling shape based on the size of the input data as I had originally done instead of using a fixed 5x2, but it wasn't made a requirement that would have anything but 20 sets so reverted back to the hardcoded arrangement.

Iniciar sesión para comentar.

Más respuestas (3)

the cyclist
the cyclist el 30 de Nov. de 2024 a las 17:25
Would subplot(2,5,1) ... subplot(2,5,10) do what you want?
Also, I would highly recommend using tiledlayout over subplot. It takes a bit of getting used to, if you have been using subplot for a while, but in the long run it is much better.

dpb
dpb el 30 de Nov. de 2024 a las 17:38
Editada: dpb el 30 de Nov. de 2024 a las 20:10
That would just double the number of rows in the subplot (or tiledlayout) arrangement...
y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size
g={'labile','stable'};
M=2;
W=width(y);
N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
The funky label orientation only shows up on this platform, the axes are all consistent on desktop...
Using the <tiledlayout> instead of subplot would give you some additional features; and boxchart might be worth looking into...
If you're looking for more sophisticated look, you could probably put each pair in a panel that would separate the two visually...I've never messed with them, so will leave as "exercise for Student"...
  8 comentarios
dpb
dpb el 1 de Dic. de 2024 a las 20:39
Editada: dpb el 1 de Dic. de 2024 a las 20:40
W/o the actual data file not possible to diagnose what you did wrong but the example code I did works provided the data array is defined as a column vector; if you convert to array format including the NaN, then you obviously can't index into it as if it were a 1D vector without.
Look at the examples above more closely; look specifically at size(y) in
L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1); % means
SD=randi([ 3 20],20,1); % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
...
you'll find that out...well, let's just go for it...
whos g y
Name Size Bytes Class Attributes g 3290x1 3556 categorical y 3290x1 26320 double
You notice these are 1D vectors with subsections of the arbitrary length of the various pieces-parts that were defined by L
N.'
ans = 1×10
442 368 326 252 314 249 366 379 328 266
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
L.'
ans = 1×20
195 247 198 170 140 186 124 128 112 202 140 109 248 118 179 200 101 227 156 110
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
sum(L)
ans = 3290
and you will note that sum(L) == numel(y). Ergo, linear indexing into the vector by the length given by N is the correct indexing in that case; it would NOT be so if had a full-length, augmented array in which each N would be the same and equal to the max(L).
As for the figures, of course if you create a new figure you get two, and not multiple subplots in one...and, also, if you tell subplot() to divide the axes into two rowsxN axes/row the height will be half that if you tell it to only put one set of axes on a row in the figure. Read the doc and look at the examples for subplot to see what it does.
It all depends upon what your output format is desired to be -- do you want all 10 in one figure as we've been presuming from the wording of the initial question or as two separate figures with only five on each?
Again, if you still can't figure out what is different in what I've showed you; then attach the data file; making up data to try to match is, as always fraught with misunderstandings when the actual starting point isn't the same and we presume the poster can relate the examples to their situation.
dpb
dpb el 1 de Dic. de 2024 a las 20:57
T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247 20
M=2; W = width(T);
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
In the above code snippet, g and y are undefined; they did NOT come from having read the test.xlsx file by the preceding code; ergo, there's no telling what they really were and that don't match expectations is therefore not surprising.
Only if we know the content of the datfile itself can we make any judgements on how it should be treated and "burned once, twice't shy", I'm not going to make any assumptions this time about what it actually does look like. Attach the file...

Iniciar sesión para comentar.


dpb
dpb el 1 de Dic. de 2024 a las 22:42
Editada: dpb el 1 de Dic. de 2024 a las 22:52
tT=readtable('test.xlsx');
whos tT
Name Size Bytes Class Attributes tT 247x20 45459 table
[head(tT,4); tail(tT,4)]
ans = 8x20 table
g1_aliphatic g2_aliphatic g1_acidic g2_acidic g1_charged g2_charged g1_polar g2_polar g1_npolar g2_npolar g1c_aliphatic g2c_aliphatic g1c_acidic g2c_acidic g1c_charged g2c_charged g1c_polar g2c_polar g1c_npolar g2c_npolar ____________ ____________ __________ __________ __________ __________ __________ __________ __________ __________ _____________ _____________ __________ __________ ___________ ___________ __________ __________ __________ __________ 2.7186e+05 3.1731e+05 1.1597e+05 1.1731e+05 2.7376e+05 2.8654e+05 4.7719e+05 4.4231e+05 5.2281e+05 5.5769e+05 2.5421e+05 3.1731e+05 1.1028e+05 1.1731e+05 2.4673e+05 2.8654e+05 4.7477e+05 4.4231e+05 5.2523e+05 5.5769e+05 2.8358e+05 2.6946e+05 1.194e+05 1.1776e+05 2.7052e+05 2.6148e+05 4.6455e+05 4.6108e+05 5.3545e+05 5.3892e+05 2.8486e+05 2.8911e+05 1.1753e+05 1.1683e+05 2.7092e+05 2.5545e+05 4.4821e+05 4.4554e+05 5.5179e+05 5.5446e+05 2.5421e+05 2.8911e+05 1.1028e+05 1.1683e+05 2.4673e+05 2.5545e+05 4.7477e+05 4.4554e+05 5.2523e+05 5.5446e+05 2.58e+05 3.0691e+05 1.1864e+05 1.0976e+05 2.7119e+05 2.8658e+05 4.7269e+05 4.7968e+05 5.2731e+05 5.2032e+05 2.8486e+05 3.0691e+05 1.1753e+05 1.0976e+05 2.7092e+05 2.8658e+05 4.4821e+05 4.7968e+05 5.5179e+05 5.2032e+05 2.5545e+05 3.3676e+05 1.2079e+05 92402 2.6733e+05 2.2587e+05 4.6733e+05 4.271e+05 5.3267e+05 5.729e+05 NaN 2.9231e+05 NaN 1.1429e+05 NaN 2.4396e+05 NaN 4.5934e+05 NaN 5.4066e+05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0485e+05 NaN 1.165e+05 NaN 2.6019e+05 NaN 4.1165e+05 NaN 5.8835e+05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
sum(~isfinite(tT{:,:}))
ans = 1×20
6 2 6 2 6 2 6 2 6 2 132 88 132 88 132 88 132 88 132 88
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Those data look nothing like the prior examples -- it's clear why the numbers were large now; that's what's in the file.
How were the percentage numbers shown before generated?
  1 comentario
George
George el 1 de Dic. de 2024 a las 23:52
It seems that Excel did one of its tricks and I was fool enough not to check. For some reason Excel has problems with decimals, etc. Anyway, it's not a cheap excuse and I'm sorry about this. I'm attaching a matlab file containing the data of the two datasets as individual vectors (datasets.mat). Thanks again.

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by