Question on how to delete the data (outliers) in the boxplot

111 visualizaciones (últimos 30 días)
Luís Barbosa
Luís Barbosa el 28 de Ag. de 2020
Respondida: Baldvin hace alrededor de 19 horas
Hello everyone.
I need to know how to eliminate the representation of outliers (> 95%) in a boxplot representation.
Any suggestion?
Thank you
  4 comentarios
Adam Danz
Adam Danz el 28 de Ag. de 2020
Editada: Adam Danz el 28 de Ag. de 2020
Nice! So, you wanted to remove the outlier markers completely.
Just be careful in how you interpret those plots. Outliers are often very informative and it should be indicated somehwere that they were removed from the visualization.
Binbin Qi
Binbin Qi el 28 de Ag. de 2020
I think you can use
rmoutliers(A,'percentiles',threshold)

Iniciar sesión para comentar.

Respuesta aceptada

Adam Danz
Adam Danz el 5 de En. de 2021
Editada: Adam Danz el 5 de En. de 2021
> how to eliminate the representation of outliers in a boxplot representation.
Removing outliers from the raw data
By default, outliers are data points that are more than 1.5*IQR from the median where IQR is the interquartile range, computed by iqr().
If the goal is to remove outliers from the raw data based on this definition, you can replace their values with NaNs to preserve the size and shape of the variable using,
rng default % For reproducibility
x = [randn(25,4);rand(2,4)-6;rand(2,4)+6];
x = reshape(x(randperm(numel(x))),size(x)); % scrambles rows of x; for demo purposes only
isout = isoutlier(x,'quartiles');
xClean = x;
xClean(isout) = NaN;
however, this won't necessarily remove outliers markers from the plot since the medians and IQRs of the data have changed and what used to not be an outlier may now be an outlier.
Removing outlier markers from the boxplot
If the goal is to remove outlier markers from the plot, produce the boxplots using an empty outlier marker style as Luís Barbosa suggested above.
% Create data
rng default % For reproducibility
x = [randn(25,4);rand(2,4)-6;rand(2,4)+6];
x = reshape(x(randperm(numel(x))),size(x)); % scrambles rows of x; for demo purposes only
% Plot with and without outlier markers
figure()
ax(1) = subplot(1,2,1);
boxplot(ax(1), x)
title(ax(1), 'With outlier markers')
grid(ax(1),'on')
ax(2) = subplot(1,2,2);
boxplot(ax(2), x, 'symbol', '')
title(ax(2), 'Without outlier markers')
grid(ax(2),'on')
However, removing outlier markers should usually be avoided and can be very deceptive. It's easy to view a figure at some point in the future and to forget that outliers were removed. Outliers can be very informative and are often just as important as the median and IQR. Therefore, it should be indicated somehwhere on the figure that they were removed from the visualization.

Más respuestas (1)

Baldvin
Baldvin hace alrededor de 15 horas
As per boxplot's documentation, each aspect of a boxplot is tagged and we can find them with findobj:
ax=axes;
boxplot(ax,[-3,repelem(3:8,10),14]) % Should contain two outliers at y=-3 and y=14.
% Obtain graphics handle for outliers:
o=findobj(ax.Children,'Tag','Outliers');
% We can now change all aspects of the markers:
set(o,'Marker','h','MarkerSize',12)
We could now make the outliers invisible or delete them from the plot, but I encourage folks to read Adam's post above!
% set(o,'Marker','none') % Misleading
% o.Visible = 'off' % Misleading
% delete(o) % Danger, danger!
Let me repeat Adams' words of wisdom that "... removing outlier markers should usually be avoided and can be very deceptive."

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by