alternative rounding method when using histcounts
14 views (last 30 days)
My understanding is that when using hiscounts matlab uses the "round up" method when a value lies exactly on a bin edge.
Is it possible to implement a different type of rounding startegy e.g. bankers rounding/gaussian rounding? e.g 4.5 would round to 4 not to 5
It doesn;t appear to be an optional parameter to pass to the hiscounts function
Guillaume on 14 Apr 2020
"histcounts makes a rounding decison when a value is exactly on a a bin edge."
No, you misunderstand how histcounts works. As I said it doesn't do any rounding.
As documented, The value X is in the bin if except for the last bin where the the right edge is part of the bin. So, yes if you have an edge at 2.5, the value 2.5 will be part of the 2.5+ bin, no rounding involved.
Now, I thought there was a way to reverse this so that's it's the right edge that is included in bin k instead of the left edge but I was surprised to find that this option is only available for discretize which is very similar in some way.
So, if you want 2.5 to be included in the [2, 2.5] bin, you have two options:
1. Change your bin definition so that the right edge is not 2.5 but the next number up, which is 2.5 + eps(2.5):
edges = [0, 0.5, 1, 1.5, 2, 2.5, 3]
edges(2:end-1) = edges(2:end-1) + eps(edges(2:end-1)); %increase right edges of each bin to the next representable number
%use histcount as normal
2. Do an indirect trip through discretize:
edges = [0, 0.5, 1, 1.5, 2, 2.5, 3];
bin = discretize(yourvector, edges, 'IncludedEdge', 'right');
newedges = 1:numel(edges)
result = histcounts(yourvector, newedges, ..your_histcounts_options); %works as long as 'Normalization' doesn't rely on bin width (i.e. 'cdf' and 'countdensity')
More Answers (1)
Steven Lord on 14 Apr 2020
If a value in your data exactly matches one of the elements of the edges vector, that value is counted in the right bin of the two (unless it matches the last element of the edges vector, in which case it's in the last bin.) From the histcounts documentation page:
"[N,edges] = histcounts(X,edges) sorts X into bins with the bin edges specified by the vector, edges. The value X(i) is in the kth bin if edges(k) ≤ X(i) < edges(k+1). The last bin also includes the right bin edge, so that it contains X(i) if edges(end-1) ≤ X(i) ≤ edges(end)."
Bins other than the last contain their left edge but not their right, and the last bin contains both edges.
There's no option to change which edge each bin contains (to make the first bin contain both its edges and make all others contain their right edge but not their left.) The discretize function has an option that does this, so asking for a similar option in histcounts and related functions seems to me like a reasonable enhancement request for you to file with Technical Support.