I don't understand the behavior of discretize
18 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Vittorio Picco
el 1 de Sept. de 2022
Comentada: Vittorio Picco
el 1 de Sept. de 2022
I can't understand what discretize does.
Example 1:
[a1, e1] = discretize(1:100,2);
I expect this to create 2 uniform bins, therefore the edges would be 0, 50,100. Because the default rule for filling the bins has a < instead of <= I get the first 49 points into bin 1, and the last 51 points into bin 2. That's what I see in a1 and e1 and it makes sense to me. (Although I understand the logic, uniform bins to me would mean that both bins should have 50 elements, but OK.)
Example 2:
[a2, e2] = discretize(1:101,2);
The edges returned are 0, 60, 120. The first 59 points end up in bin 1, the remaining 42 in bin 2. This makes no sense to me. The calculated edges make no sense, and the output is clearly bins of non-uniform width. The same output is returned in R2021a.
I must have some fundamental misunderstanding of what is happening.
3 comentarios
Respuesta aceptada
Walter Roberson
el 1 de Sept. de 2022
discretize() invokes matlab.internal.math.binpicker()
... which places the bins at "nice" locations, involving multiples of 10.
This is not documented.
0 comentarios
Más respuestas (2)
Bruno Luong
el 1 de Sept. de 2022
Editada: Bruno Luong
el 1 de Sept. de 2022
The doc saids
"discretize divides the data into N bins of uniform width, choosing the bin edges to be "nice" numbers that overlap the range of the data."
Good luck to have an exact specification of "nice". I guess the purpose is when bining then plot with bar on the screen the bar are sync with digits and xticks of x-axis.
0 comentarios
Steven Lord
el 1 de Sept. de 2022
The calculated edges make no sense, and the output is clearly bins of non-uniform width.
No, in that example the bins are uniformly 60 units wide. Non-uniform bins would be a case like the following:
h = histogram(1:101, [0 50 101]);
h.BinWidth
E = h.BinEdges
theBinWidths = diff(E) % Different widths
It seems that your expectation of what is "uniform" is related to the number of points in the bin, and that is not the definition of "uniform" used by histogram, histcounts, or discretize. Their definition of "uniform" uses the distance between edges.
By your definition of "uniform" you could easily encounter a situation where it's impossible to create uniform bins. The obvious case is where the number of bins does not divide the number of points (for example dividing 101 points between 2 bins) but another simple case involves binning 4 points into 2 uniform bins.
[counts, edges] = histcounts([1 1 1 2], 2)
Ver también
Categorías
Más información sobre Creating and Concatenating Matrices en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!