I don't understand the behavior of discretize

18 visualizaciones (últimos 30 días)
Vittorio Picco
Vittorio Picco el 1 de Sept. de 2022
Comentada: Vittorio Picco el 1 de Sept. de 2022
I can't understand what discretize does.
Example 1:
[a1, e1] = discretize(1:100,2);
I expect this to create 2 uniform bins, therefore the edges would be 0, 50,100. Because the default rule for filling the bins has a < instead of <= I get the first 49 points into bin 1, and the last 51 points into bin 2. That's what I see in a1 and e1 and it makes sense to me. (Although I understand the logic, uniform bins to me would mean that both bins should have 50 elements, but OK.)
Example 2:
[a2, e2] = discretize(1:101,2);
The edges returned are 0, 60, 120. The first 59 points end up in bin 1, the remaining 42 in bin 2. This makes no sense to me. The calculated edges make no sense, and the output is clearly bins of non-uniform width. The same output is returned in R2021a.
I must have some fundamental misunderstanding of what is happening.
  3 comentarios
Vittorio Picco
Vittorio Picco el 1 de Sept. de 2022
Yes, if I define the edges manually it makes sense, but why does the syntax discretize(1:101,2) produce edges 0, 60 and 120 in the first place?
Torsten
Torsten el 1 de Sept. de 2022
Editada: Torsten el 1 de Sept. de 2022
Yes, if I define the edges manually it makes sense, but why does the syntax discretize(1:101,2) produce edges 0, 60 and 120 in the first place?
Why not ? You are able to control the edges - so just do it.

Iniciar sesión para comentar.

Respuesta aceptada

Walter Roberson
Walter Roberson el 1 de Sept. de 2022
discretize() invokes matlab.internal.math.binpicker()
... which places the bins at "nice" locations, involving multiples of 10.
This is not documented.

Más respuestas (2)

Bruno Luong
Bruno Luong el 1 de Sept. de 2022
Editada: Bruno Luong el 1 de Sept. de 2022
The doc saids
"discretize divides the data into N bins of uniform width, choosing the bin edges to be "nice" numbers that overlap the range of the data."
Good luck to have an exact specification of "nice". I guess the purpose is when bining then plot with bar on the screen the bar are sync with digits and xticks of x-axis.

Steven Lord
Steven Lord el 1 de Sept. de 2022
The calculated edges make no sense, and the output is clearly bins of non-uniform width.
No, in that example the bins are uniformly 60 units wide. Non-uniform bins would be a case like the following:
h = histogram(1:101, [0 50 101]);
h.BinWidth
ans = 'nonuniform'
E = h.BinEdges
E = 1×3
0 50 101
theBinWidths = diff(E) % Different widths
theBinWidths = 1×2
50 51
It seems that your expectation of what is "uniform" is related to the number of points in the bin, and that is not the definition of "uniform" used by histogram, histcounts, or discretize. Their definition of "uniform" uses the distance between edges.
By your definition of "uniform" you could easily encounter a situation where it's impossible to create uniform bins. The obvious case is where the number of bins does not divide the number of points (for example dividing 101 points between 2 bins) but another simple case involves binning 4 points into 2 uniform bins.
[counts, edges] = histcounts([1 1 1 2], 2)
counts = 1×2
3 1
edges = 1×3
1.0000 1.5000 2.0000
  1 comentario
Vittorio Picco
Vittorio Picco el 1 de Sept. de 2022
Yes, you are right, I used the word "uniform" a bit freely. It's just so unintuitive: split 101 in 2, who would pick 59 and 42? If MATLAB had picked 50 and 51 I don't think I would have ever asked the question...

Iniciar sesión para comentar.

Categorías

Más información sobre Creating and Concatenating Matrices en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by