# histcounts

Histogram bin counts

## Syntax

``````[N,edges] = histcounts(X)``````
``````[N,edges] = histcounts(X,nbins)``````
``````[N,edges] = histcounts(X,edges)``````
``````[N,edges,bin] = histcounts(___)``````
``N = histcounts(C)``
``N = histcounts(C,Categories)``
``````[N,Categories] = histcounts(___)``````
``[___] = histcounts(___,Name,Value)``

## Description

example

``````[N,edges] = histcounts(X)``` partitions the `X` values into bins and returns the bin counts and the bin edges. The `histcounts` function uses an automatic binning algorithm that returns uniform bins chosen to cover the range of elements in `X` and reveal the underlying shape of the distribution.```

example

``````[N,edges] = histcounts(X,nbins)``` uses a number of bins specified by the scalar, `nbins`.```

example

``````[N,edges] = histcounts(X,edges)``` sorts `X` into bins with the bin edges specified by the vector, `edges`.```

example

``````[N,edges,bin] = histcounts(___)``` also returns an index array, `bin`, using any of the previous syntaxes. `bin` is an array of the same size as `X` whose elements are the bin indices for the corresponding elements in `X`. The number of elements in the `k`th bin is `nnz(bin==k)`, which is the same as `N(k)`.```

example

````N = histcounts(C)`, where `C` is a categorical array, returns a vector, `N`, that indicates the number of elements in `C` whose value is equal to each of `C`’s categories. `N` has one element for each category in `C`.```
````N = histcounts(C,Categories)` counts only the elements in `C` whose value is equal to the subset of categories specified by `Categories`.```

example

``````[N,Categories] = histcounts(___)``` also returns the categories that correspond to each count in `N` using either of the previous syntaxes for categorical arrays.```

example

````[___] = histcounts(___,Name,Value)` specifies additional parameters using one or more name-value arguments. For example, you can specify `'BinWidth'` and a scalar to adjust the width of the bins for numeric data. For categorical data, you can specify `'Normalization'` and either `'count'`, `'countdensity'`, `'probability'`, `'pdf'`, `'cumcount'`, or `'cdf'`.```

## Examples

collapse all

Distribute 100 random values into bins. `histcounts` automatically chooses an appropriate bin width to reveal the underlying distribution of the data.

```X = randn(100,1); [N,edges] = histcounts(X)```
```N = 1×7 2 17 28 32 16 3 2 ```
```edges = 1×8 -3 -2 -1 0 1 2 3 4 ```

Distribute 10 numbers into 6 equally spaced bins.

```X = [2 3 5 7 11 13 17 19 23 29]; [N,edges] = histcounts(X,6)```
```N = 1×6 2 2 2 2 1 1 ```
```edges = 1×7 0 4.9000 9.8000 14.7000 19.6000 24.5000 29.4000 ```

Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.

```X = randn(1000,1); edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5]; N = histcounts(X,edges)```
```N = 1×10 0 24 149 142 195 200 154 111 25 0 ```

Distribute all of the prime numbers less than 100 into bins. Specify `'Normalization'` as `'probability'` to normalize the bin counts so that `sum(N)` is `1`. That is, each bin count represents the probability that an observation falls within that bin.

```X = primes(100); [N,edges] = histcounts(X, 'Normalization', 'probability')```
```N = 1×4 0.4000 0.2800 0.2800 0.0400 ```
```edges = 1×5 0 30 60 90 120 ```

Distribute 100 random integers between -5 and 5 into bins, and specify `'BinMethod'` as `'integers'` to use unit-width bins centered on integers. Specify a third output for `histcounts` to return a vector representing the bin indices of the data.

```X = randi([-5,5],100,1); [N,edges,bin] = histcounts(X,'BinMethod','integers');```

Find the bin count for the third bin by counting the occurrences of the number `3` in the bin index vector, `bin`. The result is the same as `N(3)`.

`count = nnz(bin==3)`
```count = 8 ```

Create a categorical vector that represents votes. The categories in the vector are `'yes'`, `'no'`, or `'undecided'`.

```A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1]; C = categorical(A,[1 0 NaN],{'yes','no','undecided'})```
```C = 1x27 categorical no no yes yes yes no no no no undecided undecided yes no no no yes no yes no yes no no no yes yes yes yes ```

Determine the number of elements that fall into each category.

`[N,Categories] = histcounts(C)`
```N = 1×3 11 14 2 ```
```Categories = 1x3 cell {'yes'} {'no'} {'undecided'} ```

## Input Arguments

collapse all

Data to distribute among bins, specified as a vector, matrix, or multidimensional array. If `X` is not a vector, then `histcounts` treats it as a single column vector, `X(:)`.

`histcounts` ignores all `NaN` values. Similarly, `histcounts` ignores `Inf` and `-Inf` values unless the bin edges explicitly specify `Inf` or `-Inf` as a bin edge.

Data Types: `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `datetime` | `duration`

Categorical data, specified as a categorical array. `histcounts` ignores undefined categorical values.

Data Types: `categorical`

Number of bins, specified as a positive integer. If you do not specify `nbins`, then `histcounts` automatically calculates how many bins to use based on the values in `X`.

Example: `[N,edges] = histcounts(X,15)` uses 15 bins.

Bin edges, specified as a vector. The first vector element specifies the leading edge of the first bin. The last element specifies the trailing edge of the last bin. The trailing edge is only included for the last bin.

For datetime and duration data, `edges` must be a datetime or duration vector in monotonically increasing order.

Categories included in count, specified as a string vector, cell vector of character vectors, `pattern` scalar, or categorical vector. By default, `histcounts` uses a bin for each category in categorical array `C`. Use `Categories` to specify a unique subset of the categories instead.

Example: `h = histcounts(C,["Large","Small"])` counts only the categorical data in the categories `Large` and `Small`.

Example: `h = histcounts(C,"Y" + wildcardPattern)` counts categorical data in all the categories whose names begin with the letter `Y`.

Data Types: `string` | `cell` | `pattern` | `categorical`

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `[N,edges] = histcounts(X,'Normalization','probability')` normalizes the bin counts in `N`, such that `sum(N)` is 1.

Width of bins, specified as a positive scalar. If you specify `BinWidth`, then `histcounts` can use a maximum of 65,536 bins (or 216). If the specified bin width requires more bins, then `histcounts` uses a larger bin width corresponding to the maximum number of bins.

• For `datetime` and `duration` data, `BinWidth` can be a scalar duration or calendar duration.

• If you specify `BinWidth` with `BinMethod`, `NumBins`, or `BinEdges`, `histcounts` only honors the last parameter.

• This option does not apply to categorical data.

Example: `histcounts(X,'BinWidth',5)` uses bins with a width of 5.

Edges of bins, specified as a numeric vector. The first element specifies the leading edge of the first bin. The last element specifies the trailing edge of the last bin. The trailing edge is only included for the last bin.

If you do not specify the bin edges, then `histcounts` automatically determines the bin edges.

• If you specify `BinEdges` with `BinMethod`, `BinWidth`, `NumBins`, or `BinLimits`, `histcounts` only honors `BinEdges` and `BinEdges` must be specified last.

• This option does not apply to categorical data.

Bin limits, specified as a two-element vector, `[bmin,bmax]`. The first element indicates the first bin edge. The second element indicates the last bin edge.

This option computes using only the data that falls within the bin limits inclusively, `X>=bmin & X<=bmax`.

This option does not apply to categorical data.

Example: `histcounts(X,'BinLimits',[1,10])` bins only the values in `X` that are between `1` and `10` inclusive.

Binning algorithm, specified as one of the values in this table.

Value

Description

`'auto'`

The default `'auto'` algorithm chooses a bin width to cover the data range and reveal the shape of the underlying distribution.

`'scott'`

Scott’s rule is optimal if the data is close to being normally distributed. This rule is appropriate for most other distributions, as well. It uses a bin width of `3.5*std(X(:))*numel(X)^(-1/3)`.

`'fd'`

The Freedman-Diaconis rule is less sensitive to outliers in the data, and might be more suitable for data with heavy-tailed distributions. It uses a bin width of `2*iqr(X(:))*numel(X)^(-1/3)`.

`'integers'`

The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers.

To avoid accidentally creating too many bins, you can use this rule to create a limit of 65536 bins (216). If the data range is greater than 65536, then the integer rule uses wider bins instead.

`'integers'` does not support datetime or duration data.

`'sturges'`

Sturges’ rule is popular due to its simplicity. It chooses the number of bins to be ```ceil(1 + log2(numel(X)))```.

`'sqrt'`

The Square Root rule is widely used in other software packages. It chooses the number of bins to be `ceil(sqrt(numel(X)))`.

`histcounts` adjusts the number of bins slightly so that the bin edges fall on "nice" numbers, rather than using these exact formulas.

For `datetime` or `duration` data, specify the bin width as one of these units of time.

ValueDescriptionData Type
`"second"`

Each bin is 1 second.

`datetime` and `duration`
`"minute"`

Each bin is 1 minute.

`datetime` and `duration`
`"hour"`

Each bin is 1 hour.

`datetime` and `duration`
`"day"`

Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

`datetime` and `duration`
`"week"`Each bin is 1 calendar week.`datetime` only
`"month"`Each bin is 1 calendar month.`datetime` only
`"quarter"`Each bin is 1 calendar quarter.`datetime` only
`"year"`

Each bin is 1 calendar year. This value accounts for leap days.

`datetime` and `duration`
`"decade"`Each bin is 1 decade (10 calendar years).`datetime` only
`"century"`Each bin is 1 century (100 calendar years).`datetime` only

• If you specify `BinMethod` for `datetime` or `duration` data, then `histcounts` can use a maximum of 65,536 bins (or 216). If the specified bin duration requires more bins, then `histcounts` uses a larger bin width corresponding to the maximum number of bins.

• If you specify `BinLimits`, `NumBins`, `BinEdges`, or `BinWidth`, then `BinMethod` is set to `'manual'`.

• If you specify `BinMethod` with `BinWidth`, `NumBins` or `BinEdges`, `histcounts` only honors the last parameter.

• This option does not apply to categorical data.

Example: `histcounts(X,'BinMethod','integers')` centers the bins on integers.

Type of normalization, specified as one of the values in this table. For each bin `i`:

• ${v}_{i}$ is the bin value.

• ${c}_{i}$ is the number of elements in the bin.

• ${w}_{i}$ is the width of the bin.

• $N$ is the number of elements in the input data. This value can be greater than the binned data if the data contains missing values, such as `NaN`, or if some of the data lies outside the bin limits.

ValueBin ValuesNotes
`'count'` (default)

`${v}_{i}={c}_{i}$`

• Count or frequency of observations.

• Sum of bin values is at most `numel(X)`, or `sum(ismember(X(:),Categories))` for categorical data. The sum is less than this only when some of the input data is not included in the bins.

`'probability'`

`${v}_{i}=\frac{{c}_{i}}{N}$`

• Relative probability.

• The number of elements in each bin relative to the total number of elements in the input data is at most 1.

`'percentage'`

`${v}_{i}=100*\frac{{c}_{i}}{N}$`

• Relative percentage.

• `'percentage'` does not support categorical data.

• The percentage of elements in each bin is at most 100.

`'countdensity'`

`${v}_{i}=\frac{{c}_{i}}{{w}_{i}}$`

• Count or frequency scaled by width of bin.

• For categorical data, this is the same as `'count'`.

• `'countdensity'` does not support `datetime` or `duration` data.

• The sum of the bin areas is at most `numel(X)`.

`'cumcount'`

`${v}_{i}=\sum _{j=1}^{i}{c}_{j}$`

• Cumulative count, or the number of observations in each bin and all previous bins.

• `N(end)` is at most `numel(X)`, or `sum(ismember(X(:),Categories))` for categorical data.

`'pdf'`

`${v}_{i}=\frac{{c}_{i}}{N\text{\hspace{0.17em}}\text{\hspace{0.17em}}\cdot \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{w}_{i}}$`

• Probability density function estimate.

• For categorical data, this is the same as `'probability'`.

• `'pdf'` does not support `datetime` or `duration` data.

• The sum of the bin areas is at most `1`.

`'cdf'`

`${v}_{i}=\sum _{j=1}^{i}\text{\hspace{0.17em}}\frac{{c}_{j}}{N}$`

• Cumulative distribution function estimate.

• The count of each bin is equal to the cumulative relative number of observations in the bin and all previous bins.

• `N(end)` is at most 1.

Example: `histcounts(X,'Normalization','pdf')` bins the data using an estimate of the probability density function.

Number of bins, specified as a positive integer. If you do not specify `NumBins`, then `histcounts` automatically calculates how many bins to use based on the input data.

• If you specify `NumBins` with `BinMethod`, `BinWidth` or `BinEdges`, `histcounts` only honors the last parameter.

• This option does not apply to categorical data.

## Output Arguments

collapse all

Bin counts, returned as a row vector.

Bin edges, returned as a vector. The first element is the leading edge of the first bin. The last element is the trailing edge of the last bin.

Bin indices, returned as an array of the same size as `X`. Each element in `bin` describes which numbered bin contains the corresponding element in `X`.

A value of `0` in `bin` indicates an element which does not belong to any of the bins (for example, a `NaN` value).

Categories included in count, returned as a cell vector of character vectors. `Categories` contains the categories in `C` that correspond to each count in `N`.

## Tips

• The behavior of `histcounts` is similar to that of the `discretize` function. Use `histcounts` to find the number of elements in each bin. On the other hand, use `discretize` to find which bin each element belongs to (without counting).

## Version History

Introduced in R2014b

expand all