# normalize

Normalize data

## Syntax

``N = normalize(A)``
``N = normalize(A,dim)``
``N = normalize(___,method)``
``N = normalize(___,method,methodtype)``
``N = normalize(___,'DataVariables',datavars)``

## Description

example

````N = normalize(A)` returns the vectorwise z-score of the data in `A` with center 0 and standard deviation 1. If `A` is a vector, then `normalize` operates on the entire vector.If `A` is a matrix, table, or timetable, then `normalize` operates on each column of data separately.If `A` is a multidimensional array, then `normalize` operates along the first array dimension whose size does not equal 1. ```

example

````N = normalize(A,dim)` returns the z-score along dimension `dim`. For example, `normalize(A,2)` normalizes each row.```

example

````N = normalize(___,method)` specifies a normalization method for either of the previous syntaxes. For example, `normalize(A,'norm')` normalizes the data in `A` by the Euclidean norm (2-norm).```

example

````N = normalize(___,method,methodtype)` specifies the type of normalization for the given `method`. For example, `normalize(A,'norm',Inf)` normalizes the data in `A` using the infinity norm.```

example

````N = normalize(___,'DataVariables',datavars)` specifies variables to operate on when the input data is in a table or timetable.```

## Examples

collapse all

Normalize data in a vector and matrix by computing the z-score.

Create a vector `v` and compute the z-score, normalizing the data to have mean 0 and standard deviation 1.

```v = 1:5; N = normalize(v)```
```N = 1×5 -1.2649 -0.6325 0 0.6325 1.2649 ```

Create a matrix `B` and compute the z-score for each column. Then, normalize each row.

`B = magic(3)`
```B = 3×3 8 1 6 3 5 7 4 9 2 ```
`N1 = normalize(B)`
```N1 = 3×3 1.1339 -1.0000 0.3780 -0.7559 0 0.7559 -0.3780 1.0000 -1.1339 ```
`N2 = normalize(B,2)`
```N2 = 3×3 0.8321 -1.1094 0.2774 -1.0000 0 1.0000 -0.2774 1.1094 -0.8321 ```

Scale a vector `A` by its standard deviation.

```A = 1:5; Ns = normalize(A,'scale')```
```Ns = 1×5 0.6325 1.2649 1.8974 2.5298 3.1623 ```

Scale `A` so that its range is in the interval [0,1].

`Nr = normalize(A,'range')`
```Nr = 1×5 0 0.2500 0.5000 0.7500 1.0000 ```

Create a vector `A` and normalize it by its 1-norm.

```A = 1:5; Np = normalize(A,'norm',1)```
```Np = 1×5 0.0667 0.1333 0.2000 0.2667 0.3333 ```

Center the data in `A` so that it has mean 0.

`Nc = normalize(A,'center','mean')`
```Nc = 1×5 -2 -1 0 1 2 ```

Create a table containing height information for five people.

```LastName = {'Sanchez';'Johnson';'Lee';'Diaz';'Brown'}; Height = [71;69;64;67;64]; T = table(LastName,Height)```
```T=5×2 table LastName Height _________ ______ 'Sanchez' 71 'Johnson' 69 'Lee' 64 'Diaz' 67 'Brown' 64 ```

Normalize the height data by the maximum height.

`N = normalize(T,'norm',Inf,'DataVariables','Height')`
```N=5×2 table LastName Height _________ _______ 'Sanchez' 1 'Johnson' 0.97183 'Lee' 0.90141 'Diaz' 0.94366 'Brown' 0.90141 ```

## Input Arguments

collapse all

Input data, specified as a scalar, vector, matrix, multidimensional array, table, or timetable.

If `A` is a numeric array and has type `single`, then the output also has type `single`. Otherwise, the output has type `double`.

`normalize` ignores `NaN` values in `A`.

Data Types: `double` | `single` | `table` | `timetable`
Complex Number Support: Yes

Dimension to operate along, specified as a positive integer scalar.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Normalization method, specified as one of the following options:

Method

Description

`'zscore'`

z-score with mean 0 and standard deviation 1

`'norm'`

2-norm

`'scale'`

Scale by standard deviation

`'range'`

Scale range of data to [0,1]

`'center'`

Center data to have mean 0

`'medianiqr'`

Center and scale data to have median 0 and interquartile range 1

Method type, specified as a scalar, a 2-element row vector, or a type name, depending on the specified method:

Method

Method Type Options

Description

`'zscore'`

`'std'` (default)

Center and scale to have mean 0 and standard deviation 1

`'robust'`

Center and scale to have median 0 and median absolute deviation 1

`'norm'`

Positive numeric scalar (default is 2)

p-norm

`Inf`

Infinity norm

`'scale'`

`'std'` (default)

Scale by standard deviation

`'mad'`

Scale by median absolute deviation

`'first'`

Scale by first element of data

`'iqr'`

Scale data by interquartile range

Numeric scalar

Scale data by numeric value

`'range'`

2-element row vector (default is [0 1])

Interval of the form `[a b]` where `a < b`

`'center'`

`'mean'`

Center to have mean 0

`'median'`

Center to have median 0

Numeric scalar

Shift center by numeric value

Table variables, specified as the comma-separated pair consisting of `'DataVariables'` and a scalar, vector, cell array, function handle, or table `vartype` subscript. The `'DataVariables'` value indicates which variables of the input table to operate on, and can be one of the following:

• A character vector or scalar string specifying a single table variable name

• A cell array of character vectors or string array where each element is a table variable name

• A vector of table variable indices

• A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

• A function handle that takes the table as input and returns a logical scalar

• A table `vartype` subscript

Example: `'Age'`

Example: `{'Height','Weight'}`

Example: `@isnumeric`

Example: `vartype('numeric')`

collapse all

### Z-Score

For a random variable X with mean μ and standard deviation σ, the z-score of a value x is $z=\frac{\left(x-\mu \right)}{\sigma }.$ For sample data with mean $\overline{X}$ and standard deviation S, the z-score of a data point x is $z=\frac{\left(x-\overline{X}\right)}{S}.$

z-scores measure the distance of a data point from the mean in terms of the standard deviation. The standardized data set has mean 0 and standard deviation 1, and retains the shape properties of the original data set (same skewness and kurtosis).

### P-Norm

The general definition for the p-norm of a vector v that has N elements is

`${‖v‖}_{p}={\left[\sum _{k=1}^{N}{|{v}_{k}|}^{p}\right]}^{\text{\hspace{0.17em}}1/p}\text{\hspace{0.17em}},$`

where p is any positive real value, `Inf`, or `-Inf`. Some common values of p are:

• If p is 1, then the resulting 1-norm is the sum of the absolute values of the vector elements.

• If p is 2, then the resulting 2-norm gives the vector magnitude or Euclidean length of the vector.

• If p is `Inf`, then ${‖v‖}_{\infty }={\mathrm{max}}_{i}\left(|v\left(i\right)|\right)$.

### Interquartile Range

The interquartile range (IQR) of a data set describes the range of the middle 50% of values when the values are sorted. If the median of the data is Q2, the median of the lower half of the data is Q1, and the median of the upper half of the data is Q3, then .

The IQR is generally preferred over looking at the full range of the data when the data contains outliers (very large or very small values) because the IQR excludes the largest 25% and smallest 25% of values in the data.

### Median Absolute Deviation

The median absolute deviation (MAD) of a data set is the median value of the absolute deviations from the median $\stackrel{˜}{X}$ of the data: $\text{MAD}=\text{median}\left(|x-\stackrel{˜}{X}|\right)$. Therefore, the MAD describes the variability of the data in relation to the median.

The MAD is generally preferred over using the standard deviation of the data when the data contains outliers (very large or very small values) because the standard deviation squares deviations from the mean, giving outliers an unduly large impact. Conversely, the deviations of a small number of outliers do not affect the value of the MAD.