# varfun

Apply function to table or timetable variables

## Syntax

``B = varfun(func,A)``
``B = varfun(func,A,Name,Value)``

## Description

example

````B = varfun(func,A)` applies the function `func` separately to each variable of the table or timetable `A` and returns the results in the table or timetable `B`.The function `func` must take one input argument and return arrays with the same number of rows each time it is called. The `i`th value in the output argument, `B{:,i}`, is equal to `func(A{:,i})`.```

example

````B = varfun(func,A,Name,Value)` applies the function `func` separately to each variable of the table or timetable `A` with additional options specified by one or more `Name,Value` arguments.For example, you can use the `"GroupingVariables"` name-value argument to carry out calculations on groups of data within table variables. For more information about calculations on groups of data, see Calculations on Groups of Data.```

## Examples

collapse all

Apply an element-wise function to the variables of a table.

Create a table containing numeric variables.

`A = table([10.71;-2.05;-0.35;-0.82;1.57],[9.23;3.12;-1.18;0.23;16.41])`
```A=5×2 table Var1 Var2 _____ _____ 10.71 9.23 -2.05 3.12 -0.35 -1.18 -0.82 0.23 1.57 16.41 ```

Round the numeric values in `A` by using the `round` function. To specify a function as an input argument to `varfun`, use the `@` symbol. The variables of the output table `B` have names based on the function and the variable names from `A`.

`B = varfun(@round,A)`
```B=5×2 table round_Var1 round_Var2 __________ __________ 11 9 -2 3 0 -1 -1 0 2 16 ```

You can apply a function, such as `sum` or `max`, that reduces table variables along the first dimension. For example, use `varfun` to calculate the mean of each variable in a table.

Create a table that contains numeric variables.

`A = table([0.71;-2.05;-0.35;-0.82;1.57],[0.23;0.12;-0.18;0.23;0.41])`
```A=5×2 table Var1 Var2 _____ _____ 0.71 0.23 -2.05 0.12 -0.35 -0.18 -0.82 0.23 1.57 0.41 ```

Specify `func` as the `mean` function. To specify `func` as a function handle, use the `@` symbol.

`func = @mean`
```func = function_handle with value: @mean ```

Apply the function to all the variables of table `A`.

`B = varfun(func,A)`
```B=1×2 table mean_Var1 mean_Var2 _________ _________ -0.188 0.162 ```

`B` is a table that contains the mean value from each variable.

To have `varfun` return a numeric vector instead of a table, specify `"OutputFormat"` as `"uniform"`. To use the `"uniform"` output format, `func` must always return a scalar.

`B = varfun(func,A,"OutputFormat","uniform")`
```B = 1×2 -0.1880 0.1620 ```

Create a table that has numeric variables with data and a nonnumeric variable that is a grouping variable. Then perform a calculation on each group within the numeric variables.

Read data from a CSV (comma-separated values) file, `testScores.csv`, into a table by using the `readtable` function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, `School`, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert `School` to a `categorical` variable.

```scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)```
```scores=10×5 table LastName School Test1 Test2 Test3 __________ __________ _____ _____ _____ "Jeong" XYZ School 90 87 93 "Collins" XYZ School 87 85 83 "Torres" XYZ School 86 85 88 "Phillips" ABC School 75 80 72 "Ling" ABC School 89 86 87 "Ramirez" ABC School 96 92 98 "Lee" XYZ School 78 75 77 "Walker" ABC School 91 94 92 "Garcia" ABC School 86 83 85 "Chang" XYZ School 79 76 82 ```

Calculate the mean score for each test by school. The variables `Test1`, `Test2`, and `Test3` are the numeric variables that contain data. The `School` variable is a grouping variable. When you specify a grouping variable, its unique values specify groups that corresponding values in the data variables belong to.

```vars = ["Test1","Test2","Test3"]; meanScoresBySchool = varfun(@mean, ... scores, ... "InputVariables",vars, ... "GroupingVariables","School")```
```meanScoresBySchool=2×5 table School GroupCount mean_Test1 mean_Test2 mean_Test3 __________ __________ __________ __________ __________ ABC School 5 87.4 87 86.8 XYZ School 5 84 81.6 84.6 ```

The output `meanScoresBySchool` also contains a variable called `GroupCount` to indicate the number of entries from the input table `scores` in that group.

Create a timetable containing sample data. The row times of the timetable can also define groups because row times can be duplicates.

```Timestamps = datetime(2023,1,1)+days([0 1 1 2 3 3])'; A = timetable(Timestamps, ... [0.71;-2.05;-0.35;-0.82;1.57;0.09], ... [0.23;0.12;-0.18;0.23;0.41;0.02], ... 'VariableNames',["x","y"])```
```A=6×2 timetable Timestamps x y ___________ _____ _____ 01-Jan-2023 0.71 0.23 02-Jan-2023 -2.05 0.12 02-Jan-2023 -0.35 -0.18 03-Jan-2023 -0.82 0.23 04-Jan-2023 1.57 0.41 04-Jan-2023 0.09 0.02 ```

Compute the mean values of the variables in the timetable. Specify the vector of row times as the grouping variable. The output `B` is a timetable because the input `A` is a timetable. When you specify the vector of row times as the grouping variable, you cannot specify any variable as another grouping variable.

`B = varfun(@mean,A,"GroupingVariables","Timestamps")`
```B=4×3 timetable Timestamps GroupCount mean_x mean_y ___________ __________ ______ ______ 01-Jan-2023 1 0.71 0.23 02-Jan-2023 2 -1.2 -0.03 03-Jan-2023 1 -0.82 0.23 04-Jan-2023 2 0.83 0.215 ```

## Input Arguments

collapse all

Function, specified as a function handle. You can specify a handle for an existing function, define the function in a file, or specify it as an anonymous function. If `func` corresponds to more than one function file (that is, if `func` represents a set of overloaded functions), MATLAB® determines which function to call based on the class of the input arguments.

Use the `"OutputFormat","cell"` name-value argument, if the function `func` take one input argument and returns arrays with a different numbers of rows each time it is called. Otherwise, `func` must return arrays with the same number of rows.

Example: `func = @sum;` calculates the sum of an input.

Example: `func = @(x) x.^2;` calculates the square of each element of an input.

Input table, specified as a table or a timetable.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `InputVariables=["Var2","Var3"]` uses only the variables named `Var2` and `Var3` in `A` as the inputs to `func`.

Specifiers for selecting variables of `A` to pass to `func`, specified as `"InputVariables"` and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, `pattern` scalar, logical vector, or a function handle.

If you specify `"InputVariables"` as a function handle, then it must return a logical scalar, and `varfun` passes only the variables in `A` where the function returns `1` (`true`).

Specifiers for selecting variables of `A` to be grouping variables, specified as `"GroupingVariables"` and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, `pattern` scalar, or logical vector.

The unique values in the grouping variables specify groups. Rows in `A` where the grouping variables have the same values belong to the same groups. `varfun` applies `func` to each group of rows within each of the remaining variables of `A`, rather than to entire variables. For more information on calculations using grouping variables, see Calculations on Groups of Data

Grouping variables can have any of the data types listed in the table.

Values That Specify Groups

Data Type of Grouping Variable

Numbers

Numeric or logical vector

Text

String array or cell array of character vectors

Dates and times

`datetime`, `duration`, or `calendarDuration` vector

Categories

`categorical` vector

Bins

Vector of binned values, created by binning a continuous distribution of numeric, `datetime`, or `duration` values

If any grouping variable contains `NaN`s or missing values (such as `NaT`s, undefined `categorical` values, or missing strings), then the corresponding rows do not belong to any group, and are excluded from the output.

Row labels can be grouping variables. You can group on row labels alone, on one or more variables in `A`, or on row labels and variables together.

• If `A` is a table, then the labels are row names.

• If `A` is a timetable, then the labels are row times.

The output, `B`, has one row for each group of rows from the input, `A`. If `B` is a table or timetable, then `B` has:

• Variables corresponding to the input table variables that `func` was applied to.

• Variables corresponding to the grouping variables.

• A new variable, `GroupCount`, whose values are the number of rows of the input `A` that are in each group.

Note: If `B` is a timetable, then `B` also has:

• Row times, where the first row time from each group of rows in `A` is the corresponding row time in `B`. To return `B` as a table without row times, specify `"OutputFormat"` as `"table"`.

Format of `B`, specified as `"OutputFormat"` and either the value of `"auto"`, `"table"`, `"timetable"`, `"uniform"`, or `"cell"`.

 `"auto"` (default) (since R2023a) `varfun` returns an output whose data type matches the data type of the input `A`. `"table"` `varfun` returns a table with one variable for each variable in `A` (or each variable specified with `"InputVariables"`). For grouped calculations, `B` also contains the grouping variables and a new `GroupCount` variable.`"table"` allows you to use a function that returns values of different sizes or data types for the different variables in `A`. However, for ungrouped calculations, `func` must return arrays with the same number of rows each time it is called. For grouped calculations, `func` must return values with the same number of rows each time it is called for a given group.If `A` is a table, then this is the default output format. `"timetable"` `varfun` returns a timetable with one variable for each variable in `A` (or each variable specified with `"InputVariables"`). For grouped calculations, `B` also contains the grouping variables and a new `GroupCount` variable.`varfun` creates the row times of `B` from the row times of `A`. If the row times assigned to `B` do not make sense in the context of the calculations performed using `func`, then specify the output format as `"OutputFormat","table"`.If `A` is a timetable, then this is the default output format. `"uniform"` `varfun` concatenates the output values into a vector. `func` must return a scalar with the same data type each time it is called. `"cell"` `varfun` returns the output as a cell array. `"cell"` allows you to use a function that returns values of different sizes or data types.

Function to call if `func` fails, specified as `"ErrorHandler"` and a function handle. Define this function so that it rethrows the error or returns valid outputs for function `func`.

MATLAB calls the specified error-handling function with two input arguments:

• A structure with these fields:

 `identifier` Error identifier. `message` Error message text. `index` Index of the variable for which the error occurred. `name` Name of the variable for which the error occurred.
• The set of input arguments to function `func` at the time of the error.

For example,

```function [A, B] = errorFunc(S, varargin) warning(S.identifier, S.message) A = NaN; B = NaN;```

## Output Arguments

collapse all

Output values, returned as a table, timetable, cell array, or vector.

If `B` is a table or timetable, then it can store metadata such as descriptions, variable units, variable names, and row names. For more information, see the Properties sections of `table` or `timetable`.

collapse all

### Calculations on Groups of Data

In data analysis, you commonly perform calculations on groups of data. For such calculations, you split one or more data variables into groups of data, perform a calculation on each group, and combine the results into one or more output variables. You can specify the groups using one or more grouping variables. The unique values in the grouping variables define the groups that the corresponding values of the data variables belong to.

For example, the diagram shows a simple grouped calculation that splits a 6-by-1 numeric vector into two groups of data, calculates the mean of each group, and then combines the outputs into a 2-by-1 numeric vector. The 6-by-1 grouping variable has two unique values, `AB` and `XYZ`.

You can specify grouping variables that have numbers, text, dates and times, categories, or bins.

## Version History

Introduced in R2013b

expand all