Main Content

Calculations on Data in Tables

This example shows how to perform calculations on data in tables. Tables can contain both numeric and nonnumeric data. After you identify the table variables that contain numeric data, you can access the data in those variables by using either curly braces or dot notation. Then you can perform arithmetic operations or call functions on the numeric data and assign the result back into the table, all in one line of code. You also can use the rowfun function for calculations across the rows of a table and the varfun function for calculations along the variables. If your table has groups of data within it, you can use the groupsummary, rowfun, and varfun functions to perform calculations for each group in the table.

Read Sample Data into Table

Read data from a CSV (comma-separated values) file, testScores.csv, into a table by using the readtable function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, School, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert School to a categorical variable.

scores = readtable("testScores.csv","TextType","string");
scores.School = categorical(scores.School)
scores=10×5 table
     LastName       School      Test1    Test2    Test3
    __________    __________    _____    _____    _____

    "Jeong"       XYZ School     90       87       93  
    "Collins"     XYZ School     87       85       83  
    "Torres"      XYZ School     86       85       88  
    "Phillips"    ABC School     75       80       72  
    "Ling"        ABC School     89       86       87  
    "Ramirez"     ABC School     96       92       98  
    "Lee"         XYZ School     78       75       77  
    "Walker"      ABC School     91       94       92  
    "Garcia"      ABC School     86       83       85  
    "Chang"       XYZ School     79       76       82  

Create Subtable with Numeric Data

Create a subtable that has only the numeric data. Because the first two variables have nonnumeric data, one straightforward way is to index into the table specifying the other variables.

numericScores = scores(:,3:end)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     90       87       93  
     87       85       83  
     86       85       88  
     75       80       72  
     89       86       87  
     96       92       98  
     78       75       77  
     91       94       92  
     86       83       85  
     79       76       82  

As an alternative, you can specify variables by data type using the vartype function. This function is useful when you have a large table with many variables that have different data types. It returns a subscript that you can use to specify table variables.

numericVars = vartype("numeric")
numericVars = 
	table vartype subscript:

		Select table variables matching the type 'numeric'

	See Access Data in a Table.

numericScores = scores(:,numericVars)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     90       87       93  
     87       85       83  
     86       85       88  
     75       80       72  
     89       86       87  
     96       92       98  
     78       75       77  
     91       94       92  
     86       83       85  
     79       76       82  

Perform Arithmetic and Call Functions on Table Data

The test scores are based on a 100-point scale. To convert them to scores on a different scale, multiply the scores by the same factor.

However, you cannot use the same syntax that you would use for a numeric array. Instead, extract the data into an array, scale it, and assign it back into the table. In this way, you can perform arithmetic operations or call functions on the contents of a table in one line of code.

To extract the contents of a table, use curly braces. For example, this syntax specifies all rows and all table variables of numericScores and returns the result as a numeric matrix. As long as the table variables have compatible data types, you can concatenate their contents in this way.

A = numericScores{:,:}
A = 10×3

    90    87    93
    87    85    83
    86    85    88
    75    80    72
    89    86    87
    96    92    98
    78    75    77
    91    94    92
    86    83    85
    79    76    82

If a matrix has the same size as a table and a compatible data type, you can assign the matrix to the contents of the table.

Scale the numeric data so the test scores are on a 25-point scale, and assign it back into the table.

numericScores{:,:} = numericScores{:,:} * 0.25
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     22.5    21.75    23.25
    21.75    21.25    20.75
     21.5    21.25       22
    18.75       20       18
    22.25     21.5    21.75
       24       23     24.5
     19.5    18.75    19.25
    22.75     23.5       23
     21.5    20.75    21.25
    19.75       19     20.5

As an alternative to curly braces, you can use T.Variables to specify all table variables. T.Variables represents the same thing as T{:,:} syntax.

For example, subtract the minimum value within each table variable from that variable.

numericScores.Variables = numericScores.Variables - min(numericScores.Variables)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

    3.75        3     5.25 
       3      2.5     2.75 
    2.75      2.5        4 
       0     1.25        0 
     3.5     2.75     3.75 
    5.25     4.25      6.5 
    0.75        0     1.25 
       4     4.75        5 
    2.75        2     3.25 
       1     0.25      2.5 

You can perform calculations on one variable at a time by using dot notation and variable names. For example, add a correction worth five points to the last set of scores in Test3. The other table variables are unaffected by this operation.

numericScores.Test3 = numericScores.Test3 + 5
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

    3.75        3     10.25
       3      2.5      7.75
    2.75      2.5         9
       0     1.25         5
     3.5     2.75      8.75
    5.25     4.25      11.5
    0.75        0      6.25
       4     4.75        10
    2.75        2      8.25
       1     0.25       7.5

Calculate Across Each Table Row

Find the mean, minimum, and maximum values of the test scores for each student. Calculate these values across each row. Assign them to the original table as new table variables.

One simple, useful way is to extract the data into a matrix, call a function on it, and then assign the output to a new table variable. For example, calculate the mean test scores across each row. Then add them to scores in a new table variable, TestMean. Use curly braces to extract the numeric data from Test1, Test2, and Test3 into a matrix. To calculate the mean across rows, specify the dimension as 2 when you call mean.

vars = ["Test1","Test2","Test3"];
scores.TestMean = mean(scores{:,vars},2)
scores=10×6 table
     LastName       School      Test1    Test2    Test3    TestMean
    __________    __________    _____    _____    _____    ________

    "Jeong"       XYZ School     90       87       93           90 
    "Collins"     XYZ School     87       85       83           85 
    "Torres"      XYZ School     86       85       88       86.333 
    "Phillips"    ABC School     75       80       72       75.667 
    "Ling"        ABC School     89       86       87       87.333 
    "Ramirez"     ABC School     96       92       98       95.333 
    "Lee"         XYZ School     78       75       77       76.667 
    "Walker"      ABC School     91       94       92       92.333 
    "Garcia"      ABC School     86       83       85       84.667 
    "Chang"       XYZ School     79       76       82           79 

Another way to perform calculations across rows is to use the rowfun function. You do not need to extract data from the table when using rowfun. Instead, pass the table and a function to apply to the data as input arguments to rowfun. While the syntax is a little more complex, rowfun can be useful when the function that you apply takes multiple input arguments or returns multiple output arguments.

For example, use the bounds function to find the minimum and maximum test scores. The bounds function returns two output arguments, so apply it to scores by using rowfun. The output of rowfun is a new table that has TestMin and TestMax variables. In this case, also specify "SeparateInputs" as false so that values across each row are combined into a vector before being passed to bounds.

minmaxTest = rowfun(@bounds, ...
                    scores, ...
                    "InputVariables",vars, ...
                    "OutputVariableNames",["TestMin","TestMax"], ...
                    "SeparateInputs",false)
minmaxTest=10×2 table
    TestMin    TestMax
    _______    _______

      87         93   
      83         87   
      85         88   
      72         80   
      86         89   
      92         98   
      75         78   
      91         94   
      83         86   
      76         82   

Concatenate scores and minmaxTest so that these values are in one table.

scores = [scores minmaxTest]
scores=10×8 table
     LastName       School      Test1    Test2    Test3    TestMean    TestMin    TestMax
    __________    __________    _____    _____    _____    ________    _______    _______

    "Jeong"       XYZ School     90       87       93           90       87         93   
    "Collins"     XYZ School     87       85       83           85       83         87   
    "Torres"      XYZ School     86       85       88       86.333       85         88   
    "Phillips"    ABC School     75       80       72       75.667       72         80   
    "Ling"        ABC School     89       86       87       87.333       86         89   
    "Ramirez"     ABC School     96       92       98       95.333       92         98   
    "Lee"         XYZ School     78       75       77       76.667       75         78   
    "Walker"      ABC School     91       94       92       92.333       91         94   
    "Garcia"      ABC School     86       83       85       84.667       83         86   
    "Chang"       XYZ School     79       76       82           79       76         82   

Calculate Along Each Table Variable

Find the mean score for each test. Calculate these values along the table variables.

The simplest way is to use mean. First use curly braces to extract the numeric data from Test1, Test2, and Test3 into a matrix. Then call mean to calculate the mean of each column of the matrix. The output is a numeric vector where each element is the mean of a table variable.

vars = ["Test1","Test2","Test3"];
meanOfEachTest = mean(scores{:,vars})
meanOfEachTest = 1×3

   85.7000   84.3000   85.7000

Another way to perform calculations along table variables is to use the varfun function. You do not need to extract data from the table when using varfun. Instead, pass the table and a function to apply to the data as input arguments to varfun.

Calculate the mean scores using varfun. The output is a new table with meaningful names for the table variables.

meanOfEachTest = varfun(@mean, ...
                        scores, ...
                        "InputVariables",vars)
meanOfEachTest=1×3 table
    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________

       85.7          84.3          85.7   

Calculate Using Groups of Data Within Table

If your table has one or more grouping variables, then you can perform calculations on groups of data within the table. You can use the values in a grouping variable to specify the groups that the rows belong to.

For example, the School variable in scores has two values, ABC School and XYZ School. You can think of these two values as categories that denote groups of data in scores. In this case, you can perform calculations by school.

To apply a function and use grouping variables, you can use the varfun function. You can specify a function, such as mean, and then use varfun to apply it to each table variable that you specify. When you also specify grouping variables, varfun applies the function to each group within each table variable.

Calculate the mean score for each test by school.

vars = ["Test1","Test2","Test3"];
meanScoresBySchool = varfun(@mean, ...
                            scores, ...
                            "InputVariables",vars, ...
                            "GroupingVariables","School")
meanScoresBySchool=2×5 table
      School      GroupCount    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________    __________    __________

    ABC School        5            87.4            87          86.8   
    XYZ School        5              84          81.6          84.6   

Starting in R2018a, you also can use the groupsummary function to perform calculations on groups of data in each table variable.

meanScoresBySchool = groupsummary(scores,"School","mean",vars)
meanScoresBySchool=2×5 table
      School      GroupCount    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________    __________    __________

    ABC School        5            87.4            87          86.8   
    XYZ School        5              84          81.6          84.6   

The syntax for groupsummary is a bit simpler. Also, you can use groupsummary to specify multiple methods at once. For example, find both the minimum and maximum scores of each test by school.

minmaxBySchool = groupsummary(scores,"School",["min","max"],vars)
minmaxBySchool=2×8 table
      School      GroupCount    min_Test1    max_Test1    min_Test2    max_Test2    min_Test3    max_Test3
    __________    __________    _________    _________    _________    _________    _________    _________

    ABC School        5            75           96           80           94           72           98    
    XYZ School        5            78           90           75           87           77           93    

To use all the predefined methods of groupsummary, specify "all" as the method. Calculate all statistics on the mean test score by school.

allStatsBySchool = groupsummary(scores,"School","all","TestMean")
allStatsBySchool=2×13 table
      School      GroupCount    mean_TestMean    sum_TestMean    min_TestMean    max_TestMean    range_TestMean    median_TestMean    mode_TestMean    var_TestMean    std_TestMean    nummissing_TestMean    nnz_TestMean
    __________    __________    _____________    ____________    ____________    ____________    ______________    _______________    _____________    ____________    ____________    ___________________    ____________

    ABC School        5            87.067           435.33          75.667          95.333           19.667            87.333            75.667           57.967          7.6136                0                  5      
    XYZ School        5              83.4              417          76.667              90           13.333                85            76.667           29.856           5.464                0                  5      

Sometimes you might want to find a particular value in one table variable and then find the corresponding value in another table variable. In such cases use rowfun.

For example, find the student in each school who had the highest mean test score. The attached supporting function, findNameAtMax, returns both the highest score and the name of the student who had that score. Use rowfun to apply findNameAtMax to each group of students. The rowfun function is suitable because findNameAtMax has multiple input arguments (last names and test scores) and also returns multiple output arguments.

maxScoresBySchool = rowfun(@findNameAtMax, ...
                           scores, ...
                           "InputVariables",["LastName","TestMean"], ...
                           "GroupingVariables","School", ...
                           "OutputVariableNames",["max_TestMean","LastName"])
maxScoresBySchool=2×4 table
      School      GroupCount    max_TestMean    LastName 
    __________    __________    ____________    _________

    ABC School        5            95.333       "Ramirez"
    XYZ School        5                90       "Jeong"  

Supporting Function

function [maxValue,lastName] = findNameAtMax(names,values)
    % Return maximum value and the last name 
    % from the row at which the maximum value occurred
    [maxValue,maxIndex] = max(values);
    lastName = names(maxIndex);
end

See Also

| | | | |

Related Topics