Tutorial on Support Vector Machines and using them in MATLAB
A support vector machine (SVM) is a popular machine learning technique that delivers highly accurate, compact models. The learning algorithm optimizes decision boundaries to minimize classification errors and transformations of the feature space using kernel functions that help separate classes.
Learn how support vector machines work and how kernel transformations increase the separability of classes. Also learn how to train SVMs interactively in MATLAB
Published: 6 Apr 2021
Humans are great at making classifications, we do this daily based on available observations and past experiences. But what if we need to make classifications on complex systems, or make them thousands of times a second, and of course, we want them to be reliably accurate. Well, we can use machine learning, algorithms that give computers the ability to learn from data and make decisions.
There are many different models. We'll look at a popular technique known as support vector machines or SVM. SVM are supervised binary classifiers, this means they use label data to learn how to sort observations into two classes. During the training process, the SVM learns how to best separate the data, it does this by finding a boundary that maximizes the margin between two classes.
In this two dimensional case, the boundary is just a line. The data points that define the boundary are known as support vectors, hence, support vector machine. You can imagine expanding this into higher dimensions as we add more parameters to our model. In three dimensions, the boundary between classes is a plane and beyond that, it's a hyperplane. Most real data won't be linearly separable.
How would an SVM handle a data set that looked like this? There is no linear relationship to separate this data. SVMs can transform our data to higher dimensions to find an optimal separating hyperplane. This transformation is called a kernel function. There are many types of kernel functions to systematically find the best transformations for our data.
It's obvious that a circle can be drawn to separate the data, so we'll use the equation of a circle as our kernel function. With the data centered around the origin z squared equals x squared plus y squared transforms the data into three dimensions so we can separate the two classes. SVMs are popular and widely used to solve classification problems. They are available alongside all other popular machine learning algorithms in the classification Lerner and Regression Learner app in MATLAB.
Let's walk through a practical example using SVMs in MATLAB. First, we'll load an example data set built into MATLAB. This data set classifies human activity into one of five categories sitting, standing, walking, running, or dancing. Previously, we talked about SVMs being binary classifiers, but multiple SVMs can be used for multi-class problems.
Let's open the Classification Learner app, which enables you to interactively explore supervised machine learning using various classifiers. Our input data has 60 different features, which are statistics of accelerometer data on a test subject. Then, will choose the response parameter and start the session.
In the app we can view the data set and the activity class as a scatterplot and change the predictors we are viewing. We can choose the type of model we want to train and these are the types of SVM available. We can train different SVMs in parallel, so let's train all the SVMs and use the one with the best results.
Here we can adjust the features of the model that we are training. To facilitate visualizing the decision boundaries we'll use principal component analysis or PCA to reduce the features to just two dimensions. The SVM with the highest accuracy is using a fine Gaussian kernel function. We can view the points that the model correctly and incorrectly classifies as well as a confusion matrix for more details on each class.
Let's export the model to the workspace and explore it a bit further. Using a documentation example in the statistics and machine learning toolbox, you can now visualize the decision boundaries for how the model is classifying new observations. Thanks for watching. For more information on SVMs refer to the links below.