Training and Validating Object Detectors - MATLAB & Simulink
Video Player is loading.
Current Time 0:00
Duration 16:58
Loaded: 0.97%
Stream Type LIVE
Remaining Time 16:58
 
1x
  • descriptions off, selected
  • en (Main), selected
    Video length is 16:59

    Training and Validating Object Detectors

    From the series: Perception

    After generating ground truth data in part 1, Sebastian Castro and Connell D’Souza of MathWorks go over the workflow for using this labeled data to train and evaluate an aggregate channel features object detector. This is done using built-in MATLAB® training functions.

    Sebastian and Connell show you how to use built-in functions to create training and evaluation datasets from labeled ground truth data. Once created, the training dataset is used to train an object detector using a single line function.

    The trained detector is then used on an independent video stream to identify the objects of interest. The results are compared against the ground truth for this independent video stream to evaluate the trained detector. Sebastian and Connell also discuss metrics for object detector evaluation. Download all the files used in this video from MATLAB Central's File Exchange

    Additional Resources:

    Published: 18 Oct 2018

    Hello, everyone. And welcome to the MATLAB and Simulink Robotics Arena. In this video, we're going to continue talking about Ground Truth for object detection. And just as the previous video, we've got Connell here with us. Hey, Connell.

    Hey, Sebastian. How's it going?

    Pretty good. How about you?

    Pretty good, man.

    Awesome. So in the last part, we talked about what Ground Truth was and how we collected it in MATLAB. So in this video, I think we're going to talk about using this Ground Truth, which we'll recap for you, to actually train some kind of object detector. So why don't you run us through what we're going to see today?

    All right, thanks, Sebastian. We're going to do a quick recap of part 1. And we're going to talk about how we label the Ground Truth a little bit. And then, finally, we're going to get into the meat of today's topic, which is training the object detectors using this Ground Truth data.

    And then we talk a little bit about how can you evaluate the object detectors that you just trained. And then we do key takeaways as usual and point you guys to resources.

    So without wasting too much time, let's jump right into the recap from the part 1. So you probably remember this slide from 1 one where we sort of showed you how to create Ground Truth. And we told you that the Ground Truth is created by actually manually labeling images. And we use the Ground Truth labeler app, for that, which is part of the automated driving system Toolbox.

    So moving on, so this is the workflow that's involved with using this Ground Truth data within MATLAB to train detectors. As you remember from the last video, we created a Ground Truth object, which is, again, it's a special MATLAB data object that stores Ground Truth data. And I sort of showed you what this data object contains.

    So if you remember in the previous video, we had a couple of different labels. We had a red buoy and a green buoy. And so the next thing you want to do is you want to isolate these labels. So you can train individual detectors to detect either one of those objects.

    Once you've isolated the labels, it creates another Ground Truth object. But this new Ground Truth object has only labeled data from one particular kind of label.

    The next step is to actually create a training data set. Now, MATLAB provides you with a function that can help you do that. And it's the object detector training data function takes in this new Ground Truth object that you have created, which has data from only one particular label. And it creates a table that has a bunch of sampled images from that video file, as well as bounding box locations of the object in the image.

    The next step after that is to actually use a function called the trainACFObjectDetector. The ACF is an aggregate channel features object detector. It's a particular kind of a machine learning detector. We have these kind of functions for a few other detectors as well. I know we have one for RCNNs and the faster RCNN. And you can take a look at the documentation to figure that out. And finally, you use the detect function to actually use the detector to identify things in an image or a video stream.

    Right. So then, I guess the workflow kind of stays the same, but you can then try different detectors as part of that piece--

    Correct, yeah. So it's--

    --with different options and then kind of figure out how well each of those performs. So, yeah.

    Absolutely. So you can change one line of code and basically try a bunch of different detectors. I'm going to jump quickly into MATLAB and show you how is this workflow executed in MATLAB. So in MATLAB, I have this I have this live script setup that will that'll help us show this workflow.

    And we can talk about the script a little bit. But the first thing that I'm going to do is I'm going to load the Ground Truth data. Now if you remember this Ground Truth Training dot mat file was the one that we created at the end of our last video, where I had this the completely labeled Ground Truth session, and I exported the labels to the MATLAB workspace and saved them as a mat file.

    So the first thing I'm going to do is I'm just going to go and load this Ground Truth object. And if I go to the workspace, I can see this Ground Truth object that is popped up here.

    The second step that we discussed earlier was to actually isolate the labels. So I'm going to use the selectLabels function. And I'm going to isolate all the labels that have the name "big red buoy" in it. So basically, we're going to train a detector that can identify big red boys.

    So I'm just going to run this section. And what this section is also doing is it's-- so this little piece of code here is basically just creating a directory called training data. And I'm also going to be extracting a subset of this Ground Truth data into a training data set.

    And I use the object detector training data function for it. And as you can see, I'm passing in the-- I'm passing in this Ground Truth object. And I have a few other options in there. So sampling factor basically defines how often from your Ground Truth data set is an image being sampled.

    So you're taking a couple of images from your original data, such as the video, and kind of mapping them to that Ground Truth you extracted to just get a set of images and their labels to then pass it to your detector.

    Correct. Exactly, exactly. So as you can see, it has-- it's sort of tell me what it's done, but it has written images. It is written 103 images from the original video that we used to this training detector folder. So if you go in here, we see this training detector folder that has all the images in there.

    Anyway, so going back, now that we have extracted this data set, if you remember, the next step in the workflow was to actually use some of the training functions that we have. Now, I'm going to be using an ACF detector, which is, as I mentioned earlier, an aggregate channel features object detector.

    And again, all that I'm doing is I'm passing in this training data table that we just created from the earlier function call and passing it into this function. So let me go and hit Run. Now the ACF detector sort of trains in stages. I've set this to train for five stages.

    There are options that you can manipulate. The five stages work the best for this particular use case. But again, it's up to you to figure out what works best for you. And as you can see, MATLAB tells me what it's doing. So it's currently in stage 1. It's telling me the number of weak learners and some different metrics from this training.

    As you can see, in about 120 seconds, we've trained this detector. And then what the rest of the code is doing is that it's just saving this detector to a mat file. So now let's go and see how well this detector actually works. So I have another script out here.

    So OK, so in this new script that we have here, we're going to run this detector on an independent video stream. So I'm going to hit Run, and then I can walk through the code a little bit.

    But as you can see, it's loading in the detector and then just looping through it using this detect function. And as you can see the output on the right, it's actually doing fairly well for what was a 17-second video clip. I think we were able to extract Ground Truth from a smallish data set and detect these buoys in a completely different-- completely independent video.

    Right. So if you look at the output of that detect function of the object detector, now because you're doing object detection, you have the bounding box or the location on the image of your object. But then you also have a confidence score that you want to be high to basically determine whether it's an object. So you might have others that are lower confidence.

    Yeah. And I mean, so if you see, there's a small little part of the code that that's actually selecting the most confident detection.

    I see.

    So although it will detect this multiple times, I'm only sort of visualizing the most-- the one that's most confident, the one the dectector's most confident of.

    And sometimes you can also filter. For example, just show the ones that are above a certain score.

    Yep. Another interesting thing to note is we are detecting things that have a single color. The reason why-- so when you look at this video, it's also got a small, red buoy. And it's also picking that up.

    But as you can see, the confidence scores are a lot lower. And that's why you need Ground Truth, right? Because if a simple color thresholding algorithm would have picked up the smaller buoy as a buoy as well. So this is just a more efficient way of dealing with that.

    Right. So now, what's good is that we've done this testing on an independent source. We've kind of looked at it visually. But I think we've got other ways to just make sure that we can quantify the performance of your detectors, yeah?

    Yes. So before we do that, I'm just going to hop back into the presentation real quick, and just a bit of a recap on what we've just done. And then I'll show you how to do evaluation stuff after that.

    So going back into the presentation, the three important functions that we use were the splitLabel functions, which isolates the labels of interest. So we chose the big red buoy label. The second one is the objectDetectorTrainingData. Now, this one, as you recall, it samples a certain amount of training images from your ground Truth data set. And lastly, we used the trainACFObjectDetector.

    And as you can see, we've got functions for the R-CNN, fast R-CNN, and faster R-CNN detectors as well. So you can try this script out with these different functions. And you can see how well they do.

    The next part of this video is actually evaluating these object detectors. Now, we've trained the object detector. The next thing we want to do is see how well it does against some Ground Truth data source.

    So what I did is I-- the second video that we had, the independent set, I put that through the Ground Truth labeler app and labeled some images there. And what I did was I, again, extracted that Ground Truth data object.

    And remember, this is an independent data. Set so you don't want to evaluate your detector on the same data set that you trained it on because that will show you 100% results most of the time. And it's not good because that's what they call overfitting. So I've taken this independent data set and created an evalData table now.

    The next step in the process is to use the detector that we trained earlier and extract the results. Now, we have to save these results in a particular way. And if you take a look at the script, these scripts will be up on File Exchange. So you can download them and take a look at them.

    But if you take a look at the script, we actually construct a structure that holds the results in a particular way. And then what you do is you we provide a couple of functions that you can use to evaluate this. So those functions take in the evalData table and the results of your detector. And you can calculate things like precision and miss rate and stuff like that.

    So let's go to MATLAB and see how we can do this. So now I have a third script that I've written called Evaluate Object Detector it's loading in the Ground Truth data. Now one thing that you need to remember is that this Ground Truth set that I'm loading in is actually the Ground Truth generated from that independent video set from the second video.

    So I'm just going to run this real quick and see what happens. So it's loaded in the data set. It's creating an evaluation data folder in my current folder. And then it's constructing an image data store. Now, this image data store is something that you guys should probably take a look at and look at the documentation.

    It's a data store for image data. And you can sort of segregate images based on labels. So it's really good with dealing with large chunks of data. In a moment you, have like 10,000 plus images. And hopefully, you can use it for your deep learning activities.

    But as you can see, it has extracted 100 images. It's going to create the data store now. And this was the structure I spoke about, creating a result structure that can store results from our detector in a particular format that's accepted by those evaluation functions.

    And so as we scroll down again, it's running this detect function on this new evaluation data set that we've created. Notice that it's reading in images from the IMDS, which is the Image Data Store.

    OK. And as we scroll down, it's here where I'm using those two functions-- the evaluateDetectionPrecision and the evaluateDetectionMissRate functions and if I scroll a little more I can see these two plots that are generated.

    Let me pop these out for a quick second. So these plots are two pretty famous plots in object detector evaluation 1 is the average precision. And the other one is the log average miss rate. So again, precision, as the name suggests, is how well your detector is done, how many correct hits has it got for when comparing it to your Ground Truth data.

    And the log average miss rate is the number of false positives per image against the miss rate. One value that I want to highlight here is this thing called overlap, which I'm sort of highlighting on my screen right here. So the overlap is the measure of how much these two bounding boxes line up between your Ground Truth results and the results of your detector.

    Now, if you want to be really tough on your detector, and you want to say, oh, I want the bounding boxes to line up exactly, you'd set this overlap to 1. So 0.5 is the default value.

    Now, as you see here with this overlap value of 0.5, it says that my detector is actually very, very precise. It says that it's almost 90% precise on an average. Let's see what happens the moment I increase this value to, say, 0.6.

    And I'm just going to run this section again. And you see the average percision' coming out to be 0.7 now, which is-- again this is, again-- it's a very relative metric. You Have to figure out what you're comfortable with in the detector that you're designing. But yeah, this these are a couple of evaluation functions that we provide.

    Let's jump back to the presentation for a quick second. And we can do some we can do some recap and some key takeaways.

    Right. So for a simple classification problem, then you really only have to worry about for a particular image or data source. Was it label correct? Was it a true positive, false, positive false negative? But now we've thrown that overlap into this.

    Yes.

    So given a certain overlap threshold, that's the area basically that where the Ground Truth and the actual detection overlap. And you've got these metrics.

    Yeah. So just to do a little bit of a recap, precision, as the formula suggests on the screen, is the ratio of your true positives to the sum of all the positive detections. Recall, on the other hand, is what is also called as your detection rate. And its percentage of correct detections.

    So you have the ratio of your true positives to the sum of the true positives and false negatives. And then the miss rate is also called the false negative rate. And that means it's a measure of the likelihood that a target will be missed.

    All right, so moving on to some key takeaways. As we've shown you, there are training functions available for both machine learning or deep learning based detectors. And you can take a look at our documentation to figure out more about these.

    One thing that I want to emphasize on is that you should be using an independent data set to evaluate your detectors or to prevent things like miss fitting and having your detector not work when it's supposed to work, which is not fun. And lastly, the metrics for evaluation need to be understood and chosen based on your use case. So like we mentioned, that overlap is a very relative value. You've got to figure out what overlap threshold are you OK with for your use case and take it from there.

    All right. So thanks for running us through that. We just went all the way from getting some labeled Ground Truth data, as well as a completely independent data source for evaluation, training and object detector, and looking at different ways to quantify how well it performed.

    So as always, if you want to reach out to us, you can get us via email, via Facebook. And check out the other resources below. We'd like to hear what you're using some of these detectors for. So do reach out. And thanks for watching.