Autonomous Navigation, Part 3: Understanding SLAM Using Pose Graph Optimization
From the series: Navegación autónoma
This video provides some intuition around Pose Graph Optimization - a popular framework for solving the simultaneous localization and mapping (SLAM) problem in autonomous navigation.
We’ll cover why uncertainty in a vehicle’s sensors and state estimation makes building a map of the environment difficult and how pose graph optimization can deal with it. We’ll also briefly cover occupancy grid maps as one way to represent the environment model.
In the last video, we talked about how we could estimate an autonomous vehicle’s pose, that’s its location and orientation, if we already had a model or a map of the environment. In this video, we’re not going to have a map ahead of time. We’re going to have to build one while simultaneously figuring out where the vehicle is within in that map using a process called SLAM; Simultaneous Localization and Mapping.
There are many different SLAM algorithms, but they can mostly be classified into two groups; filtering and smoothing. Filtering, like the extended Kalman filter or the particle filter, models the problem as an on-line state estimation where the robot state (and maybe part of the environment) is updated on-the-go as new measurements become available.
On the other hand, smoothing techniques estimate the full robot trajectories from the complete set of measurements, not just the new measurements. Understanding these methods can be more intuitive if we address them under a graph-based formulation. In fact, in recent years, one particular framework, pose graph optimization (or more generically, factor graph optimization) has become the de facto standard for most modern SLAM software solutions (like g2o or GTSAM). So, in this video, we are going to focus on understanding what pose graph optimization is and why it works. So, I hope you stick around for it. I’m Brian, and welcome to MATLAB Tech Talk.
Let’s set up the problem. We have an autonomous vehicle, a robot, that has the ability to move through an environment. We’ve given it lidar to sense the distances and angles to nearby obstacles and we’ve given it a way to dead reckon its relative position over time using odometry. In this case, it uses wheel encoders to count the number of rotations each wheel makes as it drives and from that it estimates how far it has gone and how it has turned since its last known position.
So, the question is, how can we use these sensors to understand what the environment map looks like and figure out the robot’s pose over time? Well, let’s start with a simple mapping problem.
Here, I’ve placed the robot in a rectangular room with a circular obstacle in the corner. We can see this map and see where the robot it, but it has no idea of either. To it, it’s just in an unknown void. Let’s start with a very ideal situation, one in which there is no uncertainty or errors in the lidar or odometry measurements, they’re just perfect. In this situation, developing a map of the environment is relatively trivial. The robot could take a lidar measurement and we’d have confidence that what it measured is the real obstacle location. It could save that off in a global map and continue on. After driving some distance, which again we know precisely because of the perfect odometry, it takes another measurement. The two measurements will align nicely in the global map since there are no errors in our system and in this way we can drive all around the environment and create a perfect map, all while knowing exactly where we are at all times.
Of course, this scenario isn’t realistic. There is error in both the lidar measurement and the odometry and so there is some uncertainty in the estimated robot pose and in the distances to the measured obstacles. So, let’s try mapping this room one more time, but this time we’ll say the lidar is still pretty accurate, but there is a large uncertainty in the odometry measurements. Maybe one of the wheels keeps slipping so the robot thinks it’s moving in a different direction than it actually is.
So, let’s see how this works out for our robot. We get an initial lidar measurement, and we assume that there is an obstacle there, just like before, and then we drive a bit. Except, this time, the estimated pose is different than the real pose. So, the lidar returns some relative distances to the wall that the real robot sees, but we can only assume it is relative to the estimated robot pose since that is all we know.
This places the two measured obstacles in different frames and not a global environment frame and so they don’t line up. If we continue this process as we drive the robot around the room, the error in odometry causes our estimated pose to deviate more from the real pose and what we’re left with in the end, is this map of obstacles that don’t resemble the real environment. So, this is one of the reasons why we can’t just take measurements of the environment and stick the results into a map. Uncertainty in our system is going to mess it up. But this is where SLAM algorithms can help us.
Now, there are several different ways to approach solving the SLAM problem. As I said at the beginning, in this video we’re going to focus on Pose Graph Optimization. I want to provide a little intuition behind the algorithm and give you a feeling for how it works without a bunch of math, but if you want to learn more about the mathematics, I’ve left some great resources in the description that go through it in detail.
Just like before, we have our real and estimated robot poses, which at the beginning lie right on top of each other in the real environment map. But now, on the right, there is also this blank area. And this is where we are going to build up the pose graph.
The first thing our robot does is take a measurement of the environment. This measurement is associated with the current estimated robot pose and we can add both to the pose graph. So, essentially what we’re doing is saying that this pose is defined as an X and Y location and a rotation angle and along with it we save off the the distances and angles to the sensed obstacles. Now, there is is also uncertainties associated with this pose entry, but I'll get to that in a second.
Alright, we have our first entry in the post graph, let’s get the second. Again, the robot drives for a little bit and our estimated pose starts to deviate from the real pose. Another measurement of the environment, which gets associated with the new estimated pose and this combination is now saved in the pose graph.
So, we have two poses, each with their own local estimate of where the obstacles are. And even though we don’t know where these two poses are in the environment, we do have an idea of about how far apart they are from each other. In our case, this came from counting the wheel rotations, but the idea behind this SLAM algorithm isn’t tied to a specific set of sensors. The relative pose distances could have come from another internal measurement source like an IMU, or we could have figured out how far the robot moved from other external sources like GPS or visible odometry. The point is that we have some best guess as to how far these two poses are from each other, as well as a measure of how confident we are in that guess.
In this way, there is a constraint that we can place on the relative distance between these two poses. Ideally, they would stay exactly this far apart, since that’s our best estimate, but due to our uncertainty in the odometry process, maybe we’d be better off if we moved these two poses around relative to each other. For example, it would be nice if there was a mechanism that would be able move the two poses around to align the currently misaligned obstacles. Well, there is! And the constraints are the key to doing so.
Now, to visualize a constraint, I think it’s helpful to imagine a rubber bar connecting the two poses. The nominal length of the bar is how far apart we estimate them to be. With no external forces on these poses, the bar will just keep them at this fixed distance. However, if I hold one pose fixed and move the other pose closer to or further from it, it will compress or stretch the rubber bar and there will be a force that wants to restore the poses back to their nominal distance. The strength of the rubber bar depends on how confident we are in the distance estimate. If we have more confidence, or we have a really good odometry process, then this is a really strong bar that makes it difficult to change this nominal distance. If we have almost no confidence, then this rubber bar is very weak and provides almost no restoring force.
So, in pose graph nomenclature, we’d say the poses are the nodes of the graph and the constraint, or the rubber bar, is an edge. Of course, this constraint acts in all three pose dimensions, both X and Y, and in rotation, always trying to bring it back to its estimated distance. Alright, there’s not much we can do with two nodes and one edge, so let’s keep going.
Just like before, the robot keeps driving along and measuring the environment. And after a measurement we can take the estimated pose, along with its measurement of the local obstacles, and place it in our pose graph and add a constraint. So now we have three nodes and two edges. And we can continue this, filling out our pose graph one pose at a time until we have something that looks like the incorrect map that we built earlier, with the exception that there are constraints connecting all of the poses.
Now, we still can’t do much with this information, because you can imagine that all of the bars are at their nominal length and everything wants to stay right where it is. However, we’re at a point where something interesting can happen. Our first pose and the current pose are both observing the same feature in the environment. This means that we can build a new edge, a new constraint between these two nodes. We just need to understand how these two features align to figure out where the two poses have to be relative to each other. And, in this example, they would have to be in the exact same location.
So, now we can add a new constraint that connects the first and last poses and closes the loop. This bar has a nominal length of 0, since these two poses want to be right on top of each other. And the strength of this rubber bar depends on how confident you are in your external measurements and your ability to associate the two sets of data as the same feature. If you’re really confident, like we are in this example, this bar is extremely strong, it really wants to pull these two poses together.
So, hopefully you can start to see that a loop closure is the thing that makes all of this work. Without a closure, all of the constraints were just happily being met. But with a loop closure, we have a way to inject tension into this graph. The blue bar wants to pull these two nodes together and the purple bars all want to keep the relative distances the way they are. Allowing this graph to settle to equilibrium, or to balance all of the forces that each of the constraints are imposing is the optimization part of pose graph optimization.
And what’s really cool about this is that by optimizing the pose graph, not only do we have a better estimate of the current pose and a better model of the environment, but we also have a better estimate of where the robot was in the past, since all of the past poses were updated as well. So, we got a lot of value from this one loop closure. Of course, we don’t have a perfect model of the environment, or the robots poses, but that’s ok because we can continue to improve our estimate by making more loop closures.
The robot can continue to drive, adding new poses from odometry, and new loop closures whenever it’s able to determine a relationship using external data. For this example, let’s assume we were able to relate these three poses together even though this area of the room is relatively featureless.
At this point, all of the previous constraints are the same as they were before, and until we added these loop closures the graph was in equilibrium. But now that we’ve added new loop closures, this has created more tension, which wants tugs and pushes on all of the poses a little bit more, and so we need to rerun the optimization again and allow the graph to re-settle.
In this way, you can see how after each loop closure the graph will continue to improve its global model of the environment and the past robots poses.
That is, as long as the two poses that you connect are truly looking at the same environment feature. If you make an incorrect association, and you build an edge with a constraint that is not real, you can imagine how that would persists in your graph and continue to skew your optimization process every time you run it. It would always be pulling those nodes in a direction away from truth. So, it is incredibly important that you are confident in any loop closures that you make. In fact, it is better to miss a real loop closure than it is to add a false one. You can always drive around some more and make another closure in the future, it’s harder to remove the bad ones once they’re in.
Now, it’s important to note that a loop closure requires an absolute measurement of the environment, like we got with lidar, or you could get with a visible camera, or some other environment sensor. You can’t close the loop using a relative sensor like an IMU or the wheel encoders because there is no way to relate that information back to an older pose. You may think you’re near a past node, but without checking the environment you can’t know for sure.
Ok, so we have our pose graph and we’ve optimized it, now how do we build a map of the environment from all of these data points? Well, there’s several different ways to represent an environment model but I’m going to quickly introduce the binary occupancy grid. The idea behind it is that the environment is broken up into a grid of cells. If you believe the cell it occupied, you set the value to a 1, and not occupied you set it to a 0. Here’s I’m assuming the entire environment is empty, and filling in the cells where I believe there is an obstacle. You could also assume the opposite, where the cells are all occupied and you set them to a zero when your sensor indicates that there’s nothing there. You can see that with just 1’s and 0’s I we’ve generated a pretty accurate model of the rectangular room and part of the circular obstacle. There are some holes and we can fill them in by driving around more, but overall it looks ok.
Now, there are also probabilistic occupancy grids where the cell doesn’t have to be fully occupied or not and instead there is a probability that it’s occupied between 0 and 1. With this model, you have full confidence in black and white cells and everywhere else is some shade of gray depending on your uncertainty.
And there we have it, SLAM using pose graph optimization. Hopefully now you can see how a map of the environment and the pose of the robot can be determined simultaneously with this method. And now that we have an occupancy grid map, we can start the process of planning a future trajectory through this environment. The robot no longer has to wander around aimlessly, but can use this map to plan where it wants to go. And we’re going to cover some ways to approach planning in the next video.
But, before I end this one, I want to remind you that a great way to learn this stuff is to just practice it and try it out on your own. There is a MATLAB example that uses the navigation toolbox called Implement SLAM with Lidar Scans that builds up an occupancy grid map of an environment using just Lidar, no relative odometry process required. Essentially, the environment has enough unique features in it that the lidar measurement of every pose can be related back to other past poses just through data association - or feature matching. It’s worth checking out and playing around with. I’ve linked to another video below that walks through this exact example if you want more information on it.
Ok, that’s where I’ll end this video. If you don’t want to miss any other future Tech Talk videos don’t forget to subscribe to this channel. And you want to check out my channel, control system lectures, I cover more control theory topics there as well. Thanks watching, and I’ll see you next time.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.