Traditional blob analysis is not efficient on FPGAs due to the multipass nature of the algorithm. For real-time video, I would instead recommend an approach similar to the one in the example "Pothole Detection" in Vision HDL Toolbox.
If you look at that model, you will see it uses several preprocessing steps: bilateral filtering, sobel edge detection, masking, and morphological closing before trying to find the objects. Then in the block called Centroid31, you will see that a 31x31 region of the binary image is used and the centroid and total active area are calculated. The following block, DetectAndHold, uses the area metric to decide if the 31x31 region is both above the user's threshold and above the previously found maximum area. In this way, this example finds the largest single area above the user's threshold. Then the X,Y location from the centroid are send to another block that displays a marker and text on the output video.
For your application you will want to reset the maximum found value from the area metric when the detected area is low or some period of time or number of video lines in between blobs. Another approach would be use non-maximal supression on the calculated area which is shown in the example "FAST Corner Detection." This will remove most, but not all of the over-detected areas.