Background estimation is a common problem in image processing that, in some cases, could be resolved by using a simple method. If you imagine a highway with cars passing by, most of the time, the background is visible (e.g., road, trees, traffic signs) with an occasional “disturbance” in the form of a vehicle. These outliers are usually easily solved by using a median filter.
To find a background image, we need to calculate the median of each pixel in a specific window e.g., 50 last images. This could be done by stacking images in a 2D array and calculate the median for each pixel. However, most computer vision libraries already have a ready-made method like rank_image in Halcon.
The results are shown in the first video. Upper left corner shows an original video, upper right is the basic median method, bottom left improved method discussed in the next paragraph, and the bottom-right absolute difference between the original and the improved method.
For the most part, results are impressive, but if you look closely in the latter part of the video, some artifacts are created on the top of the road, because the background is rarely free from cars. This method doesn’t work for heavily congested roads, at least not for the whole image. A slight improvement can be achieved.
Instead of adding every pixel to our buffer (image stack), let’s add only the ones where the change between neighboring frames was less than X (some arbitrary value e.g., 50) in pixel intensity. Instead of adding their values, we could put a random number between 0 and 255. This results in an even distribution until pixels with similar values arrive. In the original algorithm, the intensities could group around the colors of the vehicles before the road went free. The salt and pepper-like noise can be solved with a simple median filter, but it will not be used in the video, depending on the application needs. The difference between the original and improved algorithm is not very visible in the first video (upper right and bottom left corners), but if you pay attention to the middle lane in the next video, differences are quite visible. As promised, the algorithm does not work for the congested lanes.
To emphasize this difference, in the third video, you can see histograms of a single pixel in the middle lane, and a jumping red line, which represents a median of the stacked images in our chosen window. The second median becomes stable much faster at a value that is approximately the same intensity as the expected background (road).
There is one more interesting phenomenon happening in the third video; instead of one normal distribution in the histogram, we see two different peaks. Our median does not end up in the middle of a larger one. The solution for this problem might be taking the most common element in the histogram instead of a median, but this will be a topic for another article.