Background subtraction: ViBe

Wednesday, 18 September 2013 14:28 Stefano Tommesani
Print

The ViBe algorithm

In the paper "ViBe: A universal background subtraction algorithm for video sequences", Olivier Barnich and Marc Van Droogenbroeck introduce several innovative mechanisms for motion detection:

ViBeSearchingPixelsPixel model and classification process

Let's denote by v(x) the value in a given Euclidean color space taken by the pixel located at x in the image, and by vi a background sample value with an index i. Each background pixel x is modeled by a collection of N background sample values
M(x) = {v1, v2, . . . , vN}
taken in previous frames. To classify a pixel value v(x) according to its corresponding model M(x), we compare it to the closest values within the set of samples by defining a sphere SR(v(x)) of radius R centered on v(x). The pixel value v(x) is then classified as background if the cardinality of the set intersection of this sphere and the collection of model samples M(x) is larger than or equal to a given threshold. The classification of a pixel value v(x) involves the computation of N distances between v(x) and model samples, and of N comparison with a thresholded Euclidean distance R.
The accuracy of the ViBe model is determined by two parameters only: the radius R of the sphere and the minimal cardinality. Experiments have shown that a unique radius R of 20 (for monochromatic images) and a cardinality of 2 are appropriate. There is no need to adapt these parameters during the background subtraction nor to change them for different pixel locations. 

Background model initialization from a single frame

Many popular techniques described in the literature need a sequence of several dozens of frames to initialize their models; such an approach makes sense from a statistical point of view as it seems necessary to gather a significant amount of data in order to estimate the temporal distribution of the background pixels. But this learning period cannot be allowed by many applications, as it may be necessary to

A more convenient solution is to provide a technique that will initialize the background model from a single frame, so that the response to sudden illumination changes is straightforward: the existing background model is discarded and a new model is initialized instantaneously. What's more, being able to provide a reliable foreground segmentation as early on as the second frame of a sequence has obvious benefits for short sequences in video-surveillance. Since there is no temporal information in a single frame, ViBe assumes that neighboring pixels share a similar temporal distribution, so it populates the pixel models with values found in the spatial neighborhood of each pixel. The size of the neighborhood needs to be chosen so that it is large enough to comprise a sufficient number of different samples, while keeping in mind that the statistical correlation between values at different locations decreases as the size of the neighborhood increases.
The only drawback is that the presence of a moving object in the first frame will introduce an artifact called a ghost (that is, a set of connected points, detected as in motion but not corresponding to any real moving object). In this particular case, the ghost is caused by the unfortunate initialization of pixel models with samples coming from the moving object. In subsequent frames, the object moves and uncovers the real background, which will be learned progressively through the regular model update process, making the ghost fade over time. ViBe's update process ensures both a fast model recovery in the presence of a ghost and a slow incorporation of real moving objects into the background model. 

Updating the background model over time

The classification step of ViBe compares the current pixel value vt(x) directly to the samples contained in the background model of the previous frame, Mt−1(x) at time t − 1. But which samples have to be memorized by the model and for how long? The classical approach to the updating of the background history is to discard and replace old values after a number of frames or after a given period of time.
The question of including or not foreground pixel values in the model is one that is always raised for a background subtraction method based on samples; otherwise the model will not adapt to changing conditions. It comes down to a choice between a conservative and a blind update scheme:

The ViBe update method incorporates three important components:

It is worth analysing the strengths of the ViBe algorithm:

VibeGhost

 

Performance analysis

After this long introduction to ViBe algorithms, it is time how it behaves on the test sequence used in this series of articles about background segmentation.

The first test adds a slight blur filter as pre-processing, and a 5x5 median filter as post-processing to remove noise from the results mask, as with other non-parametric algorithms:

This is the code fragment that receives an OpenCV image, applies the filters and returns the processing results inside an OpenCV mask:

void ViBeBGS::process(const cv::Mat &img_input, cv::Mat &img_output)
{
    if (img_input.empty())
        return;
 
    if (img_input.channels() == 3)
        cv::cvtColor(img_input, gray_input_image, CV_BGR2GRAY);
    else if (img_input.channels() == 1)
            img_input.copyTo(gray_input_image);
 
    cv::Mat filtered_input_image;
    cv::GaussianBlur(gray_input_image, filtered_input_image, cv::Size(5,5), 1.5);
    detector->Update(filtered_input_image.cols, filtered_input_image.rows, filtered_input_image.step, ViBe::PixelFormat_Gray8, filtered_input_image.data);
    cv::Mat mask(detector->GetComputedMaskHeight(), detector->GetComputedMaskWidth(), CV_8UC1,
                 const_cast<void*>(detector->GetComputedMaskBuffer()), detector->GetComputedMaskStride());
    cv::medianBlur(mask, img_output, 5);
}

The ViBe SDK is extremely easy to use: after creating an instance of the ViBe::ViBeDetector() class, the user passes a grayscale image to the Update() method, and gets the resulting mask by calling the GetComputedMaskBuffer() method. Moving buffers in and out of OpenCV is effortless, as adding pre- and post-processing filters based on OpenCV image primitives.

ViBe authors claim that it has a low computational cost and it is suitable to embedded implementations. Benchmarking the code above results in a median elapsed time of 2357 microseconds per frame (2735 microseconds per frame on average), way below the computation load of other algorithms offering the same level of performance.

Still, we can remove both the pre- and post- filtering, and check if the claims about resilience against noise hold true, and how much the computational load is due to the ViBe algorithm alone:

This is the modified code, calling just the ViBe API:

void ViBeBGS::process(const cv::Mat &img_input, cv::Mat &img_output)
{
    if (img_input.empty())
        return;
 
    if (img_input.channels() == 3)
        cv::cvtColor(img_input, gray_input_image, CV_BGR2GRAY);
    else if (img_input.channels() == 1)
            img_input.copyTo(gray_input_image);
 
    detector->Update(gray_input_image.cols, gray_input_image.rows, gray_input_image.step, ViBe::PixelFormat_Gray8, gray_input_image.data);
    img_output = cv::Mat(detector->GetComputedMaskHeight(), detector->GetComputedMaskWidth(), CV_8UC1,
                 const_cast<void*>(detector->GetComputedMaskBuffer()), detector->GetComputedMaskStride());
}

Without pre- and post-processing, the computational load drops to 1506 microseconds per frame (2010 microseconds on average), slower only that the basic methods such as Frame Difference and Medians.

ViBeBenchmark

Still, a median filter is really a rough post-processing step, and some blob-oriented processing goes a long way toward reducing false alarms and properly outlining moving objects. The following code fragment adds two more steps of post-processing:

  1. blobs are extracted from the filtered mask, smaller blobs that have an area smaller than 20 pixels are dropped, as they are probably due to noise or misclassified parts of bigger moving objects, and the blobs are redrawn as filled areas to eliminate internal holes
  2. a morphological closure with size 7x7 is applied to further reduce partially closed holes in the blobs
void ViBeBGS::process(const cv::Mat &img_input, cv::Mat &img_output, cv::Mat &img_bgmodel)
{
  if (img_input.empty())
      return;
 
  if (img_input.channels() == 3)
  {
    cv::cvtColor(img_input, gray_input_image, CV_BGR2GRAY);
    cv::Mat filtered_input_image;
    cv::GaussianBlur(gray_input_image, filtered_input_image, cv::Size(5,5), 1.5);
    detector->Update(filtered_input_image.cols, filtered_input_image.rows, filtered_input_image.step, ViBe::PixelFormat_Gray8, filtered_input_image.data);
    cv::Mat mask(detector->GetComputedMaskHeight(), detector->GetComputedMaskWidth(), CV_8UC1,
                 const_cast<void*>(detector->GetComputedMaskBuffer()), detector->GetComputedMaskStride());
    // median filtering
    cv::medianBlur(mask, mask, 5);
 
    // find blobs
    std::vector<std::vector<cv::Point> > v;
    std::vector<cv::Vec4i> hierarchy;
    cv::findContours( mask, v, hierarchy, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
    mask = cv::Scalar(0, 0, 0);
    for ( size_t i=0; i < v.size(); ++i )
    {
        // drop smaller blobs
        if (cv::contourArea(v[i]) < 20)
            continue;
        // draw filled blob
        cv::drawContours(mask, v, i, cv::Scalar(255,0,0), CV_FILLED, 8, hierarchy, 0, cv::Point() ); 
    }
 
    // morphological closure
    cv::Mat element = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(7, 7));
    cv::morphologyEx(mask, mask, cv::MORPH_CLOSE, element);
 
    mask.copyTo(img_output);
  }
}

The result of this enhanced post-processing phase is highlighted by the following video: blobs, mostly without internal holes, are more uniform and follow closely the outline of moving objects, and there are fewer false alarms due to winding trees.

To further test the base ViBe algorithm and the additional filtering, I used a subset of video sequences from the changedetection.net website (for a description of the various categories of videos, please see the table below in the ViBe+ description). The following table shows the output from the bare ViBe algorithm on the left, and the post-processed version in the right column. Being ViBe a non-parametric algorithm, no optimization was applied to any video sequence, and all results were achieved with standard settings. There are various parameters that can be tweaked in the post-processing phase, e.g. size of the morphology element, minimum size of blobs, leading to reduced errors, but in order to provide a baseline performance evaluation the post-processing code was always as shown above.

highwayHighway

category Baseline

PedestriansPedestrians

category Baseline

OfficeOffice

category Baseline

canoe Canoe

category Dynamic Background

please refer to the description of ViBe+ to see how it improves handling of blinking pixels


fountain01 Fountain 1

category Dynamic Background

fountain2Fountain 2

category Dynamic Background

post-processing filters are effective as removing impulse noise in the output mask due to spilling water


parkingParking

category Intermittent Object Motion

in this sequence moving objects are quite small, and the default post-processing parameters hide motion of walking people as the minimum size of blobs is too high

steetlightStreetlight

category Intermittent Object Motion

SofaSofa

category Intermittent Object Motion

clear example of how ViBe avoids inserting foreground objects in the background model, and so it can quickly recover when these objects are taken away

backdoorBackdoor

category Shadow

busstationBus station

category Shadow

PeopleinshadePeople in shade

category Shadow

 

How ViBe+ improves on ViBe

ViBe+, described in the paper "Background Subtraction: Experiments and Improvements for ViBe" by M. Van Droogenbroeck and O. Paquot, improves on several aspects of the ViBe algorithm:

ViBeBlinkingPixels

All these improvements moving from ViBe to ViBe+ leads to the following results in the public dataset provided on the http://www.changedetection.net web site. The dataset contains 31 video sequences, grouped in 6 categories: baseline, dynamic background, camera jitter, intermittent object motion, shadow, and thermal.

ViBeChangeDetection

Baseline This category contains four videos, two indoor and two outdoor. These videos represent a mixture of mild challenges typical of the next 4 categories. Some videos have subtle background motion, others have isolated shadows, some have an abandoned object and others have pedestrians that stop for a short while and then move away. These videos are fairly easy, but not trivial, to process, and are provided mainly as reference.
Dynamic Background There are six videos in this category depicting outdoor scenes with strong (parasitic) background motion. Two videos represent boats on shimmering water, two videos show cars passing next to a fountain, and the last two depict pedestrians, cars and trucks passing in front of a tree shaken by the wind
Camera Jitter This category contains one indoor and three outdoor videos captured by unstable (e.g., vibrating) cameras. The jitter magnitude varies from one video to another.
Shadows This category consists of two indoor and four outdoor videos exhibiting strong as well as faint shadows. Some shadows are fairly narrow while others occupy most of the scene. Also, some shadows are cast by moving objects while others are cast by trees and buildings.
Intermittent Object Motion This category contains six videos with scenarios known for causing “ghosting” artifacts in the detected motion, i.e., objects move, then stop for a short while, after which they start moving again. Some videos include still objects that suddenly start moving, e.g., a parked vehicle driving away, and also abandoned objects. This category is intended for testing how various algorithms adapt to background changes.
Thermal In this category, five videos (three outdoor and two indoor) have been captured by far-infrared cameras. These videos contain typical thermal artifacts such as heat stamps (e.g., bright spots left on a seat after a person gets up and leaves), heat reflection on floors and windows, and camouflage effects, when a moving object has the same temperature as the surrounding regions.

 

ViBeResultsTable1

ViBeResultsTable2

 

ViBeLogo
I would like to thank Prof. M. Van Droogenbroeck for taking the time of explaining the details of ViBe and ViBe+, and letting me access and test the ViBe SDK.ViBe is patented technology and requires a license for commercial usage. For more information about ViBe, please refer to the ViBe website at www.vibeinmotion.com. The description of the ViBe algorithm is extracted from the following paper: O. Barnich and M. Van Droogenbroeck.ViBe: A universal background subtraction algorithm for video sequences. In IEEE Transactions on Image Processing, 20(6):1709-1724, June 2011. The description of the ViBe+ algorithm is extracted from the following paper: M. Van Droogenbroeck and O. Paquot.Background Subtraction: Experiments and Improvements for ViBe. In Change Detection Workshop (CDW), Providence, Rhode Island, June 2012. Both papers can be downloaded at the following web address: www.vibeinmotion.com/Product/References.aspx

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

Background subtraction: ViBe
Wednesday, 18 September 2013

© 2018 - Stefano Tommesani


Powered by QuoteThis © 2008
Last Updated on Tuesday, 15 October 2013 11:50