Hand Gesture Recognition -1 :background subtraction

The aim here is to segment the hand region in a video sequence.
It is assumed that the video capture  device is stationary .
The background is relatively stationary.
The hand motion is assumed to be dominant motion in the scene.
The first step in hand region segmentation is hand region segmentation.
The hand region segmentation is performed using background 
subtraction technique.This method is computationally efficient and 
robust to illumination changes and can efficiently segment hand 
region against complex backgrounds.
The CB algorithm adopts a quantization/clustering technique to model 
a background.Samples at each pixel are clustered into the set of 
codewords. The background is encoded on a pixel by pixel basis.
Let X be a training sequence for a single pixel consisting of N RGB or
 YUV or HSV vectors: X = {x1 , x2 , ..., xN }. 

The training pixels are  obtained from the video sequence 
by capturing the image for a few  seconds where there are no foreground
objects in the scene.

Let C = {c1 , c2 , ..., cL } represent the codebook for the pixel con-
sisting of L codewords.
Each pixel has a different codebook size based on its sample variation.

Each codeword c_i , {i =1 . . . L}, consists of an intensity vector 
vi  and a 6-tuple vector xi = {Imin , Imax , fi , λi , pi , qi .}
The tuple xi contains intensity (brightness) values and temporal variables
described below.

Imin and Imax are minimum and maximum brightness values
f is the frequency with which the codeword has occured.
λ is maximum interval in the training period that codeword has not 
occured.
p,q are the first and the last access times that code word has occured. 
In The training period each sample x sampled at time t is compared to
the current codebook to determine which codewords if any match.
if two codewords provide good matches then the color distortion measure
is used to resolve the ambiguity.

During the training period a run length is maintained of codeword
has not re-occurred during the training period.This indicates that codeword
was generated by foreground object or noise hence is eliminated.
The maximum run  length can be configured by the parameter λ.A large value
of λ indicates that it is a foreground event which was stationary for only
small period .

The frequency of occurrence of codeword is also recorded.A large value of 
f over a period T and small value of λ indicates that the object is a 
foreground object which was stationary during the time period T.

A small value of f may indicate noise or very fast moving objects which
will be accompanied by large value of λ.

A small value of λ and small value of f indicates a periodic event 
occurring  in the background,since the frequency of occurrence is small
by the pixel is occurring repeatedly.

consider the background pixel values or a particular pixel in 
video sequence p1,p2,p3.
the variation in brightness of pixels p1,p2,p3 will be reflected in the
variation of codewords corresponding to the pixels p1,p2,p3.
A low and high bound can be assigned to the pixel values which define
the allowable range of pixel values corresponding to the same codeword.
This range is can be called I_low and I_high and all the pixel values
lying in this range will be assigned the same codeword.
pixel intensity value of p1 is compared with the entires int the codebook.
the pixel intensities lies within the range I_low and I_high of some
entry in the codebook then the run length of non occurrence is set to zero
and the count is also incremented.
in case on ambiguity between two codewords,the color distortion measure
can be used to determine codeword that is assigned to the pixel.

We assume that the training phase has been completed and we have
a codebook associated with each pixel.

For all the codeword in the codebook we find the codeword that matching
the given pixel such that given pixel intensity lies within the bounds
I_low and I_high and the color distortion .
If we find that non of the entries in the codebook satisfy the condition
the we consider pixel to be foreground .If we find entry in the codebook
corresponding the pixel then it is considered as background pixel.

Implementation Details
The present algorithm has implementation available in OPENCV

CvBGCodeBookModel* model cvCreateBGCodeBookModel() - This method creates 
the background model.

I think mode->modMin model->modMax allows you to set the I_low and I_high.
and model->cbounds allows us to set the mminimum color distortion
for the pixel to be considered as the belonging to the codeword.

For the first 100 frames the method
cvBGCodeBookUpdate( model, yuvImage );
is called to update the background model.

one the learning has been completed
cvBGCodeBookClearStale( model, model->t/2 ); is called
model->t is total run length of pixel during the training phase
The above call removes the pixels that were not present for
more that 50% of the training period ie non stationary pixels.

One the background has been obtained

cvBGCodeBookDiff( model, yuvImage, ImaskCodeBook );
cvSegmentFGMask( ImaskCodeBookCC );

can be called to obtain the foregound image mask in ImaskCodeBook

The above code was written using OpenCv C api using QtSDK on
ubuntu 12.04 OS.

Change the path os the Opencv libraries int project .pro file 
according to you environment.

change nframesToLearnBG or -nframes 100 as argument to configure the 
number of background frames.

Disadvantages 
The model performs poorly in presense of shadows eg poorly lit large 
indoor rooms where shadows may occur in background.

Enchancements
The same model can be made adaptive by performing update and 
cleaning of the model  periodically.
This feature will be explored in the future blogs.

Example code is available in opencv /samples/c/bgfg_codebook.cpp directory
The above code is modified version of the code including largest blob 
selection and ROI selection.

The example for the above program can be found in the below link
http://sdrv.ms/Kk3pRW

References :
1. BACKGROUND MODELING AND SUBTRACTION BY CODEBOOK CONSTRUCTION
Kyungnam Kim1 , Thanarat H. Chalidabhongse2 , David Harwood1 ,
Larry Davis1 .Computer Vision Lab, University of Maryland, College Park, 
MD 20742, USA Faculty of Information Technology, 
King Mongkut’s Institute of Technology, Thailand

2. Learning OpenCV: Computer Vision with the OpenCV Library
Gary Bradski (Author), Adrian Kaehler (Author)
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s