Person Tracking in Video Surveillance [629735]

Person Tracking in Video Surveillance
Systems Using Kalman Filter

C. Suliman1, C. Cruceru1, G. Macesanu1, F. Moldoveanu1
1Transylvania University of Brasov ,
caius.suliman @unitbv.ro , [anonimizat], gigel.macesanu @unitbv.ro , [anonimizat]

Abstract -In this paper we have developed a Simulink based
model for monitoring a contact in a video surveillance sequence.
To correctly identify a contact in a surveillance video, we have
used the Horn- Schunk optical flow algorithm. The position and
the behavior of the correctly detected contact was monitored with
the help of the traditional Kalman filter. Here we compare the
results obtained from the optical flow with the ones obtained from
the Kalman filter, and we show the correct functionality of the
Kalman filt er based tracking. The tests were performed using
video data taken with the help of a fix camera. The tested alg o-
rithm has shown promising results.
I. INTRODUCTION
The problem of using vision to track and understand the be-
havior of humans is a very important one. The main applic a-
tions that it has are in the areas concerning human -robot inte-
raction [7], robot learning, and video surveillance.
Here we try to focus our attention on video surveillance sy s-
tems. A high level of s ecurity in public places is an extremely
complex challenge. A number of technologies can be applied to
various aspects of security, including biometric systems,
screening systems, and video surveillance systems. Nowadays
video surveillance systems act as l arge- scale video recorders,
analog or digital. These systems serve two main purposes: to provide a human operator with images to detect and react to
potential threats and recording for future investigative purpos-es.
From the perspective of real -time detect ion, it is well known
that the human’s visual attention drops below acceptable levels even if that operator is a trained one in the task of visual moni-
toring. Video analysis technologies can be applied to develop
smart surveillance systems that can aid the operator in the de-
tection and in the investigatory tasks.
For surveillance applications, the tracking problem is a fu n-
damental component. In video surveillance one of the most
used method for tracking contacts is the particle filter
[8][10][11][13]. Another well known method in the research
community is the use of the traditional Kalman filter [9]. In
many cases the use of this type of filter is sufficient. This is due
to the controlled indoor and outdoor environments that are used
in the studies.
Many pap ers in the literature detail methods that track single
persons only [6][10], but there are also many authors that de-
scribe different methods for the detection and tracking of mul-
tiple persons [2][3][5][11] . Most of these methods involve as
testing grounds indoor environments [1][3][8][13] as well as
outdoor environments [2][5][8][ 9], where these methods are applied to track groups.
The objective of this paper is the development of a video
surveillance system capable of tracking a person in an outdoor
environment. In Section II we describe the structure of the pr o-
posed video surveillance system. In Section III we present the
method us ed for contact detection and the method used for the
extraction of useful data from the video feed. Section IV d e-
scribes the Kalman filter algorithm applied in our case. In Se c-
tion V and VI we present the results obtained from the Sim u-
link model’s simulati on, the conclusions drawn from this study
and the possible future developments.
II. S
URVEILLANCE SYSTEM STRUCTURE
In this paper we examine the feasibility of using the optical
flow algorithm in conjunction with the Kalman filter algorithm
[9][12] for tracking a contact in a surveillance scene. In order
to create an algorithm that is able to track a contact in a scene, three different, large -scale task must be accomplished (see Fi g-
ure 1) . First the algorithm needs to take an incoming survei l-
lance v ideo signal and segment it into a stream of frames where
contacts are distinguished from the background of the scene. The next step is the tracking of the contact throughout the vi d-
eo sequence. Finally, the resulting track must be processed in order to ana lyze the contact’s behavio r.

Figure 1. The surveillance system structure.

For the segmentation process of the incoming video signal ,
the optical flow algorithm developed by Horn and Schu nk was
used [4]. The optical flow algorithm approximates the move-
ment of the contact in the current frame as referenced to the
previous frame. By deter mining the motion of objects, one can
distinguish between the contact and the background of the
scene. After careful tuning and processing, the output of the
segmentation process is passed to the Kalman filter algorithm for further processing.
The Kalman filter is a recursive, adaptive filter that operates
in the state space. It is well known for its abilit y to track o b-

jects in a timely and accurate manner. The tracking algorithm
developed in this paper is able to process one contact at a time.
III. OPTICAL FLOW ANALYSIS
One of the important blocks presented in the above scheme
is the so called o ptical flow analysis block. The main purpose
of this block is to determine the existence of possible contacts
in the incoming video signal and process them in such manner
that the Kalman filter will be able to track them with minimal error.
In Figure 2 we will presen t the main component parts of the
Optical Flow Analysis block.

Figure 2 . The optical flow analysis b lock.

In the following we will describe the functionality for each
component block.
A. Segmentation
In our case, the term segmentation is used to describe the
process through which a video signal passes to become a series
of binary images. At the output of this sub -block, each of the
resulting binary images will contain black and white are as. The
black areas correspond to the portion of the frame where no motion was detected, and the white areas correspond to the
portion of the frame where motion was detected.
The surveillance system was developed in Matlab’s Sim u-
link. At first the incomin g video signal is coded in the RGB
color space . Because we use the optical flow to detect motion,
the video signal needs to be converted to the intensity color space (see Figure 4) . To estimate the optical flow between two
images we use the algorithm devel oped by Horn and Schunk.
In our case this algorithm is used to compute the optical flow between the current frame and the previous one. This is one of
the tunable parameters used in our experiments. Another i m-
portant tunable parameter used in the optical f low estimation is
the smoothness factor. This parameter is defined as a constraint
which controls how smoothly the velocity field of the bright-
ness pattern in images varies throughout the image. The Horn –
Schunk algorithm quantifies the smoothness of the ve locity
filed using the magnitude of the gradient of the optical flow velocity defined as in :

2 2




∂∂+


∂∂
yu
xuand 2 2




∂∂+


∂∂
yv
xv, (1) where u and vare the velocity vectors corresponding to the
optical flow. A small value for this gradient indicates that the
vector field is very smooth; a higher one indicates the contrary.
A smooth vector field tends to zero- out regions where no m o-
tion is de tected leaving only limited areas of non -zero vector
fields. In the Simulink model of our surveillance system, this
smoothness factor is inversely proportionally to the magnitude
of the velocity gradients. Our experiments pointed out that the
optimal value for the smoothness factor is 0.6.

Figure 3 . The segmentation process .

Before the processed video signal exits the segmentation
sub-block it is compar ed with a certain threshold to keep only
what interests us from the video feed.
B. Median Filtering
One of the biggest problems that optical flow has it’s that it
is very sensitive to changes in illumination or to the quality of
the video. This s ensitivity conduces in erroneous blobs appea r-
ing in individual frames. If these blobs are large, that means that they are approaching the average size of a real person, they
can create problems for successful morphological operations.

Figure 4 . Median filtering .

One of the main reason for choosing the median filter is that
most of these abnormalities, the erroneous blobs, appear in singular frames and they do not appear again for several more frames. The median filter is used to decrease the effect of these

abnorm alities while still maintaining the information of the
correctly detected contacts.
In Figure 3 it can be seen that the segmented image contains
many abnormalities. After the median filtering many of these
abnormalities are gone (see Figure 4).
C. Morphological Operations
The morphological operation will process the video signal
coming from the output of the median filtering sub -block in
such way that all erroneous blobs residing in the image are
eliminated and all and only the correctly detected blob is main-
tained and classified as a real contact. The main morphological
operations used by us in this study are the erosion and dilata-
tion. Optimal erosion is achieved when the structuring element
keeps at least the remnant of a blob for all correct contacts. I f
we use a sub -optimal structuring element for the erosion and
dilatation operations, a valid contact could be lost completely,
or an erroneous blob could be tracked. Both these errors can
produce significant barriers to optimal dilatation and to Kalman
filter tracking. Optimal dilation is obtained when the structu r-
ing element merges all remnants of a single blob into one con-
tact. If a sub -optimal structuring element is used for dilation,
one contact could be viewed as multiple contacts or multiple
contacts could be viewed as one contact. After an optimal
structuring element for erosion was determined, each frame
was eroded using the chosen structuring element. Determin a-
tion of the optimal structuring element for dilation was similar
to that of erosion. Each frame was dilated with a square stru c-
turing element .
An infinite number of possibilities exist for size and shape of
structuring elements. Depending on the data used, the size and
shape of the optimal structuring element could vary significan t-
ly.

Figure 5. Morphological operations .

Comparing the morphological altered image to the image r e-
sulting after the median filtering (see Figure 5 ) shows that the
erosion operation removed the remaining erroneous blobs r e-
siding in the image , thus de ciding that they were not contact s.
Further dilation has created a solid blob out of the area where
motion was detected, and this blob will be tracked as a contact.
D. Blob Analysis
The main func tionality of the blob analysis sub -block is to
determine the minimum size of a blob and the maximum num-
ber of blobs that will be used in the Kalman tracking process
and in the visualization step. By setting the minimum blob size
we obtain a new level of pr otection against abnormalities by
specifying a minimum size that a blob must have in order to be correctly tracked. Thus, any blob that doesn’t fulfill this cond i-
tion will not be tracked.
The other tunable parameter of the blob analysis sub -block,
the maximum number of blobs, is used to set the number of
Kalman filters to be used in the tracking process. In our case this parameter was set to 1.
IV.
KALMAN FILTERING
Filtering is a very used method in engineering and embedded
systems. A good fil tering algorithm can reduce the noise from
signals while retaining the useful information. The Kal man
filter is a mathematical tool that can estimate the variables of a
wide range of process es. It estimates the states of linear sy s-
tems. This type of filter works very well in practice and that is
why it is often implemented in embedded control system and because we need an accurate estimate of the process variables.
The discrete Kalman filter is characterized by both a process
model and a measurement equatio n.
The process model is characterized by the assumption that
the present state,
kx, can be related to the past state, 1−kx, as
follows:

k kk k w x x +Φ=−1 , (2)

where kw is assumed to be a discrete, white, zero -mean
process noise with known covariance matrix, kQ; kΦ
represents the state transition matrix which determines the rel a-
tionship between the present state and the previous one.
In our case we try to track the state of a contact based on its
last known state. Here, the state vector consists of a two –
dimensional position expressed in Cartesian coordinates, a two –
dimensional velocity and a two -dimensional acceleration. By
considering a constant acceleration, the state transition matrix
can be determined from the basic kinematic equations as fo l-
lows:

2
1 1 121ta tv s sk k k k − −−++= , (3)
ta v vk k k 1 1−−+= , (4)
1−=k ka a , (5)

where s is defined to be the contact’s position, v is its veloc i-
ty, a is the contact’s acceleration and t is the sampling period.
In a matrix form, the above equations can be written as:









=



−−−−−−
1,1,1,1,1,1,
,,,,,,
1 000000 100001 010000 101005.00101005.00101
kykxkykxkykx
kykxkykxkykx
aavvss
aavvss
. (6)

Here, the subscripts x and y refer to the direction of the co n-

tacts position, velocity and acceleration in the two -dimensional
plane . The value of the sampling period is set to 1. From the
above equation the state transition m atrix, kΦ, is:






1 000000 100001 010000 101005.00101005.00101k . (7)

The measurement equation is defined as:

k kk k v xH z += (8)

where kz represents the measurement vector, kv is assumed
to be a discrete, white, zero -mean process noise with kn own
covariance matrix, kR. The matrix kH describes the relatio n-
ship between the measurement vector, kz, and the state vector,
kx. Given the fact that the state vector is of length six and the
measurement vector is of length two, the matrix kH must be
of length six by two:


=5.00101005.00101
kH . (9)

From the process model and measurement equation it results
that the Kalman filter attempts to improve the prior state est i-
mate using the incoming measurement which has been cor-
rupted by noise. This improvement can be achieved by linearly
blending the prior state estimate, 1ˆ−kx , with t he noisy me a-
surement, kz, in:

)ˆ ( ˆ ˆ− −−+=kk kk k k xH zK x x . (10)

Here −
kxˆ means the a- priori estimate; kK is known as the
blending factor . The minimum mean squared error of the est i-
mate is obtained when the blending factor assumes the value of
the Kalman gain :

1) (− − −+ =kT
k kkT
k k k R HPH HP K , (11)

where kP is known as the state covariance matrix. Generally
the state covariance matrix is a diago nal matrix . The state co-
variance matrix is determined from the a -priori state covariance
matrix as follows:

−−=kkk k PHKI P ) ( . (12)
After the Kalman gain has been computed , and the state and
state error covariance matrices have been updated, the Kalman filter makes projections for the next value of k. These proje c-
tions will be used as the a -priori estimates during processing of
the next frame of data.

kk k x x ˆ ˆ1Φ=−
+ , (13)
kT
kkk k Q P P +ΦΦ=−
+1 . (14)

The above equations are the projection equations for the
state estimate and for the state covariance matrix. In the below
figure, the Kalman filter is presented in a diagram form (see
Figure 6 ).
The main role of the Kalman filtering block is to assign a
tracking filter to each of the measurements entering the system
from the optical flow analysis block. For an easy implement a-
tion of the Kalman filter in Simulink, we wrote an embedded
Matlab function. This method is often used when the function that needs to be implemented is more easily to express in Mat-
lab’s symbolic language than in Simulink’s graphical language.

Figure 6 . The Kalman filter algorithm .
V. POST PROCESSING
The last block that we will discuss is the post
processing/video output block. This block was used to process
the output from the optical flow analysis block and the output
of the Kalman filtering block.
The post processing block is composed of three video output
sub-blocks. The first sub -block is used only to view the origi-
nal video signal.
The second sub -block is used to visualize the resulting signal
from the optical flow analysis block and to allow the user to be
sure of the correct functionality of the optical fl ow analysis
block. This sub -block is in direct connection with the blob
analysis sub-block, block that produces the coordinates for a
bounding box. This bounding box is a rectangle drawn around
each correctly detected blob. The user is able to watch in real –
time which contact in the video feed is being sent to the Ka l-
man filtering block. If rectangles are not surrounding the co r-

rectly detected contacts in an image, this thing means that the
optical flow analysis bloc is not working properly. Figure 7
presen ts the resulting output of the optical flow video viewer sub-block. We present only five frames taken at 17.5 FPS of
each other . It can be clearly seen that a contact was detected
and a bounding box was correctly superimposed on the contact.

Figure 7 . Output sample from the optical flow video viewer .

The last sub-block discussed in this section is th e Kalman
filtering video viewer. This sub -block is in direct connection
with the Kalman filtering block. This block produces at its out-
put a matrix containing the position of the detected contact.
The output is use d by the Kalman filtering video viewer to
draw markers in the video. These markers are represented here by red circles. The user is able to see in real -time if the d e-
tected contact is correctly tracked by the Kalman filter. If a
marker doesn’t follow consi stently a contact we can say that the Kalman filtering block is n’t working properly. Figure 8 a
presents five frames resulting from the Kalman filtering video viewer, taken like in the previous case, at 17.5 FPS of each
other. We can clearly see that the m arker is correctly tracking
the detected contact thus confirming that the Kalman filter is
working properly . The Kalman filter is even capable to track
the contact that is leaving at some time the visual field of the
camera and then correctly reassign the marker to the contact
that reen ters in the vis ual field (see Figure 8b) .

(a)

(b)
Figure 8. Output sample from the Kalman filtering video viewer .
(a) tracking a contact that passes through the camera’s field of view; (b) contact exiting and reentering the camera’s field of view.

VI. CONCLUSIONS AND FUTURE WORK
There are two main factors that affect the problem of track-
ing: the accuracy to distinguish between contacts passing through the scene and the speed to process the video feed in
real-time. In this paper we have shown that with the help of the
optical flow and Kalman filter algorithms it is possible to
detect and track a person passing through a scene.
The video signal used in our experiments it is provided by a
Linksys WVC200 PTZ IP video camera at a resolution of
240×320. The entire experi ment was conducted using an Intel
Core2Duo T9300 computer with 4 GB of RAM. From the optical flow analysis used in our research we have
deduced that there is an inevitable trade -off between the accu-
racy and speed of processing. To accurately distinguish a co n-
tact that passes through a scene, the computational time of the
optical flow algorithm must be increased. If this increase of the processing time is too large, the algorithm will not operate in
real-time.
The Kalman filter algorithm presented in this re search was
able to correctly process a contact and to correctly assign a
filter to the processed contact. After reviewing the results we
deduced that the algorithm performed quite well showing a moderate consistency in tracking. Due with the success with

the data used in our experiments, any inconsistencies in the
tracking process can be traced back to the fluctuations in pe r-
formance of the optical flow algorithm.
Future research in the area of surveillance systems should be
focused in two directions. First , research should be made to
determine an objective measure of performance of the optical
flow algorithm and to see if other existing algorithms are better
suited to accurate, real -time processing of a video signal. The
second research should be focused in the area of determining
contact behavior, and in areas such as merging contacts into groups or dividing groups into separate contacts.
For a future research we will try to implement the presented
Kalman filter algorithm into a system that is capable of trac k-
ing multiple persons.
A
CKNOWLEDGMENT
This paper is supported by the Sectoral Operational Pr o-
gramme Human Resources Development (SOP H RD), f i-
nanced from the European Social Fund and by the Romanian
Government under the contract number POSDRU/6/1.5/S/6.
REFERENCES
[1] M. A. Ali, S. Indupalli, B. Boufama, “ Tracking Multiple People for Vid-
eo Surveillance,” First Intern. Workshop on Video Processing for Secur i-
ty, June 2006.
[2] B. Benfold, I. Reid, “ Guiding Visual Surveillance by Tracking Human
Attention,” Proc. of the 20th British Machine Vision Conf. , September
2009. [3] L.M. Fuentes, S.A. Velastin, “From t racking to advanced surveillance,”
Proc. of the Intern. Conf. on Image Processing, vol. 3, pp. 121- 124, Se p-
tember 2003.
[4] K.P. Horn, B.G. Schunck, “Determining optical flow,” Artificial intelli-
gence, vol. 17, pp. 185- 203, 1981.
[5] C. C. Hsieh, S. S. Hsu, “A Simple and Fast Surveillance System for Hu-
man Tracking and Behavior Analysis,” Proc. of the 3rd Intern. IEEE
Conf. on Signal -Image Technologies and Internet- Based System, pp. 812 –
818, December 2007.
[6] F. Jean, R. Bergevin, A.B. Albu, “Body tracking in human walk from
monocular video sequences ,” Proc. of the 2nd Canadian Conf. on Com-
puter and Robot Vision, pp. 144- 151, May 2005.
[7] N. Koenig, “ Toward real -time human detection and tracking in diverse
environments ,” Proc. of the 6th IEEE Intern. Conf. on Development and
Learning , pp. 94- 98, July 2007.
[8] S. Kong, M.K. Bhuyan, C. Sanderson, B.C. Lovell, “Tracking of Persons
for Video Surveillance of Unattended Environments ,” Proc. of the 19th
Intern. Conf. on Pattern Recognition , pp. 1-4, December 2008.
[9] W. Niu, L. Jiao, D. Han, Y. F. Wang, “Real- time multiperson tracking in
video surveillance,” Proc. of the 4th Pacific Rim Conf. on Multimedia ,
vol. 2, pp. 1144- 1148, December 2003.
[10] A.W. Senior, G. Potamianos, S. Chu, Z. Zhan g, and A. Hampapur, “ A
comparison of multicamera person -tracking algorithms ,” Proc. IEEE Int.
Works. Visual Surveillance, May 2006.
[11] J. Wang, Y .g Yin, H. Man, “ Multiple Human Tracking Using Particle
Filter with Gaussian Process Dynamical Model,” EURASIP Journal on
Image and Video Processing, vol. 2008, Article ID 969456, 10 pages,
2008.
[12] G. Welch, G. Bishop, “An Introduction to the Kalman Filter ,”, Technical
Report: TR95 -041, University of North Carolina, 2006.
[13] J. Yao, J.M. Odobez, “ Multi- Camera 3d Person Tracking With Particle
Filter In A Surveillance Environment, ” Proc. of the 16th European Signal
Processing Conf. , August 2008.

Similar Posts