The main innovation brought by Viola and Jones was to [627485]

The main innovation brought by Viola and Jones was to
not try to directly analyze the image itself, only certain rect-
angular "features" in it. These features inspired by analogy
with the analysis of complex waveforms from the orthogonal
system of functions Haar, known as "Haar like features" after
the mathematician Alfred Haar.
First, if one color image is analyzed, it is transformed into
one in gray levels, in which appear only levels of brightness
/ luminance, color information being neglected.
Fig. 10. Haar classifier structure[12]
Finally, Viola and Jones use 38 levels for their cascade
of classifiers [10], using the first five of them respectively
by 2, 10, 25, 25 and 50 of features, and overall, all levels,
their number being 6000+ (in 5 stages). The number of
characteristics for each level was determined empirically by
trial and error, for the first few levels. The criteria were to
minimize false positives to below a certain threshold for each
level while maintaining a high rate of correct recognition
(very low rate of false-negatives). Each level features have
been added in stages until the proposed performance criteria
have been achieved at that level.
In Figure 10 it is described the structure to generate a
Haar type classifier, the need of a set of positive images
that describe the object that is intended to be identified,
and a set of negative images, which must be pictures with
totally different structure from those of positive images. For
example, to generate a Haar classifier for traffic signs, the
positive set of images used were pictures of the respective
traffic sign, and as negative images were used images of
what would be on the background of signs, that is images
of nature and city.
Fig. 11. Selecting region of interest for positive set in MatLabTo generate the classifier I used Matlab’s package Com-
puter Vision System Toolbox. It allowed uploading pictures
for positive set and selecting the region of interest within it.
It is desired to use a larger number of pictures for the
positive set, as it will produce a higher rate for detection.
After selecting the region of interest in Matlab, it provides
a tabular form for the pictures used and of each respective
region of interest, to be used later in generating the classifier.
You also need to introduce a large set of negative images
samples, from which the function will generate a negative
pattern. To achieve the utmost possible detection accuracy,
the number of steps must be as big as possible. With each
step, the detection rate function removes false positives and
the correct recognition rate will get better. [12]
Negative samples are not specified explicitly. Instead,
trainCascadeObjectDetector function automatically gener-
ates negative samples from the negative image provided by
the user that do not contain objects of interest.
Prior to each new stage, the detector lets the function
comprising the steps already trained on negative images.
Every object detected in these images is a false positive and
negative samples will be used as the next step. In this way,
each new phase of the cascade is trained to correct mistakes
made earlier stages.[12]
Sometimes classifier training may end soon, for example,
suppose formation stops after seven stages, even if we set
the parameter number to 20. The feature could not generate
enough negative samples. If we execute the function again
and we set number of steps to seven, we get the same result.
This is because the positive and negative samples for each
step are recalculated for the new number of steps.
Upon completion of the classifier training, MATLAB
returns an XML format file containing points of interest for
the detection of objects in the algorithm, it is loaded into the
program as a variable of type cv2.CascadeClassifier where
the file extension ".xml" will be generated set as a parameter.
C. Testing the generated classifier
After generating the classifier, it was loaded on a Rasp-
berry Pi and tested in a traffic sign detection application.
Fig. 12. Selecting region of interest for positive set in MatLab
The number of frames per second in this case, while using

only one classifier to detect only one region of interest was
around 2FPS.
D. Testing with multiple classifiers
After the test with one classifier, I decided to try and
detect more traffic signs using more Haar cascade classifiers,
each one made independent using a different set of positive
samples, but with the same set of samples as negative.
Fig. 13. Selecting region of interest for positive set in MatLab
In this test, the window size was set to a lower value from
the first one. However, the program managed to detect both
regions of interest with a same number of FPS’s.
More tests have concluded that, the more cascade classi-
fiers are loaded, the lower the frame rate will get. Another
test made for this case, was using six different classifiers
to detect six regions of interest, and it managed to properly
detect all six of them, but at a cost of 0.5 frames on second.
This is due to the low processing power of the Raspberry
Pi 2 B+, and the big resolution of the input image. It can
be solved, or at least improved by using a multi-threading
operation to get the input stream from the camera in a
different thread/core, and keep the input from the camera on a
high FPS stream.After all, by utilizing threads to improve the
FPS, the I/O latency will decrease as the FPS will increase.
E. Performance benchmark for OpenCV
To conclude this experiment, we have to test and see the
actual results, and for testing it I have used the imutils library
for python.[13]
$ python fps_demo . py
[ INFO ] s a m p l i n g f r a m e s from webcam . . .
[ INFO ] e l a p s e d ti me : 7 . 3 1
[ INFO ] approx . FPS : 1 3 . 6 8
[ INFO ] s a m p l i n g THREADED f r a m e s from webcam . . .
[ INFO ] e l a p s e d ti me : 3 . 7 9
[ INFO ] approx FPS : 2 6 . 1 2
As we can see from this experiment, the initial framerate,
which was around 14 frames per second has increased by
90.93% only by using the threaded I/O for the camera input
stream.
In put project we also want to use the Raspberry Pi’s
screen to show the output of the image processing process,therefore the function cv2.imshow will be used. This
will change the overall behavior of the program, since the
cv2.show function is just another stream of data, in this
case, a video stream, a form of I/O, only that it’s not reading
a frame from the video stream, instead it’s sending it as a
output frame on out display.
Now to test this we can run the test with -display 1
$ python fps_demo . py d i s p l a y 1
[ INFO ] s a m p l i n g f r a m e s from webcam . . .
[ INFO ] e l a p s e d ti me : 1 0 . 1 7
[ INFO ] approx . FPS : 9 . 8 3
[ INFO ] s a m p l i n g THREADED f r a m e s from webcam . . .
[ INFO ] e l a p s e d ti me : 9 . 6 5
[ INFO ] approx FPS : 1 0 . 3 7
In this case, we got a 5.49% increase in frames per
second, being far from the 90.93% increase from the first
test. However, this example also used a cv2.waitKey(1)
functions, which is necessary for the frame to be displayed
on our screen.
Overall, the cv2.imshow function is recommended in
the debugging of out program, but if the final product dose
not require it, there is no reason to include it, as it will
decrease the FPS.
F . Testing ORB: Feature matching
"This algorithm was brought up by Ethan Rublee, Vincent
Rabaud, Kurt Konolige and Gary R. Bradski in their paper
ORB: An efficient alternative to SIFT or SURF in 2011."[14]
ORB (Oriented FAST and Rotated BRIEF) for feature
matching in OpenCV is a algorithm that uses FAST to detect
the stable keypoints, selects the strongest features, finds their
orientation and computes the descriptors using BRIEF (the
coordinates, in pairs or k-tuples are rotated according to the
measured orientation).[14]
Fig. 14. ORB – 50 matches
One advantage for ORB is the efficiency of detection and
description on standard CPUs, which is a good choice for
our Raspberry Pi’s low processing capability. On an average
feature matched and run time, the ORB detector will go

up to 23.50 features matched on a run time of 15.33 ms,
which compared with other feature detector/matchers is a
good performance and efficiency.[15]
In Figure 14. the STOP sign was loaded as a ORB object,
with the optional parameter nfeatures which describes
the number of features to detect set as 50.
For the feature detection, ORB approximates the angle
to increments of 2=30(12 degrees) and builds a table of
precomputed BRIEF patterns, according to the orientation os
keypoints. For any feature set of nbinary tests at location
(xi;yi), define a 2nmatrix.
Fig. 15. Brute-Force Matching with ORB Descriptors
In Figure 15. we created a BFMatcher (Brute-force
descriptor matcher) object, in order to use it in the matching
of features, where crossCheck was set to True for better
results. Further, the cv2.drawMatches function which takes
first picture, set as a query image, it’s keypoints and descrip-
tors with ORB, the second picture as a training image and
with it’s keypoints and descriptors with ORB, the number of
matches, which for this example was set to 50, because that is
the number of features that have to match in the application
used further, and after that it draws the found matches of
keypoints from one image to another.
G. Using Haar cascade with ORB feature matching
OpenCV Haar feature-based cascade classifier works well
on detecting the object of interest in a simple scene, but it
can detect many false-positive if we lower the minNeighbors
(parameter specifying how many neighbors each candidate
rectangle should have to retain it) parameter, thus making
the object detection less strict.
1) Case 1: Lower minNeighbors parameter: In this case,
we will get many false positive, since we are less strict
regarding the detection of region of interest, regardless of
how good and how many times the cascade was trained, or
how many positive images we used for it.
2) Case 2: Higher minNeighbors parameter: On this
approach, the number of false positive appearances have
lowered but with the cost of not detecting the positive region
of interest at some points, like when it is a bit rotated or
at some certain distances from the camera. This problem is
called a negative-positive, because the object is within range,
is clear, but the program fails to detect the region.
3) Case 3: Lower minNeighbors parameter + ORB: The
solution was to let the parameter for minNeighbors low
enough to detect the objects of interest as well as other
false-positive matches, and then treat each region detectedwith a feature matching ORB method. Each region will be
inspected, to make sure it is actually a STOP sign.
There are many techniques available in OpenCV to inspect
objects(regions), but in this project I used the methods tested
above.
As for the principle of operation, the program firs
read the input stream from the camera. Next step
is to load the Haar cascade in a classifier function
(cv2.CascadeClassifier ) as well as loading a picture
with the clean STOP sign in a ORB function, to further
use it as a model for matching features with the region
detected firstly by the Haar classifier. Subsequently, we do
a detection for the classifier on the input stream from the
camera, with the detectMultiScale argument on the
variable attributed with the cascade classifier, and with the
parameter for image (matrix of the type CV_8U containing
an image) set as the camera input stream. Note that the input
stream is transformed in gray scale, colors been disregarded,
Haar classifiers working with only gray scale features(see
Fig. 8.).The scaleFactor , set to 1.1, specifies how much
the image size is reduced at each image scale. Experiments
have concluded that 1.1 is the optimum value, as it will
detect the image at any scale/distance. Last parameter for the
detect object is the minNeighbors , with the value set to
10 and it represents the number or neighbors each candidate
rectangle should get to retain it. Next, the program will find
the keypoints and descriptors for the STOP sign loaded as
a feature matching comparison image, as a InputArray for
thedetectAndCompute argument of ORB object created
earlier. After these preliminary setups, the program will start
looping through all detected object in the matched cases
from the Haar cascade classifier. Obtain object from input
frame, find the keypoints and descriptors for the object
in the candidate section, match descriptors, check if the
threshold for the matched features is met (in this project,
we used MATCH_THRESHOLD = 50 ) and if it is, it draws
the rectangular box on it’s coordinates.[16]
VI. R ESULTS
OpenCV 3.2.0 has many dependencies, and it was not easy
to install it on a embedded linux machine like Raspberry Pi,
however, after building and compiling it, the results were as
expected. However, the difficulty of recognizing road signs is
due to the outdoor lighting conditions, as well as reflections
and shadows, obstacles, such as trees or buildings, vehicles,
and to the effect of blurring caused by the movement of
vehicle the camera is mounted. Although, the number of
frames have been improved by using a threaded program,
and the false-positive appearances have been removed, or at
least lowered to a very low point, and if the traffic sign has an
obstacle in front of it, but the classifier and feature matching
algorithm still manage to match some features, then it will
still be detected. Now a second problem is the draw of current
by the Raspberry Pi, because it uses a high amount of Amps
to function, and even more, the webcam also requires some
current, therefore making the Raspberry a huge consumer.

As for the heat, it will not go over 50°C, and the processor
load will be around 50%.
VII. C ONCLUSIONS
This project has in mind the inexpensive solution for a
mobile platform, capable of autonomy by identifying traffic
signs with the lowest fals-positive rate. The STOP sign used
in this paper is just an example, as it can cover more traffic
signs, and note that a Haar cascade can be trained by using
all the traffic sign ones desires, and further distinguish them
with the feature matching approach.
For this build, the Linux operating system gives numerous
advantages, beside the fact that is on a memory card, and
can be easily switched just by changing the card from the
Raspberry, giving the user the ability to use a different
operating system (or in this case, a Linux with different
configurations). Python language was used because it works
perfectly with Raspberry and is easy to write and debug.
Another advantage os using Python was that the program
could be write on a PC, tested there, and after that moved
to the Raspberry and it would work the same.
VIII. F UTURE WORK
This project was focused on the computer vision of the
robot, making it detecting objects of interest with no false-
positive appearances and improving it’s speed of detection.
Also, the hole concept of robots with computer vision
focuses on achieving tasks, so that will be the next step for
this project, making it identify objects and pick them with
the gripper and sort them or put them in marked areas.
REFERENCES
[1] ***, https://ro.wikipedia.org/wiki/Robot
[2] ***, https://ro.wikipedia.org/wiki/Robot_industrial
[3] Alexandru N ˘astase, “MECANICA ROBO¸ TILOR Mecanisme manip-
ulatoare seriale”, Gala¸ ti, 2012, pp. 6-8
[4] Milan Sonka, Vaclav Hlavac, Roger Boyle, Image Processing, Analy-
sis, and Machine Vision, 3rd ed.ISBN: 10: 0-495-24438-4 ISBN: 13:
978-0-495-24428-7
[5] Mohammed Reyad AbuQassem, Simulation and Interfacing of 5 DOF
Educational Robot Arm, Islamic University of Gaza, June 2010
[6] T ,ârliman Doina –Cercet ˘ari privind sistemele de prehensiune
ale robot ,ilor industriali act ,ionate cu ajutorul mus ,chilor
pneumatici,Bras ,ov,2014.
[7] ***, http://www.roroid.ro/un-pic-mai-multe-despre-comunicarea-
seriala/
[8] Gigel M ˘aces ,anu, Tiberiu T. Cocias ,, Sisteme de Vedere Artificial ˘a,
Editura Universit ˘at,ii TRANSILV ANIA din Bras ,ov, 2016
[9] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted
Cascade of Simple Features,” Proc. of the 2001 IEEE Computer
Society Conf. on Computer Vision and Pattern Recognition, V ol. 1,
Kauai, USA, 2001, pp. 511–518.
[10] R. Lienhart and J. Maydt, “An Extended Set of Haar-like Features for
Rapid Object Detection,” Proc. of the 2002 Inter. Conf. on In Image
Processing ICIP, New York, USA, 2002, pp. 900–903.
[11] Alexandru Mihail ITU, GRASPING STRATEGIES CONTRIBU-
TIONS IN REAL AND VIRTUAL ENVIRONMENTS USING A
THREE FINGERED ANTHROPOMORPHIC GRIPPER, Bra¸ sov,
2010
[12] MatLab, http://www.mathworks.com/help/vision/ug/train-a-cascade-
object-detector.html
[13] http://www.pyimagesearch.com/2015/12/21/increasing-webcam-fps-
with-python-and-opencv/
[14] Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB:
An efficient alternative to SIFT or SURF. ICCV 2011: 2564-2571.[15] Frazer K. Noble, Comparision of OpenCV’s Feature Detectors and
Feature Matchers
[16] Ross D Milligan, Road sign detection using OpenCV ORB, rdmilli-
gan.wordpress.com

Similar Posts