Chapter 1: Introduction [306900]
Chapter 1: Introduction
As human beings we encounter many faces in our day to day lives. [anonimizat], rather than the whole face. This helps us recognize faces easily and naturally. [anonimizat]. [anonimizat], [anonimizat], [anonimizat]. [anonimizat], [anonimizat], face recognition becomes harder.
[anonimizat]. The earliest works on the subject of face recognition were made in the 1950’s in psychology. [anonimizat], [anonimizat] 1960’s. One of the first institutes researching the subject was the Panoramic Research inc., founded by Woodrow W. Bledsoe. [anonimizat]. [anonimizat].
After, private companies and universities became involved in the research and many new approaches were designed and implemented. Sirovich and Kirby had efficiently represented human faces using principal component analysis. M.A. Turk and Alex P.Pentland developed a near real time Eigen faces system for face recognition using Euclidean distance. Some tried representing a human face as a [anonimizat]. [anonimizat] a subjects face. [anonimizat]: Geometric/[anonimizat]/[anonimizat]-based/Model-[anonimizat]/statistical/neural network approach. [anonimizat].
Face recognition is considered one of the most relevant applications of image analysis. [anonimizat]’t deal with a [anonimizat]. [anonimizat][23]. [anonimizat]-[anonimizat], virtual reality or even law enforcement. Because of this wide domain of interest the research is pushed towards a [anonimizat]: [anonimizat], [anonimizat].
[anonimizat], ground breaking technology. In recent years facial recognition has become a [anonimizat], neuroscience, psychology and medicine. With the fast development of technology face recognition has become a reality and it is demonstrated in real life applications. The fast development can be attributed to two major factors[1]: active development of new facial recognition algorithms and the availability of large database of facial images. These huge databases can be obtained nowadays from low enforcement agencies like the FBI, Passport office databases, visas, drivers license photos.
Facial recognition is most useful in domains where high security is needed. Some of these domains are: information security, by creating access security, data privacy, user authentication solutions for medical records, data bases, access management, with the help of secure access management, permission based systems, and biometrics in case of person identification, automated identity verification, just to mention a few. Major companies are starting to use these technologies in their new, ground breaking products as Xbox kinetics, Microsoft’s Project Natal and Sony’s PlayStation Eye all use facial recognition. It opens new up new ways for the user to interact with the machine. Manufacturers like Toyota are also interested in using this new technology, they are developing sleep detectors for their new cars, to increase the safety of their cars.
During the summer of 2011, following a couple of months of previous research, we had the honor to be invited to Debrecen, Hungary to present a paper during the ETDK, on the subject of Computer Vision Systems. During our visit, we were invited to visit the National Instruments factory, market leader in PC based industrial measurement and automation. After our visit we were given the opportunity of assembling and designing a software package for a vision system using some of their more significant products, the NI SB-RIO 9631 prototype robot, the CVS-1454 Compact Vision System and the Basler ScA-640 camera. The NI SB-RIO 9631 prototype robot itself is based on FPGA processor, making it easy to interface with other peripheral devices and also allowing high speed, real-time data processing and transfer. It is a robot designed and created by the engineers from National Instruments for educational purposes and to better illustrate the capabilities of some of their top-of-the-line products. The CVS-1454 Compact vision system is another outstanding tool of National Instruments. It is, as even its name suggests, a compact and rugged vision system, designed to withstand even the harshest environments encountered in vision guided robotics applications, industrial inspection and OEM vision. It uses IEEE 1394 technology, which makes it easy to interface it with other of-the-shelf cameras. One major advantage it has, is that we are able to connect up to three cameras to one CVS module, by this allowing us to perform multiple operations at once. By attaching it to the robot, we can create a fast and reliable high resolution image processing system. Another essential part of the system is the Basler ScA-640 camera, which provides the essential, good quality, real time image stream for the CVS module. The Compact Vision System communicates with the SB-Rio 9631 FPGA through its I/O ports. Updating the robot with the result of the image processing. By interconnecting the above mentioned three devices we can create a high speed, high quality image processing system, with the help of which high speed and accuracy robot control can be obtained, in relatively short time. After assembling the hardware, interconnecting all the necessary I/O ports, connecting the camera to the Compact Vision System, we were able to start the development of the software package.
The software itself was developed in the NI LabVIEW programming environment, which is a highly productive development environment that mostly engeneers and scientists use for graphical programming and unprecedented hardware integration to rapidly design and control systems[9]. Using this platform, engeneers become able to test from small to large systems and to evaluate and optimize their performance, to achieve their maximum potential. National Instruments LabVIEW is a graphical development environment used to design scalable measurement and control systems[25]. It gained popularity because its simplicity, high processing speed and flexibility. Thanks to its patented dataflow programming, National Instruments managed to reduce development time, but at the same time deliver the flexibility of a powerful programming language. With the help of many modules offered, we are able to extend the capabilities of LabVIEW. By using the LabVIEW Real-Time module the user is able to create reliable and deterministic applications. The LabVIEW FPGA extend its capabilities to be able to design custom hardware using NI reconfigurable I/O devices. With the help of eider module we are able to compile and download the created LabVIEW code to a dedicated real-time FPGA target. During our work we used extensively two important modules of LabVIEW, these are the Robotics and the Vision module. Without the help of these modules the development of the software package would have been much harder, if not impossible. LabVIEW offers a software development platform for anything from autonomous vehicles to robotic arms. The NI Robotics module makes this all possible, with the help of its built in connectivity to sensors and actuators, with its fundamental algorithms for intelligent operations and robust perception. It is a valued tool teaching, “it enables students to design and prototype complex robots faster than text-based tools” Dr. Denis Hong, Virginia Tech[9].
We chose to design our software package in LabVIEW based on a variety of reasons. It gave us a great opportunity to explore a new way of programming, a new programming language that is based on blocks, that contain parts of code. By logically connecting these togeather, we can get a fast and reliable system. With the help of National Instruments Debrecen, and the Univercity of Debrecen, we were provided with a student licence to LabVIEW, which enabled us to use and exploit all of the softwares remarkable capabilities. With LabVIEW also being designed by National Instruments, as the other pieces of the robot and the compact vision sytem, it was an obvious choice in order to achieve maximum speed, reliability and to minimize compatibility and communication issues.
By interconnecting the above mentioned components, a high speed and high quality image processing system is obtained, with the help of witch a complex real-time robot control can be achieved in relatively short time. After presenting a short history of robotic vision, and some state-of-the-art face recognition algorithms, presentation of the hardware modules and some basics of the LabVIEW programming environment, the main objective of this paper is to introduce in the development of a robot control based on real-time image processing, and also act as a guide for further development and research.
Chapter 2: State of the Art in Face Recognition
Face recognition is considered one of the most relevant applications of image analysis. Although humans are good at recognizing faces, but we can not deal with a large number of unknown faces, this is why at one point computers will overcome human limitations in face recognition. Many different domains use and are extremely interested in face recognition(FR), as we can see in Figure 1. The domains include video surveillance, human-machine interaction, photo cameras, virtual reality or even law enforcement. Because of this wide domain of interest the research is pushed towards a multitude of interested disciplines, these including: pattern recognition, neural networks, computer graphics, image processing and psychology.
2.1 History and Development of Face Recognition
The earliest works on the subject were made in the 1950’s in psychology[10]. They came attached to other issues like face expressions, interpretation of emotion or perception of gestures. Engineers only started showing interest in FR in the 1960’s. One of the first researchers in the domain was Woodrow W. Bledsoe[3]. They founded the Panoramic Research Inc. in California. Of course the majority of the research done by this company was for the U.S. Department of Defense and its interlacing agencies. He and his partners: Helen Chan and Charles Bisson worked on using computers to recognize human faces, but because of the secrecy of the research few papers were published by them. They managed to implement a semi-automated system. Some facial coordinates were selected by a human operator and this information was used by the computer for recognition. They immediately realized the problems that FR even today suffers from: variations in illumination, head rotation, facial expressions, ageing. In Bell Laboratories, A. Jay Goldstein, Leon D. Harmon and Ann B. Lesk described a vector that contained 21 subjective features like: ear protrusion, eyebrow weight or nose length, as the basis to face recognition using pattern classification techniques. Other approaches tried to define the face as a set of geometric parameters and perform pattern recognition based on these parameters.
The first persom to implement a fully automated FR system was Kenade in 1973. He designed a FR program that ran on a computer designed especially for this purpose. It was able to extract 16 facial parameters automatically. This system had a correct identification rate between 45% and 75%[19]. He also demonstrated that the system gave better results when irrelevant features were not used. After, a variety of different approaches followed, but most of them were following the previous tendencies. Mark Nixon presented geometric measurement of eye spacing. The template matching approach was improved using such strategies as “deformable templates”. Also other researches were trying to build face recognition systems using neural networks. Eigenface would become dominant in future face recognition systems[8]. The first mention of this technique was made by L. Sirovich and M. Kirby in 1986. The principle that they used in their methods was Principal Component Analysis. The basic idea was to represent an image in a lower dimension without losing too much information and then being able to reconstruct it. This work became the basis for many new face recognition systems. In 1992 Mathew Turk and Alex Pentland from MIT used the eigenface to locate, track and even classify a subjects head.
Figure 1 – Applications of face recognition [3]
Face recognition has drawn more attention during these years, and because of this many new approaches were taken. Some of the most relevant are: Principal Component Analysis(PCA), Independent Component Analysis(ICA), Linear Discriminant Analysis(LDA). Nowadays many different enterprises are using FR on their products, such as the entertainment business.
2.2 The Basic Structure of a Face Recognition System
In any case of face recognition, the input is always an image or a video stream and the output is an identification or verification of the subject or subjects from the video input. Some consider the face recognition as being a three step process, as also shown on Figure 2, in this case the Face detection and the Face extraction phase can run simultaneously.
Figure 2 – Structure of a generic face recognition system[3]
Face detection is considered the process of extracting the face from the rest of the scene. By this the system can identify a certain region as a face. This can be used in many applications like face tracking, pose estimation or compression. Feature extraction is considered the extraction of relevant facial features from the data. These features can be face regions, variations, angles that can be relevant. This is the phase where facial feature tracking and emotion recognition is done. In the final phase the system would recognize the face from a database. This phase involves a comparison method, a classification algorithm and an accuracy measure. In real life applications these phases can be merged or new ones can be added based on the systems needs. Because of this we can see many different approaches to face recognition.
2.3 State of the Art Face Detection Algorithms
Modern face recognition applications sometimes don’t even require face detection, the images stored in the database are already normalized. However in the case of the conventional input of a computer system, this can contain more faces, so the detection phase is mandatory. This is the case also when developing automated tracking systems. When building an face recognition system we can encounter some well-known challenges. These are usually because of input images captured in uncontrolled environment, like by a surveillance system. The main factors that cause these challenges to appear are[18]:
Pose variation: in real life applications it is not possible to use images shot only from one angle. This makes recognition harder. Even more, because of pose variations the performance of the system drops.
Feature occlusion: the presence of elements like beards, hats, glasses introduces variability.
Facial expression: facial features can become distorted because of different gestures.
Imaging conditions: the use of different cameras and the ambient conditions can also affect the quality of the image.
One other potential problem that can arises is connected to face location. Face location is a simplified version of face detection. Methods like finding the head boundaries were first made using this and after were exported to more complicated problems. Facial feature location involves locating important facial features like, nose, eyes and ears. Face tracking is a problem that sometimes can come up because of face detection.
2.3.1 Problems Encountered, Sources of Error:
The concept of face detection hides in itself many sub-problems. Depending on the system, they can detect and locate faces at the same time or they can first perform the detection and if positive, they can locate the face.
Figure 3: Face detection process[3]
Most recognition systems share some common steps, one of these is the data dimension reduction. This is done to help the system achieve a faster response time. There are cases where some pre-processing is possible in order to make the input adapt to the algorithm prerequisites. The next step is usually involves the extraction of facial features or measurements. Some algorithms may even include a learning routine and they include new data to their models.
2.3.2 Common Approaches to Face Detection
There has not been yet established a clear grouping criteria. There are many ways in which we can group, some of them may even overleap. In this section two criteria will be presented: one of them differentiates between different scenarios, the other divides the detection algorithms into four categories[15].
Detection depending on the scenario:
Controlled environment: we may consider it as the most straightforward case. The input images are taken in a controlled environment with same lighting, background etc.
Color images: skin color can be used to detect faces. This can be hard because different lighting can change the color of the skin. Human skin color varies from white to black. Even so, there are many attempts of building FR systems based on this
Images in motion: if using real time video as input, it gives us the opportunity to use motion for face detection. Also this gives us the opportunity to use a different approach: eye blink detection, that has many uses aside from face detection
Detection methods divided into categories:
One well-accepted classification was presented by Yan, Kriegman and Ahaja[3]. The methods are divided into four categories, but these categories may overleap. Because of this an algorithm can belong to two or more categories:
Knowledge-based methods: rule-based methods that encode our knowledge of human faces.
Feature-invariant methods: it tries to find invariant features of a face. These features do not depend on the angle or position of the face.
Template matching: it compares the input image with stored patterns of face features.
Appearance-based methods: it is a template matching based method, that’s pattern database is learned from a set of learning images.
By examining these methods more in detail we can clearly see the advantages and disadvantages that they offer. By knowing these we can objectively decide which is the one best suitable for our systems needs, by this maximizing its performance and speed.
b.1 Knowledge-based method:
These are rules-based methods. They try to translate the knowledge that we have about faces into rules like: the face has two symmetrical eyes, and the eye area is darker than the chin. These methods are not the most reliable. It can give false positives if the rules are too general and false positives if the rules are too detailed. Because of this, this approach is very limited, it is unable to find faces in complex images.
Han, Liao, Yu and Chen in 1997 tried to find a new approach[11]. This involved finding invariant features to use in face detection. The method consists of several steps. First, we try to find the eye-analogue pixels, so it removes unwanted pixels from the image. Each eye-analogue segment is considered the candidate of the eyes. A set of rules determines the potential pair of eyes. After the face area is calculated as a rectangle. The faces are normalized to a fixed size and orientation. Finally a cost function is applied to make the final selection. This method was 94% efficient even in pictures with many faces. But there are problems when the person was wearing glasses. To overcome this, there are algorithms that detect face like textures or the color of human skin. We can use the RGB and HSV together to detect faces. The authors use the following parameters:
0.4≤r≤0.6, 0.22≤g≤0.33, r>g>(1 − r)/2
0≤H≤0.2, 0.3≤S≤0.7, 0.22≤V≤0.8
Formula [1]
Both of these conditions can be and are used to detect skin color pixels. We should mention that these alone are not enough to build a good face detection algorithm because skin color can change in different light conditions. Therefore they need to be used alongside other methods.
b.2 Template matching:
These methods try to define a face as a function. A face can be divided into smaller parts like eyes, face contour, nose and mouth. Unfortunately these methods are limited to faces that are frontal and unoccluded. Some templates might use the relation between face regions in terms of brightness and darkness, these are after compared to the input images to detect faces. This approach is fairly easy to implement but it is unfortunately inadequate for face detection. It is not capable of doing a precise detection if we have variations in pose, scale and shape.
b.3Appearance-based methods:
Templates are learned from examples in the images. Generally they are based on statistical analysis and machine learning. Sometimes they define discriminant functions between face and non-face. The most relevant tools are presented here:
Eigenface-based: it is used to efficiently represent faces in Principal Component Analysis. It was developed by Kirby and Sirovich[12] and its goal is to represent faces as a coordinate system. They referred as eigenpictures to the vectors that made up this coordinate system.
Distribution-based: the idea behind it was to gather sufficiently large number of samples into a class, that covers all the variations we wish to handle. An appropriate feature space is chosen, than the system matches the candidates pictures to the distribution based canonical face model.
Neural Networks: they are an efficient way of detecting faces. At the beginning, researchers used neural networks to learn face and non-face patterns. The hardest part for them was to represent the “images not containing faces” class. Others tried using discriminant functions to classify patterns using distance measurement. Yet others tried to find an optimal boundary between face and non-face pictures using constrained generative model.
Sparse Network of Winnows: this method is based on describing two linear units, also referred to as target nodes. One represents the face pattern, the other one the non-face pattern. The SNoW had an incrementally learned feature space. New label cases served as a positive example for one target and as a negative one for the others.
Naïve Bayes Classifiers: this object recognition algorithm models and estimates a Bayesian Classifier. the probability of a face being present was computed using the frequency of occurrence of a series of patterns. This showed good results in frontal detection, but they are mostly used as complementary parts of other algorithms.
Hidden Markov Model: the challenge that they faced while building this statistical model was to build a proper HMM, so that the output probability can be trusted. The facial features were used as the states of the model, which were often defined as pixel strips. The probabilistic transitions between states are usually boundaries in these pixel strips.
Information-Theoretical Approach: contextual constraints of face patterns and correlated features can be modeled using Markov Random Fields(MRF). It maximizes the discrimination between classes using the Kullback-Leibler divergence.
2.3.3 Theory and Different Modalities of Face Tracking
In case of face recognition systems that have a video sequence as an input face tracking is very useful. Face tracking is basically a motion estimation problem. There are a series of ways to track a face, like head tracking, feature tracking, image based tracking, model based tracking. We can classify these algorithms as[13]:
Head Tracking/Individual feature tracking: In this case a head can be tracked as a whole or just a couple of features can be tracked.
2D/3D: Two dimensional systems track a face and output an image space where the face is located. In the case of 3D systems a 3D modeling of the face is performed. This approach allow pose and orientation variations to be estimated.
In basic face tracking we are seeking to locate an image inside of a picture, after we need to compute the difference in frames and update its position. There are many variations that we need to face when tracking a face, like: Partial occlusions, illumination changes, computational speed and facial deformation. The state vector of a face includes the center position, size of the rectangle containing the face, the average color of the face area and their first derivatives. The faces of the new candidates are evaluated using the Kalman estimator. In tracking mode, if a face is not new, the image from the previous frame is used as a template. Using the Kalman[3] estimate we evaluate the position of the face and the face region is searched by the SSD algorithm using the template. When the SSD finds the region, the color information is embedded into the Kalman estimator to exactly confine the face region. After this, the state vector is updated.
2.3.4 Theory and Different Modalities Feature extraction
We can consider feature extraction as the extraction of relevant information form an image. This process must be efficient in terms of time and memory usage, but also the output should be optimized for the classification. Feature extraction contains in itself some smaller steps like: dimensionality reduction, feature extraction and feature selection. Dimensionality reduction is considered very important because the performance of the classifiers depends on the amount of images and the number of features and classifier complexity. Added features might degrade the performance of the classification algorithm, also known as “peaking phenomenon”. To avoid this, we need to use ten times as many training samples per class as the number of features. There is a difference between feature extraction and feature selection. Feature extraction algorithms, as their name say, extract features from the data. It transforms the data in order to select the proper subspace in the original feature space. Feature selection algorithms select the best subset of the input feature set. In the majority of the cases feature extraction is performed before feature selection.
There are many face extraction algorithms but not all of them are used in face recognition. Many of them were just adopted to help in facial recognition. In Figure 4, we can see some feature extraction algorithms that are used in face recognition.
Figure 4: Feature extraction algorithms[3]
Face selection algorithms aim to select a subset of the extracted features that can cause the smallest classification error. Some of the more successful approaches to this problem are based on the branch and bound algorithms. In Figure 5 we can see some of these methods and their capabilities. We can’t consider this a complete list because recently more feature selection algorithms were proposed. Researchers are thriving for a satisfactory algorithm, not necessarily an optimum one. Some of them use resemblance coefficient or satisfactory rate as a criterion and quantum geometric algorithm.
Figure 5: Feature selection methods[3]
2.3.5 Theory and Different Modalities Face Recognition
Computer vision, optics, pattern recognition, neural networks, machine learning, psychology are just a couple domains that influence the evolution of facial recognition. The steps of face recognition can sometimes overleap or even change depending on the consulted bibliography. These factors make it hard to achieve a unified face recognition algorithm and classification scheme[3].
Geometric/Template Based approaches
A face recognition algorithm can be classified as geometry based or template based. The later methods compare the input with a set of templates. The geometry feature-based do face recognition by analyzing the facial features and their geometric relationships. There are some algorithms that use both at the same time, like a 3D morph able model approach can also use feature points or textures as well as PCA to build a recognition system.
Piecemeal/Holistic approaches
Some algorithms try to identify faces from little information, they process facial features independently. The relation of the features to the whole face is not taken into account. Early research used this very often, trying to find the most relevant features. Some tried to use the eyes or a combination of features. Also some Hidden Markov Models can fall into this category. Nowadays most algorithms follow a holistic approach.
Appearance-based/Model-based approaches
All facial recognition methods can be divided into these two categories. Appearance-based methods represent a face in terms of several raw intensity images, Statistical techniques are used to derive feature space from the image distribution. The model-based approach tries to model a human face. A new sample is fitted to the model, and the parameters of the fitted model are used to recognize the model. Appearance methods can be linear or non-linear and model based methods can be 2D or 3D. Linear appearance-based methods perform a linear dimension reduction, the face vectors are projected to the basis vectors. Non-linear appearance-based methods are more complex, they use Kernel PCA. As mentioned before, model-based methods can be 2D or 3D. Basically they try to build a model of a human face. They allow us to identify even when pose changes occur. 3D models are more complex, they capture the 3D model of a face. We can see this model in Electric Bunch Graph Matching.
Template/statistical/neural network approaches
According to a study by Jain and colleges, we can group face recognition methods into three main groups:
Template matching: patterns represented by models, samples, pixels, curves, textures. They usually function alongside correlation or distance measure. On Figure 6 we can see the structure of a template based algorithm.
Statistical approach: Patterns represented as features. The recognition function is a discriminant function.
Neural networks: The representation may vary; there is a network function at some point.
Figure 6 – Template matching algorithm diagram[3]
2.4 The Evaluation of State-of-the-Art Algorithms for Remote Face Recognition:
Face recognition has undergone a huge progress and has gained huge attention during the last couple of decades. At the moment most facial recognition databases are using images captured at close range and in a controlled environment, which can’t be applied in many cases in real world applications. Images can change due to poor illumination, distance, blur and many other variables. In order to obtain better results the first step consists of building a robust database of images in which many images are taken from long distance and in outdoor environment. The images are then individually cropped and labeled according to the illumination quality (from good to really bad), pose (fontal and non-frontal), blur and non-blur. Two state-of-the-art FR algorithms were evaluated with this remote database: baseline algorithm and a recently developed algorithm base on spare representation. The following observations needed to be made: the performance of the algorithms improves as the number of the gallery images becomes higher. The performance varies based on the number of images available and on the quality. It is advised to design a quality metric for rejecting very bad quality images.
2.4.1 Remote Face Database Description
The distance from where the images were taken varies form 5m to 250m and under different scenarios. The faces were manually cropped from the background. The resulting database contained images from 17 different individuals and totaling 2106 images. All images have the same size and format: 120×120 png. Many of the faces are in frontal view. After labeling the images according to the above mentioned criteria it has been concluded that there are 688 clear images, 58 partially occluded images, 37 severely occluded images, 570 images with medium blur, 245 with sever blur, and 244 in poor illumination conditions.
a) b) c)
d) e) f)
g) h) i)
Figure 7 – Sample images from the remote database:
clear, b) clear, c) partially occluded, d) partially occluded, e) have pose variations, f) have pose variations, g) poorly illuminated, h) severely occluded, i) severely blurred[13]
The remaining images have more than one condition present, these images were discarded and not used in the following experiment. On Figure 7 we can see a couple of images from the database, and its classifications. It is noticeable that in some cases the faces are hard to be recognized even by humans.
2.4.2 Algorithms and Experiments:
In this section two state-of-the-art algorithms will be evaluated using the before created face database. Their results will be compared in each scenario, thus highlighting their efficiency. After the results are compared and analyzed, we will be able to have a clearer overview of the performance of the two algorithms and will help us choose the one most suitable for our systems requirements. As mentioned above, the quality of the images is variable, they can contain one or more disturbing factor present.
Experiments with Baseline Algorithms:
In this particular case, clear images from the database as gallery images were used. Gradually the image amount is increased from one to fifteen per subject. The gallery images each time are chosen randomly, and the experiment is repeated five times and the average is considered as the final recognition result.
Baseline Algorithm
Baseline recognition algorithm involves Kernel Principle Component Analysis(KPCA)[13], Linear Distance Analysis[13] (LDA) and also Support Vector Machine(SVM)[13]. LDA is used for feature extraction and dimensionality reduction in pattern recognition tasks. The basic principle that it involves is the maximization of the class distance and the minimization of the within-class distance. The KPCA is used to make the within-class matrix nonsingular. LDA has a high fail rate when the sample rate is low. Because of this we use Regularized Distance Analysis (RDA) to eliminate this effect.
Handling illumination Variations
Illumination is a very important factor that can distort images. In order to minimize the effect of illumination the albedo estimates were used. Albedo is considered the fraction of the light that is reflected when illuminated. In this case it was calculated using the method of minimum mean square error criterion. Figure 8 shows the effect of the albedo estimate on two images taken from 50 m.
Figure 8 – Results of albedo estimation. Left: original image,
right: Estimated albedo image[13]
Experimental Results
In the first experiment all the remaining clear images, except the gallery images were selected. Both albedo maps and intensity images were used as inputs. The results of this experiment are shown in Figure 9. As we can notice all the parameters for KPCA, LDA and SVM are tuned well. It is noticeable that intensity images perform better than the albedo maps. One reason for this can be that the images in the database are not fully frontal. The albedo needs for the two images to be as closely aligned as possible. Also extreme illumination conditions can cause that we are not able to get good initial estimates with albedo.
Figure 9 – Face recognition using albedo maps and intensities in
the baseline algorithm (Comparison)[13]
After, the test images were changed to poorly illuminated, medium blurred, severely blurred, partially occluded and severely occluded. Clear images were still present. The results of this experiment can be seen in Figure 10. By using the albedo maps approach, the recognition rate drops. The recognition rate, as we can see it in the figure above, is the lowest when the gallery images are between 5 and 10. With the increase of the gallery images per subject, the recognition rate also rises, reaching 0.9[13].
Figure 10 – Performance of baseline with varying test image conditions[13]
It can be seen that the condition of the test images severely influences the recognition, as the image condition decreases so does the performance of the system. The worst are the occluded and severely blurred images. In the case of severely occluded images the recognition rate drastically drops, we can notice that the images containing these defects have a much smaller recognition rate than images containing any other defects[13].
Experiments Using Sparse Representation
This algorithm uses a modified BPDN(Basis Pursuit DeNoising) algorithm in order to get a sparser coefficient vector to represent the test image. The SCI(Sparsity Concentration Index) was calculated for each image, and images with values under a certain threshold were discarded. For the experiment 14 subjects were used, each with 10 clear images[13]. These images were clear, blurry, poorly illuminated and occluded. In Figure 11 we can see the comparison of the results with the results from the baseline algorithm. When no rejection is allowed, we can notice that the recognition of the sparse representation method is low. As the threshold of the SCI is increased, more bad quality images are rejected, thus the accuracy increases. The rejection rates are 6%, 25.11%, 38.46% and 17.33% when the images are clear, poorly lighted, occluded and blurred. We can clearly notice that the sparse representation-based algorithm has a clear advantage over the baseline algorithm in cases when occlusion occurs on the test image.
Figure 11 – Comparison between sparse representation and baseline algorithms: clear, poorly lighted, occluded and blurred stand for the condition of the test image[13]
In case of sparse representation (no rejection), we can clearly see that the performance it gives us is the worst, it has the lowest recognition rate in all four types of images. On the other hand, in the case of the sparse representation, with rejection, we can see that it has a better recognition rate than the baseline method in occluded and blurred images. The difference is most striking in the case of occluded images. In case of clear and poorly lighted images, the baseline offers us the best results, but not by much[13]. In these two cases both baseline and sparse representation have a good recognition rate.
Addition of Degraded Images to the Gallery
During this experiment only blurred, poorly illuminated and occluded images were used as the test images, but also the corresponding types of images were only used in the gallery set. In order to make the comparison with previous experiments first 140 clear images were kept in the gallery, and one third of the test images were moved into the gallery for each case. Also the images from previous experiment were divided into two for each case, half were used as gallery images, half as test images. The result of this experiment can be seen in Figure 12.
Figure 12. C,M,D represent using all clear, mixture of clear, and degraded, all degraded images as gallery images[13]
From the above table we can see that the results are the lowest when using occluded images. This tendency could be noticed also in the previous experiments. The poor lighting conditions have had the least effect on the recognition rate, here, when using clear, mixture of clear and degraded images, the recognition rate still remained the highest. The experiment shows that in the recognition of degraded images adding the corresponding type of variation into the gallery can improve the performance.
2.4.3 Conclusion
In this study a remote face database was built, and the performance of different state-of –the-art Facial Recognition algorithms were tested on it. From the results we can clearly deduce that the recognition rate drops as the face images quality decreases. But we can also obtain valuable information about how the gallery images effect the recognition rate. If we use good quality images, the recognition rate in both cases is the highest. We can also see that in most cases the baseline algorithm offers us the best results. Both methods have their advantages and drawbacks, they both can achieve high recognition rate, based on these experimental results the user has an easier job selecting the one which offers the best performance for his system. The evolution reported here can provide guidance for further research in this domain.
2.5 Face Recognition Based on PCA Algorithms:
As human beings we encounter many faces in our day to day lives. Whenever we meet somebody new, we remember the particular facial features of that person, rather than the whole face. This helps us recognize faces easily and naturally. This process is done automatically by our brain, and it is still mysterious to us. The shape, distance between the eyes, the shape of the nose, mouth, and the relationship between them, all help us to distinguish between different faces. If we correctly extract these features, face recognition becomes easy, however by changing our hair, glasses, noise face recognition becomes harder. Our face is full of information that we can use for detection, but using all would make the process significantly slower and less efficient. Facial recognition is most useful in areas where high security is needs, like airports, military bases, government offices etc. Sirovich and Kirby had efficiently represented human faces using principal component analysis. M.A. Turk and Alex P.Pentland developed a near real time Eigen faces system for face recognition using Euclidean distance. In recent years facial recognition has become a blooming area of research and development. One of the most attractive areas of research are computer vision, neuroscience, psychology and medicine. With the fast development of technology face recognition has become a reality and it is demonstrated in real life applications. The fast development can be attributed to two major factors: active development of new facial recognition algorithms and the availability of large database of facial images. These huge databases can be obtained nowadays from low enforcement agencies like the FBI, Passport office databases, visas, drivers license photos. In this first face recognition method we use Principal Component Analysis for extraction and Neural Networks for recognition. We can distinguish two main parts in this case: face verification and face recognition. In the verification phase, the system knows prior the identity of the user, and has to verify this identity, he has to decide if he is an impostor or not. In the face recognition phase the prior identity is not known, the system has to decide which of the images from the database bares the nearest resemblance. This particular face recognition process consists of two phases, as shown on Figure 13. Enrollment and the recognition/verification phase. The system is made up of several modules: Image Acquisition, Face Detection, Training, Recognition and Verification.
2.5.1 Structure of a Face Recognition System
Below we will study the structure of this face recognition system more in detail, explaining in part what each phase does and how it helps and affects the outcome of the recognition process.
Figure 13 – Block diagram for face recognition system[14]
Enrolment phase: Images are taken using a webcam, or any other type of camera, and they are stored in a database. The next step is to detect and train the image. By training we understand geometric and photometric normalization of the image. Using several techniques the features of the face are extracted. These features, along with the image are then stored in a database.
Recognition/Verification phase: At this point the users face is again acquired, the system uses this to identify who the user is or to determine if the user really is who he claims he is. During identification the system compares the acquired template to all the users in the database. During verification we only verify the templates belonging to the eledged identity. As we can notice on Figure 13 the recognition/verification phase consists of several modules, which are: image acquisition, face detection, face recognition/verification.
b.1 The image acquisition/face detection module:
This module is used to seek and extract the particular area of the image that contains the face. In the next steps of the process, the image is resized, corrected geometrically and it eliminates the background and the redundant data.
b.2 Face recognition/verification module:
It is made up out of preprocessing, classification and feature extraction sub-modules. As an input, we use the face image from a camera or a database. During the feature extraction, the normalized image is represented as feature vectors. The result of the classification for the recognition purpose is determined by matching the client index with the client identity in the database. Preprocessing: it reduces or eliminate some of the variations due to illumination. It also enhances and normalizes the image for a better recognition. The robustness of the facial recognition greatly depends on this step. Histogram equalization: it is considered the most common histogram normalization or grey level transform. Its purpose is to give as image with equally distributed brightness levels over the whole brightness scale. It is usually used on too dark or too bright images. As a result some important features might become visible.
b.3 Steps of histogram normalization:
For an N x M image of G gray-levels, create two arrays H and T of length G initialized with 0 values.
Form the image histogram: scan every pixel and increment the relevant member of H– if pixel X has intensity p, perform H[p] = H[p] +1 (1)
Form the cumulative image histogram Hc; use the same array H to store the result.
H[O] = H[O]
H[p] = H [p -1] + H[p]
For p = 1,…, G-1.
4. Set
G -1I
T[p] H[p] (2)
MN7[14]
Rescan the image and write an output image with gray-levels q, setting q = T[p].
Face extraction: we use the feature extraction for extracting the facial vectors, information that represents the face. We use the PCA (Principal Component Analysis). PCA is based on information theory approach. It extracts the most relevant information in an image and encodes it as efficiently as possible. The classical representation of a face image is obtained by projecting it to the coordinate system defined by the principal components. The projection of face images into the principal component subspace achieves information compression, de correlation and dimensionality reduction to facilitate decision making. In mathematical terms, the principal component in the distribution of faces or the eigenvectors of the covariance of the set of face images, is sought by treating an image as a vector in a very high dimensional face space. We apply the PCA on this database and get the unique feature vectors using the following method. Suppose there are P patterns and each pattern has t training images of m x n configuration.
• The database is rearranged in the form of a matrix where each column represents an image.
• With the help of Eigen values and Eigen vectors covariance matrix is computed.
• Feature vector for each image is then computed. This feature vector represents the signature of the image. Signature matrix for whole database is then computed.
• Euclidian distance of the image is computed with all the signatures in the database.
• Image is identified as the one which gives least distance with the signature of the image to recognize.
Classification: it helps map the feature space of the test data to a discrete set of label data that serves as a template. Three techniques are predominantly used: Neural Networks, Normalized correlation, Euclidian Distance.
Neural Network: it is a machine learning algorithm, it was used for various pattern classification problems. It has good generalization and good learning ability. Neural networks are basically built up out of 3 layers: input layer, that is responsible for inserting the information network. Hidden layer may consist of one or two layers, these are sufficient even for solving difficult problems. It is responsible for processing data and learning. The last layer is the output layer, that gives the output to a comparator, that compares the output with a predefined target. The neural networks require training to improve, the better the network, the less time it needs. A face neural network consists of 448 input nodes, 12 hidden nodes, and 1 output node. The architecture of a Neural Network is illustrated on Figure 14.
Figure 14: Neural Networks Architecture[14]
In back propagation method, if we use 20 to 40 hidden neurons, we get 100% recognition accuracy, in the very less time. If we increase the neurons form 40 to 60, the recognition accuracy remains 100%, but the time increases. When we increase the neurons over 65, the accuracy starts to decrease and after 75 it reaches 0. Figure 16 contains a comparative analysis of different neural networks over various step size of learning rate. We can notice that the back propagation algorithm is more sensitive to variable learning rate. With the increase of the step size, the back propagation neural network becomes unstable.
Eucledian Distance(E.D.) Formula 2 is the nearest mean classifier which is used for decision rule:
Formula 2
Where the claimed client is accepted if is below the threshold and rejected otherwise.
Normalized correlation (N.C.): it is based on the correlation score, denoted as:
[14]
Formula 3.
Where the claimed identity is accepted if exceeds the threshold .
2.5.2 Experiment and Results:
The purpose of this experiment was to evaluate the speed and performance of the face recognition system using photometric normalization techniques such as histogram equalization to the images. The facial images that were used were taken from a local database and they are frontal images. We used 20 individuals, each of them having ten images. For verification purposes two measures were used: false acceptance rate(FAR) and false rejection rate(FRR). FAR are considered the cases when an impostor tries to gain access and succeeds. FRR is the case when a client tries to gain access and is rejected. These two are given by the formula: FAR=:IA/I, FRR=:CR/C. Where IA considered the number of impostor access and CR is the number of clients rejected, C is the number of client trials[18].
Face verification: in this they studied the verification percentage of the system using original facial images. The results of this experiment are showed in Figure 15. It shows that even tough that the E.D. has the lowest HTER, N.N. classifier gives the best PCA feature extractor results. The N.N. has a higher average, with 5.69% but the highest values in this experiment were achieved by the E.D. with 15% which is double as any other.
Figure 15 – Verification results using Original Image[14]
In the second experiment, the histogram equalization is initially applied to the face image. The results of this experiment can be seen in Figure 16. From here we can clearly deduce that the N.C. classifier has the lowest HTER for both feature extractors, with only 6.025%. We can see that the verification results in the three cases completely changed when using histogram equalization. N.C. has the lowest values with 6.025, and E.D. became the highest with 10.58%. N.N. had a HTER of 7.56%. With histogram equalization none of the methods had very high HTER values.
Figure 16 – Verification Results using Histogram Equalization[14]
The third experiment shows the results when the histogram equalization to the face image. The experimental results can be seen in Figure 17. We can clearly see that the N.N. classifier has the lowest HTER, it achieved only 3.92%. A slightly higher result was achieved by the N.C. classifier, with an average of 5.625%, but the best highest was obtained with the E.D. classifier. In the case of histogram equalization the highest values, based on this experiment, ware measured with the E.D. classifier: 9.565%, making it the worst classifier in this particular case.
Figure 17 – Verification Results using Histogram Equalization[14]
Face recognition: In the case of recognition purpose, the performance is evaluated based on recognition rate and accuracy. The results using the original image are shown in Figure 18. It shows that the E.D. classifier has the highest recognition rate for PCA with 98.51%, but also N.C. has a good recognition rate, it has a 97.04 recognition rate, the worst recognition rate has the N.N. with only 87.03%.
Figure 18: Recognition Results using Original Image[14]
2.5.3 Conclusion of the Experiment
This paper has presented a face recognition system using PCA with neural networks in the context of face verification and face recognition using photometric normalization for comparison. It is important to mention that these experiments were conducted by Taranpreet Singh Ruprah, all I am only interpreting his results in order to illustrate the difference between the face recognition methods and their success rate. From the experimental results we can clearly deduce that the N.N. Euclidian distance is the best for PCA for overall performance for verification. It can be also seen that in recognition, E.D. classifier gives us a high accuracy using the original image. With these results we can conclude that using histogram equalization techniques on the face image do not have a big effect on the performance of the system in a controlled environment. The results of each experiment can help us in choosing the best, most suitable classifier for our project, to optimize the performance and speed of our system. As we noticed each classifier works best in a specific situation, we everyone should choose wisely the one best suitable for their needs.
Chapter 3: Introduction into LabVIEW and Image Processing
As we all can “see” the one of our five senses that we rely on most is the sense of sight. It gives us almost 80% of the information that we process during the daytime activities. It is also astonishing to know that nearly 75% of the sensory receptor cells in our body can be found in the retinas of our eyes. Most of our daily decisions are based on our sense of sight, on what we can see, and how we can interpret the obtained data. As in the case of us, humans, the sight has become an essential part of production, in computer controlled robotics. It has an ever increasing role in factory production, scientific, medical and safety fields. Because of the need of computer controlled robotics to process and make decisions based on image date, LabVIEW Vision Toolkit was created.
With the introduction of NI LabVIEW, a Virtual Instrumentation revolution begun, which is still going strong today. This rapid growth is partially due to the huge advances in personal computers and consumer electronics, thus giving the LabVIEW programming environment the CPU clock rate, RAM size, bus speed and disk size it needs to function efficiently. In the first steps of virtual instrumentation only electrical devices were connected to computers, but after, the ability to connect measurement devices was added. The next huge step in the evolution of Virtual Instrumentation is the introduction of the first image acquisition hardware along with a Image Analysis library. This step is even more important for us because during our project we constantly use both. At the time of its introduction image processing was still just taking its first steps, because of it requiring a powerful computers and a lot of knowledge. Since then, due to the rapid advancement of personal computer technology, nowadays image processing is possible on most devices, even more thanks to good quality webcams and the NI Vision Builder software, it has become accessible to almost anybody. It is on the fast road on becoming an essential part in any Virtual Instrumentation. The LabVIEW programming environment is used for these types of works because it enables programmers to create simple and efficient image processing algorithms with the help of its many incorporated functions.
Images are a way of recording and processing data in a visual form. For most people images represent pictures, but this is not entirely true. In the broader sense of the word, images can represent any kind of 2D data. By digital image processing we refer to images being manipulated by digital means, in most cases, with the help of computers. Naturally occurring images can not be processed by computers because a computer needs to process numerical data. This is why images need to be converted into numerical data, also referred to as digital images, by this enabling computer manipulation. A digital image corresponds to an array of real or complex numbers represented by a finite number of bits, showing visual information in a discrete form. The conversion of an image into an appropriate digital conversion is called analog to digital conversion, that carries out sampling and quantization.
3.1 Types of Images:
The Vision Toolkit found in the LabVIEW programming environment can read and process raster images. A raster image is basically made out of cells called pixels, each of them containing a color or a greyscale intensity, as we can see in Figure 19. Most machine vision cameras capture images in this format, after this information is transmitted to the computer trough standard data bus or frame grabber.
Original Image Pixelated Raster Image
Figure 19 – Original image and it’s representation in pixelated raster form[15]
A common misconception is that an images type defines the image files type or vice versa. This is not true in all cases. It is true that certain image types work well with certain image file types so it is important to take this into account.
The Vision Toolkit is designed to process three types of images:
Greyscale
Color
Complex
Grayscale images:
Greyscale images basically consist of an x and y special coordinates and their intensity values. This type of image can be represented in a surface plot, where z axis represents the light intensity, as shown of Figure 20. The intensity data is represented by its depth. For a bit depth of x, the image can have a depth 2x , this meaning that each pixel can have an intensity level of 2x levels. In our case, the Vision Toolkit can manipulate grayscale images with the depth of 8, 16 and 32 bits. A higher bit depth image uses more memory, RAM as well as fixed storage memory. The memory required can me calculated with the help of this formula:
Required Memory=Resolutionx X Resolutiony X Bit Depth
Bit depths that are not represented in the list above are automatically converted into the next highest acceptable bit depth.
Original image Surface plot representation
Figure 20 – Image represented in surface plot form[15]
Color images:
Color images can be represented in two different ways: RGB short for Red Green Blue or HSL short for Hue-Station-Luminance form. The Vision Toolkit accepts both models.
Pixel Depth Channel Intensity Extremities
RGB 0 to 255
HSL 0 to 255
The α component represents the opacity of the image. Zero represents a clear pixel and 255 represents a fully opaque pixel. The Vision Toolkit ignores the α information.
In memory usage the color image follows the same relationship as the Grayscale image.
Complex images:
They include real and complex components, this is where they derive their name. Complex image pixels are stored as 64-bit floating-pint numbers, which are constructed with 32-bit and 32-bit imaginary parts.
A complex image contains frequency information representing a grayscale image, because of this it can prove useful in cases when you apply frequency domain processes to the image data. Complex images are created by applying FFT to a grayscale image.
3.2 Image Acquisition types
We can obtain images with the help of two main methods: loading an image file from the disc and directly, using a camera. Image acquisition can be considered a complex task in most cases but with the help of NI MAX and the LabVIEW Vision Toolkit it becomes a straightforward and integrated task.The Vision Toolkit support four image acquisition types: Snap, Grab, Sequence and StillColor.
Snap is the simples way of image acquisition. It requires only three steps to execute: initialize, acquire and close. To initialize the IMAQ session, the interface name is input to IMAQ Init. After the IMAQ session refnum is fed into a property node, that determines the image type of the camera interface. Also the data space for the images is created, the system waits for the user to click Acquire or Quit. In case of Acquire the next video frame from the camera is returned to the Image Out. Finally we use IMAQ Close to release the camera and interface resources.
Grab is a very fast way of image acquisition. The best way of acquiring and displaying live images is by using IMAQ Grab. First IMAQ Grab Setup is executed. IMAQ Grab Acquire is used to grab the current image into the buffer. It has a Boolean input that defines the next image is to be acquired.
Sequence acquisition is useful in cases when you know how many images you want to acquire and at what rate. By using IMAQ Sequence we can control the number of frames and which frames we want to acquire. The time it takes to acquire the frames is called the frame rate:
Formula 4.
We can also use a Skip Table array input, which allows us to set which frames to acquire and which to skip.
3.3 Working with Images:
Almost all applications based on machine vision are based on image files, these images can be acquired, saved and then processed or acquired processed and saved. In some cases the saving is optional.
The simplest way of reading an image from a storage devices is by using the IMAQ ReadFile. This can read BMP, TIFF, JPEG, PNG, AIPD file formats. First you have to create the image data space with the IMAQ Create, after the image file is opened and read. The block diagram for reading an image file is shown of Figure 21. In newer LabVIEW versions this can also be done by the Vision Acquisition block.
Or:
Figure 21 – Loading a standard image file[15]
We can also save the read image, using the Vision Acquisition block. Using this one simple block, we can set the frame rate, camera model, acquisition mode, give different types of outputs, save video files, thus making the above mentioned block diagram replaceable with one simple block. Of course creating our own acquisition block diagram increases the image recognition speed and uses less memory. This is important in systems where memory and speed is of the essence.
3.3.1Sampling:
Sampling is the process of measuring the value of the physical image at discrete intervals in space. The most common sampling grid used is the rectangular sampling grid. Each sample can be considered a small region of the physical image, and is called a pixel or picture element. Most commonly pixels are indexed by x and y coordinates, upper left corner being the origin. In Figuure 22. we can see the process of sampling represented on a digital image.
Figure 22 – Sampling process represented on a digital image [15]
The horizontal and vertical sampling rates give the pixel dimension of an image. A very common broadcast video standard is 640×480. These numbers represent that the image has 640 samples in the horizontal direction and 480 samples in the vertical direction. Digital cameras today enable the user to select witch image dimension they prefer.
3.3.2 Connectivity:
The nature of image processing alters the PUI’s(Pixel Under Investigation) value with respect to its surrounding pixel values and perhaps the PUI itself, this process is called connectivity. Most image processing routines are based on 4, 6 or 8 pixel connectivity. In case of four-pixel connectivity concerns the image processing routine with the cardinal pixels, in case of PUI is 23, the connectivity patterns are: 13,22,24,33, as represented in the Table 1 below.
Table 1.
Six pixel connectivity is represents a hexagonal pixel layout, which can be simulated on a rectangular grid by shifting each second row by half a pixel.
Eight pixel connectivity concerns image processing patterns with the cardinal and sub cardinal pixels, when the PUI is 23, the connectivity patterns become: 12, 13, 14 22, 24, 32, 33 and 34 as seem in Table 2.
Table 2.
Basic operations can be executed by performing pixel by pixel transformations on the intensities throughout the source images and function in two modes: with two source images and with a source image and a constant operator. We can perform most basic operations.
3.3.3 Quantization:
By quantization we understand the replacement of continuous values of the sampled image with a discrete set of quantization levels. The number of quantization levels employed governs the accuracy by which the image values are displayed. Quantization values are usually represented by integers ranging from 0 to L-1, 0 representing black and L-1 representing white, the values intermediate to these represent different shades of gray, these values are usually referred to as greyscale. The number of bits used to define a pixel value is called bit depth. In most cases a bit depth of 256 is used.
1bit/pixel 4bit/pixel 8bit/pixel
Figure 23 – The effect of different bit depths[15]
As you can see on Figure 23. 1bit/pixel is not sufficient to display an image in detail, but in the case of 4bit/pixel and 8bit/pixel representation the quality is acceptable.
3.3.4 Image Re-sampling:
By Re-sampling we understand the changing of the pixel dimensions of an image. The simplest way of achieving this is by inserting pixels, to increase the size of the image, or dropping pixels to decrease the size. There are other ways of inserting pixels in an appropriate way into the image so that the quality will not deteriorate. Some of these are bi-linear interpolation, cubic spline interpolation, quadric interpolation.
Images that do not contain formatting, but just the quantized pixels are called raw images. Images are stored and processed in raw form, so that pixels are encountered line by line.
3.3.5 Basic Image Processing:
Image processing refers to the procedure of manipulating images. Digital image processing covers a wide variety of techniques to modify the properties and the appearance of the images. One of the most common image processing modes is changing the location of the pixels in the image. An example of this is reversing the pixels of an image according to a symmetry location. An other way of manipulating an image is by rotating an image. In this case the pixels of an image are rotated around a certain origin by a determined rotation angle.
3.3.6 Arithmetic Image Processing
While basic image processing only changes the location of pixels, moves the pixels to another location, there is another way to modify pictures. This is by carrying out arithmetic operations on pixels. This makes possible to add/subtract an integer to/from pixel values or multiply or divide image pixels by a constant value. In the picture below, Figure 3. We can see what effect adding 50 and 150 to an image has. The pictures appear to be brighter with the increase of the amount that we add. The pixel values that exceed 255 appear as white. On Figure. 24 we can clearly see the changes on the image which are generated by adding to the pixel value of the original image.
Original +50 to pixel value +150 to pixel value
Figure 24 – Example of adding 50 and 150 to the pixel value[15]
3.3.6 Image Enhancement:
Image enhancement is the process with the help of which we can emphasize certain image feature. These features can be: edge boundaries or contrast. By this we can improve the visual appearance or content.
A simple but also efficient class of image enhancement techniques are the so called Point operations. The class gets its name from the fact that each pixel gets recalculated independently, according to a certain transformation. These transformations are also referred to as: contrast enhancement, contrast stretching or grey-scale transformation techniques. They usually involve the adjustment of brightness and contrast of an image.
3.3.7 Linear Mapping(Linear point operations)
The overall brightness of a grey-scale image can be adjusted by adding a constant bias, b, to pixel values, as seen in Formula 4:
g(x,y)=f(x,y)+b[?]
Formula 4.
The brightness of the image can be increased or decreased by modifying the value of b. In Figure 4 we can observe how modifying the brightness affects the original image.
Block Diagram Original Brightness +50 Brightness -50
Figure 25 – Block diagram and the effect of modifying the brightness value on the original image
The contrast of a grey-scale image can be adjusted by multiplying all pixel values by a constant gain, as seen in Formula 5:
g(x,y)=af(x,y)[?]
Formula 5.
By increasing or decreasing the value of a, we can modify the contrast of the image, as shown in Figure 26.
Original image Pixel values multiplied by 2
Figure 26 – The effect of modifying the image contrast[15]
3.3.8 Negation:
Negation, also called inversion, is accomplished by applying a negative gain factor to an image. The typical way if getting a negative image is by substracting each pixel from the maximum pixel value. In case of an 8-bit image, by using Formula 6:
G(x,y)=255-f(x,y)
Formual 6.
Original Negative image
Figure 27 – Image Negation[15]
On Figure 27 we can clearly see the effects of negation. It is similar to the negatives used in the photo film. In some situations the negation of an image can come in quite handy, it can sometimes highlight some information on an image that can not be noticed on the original image.
3.3.9 Intensity Level Scaling:
Can be used to segment certain grey level regions from the rest of the image. This technique can prove useful in cases where the ROI of an image is within a particular grey level range. One way of intensity level scaling is achieved by segmenting a particular range and keeping the original pixels and setting the rest to zero, this can be easily done with Formula 7.
Formula 7.
Another way is by using Formula 8. and segmenting a particular range replacing pixel values by their highest value and set the rest to zero:
Formula 8.
3.3.10 Histograms:
By an image histogram we understand the frequency of occurrence of grey levels in an image. The histogram of an 8-bit image is shown in a table with 256 entries, that will record the number of occurrences of each level in the image. From the histogram of an image we can deduce useful information about the importance and frequency of different grey levels in the image. This can become especially useful in selecting the brightness and contrast procedure.
Image Image histogram
Figure 28 – Original image and image histogram
As you can see from Figure 28. most of the pixels are around the 60 and 180 level and we have few pixels in the 0-50 level, so we do not have many totally white pixels. Changing the brightness of the image will affect the histogram in a predictive way. By adding contrast bios to the original image will shift the histogram along the grey level axis by the corresponding distance. This does not effect the shape of the histogram. A non-linear mapping of the grey levels will usually stretch some parts of the histogram and compress others.
3.3.11 Histogram Equalization:
By allocating more grey levels in part of the histogram where most of the pixels are fewer grey levels for parts of the histogram with a smaller amount of pixels, an optimal contrast will be achieved as contrast is varied according to the number of pixels within the image itself.
The steps for histogram equalization of an 8-bit image can be done by following these steps:
Define the scaling factor d=255/total number of pixels
Calculate the histogram of the image
Compute the first grey level mapping c[0]=d x histogram[0]
Compute for the remaining grey levels c[i]=c[i-1]+d x histogram[i]
Perform mapping for all pixels in the image g(x,y)=c[f(x,y)]
Histogram equalized image Histogram
Figure 29 – Image after histogram equalization and its corresponding histogram[15]
With the help of this procedure we can ensure that the distance between two mapping levels is proportional to number of occurrences of the level. Less occurring grey levels will be mapped closer, while the more frequent grey levels will be mapped further apart. In Figure 29. we can very easily see the effect of histogram equalization on the original image.
3.3.12 Convolution:
Convolution calculates the new value of a pixel as a weighed sum of pixels in a certain neighborhood surrounding the pixel. The convolution is most commonly computed with the convolution kernel. The convolution kernel is computed by weighting neighborhood grey levels by coefficients that are defined in a matric(convolution kernel). The dimensions of the kernel define the neighborhood where the convolution is calculated.
The convolution process is defined for a kernel of w x h, and the input image denoted as f, looks as such as represented in Formula 9.:
Formula 9.
3.3.13 Morphology:
When applied to vision, morphology refers to alternating an image with the help of computer routines. It also refers in some cases to changing an image in steps, but this is not true in all cases. When the obtained images have noise, unwanted holes or small particles that we do not need, that is when we use the morphology.
Most morphological operations are neighborhood based (new value is calculated from its neighboring pixels). We consider two types of morphological operations: erosion and dilation. Morphological operations behave similar to filters, only the kernels used are dependent on the original value of the PUI.
3.3.14 Dilation:
It refers to special expansion of an object, in order to increase its size and by this filling holes and also connecting neighboring objects. If we consider an image with the following elements as shown in Table 3 :
Table 3.
If the PUI is 1, it keeps its value, if it is 0 than it becomes the logical OR of its cardinal neighbors:
Formula 10.
. In the example in Figure 30. we can see how dilation works, any PUI of original 1 are retained, small holes are filled and boundaries are also retained.
Figure 30 – Original image and the effect of dilation[15]
3.3.15 Erosion:
Everybody knows the effects of erosion in water, well erosion in image processing works similar, it takes away the edges of an images features, by this decreasing its size and opening up holes inside the object. Once more we consider the image with the following elements as shown in Table 4:
Table 4.
In case of simple erosion the following are applied: if the PUI is 0 then it keeps its value, if it is 1 and also all of its cardinal neighbors are 1, the we set the new pixel value to 1(otherwise set is to 0):
Formula 11.
Figure 31 – Original image and the effect of erosion[15]
On Figure 31. we can notice how erosion effects the original image, we can see the effect of Formula 12 on the original pixels. Just as water, the erosion algorithm takes away the pixels close to the borders of the image.
3.3.16 Closing:
The notion Closing refers to remove small holes and other unwanted background features from an image. To accomplish this all we need to do is first dilate and after erode the source image using Formulas 10 and 11.
Formula 12.
Although you might consider that performing two seemingly opposite morph operations on a source image, it will result in the destination image being the same as the source, but this is not the case. The original image data is lost already when performing the first operation, so that cannot be recovered.
Figure 32 – Effect of Closing on a source image[15]
On Figure 32. we can see in detail the effect of dilation and after erosion on a source image. And also we can notice that after the two operations have finished, we do not get the original image.On above image we can clearly see the effect of Formula 12 on the starting image. Small holes were filled as well as a small portion of the upper inlet.
3.3.17 Opening:
Opening can be considered as expanding holes and other background features in an image. It requires the opposite sequence as closing: first you need to erode the image and after you need to dilate it, as you can clearly see form Formula 13.
Formula 13.
Figure 33 – Effect of Opening on a source image[15]
On Figure 33 you can clearly see the effect of Opening operation on the original image. First erosion does its job, and afterwards by Dilating the second image we get a completely opened image.
3.3.18 Particle Removal:
One of the most common uses of morphology in machine vision systems is particle removal. In some cases you may need to count the number of particles larger or smaller then a set point in a source image. In these cases a very powerful tool that we can use is the IMAQ Filter Particle. With the help of this we can perform particle removal that removes particles smaller than a predefined area or those that do not define to a defined circularity.
3.3.19 Filling Particle Holes:
Once all the unwanted particles have been removed the next useful step is to fill any holes in the images objects. By this we can obtain solid objects in the image which makes it easier to count the or to measure them. The IMAQ FillHoles accepts binary source images and changes the intensity value of any detected hole pixels to 1. Holes touching the edge of the of the image are not removed, because it cannot be determined if they are holes or irregularly shaped objects.
3.3.20 IMAQ Danielsson
In many cases where high speed distance counting is needed IMAQ Distance is the ideal tool, but it is far from accurate. In cases where accuracy is a factor but speed not so much, we need to use the IMAQ Danielsson. The Danielsson distance algorithm was designed by Erik Danielsson, and is based on the Euclidian distance map. The map values are coordinates giving the x and y distance to the nearest point on the boundary, and are stored as complex valued pixels. After, the absolute function is used to transform these values into radial distance. When using IMAQ Danielsson, we don’t need to specify if we want to use square or hexagonal pixel frames.
3.3.21 Edge Detection:
It is a very important part of image processing and also my project. Edge detection algorithms usually contain these three steps:
Noise reduction- edge detection algorithms can be affected by noise, that is why it is indicated to use some kind of noise reduction. Most commonly we use low pass filtering.
Edge enhancement- this emphasizes pixels that show a great change in local intensity. For this we use a high pass filter, which acts strongly on the edges and weakly on other parts of the image.
Edge detection- distinguishes meaningful edges from other points that have nonzero value. Thresholding is often used for this kind of detection.
3.4 Cameras:
The role of the image acquisition should never be underestimated. Making the right hardware selection can greatly influence the systems performance and save time on programming. When choosing the right camera you should always take into consideration your projects demands. With the right lens and light and camera setup you will have good quality images to work with. An electronic camera contains a sensor that maps an array of incident photons into an electronic signal.
There are three major types of camera:
3.4.1 Progressive Area Scan
Best used in case of fast moving objects. They operate by transferring an entire captured frame from the image sensor, and as long as the image acquisition is fast enough, the motion will be frozen and the image will be the true representation of the object. They gained popularity in computer-based applications, they eliminate many time-consuming processing steps.
3.4.2 Interlaced Area Scan
The standard interlaced technique is to transmits the picture in two parts, as shown in the Figure 34, where we can see the letter “a” how it is transmitted. The two fields are acquired at slightly different time, this we can not observe with the naked eye because of the brains ability to combine the interlaced images into one continuous image. But machine vision system can be easily confused by this.
Figure 34 – a. interlaced image, b. field A, c. Field B[15]
3.4.3 Line Scan
All scanners and cameras use some kind of sensor with CCD pixels arranged in a row. As their name also suggest line scan cameras work with the help on array of sensors that scan the image and build the image one row at a time. This is best shown in Figure 35. Where we can see the scanning process.
Figure 35 – Line scan progression[15]
The resolution of the line scan system can be different for the two axes, because of the resolution perpendicular to the array is dependent on how fast it is physically scanning, while the resolution along the linear array is dependent on the amount of pixel sensors available.
Chapter 4: Hardware Development System and LabVIEW Programming Environment
National Instruments is a world leading manufacturer of industrial measurement and automation equipment. They have sold their products to over 30.000 companies around the globe. They can be considered one of the leading providers in their domain. Fortunately, thanks to good relations with the University of Debrecen, we were given the opportunity not only to visit their factory, but also we were asked to develop a software package for some of their more significant products. One of these products is the SbRIO-9631 prototype robot. This device can be used not only for educational purposes but also in development and research projects. The system itself is based on a FPGA processor, which makes it easy to interface it with other peripheral devices and is capable of high speed, real-time data processing and transfer. The programming of the robot is done with the help of LabVIEW Robotics program package, which is a module-based graphical programming environment. The CVS-1454 Compact Vision System is also an outstanding product of National Instruments that can be attached to the robot, thus creating a system that is capable of high resolution image processing. The Basler ScA-640 camera is an essential part of the system by supplying the image information for the device. The image processing module communicates with the robot using its output ports, uploading the robot with the results of the processed images. By interconnecting the above mentioned two devices a high speed and high quality image processing system is obtained, with the help of which complex real-time robot control can be achieved in a relatively short time.
The project can be broken down into two major parts: Software and Hardware part. By combining these two parts we were able to create a system that is capable of real-time image processing, pattern matching and also basic facial recognition. After presenting the hardware and software components separately and in detail, we will illustrate and describe the functioning and results of the developed software package. These results were also presented to the director and engineers of National Instruments Debrecen, who awarded us with the third place in the National Instruments Virtual Implementation Foundations competition.
4.1 The sbRIO-9631 Prototype Robot:
The most important part of the hardware configuration is the NI LabVIEW Robotics Starter Kit that is mobile robotic platform, containing sensors, motors and an NI Single-Board RIO that if responsible for the hardware embedded control. The simplicity and versatility of this robotic platform makes it ideally designed for learning robotics and mechatronics concepts or even for developing more sophisticated robot prototypes using LabVIEW Robotics. The robotic platform comes preprogrammed with an obstacle avoiding that executes a vector field histogram obstacle avoidance algorithm based on feedback from the induced ultrasonic sensor. During our research we did not use this particular software but it illustrates very well the capabilities of the robot and the software package. The NI LabVIEW Robotics Starter Kit features[31]:
Pitsco Education 12 VDC motors featuring 152 rpm and 300 oz-in torque
Optical quadrature encoders with 400 pulses per revolution
PING))) ultrasonic distance sensor for distance measurement between 2cm and 3m
PING))) mounting bracket for 180-degrees sweep of the environment
Pitsco Education TETRIC 4in. wheels
The NI sbRIO-9631 FPGA was designed especially for applications that require flexibility, high performance and even higher reliability. To achieve this high performance and reliability, it is equipped with an industrial 266MHz Freescale MPC5200 real time processor that is used for deterministic real time applications. For maximizing performance this real-time processor is combined via a high-speed internal PCI bus with an onboard reconfigurable Xilinx Spartan-3 field-programmable gate array(FPGA). This is connected directly to all onboard 3.3 V I/O. All analog and digital I/O ports have a direct connection to this FPGA. It contains also 110 bidirectional digital lines and 32 16-bit analog inputs and for 16-bit analog outputs. In additions to all these facilities offered to us by the sbRIO-9631 board, the manufacturers also included three connectors for adding board-only version of NI, third party, or custom C series I/O modules. As mentioned before, this board was designed for industrial use, because of this it can work under many conditions, it can operate with a 19 to 30 VDC power supply and it can withstand and operate within -20 and 50°C temperature range. Communication with the board is also done easily, with the help of either the 10/100 Mbits/s Ethernet and serial ports, but you can also communicate with external devices and systems via TVP/IP. UDP, Modbus/TCP, and serial protocols. Some of the most important features of the sbRIO-9631 are[28]:
Integrated real-time controller, reconfigurable FPGA, and I/O on a single board
1M gate Xilinx Spartan FPGA
266 MHz Freescale real-time processor
64 MB DRAM, 128 MB nonvolatile storage
RS232 serial port for peripheral devices
110 3.3 V (5 V tolerant/TTL compatible) digital I/O lines
32 single-ended/16 differential 16-bit analog input channels at 250kB/s
Four 16-bit analog output channels at 100kB/s
Low power consumption with single 19 to 30 VDC power supply input
Figure 37 – The NI sbRIO-9631 prototype robot
In Figure. 37 we can see how the NI sbRIO-9631 prototype robot looks like out of the box. As we can see on the picture, the only sensor that helps its orientation in the 2D space is the PING))) ultrasonic distance sensor. Later on, by adding the CVS module and DAQ module, the capabilities of the robot are significantly extended. On Figure 40 we can observe the complete robot with the two mentioned modules connected to it, thus giving it the capability of real-time image and sound processing. To better understand the NI sbRIO-9631 prototype robot we can use the block diagram in Figure 38. On it we can clearly see the major components of the robot, and how they are connected. By following the current in the diagram, I would like to present the basic parts of the robot and the connection between them.
Figure 38. sbRIO 9631 prototype robot components[19]
The TETRIC 12-Volt Battery Pack(9) supplies the power for the whole robot. This is followed immediately by a Fuse(14), that can interrupt the flow of current in case of any malfunction, a Connector(1), with the help of witch we can easily disconnect the Battery Pack(9) from the rest of the system, and two witches(2,3), that can interrupt the current going to the whole system MASTER(2) or just to the two MOTORS(3).
Figure 39. LabVIEW Robotics Starter Kit (Block Diagram)[17]
In case that both switches(2,3) are on 1, thus letting the current trough we can power the National Instruments 12V-24V DC-DC Converter(10), that converts the Battery Packs 12V DC into 24V DC needed to power the NI sbRIO 9631 FPGA, and to the Dual DC Motor Controller(12), that makes it possible to control the speed, direction and torque of the two DC motors(4). These motors are also connected to two Optical Encoders, one for each motor, that transmit data about the movement of the wheels to the Breakout Board(11), thus knowing at all time the position of the wheels and how many complete turns it has done. To the above mentioned Breakout Board(12) are also connected the robots “eyes and ears”, the Ping ultrasonic sensor(6) and the servo motor(5) that it is mounted on. The servo motor(5) helps to cover a wider range with the sensor, by moving it. Finally, the National Instruments Breakout Board(7) is connected with the help of 3.3V Digital I/O ports of the sbRIO, thus supplying the FPGA with valuable data from the sensors.
4.2 The NI CVS-1454 Compact Vision System
National Instruments provides a wide variety of modules that make it possible to expand the already numerous capabilities of the sbRIO-9631 prototype robot. Using these modules we are capable of obtaining and processing real-time image stream, and using the robots I/O port to control the movement of the robot, thus creating a completely autonomous machine. For obtaining the image data, we use a high resolution digital camera that is full-capable of capturing and processing real time images trough the CVSs integrated processor[23]. The obtained information can be downloaded to the robots main board, thus commanding it. The CVS 1454 is capable of processing the image information obtained from three different cameras, but in our experiment the use of one was fully sufficient. The full capabilities of the CVS system will be presented below.
In our project obtaining and processing real-time image data is an essential part. To fulfill this invaluable task we were provided with a CVS 1454 Compact Vision System and a high resolution digital camera. “The NI CVS 1450 Series devises are an easy to use, distributed, real-time imaging systems that acquire, process and display images from the IEEE 1394 cameras conforming to the IIDC 1394-based Digital Camera specifications”[16]. The system was also equipped with numerous I/O ports that can, and were used in our project, for communicating with external devices(sbRIO-9631), to configure, start inspections and also to display results. Communication and uploading of the bit-file to the CVS system is accomplished using an Ethernet cable. Using this we can also display on the computer measurement results and status information. A big advantage of the system is that once configured, it can run without the need to be connected to the development computer.
The hardware configuration of the NI CVS-1454 is shown in Figure 40. and consists of a VGA connector(9), RS-232 serial port(10), 10/100 Ethernet connector(11), and three IEEE 1394a ports(5). The system was designed in such a way that it also contains LEDs(1,2) for communicating the status of the system and DIP switches(8) that help to specify the startup options. The I/O ports are of two kinds: triggers(3,4), that are TTL I/O ports and isolated I/O ports(6) for connecting to external devices as in our case the sb-RIO-9631 FPGA. The isolated I/O ports help prevent ground loops that can degrade signal integrity.
Figure 40. CVS-1454 front panel[16]
By combining these two state-of-the-art systems from National Instrument, the NI sbRIO-9631 prototype robot and the CVS-1454 system, as shown in Figure 40, we are able to create a reliable and robust real-time image processing system that is capable of controlling the robot. The communication between the two modules, the robot and the CVS is implemented using the above mentioned I/O port available on both devices. After the CVS processes the acquired image data, the results are sent to its Output port, witch in turn is connected to the Input pin of the robot. The robot receives the Output generated by the CVS module and moves the robot according to this.
Figure 41. Complete vision hardware development system
On Figure 41. we can see the complete hardware development system, with all its components in place: microphone(1), not used in my project, but used in a separate one, DAQ module(2), also used in a parallel project, CVS-1454(3) and the Bassler camera(4). By connecting these modules we have been able to create a vision controlled mobile robot.
4.3 LabVIEW Programming Environment
After the short above short description of the hardware composition of our project, it is also important to mention the software programming of our project. Both above mentioned components, the sbRIO-9631 and the CVS-1454, are programmable and in our case programmed with the help of the LabVIEW graphical programming environment. Although we use the same graphical environment to program both hardware components, the toolkits and the modules used are very different.
When programming the sbRIO-9631, the real-time processor runs the LabVIEW Real-Time Module on the Wind River VsWorks real-time operating system (RTOS), thus offering extreme reliability and determinism. The on-board FPGA module can be easily reprogrammed using the LabVIEW FPGA module, that offers “high speed control, custom I.O timing and inline signal processing”[17]. This task is made easier also by the built-in drivers and APIs for the handling of data transfer between the FPGA and the real-time processor.
The most frequently used module in programming the sbRIO-9631 is the LabVIEW Robotics module. It is easily plugged into the LabVIEW graphical programming environment and it can deliver an externsive array of robotics libraries that include:
Built-in connectivity to robotic sensors
Foundational algorithms for intelligent operations and robust perception
Motion functions for making your robot or vehicle move
Real-world application examples
Connectivity to third-party simulation environment
Forward and inverse kinetics
Libraries for protocols including I2C, SPI, PWM and JAUS
Interface to robotics software by MobileRobots and Skillgent[17]
The LabVIWE Robotics module, as a software package of LabVIEW offers a variety of tools for developing a variety of robots of all levels of complexity. It covers a wide range from the simplest educational robots to the most complicated automation systems. It includes built-in drivers for the following sensors and cameras for the NI sbRIO and NI CompactRIO embedded platforms.
Hokuyo, SICK and Velodyne LIDAR
Sharp infrared sensor
Garmin, NavCom, and U-blox GPS
Crossbow, Microsreain, and OceanServer inertial measurement unit IMU
Devantech and MaxSonar sonar sensor
Basler and Axis IP cameras
Analog cameras with AD-1501 analog frame grabber from movMED[17]
4.4 Progaramming the CVS-1454
The second essential part of our mobile robot is the CVS-1454 Compact Vision System that offers us the power and flexibility needed for our project. For programming this device, just as we did in the case of the sbRIO-9631 we use the LabVIEW graphical programming environment, with the difference that we also need to use the some menu-driven environments such as Vision Builder that are used for automated inspections. By using this we can significantly simplify the programming process by replacing complexity with an interactive development environment. It is suitable for gauging, part present/not present, alignment and optical character recognition operations. With the help of LabVIEW and Vision Builder we were capable to easily develop our own image processing system, optimize the processing speed, memory usage and to develop a custom interface that helped us verify and display the results.
For the programming of the CVS-1454 module it is essential the LabVIEW programming environment, but it is also needed to have the LabVIEW Real-Time module and the NI Vision Development Module and the Vision Acquisition Software. The LabVIEW Real-Time moduke combines LabVIEW graphical programming with the power of Real-Time (RT) Series hardware, such as the NI CVS-1454 Series, enabling the user to build deterministic, real –time systems[16]. The RT target can run Vis without the user interface and can offer a stable platform for real-time Vis. An other essential part of programming the CVS system is the NI Vision Development Module. It can be considered as an image acquisition, processing and analysis library for more than 270 functions that can be used with the following machine vision tasks:
Pattern Matching
Particle analysis
Gauging
Taking measurements
Grayscale, color, and binary image display
In our project this was valuable tool, allowing us to acquire, display and store images. Asides from these we had the opportunity to use it for solving the major image analysis and processing problems. It makes it possible to tackle complicated image processing problems easily, without much knowledge about the particular algorithm implementations. Included in the Vision Development Module is the Vision Assistant, an interactive prototyping tool for machine vision and scientific image development. During our research and implementation we use this block many times, because of its simplicity and reliability. It helps the user design a reliable and robust image processing system in little time and effort. Also included in the NI Vision Acquisition software, are the Measurement & Automation Explorer, also referred to as MAX, and the NI-IMAQdx driver software. Using MAX we can obtain the IM address of the CVS and also helps update the software on it.
After this short overview of the two major parts of the project, the sbRIO-9631 and the CVS-1454, both from Hardware and Software point of view we can present the progress of our research project, the experimental results obtained and aims at further research and development of the system.
Chapter 5: Color Tracking and Image Recognition Using LabVIEW
After spending a couple of days visiting the National Instruments factory, where we were fortunate enough to be presented the inner workings of one of the worlds leading industrial automation factories. We were given the interesting and challenging task of creating a sound and face recognition software package, using some of their more significant products. The NI factory provided us with all the necessary hardware and software that was needed to complete our project. These included some of their top-of-the-line products like the sbRIO-9631 prototype robot, the CVS-1454 compact vision system and the LabVIEW graphical programming environment, with all the modules, libraries and drivers necessary for us.
5.1 Getting Started with sbRIO-9631 Prototype Robot and the CVS-1454 Module
As with many things in life, the first step is always research. We needed to read through the manuals of the hardware that was given to us. We studied thoroughly the documentation and manuals provided by NI for both the sbRIO-9631 prototype robot[17] and the CVS-1454 module[16]. We were especially focused on finding a solution for the communication between the two modules. The initial phase also included learning the basics of graphical programming[18] and how to use the Vision toolkit[16].
The next step, after learning about the structure and programming of the robot was to “bring it to life”. For this we were provided with a very useful Getting Started Guide for the Robotics Prototyping Kit[19], that was an enormous helps in finding out how to connect the robot to the PC and how to upload the first software onto it. The uploading of the first software is made very simple with the help of the hardware setup wizard, that guides you through the whole process in some easy to follow steps. These steps are also shown in Figure 42.
Figure 42. Hardware setup wizard
By following the steps form Step 1 to Step 5 we are able to full hardware configuration of the prototype robot. The next step was to create a new robotics project, for this also NI offers us an easy to use solution, in the form of the Robotics Project Wizard, also illustrated in Figure 43.
Figure 43. Create new LabVIEW Robotics Project
After choosing the Robotics Starter Kit and entering the correct IP address, that you get from the Measurement and Automation Explorer(MAX) you only need to name your new robotics project. After the project is created we only need to press the RUN button, marked with an arrow in the top-right corner. After successfully uploading the first, Roaming.vi, program to the robot we were able to see how it moved and avoided objects in a room. After seeing the robot work, and move in the 2D plane on its own, we needed to understand the software that was running on it, to have a deeper insight into LabVIEW programming.
The LabVIEW project consisted of two important VIs: Roaming.vi and Starter Kit FPGA.vi. From these two VIs we only needed to use the part responsible for the steering and for driving the two motors, the rest of the I/O of the robot, like data from the ultrasonic sensor or the optical encoder were of no significant interest at the moment. After finding and isolating the blocks responsible for moving the robot in 2D space we were now able to move the robot without the inputs and outputs of the robot. The Starter Kit FPGA.vi contains the I/O block of the FPGA, it basically gets the data form all the sensors and sends it to the Roaming.vi trough read/write blocks. Also here we can find all kinds of sub-vis that we do not use in our project, like Encoder Loop and Ultrasonic distance loop. Although we do not use many of the modules found here, it offers a valuable insight into how LabVIEW and the sbRIO-9631 prototype robots sensors and motors work.
5.2 Color Tracking
After reading many books and scientific papers on the subject of facial recognition and machine vision from authors like Christopher G. Relf[17], Gary W. Johnson[15], Iyad Aldasouqi[21] and Christopher A. Waring[20] we managed to get a basic knowledge of the subject. After installing LabVIEW onto our computer we could start experimenting with designing basic machine vision applications. For these first experiments we did not use the provided hardware, we only experimented with pictures from folders, which could be played one after and other as a slide show and were perfectly suitable for experiments with basic color and shape recognition applications. The block diagram for a simple color tracking application can be seen in Figure 44.
Figure 44. Block diagram of a simple color tracking .vi
The operation of the block diagram is not complicated and easy to understand. The whole application is in a while loop, that makes it run continuously until the Stop button is pressed. As a first step, the images are taken and run continuously in the Vision Acquisition block(1). Here you have a some options to choose from, depending on your systems requirements. First we need to select the Acquisition Source, that can be Simulated(Folder of Images or AVI) or from a Camera. In this first experiment we use Folder of Images. As a next step we are required to select the Acquisition Type that can be Single Acquisition with processing, Continuous Acquisition with inline processing, Finite Acquisition with inline processing or Finite Acquisition with post processing. All of these are best suitable for fulfilling different needs, but in our case the Continuous Acquisition with inline processing offers the best results. As a next step we are asked to Configure the Acquisition Settings, specifying the Image Path. As a last step we need to Select the Controls/Indicators that we will be using for further processing. Form here the images are sent to a display(7) and to the Vision Assistant(2) block, which does the processing of the images.
The Vision Assistant(2) block is responsible for processing the obtained images. Inside of this block every sub-module has a different, well defined purpose and task that they need to be placed in linear order from left to right, as portrayed in Figure 45. The first part of the script is always the Original Image that contains the original sample image and the elements of this image will consist as samples for the true image recognition
Figure 45. Sub-modules of the Vision Assistant .vi
The Color Threshold helps us isolate a color of our choice, that we will be tracking further on, thus transforming all other colors into black. Using the next block, Advanced Morphology, we are able to modify the binary image, enabling the user to remove small objects and fill the small holes created due to lighting conditions. The Particle Analysis modules helps in finding the objects center of mass, combining this module with the Set Coordinate System makes tracking the object much easier. In this case, where we need to track a circular object, the Find Circular Edge needs to be used.
After the results of the processing are given to us through the outputs of the block we are able to use this data to control the Case loop(4) responsible for creating the circular ROI and to a display that tells us the exact number of green circular objects found in the image. The input and the processed image are both displayed using Display Blocks(7), and there is a Loop Delay with which and a Slide Bar we can adjust the frame speed.
Figure 46. Color tracking
As we can see in Figure 46. with the help of this application we can recognize, track and build a ROI around the green, round objects. The Front Panel of the .vi displays both the original image and the processed image, making it extremely easy to verify the accurateness of the created color tracking .vi.
5.3 Retina Recognition
As in our modern world we are becoming more and more reliant on the use of fast and highly reliable personal identification, because of various government and industrial applications. Passwords are already a thing of the past, because of their high risk, they can be lost, stolen or just divulged to the wrong person. Biometrics and more specifically retina scan is one of the methods that can make personal access more secure. Due to the uniqueness of the iris pattern many people argue that it is the most secure recognition method up to date. John Daugman[23], had some incredible results on this subject, using two-dimensional Gabor wavelets. The results were so great that most commercially present iris recognition products are based on his results.
Iris recognition is a method of biometric personal identification based on high-resolution iris images of human eyes[24]. It is especially useful because of two special properties that the human iris has: it has very little variation over the human life and it has genetic independence. The false acceptance rate in retina scans is 1 in 1031[25].
The process can be broken down into 4 major parts: acquisition, localization and processing. All of these steps have their own challenges, that need to be tackled, already at the first steps, we are faced with the problem of acquiring suitable quality images for the recognition. For this, the simplest way is to create a high resolution picture database that contains retina images.
After reading Mohinder Pal Joshi[22], R.S. Uppal[22] and Livjeet Kaur’s[22] paper entitled Development of Vision Based Face Detection System, and doing a little more research on the subject I came to the conclusion that retina scan with CVS and LabVIEW is truly possible, but it has its limitations.
Recognizing objects based on their color is an efficient way of recognition, but it has its limitations. For example, if an object is made up out of more colors, or if the object has an irregular shape. In real-life machine vision situations we encounter these situations many times, and in these cases Color Threshold and Find Circular Edge alone can not help us. These are the cases that Color Pattern Matching becomes a powerful and essential tool.
As many of the above mentioned blocks, it also can be found inside the Vision Assistant. By using this block we can Create or Load a template, for which we will be searching for in the original image. In addition it has a very useful setting that allows us to search for rotated patterns. By using Color Patter Matching could create a retina recognition application.
Figure 47. Retina recognition .vi
Just like fingerprints, the retina patters are unique for each person, no two humans have the same retina. This uniqueness offers us a high reliability factor when comparing the templates to each humans retina. As we can see in Figure 47, we do not use the whole retina of the eye as a template, but only two small parts of it(1),(2). In case that both templates are successfully recognized in one image, only then does the system confirm a match(3). By using two smaller parts of the retina to give a match, we significantly reduce the number of false accepts.
5.4 Face Recognition
After these initial successful results in tracking different color we considered to try recognizing a given human face from a series of pictures containing different faces. In this experiment we considered to combine two different approaches: isolating the human skin, based on color and HUE saturation level and template matching. The structure of the block diagram is very similar to the one mentioned above, the major changes have been made in the Vision Assistant block, as we can see also in Figure 48.
Figure 48 Template Matching and HUE select in Vision Assistant
The first block, as always, remains the original image(1), that we can also see in the top-left corner. This is followed by the Color Pattern Matching block, which allows us to create or load an image file that will be used as a template(2) for further recognition procedures. After the Color Threshold, Fill Holes and Remove small objects operations we are left with the parts of image that contain human skin color(3). By rearranging the block in the Vision Assistant in such an order we are able to create a reliable face detection and recognition system, which can offer good results, as we can see also in Figure 49.
Figure 49. Face recognition results
As we can see clearly in picture above, we managed to isolate the faces from the background and to select the face that we needed from all the rest of the faces. Template matching is a very useful tool in selecting the face that we need but it has its limitations. By modifying the image input image compared to the template significantly, it loses from its accuracy.
5.5 Robot Controlled by Image Processing:
After these initial experiments on image processing we needed to take the next big step, to control the sbRIO-9631 prototype robot using the results obtained after the image processing. There were two major problems in achieving this, namely: how to send the result of the image processing phase to the robot, and how to accept and use these results in controlling the robot.
The solution to both these problems was using the I/O ports of both devices. The results of the image processing were sent to the Digital Outputs of the CVS-1454, which has a 44-pin DSUB connector, as shown in Figure 50, containing general-purpose digital inputs and outputs. These include 2TTL inputs, 8 TTL outputs, 12 isolated inputs, and 4 isolated outputs.
Figure 50. 44 Pin DSUB Connector
We used Pin 13 and 27, which are ISO Output 1 and ISO Output 2, both General-purpose output ports. To verify if we are able to send commands to the ports we created a simple .vi Figure 51, in which we sent 1 and 0 to the above mentioned ports.
Figure 51. CVS-1454 Digital Output Write Block Diagram
Using two push buttons(1), we were able to send a simple 1 or 0 to both output ports, with the help of Read/Write Blocks(2), that were configured accordingly. The results were verified using a simple LED connected to these ports and the GND, and also with a digital voltmeter.
After solving the question of how the two devices could communicate, and having the previously obtained knowledge of processing images we created a .vi that is able to send commands to control the sbRIO-9631 prototype robot. The block diagram can be seen in Figure 52.
Figure 52 CVS Image Processing Block Diagram
The image processing procedure is similar to the one explained in the previous chapter, it uses template matching on order to recognize the different directions that the arrow is pointing to. The original image is obtained from the Bessler camera trough the Vision Acquisition block. After it is displayed, it is processed in two Vision Assistant blocks(2), each one searching for a different template. The results are then displayed(3) and compared. Based on the comparison results the outputs(4) become activated and command the robot. There are also two LED display buttons(5), that show the status of the outputs. Using this simple .vi we are capable of processing the obtained images and sending commands to the sbRIO-9631 FPGA. The completely assembled prototype robot with the CVS module connected to the input ports of the FPGA can be seen in Figure 53.
Figure 53. sbRIO-9631 prototype robot with the CVS-1454 block connected to it
The output results of the CVS module is sent to Port6/DIO8 and Port2/DIO5, represented by pins 39 and 43. The control of the mobile robot can be seen in Figure 54. With using the result of the CVS module we are able to control the direction of the robot. This is accomplished by changing the direction of movement of the two DC motors that control the two sets of wheels responsible for the special movement of the robot.
Figure 54 sbRIO-9631 prototype robot motor control
Block Diagram
Port6/DIO8 and Port2/DIO5(1) get the image processing results from the CVS module. These become inputs for two case loops(2). Based on the results of these two case loops, the third case loop modifies the values sent to the motor control block. By modifying these values we are able to control the movement of the two DC motors, thus controlling the movement of the robot.
5.6 Experimental Results:
The main objective of our research efforts can be considered the development of a real-time application in the LabVIEW programming environment, capable of controlling the SbRIO-9631 prototype robot based on face and image recognition. By recognizing shapes and colors, the robot will be capable of moving in the 2D space. After the initial successes in connecting the hardware components, solving the communication problems and being able to write image processing applications in LabVIEW, we were ready to combine these elements, thus getting a valid image processing controlled robot.
In the initial stages of the experiment we have focused on controlling the robot by uploading simple image processing .vi’s to the CVS module, and with the results of this, controlling the robot. The results of such and experiment can be seen in Figure 53., where we control the robot using a Red or Green circle. The results of this simple experiment can be seen in the figure below.
Figure 55. Forward/Backward using Red/Green dots
As we can see in Figure 55. we are able to control the robots movement(3) using the Red/Green dots(1) pointed at the camera(2). By moving the green dot in front of the camera, the CVS recognizes its color and send a command to move the robot forward. By placing the red dot in front of the camera, the robot starts to move backwards.
After the basic control of Forwards/Backwards of the robot we developed a .vi that gave the robot more mobility, enabling it to move also to the left and right. In the Front_Panel window shown in Figure 56. we can easily distinguish the Go Right command. The positive recognition of the arrows directions is shown with the help of the square ROI created around it and is also indicated by the green virtual LED.
Figure 56. Robot control using arrows
Based on the direction the arrow is pointing, the robot moves in different directions. Changing the orientation of the arrows changes the movement direction of the robot. Using this simple method we are able to control the robots movement in the 2D plane.
Figure 57. Robot control using arrows
As we can see in Figure 57. by rotating the arrow(2) in front of the camera(1) we are able to control the robots movement(3). The images from the camera are processed by the CVS, thus recognizing the orientation of the arrows and it sends the appropriate signal to the FPGA, that controls the two motors of the robot.
In these two cases, because of the simplicity of the images and colors, the recognition rate is very high, over 80%. The differences in lighting condition and the distance can affect the results but not much. These two modalities are reliable robot control methods.
In the case robot control using facial recognition and head orientation, the results are not as accurate. Lighting conditions, distance and head orientation can significantly influence the experimental results. The Front_Panel of a head orientation recognition .vi can be seen in Figure 58. By moving the head to the left/right the robot also turns in the appropriate direction.
Figure 58. Left/Right turn by tracking head movement
As in the case above, a positive match is marked by creating a rectangular ROI around the face. For better results the original color image is transferred to grayscale before processing. By doing so we are able to not only increase the speed of the process but also achieve a higher recognition rate. By playing with the template matching blocks settings we can get a slightly higher recognition rate but these settings need to be set up every time that the external conditions change. Even with the best setting that we could find we were not able to achieve a recognition rate not even close to being precise. With this method we were able to reach a maximum recognition of 63%. Further improvements can be made with the help of carefully set environment, lighting and even more accurate images.
Chapter 6: Conclusion
6.1 Accomplishments:
During this project we were able to accomplish all the major goals that were set for us. By achieving these goals we were able to create a Sb-Rio 9631 prototype robot that is connected and controlled by a CVS programmed for face and image detection.
In this project the following goals have been achieved:
Researching the history and state of the art image processing techniques, by this achieving a deeper knowledge on image processing.
Mastering LabVIEW programming environment and language.
Getting familiar with the Vision Acquisition and Vision Assistant Block and all the functions that they contain.
Writing the Sb-RIO 9631 motor control Block Diagram.
Writing the first color recognition and tracking programs.
Achieved communication between the Sb-RIO 9631 and CVS, thus being able to control the robot with the help of the image processing results sent from the CVS.
Development image and face recognition Block Diagrams for the CVS module
6.2 Experimental Results
After conducting numerous experiments under various conditions, we can defiantly say that we have achieved reasonably high success ration in facial recognition and image recognition. We have been able to detect images, that are easier to detect because they do not change with time, with an proximately 85% success ratio. As mentioned above and also shown in Figure 54, we were able to detect the different orientations of the arrow or the color of the ball, Figure 44, and using this data to control the two DC motors on the robot.
While conducting face recognition tests using our software, we have concluded that error percentage during face detection is much higher. Because of the flexibility of the human faces muscles and the always changing emotions on a human face, it was hard to positively identify the subject using just one sample template, the success rate even under ideal environmental conditions was only 80%, under normal, non-laboratory conditions this number decreased even more, arriving to almost 60%. Using multiple templates, or even creating a template database for each subject could significantly increase recognition rates, but it will also increase processing time and would also require more processing power and memory space.
These results are achieved using built in LabVIEW functions as Circular Edge Detection, Fill Holes, Template Matching etc. that were designed and most efficient on industrial image processing. Even with these limitations we were able to achieve an acceptable recognition rate. During further research it is possible to incorporate and try out different image processing codes into LabVIEW, thus being able to compare the results of different methods.
All the experimental results achieved by us have been published in scientific papers (see Appendix 1) and also presented to the research teams at NI Debrecen. Based on our research on this subject we have wont the 3rd place in the NI organized robotics contest(see Appendix 2).
6.4 Future Improvements
As the first part of the project was finished and we were satisfied with the results that we have achieved, we have considered further research and development of the already created Software package. We have considered following the paths showed to us by the works of Richard Szeliski[26], Yao-Jiunn Chen, Yen-Chun Lin[27] and Helmut Seibert[28]. Using these valuable guides and our own already gained experience in image recognition and LabVIEW programming, we will try to achieve higher speed, more reliable face detection, run down the false positive and false negative values to under 10%.
Further research also consists of implementing and comparing other face recognition methods results to the results already documented by us. By achieving this we can positively identify the best face and image recognition method, and distinguish which method is best used under which conditions. Also by combining the elements of different methods we will be able to achieve not only higher recognition speed but also more accurate results.
Even further research could include emotion detection based on facial expressions, this research would give us valuable information regarding peoples emotions based on a scientific algorithm that can analyze their facial expressions. For this you need an enormous database of people in different emotional states and comparing this data to real time facial expressions of people.
Bibliography
[1] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neurosicence, 71–86, 1991.
[2] P. Jonathan Phillips and Hyeonjoon Moon. The FERET Evolution Methology for Face-Recognition Algorithms. Transasctions on Patterns analysis and Machine Intelligence, Vol. 22, October 2000.
[3] Ion Marqu´es. Face Recognition Algorithms. Proyecto Fin de Carrera, June 16, 2010
[4] Rosalyn R. Porle, Ali Chekima, Farrah Wong, and G. Sainarayanan. Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application. World Academy of Science, Engineering and Technology, 2009.
[5] R. Louban. Image Processing of Edge and Surface Defects Theoretical Basis of Adaptive Algorithms with Numerous Practical Applications, volume 123, chapter Edge Detection, Springer Berlin Heidelberg, 2009.
[6] S. K. Singh, D. S. Chauhan, M. Vatsa, and R. Singh. A robust skin color based face detection algorithm. Tamkang Journal of Science and Engineering, 6(4):227–234, 2003.
[7] M. C. Nechyba, L. Brandy, and H. Schneiderman. Lecture Notes in Computer Science, Face Detectionand Tracking for the CLEAR 2007 Evaluation, pages 126–137. Springer Berlin Heidelberg, 2009.
[8] M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1):34–58, January 2002
[9] http://sine.ni.com/np/app/main/p/docid/nav-104/lang/ro/fmid/1762/
[10] J. S. Bruner and R. Tagiuri. The percepton of people. Handbook of Social Psycology, 2(17), 1954.
[11] C.-C. Han, H.-Y. M. Liao, K. chung Yu, and L.-H. Chen. Lecture Notes in Computer Science, volume 1311, chapter Fast face detection via morphology-based pre-processing, pages 469–476. Springer Berlin Heidelberg, 1997.
[12] L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A-Optics, Image Science and Vision, 4(3):519–524, March 1987.
[13] Jie Ni and Rama Chellappa. Evaluation of state-of-the-art algorithms for remote face recognition. Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland, College Park, MD 20742, USA.
[14] Taranpreet Singh Ruprah. Face Recognition Based on PCA Algorithm, IET-DAVV, Indore, India, February 2012.
[15] Phillip A. Laplante. Image processing series, Pennsylvania Institute of Technology, 2008.
[16] National Instruments Corporation. NI CVS-1450 Series User Manual, 11500 North Mopac Expressway Austin, Texas, December 2004.
[17] National Instruments Corporation. NI LabVIEW Robotics Starter Kit, Robotics Platform for Teaching, Research, and Prototyping , 11500 North Mopac Expressway Austin, Texas, July 2012.
[18] Rick Bitter, Taqi Mohiuddin and Matt Nawrocki. LabVIEW Advanced Programming Techniques, Second Edition, Taylor and Francis Group, 2006.
[19] National Instruments Corporation. Getting Started Guide for the Robotics Prototyping Kit, 11500 North Mopac Expressway Austin, Texas, December 03, 2010.
[20] Christopher A. Waring and Xiuwen Liu. Face Detection Using Spectral Histograms and SVMs, Department of Computer Science, The Florida State University, Tallahassee, FL, 2008.
[21] Iyad Aldasouqi, and Mahmoud Hassan. Smart Human Face Detection System. International journal of computers, Issue 2, Volume 5, 2011.
[22] Mohinder Pal Joshi, R.S. Uppal, Livjeet Kaur. Development of Vision Based Iris Recognition System. International journal of advanced engineering sciences and technologies Vol No. 6, Issue No. 2, 2011.
[23] J. Daughman. Complete Discrete 2-D Gabor Transformsby Neural Networks for Image Analysis and Compression. IEEE Transactions on Acoustics, Speech and signal Processing, VOL.36, No.7, July 1988.
[24] R.P. Wildes. Iris Recognition: An Emerging Biometric Technology. Proceedings of the IEEE, vo1. 85, pp.1348-1363, Sept. 1997.
[25] J. Daugman. How Iris Recognition Works. IEEE Trans. on Circuits and Systems for Video Technology, Vol. 14, No. 1, January 2004.
[26] Richard Szeliski. Computer Vision: Algorithms and Applications Richard. September 3 2010
[27] Yao-Jiunn Chen and Yen-Chun Lin. Simple Face-detection Algorithm Based on Minimum Facial Features. Mechanical & Systems Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan, ROC, 2007.
[28] Xuebing Zhou, Helmut Seibert, Christoph Busch and Wolfgang Funk. A 3D Face Recognition Algorithm Using Histogram-based Features. The Eurographics Association, 2008.
[29] National Instruments Corporation. Rugged Real-Time Compact Vision Systems. 11500 North Mopac Expressway Austin, Texas, 2004
[30] National Instruments Corporation. IMAQ, NI Vision Assistant Tutorial. 11500 North Mopac Expressway Austin, Texas, August 2004.
[31] Noor Faezah Binti Tumari. Developing a face detection system utilizing usb webcam, Faculty Of Electronic and Computer Engineering Universiti Teknikal Malaysia Melaka, April 2008.
[32] Nie Nasuriana Binti Sidek. Developing a face recognition software. Faculty Of Electronic and Computer Engineering Universiti Teknikal Malaysia Melaka, April 2008.
[33] Dr. Alex See Kok Bin and Ang Jin Leong. Developing Face Recognition Software Using LabVIEW and a Vector Quantization Histogram.
[34] Alex See and K.Y. Tan. Using LabVIEW and NI-IMAQ Vision Software to Create a Robust Automated Face Detection System.
[35] Christopher G. Relf. Image Acquisition and Processing with LabVIEW. CRC Press LLC, 2004.
[36] http://sine.ni.com/ds/app/doc/p/id/ds-217/lang/ro
[37] http://www.ni.com/white-paper/10568/en/
[38] http://www.ni.com/vision/systems/cvs/
[39] http://sine.ni.com/nips/cds/view/p/lang/ro/nid/211839
[40] http://www.ni.com/robotics/
[41] http://romania.ni.com/
Appendix 1
Below is attached an article that was published based on our research in the domain of Image processing abilities implementation upon NI SBRIO-9631 Prototype robot interfaced with CVS-1454 Vision System. Article published on the 17th “Building Services, Mechanical and Building Industry days” International Conference, held in Debrecem, Hungary, between 13-14 October 2011.
Appendix 2.
Below you can find attached a copy of the diploma received by our team for achieving 3rd place in the National Instruments Virtual Instrumentation competition, with our project entitled: Multimedia control of the SbRIO-9631 Robot, organized by National Instruments Corporation on the 23rd of January in Debrecen, Hungary.
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Chapter 1: Introduction [306900] (ID: 306900)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
