Abstract Emotional health plays a very vital role to improve [632258]


Abstract — Emotional health plays a very vital role to improve
people's quality of lives , especially for the elderly where negative
emotional situations may lead to social or mental health
problems. To cope with emotional health problems due to
negative emotions in d aily life, our research in this work focuses
on efficient facial expression recognition system to contribute in
emotional healthcare system in a smartly controlled environment .
Facial expressions play a key role in our daily communications
by representing our emotions , and recent years have witnessed a
lot of research works to develop reliable facial expressions
recognition (FER) systems. Hence, facial expression analysis
from video data is considered to be a very challenging task in the
research areas of computer vision, image processing, and pattern
recognition. The accuracy of a FER system is pretty much reliant
on the extraction of robust features. In this work, a novel feature
extraction method is proposed to extract prominent features
from the human face. For person independent expression
recognition, depth video data is used as input to the system where
in each frame, pixel intensities are distributed based on the
distances to the camera. A novel robust feature extraction
process is a pplied in this work which is named as Local
Directional Position Pattern (LDPP). In LDPP, after extracting
local directional strengths for each pixel such as applied in
typical Local Directional Pattern (LDP), top directional strength
positions are conside red in binary along with their strength sign
bits. Considering top directional strength positions with strength
signs in LDPP can differentiate edge pixel s with bright as well as
dark regions on their opposite sides by generating different
patterns whereas typical LDP only considers directions
representing the top strengths irrespective of their signs as well
as position orders (i.e., directions with top strength s represent 1
and rest of them 0) , which can generate the same patterns in this
regard sometimes . Hence, LDP fails to distinguish edge pixels
with opposite bright and dark regions in some cases which can be
overcome by LDPP . Furthermore, t he LDPP features are
extended by Principal Component Analysis and Generalized
Discriminant Analysis (GDA) to make them more robust for
better face feature representation in expression . The proposed
features are finally applied with Deep Belief Network (DBN) for
expression training and recognition.

Index Terms —DBN, Depth Image, GDA, LDP, LDPP, PCA.

Md. Zia Uddin is with the d epartment of Computer Education,
Sungkyunkwan University, Seoul, and Republic of Korea (e-mail:
[anonimizat] ).
Mohammed Mehedi Hassan is with the department of Information
Systems, College of Comp uter and Information Sciences, King Saud
University, Riyadh, KSA, Saudi Arabia ( e-mail: [anonimizat] ). I. INTRODUCTION
ATELY , human computer interactions (H CI) have attracted
a lot of researchers ’ attentions due to their practical
application s in ubiquitous healthcare systems [1]. For
instance, adopting HCI systems for emotion recognition in a
ubiquitous smart healthcare system can improve the system by
perceiving people’s accurate facial expressions and react
according to their emotions . For emotional healthcare in a
smart ly controlled environment , researches on facial
expression recognition (FER) from video are getting
considerable attentions by computer vision and image
processing researchers these days [2].
For facial expression feature representation, Principal
Component Analysis (PCA) has been applied by most of
the researchers to extract global or averag e features from
faces in different expressions [3]-[10]. As PCA could not
extract local face features, local feature extraction approach
such as ICA was introduced and adopted by many
researchers for statistically independent facial feature
extraction [11]-[22]. For local face feature extraction in
addition to ICA, Local Binary Patterns (LBP) also applied
by some researchers [23]-[25]. One advantage of LBP over
ICA is LBP features have toleran ce against illumination
changes . Besides, LBP features are m uch computationally
easier than ICA. To improve LBP, local directional
strengths of a pixel was focused and hence, Local
Directional Pattern (LDP) was introduced to represent local
face features by focusing on pixel’s gradient information
[26]. Once direct ional strengths are obtained in LDP, top
strengths are considered where the value of number of top
directional strength is determined empirically.
In this work, the conventional LDP is changed to make it
more robust where a fter determining the directiona l
strength s of a depth pixel, top directional strengths are
obtained in descending order and corresponding strength
sign bits are combined with the top direction al strength
position s in binary . This approach is named as local
directional position pattern ( LDPP) here. In traditional LDP
for a pixel , the directions with top edge strength are
considered by assigning binary bit 1 to them and 0 to rest of
the directions. It does not consider the strength signs and
order of the directions with top strengths and hence, there
can be a possibility of generating same pattern for two
opposite kind of edge pixel s having opposite dark and
bright regions . The proposed LDPP can overcome this Facial Expression Recognition Utilizing Local
Direction -based Robust Features and Deep Belief
Network
Md. Zia Uddin and Mohammed Mehedi Hassan
L

problem . Basically, for an edge pixel, dark region mostly
represent s negative strength whereas bright region show s
positive strength. As LDP does not consider the signs of the
directions strengths, two edge pixels with opposite dark and
bright regions may exchange the strength signs keeping
their strength order same which should generate the same
LDP code for these two edge pixels where they should be
very differen t patterns . Besides, LDP represents flat bit 1 to
directions with top strengths and 0 to other s without
considering the orders of the strengths. In such cases, LDP
becomes a weak approach to generate features whereas
considering strength sign bits along with the top directional
strength positions in binary may resolve this issue very
strongly to re present robust features such as LDPP . Thus,
top directional strength positions’ are considered in binary
with sign bits in LDPP and then, LDPP histogram is
generated to represent robust features for whole face . To
make the LD PP features more robust, Genera l Discriminant
Analysis (GDA) is utilized after applying PCA for
dimension reduction. GDA is considered to be an efficient
tool to discriminat e the features from different classes [27] –
[30].
Typical RGB videos are very commonly utilized for face
image extraction in computer vision -based FER works. But,
one problem in RGB face -based works is face identity cannot
be hidden in images which may cause privacy issues
sometim es by the users of the system. Another problem in
RGB face images is face parts cannot be differentiated based
on the camera distance . For instance, as nose is always a l ittle
ahead of other face parts, distance -based face information
could help for better face description in expressions. H ence, it
would be convenient to represent face part’s features if the
face pixels are distributed based on the distance to the camera. Depth based cameras can overcome such limitation s of RGB
cameras by providing the dept h information of the face parts
based on the distance to the camera . This advantage of depth
videos would allow one to come up with more efficient
expression recognition systems than RGB -based ones. As
depth pixels don’t reveal the real identity of the use r, depth
cameras would make it possible to solve some privacy issues
whereas RGB cameras cannot make it possible and hence,
depth cameras can be applied regardless of person’s identity.
Due to the aforementioned advantages , depth images have
been getting more and more popular day by day to many
researchers for computer vision and image processing
applications such as body [31] -[42]. For instance, the authors
used depth videos for human activity recognition in [31] . In
[33], the authors obtained depth image -based surface
histograms for activity analysis . In [42], the authors adopted
Maximum Entropy Markov Model (MEMM) for depth image –
based action analysis . In additions to human actions , hand
movement s in depth videos were focused by some researches
[44]-[52]. In [45], interacting hands were analyzed in depth
videos using particle swarm optimization . In [47], the authors
focused random forests for segmentation of different hand
parts from depth images . Using depth information, some
researc hers also focused on gesture recognition [53] -[62],
head pose estimation [63] -[71], and face recognition [72] -[76].
For instance, in [66], the authors focused on head pose
estimation from depth image using artificial neural network . In
[71], the authors tried to do face recognition where low quality
depth images were obtained from stereo cameras.
Hidden Markov Model (HMM) is a famous tool for time –
sequential even analysis and hence, it has been applied for
FER in some research works [77], [78]. Nowadays , deep
learning has been focused by most of the image processing
and computer vision researchers [79], [80]. In deep learning,
Deep Neural Network (DNN) was used by a lot of researchers
since it can also generate some features from the raw input in
addition to training and recognition which is an advantage
over the other typical classifiers. Though DNN has drawn
considerable attentions from the researchers but it require s too
much training time [79]. Later on , DNN was modified and
improved by Hilton et al. and called Deep Belief Network
(DBN) that utilize d Restricted Boltzmann Machine (RBM) for
efficient training [80]. DBNs incrementally makes network
topology layer by layer where each layer is trained by means
of RBM and each RBM learns the proba bility distribution of a
set of observations over the hidden units in hidden layers [81].
Thus, DBNs are formed by stacking a set of RBMs starting
with the input layer and first hidden layer. Once the first RBM
is trained, its hidden layer becomes the vis ible layer of a
second RBM and then the second RBM is trained. Once the
second RBM is trained, its hidden layer becomes the visible
layer of a third RBM and so on. When this process is finished,
final hidden layers are connected to the output layer. Then, a
DNN can be applied for training process with gradient descent
on the error from the target output. Thus, a DBN iteratively
learns the weights of the network by composing lower -order
features of preceding layers. As DBN seems to be better than

Fig. 1. Basic architecture of the proposed FER system.

typical classifiers, DBN is adopted in this work for emotion
recognition from facial expressions.

(a)

(b)
Fig. 2. (a) A depth image and (b) corresponding pseudo -color image of a
surprise expression.

(a)

(b)
Fig. 3. (a) Sample gray faces converted from RGB and (b) depth faces from a
happy facial expression.

A novel FER approach is proposed in this work using
LDPP, PCA, GDA, and DBN based on a depth sensor -based
video camera images. The LDPP features are extracted first
from the facial expression depth images which are then PCA
is applied for dimension reduction. Furthermore, the face
features are classified by GDA to make them more robust.
Finally, the features are applied to train a DBN to be applied
later for re cognition on cloud . Fig. 1 depicts the basic
architecture of the proposed FER system .
II. FEATURE EXTRACTION
The depth images are acquired first by a depth camera [ 2]
where the depth videos generate RGB and depth information simultaneously for the objects captured by the camera. The
depth sensor video data shows the range of every pixel in the
scene as a gray level intensity . Figs. 2(a) and (b) represent a
surprise depth and corresponding pseudo -color image
respectively. The d epth images indicate the bright pixel values
for near and dark ones for far distant face parts . The
corresponding pseudo -color face image tries to depict the
differences of face parts in Fig. 2(b) . Fig . 3 shows a sequence
of gray and depth faces from happy expression.
Southeast
(SE)Southwest
(SW)Northwest
(NW)
South
(S)North
(N)
West
(W)East
(E)Northeast
(NE)
3 3 5 3 5 5 5 5 5 5 5 3
3 0 5 3 0 5 3 0 3 5 0 3
3 3 5 3 3 3 3 3 3 3 3 3
east north ast north north est S e S S w S          
                  
                         3 3 5 3 5 5 5 5 5 5 5 3
3 0 5 3 0 5 3 0 3 5 0 3
3 3 5 3 3 3 3 3 3 3 3 3
east north ast north north est S e S S w S          
                  
                        
0 1 2 3 east north ast north north est
5 3 3 3 3 3 3 3 3
5 0 3 5 0 3 3 0 3
5 3 3 5 5 3 5 5 5S e S S w S
            
           
           
0 1 2 3
5 3 3 3 3 3 3 3 3
5 0 3 5 0 3 3 0 3
5 3 3 5 5 3 5 5 5            
           
           
west south est south south ast S w S S e S 0 1 2 3
5 3 3 3 3 3 3 3 3
5 0 3 5 0 3 3 0 3
5 3 3 5 5 3 5 5 5     
     3 3 3
3 0 5
3 5 5
west south est south south ast S w S S e S   
    
    3 3 5 3 5 5 5 5 5 5 5 3
3 0 5 3 0 5 3 0 3 5 0 3
3 3 5 3 3 3 3 3 3 3 3 3          
                  
                         3 3 5 3 5 5 5 5 5 5 5 3
3 0 5 3 0 5 3 0 3 5 0 3
3 3 5 3 3 3 3 3 3 3 3 3          
                  
                        
0 1 2 3
5 3 3 3 3 3 3 3 3
5 0 3 5 0 3 3 0 3
5 3 3 5 5 3 5 5 5            
           
         
west south est south south ast S w S S e S  
  
  

Fig. 4. Kirsch edge masks in eight directions.

0 0 1
1 X 1
0 0 0D0D7
D4D6D5
D1 D2D3
0 0 1
1 X 1
0 0 0B0B7
B4B6B5
B1 B2B30 0 1
1 X 1
0 0 0R0R7
R4R6R5
R1 R2R3

(a) (b) (c)
Fig. 5. (a) Edge response to eight directions in number around a pixel (b) sign
bit of the edge responses in corresponding direction, (c) ranking of the edge
response.
A. Local Directional Position Pattern (LDPP)
The Local Directional Position Pattern (LDPP ) assigns an
eight -bit binary code to each pixel of an input depth face. This
pattern is calculated by considering top two edge strength
position with sign from eight different direction s. For pixel in
the image, the eight directional edge response values {Di}
wher e i=0,1,..,7 are calculated by Kirsch masks . Fig. 4 shows
the Kirsch mask s. After applying the mask, the directional
positions are determined as

0,1,..,7 , , , , , , , . D E SE S SW W NW N NE (1)

Thus , LDPP code for a pixel x is derived as
8
0( ) 2 ,i
i
iLDPP x L
  (2)
|| , L A K (3)
( ) || ( ( ( ))), A B g binary Arg D g (4)
( ) || ( ( ( ))), K B e binary Arg D e (5)
0 1 2 7( , , ,…, ),g R R R R
(6)

0 1 2 7( , , ,…, ),e R R R R (7)

where g represents the highest edge response direction , e
second highest edge response direction , R ranking s of the
edge response s to the corresponding directions . Fig. 5
depicts the edge response s, sign bit of t he edge responses,
and edge response ranking to eight directions. The highest
edge response is set to eighth rank. Then, the second
highest response is set to seventh, and so on. Fig. 6 shows
two examples of LDPP code s where typical LDP makes
same patterns for different edges but LDPP can generate
separate pattern . In the upper part of figure, the highest
edge response is 2422 and hence the first bit of LDPP code
for the pixel is the sign bit of 2422 which is 0 and the
following three bits are the binary of the direction where
the highest strength position lies i.e., 001 which is binary of
1 from D 1 . The second highest edge response is -1578 and
hence the fifth bit of LDPP code for the pixel is the sign bit
of -1578 which is 1 followed by three bits that ar e the
binary of the direction where the second highest strength
position lies i.e., 100 which is binary of 4 from D 4. Hence,
the LDPP code for upper pixel is 00011100 and for lower
pixel 10010100. On the other hand, considering top five
directional strengt hs, LDP codes for both of them are same
i.e., 01110011 . Similarly, considering all possible top
directional strengths, the two pixels LDP codes must be
same as their directional rankings are same in both the
pixels. Hence, LDPP represents robust features than LDP.
Thus , an image is transformed in to the LDPP map using
LDPP code. The image textual feature is presented by the
histogram of the LDPP map of which the sth bin can be
defined as  
,( , ) ,s 0,1,… 1
x ysZ I LDPP x y s n     (8)

where is the number of the LDPP histogram bins for an
image I. Then , the histogram of the LDPP map is presented as

0 1 1( , ,…, ).n H Z Z Z  (9)

To describe the LDPP features, a depth silhouette image is
divided into non -overlapping rectangle regions and the
histogram is computed for each region as shown in Fig. 7 .
Furthermore, the whole LDPP feature A is expressed as a
concatenated sequence of histograms

1 2( , ,…, )gA H H H (10)

where g represents the number of non -overlapped regions
in the image.

Fig. 7. A depth expression image is divided into small regions and the
regions’ LDPP histograms are concatenated to represent features for a face.
n
LDPP Features

Fig. 6. Two examples for opposite edge pixels where LDP fails to separate them but LDPP does successfully.

Fig. 8. Top 150 eigenvalues after applying PCA on LDPP features.

B. Principal Component Analysis (PCA)
Once locally salient LDPP features are obtained for all the
trained facial expression depth images, the feature dimension
become s high and hence , PCA is adopted in this work for
dimension reduction. PCA is used to look for the directions of
maximum variation in data. Considering J as a covariance
matrix of LDPP feature vectors, PCA on J should find out
the princip al components with high variances. Thus , PCA on J
can be described as

TY E JE (11)

where E indicates the eigenvector matrix representing
principal components (PCs). In this work , we considered 150
PCs after PCA over J. Fig. 8 depicts the top 150 eigenvalues
corresponding to the first 150 PCs once PCA is applied on
LDPP feature s. The eigenvalues basically indicates the
importance of the corresponding PCs . It can be noticed in the
figure that after first few positions, the eigenvalues are
descending to zero , indicating the considered number of
dimensions should reduce the LDPP feature dimension well
with negligible loss of original features . Thus, the reduced
dimensional LDPP features after PCA can be shown as

C AE (12)

C. Generalized Discriminant Analysis (G DA)
The final step of the feature extraction from a depth
image of facial expression is to apply generalized
discriminant analysis (GDA) to make the features more
robust. GDA , a generalized method of linear discriminant
analysis (LDA) which is basically based on an eigenvalue
resolution problem to make between inner -class scatterings
minimum and inter -class scatterings maximum . GDA first
represents the input s into a high dimensional feature space
where it tries to solve the problem by applyin g LDA
method on the feature space . Thus, t he fundamental idea of
GDA is to map the training data into a high dimensional
feature space M by a nonlinear Gaussian kernel function to
apply LDA on M. Hence, t he main goal of GDA is to maximize the following equations.

T
B
GDAT
TG
G 
 
  (13)

where and are the between -class and total scatter
matrices of the features . Finally, the PCA features C is
projected on GDA feature space GDA as

.T
GD A O C  (14)

Fig. 9 shows a 3-D plot of the GDA features of training
expression images which shows good separation among the
samples of different classes which indicat es the robustness of
GDA in this regard. The clusters consisting of sample from
different classes are also indicated by dotted ellipses in the
figure. The LDPP -PCA -GDA features for each image in a
video of length rare augmented further to feed into DBN as

1 2 3[ || || || … || ].rT O O O O (15)

Fig. 9. 3-D plot of GDA features of depth faces from six expressions.
III. DBN FOR EXPRESSION MODELING
Training a DBN consists of two main parts that are pre-
training and fine -tuning. The pre -training phase consists of
Bolt Restricted Boltzmann Machine (RBM). When the
network is pre -trained then the weights of the networks are
adjusted later on by fine -tuning algorithm. RBM is basically
very useful for unsupervised learning that contribute s for local
optimum error avoidance . One of the key benefits of using
DBN is th e ability of DBN to extract and select prominent
features from the input data. Each layer of RBM is updated
depending on the previous layer. Once the first layer is done
with computing the weight matrix, the weights are then
considered as an input for the second layer and so on. This
process continues to train RBMs one after another. Besides,
the input during this process is reduced layer by layer and
hence, the selected features at the hidden nodes of the last
layer can be considered as a vector of feature s for current BG
TG
PCs (Eigenvectors) Eigenvalues Corresponding to the PCs
Highest value indicating the most important PC
Second highest value indicating the second most important PC
Third highest value indicating the third most important PC

layer. The algorithm of Contrastive Divergence -1 (CD -1) can
be utilized to update the matrix of weights layer by layer [81].

Fig. 10. Structure of a DBN used in this work with 100 input neurons, 80
neurons in hidden layer1, 60 neurons in hidden layer2, 20 neurons in hidden
layer 3, and 6 output neurons.

Fig. 11. The pre -training and fine -tuning processes of DBN used in this work .

Fig. 10 shows a sample DBN where three hidden layers
consisting of different number of neurons in different layers
such as 100 for input layer, 80 for hidden layer1, 60 for
hidden layer2, 20 for hidden layer3, and 6 for output layers
indicating to train and recognize 6 classes i.e., expressions
in this regard . For initialization of the network, a gree dy
layer -wise training methodology is applied . Once the
weights of the first RBM are trained, h1 becomes fixed.
Then, t he weights of the second RBM are adjusted for
training using the fixed h1. Then, the third RBM is trained
with the help of previous RBM . The process of training a
typical RBM involves some crucial steps. First of all ,
initialization is done where a bias vector for the visible
layer P, a bias vector for the hidden layer H, a weight
matrix T are set to zero.

ℎଵ=൜1, ݂(ܪ+݌ଵ்ܶ)>ݎ
0, ݋ݐℎ݁ݏ݅ݓݎ݁
(16)
݌௥௘௖௢௡ =൜1, ݂(ܲ+ℎଵܶ)>ݎ
0, ݐ݋ℎ݁ݏ݅ݓݎ݁
(17)
ℎ௥௘௖௢௡ =݂(ܪ+݌௥௘௖௢௡்ܶ) (18)

Then , the binary state of the hidden layer ℎଵ is computed
using ( 16). Later on , the binary state of the visible layer ݌௥௘௖௢௡ is reconstructed from the binary state of the hidden
layer using (17). Then , the hidden layer ℎ௥௘௖௢௡ is re –
computed given ݌௥௘௖௢௡ where

݂(ݐ)=1/(1+݌ݔ݁(−ݐ)) (19)

The threshold value ݎ is learnt with the weights to determine
the output of the sigmoid function in the network and the
weight difference is computed as

∆T = ቀ௛భ௣భ
௅ቁ− (ℎ௥௘௖௢௡ .݌௥௘௖௢௡ )/ܮ (20)

where L is considered as batch size. Finally, the current
weight becomes a summation of the previous weight s. These
steps are repeated for all the batches. When RBM process is
done , a typical back propagation algorithm is applied for
adjustments of all parameters for fine-tuning. The pre -training
and fine -tuning steps are shown in Fig. 11.
IV. EXPER IMENTS AND RESULTS
A depth database was built for this work contain ing six
facial expressions: namely Anger, Happy, Sad, Surprise,
Disgust, and Neutral. There were 40 videos for each
expression where each video consist ed of 10 sequential
frames. For the experiments , four-fold cross validation was
applied to generate four groups of datasets where for each
fold, 30 videos were used for training and other 10 for
testing without overlapping the vide os in training and
testing. Hence, a total of 120 videos were applied for
training and 40 for testing respectively.
The RGB camera -based FER results are shown in
confusion matrix from Table I-Table VII . The mean
recognition rate using PCA with HMM on depth faces is
58%. Then, PCA -LDA was tried for FER which achieved
mean recognition rate of 61.50%. W e proceed to apply ICA
and HMM on the RGB facial expression images, obtained
80.50% mean recognition rate. Furthermore, LBP was
applied with HMM for FER that achieved the mean
recognition rate of 81.25%. Then, LDP was then tried
which achieved 8 2.91%, better than others so far. Later on ,
LDPP features were combined with PCA and GDA features
to be tried with HMMs that achieved 89. 58% mean
recognition rate. Then, LDPP -PCA -GDA features were
tried with DBN for better FER on the RGB faces which
achieved the recognition rate of 92.50 %, the highest in
RGB camera -based experiments.
Then, the FER experiments were continued to the depth
camera -based ones and the results are shown in confusion
matrix from Table VIII-Table XIV . We started with the
global feature extraction PCA with HMM where t he mean
recognition rate was 62%. Then, we moved on to local face
feature extractions from depth images . The mean
recognition rate utilizing ICA representation with HMM on
the depth facial expression images is 83.50%, indicating
better performance than applying PCA as well as PCA –
LDA (i.e., 65%) on depth faces. Then , LBP was tried with
HMM on the same database that achieved the mean
recognition rate of 87.91%. LDP with HMM was then

employed and achieved the better recognition rate than
LBP i.e., 8 9.16%. Then , proposed LDPP-PCA -GDA
features were applied with HMMs on depth faces that
achieved 91.67% mean recognition rate which is best
recognition rate in HMM -based experiments. Finally, t he
proposed LDPP -PCA -GDA features were applied with
DBN which showed its superiority over the other methods
by achieving the highest mean recognition rate (i.e.,
96.67 %).
TABLE I
EXPRESSION RECOGNITIO N USING RGB FACES USING PCA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 37.50 0 0 12.50 10 40
Happy 7.50 47.50 10 10 20 10
Sadness 0 17.50 70 12.50 0 0
Surprise 0 0 0 75 15 10
Neutral 0 5 25 10 60 0
Disgust 0 7.50 27.50 0 0 65
Mean 58

TABLE II.
EXPRESSION RECOGNITIO N USING RGB FACES USING PCA -LDA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 50 0 20 0 0 30
Happy 5 55 10 10 15 0
Sadness 0 15 75 15 0 0
Surprise 0 0 0 72.50 7.50 20
Neutral 0 7.50 30 7.50 55 0
Disgust 0 10 20 0 0 70
Mean 61.50

TABLE III
EXPRESSION RECOGNITIO N USING RGB FACES USING ICA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 75 0 10 0 5 10
Happy 7.50 82.50 10 5 5
Sadness 0 7.5 82.50 10 0 0
Surprise 0 0 0 80 12.50 7.50
Neutral 0 0 7.50 10 82.50 0
Disgust 0 0 17.50 0 0 82.50
Mean 80.50

TABLE IV
EXPRESSION RECOGNITIO N USING RGB FACES USING LBP WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 82.50 0 7.50 0 0 10
Happy 5 80 10 10 0
Sadness 0 10 80 10 0 0
Surprise 0 0 0 80 12.50 7.50
Neutral 0 5 7.50 5 82.50 0
Disgust 0 2.50 10 10 0 77.50
Mean 81.25

TABLE V
EXPRESSION RECOGNITIO N USING RGB FACES USING LDP WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 85 0 5 0 0 10
Happy 0 82.50 7.50 10 0
Sadness 0 7.5 82.50 10 0 0
Surprise 0 0 0 82.50 10 7.50
Neutral 0 5 7.50 0 85 0
Disgust 0 0 10 10 0 80
Mean 82.91

TABLE VI
EXPRESSION RECOGNITIO N USING RGB FACES USING LDPP -PCA -GDA WITH
HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 90 0 5 0 0 5
Happy 2.50 90 0 5 5
Sadness 0 2.50 87.50 10 0 0
Surprise 0 0 0 92.50 7.50 0
Neutral 0 5 10 5 90 0
Disgust 0 2.50 12.20 0 0 87.50
Mean 89.58

TABLE VII
EXPRESSION RECOGNITIO N USING RGB FACES WITH LDPP -PCA -GDA WITH
DBN.
Anger Happy Sadness Surprise Neutral Disgust
Anger 90 0 0 0 0 10
Happy 2.50 92.50 0 0 5
Sadness 0 2.50 92.50 7.50 0 0
Surprise 0 0 0 95 5 0
Neutral 0 5 7.50 0 95 0
Disgust 0 0 10 0 0 90
Mean 92.50

TABLE VIII
EXPRESSION RECOGNITIO N USING DEPTH FACES WITH PCA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 50 0 20 0 15 15
Happy 7.50 52.50 10 10 20 10
Sadness 0 17.5 70 12.50 0 0
Surprise 0 0 0 80 15 5
Neutral 0 5 25 10 60 0
Disgust 0 5 27.50 0 0 62.50
Mean 62

TABLE IX
EXPRESSION RECOGNITIO N USING DEPTH FACES WITH PCA -LDA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 55 0 25 0 10 10
Happy 10 60 10 5 15 0
Sadness 0 17.5 75 7.50 0 0
Surprise 0 0 0 75 10 15
Neutral 0 10 27.50 0 62.50 0
Disgust 0 0 20 12.50 0 67.50
Mean 65

TABLE X
EXPRESSION RECOGNITIO N USING DEPTH FACES USING ICA WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 80 0 10 0 0 10
Happy 0 85 10 5 0 0
Sadness 0 0 85 15 0 0
Surprise 0 0 0 82.50 17.50 0
Neutral 0 5 15 0 85 0
Disgust 0 0 15 2.50 0 82.50
Mean 83.50

TABLE XI
EXPRESSION RECOGNITIO N USING DEPTH FACES USING LBP WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 87.50 0 12.50 0 0 0
Happy 10 87.50 0 0 2.50 0
Sadness 0 0 85 0 15 0
Surprise 7.50 0 0 92.50 0 0
Neutral 0 2.50 10 0 87.50 0
Disgust 0 0 12.50 0 0 87.50
Mean 87.91

TABLE XII
EXPRESSION RECOGNITIO N USING DEPTH FACES USING LDP WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 87.50 0 10 0 0 2.50
Happy 10 90 0 0 0 0
Sadness 0 0 87.50 0 12.50 0
Surprise 7.50 0 0 92.50 0 0
Neutral 0 2.5 7.50 0 90 0
Disgust 0 0 12.5 0 0 87.50
Mean 89.16

TABLE XIII
EXPRESSION RECOGNITIO N USING DEPTH FACES USING LDPP -PCA -GDA
WITH HMM.
Anger Happy Sadness Surprise Neutral Disgust
Anger 90 0 10 0 0 0
Happy 7.50 92.50 0 0 0 0
Sadness 0 0 90 10 0 0
Surprise 0 0 0 95 0 5
Neutral 0 0 10 0 90 0
Disgust 0 0 10 0 0 90
Mean 91.25

TABLE XIV
EXPRESSION RECOGNITIO N USING DEPTH FACES USING LDPP -PCA -GDA
WITH DBN.
Anger Happy Sadness Surprise Neutral Disgust
Anger 95 0 0 0 0 5
Happy 0 97.50 0 2.50 0 0
Sadness 0 0 97.50 0 2.50 0
Surprise 0 2.50 0 97.50 0 0
Neutral 0 0 2.50 0 97.50 0
Disgust 5 0 0 0 0 95
Mean 96.67 V. CONCLUSION
Facial expression recognition (FER) is the most natural way
of human emotion expression . A basic FER system consists of
three main parts . First part is face image preprocessing that
acquires the image , tries to improve the image quality after
background elimination. Second part is feature processing that
tries to obtain distinguishable robust features for each
expression so that each expression can be represented as much
different from each other. The third and final one is expression
recognition that recognizes facial expression s by applying
robust features on a strong pre -trained expression model.
Usually, face features are considered to be sensitive to noise
and illumination due to changes in the light sources which
often generates comp lexity by merging the face features from
different classes to each other in the feature space. Therefore ,
an FER system performance is considered to be much
dependent on feature extraction . In this work , we have
proposed a novel approach for emotion recogn ition from facial
expression depth videos where a novel feature extraction
method consisting of LDPP, PCA, and GDA has been
investigated . The proposed method considers tolerance against
illumination variation and tries to extract salient features by
utiliz ing prominent directional strengths of the pixels where
couple of top strength directional position s and the sign s of the
strengths are considered . Besides, the proposed features can
be adopted to overcome critical problems which could not be
resolved by traditional LDP feature extraction such as
generating different pattern s for two edge pixels where they
have dark and bright sides opposite to each other . One major
advantage of the proposed FER system is using depth face
over RGB where depth map can be im plemented without
revealing the identity of the subject as depth face pixels are
distributed with respect to the distance to the camera .
Therefore , the original identity of a subject is hidden that may
resolve privacy issues related to the permission of the subjects
in database . The robust LDPP -PCA -GDA features have been
implemented with a state -of-the-art machine learning
technique, Deep Belief Network (DBN) for expression
training and recognition. The proposed FER approach was
compared with traditional approaches where it showed its
superiority over others. The proposed system could be adopted
to contribute in any smartly controlled environment for video –
based emotional healthcare.
REFERENCES
[1] P. Baxter and J. G. Trafton, “Cognitive Architectures for Huma n-Robot
Interaction,” in Proceedings of the 2014 ACM/IEEE international
conference on Human -robot interaction – HRI ’14, Bielefeld, Germany:
ACM Press, pp. 504 –505, 2014.
[2] S. Tadeusz, "Application of vision information to planning trajectories
of Adept Six -300 robot," in Proceedings of 21st International
Conference on Methods and Models in Automation and Robotics
(MMAR) , Miedzyzdroje, Poland, pp. 1069 -1075, 2016.
[3] D.-S. Kim, I. -J. Jeon, S. -Y. Lee, P. -K. Rhee, and D. -J. Chung,
“Embedded Face Recognition based on Fast Genetic Algorithm for
Intelligent Digital Photography,” IEEE Transactions on Consumer
Electronics , vol. 52, no. 3, pp. 726 –734, 2006.
[4] C. Padgett and G. Cottrell, “Representation face images for emotion
classification,” Advances in Neural Informati on Processing Systems ,
vol. 9, Cambridge, MA, MIT Press, 1997.

Similar Posts