8th International Multidisciplinary Symposium [602948]
8th International Multidisciplinary Symposium
„Challenges and opportuniti es for sustainable development through quality and
innovation in engineering and research management ”
UNIVERSITARIA
SIMPRO 2018
A Combinatorial Gray Wolf Optimization Algorithm for Finding
2D Protein Conformation
Ioan Simaa *
a”Babes -Bolyai” University , str. M. Kogalniceanu, nr. 1, Cluj-Napoca, nr. 1 RO-400084 , Romania
Abstract
Protein folding problem is one of the exciting subjec ts in the bioinformatics field. It consists in finding t ertiary or quaternary
structure from an amino acid sequence. Gray Wolf Optimizatio n (GWO) algorithm is a new meta -euristic method dedicated to
solving optimization problems with continuous functions . In this paper, we propose a novel variant of Gray W olf Optimization
algorithm for finding the optimum of combinatorial function s (cGWO ). This variant include s crossover operation taken from
evolutionary methods. In our proposal, it is applied for solving p rotein folding i n HP 2D model, w hich is a combinatorial problem .
Furthermore , Protein Folding is NP -complete problem. Results of experiments have shown that the proposed cGWO has a weak
convergence . However, the best known energy in the literature was reac hed in this study . Energy conformations close to optimum are
obtained from the first iterati on if the size population is large enough .
Keywords: Grey Wolf Optimisation; Combinatorial optimization; Protein Folding; 2D HP Model; Crossover ; Bioinformatics ;
1. Introduction
The proteins are one of the most important constituents of living matter . They are epigenetic ex pression of the
cellular genome (Dinu et al, 2006) . They perform and control the essential functions in living cells and organisms.
Therefore , the knowledge o f the structure and their functions is essential to understanding the functioning of the living .
The t ertiary or quaternary structure (also called native conformation or folded protein) is difficult to predict because
the space of the possible c onformations is huge and complex. This is known as protein folding problem .
At the molecular level, the proteins are linear polymers formed from 20 types of α-amino acids (aa), called
proteinogenic amino acids . The length of the protein s varies from a few tens to a few thousan d, even tens of thousands
of aa . Thus , the molecular mass varies fro m hundreds to millions of u.a (Dalton) .
Amino acids have similar che mical structures. These differ by the degree of hydrophobicity, charge and size.
Levinthal outlines the principle ac cording to which the native conformation can be predicted only from their linear
sequences (Levinthal, 1969).
Over time, several models have been created for the representation of protein conformations. One of the simplest and
coarser models , but one that still captures the essence of the folding phenomenon , was proposed by Lau and Dill, 1989 .
It has two simple forms : square or 2D – used in th is paper – and cubic or 3D type, closer to the real protein.
To solve protein folding on this model, s o far , it has been proposed a lot of computationa l methods . In Unger and
Moult, 1993, it was applied a genetic algorithm (GA) with a form of simulated ann ealing (SA) and Turabieh used a
hybridized GA with a form of local search (LS) called Great Deluge Algorithm (Turabieh, 2016) .
It was also proposed an Ant Colony Optimization Algorithm (ACO) with the 2D and 3D Model (Shmygelska and
Hoos, 2003, 2005). In th e same way, it has been proposed a n improved hybrid algorithm which is a mix between
particle swarm optim izer (PSO) and tabu search (TS) and it is used for prediction protein structure for 3D HP lattic e
(Zhou et al, 2013) . In Czibula et al, 2011, it is fou nd a Q-learning approach for solving the 2D HP lattice.
The Grey Wolf Optimization ( GWO ), a new inspired of nature algorithm, was proposed by Mirjalili et al, 2014.
Initial, it was created for solving continuous functions . Madadi and Motlagh, 2014, used it for optimal tuning of
* Corresponding author. Tel.: +4-074-584-9607 .
E-mail address: sima.ioan@cs.ubbcluj.ro .
Applied mathematics, inf ormatics and physics
proportional -integral -derivative (PID) controller parameters for DC Motor drive to find the global o ptimum solution in
search space and Yassien applied the GWO to the 0/1 Knapsack Problem (Yassien et al, 2017) .
Lu et al, 2016, change d the GWO for the approach of combinatorial issues. This variant , named by them , MODGWO,
is a multi -objective discrete grey wolf optimizer used for scheduling problem in welding production. The authors
modified the search operator developed in originally G WO for solving continuous optimization problems, with a
mutation shift variant . Also, the GWO has been modified for approach the binary problem for feature selection (Emary
et al, 2016).
After this state of the art, we can say that the GWO algorithm has no t yet been applied for solving the protein folding
problem.
This paper is organized as below . After this introduction, the next Sec tion presents briefly the protein folding
problem. In Section 3 , we will describe in very short detail s the Grey Wolf Optimiz ation (GWO) algorithm for
continuous function. Then a modified GWO algorithm, adapted for combinatorial function, that we called cGWO , is
propo sed in th is paper for solving the 2D HP model. This model will be explained in Section 4. In Section 5 , we will
analyze the results of our experiments. The last section contains the conclusions and further work.
2. Protein folding problem
There are 20 proteins genic (protein -building) amino acids (aa), which differ one from another by the nature of side
chain R. Biochem ical notation uses three letters, while bioinformatics uses a single English letter for its representation.
The proteins (also called protein sequences) are linear chains of α -amino acids. The proteic macromolecule has four
levels of organization: primary structure, secondary structure, tertiary structure, and quaternary structure (Mihalaș,
2011) .
Primary structure is the amino -acid sequence, poly -peptide chain being represented as an ASCII string in
bioinformatics.
Secondary structure is made of by regula rly repeating local substructures (α -helix, β -sheet, turns) stabilized by
hydrogen bonds .
Tertiary structure is the overall shape of a single protein macromolecule, representing the spatial (3 -D) relationships
of the secondary structures to one another. A synonym term to tertiary structure is fold – it enables th e basic function of
a protein .
Quaternary structure is sometimes formed by several protein molecules (called protein subunits) which function as a
single protein complex.
Protein folding is the phys ical process through which their uni -dimensional primary structure becomes three –
dimensional tertiary (or quaternary) one, by means of a sequence of twists, bends, and bundles. Most proteins fold into
unique 3 -dimensional structures and the shape of the na tural folding process is called native conformation. In the case
of small proteins, the folding process acts spontaneously, both in vivo and in vit ro.
Native (or folded) protein conformation is difficult to predict.
Due to its numerous applications in gen etic engineering and medicine, the predic tion protein structure folded,
starting from its primary structure is one of the greatest c hallenges of current bioinformatics research.
To solve it, this problem has been addressed from the perspective of several a lgorithms applied on series simple or
more complex models.
3. HP 2D Lattice Model
One of the most popular considered models is HP model, which was proposed by Lau and Dill in 1989. This model
introduces a series of simplifications.
First, HP being a coarse mo del, the units that are represented on lattice are the amino acids, not atoms.
Second, the 20 proteinogenic amino acids are classified only into 2 classes: H (hydrophobic) and P (polar, or
hydrophilic). The H amino acids reject the water and P amino acids attract the water. Due to the fact that the globular
proteins are submerged in an aqueous internal environment of living organisms, they will form a hydrophobic core, as
can be seen in fig 3. Thus, it will have an alphabet composed by 2 letters, {H, P}, be tween which it can form up to 4
types of contacts: H..H, H..P, P..H and P..P.
The third, the space is discretized in very simple lattices. For bi -dimensional lattices, the simplest space is the square
and for tri -dimensional lattices a simple space is the cube. In these spaces, amino acids can be rotate with exactly 90
degrees. The encoding of directions of amino acids can be absolute or relative. In our work, we are focused on
experimenting in 2D space, where amino acids can be rotate with 90, 180 or 270 d egrees and it was used the absolute
encoding of directions. In this way, the directions are: Right – R(0), Down – D(1) Left – L (2), and Up – U(3). For 2D
HP model and absolute encoding, it will have an alphabet of directions composed by 4 letters: {L; R; U; D}.
SIMPRO 2018 : „Challenges and opportunities for sustainable development through quality and innovation in engineering and research
management”
3
The fourth, the energy of the HP model is a convention which reflects the fact that H aa to form a hydrophobic core.
Thus, for each pair of non -consecutive H (H..H) aa in the protein sequence, but which are adjacent on the lattices
(topological neig hbors), the energy of conformations decreases with a value, e, equal to -1. For the other three types of
contacts (H..P, P..H, P..P), the value of e is considered equal to 0. In fact, the energy of conformation is the number of
contacts between H…H aa that are not consecutive in sequence (see fig. 3 ).
These being given, the folding protein problem, in HP models, can be defined as a problem of combinatorial
optimization.
The HP sequence is given as: s=s 1s2s3..si..sn, where: n – no of aa;
PH si , ; the native conformation is the
conformation which has the minimum energy, cm, of sequence s, where
)(sCcm , and C(s) – the set of feasible
conformations of the sequence s.
The energy of conformation cm, E(c m), is:
)}( |)( min{)( sCccE cEm
, where E(c) – energy of conformation c (1)
E(c) is:
n
jisj siba cE
1,)(
(2)
where: asi, bsj = 1 if si is H aa, and asi, bsj =0, otherwise
and asi and bsj are topological neighbors, but are not sequence neighbors
4. Gray Wolf Optimization (GWO) Algorithm
Grey Wolf Optimization (GWO) is one of the newest meta -heuristics proposed by Mirjalili et al, 2014. It is a Swarm
Intelligence algorithm which imitate the social hierarchy and hunting behavior of grey wolves.
The grey wolves are divided into four categories: alpha -wolf (α) is the highest; beta -wolf (β) is the second, obedient
only to α -wolf; and delta -wolf (δ) is the third in hierarchy. The others are omega -wolves (ω). During the rush, ω -wolves
follow the three leaders.
Fig. 1. The social hierarchy of the grey wolves.
The encircling and hunting of the victim can be mathematically formulated by (3) -(6).
)( )( tXtXCDp
(3)
DAtX tX )( )1(
(4)
1 21 ra A
(5)
22r C
(6)
where the components of r1 and r2 are uniform random vectors in [0,1] and the components of vector a decrease
uniformly from 2 to 0 over iterations. Xp is the position vector of the prey, t is the c urrent iteration, and X represents the
position vector of a current grey wolf.
The standard GWO was created to find the optimum of continuous functions.
The prey represents the solution that should be found. As this solution is not known, it is considered that an
intermediate position of the first three wolves could be very close to the place of the prey (the optimal solution) .
Applied mathematics, inf ormatics and physics
Since the optimum position (prey) is not known, the optimization process consists in capturing the prey, which is
presumed to find in the positions of the three wolves dominated (α, β and δ). Finally, the optimal value is the best
position of the α -wolf.
The equations are:
X XC D 1
,
X XC D 2 ,
X XC D 3 (7)
DA X X 1 1
,
DA X X 2 2 ,
DA X X 3 3 (8)
33 2 1
)1(X X XXt
(9)
Advantages of algorithm:
Few parameters required by the algorithm
There is no need for mutation or crossover probabilities like Genetic Algorithms.
It is easier to implement.
The downside is that GWO was created to solve continuous optimization problems. As the Protein Folding Problem
is a combinatorial optimization problem, in this situation the GWO cannot be applied in its classical form.
5. Gray Wolf Optimization com binatorial (cGWO ) Algorithm
In order to be applied the GWO for combinatorial optimization problems, our work consist in adding a crossover
type, so that in this way the GWO can be applied to problems that involve discrete or combinatorial functions.
Thus, the modified search operator is replaced with crossover operation. From the GWO algorithm it is retained the
idea of going through the entire wolf population that makes crossover with one of three dominant wolves:
otherwiserndrnd
elseifif
tctc crossovertctc crossovertctc crossover
tc
iii
i 3/23/1
)),(),(()),(),(()),(),((
)1(
(10)
where ci(t+1) is the conformation resulting from the application of the crossover operator, ci(t) is the current wolf of
current iteration , and cα(t), cβ(t), cδ(t) is the conformation of the α -wolf, β -wolf, δ -wolf, respectively . Rnd is a uniform
random value betwe en [0,1].
In this work, we used single -point crossover. The place, where the string is cut, varies linearly with number of
iteration. In the first generations, the location where will be the cutting is close to 1, so that towards the end of the
algorithm i t reaches close to n -1, where n is the length of the protein sequence. Therefore, at the beginning is
encouraged exploration, but finally, exploitation. This is because, at first, the resulted conformations from the crossover
have big sequences from ω -wolves, and to the end, they have a large portions of one of three dominant wolves, which
are conformation with high energy.
For example, for a structure by 8 aa, in below image can see how the first half of α -wolf is concatenate with the
second half of the current wolf (an ω -wolf). Note that, not all conformations results crossover operation are feasible.
Fig. 2 . The crossover operation. (a) the resulti ng conformation is weaker than parents . (b) the resulting has o intermediate energy . The red is
Hydrophobic aa and the blue is Polar aa.
a b
SIMPRO 2018 : „Challenges and opportunities for sustainable development through quality and innovation in engineering and research
management”
5
In the above picture it can be seen that after crossover operation it is obtained the following cases:
the resulting conformation is weaker than the initial conformations . See fig. 2 (a).
the resulting conformation is better than the initial conformations
the energy of the resulting has an intermediate value between the energies of the initial conformations . See fig. 2 (b).
6. Experiments (Simulation results)
The aim of this paper was to experimentally evaluating in our proposed combinatorial Grey Wolf Optimization
algorithm (cGWO) . The cGWO was implemented using C# programming language and tested on one standard
benchmark HP protein . The para meters of the cGWO algorithm are: number of grey wolf (population size): 10 –
100,000; numb er of runs for every population size: 1 0, number of iterations: 1,000 .
The sequence tested in this paper is: HPHPPHHPHPPHPHHPPHPH , a sequnce whic h contains 20 aa . This sequence
is well -known structure. It was taken from Unger and Moult, 1993. In its case, it is known that there are 83,779,155
feasible conformations of wich only 4 conformations have the lowest energy level , -9. Note that an optimal
conformation is very difficult to achieve. The fraction of the optimal con formations from the total number of
conformations is extremely small, only 4.77 x 10-8.
Below, i n table are presented the best result from the 10th runs for every population size. It can be observed that our
approach has a weak converge nce and that the lowest energy was reached in t he case of big population size (the cGWO
is not able to increase signifiant the initial solutions). ). It can also be seen that the minimum energy was obtai ned in all
cases in the first 100 iterations. In the other 900 iterations, is not significant evolution which took place. From this, it
can be deduced that the size of the population is more important than the number of iteration. This means that the
cross over operator in this form is not the best solution.
In this experiments the α -wolf it is not conserved from one iteration to another.
Table 1 .Results of our proposed c GWO
Population
size Energy
from
first
iteration Best energy
(solution) No of iteration w here
the best energy was
reached
10 -3 -7 87
50 -4 -6 93
100 -4 -6 68
200 -4 -6 24
500 -5 -8 62
1,000 -6 -8 68
5,000 -8 -9 62
10,000 -7 -9 49
50,000 -7 -9 43
100,000 -8 -9 37
It can be seen that our algorithm is able to obtain the con formations with the best energy .
One of t he conformation found (energy is -9), is represented in fig below :
Fig. 3 . The best conformation finding the cGWO . The red is Hydrophobic aa and the blue is Polar aa. It can see that H aa forms a hydrophobic core.
In below figure is represented the convergence of cGWO for protein folding on 2D HP lattice for populationa size =
10,000 . In the same picture , it was represented only the first 100 iterations.
Since the first iteration a low energy conformation has been generate d (-7). For 42 iterations there was no evolution.
In iteration 43, a -wolf reaches the energy -8, so after another six iterations reaches the minimum energy ( -9)
Applied mathematics, inf ormatics and physics
Fig. 4 . The convergence of cGWO
7. Conclusion and further work
In this work, we proposed a Gray Wolf Optimization Algorithm for solving of protein folding problem on the bi –
dimensional Hydrophobic -Polar Model. Due to the fact that the GWO is a meta -euristic for finding the optimum for
continuous functions, and protein folding is a combinat orial funct ion, in this paper was presented a modified GWO
Algorithm ( cGWO ) adapted to search for solutions in combinatorial spaces . This includes the crossover, a known
evolutionary operator.
This algorithm is simple and shows a good result on the tested sequence on 2D HP lattice model . But the
convergence of the algortihm is weak.
The further work will consist in the improve ment of our algorithm including mutation and finding ideas for
increasing convergence. Then, our research can be extended to other large HP prot ein sequences and the 3D HP
Model s.
References
Czibula., G., Bocicor, M -I., Czibula, I -G., An experiment on protein structure prediction using Reinforcement Learning ,
Studia Univ. Informat ica, LVI ( 1), pp. 25 -34, (2011) .
Dinu, V., Truția E., Popa -Cristea E., Popescu A., Biochimie medicală – mic tratat , Ed. Medicală, București, (2006).
Emary, E., Zawbaa, H.M., Hassanien, A. E.,. Binary Grey Wolf Optimization Approaches for Feature Selection.
Neurocomputing. 172, pp. 371 -381, (2016) .
Lau K.F., Dill K.A., A lattice statistical mechanics model of the conformation and sequence space of proteins,
Macromolecules, . 22, pp. 3986 -3997, (1989).
Levinthal, C., How to fold graciously , Mössbaun Spectroscopy in Biological Systems Proceedin gs, Univ of Ilinois
Bulletin, pp. 22 -24, (1969).
Lu, C., Xiao, S., Li, X., Gao, L., An effective multi -objective discrete grey wolf optimizer for a real -world scheduling
problem in welding production . Adva nces in Engineering Software 99, 161–176, (2016)
Madadi, A., Motlagh, M.M ., Optimal control of DC motor using grey wolf optimizer algorithm. Tech J Eng Appl Sci. 4,
pp. 373-379, (2014).
Mihalaș, Gh -I., Tudor, A., Paralescu, S., Bioinformatica , Ed. Victor Babeș, Timișoara, (2011 ).
Mirjalili, S., Mirjalili, S.M., Lewis, A., Grey wolf optimizer. Advances in Engineering Software, 69, pp. 46–61, (2014 ).
Shmygelska, A., Hoos , H.H., An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein Folding
Problem, In Springer Verlag, editor, In Proceedings of th e 16th Canadian Conference on Artificial Intelligence, pp.
400-417, (2003 ).
Shmygelska, A., Hoos , H.H., An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein
folding problem, BMC Bioinformatics, 6 ( 30), (2005 ).
Turabieh, H., A Hybrid Genetic Algorithm for 2DProtein Folding Simulations , International Journa l. of Computer
Applications, 139 (3), pp. 38 -43, (2016 ).
Unger, R., Moult, J., Genetic algorithms for protein folding simulations . Journal of Molecular Biology, 231, pp. 75 -81,
(1993).
Yassien, E., Masadeh, R., Alzaqebah, A., Shaheen, A., Grey Wolf Optimization Applied to the 0/1 Knapsack Problem .
International Journal of Computer Applications , 169, pp. 11-15, (2017).
Zhou, C., Hou, C., Zhang Q., Wei, X., Enhanced hybrid search al gorithm for protein structure prediction using the 3D –
HP lattice model , J of Molecular Modeling, 19 ( 9), pp 3883 –3891 , (2013 ).
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: 8th International Multidisciplinary Symposium [602948] (ID: 602948)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
