„BABEȘ -BOLYAI” UNIVERSITY CLUJ -NAPOCA FACUL TY OF MAT HEMATIC S AND COMPUTER SCIENCE RESEARCH PROJECT Protein folding s imula tion with chaperones… [600753]
„BABEȘ -BOLYAI” UNIVERSITY CLUJ -NAPOCA
FACUL TY OF MAT HEMATIC S AND COMPUTER SCIENCE
RESEARCH PROJECT
Protein folding s imula tion with chaperones using
parallel MD algorithms
PhD supervisor
Prof. univ. dr. BAZIL PÂRV
PhD student: [anonimizat] 2017
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
2
Introdu ction
This research plan aims to describe the goals and activities of our research stage at Center for
Bioinformatics, Chair of Computational Biology at the Saarland University , Germany . My general
objective is to study protein folding simulation using parallel MD algorithms and chaperones.
This material is organized as follows. After this short introduction, the next section briefly
presents the biochemical aspects of the protein folding process, while the second section is dedicated
to the protein folding simulation, describing in very short detail the models and algorithms used so
far. The last section presents the proposed research plan.
1. Biochemical aspects
1.1. The Protein s
Proteins are the most important substances containded in a living cell . They are
macromolecules consisting of one or more chains of aminoacids. There are 20 different amino acids,
and in a protein molecule there is at least one linear chain of amino acid residues, called a
polypeptide.
Amino acids are organic compounds containing amine (-NH 2) and carboxyl (-COOH)
functional groups , along with an organic constituent or side chain (R group ) ([MIH11] ,[BED85] ).
Their generic formula is :
H2N-CHR -COOH
There are 2 0 proteinogeni c (protein -building) amino acids, which differ one from another by
the nature of side chain R. Biochemical notation uses three letters, while bioinformatics uses a single
English letter.
Protein structure . The protein s are linear polymers built from sequences of L -alpha -amino
acids ([BED85] ) bound together by peptide links : -CO-NH-. The proteic macromolecule has four
levels of organization : primary structure, secondary structure, tertiary structure, and quaternary
structure .
Primary structure is a amino -acid sequence, poly-peptide chain being represented as a n ASCII
string in bioinformatics.
Secondary structure is made of by regularly repeating local substructures ( α-helix , β-sheet ,
turns ) stabilized by hydrogen bonds ([MIH11] ).
Tertiary structure is the ove rall shape of a single protein macro molecule, representing the
spatial (3 -D) relationships of the secondary structures to one another. A synonym term to tertiary
structure is fold – it enables the basic function of a protein ([BED85] ).
Quaternar y structure is sometimes formed by several protein molecules (called protein
subunits ) which function as a single protein complex .
Protein functions . Proteins are carrying out the duties specified by the information encoded
in genes. Their function s are supported by chemical, physical, biological, and steric factors
([DIN06],[BED85] ). Of the most important functions we mention here their constructive role (cell
construction), enzymatic role (catalysis of chemical reactions), defence role (immunology), transport
role (ions and molecules), regulatory role (hormones), and contractile role (actine and myozine in
muscle fiber) .
Protein life cycle . Protein life cycle contains its unfolded stat e (primary structure), its native
conformat ion (fol ded), or a misfolded state; other possible states are : degraded (usually in apoptosis)
or aggregated (in some critical diseases) ([MCC05],[MUN16]).
The beginning of the protein life cycle is the ribozomal synthesis, followed immediately by
folding. In its native conformation, the protein is able to perform its duties; the life cycle ends when
physiological (or patological) degradation takes place. All folding processes during the whole life
cycle (folding, unfolding, misfolding, re -foldin g) are facilitated by chaperons. They also help in
identification of misfolded proteins and their correct re-folding .
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
3
1.2. Protein folding process
Protein folding is the physical process through which their uni -dimensional primary structure
becomes three -dimensional tertiary (or quaternary) one, by means of a sequence of twists, cuts, bends,
and bundles. Most proteins fold into unique 3 -dimensional s tructures, and the shape of the natural
folding process is called native conformation. The knowledge of this process is important for the
prediction of protein structure and their function, drug design and many other reasons.
In the case of small proteins, the folding process acts spontaneously, both in vivo and in vitro.
The folding of b igger proteins, in vivo, is aided by chaperons.
Protein folding problem defined by Anfinsen ([ANF7 3]), refers to (a) finding a native
(functi onal) conformation and (b) finding the folding pathway, by any means (physical, bio -chemical,
or computational ). It is a computational challenge because the number of possible conformation and
folding pathways is really huge.
There is no complete explanati on of the folding process. There are may possible (and partial)
explanations, such as thermodynamic hypothesis, Anfinsen ’s dogma, and Levinthal paradox.
Thermodynamic hypothesis , introduced by Anfinsen ([ANF61] ), states that, during folding,
proteins are searching the most stable conformation, i.e. that with minimal potential energy. This
hypothesis allows the use of quantitative techniques .
Anfinsen ’s dogma states that the tertiary structure depends only on the pr imary structure; in
other words, all information used by the folding process is contained in the primary structure. Most of
the simulations performed use this. This means that ST=f(S P), where ST is the tertiary structure, SP is
the primar y structure, and f is a bijection.
The Levinthal paradox ([LEV69] ) observes that protein folding time in vivo is very small
(micro or milli -seconds), despite the huge number of possible conformations. A possible explanation,
given by Levinthal, is that the nature proceeds i n the folding by forming some local amino acid
sequences, serving as nucleation points, intermediaries or partially folded transition states. Also he
suggests that the native state can be stabilized at a higher energy, in the case when the minimal energy
cannot be reached due to kinetical reasons (and thus violating the thermodynamic hypothesis).
Factor s influencing protein folding . There are two main categories : physico -chemical (the
orientation of electron cloud , the nature of electronic bonds , the solvent -water or other ) and biologic al
(endoplasmic reticulum ER, chaperon s, prions, other proteins ) ([DIL90] ,[HEL02 ])
Chaperons . Not all proteins obey the minimal energy principle (thermodynamic hypothesis)
in their spontaneous folding process. Some of them have a complex/unstable conformation, and are
aided by chaperons (specialized proteins) to gain their native (correct) conformation ([MCC05] ).
Chaperons participate in all phases of the proteic life cycle, including partial or total unfolding, or
aggregate / misfolding state ([MUN16]).
2. The state of the art in protein folding simulation
2.1. The i mportan ce of protein folding simu lation
Any non -native conformation and any protein misfolding lead to loss of proteic function
([TUR 16]), causing Alzheimer and Parkinson diseases , some cancer types, or proteing aggregation
([ENG07 ], [DYS16 ]).
Obtaining the native conformation by classical methods (NMR, X -Ray difraction , Dual
polarisation interferometry ) is very costly in terms of money and time. Due to this reason, until now
we know the native conformation of some 100,000 proteins ([PDB ])., a small part of the
comprehensive set of all proteins in the living cells. The alternative to classical methods is to obtain
native conformation of proteins by computer simulation. When this will be effective, being quicker
and cheaper than classical approaches, it will allow us to find tertiary structure and the pathway,
making it possible the identification of their biological function and, maybe, the discovery a new
function, not known so far.
Even in the case of proteins with known native conformation and biological function, the
simulation will be useful for cellular simulation.
After some 40 years of efforts in protein folding simulation, despite some successes for small
proteins, the general solution is not available.
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
4
2.2. Model s for protein folding simula tion
In what follows we discuss some models for representing the polypeptide chain and used in
simulation algori thms. We discuss two classification criteria, with respect to model resolution and
potential energy function computation, respectively.
a). Model resolution criterion :
All-atom models : fine-grained model s, representing all atoms in the polypeptide
chain , and considering the force field generated by each atom. They need huge
computing resources, but provide very accurate results: native state and folding
pathway. Used in folding simulation of small proteins .
Coarse -grained models : medium -grained models, rep resenting Carbon -alpha atoms
(polypeptide chain backbone) and amino acid residues only. Examples are: coarse –
grained model from Gdansk -Cornell group ([LIW11] ,[MAI10] ,[SIE17] ), Frenet
frame, HP-SC ([LIM02] , ([BEN10])) , off-latice ([LU03] ), difusion models ( [BES11 ]),
etc. The computational resources needed are smaller than in the case of all -atom
models, with some reduction in precision. All models provide native state, and some
of them provide a coarse version of the pathway.
Lattice models : low-resoluti on models based on some simplified hypotheses.
Examples are : HP ([DIL85], [LAU89] ), FCC ([BAC06] ), HPNX ([BAC99],
[MAN14] ), etc. Compared to coarse -grained models, lat tice models need fewer
computational resources. They provide the native conformation only .
b) Potential energy function criterion :
Phys ics-based models . Computing of the potential energy uses physics laws only .
This requires complex computations, and provides accurate results
Knowledge -based models : use data from biochemical studies. Different techniques
fall into this category :
– Homology modeling . Combines classical and computational techniques
(the alignment of amino acid sequences , modeling tools like Swiss –
Model, PyMOL, etc) for predicting native structure, based on known
native conformations of some related species ([BIS11], [CHA14] ).
– Threading – superposes studied ( target ) amino acid sequence over
sequences taken from a database in order to compute a fitting (matching)
index ([HE13 ]).
– Minithreading (fragment coupling) – predicted sequence is assembled
from a sequence of small fra gments of known shape ([HE13 ],[KRU16] ).
2.3. Techniques for protein folding simula tion
Protein folding simulation started in the 1970s, using deterministic and nondeterministic
algorithms .
A deterministic algorithm DA produce always the same output, given a particular input. DA
for protein folding will be discussed below. In contrast to a DA, a nondeterministic algorithm NA
can produce different outputs on different runs, given the same particular input. Usually, NAs are
used to find an approximate solution, when the exact one is hard (or almost impossible) to obtain due
to computational reasons.
Probabilistic (or randomized) algorithms are a subclass of NA which depend on a random
number generator. All computational intelligence algorithms belong to this category: Genetic
algorit hms (GA)([HOL75 ], [UNG93 ]), Neural Networks (NN) ([SAN13] ), Ant Colony
Optimization (ACO) ([SHM03], [SHM05 ]), Particle Swarm Optimization (PSO) ([LIN11]) ,
Machine learning ([CZI11a], [CZI11b], [CZI11c], [CZI11d], [CZI11e], [BOC13] ), Constraint
Logic Programming ([DOV11]) , etc.
In what follows we refer to deterministic algorithms for protein folding simulation.
Molecular Dynamics (MD) Simulations use all -atom models, as well as some coarse –
grained ones. They provide both native state and folding pathway. They use numerical integration
algorithms. In order to reduce the computational load, they use approximation in computin g energy
function ([MIA15] ). Well -known software tools for MD Simulations are: AMBER, CHARMM,
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
5
GROMACS, GROMOS, OPLS -AA, ROSETTA, UNRES , ([LIW10],[LIW11] ), STAPL ([THO05]) ,
SWISSPROT.
As far as we know, all MD simulations so far (with the notable exception of [FAN04] and
[FAN0 6])) don’t take into account the role of chaperons. In the mentioned papers, the chaperon action
was mimicked by the solvent (water) from environment .
3. Proposed work
3.1. Research objectives and activities
Our future research is directed to combine (a) the use of parallel MD simulation algorithms on
all-atom models and (b) the action of chaperons in the folding process.
The table below contains the planned research o bjectives and activities during our research
stage .
Table . Research plan
Month Objective s Activities
Month
1
1. Documentation 1.1. Investigation of MD algorithms for protein folding
using all -atom models
1.2. Experimenting with some MD tools (Q-HOP, Ball
Software, GROMACS , STAPL, CHARMM, Abalone ,
UNRES , NAMD , I-TASSER , ROSSETA , SWISS –
MODEL )
Month
2 2. Including of Chaperon
Action into some existing
All-Atom Model s
(CAAAM) 2.1. CAAAM using GroEL and GroES chaperones
2.2. CAAAM using Hsp family of chaperones (Hsp60,
Hsp10, etc.)
Month
3 3a. Experimental work
using sequential MD
algorithms on CAAAM.
3b. Improvement of
CAAAM 3.1 Experimenting with CAAAM using existing MD
algorithms
3.2 Improving the model(s) using simulation results
3.3 Performance analysis
Month
4 4a. Experimental work
using parallel MD
algorithms on CAAAM
4b. Improvement of
CAAAM 4.1 Parallel ization of MD algorithms for CAAAM, using
CUDA, MPI or OpenMP
4.2. Experimental work using our own implementations
and existing MD packages . Month
5
Month
6 Analysis and dissemination
of results 5.1. Writing a research report
5.2. Writing at least one conference paper
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
6
References
[ANF61] Anfinsen, C. B. , Haber, E., Sela, M. F ., White, H. Jr. , The kinetics of formation of native ribonuclease
during oxidation of the reduced polypeptide chain , PNAS , Vol. 47, Nr. 9, 1961, pp. 1309 –1314 .
[ANF73] Anfinsen, C. B., Principles that govern the folding of protein chains , Science , Vol. 181, Nr. 4096, 1973,
pp. 223–230.
[BAC99] Backofen, R., Will, S., Bornberg -Bauer, E., Application of constraint programming techniques for
structure prediction of lattice proteins with extended alphabets , Bioinformatics , Vol. 15(3), 1999, pp.
234-242.
[BAC06] Backofen, R., Will, S., A constraint -based approach to fast and exact structure prediction in three –
dimensional protein models, Constraints, Vol. 11(1), 2006, pp. 5 -30.
[BED85] Bedeleanu, D. D., Manta I., Biochimie medicală & farmaceutică, vol.I Biochimie structurală , Ed. Dacia,
Cluj-Napoca, 1985 .
[BEN10] Benitez, C.M.V., Lopes, H.S., Protein structure prediction with the 3D -HP side -chain model using a
master –slave parallel genetic algorithm . J Braz Comput Soc, Vol. 16, 2010, pp. 69 –78.
[BES11] R. B. Best, G. Hummer, Diffusion models of protein folding , Phys. Chem. Chem. Phys., 2011, 13,
16902 –16911.
[BIS11 ] Özlem Tastan Bishop, O. T., Kroon, M., Study of protein complexes via homology modeling, applied to
cysteine proteases and their protein inhibitors J Mol Model, Vol. 17, Nr. 12, 2011, pp. 3163 -3172.
[BOC13] Bocicor, M. I., Machine Learning Models for Solving Problems in Bioinformatics , PhD Thesis Abstract,
Cluj-Napoca, 2013 .
[CHA14] Chai, H. H., Lim, D., Lee, S. W., Chai, H. Y., Jung, E., Homology Modeling Study of Bovine μ -Calpain
Inhibitor -Binding Domains , Int. J. Mol. Sci, Vol. 15, 2014, pp. 7897 -7938.
[CZI11 a] Czibula, G., Bocicor, M.I., Czibula, I.G., A Distributed Reinforcement Learning Approach for Solving
Optimization Problems , Recent Researches in Communications and IT, Proceedings of the 5th
International Conference on Communications and Information Technology (CIT '11), Greece, 2011, pp.
25-30.
[CZI11b] Czibula , I-G., Czibula., G., Bocicor , M-I., A software framework for solving combinatorial optimization
tasks , Studia Univ Inform atica, Vol. LVI, Nr. 3, pp. 3 -8.
[CZI11c] Czibula., G., Bocicor , M-I., Czibula, I -G., An experiment on protein structure prediction using
Reinforcement Learning , Studia Univ. Informat ica, Vol. LVI, Nr. 1, pp. 25 -34.
[CZI11d] Czibula, G., Bocicor, M. I., Czibula, I.G., A reinforcement learning model for solving the protein
problem , Int. J. Comp. Tech., Vol. 2, Nr. 1, 2011 , pp. 171 -182.
[CZI11e] Czibula, G., Bocicor, M. I., Czibula, I.G., Solving the protein folding problem usin g a distributed Q –
Learning approach , Int. J. Comp. Tech., Vol. 5, Nr. 3, 2011, pp. 404 -413.
[DIL85] Dill, K.A., Theory for the Folding and Stability of Globular Proteins . Biochemistry, Vol. 24, 1985, pp.
1501 -1509.
[DIL90] Dill, K.A ., Dominant Forces in Protein Folding Biochemistry, Vol. 29, Nr. 31, 1990, pp. 7133 -7155.
[DIN06] Dinu, V., Truția E., Popa -Cristea E., Popescu A., Biochimie medicală – mic tratat , Ed. Medicală,
București, 2006 .
[DOV11] Dovier, A ., Recent constraint/logic programming based advances in the solution of the protein folding
problem, Intelligenza Artificiale , Vol. 5, Nr . 1, 2011, pp. 113 -117.
[DYS16] Dyson F. , Originile vie ții Ed Humanitas, București, 2016 .
[ENG0 7] Englader, S. W., Mayne, L., Krishna, M. M. G ., Protein folding and misfolding: mechanism and
princilpes , Quarterly Reviews of Biophysics, Vol. 40, Nr. 4, 2007, pp. 287 -326.
[FAN04] H. Fan A. E. Mark, Mimicking the action of folding chaperones in molecular dynamics simulations:
Application to the refinement of homology -based protein structures , Protein Science, 2004, 13, 992 –999.
[FAN06] H. Fan, A. E. Mark, Mimicking the action of GroEL in molecular dynamics simulations: Application to
the refinement of protein structures , Protein Science, 2006, 15, 441 –448.
[HE13] Y. He, M. A. Mozolewska, P. Krupa, A. K. Sieradzan, T. K. Wirecki, A. Liwo, K. Kachlishvili, S.
Rackovsky, D. Jagiełab, R. Slusarz, C. R. Czaplewski, S. Ołdziej, H. A. Scheraga, Lessons from
application of the UNRES force field to predictions of structures of CASP10 targets , PNAS, 2013, 110,
37, 14936 –14941.
[HEL02] V. Helms , Attraction within the membrane -Forces behind transmembrane protein folding and
supramolecular complex assembly, Embo reports, 2002, 3, 12, 1133 -1138
[HOL75] Holland, J.H ., Adaptation in Natural and Artificial Systems . University of Michigan Press, Ann Arbor,
1975.
[KRU16] P. Krupa, M. A. Mozolewska, M. Wiśniewska, Y. Yin, Y. He, A. K. Sieradzan, R. Ganzynkowicz, A. G.
Lipska, A. Karczyńska, M. Ślusarz, R. Ślusarz, A. Giełdoń, C. Czapl ewski, D. Jagieła, B. Zaborowski,
H. A. Scheraga, A. Liwo, Performance of protein -structure predictions with the physics -based UNRES
force field in CASP11 , Bioinformatics Advance Access published July 4, 2016.
[LAU8 9] Lau K.F., Dill K.A., A lattice statist ical mechanics model of the conformation and sequence space of
proteins, Macromolecules, Vol. 22, 198 9, pp. 3986 -3997.
[LEV69] Levinthal, C., How to fold graciously , Mössbaun Spectroscopy in Biological Systems Proceedings, Univ
of Ilinois Bulletin, 1969, pp. 22 -24.
„Babeș -Bolyai” University Cluj-Napoca Faculty of Matematics and Computer Science
7
[LIN11] Lin C -J., Su S -C., Protein 3D HP Model Folding Simulation Using a Hybrid of Genetic Algorithm and
Particle Swarm Optimization , International Journa l of Fuzzy Systems, Vol. 13, No. 2, June 2011, pp.
140-147.
[LIM02] Li M.S., Klimov D.K., Thirumalai D., Folding in lattice models with side chains. Comput Phys
Communications, Vol. 147, Nr. 1, 2002, pp. 625 –628.
[LIW10] A. Liwo, S. Ołdziej, C. Czaplewski, D. S. Kleinerman, P. Blood, H. A. Scheraga, Implementation of
Molecular Dynamics and ItsExtensions with the Coarse -Grained UNRES Force Fieldon Massively
Parallel Systems: Toward Millisecond -Scale Simulations of Protein Structure, Dynamics, and
Thermodynamics , J. Chem. Theory Comput. 2010, 6, 890 –909.
[LIW11] A. Liwo, Y. He, H. A. Scheraga, Coarse -grained force field: general folding theory , Phys. Chem. Chem.
Phys., 2011, 13, 16890 –1690.
[LU03] Lu, B.Z., Wang, B.H., Chen, W.Z., Wang, C.X., A new computational approach for real protein folding
prediction , Protein Engineering, Vol. 16, Nr. 9, 2003, pp. 659 -663.
[MAI10] G. G. Maisuradze, P. Senet, C. Czaplewski, A. Liwo, H. A. Scheraga, Investigation of Protein
Folding b y Coarse -Grained Molecular Dynamics with the UNRESForce Field , J. Phys. Chem. A 2010,
114, 4471 –4485.
[MAN14] Mann, M., Backofen, R., Exact methods for lattice protein models, Bio-Algoritms and Med -Systems,
Vol. 10, Nr. 4, 2014, pp. 216 -227.
[MCC05] Amie J. McClellan, Stephen Tam, Daniel Kaganovich and Judith Frydman, Protein quality control:
chaperones culling corrupt conformations , Nature Cell Biology , Vol 7, Nr. 8, 2005, pp. 736-741.
[MIA15] Y. Miao, F. Feixas, C. Eun, A. McCammon, Accelerated Molecular Dynamics Simulations of Protein
Folding , Phys. Journal of Computational Chemistry, 2015, 36, 1536 -1549
[MIH11] Mihalaș, Gh -I., Tudor, A., Paralescu, S., Bioinformatica , Ed. Victor Babeș, Timișoara, 2011 .
[MUN16] Munoz, V., Cerminara, M., When fast is better: protein folding f undamentals and mechanisms from
ultrafast approacher , Biochem. J., Vol. 473, 2016, pp. 2545 -2559.
[SAN13] Santos, J., Villot, P., Dieguez, M., Protein Folding with Cellular Automata in the 3D HP Model ,
Proceedings of the GE CCO13, 2013, pp. 1595 -1602.
[SHM03] Shmygelska, A., Holger, H., An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein
Folding Problem , In Springer Verlag, editor, In Proceedings of the 16th Canadian Conference on
Artificial Intelligence, 2003 , pp. 400 -417.
[SHM05] Shmygelska, A., Holger, H., An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar
protein folding problem , BMC Bioinformatics, 2005, Vol. 6, Nr. 30.
[SIE17] A. K. Sieradzan, R. Jakubowski , Introduction of Steered Molecular Dynamics into UNRES Coarse –
Grained Simulations Package , Journal of Computational Chemistry, 2016
[THO05] S. Thomas, G. Tanase, L. K. Dale, J. M. Moreira, L. Rauchwerger N. M. Amato, Parallel protein folding
with STAPL , Concurrency Computat.: Pract. Exper. 2005
[TUR16] Turabieh, H., A Hybrid Genetic Algorithm for 2DProtein Folding Simulations , International Journal. of
Computer Applications, Vol. 139 (3), 2016, pp. 38 -43.
[UNG93 ] Unger, R., Moult, J., Genetic algorithms for protein folding simu lations . Journal of Molecular Biology,
231, 1993, pp. 75 -81.
[PDB] http://www.wwpdb.org/
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: „BABEȘ -BOLYAI” UNIVERSITY CLUJ -NAPOCA FACUL TY OF MAT HEMATIC S AND COMPUTER SCIENCE RESEARCH PROJECT Protein folding s imula tion with chaperones… [600753] (ID: 600753)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
