Student : Sukjoon Yoon
Discriminating Homologous Templates from Non-homologous Templates in Protein Homology Modeling Using an Energy Function
A model of the unknown protein can be constructed using its amino acid sequence and the backbone conformation of the known structure. For many cases in which sequences have diverged greatly, the difficulty of predicting conformation lies in recognizing the homology. A molecular mechanics energy evalutaion is among the most accurate scoring functions that can discriminate correct structures from incorrect conformations. Sahasrabudhe et al. evaluated conformational energies of homology models to distinguish the most favorable template structure (1).
In this experiment, to confirm if a model from homologous template really had the lowest energy value, I extensively evaluated the conformational energies of model structures calculated from several different templates which include both homologous(good template) and non-homologous proteins(bad templates). Sahasrabudhe et al. used CONGEN computer program for the homology modeling (1). CONGEN was thought to be very reliable because it adapts full energy function for the calculation (2). To speed up the calculation, I used DYANA for molecular dynamics and then used CONGEN for minimization and final energy calculation step.
Table 1 : Profiles of 14 protein molecules
The target and template proteins for homology modeling were selected from the SCOP data set. 14 proteins which have similar sequence sizes were selected from 4 different classes; a , b , a /b , a +b (Table 1). Each target protein always have one good template which belongs to the same fold and family as the target protein, and several bad templates which are in different folds or classes (Table 2). 6 proteins ( 2cro, 2ezh, 1ftt, 3mef, 1ah9 and 1bu1) were used as target molecules, and then for each target molecules, 6 or 7 molecules were used as templates for calculating models.
Table 2. The template molecules used for each target molecules
a b 2CRO 1FTT 2EZH 3MEF 1AH9 1BU1 a 1R69 O O O O O O 1HOM O O O O O O b 1CSP O O O O O O 1AWX   O O     O a/b 1BTA O O O O O O a+b 3IL8 O O O O O O 1UBI       O O   1TIN O O O
The sequence alignments were carried out by using CLUSTAL W. For each alignment, homologous distance constraints were generated with PDBSTAT (2). The default values for both the number of initial constraints (i.e, 10000) and the distance limit of constraints (i.e., 99.9) were used in this step. Then, with these constraints, 10 best structures were generated by simulated annealing with DYANA molecular dynamics program. The minimization and final energy calculation were done with CONGEN. For further evaluation, I selected one out of ten structures which showed the lowest energy minimum.
Result & Discussion
I calculated total 380 models in this experiment (Figure 1). For simulated annealing, DYANA was much faster (less than 1 minute in PC farm) than CONGEN molecular dynamics because it adapted an unique target function of torsional energy instead of full energy function. Each energy calculation by CONGEN took about 20 minutes (CPU time) on lion machine.
Figure 1. The number of structure calculations
In case of 2cro (Data-1), the sequence identity to homologous template, 1r69 was 51%. The model structure from the homologous template showed the lowest target function and VDW energy value. CONGEN calculation showed the lowest energy minimum when homologous template, 1r69 was used.
1bu1 also had 51% sequence identity with homolougous template, 1awx (Data-2), but the energy difference between good model from homologous template and bad models from non-homologous templates was not so clear as that of 2cro. Another two more cases (1ftt, 3mef) which have high sequence identities with homologous templates showed little difference in comformation energies between good models and bad models (Data-3, Data-4).
When sequence identities with homologous templates were as low as with non-homologous templates (Data-5, Data-6), conformation energies could not discriminate good models at all.
Data-1 : Models of 2CRO & Energy calculation
Data-2 : Models of 1BU1 & Energy calculation
Data-3 : Models of 3MEF & Energy calculation
Data-4 : Models of 1FTT & Energy calculation
Data-5 : Models of 1AH9 & Energy calculation
Data-6 : Models of 2EZH & Energy calculation
In this experiment, I selected 6 target molecules which have different sequence identities with their homologous templates. Only when sequence identities are over 50%, the conformation energies of model structures could distinguish homologous templates from non-homologous templates (Table 3). When sequence identities were decreased, RMSD values between homologous models of targets and experimental structures of targets were increased. The energy difference between homologous model and non-homologous model had strong correlation with RMSD values. So, to adapt energy functions as reliable criteria regardless of sequence identities between targets and homologous templates, highly similar structures need to be generated from homologous templates (RMSD value of less than 1.0). Generally, sequence alignment is one of most important factors in homology modeling. Clustal W is not proper tool for sequence alignment when sequence identities are less than 25%. So better sequence alignment would lower the energies of homologous models in Data-5 and Data-6.Table 3. The evaluation of homology models
3mef 2cro 1bu1 1ftt 2ezh 1ah9
Energy difference 1
1 = (a - b) / b a: E. of good model from homologous template   b: ave E. of bad models from non-homologous template 2 : mean global heavy RMSD (experimental target structure vs. calculated target structure)
For fast calculation, I excluded explicit solvent in energy calculations but the solvation of exposed sidechains by surrounding water molecules is actually very important. It was known that molecular mechanics energies evaluated without a solvent environment were not always able to reliably distinguish correctly folded structures from incorrectly folded conformations (3,4). So further experiments including a solvent would show more clear difference in energies between good models and bad models.
In addition, I used random number of homologous distant constraints for structure generation and excluded these constraints in energy calculation. For next step, more evaluation should be done with a consistent number of constraints in structure generations and energy calculaton including constraints term.
- Sahasrabudhe, P.V. et al. (1998) Proteins: Structures, Functions, and Genetics, 33, 558-566.
- Li, H. et al. (1997) Protein Science, 6, 956-970.
- Chiche, L. et al, (1990) Proc. Natl. Acad. Sci. USA, 87, 3240-3243.
- Holm, L. & Sander, C. (1992) J. Mol. Biol. 225, 93-105.