Journal club list for monday 1998-05-18


1. Protein Folding Simulation With Genetic Algorithm and Supersecondary Structure Constraints

Yan Cui,1 Run Sheng Chen, and Wing Hung Wong - PROTEINS: Structure, Function, and Genetics 31:247-257 (1998)

We describe an algorithm to compute native structures of proteins from their primary sequences. The novel aspects of this method are: 1) The hydrophobic potential was set to be proportional to the nonpolar solvent accessible surface. To make computa-tion feasible, we developed a new algorithm to compute the solvent accessible surface areas rapidly. 2) The supersecondary structures of each protein were predicted and used as re-straints during the conformation searching processes. This algorithm was applied to five proteins. The overall fold of these proteins can be computed from their sequences, with devia-tions from crystal structures of 1.48± 4.48 Å for Ca atoms.

2. Are Knowledge-Based Potentials Derived From Protein Structure Sets Discriminative With Respect to Amino Acid Types?

Shamil R. Sunyaev, Frank Eisenhaber, Patrick Argos, Eugene N. Kuznetsov, and Vladimir G. Tumanyan - PROTEINS: Structure, Function, and Genetics 31:225-246 (1998)
The parametric description of residue environments through solvent accessi-bility, backbone conformation, or pairwise resi-due± residue distances is the key to the compari-son between amino acid types at protein sequence positions and residue locations in structural templates (condition of protein se-quence± structure match). For the first time, the research results presented in this study clarify and allow to quantify, on a rigorous statistical basis, to what extent the amino acid type-specific distributions of commonly used environment parameters are discriminative with respect to the 20 amino acid types. Rely-ing on the Bahadur theory, we estimate the probability of error in a single-sequence± structure alignment based on weak or absent discriminative power in a learning database of protein structure. We present the results for many residue environment variables and dem-onstrate that each fold description parameter is sensitive with respect to only a few amino acid types while indifferent to most of the other amino acid types. Even complex struc-tural characteristics combining solvent-acces-sible surface area, backbone conformation, and pairwise distances distinguish only some amino acid types, whereas the others remain nondis-criminated. We find that the knowledge-based potentials currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr as essentially ''average'' amino acids. Thus, highly discrimina-tive amino acid types define the alignment register in gapless sequence± structure align-ments. The introduction of gaps leads to align-ment ambiguities at sequence positions occu-pied by nondiscriminated amino acid types. Therefore, local sequence± structure alignments produced by techniques with gaps cannot be reliable. Conceptionally new and more sensi-tive environment parameters must be invented.

3. Influence of Protein Structure Databases on the Predictive Power of Statistical Pair Potentials

Emiko Furuichi and Patrice Koehl - PROTEINS: Structure, Function, and Genetics 31:139-149 (1998)
A long standing goal in pro-tein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experi-mental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge-based statistical pair potentials have attracted con-siderable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statis-tical pair potentials derived from protein three-dimensional structures, addressing in particu-lar questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 A ° ). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary struc-ture it contains. We find, for example, that potentials derived from a database containing a-proteins will only perform best on a-proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database.
 

4. Interaction of Transmembrane Helices by a Knobs-Into-Holes Packing Characteristic of Soluble Coiled Coils

Dieter Langosch  and Jaap Heringa - PROTEINS: Structure, Function, and Genetics 31:150-159 (1998)

Membrane-embedded protein domains frequently exist as a-helical bundles, as exemplified by photosynthetic reaction cen-ters, bacteriorhodopsin, and cytochrome C oxi-dase. The sidechain packing between their transmembrane helices was investigated by a nearest-neighbor analysis which identified sets of interfacial residues for each analyzed helix± helix interface. For the left-handed helix± helix pairs, the interfacial residues almost exclu-sively occupy positions a, d, e,or g within a heptad motif (abcdefg) which is repeated two to three times for each interacting helical sur-face. The connectivity between the interfacial residues of adjacent helices conforms to the knobs-into-holes type of sidechain packing known from soluble coiled coils. These results demonstrate on a quantitative basis that the geometry of sidechain packing is similar for left-handed helix± helix pairs embedded in membranes and coiled coils of soluble pro-teins. The transmembrane helix± helix inter-faces studied are somewhat less compact and regular as compared to soluble coiled coils and tolerate all hydrophobic amino acid types to similar degrees. The results are discussed with respect to previous experimental findings which demonstrate that specific interactions between transmembrane helices are important for membrane protein folding and/or oligomerization.

5. Relationships Between Protein Sequence and Structure Patterns Based on Residue Contacts

Joachim Selbig and Patrick Argos - PROTEINS: Structure, Function, and Genetics 31:172-185 (1998)
The identification of correla-tions between sequence patterns and struc-tural motifs is a prerequisite in the develop-ment of protein structure prediction methods. The prediction accuracy indicates whether these correlations are discerned. We present an approach to identify long-range relation-ships between sequence patterns and struc-tural motifs by varying the granulation of the structure description. Since interaction among residues is a major determinant in protein folding, we consider contact environments formed by two triplets of three sequentially neighboring residues and described by vectors whose components express contact strengths on an atomic level. Through testing various classification schemes, including their resolu-tion and optimizing parameters, discernible relationships between sequences and folds are explored. About ten structural contact states, together with information from noncontacting regions, could improve the accuracy of contact prediction.

6. Structure-Based Design of Model Proteins

Jayanth R. Banavar, Marek Cieplak, Amos Maritan, Gautham Nadig, Flavio Seno, and Saraswathi Vishveshwara - PROTEINS: Structure, Function, and Genetics 31:10± 20 (1998)
A structure-based, sequence-design procedure is proposed in which one considers a set of decoy structures that com-pete significantly with the target structure in being low energy conformations. The decoy structures are chosen to have strong overlaps in contacts with the putative native state. The procedure allows the design of sequences with large and small stability gaps in a random-bond heteropolymer model in both two and three dimensions by an appropriate assign-ment of the contact energies to both the native and nonnative contacts. The design procedure is also successfully applied to the two-dimen-sional HP model.

7. Prediction and Classification of Domain Structural Classes

Kou-Chen Chou,1 Wei-Min Liu, Gerald M. Maggiora, and Chun-Ting Zhang - PROTEINS: Structure, Function, and Genetics 31:97-103 (1998)
Can the coupling effect among different amino acid components be used to improve the prediction of protein struc-tural classes? The answer is yes according to the study by Chou and Zhang (Crit. Rev. Bio-chem. Mol. Biol. 30:275± 349, 1995), but a com-pletely opposite conclusion was drawn by Ei-senhaber et al. when using a different dataset constructed by themselves (Proteins 25:169± 179, 1996). To resolve such a perplexing prob-lem, predictions were performed by various approaches for the datasets from an objective database, the SCOP database (Murzin, Bren-ner, Hubbard, and Chothia. J. Mol. Biol. 247:536± 540, 1995). According to SCOP, the classifica-tion of structural classes for protein domains is based on the evolutionary relationship and on the principles that govern the 3D structure of proteins, and hence is more natural and reliable. The results from both resubstitution tests and jackknife tests indicate that the over-all rates of correct prediction by the algorithm incorporated with the coupling effect among different amino acid components are signifi-cantly higher than those by the algorithms without using such an effect. It is elucidated through an analysis that the main reasons for Eisenhaber et al. to have reached an opposite conclusion are the result of (1) misusing the component-coupled algorithm, and (2) using a conceptually incorrect rule to classify protein structural classes. The formulation and analysis presented in this article are conducive to clarify these problems, helping correctly to apply the prediction algorithm and interpret the results.
 

8. Prediction of the Three-Dimensional Structure of Proteins Using the Electrostatic Screening Model and Hierarchic Condensation

Franc Avbelj and Ljudmila Fele - PROTEINS: Structure, Function, and Genetics 31:74-96 (1998)
We describe a method for pre-dicting the three-dimensional (3-D) structure of proteins from their sequence alone. The method is based on the electrostatic screening model for the stability of the protein main-chain conformation. The free energy of a pro-tein as a function of its conformation is ob-tained from the potentials of mean force analysis of high-resolution x-ray protein struc-tures. The free energy function is simple and contains only 44 fitted coefficients. The minimi-zation of the free energy is performed by the torsion space Monte Carlo procedure using the concept of hierarchic condensation. The Monte Carlo minimization procedure is applied to predict the secondary, super-secondary, and native 3-D structures of 12 proteins with 28± 110 amino acids. The 3-D structures of the majority of local secondary and super-secondary struc-tures are predicted accurately. This result sug-gests that control in forming the native-like local structure is distributed along the entire protein sequence. The native 3-D structure is predicted correctly for 3 of 12 proteins com-posed mainly from the a-helices. The method fails to predict the native 3-D structure of proteins with a predominantly b secondary structure. We suggest that the hierarchic con-densation is not an appropriate procedure for simulating the folding of proteins made up primarily from b-strands. The method has been proved accurate in predicting the local second-ary and super-secondary structures in the blind ab initio 3-D prediction experiment.