Journal club list for monday 1998-05-18
1. Protein Folding Simulation With Genetic Algorithm
and Supersecondary Structure Constraints
Yan Cui,1 Run Sheng Chen, and Wing Hung Wong - PROTEINS: Structure,
Function, and Genetics 31:247-257 (1998)
We describe an algorithm to compute native structures of proteins from
their primary sequences. The novel aspects of this method are: 1) The hydrophobic
potential was set to be proportional to the nonpolar solvent accessible
surface. To make computa-tion feasible, we developed a new algorithm to
compute the solvent accessible surface areas rapidly. 2) The supersecondary
structures of each protein were predicted and used as re-straints during
the conformation searching processes. This algorithm was applied to five
proteins. The overall fold of these proteins can be computed from their
sequences, with devia-tions from crystal structures of 1.48± 4.48
Å for Ca atoms.
2. Are Knowledge-Based Potentials Derived From Protein Structure Sets Discriminative
With Respect to Amino Acid Types?
Shamil R. Sunyaev, Frank Eisenhaber, Patrick Argos, Eugene N. Kuznetsov,
and Vladimir G. Tumanyan - PROTEINS: Structure, Function, and Genetics
31:225-246 (1998)
The parametric description of residue environments through
solvent accessi-bility, backbone conformation, or pairwise resi-due±
residue distances is the key to the compari-son between amino acid types
at protein sequence positions and residue locations in structural templates
(condition of protein se-quence± structure match). For the first
time, the research results presented in this study clarify and allow to
quantify, on a rigorous statistical basis, to what extent the amino acid
type-specific distributions of commonly used environment parameters are
discriminative with respect to the 20 amino acid types. Rely-ing on the
Bahadur theory, we estimate the probability of error in a single-sequence±
structure alignment based on weak or absent discriminative power in a learning
database of protein structure. We present the results for many residue
environment variables and dem-onstrate that each fold description parameter
is sensitive with respect to only a few amino acid types while indifferent
to most of the other amino acid types. Even complex struc-tural characteristics
combining solvent-acces-sible surface area, backbone conformation, and
pairwise distances distinguish only some amino acid types, whereas the
others remain nondis-criminated. We find that the knowledge-based potentials
currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr
as essentially ''average'' amino acids. Thus, highly discrimina-tive amino
acid types define the alignment register in gapless sequence± structure
align-ments. The introduction of gaps leads to align-ment ambiguities at
sequence positions occu-pied by nondiscriminated amino acid types. Therefore,
local sequence± structure alignments produced by techniques with
gaps cannot be reliable. Conceptionally new and more sensi-tive environment
parameters must be invented.
3. Influence of Protein Structure Databases on the Predictive Power of
Statistical Pair Potentials
Emiko Furuichi and Patrice Koehl - PROTEINS: Structure, Function, and
Genetics 31:139-149 (1998)
A long standing goal in pro-tein structure studies is
the development of reliable energy functions that can be used both to verify
protein models derived from experi-mental constraints as well as for theoretical
protein folding and inverse folding computer experiments. In that respect,
knowledge-based statistical pair potentials have attracted con-siderable
interests recently mainly because they include the essential features of
protein structures as well as solvent effects at a low computing cost.
However, the basis on which statistical potentials are derived have been
questioned. In this paper, we investigate statis-tical pair potentials
derived from protein three-dimensional structures, addressing in particu-lar
questions related to the form of these potentials, as well as to the content
of the database from which they are derived. We have shown that statistical
pair potentials depend on the size of the proteins included in the database,
and that this dependence can be reduced by considering only pairs of residue
close in space (i.e., with a cutoff of 8 A ° ). We have shown also
that statistical potentials carry a memory of the quality of the database
in terms of the amount and diversity of secondary struc-ture it contains.
We find, for example, that potentials derived from a database containing
a-proteins will only perform best on a-proteins in fold recognition computer
experiments. We believe that this is an overall weakness of these potentials,
which must be kept in mind when constructing a database.
4. Interaction of Transmembrane Helices by a Knobs-Into-Holes Packing Characteristic
of Soluble Coiled Coils
Dieter Langosch and Jaap Heringa - PROTEINS: Structure, Function,
and Genetics 31:150-159 (1998)
Membrane-embedded protein domains frequently exist as
a-helical bundles, as exemplified by photosynthetic reaction cen-ters,
bacteriorhodopsin, and cytochrome C oxi-dase. The sidechain packing between
their transmembrane helices was investigated by a nearest-neighbor analysis
which identified sets of interfacial residues for each analyzed helix±
helix interface. For the left-handed helix± helix pairs, the interfacial
residues almost exclu-sively occupy positions a, d, e,or g within a heptad
motif (abcdefg) which is repeated two to three times for each interacting
helical sur-face. The connectivity between the interfacial residues of
adjacent helices conforms to the knobs-into-holes type of sidechain packing
known from soluble coiled coils. These results demonstrate on a quantitative
basis that the geometry of sidechain packing is similar for left-handed
helix± helix pairs embedded in membranes and coiled coils of soluble
pro-teins. The transmembrane helix± helix inter-faces studied are
somewhat less compact and regular as compared to soluble coiled coils and
tolerate all hydrophobic amino acid types to similar degrees. The results
are discussed with respect to previous experimental findings which demonstrate
that specific interactions between transmembrane helices are important
for membrane protein folding and/or oligomerization.
5. Relationships Between Protein Sequence and Structure Patterns Based
on Residue Contacts
Joachim Selbig and Patrick Argos - PROTEINS: Structure, Function, and
Genetics 31:172-185 (1998)
The identification of correla-tions between sequence
patterns and struc-tural motifs is a prerequisite in the develop-ment of
protein structure prediction methods. The prediction accuracy indicates
whether these correlations are discerned. We present an approach to identify
long-range relation-ships between sequence patterns and struc-tural motifs
by varying the granulation of the structure description. Since interaction
among residues is a major determinant in protein folding, we consider contact
environments formed by two triplets of three sequentially neighboring residues
and described by vectors whose components express contact strengths on
an atomic level. Through testing various classification schemes, including
their resolu-tion and optimizing parameters, discernible relationships
between sequences and folds are explored. About ten structural contact
states, together with information from noncontacting regions, could improve
the accuracy of contact prediction.
6. Structure-Based Design of Model Proteins
Jayanth R. Banavar, Marek Cieplak, Amos Maritan, Gautham Nadig, Flavio
Seno, and Saraswathi Vishveshwara - PROTEINS: Structure, Function, and
Genetics 31:10± 20 (1998)
A structure-based, sequence-design procedure is proposed
in which one considers a set of decoy structures that com-pete significantly
with the target structure in being low energy conformations. The decoy
structures are chosen to have strong overlaps in contacts with the putative
native state. The procedure allows the design of sequences with large and
small stability gaps in a random-bond heteropolymer model in both two and
three dimensions by an appropriate assign-ment of the contact energies
to both the native and nonnative contacts. The design procedure is also
successfully applied to the two-dimen-sional HP model.
7. Prediction and Classification of Domain Structural Classes
Kou-Chen Chou,1 Wei-Min Liu, Gerald M. Maggiora, and Chun-Ting Zhang
- PROTEINS: Structure, Function, and Genetics 31:97-103 (1998)
Can the coupling effect among different amino acid components
be used to improve the prediction of protein struc-tural classes? The answer
is yes according to the study by Chou and Zhang (Crit. Rev. Bio-chem. Mol.
Biol. 30:275± 349, 1995), but a com-pletely opposite conclusion
was drawn by Ei-senhaber et al. when using a different dataset constructed
by themselves (Proteins 25:169± 179, 1996). To resolve such a perplexing
prob-lem, predictions were performed by various approaches for the datasets
from an objective database, the SCOP database (Murzin, Bren-ner, Hubbard,
and Chothia. J. Mol. Biol. 247:536± 540, 1995). According to SCOP,
the classifica-tion of structural classes for protein domains is based
on the evolutionary relationship and on the principles that govern the
3D structure of proteins, and hence is more natural and reliable. The results
from both resubstitution tests and jackknife tests indicate that the over-all
rates of correct prediction by the algorithm incorporated with the coupling
effect among different amino acid components are signifi-cantly higher
than those by the algorithms without using such an effect. It is elucidated
through an analysis that the main reasons for Eisenhaber et al. to have
reached an opposite conclusion are the result of (1) misusing the component-coupled
algorithm, and (2) using a conceptually incorrect rule to classify protein
structural classes. The formulation and analysis presented in this article
are conducive to clarify these problems, helping correctly to apply the
prediction algorithm and interpret the results.
8. Prediction of the Three-Dimensional Structure of Proteins Using the
Electrostatic Screening Model and Hierarchic Condensation
Franc Avbelj and Ljudmila Fele - PROTEINS: Structure, Function, and
Genetics 31:74-96 (1998)
We describe a method for pre-dicting the three-dimensional
(3-D) structure of proteins from their sequence alone. The method is based
on the electrostatic screening model for the stability of the protein main-chain
conformation. The free energy of a pro-tein as a function of its conformation
is ob-tained from the potentials of mean force analysis of high-resolution
x-ray protein struc-tures. The free energy function is simple and contains
only 44 fitted coefficients. The minimi-zation of the free energy is performed
by the torsion space Monte Carlo procedure using the concept of hierarchic
condensation. The Monte Carlo minimization procedure is applied to predict
the secondary, super-secondary, and native 3-D structures of 12 proteins
with 28± 110 amino acids. The 3-D structures of the majority of
local secondary and super-secondary struc-tures are predicted accurately.
This result sug-gests that control in forming the native-like local structure
is distributed along the entire protein sequence. The native 3-D structure
is predicted correctly for 3 of 12 proteins com-posed mainly from the a-helices.
The method fails to predict the native 3-D structure of proteins with a
predominantly b secondary structure. We suggest that the hierarchic con-densation
is not an appropriate procedure for simulating the folding of proteins
made up primarily from b-strands. The method has been proved accurate in
predicting the local second-ary and super-secondary structures in the blind
ab initio 3-D prediction experiment.