The LGscore and LGscore2 programs.

Here we present two methods that van be used to calculate the quality of a model using the LGscore (see this or other manuscript for description). The program is freely available under the GPL license.

To use the program you need two models of your protein (your model and the correct structure) in pdb format. They should have the same residue numbering. Then the programs van be run by typing "./LGscore file1.pdb file2.pdb". Only the C-alpha coordinates are needed. LGscore2 is calculated by typing ./LGscore2 file1.pdb file2.pdb

SCORE:        0.88    28      0.9     485.2   1.636783e-04

This means that a fragment containing 88% o second protein was found. This fragment was of length 28 and had an rmsd of 0.9. The score S was 485.2 and the P-value was 1.64e-4. The output of LGscore and LGscore2 are equivalent. If you want to calculate LGscore2 it is best to run both LGscore and LGscore2 and take the best output, as in a few cases the structural alignment is worse than LGscore. This might happen when large insertions/deletions has happened.

To compile the program simple type "gcc -lz -lm -o rmsd rmsd.c" or something similar (-Dnozlib turns of the ability to read gzipped files).

Note: The LGscore is not really reliable for short fragments and the statistics ought to be re-fitted. We are working on that, but if you want to help you are welcome to contact me.

Please refer to this page until the manuscript is printed. If you download and user the program please send me an email and I appreciate bug fixes and other comments.

Version 2.0

We have made a new version of LGscore available. This version has some major changes. This measure will be used

Changes in statistics

  1. The statistics has been recalculated. This has resulted in a better scoring system for short matches. Another result is that the P-values are not really comparable with version 1.0 of the LGscore. Approximately an old LGscore of 1.e-3 is equal to a new LGscore of 1.e-1.5.
  2. A new quality measure (the Q-value) is calculated. This measure is dependent on the length of the target protein. This measure should be significant better for measuring the quality of a model. More details will be made available later.

The output has one extra column. The Q-value a good cutoff is a Q-value of approximately 2. to 5.. A Q-value below 0 is not at all significant.

SCORE:        0.88    28      0.9     485.2   1.636783e-04  2.432082e-03

Changes in the algorithm

  1. The algorithm has been speeded up, by using a termination of inner loops if the P-value is larger that 0.5. The obtained results correlate with the old results with a correlation coefficient of better than 0.9999. The speed is approximately 10 times faster than the original algorithm on the LiveBench-2 dataset. (the complexity should go from O(n4) to O(n2). After some optimizations it was found that if all searches where the rmsd was larger than 5(N+225)/225 were terminated and that all residues closer than 3Å automatically were included in the next iteration.
  2. The speedup for LGscore2 is quite insignificant. It should at the best be a factor of two. The reason is that the majority of the time is spent using for the structural superposition.

Paper

IF you use LGscore please refer to the following paper "A study of quality measures for protein threading models. Susana Cristobal, Adam Zemla, Daniel Fischer, Leszek Rychlewski and Arne Elofsson.2001BMC Bioinformatics 2:5" The paper can be downloaded from here

Supplementary material


Arne Elofsson
Last modified: Tue Oct 8 14:33:45 CEST 2002