Palign, pmembr etc INTRODUCTION This is a set of programs for sequence searches. It can perform both dynamic programming and heuristic searches. Unfortunately they are not very well documented LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License. With the exception that if you use this program in any scientific work you have to explicitly state that you have used PALIGN and cite the relevant publication (dependent on what you have used PALIGN for). My publication list can be found at http://www.sbc.su.se/~arne/papers/. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program (in the file gpl.txt); if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-13 For support please use email to arne@sbc.su.se and add PALIGN-SUPPORT in the header BINARIES: In the tar distribution there are binaries for linux (redhat 7.2) compiled in linux/p*. Note that these are compiled for quite large sequences and therefore you need about 700mb of RAM+SWAP space. COMPILATION You might have to define the environment variable ARCH (either in your shell or in Makefile) and maybe edit the file $ARCH/flags.mk (not if you are using g77) that set some flags that are specific for your computer. Then all you should need to do is to type: > make all this will place executables in $ARCH/ (Note that precompiled binaries for redhat 7.2 exists in the linux/ subdirectory). Development is done using gcc and intels fortran compiler. If you have this the code will be significantly faster (binaries in linux.ifc/). However, porting should be quite trivial to any plattform. PMEMBR To use Pmembr (Hedman et al, 2002) you also need tmHMM (available from Anders Krogh) or another membrane prediction program (that can produce a similar output). You will also need to convert the output of tmHMM to our database format using the plptofast.pl program. To run pmembr you need to (a) compile the program (b) run tmHMM for your sequence and your database (c) convert your database to our internal format with plptofast.pl. {The database can also be download from http://www.sbc.su.se/~arne/pmembr/sprot.seq} (d) run PSIBLAST (with these parameters > blastpgp -j 5 -h 1e-5 -m 1 -e 99 -v 10000 -F F against SwissProt) and (e) run it with this command. pmembr2 -prof FILE.plp FILE.psi - DATABASE If you want to run faster you can use our heuristic search method (pmembrh) instead of pmembr2. DIFFERENT PROGRAMS: There are three groups of programs: - palign, palign2 and palignh are for searches of globular proteins - pmembr, pmembr2 and pmembrh are for searches against membrane proteins - palignp and palignp2 are for profile-profile alignments in each group there are two or three program: - pmembr, palignp and palign just aligns two sequences - palign2,palignp2 and pmembr2 performs a search of a sequence (or profile) against a database of sequence. - pmembrh/palignh are heuristic versions of pmembr2 and palign2 using a FASTA-like algorithm for the algorithm. Not perfect but they work and are quite fast. USAGE: type palign -h for some help on different flags that can be used. Palign can be used for many type of alignment problems, it is certainly not always the fastest of all programs but it is very flexible and can: - Align two sequences and search a database with a sequence - Align a sequence and a profile - Include and combine secondary structure information in the alignments - Read sequences, profiles etc in many formats - Perform profile-profile searches/alignments using different algorithms EXAMPLES: Alignments: - To align two sequences (and print the alignments) > palign -ali seq1.seq seq2.seq - To align a sequenc againast a pdb file and print the coordinates of the model into file foo.pdb > palign -pdb foo.pdb seq1.seq seq2.pdb - To use secondary structure information in the alignments > palign -seq seq1.seq seq1.ss - -prof seq2.seq seq2.ss - - To align two profiles using the log average scoring > palignp -logaver -prof prof1.psi- -seq prof2.psi - - To use the ProfNet method and print the coordinates of the model into file foo.pdb. - -scalepsib transform the values in the log-odds based psi-blast profile vector. - -winsize and -win should be set to 1 and -lambda should be set to 2 to use the correct scoring algorithm. - Here the GO/GE and shift values are set to 0.3 0.03 -0.3 - NB the order of the arguments is very important > palignp -lambda 2 -scalepsib -winsize 1 -win 1 -go 0.3 -ge 0.03 -shift -0.3 -readnn profnet_nin40_nhid29_out1_epochs700.ann -pdbfile model_of_file1.pdb -prof file2.pdb file2.psi - -seq file1.psi - Searches: - To search a sequence against a database (list of sequence files) > palign2 seq1.seq list.txt - To do a profile--profile search > palignp2 seq1.psi list-psi.txt PLATFORMS: The programs are normally developed on linux using either g77 or intels compilers. Some of the programs have been tested (compiled) on alphas/aix/sgi/sun4 machines but this is not at all supported, but should probably work. You need to either use the split.awk script or have a compiler that (a) inludes cpp preprocessing and (b) lines longer then 72 characters. The rest should be more or less standard fortran 77 Look in */flags.mk for ideas on how to compile. Note that the profile-profile comparisons are written in C.