Requirements ============ To encode protein structures into matrices, it is necessary to install the DSSP program (http://swift.cmbi.ru.nl/gv/dssp/) to assign secondary structure of proteins and the NumPy package of Python (http://www.scipy.org/install.html) to calculate the vectors and interaxial angles. Installation of MOMA ==================== Linux 1. First, download the binary file from "http://swift.cmbi.ru.nl/gv/dssp/". 2. Then, open a terminal and create a symbolic link to DSSP, for example: $ ln -s ~/programs/dssp2.0.4/mkdssp /usr/bin/dssp or copy it directly to /usr/local/bin or /usr/bin/dssp with cp comand 3. Next, download the last MOMA version (MOMA_1.1.tar.gz) 4. After, unzip and untar the compressed file (MOMA_1.1.tar.gz): $ tar -xzvf MOMA_1.1.tar.gz 5. Finally, open the "MOMA" folder and run "make" command to compile the program that compares matrices. $ cd MOMA/ $ make MAC 1. First to compile DSSP on OS X, you should have the latest version of Apple's Xcode and its "Command Line Tools" on your system. Also make sure to have installed "Homebrew" (https://github.com/Homebrew/homebrew/tree/master/share/doc/homebrew#readme). Next, install the boost libraries using "Homebrew": $ brew install boost 2. Obtain the DSSP source code from "ftp://ftp.cmbi.ru.nl/pub/molbio/software/dssp-2/" We use the 2.0.4 version. 3. Decompress the source code and change to its directory: $ tar zxvf dssp-2.0.4.tgz $ cd dssp-2.0.4 4. Run "make" command into dssp folder, this should fail with errors: $ make 5. Cleanup the cruft leftover from previous make: $ make clean 6. Now you have a new file in the current directory "make.config", Edit this file with an text editor in lines: # Set local options for make here BOOST_LIB_SUFFIX = -mt BOOST_LIB_DIR = /usr/local/lib BOOST_INC_DIR = /usr/local/include/boost 7. Then, edit the file "makefile" and delete the word -static. Save the two files and recompile with: $ make 8. Next, create a symbolic link to new mkdssp executable in the current directory, for example: $ ln -s ~/programs/dssp-2.0.4/mkdssp /usr/bin/dssp 9. Next, download the last MOMA version (MOMA_1.1.tar.gz) 10. After, unzip and untar the compressed file (MOMA_1.1.tar.gz): $ tar -xzvf MOMA1.1.tar.gz 11. Finally, open the "MOMA" folder and run "make" command to compile the program that compares matrices. $ cd MOMA/ $ make Usage ===== 1. Create a matrix from a protein structure (in PDB format): $ python tableauSSE.py examples/1BL0.pdb 2. Select a protein chain to encode in a matrix of secondary structure elements: $ python tableauSSE.py -c A examples/1BL0.pdb By default, the output matrix is renamed as "-.out", for example: "1bl0-20.out" 3. Define distance cutoff in the matrix of secondary structure elements: $ python tableauSSE.py -d 25 examples/1BL0.pdb 4. Encode the PDB files within a folder into matrices $ python tableauSSE.py -l examples/PDB/ This command create a matrix for each PDB file within a folder and create a library of matrices outside of folder (renamed as "PDB-20.out" if do not use other parameters). 5. Run "compare" command on a pair of matrices to align with default parameters (g1:-4, g2:-4, C:45) $ ./compare examples/1bl0-20.out examples/1aih-20.out 6. Specify combination of gap penalties (g1, g2) and C constant to align the matrices of secondary structure elements $ ./compare examples/1bl0-20.out examples/1aih-20.out -5 -5 45 7. Compare a query matrix against a library of matrices $ ./compare examples/1bl0-20.out examples/SCOP40-20.out 20 This command shows a ranking with the 20 top results according to relative similarity (scores sorted in descending order) You can change this value to see more hits. 8. Superpose two PDB files with MOMA $ python MOMA.py --cq A --ct A examples/1BL0.pdb examples/1AIH.pdb (run with default parameters) $ python MOMA.py --cq A --ct A --g1 -5 --g2 -5 -C 45 -D 25 examples/1BL0.pdb examples/1AIH.pdb $ python MOMA.py --cq A --ct A --g1 -4 --g2 -4 -C 90 examples/PDB/2BBM.pdb examples/PDB/1CFC.pdb $ python MOMA.py --cq A --ct A --g1 -5 --g2 -5 -C 45 examples/PDB/3L6D.pdb examples/PDB/2UYY.pdb $ python MOMA.py --cq A --ct A --g1 -5 --g2 -5 -C 45 examples/PDB/2QX5.pdb examples/PDB/2PM7.pdb $ python MOMA.py --cq A --ct A --g1 -10 --g2 -15 -C 45 examples/1CDG.pdb examples/1TIM.pdb 9. See structural alignment with pymol $ cd examples/1CDGA_1TIMA/ $ pymol 1CDGA_1TIMA.py Interpreting Output =================== Comparing a par of structures or matrices (log file format reported by MOMA.py or output of "compare" program) -------------------------------------------------------------------------------------------------------------- GA: -5 LA: -5 C: 45 Query : 1 AA---AAAAA 7 Target: 2 AABBAAAAAA 11 Alignment Tableau (Query-Target) 1BL0|A - 1AIH|A A1A2 3.9 2.4 4.7 0 A2A3 2.9 2.8 40 0 A3A7 11 8 A4A8 1.1 3.0 3.8 25 A5A9 2.9 1.6 57 22 A6A10 1.8 8 38 88 A7A11 1BL0|A-1AIH|A score: 7.6 pval: 3.2e-07 Sr: 34.4 Cr: 68.8 le45: 9 gt45: 2 subn: 7 tn: 12 qn: 7 SAS_ss: 8.9 where: GA Gap-opening penalty 1 LA Gap-opening penalty 2 C C constant score Raw score pval P-value of the sub-matrix reported. This value is generates with a Binomial test that considers the numbers of secondary structure elements (SSE) pairs aligned where the angular difference is less than 45 degrees or more than 45. The p-value is significant if the majority of secondary structure elements aligned between two protein structures (likely share the same fold assignment) has an angular difference below 45 degrees Sr Relative similarity score Cr Relative overlap score le45 Number of SSE pairs aligned that have an angular difference less than 45 degrees gt45 Number of SSE pairs aligned that have an angular difference more than 45 degrees sn Number of SSE matches found in the alignment of SSE strings (this value reveals the size of the submatrix) qn Number of SSE in the query protein tn Number of SSE in the target protein SAS_ss RMSD for the distance differences between a pair of the secondary structure elements aligned in the submatrix divided by total number of pairs aligned, and multiplied by 100 Symbols in the main diagonal of the submatrix represent the pairs of secondary structure elements aligned by MOMA. The values below and upper of the main diagonal correspond to angular differences and distances reported from the pairs of secondary structure elements aligned. Comparing a query matrix against a database of matrices (output of the "compare" program) ----------------------------------------------------------------------------------------- query chain-q target chain-t score Sr Cr SAS_ss sn nq nt 1BL0 A d1bl0a2 A 5.96 70.14 99.36 2.85 4 7 4 1BL0 A d1d5ya2 A 5.24 61.61 87.29 3.13 4 7 4 1BL0 A d2uubm1 M 4.61 54.24 76.85 12.55 6 7 6 1BL0 A d1vz0a1 A 6.10 50.82 55.44 5.05 6 7 7 1BL0 A d1exra_ A 4.35 48.33 62.14 7.97 6 7 7 1BL0 A d2iw5b1 B 4.11 48.33 68.46 9.12 4 7 4 1BL0 A d1x40a1 A 5.07 48.27 50.69 8.52 5 7 5 1BL0 A d1rr7a_ A 3.59 47.91 89.84 2.05 5 7 5 1BL0 A d2ao9a1 A 4.73 47.31 52.56 13.66 5 7 7 1BL0 A d3br0a1 A 3.48 46.46 87.11 6.58 4 7 4 1BL0 A d2i10a1 A 4.18 46.46 59.73 22.23 5 7 5 1BL0 A d2e1fa_ A 5.94 45.67 53.97 17.34 6 7 6 1BL0 A d2id6a1 A 3.43 45.67 85.63 7.83 4 7 4 1BL0 A d2hkua1 A 3.42 45.55 85.41 5.60 4 7 4 1BL0 A d1u9la_ A 4.78 45.52 47.79 12.55 5 7 5 1BL0 A d1y9qa1 A 5.28 44.02 48.02 19.10 6 7 6 1BL0 A d2cfoa2 A 6.98 43.60 63.42 3.63 7 7 10 1BL0 A d2a6ca1 A 3.68 43.31 61.36 17.67 4 7 4 1BL0 A d1x2ia1 A 4.33 43.30 48.11 18.29 5 7 5 1BL0 A d2d6ya1 A 3.68 43.28 61.31 17.39 4 7 4 1BL0 A d1h8ba_ A 3.64 42.87 60.73 10.77 4 7 4 Where: score Raw score obtained of Gaussian function Sr Relative similarity score Cr Relative overlap score SAS_ss RMSD for distance differences between a pair of the secondary structure elements aligned in the submatrix divided by total number of pairs aligned, and multiply by 100 sn Size of the submatrix qn Number of SSE in the query protein tn Number of SSE in the target protein Additional information ====================== Input format ------------ The "compare" program uses matrices in a format similar to Forsyth notation (used in presenting positions of pieces on the chess board) as input. In this representation, the angles and distance between each pair of secondary structure elements (SSE) on and below the main diagonal in the matrix appear as follows, for example: >1bl0|A A / RT|108.533|12.8 A / LS|-112.593|11.3 RT|93.627|12.1 A / LE|-55.761|17.4 RD|54.286|18.9 RD|54.366|nn A / RD|70.815|nn PD|21.168|nn RT|129.820|nn RT|134.768|12.6 A / LS|-134.865|nn LS|-106.829|nn PE|-27.889|nn LS|-100.541|11.0 RD|84.589|11.7 A / LS|-110.490|nn LE|-59.156|nn PD|11.049|nn LE|-50.890|13.7 RD|76.157|13.2 PD|17.143|10.5 A / where the symbols and values between vertical bars correspond a discrete representation of the angles (Lesk codification), the angle values (in degrees) and distances between each par of secondary structure element (angstroms) in a structure. If the distance value between SSE is more than the fixed cutoff value, this value is labelled by 'nn' represents that it is not contact. Slash marks '/' indicate the separation between rows of the matrix.