INSTRUCTIONS Please gunzip and uncompress the file 'model_assessment.tar.gz'. Eight executable files and three library files will be extracted. In the local execution directory, the following three library files will be required: - 'GA341-NUMAA.pdf' - 'pair.de' - 'surf.de' The required order to run these programs is: 1.- Any of these first: ene_pair_L ene_surf_L get_compactness_L get_length get_seq_ide_from_ali 2.- Then (this program requires the output of the previous programs) get_zscore_L 3.- Then: get_ga341 4.- Finally: get_pG Now, how this software works ?: The GA341 score is a function of the model combined statistical potentials z-score, compactness, and percentage sequence identity of the alignment used to build the model. The compactness is calculated simply from the 3D model (PDB file) using the program 'get_compactness'. The output of this program will be 'pdbfilename.compactness'. The compactness will have always a value between 0.0 and 1.0. This program is run as follows: get_compactness pdbfilename [protein_chain ID] The running will generate an output file called 'pdbfilename.compactness'. The program 'get_seq_ide_from_ali' uses as input an alignment in PIR format (the same as MODELLER) to get the percentage sequence identity (NOTE: this value must be between 0.0 and 1.0 and NOT between 0.0 and 100.0. This program will give this value in that range, but it is also possible to get this value by any external program of your preference). Very important: if an external software is used to obtain the percentage sequence identity value from the alignment, a file called 'pdbfilename.seq_ide' must be generated. This file should contain the percentage sequence identity on it (between 0.0 and 1.0). The program 'get_seq_ide_from_ali' will do that automatically. This program is run as follows: get_seq_ide_from_ali alignment_file_in_PIR_format The alignment file in PIR format should be named 'pdbfilename.ali'. In that case, the running will generate an output file called 'pdbfilename.seq_ide'. If not, this last name should be generated within the user script after the running of this program. The programs 'ene_pair_L' and 'ene_surf_L' use statistical potentials of pairwise distance-dependent ('pair.de') and solvent accessible energies ('surf.de') to calculate the total energy of the model and of hundreds of random models of the same residue composition. These programs must be invoked as follows: ene_pair_L ./pair.de pdbfilename [protein_chain ID] and ene_surf_L ./surf.de pdbfilename [protein_chain ID] The running of the ene_pair_L program will generate two output files: 'pdbfilename.ene_pair.native' (this file contains the total pairwise energy of the protein) 'pdbfilename.ene_pair.random' (this file contains the total pairwise energies of 200 random proteins) The running of the ene_surf_L program will generate two output files: 'pdbfilename.ene_surf.native' (this file contains the total accessible solvent energy of the protein) 'pdbfilename.ene_surf.random' (this file contains the total accessible solvent energies of 200 random proteins) Once these programs are run, the combined statistical potential z-score can be calculated By using the program 'get_zscore_L'. This program is run as follows: get_zscore_L pdbfilename [protein_chain ID] The running of this program will generate three output files: 'pdbfilename.pair.zscore' (this file contains the pairwise distant-dependent z-score) 'pdbfilename.surf.zscore' (this file contains the accessible solvent z-score) 'pdbfilename.zscore' (this file contains the combined z-score) Up to now, the following should be available: 1.- compactness of the model in the range [0.0 :1.0] 2.- combined energy z-score of the model in the range [-inf : +inf] 3.- percentage sequence identity of the target-template alignment used to build the model in the range [0.0 : 1.0] Thus, the model assessment score called GA341, which is a function of these three variables, can be calculated by using the program 'get_ga341'. This program is run as follows: get_ga341 pdbfilename The running of this program will generate an output file called 'pdbfilename.ga341', which contains the GA341 score in the range [0.0 : 1.0]. A score near 0.0 represents a bad or incorrect model and a score near 1.0 represents a correct or good model. Then, the the 'get_pG' program can be run, which uses the ga341 score and the model length to get the conditional probability that the model is good or correct. This program will need a file called 'pdbfilename.length' which can be generated by running the program 'get_length'. This program is run as follows: get_length pdbfilename The 'get_pG' program uses as input the PDF (probability density functions) provided in the file 'GA341-NUMAA.pdf'. Currently, a pG of 0.7 or higher is being used to classify a model as good or correct. Values of pG lower than 0.7 classify the model as wrong or bad.