I-Mutant

Last Update 29/09/04


 

I-Mutant: a tool for predicting protein stability upon mutation


Introduction

I-Mutant is a neural-network-based web server for the automatic prediction of protein stability changes upon single-site mutations. The tool was trained on a data set derived from ProTherm [1], presently the most comprehensive database of protein mutations. When trained/tested with a cross validation procedure, I-Mutant correctly predicts whether the protein mutation stabilises or destabilises the protein structure in 80% of the cases (on the S1615 set consisting of 1615 mutations web server provides the values of free energy change predictions computed with the energy-based FOLD-X tool. By coupling the FOLD-X predictions with those of I-Mutant, and considering the reliability index value of the latter, the joint-method achieves an accuracy of 93% on one third of the database, making I-Mutant a valuable tool for protein design and mutation.


Results

In the table we report some parameters that score the efficiency of our method 

          Q2 P(+) Q(+) P(-) Q(-)   C
I-Mutant 0.81 0.71 0.52 0.83 0.91 0.49


The overall accuracy Q2 is:

Q2=p/N

where p is the total number of correctly predicted residues and N is the total number of residues.
The correlation coefficient C is defined as:

C(s)=[ p(s)n(s)-u(s)o(s) )] / D


where D is the normalization factor

D =[(p(s)+u(s))(p(s)+o(s))(n(s)+u(s))(n(s)+o(s))]1/2

for each class s (+ and -, for increasing and decreasing stability, respectively); p(s) and n(s) are the total number of correct predictions and correctly rejected assignments, respectively, and u(s) and o(s) are the numbers of under and over predictions.
The coverage for each discriminated structure s is evaluated as:

Q(s)=p(s)/[ p(s)+u(s)]

where p(s) and u(s) are used in previous equations The probability of correct predictions P(s) (or accuracy for s) is computed as:

P(s)=p(s) / [p(s) + o(s)]

where p(s) and o(s) are previous defined (ranging from 1 to 0).


Required Inputs

PDB code: PDB protein code [2]
Chain: Chain label. Default value: "_"
Position: PDB residue position
Temperature: Temperature in Celsius degree [0-100]
pH: negative logarithm of H+ concentration [0-14]
FOLD-X: Post your query to FOLD-X Server
e-mail: Insert your e-mail. The output of our program will be send to your address

Outputs

The output consists of a table listing the sign of the predicted stability changes upon the 19 possible mutations for a given PDB position.
The RSA value (Relative Solvent Accessible Area) is caluculated using the DSSP program [3]. It is obtained dividing the surface area calculated (DSSP program) by the relative aminoacid surface [4].

The RI value (Reliability Index) is calculated from the output of the neural network O

RI=20*abs(O-0.5)

If the FOLD-X option is selected our program posts a query to FOLD-X Server [5] and sends via e-mail the values of DG (free energy variation) and DDG (change in free energy variation upon mutation) expressed in kcal/mol.

In case  FOLD-X  does not answer  for any kind of server trouble, we return "data NA" instead of the DDG and DG values .

Possible errors may occur when PDB files contain broken chains or a different numbering of residues than expected by the user.





[1] Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, Prabakaran P, Sarai A (2000). ProTherm, version 2.0: thermodynamic database for proteins and mutants. Nucleic Acids Res. 28, 283-285.

[2] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235-242.

[3] Kabsch W, Sander C (1983). Dictionary of protein secondary structure: pattern of hydrogen-bonded and geometrical features. Biopolymers. 22, 2577-2637.

[4] Chothia C (1976). The Nature of the Accessible and Buried Surfaces in Proteins. J. Mol. Biol. 105, 1-14.

[5] Guerois R, Nielsen JE, Serrano L (2002). Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. J. Mol. Biol. 320, 369-387.