wnprpc Example: water


Usage: water [options...] files...
 
Description: Produces a list of problematic water molecules for each
crystallographic PDB file in the 'files...' list. A water molecule is
considered 'potentially problematic' if

   - the closest distance to protein hetero atom is > 3.6 A or < 2.2 A,
   - or the closest distance to a protein hetero atom is more than 0.3 A
     shorter than the closest contact to a protein hetero atom in the
     asymmetric unit,
   - or if the water molecule is in close (<2.2A) contact to another
     water molecule.

Options:
  --version   show program's version number and exit
  -h, --help  show this help message and exit
  -a, --all   show all waters (default: show only problematic waters)

Notes: (click here to see the script)

Line 20:
from wnprpc import wnpRPCserver
The wnpRPC server class is imported from the wnprpc Python module.

Lines 24-48:
def readFile(pdb):
Definition of a function to read a file and return the file content as a string. If the file is gzip compressed it will be inflated on the fly. The 'pdb' argument can be an open file, or a string, which will be interpreted as a filename (standard input, if the string is "-").

Lines 52-137
def analyze(W,pdb,code):
Definition of the function which will do the bulk of the work. The loadMolTypes method of the wnpRPCserver object W is used to load the content 'pdb' of a file read by the readFile function into a Wit!P session. Using the cmd method of the wnpRPCserver object W the  'crystal' command is sent to the Wit!P session, and an error message is issued if the execution led to an error (s>0 in line 71). Next, three atom sets are created in the Wit!P session (water: containing all oxygen atoms in water molecules, hetero: all hetero (non-H, non-C) atoms, and target: all hetereo atoms that are not in the water atom set). The numer of water oxygens is extracted from the response text generated by the commands that defined the sets. If the water count is zero, an empty list is returned. With the Wit!P session still in 'crystal' mode, the crystal packing is generated to +-0.75 unit cells in each direction around the center of the asymmetric unit, and the shortest distences between atoms in the set water, restricted to the asymmetric unit, and atoms in the set target set are measured using a "measure autopair nearest" command. The response text from this command is converted into a tuple of dictionaries (neighb[atomname]: name of target atom closest to water oxygen atomname, dist[atomname]: distance between atomname and neighb[atomname]). The list of water oxygen atom names is extracted from the neighb dictionary, and sorted ascending by dist[atomname] (line 107). In a similar way, dictionary pairs are generated for the shortes water (asymm. unit)/water distances (lines 111-119), and for the shortest water (aymm. unit) / target (asymm. unit) distances (lines 123-131). Finally, the three dictionary pairs are converted into a list of 4-tuples (atomname, (closest target atom, distance), (closest target atom in asymm. unit, distance), (closest water oxygen, distance)) for each water oxygen atom in the asymmetric unit. This list is returned to the caller.

Lines 141-157:
def distList(monitor):
Converts the output from "measure autopairs ..." into a pair of dictionaries. 'monitor' is a response text generated by the cmd method of a wnpRPCserver object. The defList function processes records with 5 fields, wher the second and fourth fields are '--'. For these records, the first and third fields are atomnames, the last field is a distance. The atom names are full names (including molecule name, symop, chain name, residue name, separated by '/'). A split-and-join operation is applied to these names to remove the redundant molecule name.

Lines 161-200:
Execution of the script starts at line 161. After parsing the command line using the OptionParser module, a wnpRPCserver object is created with a Wit!P XML/RPC server listening on the first available port in the range 19910...19930. If W.server is None,  the XML/RPC server could not be started, and the script terminates with an error message. The W.cmd is used to set a couple of "measure monitor" options in the Wit!P session to ensure the proper functioning of the 'analyze' function defined in lines 52-137. The script then loops though the list of filenames specified on the command line, reading and analyzing each file in turn. Problematic water molecules (all water molecues, if the -all option is used) are listed on standard output, with the following seven items:
water oxygen atom name,
distance to closest hetero atom (excl. water), name of closest hetero atom,
distance to closest heter atom (excl. water) in asymm. unit, name of closest hetero atom in asymmetric unit,
distance to closest water oxygen atom, name of closest water oxygen atom
At the end of each loop, the molecule in the Wit!P session is deleted.


Sample output:

AW on camm7 836> ./water /db/pdb/6apr.pdb
Analysis of file /db/pdb/6apr.pdb (6apr)
number of water molecules: 222
A/E/HOH_591/O     2.52  C_912/E/GLY_101/O         2.91  A/E/SER_176/OG            3.15  C_912/E/HOH_501/O  
A/E/HOH_676/O     2.53  B_120/E/THR_184/OG1      19.07  A/E/LYS_108/NZ            3.42  C_022/E/HOH_537/O  
A/E/HOH_631/O     2.68  D_101/E/THR_184/OG1       4.12  A/E/ALA_237/O             2.89  A/E/HOH_638/O      
A/E/HOH_623/O     2.70  C_912/E/GLY_135/N         3.18  A/E/VAL_3/O               2.56  C_912/E/HOH_529/O  
A/E/HOH_542/O     2.73  C_912/E/VAL_136/O         5.02  A/E/GLY_2/N               2.73  A/E/HOH_623/O      
A/E/HOH_522/O     2.73  B_120/E/SER_233/OG        3.11  A/E/ALA_68/N              2.64  A/E/HOH_514/O      
A/E/HOH_630/O     2.76  B_129/E/LYS_108/NZ        3.13  A/E/THR_208/OG1           3.25  A/E/HOH_785/O      
A/E/HOH_624/O     2.84  B_120/E/ASN_229/N         4.34  A/E/ASN_61/O              2.62  A/E/HOH_683/O      
A/E/HOH_626/O     2.86  C_912/E/SER_145/OG        5.06  A/E/THR_177/OG1           2.77  A/E/HOH_695/O      
A/E/HOH_627/O     2.91  D_101/E/LEU_183/N         4.72  A/E/SER_207/OG            2.62  A/E/HOH_688/O      
A/E/HOH_538/O     2.91  C_922/E/SER_84/OG         4.04  A/E/GLY_283/N             2.71  A/E/HOH_547/O      
A/E/HOH_580/O     2.91  C_912/E/ASP_141/O         7.84  A/E/GLU_168/OE1           2.78  A/E/HOH_695/O      
A/E/HOH_514/O     2.92  B_120/E/SER_233/O         3.99  A/E/GLN_67/NE2            2.64  A/E/HOH_522/O      
A/E/HOH_585/O     3.01  B_120/E/ASP_214/N         6.02  A/E/ASN_61/ND2            3.32  A/E/HOH_791/O      
A/E/HOH_727/O     3.16  C_922/E/SER_74/OG         4.91  A/E/GLN_282/N             2.85  A/E/HOH_538/O      
A/E/HOH_792/O     3.21  D_101/E/THR_185/N         5.34  A/E/ALA_237/O             3.49  A/E/HOH_741/O      
A/E/HOH_537/O     3.32  D_101/E/SER_268/O         7.29  A/E/ARG_236/O             3.42  C_922/E/HOH_676/O  
A/E/HOH_811/O     3.58  B_120/E/SER_212/O         4.77  A/E/ASP_59/OD2            3.37  A/E/HOH_585/O      
A/E/HOH_598/O     3.58  B_129/E/GLY_27/O         12.27  A/E/ASN_229/ND2           5.89  B_129/E/HOH_784/O  
A/E/HOH_561/O     3.60  B_129/E/LYS_108/NZ        4.90  A/E/THR_206/O             2.59  A/E/HOH_826/O      
A/E/HOH_607/O     3.65  C_012/E/SER_268/N        11.59  A/E/GLN_67/OE1            4.28  B_120/E/HOH_537/O  
A/E/HOH_823/O     3.71  B_120/E/SER_212/OG        6.80  A/E/THR_46/O              4.25  B_120/E/HOH_787/O  
A/E/HOH_625/O     3.72  A/E/PRO_118/N             3.72  A/E/PRO_118/N             6.16  A/E/HOH_629/O      
A/E/HOH_672/O     4.03  A/E/VAL_315/O             4.03  A/E/VAL_315/O             4.33  A/E/HOH_739/O      
A/E/HOH_796/O     4.15  A/E/SER_113/N             4.15  A/E/SER_113/N             5.30  A/E/HOH_853/O      
A/E/HOH_695/O     4.34  C_912/E/ASP_141/O         5.50  A/E/LYS_178/NZ            2.77  A/E/HOH_626/O      
A/E/HOH_857/O     4.54  D_191/E/LYS_258/NZ       12.14  A/E/GLU_317/OE1           4.07  D_191/E/HOH_684/O  
A/E/HOH_812/O    11.11  D_291/E/ARG_192/NH1      26.49  A/E/ASN_265/OD1          14.18  C_012/E/HOH_609/O

A.Widmer, NIBR/CPC/CSG-SB