Wit!P uses "swiles" for the compact definition of molecular substructures, e.g. in the -atom matches and -atom fragment restrictors in atomselections, in the definition of automatic force field atomtype assignment rules and in the definition of atom and bond parameters of the MPEOE net atomic charge assignment method.
"swiles" is a linear chemical structure notation derived from the SMILES (Simplified Molecular Input Line Entry System) developed by D. Weininger [1] . "swiles" differ from SMILES in some details, e.g. the "m" in "swiles" is upside down.
(1) Atoms. Atoms are represented by their atomic symbols enclosed in square brackets ([,]). The brackets may be omitted for neutral atoms of the subset of common elements, C,H,N,O,S,P,Cl,Br,F, and I. The symbols "R" and "X" may be used to specify "any atom", and "any except H", respectively. Aromatic atoms are specified by lower case atomic symbols (e.g. c,n,x). Atoms with formal charges are specified inside square brackets by their atomic symbol, followed by one or more of the symbols + or -, followed by an optional digit (e.g. [N+], [Ca-2], [Ca--]).
Examples (including some invalid swiles):(2) Bonds. Bonds are represented by the symbols "-" (single), "=" (double), "#" (triple), ":" (aromatic) and "~" (any, not allowed in hard swiles). Single and aromatic bond symbols may usually be omitted: unspecified bond types are assumed to be aromatic for bonds between two aromatic atoms (lower case atom symbol) or single for all other bonds.
H hydrogen H+ invalid: charged symbols must be in brackets [H+] proton [N+] charged nitrogen Si invalid: elements other than C,H,N,O,S,P,Cl,Br,F, and I must be in brackets [Si] silicon [n+] aromatic nitrogen, charged
(3) Branches. Branches are specified by enclosures in parentheses.
The symbol for the connecting bond is specified as the first symbol inside
the opening parenthesis (if omitted, the same rules for default bond types
apply as for all other bonds). Of course, branches may be nested (a branch
within a branch) and stacked (several branches with a common root atom).
The following are valid swiles:
![]() CCOH |
![]() CC(C)C(=O)[O-] |
![]() CCN(CC)CC |
![]() NC(H)(C(CC)C)C=O |
(4) Ring closure bonds. The elements that have been introduced so far, atoms, bonds and branches, are sufficient for the description of acyclic structures. For cyclic structures, a notation for ring closure bonds is needed. To derive a swiles notation for a cyclic structure, one bond in each ring is broken, leaving an acyclic structure for which the swiles can easily be generated using the elements introduced in (1) - (3). To add the ring closure bonds, append a single digit (1, 2, 3 ,... 9, 0) to the first atom of the bond (left to right order in the swiles) and a bond symbol (-, =, #, :, ~) followed by the same digit to the second atom of the bond. A single atom may carry several ring closure markers. Once a ring is closed (i.e. the ring closure bond has been added to the second atom of the bond) the corresponding ring closure digit may be reused, which makes it possible to specify cyclic structures with more than 10 rings. The default bond type rules apply to ring closure bonds, making it possible to omit the bond type symbol in most cases.
Example: LSD, cyclic structure with four independent rings.
Removal of four ring bonds (marked in red in the structural formula above) leaves an acyclic structure with the swiles
NC=CCCN(C)CC(C(=O)N(CC)CC)C=Cc4ccccc
After insertion of the ring closure bonds, this is transformed into
Application of the default bond type convention gives the final swiles:
Note: since the ring bond labeled 4 is opened only after ring bond 3 is closed, the label 3 could have been "reused", leading to the equivalent swilesN1C=C2CC3N(C)CC(C(=O)N(CC)CC)C=C3c3cccc1c23
A hard swiles matches a molecular substructure, if (and only if) there is a 1-1 correspondence between atoms in the substructure and atoms in the swiles such that:
Exceptions: R (X) atoms in the
swiles may correspond to any atom (any non-hydrogen atom ) in the substructure.
R and X match substructure atoms of any charge, except if a formal charge
is specified for the swiles atom, in which case the swiles atom matches
only substructure atoms of the same formal charge. R and X swiles atoms
may match aromatic substructure atoms. Aromatic r and x swiles atoms are
similar to R and X, except that they match only aromatic substructure atoms.
Exception: substructure atoms
corresponding to R or X atoms in the swiles may have extra bonds which
are not single, and aromatic substructure atoms may have extra aromatic
bonds not mapped by a swiles bond.
A soft swiles matches a molecular substructure, if (and only if) there is a 1-1 correspondence between atoms in the substructure and atoms in the swiles such that:
Exception: R (X) atoms in the
swiles may correspond to any atom (any non-hydrogen atom) in the substructure
.
A substructure swiles matches a molecular substructure, if (and only if) there is a 1-1 correspondence between atoms in the substructure and atoms in the swiles such that:
Exceptions: R (X ) atoms in
the swiles may correspond to any atom (any non-hydrogen atom) in the
substructure.
1. D.Weininger:
SMILES, a Chemical Language and Information System. 1. Introduction
to Methodology and Encoding Rules.
J. Chem. Inf. Comput. Sci. 28: 31-36, 1988.