[Fsatom] Re: CML and macromolecules
Peter Murray-Rust
pm286@cam.ac.uk
Sun, 19 Oct 2003 21:43:14 +0100
At 21:16 19/10/2003 +0200, David wrote:
>Hi fsatoms,
Great to hear from you David - I gather you were at the tutorial
>I'm ready for the first discussion! How about residue information in
>CML? I ran babel on a pdb file and got only abbreviated atom names and
>coordinates.
In its original incarnation (ca 1994 (sic)) CML had support for
macromolecules. The philosophy was based on PDB and SwissProt. This included:
<SEQUENCE> for the sequence
<FEATURE> for all sequence/structure-based annotations
and <atom> had the attributes "residue" and "atomType"
Vestiges of these could be found in CML V1.0 and I wrote PDB readers and
Swiss readers.
However at that stage (a) We needed to concentrate more on systematising
small molecules (b) we thought that mmCIF-like approaches would come to
replace the PDB approach. (BTW I have been involved in the CIF effort and
much of the CML philosophy steals from CIF). (c) I would have to maintain it.
mmCIF has gradually been introduced into macromolecular crystallography and
provides much more extended support for things like:
- biological unit
- definitions of the chemical entities in the structure
- mapping between entities
- labelling of atoms, residues
mmCIF was the basis for the submission to the OMG Life Sciences program
(CML was the basis for the small molecules). We therefore felt that we
shouldn't muddy the waters and duplicate the mm effort.
However I think there is still a strong "PDB-like" approach and I am happy
to extend CML to manage that aspect of macromolecules. I think it's *not*
useful for CML to try to model protein hierarchy
(primary/secondary/supersecondary/tertiary/quaternary, etc.) However it
could be useful to have a "flat-file" approach" where the atoms had PDB
like info on:
- their PDB type (CA, SG. etc.)
- their PDB number
- their residue type
- the chain number.
CML could carry this - and more - , but would not support the explicit
hierarchy.
the result might look like:
<atom elementType="C" cmlx:residue="GLY13" cmlx:pdbNumber="23"
cmlx:chain="B" x3="1.23".../>
where cmlx: is an extension CML namespace.
This can be compacted to an array format like:
<atomArray elementType="C O N C C O N C C..."
cmlx:residue="GLY13 GLY13 CLY13 GLY13 ALA14..."
cmlx:pdbNumber="23 24 25 26..."/>
The array format can actually be more cost-effective in space than PDB
what CML will not support is:
<protein>
<biologicalUnit>
<crystalUnit>
<chain id="A">
<residue>
<atom>
<atom>
<chain id="B">
etc.
CML *can* support nested molecules so it is possible to write:
<molecule id="protein">
<molecule id="chainA">
<molecule id="chainB">
<molecule id="ligand1">
<molecule id="ligand2">
... and so on
but if you detail what you want we can see if it fits..
Best
Peter
>On the positive side, babel finds the chemical bonds and outputs them in
>the CML file. Are these 100 % reliable for correct structures?
>--
>David.
>________________________________________________________________________
>David van der Spoel, PhD, Assist. Prof., Molecular Biophysics group,
>Dept. of Cell and Molecular Biology, Uppsala University.
>Husargatan 3, Box 596, 75124 Uppsala, Sweden
>phone: 46 18 471 4205 fax: 46 18 511 755
>spoel@xray.bmc.uu.se spoel@gromacs.org http://xray.bmc.uu.se/~spoel
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++