[Fsatom] Re: CML and macromolecules
Peter Murray-Rust
pm286@cam.ac.uk
Mon, 20 Oct 2003 00:15:49 +0100
At 02:27 20/10/2003 +0200, David wrote:
>On Sun, 2003-10-19 at 22:43, Peter Murray-Rust wrote:
> >
> > However I think there is still a strong "PDB-like" approach and I am happy
> > to extend CML to manage that aspect of macromolecules. I think it's *not*
> > useful for CML to try to model protein hierarchy
> > (primary/secondary/supersecondary/tertiary/quaternary, etc.) However it
> > could be useful to have a "flat-file" approach" where the atoms had PDB
> > like info on:
> > - their PDB type (CA, SG. etc.)
> > - their PDB number
> > - their residue type
> > - the chain number.
> >
> > CML could carry this - and more - , but would not support the explicit
> > hierarchy.
> >
> > the result might look like:
> >
> > <atom elementType="C" cmlx:residue="GLY13" cmlx:pdbNumber="23"
> > cmlx:chain="B" x3="1.23".../>
> >
> > where cmlx: is an extension CML namespace.
> >
> > This can be compacted to an array format like:
> >
> > <atomArray elementType="C O N C C O N C C..."
> > cmlx:residue="GLY13 GLY13 CLY13 GLY13 ALA14..."
> > cmlx:pdbNumber="23 24 25 26..."/>
> >
>
>This looks indeed a lot more useful. The crux is very simple, my (and
>probably other peoples) laziness... If I have to read CML *and* PDB I
>have made my life more miserable instead of better! Therefore for
>molecular modeling ends, it would be crucial to have such things like
>residues, chain identifiers, crystal/unit cell information, b-factors,
>occupancies.
native CML supports occupancies. To support b-factors you could write:
<atom id="foo"... x3="1.23"...>
<scalar dictRef="iucr:_atom_site_B_iso">21.1</scalar>
</atom>
or you could argue for:
<atom id="foo"... x3="1.23"... cmlx:biso="21.1"/>
(or its atomArray equivalent);
Note that the content version (scalar) allows for the e.s.d's that
crystallographers report while the attribute version doesn't. The content
version can be created by anyone; the cmlx attributes have to be sanctioned
as a CML extension - which is still at an early stage. Note that all CML
extensions will have dictionary entries.
The main problem with extensions of either sort is that someone has to
write the code and it probably won't be me! I am fairly committed to
creating code for all integral CML constructs but not extensions. The
danger is that extensions will get lost in conversions.
>The remainder (mainly REMARK stuff) is most likely less interesting. If
>CML would provide the stuff listed above we don't have to worry about
>reading pdb files anymore, and we could even use the bond information
>that babel provides.
I have provided core functionality only for Babel. I do not know what
additional data structure they provide for PDB-like elements. If it's easy
I can probably hack it. I doubt they support BIso. The problem with all
conversion programs is that you are constrained by the data structures they
provide. Not their fault...
P.
> > The array format can actually be more cost-effective in space than PDB
> >
> > what CML will not support is:
> >
> > <protein>
> > <biologicalUnit>
> > <crystalUnit>
> > <chain id="A">
> > <residue>
> > <atom>
> > <atom>
> > <chain id="B">
> > etc.
>But this can easily be added in a different schema using namespaces.
>
>
>--
>David.
>________________________________________________________________________
>David van der Spoel, PhD, Assist. Prof., Molecular Biophysics group,
>Dept. of Cell and Molecular Biology, Uppsala University.
>Husargatan 3, Box 596, 75124 Uppsala, Sweden
>phone: 46 18 471 4205 fax: 46 18 511 755
>spoel@xray.bmc.uu.se spoel@gromacs.org http://xray.bmc.uu.se/~spoel
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>_______________________________________________
>Fsatom mailing list
>Fsatom@www.tddft.org
>http://www.tddft.org/mailman/listinfo/fsatom