Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification not in sync with avogadrolibs implementation #15

Open
awvwgk opened this issue Jul 26, 2022 · 16 comments
Open

Specification not in sync with avogadrolibs implementation #15

awvwgk opened this issue Jul 26, 2022 · 16 comments

Comments

@awvwgk
Copy link

awvwgk commented Jul 26, 2022

The specification provided for chemical JSON in this repository seems to behind the actually supported implementation in https://github.com/OpenChemistry/avogadrolibs/blob/master/avogadro/io/cjsonformat.cpp.

Where can implementors of chemical JSON find an up-to-date specification or is the implementation in avogadrolibs the source of truth for this format?

@ghutchis
Copy link
Member

Great question - I saw your earlier tag. Partly, this has been due to lack of time. (I'm behind on writing documentation for Avogadro 2 as well. 😄 ) Partly, it's been not knowing the best mechanism to encode the spec (e.g., using something formal like Pydantic.)

A somewhat more readable "ground truth" is in https://github.com/OpenChemistry/avogadro-cclib/blob/main/cclibScript.py which I'm working hard to keep that up-to-date and merge into cclib as well.

I've been discussing also with the cclib team on other attributes.

A few things that are definitely missing:

  • total charge / total spin
  • energies for multiple coordinate sets (e.g., conformers, IRC, optimization steps, etc.)
    • Right now this is going in "properties": { "energies": [ … ] } which makes some sense.
  • other kinds of molecular / atomic / bond properties (e.g., electrostatics, tensors, …)
  • calculation metadata (IIRC it's in the avogadro-cclib script, but not the C++ yet)

Right now, it supports IR / Raman, electronic, and NMR spectra, but only in a limited way because plotting has been somewhat complicated in Avogadro 2.

Suggestions / discussion are definitely welcome.

@ghutchis
Copy link
Member

The format is used by other code, particularly the MongoChem web components, but these are mostly extensions or reverse-engineering the work done for the web components.

@awvwgk
Copy link
Author

awvwgk commented Jul 26, 2022

I would prefer a spec to implement against instead of another implementation, but I'll work with what is available.

total charge / total spin

Only half of the formats xtb supports via mctc-lib have this (Turbomole coords, CTFiles like mol/sdf, QCSchema JSON, QChem molfiles, Gaussian external inputs), this is why we have overwrites via cmd-args or .CHRG / .UHF files, so not a big issue. Of course we loose the information when writing in the same format since xtb preserves the input format.

@berquist
Copy link
Contributor

using something formal like Pydantic

JSON Schema (either from the spec or as produced by Pydantic) has some annoying limitations, but overall my experience generating schemas and JSON blobs via Pydantic is great, and certainly better than hand-writing JSON.

@ghutchis
Copy link
Member

I would prefer a spec to implement against instead of another implementation, but I'll work with what is available.

In my experience, both a spec, and an implementation and/or live examples are critical, but I understand completely.

I just discovered https://jsontopydantic.com which should help writing a more formal spec and Pydantic model. Given my workload this week and a conference next week, this may take a while, but I'm happy to answer questions in the meantime.

@awvwgk
Copy link
Author

awvwgk commented Jul 26, 2022

I wonder what is supposed to happen if a chemical JSON such as

{"chemical json": 2, "chemicalJson": 1}

is provided, should the reader reject it or pass it like currently done in Avogadro 2? This behaviour is somewhat inconsistent with

{"chemical json": 1, "chemicalJson": 2}

which is going to fail due to the version number specified.

The specs don't mention that there are alternative keys and how an implementation should deal with duplicated entries.

@ghutchis
Copy link
Member

In general in JSON, non-matching keys are ignored. So a key of "chemical json" is irrelevant, because it's not "chemicalJson".

At the moment, there are not alternative keys and I'm not sure that would be a good idea.

@awvwgk
Copy link
Author

awvwgk commented Jul 26, 2022

At the moment, there are not alternative keys and I'm not sure that would be a good idea.

This is not true if I check the actual implementation at

https://github.com/OpenChemistry/avogadrolibs/blob/55dec1df36c574a456d18735032b8e29928a28b0/avogadro/io/cjsonformat.cpp#L95-L101

which allows both chemical json and chemicalJson as key for the version numbers, same for several other keys.

@ghutchis
Copy link
Member

Dealing with spaces in JSON keys is more cumbersome in JavaScript as well: https://stackoverflow.com/a/4968448/131896

molecule.atoms vs molecule["vibrational modes"]

@ghutchis
Copy link
Member

This is not true if I check the actual implementation

I think that's due to some unfortunate backwards compatibility. It's a bad idea and I don't recommend it.

As I mentioned, Avogadro now always writes Cartesian coordinates and cell vectors even with periodic systems. (Very old versions would convert to fractional and write parameters .. but you drop useful information.)

Yes, it's why a spec is needed vs. an implementation that tries to ensure backwards compatibility with not-quite compatible files.

@awvwgk
Copy link
Author

awvwgk commented Jul 26, 2022

Yes, it's why a spec is needed vs. an implementation that tries to ensure backwards compatibility with not-quite compatible files.

This is good to know, since I got the impression that I have the choice as implementor which key I can write, but have to always support both variants when reading. So the camelCase ones are the correct ones? (I should go change this than in my implementation)

@ghutchis
Copy link
Member

My general view on parsers is flexible on read, but correct on write. (This is motivated particularly by "PDB" files. 🙄

Let's also agree on properties => "totalCharge" and properties => "totalSpinMultiplicity" and I promise I'll get that into avogadrolibs tonight.

@awvwgk
Copy link
Author

awvwgk commented Jul 26, 2022

Let's also agree on properties => "totalCharge" and properties => "totalSpinMultiplicity" and I promise I'll get that into avogadrolibs tonight.

totalCharge and totalSpinMultiplicity are fine with me.

My general view on parsers is flexible on read, but correct on write.

I have to disagree on this one, every flexibility allowed in the parser will get used at some point by somebody and is than expected to be supported for all future. Even worse, future implementations are expected to support the same extensions (like this SDF stuff grimme-lab/mctc-lib#49). A parser should enforce the correctness of the format, rigorously.

@awvwgk
Copy link
Author

awvwgk commented Aug 1, 2022

Thanks for OpenChemistry/avogadrolibs#1058, however this doesn't solve the issue that avogadrolibs implements something else than documented here.

@ghutchis
Copy link
Member

ghutchis commented Aug 1, 2022

I'm aware - I'm at a conference this week, so my time to write a spec is limited. I will certainly write up a more complete current spec in the next few weeks.

@ghutchis
Copy link
Member

ghutchis commented Aug 4, 2022

Fortunately, it was a bit easier to push things into Pydantic than I worried. Here's a start, which will turn into a pull request.
https://gist.github.com/ghutchis/3a31e5b597dafb15de9ea048cad35268

Yes, some of these examples are really old and need to be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants