Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonical implementation #123

Open
rkurchin opened this issue Oct 23, 2024 · 5 comments
Open

canonical implementation #123

rkurchin opened this issue Oct 23, 2024 · 5 comments
Labels
question Further information is requested

Comments

@rkurchin
Copy link
Collaborator

A major discussion point at the workshop yesterday was the idea of having an "official" or "canonical" implementation of the AtomsBase interface (somewhat analogous to ASE.Atoms in Python). Advantages would include:

  • easy entry point for new users to get started and a concrete type to use in generic documentation about the interface
  • can hopefully reduce some unnecessary duplication of functionality
  • avoid confusion about whether the currently included (and arguably not great) prototype implementations are actually meant to be used for things...they could presumably just be moved to documentation as examples and not actually included

Some logistical points to discuss on this:

  • should this live in this repository? In this repository, but not exported by default? In a different repository?
  • It was also mentioned that if it lives in a different repo, we might discuss renaming this one to AtomsInterface or similar instead

Implementation discussion points:

  • needs to be type-stable
  • how flexible to additional properties do we want to be? (fully arbitrary makes type stability difficult, but probably we want some such flexibility)
  • should it be mutable or immutable? Or should there be two versions?
  • Should its length be fixed (i.e. should atoms be able to be added/removed after instantiation)?
  • Should people be able to directly modify fields or only via setter methods (to enable preservation of any internal consistency between fields, a la issues Ask had mentioned in ASE)
  • things to build off of could include the versions in DecoratedParticles or ExtXYZ

Going to explicitly ping @cortner, @jgreener64, and @jameskermode here as they either very actively participated in discussion or implemented something that came up in said discussion :) but anyone should chime in with thoughts!

@rkurchin rkurchin added the question Further information is requested label Oct 23, 2024
@jgreener64
Copy link
Collaborator

I'm pro a canonical implementation that covers most use cases. I would only want it to be in the interface repo if it had negligible include time though. I don't see the harm in it being in a separate package.

The challenge would be agreeing on some minimal required properties, and yes I think there needs to be some way to store arbitrary data. But there needs to be at least some required properties, otherwise we end up defining an empty struct.

@mfherbst
Copy link
Member

I also think a canonical implementation outside AtomsBase would be good. For me important would be:

  • Arbitrary properties
  • A good set of basic properties (e.g. like FlexibleSystem)
    I think DecoratedParticles might be a good start, actually.

My feeling is that convenience (e.g. mutability, annotation of bond properties etc.) and efficiency/suitability for AD might be contradicting. So maybe we should in the long run even have two implementations, one flexible, one fast ? That makes it easy to convert to/from these things whenever coding generic algorithms like GeometryOptimization does ? Would of course be good if this fragmentation can be avoided.

@jameskermode
Copy link

+1 for arbitrary properties. This has been critical to the success of ase.Atoms.

@cortner
Copy link
Member

cortner commented Oct 24, 2024

Apparently Ask is not that happy with it :)

@lmiq
Copy link

lmiq commented Nov 19, 2024

I'm somewhat late for the discussion, but maybe I have something to add. I have been recently implementing the reading o mmCIF files in PDBTools.jl because I needed to store and read data for simulation files of 70M atoms. For that, apart from having to optimize the reading of the file, I had to remove a custom dictionary from my Atom data structure, because itself was too large to be stored in memory for every atom, even if empty:

julia> d = Dict{Symbol,Any}()
Dict{Symbol, Any}()

julia> Base.summarysize(d) * 70 * 10^6 / 1024 / 1024
30441.2841796875

(that would be 30GB just to store empty dicts for custom properties).

The result was that my Atom type, which previously always contained an empty Dict, had to be become a parametric Atom{CustomType} which defaults to Atom{Nothing} where the custom field of the struct is instantiated to nothing. With that I can hold in memory the data for the 70M atoms.

As a side note, I kept my Atom implementation mutable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants