-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yaml format for CLI input #583
Conversation
openfecli/commands/input.yaml
Outdated
scorer: default_lomap_score | ||
|
||
# Question: | ||
# use settings here, to match Protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the "settings" nest, since a) allows for further expansion for non-settings keys if they are ever needed, b) it's just visually pleasing to look at.
Not a super strong vote though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on settings nesting
openfecli/commands/input.yaml
Outdated
# things behind a "method" key have to match up to a known thing in openfe | ||
|
||
network: | ||
method: plan_radial_network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these going to be required? Like will there be some defaults if it is blank, or do we require some (or all) to have values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything in the file is optional. It's just to augment the command line plan_rbfe_network
.
openfecli/commands/input.yaml
Outdated
# Question: | ||
# is mapper*s* always a list and mapper always a single item | ||
# or do we allow mapper to define either a list or single item behind it? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is more than one mapper, how are things divided up? Like if there are 2 mappers, do we just create 2 mappings for each leg? or do you control which leg gets which mapper (if this is out of scope for what you are doing, then ignore).
I think I like:
mappers:
- method: LomapAtomMapper
settings:
timeout: 60
and
mappers:
- method: LomapAtomMapper
settings:
timeout: 60
- method: LomapAtomMapper++
settings:
timeout: 60
so that we either have a list or a single one, but it always uses the mappers
key even if there is one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way multiple mappers works is already define in the plan_radial_network
etc functions. see: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/setup/ligand_network_planning.py#L99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more aesthetically, is the spec that we allow Union[AtomMapper, list[AtomMapper]]
to mapper? This would match the above signature I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: this appears to be a format only for the basic RBFE/RHFE network planners. It looks like this design will not support more complicated alchemical network planners (e.g., something that combines RBFE and ABFE). This isn't necessarily a problem, but I'm just making sure we're conscious that this is solving a subset of the problem space, and we might need to revisit it (but that's an OpenFE 3.0 problem).
My main thought to consider for the future: A user can create a JSON Schema based on their installed plugins (once we have plugins) that represents the allowed input structure. This would be useful because it will allow in-editor autocomplete and validation (that's a vim plugin, but many text editors support this). So they get a bunch of usability for free, if we can output a JSON schema. Therefore I'd recommend approaches that are more friendly to JSON schema output.
openfecli/commands/input.yaml
Outdated
scorer: default_lomap_score | ||
|
||
# Question: | ||
# use settings here, to match Protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't remember which approach was easier for a JSON Schema. I think the not nested version is a little easier: you define a single subschema with a literal for method
and the appropriate list of settings keys. With nested settings, I think you have to define the subschema with the method
literal and point another subschema for the settings. That would make documentation, e.g., as generated by sphinx-jsonschema
, a little ugly. TBH, it is really just a question of which is easier with whatever tool we'll do to generate the JSON schema (do not go down the road of rolling your own).
openfecli/commands/input.yaml
Outdated
settings: | ||
timeout: 120 | ||
|
||
# OR not nesting settings, anything without the "method" key gets put into kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either one works from a JSON Schema standpoint. I'd vote to start by requiring it to be a list and add handling for non-list later, just as a matter of do the simple thing first.
Hello @richardjgowers! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2023-11-14 15:08:08 UTC |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #583 +/- ##
==========================================
- Coverage 92.21% 91.43% -0.78%
==========================================
Files 124 127 +3
Lines 8594 8594
==========================================
- Hits 7925 7858 -67
- Misses 669 736 +67 ☔ View full report in Codecov by Sentry. |
import warnings | ||
|
||
|
||
class MapperSelection(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwhswenson would defining pydantic models (which export to json schema) go towards making it easier for plugins to provide extensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely expecting to use pydantic (part of my "mistake I did in OPS CLI: I reinvented pydantic; let's not repeat that mistake").
That said, you'll need a model for, e.g, each AtomMapper, and then a separate model for MapperSelection
, which is just mappers: list[LomapMapperModel | KartografMapperModel | ...]
, where LomapMapperModel
looks like what you have now. (With the nested settings, you may need yet another model for, e.g., LomapMapperSettingsModel
; this is part of why that makes the schema messier.)
In the later version, (1) we'd see what we can do to dynamically create the LomapMapperModel
so devs don't have to repeat typing/docs; (2) we'd dynamically generate the MapperSelection
model based on what plugins existed.
currently does nothing
|
||
choices['mapper'] = cls(**opt.mapper.settings) | ||
if opt.network: | ||
network_choices = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwhswenson avert your eyes! This is the hacky job I think will initially serve us. I think having this dict-like thing being eventually much more clever is what you're envisaging, this is as far as I'm taking it.
hmm interesting, the failing test is on our pydantic 1 build. It's the normalisation to lower case |
Ok this is ready for review. To try this out, you can create a mapper:
method: LomapAtomMapper
settings:
time: 120
network:
method: generate_minimal_redundant_network
settings:
mst_num: 3 Then run: openfe plan-rbfe-network -p tyk2_protein.pdb -M tyk2_ligands.sdf -s settings.yaml And this should have created a simulation plan with minimal redundant network selected edges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a number of minor changes even if we want to limit it to this functionality; see individual comments. Big things that are still missing:
-
A way to document the available options. This was sort of the big sticking point when we discussed this. It is not solved in any way at all here. An undocumented settings file is not particularly useful to users.
-
There's an assumption here that all user-supplied values are YAML/JSON basic types. This is true for the limited subset of functionality included in this PR, but it is going to break down very quickly once we move to, say, adding support for protocol settings. From this PR, it is unclear what the intended direction forward on that is.
-
There's no validation of the user input in the
settings
subdicts. This means that something that could easily be a user-understandableTypeError
will lead to some unclear error at runtime. For example, give bad types to the LOMAP params and you end up with:ArgumentError("Python argument types in\n rdkit.Chem.rdFMCS.FindMCS(list)\ndid not match C++ signature:\n FindMCS(boost::python::api::object mols, RDKit::PyMCSParameters {lvalue} parameters)\n FindMCS(boost::python::api::object mols, bool maximizeBonds=True, double threshold=1.0, unsigned int timeout=3600, bool verbose=False, bool matchValences=False, bool ringMatchesRingOnly=False, bool completeRingsOnly=False, bool matchChiralTag=False, RDKit::AtomComparator atomCompare=rdkit.Chem.rdFMCS.AtomCompare.CompareElements, RDKit::BondComparator bondCompare=rdkit.Chem.rdFMCS.BondCompare.CompareOrder, RDKit::RingComparator ringCompare=rdkit.Chem.rdFMCS.RingCompare.IgnoreRingFusion, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > seedSmarts='')")
That will not be a useful error for users.
@YAML_OPTIONS.parameter( | ||
multiple=False, required=False, default=None, | ||
help=YAML_OPTIONS.kwargs["help"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the same:
@YAML_OPTIONS.parameter( | |
multiple=False, required=False, default=None, | |
help=YAML_OPTIONS.kwargs["help"], | |
) | |
@YAML_OPTIONS.parameter() |
You don't need to explicitly specify default behavior. The purpose of other things changing help
(for example) is to re-use most of the behavior of the parameter (or even help string) and add something else to clarify the use in this specific case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm (we're?) not familiar with the defaults, so being explicit here is useful to know what's happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to actually be explicit, please list all standard defaults for click.Option and click.Parameter, except for those overridden in YAML_OPTIONS
. Personally, I think that's way too much information, so I don't think that being explicit is useful here.
In particular, please do not explicitly list anything that is just repeated from YAML_OPTIONS
, such as help=YAML_OPTIONS["help"]
. This is not informative to a reader. This got into our codebase because someone copied it from this early code, and decided they didn't need the modification to the help. But as you can see from that code, the purpose of including it is adjust the default help for the parameter.
if solvent is None: | ||
solvent = SolventComponent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any code path in which solvent is not None
? The (implied) else
here is a meaningless code branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roughing out space for future option here
mapper_obj = None | ||
mapping_scorer = None | ||
ligand_network_planner = None | ||
solvent = None | ||
|
||
if yaml_settings is not None: | ||
yaml_options = YAML_OPTIONS.get(yaml_settings) | ||
mapper_obj = yaml_options.get('mapper', None) | ||
ligand_network_planner = yaml_options.get('network', None) | ||
|
||
if mapper_obj is None: | ||
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, max3d=1) | ||
if mapping_scorer is None: | ||
mapping_scorer = default_lomap_score | ||
if ligand_network_planner is None: | ||
ligand_network_planner = generate_minimal_spanning_network | ||
if solvent is None: | ||
solvent = SolventComponent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion of duplicate in plan_rbfe_network
.
if mapping_scorer is None: | ||
mapping_scorer = default_lomap_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be an option changing the scorer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is roughing out space for it in the future
@YAML_OPTIONS.parameter( | ||
multiple=False, required=False, default=None, | ||
help=YAML_OPTIONS.kwargs["help"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see duplicate in plan_rbfe_network
43ef282
to
0d57408
Compare
@dwhswenson your comments on (auto) documentation, non native yaml types and better validation/error messages are not in the MVP, so they're not going to make this PR |
""" | ||
raw = yaml.safe_load(contents) | ||
|
||
if False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be good to have and also helps with typos in keys
try: | ||
cls = mapper_choices[opt.mapper.method] | ||
except KeyError: | ||
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'") | |
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}' Allowed choices are {mapper_choices}") |
Or something like that
If there is a typo/extra key added in |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple thoughts, nothing blocking.
if opt and opt.mapper: | ||
mapper_choices = { | ||
'lomap': LomapAtomMapper, | ||
'lomapatommapper': LomapAtomMapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of having two choices here? Would it not be safer to just support the one thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just aliases to make finding options easier
|
||
# convert normalised inputs to objects | ||
if opt and opt.mapper: | ||
mapper_choices = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for here, but later - it'd be great if we could define this as a dictionary somewhere else, that way we can just import that (I'm thinking about expanding this to include perses, kartograf, etc..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also won't be known ahead of time, more likely this is more a EAFP situation
mapper_obj = cls(**opt.mapper.settings) | ||
else: | ||
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, | ||
max3d=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate issue: @hannahbaumann did bring up the fact that max3d=1
is not the default for LomapAtomMapper - we don't know about element_charge
, but max3d
really should change to match this (it's real problematic otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should ideally be up to date with our best practices, so sounds like we can patch this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra = 'allow' | ||
anystr_lower = True | ||
|
||
method: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than validating later on, could we just use a validator here for the allowed method names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed this live, it won't be possible to have a list of possible methods a priori
? add saving of argv to alchemical network creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an MVP, this is ready, I think we need to start using it raising issues as we find them
[skip ci]
addresses #579
draft of yaml format for CI
There's a couple of questions for how we structure things in here, but it's mostly been decided in various discussion