Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml format for CLI input #583

Merged
merged 18 commits into from
Nov 17, 2023
Merged

Yaml format for CLI input #583

merged 18 commits into from
Nov 17, 2023

Conversation

richardjgowers
Copy link
Contributor

[skip ci]

addresses #579

draft of yaml format for CI

There's a couple of questions for how we structure things in here, but it's mostly been decided in various discussion

draft of yaml format for CI
scorer: default_lomap_score

# Question:
# use settings here, to match Protocol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "settings" nest, since a) allows for further expansion for non-settings keys if they are ever needed, b) it's just visually pleasing to look at.

Not a super strong vote though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on settings nesting

# things behind a "method" key have to match up to a known thing in openfe

network:
method: plan_radial_network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these going to be required? Like will there be some defaults if it is blank, or do we require some (or all) to have values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything in the file is optional. It's just to augment the command line plan_rbfe_network.

# Question:
# is mapper*s* always a list and mapper always a single item
# or do we allow mapper to define either a list or single item behind it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is more than one mapper, how are things divided up? Like if there are 2 mappers, do we just create 2 mappings for each leg? or do you control which leg gets which mapper (if this is out of scope for what you are doing, then ignore).

I think I like:

mappers:
  - method: LomapAtomMapper
    settings:
      timeout: 60

and

mappers:
  - method: LomapAtomMapper
    settings:
      timeout: 60
  - method: LomapAtomMapper++
    settings:
      timeout: 60

so that we either have a list or a single one, but it always uses the mappers key even if there is one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way multiple mappers works is already define in the plan_radial_network etc functions. see: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/setup/ligand_network_planning.py#L99

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more aesthetically, is the spec that we allow Union[AtomMapper, list[AtomMapper]] to mapper? This would match the above signature I guess

Copy link
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this appears to be a format only for the basic RBFE/RHFE network planners. It looks like this design will not support more complicated alchemical network planners (e.g., something that combines RBFE and ABFE). This isn't necessarily a problem, but I'm just making sure we're conscious that this is solving a subset of the problem space, and we might need to revisit it (but that's an OpenFE 3.0 problem).

My main thought to consider for the future: A user can create a JSON Schema based on their installed plugins (once we have plugins) that represents the allowed input structure. This would be useful because it will allow in-editor autocomplete and validation (that's a vim plugin, but many text editors support this). So they get a bunch of usability for free, if we can output a JSON schema. Therefore I'd recommend approaches that are more friendly to JSON schema output.

scorer: default_lomap_score

# Question:
# use settings here, to match Protocol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't remember which approach was easier for a JSON Schema. I think the not nested version is a little easier: you define a single subschema with a literal for method and the appropriate list of settings keys. With nested settings, I think you have to define the subschema with the method literal and point another subschema for the settings. That would make documentation, e.g., as generated by sphinx-jsonschema, a little ugly. TBH, it is really just a question of which is easier with whatever tool we'll do to generate the JSON schema (do not go down the road of rolling your own).

settings:
timeout: 120

# OR not nesting settings, anything without the "method" key gets put into kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either one works from a JSON Schema standpoint. I'd vote to start by requiring it to be a list and add handling for non-list later, just as a matter of do the simple thing first.

@pep8speaks
Copy link

pep8speaks commented Oct 25, 2023

Hello @richardjgowers! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 76:80: E501 line too long (81 > 79 characters)

Line 84:80: E501 line too long (82 > 79 characters)
Line 97:80: E501 line too long (80 > 79 characters)
Line 138:80: E501 line too long (80 > 79 characters)
Line 148:80: E501 line too long (83 > 79 characters)
Line 150:80: E501 line too long (85 > 79 characters)
Line 157:80: E501 line too long (83 > 79 characters)

Line 169:80: E501 line too long (83 > 79 characters)

Line 122:80: E501 line too long (86 > 79 characters)

Line 29:1: E302 expected 2 blank lines, found 1

Comment last updated at 2023-11-14 15:08:08 UTC

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Attention: 66 lines in your changes are missing coverage. Please review.

Comparison is base (7a39491) 92.21% compared to head (a1fb632) 91.43%.
Report is 4 commits behind head on main.

Files Patch % Lines
openfe/tests/protocols/test_openmm_afe_slow.py 25.39% 47 Missing ⚠️
...protocols/openmm_afe/equil_solvation_afe_method.py 60.00% 8 Missing ⚠️
openfe/protocols/openmm_rfe/equil_rfe_methods.py 57.14% 6 Missing ⚠️
openfecli/parameters/plan_network_options.py 93.75% 4 Missing ⚠️
openfe/protocols/openmm_afe/base.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #583      +/-   ##
==========================================
- Coverage   92.21%   91.43%   -0.78%     
==========================================
  Files         124      127       +3     
  Lines        8594     8594              
==========================================
- Hits         7925     7858      -67     
- Misses        669      736      +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

import warnings


class MapperSelection(BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwhswenson would defining pydantic models (which export to json schema) go towards making it easier for plugins to provide extensions?

Copy link
Member

@dwhswenson dwhswenson Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was definitely expecting to use pydantic (part of my "mistake I did in OPS CLI: I reinvented pydantic; let's not repeat that mistake").

That said, you'll need a model for, e.g, each AtomMapper, and then a separate model for MapperSelection, which is just mappers: list[LomapMapperModel | KartografMapperModel | ...], where LomapMapperModel looks like what you have now. (With the nested settings, you may need yet another model for, e.g., LomapMapperSettingsModel; this is part of why that makes the schema messier.)

In the later version, (1) we'd see what we can do to dynamically create the LomapMapperModel so devs don't have to repeat typing/docs; (2) we'd dynamically generate the MapperSelection model based on what plugins existed.


choices['mapper'] = cls(**opt.mapper.settings)
if opt.network:
network_choices = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwhswenson avert your eyes! This is the hacky job I think will initially serve us. I think having this dict-like thing being eventually much more clever is what you're envisaging, this is as far as I'm taking it.

@richardjgowers
Copy link
Contributor Author

hmm interesting, the failing test is on our pydantic 1 build. It's the normalisation to lower case

@richardjgowers
Copy link
Contributor Author

Ok this is ready for review.

To try this out, you can create a settings.yaml file like below

mapper:
  method: LomapAtomMapper
  settings:
    time: 120

network:
  method: generate_minimal_redundant_network
  settings:
    mst_num: 3

Then run:

openfe plan-rbfe-network -p tyk2_protein.pdb -M tyk2_ligands.sdf -s settings.yaml

And this should have created a simulation plan with minimal redundant network selected edges

@richardjgowers richardjgowers changed the title Draft: Yaml format for CLI input Yaml format for CLI input Nov 1, 2023
Copy link
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a number of minor changes even if we want to limit it to this functionality; see individual comments. Big things that are still missing:

  • A way to document the available options. This was sort of the big sticking point when we discussed this. It is not solved in any way at all here. An undocumented settings file is not particularly useful to users.

  • There's an assumption here that all user-supplied values are YAML/JSON basic types. This is true for the limited subset of functionality included in this PR, but it is going to break down very quickly once we move to, say, adding support for protocol settings. From this PR, it is unclear what the intended direction forward on that is.

  • There's no validation of the user input in the settings subdicts. This means that something that could easily be a user-understandable TypeError will lead to some unclear error at runtime. For example, give bad types to the LOMAP params and you end up with:

    ArgumentError("Python argument types in\n rdkit.Chem.rdFMCS.FindMCS(list)\ndid not match C++ signature:\n FindMCS(boost::python::api::object mols, RDKit::PyMCSParameters {lvalue} parameters)\n FindMCS(boost::python::api::object mols, bool maximizeBonds=True, double threshold=1.0, unsigned int timeout=3600, bool verbose=False, bool matchValences=False, bool ringMatchesRingOnly=False, bool completeRingsOnly=False, bool matchChiralTag=False, RDKit::AtomComparator atomCompare=rdkit.Chem.rdFMCS.AtomCompare.CompareElements, RDKit::BondComparator bondCompare=rdkit.Chem.rdFMCS.BondCompare.CompareOrder, RDKit::RingComparator ringCompare=rdkit.Chem.rdFMCS.RingCompare.IgnoreRingFusion, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > seedSmarts='')")

    That will not be a useful error for users.

Comment on lines +81 to +84
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the same:

Suggested change
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
@YAML_OPTIONS.parameter()

You don't need to explicitly specify default behavior. The purpose of other things changing help (for example) is to re-use most of the behavior of the parameter (or even help string) and add something else to clarify the use in this specific case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm (we're?) not familiar with the defaults, so being explicit here is useful to know what's happening

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to actually be explicit, please list all standard defaults for click.Option and click.Parameter, except for those overridden in YAML_OPTIONS. Personally, I think that's way too much information, so I don't think that being explicit is useful here.

In particular, please do not explicitly list anything that is just repeated from YAML_OPTIONS, such as help=YAML_OPTIONS["help"]. This is not informative to a reader. This got into our codebase because someone copied it from this early code, and decided they didn't need the modification to the help. But as you can see from that code, the purpose of including it is adjust the default help for the parameter.

openfecli/commands/plan_rbfe_network.py Outdated Show resolved Hide resolved
Comment on lines 171 to 172
if solvent is None:
solvent = SolventComponent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any code path in which solvent is not None? The (implied) else here is a meaningless code branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roughing out space for future option here

Comment on lines 127 to 144
mapper_obj = None
mapping_scorer = None
ligand_network_planner = None
solvent = None

if yaml_settings is not None:
yaml_options = YAML_OPTIONS.get(yaml_settings)
mapper_obj = yaml_options.get('mapper', None)
ligand_network_planner = yaml_options.get('network', None)

if mapper_obj is None:
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, max3d=1)
if mapping_scorer is None:
mapping_scorer = default_lomap_score
if ligand_network_planner is None:
ligand_network_planner = generate_minimal_spanning_network
if solvent is None:
solvent = SolventComponent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion of duplicate in plan_rbfe_network.

openfecli/parameters/plan_network_options.py Show resolved Hide resolved
Comment on lines 167 to 168
if mapping_scorer is None:
mapping_scorer = default_lomap_score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be an option changing the scorer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is roughing out space for it in the future

Comment on lines +67 to +70
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see duplicate in plan_rbfe_network

@richardjgowers
Copy link
Contributor Author

@dwhswenson your comments on (auto) documentation, non native yaml types and better validation/error messages are not in the MVP, so they're not going to make this PR

"""
raw = yaml.safe_load(contents)

if False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be good to have and also helps with typos in keys

try:
cls = mapper_choices[opt.mapper.method]
except KeyError:
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add

Suggested change
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}' Allowed choices are {mapper_choices}")

Or something like that

@mikemhenry
Copy link
Contributor

If there is a typo/extra key added in mapper.settings the error message isn't awful and it looks like we just call lomap so I am not sure if we can improve the error message, but we could catch it and add a hint "check path_to_settings_file for a mistake" or something
TypeError: LomapAtomMapper.__init__() got an unexpected keyword argument 'shifts'

@mikemhenry
Copy link
Contributor

KeyError: "Bad mapper choice: 'kartografatommapper'"

KartografAtomMapper doesn't look like it works, but in this test it looks like it is an option? https://github.com/OpenFreeEnergy/openfe/pull/583/files#diff-486ee2538adfaddbe02f50989f8ac44fd18054d3fce3cd1a8dfc9cfcf3371b9dR24

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple thoughts, nothing blocking.

openfecli/commands/plan_rbfe_network.py Show resolved Hide resolved
if opt and opt.mapper:
mapper_choices = {
'lomap': LomapAtomMapper,
'lomapatommapper': LomapAtomMapper,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of having two choices here? Would it not be safer to just support the one thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just aliases to make finding options easier


# convert normalised inputs to objects
if opt and opt.mapper:
mapper_choices = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for here, but later - it'd be great if we could define this as a dictionary somewhere else, that way we can just import that (I'm thinking about expanding this to include perses, kartograf, etc..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also won't be known ahead of time, more likely this is more a EAFP situation

mapper_obj = cls(**opt.mapper.settings)
else:
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False,
max3d=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate issue: @hannahbaumann did bring up the fact that max3d=1 is not the default for LomapAtomMapper - we don't know about element_charge, but max3d really should change to match this (it's real problematic otherwise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should ideally be up to date with our best practices, so sounds like we can patch this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra = 'allow'
anystr_lower = True

method: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than validating later on, could we just use a validator here for the allowed method names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed this live, it won't be possible to have a list of possible methods a priori

@richardjgowers
Copy link
Contributor Author

? add saving of argv to alchemical network creation

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an MVP, this is ready, I think we need to start using it raising issues as we find them

@richardjgowers richardjgowers merged commit c96cb57 into main Nov 17, 2023
6 of 8 checks passed
@richardjgowers richardjgowers deleted the cli_yaml_draft branch November 17, 2023 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants