Yaml format for CLI input #583

richardjgowers · 2023-10-24T16:55:51Z

[skip ci]

addresses #579

draft of yaml format for CI

There's a couple of questions for how we structure things in here, but it's mostly been decided in various discussion

draft of yaml format for CI

IAlibay · 2023-10-24T16:58:56Z

openfecli/commands/input.yaml

+  scorer: default_lomap_score
+
+# Question:
+# use settings here, to match Protocol


I like the "settings" nest, since a) allows for further expansion for non-settings keys if they are ever needed, b) it's just visually pleasing to look at.

Not a super strong vote though.

+1 on settings nesting

mikemhenry · 2023-10-24T17:09:08Z

openfecli/commands/input.yaml

+# things behind a "method" key have to match up to a known thing in openfe
+
+network:
+  method: plan_radial_network


Are these going to be required? Like will there be some defaults if it is blank, or do we require some (or all) to have values?

I think everything in the file is optional. It's just to augment the command line plan_rbfe_network.

mikemhenry · 2023-10-24T17:15:42Z

openfecli/commands/input.yaml

+# Question:
+# is mapper*s* always a list and mapper always a single item
+# or do we allow mapper to define either a list or single item behind it?
+


If there is more than one mapper, how are things divided up? Like if there are 2 mappers, do we just create 2 mappings for each leg? or do you control which leg gets which mapper (if this is out of scope for what you are doing, then ignore).

I think I like:

mappers: - method: LomapAtomMapper settings: timeout: 60

and

mappers: - method: LomapAtomMapper settings: timeout: 60 - method: LomapAtomMapper++ settings: timeout: 60

so that we either have a list or a single one, but it always uses the mappers key even if there is one

The way multiple mappers works is already define in the plan_radial_network etc functions. see: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/setup/ligand_network_planning.py#L99

It's more aesthetically, is the spec that we allow Union[AtomMapper, list[AtomMapper]] to mapper? This would match the above signature I guess

dwhswenson

To be clear: this appears to be a format only for the basic RBFE/RHFE network planners. It looks like this design will not support more complicated alchemical network planners (e.g., something that combines RBFE and ABFE). This isn't necessarily a problem, but I'm just making sure we're conscious that this is solving a subset of the problem space, and we might need to revisit it (but that's an OpenFE 3.0 problem).

My main thought to consider for the future: A user can create a JSON Schema based on their installed plugins (once we have plugins) that represents the allowed input structure. This would be useful because it will allow in-editor autocomplete and validation (that's a vim plugin, but many text editors support this). So they get a bunch of usability for free, if we can output a JSON schema. Therefore I'd recommend approaches that are more friendly to JSON schema output.

dwhswenson · 2023-10-24T17:17:35Z

openfecli/commands/input.yaml

+  scorer: default_lomap_score
+
+# Question:
+# use settings here, to match Protocol


I honestly don't remember which approach was easier for a JSON Schema. I think the not nested version is a little easier: you define a single subschema with a literal for method and the appropriate list of settings keys. With nested settings, I think you have to define the subschema with the method literal and point another subschema for the settings. That would make documentation, e.g., as generated by sphinx-jsonschema, a little ugly. TBH, it is really just a question of which is easier with whatever tool we'll do to generate the JSON schema (do not go down the road of rolling your own).

dwhswenson · 2023-10-24T17:17:40Z

openfecli/commands/input.yaml

+    settings:
+      timeout: 120
+
+# OR not nesting settings, anything without the "method" key gets put into kwargs


Either one works from a JSON Schema standpoint. I'd vote to start by requiring it to be a list and add handling for non-list later, just as a matter of do the simple thing first.

fixes #580

pep8speaks · 2023-10-25T19:58:04Z

Hello @richardjgowers! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file openfecli/commands/plan_rhfe_network.py:

Line 76:80: E501 line too long (81 > 79 characters)

In the file openfecli/parameters/plan_network_options.py:

Line 84:80: E501 line too long (82 > 79 characters)
Line 97:80: E501 line too long (80 > 79 characters)
Line 138:80: E501 line too long (80 > 79 characters)
Line 148:80: E501 line too long (83 > 79 characters)
Line 150:80: E501 line too long (85 > 79 characters)
Line 157:80: E501 line too long (83 > 79 characters)

In the file openfecli/tests/commands/test_plan_rbfe_network.py:

Line 169:80: E501 line too long (83 > 79 characters)

In the file openfecli/tests/commands/test_plan_rhfe_network.py:

Line 122:80: E501 line too long (86 > 79 characters)

In the file openfecli/tests/parameters/test_plan_network_options.py:

Line 29:1: E302 expected 2 blank lines, found 1

Comment last updated at 2023-11-14 15:08:08 UTC

codecov · 2023-10-25T20:03:58Z

Codecov Report

Attention: 66 lines in your changes are missing coverage. Please review.

Comparison is base (7a39491) 92.21% compared to head (a1fb632) 91.43%.
Report is 4 commits behind head on main.

Files	Patch %	Lines
openfe/tests/protocols/test_openmm_afe_slow.py	25.39%	47 Missing ⚠️
...protocols/openmm_afe/equil_solvation_afe_method.py	60.00%	8 Missing ⚠️
openfe/protocols/openmm_rfe/equil_rfe_methods.py	57.14%	6 Missing ⚠️
openfecli/parameters/plan_network_options.py	93.75%	4 Missing ⚠️
openfe/protocols/openmm_afe/base.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #583      +/-   ##
==========================================
- Coverage   92.21%   91.43%   -0.78%     
==========================================
  Files         124      127       +3     
  Lines        8594     8594              
==========================================
- Hits         7925     7858      -67     
- Misses        669      736      +67

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

richardjgowers · 2023-10-25T20:17:48Z

openfecli/commands/plan_network_schema.py

+import warnings
+
+
+class MapperSelection(BaseModel):


@dwhswenson would defining pydantic models (which export to json schema) go towards making it easier for plugins to provide extensions?

I was definitely expecting to use pydantic (part of my "mistake I did in OPS CLI: I reinvented pydantic; let's not repeat that mistake").

That said, you'll need a model for, e.g, each AtomMapper, and then a separate model for MapperSelection, which is just mappers: list[LomapMapperModel | KartografMapperModel | ...], where LomapMapperModel looks like what you have now. (With the nested settings, you may need yet another model for, e.g., LomapMapperSettingsModel; this is part of why that makes the schema messier.)

In the later version, (1) we'd see what we can do to dynamically create the LomapMapperModel so devs don't have to repeat typing/docs; (2) we'd dynamically generate the MapperSelection model based on what plugins existed.

currently does nothing

richardjgowers · 2023-10-26T17:07:42Z

openfecli/parameters/plan_network_options.py

+
+        choices['mapper'] = cls(**opt.mapper.settings)
+    if opt.network:
+        network_choices = {


@dwhswenson avert your eyes! This is the hacky job I think will initially serve us. I think having this dict-like thing being eventually much more clever is what you're envisaging, this is as far as I'm taking it.

richardjgowers · 2023-10-30T15:55:36Z

hmm interesting, the failing test is on our pydantic 1 build. It's the normalisation to lower case

richardjgowers · 2023-11-01T17:12:29Z

Ok this is ready for review.

To try this out, you can create a settings.yaml file like below

mapper:
  method: LomapAtomMapper
  settings:
    time: 120

network:
  method: generate_minimal_redundant_network
  settings:
    mst_num: 3

Then run:

openfe plan-rbfe-network -p tyk2_protein.pdb -M tyk2_ligands.sdf -s settings.yaml

And this should have created a simulation plan with minimal redundant network selected edges

dwhswenson

Needs a number of minor changes even if we want to limit it to this functionality; see individual comments. Big things that are still missing:

A way to document the available options. This was sort of the big sticking point when we discussed this. It is not solved in any way at all here. An undocumented settings file is not particularly useful to users.
There's an assumption here that all user-supplied values are YAML/JSON basic types. This is true for the limited subset of functionality included in this PR, but it is going to break down very quickly once we move to, say, adding support for protocol settings. From this PR, it is unclear what the intended direction forward on that is.
There's no validation of the user input in the settings subdicts. This means that something that could easily be a user-understandable TypeError will lead to some unclear error at runtime. For example, give bad types to the LOMAP params and you end up with:

ArgumentError("Python argument types in\n rdkit.Chem.rdFMCS.FindMCS(list)\ndid not match C++ signature:\n FindMCS(boost::python::api::object mols, RDKit::PyMCSParameters {lvalue} parameters)\n FindMCS(boost::python::api::object mols, bool maximizeBonds=True, double threshold=1.0, unsigned int timeout=3600, bool verbose=False, bool matchValences=False, bool ringMatchesRingOnly=False, bool completeRingsOnly=False, bool matchChiralTag=False, RDKit::AtomComparator atomCompare=rdkit.Chem.rdFMCS.AtomCompare.CompareElements, RDKit::BondComparator bondCompare=rdkit.Chem.rdFMCS.BondCompare.CompareOrder, RDKit::RingComparator ringCompare=rdkit.Chem.rdFMCS.RingCompare.IgnoreRingFusion, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > seedSmarts='')")

That will not be a useful error for users.

dwhswenson · 2023-11-08T16:06:46Z

openfecli/commands/plan_rbfe_network.py

+@YAML_OPTIONS.parameter(
+    multiple=False, required=False, default=None,
+    help=YAML_OPTIONS.kwargs["help"],
+)


I think this should be the same:

Suggested change

@YAML_OPTIONS.parameter(

multiple=False, required=False, default=None,

help=YAML_OPTIONS.kwargs["help"],

)

@YAML_OPTIONS.parameter()

You don't need to explicitly specify default behavior. The purpose of other things changing help (for example) is to re-use most of the behavior of the parameter (or even help string) and add something else to clarify the use in this specific case.

I'm (we're?) not familiar with the defaults, so being explicit here is useful to know what's happening

If you want to actually be explicit, please list all standard defaults for click.Option and click.Parameter, except for those overridden in YAML_OPTIONS. Personally, I think that's way too much information, so I don't think that being explicit is useful here.

In particular, please do not explicitly list anything that is just repeated from YAML_OPTIONS, such as help=YAML_OPTIONS["help"]. This is not informative to a reader. This got into our codebase because someone copied it from this early code, and decided they didn't need the modification to the help. But as you can see from that code, the purpose of including it is adjust the default help for the parameter.

openfecli/commands/plan_rbfe_network.py

dwhswenson · 2023-11-08T16:17:35Z

openfecli/commands/plan_rbfe_network.py

+    if solvent is None:
+        solvent = SolventComponent()


Is there any code path in which solvent is not None? The (implied) else here is a meaningless code branch.

roughing out space for future option here

dwhswenson · 2023-11-08T16:18:28Z

openfecli/commands/plan_rhfe_network.py

+    mapper_obj = None
+    mapping_scorer = None
+    ligand_network_planner = None
+    solvent = None
+
+    if yaml_settings is not None:
+        yaml_options = YAML_OPTIONS.get(yaml_settings)
+        mapper_obj = yaml_options.get('mapper', None)
+        ligand_network_planner = yaml_options.get('network', None)
+
+    if mapper_obj is None:
+        mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, max3d=1)
+    if mapping_scorer is None:
+        mapping_scorer = default_lomap_score
+    if ligand_network_planner is None:
+        ligand_network_planner = generate_minimal_spanning_network
+    if solvent is None:
+        solvent = SolventComponent()


See discussion of duplicate in plan_rbfe_network.

openfecli/parameters/plan_network_options.py

dwhswenson · 2023-11-08T16:53:21Z

openfecli/commands/plan_rbfe_network.py

+    if mapping_scorer is None:
+        mapping_scorer = default_lomap_score


There doesn't seem to be an option changing the scorer here?

this is roughing out space for it in the future

dwhswenson · 2023-11-08T17:04:44Z

openfecli/commands/plan_rhfe_network.py

+@YAML_OPTIONS.parameter(
+    multiple=False, required=False, default=None,
+    help=YAML_OPTIONS.kwargs["help"],
+)


see duplicate in plan_rbfe_network

richardjgowers · 2023-11-14T11:53:10Z

@dwhswenson your comments on (auto) documentation, non native yaml types and better validation/error messages are not in the MVP, so they're not going to make this PR

mikemhenry · 2023-11-14T13:57:11Z

openfecli/parameters/plan_network_options.py

+    """
+    raw = yaml.safe_load(contents)
+
+    if False:


I think this would be good to have and also helps with typos in keys

mikemhenry · 2023-11-14T13:58:25Z

openfecli/parameters/plan_network_options.py

+        try:
+            cls = mapper_choices[opt.mapper.method]
+        except KeyError:
+            raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")


It would be nice to add

Suggested change

raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")

raise KeyError(f"Bad mapper choice: '{opt.mapper.method}' Allowed choices are {mapper_choices}")

Or something like that

mikemhenry · 2023-11-14T14:01:47Z

If there is a typo/extra key added in mapper.settings the error message isn't awful and it looks like we just call lomap so I am not sure if we can improve the error message, but we could catch it and add a hint "check path_to_settings_file for a mistake" or something
TypeError: LomapAtomMapper.__init__() got an unexpected keyword argument 'shifts'

mikemhenry · 2023-11-14T14:07:33Z

KeyError: "Bad mapper choice: 'kartografatommapper'"

KartografAtomMapper doesn't look like it works, but in this test it looks like it is an option? https://github.com/OpenFreeEnergy/openfe/pull/583/files#diff-486ee2538adfaddbe02f50989f8ac44fd18054d3fce3cd1a8dfc9cfcf3371b9dR24

IAlibay

Couple thoughts, nothing blocking.

openfecli/commands/plan_rbfe_network.py

IAlibay · 2023-11-14T14:40:47Z

openfecli/parameters/plan_network_options.py

+    if opt and opt.mapper:
+        mapper_choices = {
+            'lomap': LomapAtomMapper,
+            'lomapatommapper': LomapAtomMapper,


What is the advantage of having two choices here? Would it not be safer to just support the one thing?

just aliases to make finding options easier

IAlibay · 2023-11-14T14:41:57Z

openfecli/parameters/plan_network_options.py

+
+    # convert normalised inputs to objects
+    if opt and opt.mapper:
+        mapper_choices = {


Not for here, but later - it'd be great if we could define this as a dictionary somewhere else, that way we can just import that (I'm thinking about expanding this to include perses, kartograf, etc..)

also won't be known ahead of time, more likely this is more a EAFP situation

IAlibay · 2023-11-14T14:43:03Z

openfecli/parameters/plan_network_options.py

+        mapper_obj = cls(**opt.mapper.settings)
+    else:
+        mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False,
+                                     max3d=1)


Separate issue: @hannahbaumann did bring up the fact that max3d=1 is not the default for LomapAtomMapper - we don't know about element_charge, but max3d really should change to match this (it's real problematic otherwise).

this should ideally be up to date with our best practices, so sounds like we can patch this

see: OpenFreeEnergy/Lomap#45

IAlibay · 2023-11-14T14:43:54Z

openfecli/parameters/plan_network_options.py

+        extra = 'allow'
+        anystr_lower = True
+
+    method: Optional[str] = None


Rather than validating later on, could we just use a validator here for the allowed method names?

discussed this live, it won't be possible to have a list of possible methods a priori

richardjgowers · 2023-11-14T15:16:04Z

? add saving of argv to alchemical network creation

mikemhenry

As an MVP, this is ready, I think we need to start using it raising issues as we find them

[skip ci]

637c7f6

draft of yaml format for CI

IAlibay reviewed Oct 24, 2023

View reviewed changes

mikemhenry reviewed Oct 24, 2023

View reviewed changes

dwhswenson reviewed Oct 24, 2023

View reviewed changes

adds yaml reader for cli options input

21a0df6

fixes #580

richardjgowers commented Oct 25, 2023

View reviewed changes

richardjgowers added 6 commits October 26, 2023 12:31

change plan_rXfe_network_main functions to expect list of mappers

17e5026

move settings yaml parsing to be parameter

d4e3261

add YAML_OPTIONS as parameter

4ef7ab1

add YAML_OPTIONS to plan_rbfe_network

7ad79c7

currently does nothing

cli: add interpretation of settings yaml

53fc5b1

cli: enable settings yaml reading in plan_rbfe_network

675623c

richardjgowers commented Oct 26, 2023

View reviewed changes

richardjgowers added 4 commits October 30, 2023 09:51

cli: some extra documentation on load_yaml

900e43b

cli: fix tests to account for normalisation

4ec89e0

cli: add test for custom rbfe yaml usage

912720a

cli: update plan_rhfe to allow yaml options

5f5d619

richardjgowers added 2 commits October 31, 2023 13:02

cli: use v1/2 compatible Pydantic

90bfae8

cli: fixup pydantic v1/2 compat

f78762f

richardjgowers changed the title ~~Draft: Yaml format for CLI input~~ Yaml format for CLI input Nov 1, 2023

remove draft yaml idea

aab5620

richardjgowers mentioned this pull request Nov 3, 2023

Plan rbfe network radial #488

Closed

dwhswenson requested changes Nov 9, 2023

View reviewed changes

cli: centralise parsing of yaml options and defaults

0d57408

richardjgowers force-pushed the cli_yaml_draft branch from 43ef282 to 0d57408 Compare November 14, 2023 11:30

add pyyaml to deps

16cae4b

mikemhenry reviewed Nov 14, 2023

View reviewed changes

IAlibay reviewed Nov 14, 2023

View reviewed changes

Merge branch 'main' into cli_yaml_draft

a1fb632

mikemhenry approved these changes Nov 14, 2023

View reviewed changes

richardjgowers merged commit c96cb57 into main Nov 17, 2023
6 of 8 checks passed

richardjgowers deleted the cli_yaml_draft branch November 17, 2023 10:56

This was referenced Nov 20, 2023

cli: plan: yaml format definition #579

Closed

cli: variable ligand planning alg #581

Closed

cli: interchageable mappers #582

Closed

cli: internal tool visibility #590

Closed

		if mapping_scorer is None:
		mapping_scorer = default_lomap_score

	raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")
	raise KeyError(f"Bad mapper choice: '{opt.mapper.method}' Allowed choices are {mapper_choices}")

Yaml format for CLI input #583

Yaml format for CLI input #583

Conversation

richardjgowers commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwhswenson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Oct 25, 2023 • edited Loading

Comment last updated at 2023-11-14 15:08:08 UTC

codecov bot commented Oct 25, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

dwhswenson Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Oct 30, 2023

richardjgowers commented Nov 1, 2023

dwhswenson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Nov 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemhenry commented Nov 14, 2023

mikemhenry commented Nov 14, 2023

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Nov 14, 2023

mikemhenry left a comment

Choose a reason for hiding this comment

pep8speaks commented Oct 25, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

dwhswenson Oct 25, 2023 •

edited

Loading