Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml format for CLI input #583

Merged
merged 18 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- pytest-cov
- pytest-rerunfailures
- pydantic >1
- pyyaml
- coverage
- cinnabar ==0.3.0
- openff-toolkit>=0.13.0
Expand Down
37 changes: 18 additions & 19 deletions openfecli/commands/plan_rbfe_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from openfecli.utils import write, print_duration
from openfecli import OFECommandPlugin
from openfecli.parameters import (
MOL_DIR, PROTEIN, MAPPER, OUTPUT_DIR, COFACTORS,
MOL_DIR, PROTEIN, MAPPER, OUTPUT_DIR, COFACTORS, YAML_OPTIONS,
)
from openfecli.plan_alchemical_networks_utils import plan_alchemical_network_output

Expand All @@ -24,8 +24,8 @@ def plan_rbfe_network_main(

Parameters
----------
mapper : LigandAtomMapper
the mapper to use to generate the mapping
mapper : list[LigandAtomMapper]
list of mappers to use to generate the mapping
mapping_scorer : Callable
scorer, that evaluates the generated mappings
ligand_network_planner : Callable
Expand All @@ -51,7 +51,7 @@ def plan_rbfe_network_main(
)

network_planner = RBFEAlchemicalNetworkPlanner(
mappers=[mapper],
mappers=mapper,
mapping_scorer=mapping_scorer,
ligand_network_planner=ligand_network_planner,
)
Expand All @@ -78,13 +78,19 @@ def plan_rbfe_network_main(
@COFACTORS.parameter(
multiple=True, required=False, default=None, help=COFACTORS.kwargs["help"]
)
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
Comment on lines +81 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the same:

Suggested change
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
@YAML_OPTIONS.parameter()

You don't need to explicitly specify default behavior. The purpose of other things changing help (for example) is to re-use most of the behavior of the parameter (or even help string) and add something else to clarify the use in this specific case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm (we're?) not familiar with the defaults, so being explicit here is useful to know what's happening

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to actually be explicit, please list all standard defaults for click.Option and click.Parameter, except for those overridden in YAML_OPTIONS. Personally, I think that's way too much information, so I don't think that being explicit is useful here.

In particular, please do not explicitly list anything that is just repeated from YAML_OPTIONS, such as help=YAML_OPTIONS["help"]. This is not informative to a reader. This got into our codebase because someone copied it from this early code, and decided they didn't need the modification to the help. But as you can see from that code, the purpose of including it is adjust the default help for the parameter.

@OUTPUT_DIR.parameter(
help=OUTPUT_DIR.kwargs["help"] + " Defaults to `./alchemicalNetwork`.",
default="alchemicalNetwork",
)
@print_duration
def plan_rbfe_network(
molecules: List[str], protein: str, cofactors: tuple[str], output_dir: str
molecules: List[str], protein: str, cofactors: tuple[str],
yaml_settings: str,
output_dir: str,
):
"""
Plan a relative binding free energy network, saved as JSON files for
Expand Down Expand Up @@ -115,15 +121,6 @@ def plan_rbfe_network(

write("Parsing in Files: ")

from gufe import SolventComponent
from openfe.setup.atom_mapping.lomap_scorers import (
default_lomap_score,
)
from openfe.setup import LomapAtomMapper
from openfe.setup.ligand_network_planning import (
generate_minimal_spanning_network,
)

# INPUT
write("\tGot input: ")

Expand All @@ -142,27 +139,29 @@ def plan_rbfe_network(
cofactors = []
write("\t\tCofactors: " + str(cofactors))

solvent = SolventComponent()
yaml_options = YAML_OPTIONS.get(yaml_settings)
IAlibay marked this conversation as resolved.
Show resolved Hide resolved
mapper_obj = yaml_options.mapper
mapping_scorer = yaml_options.scorer
ligand_network_planner = yaml_options.ligand_network_planner
solvent = yaml_options.solvent

write("\t\tSolvent: " + str(solvent))
write("")

write("Using Options:")
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, max3d=1)
write("\tMapper: " + str(mapper_obj))

# TODO: write nice parameter
mapping_scorer = default_lomap_score
write("\tMapping Scorer: " + str(mapping_scorer))

# TODO: write nice parameter
ligand_network_planner = generate_minimal_spanning_network
write("\tNetworker: " + str(ligand_network_planner))
write("")

# DO
write("Planning RBFE-Campaign:")
alchemical_network, ligand_network = plan_rbfe_network_main(
mapper=mapper_obj,
mapper=[mapper_obj],
mapping_scorer=mapping_scorer,
ligand_network_planner=ligand_network_planner,
small_molecules=small_molecules,
Expand Down
35 changes: 16 additions & 19 deletions openfecli/commands/plan_rhfe_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from openfecli.utils import write, print_duration
from openfecli import OFECommandPlugin
from openfecli.parameters import (
MOL_DIR, MAPPER, OUTPUT_DIR,
MOL_DIR, MAPPER, OUTPUT_DIR, YAML_OPTIONS,
)
from openfecli.plan_alchemical_networks_utils import plan_alchemical_network_output

Expand All @@ -21,8 +21,8 @@ def plan_rhfe_network_main(

Parameters
----------
mapper : LigandAtomMapper
the mapper to use to generate the mapping
mapper : list[LigandAtomMapper]
list of mappers to use to generate the mapping
mapping_scorer : Callable
scorer, that evaluates the generated mappings
ligand_network_planner : Callable
Expand All @@ -43,7 +43,7 @@ def plan_rhfe_network_main(
)

network_planner = RHFEAlchemicalNetworkPlanner(
mappers=[mapper],
mappers=mapper,
mapping_scorer=mapping_scorer,
ligand_network_planner=ligand_network_planner,
)
Expand All @@ -64,12 +64,16 @@ def plan_rhfe_network_main(
@MOL_DIR.parameter(
required=True, help=MOL_DIR.kwargs["help"] + " Any number of sdf paths."
)
@YAML_OPTIONS.parameter(
multiple=False, required=False, default=None,
help=YAML_OPTIONS.kwargs["help"],
)
Comment on lines +67 to +70
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see duplicate in plan_rbfe_network

@OUTPUT_DIR.parameter(
help=OUTPUT_DIR.kwargs["help"] + " Defaults to `./alchemicalNetwork`.",
default="alchemicalNetwork",
)
@print_duration
def plan_rhfe_network(molecules: List[str], output_dir: str):
def plan_rhfe_network(molecules: List[str], yaml_settings: str, output_dir: str):
"""
Plan a relative hydration free energy network, saved as JSON files for
the quickrun command.
Expand Down Expand Up @@ -102,15 +106,6 @@ def plan_rhfe_network(molecules: List[str], output_dir: str):

write("Parsing in Files: ")

from gufe import SolventComponent
from openfe.setup.atom_mapping.lomap_scorers import (
default_lomap_score,
)
from openfe.setup import LomapAtomMapper
from openfe.setup.ligand_network_planning import (
generate_minimal_spanning_network,
)

# INPUT
write("\tGot input: ")

Expand All @@ -120,27 +115,29 @@ def plan_rhfe_network(molecules: List[str], output_dir: str):
+ " ".join([str(sm) for sm in small_molecules])
)

solvent = SolventComponent()
yaml_options = YAML_OPTIONS.get(yaml_settings)
mapper_obj = yaml_options.mapper
mapping_scorer = yaml_options.scorer
ligand_network_planner = yaml_options.ligand_network_planner
solvent = yaml_options.solvent

write("\t\tSolvent: " + str(solvent))
write("")

write("Using Options:")
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False, max3d=1)
write("\tMapper: " + str(mapper_obj))

# TODO: write nice parameter
mapping_scorer = default_lomap_score
write("\tMapping Scorer: " + str(mapping_scorer))

# TODO: write nice parameter
ligand_network_planner = generate_minimal_spanning_network
write("\tNetworker: " + str(ligand_network_planner))
write("")

# DO
write("Planning RHFE-Campaign:")
alchemical_network, ligand_network = plan_rhfe_network_main(
mapper=mapper_obj,
mapper=[mapper_obj],
mapping_scorer=mapping_scorer,
ligand_network_planner=ligand_network_planner,
small_molecules=small_molecules,
Expand Down
1 change: 1 addition & 0 deletions openfecli/parameters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
from .output_dir import OUTPUT_DIR
from .protein import PROTEIN
from .molecules import MOL_DIR, COFACTORS
from .plan_network_options import YAML_OPTIONS
179 changes: 179 additions & 0 deletions openfecli/parameters/plan_network_options.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# This code is part of OpenFE and is licensed under the MIT license.
# For details, see https://github.com/OpenFreeEnergy/openfe
"""Pydantic models for the definition of advanced CLI options

"""
import click
from collections import namedtuple
try:
# todo; once we're fully v2, we can use ConfigDict not nested class
from pydantic.v1 import BaseModel # , ConfigDict
except ImportError:
from pydantic import BaseModel
from plugcli.params import Option
from typing import Any, Optional
import yaml
richardjgowers marked this conversation as resolved.
Show resolved Hide resolved
import warnings


PlanNetworkOptions = namedtuple('PlanNetworkOptions',
['mapper', 'scorer',
'ligand_network_planner', 'solvent'])


class MapperSelection(BaseModel):
# model_config = ConfigDict(extra='allow', str_to_lower=True)
class Config:
extra = 'allow'
anystr_lower = True

method: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than validating later on, could we just use a validator here for the allowed method names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed this live, it won't be possible to have a list of possible methods a priori

settings: dict[str, Any] = {}


class NetworkSelection(BaseModel):
# model_config = ConfigDict(extra='allow', str_to_lower=True)
class Config:
extra = 'allow'
anystr_lower = True

method: Optional[str] = None
settings: dict[str, Any] = {}


class CliYaml(BaseModel):
# model_config = ConfigDict(extra='allow')
class Config:
extra = 'allow'

mapper: Optional[MapperSelection] = None
network: Optional[NetworkSelection] = None


def parse_yaml_planner_options(contents: str) -> CliYaml:
"""Parse and minimally validate a user provided yaml

Parameters
----------
contents : str
raw yaml formatted input to parse

Returns
-------
options : CliOptions
will have keys for mapper and network topology choices

Raises
------
ValueError
for any malformed inputs
"""
raw = yaml.safe_load(contents)

if False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be good to have and also helps with typos in keys

# todo: warnings about extra fields we don't expect?
expected = {'mapper', 'network'}
for field in raw:
if field in expected:
continue
warnings.warn(f"Ignoring unexpected section: '{field}'")

return CliYaml(**raw)


def load_yaml_planner_options(path: Optional[str], context) -> PlanNetworkOptions:
"""Load cli options from yaml file path and resolve these to objects

Parameters
----------
path : str
path to the yaml file
context
unused

Returns
-------
PlanNetworkOptions : namedtuple
a namedtuple with fields 'mapper', 'scorer', 'network_planning_algorithm',
and 'solvent' fields.
these fields each hold appropriate objects ready for use
"""
from gufe import SolventComponent
from openfe.setup.ligand_network_planning import (
generate_radial_network,
generate_minimal_spanning_network,
generate_maximal_network,
generate_minimal_redundant_network,
)
from openfe.setup import (
LomapAtomMapper,
)
from openfe.setup.atom_mapping.lomap_scorers import (
default_lomap_score,
)
from functools import partial

if path is not None:
with open(path, 'r') as f:
raw = f.read()

# convert raw yaml to normalised pydantic model
opt = parse_yaml_planner_options(raw)
else:
opt = None

# convert normalised inputs to objects
if opt and opt.mapper:
mapper_choices = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for here, but later - it'd be great if we could define this as a dictionary somewhere else, that way we can just import that (I'm thinking about expanding this to include perses, kartograf, etc..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also won't be known ahead of time, more likely this is more a EAFP situation

'lomap': LomapAtomMapper,
'lomapatommapper': LomapAtomMapper,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of having two choices here? Would it not be safer to just support the one thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just aliases to make finding options easier

}

try:
cls = mapper_choices[opt.mapper.method]
except KeyError:
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add

Suggested change
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}'")
raise KeyError(f"Bad mapper choice: '{opt.mapper.method}' Allowed choices are {mapper_choices}")

Or something like that

mapper_obj = cls(**opt.mapper.settings)
else:
mapper_obj = LomapAtomMapper(time=20, threed=True, element_change=False,
max3d=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate issue: @hannahbaumann did bring up the fact that max3d=1 is not the default for LomapAtomMapper - we don't know about element_charge, but max3d really should change to match this (it's real problematic otherwise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should ideally be up to date with our best practices, so sounds like we can patch this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# todo: choice of scorer goes here
mapping_scorer = default_lomap_score

if opt and opt.network:
network_choices = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwhswenson avert your eyes! This is the hacky job I think will initially serve us. I think having this dict-like thing being eventually much more clever is what you're envisaging, this is as far as I'm taking it.

'generate_radial_network': generate_radial_network,
'radial': generate_radial_network,
'generate_minimal_spanning_network': generate_minimal_spanning_network,
'mst': generate_minimal_spanning_network,
'generate_minimal_redundant_network': generate_minimal_redundant_network,
'generate_maximal_network': generate_maximal_network,
}

try:
func = network_choices[opt.network.method]
except KeyError:
raise KeyError(f"Bad network algorithm choice: '{opt.network.method}'")

ligand_network_planner = partial(func, **opt.network.settings)
else:
ligand_network_planner = generate_minimal_spanning_network

# todo: choice of solvent goes here
solvent = SolventComponent()

return PlanNetworkOptions(
mapper_obj,
mapping_scorer,
ligand_network_planner,
solvent,
)


YAML_OPTIONS = Option(
'-s', "--settings", "yaml_settings",
type=click.Path(exists=True, dir_okay=False),
help="Path to planning settings yaml file.",
getter=load_yaml_planner_options,
)
Loading