Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looping operations and subgraph management #205

Open
eirrgang opened this issue Dec 6, 2018 · 11 comments
Open

Looping operations and subgraph management #205

eirrgang opened this issue Dec 6, 2018 · 11 comments
Labels
enhancement gmxapi pertains to this repository and the Python module support

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Dec 6, 2018

relates to #190
relates to #84

Control flow vs. data flow

Branching and conditional logic could be handled in gmxapi as data flow, either by

  1. allowing operations to rewrite the work graph,
  2. adding API operations to cancel operations or set their stop conditions, or
  3. wrapping operations in other operations (either intercepting their inputs and outputs to allow data flow conditions to be satisfied before they are eligible for execution, or through nested work graphs)

We already do a bit of (2) with the simulation stop condition that can be issued by an MD plugin. We have not needed to allow operations to rewrite their encapsulating work graph (1) yet, and to do so would make scheduling and execution graph management harder, so I think we should avoid that if we can. But allowing rewrite or optional execution of nested work graphs seems straightforward.

Looping operations, like for or while map well to nested work graphs, conceptually. Since work graphs are representable in a data structure that is compatible with the generic params dictionary in workspec 1, we could put entire work graphs in the parameters for new control operations. The representation would need to be updated for workspec 2, but we should be able to make a user interface that could stay the same.

@eirrgang eirrgang added enhancement gmxapi pertains to this repository and the Python module support labels Dec 6, 2018
@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

Peter had offered the following pseudocode.

init_subgraph(inputs=gmx.load_tpr([file1, file2]))
do {
    modified_inputs = gmx_alter_parms(inputs, ...)
    md = gmx.md_run(modified_inputs)
    cluster = gmx.command_line('cluster', inputs = ...)
}
while(test=gmx.external('is_converged', cluster), next_input=cluster.output.conformations)

I think the intention would be to run a bunch of simulations, find the most distinct conformations, then use those ('-clndx' or '-cl' option) to launch another set of simulations, repeating until something about the gmx cluster command converges (maybe either the number of conformations, or a user-provided analysis of cluster characteristics (RMSD eigenvectors?))

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

We can do clever tricks to make Python for loops turn into gmxapi operations, but it might not be worthwhile. do...while would not be possible to automatically translate into gmxapi operations for deferred execution. I think we can assume operations like gmx.for, gmx.foreach, gmx.while, and, possibly, gmx.do_while.

Proposed syntax for Peter's example, assuming all of the frames from all of the trajectories are clustered. I don't have a clear notion of how an operation might change width during execution, so assume we run the same number of simulations in each loop, and that we have enough clusters to do so.

# Add a TPR-loading operation to the default work graph (initially empty) that produces 
# the standard simulation input data bundle (parameters, structure, topology, state)
initial_input = gmx.load_tpr([file1, file2])

# Get a placeholder object that can serve as a sub context / work graph owner,
# and which can be used in a control operation.
subgraph = gmx.subgraph(input={'conformation': initial_input})

# As an alternative to specifying the context or graph in each call, intuiting the graph,
# or requiring the user to manage globally switching, we could use a context manager.
with subgraph:
    modified_input = gmx.modify_input(input=initial_input, structure=subgraph.input.conformation)
    md = gmx.mdrun(input=modified_input)
    # Assume the existence of a more integrated gmx.trajcat operation
    cluster = gmx.command_line('gmx', 'cluster', input=gmx.reduce(gmx.trajcat, md.output.trajectory))
    condition = mymodule.cluster_analyzer(input=cluster.output.file["-ev"])
    subgraph.next_input.conformation = cluster.output.conformation

# In the default work graph, add a node that depends on `condition` wraps subgraph.
# It makes sense that a convergence-checking operation is initialized such that
# `is_converged() == False`
my_loop = gmx.while(gmx.logical_not(condition.is_converged), subgraph)
gmx.run()

I think we should consider what this would look like in TensorFlow, though. I think the subgraph would be a "layer" and the inputs / next_input would be Variable(s). Along with some aspect of the cluster output, they could be one of the special Variables that participate in automatic updating, learning, etc.

@peterkasson
Copy link
Collaborator

peterkasson commented Dec 7, 2018

Suggest a modification as follows:

subgraph = gmx.subgraph(input={'conformation': initial_input})

# As an alternative to specifying the context or graph in each call, intuiting the graph,
# or requiring the user to manage globally switching, we could use a context manager.
with subgraph:
    modified_input = gmx.modify_input(input=initial_input, structure=subgraph.input.conformation)
    md = gmx.mdrun(input=modified_input)
    # Assume the existence of a more integrated gmx.trajcat operation
    cluster = gmx.command_line('gmx', 'cluster', input=gmx.reduce(gmx.trajcat, md.output.trajectory))
    condition = mymodule.cluster_analyzer(input=cluster.output.file["-ev"])
    subgraph.output.conformation = cluster.output.conformation

# In the default work graph, add a node that depends on `condition` wraps subgraph.
# It makes sense that a convergence-checking operation is initialized such that
# `is_converged() == False`
my_loop = gmx.while(gmx.logical_not(condition.is_converged), subgraph)
gmx.run()

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

In the above comment from @peterkasson , it is up to the implementation of gmx.while to map the output of subgraph on iteration 1 to the input of subgraph on iteration 2

@jmhays
Copy link
Contributor

jmhays commented Dec 10, 2018

Question about step number next:

Say we were doing adaptive MSMs weighting by uncertainty rather than restarting uniformly across ensemble members. Then we would not say subgraph.output.conformation = cluster.output.conformation. Instead, the cluster_analyzer would need to return a gmxapi-recognizable conformation object? Am I understanding this properly?

@eirrgang
Copy link
Collaborator Author

Clarification to @jmhays' comment: in mymodule.cluster_analyzer is simultaneously the name of an operation that will appear in the work graph and something that can be imported at runtime to make that operation happen. It seems like that means that, in a Python module, it is a helper function that returns an object that is compatible with a gmxapi "Operation", which we are trying to specify more clearly, but which has a notion of named input and/or output "ports".

For convenience, I have been neglecting the additional layer of input or output attribute when it is redundant (e.g. gmx.fileio._SimulationInput) because it is easy to check for, say, object.output.parameters and then object.parameters, but maybe that is sloppy an unclear, in which case the example line my_loop = gmx.while(gmx.logical_not(condition.is_converged), subgraph) would more appropriately be my_loop = gmx.while(gmx.logical_not(condition.output.is_converged), subgraph)

Comment: I don't think that we should allow an assignment of the form subgraph.output.conformation = cluster.output.conformation unless we have already declared that there is an output called conformation and allow for (future compatibility) a declaration of what sort of data is at such a place.

Maybe replace with subgraph.add_output(conformation=cluster.output.conformation) or subgraph.add_output({'conformation': cluster.output.conformation})

@eirrgang
Copy link
Collaborator Author

I think the answer to "Question about step number next" is that cluster_analyzer would be a helper function in a mymodule.py that produces an Operation (named "cluster_analyzer" in namespace: "mdmodule") that has an output named conformation, that is a gmxapi compatible attribute in the Python world, meaning that it can be used as an input to another operation or extract()ed in the Python script.

A raw implementation of such an Operation and helper function will be clearer as I finish up #85, but the plan ought to be to make it easy to generate appropriate wrappers. This is the set of updates that we've been talking about for the C++ plugin development environment as well as probably a Python superclass or helper function that would allow you to declare (a) something stateful, (b) named input hooks, (c) named output hooks, and connect them with whatever calculation you need.

I'll follow up with a gist or something for a possible example.

@jmhays
Copy link
Contributor

jmhays commented Dec 10, 2018

Here Eric, let's try this. I'll write the code and you can tell me what I'm doing wrong.

import gmx
# other imports

class cluster_analyzer(gmx_analyzer):
    """ Analyzes results of gmx cluster command line tool """
    def __init__(self, input: list):
        super().__init__()
        self.input = input  # List of files to do things with
    def other_function(self):
        return other_stuff
    def is_converged(self):
        """ Calculates relative entropy of two MSMs """
        # TODO: Relative entropy calc'n
        return relative_entropy < tolerance

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 10, 2018

I assume there is a typo in your init definition.

I was thinking that mymodule.cluster_analyzer would have to be a function to have the flexible usage we want (both something clear to use in a Python script and an importable factory function to use in the implementation details of the Session launcher).

So the simplest change to what you've written would probably be to define a class and use a helper from the gmx package to make a compatible wrapper. Something like
https://gist.github.com/eirrgang/0d975eb279fce21f59aa29de2b1316f2

I don't know how quickly I can come up with a plausible from-scratch implementation since I still haven't finished designing the solutions to #85 or #190

The trick is that we need myplugin.cluster_analyzer(...) to produce an object with an is_converged attribute (or a nested output.is_converged attribute, that can serve as a placeholder when building a work graph but which has a clear means of describing to gmxapi what happens when the work graph is executed. Ultimately, this means it can express a data type, a unique (deterministic) identity, and a mapping to a getter of some sort. For example, in #86 I added gmx.fileio.read_tpr, which provides runtime getters for the output data, but in #85 I need to add a gmx.read_tpr thingy that mirrors gmx.fileio.read_tpr for the work graph manipulation.

@eirrgang
Copy link
Collaborator Author

I don't think this is what you're looking for, but just as a reminder, the way gmxapi operations are currently defined and launched is by creating a gmx.workflow.WorkElement and by defining a Context operation in the Context itself. I've been working on making this more modular for a while, but it is an ongoing process. Right now, operations like 'md' and 'load_tpr' are defined in a map ( https://github.com/kassonlab/gmxapi/blob/master/src/gmx/context.py#L698 ) that is initialized with some static internal function mappings ( https://github.com/kassonlab/gmxapi/blob/master/src/gmx/context.py#L26 and https://github.com/kassonlab/gmxapi/blob/master/src/gmx/context.py#L59 ) , extensible with Context.add_operation, and with a protocol for automatic extensibility that is contained in Context.enter ( https://github.com/kassonlab/gmxapi/blob/master/src/gmx/context.py#L761 ). Solidifying these protocols are the subjects of several issues, particularly in
https://github.com/kassonlab/gmxapi/milestone/3

So that is how to add Python modules to gmxapi in 0.0.7, but not what I want for #190 or by the time #205 is resolved.

@eirrgang
Copy link
Collaborator Author

Refinements:

We would like to use Python context managers (with blocks) as the primary short-hand for manipulating operations in the scope of a subgraph. We want to encourage or require syntax that clearly indicates this scope and limits access to subgraph components that are not explicitly exposed.

We can use the TensorFlow Variable scoping as examples, but we might find more user-friendly syntax.

  • All operations should allow a user-provided label string to be specified for later retrieval from the graph or subgraph.
  • The Python context manager __exit__ can't del Python variables assigned in the with block, but we can get the same effect by making the reference raise an exception if accessed from outside of the graph. Note: del would undefine a named variable so that an attempt to access raises NameError. That doesn't sound quite right to me for our case; I propose gmx.exceptions.ScopeError for Out of scope reference: object 'foo.spam' referenced while currently active subgraph is 'bar'.
  • We encourage access with explicit scoped naming instead of with direct Python references captured from function return values. We can either use generated attributes (e.g. mysubgraph.namednode.output) or getters (like TensorFlow variables, e.g. mysubgraph.get('namednode').output)

@eirrgang eirrgang added this to the gmxapi_workspec_0_2 milestone Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement gmxapi pertains to this repository and the Python module support
Projects
None yet
Development

No branches or pull requests

3 participants