diff --git a/docs/contributing.rst b/docs/contributing.rst index 2b3c4cfcbe..9d05d14f03 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -9,4 +9,5 @@ Documentation for maintaining and contributing to this project. projectstructure layers/python layers/bindings + layers/workspec_schema_0_2 diff --git a/docs/layers/workspec_schema_0_2.rst b/docs/layers/workspec_schema_0_2.rst index 98bf7af8ac..273bf666b1 100644 --- a/docs/layers/workspec_schema_0_2.rst +++ b/docs/layers/workspec_schema_0_2.rst @@ -2,6 +2,26 @@ Work specification schema ========================= +.. contents:: + :local: + +Changes in second version +========================= + +- Use the term "work graph" instead of "work specification". + ``gmx.workflow.WorkSpec`` is replaced with an interface for a view to a work graph owned + by a Context instance. +- Schema version string changes from ``gmxapi_workspec_0_1`` to ``gmxapi_graph_0_2`` +- ``gmx.workflow.WorkElement`` class is replaced with an interface definition + for an instance of an Operation. Users no longer create objects representing + work directly. +- Deprecate work graph operations ``gmxapi.md`` and ``gromacs.load_tpr`` +- User-provided ``name`` properties are replaced with two new properties: + - ``label`` optionally identifies the entity to the user. + - ``uid`` is a unique identifier that is deterministically generated by the API to + completely and verifiably characterize an entity in terms of its inputs to facilitate + reproducibility, optimization, and flexibility in graph manipulation. + Goals ===== @@ -38,7 +58,7 @@ Serialization format The work specification record is valid JSON serialized data, restricted to the latin-1 character set, encoded in utf-8. Uniqueness -========== +---------- Goal: results should be clearly mappable to the work specification that led to them, such that the same work could be repeated from scratch, interrupted, restarted, etcetera, in part or in whole, and verifiably produce the same results @@ -81,12 +101,11 @@ trajectory time. Other than the intentional ambiguity that could be introduced w paragraph, Heuristics -========== +---------- Dependency order affects order of instantiation and the direction of binding operations at session launch. -Rules of thumb --------------- +.. rubric:: Rules of thumb An element can not depend on another element that is not in the work specification. *Caveat: we probably need a special operation just to expose the results of a different work flow.* @@ -94,9 +113,606 @@ An element can not depend on another element that is not in the work specificati Dependency direction affects sequencing of Director calls when launching a session, but also may be used at some point to manage checkpoints or data flow state checks at a higher level than the execution graph. +Namespaces +========== + +Python, work graph serialization spec, and extension modules +------------------------------------------------------------ + +I need to work on expressing it more clearly (maybe through Sphinx formatting), +but it is important to note that there are three different concepts implied by +the prefixes to names used here. + +Names starting with ``gmx.`` are symbols in the Python ``gmx`` package. +Names starting with ``gmxapi.`` are not Python names, but work graph operations +defined for gmxapi and implemented by a gmxapi compatible execution Context. + +Names starting with ``gromacs.`` are also work graph operations, but are implemented +throught GROMACS library bindings (currently ``gmx.core`` but it seems like we should +separate out). +They are less firmly specified because they +are dependent on GROMACS terminology, conventions, and evolution. +Operations +implemented by extension modules use a namespace equal to their importable module name. + +The Context implementation in the Python package implements the runtime aspects +of gmxapi operations in submodules of ``gmx``, named (hopefully conveniently) the +same as the work graph operation or ``gmx`` helper function. + +The procedural interface in the :py:mod:`gmx` module provides helper functions that produce handles to work graph +operations and to simplify more involved API tasks. + +Operation and Object References +------------------------------- + +Entities in a work graph also have (somewhat) human readable names with nested +scope indicated by ``.`` delimiters. Within the scope of a work node, namespaces +distinguish several types of interaction behavior. (See :ref:`grammar`.) +Within those scopes, operation definitions specify named "ports" that are +available for nodes of a given operation. +Port names and object types are defined in the API spec (for operations in the ``gmxapi`` +namespace) and expressed through the lower level API. + +The ports for a work graph node are accessible by proxy in the Python interface, +using correspondingly named nested attributes of a Python reference to the node. + +Note: we need a unique identifier and a well defined scheme for generating them so +that the API can determine data flow, tag artifacts, and detect complet or partially +complete work. It could be that we should separate work node 'name' into 'uid' +and 'label', where 'label' is a non-canonical and non-required part of a work +graph representation. + +Canonical work graph representation +=================================== + +*Define the deterministic way to identify a work graph and its artifacts for +persistence across interruptions and to avoid duplication of work...* + +Core API roles and collaborations +================================= + +Interfaces and/or Base classes +------------------------------ + +OperationFactory +~~~~~~~~~~~~~~~~ + +An OperationFactory receives a Context and Input, and returns an OperationHandle to the caller. + +Context +~~~~~~~ + +Variations include + +* GraphContext that just builds a graph that can be serialized for deserialization by another context. +* LaunchContext that processes a graph to be run in appropriate OperationContexts. Produces a Session. +* OperationContext or ImmediateContext that immediately executes the implemented operation + +NodeBuilder +~~~~~~~~~~~ + +``addResource()`` Configures inputs, outputs, framework requirements, and factory functions. +``build()`` returns an OperationHandle + +Operation +~~~~~~~~~ + +The OperationHandle returned by a factory may be a an implementing object or some sort of wrapper or proxy object. +output() provides getters for Results. + +Has a Runner behavior. + +Result +~~~~~~ + +gmxapi-typed output data. May be implemented with futures and/or proxies. Provides +extract() method to convert to a valid local owning data handle. + +Behaviors +--------- + +Launcher +~~~~~~~~ + +Callable that accepts a Session (Context) and produces a Runnable (Operation). +A special case of OperationDirector. + +Runner +~~~~~~ + +Takes input, runs, and returns a Runner that can be called in the same way. + +run() -> Runner +run(Runner::Resources) -> Runner + +OperationDirector +~~~~~~~~~~~~~~~~~ + +Takes Context and Input to produce a Runner. + +Use a Context to get one or more NodeBuilders to configure the operation in a new context. +Then return a properly configured OperationHandle for the context. + +Graph +~~~~~ + +Can produce NodeBuilders, store directorFactories. + +Can serialize / deserialize the workflow representation. + +* ``serialize()`` +* ``uid()`` +* ``newOperation()``: get a NodeBuilder +* ``node(uid)``: getter +* ``get_node_by_label(str)``: find uid +* iterator + +OutputProxy +~~~~~~~~~~~ + +Service requests for Results for an Operator's output nodes. + +Input +~~~~~ + +Input is highly dependent on the implementation of the operation and the context in which +it is executing. The important thing is that it is something that can be interpreted by a DirectorFactory. + +Arbitrary lists of arguments and keyword arguments can be accepted by a Python +module director to direct the construction of one or more graph nodes or to +get an immediately executed runner. + +GraphNode or serialized Operation Input is accepted by a LaunchContext or +DispatchingContext. + +A runner implementing the operation execution accepts Input in the form of +SessionResources. + Middleware API ============== +The work graph has a basic grammar and structure that maps well to basic data structures. +We use JSON for serialization of a Python dictionary. + +Integers and floating point numbers are 64-bit. + +The JSON data should be utf-8 compatible, but note that JSON codecs probably map Unicode string +objects on the program side to un-annotated strings in the serialized data +(encoding is at the level of the entire byte stream). + +.. _grammar: + +Work graph grammar +------------------ + +Names (labels and UIDs) in the work graph are strings from the ASCII / Latin-1 character set. +Periods (``.``) have special meaning as delimiters. + +Bare string values are interpreted as references to other work graph entities. Strings in +lists are interpreted as strings. + +Operations +---------- + +Each node in a work graph represents an instance of an Operation. +The API specifies operations that a gmxapi-compliant execution context *should* provide in +the ``gmxapi`` namespace. + +All specified ``gmxapi`` operations are provided by the reference implementation in released +versions of the ``gmx`` package. ``gmx.context.Context`` also provides operations in the ``gromacs`` +namespace. This support will probably move to a separate module, but the ``gromacs`` namespace +is reserved and should not be reimplemented in external software. + +When translating a work graph for execution, the Context calls a factory function for each +operation to get a Director. A Python-based Context *should* consult an internal map for +factory functions for the ``gmxapi`` namespace. **TODO:** *How to handle errors? +We don't really want middleware clients to have to import ``gmx``, but how would a Python +script know what exception to catch? Errors need to be part of an API-specified result type +or protocol, and/or the exceptions need to be defined in the package implementing the context.* + + +namespace is imported + +operation is an attribute in namespace that + +.. versionadded:: 0.0 + + is callable with the signature ``operation(element)`` to get a Director + +.. versionchanged:: 0.1 + + has a ``_gmxapi_graph_director`` attribute to get a Director + +Helper +~~~~~~ + +Add operation instance to work graph and return a proxy object. +If proxy object has ``input`` or ``output`` attributes, they should forward ``getattr`` +calls to the context... *TBD* + +The helper makes API calls to the default or provided Context and then asks the Context for +an object to return to the caller. Generally, this is a proxy Operation object, but when the +context is a local context in the process of launching a session, the object can be a +graph Director that can be used to finish configuring and launch the execution graph. + +Signatures + +``myplugin.myoperation(arg0: WorkElement) -> gmx.Operation`` + +.. versionchanged:: 0.1 + + Operation helpers are no longer required to accept a ``gmx.workflow.WorkElement`` argument. + +``myplugin.myoperation(*args, input: inputobject, output: outputobject, **kwargs)`` + + inputobject : dict + Map of named input ports to typed gmxapi data, implicitly mappable Python objects, + or objects implementing the gmxapi Output interface. + +Some operations (``gmx.commandline``) need to provide an ``output`` keyword argument to define +data types and/or placeholders (not represented in the work graph). + outputobject : dict + Map of named output ports to + +Additional ``args`` and ``kwargs`` may be used by the helper function to set up the work +graph node. Note that the context will not use them when launching the operation, though, +so .... + + +... Maybe let ``input`` and ``output`` kwargs be interpreted by the helper function, too, +and let the operation node input be completely specified by ``parameters``? + +``myplugin.myoperation(arg0: graph_ref, *args, parameters: inputobject, **kwargs)`` + +... I think we can go ahead and let ``gmx.Operation.input`` and ``gmx.Operation.output`` +implement ``get_item``... + +Implementation note: the input and output attributes can have a common implementation, +provided with Python "Descriptors" + +Servicing the proxy +~~~~~~~~~~~~~~~~~~~ + +When the Python client added the operation to the work graph, it used a helper function +to get a reference to an Operation proxy object. This object holds a weak reference to +the context and work graph to which it was added. + + +Factory +~~~~~~~ + +get Director for session launch + +Director +~~~~~~~~ + +subscribable to implement data dependencies + +``build`` method adds ``launch`` and ``run`` objects to execution graph. + +To do: change ``build`` to ``construct`` + +Session callable +~~~~~~~~~~~~~~~~ + +``gmxapi`` operations +--------------------- + +Operation namespace: gmxapi + + +.. rubric:: operation: make_input + +.. versionadded:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.make_input` + +* ``input`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + +* ``output`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + + +.. rubric:: operation: md + +.. versionadded:: gmxapi_workspec_0_1 + +.. deprecated:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.workflow.from_tpr` + +Ports: + +* ``params`` +* ``depends`` + + +.. rubric:: operation: modify_input + +.. versionadded:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.modify_input` + +* ``input`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + +* ``output`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + + +``gromacs`` operations +---------------------- + +Operation namespace: gromacs + + +.. rubric:: operation: load_tpr + +.. versionadded:: gmxapi_workspec_0_1 + +.. deprecated:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.workflow.from_tpr` + + +.. rubric:: operation: mdrun + +.. versionadded:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.mdrun` + +* ``input`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + +* ``output`` ports + + - ``trajectory`` + - ``conformation`` + - ``state`` + +* ``interface`` ports + + - ``potential`` + + +.. rubric:: operation: read_tpr + +.. versionadded:: gmxapi_graph_0_2 + +Produced by :py:func:`gmx.read_tpr` + +* ``input`` ports + + - ``params`` takes a list of filenames + +* ``output`` ports + + - ``params`` + - ``structure`` + - ``topology`` + - ``state`` + + +Extension API +============= + +Extension modules provide a high-level interface to gmxapi operations with functions +that produce Operation objects. Operation objects maintain a weak reference to the +context and work graph to which they have been added so that they can provide a +consistent proxy interface to operation data. Several object properties provide +accessors that are forwarded to the context. + +.. These may seem like redundant scoping while operation instances are essentially + immutable, but with more graph manipulation functionality, we can make future + operation proxies more mutable. Also, we might add extra utilities or protocols + at some point, so we include the scoping from the beginning. + +``input`` contains the input ports of the operation. Allows a typed graph edge. Can +contain static information or a reference to another gmxapi object in the work graph. + +``output`` contains the output ports of the operation. Allows a typed graph edge. Can +contain static information or a reference to another gmxapi object in the work graph. + +``interface`` allows operation objects to bind lower-level interfaces at run time. + +Connections between ``input`` and ``output`` ports define graph edges that can be +checkpointed by the library with additional metadata. + +Python interface +================ + + +:py:func:`gmx.read_tpr` creates a node for a ``gromacs.read_tpr`` operation implemented +with :py:func:`gmx.fileio.read_tpr` + +:py:func:`gmx.mdrun` creates a node for a ``gromacs.mdrun`` operation, implemented +with :py:func:`gmx.context._mdrun` + +:py:func:`gmx.init_subgraph` + +:py:func:`gmx.while_loop` creates a node for a `gmxapi.while_loop + + +Work graph procedural interface +------------------------------- + +Python syntax available in the imported ``gmx`` module. + +.. py:function:: gmx.commandline_operation(executable, arguments=[], input=[], output=[]) + + .. versionadded:: 0.0.8 + + lorem ipsum + +.. py:function:: gmx.get_context(work=None) + :noindex: + + .. versionadded:: 0.0.4 + + Get a handle to an execution context that can be used to launch a session + (for the given work graph, if provided). + +.. py:function:: gmx.logical_not + + .. versionadded:: 0.1 + + Create a work graph operation that negates a boolean input value on its + output port. + +.. py:function:: gmx.make_input() + :noindex: + + .. versionadded:: 0.1 + +.. py:function:: gmx.mdrun() + + .. versionadded:: 0.0.8 + + Creates a node for a ``gromacs.mdrun`` operation, implemented + with :py:func:`gmx.context._mdrun` + +.. py:function:: gmx.modify_input() + + .. versionadded:: 0.0.8 + + Creates a node for a ``gmxapi.modify_input`` operation. Initial implementation + uses ``gmx.fileio.read_tpr`` and ``gmx.fileio.write_tpr`` + +.. py:function:: gmx.read_tpr() + + .. versionadded:: 0.0.8 + + Creates a node for a ``gromacs.read_tpr`` operation implemented + with :py:func:`gmx.fileio.read_tpr` + +.. py:function:: gmx.gather() + + .. versionadded:: 0.0.8 + +.. py:function:: gmx.reduce() + + .. versionadded:: 0.1 + + Previously only available as an ensemble operation with implicit reducing + mode of ``mean``. + +.. py:function:: gmx.run(work=None, **kwargs) + :noindex: + + Run the current work graph, or the work provided as an argument. + + .. versionchanged:: 0.0.8 + + ``**kwargs`` are passed to the gmxapi execution context. Refer to the + documentation for the Context for usage. (E.g. see :py:class:`gmx.context.Context`) + +.. py:function:: gmx.init_subgraph() + + .. versionadded:: 0.1 + + Prepare a subgraph. Alternative name: ``gmx.subgraph`` + +.. py:function:: gmx.tool + + .. versionadded:: 0.1 + + Add a graph operation for one of the built-in tools, such as a GROMACS + analysis tool that would typically be invoked with a ``gmx toolname `` + command line syntax. Improves interoperability of tools previously accessible + only through :py:func:`gmx.commandline_operation` + +.. py:function:: gmx.while_loop() + + .. versionadded:: 0.1 + + Creates a node for a ``gmxapi.while_loop`` + +Types +----- + +Python classes for gmxapi object and data types. + +.. py:class:: gmx.InputFile + + Proxy for + +.. py:class:: gmx.NDArray + + N-dimensional array of gmxapi data. + +.. py:class:: gmx.OutputFile + + Proxy for + +Additional classes and specified interfaces +------------------------------------------- + +We support Python duck-typing where possible, in that objects do not need to +inherit from a gmxapi-provided base class to be compatible with specified +gmxapi behaviors. This section describes the important attributes of specified +gmxapi interfaces. + +This section also notes + +* classes in the reference implementation that implement specified interfaces +* utilities and helpers provided to support creating gmxapi compatible wrappers + +.. rubric:: Operation + +Utilities +--------- + +.. py:function:: gmx.operation.make_operation(class, input=[], output=[]) + + Generate a function object that can be used to manipulate the work graph + _and_ to launch the custom-defined work graph operation. + + Example: https://github.com/kassonlab/gmxapi-scripts/blob/master/analysis.py + +Reference implementation +------------------------ + +The ``gmx`` Python package implements ``gmxapi`` operations in the ``gmx.context.Context`` +reference implementation to support top-level ``gmx`` functions using various +``gmx`` submodules. + +:py:func:`gmx.fileio.read_tpr` Implements :py:func:`gromacs.read_tpr` + +Specification module +==================== + +Documentation for the specification should be extracted from the package. +It will be migrated to :module:`gmx._workspec_0_2`. + +The module (eventually) provides helpers, with utilities to validate API implementations / data structures / interfaces, +and (possibly) wrappers or factories. + +The remaining content in this document is automatically extracted from the +:py:mod:`gmx._workspec_0_2` module. The above content can be migrated into this +module shortly, but the intent is that the module will also contain syntax and +schema checkers. + Specification ------------- diff --git a/src/gmx/operation.py b/src/gmx/operation.py new file mode 100644 index 0000000000..5b21f3b02b --- /dev/null +++ b/src/gmx/operation.py @@ -0,0 +1,52 @@ +""" +Python front-end to gmxapi work graph Operations +================================================ + +Reference https://github.com/kassonlab/gmxapi/issues/208 + +Consider an example work graph like the following. + + subgraph = gmx.subgraph(input={'conformation': initial_input}) + + # As an alternative to specifying the context or graph in each call, intuiting the graph, + # or requiring the user to manage globally switching, we could use a context manager. + with subgraph: + modified_input = gmx.modify_input(input=initial_input, structure=subgraph.input.conformation) + md = gmx.mdrun(input=modified_input) + # Assume the existence of a more integrated gmx.trajcat operation + cluster = gmx.command_line('gmx', 'cluster', input=gmx.reduce(gmx.trajcat, md.output.trajectory)) + condition = mymodule.cluster_analyzer(input=cluster.output.file["-ev"]) + subgraph.output.conformation = cluster.output.conformation + + # In the default work graph, add a node that depends on `condition` wraps subgraph. + # It makes sense that a convergence-checking operation is initialized such that + # `is_converged() == False` + my_loop = gmx.while(gmx.logical_not(condition.output.is_converged), subgraph) + gmx.run() + +What is produced by mymodule.cluster_analyzer? + + +""" +# The object returned needs to be usable in the definition of a work graph, as well +# as in the running of it. +# +# Possibly the easiest way, and the way that would be most +# consistent with what we've done so far, would be for the `mymodule` Python module +# (optionally "package") to have a `cluster_analyzer` function with multiple +# signatures. A signature that takes a single WorkElement object produces a Director +# for the operation at launch time. This would be compatible with the 0.0.7 spec. +# +# To allow the flexible usage described above, the object would need to have `input` +# and `output` properties that each provide properties that look like named gmxapi +# input or output data. Also see gist: https://gist.github.com/eirrgang/0d975eb279fce21f59aa29de2b1316f2 +# +# Side note: I'm not sure the objects need to have separate `input` and `output` +# properties. We don't really have a use case yet for assigning to `input` without +# using keyword arguments in a function, so maybe there is no `input` property and +# the `output` categorization is implicit for Operation properties. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals