forked from kassonlab/gmxapi
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lay out major functionality and interface changes.
- Loading branch information
Showing
6 changed files
with
599 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
========================= | ||
Work specification schema | ||
========================= | ||
|
||
Goals | ||
===== | ||
|
||
- Serializeable representation of a molecular simulation and analysis workflow. | ||
- Simple enough to be robust to API updates and uncoupled from implementation details. | ||
- Complete enough to unambiguously direct translation of work to API calls. | ||
- Facilitate easy integration between independent but compatible implementation code in Python or C++ | ||
- Verifiable compatibility with a given API level. | ||
- Provide enough information to uniquely identify the "state" of deterministic inputs and outputs. | ||
|
||
The last point warrants further discussion. | ||
|
||
One point to make is that we need to be able to recover the state of an | ||
executing graph after an interruption, so we need to be able to identify whether or not work has been partially completed | ||
and how checkpoint data matches up between nodes, which may not all (at least initially) be on the same computing host. | ||
|
||
The other point that is not completely unrelated is how to minimize duplicated data or computation. Due to numerical | ||
optimizations, molecular simulation results for the exact same inputs and parameters may not produce output that is | ||
binary identical, but which should be treated as scientifically equivalent. We need to be able to identify equivalent | ||
rather than identical output. Input that draws from the results of a previous operation should be able to verify whether | ||
valid results for any identically specified operation exists, or at what state it is in progress. | ||
|
||
The degree of granularity and room for optimization we pursue affects the amount of data in the work specification, its | ||
human-readability / editability, and the amount of additional metadata that needs to be stored in association with a | ||
Session. | ||
|
||
If one element is added to the end of a work specification, results of the previous operations should not be invalidated. | ||
|
||
If an element at the beginning of a work specification is added or altered, "downstream" data should be easily invalidated. | ||
|
||
Serialization format | ||
==================== | ||
|
||
The work specification record is valid JSON serialized data, restricted to the latin-1 character set, encoded in utf-8. | ||
|
||
Uniqueness | ||
========== | ||
|
||
Goal: results should be clearly mappable to the work specification that led to them, such that the same work could be | ||
repeated from scratch, interrupted, restarted, etcetera, in part or in whole, and verifiably produce the same results | ||
(which can not be artificially attributed to a different work specification) without requiring recomputing intermediate | ||
values that are available to the Context. | ||
|
||
The entire record, as well as individual elements, have a well-defined hash that can be used to compare work for | ||
functional equivalence. | ||
|
||
State is not contained in the work specification, but state is attributable to a work specification. | ||
|
||
If we can adequately normalize utf-8 Unicode string representation, we could checksum the full text, but this may be more | ||
work than defining a scheme for hashing specific data or letting each operation define its own comparator. | ||
|
||
Question: If an input value in a workflow is changed from a verifiably consistent result to an equivalent constant of a | ||
different "type", do we invalidate or preserve the downstream output validity? E.g. the work spec changes from | ||
"operationB.input = operationA.output" to "operationB.input = final_value(operationA)" | ||
|
||
The question is moot if we either only consider final values for terminating execution or if we know exactly how many | ||
iterations of sequenced output we will have, but that is not generally true. | ||
|
||
Maybe we can leave the answer to this question unspecified for now and prepare for implementation in either case by | ||
recording more disambiguating information in the work specification (such as checksum of locally available files) and | ||
recording initial, ongoing, and final state very granularly in the session metadata. It could be that this would be | ||
an optimization that is optionally implemented by the Context. | ||
|
||
It may be that we allow the user to decide what makes data unique. This would need to be very clearly documented, but | ||
it could be that provided parameters always become part of the unique ID and are always not-equal to unprovided/default | ||
values. Example: a ``load_tpr`` operation with a ``checksum`` parameter refers to a specific file and immediately | ||
produces a ``final`` output, but a ``load_tpr`` operation with a missing ``checksum`` parameter produces non-final | ||
output from whatever file is resolved for the operation at run time. | ||
|
||
It may also be that some data occurs as a "stream" that does not make an operation unique, such as log file output or | ||
trajectory output that the user wants to accumulate regardless of the data flow scheme; or as a "result" that indicates | ||
a clear state transition and marks specific, uniquely produced output, such as a regular sequence of 1000 trajectory | ||
frames over 1ns, or a converged observable. "result"s must be mapped to the representation of the | ||
workflow that produced them. To change a workflow without invalidating results might be possible with changes that do | ||
not affect the part of the workflow that fed those results, such as a change that only occurs after a certain point in | ||
trajectory time. Other than the intentional ambiguity that could be introduced with parameter semantics in the previous | ||
paragraph, | ||
|
||
Heuristics | ||
========== | ||
|
||
Dependency order affects order of instantiation and the direction of binding operations at session launch. | ||
|
||
Rules of thumb | ||
-------------- | ||
|
||
An element can not depend on another element that is not in the work specification. | ||
*Caveat: we probably need a special operation just to expose the results of a different work flow.* | ||
|
||
Dependency direction affects sequencing of Director calls when launching a session, but also may be used at some point | ||
to manage checkpoints or data flow state checks at a higher level than the execution graph. | ||
|
||
Middleware API | ||
============== | ||
|
||
Specification | ||
------------- | ||
|
||
.. automodule:: gmx._workspec_0_2 | ||
:members: | ||
|
||
Helpers | ||
------- | ||
|
||
.. automodule:: gmx._workspec_0_2.util | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
==================== | ||
Release Notes: 0.0.8 | ||
==================== | ||
|
||
Feature additions | ||
================= | ||
|
||
Convenient access to trajectory data | ||
------------------------------------ | ||
|
||
In addition to operationally manipulating trajectory data handles between work | ||
elements, users are able to extract trajectory data output in the local environment. | ||
|
||
Trajectory data from static sources (e.g. files) can be read with a compatible | ||
numpy-friendly interface. | ||
|
||
Interfaces for accessing trajectory data will converge on compatibility with | ||
concurrent updates to the GROMACS Trajectory Analysis Framework. | ||
Supersede https://gerrit.gromacs.org/c/6567/ | ||
|
||
See also `Session and client need access to trajectory step #56 <https://github.com/kassonlab/gmxapi/issues/56>`_ | ||
|
||
Override MDP options | ||
-------------------- | ||
|
||
Parameters may be specified as part of the work graph. Specified parameters | ||
override defaults or previously set values and become part of the unique | ||
identifying information for data in the execution graph. | ||
|
||
Generically, key-value entries compatible with the current MDP file format may | ||
be provided as part of a single parameters dictionary. Future work will provide | ||
better integration with the MDP options expression in GROMACS and allow for | ||
better detection of equivalent work graphs. | ||
|
||
Parameters that can be specified with their own key-word arguments can provide | ||
constant data or can reference named outputs of gmxapi operations already in | ||
the work graph. | ||
|
||
Multiple simulations per work graph | ||
----------------------------------- | ||
|
||
gmxapi 0.0.7 requires a new work graph to be launched for each simulation | ||
operation. Updates to the WorkSpec, Context, and Session implementations allow | ||
multiple simulation nodes, not just parallel arrays of simulations. | ||
|
||
This functionality simultaneously | ||
|
||
* simplifies user management of data flow | ||
* separates the user from filesystem management | ||
|
||
See also `multiple MD elements in a single workflow #39 <https://github.com/kassonlab/gmxapi/issues/39>`_ | ||
|
||
More flexible asynchronous work | ||
------------------------------- | ||
|
||
Asynchronous elements of work may be run serially, if appropriate for the | ||
execution environment, even if the work is part of a trajectory ensemble. | ||
|
||
Session-level data flow is distinguished from lower-level data flow to allow | ||
interaction between operation nodes between updates to the execution graph state. | ||
This is a formalization of the distinction between (a) the plugin force-provider | ||
interface or simulation stop signal facility and (b) data edges on the execution | ||
graph. | ||
|
||
Named inputs and outputs in work graph | ||
-------------------------------------- | ||
|
||
Instead of automatic subscription between work graph nodes and dependent nodes, | ||
operations have named inputs and outputs that can be referenced in the params | ||
for other operations. | ||
|
||
File utilities | ||
-------------- | ||
|
||
Outside of the work graph that is dispatched to run in a session, simple tools | ||
provide equivalent functionality to ``gmx`` command line tools to | ||
|
||
* build or modify run-input files (like ``grompp``, ``convert-tpr``, and such) | ||
* read file data (like ``gmx dump``) | ||
|
||
Better data flow | ||
---------------- | ||
|
||
See also `Tag artifacts #76 <https://github.com/kassonlab/gmxapi/issues/76>`_, | ||
`place external data object #96 <https://github.com/kassonlab/gmxapi/issues/96>`_, | ||
`reusable output node #40 <https://github.com/kassonlab/gmxapi/issues/40>`_ | ||
|
||
Procedural interface | ||
==================== | ||
|
||
``gmx.make_input()`` generates node(s) providing source(s) of | ||
|
||
* structure | ||
* topology | ||
* simulation parameters | ||
* generic data (catch-all options or data streams not specified in the API) | ||
|
||
Python object-oriented API | ||
========================== | ||
|
||
WorkElement objects are now views into WorkSpec work graph objects. | ||
|
||
WorkSpec objects contain the work graph and are owned by exactly one Context | ||
object. | ||
|
||
Though implementation classes exist in gmx.workflow, WorkElement and WorkSpec | ||
objects only need to implement a specified interface and do not need to be of | ||
any specific type. These interfaces are specified as part of `workspec 2`. | ||
|
||
See also `Add proxy access to data graph through WorkElement handles #94 <https://github.com/kassonlab/gmxapi/issues/94>`_ | ||
|
||
workspec 2 | ||
========== | ||
|
||
See :doc:`layers/workspec_schema_0_2` and | ||
`gmxapi_workspec_0_2 milestone <https://github.com/kassonlab/gmxapi/milestone/3>`_ | ||
|
||
See also `resolve protocol for API operation map #42 <https://github.com/kassonlab/gmxapi/issues/42>`_ | ||
|
||
C++ API | ||
======= | ||
|
||
Canonical gmxapi C++ API is now in GROMACS master. | ||
Pre-release and experimental features are still available through the kassonlab | ||
GitHub fork. | ||
The non-canonical nature of the fork is expressed by the presence of the CMake | ||
variable ``GMXAPI_EXPERIMENTAL``. | ||
|
||
Hierarchical object ownership | ||
----------------------------- | ||
|
||
gmxapi code must occur within the scope of a gmxapi::Context object lifetime. | ||
Allocated resources are owned by a Context or by objects ultimately owned by | ||
the Context. Work is launched in a Session, owned by the Context, which owns the | ||
objects performing actual computation in a configured execution environment. | ||
|
||
This means that gmxapi 0.0.8 necessarily enforces the proxy-object concept | ||
intended for gmxapi 1.0. Client code interacts with a work graph through a | ||
Context, and local objects are non-owning handles to resources owned and | ||
managed by the Context. | ||
|
||
This is also an inversion of the previous ownership model, in which ownership | ||
of resources was shared by the objects depending on those resources and object | ||
lifetimes were managed exclusively through reference-counting handles / smart | ||
pointers. Consequently, a handle to the Context, Session, or other resource | ||
owner must always be passed down into functions or shorter-lived objects that | ||
use the resources. | ||
|
||
See also `Context chain of responsibility <https://github.com/kassonlab/gmxapi/milestone/5>`_ | ||
|
||
Plugin development improvements | ||
=============================== | ||
|
||
Automatic Python interface generation | ||
------------------------------------- | ||
|
||
The developer no longer has to explicitly write a "builder." The operation | ||
launching protocol is managed with the help of included headers. | ||
|
||
Users no longer interact directly with gmx.workflow.WorkElement objects to | ||
interact with a plugin. Helper functions add operations to the work graph. | ||
Helper functions are automatically generated for plugins built on the provided | ||
sample code. | ||
|
||
See also `Remove boilerplate for plugin instantiation #78 <https://github.com/kassonlab/gmxapi/issues/78>`_ | ||
|
||
Templated registration of inputs and outputs | ||
-------------------------------------------- | ||
|
||
Reduced boiler plate, improved error checking, and compatibility with automatic | ||
workflow checkpointing. Input, output, and state data are managed by the | ||
framework. Instead of writing a class to contain a plugin's functions, the | ||
functions are written as free functions and use a SessionResources handle to | ||
interact with gmxapi and data on the execution graph. | ||
|
||
See also `clean up input parameter specification for plugins #47 <https://github.com/kassonlab/gmxapi/issues/47>`_ | ||
|
||
More templating to minimize implementation | ||
------------------------------------------ | ||
|
||
Plugin developers no longer implement an entire class, but only the functions | ||
they need. | ||
|
||
More call signatures are available for MD plugin operations to allow more | ||
intuitive implementation code. | ||
|
||
Input, output, and state data is no longer specified as class data members, but | ||
as resources to be managed through SessionResources. | ||
|
||
See also `restraint potential calculator inputs are confusing #140 <https://github.com/kassonlab/gmxapi/issues/140>`_ | ||
|
||
Integrated sample code | ||
---------------------- | ||
|
||
Sample MD plugin code is still provided as a standalone repository, but it is | ||
also included as a ``git`` *submodule* for convenience and to allow development | ||
documentation to be integrated with the primary ``gmx`` Python package documentation. | ||
|
Oops, something went wrong.