Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General mapping of inputs and outputs for arbitrary tools. #190

Open
eirrgang opened this issue Nov 30, 2018 · 2 comments
Open

General mapping of inputs and outputs for arbitrary tools. #190

eirrgang opened this issue Nov 30, 2018 · 2 comments
Labels
enhancement gmxapi pertains to this repository and the Python module support

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Nov 30, 2018

We need support for analysis tool chains.

Well before we have GROMACS core functionality to interact with the Trajectory Analysis Framework and GROMACS module registration, we can start with top-down access to command line interface (CLI) tools. In a first pass, we can enable CWL-like wrapping of CLI tools, allowing access to gmx toolname -flag1 arg1 -flag2 arg2 sorts of tools (inherently enabling any command line program with similar syntax), with management for inputs and outputs. Then we can wrap the CLI code for gmx tools directly to allow named operations with better knowledge of available inputs and outputs. As API is developed for registering inputs and outputs, GROMACS modules can be updated to have full gmxapi abstraction and liberation from strictly filesystem-based I/O.

The change sequence will look something like the following.

  1. Python wrapper for CLI programs: gmx.command_line() produces gmx.Operation objects that can be used in a work graph to invoke subprocesses. gmx.map() can generate appropriate graph topologies for e.g. ArrayOperation or ensemble simulation inputs. The user expresses CLI flags as a dictionary (collections.OrderedDict) of key-value pairs. User must express execution order with the usual work graph dependency annotation. Python wrapper for CLI programs #198
  2. If a value is of type gmx.InputFile or gmx.OutputFile, filename is extracted from the object. (If the value is the type object gmx.InputFile or gmx.OutputFile, a placeholder is created, for convenience.) If a gmx.OutputFile object has already been associated with the output of an operation when it is used again, it is transformed to a gmx.InputFile dependent on the previous operation. The object returned has input and output attributes that can be used to set or get references to the files. InputFile and OutputFile placeholders for Operations #203
  3. Operation inputs and outputs are compatible with simulation operation inputs and outputs. Make operation inputs and outputs compatible with simulation operation #204
  4. gmx.tool contains predefined operations for known GROMACS CLI tools. The interfaces are like that of the objects returned by gmx.command_line(), but their input and output file options are discoverable, with appropriate default values or automatic file argument management.
  5. Operations in gmx.tool are compiled from GROMACS source, directly, with modifications to support error handling, like redefining GMX_FATAL_ERROR and handling CLI argument requirements or validity.

TBD:

  • GROMACS tool failures are hard to detect in many cases. It will take a while to discover the different cases. One that I've noticed was a quiet no-op when a filename argument did not have the standard GROMACS file extension for files of that type.
  • As a separate task, we can look at conveniently converting known output (file) types to numpy arrays or whatever.
  • In the first round, we probably assume that array tasks are one-dimensional, requiring sets of array inputs or outputs to be the same width (or zero-dimensional, if broadcasting). We might want syntax and semantics for higher-dimensional array work, such as permuting to sweep multiple parameters.
@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 6, 2018

Resolving one of the "to do"s: w.r.t dimensionality of data versus graph edges, the currently proposed solution is to make a distinction between a graph edge that is an ensemble or array of operations, versus data the is a sequence or array. More discussion and details in issue #198

@peterkasson
Copy link
Collaborator

We discussed the idea of having map() being implicit:
if an input is parallel, we automatically map the operation onto each input.
if we want to gather inputs, we call gmx.gather()
The alternative would be to call gmx.map(gmx.command_line(...), foo.outputs) on the output of some previous operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement gmxapi pertains to this repository and the Python module support
Projects
None yet
Development

No branches or pull requests

2 participants