General mapping of inputs and outputs for arbitrary tools. #190

eirrgang · 2018-11-30T11:30:32Z

We need support for analysis tool chains.

Well before we have GROMACS core functionality to interact with the Trajectory Analysis Framework and GROMACS module registration, we can start with top-down access to command line interface (CLI) tools. In a first pass, we can enable CWL-like wrapping of CLI tools, allowing access to gmx toolname -flag1 arg1 -flag2 arg2 sorts of tools (inherently enabling any command line program with similar syntax), with management for inputs and outputs. Then we can wrap the CLI code for gmx tools directly to allow named operations with better knowledge of available inputs and outputs. As API is developed for registering inputs and outputs, GROMACS modules can be updated to have full gmxapi abstraction and liberation from strictly filesystem-based I/O.

The change sequence will look something like the following.

Python wrapper for CLI programs: gmx.command_line() produces gmx.Operation objects that can be used in a work graph to invoke subprocesses. gmx.map() can generate appropriate graph topologies for e.g. ArrayOperation or ensemble simulation inputs. The user expresses CLI flags as a dictionary (collections.OrderedDict) of key-value pairs. User must express execution order with the usual work graph dependency annotation. Python wrapper for CLI programs #198
If a value is of type gmx.InputFile or gmx.OutputFile, filename is extracted from the object. (If the value is the type object gmx.InputFile or gmx.OutputFile, a placeholder is created, for convenience.) If a gmx.OutputFile object has already been associated with the output of an operation when it is used again, it is transformed to a gmx.InputFile dependent on the previous operation. The object returned has input and output attributes that can be used to set or get references to the files. InputFile and OutputFile placeholders for Operations #203
Operation inputs and outputs are compatible with simulation operation inputs and outputs. Make operation inputs and outputs compatible with simulation operation #204
gmx.tool contains predefined operations for known GROMACS CLI tools. The interfaces are like that of the objects returned by gmx.command_line(), but their input and output file options are discoverable, with appropriate default values or automatic file argument management.
Operations in gmx.tool are compiled from GROMACS source, directly, with modifications to support error handling, like redefining GMX_FATAL_ERROR and handling CLI argument requirements or validity.

TBD:

GROMACS tool failures are hard to detect in many cases. It will take a while to discover the different cases. One that I've noticed was a quiet no-op when a filename argument did not have the standard GROMACS file extension for files of that type.
As a separate task, we can look at conveniently converting known output (file) types to numpy arrays or whatever.
In the first round, we probably assume that array tasks are one-dimensional, requiring sets of array inputs or outputs to be the same width (or zero-dimensional, if broadcasting). We might want syntax and semantics for higher-dimensional array work, such as permuting to sweep multiple parameters.

The text was updated successfully, but these errors were encountered:

eirrgang · 2018-12-06T13:22:46Z

Resolving one of the "to do"s: w.r.t dimensionality of data versus graph edges, the currently proposed solution is to make a distinction between a graph edge that is an ensemble or array of operations, versus data the is a sequence or array. More discussion and details in issue #198

peterkasson · 2018-12-07T14:18:51Z

We discussed the idea of having map() being implicit:
if an input is parallel, we automatically map the operation onto each input.
if we want to gather inputs, we call gmx.gather()
The alternative would be to call gmx.map(gmx.command_line(...), foo.outputs) on the output of some previous operation.

This was referenced Dec 3, 2018

Implicit context stored in Python module variable #90

Closed

Python wrapper for CLI programs #198

Open

Python wrapper for CLI programs #200

Closed

InputFile and OutputFile placeholders for Operations #203

Open

eirrgang added the gmxapi pertains to this repository and the Python module support label Dec 6, 2018

eirrgang added the enhancement label Dec 6, 2018

This was referenced Dec 6, 2018

Make operation inputs and outputs compatible with simulation operation #204

Open

Looping operations and subgraph management #205

Open

support find_program(gmx) #207

Open

This was referenced Dec 7, 2018

Conventions and utilities for Python Operations #208

Open

Allow Context to automatically manipulate simulation input files to restore MD operation to a known state. #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General mapping of inputs and outputs for arbitrary tools. #190

General mapping of inputs and outputs for arbitrary tools. #190

eirrgang commented Nov 30, 2018 •

edited

Loading

eirrgang commented Dec 6, 2018

peterkasson commented Dec 7, 2018

General mapping of inputs and outputs for arbitrary tools. #190

General mapping of inputs and outputs for arbitrary tools. #190

Comments

eirrgang commented Nov 30, 2018 • edited Loading

eirrgang commented Dec 6, 2018

peterkasson commented Dec 7, 2018

eirrgang commented Nov 30, 2018 •

edited

Loading