Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InputFile and OutputFile placeholders for Operations #203

Open
2 tasks
eirrgang opened this issue Dec 6, 2018 · 7 comments
Open
2 tasks

InputFile and OutputFile placeholders for Operations #203

eirrgang opened this issue Dec 6, 2018 · 7 comments
Labels
gmxapi pertains to this repository and the Python module support task Task in support of a larger issue

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Dec 6, 2018

subtask of #190
depends #198

Manage input and output for operations with file-based I/O. Solve the problem for operations that wrap CLI tools in a manner that anticipates interoperation with more generic work graph nodes. (e.g. simulation input/output)

If a value is of type gmx.InputFile or gmx.OutputFile, filename is extracted from the object. (If the value is the type object gmx.InputFile or gmx.OutputFile, a placeholder is created, for convenience.) If a gmx.OutputFile object has already been associated with the output of an operation when it is used again, it is transformed to a gmx.InputFile dependent on the previous operation. The object returned has input and output attributes that can be used to set or get references to the files.

  • Manage input and output file descriptors: abstraction for "magic" arguments
  • Enforce data flow requirements: detect data dependencies and require that input data exists before allowing execution
@eirrgang eirrgang added task Task in support of a larger issue gmxapi pertains to this repository and the Python module support labels Dec 6, 2018
@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 6, 2018

Values of the input and output dictionaries are inspected to see if they are objects representing files managed by gmxapi.
Otherwise, they must be convertible to strings to be added to the command line.

GROMACS tools have been observed to do weird and inconsistent things when conventional
filename extensions are not used. In the first pass, at least, the user is required to provide a
sensible filename.

output_placeholder = gmx.OutputFile('bynum.xvg')
hbond = gmx.commandline_operation('gmx',
    arguments=['hbond'],
    input={
        '-f': 'somefile.trr',
        '-s': 'input.tpr',
        '-n': 'index.ndx' 
    },
    output={
        '-num': output_placeholder,
        '-ang': 'hbang.xvg'
    })

Note also that, in general, we can't intuit conventions for command line flag syntax (when to use one or two hyphens ('-')) so it's safest not to try. This makes the flags unsuitable for attribute names, so we will continue to use them in dictionary form after the object is created.

I'm not sure if it is a good idea to have the output property of an operation object implement getattr like a dictionary. For generality and consistency with other gmxapi operations, we should use a nested attribute. E.g.

>>> hbond.output.file_arg['-num']
gmx.OutputFile('bynum.xvg')
>>> str(hbond.output.file_arg['-num'])
'bynum.xvg'

We might not want to make it that easy to access the file name, though, to allow future abstraction (don't give gmx.OutputFile a __str__ method).

For example, consider a simulation that has just run.

Get a numpy array-like view into local data

xyz = md.output.conformation.extract().position[...]

Get the name of a file that is locally accessible

md.output.conformation.extract(savefile='output.gro')

In the long run, it is a robustness problem to encourage users to share file management responsibility with gmxapi. We can allow filename specification, but it
Is probably safer to have the user “extract” the file to a filename, possibly making a copy. It could be a symbolic link, because in the long term we expect to supplement file name information with a checksum hash and gmxapi will be able to detect if file content has changed unexpectedly.

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

Do not specify static names. Default: manage file name. Allow helper to specify suffix when that doesn't work.

@peterkasson
Copy link
Collaborator

A couple suggestions:

hbond = gmx.commandline_operation('gmx',
    arguments=['hbond'],
    input={
        '-f': 'somefile.trr',
        '-s': 'input.tpr',
        '-n': 'index.ndx' 
    },
    output={
        '-num': gmx.output_placeholder('.xvg'),
        '-ang': gmx.output_placeholder('.xvg')
    })

and there could maybe also be default outputs?

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

Resolve ambiguity: When a file list is given to, say, an input file argument, assume it is implicitly a map operation. If it is a command that takes a list of filenames, require explicit gather

@peterkasson
Copy link
Collaborator

Suggest example for trjcat:

catfiles = gmx.commandline_operation('gmx',
    arguments=['trjcat'],
    input={
        '-f': gmx.gather(md.output.trajectory)
    },
    # output could be left implicit
    output={
        '-o': gmx.output_placeholder('.xtc')
    })

@peterkasson
Copy link
Collaborator

So for mapped inputs:

hbond = gmx.commandline_operation('gmx',
    arguments=['hbond'],
    input={
        '-f': md.output.trajectory
        '-s': grompp.output
        '-n': select.output
    },
    output={
        '-num': gmx.output_placeholder('.xvg'),
        '-ang': gmx.output_placeholder('.xvg')
    })

@eirrgang
Copy link
Collaborator Author

eirrgang commented Dec 7, 2018

Note that when we have a list of ensemble data, we could allow trajcat to map across the ensemble to combine a sequence of trajectory segments in each ensemble member.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gmxapi pertains to this repository and the Python module support task Task in support of a larger issue
Projects
None yet
Development

No branches or pull requests

2 participants