Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is not straightforward to set an array of parameters across an array of plugins #202

Open
eirrgang opened this issue Dec 4, 2018 · 2 comments

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Dec 4, 2018

When MD extension WorkElements are specified for ensembles or arrays of simulations, all ensemble members receive the same parameters.

If we need to solve this problem in the very short term, the easiest thing would be to let extension code process arrays with the session rank. We didn't settle on a solid way to provide that to operations during session launch, but it is available indirectly if constructing a launch() method in the builder closure. Instead, we would probably make a new operation that provides ensemble arrays of data and allow plugins to subscribe to the operation.

If we can avoid a rushed solution, the problem should be addressed automatically with the dimensioned input/output types in the next generation work graph scheme. Broadcasting would usually be possible implicitly but reduction operations would reduce dimensionality of data on graph edges.

Questions: is there is necessarily a distinction between which dimension is added or removed and whether there are different types of dimensions. E.g.

  • Does a N-member ensemble of simulations automatically map to/from the array dimension of any input/output? (currently it does)
  • Is only the "outermost" dimension available for broadcast/reduction? I.e. are multidimensional arrays explicitly nested one-dimensional arrays of one-dimensional arrays, or do we have a more numpy-like array slicing notion?
@eirrgang eirrgang added this to the gmxapi_workspec_0_2 milestone Dec 4, 2018
@eirrgang eirrgang added the bug label Dec 4, 2018
@eirrgang
Copy link
Collaborator Author

  • Mapping between operations of equal width is straightforward and implicit.
  • Broadcasting from single-valued to an ensemble of values is straightforward and implicit.
  • Gather is explicit with gmx.gather (TBD).
  • Reduce is just an operation whose output has different dimensionality than its input.
  • Scattering (or mapping a list of values) is implicit for the outermost dimension of list or array-like data.
  • For the purposes of implicit Scatter, gmx.MDArray objects are singular and are not decomposed by their outermost dimension.

In the shortest term, we can let the work graph processing interpret a list of length 1 the same way it interprets a scalar, broadcasting the single element if appropriate.

Then in the current schema, if you want to broadcast the array [0.0, 0.0, 0.0] to all members of an ensemble (say, to initialize some vector parameter), the parameter would just be defined set as 'params': {'vector': [[0.0, 0.0, 0.0]]}

Then, to the API, 'params': {'parameter1': 1.0} is treated the same as 'params': {'parameter1': [1.0]}, but 'params': {'parameter1': [1.0, 2.0]} is treated differently than 'params': {'parameter1': [[1.0, 2.0]]}. Dimensions further "in" are handled with behavior defined by the operation.

For compatibility with future semantics or schema, we can then let a Python gmx.MDArray type be a hint to the helper functions to write the work graph with the intended behavior. It's possible we will want to either deprecate bare scalar input values entirely (in the work graph schema) or remove the redundant representation of singular values through some other schema update.

@eirrgang
Copy link
Collaborator Author

After conversation with @peterkasson it sounds like the most useful and user-friendly thing we can do is to maximize the opportunity for implicit ensemble broadcast, map, and scatter by providing a gmx.NDArray type to explicitly mark data as singular. The N-Dimensional Array is one of the types we are beginning to specify for gmxapi anyway, so it is appropriate to create a Python interface now and allow its use similarly to gmx.InputFile and gmx.OutputFile in #203.

The gist:

gmx.NDArray is a class whose initializer mirrors the syntax of numpy.array, with a couple of additions. The initializer borrows the additional keyword argument shape as used in helpers like numpy.empty, numpy.ones, and numpy.zeros, but the shape keyword argument is not the first argument (the source data object is). This allows us additionally to broadcast the input data to the desired array shape.

Example:

myoperation = someplugin.someoperation(
    input={
        'a': [3., 2., 1.],
        'b': [0.],
        'c': 0.0,
        'd': gmx.NDArray([0., 0., 0., 1.]),
        'e': gmx.NDArray([0.], shape=(10,)),
        'f': gmx.NDArray(0., shape=10),
        'g': gmx.NDArray(0., 10),
    }
)

The parameter a causes the values 3.0, 2.0, and 1.0 to be used on the first, second, and third members of an ensemble work graph.

The single value in parameter b is treated as if the user had written 'b': [0.] * 3 because the ensemble size >1 is determined by examining other parameters (a).

The scalar in c is treated as if it had been written [0.0] and is handled the same as in b.

The d parameter has been provided a singular value that will be broadcast as for b and c. The value is an array of length 4.

For e, the single-element list is broadcast into a new NDArray of length 10. Ref numpy.resize

For f, broadcast the value into the new array as if it were a single element list. As with numpy, the integer value for shape is assumed to define the length of a one-dimensional array, is if it had been (10,).

For g, since shape is the second argument in the signature of gmx.NDArray.__init__, it can be left out, allowing syntax that looks just like one of the constructors of the C++ std::vector<float>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant