Gradient Implementation #170

sjperkins · 2016-10-10T10:40:18Z

The intention is to leverage Tensorflow's gradient implementation which computes symbolic partial derivatives of an output tensor with respect to a number of input tensors.

So for example, Montblanc's tensorflow implementation will be able to output the partial derivative of a chunk of model visibilities with respect to (lm, stokes, alpha, ebeam, etc...).

Question for the mathematicians: Is it possible to combine the partials to compute a derivative with respect to all input tensors? If not, it'll have to be brute-forced by computing visibilities for different parameters and differencing.

@ArunAniyan made the point that the partial derivatives are all separate because their quantities are different (coordinate, flux etc.). It should be possible to combine partial derivatives if their quantities are the same.
@bmerry points at that if the functions is well behaved you can take the dot product of the partials with respect to some offsets as in the following equation:

Things to consider when implementing:

Both computation and memory requirements will probably scale by P, where P is the number of partial derivatives. Thus, the memory budgeting mechanisms will need to take this into account.

/cc @marziarivi, @jtlz2, @SaiyanPrince and all Bayesian inference people, everywhere.

marziarivi · 2016-10-18T14:20:46Z

Not sure to understand the question...
For the chi squared gradient, obviously you can combine the partial derivatives of the visibilities to compute it because of the derivative chaining rule.

sjperkins · 2016-10-19T16:48:07Z

After discussion with @landmanbester and some experiments this morning it looks like like tensorflow function (tf.gradients) provides the Jacobian of some tensor with respect to input tensors. For example, this example computes y and the Jacobian of y w.r.t. x:

import tensorflow as tf

xshape = (4, 2)

x = tf.ones(shape=xshape, dtype=tf.float64)
y = tf.reduce_sum(x**2 + x)
grad = tf.gradients(y, x)

with tf.Session() as S:
    # Compute y and it's symbolic gradient w.r.t. x
    _y, _grad = S.run([y, grad])
    # _grad[0] contains partial of y w.r.t. x. Same shape as x.
    assert _grad[0].shape == xshape
    print _y, _grad

Then, to produce a gradient operator, one can flatten the Jacobian(s) out to produce a vector of parameters. So if one thinks of lm coordinates instead of x one will have eight parameters (four l and four m coordinates).

Thoughts

probably can't compute gradient w.r.t. complex-valued tensors.
Need to discover if one can compute gradient of complex-valued tensors w.r.t. real tensors.
It's possible to define gradients for custom operators e.g. this

sjperkins added the enhancement label Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Implementation #170

Gradient Implementation #170

sjperkins commented Oct 10, 2016

marziarivi commented Oct 18, 2016

sjperkins commented Oct 19, 2016

Gradient Implementation #170

Gradient Implementation #170

Comments

sjperkins commented Oct 10, 2016

marziarivi commented Oct 18, 2016

sjperkins commented Oct 19, 2016