Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient Implementation #170

Open
sjperkins opened this issue Oct 10, 2016 · 2 comments
Open

Gradient Implementation #170

sjperkins opened this issue Oct 10, 2016 · 2 comments

Comments

@sjperkins
Copy link
Member

The intention is to leverage Tensorflow's gradient implementation which computes symbolic partial derivatives of an output tensor with respect to a number of input tensors.

So for example, Montblanc's tensorflow implementation will be able to output the partial derivative of a chunk of model visibilities with respect to (lm, stokes, alpha, ebeam, etc...).

Question for the mathematicians: Is it possible to combine the partials to compute a derivative with respect to all input tensors? If not, it'll have to be brute-forced by computing visibilities for different parameters and differencing.

  1. @ArunAniyan made the point that the partial derivatives are all separate because their quantities are different (coordinate, flux etc.). It should be possible to combine partial derivatives if their quantities are the same.
  2. @bmerry points at that if the functions is well behaved you can take the dot product of the partials with respect to some offsets as in the following equation: quicklatex

Things to consider when implementing:

  1. Both computation and memory requirements will probably scale by P, where P is the number of partial derivatives. Thus, the memory budgeting mechanisms will need to take this into account.

/cc @marziarivi, @jtlz2, @SaiyanPrince and all Bayesian inference people, everywhere.

@marziarivi
Copy link
Contributor

Not sure to understand the question...
For the chi squared gradient, obviously you can combine the partial derivatives of the visibilities to compute it because of the derivative chaining rule.

@sjperkins
Copy link
Member Author

After discussion with @landmanbester and some experiments this morning it looks like like tensorflow function (tf.gradients) provides the Jacobian of some tensor with respect to input tensors. For example, this example computes y and the Jacobian of y w.r.t. x:

import tensorflow as tf

xshape = (4, 2)

x = tf.ones(shape=xshape, dtype=tf.float64)
y = tf.reduce_sum(x**2 + x)
grad = tf.gradients(y, x)

with tf.Session() as S:
    # Compute y and it's symbolic gradient w.r.t. x
    _y, _grad = S.run([y, grad])
    # _grad[0] contains partial of y w.r.t. x. Same shape as x.
    assert _grad[0].shape == xshape
    print _y, _grad

Then, to produce a gradient operator, one can flatten the Jacobian(s) out to produce a vector of parameters. So if one thinks of lm coordinates instead of x one will have eight parameters (four l and four m coordinates).

Thoughts

  • probably can't compute gradient w.r.t. complex-valued tensors.
  • Need to discover if one can compute gradient of complex-valued tensors w.r.t. real tensors.
  • It's possible to define gradients for custom operators e.g. this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants