-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifold optimization #435
Changes from 22 commits
2b9b7f9
9aa58c1
0ea8ff5
765e574
7e64c2a
dd106e2
74cd17b
02f05f6
7b7002e
1f8315e
2aadf79
cff44ec
583997f
1feb7b6
1bb787b
018cc81
0cba7ee
2878095
376178b
e71a6ed
2ae30ee
4cafc1a
e9c8566
c1f1bc5
0d8a1ac
44a186b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Complex optimization | ||
Optimization of functions defined on complex inputs (C^n to R) is supported by simply passing a complex `x0` as input. All zeroth and first order optimization algorithms are supported. For now, only explicit gradients are supported. | ||
|
||
The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`. | ||
|
||
The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Manifold optimization | ||
Optim.jl supports the minimization of functions defined on Riemannian manifolds, i.e. with simple constraints such as normalization and orthogonality. The basic idea of such algorithms is to project back ("retract") each iterate of an unconstrained minimization method onto the manifold. This is used by passing a `manifold` keyword argument to the optimizer. | ||
|
||
## Howto | ||
Here is a simple test case where we minimize the Rayleigh quotient `<x, A x>` of a symmetric matrix `A` under the constraint `||x|| = 1`, finding an eigenvector associated with the lowest eigenvalue of `A`. | ||
```julia | ||
n = 10 | ||
A = Diagonal(linspace(1,2,n)) | ||
f(x) = vecdot(x,A*x)/2 | ||
g(x) = A*x | ||
g!(stor,x) = copy!(stor,g(x)) | ||
x0 = randn(n) | ||
|
||
manif = Optim.Sphere() | ||
Optim.optimize(f, g!, x0, Optim.ConjugateGradient(manifold=manif)) | ||
``` | ||
|
||
## Supported solvers and manifolds | ||
All first-order optimization methods are supported. | ||
|
||
The following manifolds are currently supported: | ||
* Flat: Euclidean space, default. Standard unconstrained optimization. | ||
* Sphere: spherical constraint `||x|| = 1` | ||
* Stiefel: Stiefel manifold of N by n matrices with orthogonal columns, i.e. `X'*X = I` | ||
|
||
The following meta-manifolds construct manifolds out of pre-existing ones: | ||
* PowerManifold: identical copies of a specified manifold | ||
* ProductManifold: product of two (potentially different) manifolds | ||
|
||
See `test/multivariate/manifolds.jl` for usage examples. | ||
|
||
Implementing new manifolds is as simple as adding methods `project_tangent!(M::YourManifold,x)` and `retract!(M::YourManifold,g,x)`. If you implement another manifold or optimization method, please contribute a PR! | ||
|
||
## References | ||
The Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tomás A. Arias, Steven T. Smith, SIAM. J. Matrix Anal. & Appl., 20(2), 303–353 | ||
|
||
Optimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
# Manifold interface: every manifold (subtype of Manifold) defines the functions | ||
# project_tangent!(m, g, x): project g on the tangent space to m at x | ||
# retract!(m, x): map x back to a point on the manifold m | ||
|
||
# For mathematical references, see e.g. | ||
|
||
# The Geometry of Algorithms with Orthogonality Constraints | ||
# Alan Edelman, Tomás A. Arias, and Steven T. Smith | ||
# SIAM. J. Matrix Anal. & Appl., 20(2), 303–353. (51 pages) | ||
|
||
# Optimization Algorithms on Matrix Manifolds | ||
# P.-A. Absil, R. Mahony, R. Sepulchre | ||
# Princeton University Press, 2008 | ||
|
||
|
||
abstract type Manifold | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to add docstrings at least to all the new types |
||
end | ||
|
||
|
||
# Fake objective function implementing a retraction | ||
type ManifoldObjective{T<:NLSolversBase.AbstractObjective} <: NLSolversBase.AbstractObjective | ||
manifold::Manifold | ||
inner_obj::T | ||
end | ||
iscomplex(obj::ManifoldObjective) = iscomplex(obj.inner_obj) | ||
# TODO is it safe here to call retract! and change x? | ||
function NLSolversBase.value!(obj::ManifoldObjective, x) | ||
xin = complex_to_real(obj, retract(obj.manifold, real_to_complex(obj,x))) | ||
value!(obj.inner_obj, xin) | ||
end | ||
function NLSolversBase.value(obj::ManifoldObjective) | ||
value(obj.inner_obj) | ||
end | ||
function NLSolversBase.gradient(obj::ManifoldObjective) | ||
gradient(obj.inner_obj) | ||
end | ||
function NLSolversBase.gradient(obj::ManifoldObjective,i::Int) | ||
gradient(obj.inner_obj,i) | ||
end | ||
function NLSolversBase.gradient!(obj::ManifoldObjective,x) | ||
xin = complex_to_real(obj, retract(obj.manifold, real_to_complex(obj,x))) | ||
gradient!(obj.inner_obj,xin) | ||
project_tangent!(obj.manifold,real_to_complex(obj,gradient(obj.inner_obj)),real_to_complex(obj,xin)) | ||
return gradient(obj.inner_obj) | ||
end | ||
function NLSolversBase.value_gradient!(obj::ManifoldObjective,x) | ||
xin = complex_to_real(obj, retract(obj.manifold, real_to_complex(obj,x))) | ||
value_gradient!(obj.inner_obj,xin) | ||
project_tangent!(obj.manifold,real_to_complex(obj,gradient(obj.inner_obj)),real_to_complex(obj,xin)) | ||
return value(obj.inner_obj) | ||
end | ||
|
||
# fallback for out-of-place ops | ||
project_tangent(M::Manifold,x) = project_tangent!(M, similar(x), x) | ||
retract(M::Manifold,x) = retract!(M, copy(x)) | ||
|
||
# Flat manifold = {R,C}^n | ||
# all the functions below are no-ops, and therefore the generated code | ||
# for the flat manifold should be exactly the same as the one with all | ||
# the manifold stuff removed | ||
struct Flat <: Manifold | ||
end | ||
retract(M::Flat, x) = x | ||
retract!(M::Flat,x) = x | ||
project_tangent(M::Flat, g, x) = g | ||
project_tangent!(M::Flat, g, x) = g | ||
|
||
# {||x|| = 1} | ||
struct Sphere <: Manifold | ||
end | ||
retract!(S::Sphere, x) = normalize!(x) | ||
project_tangent!(S::Sphere,g,x) = (g .= g .- real(vecdot(x,g)).*x) | ||
|
||
# N x n matrices with orthonormal columns, i.e. such that X'X = I | ||
# Special cases: N x 1 = sphere, N x N = O(N) / U(N) | ||
abstract type Stiefel <: Manifold end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it's not done all over the code base, but a simple reference or explanation of what the "Stiefel manifold" is would be nice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an important but pretty special manifold, no? What is the justification for having it as part of Optim? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will document it. The main justification is that it is the one I need in my application ;-) More seriously, it's the basic manifold to do this kind of algorithms on: it was the original motivation for the theory, many other manifolds (sphere, O(n), U(n)) are special cases, it's probably the most used in applications (at least that I know of) outside of the sphere, and it's a good template for implementation of other manifolds. There could be a Manifolds package living outside Optim, but it's a pretty short file so I would think this is fine, and people implementing other manifolds can just PR on Optim? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point about the special cases. Ok maybe leave this for now and discuss moving outside of Optim only when somebody complains |
||
# Two types of retraction: SVD is the most stable, CholQR the fastest | ||
struct Stiefel_CholQR <: Stiefel end | ||
struct Stiefel_SVD <: Stiefel end | ||
function Stiefel(retraction=:SVD) | ||
if retraction == :CholQR | ||
Stiefel_CholQR() | ||
elseif retraction == :SVD | ||
Stiefel_SVD() | ||
end | ||
end | ||
|
||
function retract!(S::Stiefel_SVD, X) | ||
U,S,V = svd(copy(X)) | ||
X .= U*V' | ||
end | ||
function retract!(S::Stiefel_CholQR, X) | ||
overlap = X'X | ||
X .= X/chol(overlap) | ||
end | ||
project_tangent!(S::Stiefel, G, X) = (G .-= X*((X'G .+ G'X)./2)) | ||
|
||
|
||
# multiple copies of the same manifold. Points are arrays of arbitrary | ||
# dimensions, and the first (given by inner_dims) are points of the | ||
# inner manifold. E.g. the product of 2x2 Stiefel manifolds of | ||
# dimension N x n would be a N x n x 2 x 2 matrix | ||
struct PowerManifold<:Manifold | ||
inner_manifold::Manifold #type of embedded manifold | ||
inner_dims::Tuple #dimension of the embedded manifolds | ||
outer_dims::Tuple #number of embedded manifolds | ||
end | ||
function retract!(m::PowerManifold, x) | ||
for i=1:prod(m.outer_dims) | ||
retract!(m.inner_manifold,get_inner(m, x, i)) | ||
end | ||
x | ||
end | ||
function project_tangent!(m::PowerManifold, g, x) | ||
for i=1:prod(m.outer_dims) | ||
project_tangent!(m.inner_manifold,get_inner(m, g, i),get_inner(m, x, i)) | ||
end | ||
g | ||
end | ||
@inline function get_inner(m::PowerManifold, x, i::Int) | ||
size_inner = prod(m.inner_dims) | ||
size_outer = prod(m.outer_dims) | ||
@assert 1 <= i <= size_outer | ||
return reshape(view(x, (i-1)*size_inner+1:i*size_inner), m.inner_dims) | ||
end | ||
@inline get_inner(m::PowerManifold, x, i::Tuple) = get_inner(m, x, ind2sub(m.outer_dims, i...)) | ||
|
||
#Product of two manifolds {P = (x1,x2), x1 ∈ m1, x2 ∈ m2}. | ||
#P is stored as a flat 1D array, and x1 is before x2 in memory | ||
struct ProductManifold<:Manifold | ||
m1::Manifold | ||
m2::Manifold | ||
dims1::Tuple | ||
dims2::Tuple | ||
end | ||
function retract!(m::ProductManifold, x) | ||
retract!(m.m1, get_inner(m,x,1)) | ||
retract!(m.m2, get_inner(m,x,2)) | ||
x | ||
end | ||
function project_tangent!(m::ProductManifold, g, x) | ||
project_tangent!(m.m1, get_inner(m, g, 1), get_inner(m, x, 1)) | ||
project_tangent!(m.m2, get_inner(m, g, 2), get_inner(m, x, 2)) | ||
g | ||
end | ||
function get_inner(m::ProductManifold, x, i) | ||
N1 = prod(m.dims1) | ||
N2 = prod(m.dims2) | ||
@assert length(x) == N1+N2 | ||
if i == 1 | ||
return reshape(view(x, 1:N1),m.dims1) | ||
elseif i == 2 | ||
return reshape(view(x, N1+1:N1+N2), m.dims2) | ||
else | ||
error("Only two components in a product manifold") | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clear to me now. But the sentence
still reads as if this is always true, but it is only true in this example (and many others). Maybe you could add the "rarely" somewhere? But I think my original point was valid: if g were complex-differentiable then you can define the hessian as a C^{n x n} matrix. You are just arguing that this is "rare"? Correct? But is it more "rare" than the objective f being complex-differentiable?
I have no experience this this at all so will take your word - I just want the documentation to be clear. If you can add a reference that would help as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a "in general".
I don't know how rare it is. Some simple functions do have a C^{n x n} matrix, e.g. |z|^2, z' A z, others don't. I have encountered two cases where that happens: |z|^4 (the nonlinearity in the Gross-Pitaevskii equation) and Im(ln(z)) (used in the computation of Wannier functions).
I don't have a good reference for this. Even the definition of the gradient is somewhat non-standard (even though it's cleary the right thing to do for C^n to R optimization). Since half of quantum mechanics is minimizing complex-to-real energies (the other half being perturbation), the physics literature is littered with "dE/dz^*" with no details, so this might be covered in a physics textbook somewhere.