Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed posterior for NormalWishart. #28

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

kaselby
Copy link
Contributor

@kaselby kaselby commented Apr 17, 2018

The posterior calculations for most of this package are based on the paper Conjugate Bayesian analysis of the Gaussian distribution by Kevin Murphy. This paper has several typos related to the Normal-Wishart distribution. The ConjugatePriors package fixes the typos related to the pdf of the Normal-Wishart (though I would note that right now the pdf function will not run without errors for this distribution, as it relies on a function called IsApproxSymmetric that does not appear to exist - but that is a separate issue), but it does not fix the typos in the posterior. The paper cites the posterior update for the scale matrix as being:
T_n = T_0 + S + k_0/k_n * n * (mu_0 - xhat)(mu0-xhat)'
The correct update is
inv(T_n) = inv(T_0) + S + k_0/k_n * n * (mu_0 - xhat)(mu0-xhat)'
See https://stats.stackexchange.com/questions/153241/derivation-of-normal-wishart-posterior/324925#324925 for details.

@kaselby
Copy link
Contributor Author

kaselby commented Apr 18, 2018

Just a comment that I have tested this with a MCMC sampler I am using. When I use the existing posterior update, the sampler does not converge, and gives nonsensical results. When I use the update I added here, it converges quickly to the expected answer.

@mschauer
Copy link
Member

mschauer commented May 2, 2018

I checked with https://en.wikipedia.org/wiki/Conjugate_prior#Continuous_distributions, so I agree that this is correct. Is there a way that we could interchange last Cholesky und inversion and return a Wishart which is parametrized differently?

@kaselby
Copy link
Contributor Author

kaselby commented May 2, 2018

After looking at this further, it seems like the T that is intended to be stored in the Normal-Wishart object here is the inverse scale matrix for the Wishart, rather than the scale matrix itself (at least judging by the implementation of the pdf and posterior functions). In this case, the existing posterior update and pdf calculation for the scale matrix should be retained, and they are correct as is. However, T should be inverted before passing it to the Wishart in the rand function. (Also, this should probably be better documented, as it is not the standard parametrization of the Normal-Wishart)

@mschauer
Copy link
Member

mschauer commented May 2, 2018

Great! This is the original implementation: https://github.com/JuliaStats/Distributions.jl/pull/140/files Maybe @nfoti remembers

@mschauer mschauer self-assigned this May 2, 2018
@kaselby
Copy link
Contributor Author

kaselby commented May 2, 2018

I believe I updated the PR to reflect the inverse scale matrix parametrization. It would be good to have someone look it over though to make sure I didn't miss anything. I believe what I have now is the same as the original formulation before my pull request, except for a change to the rand function.

@kaselby
Copy link
Contributor Author

kaselby commented May 5, 2018

After checking again, the pdf function actually did need to be changed for this parametrization. I also fixed a syntax error in logpdf, and commented out a line which called the "isapproxsymmetric" function, which is not a function that appears to exist as far as I can tell.

@kaselby
Copy link
Contributor Author

kaselby commented May 5, 2018

Apologies for making so many commits. I have now gone through and thoroughly tested this and am confident it is now right.

@mschauer
Copy link
Member

mschauer commented May 5, 2018

Don't worry. Can we cook up a test which shows that this is right and the version on master is wrong?

@kaselby
Copy link
Contributor Author

kaselby commented May 5, 2018

I am working on writing something up, I will hopefully post it later today.

@mschauer
Copy link
Member

mschauer commented May 6, 2018

One suggestion would be to use in test that the one-dimensional case falls back to the InverseGamma/Gamma distribution, so they should give corresponding results.

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add some tests for this (especially since it seems like there are bugs that could have been caught with tests before...)

@@ -62,27 +60,28 @@ function logpdf(nw::NormalWishart, x::Vector{T}, Lam::Matrix{T}) where T<:Real
hp = 0.5 * p

# Normalization
logp = hp*(log(kappa) - Float64(log2π))
logp = hp*(log(kappa) - Float64(log(2*π)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why get rid of log2pi? It's a StatsFuns constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I think it was giving me errors when I tried to run it. Maybe I just wasn't importing the required package or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's necessary to add StatsFuns (which also provides logmvgamma which is needed elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logmvgamma is from Distributions. maybe the old lpgamma is a version from StatsFuns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logp -= hnu * logdet(Tchol)
logp -= hnu * p * log(2.)
logp -= lpgamma(p, hnu)
logp -= logmvgamma(p, hnu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little disturbed that this hadn't been fixed until now......

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were other areas too where the current logpdf function crashes. I think somewhere in insupport there is a call to a function that doesn't exist. I didn't fix that in this pull request because it was a separate issue, and I forgot to make a new issue for it.

@@ -52,8 +52,6 @@ function logpdf(nw::NormalWishart, x::Vector{T}, Lam::Matrix{T}) where T<:Real
if !insupport(NormalWishart, x, Lam)
return -Inf
else
p = length(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to remove this? seems like it's needed below...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, you could use nw.dim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was an accident

Lam = rand(Wishart(nw.nu, nw.Tchol))
Lsym = PDMat(Symmetric(inv(Lam) ./ nw.kappa))
mu = rand(MvNormal(nw.mu, Lsym))
V = PDMat(Symmetric(inv(nw.Tchol)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to enforce that these are Symmetric?

Copy link
Contributor Author

@kaselby kaselby May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Julia's matrix inversion can lead to some small inaccuracies (of order 1e-16 or 1e-18 or so) and thus inverse matrices that SHOULD be symmetric will be slightly asymmetric. This shouldn't be a problem, but the distributions we are passing these to require symmetric matrices, and enforce a check for symmetricity. Julia's base is_hermitian function does not have a relative tolerance, and thus without some way of enforcing that the inverted matrix is symmetric, this will crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this somewhere, and this was the conclusion, but I cannot find it anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this might have something to do with the missing isApproxSymmetric function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the symmetry checks are not in the ConjugatePriors package, they are in various places in Distributions. So even if we make our symmetry checks sensitive to a relative tolerance, the checks there won't be. To be honest I have no idea why Julia won't just add a relative tolerance to its is_hermitian function given that the symmetry errors are coming from Julia's own matrix inversion routines, but I was told that that isn't going to happen...

@kleinschmidt
Copy link
Member

Thanks for looking at this! I agree that this parametrization is very weird. Might also be worth adding a docstring so that' the parametrization is documented in the help file at least.

@kaselby
Copy link
Contributor Author

kaselby commented May 21, 2018

To be honest I am still not entirely sure how best to approach the problem of how to parametrize this - but no matter how we decide it is best to do it, the current implementation needs to be fixed. I apologize for not getting back to you on the testing issue, I have been very busy recently so haven't had time to look at it.

@kleinschmidt
Copy link
Member

I agree that the parametrization can be fixed later, but tests are critical before we merge. Even something as simple as calling the functions on a few values would have caught many of these bugs... Thanks for your work on this!

@kleinschmidt
Copy link
Member

kleinschmidt commented May 30, 2018

Also, i'm looking the Normal-Inverse-Wishart and seeing a lot of the same problems (with the old function names, checking for symmetry, etc.). Would you be willing to fix those as well? If not let's merge this after adding some minimal tests (at a minimum just to make sure this code can be executed without errors)

@kleinschmidt
Copy link
Member

kleinschmidt commented May 30, 2018

ah I found the source of that function (as well as another one that's used by NIW):

https://github.com/JuliaStats/Distributions.jl/blob/2fc223f72a2df4934c47e5701b9afc8119e0448f/src/utils.jl#L117-L137

@kaselby
Copy link
Contributor Author

kaselby commented May 30, 2018

I'll try to get to this today and look over everything again. Sorry I've been dragging my heels, I've just been caught up in other work.

@kleinschmidt
Copy link
Member

No worries! I'm in the same boat :)

@@ -40,7 +40,7 @@ end
function insupport(::Type{NormalWishart}, x::Vector{T}, Lam::Matrix{T}) where T<:Real
return (all(isfinite(x)) &&
size(Lam, 1) == size(Lam, 2) &&
isApproxSymmmetric(Lam) &&
#isApproxSymmmetric(Lam) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and hasCholesky) are in Distributions, so can be fixed by qualifying with Distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth moving them over here since I don't think they're used in Distributions anymore...

@kleinschmidt
Copy link
Member

@krylea I have a bit of time to work on this now actually; is it okay if I push more commits to this PR?

@kaselby
Copy link
Contributor Author

kaselby commented May 31, 2018

Go for it!

@kaselby
Copy link
Contributor Author

kaselby commented Jun 14, 2018

After taking another look, I believe the alternate parameterization we were considering here is actually the parameterization for the Normal INVERSE Wishart. I returned to the original parameterization, and added the inversions that were missing in the base code.

@mschauer
Copy link
Member

mschauer commented Sep 6, 2018

What is the status here?

@matbesancon
Copy link
Member

ping @krylea

@mschauer
Copy link
Member

@johnczito Would this fall into your area of expertise, be interesting for you?

@johnczito
Copy link
Member

johnczito commented Aug 16, 2019

Sure. I guess the first thing that comes to mind is that NormalWishart should have a field W that simply stores Wishart(nu, inv(Tchol)). Also, why shouldn't logpdf just be something like this?

function logpdf(nw::NormalWishart, x::Vector{T}, Lam::Matrix{T}) where T<:Real
    Lsym = PDMat(Symmetric(inv(Lam) ./ nw.kappa))
    logpdf(Wishart(nw.nu, inv(nw.Tchol)), Lam) + logpdf(MvNormal(nw.mu, Lsym), x)
end

@johnczito
Copy link
Member

^ That at least suggests a unit test you'd want to implement, in addition to comparing against the NormalGamma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants