Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDMat error when sampling from prior of model with LKJCholesky #2316

Open
simonsteiger opened this issue Sep 8, 2024 · 13 comments
Open

PDMat error when sampling from prior of model with LKJCholesky #2316

simonsteiger opened this issue Sep 8, 2024 · 13 comments
Assignees

Comments

@simonsteiger
Copy link

simonsteiger commented Sep 8, 2024

Hi,

I have encountered an error while trying to sample from the prior of a Turing model with an LKJCholesky prior. The error seems to stem from PDMats calling oneunit(Any) at some point.

The code example below follows this blog post.

using Distributions, Turing, LinearAlgebra, PDMats

M = rand(MvNormal(zeros(3), I), 30)

@model function test(X, y)
    predictors, _ = size(X)
	
    Ω ~ LKJCholesky(predictors, 2.0)
    σ ~ Exponential(1)
    τ ~ filldist(truncated(Cauchy(0, 2); lower=0), predictors)
    γ ~ filldist(Normal(0, 2), predictors)

    Σ_L = Diagonal(τ) * Ω.L
    Σ = PDMat(Cholesky(Σ_L + 1e-6 * I))
    β = Σ * γ

    return y ~ arraydist([Normal(X[:, i]  β, σ) for i in 1:length(y)])
end

m1 = test(M[:, 2:end], M[:, 1])

sample(m1, Prior(), 1000)

The error raised after the last line is

MethodError: no method matching oneunit(::Type{Any})

with PDMat somewhere further down in the stack trace

PDMats.PDMat(fac::LinearAlgebra.Cholesky{Any, Matrix{Any}}) @ pdmat.jl:20

Swapping out the above model for one where Ω is distributed as LKJ no longer raises an error during prior sampling.

@model function test2(X, y)

    predictors, n = size(X)
	
    Ω ~ LKJ(predictors, 2.0)
    σ ~ Exponential(1)
    τ ~ filldist(truncated(Cauchy(0, 2); lower=0), predictors)
    γ ~ filldist(Normal(0, 2), predictors)

    Σ = Diagonal(τ) * Ω * Diagonal(τ)
    β = Σ * γ

    return y ~ arraydist([Normal(X[:, i]  β, σ) for i in 1:length(y)])
end

Sampling with sample(m1, NUTS(), 1000) works normally.

EDIT
I ran the above examples in Pluto with Distributions v0.25.111, Turing v0.33.3, and PDMats v0.11.31.

When running them in the REPL (Turing v0.28.3 here) I could sample from the prior of m1 without error.

@mhauru mhauru self-assigned this Sep 9, 2024
@mhauru
Copy link
Member

mhauru commented Sep 9, 2024

Thanks @simonsteiger, looking into this.

A smaller repro:

import DynamicPPL, Distributions, LinearAlgebra, PDMats

DynamicPPL.@model function test()
    Ω ~ Distributions.LKJCholesky(1, 2.0)
    Σ_L = LinearAlgebra.Diagonal([1.0]) * Ω.L
    PDMats.PDMat(LinearAlgebra.Cholesky(Σ_L))
end

m = test()
vi = DynamicPPL.VarInfo()
m(vi)
m(vi)

On the second pass the eltype of Ω is Real because of the UnstableVarInfo being used, and then apparently eltype(Diagonal([1.0]) * Real[1.0;;]) == Any for reasons I don't understand.

@mhauru
Copy link
Member

mhauru commented Sep 10, 2024

The issue isn't specific to the distribution, this will happen with any combination of taking a matrix product of a random variable and calling PDMat on it. Here's a model that also fails:

DynamicPPL.@model function test()
    x ~ Distributions.product_distribution([Distributions.Uniform();;])
    mat = x * [1.0;;]
    return PDMats.PDMat(mat)
end

m = test()
vi = DynamicPPL.VarInfo()
m(vi)
m(vi)

I think I understand the problem, but I'm unsure how to best fix it. In the above example, the second time we call m(vi) the UntypedVarInfo vi has a value for x that is internally stored as a Vector{Real}. When the model hits tilde_assume!! for x it fetches that value and reshapes it into a matrix. x * [1.0;;] widens the type into a Matrix{Any} (see Discourse) and then PDMat can't handle it any more.

Some options that I could think of but didn't like:

  1. Concretising the type returned by tilde_assume!! at some point, so that the Vector{Real} would be turned into Vector{Float64} if all values are floats. This feels ad hoc, and type unstable.
  2. Submitting a Base Julia PR to make it so that matrix products wouldn't widen the eltype unnecessarily. Not exactly a quick fix.

Any thoughts @yebai, @sunxd3, @willtebbutt?

@simonsteiger
Copy link
Author

Thank you for taking the time to look into this! :)

@yebai
Copy link
Member

yebai commented Sep 10, 2024

Cc @devmotion, who might have encountered this before.

@devmotion
Copy link
Member

the UntypedVarInfo vi has a value for x that is internally stored as a Vector{Real}

Could this be changed?

In my opinion, matrix multiplication shouldn't promote containers of Reals to Any but I assume the best we can do on our side is to operate with concrete types as much as possible. If you end up with arrays of Any, then you are lost - any operation that requires something more concrete will fail, and most numerical algorithms require (IMO correctly) at least Numbers.

so that the Vector{Real} would be turned into Vector{Float64} if all values are floats.

Such a Float64-specific heuristic seems problematic if the elements are, e.g., Duals or Ints. But it seems we could use the map(identity, x) trick. At least for the example above this seems to fix the problem:

julia> using Turing, PDMats

julia> @model function test()
           x ~ product_distribution([Distributions.Uniform();;])
           mat = (Base.isconcretetype(x) ? x : map(identity, x)) * [1.0;;]
           return PDMat(mat)
       end;

julia> m = test();

julia> vi = DynamicPPL.VarInfo();

julia> m(vi)
1×1 PDMat{Float64, Matrix{Float64}}:
 0.871459128598002

julia> m(vi)
1×1 PDMat{Float64, Matrix{Float64}}:
 0.871459128598002

julia> m(vi)
1×1 PDMat{Float64, Matrix{Float64}}:
 0.871459128598002

Maybe we could make use of that internally?

@mhauru
Copy link
Member

mhauru commented Sep 11, 2024

In my opinion, matrix multiplication shouldn't promote containers of Reals to Any but I assume the best we can do on our side is to operate with concrete types as much as possible. If you end up with arrays of Any, then you are lost - any operation that requires something more concrete will fail, and most numerical algorithms require (IMO correctly) at least Numbers.

Fully agree with both points.

But it seems we could use the map(identity, x) trick. [...] Maybe we could make use of that internally?

Yeah, I wonder this as well. It could be part of the transform that turns VarInfo-internal values into values as the model sees them. This might mess with type stability though; You'd have VarInfo with eltype Real and only at runtime would it be determined what type actually gets retuned when you ask for a variable. Not sure how much that type instability matters though, if it only occurs in cases when the eltype is abstact anyway.

This touches on the whole TypedVarInfo/UntypedVarInfo distinction. I wondered about that as I was working on VarNamedVector: Could we make it such that every time new values are pushed to a VarInfo, a check is done for whether a) we need to widen the eltype of the VarInfo to accommodate the value, or b) we can narrow the eltype of the VarInfo, making it more concrete? You would still need the NamedTuple-keyed-by-varname-symbols trick to deal with variables having different types, but in many cases this might help us stick to concrete element types. It would result in type instability, but hopefully only in cases where currently we just resign to using an abstact eltype.

Note that VarNamedVector as implemented in TuringLang/DynamicPPL.jl#555 does check a), but not check b). Also, I partially had something like the above paragraph in mind when I proposed TuringLang/DynamicPPL.jl#653.

cc @torfjelde too

@yebai
Copy link
Member

yebai commented Sep 11, 2024

One possible VarNamedVector optimisation is to have multiple value containers (e.g. Tuple of Vectors) for different types, e.g. Bool and Float64. Such an optimisation combines the advantages of TypedVarInfo and SimpleVarInfo, but also avoids compilation overheads of large NamedTuples (since we have a NamedTuple key for each type, rather than for each symbol of model parameters).

@torfjelde
Copy link
Member

On my phone so will reply more extensively later, but I'd highly recommend against automatic expansion of container. It adds a lot of complexity that can be captured in a simple "convert from abstract type to specialized" step after constructing the container.

@mhauru
Copy link
Member

mhauru commented Oct 28, 2024

On my phone so will reply more extensively later, but I'd highly recommend against automatic expansion of container. It adds a lot of complexity that can be captured in a simple "convert from abstract type to specialized" step after constructing the container.

Wouldn't this complexity be nicely hidden away behind the VarNamedVector front though? I think having a well defined interface for VarNamedVector would allow us to do exactly things like these. We just call setindex!! or setindex_internal!! on a VarNamedVector, and whether that operation turned out to be a simple setindex! on a Vector, or some complicated thing of checking what the most concrete common type is and reallocating memory for it, is of no consequence for the caller. On the contrary, the caller would know that the type of of the thing they get back is as concrete as possible, which means that handling it gets less complex for them.

@torfjelde
Copy link
Member

torfjelde commented Oct 28, 2024

Sure, but

  1. It will complicate the internal code.
  2. It creates further discrepancy between setindex! and setindex!!, since the former cannot handle this case.

IMO it doesn't seem worth it only to save us from doing a single line of code to convert from type "unstable" to "stable", given that right now there's really no end-user making use of VarNamedVector; it's only us 🤷

@mhauru
Copy link
Member

mhauru commented Oct 29, 2024

given that right now there's really no end-user making use of VarNamedVector; it's only us 🤷

I'm thinking of this for the future when VarNamedVector is the default VarInfo backend.

IMO it doesn't seem worth it only to save us from doing a single line of code to convert from type "unstable" to "stable"

I think it saves us more much than one line of code. I find it quite confusing which operations return a TypedVarInfo and which an UntypedVarInfo, and when the conversion happens. I've also, for instance in working on the new Gibbs sampler modifications, hit several snags were something doesn't work because pushing new values into a TypedVarInfo is tricky. You can make a bunch of very reasonable seeming operations of pushing and changing values in a VarInfo, one of which suddenly fails because of some strange error about mismatched types. This effectively exposes users of VarNamedVector/VarInfo (by which I don't mean necessairly end users of Turing.jl, but developers of other parts of the ecosystem) to implementation details they did not want to think about.

I dream of a world where

  1. We always use setindex!! rather than setindex!
  2. It always succeeds if the call is reasonable, regardless of e.g. what variables have or have not been added to the VarInfo before or whether the VarInfo has been through some "type stabilisation" process.
  3. The result is always a VarInfo with as concrete types as possible.
  4. Maybe, and this is stretch goal that might fail, there is no UntypedVarInfo anymore. It's all a single type (maybe TypedVarInfo, maybe SimpleVarInfo, maybe something new). This would simplify varinfo.jl massively, and make the code much easier to reason about.

Code internal to VarNamedVector would get more complicated, yes, but there would be no more "leakeage" of implementation details causing weird errors. It would just work. Contrast this with how people e.g. implement dictionary types in many languages. Underneath, there might be some very complex hash table something-something going on, that dynamically allocates more memory using some clever algorithm that predicts future usage and tries to achieve amortized O(1) operations and blahblahblah, but when I want to put stuff into my dictionary, or get stuff from my dictionary, I do not have to care about any of that. My mental model is just that it maps keys to values and does it fast, and that's all I need to know. I would like VarNamedVector/VarInfo to have a similarly straight-forward and reliable interface.

@torfjelde
Copy link
Member

I dream of a world where

Haha, fair enough 👍 I'm very much in agreement that it'll make things better if we can do it in a nice way:)

But there are also realisitic scenarios where you don't actually want the TypedVarInfo representaiton, e.g. if you're working with a model with loads of different symbols. So I don't think we can avoid the conversion between a NamedTuple-backed VarInfo and not? So would we still keep around the distinction between TypedVarInfo and UntypedVarInfo?

@mhauru
Copy link
Member

mhauru commented Oct 30, 2024

But there are also realisitic scenarios where you don't actually want the TypedVarInfo representaiton, e.g. if you're working with a model with loads of different symbols. So I don't think we can avoid the conversion between a NamedTuple-backed VarInfo and not? So would we still keep around the distinction between TypedVarInfo and UntypedVarInfo?

Yeah, this is the part where my stretch goal 4. might fail. I don't understand well enough when, and how badly, we want to avoid TypedVarInfo, to really opine on whether this can be avoided. I can imagine having loads of symbols could hit compile times hard.

Another scenario that I fear a bit is that if we incrementally build up a TypedVarInfo during the first execution pass, we might do a lot of compiling for various types, like first an empty TypedVarInfo, then one with one symbol, then one with two symbols, etc. Whereas currently we only compile for UntypedVarInfo and TypedVarInfo with all the variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants