Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninformative Error Message when Sample Size is 1 #67

Open
nsgrantham opened this issue Feb 16, 2022 · 5 comments
Open

Uninformative Error Message when Sample Size is 1 #67

nsgrantham opened this issue Feb 16, 2022 · 5 comments

Comments

@nsgrantham
Copy link

I am encountering an issue where if I pass a Turing model of type DynamicPPL.Model to psis_loo() it returns values of -Inf or NaN.

However, when I pass it a log-likelihood function and the data as an array of tuples, it behaves "correctly" (in the sense that it returns finite values, but I have not independently verified that they are accurate).

The following is a reproducible example.

using Distributions
using DataFrames
using Turing
using ParetoSmooth

function generate_persons(; n_years=1000, max_age=65, n_births=20, age_of_marriage=18)
    # https://github.com/rmcelreath/rethinking/blob/master/R/sim_happiness.R

    invlogit(x) = 1 ./ (exp(-x) .+ 1)
    age = fill(1, n_births)
    married = fill(false, n_births)
    happiness = collect(range(-2, 2, length=n_births))

    for _ in 1:n_years
        age .+= 1
        append!(age, fill(1, n_births))
        append!(married, fill(false, n_births))
        append!(happiness, collect(range(-2, 2, length=n_births)))

        for i in eachindex(age, married, happiness)
            if age[i] >= age_of_marriage && !married[i]
                married[i] = rand(Bernoulli(invlogit(happiness[i] - 4)))
            end
        end

        deaths = findall(age .> max_age)
        deleteat!(age, deaths)
        deleteat!(married, deaths)
        deleteat!(happiness, deaths)
    end

    DataFrame(
        age = age,
        married = convert.(Int, married),
        happiness = happiness
    )
end

persons = generate_persons()
adults = persons[persons.age .> 17, :]
min_age, max_age = extrema(adults.age)
adults.age = (adults.age .- min_age) ./ (max_age - min_age)

@model function adult_happiness(h, a)
    α ~ Normal(0, 1)
    β ~ Normal(0, 2)
    σ ~ Exponential(1)
    μ = α .+ β .* a
    h ~ MvNormal(μ, σ)
end

model = adult_happiness(adults.happiness, adults.age)
post = sample(model, NUTS(), 10_000)

Now, reading through the code in src/TuringHelpers.jl in this repo, it seems as though there's a method that supports Turing models directly. However, if I run

psis_loo(model, post)

then I get

┌ Warning: Some Pareto k values are extremely high (>1). PSIS will not produce consistent estimates.
└ @ ParetoSmooth ~/.julia/packages/ParetoSmooth/agZDP/src/InternalHelpers.jl:43
Results of PSIS-LOO-CV with 10000 Monte Carlo samples and 1 data points. Total Monte Carlo SE of NaN.
┌───────────┬───────┬──────────┬───────┬─────────┐
│           │ total │ se_total │  mean │ se_mean │
├───────────┼───────┼──────────┼───────┼─────────┤
│   cv_elpd │  -Inf │      NaN │  -Inf │     NaN │
│ naive_lpd │  -Inf │      NaN │  -Inf │     NaN │
│     p_eff │   NaN │      NaN │   NaN │     NaN │
└───────────┴───────┴──────────┴───────┴─────────┘

Instead, I can define the appropriate log-likelihood function and provide the data in an array of tuples:

function ll_fun(α, β, σ, data)
    h, a = data
    logpdf(Normal.+ β .* a, σ), h)
end

psis_loo(ll_fun, post, collect(zip(adults.happiness, adults.age)))

And this yields something that seems correct.

[ Info: No source provided for samples; variables are assumed to be from a Markov Chain. If the samples are independent, specify this with keyword argument `source=:other`.
Results of PSIS-LOO-CV with 10000 Monte Carlo samples and 960 data points. Total Monte Carlo SE of 0.019.
┌───────────┬──────────┬──────────┬───────┬─────────┐
│           │    total │ se_total │  mean │ se_mean │
├───────────┼──────────┼──────────┼───────┼─────────┤
│   cv_elpd │ -1551.04 │    13.81 │ -1.62 │    0.01 │
│ naive_lpd │ -1548.62 │    13.75 │ -1.61 │    0.01 │
│     p_eff │     2.42 │     0.08 │  0.00 │    0.00 │
└───────────┴──────────┴──────────┴───────┴─────────┘

I would expect these two approaches to yield the same result.

Am I simply using psis_loo() incorrectly when passing it a Turing model of type DynamicPPL.Model? The language that Results of PSIS-LOO-CV with 10000 Monte Carlo samples and 1 data points seems to suggest that it does not recognize that model contains multiple data points.

For now I will continue to use the ll_fun approach, but wanted to call this out here in case there is something that needs a closer look in TuringHelpers.jl.

@ParadaCarleton
Copy link
Member

The problem is that the Turing model has an error; it is using MvNormal to mean a sequence of independent, normally-distributed values.

However, MvNormal is supposed to be used when each observation is a vector of correlated observations. For example, if I want to predict each person's height and weight (which are correlated), I could say [height, weight] ~ MvNormal([6 ft, 150 lbs], Σ). As written, your model says "I have a single observation consisting of one really long vector of uncorrelated features." When this single observation is removed, the model breaks because it's being trained on a sample of 0 while trying to predict the whole sample.

Rewriting the Turing model like this should fix the problem:

@model function adult_happiness(h, a)
    α ~ Normal(0, 1)
    β ~ Normal(0, 2)
    σ ~ Exponential(1)
    μ = α .+ β .* a
    @. h ~ Normal(μ, σ)
end

@ParadaCarleton
Copy link
Member

(Don't worry, this is a very common mistake -- common enough that I should probably add an explanation when the model has a sample size of 1 😄)

@ParadaCarleton ParadaCarleton changed the title psis_loo() does not behave as expected when given a Turing model Uninformative Error Message when Sample Size is 1 Feb 16, 2022
@nsgrantham
Copy link
Author

Thanks for the quick response and the fix, @ParadaCarleton. I understand the source of the problem now. This is my first time encountering the @. macro, very useful.

I'm not sure I would say that model has an error, however. Using the MvNormal distribution with a mean vector and variance scalar as a "shorthand" for a vector of independent Normal distributions is a convention that Turing's own documentation uses (see the Linear Regression tutorial). I also recall seeing somewhere (perhaps in an issue that I can't seem to find again) that using MvNormal in this way, rather than for i in 1:N; h[i] ~ Normal(...); end, is better for the sampler from a computational perspective.

In any case, I can see how using MvNormal in this way would confuse psis_loo() because it isn't clear from the code alone whether this is a single observation or multiple independent observations.

@nsgrantham
Copy link
Author

And yes, changing h ~ MvNormal(μ, σ) to @. h ~ Normal(μ, σ) resolves the issue.

@ParadaCarleton
Copy link
Member

I'm not sure I would say that model has an error, however. Using the MvNormal distribution with a mean vector and variance scalar as a "shorthand" for a vector of independent Normal distributions is a convention that Turing's own documentation uses (see the Linear Regression tutorial).

I believe @torfjelde commented on this by saying:

IMO it does go against the semantic meaning of MvNormal: it represents a single multivariate random variable, not a collection of such. Hence if you want a collection of independent observations, then you should use .. Note that both the LHS and RHS of . can be arrays, e.g. [1.0, 2.0] .~ [Normal(), Normal()] also works.

I also recall seeing somewhere (perhaps in an issue that I can't seem to find again) that using MvNormal in this way, rather than for i in 1:N; h[i] ~ Normal(...); end, is better for the sampler from a computational perspective.

Broadcasting (.~) is definitely faster than using a loop, but I think that broadcasting should be just as fast as using MvNormal. Don't quote me on that though.

That being said, I'm working on a PR that will fix this by creating syntax that lets users unambiguously specify what they mean by an independent observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants