Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Julia 1.11 #284

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Support for Julia 1.11 #284

wants to merge 5 commits into from

Conversation

miguelbiron
Copy link
Collaborator

No description provided.

@miguelbiron
Copy link
Collaborator Author

Hmm a brittle test is failing consistently across OSs

Ubuntu: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420480756#step:8:1790
Macos: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420481066#step:8:1789
Windows: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420481366#step:8:1697

At least all these new values are consistent across platforms. Increasing rtol to 0.02 solves the issue so I'll do that.

@alexandrebouchard
Copy link
Member

Hmm a brittle test is failing consistently across OSs

Ubuntu: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420480756#step:8:1790 Macos: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420481066#step:8:1789 Windows: https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420481366#step:8:1697

At least all these new values are consistent across platforms. Increasing rtol to 0.02 solves the issue so I'll do that.

Interesting...

Why would the ESS change when we change the Julia version??

@miguelbiron
Copy link
Collaborator Author

Ok this one is kind of problematic: the new MemoryRef thing broke the parallelism invariance equality test for something inside the Replicas, not sure what exactly.

https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11296182614/job/31420480756#step:8:4670

It actually may be just the fact that vectors are now memoryref

@miguelbiron
Copy link
Collaborator Author

Why would the ESS change when we change the Julia version??

My guess is that they changed the ESS methodology in MCMCChains slightly, which coincided with this new julia version. Or it could be a change in any of the packages required by that (FFT stuff... who knows).

I guess that to rule out a package change instead of julia version, I could rerun the last test for main and see if that throws that error.

@alexandrebouchard
Copy link
Member

What is a MemoryRef?

@miguelbiron
Copy link
Collaborator Author

So I'm trying to learn right now but I read it in the NEWS.md file.

It actually sounds like a huge change to put in a minor release...

@alexandrebouchard
Copy link
Member

Ah, thank you, I didn't know about that change! Seems pretty major, but should be hidden from users like us, no? I am not sure I understand how it would be break Parallelism Invariance, but I am very curious!

@miguelbiron
Copy link
Collaborator Author

It's just that we probably need to add a custom rule like all the others we have. Here's the error

ref is different between a and b:
	a.f=MemoryRef{Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, ...

@miguelbiron
Copy link
Collaborator Author

Ok I'm gonna stop the tests for now, rerun the one in main to rule out pkg changes for ESS numbers, and diagnose the memoryref issue locally

@alexandrebouchard
Copy link
Member

Ah I see!! So some kind of change of the type hierarchy... agreed with above change

@alexandrebouchard
Copy link
Member

Thank you Miguel!!

@miguelbiron
Copy link
Collaborator Author

miguelbiron commented Oct 11, 2024

Ok so the fields in Arrays are totally different. They now have a ref field of type MemoryRef that contains the data

1.11

julia> v=zeros(2)
2-element Vector{Float64}:
 0.0
 0.0

julia> fieldnames(typeof(v))
(:ref, :size)

julia> v.ref
MemoryRef{Float64}(Ptr{Nothing} @0x0000757f5d276ff0, [0.0, 0.0])

1.10

julia> v=zeros(2)
2-element Vector{Float64}:
 0.0
 0.0

julia> fieldnames(typeof(v))
()

julia> v.ref
ERROR: type Array has no field ref
Stacktrace:
 [1] getproperty(x::Vector{Float64}, f::Symbol)
   @ Base ./Base.jl:37
 [2] top-level scope
   @ REPL[3]:1

@miguelbiron
Copy link
Collaborator Author

miguelbiron commented Oct 11, 2024

So it seems that the equality check passes for arrays with isbit eltype ; e.g.,

target = toy_mvn_target(2)
p1 = pigeons(; target, checkpoint = true)
p2 = PT("results/latest")
Pigeons.recursive_equal(p1.shared.tempering.schedule.grids, p2.shared.tempering.schedule.grids)

returns true, whereas this fails

julia> Pigeons.recursive_equal(p1.replicas, p2.replicas)
ref is different between a and b:
        a.f=MemoryRef{Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}}(Ptr{Nothing} @0x0000757f5c18b880, Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}[Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}([-0.11479492957168834, -0.25615081178503446], 1, SplittableRandom(0xa4c50f5189f1f51a, 0xaabf13ae126f9159), (swap_acceptance_pr = GroupBy: Tuple{Int64, Int64} => Mean
, log_sum_ratio = GroupBy: Tuple{Int64, Int64} => LogSum
, timing_extrema = NonReproducible{...}(GroupBy: Symbol => Extrema
), allocation_extrema = NonReproducible{...}(GroupBy: Symbol => Extrema
)), 1)])
        b.f=MemoryRef{Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}}(Ptr{Nothing} @0x0000757fc07a6420, Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}[Pigeons.Replica{Vector{Float64}, @NamedTuple{swap_acceptance_pr::GroupBy{Tuple{Int64, Int64}, Number, Mean{Float64, EqualWeight}}, log_sum_ratio::GroupBy{Tuple{Int64, Int64}, Number, Pigeons.LogSum{Float64}}, timing_extrema::NonReproducible{...}, allocation_extrema::NonReproducible{...}}}([-0.11479492957168834, -0.25615081178503446], 1, SplittableRandom(0xa4c50f5189f1f51a, 0xaabf13ae126f9159), (swap_acceptance_pr = GroupBy: Tuple{Int64, Int64} => Mean
, log_sum_ratio = GroupBy: Tuple{Int64, Int64} => LogSum
, timing_extrema = NonReproducible{...}(GroupBy: Symbol => Extrema
), allocation_extrema = NonReproducible{...}(GroupBy: Symbol => Extrema
)), 1)])

throws the error.

@miguelbiron
Copy link
Collaborator Author

So it seems that the default == check implemented in Julia for Arrays does not look at the ref field. It does so if you ask ===

julia> p1.shared.tempering.schedule.grids == p2.shared.tempering.schedule.grids
true

julia> p1.shared.tempering.schedule.grids === p2.shared.tempering.schedule.grids
false

@miguelbiron
Copy link
Collaborator Author

Ok this explains the issue, because we do have a special rule for Vector{<:Replica} (and Vector{<:InterpolatedLogPotential} so that will fail too). Will remove these and look for a less heavy handed approach.

@miguelbiron
Copy link
Collaborator Author

miguelbiron commented Oct 11, 2024

BTW, the relaunched tests on main pass the ESS check (except for 2 that are actually running 1.11). The versions of external packages launched are exactly the same. Definitely the culprit is in stdlib, most likely LinearAlgebra since the default ESS method in MCMCChains is a straightforward autocovariance estimator

https://github.com/TuringLang/MCMCDiagnosticTools.jl/blob/04d44cbbfb84ab028165832ae5ecd083ac9cefb8/src/ess_rhat.jl#L148

@miguelbiron
Copy link
Collaborator Author

miguelbiron commented Oct 11, 2024

Ugh another numerical accuracy. This time not via ESS but log normalization constant

https://github.com/Julia-Tempering/Pigeons.jl/actions/runs/11297944298/job/31425868094#step:8:7690

Edit: Good news is that it is not the "truth" value that changed (that would've been catastrophic). They match exactly in 1.10 and 1.11.

@miguelbiron
Copy link
Collaborator Author

@alexandrebouchard it seems that all issues have been fixed. Are you ok with keeping the increased tolerances for those 2 tests?

@alexandrebouchard
Copy link
Member

Thanks a lot Miguel.

Quick question what is "Julia 1" in the CI? Maybe a more descriptive name would be better.

Not opposed to merging per se, but isn't it strange that numerical accuracy of the log normalization constant changes when doing a minor version increment though?

@miguelbiron
Copy link
Collaborator Author

Quick question what is "Julia 1" in the CI? Maybe a more descriptive name would be better.

It tells CI to use the latest released julia major version 1. Now that means 1.11.0. Last week it was 1.10.5.

isn't it strange that numerical accuracy of the log normalization constant changes when doing a minor version increment though?

100%. But I have no idea what we could do in this case. Thoughts?

@miguelbiron
Copy link
Collaborator Author

miguelbiron commented Oct 12, 2024

One thing that changed is the MersenneTwister implementation: from a fresh session

1.11

julia> using Random

julia> rng = MersenneTwister(321)
MersenneTwister(321)

julia> x = rand(rng, 2)
2-element Vector{Float64}:
 0.4643060308852154
 0.5086105036225046

1.10

julia> using Random

julia> rng = MersenneTwister(321)
MersenneTwister(321)

julia> x = rand(rng, 2)
2-element Vector{Float64}:
 0.6306801637635273
 0.9123961296861567

We do use this RNG in the tests, but in the particular ones that failed.

@miguelbiron
Copy link
Collaborator Author

AFAICT, the only thing in common that the two failed tests have is that they use AutoMALA with ForwardDiff as backend. And weirdly, there is an open issue in ForwardDiff about a weird interaction with RNGs, although I don't think it is pertinent to our situation.

@miguelbiron
Copy link
Collaborator Author

Double checking that it is the RNG implementation and not the rand*() functions that changed

Code:

using Random, SplittableRandoms
all_comb = Iterators.product(
    (rand, randn, randexp),
    (MersenneTwister(1), Xoshiro(1), SplittableRandom(1))
)
for (rfun, rng) in all_comb
    println("$rfun\t$(nameof(typeof(rng)))\t$(rand(rng))")
end

1.11

rand    MersenneTwister 0.09913970137863681
randn   MersenneTwister 0.7019797138879542
randexp MersenneTwister 0.503261785841856
rand    Xoshiro 0.07336635446929285
randn   Xoshiro 0.34924148955718615
randexp Xoshiro 0.6988266836914685
rand    SplittableRandom        0.10425610923818551
randn   SplittableRandom        0.8390896224437898
randexp SplittableRandom        0.3726627323565861

1.10

rand	MersenneTwister	0.23603334566204692
randn	MersenneTwister	0.34651701419196046
randexp	MersenneTwister	0.3127069683360675
rand	Xoshiro	0.07336635446929285
randn	Xoshiro	0.34924148955718615
randexp	Xoshiro	0.6988266836914685
rand	SplittableRandom	0.10425610923818551
randn	SplittableRandom	0.8390896224437898
randexp	SplittableRandom	0.3726627323565861

@miguelbiron
Copy link
Collaborator Author

@alexandrebouchard the failed tests in main are due to the slight change in results under Julia 1.11, not due to DynamicPPL version bump.

@miguelbiron miguelbiron changed the title Bump julia versions in CI Support for Julia 1.11 Oct 16, 2024
@miguelbiron
Copy link
Collaborator Author

The current failures are due to an issue with the resolution of the environment

JuliaLang/Pkg.jl#4052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants