Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poisson sample() hangs when lambda is close to max of the float type. #1312

Closed
WarrenWeckesser opened this issue May 3, 2023 · 6 comments
Closed
Labels
A-degenerate Algorithm fails on some input X-bug Type: bug report

Comments

@WarrenWeckesser
Copy link
Collaborator

The follow programs appear to hang indefinitely in the call poisson.sample(&mut rng):

f32

use rand_distr::{Distribution, Poisson};

fn main() {
    let lambda = 1.0e37f32;
    let poisson = Poisson::<f32>::new(lambda).unwrap();
    let mut rng = rand::thread_rng();
    let k = poisson.sample(&mut rng);
    println!("{}", k);
}

f64

use rand_distr::{Distribution, Poisson};

fn main() {
    let lambda = 1.0e306f64;
    let poisson = Poisson::<f64>::new(lambda).unwrap();
    let mut rng = rand::thread_rng();
    let k = poisson.sample(&mut rng);
    println!("{}", k);
}

I ran into this issue while testing gh-1296.

@dhardy
Copy link
Member

dhardy commented May 3, 2023

Related to #1290?

@WarrenWeckesser
Copy link
Collaborator Author

The problem is that when $\lambda$ is sufficiently large, the value of magic_val computed here:

magic_val: lambda * log_lambda - crate::utils::log_gamma(F::one() + lambda),

will be nan, because both terms in that expression will overflow to inf, and inf - inf gives nan.

Then down at

// check with uniform random value - if below the threshold, we are within the target distribution
if rng.gen::<F>() <= check {
break;
}

check will be nan, so the comparison in the if statement will be false, and the loop will never break.

When $\lambda$ is that large, terms result * self.log_lambda and crate::utils::log_gamma(F::one() + result) from

let check = F::from(0.9).unwrap()
* (F::one() + comp_dev * comp_dev)
* (result * self.log_lambda
- crate::utils::log_gamma(F::one() + result)
- self.magic_val)
.exp();

can also overflow, because result will be $\lambda + c\sqrt{\lambda}$, where $c$ is not large. ($\sqrt{\lambda}$ is the standard deviation of the Poisson distribution.)

There is another problem with large $\lambda$, even if it isn't large enough to cause overflow: the terms being subtracted in the expressions shown above will be very close, resulting in significant loss of precision.

I've looked into modifying the calculation when $\lambda$ is large to use an asymptotic version of the check expression, but I don't have anything useful yet.

A different approach to avoiding the problem would be switch to a normal distribution when $\lambda$ is sufficiently large. Convential wisdom is that the normal distribution is a good approximation to the Poisson distribution when $\lambda$ is large, but it would be nice to have a more quantitative result on the quality of the approximation. (John D. Cook has a short blog post about it.)

@dhardy dhardy added the A-degenerate Algorithm fails on some input label Jul 26, 2024
@dhardy
Copy link
Member

dhardy commented Jul 28, 2024

Depending on how large a value we are talking about, another solution is simply to return an error in the constructor.

@dhardy dhardy mentioned this issue Jul 28, 2024
1 task
@benjamin-lieser
Copy link
Collaborator

benjamin-lieser commented Aug 7, 2024

I would go for using the normal approximation with lambda bigger than 1e30
The two cdf function have a maximum error of smaller than 1/sqrt(lambda) according to https://www.johndcook.com/blog/normal_approx_to_poisson/ so this is very well justified

numpy does reject lambda bigger than 1e18
R does work for all ranges (or breaks a bit later than ours in my testing). https://github.com/wch/r-source/blob/trunk/src/nmath/rpois.c

@benjamin-lieser
Copy link
Collaborator

If we don't care about higher precision than f64 it is actually save to just return lambda if lambda is >= 1e35
The standard deviation is then 1e17.5 which means for a significant digit on the scale of 1e35 it has to be more than 10 sigma away, which basically can't happen.

dhardy added a commit that referenced this issue Aug 12, 2024
dhardy pushed a commit that referenced this issue Oct 1, 2024
This addresses #1497 by adding
`Distribution<u64>`
It also solves #1312 by not
allowing `lambda` bigger than `1.844e19` (this also makes them always
fit into `u64`)
@benjamin-lieser
Copy link
Collaborator

Solved with #1498 by introducing maximum lambda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-degenerate Algorithm fails on some input X-bug Type: bug report
Projects
None yet
Development

No branches or pull requests

3 participants