Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test at test-independent-fixed_design.R:141 is not stable #451

Closed
yihui opened this issue Aug 13, 2024 · 4 comments · Fixed by #455
Closed

The test at test-independent-fixed_design.R:141 is not stable #451

yihui opened this issue Aug 13, 2024 · 4 comments · Fixed by #455

Comments

@yihui
Copy link
Contributor

yihui commented Aug 13, 2024

expect_equal(y$analysis$power, 0.9, tolerance = testthat_tolerance() * 1000)

It throws an error sporadically, e.g.,

The tolerance was recently raised with e9317fd but may still not be enough.

@yihui yihui changed the title The test at test-independent-fixed_design.R:141 is not stable on macOS The test at test-independent-fixed_design.R:141 is not stable Aug 13, 2024
@nanxstats
Copy link
Collaborator

Apparently, I'm not the stats expert here but @elong0527 how about we set tolerance = 1e-4 explicitly, meaning power = 90.009 is ok but power = 90.01 is not ok.

@elong0527
Copy link
Collaborator

I like a explicit criteria as well.

@LittleBeannie
Copy link
Collaborator

Thanks for pointing it out, @yihui ! I guess there are 2 issues leading to the instability of the testing:

  1. A numerical issue. From line 118-127, I calculate the sample size to get 90% power, which is around 415.7804. Then, from line 130 - 139, I input the sample size of 415.7804 via the implementation of line 132, to check if the sample size gives me 90% power of not.
    If you look into line 132, you will notice I did something like x/y*y. As we discussed before, it is theoretically true, but not hold for laptop numerical calculation. So y's sample size (line 130) is slightly different from 415.7804, which leading a power slightly different of 90%. In my laptop, I got y's power is 0.899991, which fails when I run expect_equal(0.899991, 0.9).

  2. A random number issue. For the MaxCombo test, there are some random number issues, as Nan bring up in Add random seed for running the non-deterministic gs_power_combo()? #340.

@yihui
Copy link
Contributor Author

yihui commented Aug 15, 2024

Got it. Then it seems we need to either raise the tolerance (as I did in PR #455) or set a seed. I noticed that we were using a higher tolerance (0.01) for other tests: https://github.com/Merck/gsDesign2/blob/main/tests/testthat/test-independent-fixed_design.R so I wonder why this test was given a much smaller tolerance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants