No genlasso #595

dajmcdon · 2025-01-23T00:19:48Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
Request a review from one of the current main reviewers:
brookslogan, nmdefries.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

This completes the process of removing the genlasso dependency (because it required a C compiler).

We replace the functionality with glmgen/trendfilter.
But list this in Suggests and test for it if that method is requested.
Also refactor growth_rate() to be a bit less clunky. It could (and often did) return vectors of unexpected lengths (due to NAs or duplicated x-values). This does not play nice with mutate() which was the main example use case.
Add tests for the (many) edge cases of argument interaction.

Note that this does entail breaking changes.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves Use glmgen in growth rate function when ready #57
Closes Move genlasso to Suggests:/Enhances: #585

…into no-genlasso

Merge branch 'dev' into no-genlasso # Conflicts: # DESCRIPTION

dshemetov · 2025-01-24T20:57:42Z

/preview-docs

github-actions · 2025-01-24T21:04:24Z

🚀 Deployed on https://679400526c295d36604a015e--epiprocess.netlify.app

brookslogan · 2025-01-24T21:32:04Z

@dshemetov you got ahead of me, I was just wanting to test this out!! Very useful already.

@dajmcdon how compatible are genlasso and trendfilter supposed to be? There appear to be some changes from dev:

to this PR:

and trendfilter seems like it might be a bit under-regularized in this case.

dajmcdon · 2025-01-24T21:55:27Z

The implementation is completely different. genlasso is a path algorithm while trendfilter uses ADMM with warm-starts (From Ramdas+Tibshirani 2016). So the lambda's visited are likely quite different. The default here is to use the CV minimizer (in both cases). So they shouldn't be too far off, modulo the possible lambdas that could be used. We could switch to the 1se rule if we want more regularization.

brookslogan

Dumping some notes here. I plan to make at least some minor editing commits and maybe take another look / do actual testing when my mind's a bit more fresh.

R/growth_rate.R

NEWS.md

R/growth_rate.R

dshemetov

Installing trendfilter wasn't so bad after installing g++ on Ubuntu, so hopefully this alleviates installation issues, thanks!
Did some sanity checking of the code, as far as I could follow it (mostly the logic here, don't quite understand the external package parameters enough to comment on those)
Did some testing of the functions to make sure we're getting sensible values. Assuming those are fine, this looks good to me.

…rd error

brookslogan · 2025-01-28T18:24:50Z

When I run tests locally, I get a lot of warnings like:

subscript out of bounds (index 29 >= vector size 29)

and at some point something like "partial match of tol = to tolerance".

[Ah, now I can see both again. Here's an example of one:

── Warning (test-growth_rate.R:25:3): new setup args and warnings are as expected ───
subscript out of bounds (index 29 >= vector size 29)
Backtrace:
     ▆
  1. ├─testthat::expect_length(...) at test-growth_rate.R:25:3
  2. │ └─testthat::quasi_label(enquo(object), arg = "object") at testthat/R/expect-length.R:18:3
  3. │   └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
  4. └─epiprocess::growth_rate(y = c(1:20, NA, 22:30), method = "trend_filter")
  5.   ├─stats::predict(obj, newx = x0, which_lambda = which_lambda) at epiprocess/R/growth_rate.R:255:9
  6.   └─trendfilter:::predict.cv_trendfilter(obj, newx = x0, which_lambda = which_lambda)
  7.     ├─stats::predict(object$full_fit, newx, which_lambda, ...)
  8.     └─trendfilter:::predict.trendfilter(...)
  9.       └─base::apply(...)
 10.         └─trendfilter (local) FUN(newX[, i], ...)
 11.           └─dspline::dspline_interp(th, object$k, object$x, newx)
 12.             └─dspline:::rcpp_dspline_interp(v, k, xd, x, implicit)

and an example of the other:

── Warning (test-growth_rate.R:187:3): trendfilter growth_rate implementation ───────
partial match of 'tol' to 'tolerance'
Backtrace:
    ▆
 1. ├─testthat::expect_length(...) at test-growth_rate.R:187:3
 2. │ └─testthat::quasi_label(enquo(object), arg = "object") at testthat/R/expect-length.R:18:3
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
 4. └─epiprocess::growth_rate(y = z, method = "trend_filter", params = growth_rate_params(nfolds = 3))
 5.   └─trendfilter::cv_trendfilter(...) at epiprocess/R/growth_rate.R:246:9
 6.     └─trendfilter::trendfilter(...)
 7.       └─trendfilter:::admm_lambda_seq(...)

]

At least some from these parts of my .Rprofile

options(
  warnPartialMatchArgs = TRUE,
  warnPartialMatchAttr = TRUE,
  warnPartialMatchDollar = TRUE,
  useFancyQuotes = FALSE
)

## seems to work on (currently installed as of time of writing) 4.0.5 though doesn't seem to be documented in ?Logic like in some later version
## thought this was only about [[ not [ though
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_"=TRUE)

Are these epiprocess or upstream problems? Do they need addressed? The subscript out of bounds thing seems a bit fishy, the partial match more innocuous.

brookslogan · 2025-01-28T19:42:16Z

@dajmcdon I finished the rewords I was hoping to do + some of the minor tasks. There's still some lingering stuff that I'm hoping you'll be able to decide what to do with & handle.

(I've linted those overly long lines in another PR.)

This reverts commit 5612df0.

dajmcdon · 2025-01-29T06:11:35Z

OK. By my count, two outstanding issues (after the above commits)

The subscript out of bounds (index 29 >= vector size 29) warning. I can't reproduce this locally, but it happens in CI.
The question of "are the new results" too wiggly? I think this is up for debate. @lcbrooks what do you think about using lambda.1se as the default instead of the minimizer (it was the minimizer previously). I'm not sure what could account for the difference, though Ryan may know.

dajmcdon · 2025-01-29T21:31:25Z

@brookslogan following up on the point about underregularization. I did a bit of investigating. The previous implementation used the cv minimizer, but capped the number of steps in the solution path at 1000. Because of this, lambda_min == min(lambda) and min(lambda) from genlasso is much larger than min(lambda) from trendfilter. The 1000 steps doesn't "go down the path" as far as in the other implementation, visiting only smoother solutions. So it effectively forces more smoothness due to other algorithmic choices.

However, the trendfilter implementation also produces a questionable level of smoothness. CV there also has the property that lambda_min == min(lambda). So ideally, we would consider even smaller levels of regularization!

I'm not sure how much more to beat on this. We could do some other default choice, but I think this is a larger problem (one I've encountered before) that we shouldn't necessarily handle here. We can put it on the tooling agenda for next time we meet with Ryan.

x <- cases_deaths_subset |>
  filter(geo_value == "pa" & time_value >= "2020-06-01") |>
  select(tv = time_value, cases = cases_7d_av) |>
  arrange(tv) |>
  mutate(tv = as.numeric(tv))

# using internal defaults from current epiprocess implementation
gen <- genlasso::trendfilter(x$cases, ord = 3L, maxsteps = 1000L)
cvgen <- genlasso::cv.trendfilter(gen, k = 3)

tf <- trendfilter::cv_trendfilter(x$cases, x$tv, nfolds = 3L) # chosen to match

# both minimizers are at the minimum lambda
plot(cvgen)
plot(tf)

# convert to growth rates using lambda_min
genf <- gen$beta[,cvgen$i.min]
tff <- predict(tf, which_lambda = "lambda_min")
dgenf <- diff(genf) / diff(x$tv)
dgenf <- c(dgenf, dgenf[length(dgenf)])
dtff <- diff(tff) / diff(x$tv)
dtff <- c(dtff, dtff[length(dtff)])

plot(dgenf / genf, ty = "l", col = 1)
lines(dtff / tff, col = 2)

brookslogan · 2025-01-29T21:45:28Z

Nice catches! I do think the current plot looks pretty weird; part of that might be under-regularization; part might be the degree of the polynomial.

I could believe there is a rapid change around Dec & Jan that's oversmoothed in genlasso, but all these other little bumps seem suspect.
The small bumps may look particularly unnatural due to the degree of the spline, but we may sort of be constrained there by attempting apples-to-apples comparison with smoothing splines.

I think in the short term, given that upstream changes may need to be made, we should decide among:

A. tweaking this text to be more accurate:

In this particular example, the trend filtering estimates of growth rate appear to be much more stable than those from the smoothing spline, and also much more stable than the estimates from local relative changes and linear regressions.

B. maybe trying putting 1se as default, seeing if it makes things look better, and remembering to revisit if something changes upstream / the lambda range changes (which might not only change the "min" selection but also "1se"). Conceptually I like 1se's motivation as a default, but I'm afraid there will be even more weird situations than with min, & we're already hitting one with min. Rather than worry about them all now, we could just go with whatever makes this example look better & try to iterate if we/users encounter issues later.

C. both.

dajmcdon · 2025-01-29T22:09:34Z

I'm fine with A, but against B (and C).

Based on the above investigation, B won't have an effect here (it gives the same result as lambda min). CV is not a great estimator for this data, at least implemented as leave 1 fold out and folds are set as every vth observation. The defaults should remain.

We should be thinking of the previously "nice smooth result" as nothing more than an artefact that conveniently looked how we wanted. Along the lines of running gradient descent for only a few iterations and saying, "yeah, I stopped way early, but I like the answer!".

brookslogan · 2025-01-29T22:10:24Z

tests/testthat/test-growth_rate.R

+  )
+
+  # other spline args give output (correctness not checked)
+  z <- rnorm(30)


Think we might be running into some nondeterminism in tests, here & in another similar line in this file. Suggest updating these to something like

z <- withr::with_rng_version("3.5.0", withr::with_seed(7164574, rnorm(30)))

but this doesn't seem to trigger the index bounds warning; I'm going to browse around some seeds.

CI hits the bounds warning even earlier, with the test on line 23. There's no randomness there, so maybe better to poke at that one.

Following one of these deterministic problematic lines down to the first problematic interp call, we arrive at this example:

dspline:::rcpp_dspline_interp(v = c(0.112401954925076, 0.224803909850205, 0.337205864775255, 0.449607819700357, 0.562009774625446, 0.674411729550536, 0.786813684475624, 0.899215639400714, 1.0116175943258, 1.12401954925089, 1.23642150417598, 1.34882345910107, 1.46122541402616, 1.57362736895125, 1.68602932387634, 1.79843127880143, 1.91083323372652, 2.02323518865161, 2.13563714357669, 2.24803909850179, 2.47284300835196, 2.58524496327706, 2.69764691820214, 2.81004887312723, 2.92245082805232, 3.03485278297741, 3.14725473790251, 3.25965669282756, 3.37205864775269), k = 3L, xd = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30), x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), implicit = TRUE)

(from something like

withr::with_options(list(warn = 2, warnPartialMatchDollar = FALSE), { trace(dspline:::rcpp_dspline_interp, tracer = quote({ .GlobalEnv[["problematic_args"]] <- as.list(environment()) })) try(growth_rate(y = c(1:20, NA, 22:30), method = "trend_filter")) untrace(dspline:::rcpp_dspline_interp) dput(problematic_args) })

)

I haven't done any tracing in dspline Rcpp code to figure out what's going on.

Though the culprit seems likely to be how NAs are being omitted before being passed to the rcpp function, supported by:

Warning message: In dspline:::rcpp_dspline_interp(v = c(0.112401954925076, 0.224803909850205, : subscript out of bounds (index 29 >= vector size 29)

29 seems to be a 0-based index here, and

problematic_args$xd %>% length() #> [1] 29 problematic_args$x %>% length() #> [1] 30

suggests that it might be assuming xd and x have the same length and trying to index xd all the way through length(x)

Based on my understanding of that function, as long as the xd contains the range of x there shouldn't be any problem. I can't see where the issue would be in the Cpp code. I'd suggest making an issue in dspline and tagging Ryan.

(Maybe I got mixed up about line numbers for the two warnings and there isn't a problem reproducing either warning. NA testing seems up above seems like much more of a prime suspect for the length check vs. some randomness down here being the an issue. On the fence about setting the RNG vs. not now; setting makes debugging easier, not setting makes detection more reliable in a sense, as we will be doing many draws over many test runs.)

In the process of filing issue upstream, I think I might have found a way to address: try implicit = FALSE.

Or alternatively by only querying non-design points with implicit = TRUE, then recombining with the design points manually. (Upstream issue here.)

dajmcdon and others added 14 commits December 11, 2024 13:07

rm genlasso

cc07ab0

fix trendfilter + growth_rate

975e783

add tests for parameter constructor and splines

a236706

pass minimal tests

967f16a

docs: document (GHA)

2f8b06f

style: styler (GHA)

b41b5fb

slight modifications to the vignette

bbf5a07

add install instructions to vignette

92f3eb0

Merge branch 'no-genlasso' of https://github.com/cmu-delphi/epiprocess …

a6e8785

…into no-genlasso

redocument

f2c9383

merge dev, bump version.

5a242a1

Merge branch 'dev' into no-genlasso # Conflicts: # DESCRIPTION

redocument

0e10f94

add tests, pass checks

5101627

style: styler (GHA)

6d7091c

dajmcdon requested review from dshemetov and brookslogan January 23, 2025 00:20

dajmcdon added 2 commits January 23, 2025 12:02

handle annoying lintr crud

61628fd

add missing fn for pkgdown

3b77ef6

Merge branch 'dev' into no-genlasso

bcdb216

docs: document (GHA)

8758128

brookslogan reviewed Jan 24, 2025

View reviewed changes

brookslogan self-requested a review January 24, 2025 23:17

dshemetov approved these changes Jan 25, 2025

View reviewed changes

brookslogan added 3 commits January 28, 2025 09:46

docs(growth_rate): reword + typo

5280faf

Use glmgen/trendfilter installation command as suggestion

0426ec4

Make growth_rate(method = "trendfilter") without {trendfilter} a ha…

393370f

…rd error

brookslogan and others added 2 commits January 28, 2025 10:06

refactor(growth_rate): preoptimize/normalize some ops

341be94

docs: document (GHA)

d10c440

brookslogan and others added 3 commits January 28, 2025 11:38

fix(growth_rate): single_lambda check

8c92285

Rename growth_rate_global_params() -> growth_rate_params()

b0d3236

docs: document (GHA)

2baa9e0

brookslogan and others added 7 commits January 28, 2025 13:03

Further NEWS rewording

730bc55

remove scaling by sd (now done internally by the method)

264fbcf

remove sorting of x0, not really needed

5a037bc

rm straggling sdy

2ce5daa

satisfy the linter

5612df0

fix rcmd check errors

85e6530

Revert "satisfy the linter"

d5c3498

This reverts commit 5612df0.

brookslogan reviewed Jan 29, 2025

View reviewed changes

brookslogan mentioned this pull request Jan 30, 2025

Better reactions to NAs in dspline_interp(), and implicit = TRUE generating garbage values glmgen/dspline#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No genlasso #595

No genlasso #595

dajmcdon commented Jan 23, 2025

dshemetov commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

brookslogan commented Jan 24, 2025

dajmcdon commented Jan 24, 2025

brookslogan left a comment

dshemetov left a comment •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

dajmcdon commented Jan 29, 2025 •

edited by brookslogan

Loading

dajmcdon commented Jan 29, 2025

brookslogan commented Jan 29, 2025

dajmcdon commented Jan 29, 2025

brookslogan Jan 29, 2025 •

edited

Loading

dajmcdon Jan 29, 2025

brookslogan Jan 29, 2025 •

edited

Loading

brookslogan Jan 29, 2025 •

edited

Loading

dajmcdon Jan 29, 2025

brookslogan Jan 29, 2025

brookslogan Jan 30, 2025

brookslogan Jan 30, 2025

No genlasso #595

Are you sure you want to change the base?

No genlasso #595

Conversation

dajmcdon commented Jan 23, 2025

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dshemetov commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

brookslogan commented Jan 24, 2025

dajmcdon commented Jan 24, 2025

brookslogan left a comment

Choose a reason for hiding this comment

dshemetov left a comment • edited Loading

Choose a reason for hiding this comment

brookslogan commented Jan 28, 2025 • edited Loading

brookslogan commented Jan 28, 2025 • edited Loading

dajmcdon commented Jan 29, 2025 • edited by brookslogan Loading

dajmcdon commented Jan 29, 2025

brookslogan commented Jan 29, 2025

dajmcdon commented Jan 29, 2025

brookslogan Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

dajmcdon Jan 29, 2025

Choose a reason for hiding this comment

brookslogan Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

brookslogan Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

dajmcdon Jan 29, 2025

Choose a reason for hiding this comment

brookslogan Jan 29, 2025

Choose a reason for hiding this comment

brookslogan Jan 30, 2025

Choose a reason for hiding this comment

brookslogan Jan 30, 2025

Choose a reason for hiding this comment

dshemetov left a comment •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

dajmcdon commented Jan 29, 2025 •

edited by brookslogan

Loading

brookslogan Jan 29, 2025 •

edited

Loading

brookslogan Jan 29, 2025 •

edited

Loading

brookslogan Jan 29, 2025 •

edited

Loading