Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use data.table for grouped sort in pw_info() #300

Merged
merged 1 commit into from
Dec 22, 2023

Conversation

jdblischak
Copy link
Collaborator

The profiling indicated that dplyr::group_by() and dplyr::arrange() were two of the biggest bottlenecks:

image

I confirmed these were the ones in pw_info():

gsDesign2/R/pw_info.R

Lines 207 to 210 in 6d4371a

ans <- ans %>%
group_by(time, stratum) %>%
arrange(t, .by_group = TRUE) %>%
ungroup()

In my previous PR #295, I had delayed converting this to data.table. Once I remembered that group_by() also orders the columns, I was able to figure it out. Here is example code to show that the two approaches are identical:

set.seed(1)
d <- data.frame(
  x = sample(letters[1:3], size = 100, replace = TRUE),
  y = sample(letters[4:5], size = 100, replace = TRUE),
  z = sample(1:5, size = 100, replace = TRUE)
)

library("dplyr")
x1 <- d %>% group_by(x, y) %>% arrange(z, .by_group = TRUE) %>% ungroup()
x1 <- as.data.frame(x1)

library("data.table")
x2 <- as.data.table(d)
setorder(x2, x, y)
x2 <- x2[order(z), .SD, by = .(x, y)]
setDF(x2)

all.equal(x1, x2)

In repeated benchmarking of gs_power_ahr(), this update removes reduces the runtime by 70-140 ms

I also removed an isolated dplyr::transmute() from gs_info_rd() and updated the documentation to reflect that now a data frame is returned

@jdblischak jdblischak self-assigned this Dec 19, 2023
@jdblischak
Copy link
Collaborator Author

xref: #219

Copy link
Collaborator

@keaven keaven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update

@nanxstats nanxstats merged commit 9ef6050 into Merck:main Dec 22, 2023
8 checks passed
@jdblischak jdblischak deleted the replace-grouped-sort branch December 22, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants