Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisions to prediction with lm_lin() #415

Open
mollyow opened this issue Jan 7, 2025 · 1 comment · May be fixed by #416
Open

Revisions to prediction with lm_lin() #415

mollyow opened this issue Jan 7, 2025 · 1 comment · May be fixed by #416

Comments

@mollyow
Copy link

mollyow commented Jan 7, 2025

Currently prediction does not work for lm_lin() with multi-valued or factorial treatments. This is because of how predict.lm_robust() handles generation of the lin estimator model matrix with new data. The treatment name saved in the lm_lin() model object refers to the original variable name, which may have been transformed in the model matrix to multiple columns, causing some disagreement when treatment x covariate interactions are created. The original variable name doesn't exist in the revised model matrix, and/or the new data model matrix doesn't have correct dimensions when multiplied by coefficients.

See here.

For example:

library(estimatr)
set.seed(60637)

N <- 40
dat <- data.frame(
  x = rnorm(N, mean = 2.3),
  x2 = rpois(N, lambda = 2),
  x3 = runif(N)
)

dat$y0 <- rnorm(N) + dat$x
dat$y1 <- dat$y0 + 0.35
dat$y2 <- dat$y0 + 0.55

dat$z_multi <- sample(0:2, size = nrow(dat), replace = TRUE)
dat$z_bin <- 1*(dat$z_multi>0)
dat$y <- (dat$z_multi == 0)*dat$y0 + (dat$z_multi == 1)*dat$y1 + (dat$z_multi == 2)*dat$y2
# Multi-valued numeric treatments with lm_lin; estimation works as expected
lmlin_mult <- lm_lin(y ~ z_multi, covariates = ~ x, data = dat)
# prediction does not
predict(lmlin_mult, newdata = dat)
# Error in X[, !beta_na, drop = FALSE] :
#   (subscript) logical subscript too long

# Binary factorial treatment with lm_lin; estimation works,
lmlin_bin_f <- lm_lin(y ~ as.factor(z_bin), covariates = ~ x + x2 + x3, data = dat)
# prediction breaks
predict(lmlin_bin_f, newdata = dat)
# Error in X[, treat_name] : subscript out of bounds

More detail in gist here

A revision to handle setting up treatment columns in the new data could be implemented in get_X().

@mollyow mollyow linked a pull request Jan 7, 2025 that will close this issue
@mollyow
Copy link
Author

mollyow commented Jan 7, 2025

Also thank you all for making such a very useful package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant