Scale invariance #42

nlapier2 · 2024-01-22T23:53:05Z

This is essentially the same as the following issue from mr.ash: stephenslab/mr.ash.alpha#7 (comment)

Using the same input data, I get very different results for

as.numeric(varbvs(X=x_betas, y=y_betas, Z=NULL, family='gaussian')$pip)

versus

as.numeric(varbvs(X=x_betas*0.01, y=y_betas*0.01, Z=NULL, family='gaussian')$pip)

The text was updated successfully, but these errors were encountered:

pcarbo · 2024-01-23T02:41:52Z

@nlapier2 It is clear this isn't scale invariant with the default settings, but perhaps with the right settings it will be scale invariant. Could you send me x_betas and y_betas to use as a test?

nlapier2 · 2024-01-23T02:46:55Z

I would suggest the ones that are generated using the code in the linked issue. If that doesn't work, let me know and I can generate some new ones.

pcarbo · 2024-01-24T17:13:01Z

@nlapier2 Try this:

set.seed(1)
eff_size <- 0.1
N <- 1e4
M <- 100
K <- 30
theta_gx <- matrix(rnorm(M*K),M,K)
theta_xy <- c(rep(eff_size,4),rep(0,K-4))
G <- matrix(rnorm(N*M),N,M)
X <- G %*% theta_gx + matrix(rnorm(N*K),N,K)
Y <- X %*% theta_xy + rnorm(N)

get_betas_stderrs <- function(G, X, Y) {
  num_snps = dim(G)[2]
  num_exposures = dim(X)[2]
  y_betas = integer(num_snps)
  y_stderrs = integer(num_snps)
  x_betas = matrix(0, num_exposures, num_snps)
  x_stderrs = matrix(0, num_exposures, num_snps)
  for (i in 1:num_snps) {
    Gi = G[, i]
    Gi = (Gi - mean(Gi)) / sd(Gi)  # standardize genotypes
    res_gy = summary(lm(Y ~ Gi))
    y_betas[i] = res_gy$coefficients[2,1]
    y_stderrs[i] = as.numeric(coef(res_gy)[, "Std. Error"][2])
    for (j in 1:num_exposures) {
      res_gx = summary(lm(X[, j] ~ Gi))
      x_betas[j,i] = res_gx$coefficients[2,1]
      x_stderrs[j,i] = as.numeric(coef(res_gx)[, "Std. Error"][2])
    }
  }
  x_betas = t(x_betas)
  x_stderrs = t(x_stderrs)
  return(list("x_betas" = x_betas, "x_stderrs" = x_stderrs, "y_betas" = y_betas,
              "y_stderrs" = y_stderrs))
}

set.seed(1)
fit1 <- varbvs(X = x_betas,y = y_betas,Z = NULL,family = "gaussian",
               sa = 1,update.sa = TRUE,update.sigma = TRUE,tol = 1e-8)
fit2 <- varbvs(X = 0.1*x_betas,y = 0.1*y_betas,Z = NULL,
               family = "gaussian",sa = 1,update.sa = TRUE,
               update.sigma = TRUE,tol = 1e-8)
plot(fit1$pip,fit2$pip,pch = 20,xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,col = "magenta",lty = "dotted")

The results are more consistent but there are still clearly differences.

This particular data set seems to represent an "edge case" where varbvs, perhaps due to the way the model is parameterized, has difficult with the optimization of the model parameters.

My general recommendation would be to help varbvs when you call it by rescaling the data X, y so that, say, the column variances of X are closer to 1.

You should also compare to mr.ash to see if it does a better job in this case handling the rescaling.

nlapier2 changed the title ~~VarBVS -- scale invariance~~ Scale invariance Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale invariance #42

Scale invariance #42

nlapier2 commented Jan 22, 2024

pcarbo commented Jan 23, 2024

nlapier2 commented Jan 23, 2024

pcarbo commented Jan 24, 2024

Scale invariance #42

Scale invariance #42

Comments

nlapier2 commented Jan 22, 2024

pcarbo commented Jan 23, 2024

nlapier2 commented Jan 23, 2024

pcarbo commented Jan 24, 2024