-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale invariance #42
Comments
@nlapier2 It is clear this isn't scale invariant with the default settings, but perhaps with the right settings it will be scale invariant. Could you send me |
I would suggest the ones that are generated using the code in the linked issue. If that doesn't work, let me know and I can generate some new ones. |
@nlapier2 Try this: set.seed(1)
eff_size <- 0.1
N <- 1e4
M <- 100
K <- 30
theta_gx <- matrix(rnorm(M*K),M,K)
theta_xy <- c(rep(eff_size,4),rep(0,K-4))
G <- matrix(rnorm(N*M),N,M)
X <- G %*% theta_gx + matrix(rnorm(N*K),N,K)
Y <- X %*% theta_xy + rnorm(N)
get_betas_stderrs <- function(G, X, Y) {
num_snps = dim(G)[2]
num_exposures = dim(X)[2]
y_betas = integer(num_snps)
y_stderrs = integer(num_snps)
x_betas = matrix(0, num_exposures, num_snps)
x_stderrs = matrix(0, num_exposures, num_snps)
for (i in 1:num_snps) {
Gi = G[, i]
Gi = (Gi - mean(Gi)) / sd(Gi) # standardize genotypes
res_gy = summary(lm(Y ~ Gi))
y_betas[i] = res_gy$coefficients[2,1]
y_stderrs[i] = as.numeric(coef(res_gy)[, "Std. Error"][2])
for (j in 1:num_exposures) {
res_gx = summary(lm(X[, j] ~ Gi))
x_betas[j,i] = res_gx$coefficients[2,1]
x_stderrs[j,i] = as.numeric(coef(res_gx)[, "Std. Error"][2])
}
}
x_betas = t(x_betas)
x_stderrs = t(x_stderrs)
return(list("x_betas" = x_betas, "x_stderrs" = x_stderrs, "y_betas" = y_betas,
"y_stderrs" = y_stderrs))
}
set.seed(1)
fit1 <- varbvs(X = x_betas,y = y_betas,Z = NULL,family = "gaussian",
sa = 1,update.sa = TRUE,update.sigma = TRUE,tol = 1e-8)
fit2 <- varbvs(X = 0.1*x_betas,y = 0.1*y_betas,Z = NULL,
family = "gaussian",sa = 1,update.sa = TRUE,
update.sigma = TRUE,tol = 1e-8)
plot(fit1$pip,fit2$pip,pch = 20,xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,col = "magenta",lty = "dotted") The results are more consistent but there are still clearly differences. This particular data set seems to represent an "edge case" where varbvs, perhaps due to the way the model is parameterized, has difficult with the optimization of the model parameters. My general recommendation would be to help varbvs when you call it by rescaling the data X, y so that, say, the column variances of X are closer to 1. You should also compare to |
This is essentially the same as the following issue from mr.ash: stephenslab/mr.ash.alpha#7 (comment)
Using the same input data, I get very different results for
as.numeric(varbvs(X=x_betas, y=y_betas, Z=NULL, family='gaussian')$pip)
versus
as.numeric(varbvs(X=x_betas*0.01, y=y_betas*0.01, Z=NULL, family='gaussian')$pip)
The text was updated successfully, but these errors were encountered: