Skip to content

Commit

Permalink
bulk update
Browse files Browse the repository at this point in the history
  • Loading branch information
changwoo-lee committed Dec 25, 2023
1 parent c728c7e commit d55b04d
Show file tree
Hide file tree
Showing 21 changed files with 638 additions and 194 deletions.
Binary file added .DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: bspme
Type: Package
Title: Bayesian Spatial Measurement Error Models
Version: 0.1.0
Version: 0.2.0
Author: Changwoo Lee, Eun Sug Park
Maintainer: Changwoo Lee <[email protected]>
Description: Functions for fitting Bayesian linear and genearlized linear models in presence of measurement error of the covariate.
Expand Down
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Generated by roxygen2: do not edit by hand

export(blm_me)
export(blinreg_me)
export(vecchia_cov)
importFrom(Matrix,Diagonal)
126 changes: 67 additions & 59 deletions R/blm_me.R → R/blinreg_me.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
#' Bayesian normal linear regression models with (spatially) correlated measurement errors
#' Bayesian normal linear regression models with correlated measurement errors
#'
#' This function implements the Bayesian normal linear regression model subject to measurement error with semiconjugate priors.
#'
#' Model: Y = beta0 + beta_x X + beta_z Z + epsilon, epsilon ~ N(0, sigma2y)
#' Measurement error: sigma_Y^2 ~ IG(0.01, 0.01), beta|sigma_Y^2 ~ N(0, sigma_Y^2*diag(var_beta))
#' This function implements the Bayesian normal linear regression model with correlated measurement error of covariate(s).
#' Denote \eqn{Y_i} be a continuous response, \eqn{X_i} be a \eqn{q\times 1} covariate of \eqn{i}th observation that is subject to measurement error,
#' and \eqn{Z_i} be a \eqn{p\times 1} covariate without measurement error.
#' The likelihood model is Bayesian normal linear regression,
#' \deqn{Y_i = \beta_0 + X_i^\top \beta_x + Z_i^\top \beta_z + \epsilon_i,\quad \epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n}
#' and correlated measurement error of \eqn{X_i, i=1,\dots,n} is incorporated into the model as a multivariate normal prior. For example when \eqn{q=1}, we have \eqn{n-}dimensional multivariate normal prior
#' \deqn{(X_1,\dots,X_n)\sim N_n(\mu_X, Q_X^{-1}).}
#' Also, we consider semiconjugate priors for regression coefficients and noise variance;
#' \deqn{\beta_0 \sim N(0, V_\beta), \quad \beta_{x,j} \stackrel{iid}{\sim} N(0, V_\beta), \quad \beta_{z,k} \stackrel{iid}{\sim} N(0, V_\beta), \quad \sigma_Y^2 \sim IG(a_Y, b_Y).}
#' The function \code{blinreg_me()} implements the Gibbs sampler for posterior inference. Most importantly, it allows sparse matrix input for \eqn{Q_X} for scalable computation.
#'
#' @param Y n by 1 matrix, response
#' @param X_mean n by 1 matrix or list of n by 1 matrices of length q, mean of X.
#' @param X_prec n by n matrix or list of n by n matrices of length q, precision matrix of X. Support sparse matrix format from Matrix package.
#' @param X_mean n by 1 matrix or list of n by 1 matrices of length q, mean of X \eqn{\mu_X}.
#' @param X_prec n by n matrix or list of n by n matrices of length q, precision matrix of X \eqn{Q_X}. Support sparse matrix format from Matrix package.
#' @param Z n by p matrix, covariates without measurement error
#' @param nburn integer, burn-in iteration
#' @param nthin integer, thin-in rate
Expand All @@ -22,12 +28,65 @@
#'
#' @examples
#'
#'\dontrun{
#' data(ozone)
#' data(health_sim)
#' library(bspme)
#' data(ozone)
#' data(health_sim)
#' library(fields)
#' # exposure mean and covariance at health subject locations with 06/18/1987 data (date id = 16)
#' # using Gaussian process with prior mean zero and exponential covariance kernel
#' # with fixed range 300 (in miles) and stdev 15 (in ppb)
#'
#' ozone16 = ozone[ozone$date_id==16,]
#'
#' Dxx = rdist.earth(cbind(ozone16$coords_lon, ozone16$coords_lat))
#' Dyy = rdist.earth(cbind(health_sim$coords_y_lon, health_sim$coords_y_lat))
#' Dxy = rdist.earth(cbind(ozone16$coords_lon, ozone16$coords_lat),
#' cbind(health_sim$coords_y_lon, health_sim$coords_y_lat))
#'
#' Kxx = Exponential(Dxx, range = 300, phi=15^2)
#' Kyy = Exponential(Dyy, range = 300, phi=15^2)
#' Kxy = Exponential(Dxy, range = 300, phi=15^2)
#'
#' X_mean = t(Kxy) %*% solve(Kxx, ozone16$ozone_ppb)
#' X_cov = Kyy - t(Kxy) %*% solve(Kxx, Kxy)
#'
#' # visualize
#' par(mfrow = c(1,3))
#' quilt.plot(cbind(ozone16$coords_lon, ozone16$coords_lat),
#' ozone16$ozone_ppb, main = "ozone measurements"); US(add= T)
#'
#' quilt.plot(cbind(health_sim$coords_y_lon, health_sim$coords_y_lat),
#' X_mean, main = "health subjects, mean of exposure"); US(add= T)
#' points(cbind(ozone16$coords_lon, ozone16$coords_lat), pch = 17)
#'
#' quilt.plot(cbind(health_sim$coords_y_lon, health_sim$coords_y_lat),
#' sqrt(diag(X_cov)), main = "health subjects, sd of exposure"); US(add= T)
#' points(cbind(ozone16$coords_lon, ozone16$coords_lat), pch = 17)
#'
#' # vecchia approximation
#' run_vecchia = vecchia_cov(X_cov, coords = cbind(health_sim$coords_y_lon, health_sim$coords_y_lat),
#' n.neighbors = 10)
#' Q_sparse = run_vecchia$Q
#' run_vecchia$cputime
#'
blm_me <- function(Y,
#' # fit the model
#' fit_me = blm_me(Y = health_sim$Y,
#' X_mean = X_mean,
#' X_prec = Q_sparse, # sparse precision matrix
#' Z = health_sim$Z,
#' nburn = 100,
#' nsave = 1000,
#' nthin = 1)
#' fit_me$cputime
#' summary(fit_me$posterior)
#' library(bayesplot)
#' bayesplot::mcmc_trace(fit_me$posterior)
#' }
#'
blinreg_me <- function(Y,
X_mean,
X_prec,
Z,
Expand Down Expand Up @@ -160,54 +219,3 @@ blm_me <- function(Y,
if(saveX) out$X_save = X_save
out
}


























# data(mtcars)
#
# fitme = linreg_me(Y = mtcars$mpg,
# X_mean = mtcars$wt,
# X_prec = diag(rep(10, 32)),
# Z = cbind(mtcars$am, mtcars$vs), var_beta = 10000, nsave = 1000, nthin = 10, saveX = T)
# fitme_conj = linreg_me_conj(Y = mtcars$mpg,
# X_mean = mtcars$wt,
# X_prec = Matrix(diag(rep(10, 32))),
# Z = cbind(mtcars$am, mtcars$vs), var_beta = 10000, nsave = 1000, nthin = 10, saveX = T)
#
# fitme2 = linreg_me(Y = mtcars$mpg,
# X_mean = list(wt = mtcars$wt, cyl = mtcars$cyl),
# X_prec = list(wt = diag(rep(10, 32)), cyl = diag(rep(10, 32))),
# Z = cbind(mtcars$am, mtcars$vs), var_beta = 10000, nsave = 1000, nthin = 10, saveX = T)

#
#
# summary(fit$posterior)
#
# summary(fitme$posterior)
#
# summary(fitme2$posterior)
# str(fitme2$X_save)
# plot(fitme2$posterior)
14 changes: 8 additions & 6 deletions R/bspme-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,15 @@ NULL

#' Dataset, simulated health data
#'
#' Simulated health data based on ozone exposures on 06/18/1987.
#' Simulated health data based on ozone exposures on 06/18/1987. For details, see \code{health_sim.R}.
#'
#' @format A data frame with 3000 rows and 4 variables:
#' @format A data frame with n = 3000 rows and 4 variables:
#' \describe{
#' \item{y}{numeric, simulated health outcome}
#' \item{coords_y_lon}{numeric, simulated health subject longitude}
#' \item{coords_y_lat}{numeric, simulated health subject latitude,}
#' \item{covariate}{numeric, covariate}
#' \item{Y}{n by 1 matrix, numeric, simulated continuous health outcome}
#' \item{Ybinary}{n by 1 matrix, numeric, simulated binary health outcome}
#' \item{coords_y_lon}{n by 1 matrix, numeric, simulated health subject longitude}
#' \item{coords_y_lat}{n by 1 matrix, numeric, simulated health subject latitude}
#' \item{Z}{n by 1 matrix, numeric, covariate}
#' \item{X_true}{n by 1 matrix, numeric, true ozone exposure used for simulation}
#' }
"health_sim"
5 changes: 3 additions & 2 deletions R/vecchia_cov.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,10 @@ vecchia_cov = function(Sigma, coords, n.neighbors, ord = NULL, KLdiv = FALSE){
A[i+1, Nlist[[i+1]]]= temp
D[i+1]= Sigma[i+1, i+1] - sum(Sigma[i+1, Nlist[[i+1]]]*temp)
}
D = Diagonal(n, x = D)
Q_reordered = Matrix::t((Matrix::Diagonal(n) - A))%*%solve(D)%*%(Matrix::Diagonal(n) - A)
D = Matrix::Diagonal(n, x = D)
Q_reordered = Matrix::t((Matrix::Diagonal(n) - A))%*%Matrix::solve(D)%*%(Matrix::Diagonal(n) - A)
Q = Q_reordered[order(ord),order(ord)]
Q = as(Q, "dgCMatrix")
end.time = Sys.time()

out = list()
Expand Down
33 changes: 23 additions & 10 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,19 @@ knitr::opts_chunk$set(
<!-- badges: start -->
<!-- badges: end -->

**bspme** is an R package that provides a set of functions for **B**ayesian **sp**atially correlated **m**easurement **e**rror models, the Bayesian linear models with the presence of (spatailly) correlated measurement error of covariate(s). For illustration, consider a normal linear regression model with intercept, one covariate $X$ with measurement error, and one covariate $Z$ without measurement error:
\[
Y_i = \beta_0 + \beta_x X_i + \beta_z Z_i + \epsilon, \quad \epsilon \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n,
\]
**bspme** is an R package that provides a set of functions for **B**ayesian **sp**atially correlated **m**easurement **e**rror models, the Bayesian linear models with the presence of (spatailly) correlated measurement error of covariate(s). For more details, please see the following paper:


> A scalable two-stage Bayesian approach accounting for exposure measurement error in
environmental epidemiology.
> Lee, C. J., Symanski, E., Rammah, A., Kang, D.H., Hopke, P., Park, E.S. (2023+). preprint (url here).

For illustration, consider a normal linear regression model with intercept, one covariate $X$ with measurement error, and one covariate $Z$ without measurement error:

$$
Y_i = \beta_0 + \beta_x X_i + \beta_z Z_i + \epsilon_i, \quad \epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n,
$$

For example in the context of environmental epidemiology, covariate $X_i$ can be an exposure to air pollution of subject $i$ at different locations, $Z_i$ can be demographic information, and $Y_i$ can be associated health outcome. Since exposure $X_i$ are not directly measured at health subject locations but are predicted from air pollution monitoring station locations, this induces spatially correlated measurement error in $X_i$. Also, the uncertainty information should be taken account, which depends on the proximity of the monitoring station to the subject location. One way to incorporate this information is to use a multivariate normal prior on the covariate $(X_1,\dots,X_n)$ (MVN prior approach) adopted by **bspme** package, the **B**ayesian **sp**atially correlated **m**easurement **e**rror models.

Expand All @@ -45,15 +54,19 @@ devtools::install_github("changwoo-lee/bspme")

| Function | Description |
| ---------------------- | -------------------------------------------------------------------------|
| `blm_me()` | #' Bayesian normal linear regression models with (spatially) correlated measurement errors |
| `bglm_me()` | #' Bayesian generalized linear regression models with (spatially) correlated measurement errors |
| `vecchia_cov()` | Vecchia approximation given a covariance matrix |
| `blinreg_me()` | Bayesian normal linear regression models with (spatially) correlated measurement errors |
| `blogireg_me()` | Bayesian generalized linear regression models with (spatially) correlated measurement errors |
| `vecchia_cov()` | Vecchia approximation given a covariance matrix |



## datasets

| Data call | Description |
| Dataset call | Description |
| ---------------------- | -------------------------------------------------------------------------|
| `data("ozone")` | ozone exposure data |
| `data("health_sim")` | Simulated health data |
| `data("ozone")` | 1987 midwest ozone exposure data |
| `data("health_sim")` | Simulated health data |

## Examples

Please see the vignette "Ozone exposure and simulated health data".
40 changes: 27 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,20 @@
**bspme** is an R package that provides a set of functions for
**B**ayesian **sp**atially correlated **m**easurement **e**rror models,
the Bayesian linear models with the presence of (spatailly) correlated
measurement error of covariate(s). For illustration, consider a normal
linear regression model with intercept, one covariate $X$ with
measurement error, and one covariate $Z$ without measurement error: $$
Y_i = \beta_0 + \beta_x X_i + \beta_z Z_i + \epsilon, \quad \epsilon \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n,
measurement error of covariate(s). For more details, please see the
following paper:

> A scalable two-stage Bayesian approach accounting for exposure
> measurement error in environmental epidemiology. Lee, C. J., Symanski,
> E., Rammah, A., Kang, D.H., Hopke, P., Park, E.S. (2023+). preprint
> (url here).
For illustration, consider a normal linear regression model with
intercept, one covariate $X$ with measurement error, and one covariate
$Z$ without measurement error:

$$
Y_i = \beta_0 + \beta_x X_i + \beta_z Z_i + \epsilon_i, \quad \epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n,
$$

For example in the context of environmental epidemiology, covariate
Expand Down Expand Up @@ -51,15 +61,19 @@ devtools::install_github("changwoo-lee/bspme")

## Functionality

| Function | Description |
|-----------------|--------------------------------------------------------------------------------------------------|
| `blm_me()` | \#’ Bayesian normal linear regression models with (spatially) correlated measurement errors |
| `bglm_me()` | \#’ Bayesian generalized linear regression models with (spatially) correlated measurement errors |
| `vecchia_cov()` | Vecchia approximation given a covariance matrix |
| Function | Description |
|-----------------|----------------------------------------------------------------------------------------------|
| `blinreg_me()` | Bayesian normal linear regression models with (spatially) correlated measurement errors |
| `blogireg_me()` | Bayesian generalized linear regression models with (spatially) correlated measurement errors |
| `vecchia_cov()` | Vecchia approximation given a covariance matrix |

## datasets

| Data call | Description |
|----------------------|-----------------------|
| `data("ozone")` | ozone exposure data |
| `data("health_sim")` | Simulated health data |
| Dataset call | Description |
|----------------------|----------------------------------|
| `data("ozone")` | 1987 midwest ozone exposure data |
| `data("health_sim")` | Simulated health data |

## Examples

Please see the vignette “Ozone exposure and simulated health data”.
Loading

0 comments on commit d55b04d

Please sign in to comment.