Skip to content

Commit

Permalink
manual update
Browse files Browse the repository at this point in the history
  • Loading branch information
changwoo-lee committed Jan 12, 2024
1 parent 8319c1e commit 6043eab
Show file tree
Hide file tree
Showing 14 changed files with 162 additions and 153 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Version: 1.0.0
Authors@R: c(person("Changwoo", "Lee", role=c("aut", "cre"), email="[email protected]"), person("Eun Sug", "Park", role = c("aut")))
Author: Changwoo Lee[aut, cre], Eun Sug Park[aut]
Maintainer: Changwoo Lee <[email protected]>
Description: Scalable methods for fitting Bayesian linear and generalized linear models in the presence of spatial exposure measurement error, represented as a multivariate normal prior distribution.
These models typically arises from a two-stage Bayesian analysis of environmental exposures and health outcomes.
From a first-stage model, predictions of the covariate of interest (exposure) and their uncertainty information (typically contained in MCMC samples) are used to form a multivariate normal prior distribution for exposure in a second-stage regression model.
Description: Scalable methods for fitting Bayesian linear and generalized linear models in the presence of spatial exposure measurement error.
These models typically arise from a two-stage Bayesian analysis of environmental exposures and health outcomes.
From a first-stage model, predictions of the covariate of interest ("exposure") and their uncertainty information (typically contained in MCMC samples) are used to form a multivariate normal prior distribution for exposure in a second-stage regression model.
The package provides implementation of the methods used in Lee et al. (2024) <https://arxiv.org/abs/2401.00634>.
License: GPL (>= 3)
Encoding: UTF-8
Expand Down
2 changes: 0 additions & 2 deletions LICENSE

This file was deleted.

4 changes: 2 additions & 2 deletions R/bglm_me.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Bayesian generalized linear models with spatial exposure measurement error.
#'
#' This function fits a Bayesian generalized linear model in the presence of spatial exposure measurement error for covariate(s) \eqn{X}, represented as a multivariate normal prior.
#' This function fits a Bayesian generalized linear model in the presence of spatial exposure measurement error for covariate(s) \eqn{X}.
#' One of the most important features of this function is that it allows a sparse matrix input for the prior precision matrix of \eqn{X} for scalable computation.
#' As of version 1.0.0, only Bayesian logistic regression model is supported among GLMs, and
#' function \code{bglm_me()} runs a Gibbs sampler to carry out posterior inference using Polya-Gamma augmentation (Polson et al., 2013).
Expand Down Expand Up @@ -109,7 +109,7 @@
#' family = binomial(link = "logit"),
#' nburn = 5000,
#' nsave = 5000,
#' nthin = 5)
#' nthin = 1)
#' fit$cputime
#' summary(fit$posterior)
#' library(bayesplot)
Expand Down
71 changes: 36 additions & 35 deletions R/blm_me.R
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@
#' Bayesian linear regression models with spatial exposure measurement error.
#'
#' This function fits Bayesian linear regression model in the presence of spatial exposure measurement error of covariate(s) \eqn{X}, represented as a multivariate normal prior.
#' One of the most important features of this function is that it allows sparse matrix input for the prior precision matrix of \eqn{X} for scalable computation.
#' Function \code{blm_me()} runs a Gibbs sampler to carry out posterior inference; see below "Details" section for the model description, and Lee et al. (2024) for an application example in environmental epidemiology.
#' This function fits a Bayesian linear regression model in the presence of spatial exposure measurement error for covariate(s) \eqn{X}.
#' One of the most important features of this function is that it allows a sparse matrix input for the prior precision matrix of \eqn{X} for scalable computation.
#' Function \code{blm_me()} runs a Gibbs sampler to carry out posterior inference; see below "Details" section for the model description, and Lee et al. (2024) for an application example in environmental epidemiology.
#'
#' Denote \eqn{Y_i} be a binary response, \eqn{X_i} be a \eqn{q\times 1} covariate that is subject to spatial exposure measurement error,
#' and \eqn{Z_i} be a \eqn{p\times 1} covariate without measurement error.
#' Consider normal linear regression model,
#' Denote \eqn{Y_i} be a continuous response, \eqn{X_i} be a \eqn{q\times 1} covariate vector that is subject to spatial exposure measurement error,
#' and \eqn{Z_i} be a \eqn{p\times 1} covariate vector without measurement error.
#' Consider a normal linear regression model,
#' \deqn{Y_i = \beta_0 + X_i^\top \beta_X + Z_i^\top \beta_Z + \epsilon_i,\quad \epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2_Y), \quad i=1,\dots,n,}
#' and spatial exposure measurement error of \eqn{X_i} for \eqn{i=1,\dots,n} is incorporated into the model as a multivariate normal prior.
#' For example when \eqn{q=1}, we have \eqn{n-}dimensional multivariate normal prior of \eqn{X = (X_1,\dots,X_n)^\top},
#' where spatial exposure measurement error of \eqn{X_i} for \eqn{i=1,\dots,n} is incorporated into the model using a multivariate normal prior.
#' For example when \eqn{q=1}, we have an \eqn{n-}dimensional multivariate normal prior of \eqn{X = (X_1,\dots,X_n)^\top},
#' \deqn{(X_1,\dots,X_n)\sim N_n(\mu_X, Q_X^{-1}).}
#' Most importantly, it allows sparse matrix input for the prior precision matrix \eqn{Q_X} for scalable computation, which could be obtained by Vecchia approximation.
#' Most importantly, it allows a sparse matrix input for the prior precision matrix \eqn{Q_X} for scalable computation, which could be obtained by Vecchia approximation.
#' When \eqn{q>1}, \eqn{q} independent \eqn{n-}dimensional multivariate normal priors are assumed.
#'
#' We consider semiconjugate priors for regression coefficients and error variance,
#' \deqn{\beta_0 \sim N(0, V_\beta), \quad \beta_{X,j} \stackrel{iid}{\sim} N(0, V_\beta), \quad \beta_{Z,k} \stackrel{iid}{\sim} N(0, V_\beta), \quad \sigma_Y^2 \sim IG(a_Y, b_Y).}
#' where \code{var_beta} corresponds to \eqn{V_\beta} and \code{a_Y}, \code{b_Y} corresponds to hyperparameters for \eqn{\sigma^2_Y}.
#' where \code{var_beta} corresponds to \eqn{V_\beta}, and \code{a_Y} and \code{b_Y} corresponds to hyperparameters of an inverse gamma prior for \eqn{\sigma^2_Y}.
#'
#' @param Y *vector&lt;int&gt;*, n by 1 continuous response vector.
#' @param X_mean *matrix&lt;num&gt;*, n by 1 prior mean matrix \eqn{\mu_X}. When there are q multiple exposures subject to measurement error, it can be a length q list of n by 1 matrices.
#' @param X_prec *matrix&lt;num&gt;*, n by 1 prior precision matrix \eqn{Q_X}, which allows sparse format from \link{Matrix} package. When there are q multiple exposures subject to measurement error, it can be a length q list of n by n matrices.
#' @param Z *matrix&lt;num&gt;*, n by p covariates that are not subject to measurement error.
#' @param nburn *integer*, burn-in iteration, default 1000.
#' @param nsave *integer*, number of posterior samples, default 1000. Total number of MCMC iteration is \code{nburn + nsave * nthin}.
#' @param nthin *integer*, thin-in rate, default 1.
#' @param prior *list*, list of prior parameters of regression model. Default is \code{list(var_beta = 100, a_Y = 0.01, b_Y = 0.01)}.
#' @param X_mean *vector&lt;num&gt;*, n by 1 prior mean vector \eqn{\mu_X}. When there are q multiple exposures subject to measurement error, it can be a length q list of n by 1 vectors.
#' @param X_prec *matrix&lt;num&gt;*, n by n prior precision matrix \eqn{Q_X}, which allows sparse format from \link{Matrix} package. When there are q multiple exposures subject to measurement error, it can be a length q list of n by n matrices.
#' @param Z *matrix&lt;num&gt;*, n by p matrix containing covariates that are not subject to measurement error.
#' @param nburn *integer*, number of burn-in iteration (default 5000).
#' @param nsave *integer*, number of posterior samples (default 5000). Total number of MCMC iteration is \code{nburn + nsave * nthin}.
#' @param nthin *integer*, thin-in rate (default 1).
#' @param prior *list*, list of prior parameters of the regression model. Default is \code{list(var_beta = 100, a_Y = 0.01, b_Y = 0.01)}.
#' @param saveX *logical*, default FALSE, whether save posterior samples of X (exposure).
#'
#' @return list of the following:
Expand All @@ -46,9 +47,9 @@
#' data(health_sim)
#' library(fields)
#' library(maps)
#' # Obtain the predicted exposure mean and covariance at health subject locations based on Jan 10, 2012 data
#' # using Gaussian process prior with mean zero and exponential covariance kernel
#' # with fixed range 5 (in miles) and stdev 0.5 (in ppbv)
#' # Obtain the predicted exposure mean and covariance at simulated health subject locations based on Jan 10, 2012 NO2 data
#' # using a Gaussian process prior with mean zero and exponential covariance kernel
#' # with fixed range 8 (in km) and standard deviation 1.
#'
#' # exposure data
#' data_jan10 = NO2_Jan2012[NO2_Jan2012$date == as.POSIXct("2012-01-10"),]
Expand All @@ -57,36 +58,36 @@
#' # health data
#' coords_health = cbind(health_sim$lon, health_sim$lat)
#'
#' distmat_xx <- rdist.earth(coords_monitor)
#' distmat_xy <- rdist.earth(coords_monitor, coords_health)
#' distmat_yy <- rdist.earth(coords_health)
#' distmat_xx <- rdist.earth(coords_monitor, miles = F)
#' distmat_xy <- rdist.earth(coords_monitor, coords_health, miles = F)
#' distmat_yy <- rdist.earth(coords_health, miles = F)
#'
#' a = 5; sigma = 1; # assume known
#' a = 8; sigma = 1; # assume known
#'
#' Sigmaxx = fields::Matern(distmat_xx, smoothness = 0.5, range = a, phi = sigma^2)
#' Sigmaxy = fields::Matern(distmat_xy, smoothness = 0.5, range = a, phi = sigma^2)
#' Sigmayy = fields::Matern(distmat_yy, smoothness = 0.5, range = a, phi = sigma^2)
#'
#' # posterior predictive mean and covariance of exposure at health data
#' # posterior predictive mean and covariance of exposure at health subject locations
#' X_mean <- t(Sigmaxy) %*% solve(Sigmaxx, data_jan10$lnNO2)
#' X_cov <- Sigmayy - t(Sigmaxy) %*% solve(Sigmaxx,Sigmaxy) # n_y by n_y
#'
#' # visualize
#' # monitoring station exposure data
#' quilt.plot(cbind(data_jan10$lon, data_jan10$lat),
#' data_jan10$lnNO2, main = "NO2 exposures (in log) at 21 stations",
#' data_jan10$lnNO2, main = "NO2 exposures (in log) at 21 monitoring stations",
#' xlab = "longitude", ylab= "latitude", xlim = c(-96.5, -94.5), ylim = c(29, 30.5))
#' maps::map("county", "Texas", add = T)
#'
#' # posterior predictive mean of exposure at health data
#' # posterior predictive mean of exposure at health subject locations
#' quilt.plot(cbind(health_sim$lon, health_sim$lat),
#' X_mean, main = "posterior predictive mean of exposure at health data locations",
#' X_mean, main = "posterior predictive mean of exposure at health subject locations",
#' xlab = "longitude", ylab= "latitude", xlim = c(-96.5, -94.5), ylim = c(29, 30.5))
#' maps::map("county", "Texas", add = T)
#'
#' # posterior predictive sd of exposure at health data
#' # posterior predictive sd of exposure at health subject locations
#' quilt.plot(cbind(health_sim$lon, health_sim$lat),
#' sqrt(diag(X_cov)), main = "posterior predictive sd of exposure at health data locations",
#' sqrt(diag(X_cov)), main = "posterior predictive sd of exposure at health subject locations",
#' xlab = "longitude", ylab= "latitude", xlim = c(-96.5, -94.5), ylim = c(29, 30.5))
#' maps::map("county", "Texas", add = T)
#'
Expand All @@ -102,9 +103,9 @@
#' X_mean = X_mean,
#' X_prec = Q_sparse, # sparse precision matrix
#' Z = health_sim$Z,
#' nburn = 1000,
#' nsave = 1000,
#' nthin = 5)
#' nburn = 5000,
#' nsave = 5000,
#' nthin = 1)
#' fit$cputime
#' summary(fit$posterior)
#' library(bayesplot)
Expand All @@ -115,8 +116,8 @@ blm_me <- function(Y,
X_mean,
X_prec,
Z,
nburn = 1000,
nsave = 1000,
nburn = 5000,
nsave = 5000,
nthin = 1,
prior = NULL,
saveX = FALSE){
Expand Down
34 changes: 19 additions & 15 deletions R/bspme-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,24 @@ NULL


#' @name NO2_Jan2012
#' @title Daily average NO2 (nitrogen dioxide) measurements at Houston area
#' @title Daily average NO2 concentrations in the Harris County, Texas, in Jan 2012
#'
#' @description
#' This dataset is
#' This dataset contains daily average NO2 (nitrogen dioxide) concentrations measured at 21 monitoring sites around the Harris County, Texas, in January 2012.
#' The raw data were obtained from the Texas Commission on Environmental Quality (TCEQ); see, e.g., <https://www.tceq.texas.gov/cgi-bin/compliance/monops/daily_summary.pl>.
#' From the hourly average NO2 raw data, the daily average NO2 concentrations are calculated by taking the average of 24 hourly averages.
#' If there are any missing data, the daily average NO2 concentrations are imputed from the average of 3-nearest neighboring monitoring station that are non-missing. The imputation is done separately for each day.
#' Finally, the natural logarithm of daily average NO2 concentrations are calculated.
#'
#' @usage data(NO2_Jan2012)
#'
#' @format A data frame with 5963 rows and 6 variables:
#' @format A data frame with 651 (21 sites x 31 days) rows and 5 variables:
#' \describe{
#' \item{date}{POSIXct, date}
#' \item{site_name}{character, monitoring station name}
#' \item{lon}{numeric, longitude of monitoring station}
#' \item{lat}{numeric, latitude of monitoring station}
#' \item{lnNO2}{numeric, natural logarithm of daily average NO2 concentrations, in parts per billion by volume (ppbv)}
#' \item{date}{date in POSIXct format}
#' \item{site_name}{monitoring station name}
#' \item{lon}{monitoring station longitude}
#' \item{lat}{monitoring station latitude}
#' \item{lnNO2}{natural logarithm of daily average NO2 concentrations, in parts per billion by volume (ppbv)}
#' }
NULL

Expand All @@ -33,13 +37,13 @@ NULL
#'
#' @usage data(health_sim)
#'
#' @format A data frame with n = 3000 rows and 4 variables:
#' @format A data frame with n = 2000 rows and 6 variables:
#' \describe{
#' \item{Y}{n by 1 matrix, numeric, simulated continuous health outcome}
#' \item{Ybinary}{n by 1 matrix, numeric, simulated binary health outcome}
#' \item{lon}{n by 1 matrix, numeric, simulated health subject longitude}
#' \item{lat}{n by 1 matrix, numeric, simulated health subject latitude}
#' \item{Z}{n by 1 matrix, numeric, covariate}
#' \item{X_true}{n by 1 matrix, numeric, true ozone exposure used for simulation}
#' \item{Y}{simulated continuous health outcome}
#' \item{Ybinary}{simulated binary health outcome}
#' \item{lon}{simulated health subject longitude}
#' \item{lat}{simulated health subject latitude}
#' \item{Z}{simulated covariate (p=1) that is not subject to measurement error}
#' \item{X_true}{true ln(NO2) exposure used for simulating health outcome}
#' }
NULL
4 changes: 2 additions & 2 deletions R/vecchia_cov.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' Given a multivariate normal (MVN) distribution with covariance matrix \eqn{\Sigma},
#' this function finds a sparse precision matrix (inverse covariance) \eqn{Q} based on the Vecchia approximation (Vecchia 1988, Katzfuss and Guinness 2021),
#' where \eqn{N(\mu, Q^{-1})} is the sparse MVN that approximates original MVN \eqn{N(\mu, \Sigma)}.
#' where \eqn{N(\mu, Q^{-1})} is the sparse MVN that approximates the original MVN \eqn{N(\mu, \Sigma)}.
#' The algorithm is based on the pseudocode 2 of Finley et al. (2019).
#'
#' @param Sigma *matrix&lt;num&gt;*, n by n covariance matrix
Expand All @@ -16,7 +16,7 @@
#' \item{Q}{The n by n sparse precision matrix in \link{Matrix} format}
#' \item{ord}{The ordering used for Vecchia approximation}
#' \item{cputime}{time taken to run Vecchia approximation}
#' \item{KLdiv}{ (if \code{KLdiv = TRUE}) KL divergence \eqn{D_{KL}(p || \tilde{p})} where \eqn{p} is multivariate normal with original covariance matrix and \eqn{\tilde{p}} is the approximated multivariate normal with sparse precision matrix.}
#' \item{KLdiv}{ (if \code{KLdiv = TRUE}) KL divergence \eqn{D_{KL}(p || \tilde{p})} where \eqn{p} is the multivariate normal with original covariance matrix and \eqn{\tilde{p}} is the approximated multivariate normal with a sparse precision matrix.}
#' }
#'
#' @references Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50(2), 297-312.
Expand Down
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ To see function description in R environment, run the following lines:

| Dataset call | Description |
| ---------------------- | -------------------------------------------------------------------------|
| `data("NO2_Jan2012")` | January 2012 daily average NO2 exposure data at Houston, Texas |
| `data("health_sim")` | Simulated health data corresponding to ozone exposure data |
| `data("NO2_Jan2012")` | January 2012 daily average NO2 exposure data at Houston, Texas |
| `data("health_sim")` | Simulated health data associated with log(NO2) concentration on Jan 10, 2012 |

## Examples

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ To see function description in R environment, run the following lines:

## datasets

| Dataset call | Description |
|-----------------------|----------------------------------------------------------------|
| `data("NO2_Jan2012")` | January 2012 daily average NO2 exposure data at Houston, Texas |
| `data("health_sim")` | Simulated health data corresponding to ozone exposure data |
| Dataset call | Description |
|-----------------------|------------------------------------------------------------------------------|
| `data("NO2_Jan2012")` | January 2012 daily average NO2 exposure data at Houston, Texas |
| `data("health_sim")` | Simulated health data associated with log(NO2) concentration on Jan 10, 2012 |

## Examples

Expand Down
20 changes: 12 additions & 8 deletions man/NO2_Jan2012.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6043eab

Please sign in to comment.