Code to reproduce figures, simulation results, and real data analysis results from the paper
Lee, C. J., Zito, A., Sang, H., & Dunson, D. B. (2024). Logistic-beta processes for dependent random probabilities with beta marginals. arXiv preprint arXiv:2402.07048. https://arxiv.org/abs/2402.07048
- Figure 1 (density of univariate logistic-beta and Polya):
figure/figure_1.R
- Figure 2 (density of bivariate logistic-beta):
figure/figure_2.R
- Figure 3 (LBP prior example and latent LBP):
figure/figure_3.R
- Figure 4 (correlation lower bounds of bivariate betas and LB-DDP):
figure/figure_4_and_scatter.R
- Figure 5 (preterm birth probability plot for smoking group):
real_analysis/DDEexposure_smoking.R
- Figure B.1 (correlation lower bounds of LB-DDP and bivariate
scatterplots):
figure/figure_4_and_scatter.R
- Figure D.1 (effect of feature map normalization):
figure/figure_basis.R
- Figure D.2 (preterm birth probability plot for nonsmoking group):
real_analysis/DDEexposure_nonsmoking.R
Scenario 1, Scenario 2, Scenario 3 correspond to rho = 0.1, 0.2, 0.4, respectively, for data generation settings with different Matern range parameters.
- See
simul/execute_simul_copula*.R
for running all simulations with data generation based on Gaussian copula. - See
simul/execute_simul_lbp*.R
for running all simulations data generation based on latent LBP. simul/simul_data_copula.R
andsimul/simul_data_lbp.R
are scripts to generate data under Gaussian copula and latent LBP models, respectively.run_copula.R
andrun_lbp.R
are scripts to run a single simulation for Gaussian copula and latent LBP models, respectively.- Stan code and associated prediction code for the Gaussian copula model can be found in the folder
simul/copula_codes.
- Smoking group:
real_analysis/DDEexposure_smoking.R
- Nonsmoking group:
real_analysis/DDEexposure_nonsmoking.R
R/lb.R
: basic functions for logistic-beta
- Random variate generation: univariate
rulb()
, multivariatermlb()
. - Density evaluation: univariate
dulb()
, multivariatedmlb()
(warning: unstable!). - Others:
mlb_cov()
,pulb()
R/polya.R
: basic functions for Polya distribution
- Random variate generation:
rpolya()
(requiresdqrng
package for fast random number generation) - Density evaluation:
dpolya()
(warning: unstable!) - Adaptive Polya proposal:
find_polya_proposal()
to find a’, b’ with moment matching criterion - Others: mean
epolya()
, variancevpolya()
.
- Functions for latent LBP model for binary data
R/LBBer_Matern.R
implements Algorithm 1 (blocked Gibbs sampler) with Matern correlation kernel.- Can handle both 1 and 2-dimensional covariates
- Fixed smoothness parameter. Discrete uniform prior is assumed on the Matern range parameter.
- If
adapt = TRUE
, it utilizes the adaptive Polya proposal (seefind_polya_proposal()
inR/polya.R
) - If
nparticle = 0
, it runs independent M-H for sampling the Polya mixing parameter. Otherwise, it runs particle Gibbs. - For the unblocked version, see
R/LBBer_Matern_unblocked.R
R/LBBer_Matern_predict_LB.R
is a function to get posterior predictive samples of success probabilities.
R/LBDDP_basis.R
andR/LBDDP_postprocess.R
: functions for LB-DDP mixture model
R/LBDDP_basis.R
implements Algorithm 2 (blocked Gibbs sampler) with normalized basis.R/LBDDP_postprocess.R
is a function to get posterior predictive samples of conditional density and cumulative probability.
-
R/LSBP_basis.R
andR/LSBP_postprocess.R
: functions for logit stick-breaking mixture model, implemented in R. -
R/dmvn_lowrankstr.R
: functions for MVN density evaluation with low-rank structure
# reproduce figure 3, latent LBP model with binary data
rm(list = ls())
source("R/polya.R")
source("R/lb.R")
library(ggplot2) # plotting
library(dqrng)
library(fields)
library(mvnfast)
library(spam)
set.seed(1)
x = c(seq(0,0.99,length = 295), seq(1,2,length = 10), seq(2.01,3,length = 295))
n = length(x)
ngrid= 200 # for prediction
xgrid = seq(0.01, 2.99, length = ngrid) # for prediction
p_true = 1/(1+exp(-cos(x*pi)))
z = rbinom(n, 1, prob = p_true) # binary data
pgrid_true = 1/(1+exp(-cos(xgrid*pi)))
# prior draw with Matern kernel
set.seed(1234)
dqrng::dqset.seed(1234) # rpolya uses dqrng package
R22 = fields::Matern(fields::rdist(xgrid), range = 0.3, smoothness = 1.5)
eta = rmlb(8, a= 2, b = 4, R22) # eight LBP samples
dfprior = data.frame(xgrid = xgrid, pi_draws = t(1/(1+exp(-eta)))) # transform to [0,1]
g1 = ggplot(dfprior) +
geom_line(aes(x=xgrid, y = pi_draws.1), col = 1) +
geom_line(aes(x=xgrid, y = pi_draws.2), col = 2) +
geom_line(aes(x=xgrid, y = pi_draws.3), col = 3) +
geom_line(aes(x=xgrid, y = pi_draws.4), col = 4) +
geom_line(aes(x=xgrid, y = pi_draws.5), col = 5) +
geom_line(aes(x=xgrid, y = pi_draws.6), col = 6) +
geom_line(aes(x=xgrid, y = pi_draws.7), col = 7) +
geom_line(aes(x=xgrid, y = pi_draws.8), col = 8) +
ylim(c(0,1)) + ylab("") + xlab("") + theme_bw()
g1
# posterior
# take some time, full-rank Matern correlation kernel
source("R/LBBer_Matern.R")
source("R/LBBer_Matern_predict.R")
set.seed(1)
fit = LBBer_Matern(z, x, a = 2, b = 4,
range_grid = 0.3, # discrete uniform prior on range_grid. Fixed if only one value is provided
smoothness = 1.5,
nburn = 1000, nsave = 1000, nthin = 1)
# prediction of success probabilities
fit_pred = LBBer_Matern_predict(fit, xgrid)
df = data.frame(xgrid = xgrid,
pgrid_true = pgrid_true,
pi_est = apply(fit_pred, 2, mean),
pi_lb = apply(fit_pred, 2, function(x) quantile(x, 0.025)),
pi_ub = apply(fit_pred, 2, function(x) quantile(x, 0.975)),
x = x,
z = z)
g2 = ggplot(df) +
geom_point(aes(x = x, y = z), alpha = 1, shape= 108) +
geom_ribbon(aes(x = xgrid, ymin = pi_lb, ymax = pi_ub), fill = "grey85") +
geom_line(aes(x=xgrid, y = pi_est)) +
geom_line(aes(x=xgrid, y = pgrid_true), color = "blue", linetype = "dashed")+
ylab("") + xlab("") +
theme_bw()
g2