From e1183eda7f4d753967edaa290c9b09d86b0067de Mon Sep 17 00:00:00 2001 From: Anka Date: Mon, 2 Sep 2024 08:00:35 +0000 Subject: [PATCH] Built site for abess@0.4.8: d43ab4e --- 404.html | 18 +- LICENSE-text.html | 20 +- articles/index.html | 18 +- articles/v01-abess-guide.html | 62 +-- articles/v03-classification.html | 136 +++--- articles/v04-PoissonGammaReg.html | 72 ++-- articles/v05-coxreg.html | 80 ++-- articles/v06-MultiTaskLearning.html | 74 ++-- articles/v07-advancedFeatures.html | 91 ++-- articles/v08-sPCA.html | 187 ++++---- articles/v09-fasterSetting.html | 140 +++--- articles/v10-algorithm.html | 405 ++++++++++-------- articles/v11-power-of-abess.html | 68 +-- ...2-Robust-Principal-Component-Analysis.html | 142 +++--- authors.html | 46 +- index.html | 18 +- news/index.html | 18 +- pkgdown.yml | 5 +- reference/_PACKAGE.html | 8 + reference/abess-package.html | 18 +- reference/abess.formula.html | 8 + reference/abess.html | 102 ++--- reference/abesspca.html | 74 ++-- reference/abessrpca.html | 96 ++--- reference/coef.abess.html | 34 +- reference/coef.abesspca.html | 36 +- reference/coef.abessrpca.html | 36 +- reference/deviance.abess.html | 32 +- reference/extract.abess.html | 32 +- reference/generate.data.html | 54 +-- reference/generate.matrix.html | 42 +- reference/generate.spc.matrix.html | 42 +- reference/index.html | 20 +- reference/plot.abess.html | 34 +- reference/plot.abesspca.html | 34 +- reference/plot.abessrpca.html | 40 +- reference/predict.abess.html | 36 +- reference/print.abess.html | 32 +- reference/print.abesspca.html | 32 +- reference/print.abessrpca.html | 32 +- reference/trim32.html | 18 +- sitemap.xml | 156 ++----- 42 files changed, 1382 insertions(+), 1266 deletions(-) create mode 100644 reference/_PACKAGE.html create mode 100644 reference/abess.formula.html diff --git a/404.html b/404.html index be47648f..142803d0 100644 --- a/404.html +++ b/404.html @@ -27,7 +27,7 @@ - +
- +
@@ -117,7 +117,7 @@ - +
@@ -145,17 +145,17 @@

Page not found (404)

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -128,7 +128,7 @@

License

means any form of the work other than Source Code Form. . 1.7. "Larger Work" -means a work that combines Covered Software with other material, in +means a work that combines Covered Software with other material, in a separate file or files, that is not Covered Software. . 1.8. "License" @@ -481,16 +481,16 @@

License

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -127,16 +127,16 @@

All vignettes

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -128,7 +126,7 @@

Quick start for abess: Linear

Jin Zhu

- Source: ../vignettes/v01-abess-guide.Rmd + Source: vignettes/v01-abess-guide.Rmd
@@ -139,13 +137,20 @@

Jin Zhu

Brief introduction

The R package abess implement a polynomial algorithm for solving the -best-subset selection problem: \[\min_{\boldsymbol{\beta} \in \mathbb{R}^p} -\mathcal{L_n}({\boldsymbol\beta}), \text{ s.t. } \|\boldsymbol\beta\|_0 -\leq s,\] where \(\mathcal{L}_{n}(\boldsymbol \beta)=\frac{1}{2 -n}\|y-X \boldsymbol{\beta}\|_{2}^{2}\), \(\|\boldsymbol{\beta}\|_0=\sum_{i=1}^pI( -\boldsymbol{\beta}_i\neq 0)\) is the \(\ell_0\)(-pseudo) norm of \(\beta\), and the sparsity level \(s\) is usually an unknown non-negative -integer. Next, we present an example to show how to use the -abess package to solve a simple problem.

+best-subset selection problem: +min𝛃p𝓃(𝛃), s.t. 𝛃0s,\min_{\boldsymbol{\beta} \in \mathbb{R}^p} \mathcal{L_n}({\boldsymbol\beta}), \text{ s.t. } \|\boldsymbol\beta\|_0 \leq s, +where +n(𝛃)=12nyX𝛃22\mathcal{L}_{n}(\boldsymbol \beta)=\frac{1}{2 n}\|y-X \boldsymbol{\beta}\|_{2}^{2}, +𝛃0=i=1pI(𝛃i0)\|\boldsymbol{\beta}\|_0=\sum_{i=1}^pI( \boldsymbol{\beta}_i\neq 0) +is the +0\ell_0(-pseudo) +norm of +β\beta, +and the sparsity level +ss +is usually an unknown non-negative integer. Next, we present an example +to show how to use the abess package to solve a simple +problem.

Quick example @@ -153,12 +158,17 @@

Quick example

Fixed support size best subset selection

-

We generate a design matrix \(X\) +

We generate a design matrix +XX containing 300 observations and each observation has 1000 predictors. -The response variable \(y\) is linearly -related to the first, second, and fifth predictors in \(X\): \[y = 3X_1 -+ 1.5X_2 + 2X_5 + \epsilon,\] where \(\varepsilon\) is a standard normal random -variable.

+The response variable +yy +is linearly related to the first, second, and fifth predictors in +XX: +y=3X1+1.5X2+2X5+ϵ,y = 3X_1 + 1.5X_2 + 2X_5 + \epsilon, +where +ε\varepsilon +is a standard normal random variable.

 library(abess)
 synthetic_data <- generate.data(n = 300, p = 1000, 
@@ -637,9 +647,7 @@ 

More features - -

+

@@ -652,17 +660,17 @@

More features

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -130,7 +128,7 @@

Jin Zhu, Liyuan

6/12/2021

- Source: ../vignettes/v03-classification.Rmd + Source: vignettes/v03-classification.Rmd
@@ -176,19 +174,33 @@

Titanic Dataset and Classification -Logistic regression function is an \(s\)-shaped curve modeling the posterior -probability \(p\) via a linear -combination of the features. The curve is defined as \(p = \frac{1}{1+\exp(-\eta)}\) where \(\eta = -\boldsymbol\beta_0+x\boldsymbol\beta\) and \(x\) are predictors, and \(\boldsymbol\beta_0, \boldsymbol\beta\) are -coefficients to be learned from data. The logistic regression model has -this form: \[ +Logistic regression function is an +ss-shaped +curve modeling the posterior probability +pp +via a linear combination of the features. The curve is defined as +p=11+exp(η)p = \frac{1}{1+\exp(-\eta)} +where +η=𝛃0+x𝛃\eta = \boldsymbol\beta_0+x\boldsymbol\beta +and +xx +are predictors, and +𝛃0,𝛃\boldsymbol\beta_0, \boldsymbol\beta +are coefficients to be learned from data. The logistic regression model +has this form: +log(p/(1p))=𝛃0+x𝛃. \log(p/(1-p)) = \boldsymbol\beta_0 + x\boldsymbol\beta. -\] The quantity \(\log(p/(1-p))\) is called the logarithm of -the odd, also called log-odd or logit. The best subset selection for -logistic regression aim to balance model accuracy and model complexity, -where the former is achieves by maximizing the log-likelihood function -and the latter is characterized by a constraint: \(\| \boldsymbol\beta \|_0 \leq s\) and \(s\) can be determined in a data driven -way.

+ The quantity +log(p/(1p))\log(p/(1-p)) +is called the logarithm of the odd, also called log-odd or logit. The +best subset selection for logistic regression aim to balance model +accuracy and model complexity, where the former is achieves by +maximizing the log-likelihood function and the latter is characterized +by a constraint: +𝛃0s\| \boldsymbol\beta \|_0 \leq s +and +ss +can be determined in a data driven way.

Best Subset Selection for Logistic Regression @@ -400,34 +412,50 @@

Best subset s assigned a probability between 0 and 1, with a sum of one. The extended model is multinomial logistic regression.

To arrive at the multinomial logistic model, one can imagine, for -\(K\) possible classes, running \(K-1\) independent logistic regression -models, in which one class is chosen as a ``pivot’’ and then the other -\(K-1\) classes are separately -regressed against the pivot outcome. This would proceed as follows, if -class K (the last outcome) is chosen as the pivot: \[\ln(\mathbb{P}(y = 1) / \mathbb{P}(y = K)) = -x^\top \boldsymbol\beta^{(1)}, \] \[\cdots \cdots\] \[\ln(\mathbb{P}(y = K - 1) / \mathbb{P}(y = K)) = -x^\top \boldsymbol\beta^{(K - 1)}.\] Then, the probability to -choose the \(j\)-th class can be easily -derived to be: \[\mathbb{P}(y = j) = -\frac{\exp{(x^\top \boldsymbol\beta^{(j)})}}{1 + \sum_{k=1}^{K-1} -\exp{(x^\top \boldsymbol\beta^{(k)})}}, \] and subsequently, we -would predict the \(j^{*}\)-th class if -the \(j^* = \arg\max_{j} \mathbb{P}(y = -j)\). Notice that, for \(K\) -possible classes case, there are \(p \times (K -- 1)\) unknown parameters: \(\boldsymbol\beta^{(1)}, \ldots, -\boldsymbol\beta^{(K-1)}\) to be estimated. Because the number of -parameters increase as \(K\), it is -even more urge to constrain the model complexity. And the best subset -selection for multinomial logistic regression aims to maximize the -log-likelihood function and control the model complexity by restricting -\(B = (\boldsymbol\beta^{(1)}, \ldots, -\boldsymbol\beta^{(K-1)})\) with \(\| B -\|_{0, 2} \leq s\) where \(\| B \|_{0, -2} = \sum_{i = 1}^{p} I(B_{i\cdot} = {\bf 0})\), \(B_{i\cdot}\) is the \(i\)-th row of coefficient matrix \(B\) and \({\bf 0} -\in R^{K - 1}\) is an all zero vector. In other words, each row -of \(B\) would be either all zero or -all non-zero.

+KK +possible classes, running +K1K-1 +independent logistic regression models, in which one class is chosen as +a ``pivot’’ and then the other +K1K-1 +classes are separately regressed against the pivot outcome. This would +proceed as follows, if class K (the last outcome) is chosen as the +pivot: +ln((y=1)/(y=K))=x𝛃(1),\ln(\mathbb{P}(y = 1) / \mathbb{P}(y = K)) = x^\top \boldsymbol\beta^{(1)}, \cdots \cdotsln((y=K1)/(y=K))=x𝛃(K1).\ln(\mathbb{P}(y = K - 1) / \mathbb{P}(y = K)) = x^\top \boldsymbol\beta^{(K - 1)}. +Then, the probability to choose the +jj-th +class can be easily derived to be: +(y=j)=exp(x𝛃(j))1+k=1K1exp(x𝛃(k)),\mathbb{P}(y = j) = \frac{\exp{(x^\top \boldsymbol\beta^{(j)})}}{1 + \sum_{k=1}^{K-1} \exp{(x^\top \boldsymbol\beta^{(k)})}}, +and subsequently, we would predict the +j*j^{*}-th +class if the +j*=argmaxj(y=j)j^* = \arg\max_{j} \mathbb{P}(y = j). +Notice that, for +KK +possible classes case, there are +p×(K1)p \times (K - 1) +unknown parameters: +𝛃(1),,𝛃(K1)\boldsymbol\beta^{(1)}, \ldots, \boldsymbol\beta^{(K-1)} +to be estimated. Because the number of parameters increase as +KK, +it is even more urge to constrain the model complexity. And the best +subset selection for multinomial logistic regression aims to maximize +the log-likelihood function and control the model complexity by +restricting +B=(𝛃(1),,𝛃(K1))B = (\boldsymbol\beta^{(1)}, \ldots, \boldsymbol\beta^{(K-1)}) +with +B0,2s\| B \|_{0, 2} \leq s +where $\| B \|_{0, 2} = \sum_{i = 1}^{p} +I(B_{i\cdot} = {\bf 0})$, +BiB_{i\cdot} +is the +ii-th +row of coefficient matrix +BB +and ${\bf 0} \in R^{K - 1}$ is an all +zero vector. In other words, each row of +BB +would be either all zero or all non-zero.

Multinomial logistic regression with abess Package @@ -486,9 +514,7 @@

Multinomial logistic +

@@ -501,17 +527,17 @@

Multinomial logistic

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -130,7 +128,7 @@

Liyuan Hu, Jin

2021/11/3

- Source: ../vignettes/v04-PoissonGammaReg.Rmd + Source: vignettes/v04-PoissonGammaReg.Rmd
@@ -145,18 +143,20 @@

Poisson Regression\[ +

The Poisson regression is defined as +log(E(yi|xi))=xiTβ. \log(E(y_i|x_i)) = x_i^T \beta. -\] We generate some artificial data using the following logic. -Consider a dataset containing the information of complain calls about -100 companies over a period of 10 years. count gives the -number of complains, and the dataset also have other variables like -age, sex, job, -education, region, marriage. The -generate.data() function allow you to generate simulated -data. By specifying support.size = 3, here we set only 3 of -the 5 above mentioned variable have effect on the expectation of the -response count.

+ We generate some artificial data using +the following logic. Consider a dataset containing the information of +complain calls about 100 companies over a period of 10 years. +count gives the number of complains, and the dataset also +have other variables like age, sex, +job, education, region, +marriage. The generate.data() function allow +you to generate simulated data. By specifying +support.size = 3, here we set only 3 of the 5 above +mentioned variable have effect on the expectation of the response +count.

 library(abess)
 dat <- generate.data(n = 100, p = 6, support.size = 3,family = "poisson")
@@ -322,15 +322,21 @@ 

Gamma RegressionGamma regression can be used when you meet a positive continuous response variable such as payments for insurance claims, or the lifetime of a redundant system. It is well known that the density of Gamma -distribution can be represented as a function of a mean parameter (\(\mu\)) and a shape parameter (\(\alpha\)), specifically, \[ -f(y \mid \mu, \alpha)=\frac{1}{y \Gamma(\alpha)}\left(\frac{\alpha -y}{\mu}\right)^{\alpha} e^{-\alpha y / \mu} {I}_{(0, \infty)}(y), -\] where \(I(\cdot)\) denotes -the indicator function. In the Gamma regression model, response variable -IS assumed to follow Gamma distribution. Specifically, \[ +distribution can be represented as a function of a mean parameter +(μ\mu) +and a shape parameter +(α\alpha), +specifically, +f(yμ,α)=1yΓ(α)(αyμ)αeαy/μI(0,)(y), +f(y \mid \mu, \alpha)=\frac{1}{y \Gamma(\alpha)}\left(\frac{\alpha y}{\mu}\right)^{\alpha} e^{-\alpha y / \mu} {I}_{(0, \infty)}(y), + where +I()I(\cdot) +denotes the indicator function. In the Gamma regression model, response +variable IS assumed to follow Gamma distribution. Specifically, +yiGamma(μi,α), y_i \sim Gamma(\mu_i, \alpha), -\] where \(1/\mu_i = -x_i^T\beta\).

+ where +1/μi=xiTβ1/\mu_i = x_i^T\beta.

We apply the above procedure for gamma regression simply by changing family = "poisson" to family = "gamma". This time we consider the response variables as (continuous) levels of @@ -449,9 +455,7 @@

Gamma Regression - -

+

@@ -464,17 +468,17 @@

Gamma Regression

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -129,7 +127,7 @@

Liyuan Hu, Jin

2021/5/27

- Source: ../vignettes/v05-coxreg.Rmd + Source: vignettes/v05-coxreg.Rmd
@@ -160,38 +158,44 @@

Cox Proportional Hazards Regression latter only focus on modeling the survival according to one factor (categorical predictor is best) while the former is able to take into consideration any covariates simultaneously, regardless of whether -they’re quantitative or categorical. The model is as follow: \[ +they’re quantitative or categorical. The model is as follow: +h(t)=h0(t)exp(η). h(t) = h_0(t)\exp(\eta). -\] where,

+ where,

    -
  • \(\eta = x\boldsymbol\beta.\)
  • +
  • η=x𝛃.\eta = x\boldsymbol\beta.
  • -\(t\) is the survival time.
  • +tt +is the survival time.
  • -\(h(t)\) is the hazard function -which evaluate the risk of dying at time \(t\).
  • +h(t)h(t) +is the hazard function which evaluate the risk of dying at time +tt.
  • -\(h_0(t)\) is called the baseline -hazard. It describes value of the hazard if all the predictors are -zero.
  • +h0(t)h_0(t) +is called the baseline hazard. It describes value of the hazard if all +the predictors are zero.
  • -\(\boldsymbol\beta\) measures the -impact of covariates.
  • +𝛃\boldsymbol\beta +measures the impact of covariates.
-

Consider two case \(i\) and \(i'\) that have different x values. -Their hazard functions can be simply written as follow

-

\[ +

Consider two case +ii +and +ii' +that have different x values. Their hazard functions can be simply +written as follow

+

hi(t)=h0(t)exp(ηi)=h0(t)exp(xi𝛃) h_i(t) = h_0(t)\exp(\eta_i) = h_0(t)\exp(x_i\boldsymbol\beta) -\] and \[ -h_{i'}(t) = h_0(t)\exp(\eta_{i'}) = -h_0(t)\exp(x_{i'}\boldsymbol\beta). -\] The hazard ratio for these two cases is

-

\[\begin{align*} -\frac{h_i(t)}{h_{i'}(t)} & = -\frac{h_0(t)\exp(\eta_i)}{h_0(t)\exp(\eta_{i'})} \\ - & = -\frac{\exp(\eta_i)}{\exp(\eta_{i'})} -\end{align*}\]

+ and +hi(t)=h0(t)exp(ηi)=h0(t)exp(xi𝛃). +h_{i'}(t) = h_0(t)\exp(\eta_{i'}) = h_0(t)\exp(x_{i'}\boldsymbol\beta). + The hazard ratio for these two cases +is

+

hi(t)hi(t)=h0(t)exp(ηi)h0(t)exp(ηi)=exp(ηi)exp(ηi)\begin{align*} +\frac{h_i(t)}{h_{i'}(t)} & = \frac{h_0(t)\exp(\eta_i)}{h_0(t)\exp(\eta_{i'})} \\ + & = \frac{\exp(\eta_i)}{\exp(\eta_{i'})} +\end{align*}

which is independent of time.

@@ -380,9 +384,7 @@

Make a Prediction - -

+ @@ -395,17 +397,17 @@

Make a Prediction

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -129,7 +127,7 @@

Jin Zhu, Liyuan

2021/5/31

- Source: ../vignettes/v06-MultiTaskLearning.Rmd + Source: vignettes/v06-MultiTaskLearning.Rmd
@@ -145,9 +143,24 @@

Brief Introduction\(y\) is \(m\)-dimensional response variable, \(x\) is \(p\)-dimensional predictors, \(B \in R^{m \times p}\) is coefficient -matrix, the MMLR model for the multivariate response is given by \[y = B x + \varepsilon,\] where \(\varepsilon\) is \(m\)-dimensional random noise variable with -zero mean.

+associated with Glioblastoma multiforme cancer. Let +yy +is +mm-dimensional +response variable, +xx +is +pp-dimensional +predictors, +BRm×pB \in R^{m \times p} +is coefficient matrix, the MMLR model for the multivariate response is +given by +y=Bx+ε,y = B x + \varepsilon, +where +ε\varepsilon +is +mm-dimensional +random noise variable with zero mean.

Due to the Occam’s razor principal or the high-dimensionality of predictors, it is meaningful to use a small amount of predictors to conduct multi-task learning. For example, understanding the relationship @@ -156,14 +169,22 @@

Brief Introduction\[\frac{1}{2n} \| Y - XB \|_{F}^2, \text{ s.t. } \| -B \|_{0, 2} \leq s,\] where, \(Y \in -R^{n \times m}\) and \(X \in R^{n -\times p}\) record \(n\) -observations’ response and predictors, respectively. Here \(\| B \|_{0, 2} = \sum_{i = 1}^{p} I(B_{i\cdot} = -{\bf 0})\), where \(B_{i\cdot}\) -is the \(i\)-th row of coefficient -matrix \(B\) and \({\bf 0} \in R^{m}\) is an all zero +problem under the MMLR model is formulated as +12nYXBF2, s.t. B0,2s,\frac{1}{2n} \| Y - XB \|_{F}^2, \text{ s.t. } \| B \|_{0, 2} \leq s, +where, +YRn×mY \in R^{n \times m} +and +XRn×pX \in R^{n \times p} +record +nn +observations’ response and predictors, respectively. Here $\| B \|_{0, 2} = \sum_{i = 1}^{p} I(B_{i\cdot} = +{\bf 0})$, where +BiB_{i\cdot} +is the +ii-th +row of coefficient matrix +BB +and ${\bf 0} \in R^{m}$ is an all zero vector.

@@ -270,7 +291,8 @@

Quick exampleplot() function. The three plots -corresponds to \(y_1, y_2, y_3\), +corresponds to +y1,y2,y3y_1, y_2, y_3, respectively.

 plot(abess_fit)
@@ -280,9 +302,7 @@

Quick example - -

+ @@ -295,17 +315,17 @@

Quick example

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -129,7 +127,7 @@

Jin Zhu,

1/21/2022

- Source: ../vignettes/v07-advancedFeatures.Rmd + Source: vignettes/v07-advancedFeatures.Rmd
@@ -217,16 +215,17 @@

Carrying Out the Nuisan

Regularized Adaptive Best Subset Selection

In some cases, especially under low signal-to-noise ratio (SNR) -setting or predictors are highly correlated, the vanilla type of \(L_0\) constrained model may not be -satisfying and a more sophisticated trade-off between bias and variance -is needed. Under this concern, the abess package provides -option of best subset selection with \(L_2\) norm regularization called the -regularized best subset selection (BESS). The model has this following -form:

-

\[\begin{align} -\min_{\boldsymbol\beta} \mathcal{L_n}({\boldsymbol\beta}) + \lambda -\|\boldsymbol\beta\|_2^2. -\end{align}\]

+setting or predictors are highly correlated, the vanilla type of +L0L_0 +constrained model may not be satisfying and a more sophisticated +trade-off between bias and variance is needed. Under this concern, the +abess package provides option of best subset selection with +L2L_2 +norm regularization called the regularized best subset selection (BESS). +The model has this following form:

+

min𝛃𝓃(𝛃)+λ𝛃22.\begin{align} +\min_{\boldsymbol\beta} \mathcal{L_n}({\boldsymbol\beta}) + \lambda \|\boldsymbol\beta\|_2^2. +\end{align}

Carrying out the Regularized BESS

@@ -278,12 +277,17 @@

Carrying out the Regularized BESS

Best group subset selection

-

Group linear model is a linear model which the \(p\) predictors are separated into \(J\) pre-determined non-overlapping -groups,

-

\[ +

Group linear model is a linear model which the +pp +predictors are separated into +JJ +pre-determined non-overlapping groups,

+

y=j=1JXGj𝛃𝐆𝐣+ϵ, y = \sum_{j=1}^J X_{G_j} \boldsymbol{\beta_{G_j}}+\epsilon, -\] where \(\{G_j\}_{j=1}^J\) are -the group indices of the \(p\) + where +{Gj}j=1J\{G_j\}_{j=1}^J +are the group indices of the +pp predictors.

Best group subset selection (BGSS) aims to choose a small part of groups to achieve the best interpretability on the response variable. @@ -294,20 +298,21 @@

Best group subset selection\[\begin{equation}\label{eq:constraint} -\min\limits_{\beta \in \mathbb{R}^p} -\mathcal{L_n}({\boldsymbol\beta}),\quad s.t.\ -\|\boldsymbol{\beta}\|_{0,2}\leqslant s, -\end{equation}\] where the \(\ell_{0,2}\) (-pseudo) norm is defined as -\(\|{{\boldsymbol\beta}}\|_{0,2} = -\sum_{j=1}^J \mathrm{I} (\|\boldsymbol{\beta_{G_j}}\|_2 \neq 0)\) -and model size \(s\) is a positive -integer to be determined from data. Regardless of the NP-hard of this -problem, Zhang et al develop a certifiably polynomial algorithm with a -high probability to solve it. This algorithm is integrated in the -abess package, and user can handily select best group -subset by assigning a proper value to the group.index -arguments in abess() function.

+

The BGSS can be achieved by solving: +minβp𝓃(𝛃),s.t.𝛃0,2s,\begin{equation}\label{eq:constraint} +\min\limits_{\beta \in \mathbb{R}^p} \mathcal{L_n}({\boldsymbol\beta}),\quad s.t.\ \|\boldsymbol{\beta}\|_{0,2}\leqslant s, +\end{equation} where the +0,2\ell_{0,2} +(-pseudo) norm is defined as +𝛃0,2=j=1JI(𝛃𝐆𝐣20)\|{{\boldsymbol\beta}}\|_{0,2} = \sum_{j=1}^J \mathrm{I} (\|\boldsymbol{\beta_{G_j}}\|_2 \neq 0) +and model size +ss +is a positive integer to be determined from data. Regardless of the +NP-hard of this problem, Zhang et al develop a certifiably polynomial +algorithm with a high probability to solve it. This algorithm is +integrated in the abess package, and user can handily +select best group subset by assigning a proper value to the +group.index arguments in abess() function.

Quick example

@@ -393,9 +398,7 @@

Integrate SIS in abess pack +

@@ -408,17 +411,17 @@

Integrate SIS in abess pack

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -128,7 +126,7 @@

Junhao Huang, Jin Zhu

- Source: ../vignettes/v08-sPCA.Rmd + Source: vignettes/v08-sPCA.Rmd
@@ -143,22 +141,36 @@

Introduction\[ +

maxvvΣv,s.t.vv=1. \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1. -\]

-

where \(\Sigma = X^TX / (n-1)\) and -\(X\) is the centered -sample matrix. We also denote that \(X\) is a \(n\times p\) matrix, where each row is an -observation and each column is a variable.

-

Then, before further analysis, we can project \(X\) to \(v\) (thus dimensional reduction), without -losing too much information.

+

+

where +Σ=XTX/(n1)\Sigma = X^TX / (n-1) +and +XX +is the centered sample matrix. We also denote that +XX +is a +n×pn\times p +matrix, where each row is an observation and each column is a +variable.

+

Then, before further analysis, we can project +XX +to +vv +(thus dimensional reduction), without losing too much information.

However, consider that:

    -
  • The PC is a linear combination of all primary variables (\(Xv\)), but sometimes we may tend to use -less variables for clearer interpretation (and less computational -complexity);

  • -
  • It has been proved that if \(p/n\) does not converge to \(0\), the classical PCA is not consistent, -but this would happen in some high-dimensional data analysis.

  • +
  • The PC is a linear combination of all primary variables +(XvXv), +but sometimes we may tend to use less variables for clearer +interpretation (and less computational complexity);

  • +
  • It has been proved that if +p/np/n +does not converge to +00, +the classical PCA is not consistent, but this would happen in some +high-dimensional data analysis.

For example, in gene analysis, the dataset may contain plenty of genes (variables) and we would like to find a subset of them, which can @@ -170,23 +182,28 @@

Introduction\[ - \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\ ||v||_0\leq -s. -\]

-

where \(s\) is a non-negative -integer, which indicates how many primary variables are used in -principal component. With abessPCA, we can search for the best subset of -variables to form principal component and it retains consistency even -under \(p>>n\). And we make two -remarks:

+

maxvvΣv,s.t.vv=1,||v||0s. + \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\ ||v||_0\leq s. +

+

where +ss +is a non-negative integer, which indicates how many primary variables +are used in principal component. With abessPCA, we can search for the +best subset of variables to form principal component and it retains +consistency even under +p>>np>>n. +And we make two remarks:

    -
  • Clearly, if \(s\) is equal or -larger than the number of primary variables, this sparsity limitation is -actually useless, so the problem is equivalent to a classical PCA.
  • +
  • Clearly, if +ss +is equal or larger than the number of primary variables, this sparsity +limitation is actually useless, so the problem is equivalent to a +classical PCA.
  • With less variables, the PC must have lower explained variance. -However, this decrease is slight if we choose a good \(s\) and at this price, we can interpret the -PC much better. It is worthy.
  • +However, this decrease is slight if we choose a good +ss +and at this price, we can interpret the PC much better. It is +worthy.

In the next section, we will show how to form abessPCA in our frame.

@@ -218,7 +235,8 @@

Communities-and-crime datasetAdaptive best subset selection for PCA

Next, we turn to fit abessPCA. For fitting the model, we can give -either predictor matrix \(X\):

+either predictor matrix +XX:

## 
@@ -288,7 +306,9 @@ 

Adaptive best subset selection f

Interpreting result

After fitting abessPCA, we study the percentage of explained variance -as \(s\) increases:

+as +ss +increases:

 plot(best_pca[["support.size"]], best_pca[["pev"]], type = "l")

@@ -416,15 +436,19 @@

Group variableIn some cases, some variables may need to consider together, that is, they should be “used” or “unused” for PC at the same time, which we call “group information”. The optimization problem becomes:

-

\[ - \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\ -\sum_{g=1}^G I(||v_g||\neq 0)\leq s. -\]

-

where we suppose there are \(G\) -groups, and the \(g\)-th one correspond -to \(v_g\), \(v = [v_1^T,v_2^T,\cdots,v_G^T]^T\). Then we -are interested to find \(s\) (or less) -important groups.

+

maxvvΣv,s.t.vv=1,g=1GI(||vg||0)s. + \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\ \sum_{g=1}^G I(||v_g||\neq 0)\leq s. +

+

where we suppose there are +GG +groups, and the +gg-th +one correspond to +vgv_g, +v=[v1T,v2T,,vGT]Tv = [v_1^T,v_2^T,\cdots,v_G^T]^T. +Then we are interested to find +ss +(or less) important groups.

Group problem is extraordinary important in real data analysis. Still take gene analysis as an example, several sites would be related to one pathway, and it is meaningless to consider each of them alone.

@@ -480,33 +504,41 @@

Multiple principal components\[ +

Σ=(1vv)Σ(1vv) \Sigma' = (1-vv^{\top})\Sigma(1-vv^{\top}) -\]

-

where \(\Sigma\) is the covariance -matrix and \(v\) is its (sparse) -loading vector. We map it into \(\Sigma'\), which indicates the -orthogonal space of \(v\), and then -solve the sparse principal component for \(\Sigma'\) again. By this iteration -process, we can acquire multiple principal components and they are -sorted from the largest to the smallest. In our program, there is an -additional argument sparse.type to support this feature. By -setting sparse.type = "kpc", then best subset selection -performs on the first \(K\) principal -components where \(K\) is decided by -argument support.size.

+

+

where +Σ\Sigma +is the covariance matrix and +vv +is its (sparse) loading vector. We map it into +Σ\Sigma', +which indicates the orthogonal space of +vv, +and then solve the sparse principal component for +Σ\Sigma' +again. By this iteration process, we can acquire multiple principal +components and they are sorted from the largest to the smallest. In our +program, there is an additional argument sparse.type to +support this feature. By setting sparse.type = "kpc", then +best subset selection performs on the first +KK +principal components where +KK +is decided by argument support.size.

Suppose we are interested in the first two principal components, and the support size is 50 in the first loading vector and is 40 in the second loading vector. In other words, we consecutively solve two -problem: \[ - v_1 \leftarrow \arg\max_{v} v^{\top}\Sigma v,\qquad s.t.\quad -v^{\top}v=1,\ ||v||_0\leq 10, -\] \[ - v_2 \leftarrow \arg\max_{v} v^{\top} \Sigma^{\prime} v,\qquad -s.t.\quad v^{\top}v=1,\ ||v||_0\leq 5, -\] where \(\Sigma^{\prime} = (1-v_1 -v_1^\top)\Sigma(1-v_1 v_1^\top)\). The \((v_1, v_2)\) forms a sparse loading -matrix.

+problem: +v1argmaxvvΣv,s.t.vv=1,||v||010, + v_1 \leftarrow \arg\max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\ ||v||_0\leq 10, +v2argmaxvvΣv,s.t.vv=1,||v||05, + v_2 \leftarrow \arg\max_{v} v^{\top} \Sigma^{\prime} v,\qquad s.t.\quad v^{\top}v=1,\ ||v||_0\leq 5, + where +Σ=(1v1v1)Σ(1v1v1)\Sigma^{\prime} = (1-v_1 v_1^\top)\Sigma(1-v_1 v_1^\top). +The +(v1,v2)(v_1, v_2) +forms a sparse loading matrix.

The code for solving the two problem is:

+ @@ -580,17 +611,17 @@

Multiple principal components

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -128,7 +126,7 @@

Jin Zhu

6/13/2021

- Source: ../vignettes/v09-fasterSetting.Rmd + Source: vignettes/v09-fasterSetting.Rmd
@@ -167,27 +165,32 @@

Introduction

Sparse matrix

-

We sometimes meet with problems where the \(N \times p\) input matrix \(X\) is extremely sparse, i.e., many entries -in \(X\) have zero values. A notable -example comes from document classification: aiming to assign classes to -a document, making it easier to manage for publishers and news sites. -The input variables for characterizing documents are generated from a so -called ``bag-of-words’’ model. In this model, each variable is scored -for the presence of each of the words in the entire dictionary under -consideration. Since most words are absent, the input variables for each -document is mostly zero, and so the entire matrix is mostly zero. Such -sparse matrices can be efficiently stored in R with a sparse column -format via the Matrix +

We sometimes meet with problems where the +N×pN \times p +input matrix +XX +is extremely sparse, i.e., many entries in +XX +have zero values. A notable example comes from document classification: +aiming to assign classes to a document, making it easier to manage for +publishers and news sites. The input variables for characterizing +documents are generated from a so called ``bag-of-words’’ model. In this +model, each variable is scored for the presence of each of the words in +the entire dictionary under consideration. Since most words are absent, +the input variables for each document is mostly zero, and so the entire +matrix is mostly zero. Such sparse matrices can be efficiently stored in +R with a sparse column format via the Matrix package. And the sparse matrix can be directly used by our abess package for boosting the computational efficient.

-

ABESS algorithm is ideally set up to exploit such sparsity. The \(O(N)\) inner-product operations when -computing forward sacrifice can exploit the sparsity, by summing over -only the non-zero entries. For computing backward sacrifice, the -sparsity also facilitate solving the convex optimization under a given -support set. The following example demonstrates the efficiency gain from -the sparse matrix. We first generate a input matrix whose 90% entries -are 0.

+

ABESS algorithm is ideally set up to exploit such sparsity. The +O(N)O(N) +inner-product operations when computing forward sacrifice can exploit +the sparsity, by summing over only the non-zero entries. For computing +backward sacrifice, the sparsity also facilitate solving the convex +optimization under a given support set. The following example +demonstrates the efficiency gain from the sparse matrix. We first +generate a input matrix whose 90% entries are 0.

## Loading required package: Matrix
@@ -234,8 +237,8 @@

Sparse matrixt2 <- system.time(abess_fit <- abess(x, y)) rbind(t1, t2)[, 1:3]

##    user.self sys.self elapsed
-## t1     0.215    0.001   0.216
-## t2     0.169    0.007   0.176
+## t1 0.140 0.002 0.141 +## t2 0.116 0.002 0.118

From the comparison, we see that the time required by sparse matrix is visibly smaller, and thus, we suggest to assign a sparse matrix to abess when the input matrix have a lot of zero entries.

@@ -244,28 +247,45 @@

Sparse matrixGolden-section searching

The following is a typical ``model size v.s. BGIC’’ plot.

-

The \(x\)-axis is model size, and -the \(y\)-axis is BGIC’s value recorded -in group splicing algorithm for linear model. The entries of design -matrix \(X\) are i.i.d. -sampled from \(\mathcal{N}(0, 1)\), and -the matrix shape is \(100 \times 200\). -The error term \(\varepsilon\) are -i.i.d. \(\mathcal{N}(0, -\frac{1}{2})\). Take the two adjacent variables as one group, and -set the true coefficients \(\beta=(1, 1, 1, 1, -1, 1, -1, -1, -1, -1, 0, \ldots, 0)\). The orange vertical dash -line indicates the true group subset size.

-

From this Figure, we see that the BGIC decreases from \(T=1\) to \(T=5\), but it increases as \(T\) larger than \(5\). In other words, the BGIC path of -SGSplicing algorithm is a strictly unimodal function achieving minimum -at the true group subset size \(T = -5\). Motivated by this observation, we suggest to recruit a -heuristic search based on the golden-section search technique, an -efficient method for finding the extremum of a unimodal function, to -determine support size that minimizing BGIC. Compared with searching the -optimal support size one by one from a candidate set with \(O(s_{\max})\) complexity, golden-section -reduce the time complexity to \(O(\ln{(s_{\max})})\), giving a significant -computational improvement.

+

The +xx-axis +is model size, and the +yy-axis +is BGIC’s value recorded in group splicing algorithm for linear model. +The entries of design matrix +XX +are i.i.d. sampled from +𝒩(0,1)\mathcal{N}(0, 1), +and the matrix shape is +100×200100 \times 200. +The error term +ε\varepsilon +are i.i.d. +𝒩(0,12)\mathcal{N}(0, \frac{1}{2}). +Take the two adjacent variables as one group, and set the true +coefficients +β=(1,1,1,1,1,1,1,1,1,1,0,,0)\beta=(1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 0, \ldots, 0). +The orange vertical dash line indicates the true group subset size.

+

From this Figure, we see that the BGIC decreases from +T=1T=1 +to +T=5T=5, +but it increases as +TT +larger than +55. +In other words, the BGIC path of SGSplicing algorithm is a strictly +unimodal function achieving minimum at the true group subset size +T=5T = 5. +Motivated by this observation, we suggest to recruit a heuristic search +based on the golden-section search technique, an efficient method for +finding the extremum of a unimodal function, to determine support size +that minimizing BGIC. Compared with searching the optimal support size +one by one from a candidate set with +O(smax)O(s_{\max}) +complexity, golden-section reduce the time complexity to +O(ln(smax))O(\ln{(s_{\max})}), +giving a significant computational improvement.

The code below exhibits how to employ the golden search technique with abess package:

@@ -299,8 +319,8 @@ 

Golden-section searchingt2 <- system.time(abess_fit <- abess(y ~ ., data = dat)) rbind(t1, t2)[, 1:3]

##    user.self sys.self elapsed
-## t1     0.011    0.001   0.012
-## t2     0.041    0.001   0.041
+## t1 0.007 0 0.007 +## t2 0.022 0 0.022

Early stop @@ -389,8 +409,8 @@

Early stopt2 <- system.time(abess_fit <- abess(y ~ ., data = dat)) rbind(t1, t2)[, 1:3]

##    user.self sys.self elapsed
-## t1     0.042    0.000   0.043
-## t2     0.040    0.001   0.041
+## t1 0.022 0.001 0.022 +## t2 0.021 0.001 0.022

we can conclude that early-stopping brings fast computation and might maintain statistical guarantee.

@@ -398,9 +418,7 @@

Early stop - - + @@ -413,17 +431,17 @@

Early stop

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -129,7 +127,7 @@

Jin Zhu,

1/20/2022

- Source: ../vignettes/v10-algorithm.Rmd + Source: vignettes/v10-algorithm.Rmd
@@ -153,43 +151,63 @@

Linear model

Sacrifices

-

Consider the \(\ell_{0}\) constraint -minimization problem, \[ -\min _{\boldsymbol{\beta}} \mathcal{L}_{n}(\boldsymbol{\beta}), \quad -\text { s.t }\|\boldsymbol{\beta}\|_{0} \leq \mathrm{s}, -\] where ${n}()=|y-X |{2}^{2}. $ Without loss of -generality, we consider \(\|\boldsymbol{\beta}\|_{0}=\mathrm{s}\). -Given any initial set \(\mathcal{A} \subset -\mathcal{S}=\{1,2, \ldots, p\}\) with cardinality \(|\mathcal{A}|=s\), denote \(\mathcal{I}=\mathcal{A}^{\mathrm{c}}\) and -compute \[ -\hat{\boldsymbol{\beta}}=\arg \min _{\boldsymbol{\beta}_{\mathcal{I}}=0} -\mathcal{L}_{n}(\boldsymbol{\beta}). -\] We call \(\mathcal{A}\) and -\(\mathcal{I}\) as the active set and -the inactive set, respectively.

-

Given the active set \(\mathcal{A}\) -and \(\hat{\boldsymbol{\beta}}\), we -can define the following two types of sacrifices:

+

Consider the +0\ell_{0} +constraint minimization problem, +min𝛃n(𝛃), s.t 𝛃0s, +\min _{\boldsymbol{\beta}} \mathcal{L}_{n}(\boldsymbol{\beta}), \quad \text { s.t }\|\boldsymbol{\beta}\|_{0} \leq \mathrm{s}, + where ${n}()=|y-X |{2}^{2}. $ +Without loss of generality, we consider +𝛃0=s\|\boldsymbol{\beta}\|_{0}=\mathrm{s}. +Given any initial set +𝒜𝒮={1,2,,p}\mathcal{A} \subset \mathcal{S}=\{1,2, \ldots, p\} +with cardinality +|𝒜|=s|\mathcal{A}|=s, +denote +=𝒜c\mathcal{I}=\mathcal{A}^{\mathrm{c}} +and compute +𝛃̂=argmin𝛃=0n(𝛃). +\hat{\boldsymbol{\beta}}=\arg \min _{\boldsymbol{\beta}_{\mathcal{I}}=0} \mathcal{L}_{n}(\boldsymbol{\beta}). + We call +𝒜\mathcal{A} +and +\mathcal{I} +as the active set and the inactive set, respectively.

+

Given the active set +𝒜\mathcal{A} +and +𝛃̂\hat{\boldsymbol{\beta}}, +we can define the following two types of sacrifices:

    -
  1. Backward sacrifice: For any \(j \in -\mathcal{A}\), the magnitude of discarding variable \(j\) is, \[ -\xi_{j}=\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A} -\backslash\{j\}}\right)-\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}\right)=\frac{X_{j}^{\top} -X_{j}}{2 n}\left(\hat{\boldsymbol\beta}_{j}\right)^{2}, -\] +
  2. Backward sacrifice: For any +j𝒜j \in \mathcal{A}, +the magnitude of discarding variable +jj +is, +ξj=n(𝛃̂𝒜{j})n(𝛃̂𝒜)=XjXj2n(𝛃̂j)2, +\xi_{j}=\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A} \backslash\{j\}}\right)-\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}\right)=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\hat{\boldsymbol\beta}_{j}\right)^{2}, +
  3. -
  4. Forward sacrifice: For any \(j \in -\mathcal{I}\), the magnitude of adding variable \(j\) is, \[ -\zeta_{j}=\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}^{\mathcal{A}}}\right)-\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+\hat{t}^{\{j\}}\right)=\frac{X_{j}^{\top} -X_{j}}{2 n}\left(\frac{\hat{\boldsymbol d}_{j}}{X_{j}^{\top} X_{j} / -n}\right)^{2}. -\] where \(\hat{t}=\arg \min _{t} -\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+t^{\{j\}}\right), -\hat{\boldsymbol d}_{j}=X_{j}^{\top}(y-X \hat{\boldsymbol{\beta}}) / -n\) Intuitively, for \(j \in -\mathcal{A}\) (or \(j \in -\mathcal{I}\) ), a large \(\xi_{j}\) (or \(\zeta_{j}\)) implies the \(j\) th variable is potentially -important.
  5. +
  6. Forward sacrifice: For any +jj \in \mathcal{I}, +the magnitude of adding variable +jj +is, +ζj=n(𝛃𝒜̂)n(𝛃̂𝒜+t̂{j})=XjXj2n(𝐝̂jXjXj/n)2. +\zeta_{j}=\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}^{\mathcal{A}}}\right)-\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+\hat{t}^{\{j\}}\right)=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\frac{\hat{\boldsymbol d}_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}. + where +t̂=argmintn(𝛃̂𝒜+t{j}),𝐝̂j=Xj(yX𝛃̂)/n\hat{t}=\arg \min _{t} \mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+t^{\{j\}}\right), \hat{\boldsymbol d}_{j}=X_{j}^{\top}(y-X \hat{\boldsymbol{\beta}}) / n +Intuitively, for +j𝒜j \in \mathcal{A} +(or +jj \in \mathcal{I} +), a large +ξj\xi_{j} +(or +ζj\zeta_{j}) +implies the +jj +th variable is potentially important.

@@ -200,134 +218,134 @@

Best-Subset Selection w

Unfortunately, it is noteworthy that these two sacrifices are incomparable because they have different sizes of support set. However, -if we exchange some “irrelevant” variables in \(\mathcal{A}\) and some “important” -variables in \(\mathcal{I}\), it may -result in a higher-quality solution. This intuition motivates our -splicing method. Specifically, given any splicing size \(k \leq s\), define

-

\[ -\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} -\mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\} -\] to represent \(k\) least -relevant variables in \(\mathcal{A}\) -and \[ -\mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} -\mid\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\} -\] to represent \(k\) most -relevant variables in \(\mathcal{I} .\) -Then, we splice \(\mathcal{A}\) and -\(\mathcal{I}\) by exchanging \(\mathcal{A}_{k}\) and \(\mathcal{I}_{k}\) and obtain a new active -set \[ -\tilde{\mathcal{A}}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) -\cup \mathcal{I}_{k}. -\] Let \(\tilde{\mathcal{I}}=\tilde{\mathcal{A}}^{c}, -\tilde{\boldsymbol{\beta}}=\arg \min -_{\boldsymbol{\beta}_{\overline{\mathcal{I}}}=0} -\mathcal{L}_{n}(\boldsymbol{\beta})\), and \(\tau_{s}>0\) be a threshold. If \(\tau_{s}<\) \(\mathcal{L}_{n}(\hat{\boldsymbol\beta})-\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), -then \(\tilde{A}\) is preferable to -\(\mathcal{A} .\) The active set can be -updated iteratively until the loss function cannot be improved by -splicing. Once the algorithm recovers the true active set, we may splice -some irrelevant variables, and then the loss function may decrease -slightly. The threshold \(\tau_{s}\) -can reduce this unnecessary calculation. Typically, \(\tau_{s}\) is relatively small, e.g. \(\tau_{s}=0.01 s \log (p) \log (\log n) / -n.\)

+if we exchange some “irrelevant” variables in +𝒜\mathcal{A} +and some “important” variables in +\mathcal{I}, +it may result in a higher-quality solution. This intuition motivates our +splicing method. Specifically, given any splicing size +ksk \leq s, +define

+

𝒜k={j𝒜:i𝒜I(ξjξi)k} +\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\} + to represent +kk +least relevant variables in +𝒜\mathcal{A} +and +k={j:i(ζjζi)k} +\mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} \mid\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\} + to represent +kk +most relevant variables in +.\mathcal{I} . +Then, we splice +𝒜\mathcal{A} +and +\mathcal{I} +by exchanging +𝒜k\mathcal{A}_{k} +and +k\mathcal{I}_{k} +and obtain a new active set +𝒜̃=(𝒜𝒜k)k. +\tilde{\mathcal{A}}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}. + Let +̃=𝒜̃c,𝛃̃=argmin𝛃¯=0n(𝛃)\tilde{\mathcal{I}}=\tilde{\mathcal{A}}^{c}, \tilde{\boldsymbol{\beta}}=\arg \min _{\boldsymbol{\beta}_{\overline{\mathcal{I}}}=0} \mathcal{L}_{n}(\boldsymbol{\beta}), +and +τs>0\tau_{s}>0 +be a threshold. If +τs<\tau_{s}<n(𝛃̂)n(𝛃̃)\mathcal{L}_{n}(\hat{\boldsymbol\beta})-\mathcal{L}_{n}(\tilde{\boldsymbol\beta}), +then +Ã\tilde{A} +is preferable to +𝒜.\mathcal{A} . +The active set can be updated iteratively until the loss function cannot +be improved by splicing. Once the algorithm recovers the true active +set, we may splice some irrelevant variables, and then the loss function +may decrease slightly. The threshold +τs\tau_{s} +can reduce this unnecessary calculation. Typically, +τs\tau_{s} +is relatively small, +e.g. τs=0.01slog(p)log(logn)/n.\tau_{s}=0.01 s \log (p) \log (\log n) / n.

Algorithm 1: BESS.Fix(s): Best-Subset Selection with a given support size s.
    -
  1. Input: \(X, y\), positive integers -\(k_{\max}, m_{\max}\), and a threshold -\(\tau_{s}\).
  2. -
  3. Initialize \(\mathcal{A}^{0}=\left\{j: -\sum_{i=1}^{p} \mathrm{I}\left(\left|\frac{X_{j}^{\top} -y}{\sqrt{X_{j}^{\top} X_{j}}}\right| \leq \left| \frac{X_{i}^{\top} -y}{\sqrt{X_{i}^{\top} X_{i}}}\right| \leq \mathrm{s}\right\}, -\mathcal{I}^{0}=\left(\mathcal{A}^{0}\right)^{c}\right.\), and -\(\left(\boldsymbol\beta^{0}, -d^{0}\right):\) +
  4. Input: +X,yX, y, +positive integers +kmax,mmaxk_{\max}, m_{\max}, +and a threshold +τs\tau_{s}.
  5. +
  6. Initialize +𝒜0={j:i=1pI(|XjyXjXj||XiyXiXi|s},0=(𝒜0)c\mathcal{A}^{0}=\left\{j: \sum_{i=1}^{p} \mathrm{I}\left(\left|\frac{X_{j}^{\top} y}{\sqrt{X_{j}^{\top} X_{j}}}\right| \leq \left| \frac{X_{i}^{\top} y}{\sqrt{X_{i}^{\top} X_{i}}}\right| \leq \mathrm{s}\right\}, \mathcal{I}^{0}=\left(\mathcal{A}^{0}\right)^{c}\right., +and +(𝛃0,d0):\left(\boldsymbol\beta^{0}, d^{0}\right):
-

\[\begin{align*} +

𝛃00=0,d𝒜00=0,𝛃𝒜00=(𝐗𝒜0𝐗𝒜0)1𝐗𝒜0𝐲,d00=X0(𝐲𝐗𝛃0).\begin{align*} &\boldsymbol{\beta}_{\mathcal{I}^{0}}^{0}=0,\\ &d_{\mathcal{A}^{0}}^{0}=0,\\ -&\boldsymbol{\beta}_{\mathcal{A}^{0}}^{0}=\left(\boldsymbol{X}_{\mathcal{A}^{0}}^{\top} -\boldsymbol{X}_{\mathcal{A}^{0}}\right)^{-1} -\boldsymbol{X}_{\mathcal{A}^{0}}^{\top} \boldsymbol{y},\\ -&d_{\mathcal{I}^{0}}^{0}=X_{\mathcal{I}^{0}}^{\top}\left(\boldsymbol{y}-\boldsymbol{X} -\boldsymbol{\beta}^{0}\right). -\end{align*}\]

+&\boldsymbol{\beta}_{\mathcal{A}^{0}}^{0}=\left(\boldsymbol{X}_{\mathcal{A}^{0}}^{\top} \boldsymbol{X}_{\mathcal{A}^{0}}\right)^{-1} \boldsymbol{X}_{\mathcal{A}^{0}}^{\top} \boldsymbol{y},\\ +&d_{\mathcal{I}^{0}}^{0}=X_{\mathcal{I}^{0}}^{\top}\left(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}^{0}\right). +\end{align*}

  1. -

    For \(m=0,1, \ldots, m_{\max}\), +

    For +m=0,1,,mmaxm=0,1, \ldots, m_{\max}, do

    -

    \[\left(\boldsymbol{\beta}^{m+1}, -\boldsymbol{d}^{m+1}, \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)= -\text{Splicing} \left(\boldsymbol{\beta}^{m}, \boldsymbol{d}^{m}, -\mathcal{A}^{m}, \mathcal{I}^{m}, k_{\max }, -\tau_{s}\right).\]

    -

    If \(\left(\mathcal{A}^{m+1}, -\mathcal{I}^{m+1}\right)=\left(\mathcal{A}^{m}, -\mathcal{I}^{m}\right)\), then stop

    +

    (𝛃m+1,𝐝m+1,𝒜m+1,m+1)=Splicing(𝛃m,𝐝m,𝒜m,m,kmax,τs).\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1}, \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)= \text{Splicing} \left(\boldsymbol{\beta}^{m}, \boldsymbol{d}^{m}, \mathcal{A}^{m}, \mathcal{I}^{m}, k_{\max }, \tau_{s}\right).

    +

    If +(𝒜m+1,m+1)=(𝒜m,m)\left(\mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)=\left(\mathcal{A}^{m}, \mathcal{I}^{m}\right), +then stop

    End For

  2. -
  3. Output \((\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})=\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1} -\mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right).\)

  4. +
  5. Output +(𝛃̂,𝐝̂,𝒜̂,̂)=(𝛃m+1,𝐝m+1𝒜m+1,m+1).(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1} \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right).

-
Algorithm 2: Splicing \(\left(\boldsymbol\beta, d, \mathcal{A}, -\mathcal{I}, k_{\max }, \tau_{s}\right)\) +
Algorithm 2: Splicing +(𝛃,d,𝒜,,kmax,τs)\left(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}, k_{\max }, \tau_{s}\right)
    -
  1. Input: \(\boldsymbol{\beta}, -\boldsymbol{d}, \mathcal{A}, \mathcal{I}, k_{\max }\), and \(\tau_{\mathrm{s}} .\)

  2. -
  3. Initialize \(L_{0}=L=\frac{1}{2 n}\|y-X -\beta\|_{2}^{2}\), and set \(\xi_{j}=\frac{X_{j}^{\top} X_{j}}{2 -n}\left(\beta_{j}\right)^{2}, \zeta_{j}=\frac{X_{j}^{\top} X_{j}}{2 -n}\left(\frac{d_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}, j=1, \ldots, -p.\)

  4. +
  5. Input: +𝛃,𝐝,𝒜,,kmax\boldsymbol{\beta}, \boldsymbol{d}, \mathcal{A}, \mathcal{I}, k_{\max }, +and +τs.\tau_{\mathrm{s}} .

  6. +
  7. Initialize +L0=L=12nyXβ22L_{0}=L=\frac{1}{2 n}\|y-X \beta\|_{2}^{2}, +and set +ξj=XjXj2n(βj)2,ζj=XjXj2n(djXjXj/n)2,j=1,,p.\xi_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\beta_{j}\right)^{2}, \zeta_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\frac{d_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}, j=1, \ldots, p.

  8. -

    For \(k=1,2, \ldots, k_{\max -}\), do

    -

    \[\mathcal{A}_{k}=\left\{j \in -\mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq -\xi_{i}\right) \leq k\right\}\]

    -

    \[\mathcal{I}_{k}=\left\{j \in -\mathcal{I}: \sum_{i \in \mathcal{I}} \mathrm{I}\left(\zeta_{j} \leq -\zeta_{i}\right) \leq k\right\}\]

    -

    Let \(\tilde{\mathcal{A}}_{k}=\left(\mathcal{A} -\backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}, -\tilde{\mathcal{I}}_{k}=\left(\mathcal{I} \backslash -\mathcal{I}_{k}\right) \cup \mathcal{A}_{k}\) and solve

    -

    \[\tilde{\boldsymbol{\beta}}_{{\mathcal{A}}_{k}}=\left(\boldsymbol{X}_{\mathcal{A}_{k}}^{\top} -\boldsymbol{X}_{{\mathcal{A}}_{k}}\right)^{-1} -\boldsymbol{X}_{{\mathcal{A}_{k}}}^{\top} y, \quad -\tilde{\boldsymbol{\beta}}_{{\mathcal{I}}_{k}}=0\]

    -

    \[\tilde{\boldsymbol -d}_{\mathcal{I}^k}=X_{\mathcal{I}^k}^{\top}(y-X \tilde{\beta}) / n,\quad -\tilde{\boldsymbol d}_{\mathcal{A}^k} = 0.\]

    -

    Compute \(\mathcal{L}_{n}(\tilde{\boldsymbol\beta})=\frac{1}{2 -n}\|y-X \tilde{\boldsymbol\beta}\|_{2}^{2}.\)

    -

    If \(L>\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), +

    For +k=1,2,,kmaxk=1,2, \ldots, k_{\max }, +do

    +

    𝒜k={j𝒜:i𝒜I(ξjξi)k}\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\}

    +

    k={j:iI(ζjζi)k}\mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} \mathrm{I}\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\}

    +

    Let +𝒜̃k=(𝒜𝒜k)k,̃k=(k)𝒜k\tilde{\mathcal{A}}_{k}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}, \tilde{\mathcal{I}}_{k}=\left(\mathcal{I} \backslash \mathcal{I}_{k}\right) \cup \mathcal{A}_{k} +and solve

    +

    𝛃̃𝒜k=(𝐗𝒜k𝐗𝒜k)1𝐗𝒜ky,𝛃̃k=0\tilde{\boldsymbol{\beta}}_{{\mathcal{A}}_{k}}=\left(\boldsymbol{X}_{\mathcal{A}_{k}}^{\top} \boldsymbol{X}_{{\mathcal{A}}_{k}}\right)^{-1} \boldsymbol{X}_{{\mathcal{A}_{k}}}^{\top} y, \quad \tilde{\boldsymbol{\beta}}_{{\mathcal{I}}_{k}}=0

    +

    𝐝̃k=Xk(yXβ̃)/n,𝐝̃𝒜k=0.\tilde{\boldsymbol d}_{\mathcal{I}^k}=X_{\mathcal{I}^k}^{\top}(y-X \tilde{\beta}) / n,\quad \tilde{\boldsymbol d}_{\mathcal{A}^k} = 0.

    +

    Compute +n(𝛃̃)=12nyX𝛃̃22.\mathcal{L}_{n}(\tilde{\boldsymbol\beta})=\frac{1}{2 n}\|y-X \tilde{\boldsymbol\beta}\|_{2}^{2}.

    +

    If +L>n(𝛃̃)L>\mathcal{L}_{n}(\tilde{\boldsymbol\beta}), then

    -

    \[(\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})=\left(\tilde{\boldsymbol{\beta}}, -\tilde{\boldsymbol{d}}, \tilde{\mathcal{A}}_{k}, -\tilde{\mathcal{I}}_{k}\right)\]

    -

    \[L=\mathcal{L}_{n}(\tilde{\boldsymbol\beta}).\]

    +

    (𝛃̂,𝐝̂,𝒜̂,̂)=(𝛃̃,𝐝̃,𝒜̃k,̃k)(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\tilde{\boldsymbol{\beta}}, \tilde{\boldsymbol{d}}, \tilde{\mathcal{A}}_{k}, \tilde{\mathcal{I}}_{k}\right)

    +

    L=n(𝛃̃).L=\mathcal{L}_{n}(\tilde{\boldsymbol\beta}).

    End for

  9. -
  10. If \(L_{0}-L<\tau_{s}\), then -\((\hat{\boldsymbol\beta}, \hat{d}, \hat{A}, -\hat{I})=(\boldsymbol\beta, d, \mathcal{A}, -\mathcal{I}).\)

  11. -
  12. Output \((\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})\).

  13. +
  14. If +L0L<τsL_{0}-L<\tau_{s}, +then +(𝛃̂,d̂,Â,Î)=(𝛃,d,𝒜,).(\hat{\boldsymbol\beta}, \hat{d}, \hat{A}, \hat{I})=(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}).

  15. +
  16. Output +(𝛃̂,𝐝̂,𝒜̂,̂)(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}}).

@@ -336,43 +354,52 @@
Determining the Best Support Size with SIC

In practice, the support size is usually unknown. We use a -data-driven procedure to determine s. For any active set \(\mathcal{A}\), define an \(\mathrm{SIC}\) as follows: \[ -\operatorname{SIC}(\mathcal{A})=n \log -\mathcal{L}_{\mathcal{A}}+|\mathcal{A}| \log (p) \log \log n, -\] where \(\mathcal{L}_{\mathcal{A}}=\min -_{\beta_{\mathcal{I}}=0} \mathcal{L}_{n}(\beta), -\mathcal{I}=(\mathcal{A})^{c}\). To identify the true model, the -model complexity penalty is \(\log p\) -and the slow diverging rate \(\log \log -n\) is set to prevent underfitting. Theorem 4 states that the -following ABESS algorithm selects the true support size via SIC.

-

Let \(s_{\max }\) be the maximum -support size. We suggest \(s_{\max -}=o\left(\frac{n}{\log p}\right)\) as the maximum possible -recovery size. Typically, we set \(s_{\max -}=\left[\frac{n}{\log (p) \log \log n}\right]\) where \([x]\) denotes the integer part of \(x\).

+data-driven procedure to determine s. For any active set +𝒜\mathcal{A}, +define an +SIC\mathrm{SIC} +as follows: +SIC(𝒜)=nlog𝒜+|𝒜|log(p)loglogn, +\operatorname{SIC}(\mathcal{A})=n \log \mathcal{L}_{\mathcal{A}}+|\mathcal{A}| \log (p) \log \log n, + where +𝒜=minβ=0n(β),=(𝒜)c\mathcal{L}_{\mathcal{A}}=\min _{\beta_{\mathcal{I}}=0} \mathcal{L}_{n}(\beta), \mathcal{I}=(\mathcal{A})^{c}. +To identify the true model, the model complexity penalty is +logp\log p +and the slow diverging rate +loglogn\log \log n +is set to prevent underfitting. Theorem 4 states that the following +ABESS algorithm selects the true support size via SIC.

+

Let +smaxs_{\max } +be the maximum support size. We suggest +smax=o(nlogp)s_{\max }=o\left(\frac{n}{\log p}\right) +as the maximum possible recovery size. Typically, we set +smax=[nlog(p)loglogn]s_{\max }=\left[\frac{n}{\log (p) \log \log n}\right] +where +[x][x] +denotes the integer part of +xx.

Algorithm 3: ABESS.
    -
  1. Input: \(X, y\), and the maximum -support size \(s_{\max } .\)

  2. +
  3. Input: +X,yX, y, +and the maximum support size +smax.s_{\max } .

  4. -

    For \(s=1,2, \ldots, s_{\max -}\), do

    -

    \[\left(\hat{\boldsymbol{\beta}}_{s}, -\hat{\boldsymbol{d}}_{s}, \hat{\mathcal{A}}_{s}, -\hat{\mathcal{I}}_{s}\right)= \text{BESS.Fixed}(s).\]

    +

    For +s=1,2,,smaxs=1,2, \ldots, s_{\max }, +do

    +

    (𝛃̂s,𝐝̂s,𝒜̂s,̂s)=BESS.Fixed(s).\left(\hat{\boldsymbol{\beta}}_{s}, \hat{\boldsymbol{d}}_{s}, \hat{\mathcal{A}}_{s}, \hat{\mathcal{I}}_{s}\right)= \text{BESS.Fixed}(s).

    End for

  5. Compute the minimum of SIC:

    -

    \[s_{\min }=\arg \min _{s} -\operatorname{SIC}\left(\hat{\mathcal{A}}_{s}\right).\]

    +

    smin=argminsSIC(𝒜̂s).s_{\min }=\arg \min _{s} \operatorname{SIC}\left(\hat{\mathcal{A}}_{s}\right).

  6. -
  7. Output \(\left(\hat{\boldsymbol{\beta}}_{s_{\operatorname{min}}}, -\hat{\boldsymbol{d}}_{s_{\min }}, \hat{A}_{s_{\min }}, -\hat{\mathcal{I}}_{s_{\min }}\right) .\)

  8. +
  9. Output +(𝛃̂smin,𝐝̂smin,Âsmin,̂smin).\left(\hat{\boldsymbol{\beta}}_{s_{\operatorname{min}}}, \hat{\boldsymbol{d}}_{s_{\min }}, \hat{A}_{s_{\min }}, \hat{\mathcal{I}}_{s_{\min }}\right) .

@@ -390,41 +417,41 @@

The corresponding notatio Notations in algorithm -Parameters in \(\mathtt{abess}\) +Parameters in +𝚊𝚋𝚎𝚜𝚜\mathtt{abess} Definitions -\(m_{\max}\) +mmaxm_{\max} max.splicing.iter The maximum number of splicing algorithm -\(k_{\max}\) +kmaxk_{\max} c.max The maximum splicing size -\(1,\ldots,s_{\max}\) +1,,smax1,\ldots,s_{\max} support.size A sequence representing the support sizes -\(\operatorname{SIC}\) +SIC\operatorname{SIC} tune.type The type of criterion for choosing the support size -\(\{G_1, \ldots, -G_J\}\) +{G1,,GJ}\{G_1, \ldots, G_J\} group.index A vector indicator the group that each variable belongs to -\(\lambda\) +λ\lambda lambda A value for regularized best subset selection @@ -440,9 +467,7 @@

The corresponding notatio +

@@ -455,17 +480,17 @@

The corresponding notatio

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -128,7 +126,7 @@

Liyuan Hu

8/2/2021

- Source: ../vignettes/v11-power-of-abess.Rmd + Source: vignettes/v11-power-of-abess.Rmd
@@ -148,8 +146,9 @@

SettingsAll these packages are compared in three aspects including the prediction performance, the variable selection performance, and the computation efficiency. The prediction performance of the linear model -is measured by \(\|y - \hat{y}\|_2\) on -a test set and for logistic regression this is measured by the area +is measured by +yŷ2\|y - \hat{y}\|_2 +on a test set and for logistic regression this is measured by the area under the ROC Curve. For the variable selection performance, we compute the true positive rate (TPR, which is the proportion of variables in the active set that are correctly identified) and the false positive rate @@ -158,18 +157,31 @@

Settings

The designed matrix is formed by i.i.d samples generated from a -multivariate normal distribution with mean 0 and covariance matrix \(\Sigma = (\sigma_{ij})\). We consider two -settings—low correlation and high correlation of the predictors. For the -low correlation scenario, we set \(\sigma_{ij} -= 0.1^{|i-j|}\) and for the high correlation \(\sigma_{ij} = 0.7\). The number of -predictors is 1000. The true coefficient \(\beta^*\) is a vector with 10 nonzero -entries uniformly distributed in \([b,B]\). We set \(b=5\sqrt{2\log(p)/n}\), \(B = 100b\) for linear regression \(b = 10\sqrt{2\log(p)/n}\), \(B = 5 \times b\) for logistic regression -and \(b = -10 \sqrt{2 \log(p) / n}\), -\(B=10 \sqrt{2 \log(p) / n}\) for -poisson regression. A random noise generated from a standard Gaussian -distribution is added to the linear predictor \(x^\prime\beta\) for linear regression. The -size of training data for linear regression is 500 and 1000 for logistic -regression.

+multivariate normal distribution with mean 0 and covariance matrix +Σ=(σij)\Sigma = (\sigma_{ij}). +We consider two settings—low correlation and high correlation of the +predictors. For the low correlation scenario, we set +σij=0.1|ij|\sigma_{ij} = 0.1^{|i-j|} +and for the high correlation +σij=0.7\sigma_{ij} = 0.7. +The number of predictors is 1000. The true coefficient +β*\beta^* +is a vector with 10 nonzero entries uniformly distributed in +[b,B][b,B]. +We set +b=52log(p)/nb=5\sqrt{2\log(p)/n}, +B=100bB = 100b +for linear regression +b=102log(p)/nb = 10\sqrt{2\log(p)/n}, +B=5×bB = 5 \times b +for logistic regression and +b=102log(p)/nb = -10 \sqrt{2 \log(p) / n}, +B=102log(p)/nB=10 \sqrt{2 \log(p) / n} +for poisson regression. A random noise generated from a standard +Gaussian distribution is added to the linear predictor +xβx^\prime\beta +for linear regression. The size of training data for linear regression +is 500 and 1000 for logistic regression.

All experiments are evaluated on an Ubuntu system with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz 3.31 GHz and under R version 3.6.3.

+

@@ -231,17 +241,17 @@

Results

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - - - +
- +
@@ -118,7 +116,7 @@ - +
@@ -126,7 +124,7 @@

Robust Principal Component Analysis

- Source: ../vignettes/v12-Robust-Principal-Component-Analysis.Rmd + Source: vignettes/v12-Robust-Principal-Component-Analysis.Rmd
@@ -142,12 +140,16 @@

PCA

Principal component analysis (PCA) is an important method in the field of data science, which can reduce the dimension of data and simplify our model. It solves an optimization problem like:

-

\[ +

maxvvTΣv,s.t.vTv=1 \max_v v^T \Sigma v, \quad s.t.\ v^Tv=1 -\] where \(\Sigma = X^TX/(n-1)\) -and \(X\in \mathbb{R}^{n\times p}\) is -the centered sample matrix with each row containing one observation of -\(p\) variables.

+ where +Σ=XTX/(n1)\Sigma = X^TX/(n-1) +and +Xn×pX\in \mathbb{R}^{n\times p} +is the centered sample matrix with each row containing one observation +of +pp +variables.

Robust-PCA (RPCA) @@ -163,59 +165,89 @@

Robust-PCA (RPCA)In these situations, PCA may spend too much attention on unnecessary variables. That’s why Robust-PCA (RPCA) is presented, which can be used to recover the (low-rank) sample for subsequent processing.

-

In mathematics, RPCA manages to divide the sample matrix \(X\) into two parts: \[ +

In mathematics, RPCA manages to divide the sample matrix +XX +into two parts: +X=S+L, X=S+L, -\] where \(S\) is the sparse -“outlier” matrix and \(L\) is the -“information” matrix with a low rank. Generally, we also suppose \(S\) is not low-rank and \(L\) is not sparse, in order to get the -unique solution.

+ where +SS +is the sparse “outlier” matrix and +LL +is the “information” matrix with a low rank. Generally, we also suppose +SS +is not low-rank and +LL +is not sparse, in order to get the unique solution.

-

In Lagrange format, \[ -\min _{S, L}\|X-S-L\|_{F} \leq \varepsilon, s . t . \quad -\operatorname{rank}(L)=r,\|S\|_{0} \leq s -\] where \(s\) is the sparsity -of \(S\). After RPCA, the information -matrix \(L\) can be used in further -analysis.

-

Note that it does NOT deal with “noise”, which may stay in \(L\) and need further procession.

+

In Lagrange format, +minS,LXSLFε,s.t.rank(L)=r,S0s +\min _{S, L}\|X-S-L\|_{F} \leq \varepsilon, s . t . \quad \operatorname{rank}(L)=r,\|S\|_{0} \leq s + where +ss +is the sparsity of +SS. +After RPCA, the information matrix +LL +can be used in further analysis.

+

Note that it does NOT deal with “noise”, which may stay in +LL +and need further procession.

Hard Impute

To solve its sub-problem, RPCA with known outlier positions, we follow a process called “Hard Impute”. The main idea is to estimate the -outlier values by precise values with KPCA, where \(K=r\).

+outlier values by precise values with KPCA, where +K=rK=r.

Here are the steps:

    -
  1. Input \(X\), outliers, -\(M, \varepsilon\), where outliers -record the non-zero positions in \(S\);
  2. -
  3. Denote \(X_{n e w} \leftarrow -\mathbf{0}\) with the same shape of \(X\);
  4. -
  5. For \(i=1,2, \ldots, M\) +
  6. Input +XX, +outliers, +M,εM, \varepsilon, +where outliers record the non-zero positions in +SS;
  7. +
  8. Denote +Xnew𝟎X_{n e w} \leftarrow \mathbf{0} +with the same shape of +XX;
  9. +
  10. For +i=1,2,,Mi=1,2, \ldots, M
    -
  • \(X_{o l -d}=\left\{\begin{array}{ll}X_{\text {new }}, & \text { for outliers -} \\ X, & \text { for others }\end{array}\right.\)
  • -
  • Form KPCA on \(X_{o l d}\) with -\(K=r\), and denote \(v\) as the eigenvectors;
  • +
  • Xold={Xnew , for outliers X, for others X_{o l d}=\left\{\begin{array}{ll}X_{\text {new }}, & \text { for outliers } \\ X, & \text { for others }\end{array}\right.
  • +
  • Form KPCA on +XoldX_{o l d} +with +K=rK=r, +and denote +vv +as the eigenvectors;
  • -\(X_{\text {new }}=X_{\text {old }} \cdot -v \cdot v^{T}\);
  • -
  • If \(\left\|X_{n e w}-X_{o l -d}\right\|<\varepsilon\), break: End for;
  • +Xnew =Xold vvTX_{\text {new }}=X_{\text {old }} \cdot v \cdot v^{T}; +
  • If +XnewXold<ε\left\|X_{n e w}-X_{o l d}\right\|<\varepsilon, +break: End for;
    -
  1. Return \(X_{\text {new }}\) as -\(L\); where \(M\) is the maximum iteration times and -\(\varepsilon\) is the convergence -coefficient.
  2. +
  3. Return +Xnew X_{\text {new }} +as +LL; +where +MM +is the maximum iteration times and +ε\varepsilon +is the convergence coefficient.
-

The final \(X_{\text {new }}\) is -supposed to be \(L\) under given -outlier positions.

+

The final +Xnew X_{\text {new }} +is supposed to be +LL +under given outlier positions.

RPCA Application @@ -308,9 +340,7 @@

More on the result - -

+ @@ -323,17 +353,17 @@

More on the result

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -92,45 +92,45 @@

Authors and Citation

- +
  • Jin Zhu. Author, maintainer.

  • -

    Zezhi Wang. Author. +

    Zezhi Wang. Author.

  • -

    Liyuan Hu. Author. +

    Liyuan Hu. Author.

  • -

    Junhao Huang. Author. +

    Junhao Huang. Author.

  • -

    Kangkang Jiang. Author. +

    Kangkang Jiang. Author.

  • -

    Yanhang Zhang. Author. +

    Yanhang Zhang. Author.

  • -

    Borui Tang. Author. +

    Borui Tang. Author.

  • -

    Shiyun Lin. Author. +

    Shiyun Lin. Author.

  • -

    Junxian Zhu. Author. +

    Junxian Zhu. Author.

  • -

    Canhong Wen. Author. +

    Canhong Wen. Author.

  • @@ -142,7 +142,7 @@

    Authors and Citation

  • -

    spectra contributors. Copyright holder. +

    spectra contributors. Copyright holder.
    Spectra implementation

@@ -157,7 +157,7 @@

Citation

Zhu J, Wen C, Zhu J, Zhang H, Wang X (2020). “A polynomial algorithm for best-subset selection problem.” Proceedings of the National Academy of Sciences, 117(52), 33117–33123. -doi:10.1073/pnas.2014241117. +doi:10.1073/pnas.2014241117.

@Article{,
   title = {A polynomial algorithm for best-subset selection problem},
@@ -172,7 +172,7 @@ 

Citation

Zhu J, Wang X, Hu L, Huang J, Jiang K, Zhang Y, Lin S, Zhu J (2022). “abess: A Fast Best Subset Selection Library in Python and R.” Journal of Machine Learning Research, 23(202), 1–7. -https://www.jmlr.org/papers/v23/21-1060.html. +https://www.jmlr.org/papers/v23/21-1060.html.

@Article{,
   title = {abess: A Fast Best Subset Selection Library in Python and R},
@@ -187,7 +187,7 @@ 

Citation

Zhang Y, Zhu J, Zhu J, Wang X (2023). “A Splicing Approach to Best Subset of Groups Selection.” INFORMS Journal on Computing, 35(1), 104–119. -doi:10.1287/ijoc.2022.1241. +doi:10.1287/ijoc.2022.1241.

@Article{,
   title = {A Splicing Approach to Best Subset of Groups Selection},
@@ -211,16 +211,16 @@ 

Citation

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -163,16 +163,16 @@
-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -141,16 +141,16 @@

Author

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -102,7 +102,7 @@

Adaptive best subset selection (for generalized linear model)

-
# S3 method for default
+    
# Default S3 method
 abess(
   x,
   y,
@@ -140,19 +140,21 @@ 

Adaptive best subset selection (for generalized linear model)

... ) -# S3 method for formula +# S3 method for class 'formula' abess(formula, data, subset, na.action, ...)

Arguments

-
x
+ + +
x

Input matrix, of dimension \(n \times p\); each row is an observation vector and each column is a predictor/feature/variable. Can be in sparse matrix format (inherit from class "dgCMatrix" in package Matrix).

-
y
+
y

The response variable, of n observations. For family = "binomial" should have two levels. For family="poisson", y should be a vector with positive integer. @@ -165,7 +167,7 @@

Arguments

if y is presented as a numerical vector, it will be coerced into a factor.

-
family
+
family

One of the following models: "gaussian" (continuous response), "binomial" (binary response), @@ -178,25 +180,25 @@

Arguments

Depending on the response. Any unambiguous substring can be given.

-
tune.path
+
tune.path

The method to be used to select the optimal support size. For tune.path = "sequence", we solve the best subset selection problem for each size in support.size. For tune.path = "gsection", we solve the best subset selection problem with support size ranged in gs.range, where the specific support size to be considered is determined by golden section.

-
tune.type
+
tune.type

The type of criterion for choosing the support size. Available options are "gic", "ebic", "bic", "aic" and "cv". Default is "gic".

-
weight
+
weight

Observation weights. When weight = NULL, we set weight = 1 for each observation as default.

-
normalize
+
normalize

Options for normalization. normalize = 0 for no normalization. normalize = 1 for subtracting the means of the columns of x and y, and also @@ -208,44 +210,44 @@

Arguments

3 for "cox". Default is normalize = NULL.

-
fit.intercept
+
fit.intercept

A boolean value indicating whether to fit an intercept. We assume the data has been centered if fit.intercept = FALSE. Default: fit.intercept = FALSE.

-
beta.low
+
beta.low

A single value specifying the lower bound of \(\beta\). Default is -.Machine$double.xmax.

-
beta.high
+
beta.high

A single value specifying the upper bound of \(\beta\). Default is .Machine$double.xmax.

-
c.max
+
c.max

an integer splicing size. Default is: c.max = 2.

-
support.size
+
support.size

An integer vector representing the alternative support sizes. Only used for tune.path = "sequence". Default is 0:min(n, round(n/(log(log(n))log(p)))).

-
gs.range
+
gs.range

A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is gs.range = c(1, min(n, round(n/(log(log(n))log(p))))).

-
lambda
+
lambda

A single lambda value for regularized best subset selection. Default is 0.

-
always.include
+
always.include

An integer vector containing the indexes of variables that should always be included in the model.

-
group.index
+
group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. @@ -254,12 +256,12 @@

Arguments

please set group.index = NULL (the default).

-
init.active.set
+
init.active.set

A vector of integers indicating the initial active set. Default: init.active.set = NULL.

-
splicing.type
+
splicing.type

Optional type for splicing. If splicing.type = 1, the number of variables to be spliced is c.max, ..., 1; if splicing.type = 2, @@ -267,45 +269,45 @@

Arguments

(Default: splicing.type = 2.)

-
max.splicing.iter
+
max.splicing.iter

The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is max.splicing.iter = 20.

-
screening.num
+
screening.num

An integer number. Preserve screening.num number of predictors with the largest marginal maximum likelihood estimator before running algorithm.

-
important.search
+

An integer number indicating the number of important variables to be splicing. When important.search \(\ll\) p variables, it would greatly reduce runtimes. Default: important.search = 128.

-
warm.start
+
warm.start

Whether to use the last solution as a warm start. Default is warm.start = TRUE.

-
nfolds
+
nfolds

The number of folds in cross-validation. Default is nfolds = 5.

-
foldid
+
foldid

an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in. The default foldid = NULL would generate a random foldid.

-
cov.update
+
cov.update

A logical value only used for family = "gaussian". If cov.update = TRUE, use a covariance-based implementation; otherwise, a naive implementation. The naive method is more computational efficient than covariance-based method when \(p >> n\) and important.search is much large than its default value. Default: cov.update = FALSE.

-
newton
+
newton

A character specify the Newton's method for fitting generalized linear models, it should be either newton = "exact" or newton = "approx". If newton = "exact", then the exact hessian is used, @@ -313,58 +315,58 @@

Arguments

and can be faster (especially when family = "cox").

-
newton.thresh
+
newton.thresh

a numeric value for controlling positive convergence tolerance. The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\) newton.thresh.

-
max.newton.iter
+
max.newton.iter

a integer giving the maximal number of Newton's iteration iterations. Default is max.newton.iter = 10 if newton = "exact", and max.newton.iter = 60 if newton = "approx".

-
early.stop
+
early.stop

A boolean value decide whether early stopping. If early.stop = TRUE, algorithm will stop if the last tuning value less than the existing one. Default: early.stop = FALSE.

-
ic.scale
+
ic.scale

A non-negative value used for multiplying the penalty term in information criterion. Default: ic.scale = 1.

-
num.threads
+
num.threads

An integer decide the number of threads to be concurrently used for cross-validation (i.e., tune.type = "cv"). If num.threads = 0, then all of available cores will be used. Default: num.threads = 0.

-
seed
+
seed

Seed to be used to divide the sample into cross-validation folds. Default is seed = 1.

-
...
+
...

further arguments to be passed to or from methods.

-
formula
+
formula

an object of class "formula": a symbolic description of the model to be fitted. The details of model specification are given in the "Details" section of "formula".

-
data
+
data

a data frame containing the variables in the formula.

-
subset
+
subset

an optional vector specifying a subset of observations to be used.

-
na.action
+
na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

@@ -372,9 +374,7 @@

Arguments

Value

- - -

A S3 abess class object, which is a list with the following components:

+

A S3 abess class object, which is a list with the following components:

beta

A \(p\)-by-length(support.size) matrix of coefficients for univariate family, stored in column format; while a list of length(support.size) coefficients matrix (with size \(p\)-by-ncol(y)) for multivariate family.

@@ -866,16 +866,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -125,7 +125,9 @@

Adaptive best subset selection for principal component analysis

Arguments

-
x
+ + +
x

A matrix object. It can be either a predictor matrix where each row is an observation and each column is a predictor or a sample covariance/correlation matrix. @@ -133,29 +135,29 @@

Arguments

(inherit from class "dgCMatrix" in package Matrix).

-
type
+
type

If type = "predictor", x is considered as the predictor matrix. If type = "gram", x is considered as a sample covariance or correlation matrix.

-
sparse.type
+
sparse.type

If sparse.type = "fpc", then best subset selection performs on the first principal component; If sparse.type = "kpc", then best subset selection would be sequentially performed on the first kpc.num number of principal components. If kpc.num is supplied, the default is sparse.type = "kpc"; otherwise, is sparse.type = "fpc".

-
cor
+
cor

A logical value. If cor = TRUE, perform PCA on the correlation matrix; otherwise, the covariance matrix. This option is available only if type = "predictor". Default: cor = FALSE.

-
kpc.num
+
kpc.num

A integer decide the number of principal components to be sequentially considered.

-
support.size
+
support.size

It is a flexible input. If it is an integer vector. It represents the support sizes to be considered for each principal component. If it is a list object containing kpc.num number of integer vectors, @@ -164,49 +166,49 @@

Arguments

The default is support.size = NULL, and some rules in details section are used to specify support.size.

-
gs.range
+
gs.range

A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is gs.range = c(1, min(n, round(n/(log(log(n))log(p))))).

-
tune.path
+
tune.path

The method to be used to select the optimal support size. For tune.path = "sequence", we solve the best subset selection problem for each size in support.size. For tune.path = "gsection", we solve the best subset selection problem with support size ranged in gs.range, where the specific support size to be considered is determined by golden section.

-
tune.type
+
tune.type

The type of criterion for choosing the support size. Available options are "gic", "ebic", "bic", "aic" and "cv". Default is "gic". tune.type = "cv" is available only when type = "predictor".

-
nfolds
+
nfolds

The number of folds in cross-validation. Default is nfolds = 5.

-
foldid
+
foldid

an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in. The default foldid = NULL would generate a random foldid.

-
ic.scale
+
ic.scale

A non-negative value used for multiplying the penalty term in information criterion. Default: ic.scale = 1.

-
c.max
+
c.max

an integer splicing size. The default of c.max is the maximum of 2 and max(support.size) / 2.

-
always.include
+
always.include

An integer vector containing the indexes of variables that should always be included in the model.

-
group.index
+
group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. @@ -215,12 +217,12 @@

Arguments

please set group.index = NULL (the default).

-
screening.num
+
screening.num

An integer number. Preserve screening.num number of predictors with the largest marginal maximum likelihood estimator before running algorithm.

-
splicing.type
+
splicing.type

Optional type for splicing. If splicing.type = 1, the number of variables to be spliced is c.max, ..., 1; if splicing.type = 2, @@ -228,32 +230,30 @@

Arguments

Default: splicing.type = 1.

-
max.splicing.iter
+
max.splicing.iter

The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is max.splicing.iter = 20.

-
warm.start
+
warm.start

Whether to use the last solution as a warm start. Default is warm.start = TRUE.

-
num.threads
+
num.threads

An integer decide the number of threads to be concurrently used for cross-validation (i.e., tune.type = "cv"). If num.threads = 0, then all of available cores will be used. Default: num.threads = 0.

-
...
+
...

further arguments to be passed to or from methods.

Value

- - -

A S3 abesspca class object, which is a list with the following components:

+

A S3 abesspca class object, which is a list with the following components:

coef

A \(p\)-by-length(support.size) loading matrix of sparse principal components (PC), where each row is a variable and each column is a support size;

@@ -399,10 +399,10 @@

Examples

#> abesspca(x = rob_cov, type = "gram") #> #> PC support.size ev pev -#> 1 1 1 5389.433 0.9451258 -#> 2 1 2 5467.001 0.9587285 -#> 3 1 3 5525.057 0.9689096 -#> 4 1 4 5535.616 0.9707613 +#> 1 1 1 5322.730 0.9534441 +#> 2 1 2 5375.003 0.9628076 +#> 3 1 3 5406.960 0.9685319 +#> 4 1 4 5419.600 0.9707961 ## K-component principal component analysis pca_fit <- abesspca(USArrests, @@ -455,16 +455,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -126,53 +126,55 @@

Adaptive best subset selection for robust principal component analysis

Arguments

-
x
+ + +
x

A matrix object.

-
rank
+
rank

A positive integer value specify the rank of the low-rank matrix.

-
support.size
-

An integer vector representing the alternative support sizes. -Only used for tune.path = "sequence". +

support.size
+

An integer vector representing the alternative support sizes. +Only used for tune.path = "sequence". Strongly suggest its minimum value larger than min(dim(x)).

-
tune.path
+
tune.path

The method to be used to select the optimal support size. For tune.path = "sequence", we solve the best subset selection problem for each size in support.size. For tune.path = "gsection", we solve the best subset selection problem with support size ranged in gs.range, where the specific support size to be considered is determined by golden section.

-
gs.range
+
gs.range

A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is gs.range = c(1, min(n, round(n/(log(log(n))log(p))))).

-
tune.type
-

The type of criterion for choosing the support size. -Available options are "gic", "ebic", "bic" and "aic". +

tune.type
+

The type of criterion for choosing the support size. +Available options are "gic", "ebic", "bic" and "aic". Default is "gic".

-
ic.scale
+
ic.scale

A non-negative value used for multiplying the penalty term in information criterion. Default: ic.scale = 1.

-
lambda
+
lambda

A single lambda value for regularized best subset selection. Default is 0.

-
always.include
+
always.include

An integer vector containing the indexes of variables that should always be included in the model.

-
group.index
+
group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. @@ -181,11 +183,11 @@

Arguments

please set group.index = NULL (the default).

-
c.max
+
c.max

an integer splicing size. Default is: c.max = 2.

-
splicing.type
+
splicing.type

Optional type for splicing. If splicing.type = 1, the number of variables to be spliced is c.max, ..., 1; if splicing.type = 2, @@ -193,54 +195,52 @@

Arguments

(Default: splicing.type = 2.)

-
max.splicing.iter
+
max.splicing.iter

The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is max.splicing.iter = 20.

-
warm.start
+
warm.start

Whether to use the last solution as a warm start. Default is warm.start = TRUE.

-
important.search
+

An integer number indicating the number of important variables to be splicing. When important.search \(\ll\) p variables, it would greatly reduce runtimes. Default: important.search = 128.

-
max.newton.iter
+
max.newton.iter

a integer giving the maximal number of Newton's iteration iterations. Default is max.newton.iter = 10 if newton = "exact", and max.newton.iter = 60 if newton = "approx".

-
newton.thresh
+
newton.thresh

a numeric value for controlling positive convergence tolerance. The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\) newton.thresh.

-
num.threads
+
num.threads

An integer decide the number of threads to be concurrently used for cross-validation (i.e., tune.type = "cv"). If num.threads = 0, then all of available cores will be used. Default: num.threads = 0.

-
seed
+
seed

Seed to be used to divide the sample into cross-validation folds. Default is seed = 1.

-
...
+
...

further arguments to be passed to or from methods.

Value

- - -

A S3 abessrpca class object, which is a list with the following components:

+

A S3 abessrpca class object, which is a list with the following components:

S

A list with length(support.size) elements, each of which is a sparse matrix estimation;

@@ -277,25 +277,25 @@

Value

Details

Adaptive best subset selection for robust principal component analysis aim to find two latent matrices \(L\) and \(S\) such that the original matrix \(X\) can be appropriately approximated: -$$x = L + S + N,$$ -where \(L\) is a low-rank matrix, \(S\) is a sparse matrix, \(N\) is a dense noise matrix. +$$x = L + S + N,$$ +where \(L\) is a low-rank matrix, \(S\) is a sparse matrix, \(N\) is a dense noise matrix. Generic splicing technique can be employed to solve this problem by iteratively improve the quality of the estimation of \(S\).

-

For a given support set \(\Omega\), the optimization problem: +

For a given support set \(\Omega\), the optimization problem: $$\min_S \| x - L - S\|_F^2 \;\;{\rm s.t.}\;\; S_{ij} = 0 {\rm for } (i, j) \in \Omega^c,$$ -still a non-convex optimization problem. We use the hard-impute algorithm proposed in one of the reference to solve this problem. -The hard-impute algorithm is an iterative algorithm, people can set max.newton.iter and newton.thresh to -control the solution precision of the optimization problem. -(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.) -According to our experiments, -we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced, +still a non-convex optimization problem. We use the hard-impute algorithm proposed in one of the reference to solve this problem. +The hard-impute algorithm is an iterative algorithm, people can set max.newton.iter and newton.thresh to +control the solution precision of the optimization problem. +(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.) +According to our experiments, +we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced, we suggest users keep the default values unchanged.

Note

-

Some parameters not described in the Details Section is explained in the document for abess +

Some parameters not described in the Details Section is explained in the document for abess because the meaning of these parameters are very similar.

-

At present, \(l_2\) regularization and group selection are not support, -and thus, set lambda and group.index have no influence on the output. +

At present, \(l_2\) regularization and group selection are not support, +and thus, set lambda and group.index have no influence on the output. This feature will coming soon.

@@ -520,16 +520,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,37 +100,37 @@

Extract Model Coefficients from a fitted "abess" object.

-
# S3 method for abess
+    
# S3 method for class 'abess'
 coef(object, support.size = NULL, sparse = TRUE, ...)

Arguments

-
object
+ + +
object

An "abess" project.

-
support.size
+
support.size

An integer vector specifies the coefficient fitted at given support.size. If support.size = NULL, then all coefficients would be returned. Default: support.size = NULL.

-
sparse
+
sparse

A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default: sparse = TRUE.

-
...
+
...

Other arguments.

Value

- - -

A coefficient matrix when fitting an univariate model including gaussian, binomial, poisson, and cox; +

A coefficient matrix when fitting an univariate model including gaussian, binomial, poisson, and cox; otherwise, a list containing coefficient matrices. For a coefficient matrix, each row is a variable, and each column is a support size.

@@ -156,16 +156,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,17 +100,19 @@

Extract Sparse Loadings from a fitted "abesspca" object.

-
# S3 method for abesspca
+    
# S3 method for class 'abesspca'
 coef(object, support.size = NULL, kpc = NULL, sparse = TRUE, ...)

Arguments

-
object
+ + +
object

An "abesspca" project.

-
support.size
+
support.size

An integer vector specifies the coefficient fitted at given support.size. If support.size = NULL, then all coefficients would be returned. @@ -118,7 +120,7 @@

Arguments

This parameter is omitted if sparse.type = "kpc".

-
kpc
+
kpc

An integer vector specifies the coefficient fitted at given principal component. If kpc = NULL, then all coefficients would be returned. @@ -126,20 +128,18 @@

Arguments

This parameter is omitted if sparse.type = "fpc".

-
sparse
+
sparse

A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default: sparse = TRUE.

-
...
+
...

Other arguments.

Value

- - -

A matrix with length(support.size) columns. +

A matrix with length(support.size) columns. Each column corresponds to a sparse loading for the first principal component, where the number of non-zeros entries depends on the support.size.

@@ -162,16 +162,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,38 +100,38 @@

Extract sparse component from a fitted "abessrpca" object.

-
# S3 method for abessrpca
+    
# S3 method for class 'abessrpca'
 coef(object, support.size = NULL, sparse = TRUE, ...)

Arguments

-
object
+ + +
object

An "abessrpca" project.

-
support.size
+
support.size

An integer vector specifies the sparse matrix fitted at given support.size to be returned. -If support.size = NULL, then the sparse matrix with +If support.size = NULL, then the sparse matrix with the least tuning value would be returned. Default: support.size = NULL.

-
sparse
+
sparse

A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default: sparse = TRUE.

-
...
+
...

Other arguments.

Value

- - -

A list with length(support.size) number of dgCMatrix, +

A list with length(support.size) number of dgCMatrix, each of which is the estimation the sparse component.

@@ -147,16 +147,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,32 +100,32 @@

Extract the deviance from a fitted "abess" object.

-
# S3 method for abess
+    
# S3 method for class 'abess'
 deviance(object, type = c("standard", "gic", "ebic", "bic", "aic"), ...)

Arguments

-
object
+ + +
object

A "abess" object.

-
type
+
type

The type of deviance. One of the following: "standard", "gic", "ebic", "bic" and "aic". Default is "standard".

-
...
+
...

additional arguments

Value

- - -

A numeric vector.

+

A numeric vector.

See also

@@ -149,16 +149,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -104,17 +104,19 @@

Extract one model from a fitted "abess" object.

extract(object, support.size = NULL, ...)
 
-# S3 method for abess
+# S3 method for class 'abess'
 extract(object, support.size = NULL, ...)

Arguments

-
object
+ + +
object

An "abess" project.

-
support.size
+
support.size

An integer value specifies the model size fitted at given support.size. If support.size = NULL, then the model with @@ -122,15 +124,13 @@

Arguments

Default: support.size = NULL.

-
...
+
...

Other arguments.

Value

- - -

A list object including the following components:

+

A list object including the following components:

beta

A \(p\)-by-1 matrix of sparse matrix, stored in column format.

@@ -177,16 +177,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -121,39 +121,41 @@

Generate simulated data

Arguments

-
n
+ + +
n

The number of observations.

-
p
+
p

The number of predictors of interest.

-
support.size
+
support.size

The number of nonzero coefficients in the underlying regression model. Can be omitted if beta is supplied.

-
rho
+
rho

A parameter used to characterize the pairwise correlation in predictors. Default is 0.

-
family
+
family

The distribution of the simulated response. "gaussian" for univariate quantitative response, "binomial" for binary classification response, "poisson" for counting response, "cox" for left-censored response, "mgaussian" for multivariate quantitative response, -"mgaussian" for multi-classification response, +"mgaussian" for multi-classification response, "ordinal" for ordinal response.

-
beta
+
beta

The coefficient values in the underlying regression model. If it is supplied, support.size would be omitted.

-
cortype
+
cortype

The correlation structure. cortype = 1 denotes the independence structure, where the covariance matrix has \((i,j)\) entry equals \(I(i \neq j)\). @@ -164,7 +166,7 @@

Arguments

matrix are \(rho\) and diagonal entries are 1.

-
snr
+
snr

A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as as the variance of \(x\beta\) divided by the variance of a gaussian noise: \(\frac{Var(x\beta)}{\sigma^2}\). @@ -173,17 +175,17 @@

Arguments

Note that this arguments's effect is overridden if sigma is supplied with a non-null value.

-
sigma
+
sigma

The variance of the gaussian noise. Default sigma = NULL implies it is determined by snr.

-
weibull.shape
+
weibull.shape

The shape parameter of the Weibull distribution. It works only when family = "cox". Default: weibull.shape = 1.

-
uniform.max
+
uniform.max

A parameter controlling censored rate. A large value implies a small censored rate; otherwise, a large censored rate. @@ -191,23 +193,21 @@

Arguments

Default is uniform.max = 1.

-
y.dim
+
y.dim

Response's Dimension. It works only when family = "mgaussian". Default: y.dim = 3.

-
class.num
+
class.num

The number of class. It works only when family = "multinomial". Default: class.num = 3.

-
seed
+
seed

random seed. Default: seed = 1.

Value

- - -

A list object comprising:

+

A list object comprising:

x

Design matrix of predictors.

@@ -296,16 +296,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -114,48 +114,48 @@

Generate matrix composed of a sparse matrix and low-rank matrix

Arguments

-
n
+ + +
n

The number of observations.

-
p
+
p

The number of predictors of interest.

-
rank
+
rank

The rank of low-rank matrix.

-
support.size
+
support.size

The number of nonzero coefficients in the underlying regression model. Can be omitted if beta is supplied.

-
beta
+
beta

The coefficient values in the underlying regression model. If it is supplied, support.size would be omitted.

-
snr
+
snr

A positive value controlling the signal-to-noise ratio (SNR). A larger SNR implies the identification of sparse matrix is much easier. Default snr = Inf enforces no noise exists.

-
sigma
-

A numerical value supplied the variance of the gaussian noise. +

sigma
+

A numerical value supplied the variance of the gaussian noise. Default sigma = NULL implies it is determined by snr.

-
seed
+
seed

random seed. Default: seed = 1.

Value

- - -

A list object comprising:

+

A list object comprising:

x

An \(n\)-by-\(p\) matrix.

@@ -221,16 +221,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -113,44 +113,44 @@

Generate matrix with sparse principal component

Arguments

-
n
+ + +
n

The number of observations.

-
p
+
p

The number of predictors of interest.

-
support.size
+
support.size

A integer specify the number of non-zero entries in the first column of loading matrix.

-
snr
+
snr

A positive value controlling the signal-to-noise ratio (SNR). A larger SNR implies the identification of sparse matrix is much easier. Default snr = Inf enforces no noise exists.

-
sigma
+
sigma

A numerical vector with length p specify the standard deviation of each columns. -Default sigma = NULL implies it is determined by snr. +Default sigma = NULL implies it is determined by snr. If it is supplied, support.size would be omit.

-
sparse.loading
-

A p-by-p sparse orthogonal matrix. +

sparse.loading
+

A p-by-p sparse orthogonal matrix. If it is supplied, support.size would be omit.

-
seed
+
seed

random seed. Default: seed = 1.

Value

- - -

A list object comprising:

+

A list object comprising:

x

An \(n\)-by-\(p\) matrix.

@@ -183,16 +183,16 @@

References

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + Package index • abess - +
- +
@@ -208,16 +208,16 @@

Real-world dataset
-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,7 +100,7 @@

Creat plot from a fitted "abess" object

-
# S3 method for abess
+    
# S3 method for class 'abess'
 plot(
   x,
   type = c("coef", "l2norm", "dev", "dev.ratio", "tune"),
@@ -111,11 +111,13 @@ 

Creat plot from a fitted "abess" object

Arguments

-
x
+ + +
x

A "abess" object.

-
type
+
type

The type of terms to be plot in the y-axis. One of the following: "coef" (i.e., coefficients), "l2norm" (i.e., L2-norm of coefficients), @@ -124,21 +126,19 @@

Arguments

Default is "coef".

-
label
+
label

A logical value. If label = TRUE (the default), label the curves with variable sequence numbers.

-
...
+
...

Other graphical parameters to plot

Value

- - -

No return value, called for side effects.

+

No return value, called for side effects.

Note

@@ -182,16 +182,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,17 +100,19 @@

Creat plot from a fitted "abess" object

-
# S3 method for abesspca
+    
# S3 method for class 'abesspca'
 plot(x, type = c("pev", "coef", "tune"), label = FALSE, ...)

Arguments

-
x
+ + +
x

A "abess" object.

-
type
+
type

The type of terms to be plot in the y-axis. One of the following: "pev" (i.e., percent of explained variance), @@ -119,21 +121,19 @@

Arguments

Default is "coef".

-
label
+
label

A logical value. If label = TRUE (the default), label the curves with variable sequence numbers.

-
...
+
...

Other graphical parameters to plot

Value

- - -

No return value, called for side effects.

+

No return value, called for side effects.

Note

@@ -173,16 +173,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -100,17 +100,19 @@

Creat plot from a fitted "abessrpca" object

-
# S3 method for abessrpca
+    
# S3 method for class 'abessrpca'
 plot(x, type = c("S", "loss", "tune"), support.size = NULL, label = TRUE, ...)

Arguments

-
x
+ + +
x

A "abessrpca" object.

-
type
+
type

The plot type. One of the following: "S" (i.e., a heatmap for the sparse matrix estimation), @@ -119,30 +121,28 @@

Arguments

Default is "coef".

-
support.size
+
support.size

An integer vector specifies the sparse matrix fitted at given support.size to be returned. -If support.size = NULL, then the sparse matrix with +If support.size = NULL, then the sparse matrix with the least tuning value would be returned. Default: support.size = NULL.

-
label
+
label

A logical value. If label = TRUE (the default), label the curves with variable sequence numbers.

-
...
-

Other graphical parameters to plot +

...
+

Other graphical parameters to plot or stats::heatmap function

Value

- - -

No return value, called for side effects.

+

No return value, called for side effects.

@@ -157,16 +157,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -98,21 +98,23 @@

Make predictions from a fitted "abess" object.

-
# S3 method for abess
+    
# S3 method for class 'abess'
 predict(object, newx, type = c("link", "response"), support.size = NULL, ...)

Arguments

-
object
+ + +
object

An "abess" project.

-
newx
+
newx

New data used for prediction. If omitted, the fitted linear predictors are used.

-
type
+
type

type = "link" gives the linear predictors for "binomial", "poisson" or "cox" models; for "gaussian" models it gives the fitted values. type = "response" gives the fitted probabilities for @@ -120,7 +122,7 @@

Arguments

"cox"; for "gaussian", type = "response" is equivalent to type = "link".

-
support.size
+
support.size

An integer value specifies the model size fitted at given support.size. If support.size = NULL, then the model with @@ -128,15 +130,13 @@

Arguments

Default: support.size = NULL.

-
...
+
...

Additional arguments affecting the predictions produced.

Value

- - -

The object returned depends on the types of family.

+

The object returned depends on the types of family.

See also

@@ -160,16 +160,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -98,29 +98,29 @@

Print method for a fitted "abess" object

-
# S3 method for abess
+    
# S3 method for class 'abess'
 print(x, digits = max(5, getOption("digits") - 5), ...)

Arguments

-
x
+ + +
x

A "abess" object.

-
digits
+
digits

Minimum number of significant digits to be used.

-
...
+
...

additional print arguments

Value

- - -

No return value, called for side effects

+

No return value, called for side effects

Details

@@ -151,16 +151,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -98,29 +98,29 @@

Print method for a fitted "abesspca" object

-
# S3 method for abesspca
+    
# S3 method for class 'abesspca'
 print(x, digits = max(5, getOption("digits") - 5), ...)

Arguments

-
x
+ + +
x

A "abesspca" object.

-
digits
+
digits

Minimum number of significant digits to be used.

-
...
+
...

additional print arguments

Value

- - -

No return value, called for side effects

+

No return value, called for side effects

Details

@@ -148,16 +148,16 @@

See also

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -98,29 +98,29 @@

Print method for a fitted "abessrpca" object

-
# S3 method for abessrpca
+    
# S3 method for class 'abessrpca'
 print(x, digits = max(5, getOption("digits") - 5), ...)

Arguments

-
x
+ + +
x

A "abessrpca" object.

-
digits
+
digits

Minimum number of significant digits to be used.

-
...
+
...

additional print arguments

Value

- - -

No return value, called for side effects

+

No return value, called for side effects

Details

@@ -142,16 +142,16 @@

Details

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- + - +
- +
@@ -132,16 +132,16 @@

References

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.0.

- +