diff --git a/404.html b/404.html index be47648f..142803d0 100644 --- a/404.html +++ b/404.html @@ -27,7 +27,7 @@
- +abess
: Linear
../vignettes/v01-abess-guide.Rmd
+ Source: vignettes/v01-abess-guide.Rmd
v01-abess-guide.Rmd
The R package abess
implement a polynomial algorithm for solving the
-best-subset selection problem: \[\min_{\boldsymbol{\beta} \in \mathbb{R}^p}
-\mathcal{L_n}({\boldsymbol\beta}), \text{ s.t. } \|\boldsymbol\beta\|_0
-\leq s,\] where \(\mathcal{L}_{n}(\boldsymbol \beta)=\frac{1}{2
-n}\|y-X \boldsymbol{\beta}\|_{2}^{2}\), \(\|\boldsymbol{\beta}\|_0=\sum_{i=1}^pI(
-\boldsymbol{\beta}_i\neq 0)\) is the \(\ell_0\)(-pseudo) norm of \(\beta\), and the sparsity level \(s\) is usually an unknown non-negative
-integer. Next, we present an example to show how to use the
-abess
package to solve a simple problem.
abess
package to solve a simple
+problem.
We generate a design matrix \(X\) +
We generate a design matrix + containing 300 observations and each observation has 1000 predictors. -The response variable \(y\) is linearly -related to the first, second, and fifth predictors in \(X\): \[y = 3X_1 -+ 1.5X_2 + 2X_5 + \epsilon,\] where \(\varepsilon\) is a standard normal random -variable.
+The response variable + +is linearly related to the first, second, and fifth predictors in +: + +where + +is a standard normal random variable.
library(abess)
synthetic_data <- generate.data(n = 300, p = 1000,
@@ -637,9 +647,7 @@ More features
-
-
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v03-classification.Rmd
+ Source: vignettes/v03-classification.Rmd
v03-classification.Rmd
To arrive at the multinomial logistic model, one can imagine, for -\(K\) possible classes, running \(K-1\) independent logistic regression -models, in which one class is chosen as a ``pivot’’ and then the other -\(K-1\) classes are separately -regressed against the pivot outcome. This would proceed as follows, if -class K (the last outcome) is chosen as the pivot: \[\ln(\mathbb{P}(y = 1) / \mathbb{P}(y = K)) = -x^\top \boldsymbol\beta^{(1)}, \] \[\cdots \cdots\] \[\ln(\mathbb{P}(y = K - 1) / \mathbb{P}(y = K)) = -x^\top \boldsymbol\beta^{(K - 1)}.\] Then, the probability to -choose the \(j\)-th class can be easily -derived to be: \[\mathbb{P}(y = j) = -\frac{\exp{(x^\top \boldsymbol\beta^{(j)})}}{1 + \sum_{k=1}^{K-1} -\exp{(x^\top \boldsymbol\beta^{(k)})}}, \] and subsequently, we -would predict the \(j^{*}\)-th class if -the \(j^* = \arg\max_{j} \mathbb{P}(y = -j)\). Notice that, for \(K\) -possible classes case, there are \(p \times (K -- 1)\) unknown parameters: \(\boldsymbol\beta^{(1)}, \ldots, -\boldsymbol\beta^{(K-1)}\) to be estimated. Because the number of -parameters increase as \(K\), it is -even more urge to constrain the model complexity. And the best subset -selection for multinomial logistic regression aims to maximize the -log-likelihood function and control the model complexity by restricting -\(B = (\boldsymbol\beta^{(1)}, \ldots, -\boldsymbol\beta^{(K-1)})\) with \(\| B -\|_{0, 2} \leq s\) where \(\| B \|_{0, -2} = \sum_{i = 1}^{p} I(B_{i\cdot} = {\bf 0})\), \(B_{i\cdot}\) is the \(i\)-th row of coefficient matrix \(B\) and \({\bf 0} -\in R^{K - 1}\) is an all zero vector. In other words, each row -of \(B\) would be either all zero or -all non-zero.
+ +possible classes, running + +independent logistic regression models, in which one class is chosen as +a ``pivot’’ and then the other + +classes are separately regressed against the pivot outcome. This would +proceed as follows, if class K (the last outcome) is chosen as the +pivot: + +Then, the probability to choose the +-th +class can be easily derived to be: + +and subsequently, we would predict the +-th +class if the +. +Notice that, for + +possible classes case, there are + +unknown parameters: + +to be estimated. Because the number of parameters increase as +, +it is even more urge to constrain the model complexity. And the best +subset selection for multinomial logistic regression aims to maximize +the log-likelihood function and control the model complexity by +restricting + +with + +where $\| B \|_{0, 2} = \sum_{i = 1}^{p} +I(B_{i\cdot} = {\bf 0})$, + +is the +-th +row of coefficient matrix + +and ${\bf 0} \in R^{K - 1}$ is an all +zero vector. In other words, each row of + +would be either all zero or all non-zero.../vignettes/v04-PoissonGammaReg.Rmd
+ Source: vignettes/v04-PoissonGammaReg.Rmd
v04-PoissonGammaReg.Rmd
The Poisson regression is defined as +
+ We generate some artificial data using +the following logic. Consider a dataset containing the information of +complain calls about 100 companies over a period of 10 years. +count
gives the number of complains, and the dataset also
+have other variables like age
, sex
,
+job
, education
, region
,
+marriage
. The generate.data()
function allow
+you to generate simulated data. By specifying
+support.size = 3
, here we set only 3 of the 5 above
+mentioned variable have effect on the expectation of the response
+count
.
library(abess)
dat <- generate.data(n = 100, p = 6, support.size = 3,family = "poisson")
@@ -322,15 +322,21 @@ Gamma RegressionGamma regression can be used when you meet a positive continuous
response variable such as payments for insurance claims, or the lifetime
of a redundant system. It is well known that the density of Gamma
-distribution can be represented as a function of a mean parameter (\(\mu\)) and a shape parameter (\(\alpha\)), specifically, \[
-f(y \mid \mu, \alpha)=\frac{1}{y \Gamma(\alpha)}\left(\frac{\alpha
-y}{\mu}\right)^{\alpha} e^{-\alpha y / \mu} {I}_{(0, \infty)}(y),
-\] where \(I(\cdot)\) denotes
-the indicator function. In the Gamma regression model, response variable
-IS assumed to follow Gamma distribution. Specifically, \[
+distribution can be represented as a function of a mean parameter
+()
+and a shape parameter
+(),
+specifically,
+ where
+
+denotes the indicator function. In the Gamma regression model, response
+variable IS assumed to follow Gamma distribution. Specifically,
+ where \(1/\mu_i =
-x_i^T\beta\).
+ where
+.
We apply the above procedure for gamma regression simply by changing
family = "poisson"
to family = "gamma"
. This
time we consider the response variables as (continuous) levels of
@@ -449,9 +455,7 @@
Gamma Regression
-
-
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v05-coxreg.Rmd
+ Source: vignettes/v05-coxreg.Rmd
v05-coxreg.Rmd
Consider two case \(i\) and \(i'\) that have different x values. -Their hazard functions can be simply written as follow
-\[
+ Consider two case
+
+and
+
+that have different x values. Their hazard functions can be simply
+written as follow
\[\begin{align*} -\frac{h_i(t)}{h_{i'}(t)} & = -\frac{h_0(t)\exp(\eta_i)}{h_0(t)\exp(\eta_{i'})} \\ - & = -\frac{\exp(\eta_i)}{\exp(\eta_{i'})} -\end{align*}\]
+ and + The hazard ratio for these two cases +is +which is independent of time.
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v06-MultiTaskLearning.Rmd
+ Source: vignettes/v06-MultiTaskLearning.Rmd
v06-MultiTaskLearning.Rmd
Due to the Occam’s razor principal or the high-dimensionality of predictors, it is meaningful to use a small amount of predictors to conduct multi-task learning. For example, understanding the relationship @@ -156,14 +169,22 @@
plot(abess_fit)
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v07-advancedFeatures.Rmd
+ Source: vignettes/v07-advancedFeatures.Rmd
v07-advancedFeatures.Rmd
In some cases, especially under low signal-to-noise ratio (SNR)
-setting or predictors are highly correlated, the vanilla type of \(L_0\) constrained model may not be
-satisfying and a more sophisticated trade-off between bias and variance
-is needed. Under this concern, the abess
package provides
-option of best subset selection with \(L_2\) norm regularization called the
-regularized best subset selection (BESS). The model has this following
-form:
\[\begin{align} -\min_{\boldsymbol\beta} \mathcal{L_n}({\boldsymbol\beta}) + \lambda -\|\boldsymbol\beta\|_2^2. -\end{align}\]
+setting or predictors are highly correlated, the vanilla type of + +constrained model may not be satisfying and a more sophisticated +trade-off between bias and variance is needed. Under this concern, the +abess
package provides option of best subset selection with
+
+norm regularization called the regularized best subset selection (BESS).
+The model has this following form:
+
Group linear model is a linear model which the \(p\) predictors are separated into \(J\) pre-determined non-overlapping -groups,
-\[
+ Group linear model is a linear model which the
+
+predictors are separated into
+
+pre-determined non-overlapping groups,
Best group subset selection (BGSS) aims to choose a small part of groups to achieve the best interpretability on the response variable. @@ -294,20 +298,21 @@
abess
package, and user can handily select best group
-subset by assigning a proper value to the group.index
-arguments in abess()
function.
+The BGSS can be achieved by solving:
+ where the
+
+(-pseudo) norm is defined as
+
+and model size
+
+is a positive integer to be determined from data. Regardless of the
+NP-hard of this problem, Zhang et al develop a certifiably polynomial
+algorithm with a high probability to solve it. This algorithm is
+integrated in the abess
package, and user can handily
+select best group subset by assigning a proper value to the
+group.index
arguments in abess()
function.
abess
pack
../vignettes/v08-sPCA.Rmd
+ Source: vignettes/v08-sPCA.Rmd
v08-sPCA.Rmd
where \(\Sigma = X^TX / (n-1)\) and -\(X\) is the centered -sample matrix. We also denote that \(X\) is a \(n\times p\) matrix, where each row is an -observation and each column is a variable.
-Then, before further analysis, we can project \(X\) to \(v\) (thus dimensional reduction), without -losing too much information.
+ +where + +and + +is the centered sample matrix. We also denote that + +is a + +matrix, where each row is an observation and each column is a +variable.
+Then, before further analysis, we can project + +to + +(thus dimensional reduction), without losing too much information.
However, consider that:
The PC is a linear combination of all primary variables (\(Xv\)), but sometimes we may tend to use -less variables for clearer interpretation (and less computational -complexity);
It has been proved that if \(p/n\) does not converge to \(0\), the classical PCA is not consistent, -but this would happen in some high-dimensional data analysis.
The PC is a linear combination of all primary variables +(), +but sometimes we may tend to use less variables for clearer +interpretation (and less computational complexity);
It has been proved that if + +does not converge to +, +the classical PCA is not consistent, but this would happen in some +high-dimensional data analysis.
For example, in gene analysis, the dataset may contain plenty of genes (variables) and we would like to find a subset of them, which can @@ -170,23 +182,28 @@
where \(s\) is a non-negative -integer, which indicates how many primary variables are used in -principal component. With abessPCA, we can search for the best subset of -variables to form principal component and it retains consistency even -under \(p>>n\). And we make two -remarks:
+ +where + +is a non-negative integer, which indicates how many primary variables +are used in principal component. With abessPCA, we can search for the +best subset of variables to form principal component and it retains +consistency even under +. +And we make two remarks:
In the next section, we will show how to form abessPCA in our frame.
@@ -218,7 +235,8 @@Next, we turn to fit abessPCA. For fitting the model, we can give -either predictor matrix \(X\):
+either predictor matrix +:##
@@ -288,7 +306,9 @@ Adaptive best subset selection f
Interpreting result
After fitting abessPCA, we study the percentage of explained variance
-as \(s\) increases:
+as
+
+increases:
plot(best_pca[["support.size"]], best_pca[["pev"]], type = "l")
@@ -416,15 +436,19 @@ Group variableIn some cases, some variables may need to consider together, that is,
they should be “used” or “unused” for PC at the same time, which we call
“group information”. The optimization problem becomes:
-\[
- \max_{v} v^{\top}\Sigma v,\qquad s.t.\quad v^{\top}v=1,\
-\sum_{g=1}^G I(||v_g||\neq 0)\leq s.
-\]
-where we suppose there are \(G\)
-groups, and the \(g\)-th one correspond
-to \(v_g\), \(v = [v_1^T,v_2^T,\cdots,v_G^T]^T\). Then we
-are interested to find \(s\) (or less)
-important groups.
+
+where we suppose there are
+
+groups, and the
+-th
+one correspond to
+,
+.
+Then we are interested to find
+
+(or less) important groups.
Group problem is extraordinary important in real data analysis. Still
take gene analysis as an example, several sites would be related to one
pathway, and it is meaningless to consider each of them alone.
@@ -480,33 +504,41 @@ Multiple principal components\[
+
-where \(\Sigma\) is the covariance
-matrix and \(v\) is its (sparse)
-loading vector. We map it into \(\Sigma'\), which indicates the
-orthogonal space of \(v\), and then
-solve the sparse principal component for \(\Sigma'\) again. By this iteration
-process, we can acquire multiple principal components and they are
-sorted from the largest to the smallest. In our program, there is an
-additional argument sparse.type
to support this feature. By
-setting sparse.type = "kpc"
, then best subset selection
-performs on the first \(K\) principal
-components where \(K\) is decided by
-argument support.size
.
+
+where
+
+is the covariance matrix and
+
+is its (sparse) loading vector. We map it into
+,
+which indicates the orthogonal space of
+,
+and then solve the sparse principal component for
+
+again. By this iteration process, we can acquire multiple principal
+components and they are sorted from the largest to the smallest. In our
+program, there is an additional argument sparse.type
to
+support this feature. By setting sparse.type = "kpc"
, then
+best subset selection performs on the first
+
+principal components where
+
+is decided by argument support.size
.
Suppose we are interested in the first two principal components, and
the support size is 50 in the first loading vector and is 40 in the
second loading vector. In other words, we consecutively solve two
-problem: \[
- v_1 \leftarrow \arg\max_{v} v^{\top}\Sigma v,\qquad s.t.\quad
-v^{\top}v=1,\ ||v||_0\leq 10,
-\] \[
- v_2 \leftarrow \arg\max_{v} v^{\top} \Sigma^{\prime} v,\qquad
-s.t.\quad v^{\top}v=1,\ ||v||_0\leq 5,
-\] where \(\Sigma^{\prime} = (1-v_1
-v_1^\top)\Sigma(1-v_1 v_1^\top)\). The \((v_1, v_2)\) forms a sparse loading
-matrix.
+problem:
+ where
+.
+The
+
+forms a sparse loading matrix.
The code for solving the two problem is:
best_kpca <- abesspca(X, support.size = c(50, 40), sparse.type = "kpc")
@@ -558,16 +590,15 @@ Multiple principal components## - attr(*, "class")= chr "abesspca"
The result best_kpca[["pev"]]
shows that two principal
components raised from two loading matrix could explain 40% variance of
-all variables (i.e., \(\text{trace}(\Sigma)\)).
+all variables (i.e.,
+).
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v09-fasterSetting.Rmd
+ Source: vignettes/v09-fasterSetting.Rmd
v09-fasterSetting.Rmd
We sometimes meet with problems where the \(N \times p\) input matrix \(X\) is extremely sparse, i.e., many entries -in \(X\) have zero values. A notable -example comes from document classification: aiming to assign classes to -a document, making it easier to manage for publishers and news sites. -The input variables for characterizing documents are generated from a so -called ``bag-of-words’’ model. In this model, each variable is scored -for the presence of each of the words in the entire dictionary under -consideration. Since most words are absent, the input variables for each -document is mostly zero, and so the entire matrix is mostly zero. Such -sparse matrices can be efficiently stored in R with a sparse column -format via the Matrix +
We sometimes meet with problems where the + +input matrix + +is extremely sparse, i.e., many entries in + +have zero values. A notable example comes from document classification: +aiming to assign classes to a document, making it easier to manage for +publishers and news sites. The input variables for characterizing +documents are generated from a so called ``bag-of-words’’ model. In this +model, each variable is scored for the presence of each of the words in +the entire dictionary under consideration. Since most words are absent, +the input variables for each document is mostly zero, and so the entire +matrix is mostly zero. Such sparse matrices can be efficiently stored in +R with a sparse column format via the Matrix package. And the sparse matrix can be directly used by our abess package for boosting the computational efficient.
-ABESS algorithm is ideally set up to exploit such sparsity. The \(O(N)\) inner-product operations when -computing forward sacrifice can exploit the sparsity, by summing over -only the non-zero entries. For computing backward sacrifice, the -sparsity also facilitate solving the convex optimization under a given -support set. The following example demonstrates the efficiency gain from -the sparse matrix. We first generate a input matrix whose 90% entries -are 0.
+ABESS algorithm is ideally set up to exploit such sparsity. The + +inner-product operations when computing forward sacrifice can exploit +the sparsity, by summing over only the non-zero entries. For computing +backward sacrifice, the sparsity also facilitate solving the convex +optimization under a given support set. The following example +demonstrates the efficiency gain from the sparse matrix. We first +generate a input matrix whose 90% entries are 0.
## Loading required package: Matrix
@@ -234,8 +237,8 @@ ## user.self sys.self elapsed
-## t1 0.215 0.001 0.216
-## t2 0.169 0.007 0.176
+## t1 0.140 0.002 0.141
+## t2 0.116 0.002 0.118
From the comparison, we see that the time required by sparse matrix
is visibly smaller, and thus, we suggest to assign a sparse matrix to
abess
when the input matrix have a lot of zero entries.
The following is a typical ``model size v.s. BGIC’’ plot.
The \(x\)-axis is model size, and -the \(y\)-axis is BGIC’s value recorded -in group splicing algorithm for linear model. The entries of design -matrix \(X\) are i.i.d. -sampled from \(\mathcal{N}(0, 1)\), and -the matrix shape is \(100 \times 200\). -The error term \(\varepsilon\) are -i.i.d. \(\mathcal{N}(0, -\frac{1}{2})\). Take the two adjacent variables as one group, and -set the true coefficients \(\beta=(1, 1, 1, 1, -1, 1, -1, -1, -1, -1, 0, \ldots, 0)\). The orange vertical dash -line indicates the true group subset size.
-From this Figure, we see that the BGIC decreases from \(T=1\) to \(T=5\), but it increases as \(T\) larger than \(5\). In other words, the BGIC path of -SGSplicing algorithm is a strictly unimodal function achieving minimum -at the true group subset size \(T = -5\). Motivated by this observation, we suggest to recruit a -heuristic search based on the golden-section search technique, an -efficient method for finding the extremum of a unimodal function, to -determine support size that minimizing BGIC. Compared with searching the -optimal support size one by one from a candidate set with \(O(s_{\max})\) complexity, golden-section -reduce the time complexity to \(O(\ln{(s_{\max})})\), giving a significant -computational improvement.
+The +-axis +is model size, and the +-axis +is BGIC’s value recorded in group splicing algorithm for linear model. +The entries of design matrix + +are i.i.d. sampled from +, +and the matrix shape is +. +The error term + +are i.i.d. +. +Take the two adjacent variables as one group, and set the true +coefficients +. +The orange vertical dash line indicates the true group subset size.
+From this Figure, we see that the BGIC decreases from + +to +, +but it increases as + +larger than +. +In other words, the BGIC path of SGSplicing algorithm is a strictly +unimodal function achieving minimum at the true group subset size +. +Motivated by this observation, we suggest to recruit a heuristic search +based on the golden-section search technique, an efficient method for +finding the extremum of a unimodal function, to determine support size +that minimizing BGIC. Compared with searching the optimal support size +one by one from a candidate set with + +complexity, golden-section reduce the time complexity to +, +giving a significant computational improvement.
The code below exhibits how to employ the golden search technique with abess package:
@@ -299,8 +319,8 @@Golden-section searchingt2 <- system.time(abess_fit <- abess(y ~ ., data = dat)) rbind(t1, t2)[, 1:3]
## user.self sys.self elapsed
-## t1 0.011 0.001 0.012
-## t2 0.041 0.001 0.041
+## t1 0.007 0 0.007
+## t2 0.022 0 0.022
## user.self sys.self elapsed
-## t1 0.042 0.000 0.043
-## t2 0.040 0.001 0.041
+## t1 0.022 0.001 0.022
+## t2 0.021 0.001 0.022
we can conclude that early-stopping brings fast computation and might maintain statistical guarantee.
@@ -398,9 +418,7 @@Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - - - +../vignettes/v10-algorithm.Rmd
+ Source: vignettes/v10-algorithm.Rmd
v10-algorithm.Rmd
Consider the \(\ell_{0}\) constraint -minimization problem, \[ -\min _{\boldsymbol{\beta}} \mathcal{L}_{n}(\boldsymbol{\beta}), \quad -\text { s.t }\|\boldsymbol{\beta}\|_{0} \leq \mathrm{s}, -\] where ${n}()=|y-X |{2}^{2}. $ Without loss of -generality, we consider \(\|\boldsymbol{\beta}\|_{0}=\mathrm{s}\). -Given any initial set \(\mathcal{A} \subset -\mathcal{S}=\{1,2, \ldots, p\}\) with cardinality \(|\mathcal{A}|=s\), denote \(\mathcal{I}=\mathcal{A}^{\mathrm{c}}\) and -compute \[ -\hat{\boldsymbol{\beta}}=\arg \min _{\boldsymbol{\beta}_{\mathcal{I}}=0} -\mathcal{L}_{n}(\boldsymbol{\beta}). -\] We call \(\mathcal{A}\) and -\(\mathcal{I}\) as the active set and -the inactive set, respectively.
-Given the active set \(\mathcal{A}\) -and \(\hat{\boldsymbol{\beta}}\), we -can define the following two types of sacrifices:
+Consider the + +constraint minimization problem, + where ${n}()=|y-X |{2}^{2}. $ +Without loss of generality, we consider +. +Given any initial set + +with cardinality +, +denote + +and compute + We call + +and + +as the active set and the inactive set, respectively.
+Given the active set + +and +, +we can define the following two types of sacrifices:
Unfortunately, it is noteworthy that these two sacrifices are incomparable because they have different sizes of support set. However, -if we exchange some “irrelevant” variables in \(\mathcal{A}\) and some “important” -variables in \(\mathcal{I}\), it may -result in a higher-quality solution. This intuition motivates our -splicing method. Specifically, given any splicing size \(k \leq s\), define
-\[ -\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} -\mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\} -\] to represent \(k\) least -relevant variables in \(\mathcal{A}\) -and \[ -\mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} -\mid\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\} -\] to represent \(k\) most -relevant variables in \(\mathcal{I} .\) -Then, we splice \(\mathcal{A}\) and -\(\mathcal{I}\) by exchanging \(\mathcal{A}_{k}\) and \(\mathcal{I}_{k}\) and obtain a new active -set \[ -\tilde{\mathcal{A}}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) -\cup \mathcal{I}_{k}. -\] Let \(\tilde{\mathcal{I}}=\tilde{\mathcal{A}}^{c}, -\tilde{\boldsymbol{\beta}}=\arg \min -_{\boldsymbol{\beta}_{\overline{\mathcal{I}}}=0} -\mathcal{L}_{n}(\boldsymbol{\beta})\), and \(\tau_{s}>0\) be a threshold. If \(\tau_{s}<\) \(\mathcal{L}_{n}(\hat{\boldsymbol\beta})-\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), -then \(\tilde{A}\) is preferable to -\(\mathcal{A} .\) The active set can be -updated iteratively until the loss function cannot be improved by -splicing. Once the algorithm recovers the true active set, we may splice -some irrelevant variables, and then the loss function may decrease -slightly. The threshold \(\tau_{s}\) -can reduce this unnecessary calculation. Typically, \(\tau_{s}\) is relatively small, e.g. \(\tau_{s}=0.01 s \log (p) \log (\log n) / -n.\)
+if we exchange some “irrelevant” variables in + +and some “important” variables in +, +it may result in a higher-quality solution. This intuition motivates our +splicing method. Specifically, given any splicing size +, +define +to represent + +least relevant variables in + +and + to represent + +most relevant variables in + +Then, we splice + +and + +by exchanging + +and + +and obtain a new active set + Let +, +and + +be a threshold. If +, +then + +is preferable to + +The active set can be updated iteratively until the loss function cannot +be improved by splicing. Once the algorithm recovers the true active +set, we may splice some irrelevant variables, and then the loss function +may decrease slightly. The threshold + +can reduce this unnecessary calculation. Typically, + +is relatively small, +e.g.
\[\begin{align*} +
+&\boldsymbol{\beta}_{\mathcal{A}^{0}}^{0}=\left(\boldsymbol{X}_{\mathcal{A}^{0}}^{\top} \boldsymbol{X}_{\mathcal{A}^{0}}\right)^{-1} \boldsymbol{X}_{\mathcal{A}^{0}}^{\top} \boldsymbol{y},\\ +&d_{\mathcal{I}^{0}}^{0}=X_{\mathcal{I}^{0}}^{\top}\left(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}^{0}\right). +\end{align*}For \(m=0,1, \ldots, m_{\max}\), +
For +, do
-\[\left(\boldsymbol{\beta}^{m+1}, -\boldsymbol{d}^{m+1}, \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)= -\text{Splicing} \left(\boldsymbol{\beta}^{m}, \boldsymbol{d}^{m}, -\mathcal{A}^{m}, \mathcal{I}^{m}, k_{\max }, -\tau_{s}\right).\]
-If \(\left(\mathcal{A}^{m+1}, -\mathcal{I}^{m+1}\right)=\left(\mathcal{A}^{m}, -\mathcal{I}^{m}\right)\), then stop
+ +If +, +then stop
End For
Output \((\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})=\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1} -\mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right).\)
Output +
Input: \(\boldsymbol{\beta}, -\boldsymbol{d}, \mathcal{A}, \mathcal{I}, k_{\max }\), and \(\tau_{\mathrm{s}} .\)
Initialize \(L_{0}=L=\frac{1}{2 n}\|y-X -\beta\|_{2}^{2}\), and set \(\xi_{j}=\frac{X_{j}^{\top} X_{j}}{2 -n}\left(\beta_{j}\right)^{2}, \zeta_{j}=\frac{X_{j}^{\top} X_{j}}{2 -n}\left(\frac{d_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}, j=1, \ldots, -p.\)
Input: +, +and +
Initialize +, +and set +
For \(k=1,2, \ldots, k_{\max -}\), do
-\[\mathcal{A}_{k}=\left\{j \in -\mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq -\xi_{i}\right) \leq k\right\}\]
-\[\mathcal{I}_{k}=\left\{j \in -\mathcal{I}: \sum_{i \in \mathcal{I}} \mathrm{I}\left(\zeta_{j} \leq -\zeta_{i}\right) \leq k\right\}\]
-Let \(\tilde{\mathcal{A}}_{k}=\left(\mathcal{A} -\backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}, -\tilde{\mathcal{I}}_{k}=\left(\mathcal{I} \backslash -\mathcal{I}_{k}\right) \cup \mathcal{A}_{k}\) and solve
-\[\tilde{\boldsymbol{\beta}}_{{\mathcal{A}}_{k}}=\left(\boldsymbol{X}_{\mathcal{A}_{k}}^{\top} -\boldsymbol{X}_{{\mathcal{A}}_{k}}\right)^{-1} -\boldsymbol{X}_{{\mathcal{A}_{k}}}^{\top} y, \quad -\tilde{\boldsymbol{\beta}}_{{\mathcal{I}}_{k}}=0\]
-\[\tilde{\boldsymbol -d}_{\mathcal{I}^k}=X_{\mathcal{I}^k}^{\top}(y-X \tilde{\beta}) / n,\quad -\tilde{\boldsymbol d}_{\mathcal{A}^k} = 0.\]
-Compute \(\mathcal{L}_{n}(\tilde{\boldsymbol\beta})=\frac{1}{2 -n}\|y-X \tilde{\boldsymbol\beta}\|_{2}^{2}.\)
-If \(L>\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), +
For +, +do
+ + +Let + +and solve
+ + +Compute +
+If +, then
-\[(\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})=\left(\tilde{\boldsymbol{\beta}}, -\tilde{\boldsymbol{d}}, \tilde{\mathcal{A}}_{k}, -\tilde{\mathcal{I}}_{k}\right)\]
-\[L=\mathcal{L}_{n}(\tilde{\boldsymbol\beta}).\]
+ +End for
If \(L_{0}-L<\tau_{s}\), then -\((\hat{\boldsymbol\beta}, \hat{d}, \hat{A}, -\hat{I})=(\boldsymbol\beta, d, \mathcal{A}, -\mathcal{I}).\)
Output \((\hat{\boldsymbol{\beta}}, -\hat{\boldsymbol{d}}, \hat{\mathcal{A}}, -\hat{\mathcal{I}})\).
If +, +then +
Output +.
In practice, the support size is usually unknown. We use a -data-driven procedure to determine s. For any active set \(\mathcal{A}\), define an \(\mathrm{SIC}\) as follows: \[ -\operatorname{SIC}(\mathcal{A})=n \log -\mathcal{L}_{\mathcal{A}}+|\mathcal{A}| \log (p) \log \log n, -\] where \(\mathcal{L}_{\mathcal{A}}=\min -_{\beta_{\mathcal{I}}=0} \mathcal{L}_{n}(\beta), -\mathcal{I}=(\mathcal{A})^{c}\). To identify the true model, the -model complexity penalty is \(\log p\) -and the slow diverging rate \(\log \log -n\) is set to prevent underfitting. Theorem 4 states that the -following ABESS algorithm selects the true support size via SIC.
-Let \(s_{\max }\) be the maximum -support size. We suggest \(s_{\max -}=o\left(\frac{n}{\log p}\right)\) as the maximum possible -recovery size. Typically, we set \(s_{\max -}=\left[\frac{n}{\log (p) \log \log n}\right]\) where \([x]\) denotes the integer part of \(x\).
+data-driven procedure to determine s. For any active set +, +define an + +as follows: + where +. +To identify the true model, the model complexity penalty is + +and the slow diverging rate + +is set to prevent underfitting. Theorem 4 states that the following +ABESS algorithm selects the true support size via SIC. +Let + +be the maximum support size. We suggest + +as the maximum possible recovery size. Typically, we set + +where + +denotes the integer part of +.
Input: \(X, y\), and the maximum -support size \(s_{\max } .\)
Input: +, +and the maximum support size +
For \(s=1,2, \ldots, s_{\max -}\), do
-\[\left(\hat{\boldsymbol{\beta}}_{s}, -\hat{\boldsymbol{d}}_{s}, \hat{\mathcal{A}}_{s}, -\hat{\mathcal{I}}_{s}\right)= \text{BESS.Fixed}(s).\]
+For +, +do
+End for
Compute the minimum of SIC:
-\[s_{\min }=\arg \min _{s} -\operatorname{SIC}\left(\hat{\mathcal{A}}_{s}\right).\]
+Output \(\left(\hat{\boldsymbol{\beta}}_{s_{\operatorname{min}}}, -\hat{\boldsymbol{d}}_{s_{\min }}, \hat{A}_{s_{\min }}, -\hat{\mathcal{I}}_{s_{\min }}\right) .\)
Output +
../vignettes/v11-power-of-abess.Rmd
+ Source: vignettes/v11-power-of-abess.Rmd
v11-power-of-abess.Rmd
The designed matrix is formed by i.i.d samples generated from a -multivariate normal distribution with mean 0 and covariance matrix \(\Sigma = (\sigma_{ij})\). We consider two -settings—low correlation and high correlation of the predictors. For the -low correlation scenario, we set \(\sigma_{ij} -= 0.1^{|i-j|}\) and for the high correlation \(\sigma_{ij} = 0.7\). The number of -predictors is 1000. The true coefficient \(\beta^*\) is a vector with 10 nonzero -entries uniformly distributed in \([b,B]\). We set \(b=5\sqrt{2\log(p)/n}\), \(B = 100b\) for linear regression \(b = 10\sqrt{2\log(p)/n}\), \(B = 5 \times b\) for logistic regression -and \(b = -10 \sqrt{2 \log(p) / n}\), -\(B=10 \sqrt{2 \log(p) / n}\) for -poisson regression. A random noise generated from a standard Gaussian -distribution is added to the linear predictor \(x^\prime\beta\) for linear regression. The -size of training data for linear regression is 500 and 1000 for logistic -regression.
+multivariate normal distribution with mean 0 and covariance matrix +. +We consider two settings—low correlation and high correlation of the +predictors. For the low correlation scenario, we set + +and for the high correlation +. +The number of predictors is 1000. The true coefficient + +is a vector with 10 nonzero entries uniformly distributed in +. +We set +, + +for linear regression +, + +for logistic regression and +, + +for poisson regression. A random noise generated from a standard +Gaussian distribution is added to the linear predictor + +for linear regression. The size of training data for linear regression +is 500 and 1000 for logistic regression.All experiments are evaluated on an Ubuntu system with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz 3.31 GHz and under R version 3.6.3.
../vignettes/v12-Robust-Principal-Component-Analysis.Rmd
+ Source: vignettes/v12-Robust-Principal-Component-Analysis.Rmd
v12-Robust-Principal-Component-Analysis.Rmd
Principal component analysis (PCA) is an important method in the field of data science, which can reduce the dimension of data and simplify our model. It solves an optimization problem like:
-\[ + where \(\Sigma = X^TX/(n-1)\) -and \(X\in \mathbb{R}^{n\times p}\) is -the centered sample matrix with each row containing one observation of -\(p\) variables.
+ where + +and + +is the centered sample matrix with each row containing one observation +of + +variables.In mathematics, RPCA manages to divide the sample matrix \(X\) into two parts: \[
+ In mathematics, RPCA manages to divide the sample matrix
+
+into two parts:
+
In Lagrange format, \[ -\min _{S, L}\|X-S-L\|_{F} \leq \varepsilon, s . t . \quad -\operatorname{rank}(L)=r,\|S\|_{0} \leq s -\] where \(s\) is the sparsity -of \(S\). After RPCA, the information -matrix \(L\) can be used in further -analysis.
-Note that it does NOT deal with “noise”, which may stay in \(L\) and need further procession.
+In Lagrange format, + where + +is the sparsity of +. +After RPCA, the information matrix + +can be used in further analysis.
+Note that it does NOT deal with “noise”, which may stay in + +and need further procession.
To solve its sub-problem, RPCA with known outlier positions, we follow a process called “Hard Impute”. The main idea is to estimate the -outlier values by precise values with KPCA, where \(K=r\).
+outlier values by precise values with KPCA, where +.Here are the steps:
outliers
,
-\(M, \varepsilon\), where outliers
-record the non-zero positions in \(S\);outliers
,
+,
+where outliers record the non-zero positions in
+;The final \(X_{\text {new }}\) is -supposed to be \(L\) under given -outlier positions.
+The final + +is supposed to be + +under given outlier positions.
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.0.
- + - + - +Zezhi Wang. Author. +
Zezhi Wang. Author.
Liyuan Hu. Author. +
Liyuan Hu. Author.
Junhao Huang. Author. +
Junhao Huang. Author.
Kangkang Jiang. Author. +
Kangkang Jiang. Author.
Yanhang Zhang. Author. +
Yanhang Zhang. Author.
Borui Tang. Author. +
Borui Tang. Author.
Shiyun Lin. Author. +
Shiyun Lin. Author.
Junxian Zhu. Author. +
Junxian Zhu. Author.
Canhong Wen. Author. +
Canhong Wen. Author.
spectra contributors. Copyright holder. +
spectra contributors. Copyright holder.
Spectra implementation
Zhu J, Wen C, Zhu J, Zhang H, Wang X (2020). “A polynomial algorithm for best-subset selection problem.” Proceedings of the National Academy of Sciences, 117(52), 33117–33123. -doi:10.1073/pnas.2014241117. +doi:10.1073/pnas.2014241117.
@Article{, title = {A polynomial algorithm for best-subset selection problem}, @@ -172,7 +172,7 @@Citation
Zhu J, Wang X, Hu L, Huang J, Jiang K, Zhang Y, Lin S, Zhu J (2022). “abess: A Fast Best Subset Selection Library in Python and R.” Journal of Machine Learning Research, 23(202), 1–7. -https://www.jmlr.org/papers/v23/21-1060.html. +https://www.jmlr.org/papers/v23/21-1060.html.
@Article{, title = {abess: A Fast Best Subset Selection Library in Python and R}, @@ -187,7 +187,7 @@Citation
Zhang Y, Zhu J, Zhu J, Wang X (2023). “A Splicing Approach to Best Subset of Groups Selection.” INFORMS Journal on Computing, 35(1), 104–119. -doi:10.1287/ijoc.2022.1241. +doi:10.1287/ijoc.2022.1241.
@Article{, title = {A Splicing Approach to Best Subset of Groups Selection}, @@ -211,16 +211,16 @@Citation
- + - +- + - + - +@@ -163,16 +163,16 @@abess 0.1.02
- + - + - + - + - +@@ -102,7 +102,7 @@Adaptive best subset selection (for generalized linear model)
-- + - +# S3 method for default +
# Default S3 method abess( x, y, @@ -140,19 +140,21 @@
Adaptive best subset selection (for generalized linear model)
... ) -# S3 method for formula +# S3 method for class 'formula' abess(formula, data, subset, na.action, ...)Arguments
-
- x
+ + +
- x
- -
Input matrix, of dimension \(n \times p\); each row is an observation vector and each column is a predictor/feature/variable. Can be in sparse matrix format (inherit from class
"dgCMatrix"
in packageMatrix
).- y
+- y
- -
The response variable, of
n
observations. Forfamily = "binomial"
should have two levels. Forfamily="poisson"
,y
should be a vector with positive integer. @@ -165,7 +167,7 @@Arguments
if y is presented as a numerical vector, it will be coerced into a factor.- family
+- family
- -
One of the following models:
"gaussian"
(continuous response),"binomial"
(binary response), @@ -178,25 +180,25 @@Arguments
Depending on the response. Any unambiguous substring can be given.- tune.path
+- tune.path
- -
The method to be used to select the optimal support size. For
tune.path = "sequence"
, we solve the best subset selection problem for each size insupport.size
. Fortune.path = "gsection"
, we solve the best subset selection problem with support size ranged ings.range
, where the specific support size to be considered is determined by golden section.- tune.type
+- tune.type
- -
The type of criterion for choosing the support size. Available options are
"gic"
,"ebic"
,"bic"
,"aic"
and"cv"
. Default is"gic"
.- weight
+- weight
- -
Observation weights. When
weight = NULL
, we setweight = 1
for each observation as default.- normalize
+- normalize
- -
Options for normalization.
normalize = 0
for no normalization.normalize = 1
for subtracting the means of the columns ofx
andy
, and also @@ -208,44 +210,44 @@Arguments
3
for"cox"
. Default isnormalize = NULL
.- fit.intercept
+- fit.intercept
- -
A boolean value indicating whether to fit an intercept. We assume the data has been centered if
fit.intercept = FALSE
. Default:fit.intercept = FALSE
.- beta.low
+- beta.low
- -
A single value specifying the lower bound of \(\beta\). Default is
-.Machine$double.xmax
.- beta.high
+- beta.high
- -
A single value specifying the upper bound of \(\beta\). Default is
.Machine$double.xmax
.- c.max
+- c.max
- -
an integer splicing size. Default is:
c.max = 2
.- support.size
+- support.size
- -
An integer vector representing the alternative support sizes. Only used for
tune.path = "sequence"
. Default is0:min(n, round(n/(log(log(n))log(p))))
.- gs.range
+- gs.range
- -
A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is
gs.range = c(1, min(n, round(n/(log(log(n))log(p)))))
.- lambda
+- lambda
- -
A single lambda value for regularized best subset selection. Default is 0.
- always.include
+- always.include
- -
An integer vector containing the indexes of variables that should always be included in the model.
- group.index
+- group.index
- -
A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of
x
and their corresponding index ingroup.index
should be the same. @@ -254,12 +256,12 @@Arguments
please setgroup.index = NULL
(the default).- init.active.set
+- init.active.set
- -
A vector of integers indicating the initial active set. Default:
init.active.set = NULL
.- splicing.type
+- splicing.type
- -
Optional type for splicing. If
splicing.type = 1
, the number of variables to be spliced isc.max
, ...,1
; ifsplicing.type = 2
, @@ -267,45 +269,45 @@Arguments
(Default:splicing.type = 2
.)- max.splicing.iter
+- max.splicing.iter
- -
The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is
max.splicing.iter = 20
.- screening.num
+- screening.num
- -
An integer number. Preserve
screening.num
number of predictors with the largest marginal maximum likelihood estimator before running algorithm.- important.search
+- important.search
- -
An integer number indicating the number of important variables to be splicing. When
important.search
\(\ll\)p
variables, it would greatly reduce runtimes. Default:important.search = 128
.- warm.start
+- warm.start
- -
Whether to use the last solution as a warm start. Default is
warm.start = TRUE
.- nfolds
+- nfolds
- -
The number of folds in cross-validation. Default is
nfolds = 5
.- foldid
+- foldid
- -
an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in. The default
foldid = NULL
would generate a random foldid.- cov.update
+- cov.update
- -
A logical value only used for
family = "gaussian"
. Ifcov.update = TRUE
, use a covariance-based implementation; otherwise, a naive implementation. The naive method is more computational efficient than covariance-based method when \(p >> n\) andimportant.search
is much large than its default value. Default:cov.update = FALSE
.- newton
+- newton
- -
A character specify the Newton's method for fitting generalized linear models, it should be either
newton = "exact"
ornewton = "approx"
. Ifnewton = "exact"
, then the exact hessian is used, @@ -313,58 +315,58 @@Arguments
and can be faster (especially whenfamily = "cox"
).- newton.thresh
+- newton.thresh
- -
a numeric value for controlling positive convergence tolerance. The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\)
newton.thresh
.- max.newton.iter
+- max.newton.iter
- -
a integer giving the maximal number of Newton's iteration iterations. Default is
max.newton.iter = 10
ifnewton = "exact"
, andmax.newton.iter = 60
ifnewton = "approx"
.- early.stop
+- early.stop
- -
A boolean value decide whether early stopping. If
early.stop = TRUE
, algorithm will stop if the last tuning value less than the existing one. Default:early.stop = FALSE
.- ic.scale
+- ic.scale
- -
A non-negative value used for multiplying the penalty term in information criterion. Default:
ic.scale = 1
.- num.threads
+- num.threads
- -
An integer decide the number of threads to be concurrently used for cross-validation (i.e.,
tune.type = "cv"
). Ifnum.threads = 0
, then all of available cores will be used. Default:num.threads = 0
.- seed
+- seed
- -
Seed to be used to divide the sample into cross-validation folds. Default is
seed = 1
.- ...
+- ...
- -
further arguments to be passed to or from methods.
- formula
+- formula
- -
an object of class "
formula
": a symbolic description of the model to be fitted. The details of model specification are given in the "Details" section of "formula
".- data
+- data
- -
a data frame containing the variables in the
formula
.- subset
+- subset
- -
an optional vector specifying a subset of observations to be used.
- na.action
+- na.action
- @@ -372,9 +374,7 @@
a function which indicates what should happen when the data contain
NA
s. Defaults togetOption("na.action")
.Arguments
Value
- - -A S3
+abess
class object, which is alist
with the following components:A S3
abess
class object, which is alist
with the following components:
- beta
- @@ -866,16 +866,16 @@
A \(p\)-by-
length(support.size)
matrix of coefficients for univariate family, stored in column format; while a list oflength(support.size)
coefficients matrix (with size \(p\)-by-ncol(y)
) for multivariate family.Examples
- +@@ -125,7 +125,9 @@- + - +Adaptive best subset selection for principal component analysis
Arguments
-
- x
+ + +
- x
- -
A matrix object. It can be either a predictor matrix where each row is an observation and each column is a predictor or a sample covariance/correlation matrix. @@ -133,29 +135,29 @@
Arguments
(inherit from class"dgCMatrix"
in packageMatrix
).- type
+- type
- -
If
type = "predictor"
,x
is considered as the predictor matrix. Iftype = "gram"
,x
is considered as a sample covariance or correlation matrix.- sparse.type
+- sparse.type
- -
If
sparse.type = "fpc"
, then best subset selection performs on the first principal component; Ifsparse.type = "kpc"
, then best subset selection would be sequentially performed on the firstkpc.num
number of principal components. Ifkpc.num
is supplied, the default issparse.type = "kpc"
; otherwise, issparse.type = "fpc"
.- cor
+- cor
- -
A logical value. If
cor = TRUE
, perform PCA on the correlation matrix; otherwise, the covariance matrix. This option is available only iftype = "predictor"
. Default:cor = FALSE
.- kpc.num
+- kpc.num
- -
A integer decide the number of principal components to be sequentially considered.
- support.size
+- support.size
- -
It is a flexible input. If it is an integer vector. It represents the support sizes to be considered for each principal component. If it is a
list
object containingkpc.num
number of integer vectors, @@ -164,49 +166,49 @@Arguments
The default issupport.size = NULL
, and some rules in details section are used to specifysupport.size
.- gs.range
+- gs.range
- -
A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is
gs.range = c(1, min(n, round(n/(log(log(n))log(p)))))
.- tune.path
+- tune.path
- -
The method to be used to select the optimal support size. For
tune.path = "sequence"
, we solve the best subset selection problem for each size insupport.size
. Fortune.path = "gsection"
, we solve the best subset selection problem with support size ranged ings.range
, where the specific support size to be considered is determined by golden section.- tune.type
+- tune.type
- -
The type of criterion for choosing the support size. Available options are
"gic"
,"ebic"
,"bic"
,"aic"
and"cv"
. Default is"gic"
.tune.type = "cv"
is available only whentype = "predictor"
.- nfolds
+- nfolds
- -
The number of folds in cross-validation. Default is
nfolds = 5
.- foldid
+- foldid
- -
an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in. The default
foldid = NULL
would generate a random foldid.- ic.scale
+- ic.scale
- -
A non-negative value used for multiplying the penalty term in information criterion. Default:
ic.scale = 1
.- c.max
+- c.max
- -
an integer splicing size. The default of
c.max
is the maximum of 2 andmax(support.size) / 2
.- always.include
+- always.include
- -
An integer vector containing the indexes of variables that should always be included in the model.
- group.index
+- group.index
- -
A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of
x
and their corresponding index ingroup.index
should be the same. @@ -215,12 +217,12 @@Arguments
please setgroup.index = NULL
(the default).- screening.num
+- screening.num
- -
An integer number. Preserve
screening.num
number of predictors with the largest marginal maximum likelihood estimator before running algorithm.- splicing.type
+- splicing.type
- -
Optional type for splicing. If
splicing.type = 1
, the number of variables to be spliced isc.max
, ...,1
; ifsplicing.type = 2
, @@ -228,32 +230,30 @@Arguments
Default:splicing.type = 1
.- max.splicing.iter
+- max.splicing.iter
- -
The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is
max.splicing.iter = 20
.- warm.start
+- warm.start
- -
Whether to use the last solution as a warm start. Default is
warm.start = TRUE
.- num.threads
+- num.threads
- -
An integer decide the number of threads to be concurrently used for cross-validation (i.e.,
tune.type = "cv"
). Ifnum.threads = 0
, then all of available cores will be used. Default:num.threads = 0
.- ...
+- ...
further arguments to be passed to or from methods.
Value
- - -A S3
+abesspca
class object, which is alist
with the following components:A S3
abesspca
class object, which is alist
with the following components:
- coef
- @@ -399,10 +399,10 @@
A \(p\)-by-
length(support.size)
loading matrix of sparse principal components (PC), where each row is a variable and each column is a support size;Examples
#> abesspca(x = rob_cov, type = "gram") #> #> PC support.size ev pev -#> 1 1 1 5389.433 0.9451258 -#> 2 1 2 5467.001 0.9587285 -#> 3 1 3 5525.057 0.9689096 -#> 4 1 4 5535.616 0.9707613 +#> 1 1 1 5322.730 0.9534441 +#> 2 1 2 5375.003 0.9628076 +#> 3 1 3 5406.960 0.9685319 +#> 4 1 4 5419.600 0.9707961 ## K-component principal component analysis pca_fit <- abesspca(USArrests, @@ -455,16 +455,16 @@Examples
- +@@ -126,53 +126,55 @@- +Adaptive best subset selection for robust principal component analysis
Arguments
-
- x
+ + +
- x
- -
A matrix object.
- rank
+- rank
- -
A positive integer value specify the rank of the low-rank matrix.
- support.size
-An integer vector representing the alternative support sizes. -Only used for
tune.path = "sequence"
. +- support.size
+- -
An integer vector representing the alternative support sizes. +Only used for
tune.path = "sequence"
. Strongly suggest its minimum value larger thanmin(dim(x))
.- tune.path
+- tune.path
- -
The method to be used to select the optimal support size. For
tune.path = "sequence"
, we solve the best subset selection problem for each size insupport.size
. Fortune.path = "gsection"
, we solve the best subset selection problem with support size ranged ings.range
, where the specific support size to be considered is determined by golden section.- gs.range
+- gs.range
- -
A integer vector with two elements. The first element is the minimum model size considered by golden-section, the later one is the maximum one. Default is
gs.range = c(1, min(n, round(n/(log(log(n))log(p)))))
.- tune.type
-The type of criterion for choosing the support size. -Available options are "gic", "ebic", "bic" and "aic". +
- tune.type
+- -
The type of criterion for choosing the support size. +Available options are "gic", "ebic", "bic" and "aic". Default is "gic".
- ic.scale
+- ic.scale
- -
A non-negative value used for multiplying the penalty term in information criterion. Default:
ic.scale = 1
.- lambda
+- lambda
- -
A single lambda value for regularized best subset selection. Default is 0.
- always.include
+- always.include
- -
An integer vector containing the indexes of variables that should always be included in the model.
- group.index
+- group.index
- -
A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of
x
and their corresponding index ingroup.index
should be the same. @@ -181,11 +183,11 @@Arguments
please setgroup.index = NULL
(the default).- c.max
+- c.max
- -
an integer splicing size. Default is:
c.max = 2
.- splicing.type
+- splicing.type
- -
Optional type for splicing. If
splicing.type = 1
, the number of variables to be spliced isc.max
, ...,1
; ifsplicing.type = 2
, @@ -193,54 +195,52 @@Arguments
(Default:splicing.type = 2
.)- max.splicing.iter
+- max.splicing.iter
- -
The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is
max.splicing.iter = 20
.- warm.start
+- warm.start
- -
Whether to use the last solution as a warm start. Default is
warm.start = TRUE
.- important.search
+- important.search
- -
An integer number indicating the number of important variables to be splicing. When
important.search
\(\ll\)p
variables, it would greatly reduce runtimes. Default:important.search = 128
.- max.newton.iter
+- max.newton.iter
- -
a integer giving the maximal number of Newton's iteration iterations. Default is
max.newton.iter = 10
ifnewton = "exact"
, andmax.newton.iter = 60
ifnewton = "approx"
.- newton.thresh
+- newton.thresh
- -
a numeric value for controlling positive convergence tolerance. The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\)
newton.thresh
.- num.threads
+- num.threads
- -
An integer decide the number of threads to be concurrently used for cross-validation (i.e.,
tune.type = "cv"
). Ifnum.threads = 0
, then all of available cores will be used. Default:num.threads = 0
.- seed
+- seed
- -
Seed to be used to divide the sample into cross-validation folds. Default is
seed = 1
.- ...
+- ...
further arguments to be passed to or from methods.
- + - +Value
- - -A S3
+abessrpca
class object, which is alist
with the following components:A S3
abessrpca
class object, which is alist
with the following components:
- S
- @@ -277,25 +277,25 @@
A list with
length(support.size)
elements, each of which is a sparse matrix estimation;Value
Details
Adaptive best subset selection for robust principal component analysis aim to find two latent matrices \(L\) and \(S\) such that the original matrix \(X\) can be appropriately approximated: -$$x = L + S + N,$$ -where \(L\) is a low-rank matrix, \(S\) is a sparse matrix, \(N\) is a dense noise matrix. +$$x = L + S + N,$$ +where \(L\) is a low-rank matrix, \(S\) is a sparse matrix, \(N\) is a dense noise matrix. Generic splicing technique can be employed to solve this problem by iteratively improve the quality of the estimation of \(S\).
-For a given support set \(\Omega\), the optimization problem: +
For a given support set \(\Omega\), the optimization problem: $$\min_S \| x - L - S\|_F^2 \;\;{\rm s.t.}\;\; S_{ij} = 0 {\rm for } (i, j) \in \Omega^c,$$ -still a non-convex optimization problem. We use the hard-impute algorithm proposed in one of the reference to solve this problem. -The hard-impute algorithm is an iterative algorithm, people can set
max.newton.iter
andnewton.thresh
to -control the solution precision of the optimization problem. -(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.) -According to our experiments, -we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced, +still a non-convex optimization problem. We use the hard-impute algorithm proposed in one of the reference to solve this problem. +The hard-impute algorithm is an iterative algorithm, people can setmax.newton.iter
andnewton.thresh
to +control the solution precision of the optimization problem. +(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.) +According to our experiments, +we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced, we suggest users keep the default values unchanged.Note
-Some parameters not described in the Details Section is explained in the document for
abess
+Some parameters not described in the Details Section is explained in the document for
-abess
because the meaning of these parameters are very similar.At present, \(l_2\) regularization and group selection are not support, -and thus, set
lambda
andgroup.index
have no influence on the output. +At present, \(l_2\) regularization and group selection are not support, +and thus, set
lambda
andgroup.index
have no influence on the output. This feature will coming soon.@@ -520,16 +520,16 @@Examples
- + - +@@ -100,37 +100,37 @@Extract Model Coefficients from a fitted "
abess
" object.-# S3 method for abess +
# S3 method for class 'abess' coef(object, support.size = NULL, sparse = TRUE, ...)
Arguments
-
- object
+ + +
- object
- -
An "
abess
" project.- support.size
+- support.size
- -
An integer vector specifies the coefficient fitted at given
support.size
. Ifsupport.size = NULL
, then all coefficients would be returned. Default:support.size = NULL
.- sparse
+- sparse
- -
A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default:
sparse = TRUE
.- ...
+- ...
Other arguments.
@@ -156,16 +156,16 @@Value
- - -A coefficient matrix when fitting an univariate model including gaussian, binomial, poisson, and cox; +
A coefficient matrix when fitting an univariate model including gaussian, binomial, poisson, and cox; otherwise, a list containing coefficient matrices. For a coefficient matrix, each row is a variable, and each column is a support size.
See also
- + - +@@ -100,17 +100,19 @@Extract Sparse Loadings from a fitted "
abesspca
" object.-# S3 method for abesspca +
# S3 method for class 'abesspca' coef(object, support.size = NULL, kpc = NULL, sparse = TRUE, ...)
Arguments
-
- object
+ + +
- object
- -
An "
abesspca
" project.- support.size
+- support.size
- -
An integer vector specifies the coefficient fitted at given
support.size
. Ifsupport.size = NULL
, then all coefficients would be returned. @@ -118,7 +120,7 @@Arguments
This parameter is omitted if sparse.type = "kpc".- kpc
+- kpc
- -
An integer vector specifies the coefficient fitted at given principal component. If
kpc = NULL
, then all coefficients would be returned. @@ -126,20 +128,18 @@Arguments
This parameter is omitted if sparse.type = "fpc".- sparse
+- sparse
- -
A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default:
sparse = TRUE
.- ...
+- ...
Other arguments.
@@ -162,16 +162,16 @@Value
- - -A matrix with
length(support.size)
columns. +A matrix with
length(support.size)
columns. Each column corresponds to a sparse loading for the first principal component, where the number of non-zeros entries depends on thesupport.size
.See also
- + - + - +@@ -100,38 +100,38 @@Extract sparse component from a fitted "
abessrpca
" object.-# S3 method for abessrpca +
# S3 method for class 'abessrpca' coef(object, support.size = NULL, sparse = TRUE, ...)
Arguments
-
- object
+ + +
- object
- -
An "
abessrpca
" project.- support.size
+- support.size
- -
An integer vector specifies the sparse matrix fitted at given
support.size
to be returned. -Ifsupport.size = NULL
, then the sparse matrix with +Ifsupport.size = NULL
, then the sparse matrix with the least tuning value would be returned. Default:support.size = NULL
.- sparse
+- sparse
- -
A logical value, specifying whether the coefficients should be presented as sparse matrix or not. Default:
sparse = TRUE
.- ...
+- ...
Other arguments.
@@ -147,16 +147,16 @@Value
- - -A list with
length(support.size)
number of dgCMatrix, +A list with
length(support.size)
number of dgCMatrix, each of which is the estimation the sparse component.Value
- + - +@@ -100,32 +100,32 @@Extract the deviance from a fitted "
abess
" object.-- + - +# S3 method for abess +
Arguments
-Value
- - -A numeric vector.
+A numeric vector.
See also
@@ -149,16 +149,16 @@See also
- +@@ -104,17 +104,19 @@- + - +Extract one model from a fitted "
abess
" object.extract(object, support.size = NULL, ...) -# S3 method for abess +# S3 method for class 'abess' extract(object, support.size = NULL, ...)
Arguments
-Value
- - -A
+list
object including the following components:A
list
object including the following components:
- beta
- @@ -177,16 +177,16 @@
A \(p\)-by-1 matrix of sparse matrix, stored in column format.
See also
- +@@ -121,39 +121,41 @@- + - +Generate simulated data
Arguments
-
- n
+ + +
- n
- -
The number of observations.
- p
+- p
- -
The number of predictors of interest.
- support.size
+- support.size
- -
The number of nonzero coefficients in the underlying regression model. Can be omitted if
beta
is supplied.- rho
+- rho
- -
A parameter used to characterize the pairwise correlation in predictors. Default is
0
.- family
+- family
- -
The distribution of the simulated response.
"gaussian"
for univariate quantitative response,"binomial"
for binary classification response,"poisson"
for counting response,"cox"
for left-censored response,"mgaussian"
for multivariate quantitative response, -"mgaussian"
for multi-classification response, +"mgaussian"
for multi-classification response,"ordinal"
for ordinal response.- beta
+- beta
- -
The coefficient values in the underlying regression model. If it is supplied,
support.size
would be omitted.- cortype
+- cortype
- -
The correlation structure.
cortype = 1
denotes the independence structure, where the covariance matrix has \((i,j)\) entry equals \(I(i \neq j)\). @@ -164,7 +166,7 @@Arguments
matrix are \(rho\) and diagonal entries are 1.- snr
+- snr
- -
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as as the variance of \(x\beta\) divided by the variance of a gaussian noise: \(\frac{Var(x\beta)}{\sigma^2}\). @@ -173,17 +175,17 @@
Arguments
Note that this arguments's effect is overridden ifsigma
is supplied with a non-null value.- sigma
+- sigma
- -
The variance of the gaussian noise. Default
sigma = NULL
implies it is determined bysnr
.- weibull.shape
+- weibull.shape
- -
The shape parameter of the Weibull distribution. It works only when
family = "cox"
. Default:weibull.shape = 1
.- uniform.max
+- uniform.max
- -
A parameter controlling censored rate. A large value implies a small censored rate; otherwise, a large censored rate. @@ -191,23 +193,21 @@
Arguments
Default isuniform.max = 1
.- y.dim
+- y.dim
- -
Response's Dimension. It works only when
family = "mgaussian"
. Default:y.dim = 3
.- class.num
+- class.num
- -
The number of class. It works only when
family = "multinomial"
. Default:class.num = 3
.- seed
+- seed
random seed. Default:
seed = 1
.Value
- - -A
+list
object comprising:A
list
object comprising:
- x
- @@ -296,16 +296,16 @@
Design matrix of predictors.
Examples
- +@@ -114,48 +114,48 @@- + - +Generate matrix composed of a sparse matrix and low-rank matrix
Arguments
-
- n
+ + +
- n
- -
The number of observations.
- p
+- p
- -
The number of predictors of interest.
- rank
+- rank
- -
The rank of low-rank matrix.
- support.size
+- support.size
- -
The number of nonzero coefficients in the underlying regression model. Can be omitted if
beta
is supplied.- beta
+- beta
- -
The coefficient values in the underlying regression model. If it is supplied,
support.size
would be omitted.- snr
+- snr
- -
A positive value controlling the signal-to-noise ratio (SNR). A larger SNR implies the identification of sparse matrix is much easier. Default
snr = Inf
enforces no noise exists.- sigma
-A numerical value supplied the variance of the gaussian noise. +
- sigma
+- -
A numerical value supplied the variance of the gaussian noise. Default
sigma = NULL
implies it is determined bysnr
.- seed
+- seed
random seed. Default:
seed = 1
.Value
- - -A
+list
object comprising:A
list
object comprising:
- x
- @@ -221,16 +221,16 @@
An \(n\)-by-\(p\) matrix.
Examples
- + - +@@ -113,44 +113,44 @@- +Generate matrix with sparse principal component
Arguments
-
- n
+ + +
- n
- -
The number of observations.
- p
+- p
- -
The number of predictors of interest.
- support.size
+- support.size
- -
A integer specify the number of non-zero entries in the first column of loading matrix.
- snr
+- snr
- -
A positive value controlling the signal-to-noise ratio (SNR). A larger SNR implies the identification of sparse matrix is much easier. Default
snr = Inf
enforces no noise exists.- sigma
+- sigma
- -
A numerical vector with length
p
specify the standard deviation of each columns. -Defaultsigma = NULL
implies it is determined bysnr
. +Defaultsigma = NULL
implies it is determined bysnr
. If it is supplied,support.size
would be omit.- sparse.loading
-A
p
-by-p
sparse orthogonal matrix. +- sparse.loading
+- -
A
p
-by-p
sparse orthogonal matrix. If it is supplied,support.size
would be omit.- seed
+- seed
random seed. Default:
seed = 1
.Value
- - -A
+list
object comprising:A
list
object comprising:
- x
- @@ -183,16 +183,16 @@
An \(n\)-by-\(p\) matrix.
References
Package index • abess - + - +@@ -100,7 +100,7 @@Creat plot from a fitted "
abess
" object-- + - +- +# S3 method for abess +
- + - +# S3 method for class 'abess' plot( x, type = c("coef", "l2norm", "dev", "dev.ratio", "tune"), @@ -111,11 +111,13 @@
Creat plot from a fitted "
abess
" objectArguments
-
- x
+ + +
- x
- -
A "
abess
" object.- type
+- type
- -
The type of terms to be plot in the y-axis. One of the following:
"coef"
(i.e., coefficients),"l2norm"
(i.e., L2-norm of coefficients), @@ -124,21 +126,19 @@Arguments
Default is"coef"
.- label
+- label
- -
A logical value. If
label = TRUE
(the default), label the curves with variable sequence numbers.- ...
+- ...
Other graphical parameters to plot
Value
- - -No return value, called for side effects.
+No return value, called for side effects.
Note
@@ -182,16 +182,16 @@Examples
- +@@ -100,17 +100,19 @@Creat plot from a fitted "
abess
" object-- + - +# S3 method for abesspca +
Arguments
-
- x
+ + +
- x
- -
A "
abess
" object.- type
+- type
- -
The type of terms to be plot in the y-axis. One of the following:
"pev"
(i.e., percent of explained variance), @@ -119,21 +121,19 @@Arguments
Default is"coef"
.- label
+- label
- -
A logical value. If
label = TRUE
(the default), label the curves with variable sequence numbers.- ...
+- ...
Other graphical parameters to plot
Value
- - -No return value, called for side effects.
+No return value, called for side effects.
Note
@@ -173,16 +173,16 @@Examples
@@ -100,17 +100,19 @@Creat plot from a fitted "
abessrpca
" object-@@ -157,16 +157,16 @@# S3 method for abessrpca +
Arguments
-
- x
+ + +
- x
- -
A "
abessrpca
" object.- type
+- type
- -
The plot type. One of the following:
"S"
(i.e., a heatmap for the sparse matrix estimation), @@ -119,30 +121,28 @@Arguments
Default is"coef"
.- support.size
+- support.size
- -
An integer vector specifies the sparse matrix fitted at given
support.size
to be returned. -Ifsupport.size = NULL
, then the sparse matrix with +Ifsupport.size = NULL
, then the sparse matrix with the least tuning value would be returned. Default:support.size = NULL
.- label
+- label
- -
A logical value. If
label = TRUE
(the default), label the curves with variable sequence numbers.- ...
-Other graphical parameters to
plot
+- ...
+Other graphical parameters to
plot
orstats::heatmap
functionValue
- - -No return value, called for side effects.
+No return value, called for side effects.
Value
- +@@ -98,21 +98,23 @@Make predictions from a fitted "
abess
" object.-- + - +# S3 method for abess +
Arguments
-
- object
+ + +
- object
- -
An "
abess
" project.- newx
+- newx
- -
New data used for prediction. If omitted, the fitted linear predictors are used.
- type
+- type
- -
type = "link"
gives the linear predictors for"binomial"
,"poisson"
or"cox"
models; for"gaussian"
models it gives the fitted values.type = "response"
gives the fitted probabilities for @@ -120,7 +122,7 @@Arguments
"cox"
; for"gaussian"
,type = "response"
is equivalent totype = "link"
.- support.size
+- support.size
- -
An integer value specifies the model size fitted at given
support.size
. Ifsupport.size = NULL
, then the model with @@ -128,15 +130,13 @@Arguments
Default:support.size = NULL
.- ...
+- ...
Additional arguments affecting the predictions produced.
Value
- - -The object returned depends on the types of family.
+The object returned depends on the types of family.
See also
@@ -160,16 +160,16 @@See also
- +@@ -98,29 +98,29 @@Print method for a fitted "
abess
" object-- + - +# S3 method for abess +
Arguments
-Value
- - -No return value, called for side effects
+No return value, called for side effects
Details
@@ -151,16 +151,16 @@See also
- +@@ -98,29 +98,29 @@Print method for a fitted "
abesspca
" object-- + - +# S3 method for abesspca +
Arguments
-Value
- - -No return value, called for side effects
+No return value, called for side effects
Details
@@ -148,16 +148,16 @@See also
- + - +@@ -98,29 +98,29 @@Print method for a fitted "
abessrpca
" object-- + - +# S3 method for abessrpca +
Arguments
-Value
- - -No return value, called for side effects
+No return value, called for side effects
Details
@@ -142,16 +142,16 @@Details