You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of sgkit.genee only covers one special case of genee. The method does not perform the regularization of genee. In the current implementation, it is expected that regularization has happened before the method is called. Did I overlook an implementation of the regularization?
To my understanding, regularization in genee does not only serve the purpose of shrinking the effect estimates using penalties but also learning the approximately "true" SNP effects. the regularization proposed in the paper deflates the effect size estimates that arise from LD tagging. I.e. the marginal effects of linear regression performed on SNPs individually only give beta_marginal = LD_matrix * beta_true where SNPs marginally have nonzero effects but only because they tag a true causal SNP. The regularization is a crucial step in improving the mapping (e.g. https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1)
For an invertible matrix, if no regularization penalty is wanted, it would be possible to compute beta_true = (LD_matrix)^-1 * beta_marginal. Without regularization the implementation looks more like SKAT (https://doi.org/10.1016/j.ajhg.2011.05.029) even though it is still something different.
Usability
Cannot handle higher dimensionality of the beta values. It is not possible to directly use the gwas_linear_regression output from sgkit, specifically the trait dimension of the return object variable variant_linreg_beta.
sgkit provides a method for computing LD matrices. The return object of sgkit.ld_matrix is a dataframe that contains two columns holding the index and one holding the value. The required input for the sgkit.genee is a square LD matrix. Would it make sense to be able to use the sgkit LD matrix in this method?
The method sgkit.ld_matrix looses all information about variant_contig. If the windows on different contigs overlap, the matrix index indicators are not unique.
What in the R package is provided as gene list is provided here as windows. I would suggest to show an example in the docs that mentions the role of the windos, see PR.
Other
sgkit.genee does not "handle case where first component composed more than 50% SNPs". in the R package, in this case the variance component of the largest component is used as cutoff. This corresponds to a sparsity of causal SNPs assumption. There is a warning in the docs about this issue.
sgkit.genee does not provide the general option of adding custom weights to SNPs. This seems fine, especially because in the genee paper they argue that this way there is a phenotypic variance explained interpretation for the relevance of genes or gene regions
The text was updated successfully, but these errors were encountered:
In the current implementation, it is expected that regularization has happened before the method is called. Did I overlook an implementation of the regularization?
The current implementation of
sgkit.genee
only covers one special case of genee. The method does not perform the regularization of genee. In the current implementation, it is expected that regularization has happened before the method is called. Did I overlook an implementation of the regularization?To my understanding, regularization in genee does not only serve the purpose of shrinking the effect estimates using penalties but also learning the approximately "true" SNP effects. the regularization proposed in the paper deflates the effect size estimates that arise from LD tagging. I.e. the marginal effects of linear regression performed on SNPs individually only give
beta_marginal = LD_matrix * beta_true
where SNPs marginally have nonzero effects but only because they tag a true causal SNP. The regularization is a crucial step in improving the mapping (e.g. https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1)For an invertible matrix, if no regularization penalty is wanted, it would be possible to compute
beta_true = (LD_matrix)^-1 * beta_marginal
. Without regularization the implementation looks more like SKAT (https://doi.org/10.1016/j.ajhg.2011.05.029) even though it is still something different.Usability
gwas_linear_regression
output fromsgkit
, specifically the trait dimension of the return object variable variant_linreg_beta.sgkit
provides a method for computing LD matrices. The return object ofsgkit.ld_matrix
is a dataframe that contains two columns holding the index and one holding the value. The required input for thesgkit.genee
is a square LD matrix. Would it make sense to be able to use the sgkit LD matrix in this method?sgkit.ld_matrix
looses all information about variant_contig. If the windows on different contigs overlap, the matrix index indicators are not unique.Other
sgkit.genee
does not "handle case where first component composed more than 50% SNPs". in the R package, in this case the variance component of the largest component is used as cutoff. This corresponds to a sparsity of causal SNPs assumption. There is a warning in the docs about this issue.sgkit.genee
does not provide the general option of adding custom weights to SNPs. This seems fine, especially because in the genee paper they argue that this way there is a phenotypic variance explained interpretation for the relevance of genes or gene regionsThe text was updated successfully, but these errors were encountered: