Add module to support clustering analysis #682
sjspielman
started this conversation in
Propose a new analysis
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Proposed analysis
I am proposing an analysis module that can support clustering analysis. I am proposing this because, I expect, it will be common for contributors to want to cluster their data and/or assess clustering robustness. A module along those lines will provide a uniform way throughout OpenScPCA to perform this common analysis.
To be frank, my main motivation for this module is that I am interested in exploring the
sc-SHC
method. This method has two main functionalities, and for now I am primarily interested in the first here (I added bold for emphasis; copied from their README):So, to begin, I'd like to build this module around some
sc-SHC
exploration explicitly, and as we learn more about the method, the module may become a more general clustering module.Step 1: Is there a there there?
To begin, I envision a benchmark sort of analysis to answer, "is there a there there?" for using
sc-SHC
:sc-SHC
scpca-downstream-analyses
(cluster purity, silhouette width, cluster stability)sc-SHC
clustering values compared to the null distributions (e.g. using medians for each distribution?), but to start off, it's worth doing a simple visual assessment of those distributions before getting into the weeds too much.sc-SHC
clusters are indeed "significantly" better than random, we can proceed to compare those clusters to other clustering approaches, e.g. via varying parameters for walktrap orlouvainleiden (again, similar toscpca-downstream-analysis
).sc-SHC
is either comparable or more robust than other algorithms.Step 2: Write helper functions for clustering
There are broadly two sets of findings step 1 could yield and how we would proceed accordingly:
sc-SHC
seems like a useful framework, we could proceed to write a helper script and/or function to support others to use it.sc-SHC
does not seem like a useful framework, we might consider adapting, in some way, thescpca-downstream-analyses
clustering code for convenience for OpenScPCA contributors. I'm not entirely sure yet how we might provide results - would we pick a "best" clustering result to return, or provide a report for contributors to pick their clustering parameters?Scientific goals
The primary goals of this analysis are two-fold:
sc-SHC
is a valuable method for performing clusteringThere are also some bonus goals!
sc-SHC
turns out to perform quite well as a clustering method in the first place, we could consider running it across ScPCA data to generate clusters that contributors to use and would likely be more reliable than the quick graph-based results we have in the SCEs.sc-SHC
method so we can find out if it's something we want to recommend, e.g. to workshop participants.Methods or approach
To start, mostly
sc-SHC
which runs fine on a laptop.Existing modules
N/A
Input data
ScPCA data!
Scientific literature
Here is the
sc-SHC
paper: https://www.nature.com/articles/s41592-023-01933-9Other details
No response
Beta Was this translation helpful? Give feedback.
All reactions