diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml index 7f8024c..b1393e9 100644 --- a/.github/workflows/documentation.yml +++ b/.github/workflows/documentation.yml @@ -28,6 +28,8 @@ jobs: pip install sphinx pip install sphinx_book_theme pip install myst-parser + pip install sphinx-panels + pip install sphinx-copybutton pip install numpy pip install pandas pip install torch --index-url https://download.pytorch.org/whl/cpu diff --git a/README.md b/README.md index 9a16600..625d5a1 100644 --- a/README.md +++ b/README.md @@ -17,5 +17,7 @@ RegDiffusion is on pypi. pip install regdiffusion ``` -Check out the [example notebook](https://github.com/TuftsBCB/RegDiffusion/blob/master/example.ipynb) for a quick tour of how to use RegDiffusion for your research! + + +Check out the [this tutorial](https://tuftsbcb.github.io/RegDiffusion/quick_tour.html) for a quick tour of how to use RegDiffusion for your research! diff --git a/docs/.gitignore b/docs/.gitignore index 6bf8a98..a018156 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -1,2 +1,3 @@ _build/ +_autosummary/ .DS_Store diff --git a/docs/_templates/autosummary/class.rst b/docs/_templates/autosummary/class.rst new file mode 100644 index 0000000..afb2924 --- /dev/null +++ b/docs/_templates/autosummary/class.rst @@ -0,0 +1,12 @@ +.. role:: hidden + :class: hidden-section +.. currentmodule:: {{ module }} + + +{{ name | underline}} + +.. autoclass:: {{ name }} + :inherited-members: + :members: + +.. autogenerated from source/_templates/autosummary/class.rst \ No newline at end of file diff --git a/docs/_templates/autosummary/classnoinheritance.rst b/docs/_templates/autosummary/classnoinheritance.rst new file mode 100644 index 0000000..885034b --- /dev/null +++ b/docs/_templates/autosummary/classnoinheritance.rst @@ -0,0 +1,11 @@ +.. role:: hidden + :class: hidden-section +.. currentmodule:: {{ module }} + + +{{ name | underline}} + +.. autoclass:: {{ name }} + :members: + +.. autogenerated from source/_templates/autosummary/class.rst \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 407df55..6034384 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -20,11 +20,19 @@ extensions = [ 'sphinx.ext.autodoc', + 'sphinx.ext.autosummary', 'sphinx.ext.viewcode', 'sphinx.ext.napoleon', + 'sphinx_copybutton', + "sphinx_panels", 'myst_parser' ] +copybutton_prompt_text = ">>> " + +autosummary_generate = True +numpydoc_show_class_members = False + source_suffix = ['.rst', '.md'] templates_path = ['_templates'] exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', '.ipynb_checkpoints', '__pycache__/'] diff --git a/docs/modules.rst b/docs/modules.rst index cc4a9cd..bccd02e 100644 --- a/docs/modules.rst +++ b/docs/modules.rst @@ -1,7 +1,32 @@ -regdiffusion -============ +API Reference +============= + +Top level API +------------- + +.. autosummary:: + :toctree: _autosummary + + regdiffusion.RegDiffusionTrainer + regdiffusion.GRN + regdiffusion.GRNEvaluator + +models +------ + +.. autosummary:: + :toctree: _autosummary + + regdiffusion.models.RegDiffusion + +data +---- + +.. autosummary:: + :toctree: _autosummary + + regdiffusion.data.load_beeline + regdiffusion.data.load_atlas_microglia + regdiffusion.data.load_hammond_microglia -.. toctree:: - :maxdepth: 4 - regdiffusion diff --git a/docs/quick_tour.md b/docs/quick_tour.md index dad7e53..b7cb1ba 100644 --- a/docs/quick_tour.md +++ b/docs/quick_tour.md @@ -1,4 +1,4 @@ -# Getting Startted with GRN inference using diffusion model +# Get Started Diffusion model has been widely used in generative AI, especially in the vision domain. In our paper, we proposed RegDiffusion, a diffusion based model for GRN inference. Compared with previous model, RegDiffusion completes inference within a fraction of time and yield better benchmarking results. @@ -25,8 +25,8 @@ If you want to see the inference on a larger network with 14,000+ genes and 8,00 ``` >>> bl_dt, bl_gt = rd.data.load_beeline( - benchmark_data='mESC', benchmark_setting='1000_STRING' - ) +>>> benchmark_data='mESC', benchmark_setting='1000_STRING' +>>> ) ``` Here, `load_beeline` gives you a tuple, where the first element is an anndata of the single cell experession data and the second element is an array of all the ground truth links (based on the STRING network in this case). @@ -35,9 +35,7 @@ Here, `load_beeline` gives you a tuple, where the first element is an anndata of >>> bl_dt AnnData object with n_obs × n_vars = 421 × 1620 obs: 'cell_type', 'cell_type_index' -``` -```python >>> bl_gt array([['KLF6', 'JUN'], ['JUN', 'KLF6'], @@ -117,7 +115,7 @@ There are many ways to discover target genes to study the local networks. For ex ### Step 2. Visualize the local network around the selected gene -The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop top-k neighborhood around a selected gene and visualize it using `pyvis`/`vis.js`. The default `k` here is 20. However, in cases when the regulatory relationships are strong and bidirectional, `k=20` only gives a very simple network. You may increase the magnitude of `k` to find some meaningful results to you. +The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop top-k neighborhood around a selected gene and visualize it using `pyvis`/`vis.js`. The default `k` here is 20. However, in cases when the regulatory relationships are strong and bidirectional, `k=20` only gives a very simple network. You may increase the magnitude of `k` to find some meaningful results to you. Keep in mind that, if your `k` is too small, you won't be able to see some relatively strong links. ```python @@ -127,6 +125,52 @@ The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop ![](https://raw.githubusercontent.com/TuftsBCB/RegDiffusion/master/resources/mecs.png) +### (Optional) Step 3. Node clustering + +Here we have a fairly obvious bipartisan graph. It also makes sense to use some clustering methods to automatically assign nodes into partitions. You can use any clustering methods that you like (and works). Here is an example of using `node2vec` for this task. + +```python +>>> import networkx as nx +>>> from sklearn.cluster import KMeans +>>> from node2vec import Node2Vec +>>> +>>> adj_table = grn.extract_node_2hop_neighborhood('HIST1H1D', 40) +>>> nxg = nx.from_pandas_edgelist(adj_table) +>>> +>>> node2vec = Node2Vec(nxg, dimensions=64, walk_length=30, num_walks=200, +>>> workers=4, seed=123) +>>> model = node2vec.fit(window=10, min_count=1, batch_words=4) +>>> +>>> node_embeddings = [model.wv.get_vector(str(node)) for node in nxg.nodes()] +>>> +>>> kmeans = KMeans(n_clusters=4, random_state=0).fit(node_embeddings) +>>> node_labels = kmeans.labels_ +>>> +>>> print("Clusters:") +>>> for cluster_id in range(max(node_labels) + 1): +>>> cluster_nodes = [g for g, c in zip( +>>> nxg.nodes(), node_labels) if c == cluster_id] +>>> print(f"Cluster {cluster_id}: {','.join(cluster_nodes)}") +Clusters: +Cluster 0: HIST1H1D,HIST1H2BN,HIST1H2BK,HIST1H1B,HIST1H2BL,HIST1H2AK,HIST1H1A,HIST1H2AC,HIST1H2BF,HIST1H4K,HIST1H3H,HIST1H2AF,HIST1H2AI,HIST1H2AG,HIST1H2BB,DNMT1,BRCA1,KNTC1,RAD54B,GM44335,FBXO5,TAF1,ABTB1,DEK,KANK3 +Cluster 1: MCM10,TIMELESS,RAD51,RBBP4,RRM2,MCM6,PCNA,E2F1,UHRF1,MCM4,MCM5,UNG,MCM7,MCM3,ZFP367,EZH2,BARD1 +Cluster 2: TOP2A,MAZ,POLR3B,GM10184,ATF4 +Cluster 3: GM26448,EGR1 +``` + +You can also apply the clustering information to your visual. + +```python +>>> gene_group_dict = dict() +>>> gene_group_dict = {g:str(c) for g, c in zip(nxg.nodes(), node_labels)} +>>> g = grn.visualize_local_neighborhood( +>>> 'HIST1H1D', k=40, node_group_dict=gene_group_dict +>>> ) +>>> g.show('view.html') +``` + +![](https://raw.githubusercontent.com/TuftsBCB/RegDiffusion/master/resources/mecs_cluster.png) + ### Result Interpretation In the figure below, we clearly see two clusters. Most of the genes on the right side are obviously histone related since they all start with `HIST`. Genes on the left side are not that obvious. Therefore, we did a GO enrichment analysis on this gene set using [shinyGo 0.80](http://bioinformatics.sdstate.edu/go/) and found that they are closely related to DNA replication and double strand break repair. diff --git a/regdiffusion/grn.py b/regdiffusion/grn.py index d9e4ae3..93968da 100644 --- a/regdiffusion/grn.py +++ b/regdiffusion/grn.py @@ -13,9 +13,8 @@ class GRN: A Object to save and analyze gene regulatory network A GRN object includes the adjacency matrix between transcriptional factors - (|a|) and target genes (|b|). The adjacency matrix is expected to be in the - shape of |a| * |b|. In many cases, when TFs are not specified, we have a - square-shaped (|b| * |b|) adjacency matrix. We expected the adjacency + and target genes. In many cases, when TFs are not specified, we have a + square-shaped adjacency matrix. We expected the adjacency matrix to hold predicted weights/probabilities for the edges (float). To create a GRN object, you need at least two things: the adjacency matrix