Skip to content

Commit

Permalink
update website
Browse files Browse the repository at this point in the history
  • Loading branch information
haozhu233 committed Mar 27, 2024
1 parent f1931f9 commit d6f42b2
Show file tree
Hide file tree
Showing 9 changed files with 119 additions and 15 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ jobs:
pip install sphinx
pip install sphinx_book_theme
pip install myst-parser
pip install sphinx-panels
pip install sphinx-copybutton
pip install numpy
pip install pandas
pip install torch --index-url https://download.pytorch.org/whl/cpu
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,7 @@ RegDiffusion is on pypi.
pip install regdiffusion
```

Check out the [example notebook](https://github.com/TuftsBCB/RegDiffusion/blob/master/example.ipynb) for a quick tour of how to use RegDiffusion for your research!


Check out the [this tutorial](https://tuftsbcb.github.io/RegDiffusion/quick_tour.html) for a quick tour of how to use RegDiffusion for your research!

1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
_build/
_autosummary/
.DS_Store
12 changes: 12 additions & 0 deletions docs/_templates/autosummary/class.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:inherited-members:
:members:

.. autogenerated from source/_templates/autosummary/class.rst
11 changes: 11 additions & 0 deletions docs/_templates/autosummary/classnoinheritance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:members:

.. autogenerated from source/_templates/autosummary/class.rst
8 changes: 8 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,19 @@

extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx_copybutton',
"sphinx_panels",
'myst_parser'
]

copybutton_prompt_text = ">>> "

autosummary_generate = True
numpydoc_show_class_members = False

source_suffix = ['.rst', '.md']
templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', '.ipynb_checkpoints', '__pycache__/']
Expand Down
35 changes: 30 additions & 5 deletions docs/modules.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,32 @@
regdiffusion
============
API Reference
=============

Top level API
-------------

.. autosummary::
:toctree: _autosummary

regdiffusion.RegDiffusionTrainer
regdiffusion.GRN
regdiffusion.GRNEvaluator

models
------

.. autosummary::
:toctree: _autosummary

regdiffusion.models.RegDiffusion

data
----

.. autosummary::
:toctree: _autosummary

regdiffusion.data.load_beeline
regdiffusion.data.load_atlas_microglia
regdiffusion.data.load_hammond_microglia

.. toctree::
:maxdepth: 4

regdiffusion
56 changes: 50 additions & 6 deletions docs/quick_tour.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Getting Startted with GRN inference using diffusion model
# Get Started

Diffusion model has been widely used in generative AI, especially in the vision domain. In our paper, we proposed RegDiffusion, a diffusion based model for GRN inference. Compared with previous model, RegDiffusion completes inference within a fraction of time and yield better benchmarking results.

Expand All @@ -25,8 +25,8 @@ If you want to see the inference on a larger network with 14,000+ genes and 8,00

```
>>> bl_dt, bl_gt = rd.data.load_beeline(
benchmark_data='mESC', benchmark_setting='1000_STRING'
)
>>> benchmark_data='mESC', benchmark_setting='1000_STRING'
>>> )
```

Here, `load_beeline` gives you a tuple, where the first element is an anndata of the single cell experession data and the second element is an array of all the ground truth links (based on the STRING network in this case).
Expand All @@ -35,9 +35,7 @@ Here, `load_beeline` gives you a tuple, where the first element is an anndata of
>>> bl_dt
AnnData object with n_obs × n_vars = 421 × 1620
obs: 'cell_type', 'cell_type_index'
```

```python
>>> bl_gt
array([['KLF6', 'JUN'],
['JUN', 'KLF6'],
Expand Down Expand Up @@ -117,7 +115,7 @@ There are many ways to discover target genes to study the local networks. For ex

### Step 2. Visualize the local network around the selected gene

The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop top-k neighborhood around a selected gene and visualize it using `pyvis`/`vis.js`. The default `k` here is 20. However, in cases when the regulatory relationships are strong and bidirectional, `k=20` only gives a very simple network. You may increase the magnitude of `k` to find some meaningful results to you.
The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop top-k neighborhood around a selected gene and visualize it using `pyvis`/`vis.js`. The default `k` here is 20. However, in cases when the regulatory relationships are strong and bidirectional, `k=20` only gives a very simple network. You may increase the magnitude of `k` to find some meaningful results to you. Keep in mind that, if your `k` is too small, you won't be able to see some relatively strong links.


```python
Expand All @@ -127,6 +125,52 @@ The `visualize_local_neighborhood` method of an `GRN` object extracts the 2-hop

![](https://raw.githubusercontent.com/TuftsBCB/RegDiffusion/master/resources/mecs.png)

### (Optional) Step 3. Node clustering

Here we have a fairly obvious bipartisan graph. It also makes sense to use some clustering methods to automatically assign nodes into partitions. You can use any clustering methods that you like (and works). Here is an example of using `node2vec` for this task.

```python
>>> import networkx as nx
>>> from sklearn.cluster import KMeans
>>> from node2vec import Node2Vec
>>>
>>> adj_table = grn.extract_node_2hop_neighborhood('HIST1H1D', 40)
>>> nxg = nx.from_pandas_edgelist(adj_table)
>>>
>>> node2vec = Node2Vec(nxg, dimensions=64, walk_length=30, num_walks=200,
>>> workers=4, seed=123)
>>> model = node2vec.fit(window=10, min_count=1, batch_words=4)
>>>
>>> node_embeddings = [model.wv.get_vector(str(node)) for node in nxg.nodes()]
>>>
>>> kmeans = KMeans(n_clusters=4, random_state=0).fit(node_embeddings)
>>> node_labels = kmeans.labels_
>>>
>>> print("Clusters:")
>>> for cluster_id in range(max(node_labels) + 1):
>>> cluster_nodes = [g for g, c in zip(
>>> nxg.nodes(), node_labels) if c == cluster_id]
>>> print(f"Cluster {cluster_id}: {','.join(cluster_nodes)}")
Clusters:
Cluster 0: HIST1H1D,HIST1H2BN,HIST1H2BK,HIST1H1B,HIST1H2BL,HIST1H2AK,HIST1H1A,HIST1H2AC,HIST1H2BF,HIST1H4K,HIST1H3H,HIST1H2AF,HIST1H2AI,HIST1H2AG,HIST1H2BB,DNMT1,BRCA1,KNTC1,RAD54B,GM44335,FBXO5,TAF1,ABTB1,DEK,KANK3
Cluster 1: MCM10,TIMELESS,RAD51,RBBP4,RRM2,MCM6,PCNA,E2F1,UHRF1,MCM4,MCM5,UNG,MCM7,MCM3,ZFP367,EZH2,BARD1
Cluster 2: TOP2A,MAZ,POLR3B,GM10184,ATF4
Cluster 3: GM26448,EGR1
```

You can also apply the clustering information to your visual.

```python
>>> gene_group_dict = dict()
>>> gene_group_dict = {g:str(c) for g, c in zip(nxg.nodes(), node_labels)}
>>> g = grn.visualize_local_neighborhood(
>>> 'HIST1H1D', k=40, node_group_dict=gene_group_dict
>>> )
>>> g.show('view.html')
```

![](https://raw.githubusercontent.com/TuftsBCB/RegDiffusion/master/resources/mecs_cluster.png)

### Result Interpretation

In the figure below, we clearly see two clusters. Most of the genes on the right side are obviously histone related since they all start with `HIST`. Genes on the left side are not that obvious. Therefore, we did a GO enrichment analysis on this gene set using [shinyGo 0.80](http://bioinformatics.sdstate.edu/go/) and found that they are closely related to DNA replication and double strand break repair.
Expand Down
5 changes: 2 additions & 3 deletions regdiffusion/grn.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@ class GRN:
A Object to save and analyze gene regulatory network
A GRN object includes the adjacency matrix between transcriptional factors
(|a|) and target genes (|b|). The adjacency matrix is expected to be in the
shape of |a| * |b|. In many cases, when TFs are not specified, we have a
square-shaped (|b| * |b|) adjacency matrix. We expected the adjacency
and target genes. In many cases, when TFs are not specified, we have a
square-shaped adjacency matrix. We expected the adjacency
matrix to hold predicted weights/probabilities for the edges (float).
To create a GRN object, you need at least two things: the adjacency matrix
Expand Down

0 comments on commit d6f42b2

Please sign in to comment.