Skip to content

Commit

Permalink
spellcheck functional tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
gordonkoehn committed Oct 18, 2023
1 parent 96a5421 commit da114b4
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 49 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Python package for inference and analysis of mutation trees.

PYggdrasil implements the [Single Cell Inference of Tumor Evolution (SCITE)](https://github.com/cbg-ethz/SCITE) algortihm by [Kuipers J et al. (2015)](https://pubmed.ncbi.nlm.nih.gov/29030470/).

It was designed to quantify the MCMC exploration of tumour progression tree spaces, in particular to investigate: Initialisation Strategies, Convergence Diagnostics & Multi-modalities of SCITE.
It was designed to quantify the MCMC exploration of tumour progression tree spaces, in particular to investigate: Initialisation Strategies, Convergence Diagnostics & Multi-modalities.

## Usage

Expand All @@ -20,7 +20,7 @@ import pyggdrasil as yg


## Contributing
See [Contributing Guidlines](https://cbg-ethz.github.io/PYggdrasil/contributing/).
See [Contributing Guidelines](https://cbg-ethz.github.io/PYggdrasil/contributing/).
### Setting up the repository

To build package and maintain dependencies we use [Poetry](https://python-poetry.org/).
Expand Down Expand Up @@ -73,5 +73,5 @@ We recommend submitting small pull requests and starting with drafts outlining p
* Experimental workflows are in ``workflows/``, with a description of how to set up the environment in ``workflows/README.md``

## Origin & Authorship
This pakage originates from [Gordon J Köhn](https://github.com/gordonkoehn)'s MSc Thesis: _[Quantifying MCMC Exploration of Tumour Progression Tree Spaces](TODO(Gordon):add in link)_ in 2023 at ETH Zürich.
The project was supervised by [Paweł Czyż](https://pawel-czyz.github.io/) and Prof. Dr. Niko Beerenwinkel of the Combutational Biology Group at the [Department of Biosystems Science and Engineering](https://www.bsse.ethz.ch/).
This package originates from [Gordon J Köhn](https://github.com/gordonkoehn)'s MSc Thesis: _[Quantifying MCMC Exploration of Tumour Progression Tree Spaces](TODO(Gordon): add in link)_ in 2023 at ETH Zürich.
[Paweł Czyż](https://pawel-czyz.github.io/) and Prof. Dr Niko Beerenwinkel supervised this project as part of the Computational Biology Group at the [Department of Biosystems Science and Engineering](https://www.bsse.ethz.ch/).
9 changes: 4 additions & 5 deletions docs/tutorial/analyzeMCMC.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,12 @@ import matplotlib.pyplot as plt

## Run MCMC

Below we run 4 Markov Chains, for 100 iterations each, with different
Below we run 4 Markov Chains, for 200 iterations each, with different
initial trees.

### Generate a ground-truth mutation history and a noisy single-cell mutation profile

The below cell generates a random tree with 4 mutations, plus root. For
debugging we may use the *print_topo* to plot its topology.
The below cell generates a random tree with 4 mutations, plus root.

<details>
<summary>Code</summary>
Expand Down Expand Up @@ -104,9 +103,9 @@ mut_mat = jnp.array(data['noisy_mutation_mat'])

## Run the Markov Monte Carlo Chain

The below cell runs a 4 differnt MCMC chain. We initialize ti with the
The below cell runs a 4 differnt MCMC chain. We initialize it with the
initial tree from before. We configure the move probabilities and error
rates and run the MCMC chain for 100 iterations. The sampels are saved
rates and run the MCMC chain for 200 iterations. The sampels are saved
to disk and loaded back into memory as chains may be very long.

``` python
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorial/analyzeMCMC.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ import matplotlib.pyplot as plt
```

## Run MCMC
Below we run 4 Markov Chains, for 100 iterations each, with different initial trees.
Below we run 4 Markov Chains, for 200 iterations each, with different initial trees.

### Generate a ground-truth mutation history and a noisy single-cell mutation profile
The below cell generates a random tree with 4 mutations, plus root. For debugging we may use the _print_topo_ to plot its topology.
The below cell generates a random tree with 4 mutations, plus root.
```{python}
#| code-fold: true
# make true tree
Expand Down Expand Up @@ -83,8 +83,8 @@ mut_mat = jnp.array(data['noisy_mutation_mat'])
```

## Run the Markov Monte Carlo Chain
The below cell runs a 4 differnt MCMC chain. We initialize ti with the initial tree from before.
We configure the move probabilities and error rates and run the MCMC chain for 100 iterations.
The below cell runs a 4 differnt MCMC chain. We initialize it with the initial tree from before.
We configure the move probabilities and error rates and run the MCMC chain for 200 iterations.
The sampels are saved to disk and loaded back into memory as chains may be very long.

```{python}
Expand Down
20 changes: 10 additions & 10 deletions docs/tutorial/index.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# Tutorials

We'd love to see you built upon PYggdrasil. Below we provide some tutorials to help you get started.
We'd love to see you built upon PYggdrasil. Below, we provide some tutorials to help you get started.

## Functional Usage
The below tutorials illustrate how to use PYggdrasil functions for specific tasks.
The tutorials below illustrate how to use PYggdrasil functions for specific tasks.

- [Single MCMC Run](singleMCMC.md) demonstrated how to run a single MCMC chain.
- [Tree Similarities and Visualization](similarities.md) demonstrated how to compute distances between two trees and visualize trees.
- [Analyzing MCMC Runs](analyzeMCMC.md) demonstrated how to compute distances between of MCMC chains and diagnose convergence issues.
- [Single MCMC Run](singleMCMC.md) demonstrates how to run a single MCMC chain.
- [Tree Similarities and Visualization](similarities.md) demonstrates how to compute distances between two trees and visualize trees.
- [Analyzing MCMC Runs](analyzeMCMC.md) demonstrates how to compute distances between MCMC chains and diagnose convergence issues.


## Experimental Workflows
For a sustaibalbe and reproducible workflow use _snakemake_ to run PYggdrasil.
We use _snakemake_ to create a sustainable and reproducible workflow for extensive experiments with PYggdrasil.
See [Workflows](../workflows/index.md) for more details.

The below tutorials illustrate _snakemake_ workflows ilustrated that allow to run fully reproducible experiments using PYggdrasil.
From tree and data generation to convergence diagostics a pipline exists.
The tutorials below illustrate _snakemake_ workflows that allow running such fully reproducible experiments using PYggdrasil.
From tree and data generation to convergence diagnostics, a pipeline exists.

- [Basic Workflow](basicWorkflows.md) demonstrated how to run a basic workflow for a particular output file.
- [Advanced Workflow](advanced_workflow.md) demonstrated how to run a more advanced workflow.
- [Basic Workflow](basicWorkflows.md) demonstrates a basic workflow for a particular output file.
- [Advanced Workflow](advanced_workflow.md) demonstrates how to run a more advanced workflow.
2 changes: 1 addition & 1 deletion docs/tutorial/similarities.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ yg.visualize.plot_tree_no_print(deep_tree, save_name, save_dir)
methods.

1. MCMC tree generation - takes a tree and evolves it by a fixed number
random moves implemnted with SCITE.
of random moves implemnted with SCITE.
2. HUNTRESS inference - takes a cell-mutation profile and infers a tree
with HUNTRESS.

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/similarities.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ yg.visualize.plot_tree_no_print(deep_tree, save_name, save_dir)

**Note:** PYggdrasil inplements two more advanced tree generation methods.

1. MCMC tree generation - takes a tree and evolves it by a fixed number random moves implemnted with SCITE.
1. MCMC tree generation - takes a tree and evolves it by a fixed number of random moves implemnted with SCITE.
2. HUNTRESS inference - takes a cell-mutation profile and infers a tree with HUNTRESS.


Expand Down
32 changes: 16 additions & 16 deletions docs/tutorial/singleMCMC.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This tutorial shows how to run a single MCMC chain of SCITE using
PYggdrasil.

- We will generate our own ground-truth mutation histroy and generate a
- We will generate our own ground-truth mutation history and generate a
noisy single-cell mutation profile from it.
- We will then run a single MCMC chain to infer the mutation history
from the noisy single-cell mutation profile.
Expand Down Expand Up @@ -99,10 +99,10 @@ print(mut_mat)

## 4) Run the Markov Monte Carlo Chain

The below cell runs a single MCMC chain. We initialize ti with the
initial tree from before. We configure the move probabilities and error
rates and run the MCMC chain for 100 iterations. The sampels are saved
to disk and loaded back into memory as chains may be very long.
The below cell runs a single MCMC chain. We initialize it with
the initial tree from before. We configure the move probabilities
and error rates and run the MCMC chain for 100 iterations.
The samples are saved to disk and loaded back into memory as chains may be very long.

``` python
## Run MCMC
Expand Down Expand Up @@ -145,10 +145,10 @@ mcmc_data = yg.serialize.read_mcmc_samples(save_dir / f"{save_name}.json")

## 5) Visualize the results

In the following we would like to plot the evolution of the MCMC chain
and the trees that were sampled. First we convert the data from the
serialized format to a *pureMCMCdata* format. This is a simple dataclass
that contains the trees and the log probabilities of the trees.
In the following, we would like to plot the evolution of the MCMC chain
and the trees that were sampled. First, we convert the data from the serialized
format to a pureMCMCdata format. This is a simple data class that
contains the trees and the log probabilities of the trees.

``` python
# unpack the data - reads in the serialized trees to Tree objects
Expand Down Expand Up @@ -198,11 +198,11 @@ yg.compare_trees(last_tree, true_tree)

True

Not note that the last tree does not need to be a good tree. SCITE is
just likely to spend more iterations exploring more likely trees. Here
the last tree just turns out to be a tree with the highest
log-probability.
Now note that the last tree does not need to be a good tree.
SCITE is just likely to spend more iterations exploring more
likely trees. Here, the last tree just turns out to be a
tree with the highest log-probability.

To acutally retrive a mutation tree from the posterior one would have to
make a point estimate Maximum A Posteriori (MAP) tree, i.e. sampled the
most times. See SCITE paper for details.
To acutally retrive a mutation tree from the posterior one
would have to make a point estimate Maximum A Posteriori (MAP) tree,
i.e. sampled the most times. See SCITE paper for details.
16 changes: 8 additions & 8 deletions docs/tutorial/singleMCMC.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jupyter: python3

This tutorial shows how to run a single MCMC chain of SCITE using PYggdrasil.

* We will generate our own ground-truth mutation histroy and generate a noisy single-cell mutation profile from it.
* We will generate our own ground-truth mutation history and generate a noisy single-cell mutation profile from it.
* We will then run a single MCMC chain to infer the mutation history from the noisy single-cell mutation profile.
* Visualize the results. The trees and the evolution of the MCMC.

Expand Down Expand Up @@ -77,9 +77,7 @@ print(mut_mat)
```

## 4) Run the Markov Monte Carlo Chain
The below cell runs a single MCMC chain. We initialize ti with the initial tree from before.
We configure the move probabilities and error rates and run the MCMC chain for 100 iterations.
The sampels are saved to disk and loaded back into memory as chains may be very long.
The below cell runs a single MCMC chain. We initialize it with the initial tree from before. We configure the move probabilities and error rates and run the MCMC chain for 100 iterations. The samples are saved to disk and loaded back into memory as chains may be very long.

```{python}
#| warning: false
Expand Down Expand Up @@ -122,8 +120,10 @@ mcmc_data = yg.serialize.read_mcmc_samples(save_dir / f"{save_name}.json")
```

## 5) Visualize the results
In the following we would like to plot the evolution of the MCMC chain and the trees that were sampled.
First we convert the data from the serialized format to a _pureMCMCdata_ format. This is a simple dataclass that contains the trees and the log probabilities of the trees.
In the following, we would like to plot the evolution of the MCMC chain
and the trees that were sampled. First, we convert the data from the serialized
format to a _pureMCMCdata_ format. This is a simple data class that
contains the trees and the log probabilities of the trees.

```{python}
#| warning: false
Expand Down Expand Up @@ -163,6 +163,6 @@ Is it perhaps the true tree?
yg.compare_trees(last_tree, true_tree)
```

Not note that the last tree does not need to be a good tree. SCITE is just likely to spend more iterations exploring more likely trees. Here the last tree just turns out to be a tree with the highest log-probability.
Now note that the last tree does not need to be a good tree. SCITE is just likely to spend more iterations exploring more likely trees. Here, the last tree just turns out to be a tree with the highest log-probability.

To acutally retrive a mutation tree from the posterior one would have to make a point estimate Maximum A Posteriori (MAP) tree, i.e. sampled the most times. See SCITE paper for details.
To acutally retrive a mutation tree from the posterior one would have to make a point estimate Maximum A Posteriori (MAP) tree, i.e. sampled the most times. See SCITE paper for details.

0 comments on commit da114b4

Please sign in to comment.