Skip to content

Commit

Permalink
spell check workflows
Browse files Browse the repository at this point in the history
  • Loading branch information
gordonkoehn committed Oct 18, 2023
1 parent da114b4 commit d686ad8
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 68 deletions.
2 changes: 1 addition & 1 deletion docs/tutorial/advanced_workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@ The workflows are available in:
- *mark05* : assessing the MCMC tree dispersion per tree-tree metric as
a baseline to understand the tree-tree metric.

The outcome an analyis of the results is available in the
The outcome and analysis of the results are available in the
[thesis](TODO:%20(gordonkoehn)%20add%20thesis%20here.).
2 changes: 1 addition & 1 deletion docs/tutorial/advanced_workflow.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ The workflows are available in:

* _mark05_ : assessing the MCMC tree dispersion per tree-tree metric as a baseline to understand the tree-tree metric.

The outcome an analyis of the results is available in the [thesis](TODO: (gordonkoehn) add thesis here.).
The outcome and analysis of the results are available in the [thesis](TODO: (gordonkoehn) add thesis here.).
79 changes: 39 additions & 40 deletions docs/tutorial/basicWorkflows.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,59 @@
# Basic Workflows

*PYggdrasil* implements a number of basic workflows for simulated
mutation profile experiments.
*PYggdrasil* implements several basic workflows for simulated mutation
profile experiments.

We originally used these workflows as part of larger experiments to
evaluate SCITEs performance.
evaluate SCITE’s performance.

Here we show the workflow to run a SCITE mutation profile simulation and
inference experiment. We visualize the evolution of the chains via the
log probability and two similarity measures.
Here, we show the workflow to run a SCITE mutation profile simulation
and inference experiment. We visualize the evolution of the chains via
the log probability and two similarity measures.

In `workflows/` we define a number of
In `workflows/`, we define several
[Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows.
These workflows are defined in a modular way, so that they can be easily
These workflows are defined in a modular way so that they can be easily
combined to create more complex workflows.

- `workflows/tree_inference.smk` is the main workflow, which runs a
mutation profile simulation and inference experiment.
- `workflows/anayze.smk` is a workflow to analyze the results of a
simulation and inference experiment.
- `workflows/visualize.smk` is a workflow to visualize the results of a
- `workflows/tree_inference.smk` implements rules which run a mutation
profile simulation and inference experiment.
- `workflows/anayze.smk` implements rules to analyze the results of a
simulation and inference experiment.
- `workflows/visualize.smk` implements rules to visualize the results of
a simulation and inference experiment.

All the `markXX` rules are used to to define more complex workflows
using these basic functionality. These experiments are defined in
`workflows/markXX.smk` and are part of
[gordonkoehn](https://github.com/gordonkoehn) thesis.
All the `markXX` rules define more complex workflows using these basic
functionalities. These experiments are defined in `workflows/markXX.smk`
and are part of [gordonkoehn](https://github.com/gordonkoehn)’s thesis.

Here we show how the basic workflows can work together to run a single
MCMC chain and visualize the results. All steps of a workflow are
designed to yield intermediate results saved to the disk. Each file is
named in a unique way, so that it can be easily identified and used in
other workflows. A filename implies the complete history of generation!
(This results in long fielnames, but allows us to use pure string
mattching in snakemake – like magic.)
Here, we show how the basic workflows can work together to run a single
MCMC chain and visualize the results. All workflow steps are designed to
yield intermediate results saved to the disk. Each file is named
uniquely to be easily identified and used in other workflows. A filename
implies the complete history of its generation! (This results in long
filenames but allows us to use pure string matching in snakemake – like
magic.)

## Run a single MCMC chain

Here is how you would run the *mark04* workflow.

``` {zsh}
# navigate to the workflows directory
# navigate to the workflow directory
cd workflows
# run the mark00 workflow with 4 cores
# run the mark00 workflow with four cores
snakemake -c 4 mark00
```

Note before you can run it, you need to install *snakemake* at best in a
conda environment. See
Note: before you can run it, you need to install *snakemake* at best in
a conda environment. See
[workflows/README.md](https://github.com/cbg-ethz/PYggdrasil/blob/main/workflows/README.md)
for more details.

*Also, you need to adjust the paths of the *DATADIR* and *REPODIR* in
*workflows/mark00.smk* and *workflows/tree_inference.smk*!*
`workflows/mark00.smk_` and `workflows/tree_inference.smk`!*

Once you get it running: Here is what is happening the below diagram
Once you get it running, Here is what is happening; the diagram below
shows the DAG of the *mark00* workflow.

![mark00 directed acyclic graph of
Expand All @@ -70,24 +69,24 @@ The core rules here are

- gen_cell_simulation to generate a simulated mutation profile given a
tree,

- *mcmcm* running the inference and

- *mcmc* running the inference and
- *analyze_metrics* to compute the similarity metrics.

For the rest of the rules see the the individual files: \*
`workflows/tree_inference.smk` \* `workflows/anayze.smk` \*
`workflows/visualize.smk`
For the rest of the rules, see the individual files:

- `workflows/tree_inference.smk`
- `workflows/analyze.smk`
- `workflows/visualize.smk`

The full workflow generated a these three files:
The full workflow generated these three files:

![mark00 log-prob evolution](basicWorkflows_files/log_prob.svg)

![mark00 MP3 evolution](basicWorkflows_files/MP3.svg)

Note the AD is a bad matric to visualize here, as we use a star tree as
Note the AD is a bad metric to visualize here, as we use a star tree as
a ground truth. No matter what the inference does, the AD will always be
0, as no ancestor-descendant relationship is present per definition of a
star tree.
0, as no ancestor-descendant relationship is present per the definition
of a star tree.

![mark00 AD evolution](basicWorkflows_files/AD.svg)
52 changes: 26 additions & 26 deletions docs/tutorial/basicWorkflows.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,45 @@ format: gfm
jupyter: python3
---

_PYggdrasil_ implements a number of basic workflows for simulated mutation profile experiments.
_PYggdrasil_ implements several basic workflows for simulated mutation profile experiments.

We originally used these workflows as part of larger experiments to evaluate SCITEs performance.
We originally used these workflows as part of larger experiments to evaluate SCITE's performance.

Here we show the workflow to run a SCITE mutation profile simulation and inference experiment.
Here, we show the workflow to run a SCITE mutation profile simulation and inference experiment.
We visualize the evolution of the chains via the log probability and two similarity measures.

In `workflows/` we define a number of [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows.
These workflows are defined in a modular way, so that they can be easily combined to create more complex workflows.
In `workflows/`, we define several [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows.
These workflows are defined in a modular way so that they can be easily combined to create more complex workflows.

* `workflows/tree_inference.smk` is the main workflow, which runs a mutation profile simulation and inference experiment.
* `workflows/anayze.smk` is a workflow to analyze the results of a simulation and inference experiment.
* `workflows/visualize.smk` is a workflow to visualize the results of a simulation and inference experiment.
* `workflows/tree_inference.smk` implements rules which run a mutation profile simulation and inference experiment.
* `workflows/anayze.smk` implements rules to analyze the results of a simulation and inference experiment.
* `workflows/visualize.smk` implements rules to visualize the results of a simulation and inference experiment.

All the `markXX` rules are used to to define more complex workflows using these basic functionality.
These experiments are defined in `workflows/markXX.smk` and are part of [gordonkoehn](https://github.com/gordonkoehn) thesis.
All the `markXX` rules define more complex workflows using these basic functionalities.
These experiments are defined in `workflows/markXX.smk` and are part of [gordonkoehn](https://github.com/gordonkoehn)'s thesis.

Here we show how the basic workflows can work together to run a single MCMC chain and visualize the results.
All steps of a workflow are designed to yield intermediate results saved to the disk.
Each file is named in a unique way, so that it can be easily identified and used in other workflows.
A filename implies the complete history of generation!
(This results in long fielnames, but allows us to use pure string mattching in snakemake -- like magic.)
Here, we show how the basic workflows can work together to run a single MCMC chain and visualize the results.
All workflow steps are designed to yield intermediate results saved to the disk.
Each file is named uniquely to be easily identified and used in other workflows.
A filename implies the complete history of its generation!
(This results in long filenames but allows us to use pure string matching in snakemake -- like magic.)

## Run a single MCMC chain

Here is how you would run the _mark04_ workflow.
```{zsh}
# navigate to the workflows directory
# navigate to the workflow directory
cd workflows
# run the mark00 workflow with 4 cores
# run the mark00 workflow with four cores
snakemake -c 4 mark00
```

Note before you can run it, you need to install _snakemake_ at best in a conda environment.
Note: before you can run it, you need to install _snakemake_ at best in a conda environment.
See [workflows/README.md](https://github.com/cbg-ethz/PYggdrasil/blob/main/workflows/README.md) for more details.

*Also, you need to adjust the paths of the _DATADIR_ and _REPODIR_ in _workflows/mark00.smk_ and _workflows/tree_inference.smk_!*
*Also, you need to adjust the paths of the _DATADIR_ and _REPODIR_ in `workflows/mark00.smk_` and `workflows/tree_inference.smk`!*

Once you get it running: Here is what is happening the below diagram shows the DAG of the _mark00_ workflow.
Once you get it running, Here is what is happening; the diagram below shows the DAG of the _mark00_ workflow.

![mark00 directed acyclic graph of workflow](basicWorkflows_files/dag_mark00.png)

Expand All @@ -55,23 +55,23 @@ This graphic was generated by the following command:
The core rules here are

* gen_cell_simulation to generate a simulated mutation profile given a tree,
* _mcmcm_ running the inference and
* _mcmc_ running the inference and
* _analyze_metrics_ to compute the similarity metrics.

For the rest of the rules see the the individual files:
For the rest of the rules, see the individual files:

* `workflows/tree_inference.smk`
* `workflows/anayze.smk`
* `workflows/analyze.smk`
* `workflows/visualize.smk`

The full workflow generated a these three files:
The full workflow generated these three files:

![mark00 log-prob evolution](basicWorkflows_files/log_prob.svg)

![mark00 MP3 evolution](basicWorkflows_files/MP3.svg)

Note the AD is a bad matric to visualize here, as we use a star tree as a ground truth.
No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per definition of a star tree.
Note the AD is a bad metric to visualize here, as we use a star tree as a ground truth.
No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per the definition of a star tree.

![mark00 AD evolution](basicWorkflows_files/AD.svg)

0 comments on commit d686ad8

Please sign in to comment.