diff --git a/docs/tutorial/advanced_workflow.md b/docs/tutorial/advanced_workflow.md index d50cb7c..62851fd 100644 --- a/docs/tutorial/advanced_workflow.md +++ b/docs/tutorial/advanced_workflow.md @@ -28,5 +28,5 @@ The workflows are available in: - *mark05* : assessing the MCMC tree dispersion per tree-tree metric as a baseline to understand the tree-tree metric. -The outcome an analyis of the results is available in the +The outcome and analysis of the results are available in the [thesis](TODO:%20(gordonkoehn)%20add%20thesis%20here.). diff --git a/docs/tutorial/advanced_workflow.qmd b/docs/tutorial/advanced_workflow.qmd index e9ac7ad..894b2ae 100644 --- a/docs/tutorial/advanced_workflow.qmd +++ b/docs/tutorial/advanced_workflow.qmd @@ -24,4 +24,4 @@ The workflows are available in: * _mark05_ : assessing the MCMC tree dispersion per tree-tree metric as a baseline to understand the tree-tree metric. -The outcome an analyis of the results is available in the [thesis](TODO: (gordonkoehn) add thesis here.). \ No newline at end of file +The outcome and analysis of the results are available in the [thesis](TODO: (gordonkoehn) add thesis here.). \ No newline at end of file diff --git a/docs/tutorial/basicWorkflows.md b/docs/tutorial/basicWorkflows.md index ad7245c..fa706f2 100644 --- a/docs/tutorial/basicWorkflows.md +++ b/docs/tutorial/basicWorkflows.md @@ -1,60 +1,59 @@ # Basic Workflows -*PYggdrasil* implements a number of basic workflows for simulated -mutation profile experiments. +*PYggdrasil* implements several basic workflows for simulated mutation +profile experiments. We originally used these workflows as part of larger experiments to -evaluate SCITEs performance. +evaluate SCITE’s performance. -Here we show the workflow to run a SCITE mutation profile simulation and -inference experiment. We visualize the evolution of the chains via the -log probability and two similarity measures. +Here, we show the workflow to run a SCITE mutation profile simulation +and inference experiment. We visualize the evolution of the chains via +the log probability and two similarity measures. -In `workflows/` we define a number of +In `workflows/`, we define several [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows. -These workflows are defined in a modular way, so that they can be easily +These workflows are defined in a modular way so that they can be easily combined to create more complex workflows. -- `workflows/tree_inference.smk` is the main workflow, which runs a - mutation profile simulation and inference experiment. -- `workflows/anayze.smk` is a workflow to analyze the results of a - simulation and inference experiment. -- `workflows/visualize.smk` is a workflow to visualize the results of a +- `workflows/tree_inference.smk` implements rules which run a mutation + profile simulation and inference experiment. +- `workflows/anayze.smk` implements rules to analyze the results of a simulation and inference experiment. +- `workflows/visualize.smk` implements rules to visualize the results of + a simulation and inference experiment. -All the `markXX` rules are used to to define more complex workflows -using these basic functionality. These experiments are defined in -`workflows/markXX.smk` and are part of -[gordonkoehn](https://github.com/gordonkoehn) thesis. +All the `markXX` rules define more complex workflows using these basic +functionalities. These experiments are defined in `workflows/markXX.smk` +and are part of [gordonkoehn](https://github.com/gordonkoehn)’s thesis. -Here we show how the basic workflows can work together to run a single -MCMC chain and visualize the results. All steps of a workflow are -designed to yield intermediate results saved to the disk. Each file is -named in a unique way, so that it can be easily identified and used in -other workflows. A filename implies the complete history of generation! -(This results in long fielnames, but allows us to use pure string -mattching in snakemake – like magic.) +Here, we show how the basic workflows can work together to run a single +MCMC chain and visualize the results. All workflow steps are designed to +yield intermediate results saved to the disk. Each file is named +uniquely to be easily identified and used in other workflows. A filename +implies the complete history of its generation! (This results in long +filenames but allows us to use pure string matching in snakemake – like +magic.) ## Run a single MCMC chain Here is how you would run the *mark04* workflow. ``` {zsh} - # navigate to the workflows directory + # navigate to the workflow directory cd workflows - # run the mark00 workflow with 4 cores + # run the mark00 workflow with four cores snakemake -c 4 mark00 ``` -Note before you can run it, you need to install *snakemake* at best in a -conda environment. See +Note: before you can run it, you need to install *snakemake* at best in +a conda environment. See [workflows/README.md](https://github.com/cbg-ethz/PYggdrasil/blob/main/workflows/README.md) for more details. *Also, you need to adjust the paths of the *DATADIR* and *REPODIR* in -*workflows/mark00.smk* and *workflows/tree_inference.smk*!* +`workflows/mark00.smk_` and `workflows/tree_inference.smk`!* -Once you get it running: Here is what is happening the below diagram +Once you get it running, Here is what is happening; the diagram below shows the DAG of the *mark00* workflow. ![mark00 directed acyclic graph of @@ -70,24 +69,24 @@ The core rules here are - gen_cell_simulation to generate a simulated mutation profile given a tree, - -- *mcmcm* running the inference and - +- *mcmc* running the inference and - *analyze_metrics* to compute the similarity metrics. -For the rest of the rules see the the individual files: \* -`workflows/tree_inference.smk` \* `workflows/anayze.smk` \* -`workflows/visualize.smk` +For the rest of the rules, see the individual files: + +- `workflows/tree_inference.smk` +- `workflows/analyze.smk` +- `workflows/visualize.smk` -The full workflow generated a these three files: +The full workflow generated these three files: ![mark00 log-prob evolution](basicWorkflows_files/log_prob.svg) ![mark00 MP3 evolution](basicWorkflows_files/MP3.svg) -Note the AD is a bad matric to visualize here, as we use a star tree as +Note the AD is a bad metric to visualize here, as we use a star tree as a ground truth. No matter what the inference does, the AD will always be -0, as no ancestor-descendant relationship is present per definition of a -star tree. +0, as no ancestor-descendant relationship is present per the definition +of a star tree. ![mark00 AD evolution](basicWorkflows_files/AD.svg) diff --git a/docs/tutorial/basicWorkflows.qmd b/docs/tutorial/basicWorkflows.qmd index 7e87938..6fa7912 100644 --- a/docs/tutorial/basicWorkflows.qmd +++ b/docs/tutorial/basicWorkflows.qmd @@ -4,45 +4,45 @@ format: gfm jupyter: python3 --- -_PYggdrasil_ implements a number of basic workflows for simulated mutation profile experiments. +_PYggdrasil_ implements several basic workflows for simulated mutation profile experiments. -We originally used these workflows as part of larger experiments to evaluate SCITEs performance. +We originally used these workflows as part of larger experiments to evaluate SCITE's performance. -Here we show the workflow to run a SCITE mutation profile simulation and inference experiment. +Here, we show the workflow to run a SCITE mutation profile simulation and inference experiment. We visualize the evolution of the chains via the log probability and two similarity measures. -In `workflows/` we define a number of [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows. -These workflows are defined in a modular way, so that they can be easily combined to create more complex workflows. +In `workflows/`, we define several [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflows. +These workflows are defined in a modular way so that they can be easily combined to create more complex workflows. -* `workflows/tree_inference.smk` is the main workflow, which runs a mutation profile simulation and inference experiment. -* `workflows/anayze.smk` is a workflow to analyze the results of a simulation and inference experiment. -* `workflows/visualize.smk` is a workflow to visualize the results of a simulation and inference experiment. +* `workflows/tree_inference.smk` implements rules which run a mutation profile simulation and inference experiment. +* `workflows/anayze.smk` implements rules to analyze the results of a simulation and inference experiment. +* `workflows/visualize.smk` implements rules to visualize the results of a simulation and inference experiment. -All the `markXX` rules are used to to define more complex workflows using these basic functionality. -These experiments are defined in `workflows/markXX.smk` and are part of [gordonkoehn](https://github.com/gordonkoehn) thesis. +All the `markXX` rules define more complex workflows using these basic functionalities. +These experiments are defined in `workflows/markXX.smk` and are part of [gordonkoehn](https://github.com/gordonkoehn)'s thesis. -Here we show how the basic workflows can work together to run a single MCMC chain and visualize the results. -All steps of a workflow are designed to yield intermediate results saved to the disk. -Each file is named in a unique way, so that it can be easily identified and used in other workflows. -A filename implies the complete history of generation! -(This results in long fielnames, but allows us to use pure string mattching in snakemake -- like magic.) +Here, we show how the basic workflows can work together to run a single MCMC chain and visualize the results. +All workflow steps are designed to yield intermediate results saved to the disk. +Each file is named uniquely to be easily identified and used in other workflows. +A filename implies the complete history of its generation! +(This results in long filenames but allows us to use pure string matching in snakemake -- like magic.) ## Run a single MCMC chain Here is how you would run the _mark04_ workflow. ```{zsh} - # navigate to the workflows directory + # navigate to the workflow directory cd workflows - # run the mark00 workflow with 4 cores + # run the mark00 workflow with four cores snakemake -c 4 mark00 ``` -Note before you can run it, you need to install _snakemake_ at best in a conda environment. +Note: before you can run it, you need to install _snakemake_ at best in a conda environment. See [workflows/README.md](https://github.com/cbg-ethz/PYggdrasil/blob/main/workflows/README.md) for more details. -*Also, you need to adjust the paths of the _DATADIR_ and _REPODIR_ in _workflows/mark00.smk_ and _workflows/tree_inference.smk_!* +*Also, you need to adjust the paths of the _DATADIR_ and _REPODIR_ in `workflows/mark00.smk_` and `workflows/tree_inference.smk`!* -Once you get it running: Here is what is happening the below diagram shows the DAG of the _mark00_ workflow. +Once you get it running, Here is what is happening; the diagram below shows the DAG of the _mark00_ workflow. ![mark00 directed acyclic graph of workflow](basicWorkflows_files/dag_mark00.png) @@ -55,23 +55,23 @@ This graphic was generated by the following command: The core rules here are * gen_cell_simulation to generate a simulated mutation profile given a tree, -* _mcmcm_ running the inference and +* _mcmc_ running the inference and * _analyze_metrics_ to compute the similarity metrics. -For the rest of the rules see the the individual files: +For the rest of the rules, see the individual files: * `workflows/tree_inference.smk` -* `workflows/anayze.smk` +* `workflows/analyze.smk` * `workflows/visualize.smk` -The full workflow generated a these three files: +The full workflow generated these three files: ![mark00 log-prob evolution](basicWorkflows_files/log_prob.svg) ![mark00 MP3 evolution](basicWorkflows_files/MP3.svg) -Note the AD is a bad matric to visualize here, as we use a star tree as a ground truth. -No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per definition of a star tree. +Note the AD is a bad metric to visualize here, as we use a star tree as a ground truth. +No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per the definition of a star tree. ![mark00 AD evolution](basicWorkflows_files/AD.svg)