-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NOTE: As of this commit, the build _runs_, but only for whole genome, not for the E1 gene-specific build. Additionally, many aspects of the build are uncorrect, and need to be tuned or revised.
- Loading branch information
Showing
22 changed files
with
978 additions
and
353 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,36 @@ | ||
# Phylogenetic | ||
# Phylogenetic workflow | ||
|
||
This workflow uses metadata and sequences to produce one or multiple [Nextstrain datasets][] | ||
that can be visualized in Auspice. | ||
|
||
Resulting tree is available here: https://nextstrain.org/groups/neherlab/staging/nipah | ||
|
||
## Background | ||
|
||
See e.g. [Whitmer et. al, 2020](https://academic.oup.com/ve/article/7/1/veaa062/5894561) | ||
This workflow uses metadata and sequences to produce one or multiple | ||
[Nextstrain datasets][] that can be visualized in Auspice. | ||
|
||
## Data Requirements | ||
|
||
The core phylogenetic workflow will use metadata values as-is, so please do any | ||
desired data formatting and curations as part of the [ingest](../ingest/) workflow. | ||
The core phylogenetic workflow will use metadata values as-is, so | ||
please do any desired data formatting and curations as part of the | ||
[ingest][] workflow. | ||
|
||
1. The metadata must include an ID column that can be used as as exact match for | ||
the sequence ID present in the FASTA headers. | ||
2. The `date` column in the metadata must be in ISO 8601 date format (i.e. YYYY-MM-DD). | ||
1. The metadata must include an ID column that can be used as as exact | ||
match for the sequence ID present in the FASTA headers. | ||
2. The `date` column in the metadata must be in ISO 8601 date format | ||
(i.e. YYYY-MM-DD). | ||
3. Ambiguous dates should be masked with `XX` (e.g. 2023-01-XX). | ||
|
||
## Config | ||
|
||
The config directory contains all of the default configurations for the phylogenetic workflow. | ||
|
||
[config/defaults.yaml](config/defaults.yaml) contains all of the default configuration parameters | ||
used for the phylogenetic workflow. Use Snakemake's `--configfile`/`--config` | ||
options to override these default values. | ||
[defaults/config.yaml][] contains all of the default configuration | ||
parameters used for the phylogenetic workflow. Use Snakemake's | ||
`--configfile`/`--config` options to override these default values. | ||
|
||
## Snakefile and rules | ||
|
||
The rules directory contains separate Snakefiles (`*.smk`) as modules of the core phylogenetic workflow. | ||
The modules of the workflow are in separate files to keep the main ingest [Snakefile](Snakefile) succinct and organized. | ||
Modules are all [included](https://snakemake.readthedocs.io/en/stable/snakefiles/modularization.html#includes) | ||
in the main Snakefile in the order that they are expected to run. | ||
The rules directory contains separate Snakefiles (`*.smk`) as modules | ||
of the core phylogenetic workflow. The modules of the workflow are in | ||
separate files to keep the main ingest [Snakefile][] succinct and | ||
organized. Modules are all [included][] in the main Snakefile in the | ||
order that they are expected to run. | ||
|
||
[defaults/config.yaml]: ./config/defaults.yaml | ||
[included]: https://snakemake.readthedocs.io/en/stable/snakefiles/modularization.html#includes | ||
[ingest]: ../ingest/ | ||
[Nextstrain datasets]: https://docs.nextstrain.org/en/latest/reference/glossary.html#term-dataset | ||
[Snakefile]: ./Snakefile |
Oops, something went wrong.