Skip to content

Commit

Permalink
Merge pull request #120 from sanger-tol/fixes
Browse files Browse the repository at this point in the history
Fixes comma, image links and explainer
  • Loading branch information
DLBPointon authored Jun 23, 2023
2 parents 73f53de + 262cf2b commit d592835
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 21 deletions.
36 changes: 18 additions & 18 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ This subworkflow reads the input .yaml via the use of the built-in snakeyaml.Yam

This workflow generates a .genome file which describes the base pair length of each scaffold in the reference genome. This is performed by [SAMTOOLS_FAIDX](https://nf-co.re/modules/samtools_faidx) to generate a .fai file. This index file is trimmed using local module [GENERATE_GENOME_FILE](../modules/local/generate_genome_file.nf) to output a .genome file. This file is then recycled into the workflow to be used by a number of other subworkflows.

![Generate genome workflow](images/treeval_1_0_generate_genome.jpeg)
![Generate genome workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_generate_genome.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### LONGREAD_COVERAGE

Expand Down Expand Up @@ -74,9 +74,9 @@ This workflow generates a .genome file which describes the base pair length of e

The GAP_FINDER subworkflow generates a bed file containing the genomic locations of the gaps in the sequence. This is performed by the use of [SEQTK_CUTN]() which cuts the input genome at sites of N (gaps). [GAP_LENGTH]() then calculates the lengths of gaps generates in the previous step, this file is injected into the hic_maps at a later stage. SEQTK's output bed file is then BGzipped and indexed by [TABIX_BGZIPTABIX](https://nf-co.re/modules/tabix_bgziptabix).

![Gap Finder workflow](images/treeval_1_0_gap_finder.jpeg)
![Gap Finder workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_gap_finder.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### REPEAT_DENSITY

Expand All @@ -89,9 +89,9 @@ The GAP_FINDER subworkflow generates a bed file containing the genomic locations
</details>
This uses [WindowMasker](https://github.com/goeckslab/WindowMasker) to mark potential repeats on the genome. The genome is chunked into 10kb bins which move along the entire genome as sliding windows in order to profile the repeat intensity. Bedtools is then used to intersect the bins and WindowMasker fragments. These fragments are then mapped back to the original assembly for visualization purposes.

![Repeat Density workflow](images/treeval_1_0_repeat_density.jpeg)
![Repeat Density workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_repeat_density.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### HIC_MAPPING

Expand All @@ -106,9 +106,9 @@ This uses [WindowMasker](https://github.com/goeckslab/WindowMasker) to mark pote
</details>
The HIC_MAPPING subworkflow takes a set of HiC read files in .cram format as input and derives HiC mapping outputs in .pretext, .hic, and .mcool formats. These outputs are used for visualization on [PretextView](https://github.com/wtsi-hpag/PretextView), [Juicebox](https://github.com/aidenlab/Juicebox), and [Higlass](https://github.com/higlass/higlass) respectively.

![Hic Mapping workflow](images/treeval_1_0_hic_mapping.jpeg)
![Hic Mapping workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_hic_mapping.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### TELO_FINDER

Expand All @@ -125,9 +125,9 @@ The HIC_MAPPING subworkflow takes a set of HiC read files in .cram format as inp

The TELO_FINDER subworkflow uses a supplied (by the .yaml) telomeric sequence to identify putative telomeric regions in the input genome. This is acheived via the use of [FIND_TELOMERE_REGIONS](../modules/local/find_telomere_regions.nf), the output of which is used to generate a telomere.windows file with [FIND_TELOMERE_WINDOWS](../modules/local/find_telomere_windows.nf) (Both of these modules utilise VGP derived telomere programs [found here](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere)), data for each telomeric site is then extracted into bed format with [EXTRACT_TELO](../modules/local/extract_telo.nf) and finally BGZipped and indexed with [TABIX_BGZIPTABIX](https://nf-co.re/modules/tabix_bgziptabix/tabix_bgziptabix).

![Telomere Finder workflow](images/treeval_1_0_telo_finder.jpeg)
![Telomere Finder workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_telo_finder.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### BUSCO_ANALYSIS

Expand All @@ -142,9 +142,9 @@ The TELO_FINDER subworkflow uses a supplied (by the .yaml) telomeric sequence to

The BUSCO_ANNOTATION subworkflow takes an assembly genome as input and extracts a list of [BUSCO](https://gitlab.com/ezlab/busco) genes based on the BUSCO results obtained from BUSCO. Additionally, it provides an overlap BUSCO gene set based on a list of lepidoptera ancestral genes((Wright et al., 2023), which has been investigated by Charlotte Wright from Mark Blaxter's lab at the Sanger Institute.

![Busco analysis workflow](images/treeval_1_0_busco_analysis.jpeg)
![Busco analysis workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_busco_analysis.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### GENE_ALIGNMENT

Expand Down Expand Up @@ -193,9 +193,9 @@ PUNCHLIST: Punchlists contain information on genes found to be duplicated (fully

The INSILICO_DIGEST workflow is used to visualize the Bionano enzyme cutting sites for a genome FASTA file. It starts by identifying the recognition sequences of the labeling enzyme to create a CMAP file. This CMAP file is then converted into BED and BIGBED formats to provide visualizations of the Bionano enzyme cutting sites. This procedure generates data tracks based on three digestion enzymes: BSPQ1, BSSS1, and DLE1.

![Insilico digest workflow](images/treeval_1_0_insilico_digest.jpeg)
![Insilico digest workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_insilico_digest.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### SELFCOMP

Expand All @@ -209,9 +209,9 @@ The INSILICO_DIGEST workflow is used to visualize the Bionano enzyme cutting sit

he SELFCOMP subworkflow is a comparative genomics analysis originally performed by the Ensembl project. It involves comparing the genes and genomic sequences within a single species. The goal of the analysis is mainly to identify haplotypic duplications in a particular genome assembly.

![Selfcomp workflow](images/treeval_1_0_selfcomp.jpeg)
![Selfcomp workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_selfcomp.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### SYNTENY

Expand All @@ -225,9 +225,9 @@ he SELFCOMP subworkflow is a comparative genomics analysis originally performed

This worflows searches along predetermined path for syntenic genome files based on clade and then aligns with [MINIMAP2_ALIGN](https://nf-co.re/modules/minimap2_align) each to the reference genome, emitting an aligned .paf file for each.

![Synteny workflow](images/treeval_1_0_synteny.jpeg)
![Synteny workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_synteny.jpeg)

![Workflow Legend](images/treeval_1_0_legend.jpeg)
![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg)

### Pipeline information

Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/hic_mapping.nf
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ workflow HIC_MAPPING {
ch_versions = ch_versions.mix ( SAMTOOLS_MARKDUP.out.versions.first() )

//
// MODULE: SAMTOOLS FILTER READS | BAMTOBED | SORT BED FILE
// MODULE: SAMTOOLS FILTER OUT DUPLICATE READS | BAMTOBED | SORT BED FILE
//
BAMTOBED_SORT( SAMTOOLS_MARKDUP.out.bam )
ch_versions = ch_versions.mix(BAMTOBED_SORT.out.versions)
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/longread_coverage.nf
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ workflow LONGREAD_COVERAGE {
.set { view_input }

//
// MODULE: GET PRIMARY BAM
// MODULE: EXTRACT READS FOR PRIMARY ASSEMBLY
//
SAMTOOLS_VIEW(
view_input.map { [it[0], it[1], it[3]] },
Expand Down
2 changes: 1 addition & 1 deletion workflows/treeval_rapid.nf
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ workflow TREEVAL_RAPID {
//
// SUBWORKFLOW: GENERATE TELOMERE WINDOW FILES WITH PACBIO READS AND REFERENCE
//
TELO_FINDER ( GENERATE_GENOME.out.max_scaff_size
TELO_FINDER ( GENERATE_GENOME.out.max_scaff_size,
GENERATE_GENOME.out.reference_tuple,
YAML_INPUT.out.teloseq
)
Expand Down

0 comments on commit d592835

Please sign in to comment.