Releases: CFSAN-Biostatistics/bettercallsal
bettercallsal v1.0.0
bettercallsal
is published!
Read our article at Frontiers in Microbiology.
This is the major milestone release (v1.0.0
) of bettercallsal
with the following changes:
- The Illumina short read pipeline (
bettercallsal
) has been very stable over many hundreds of runs and we are excited to announce support for Oxford Nanopore long reads which can be activated with--pipeline bettercallsal_lr
option. - The long read variant of the pipeline uses
filtlong
for long read filtering andflye
for assembly. abricate
databases have been updated.
Disclaimer
Please note that bettercallsal_lr
is still undergoing development. As more datasets become available for testing, we plan to release new versions of the software with parameter fine tuning and/or tool changes.
bettercallsal v0.7.0
bettercallsal
is published!
Read our article at Frontiers in Microbiology.
This is the v0.7.0
release of bettercallsal
with following changes:
- Fixes SSL issue reported in #1.
bettercallsal
can now skip collecting hits (RefSeq or GenBank accessions) belonging to a NCBI BioProject. We have found that some genome assemblies had no corresponding run data submitted to NCBI (SRR's) and/or these genome assemblies were not up to internal quality metrics. We plan to incorporate additional genome quality filters in future releases.-
By default, accessions belonging to BioProjects
PRJNA766315,PRJNA675435,PRJNA831577,PRJNA855361
are ignored duringmash screen
step. -
Use the
--tuspy_skip
command-line option to either turn this off or add more BioProject accessions:cpipes --pipeline bettercallsal --tuspy_skip false ...........
or
cpipes --pipeline bettercallsal --tuspy_skip 'PRJNA766315,PRJNA675435,PRJNA831577,PRJNA855361,PRJNA855362' ...........
-
- You can now turn off MultiQC report generation if the number of input samples is too big with
--multiqc_run false
command-line option.
All of our internal benchmark studies used the PDG000000002.2727
version of the NCBI Pathogens release for Salmonella enterica.
- You can download the pre-formatted database generated for
PDG000000002.2727
withbettercallsal_db
workflow from S3.
bettercallsal v0.6.1
bettercallsal
is published!
Read our article at Frontiers in Microbiology.
This is the v0.6.1
release of bettercallsal
with following changes:
- NCBI
datasets
is updated tov15.3.1
. - Improved the stability of the
bettercallsal_db
workflow.- It is strongly recommended that you run the
bettercallsal_db
workflow in grid computing or similar cloud computing (Ex:aws batch
) setting for the following reasons:- The second process of the workflow (
FILTER_PDG_METADATA
) spawns ~469 jobs to first query NCBI to fetch accessions for whom genome FASTA's are downloadable. - The penultimate process of the workflow (
SCAFFOLD_GENOMES
) also spawns >300 jobs to download the genome FASTA's and scaffold them.
- The second process of the workflow (
- Internal tests show that with stable internet connection, the
bettercallsal_db
workflow finishes in approximately 1 hour and 30 minutes.
- It is strongly recommended that you run the
- Fixed a bug wherein the creation of
per_snp_cluster
db was not retaining the genomes whose contig / scaffold sizes are identical per thewaterfall
algorithm.
bettercallsal v0.6.0
bettercallsal
is published!
Read our article at Frontiers in Microbiology.
This is the v0.6.0
release of bettercallsal
with following changes:
- Now, post
kma
alignment, reads classified as Salmonella spp. are fed intomegahit
for metagenome assembly. - Running
megahit
assembler enables the use ofmlst
for sequence typing andabricate
for AMR profiling.- The first three caveats for
abricate
apply.
- The first three caveats for
- MultiQC is updated to
v1.14
. - By default
megahit
process is ON. The user has the ability to turn off themegahit
assembly via:--megahit_run false
CLI option. bettercallsal
v0.6.0
has been stress tested on samples with more than 300 million reads (~323 million reads) successfully and uses no more than a maximum of 27 GBs for any of the processes.- Squashed some bugs.
bettercallsal v0.5.0
bettercallsal
preprint is out at biorxiv.org.
This is the v0.5.0
release of bettercallsal
with following changes:
- Now,
fastp
is used to perform quality control and adapter trimming. By default, custom adapters are not used but can be turned on with--fastp_use_custom_adapters true
option.- The custom adapters distributed with the software will be used automatically with
--fastp_use_custom_adapters true
command-line option. Please note that this will make the workflow run slow due to exhaustive search for all possible adapter and primer sequences. - To use your own adapter FASTA, supply a valid UNIX path to
--fastp_adapter_fasta
command-line option.- Ex:
--fastp_use_custom_adapters true --fastp_adapter_fasta /path/to/custom/adapters.fasta
.
- Ex:
- The custom adapters distributed with the software will be used automatically with
- Based on internal research, we found that running the workflow on concatenated R1+R2 reads yields better results if the input library is paired-end. This version of
bettercallsal
automatically concatenates the R1 and R2 reads if the input data set is paired-end. Use the command-line option--fq_single_end false
to trigger this step.- Please note that you need to set the
--fq_suffix
and--fq_suffix2
options correctly. By default--fq_suffix
is set to.fastq.gz
. For paired-end,--fq_suffix
may have to be set to--fq_suffix '_R1_001.fastq.gz'
.
- Please note that you need to set the
- Addition of a global Salmonella presence/absence table in the MultiQC report.
- Squashed some bugs.
bettercallsal v0.4.0
This is the v0.4.0
release of bettercallsal
with following changes:
- Now,
sourmash search
is an option on top of the defaultsourmash gather
which is used as an additional step to further narrow down possible serotype hits based on genome fraction. Please refer to sourmash docs about which one is appropriate for your use case. Thesourmash search
can be activated by turning offsourmash gather
via the--sourmashgather_run false
CLI option. - By default, 10 CPU cores are used to run all workflow steps. You can change this behavior and set maximum CPU cores to be used via the
--max_cpus
CLI option. Ex:--max_cpus 5
. - The minimum memory requirements have been successfully re-tested with all workflow steps and now the
bettercallsal
workflow requires only 16 GBs of memory instead of 64 GBs. - The
v0.4.0
ofbettercallsal
has been successfully tested and works in cloud environment with AWS Batch. You need to set up the proper AWS Batch resources per Nextflow docs. Another example: Manual AWS Batch Configuration. - Squashed some bugs.
bettercallsal v0.3.0
This is the v0.3.0
release of bettercallsal
with following changes:
- Now,
sourmash
is used as an additional step to further narrow down possible serotype hits based on genome fraction. - An ANI Containment matrix is generated for all Samples vs Genomes in the final MultiQC report.
- The FastQC results in the MultiQC report are moved further down.
- Bug fixes.
bettercallsal v0.2.1
This release is a hotfix for error related to --bcs_root_dbdir
.
bettercallsal v0.2.0
The first release of the Nextflow workflow called bettercallsal
to assign Salmonella serotype, mostly in samples that are suspected to be a multi-serovar mixture, based on NCBI Pathogens project.