Skip to content

Releases: CFSAN-Biostatistics/bettercallsal

bettercallsal v1.0.0

10 Sep 21:53
Compare
Choose a tag to compare

bettercallsal is published!

Read our article at Frontiers in Microbiology.

This is the major milestone release (v1.0.0) of bettercallsal with the following changes:

  • The Illumina short read pipeline (bettercallsal) has been very stable over many hundreds of runs and we are excited to announce support for Oxford Nanopore long reads which can be activated with --pipeline bettercallsal_lr option.
  • The long read variant of the pipeline uses filtlong for long read filtering and flye for assembly.
  • abricate databases have been updated.


 

Disclaimer


Please note that bettercallsal_lr is still undergoing development. As more datasets become available for testing, we plan to release new versions of the software with parameter fine tuning and/or tool changes.

bettercallsal v0.7.0

02 Jan 18:30
Compare
Choose a tag to compare

bettercallsal is published!

Read our article at Frontiers in Microbiology.

This is the v0.7.0 release of bettercallsal with following changes:

  • Fixes SSL issue reported in #1.
  • bettercallsal can now skip collecting hits (RefSeq or GenBank accessions) belonging to a NCBI BioProject. We have found that some genome assemblies had no corresponding run data submitted to NCBI (SRR's) and/or these genome assemblies were not up to internal quality metrics. We plan to incorporate additional genome quality filters in future releases.
    • By default, accessions belonging to BioProjects PRJNA766315,PRJNA675435,PRJNA831577,PRJNA855361 are ignored during mash screen step.

    • Use the --tuspy_skip command-line option to either turn this off or add more BioProject accessions:

      cpipes --pipeline bettercallsal --tuspy_skip false ...........

      or

      cpipes --pipeline bettercallsal --tuspy_skip 'PRJNA766315,PRJNA675435,PRJNA831577,PRJNA855361,PRJNA855362' ...........
  • You can now turn off MultiQC report generation if the number of input samples is too big with --multiqc_run false command-line option.

All of our internal benchmark studies used the PDG000000002.2727 version of the NCBI Pathogens release for Salmonella enterica.

  • You can download the pre-formatted database generated for PDG000000002.2727 with bettercallsal_db workflow from S3.

bettercallsal v0.6.1

24 Aug 14:38
Compare
Choose a tag to compare

bettercallsal is published!

Read our article at Frontiers in Microbiology.

This is the v0.6.1 release of bettercallsal with following changes:

  • NCBI datasets is updated to v15.3.1.
  • Improved the stability of the bettercallsal_db workflow.
    • It is strongly recommended that you run the bettercallsal_db workflow in grid computing or similar cloud computing (Ex: aws batch) setting for the following reasons:
      • The second process of the workflow (FILTER_PDG_METADATA) spawns ~469 jobs to first query NCBI to fetch accessions for whom genome FASTA's are downloadable.
      • The penultimate process of the workflow (SCAFFOLD_GENOMES) also spawns >300 jobs to download the genome FASTA's and scaffold them.
    • Internal tests show that with stable internet connection, the bettercallsal_db workflow finishes in approximately 1 hour and 30 minutes.
  • Fixed a bug wherein the creation of per_snp_cluster db was not retaining the genomes whose contig / scaffold sizes are identical per the waterfall algorithm.

bettercallsal v0.6.0

02 Aug 20:00
Compare
Choose a tag to compare

bettercallsal is published!

Read our article at Frontiers in Microbiology.

This is the v0.6.0 release of bettercallsal with following changes:

  • Now, post kma alignment, reads classified as Salmonella spp. are fed into megahit for metagenome assembly.
  • Running megahit assembler enables the use of mlst for sequence typing and abricate for AMR profiling.
  • MultiQC is updated to v1.14.
  • By default megahit process is ON. The user has the ability to turn off the megahit assembly via: --megahit_run false CLI option.
  • bettercallsal v0.6.0 has been stress tested on samples with more than 300 million reads (~323 million reads) successfully and uses no more than a maximum of 27 GBs for any of the processes.
  • Squashed some bugs.

bettercallsal v0.5.0

02 Jun 15:41
Compare
Choose a tag to compare

bettercallsal preprint is out at biorxiv.org.

This is the v0.5.0 release of bettercallsal with following changes:

  • Now, fastp is used to perform quality control and adapter trimming. By default, custom adapters are not used but can be turned on with --fastp_use_custom_adapters true option.
    • The custom adapters distributed with the software will be used automatically with --fastp_use_custom_adapters true command-line option. Please note that this will make the workflow run slow due to exhaustive search for all possible adapter and primer sequences.
    • To use your own adapter FASTA, supply a valid UNIX path to --fastp_adapter_fasta command-line option.
      • Ex: --fastp_use_custom_adapters true --fastp_adapter_fasta /path/to/custom/adapters.fasta.
  • Based on internal research, we found that running the workflow on concatenated R1+R2 reads yields better results if the input library is paired-end. This version of bettercallsal automatically concatenates the R1 and R2 reads if the input data set is paired-end. Use the command-line option --fq_single_end false to trigger this step.
    • Please note that you need to set the --fq_suffix and --fq_suffix2 options correctly. By default --fq_suffix is set to .fastq.gz. For paired-end, --fq_suffix may have to be set to --fq_suffix '_R1_001.fastq.gz'.
  • Addition of a global Salmonella presence/absence table in the MultiQC report.
  • Squashed some bugs.

bettercallsal v0.4.0

17 Mar 19:29
Compare
Choose a tag to compare

This is the v0.4.0 release of bettercallsal with following changes:

  • Now, sourmash search is an option on top of the default sourmash gather which is used as an additional step to further narrow down possible serotype hits based on genome fraction. Please refer to sourmash docs about which one is appropriate for your use case. The sourmash search can be activated by turning off sourmash gather via the --sourmashgather_run false CLI option.
  • By default, 10 CPU cores are used to run all workflow steps. You can change this behavior and set maximum CPU cores to be used via the --max_cpus CLI option. Ex: --max_cpus 5.
  • The minimum memory requirements have been successfully re-tested with all workflow steps and now the bettercallsal workflow requires only 16 GBs of memory instead of 64 GBs.
  • The v0.4.0 of bettercallsal has been successfully tested and works in cloud environment with AWS Batch. You need to set up the proper AWS Batch resources per Nextflow docs. Another example: Manual AWS Batch Configuration.
  • Squashed some bugs.

bettercallsal v0.3.0

27 Dec 15:39
Compare
Choose a tag to compare

This is the v0.3.0 release of bettercallsal with following changes:

  • Now, sourmash is used as an additional step to further narrow down possible serotype hits based on genome fraction.
  • An ANI Containment matrix is generated for all Samples vs Genomes in the final MultiQC report.
  • The FastQC results in the MultiQC report are moved further down.
  • Bug fixes.

bettercallsal v0.2.1

30 Nov 14:32
Compare
Choose a tag to compare

This release is a hotfix for error related to --bcs_root_dbdir.

bettercallsal v0.2.0

29 Nov 16:07
Compare
Choose a tag to compare

The first release of the Nextflow workflow called bettercallsal to assign Salmonella serotype, mostly in samples that are suspected to be a multi-serovar mixture, based on NCBI Pathogens project.