Releases: soedinglab/plass
Release 5-cf8933
Plass & Penguin Release Notes
First release of Penguin, a metagenomic assembler that assembles DNA/RNA through a novel greedy AA/DNA-hybrid bayesian overlap extension strategy.
New Features and Enhancements
- First release of Penguin: We generate now two binaries,
plass
andpenguin
. Plass assembles protein sequences from DNA while Penguin assembles DNA contigs. Penguin comes in two variantspenguin guided_nuclassemble
, which first assembles using AA six-framed-translated overlaps and then further assemble the contigs using nucleotide information and a pure nucleotide assemblerpenguin nuclassemble
. - Compatibility and Portability: Thanks to simde Plass and Penguin now run on ARM (including Apple Silicon) and POWERPC.
Plass Release 4-687d7
Changes since Release 3-764a3:
At a glance: Significant further development of the nucleotide/hybrid assembler. Updated MMseqs2 submodule and adjusted Plass to multiple MMseqs2 changes.
Features
- Plass can extend one contig multiple times within one iteration
- Hybrid assembly is progressing nicely, stay tuned for updated!
- Plass works on many more architectures (e.g. PPC64LE, ARM64 and x64 with SSE2 only)
Plass Release 3-764a3
Changes since Release 2-c7e35:
At a glance: Significant further development of the nucleotide assembler. Reduced hard disk requirements for protein assembler and many bug fixes.
Updated mmseqs submodule and adjusted plass to multiple MMseqs2 changes.
Breaking Changes
- added reverse complement treatment for nucleotide sequences (plass nuclassemble)
- introduced
--kmer-per-seq-scale
parameter to make sure not to miss good hits of long sequences. The number of extracted kmers can now be scaled with a user defined factor multiplied by the length of the sequence. - changed scoring mode for alignment calculation (
--rescore-mode 3
)
Features
- add stdin support.
cat reads.fas | plass assemble stdin asm tmp
- reduced hard disk requirements by roughly a factor of 12 (
--delete-tmp-inc
) - added a first raw version of a cycle detector (still experimental) to avoid over extension for nucleotide assembly
- introduced a new header format, which is now consistent for protein and nucleotide assembler
<uniq ID> len:<len> cycle:<0|1>
The cycle field is optional (for the nucleotide case) - introduced a new logic to handle sequences with N repeated k-mers: sequences with more than N repeated k-mers are no longer ignored in the assembly process completely, but instead repeated k-mers are only ignored in the
kmermatcher
phase. Replaced--skip-n-repeat
parameter by--ignore-multi-kmer
- overlaps are still sorted by ScorePerColumn but the bit score was replaced by the raw score to scale correctly with the overlap length
- introduced
--min-contig-len
parameter to set minimum length of assembled contig to output (for nucleotide assembly) - added redundancy reduction (for nucleotide assembly) by clustering sequences based on user defined threshold (
--clust-thr
, default 0.97) - Dockerfile now uses Debian slim instead of alpine
Bugs
- fixed problems in the first iteration of the protein assembler
- fixed problems with start and stop codons occurring in the transition from protein alignments to nucleotide alignments and alignment offset calculation
- split file existence check in workflows to individual checks to avoid repeated linking problems
- fixed bug in the reverse complement calculation for N's in nucleotide sequences
- fixed different problems for long sequences regarding the kmermatching phase
- fixed broken compilation without zlib
Plass Release 2-c7e35
Changes since release 1-2e0ef
- overlaps are now sorted by score per column instead of sequence identity
- new flag to change neural network threshold of filtering proteins
--protein-filter-threshold
- improve neural network by retraining with cleaner training data
- add support to merge paired-end files with different length
- fix a bug in start codon correction
Plass Release 1-2e0ef
Plass Release 1-2e0ef
Plass (Protein-Level ASSembler) is a software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.
Features
- support to assemble on multiple compute using MPI
- Add
--min-length
flag to adjust codon extraction length
First release
First Plass release