Releases: suhrig/arriba
Releases · suhrig/arriba
Arriba v2.4.0
- new utility script to annotate exon numbers
- compatibility with Illumina's Dragen aligner (see notes in manual about supported aligners)
- retained fraction of protein domain was often overestimated as 100%
- better agreement of transcripts between Arriba's output file and the visualizations produced by
draw_fusions.R
by making--transcriptSelection=provided
the default - better matching of structural variant breakpoints to fusion breakpoints when parameter
-d
is used - VCF files generated by
scripts/convert_fusions_to_vcf.sh
are now compatible withbcftools
- mildly improved filtering
Arriba v2.3.0
- blacklist PhiX genome, since it is often used as spike-in control
- stricter filtering of read-through fusions
- fix broken compilation due to outdated zlib URL (thanks to @iainrb)
- updated protein domain annotation files (GFF3), now with 7-15% more annotation records
- updated reference files in
download_references.sh
to match protein domain annotation version download_references.sh
did not properly harmonize chromosome names between assembly (FastA) and annotation (GTF) when an assembly withchr
prefix was used (hg19/38, mm10/39), which had minor implications on alignment and fusion calling- coverage plots can be scaled separately and/or to a user-defined cutoff (
--coverageRange=...
) - scripts are now compatible with macOS (a recent version of bash must be installed, though; the preinstalled version 3.2 is too old)
- minor fixes for reading frame prediction when breakpoint is close to first/last exon
Arriba v2.2.1
- reverted a change introduced in v2.2.0:
download_references.sh
now uses the ENSEMBL GRCh38 assembly (FastA) again instead of the ICGC-ARGO assembly, because the latter contains ALT contigs, which is not recommended for alignment using STAR according to the STAR user manual; moreover, due to scripting error, the GRCh38 assembly generated bydownloaded_references.sh
contained malformed data at the end of the file, which is now fixed as well
Arriba v2.2.0
- improved detection of internal tandem duplications
- better sensitivity for the detection of viral integration sites
- inclusion of additional ~4500 viruses into screening, including rare strains of cancer-associated viruses (requires rebuild of STAR index)
- viral contigs were renamed to be compliant with the SAM format specification (requires rebuild of STAR index)
- support for mm39/GRCm39
- utility scripts (see also manual):
- quantify virus expression
- convert Arriba's custom output format to VCF
- extract fusion-supporting alignments into separate mini-BAM
- running Arriba on a prealigned BAM file and realigning only the fusion candidate reads saves ~80% of the CPU time compared to a complete realignment (useful when the alignments were generated by an old STAR version or by a different aligner such as HISAT2)
- polishing of fusion visualizations created by
draw_fusions.R
and new features:- all transcripts can be drawn at the same scale if desired (
--fixedScale
) - circos plots have same size across all pages
- set PDF title and print as header on every page (
--sampleName
) - fine-grained control over region to draw for intergenic breakpoints (
--showIntergenicVicinity
) - choose a different font (
--fontFamily
) - better scaling for coverage track
- all transcripts can be drawn at the same scale if desired (
- more fixes for prediction of reading frame
- better warnings and error messages
- updated STAR to version 2.7.10a, which fixes malformed chimeric alignments for paired-end reads with small insert size
- updated dependencies (HTSlib, libdeflate)
Arriba v2.1.0
- Arriba can now be cited
- arcs in circos plot are colored by type of rearrangement
- internal tandem duplications are flagged with the keyword
ITD
in Arriba's output file - more effective filtering of germline polymorphism internal tandem duplications
draw_fusions.R
loads reference files faster- under some rare conditions, the reading frame was erroneously predicted as out-of-frame
Arriba v2.0.0
- report viral integration sites
- report fusions supported by multi-mapping reads (e.g., CIC-DUX4, NPM1-ALK)
- report internal tandem duplications (e.g., FLT3, BCOR, ERBB2, NOTCH1)
- improved detection of IG/TCR rearrangements
- known fusions file based on the Mitelman database is now part of the download
- more comprehensive annotation (gene IDs, transcript IDs, user-defined tags, retained protein domains)
- support for mouse (mm10)
- (optionally) report the full transcript/peptide sequence (parameter
-I
) rather than only what can be assembled from the supporting reads - structural variants can be supplied in VCF format (parameter
-d
) - MacOS support
- faster loading of BAM files thanks to HAT-trie map as well as other speed improvements
draw_fusions.R
accepts the format of STAR-Fusion- ability to make use of external duplicate marking, e.g., for UMIs (parameter
-u
) - enhanced blacklist
- simplified code compilation procedure
- support assemblies with up to 65,000 contigs (previously 32,000)
Important compatibility notes when upgrading from version 1.x:
- STAR version >= 2.7.6a is required to make use of multi-mapping chimeric reads
- new columns were added to the output files and some were rearranged
- the parameter
-P
is obsolete; the parameters-I
and-T
have been repurposed - parsing of input TSV files (GTF, known fusions, blacklist, structural variants) is now stricter
- the order of the genes in the known fusions file (parameter
-k
) is now important - the
reading_frame
column may contain the new valuestop-codon
- the
site1/2
columns may contain new values - the parameters of the
run_arriba.sh
script have changed - the
download_references.sh
script is now parameterized using environment variables - the
chr
prefix is no longer removed from the output files - the alignment parameters of
run_arriba.sh
are set to report up to 50 multi-mapping reads - some filters were removed/renamed, which is relevant if the parameter
-f
is used
Arriba v1.2.0
- better filtering of in vitro-generated artifacts
known_fusions
filter is more sensitive- update dependencies (HTSlib, compression libs)
- example data
- under some (rare) conditions, reading frame was incorrect
- documentation provides tips on how to interpret fusion predictions
- better error messages for common cases of incorrect usage
Arriba v1.1.0
- speed improvements (BAM file loading, low_entropy filter, homologs filter, GTF parsing)
- prebuilt Docker image available at Docker Hub
- installation via bioconda
- improved confidence scoring
- better detection of intragenic rearrangements
- new blacklist
- fix some non-deterministic behavior
- more reliable auto-detection of strandedness
- protein domains were drawn in incorrect order for genes on the reverse strand
- handle empty input files more reasonably
Arriba v1.0.1
- fix bugs in parsing of command-line arguments
- do not compress intermediate (unsorted) BAM file to avoid performance bottleneck
Arriba v1.0.0
- streamlined workflow (
extract_read-through_fusions
is obsolete) - generate publication-quality figures of fusions
- predict peptide sequences
- protein domain track for loading into IGV
- Singularity recipe
- CRAM support
- simplified installation procedure
- fix off-by-one error (=> new blacklists!)
- improved sensitivity/specificity