Skip to content
Krutika Gaonkar edited this page Oct 6, 2020 · 4 revisions

annoFuse for cohort

1) Fusion Standardization fusion_standardization()

Standardize calls from fusion callers to retain information regarding fused genes,breakpoints, reading frame information as well as annotation from FusionAnnotator.

Input : Merged fusion calls per caller with additional columns "annots" and "tumor_id" ad sample identifier

Output : Standardized fusion calls with following columns

LeftBreakpoint : 5' gene breakpoint

RightBreakpoint : 3' gene breakpoint

FusionName : geneA--geneB

Sample : tumor_id used by user in merged samples set

Caller : eg StarFusion, Arriba etc

Fusion_Type : reading frame information

JunctionReadCount : junction supporting reads

SpanningFragCount : fragments spanning the fusion

Confidence : Confidence provided from caller if not NA

annots : Annotation provided by user; recommended FusionAnnotator

2) a. Fusion artifact filtering fusion_filtering_QC()

Filter standardized fusion calls to remove false positives with low read support, annotated as read-throughs, found in normal and gene homolog databases and if both fused genes are not expressed above the given threshold.

Input : Standardized fusion calls from step1

Output : Standardized fusion calls after filtering readthroughs, artifacts from annotation column and fusions with low read supports

2) b. Expression based filtering expression_filter_fusion()

If both genes of the fusion are deemed not expressed < 1 FPKM or TPM (default), the fusion transcript calls can be removed as fusion transcript most probably isn't expressed

Input : Standardized fusion calls from step1

Output : Standardized fusion calls after filtering fusions where both genes which are expressed below threshold (default 1)

3) a. Annotate genes with biological features of interest eg. Kinase, Tumor suppressor etc. annotate_fusion_calls()

Input : Filtered standardized fusion calls from step2

Ouptut : Standardized fusion calls with annotation per gene. Since callers like arriba also call intergenic fusions we have divided the fused genes as gene1A--gene1B geneic fusion between gene1A and gene2A; if fusion has intergenic 5' breakpoint then the fusion name would be gene1A/gene2A--gene1B and if 3' breakpoint is intergenic the fusion name would be gene1A--gene1B/gene2B Additional columns are:

Gene1A_anno : annotation per gene from reference gene list for gene1a

Gene1B_anno : annotation per gene from reference gene list for gene1b

Gene2A_anno : annotation per gene from reference gene list for gene2a

Gene2B_anno : annotation per gene from reference gene list for gene2b

Fusion_anno : annotation per gene from reference fusion list

reciprocal_exists : annotation per fusion if a reciprocal exists in the same sample

3) b. Annotation of retained domains fusion_driver()

Input : Filtered standardized fusion calls from step3a

Output : Domain retention status for Gene1A and Gene1B for the given pfamIDs is also annotated; defaults to kinase domain retention status information Additional columns are:

DomainRetainedGene1A, DomainRetainedGene1B with values No Partial Yes described here

3) c. (OPTIONAL) Annotation of zscore using a normal expression matrix zscored_annotation()

Input : Filtered standardized fusion calls from step3a

Output : Annotation from zscore calculation using a normal expression matrix to identify if expression is differential compared to normal per gene in fusion Additional columns are:

note_expression_[Gene1A |Gene2A |Gene1B|Gene2B]: "differential expressed" if zscore is more than threshold or NA "

zscore_[Gene1A |Gene2A |Gene1B|Gene2B] : zscore calculated from using normal expression matrix

Interactive Fusion Exploration using shinyFuse()

Output from step 3 for cohort samples or annoFuse_single_sample() per sample can be provided to shinyFuse() or standalone app http://shiny.imbei.uni-mainz.de:3838/shinyFuse/ to explore the results and generate minimal domain, breakpoint location plots and cohort recurrence plots.

Project summary visualization plot_summary()

Input : Filtered fusion calls annotated with reference gene and/or normal zscore

Output : Provides distribution of intra-chromosomal and inter-chromosomal fusions, number of in-frame and frameshift calls per algorithm, and distribution of gene biotypes, kinase group, and oncogenic annotation summary plot