-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Standardize calls from fusion callers to retain information regarding fused genes,breakpoints, reading frame information as well as annotation from FusionAnnotator.
Input : Merged fusion calls per caller with additional columns "annots" and "tumor_id" ad sample identifier
Output : Standardized fusion calls with following columns
LeftBreakpoint : 5' gene breakpoint
RightBreakpoint : 3' gene breakpoint
FusionName : geneA--geneB
Sample : tumor_id used by user in merged samples set
Caller : eg StarFusion, Arriba etc
Fusion_Type : reading frame information
JunctionReadCount : junction supporting reads
SpanningFragCount : fragments spanning the fusion
Confidence : Confidence provided from caller if not NA
annots : Annotation provided by user; recommended FusionAnnotator
Filter standardized fusion calls to remove false positives with low read support, annotated as read-throughs, found in normal and gene homolog databases and if both fused genes are not expressed above the given threshold.
Input : Standardized fusion calls from step1
Output : Standardized fusion calls after filtering readthroughs, artifacts from annotation column and fusions with low read supports
If both genes of the fusion are deemed not expressed < 1 FPKM or TPM (default), the fusion transcript calls can be removed as fusion transcript most probably isn't expressed
Input : Standardized fusion calls from step1
Output : Standardized fusion calls after filtering fusions where both genes which are expressed below threshold (default 1)
3) a. Annotate genes with biological features of interest eg. Kinase, Tumor suppressor etc. annotate_fusion_calls()
Input : Filtered standardized fusion calls from step2
Ouptut : Standardized fusion calls with annotation per gene. Since callers like arriba also call intergenic fusions we have divided the fused genes as gene1A--gene1B geneic fusion between gene1A and gene2A; if fusion has intergenic 5' breakpoint then the fusion name would be gene1A/gene2A--gene1B and if 3' breakpoint is intergenic the fusion name would be gene1A--gene1B/gene2B Additional columns are:
Gene1A_anno : annotation per gene from reference gene list for gene1a
Gene1B_anno : annotation per gene from reference gene list for gene1b
Gene2A_anno : annotation per gene from reference gene list for gene2a
Gene2B_anno : annotation per gene from reference gene list for gene2b
Fusion_anno : annotation per gene from reference fusion list
reciprocal_exists : annotation per fusion if a reciprocal exists in the same sample
Input : Filtered standardized fusion calls from step3a
Output : Domain retention status for Gene1A and Gene1B for the given pfamIDs is also annotated; defaults to kinase domain retention status information Additional columns are:
DomainRetainedGene1A, DomainRetainedGene1B with values No
Partial
Yes
described here
Input : Filtered standardized fusion calls from step3a
Output : Annotation from zscore calculation using a normal expression matrix to identify if expression is differential compared to normal per gene in fusion Additional columns are:
note_expression_[Gene1A |Gene2A |Gene1B|Gene2B]: "differential expressed" if zscore is more than threshold or NA "
zscore_[Gene1A |Gene2A |Gene1B|Gene2B] : zscore calculated from using normal expression matrix
Output from step 3 for cohort samples or annoFuse_single_sample() per sample can be provided to shinyFuse() or standalone app http://shiny.imbei.uni-mainz.de:3838/shinyFuse/ to explore the results and generate minimal domain, breakpoint location plots and cohort recurrence plots.
Input : Filtered fusion calls annotated with reference gene and/or normal zscore
Output : Provides distribution of intra-chromosomal and inter-chromosomal fusions, number of in-frame and frameshift calls per algorithm, and distribution of gene biotypes, kinase group, and oncogenic annotation summary plot