- Introduction: Quick Start, Tutorial, Flowchart, Outputs structure
- Install: Dependencies, Containers, References, Test datasets
- Inputs: Data, Design, Parameters
- 1. Preprocessing: ATAC reads, ATAC peaks, mRNA
- 2. Differential Analysis: ATAC, mRNA, Split
- 3. Enrichment Analysis: Enrichment, Figures, Tables
- Introduction
- Enrichment__computing_functional_annotations_overlaps
- Enrichment__computing_genes_self_overlaps
- Enrichment__computing_peaks_overlaps
- Enrichment__computing_motifs_overlaps
- Enrichment__reformatting_motifs_results
- Enrichment__computing_enrichment_pvalues
In this section, the gene lists and genomic regions from the splitting process are overlapped with various databases. Five standardized columns are made for each database:
tgt
: the target against which the overlap is computedtot_tgt
: total number of target entriestot_da
: total number of entries in the DAS (Differential Analysis Subset)ov_da
: overlap of entries from the DAS and the targettot_nda
: total number of entries not in the DASov_nda
: overlap of entries not in the DAS and with entries in the target
Note: Entries not in the DAS refers to all genes or detected regions (macs2 peaks or promoter) detected in the assay that are not present in the DAS.
These standardized columns are then used in subsequent process to compute pvalues and making figures and tables. The columns that are unique to a particular analysis are described in the corresponding process.
The key of each DAS is then augmented by adding the EC (Enrichment Category) variable. Thus the key becomes: ${ET}__${PA}__${FC}__${TV}__${COMP}__{EC}
.
With, as defined in the splitting process, the variables:
- ET: Experiment Type
- PA: DAR Peak Annotation
- FC: Fold Change type
- TV: Theshold Value(s)
- COMP: Comparison
- EC: Enrichment Category
And EC can be any of these:
- func_anno_{BP,MF,CC,KEGG}: Ontologie databases GO_BP, GO_CC, GO_MF and KEGG
- CHIP: Transcription factor CHIP-Seq profiles
- chrom_states: Chromatin states from the specified chromatin state file
- motifs: Transcription factors motifs sequences
- peaks_self: Genomic regions DASs from the current experiment
- genes_self: Gene list DASs from the current experiment.
e.g.: key = ATAC__all__down__1000__hmg4_vs_ctl__func_anno_BP__enrich.
Note: Please see the References section for details on how the external databases were downloaded and preprocessed, as well as details on the labels of the targets used in the figures and tables.
For all genomic regions enrichment analysis, the regions not in the DAS are used as a background for computing the significance of the overlaps. While for genes enrichment analysis and option is provided (params.use_nda_as_bg_for_func_anno) to either non DAS genes as a background or all genes in the database.
Overlap of gene lists with functional annotation databases is performed using clusterProfiler. These columns are added to the exported table:
tgt_id
: the id of the ontologygenes_id
: the list of enriched genes collapsed with a "/".
- params.do_func_anno_enrichment: enable or disable this process. Default: true.
- params.use_nda_as_bg_for_func_anno: use non-differentially expressed genes as the background for differentially analysis. If FALSE, all genes in the database are used. Default: 'FALSE'.
- params.func_anno_databases: which database(s) to query for functional annotation enrichment analysis (KEEG, GO BP, GO CC or GO MF). Options: 'KEGG', 'CC', 'MF', 'BP'. Default: ['BP', 'KEGG'].
- params.simplify_cutoff: Similarity cutoff to removed redundant go terms. Default: 0.8.
In this process, all genes sets from DASs of the splitting process are overlapped with each other.
This process takes as input genomic regions (bed files) from various sources and overlap them with genomic regions (bed files) of DASs from the splitting process.
The input genomic regions are:
- CHIP
- Chromatin states (hiHMM or ChromHMM)
- genomic regions of DASs from the splitting process -> for computing self overlap of genomic regions DASs within the experiment.
- params.chromatin_state_1: Chromatin state to use. Options are listed in the
references/${specie}/encode_chromatin_states_metadata.csv
file. Mandatory. No default. - params.chip_ontology: CHIP ontology to use to filter the ENCODE CHIP files. Options are listed in the
references/${specie}/available_chip_ontology_groups.txt
file and details on the groups can be found in the filereferences/${specie}/encode_chip_metadata.csv
file. Default: 'all'.
This process uses HOMER to compute the overlap of genomic regions of DASs in CIS-BP motifs.
- params.do_motif_enrichment: enable or disable this process. Default: true.
- params.homer__nb_threads: number of threads used by Bowtie2. Default: 6.
- Homer output folder:
Processed_Data/3_Enrichment_Analysis/motifs__raw/${key}
Homver results tables are formatted in R to add the standardized columns necessary for computing pvalues.
This process takes all overlap processes, estimates significance and format tables.
Hypergeometric minimum-likelihood two-sided p-values (pval
) are obtained with a two-sided Fisher's Exact Test in R. Two-sided tests are recommended for GO enrichment anlaysis since in most cases both enrichment and depletion can be biologically meaningful (see reference).
Log2 odd ratios (L2OR) is the log2 of the test's estimate.
Pvalues are then adjusted (padj
) using Benjamini and Hochberg's False Discovery Rate.
The pt_da
and pt_nda
columns are added to indicate the percentage of overlap of the target with the Differential Analysis subset (DA) (pt_da
) or non-DA (pt_nda
) entries.
A gene enrichment type column is added for functional annotation enrichment, to specify the gene database used.
Results are sorted by adjusted pvalues (padj
, descending order) and overlap of DA results (ov_da
, ascending order).
Finally, each elements of the key (ET
, PA
, FC
, TV
, COMP
) are split in a separate column in the table as well as the target (tgt
).
- params.motifs_test_type: The test to use for motif inputs. If 'Binomial' a two-sided binomial test is performed instead of the two-sided Fisher's Exact Test. Options: 'binomial' or 'fischer' (any value). Default: 'binomial'.
- Overlap tables:
Tables_Individual/3_Enrichment_Analysis/${EC}/${key}__enrich.{csv,xlsx}
Tables_Merged/3_Enrichment_Analysis/${EC}.{csv,xlsx}
,