Introduction: Quick Start, Tutorial, Flowchart, Outputs structure
Install: Dependencies, Containers, References, Test datasets
Inputs: Data, Design, Parameters
1. Preprocessing: ATAC reads, ATAC peaks, mRNA
2. Differential Analysis: ATAC, mRNA, Split
3. Enrichment Analysis: Enrichment, Figures, Tables

Introduction

In this section, the gene lists and genomic regions from the splitting process are overlapped with various databases. Five standardized columns are made for each database:

tgt: the target against which the overlap is computed
tot_tgt: total number of target entries
tot_da: total number of entries in the DAS (Differential Analysis Subset)
ov_da: overlap of entries from the DAS and the target
tot_nda: total number of entries not in the DAS
ov_nda: overlap of entries not in the DAS and with entries in the target

Note: Entries not in the DAS refers to all genes or detected regions (macs2 peaks or promoter) detected in the assay that are not present in the DAS.

These standardized columns are then used in subsequent process to compute pvalues and making figures and tables. The columns that are unique to a particular analysis are described in the corresponding process.

The key of each DAS is then augmented by adding the EC (Enrichment Category) variable. Thus the key becomes: ${ET}__${PA}__${FC}__${TV}__${COMP}__{EC}.
With, as defined in the splitting process, the variables:

ET: Experiment Type
PA: DAR Peak Annotation
FC: Fold Change type
TV: Theshold Value(s)
COMP: Comparison
EC: Enrichment Category

And EC can be any of these:

func_anno_{BP,MF,CC,KEGG}: Ontologie databases GO_BP, GO_CC, GO_MF and KEGG
CHIP: Transcription factor CHIP-Seq profiles
chrom_states: Chromatin states from the specified chromatin state file
motifs: Transcription factors motifs sequences
peaks_self: Genomic regions DASs from the current experiment
genes_self: Gene list DASs from the current experiment.

e.g.: key = ATAC__all__down__1000__hmg4_vs_ctl__func_anno_BP__enrich.

Note: Please see the References section for details on how the external databases were downloaded and preprocessed, as well as details on the labels of the targets used in the figures and tables.

For all genomic regions enrichment analysis, the regions not in the DAS are used as a background for computing the significance of the overlaps. While for genes enrichment analysis and option is provided (params.use_nda_as_bg_for_func_anno) to either non DAS genes as a background or all genes in the database.

Enrichment__computing_functional_annotations_overlaps

Description

Overlap of gene lists with functional annotation databases is performed using clusterProfiler. These columns are added to the exported table:

tgt_id: the id of the ontology
genes_id: the list of enriched genes collapsed with a "/".

Parameters

params.do_func_anno_enrichment: enable or disable this process. Default: true.
params.use_nda_as_bg_for_func_anno: use non-differentially expressed genes as the background for differentially analysis. If FALSE, all genes in the database are used. Default: 'FALSE'.
params.func_anno_databases: which database(s) to query for functional annotation enrichment analysis (KEEG, GO BP, GO CC or GO MF). Options: 'KEGG', 'CC', 'MF', 'BP'. Default: ['BP', 'KEGG'].
params.simplify_cutoff: Similarity cutoff to removed redundant go terms. Default: 0.8.

Enrichment__computing_genes_self_overlaps

Description

In this process, all genes sets from DASs of the splitting process are overlapped with each other.

Enrichment__computing_peaks_overlaps

Description

This process takes as input genomic regions (bed files) from various sources and overlap them with genomic regions (bed files) of DASs from the splitting process.
The input genomic regions are:

CHIP
Chromatin states (hiHMM or ChromHMM)
genomic regions of DASs from the splitting process -> for computing self overlap of genomic regions DASs within the experiment.

Parameters

params.chromatin_state_1: Chromatin state to use. Options are listed in the references/${specie}/encode_chromatin_states_metadata.csv file. Mandatory. No default.
params.chip_ontology: CHIP ontology to use to filter the ENCODE CHIP files. Options are listed in the references/${specie}/available_chip_ontology_groups.txt file and details on the groups can be found in the file references/${specie}/encode_chip_metadata.csv file. Default: 'all'.

Enrichment__computing_motifs_overlaps

Description

This process uses HOMER to compute the overlap of genomic regions of DASs in CIS-BP motifs.

Parameters

params.do_motif_enrichment: enable or disable this process. Default: true.
params.homer__nb_threads: number of threads used by Bowtie2. Default: 6.

Outputs

Homer output folder: Processed_Data/3_Enrichment_Analysis/motifs__raw/${key}

Enrichment__reformatting_motifs_results

Description

Homver results tables are formatted in R to add the standardized columns necessary for computing pvalues.

Enrichment__computing_enrichment_pvalues

Description

This process takes all overlap processes, estimates significance and format tables.

Hypergeometric minimum-likelihood two-sided p-values (pval) are obtained with a two-sided Fisher's Exact Test in R. Two-sided tests are recommended for GO enrichment anlaysis since in most cases both enrichment and depletion can be biologically meaningful (see reference).
Log2 odd ratios (L2OR) is the log2 of the test's estimate. Pvalues are then adjusted (padj) using Benjamini and Hochberg's False Discovery Rate.

The pt_da and pt_nda columns are added to indicate the percentage of overlap of the target with the Differential Analysis subset (DA) (pt_da) or non-DA (pt_nda) entries.
A gene enrichment type column is added for functional annotation enrichment, to specify the gene database used.
Results are sorted by adjusted pvalues (padj, descending order) and overlap of DA results (ov_da, ascending order).

Finally, each elements of the key (ET, PA, FC, TV, COMP) are split in a separate column in the table as well as the target (tgt).

Parameters

params.motifs_test_type: The test to use for motif inputs. If 'Binomial' a two-sided binomial test is performed instead of the two-sided Fisher's Exact Test. Options: 'binomial' or 'fischer' (any value). Default: 'binomial'.

Outputs

Overlap tables:
- Tables_Individual/3_Enrichment_Analysis/${EC}/${key}__enrich.{csv,xlsx}
- Tables_Merged/3_Enrichment_Analysis/${EC}.{csv,xlsx},

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrichment.md

Enrichment.md

Table of contents

Introduction

Enrichment__computing_functional_annotations_overlaps

Description

Parameters

Enrichment__computing_genes_self_overlaps

Description

Enrichment__computing_peaks_overlaps

Description

Parameters

Enrichment__computing_motifs_overlaps

Description

Parameters

Outputs

Enrichment__reformatting_motifs_results

Description

Enrichment__computing_enrichment_pvalues

Description

Parameters

Outputs

Files

Enrichment.md

Latest commit

History

Enrichment.md

File metadata and controls

Table of contents

Introduction

Enrichment__computing_functional_annotations_overlaps

Description

Parameters

Enrichment__computing_genes_self_overlaps

Description

Enrichment__computing_peaks_overlaps

Description

Parameters

Enrichment__computing_motifs_overlaps

Description

Parameters

Outputs

Enrichment__reformatting_motifs_results

Description

Enrichment__computing_enrichment_pvalues

Description

Parameters

Outputs