forked from BaderLab/CBW_Pathways_2021
-
Notifications
You must be signed in to change notification settings - Fork 1
/
3.1-Module3lab_gprofiler.Rmd
369 lines (230 loc) · 19.6 KB
/
3.1-Module3lab_gprofiler.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
# Module 3 Lab: g:profiler Visualization {#gprofiler_mod3}
**This work is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/deed.en_US). This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.**
By Gary Bader, Ruth Isserlin, Chaitra Sarathy, Veronique Voisin
## Goal of the exercise
**Create an enrichment map and navigate through the network**
During this exercise, you will learn how to create an enrichment map from gene-set enrichment results. The enrichment results chosen for this exercise are generated using [g:Profiler](https://biit.cs.ut.ee/gprofiler/gost) but an enrichment map can be created directly from output from [GSEA](http://software.broadinstitute.org/gsea/index.jsp),
[g:Profiler](https://biit.cs.ut.ee/gprofiler/gost),
[GREAT](http://great.stanford.edu/public/html/),
[BinGo](http://apps.cytoscape.org/apps/bingo), [Enrichr](https://amp.pharm.mssm.edu/Enrichr/) or alternately from any gene-set tool using the generic enrichment results (GEM) format.
## Data
The data used in this exercise is a list of frequently mutated genes that we used in [previous exercise](#gprofiler-lab).
Pathway enrichment analysis has been run using g:Profiler and the results have been downloaded as a GEM format.
## EnrichmentMap
* A circle (node) is a gene-set (pathway) enriched in genes that we used as input in g:Profiler (frequently mutated genes).
* edges (lines) represent genes in common between 2 pathways (nodes).
* A cluster of nodes represent overlapping and related pathways and may represent a common biological process.
* Clicking on a node will display the genes included in each pathway.
<img src="./Module3/gprofiler/images/example_cluster.png" />
## Description of this exercise
We will run the saved g:Profiler results (from [Module 2 - gprofiler lab](#gprofiler-lab)) using different parameters.
An enrichment map represents the result of enrichment analysis as a network where significantly enriched gene-sets that share a lot of genes in common will form identifiable clusters. The visualization of the results as these biological themes will ease the interpretation of the results.
The goal of this exercise is to learn how to:
1. Upload g:Profiler results into Cytoscape EnrichmentMap to create a map.
1. Upload several g:Profiler results at the same time to create one map and learn how to distinguish and compare the results.
1. To compare the differences resulting from the use of different g:Profiler parameters at the enrichment map level.
## Start the exercise
To start the lab practical section, first create a gprofiler_files directory on your computer and download the files below.
```{block, type="rmd-datadownload"}
Right click on link below and select "Save Link As...".
Place it in the corresponding module directory of your CBW work directory.
```
Five files are needed for this exercise:
1. Enrichment result 1: [gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt](./Module3/gprofiler/data/gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt)
* In g:Profiler, the parameters that we used to generate this file were:
* GO_BP no electronic annotation,
* Reactome,
* WikiPathways,
* Benjamini-Hochberg FDR 0.05
* The results were filtered using the *Term size* slidebar. Only the enriched gene-sets containing more than 3 and less than or equal to 10000 genes per gene-set were included in the result file.
2. Enrichment result 2: [gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt](./Module3/gprofiler/data/gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt)
* In g:Profiler, the parameters that we used were:
* GO_BP no electronic annotation,
* Reactome,
* WikiPathways,
* Benjamini-HochBerg FDR 0.05.
* The results were filtered using the *Term size* slidebar. Only the enriched gene-sets that contain more than 3 and less than or equal to 250 genes per gene-set were included in the result file.
3. Enrichment result 3: [gProfiler_hsapiens_Baderlab_max250.gem.txt](./Module3/gprofiler/data/gProfiler_hsapiens_Baderlab_max250.gem.txt)
4. Pathway database 1: [gprofiler_full_hsapiens.name.gmt](./Module3/gprofiler/data/gprofiler_full_hsapiens.name.gmt)
* This file can be downloaded directly or can be been created by concatenating the hsapiens.GO/BP.name.gmt, hsapiens.WP.namt.gmt and the hsapiens.REAC.name.gmt files contained in the g:Profiler gprofiler_hsapiens.name folder.
5. Pathway database 2: [Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt](./Module3/gprofiler/data/Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt)
## Exercise 1a - compare different gprofiler geneset size results
### Step 1
Launch Cytoscape and open the EnrichmentMap App
1a. Double click on Cytoscape icon
1b. Open EnrichmentMap App
* In the Cytoscape top menu bar:
* Click on Apps -> EnrichmentMap
<img src="./Module3/gprofiler/images/EM1.png" />
* A 'Create Enrichment Map' window is now opened.
### Step 2
Create an enrichment map from 2 datasets and with a gmt file.
2a. In the '**Create Enrichment Map**' window, drag and drop the 2 enrichment files *gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt* and
*gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt*.
<img src="./Module3/gprofiler/images/gem0.png" alt="workflow" width="100%" />
2b. In the white box, click on "*gProfiler_hsapiens_lab2_results_GEM_termmin3_max250 (Generic/gProfiler)*"
2c. On the right side, go to the **GMT** field, click on the 3 radio button (...) and locate the file *gprofiler_full_hsapiens.name.gmt* that you have saved on your computer to upload it.
<img src="./Module3/gprofiler/images/gem1.png" alt="workflow" width="100%" />
2d. In the white box, click on "*gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 (Generic/gProfiler)*"
2e. On the right side, go to the **GMT** field, click on the 3 radio button (...) and locate the file *gprofiler_full_hsapiens.name.gmt* that you have saved on your computer to upload it.
2f. Locate the **FDR q-value cutoff** field and set the value to 0.001
2g. Select the **Connectivity** slide bar to **sparse**.
<img src="./Module3/gprofiler/images/gem2.png" alt="workflow" width="100%" />
```{block, type="rmd-tip"}
Intstead of specifying the gmt file for each dataset separately, if all the dataasets in your analysis use the same gmt file, you can specify a common gmt file to be used by all datasets.
* Click on *Common Files (included in all datasets)*
* On the right side, go to the *GMT file* field, click on the 3 radio button (...) and locate the file *gprofiler_full_hsapiens.name.gmt* that you have saved on your computer to upload it.
<img src="./Module3/gprofiler/images/common_gmt.png" alt="workflow" width="100%" />
This can also be done for a shared expression file.
```
2h. Click on *Build*.
* A status bar should pop up showing progress of the Enrichment map build.
<p align="center">
<img src="./Module3/gprofiler/images/gem3.png" alt="workflow" width="60%"/>
</p>
### Step3: Explore the results:
In the EnrichmentMap control panel located at the left:
* Select the 2 Data Sets (checked by default)
* Set Chart Data o *Color by Data Set*
* Select *Publication Ready* to remove gene-set label to have a global view of the map.
```{block, type="rmd-tip"}
un-select *Publication Ready* when you explore the map in more detail to see the gene-set names.
```
<p align="center">
<img src="./Module3/gprofiler/images/gem3a.png" alt="workflow" width="70%" />
</p>
On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.
* A node that is blue is a gene-set that is found only in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000* .
* A node that is green is a gene-set that is found only in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max250* .
* A blue edge represents genes that overlap between gene-sets found in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000*.
* A green edge represents genes that overlap between gene-sets found in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem*.
<img src="./Module3/gprofiler/images/gem6.png" alt="workflow" width="100%" />
We can see clusters of blue nodes. All these nodes contain gene-sets that have more than 250 genes. Explore the detailed view (see below) to see if this cluster corresponds to informative terms.
```{block, type="rmd-question"}
Would you have lost information by filtering gene-sets larger than 250 genes?
```
### Explore Detailed results
* In the Cytoscape menu bar, select 'View" and 'Show Graphic Details' to display node labels.
```{block, type="rmd-caution"}
Make sure you have unselected "Publication Ready" in the EnrichmentMap control panel.
```
* Zoom in to be able to read the labels and navigate the network using the bird eye view (blue rectangle).
* Select a node and visualize the *Table Panel*
* Click on a node
* For this example the node *"Signaling by Notch"* has been selected.
```{block, type="rmd-tip"}
you can type it in the search bar, quotes are important.
```
<img src="./Module3/gprofiler/images/gem8.png" alt="workflow" width="50%" />
When the node is selected, it is highlighted in <font color="yellow">yellow</font>.
In table panel, we can see the genes included in the gene-set.
A green colored box indicates that the gene is in the gene-set(pathway) and in our gene list.
A gray colored box indicated that the gene is in the gene-set but not in our gene list.
<img src="./Module3/gprofiler/images/gem8a.png" alt="workflow" width="100%" />
## Exercise 1b - Is specifying the gmt file important?
Create an enrichment map without a gmt file to compare the results with Exercise 1a.
* Go to Control Panel and select the EnrichmentMap tab.
* Click on the "+" sign to re-open the *Create Enrichment Map* window.
<p align="center">
<img src="./Module3/gprofiler/images/gem7.png" alt="workflow" width="50%" />
</p>
* In the white box, select the "*gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem (Generic/gProfiler)*" file
* Locate the GMT field and delete the file name, leaving it blank.
* In the white box, select the "*gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 (Generic/gProfiler)*" file
* Locate the GMT field and delete the file name , leaving it blank.
* Use same parameters as in [exercise 1a](#exercise-1a): FDR q-value cutoff of 0.001 and Connectivity to sparse.
* Click on *Build*
<img src="./Module3/gprofiler/images/gem5.png" alt="workflow" width="100%" />
Explore the results:
In the EnrichmentMap control panel located at the left:
* Select the 2 Data Sets (selecteded by default)
* Set Chart Data o *Color by Data Set*
* Select *Publication Ready* to remove gene-set label to have a global view of the map.
```{block, type="rmd-tip"}
Uncheck this box when you explore the map in details to see the gene-set names.
```
<p align="center">
<img src="./Module3/gprofiler/images/gem3a.png" alt="workflow" width="70%" />
</p>
On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.
* A node that is blue is a gene-set that is found only in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000* .
* A node that is green is a gene-set that is found only in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max250* .
* A blue edge represents genes that overlap between gene-sets found in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000*.
* A green edge represents genes that overlap between gene-sets found in the file *gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem*.
<img src="./Module3/gprofiler/images/gem4.png" alt="workflow" width="100%" />
**Conclusion of exercises 1 a and 1b:**
Loading a gmt file to create an enrichment map from g:Profiler result is optional. However, there are 2 main beneficial aspects to uploading a gmt file:
1. The map will be less condensed and easier to read and interpret.
1. Clicking on a node will display all genes in the gene-set and not only genes included in our query list.
## Exercise 1c - create EM from results using Baderlab genesets
Create an enrichment map from the results of g:Profiler generated using the custom Baderlab gene-set file. <br>
To get a map that is easy to read and that does not display too many gene-sets, one option is to focus the analysis on gene-sets (pathways) that contain 250 genes or less. We prefiltered our pathway database prior to upload it into g:Profiler so that FDR is calculated only on these gene-sets (as opposed to exercise 1a where the FDR was calculated on all gene-sets and then some gene-sets > 250 genes were excluded from the result file). For this exercise, we will use:
* Filtered gmt file: [Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt](./Module3/gprofiler/data/Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt).
* We have uploaded this file as a custom gmt file in g:Profiler and run the query. (in Module 2 lab)
* To create an enrichment map of these results:
* Go to Control Panel and select the EnrichmentMap tab.
* Click on the "+" sign to re-open the *Create Enrichment Map* window.
<p align="center">
<img src="./Module3/gprofiler/images/gem7.png" alt="workflow" width="50%" />
</p>
* Drag the file that we created in Module 2 lab [gProfiler_hsapiens_Baderlab_max250.gem.txt](./Module3/gprofiler/data/gProfiler_hsapiens_Baderlab_max250.gem.txt) and the filtered gmt file ([Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt](./Module3/gprofiler/data/Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt) into the Datasets box on Enrichment map panel.
* In the white box, select the "*gProfiler_hsapiens_Baderlab_max250.gem.txt (Generic/gProfiler)*" file
* Locate the GMT field and upload the file "*Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt*".
* Set the **FDR q-value cutoff** to 0.001 and set the **Connectivity** slide bar to second level.
<img src="./Module3/gprofiler/images/gem9.png" alt="workflow" width="100%" />
Explore the results:
<img src="./Module3/gprofiler/images/gem10.png" alt="workflow" width="100%" />
```{block, type="rmd-caution"}
SAVE YOUR CYTOSCAPE SESSION (.cys) FILE !
```
## Exercise 1d (optional) - investigate individual pathways in GeneMANIA or String
Each node in the Enrichment map represents a biological process or pathway. It consists of a collection of genes. Often we want to know how the genes in that group interact. There are many different ways you can investigate the underlying interactions for the given group. Some involve searching online databases and others are directly integrated into cytoscape.
* [GeneMANIA](https://genemania.org/) - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. **Cytoscape App**
* [String](https://string-db.org/) - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. **Cytoscape App**
* [Pathway Commons](https://www.pathwaycommons.org/) - a intergrative database of pathways. (There is a beta feature in EM to show your pathway in the painter app, a pathway common web page that overlays your expression data on the given pathway. Still in beta testing and requires expression data to work correctly so won't work for this example)
### GeneMANIA
* Navigate to the enrichment map that you created using the Baderlab genesets
* Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
* or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
* In the cytoscape search bar enter *"Signaling by Notch"*
```{block, type="rmd-tip"}
If you can't see the selected nodes, click on "Fit Selected" to focus on the selected node.<br>
<img src="./Module3/gprofiler/images/gem11a.png" alt="workflow" width="50%" />
```
* Right click on the node *"Signaling by Notch"* and Select *Apps* --> *Enrichmemt Map - Show in GeneMANIA*
<img src="./Module3/gprofiler/images/gem11.png" alt="workflow" width="75%" />
* A GeneMANIA Query Panel will pop up.
* Select *Select genes with expression* to reduce the query set to just the genes in the given pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 10 genes are found in our original query set )
* Click on *OK*
<img src="./Module3/gprofiler/images/gem12.png" alt="workflow" width="75%" />
* A GeneMANIA network will show up with the connections between the genes found in your query set and the pathway "Signaling by Notch"
<img src="./Module3/gprofiler/images/gem13.png" alt="workflow" width="75%" />
* We will go more in depth into [GeneMANIA in module 5](#genemania_cytoscape)
### String
* Navigate to the enrichment map that you created using the Baderlab genesets
* Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
* or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
* In the cytoscape search bar enter *"Signaling by Notch"*
```{block, type="rmd-tip"}
If you can't see the selected nodes, click on "Fit Selected" to focus on the selected node.<br>
<img src="./Module3/gprofiler/images/gem11a.png" alt="workflow" width="50%" />
```
* Right click on the node *"Signaling by Notch"* and Select *Apps* --> *Enrichmemt Map - Show in String*
<img src="./Module3/gprofiler/images/gem14.png" alt="workflow" width="100%" />
* A String Query Panel will pop up.
* Select *Select genes with expression* to reduce the query set to just the genes in the given that pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 10 genes are found in our original query set )
* Click on *OK*
<img src="./Module3/gprofiler/images/gem15.png" alt="workflow" width="75%" />
* A String network will show up with the connections between the genes found in your query set and the pathway "Signaling by Notch"
<img src="./Module3/gprofiler/images/gem16.png" alt="workflow" width="75%" />
```{block, type="rmd-question"}
Explore the features and data of each Cytoscape app.<br>What sort of information does each tell you? <br> What is the main difference between the two resulting networks?
```
___
## Bonus - Automation.
Run analysis directly from R for easy integration into existing pipelines.
```{block, type="rmd-bonus"}
Instead of creating an Enrichment map manually through the user interface you can create an enrichment map directly using the [RCy3 bioconductor package](https://www.bioconductor.org/packages/release/bioc/html/RCy3.html) or through direct rest calls with [Cytoscape cyrest](https://apps.cytoscape.org/apps/cyrest).
Follow the step by step instructions on how to run from R here - https://risserlin.github.io/CBW_pathways_workshop_R_notebooks/create-enrichment-map-from-r-with-gprofiler-results.html
First, make sure your environment is set up correctly by following there instructions - https://risserlin.github.io/CBW_pathways_workshop_R_notebooks/setup.html
```