Module 3 Lab: g:profiler Visualization
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.
By Gary Bader, Ruth Isserlin, Chaitra Sarathy, Veronique Voisin
Goal of the exercise
Create an enrichment map and navigate through the network
During this exercise, you will learn how to create an enrichment map from gene-set enrichment results. The enrichment results chosen for this exercise are generated using g:Profiler but an enrichment map can be created directly from output from GSEA, g:Profiler, GREAT, BinGo, Enrichr or alternately from any gene-set tool using the generic enrichment results (GEM) format.
Data
The data used in this exercise is a list of frequently mutated genes that we used in previous exercise. Pathway enrichment analysis has been run using g:Profiler and the results have been downloaded as a GEM format.
EnrichmentMap
A circle (node) is a gene-set (pathway) enriched in genes that we used as input in g:Profiler (frequently mutated genes).
edges (lines) represent genes in common between 2 pathways (nodes).
A cluster of nodes represent overlapping and related pathways and may represent a common biological process.
Clicking on a node will display the genes included in each pathway.
Description of this exercise
We will run the saved g:Profiler results (from Module 2 - gprofiler lab) using different parameters. An enrichment map represents the result of enrichment analysis as a network where significantly enriched gene-sets that share a lot of genes in common will form identifiable clusters. The visualization of the results as these biological themes will ease the interpretation of the results.
The goal of this exercise is to learn how to:
- Upload g:Profiler results into Cytoscape EnrichmentMap to create a map.
- Upload several g:Profiler results at the same time to create one map and learn how to distinguish and compare the results.
- To compare the differences resulting from the use of different g:Profiler parameters at the enrichment map level.
Start the exercise
To start the lab practical section, first create a gprofiler_files directory on your computer and download the files below.
Right click on link below and select “Save Link As…”.
Place it in the corresponding module directory of your CBW work directory.
Five files are needed for this exercise:
- Enrichment result 1: gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt
- In g:Profiler, the parameters that we used to generate this file were:
- GO_BP no electronic annotation,
- Reactome,
- WikiPathways,
- Benjamini-Hochberg FDR 0.05
- The results were filtered using the Term size slidebar. Only the enriched gene-sets containing more than 3 and less than or equal to 10000 genes per gene-set were included in the result file.
- Enrichment result 2: gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt
- In g:Profiler, the parameters that we used were:
- GO_BP no electronic annotation,
- Reactome,
- WikiPathways,
- Benjamini-HochBerg FDR 0.05.
- The results were filtered using the Term size slidebar. Only the enriched gene-sets that contain more than 3 and less than or equal to 250 genes per gene-set were included in the result file.
- Enrichment result 3: gProfiler_hsapiens_Baderlab_max250.gem.txt
- Pathway database 1: gprofiler_full_hsapiens.name.gmt
- This file can be downloaded directly or can be been created by concatenating the hsapiens.GO/BP.name.gmt, hsapiens.WP.namt.gmt and the hsapiens.REAC.name.gmt files contained in the g:Profiler gprofiler_hsapiens.name folder.
- Pathway database 2: Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt
Exercise 1a - compare different gprofiler geneset size results
Step 1
Launch Cytoscape and open the EnrichmentMap App
1a. Double click on Cytoscape icon
1b. Open EnrichmentMap App
In the Cytoscape top menu bar:
Click on Apps -> EnrichmentMap
- A ‘Create Enrichment Map’ window is now opened.
Step 2
Create an enrichment map from 2 datasets and with a gmt file.
2a. In the ‘Create Enrichment Map’ window, drag and drop the 2 enrichment files gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt and gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt.
2b. In the white box, click on “gProfiler_hsapiens_lab2_results_GEM_termmin3_max250 (Generic/gProfiler)”
2c. On the right side, go to the GMT field, click on the 3 radio button (…) and locate the file gprofiler_full_hsapiens.name.gmt that you have saved on your computer to upload it.
2d. In the white box, click on “gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 (Generic/gProfiler)”
2e. On the right side, go to the GMT field, click on the 3 radio button (…) and locate the file gprofiler_full_hsapiens.name.gmt that you have saved on your computer to upload it.
2f. Locate the FDR q-value cutoff field and set the value to 0.001
2g. Select the Connectivity slide bar to sparse.
Intstead of specifying the gmt file for each dataset separately, if all the dataasets in your analysis use the same gmt file, you can specify a common gmt file to be used by all datasets.
- Click on Common Files (included in all datasets)
- On the right side, go to the GMT file field, click on the 3 radio button (…) and locate the file gprofiler_full_hsapiens.name.gmt that you have saved on your computer to upload it.
This can also be done for a shared expression file.
2h. Click on Build.
- A status bar should pop up showing progress of the Enrichment map build.
Step3: Explore the results:
In the EnrichmentMap control panel located at the left:
- Select the 2 Data Sets (checked by default)
- Set Chart Data o Color by Data Set
- Select Publication Ready to remove gene-set label to have a global view of the map.
un-select Publication Ready when you explore the map in more detail to see the gene-set names.
On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.
- A node that is blue is a gene-set that is found only in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 .
- A node that is green is a gene-set that is found only in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max250 .
- A blue edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.
- A green edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.
We can see clusters of blue nodes. All these nodes contain gene-sets that have more than 250 genes. Explore the detailed view (see below) to see if this cluster corresponds to informative terms.
Would you have lost information by filtering gene-sets larger than 250 genes?
Explore Detailed results
- In the Cytoscape menu bar, select ‘View” and ’Show Graphic Details’ to display node labels.
Make sure you have unselected “Publication Ready” in the EnrichmentMap control panel.
Zoom in to be able to read the labels and navigate the network using the bird eye view (blue rectangle).
Select a node and visualize the Table Panel
Click on a node
For this example the node “Signaling by Notch” has been selected.
you can type it in the search bar, quotes are important.
When the node is selected, it is highlighted in yellow.
In table panel, we can see the genes included in the gene-set.
A green colored box indicates that the gene is in the gene-set(pathway) and in our gene list.
A gray colored box indicated that the gene is in the gene-set but not in our gene list.
Exercise 1b - Is specifying the gmt file important?
Create an enrichment map without a gmt file to compare the results with Exercise 1a.
- Go to Control Panel and select the EnrichmentMap tab.
- Click on the “+” sign to re-open the Create Enrichment Map window.
- In the white box, select the “gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem (Generic/gProfiler)” file
- Locate the GMT field and delete the file name, leaving it blank.
- In the white box, select the “gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 (Generic/gProfiler)” file
- Locate the GMT field and delete the file name , leaving it blank.
- Use same parameters as in exercise 1a: FDR q-value cutoff of 0.001 and Connectivity to sparse.
- Click on Build
Explore the results:
In the EnrichmentMap control panel located at the left:
- Select the 2 Data Sets (selecteded by default)
- Set Chart Data o Color by Data Set
- Select Publication Ready to remove gene-set label to have a global view of the map.
Uncheck this box when you explore the map in details to see the gene-set names.
On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.
- A node that is blue is a gene-set that is found only in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000 .
- A node that is green is a gene-set that is found only in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max250 .
- A blue edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.
- A green edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.
Conclusion of exercises 1 a and 1b:
Loading a gmt file to create an enrichment map from g:Profiler result is optional. However, there are 2 main beneficial aspects to uploading a gmt file:
- The map will be less condensed and easier to read and interpret.
- Clicking on a node will display all genes in the gene-set and not only genes included in our query list.
Exercise 1c - create EM from results using Baderlab genesets
Create an enrichment map from the results of g:Profiler generated using the custom Baderlab gene-set file.
To get a map that is easy to read and that does not display too many gene-sets, one option is to focus the analysis on gene-sets (pathways) that contain 250 genes or less. We prefiltered our pathway database prior to upload it into g:Profiler so that FDR is calculated only on these gene-sets (as opposed to exercise 1a where the FDR was calculated on all gene-sets and then some gene-sets > 250 genes were excluded from the result file). For this exercise, we will use:
Filtered gmt file: Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt.
We have uploaded this file as a custom gmt file in g:Profiler and run the query. (in Module 2 lab)
To create an enrichment map of these results:
Go to Control Panel and select the EnrichmentMap tab.
Click on the “+” sign to re-open the Create Enrichment Map window.
Drag the file that we created in Module 2 lab gProfiler_hsapiens_Baderlab_max250.gem.txt and the filtered gmt file (Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt into the Datasets box on Enrichment map panel.
In the white box, select the “gProfiler_hsapiens_Baderlab_max250.gem.txt (Generic/gProfiler)” file
Locate the GMT field and upload the file “Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol_max250.gmt”.
Set the FDR q-value cutoff to 0.001 and set the Connectivity slide bar to second level.
Explore the results:
SAVE YOUR CYTOSCAPE SESSION (.cys) FILE !
Exercise 1d (optional) - investigate individual pathways in GeneMANIA or String
Each node in the Enrichment map represents a biological process or pathway. It consists of a collection of genes. Often we want to know how the genes in that group interact. There are many different ways you can investigate the underlying interactions for the given group. Some involve searching online databases and others are directly integrated into cytoscape.
- GeneMANIA - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. Cytoscape App
- String - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. Cytoscape App
- Pathway Commons - a intergrative database of pathways. (There is a beta feature in EM to show your pathway in the painter app, a pathway common web page that overlays your expression data on the given pathway. Still in beta testing and requires expression data to work correctly so won’t work for this example)
GeneMANIA
- Navigate to the enrichment map that you created using the Baderlab genesets
- Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
- or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
- In the cytoscape search bar enter “Signaling by Notch”
If you can’t see the selected nodes, click on “Fit Selected” to focus
on the selected node.
Right click on the node “Signaling by Notch” and Select Apps –> Enrichmemt Map - Show in GeneMANIA
A GeneMANIA Query Panel will pop up.
Select Select genes with expression to reduce the query set to just the genes in the given pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 10 genes are found in our original query set )
Click on OK
A GeneMANIA network will show up with the connections between the genes found in your query set and the pathway “Signaling by Notch”
We will go more in depth into GeneMANIA in module 5
String
- Navigate to the enrichment map that you created using the Baderlab genesets
- Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
- or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
- In the cytoscape search bar enter “Signaling by Notch”
If you can’t see the selected nodes, click on “Fit Selected” to focus
on the selected node.
Right click on the node “Signaling by Notch” and Select Apps –> Enrichmemt Map - Show in String
A String Query Panel will pop up.
Select Select genes with expression to reduce the query set to just the genes in the given that pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 10 genes are found in our original query set )
Click on OK
A String network will show up with the connections between the genes found in your query set and the pathway “Signaling by Notch”
Explore the features and data of each Cytoscape app.
What sort of
information does each tell you?
What is the main difference between
the two resulting networks?
Bonus - Automation.
Run analysis directly from R for easy integration into existing pipelines.
Instead of creating an Enrichment map manually through the user interface you can create an enrichment map directly using the RCy3 bioconductor package or through direct rest calls with Cytoscape cyrest.
Follow the step by step instructions on how to run from R here - https://risserlin.github.io/CBW_pathways_workshop_R_notebooks/create-enrichment-map-from-r-with-gprofiler-results.html
First, make sure your environment is set up correctly by following there instructions - https://risserlin.github.io/CBW_pathways_workshop_R_notebooks/setup.html