Module 3 Lab: g:profiler Visualization

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.

By Veronique Voisin, Ruth Isserlin, Gary Bader

Goal of the exercise

Create an enrichment map and navigate through the network

During this exercise, you will learn how to create an enrichment map from gene-set enrichment results. The enrichment results chosen for this exercise are generated using g:Profiler but an enrichment map can be created directly from output from GSEA, g:Profiler, GREAT, BinGo, Enrichr or alternately from any gene-set tool using the generic enrichment results format.

Data

The data used in this exercise is a list of frequently mutated genes that we used in previous exercise. Pathway enrichment analysis has been run using g:Profiler and the results have been downloaded as a GEM format.

EnrichmentMap

  • A circle (node) is a gene-set (pathway) enriched in genes that we used as input in g:Profiler (frequently mutated genes).

  • edges (lines) represent genes in common between 2 pathways (nodes).

  • A cluster of nodes represent overlapping and related pathways and may represent a common biological process.

  • Clicking on a node will display the genes included in each pathway.

Description of this exercise

We run and saved g:Profiler results using different parameters. An enrichment map represents the result of enrichment analysis as a network where significantly enriched gene-sets that share a lot of genes in common will form identifiable clusters. The visualization of the results as these biological themes will ease the interpretation of the results.

The goal of this exercise is to learn how to:

  1. upload g:Profiler results into Cytoscape EnrichmentMap to create a map.
  2. upload several g:Profiler results at the same time to create one map and learn how to distinguish the results.
  3. to compare the differences resulting from the use of different g:Profiler parameters at the enrichment map level.

Start the exercise

To start the lab practical section, first create a gprofiler_files directoty on your computer and download the files below.

Right click on link below and select “Save Link As…”.

Place it in the corresponding module directory of your CBW work directory.

Five files are needed for this exercise:

  1. Enrichment result 1: gProfiler_hsapiens_lab2_results_GEM_maxterm10000.gem.txt
  • In g:Profiler, the parameters that we used to generate this file were:
    • GO_BP no electronic annotation,
    • Reactome,
    • Benjamini-HochBerg FDR 0.05
    • No restriction on the number of genes in a geneset
  1. Enrichment result 2:gProfiler_hsapiens_lab2_results_GEM_maxterm250.gem.txt
  • In g:Profiler, the parameters that we used were:
    • GO_BP no electronic annotation,
    • Reactome,
    • Benjamini-HochBerg FDR 0.05.
    • The results were filtered using the g:Profiler Term size slidebar and only the enriched gene-sets that contain equal or less than 250 genes per gene-set were included in the result file (gProfiler_hsapiens_max250.gem.txt).
  1. Enrichment result 3:gProfiler_hsapiens_Baderlab_max250_gem.txt
  2. Pathway database 1 (.gmt):gprofiler_full_hsapiens.name.gmt
  • This file can be downloaded directly or can be been created by concatenating the hsapiens.GO/BP.name.gm and the hsapiens.REAC.name.gmt files contained in the g:Profiler gprofiler_hsapiens.name folder.
  1. Pathway database 2 (.gmt):Human_GOBP_AllPathways_no_GO_iea_September_01_2020_symbol_max250gssize.gmt

Exercise 1a - compare different gprofiler geneset size results

Step 1

Launch Cytoscape and open the EnrichmentMap App

1a. Double click on Cytoscape icon

1b. Open EnrichmentMap App

  • In the Cytoscape top menu bar:

  • Click on Apps -> EnrichmentMap

  • A ‘Create Enrichment Map’ window is now opened.

Step 2

Create an enrichment map from 2 datasets and with a gmt file.

2a. In the ‘Create Enrichment Map’ window, drag and drop the 2 enrichment files gProfiler_hsapiens_max10000.gem.txt and gProfiler_hsapiens_max250.gem.txt.

workflow

2b. In the white box, click on "gProfiler_hsapiens_max250.gem (Generic/gProfiler)

2c. On the right side, go to the GMT field, click on the 3 radio button (…) and locate the file gprofiler_full_hsapiens.name.gmt that you have saved on your computer to upload it.

workflow

2d. In the white box, click on "gProfiler_hsapiens_max10000 (Generic/gProfiler)

2e. On the right side, go to the GMT field, click on the 3 radio button (…) and locate the file gprofiler_full_hsapiens.name.gmt that you have saved on your computer to upload it.

2f. Locate the FDR q-value cutoff field and set the value to 0.001

2g. Select the Connectivity slide bar to sparse.

workflow

2h. Click on Build.

  • a status bar should pop up showing progress of the Enrichment map build.

workflow

Step3: Explore the results:

In the EnrichmentMap control panel located at the left:

  • select the 2 Data Sets (checked by default)
  • set Chart Data o Color by Data Set
  • select Publication Ready to remove gene-set label to have a global view of the map.

un-select Publication Ready when you explore the map in more detail to see the gene-set names.

workflow

On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.

  • A node that is blue is a gene-set that is found only in the file gProfiler_hsapiens_max10000 .
  • A node that is green is a gene-set that is found only in the file gProfiler_hsapiens_max250 .
  • A blue edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_max10000.
  • A green edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_max250.gem.

workflow

We can see clusters of blue nodes. All these nodes contain gene-sets that have more than 250 genes. Explore the detailed view (see below) to see if this cluster corresponds to informative terms.

Would you have lost information by filtering gene-sets larger than 250 genes?

Explore Detailed results

  • In the Cytoscape menu bar, select ‘View" and ’Show Graphic Details’ to display node labels.

Make sure you have unselected “Publication Ready” in the EnrichmentMap control panel.

  • Zoom in to be able to read the labels and navigate the network using the bird eye view (blue rectangle).

  • Select a node and visualize the Table Panel

    • Click on a node

    • For this example the node “Signaling by Notch” has been selected.

you can type it in the search bar, quotes are important.

workflow

When the node is selected, it is highlighted in yellow. In table panel, we can see the genes included in the gene-set. A green colored box indicates that the gene is in the gene-set(pathway) and in our gene list. A gray colored box indicated that the gene is in the gene-set but not in our gene list.

workflow

Exercise 1b - Is specifying the gmt file important?

Create an enrichment map without a gmt file to compare the results with Exercise 1a.

  • Go to Control Panel and select the EnrichmentMap tab.
  • Click on the “+” sign to re-open the Create Enrichment Map window.

    workflow

  • In the white box, select the "gProfiler_hsapiens_max250.gem (Generic/gProfiler) file
  • Locate the GMT field and delete the file name , leaving it blank.
  • In the white box, select the "gProfiler_hsapiens_max10000 (Generic/gProfiler) file
  • Locate the GMT field and delete the file name , leaving it blank.
  • Use same parameters as in exercise 1a: FDR q-value cutoff of 0.001 and Connectivity to sparse.
  • Click on Build

workflow

Explore the results:

In the EnrichmentMap control panel located at the left:

  • select the 2 Data Sets (selecteded by default)
  • set Chart Data o Color by Data Set
  • select Publication Ready to remove gene-set label to have a global view of the map.

uncheck this box when you explore the map in details to see the gene-set names.

workflow

On the map, a node that is coloured both green and blue is a gene-set that is found in the both of the 2 gProfiler result sets that we have been uploaded.

  • A node that is blue is a gene-set that is found only in the file gProfiler_hsapiens_max10000 .
    • A node that is green is a gene-set that is found only in the file gProfiler_hsapiens_max250 .
  • A blue edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_max10000.
  • A green edge represents genes that overlap between gene-sets found in the file gProfiler_hsapiens_max250.gem.

workflow

Conclusion of exercises 1 a and 1b:

Loading a gmt file to create an enrichment map from g:Profiler result is optional. However, there are 2 main beneficial aspects to uploading a gmt file:

  1. the map will be less condensed and easier to read and interpret.
  2. clicking on a node will display all genes in the gene-set and not only genes included in our query list.

Exercise 1c - create EM from results using Baderlab genesets

Create an enrichment map from the results of g:Profiler generated using the custom Baderlab gene-set file.
To get a map that is easy to read and that does not display too many gene-sets, one option is to focus the analysis on gene-sets (pathways) that contain 250 genes or less. We prefiltered our pathway database prior to upload it into g:Profiler so that FDR is calculated only on these gene-sets ( as opposed to exercise 1a where the FDR was calculated on all gene-sets and then some gene-sets > 250 genes were excluded from the result file ).

workflow

Explore the results:

workflow

SAVE YOUR CYTOSCAPE SESSION (.cys) FILE !

Exercise 1c (optional) - investigate individual pathways in GeneMANIA or String

Each node in the Enrichment map represents a biological process or pathway. It consists of a collection of genes. Often we want to know how the genes in that group interact. There are many different ways you can investigate the underlying interactions for the given group. Some involve searching online databases and others are directly integrated into cytoscape.

  • GeneMANIA - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. Cytoscape App
  • String - - an integrative database of gene connections including co-expression, protein interactions, genetic interactions, pathways and more. Cytoscape App
  • Pathway Commons - a intergrative database of pathways. (There is a beta feature in EM to show your pathway in the painter app, a pathway common web page that overlays your expression data on the given pathway. Still in beta testing and requires expression data to work correctly so won’t work for this example)

GeneMANIA

  • Navigate to the enrichment map that you created using the Baderlab genesets
    • Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
    • or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
  • In the cytoscape search bar enter “Signaling by Notch”

If you can’t see the selected nodes, click on “Fit Selected” to focus on the selected node.
workflow

  • Right click on the node “Signaling by Notch” and Select Apps –> Enrichmemt Map - Show in GeneMANIA

    workflow

  • A GeneMANIA Query Panel will pop up.

  • Select Select genes with expression to reduce the query set to just the genes in the given that pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 11 genes are found in our original query set )

  • Click on OK

    workflow

  • A GeneMANIA network will show up with the connections between the genes found in your query set and the pathway “Signaling by Notch”

    workflow

  • We will go more in depth into GeneMANIA in module 6

String

  • Navigate to the enrichment map that you created using the Baderlab genesets
    • Click on Network Tab and navigate to the third network (it should be the third network if you followed the above examples - name: gProfiler_hsapiens_Baderlab_max250_gem)
    • or in the Enrichment map panel in the top drop down select the network named gProfiler_hsapiens_Baderlab_max250_gem
  • In the cytoscape search bar enter “Signaling by Notch”

If you can’t see the selected nodes, click on “Fit Selected” to focus on the selected node.
workflow

  • Right click on the node “Signaling by Notch” and Select Apps –> Enrichmemt Map - Show in String

    workflow

  • A String Query Panel will pop up.

  • Select Select genes with expression to reduce the query set to just the genes in the given that pathway that was in your original dataset (for example we search for a set of 127 genes in g:profiler but the given pathway has 233 genes associated with it of which only 11 genes are found in our original query set )

  • Click on OK

    workflow

  • A String network will show up with the connections between the genes found in your query set and the pathway “Signaling by Notch”

    workflow

Explore the features and data of each Cytoscape app.
What sort of information does each tell you?
What is the main difference between the two resulting networks?