(6A) Gene list using g:Profiler

Specify parameters

  1. Open the g:Profiler website at g:Profiler.
  2. Paste the gene list (Supplementary_Table1_Cancer_drivers.txt) into the Query field in top-left corner of the screen.
  3. Check the box next to Ordered query. This option treats the input as an ordered gene list and prioritizes genes with higher mutation ESs at the beginning of the list.
  4. Click on Advanced Options to set additional parameters.

The gene list can be space-separated or one per line. The organism for the analysis, Homo sapiens, is selected by default. The input list can contain a mix of gene and protein IDs, symbols and accession numbers. Duplicated and unrecognized IDs will be removed automatically, and ambiguous symbols can be refined in an interactive dialogue after submitting the query.

open the file in a simple text editor such as Notepad or Textedit to copy the list of genes.

workflow

  1. Set Significance threshold to "Benjamini-Hochberg FDR".






if g:Profiler does not return any results:

  • increase the User threshold (0.1, then 1) to check that g:Profiler is running successfully but there are simply no significant results for your query.
  • Alternately, click on All results to return all enrichments irrespective of significance.

workflow

  1. Click on the Data sources tab (black rectangle) to expand it.
  2. UnSelect the following gene-set databases by clicking the "clear all" button.
  3. In the Gene Ontology category, check GO Biological Process and No electronic GO annotations.
  4. In the biological pathways category, check Reactome.



No electronic GO annotations option will discard less reliable GO annotations (IEAs) that are not manually reviewed.

if g:Profiler does not return any results uncheck the No electronic GO annotation option to expand annotations used in the test.


Downlaod GMT

workflow

  • Optional but recommended: Download the required GMT pathway database file.

  • Click on the 'combined name.gmt' link located at bottom of the dataset tab. It will download a file named combined name.gmt containing a pathway database GMT file with all the available sources.

  • (Optional) concatenate the GO:BP and Reactome gmt files:

If you want to create a smaller gmt file that doesn't contain all of the gprofiler data sources you can instead download name.gmt.zip that contains each data source as its own GMT file. You will need to concatenate the sources you require into one GMT file to use for later.

Option 1: Merge manually, if you are not familiar with unix commands

  • open a text editor such a Notepad or equivalent
  • open hsapiens.GO:BP.name.gmt using the text editor
  • open gmt hsapiens.REAC.name.gmt using the text editor
  • copy-paste all the rows from REAC file together with all the rows in GO:BP file and save it as hsapiens.pathways.name.gmt .

Option 2: using the cat command if you are familiar with unix commands

  • open your terminal window
  • cd to the unzipped gprofiler_hsapiens.name folder
  • type the following command: cat hsapiens.GO:BP.name.gmt hsapiens.REAC.name.gmt > hsapiens.pathways.name.gmt

you will be using this optional hsapiens.pathways.name.gmt file in Cytoscape EnrichmentMap.


Run Query

  1. Click Run Query to run the analysis. The Run Query button will change to Run Query ... while it is running and Run Query ✓ when the query is complete. The ✓ will only remain there momentarily and then return to Run Query button.

If you miss the button changing from "..." to ✓ it might appear that nothing has happened. Scroll down to see results.

A graphical map of the resulting pathways will be shown, with the -log10(pvalue) along the y axis and the associated pathways of the input list shown along the x axis divided by data source. By default, results with Q values <0.05 are reported but all pathways are shown in the graphical map with a dotted line indicating which pathways are significant and pass correction.

Mousing over any of the dots will display the source identifier, pathway name and associated p-value.

Filter enrichment results

  1. Click Detailed Results tab to see detailed results.

A graphical heat map image will be shown, with detected pathways shown along the y axis (left) and associated genes of the input list shown along the x axis (top). Resulting pathways are listed by p-value. g:Profiler returns only statistically significant pathways with P values adjusted for multiple testing correction (called Q values). By default, results with Q values <0.05 are reported. g:Profiler reports unrecognized and ambiguous gene IDs that can be resolved manually.

For GO pathways and gene sets the boxes under each gene will be filled with different colours. The different colours represent different evidence codes. To see the legend specifying what each colour represents click on legend drop down above heatmap.

  1. Set the Term Size slider to minimum 3 and maximum 350. The analysis will consider only more reliable pathways that have three or more genes and no more than 350.

To see the specific numbers used to calculate the statistics for each gene set click on the ">>" in the header of the heatmap. This will add T (number of genes associated with the term), Q (number of genes in the query) , T intersect Q (the number of genes found in both the term and the query), U (the number of genes in the universe).

If ordered query is selected then the number of query genes associated with each pathway will fluctuate as g:Profiler sequentially increases the number of genes tested and finds the set of genes that generate the most significant result.

Downlaod Enrichment File

  1. Click on the option GEM (Generic Enrichment Map). This file is required for visualizing pathway results with Cytoscape and EnrichmentMap.