logo

Set of R notebooks to transform expression data to a ranked list and run them through Pathay enrichment pipeline. Pathway enrichment analysis helps gain mechanistic insight into large gene lists typically resulting from genome scale (–omics) experiments. It identifies biological pathways that are enriched in the gene list more than expected by chance. We explain pathway enrichment analysis and present a practical step-by-step guide to help interpret gene lists. The protocol is designed for biologists with no prior bioinformatics training and uses freely available software including g:Profiler, GSEA, Cytoscape and Enrichment Map.

The main protocol involves translating a ranked list to an Enrichment map. The figure below outlines the main steps.

EM Protocol Flow Chart

Table of Contents

  1. Download TCGA data - R notebook shows you how to download legacy microarray and rnaseq ovarian cancer data. The notebook can be modified to download any data from GDC
  2. Supplemental Protocol 1 - convert raw RNASeq expression data to a ranked list
  3. Supplemental Protocol 2 - convert RMA normalized microarray expression data to a ranked list
  4. Supplemental Protocol 3 - perform Pathway Enrichment Analysis in R using ROAST and Camera
  5. Supplemental Protocol 4 - perform phenotype randomizations using edgeR with GSEA.
  6. Supplemental Protocol 5 - Creates a Multi dataset Enrichment Map (using GSEA results from multiple rank files) and followed by Theme calculation and summary.
  7. Protocol 2 - run GSEA on ranked list and automatically create an Enrichment Map from the results.
  8. Protocol 2 add on - take the resulting EM from Protocol 2 and annotate it. Pull all the annotations directly into R to be used for further analysis.

Data Files used for the Protocol

  1. Supplmentary table 1 - Cancer drivers. List of genes used as g:Profiler input.
  2. Supplmentary table 2 - GSEA Rank file. List of genes and their assoicated ranks used as input for GSEA.
  3. Supplmentary table 3 - GSEA GMT file. Pathway defintion file used for the protocol as GSEA input. Updated gmt files can be found on the baderlab download site here
  4. Supplmentary table 4 - Example g:Profiler results file.
  5. Supplmentary table 5 - Example g:Profiler gmt file. If using example g:Profiler results file instead of regenerating the results make sure to use this gmt file. If the g:Profiler results file has been regenerated then make sure to download an updated gmt file that corresponds to the results file as specified in the protocol.
  6. Supplmentary table 6 - Example expression file (corresponding to example rank file, supplementary 2) used when creating an Enrichment Map from GSEA results.
  7. Supplmentary table 7 - Example class file (corresponding to example expression file, supplementary 6) used when creating an Enrichment Map from GSEA results.
  8. Supplmentary table 8 - Example GSEA results file 1.
  9. Supplmentary table 9 - Example GSEA results file 2.
  10. Supplmentary table 10 - Example RANSeq expression input file used in Supplementary Protocol 1. File used to generate rank file, supplementary table 2, used in the main protocol.
  11. Supplmentary table 11 - Example sample definition input file used in Supplementary Protocol 1. File used to generate rank file, supplementary table 2, used in the main protocol.
  12. Supplmentary table 12 - Example Microarray expression input file used in Supplementary Protocol 2.
  13. Supplmentary table 13 - Example sample definition input file used in Supplementary Protocol 2.

Download all the above files here

See the GitHub repository for the source.