Materials

Equipment

Hardware

  • A personal computer with Internet access and ≥8 GB of RAM. 1 GB of RAM is sufficient to run GSEA analysis; however, Cytoscape (required to run EnrichmentMap software) requires ≥8 GB of RAM.

Software

Input data

We provide downloadable example files that are referred to throughout the protocol (Supplementary Tables 1–13). We recommend saving all these files in a personal project data folder before starting. We also recommend creating an additional result data folder to save the files generated while performing the protocol.

Right click on link below and select "Save Link As...".

Place it in the corresponding work directory.

A gene list or ranked gene list of interest

  • Example data for Step 6A. g:Profiler requires a list of genes, one per line, in a text file or spreadsheet, ready to copy and paste into a web page: for this, we use genes with frequent somatic SNVs identified in TCGA exome sequencing data of 3,200 tumors of 12 types(Kandoth et al. 2013). The MuSiC cancer driver mutation detection software was used to find 127 cancer driver genes that displayed higher than expected mutation frequencies in cancer samples (Supplementary Table 1, which is derived from column B of Supplementary Table 4 in (Kandoth et al. 2013)). Genes are ranked in decreasing order of significance (FDR Q value) and mutation frequency (not shown).

  • Example data for Step 6B. GSEA requires an RNK file with gene scores. An RNK file is a two-column text file with gene IDs in the first column and gene scores in the second column. All (or most) genes in the genome need to have a score, and the gene IDs need to match those used in the GMT file. We provide a ranked list of differentially expressed genes in ovarian cancer from TCGA (Supplementary Table 2). This cohort was previously stratified into four molecular subtypes on the basis of gene expression data, defined as differentiated, immunoreactive, mesenchymal and proliferative(Verhaak et al. 2012,Network and others (2011)). We compared the immunoreactive and mesenchymal subtypes to demonstrate the protocol. Step 5 of Supplementary Protocol 1 shows how this file was created.

Pathways gene set database

  • In Step 6A, g:Profiler maintains an up-to-date set of pathway gene sets from multiple sources and no further input from the user is required, but a database of pathway gene sets is required for Step 6B (GSEA). Supplementary Table 3 contains a database of pathway gene sets used for pathway enrichment analysis in the standard GMT format, downloaded from http://baderlab.org/GeneSets. This file contains pathways downloaded on 1 July 2017 from eight data sources: GO(Ashburner et al. 2000), Reactome(Fabregat et al. 2016), Panther(Mi, Muruganujan, and Thomas 2012), NetPath(Kandasamy et al. 2010), NCI(Schaefer et al. 2009), MSigDB curated gene sets (C2 collection, excluding Reactome and KEGG)(Liberzon et al. 2011), MSigDB Hallmark (H collection)(Liberzon et al. 2015) and HumanCyc(Caspi et al. 2016). The gene sets available from http://baderlab.org/GeneSets are updated monthly. A GMT file is a text file in which each line represents a gene set for a single pathway. Each line includes a pathway ID, a name and the list of associated genes in a tab-separated format. - Geneset files used for original protocol

References

Ashburner, Michael, Catherine A Ball, Judith A Blake, David Botstein, Heather Butler, J Michael Cherry, Allan P Davis, et al. 2000. “Gene Ontology: Tool for the Unification of Biology.” Nature Genetics 25 (1). Nature Publishing Group: 25–29.

Caspi, Ron, Richard Billington, Luciana Ferrer, Hartmut Foerster, Carol A Fulcher, Ingrid M Keseler, Anamika Kothari, et al. 2016. “The Metacyc Database of Metabolic Pathways and Enzymes and the Biocyc Collection of Pathway/Genome Databases.” Nucleic Acids Research 44 (D1). Oxford University Press: D471–D480.

Fabregat, Antonio, Konstantinos Sidiropoulos, Phani Garapati, Marc Gillespie, Kerstin Hausmann, Robin Haw, Bijay Jassal, et al. 2016. “The Reactome Pathway Knowledgebase.” Nucleic Acids Research 44 (D1). Oxford University Press: D481–D487.

Kandasamy, Kumaran, S Sujatha Mohan, Rajesh Raju, Shivakumar Keerthikumar, Ghantasala S Sameer Kumar, Abhilash K Venugopal, Deepthi Telikicherla, et al. 2010. “NetPath: A Public Resource of Curated Signal Transduction Pathways.” Genome Biology 11 (1). BioMed Central: 1–9.

Kandoth, Cyriac, Michael D McLellan, Fabio Vandin, Kai Ye, Beifang Niu, Charles Lu, Mingchao Xie, et al. 2013. “Mutational Landscape and Significance Across 12 Major Cancer Types.” Nature 502 (7471). Nature Publishing Group: 333–39.

Liberzon, Arthur, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P Mesirov, and Pablo Tamayo. 2015. “The Molecular Signatures Database Hallmark Gene Set Collection.” Cell Systems 1 (6). Elsevier: 417–25.

Liberzon, Arthur, Aravind Subramanian, Reid Pinchback, Helga Thorvaldsdóttir, Pablo Tamayo, and Jill P Mesirov. 2011. “Molecular Signatures Database (Msigdb) 3.0.” Bioinformatics 27 (12). Oxford University Press: 1739–40.

Mi, Huaiyu, Anushya Muruganujan, and Paul D Thomas. 2012. “PANTHER in 2013: Modeling the Evolution of Gene Function, and Other Gene Attributes, in the Context of Phylogenetic Trees.” Nucleic Acids Research 41 (D1). Oxford University Press: D377–D386.

Network, Cancer Genome Atlas Research, and others. 2011. “Integrated Genomic Analyses of Ovarian Carcinoma.” Nature 474 (7353). Nature Publishing Group: 609.

Raudvere, Uku, Liis Kolberg, Ivan Kuzmin, Tambet Arak, Priit Adler, Hedi Peterson, and Jaak Vilo. 2019. “G: Profiler: A Web Server for Functional Enrichment Analysis and Conversions of Gene Lists (2019 Update).” Nucleic Acids Research 47 (W1). Oxford University Press: W191–W198.

Schaefer, Carl F, Kira Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, and Kenneth H Buetow. 2009. “PID: The Pathway Interaction Database.” Nucleic Acids Research 37 (suppl_1). Oxford University Press: D674–D679.

Subramanian, Aravind, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43). National Acad Sciences: 15545–50.

Verhaak, Roel GW, Pablo Tamayo, Ji-Yeon Yang, Diana Hubbard, Hailei Zhang, Chad J Creighton, Sian Fereday, et al. 2012. “Prognostically Relevant Gene Signatures of High-Grade Serous Ovarian Carcinoma.” The Journal of Clinical Investigation 123 (1). Am Soc Clin Investig.