Materials for CLASS04
[Back to Course Homepage]
Workshop 4 - GENE SET ENRICHMENT ANALYSIS (GSEA)
GSEA User Guide Link
GSEA Homepage Link
Download the Leukemia example here
- Open a GSEA GUI session by clicking the GSEA icon.
- Click Load data on the Left Panel (Steps in GSEA Analysis).
- Load data using Method 1: Browse for files ...
- Locate the Leukemia_collapsed_symbols.gct and Leukemia.cls and c2.v1.symbols.gmt in your computer.
- Click "Choose".
- Click Run GSEA on the Left Panel (Steps in GSEA Analysis).
- Under Required fields: Select the Leukemia_collapsed_symbols.gct in Expression dataset.
- Select Gene sets database.
- In the pop-up window, select c2.v1.symbols.gmt under Gene matrix(local gmx/gmt). Click OK.
- Select 500 in the Number of permutations.
- Select Leukemia.clsin the Phenotype labels.
- Select ALL_versus_AML then click OK.
- Select false in the Collapse dataset to gene symbols. This is because the dataset is already collapsed.
- Select phenotype in the Permutation type.
- Leave the Chip platform(s) blank as the dataset is already collapsed.
- Click to expand Basic fields.
- Type in the Analysis name.
- Left all parameters as default.
- Select 500 in the Max size:exclude larger sets.
- Select 10 in the Min size:exclude smaller sets.
- Select the directory to save results in Save results in this folder.
- Click to expand Advanced fields.
- Leave all parameters as default.
- Select 100 in Plot graphs for the top sets of each phenotype.
- At the bottom, select Normal (cpu usage).
- Click Run.
- Check status at Bottom Left panel under GSEA reports.
- Click the Status column when GSEA is done.
- Browse the GSEA results as HTML.
- Browse the GSEA results in folder.
- Click Leading edge analysis on the Left Panel (Steps in GSEA analysis).
- Load the GSEA results by Select a GSEA result from the application cache section.
- Click Load GSEA Results.
- Sort the gene sets by NOM p-value. This can be done by clicking the NOM p-value bar.
- Select gene sets with NOM p-value < 0.05.
- At the bottom right, click Run leading edge analysis.
- Browse the Leading edge analysis results.
Now, use GSEA to identify differentially expressed gene sets in the KRAS-dependency microarray gene expression data. The gene expression data is available here. This experiment is the same as Assignment #2 and #3, where eight colorectal cancer cell lines were profiled. Four KRAS-dependent lines (SK-CO-1, SW620, SW1116 and RCM-1) and four KRAS-independnet lines (LS-174T, SW837, SW1463 and SW948). Use the C2 gene sets in the analysis.
Download the Assignment #4 Data here.
Use HT-HGU133A.chip to collapse dataset to gene symbols. Download the chip file here.
Use C2 gene set file in MSigDB.
Your tasks are:
- Create the .gct and .cls for the TRAIN.probesets.txt.
[You can create in Excel] or
[You can use cut and paste in the commands]
- Run GSEA to identify enriched gene sets in KRAS-DEP and KRAS-IND using C2 gene sets.
- List out the gene sets in each class with p<0.05 (Nominal p-value).
- Use Leading Edge Analysis, and plot the heatmap of the leading edge genes.
- Find drugs for KRAS-DEP and KRAS-IND using DSigDB D2 gene sets.