Materials for CLASS02

[Back to Course Homepage]

Workshop 2 - CLUSTERING

Golub et al published one of the first applications of microarray in cancer research, demonstrating the utility of this high-throughput technology in molecular class discovery and classification in leukemia. We are going to use the data set published in the paper as an example for today's workshop.

Please download the processed data. [RAW DATA]

Read the PAPER


  • Download CLUSTER 3.0 from (
  • Link to CLUSTER 3.0 Manual Manual

  • Download Java TreeView from (http://jtreeview

    Practical 1:

    1. Load the data into CLUSTER 3.0.
    2. In the "Adjust Data" tab, select "Center genes" by "Median" and "Normalized genes". [This is similar to the preprocessing step of the paper]
    3. Click "Apply" to perform the Center genes and normalization.
    4. In the "Hierarchical" tab, explore each of the clustering options (different similarity metrics and clustering methods) for Genes and Arrays.
    5. Save each combination of similarity metric and clustering method as different output (By changing the "Job Name").
    6. Invoke Java TreeView.
    7. Load the clustering output (*.cdt) files into Java TreeView.
    8. Change the color/contrast of the heatmaps using the "Pixel Setting".
    9. Change the font sizes of genes and arrays using the "Font Setting".
    10. Flip the array and gene nodes using "Analysis" options.
    11. Export the clustering into an image file (*.png).
    12. Repeat steps 7-11 for different clustering outputs to see the effects of similarity metric and clustering method.

    Practical 2:

    1. Extract the 50 genes distinguishing ALL from AML (Fig 3B of Golub et al 1999, use the Accession number, not the gene name) from the raw data. [Hint: modify the script in Assignment 1 to achieve this task]
    2. Perform clustering on the 50 genes using CLUSTER 3.0.
    3. Perform preprocessing step as described in the paper (see Step 2 in Practical 1).
    4. Visualize the clustering in Java TreeView.
    5. Adjust the color and contrast of the heatmap.
    6. Export the clustering results.

    Related Scripts

  • Hint: To make sure that the gene signature file is in the "Unix" format, in your terminal, type dos2unix file (file = your gene signature file) to convert the gene signature file into unix format.

    Assignment 2


    Singh et al derived a KRAS-dependency gene siganture for colorectal cancer using four KRAS-dependent lines (SK-CO-1, SW620, SW1116 and RCM-1) and four KRAS-independnet lines (LS-174T, SW837, SW1463 and SW948) [TRAINING SET]. The KRAS-dependency gene signature contains 687 independent and 832 dependent genes.


    Your collaborator is very excited about this publication, and would like to find out the KRAS-dependency of the other four colorectal cancer cell lines [TEST SET: cell lines A, B, C, and D] in his lab. Could you help to predict the KRAS-dependency of these lines?

    Specifically, please complete the following tasks:

    1. Perform clustering on the training set using the published gene signature.
    2. Export the dendogram and heatmap of the training set.
    3. Perform clustering on the training and test sets using the published gene signature.
    4. Export the dendogram and heatmap of the combined training and test sets.
    5. Predict the KRAS-dependency of the test set.