The differential
clustering approach for comparative gene expression analysis: application to the
Candida albicans transcription program
Supplementary Material
Supplementary Figures
- Supplementary
Figure 1
Illustration of the use of
t-statistics to evaluate the extent of co-expression of genes assigned to a
given functional category. From left to right: (1) Based on prior functional
annotation (as given by the GO or KEGG database) the corresponding subsets of
orthologous genes in S. cerevisiae and C. albicans are selected. (2) Pairwise correlations between these genes are computed in
both organisms using the respective set of expression data. (3) The
distribution of a these correlations are compared to the background
distribution corresponding to random subsets of the same size. The
significance of co-expression among the functionally associated genes is
determined using the t-statistics for the two distributions.
- Supplementary
Figure 2
Extent of co-expression of genes assigned to KEGG pathways in
the two organisms. Analysis as described for GO terms (c.f. Fig. 1a), but
using KEGG pathways instead.
- Supplementary
Figure 3
Robustness of analysis with respect to sub-sampling of
conditions: The analysis leading to Fig. 3a (left panel) was repeated using
only a fraction of the expression data (as indicated above each plot). Note
that although the average correlations vary slightly (the error bars denote
the standard deviations resulting from different sub-samples), they give rise
to the same distinct classifications, even when using only 10% of the
available expression data.
- Supplementary
Figure 4
Robustness of co-expression between CDC22 and CLB2 respect to
sub-sampling of conditions: The correlation between the expression of the
genes CD28 and CLB2 in S. cerevisiae (blue)
and their respective orthologues in C. albicans (green) and S. pombe (red) is shown as a function of the sample
fraction used to compute this correlation. 20 random sub-samples were picked
for each different size and the standard deviations of the correlations over
these samples are indicated by the error bars.
Co-expression of GO terms (Figure 1)
- List
of all GO terms and their t-values
Each GO term used in the
analysis is listed together with the normalized t-values for C. albicans and S. cerevisiae. The t-values are plotted as in Fig. 1b.
The circle indicates the 4-sigma significance cutoff. Each term is linked to a
page summarizing the expression patterns of the genes in the category.
Analysis of gene correlations (Figure 3)
- List
of all DCA gene clusters
For
each cluster, the corresponding PCM is given together with the pattern
classification values and the genes assigned to each of the binary partitions.
(Use the search function to look up individual genes!)
- Interactive
Figure 3
Each cluster shown in Fig. 3c is linked to an annotated
high-resolution image. (Just click!)
Analysis of cell-cycle genes (Figure 4
- Interactive versions of
Fig.
4 for cluster
1
2
3
4
5
6
7
8
9
10
(more
details)
Each set of expression profiles
(reduced to cell-cycle specific data in the case of S. cerevisiae and S. pombe) from the reference organism (indicated
on the left) was segregated into 10 primary clusters. The genes of these
clusters were re-ordered according to the target organism (indicated on the
top). Controls are shown on the diagonal where the target organism coincides
with the reference organism, but corresponds to a diluted dataset (taking
every forth profile only). Click on the clusters on the right side of the
figures to obtain details on the genes.
Transcription Modules (Figure 5)
- Interactive
module tree
Every module in the tree shown in Fig. 5 is linked to a
page summarizing the properties of the module (genes, conditions, enriched
functions, enriched localization, enriched sequence elements).
- List
of transcription modules and their enrichment with conserved and specific
genes
For each module, the proportion of orthologous and C.
albicans specific genes is given together with the significance p-values
of the corresponding enrichment. A p-value of 0.05 is used as a cut-off to
classify modules as core, specific or balanced mixture.
Each module is linked to a page summarizing the properties of the
module.
- List
of representative modules (C. albicans)
Most transcription
modules remain stable over a range of resolutions. To avoid a bias for those
modules that are most persistent and therefore appear at many levels in the
statistical analysis of the transcription modules in Fig. 5b-d, we compiled a
representative set of transcription modules from all levels. To this end,
similar modules were clustered together and a central representative was
chosen for each cluster. The clusters can be viewed by clicking on
Intersects. The statistics shown in Fig. 5b-d are based on this
representative set.
-
Analysis of GO connectivity (Figure 6)
- Interactive
Fig. 6b
Click on the image to see a list of all clusters identified in
the analysis of GO connectivity in C. albicans.
- Interactive
Fig. 6d
Every cluster shown in Fig. 6d is linked to an annotated
high-resolution image. (Just click!)
Analysis of sequence connectivity (Table
1)