摘要:Time-course microarray experiments track gene expression levels across several time points. They provide valuable insights into genome-wide dynamic aspects of gene regulations. We focus on gene clustering analysis in this paper. We explore a nonparametric Bayesian method for constructing clusters in functional space from the characteristics of gene profiles. In particular, we model each gene profile using a B-spline basis. So each gene is characterized by the basis coefficients of the spline fitting. Then we place a Dirichlet process prior on the basis coefficients to determine clusters of the genes. We essentially construct a hierarchical Dirichlet processes mixing model that assigns genes into the same cluster if they share the same latent basis coefficients. A simulation study is conducted to compare the proposed method to the K-means clustering method, a model-based clustering method (MCLUST), and a two-stage version of them in terms of the adjusted Rand index. We show our new method has better adjusted Rand index number among all these methods. We apply this nonparametric Bayesian clustering method to a real data set with 6 time points to gain further insights into how genes with similar profiles are clustered together and we find their functional annotation in Gene-Ontology groups using GOstats.
关键词:Dirichlet process; time-course microarray; functional data analysis