标题:Sélection de variables pour la classification par mélanges gaussiens pour prédire la fonction des gènes orphelins. Cathy Maugis, Marie-Laure Martin-Magniette, Jean-Philippe Tamby, Jean-Pierre Renou, Alain Lecharny, Sébastien Aubourg, Gilles Celeux.
摘要:Biologists are interested in predicting the gene functions of sequenced
genome organisms according to microarray transcriptome data. The microarray
technology development allows one to study the whole genome in different experimental
conditions. The information abundance may seem to be an advantage for the
gene clustering. However, the structure of interest can often be contained in a subset
of the available variables. The currently available variable selection procedures
in model-based clustering assume that the irrelevant clustering variables are all independent
or are all linked with the relevant clustering variables. A more versatile
variable selection model is proposed, taking into account three possible roles for each
variable: The relevant clustering variables, the redundant variables and the independent
variables. A model selection criterion and a variable selection algorithm are
derived for this new variable role modelling. The interest of this new modelling for
discovering the function of orphan genes is highlighted on a transcriptome dataset
for the Arabidopsis thaliana plant.