摘要:A key step in pharmacogenomic studies is the development of accurate prediction models for drug response based on individuals’ genomic information. Recent interest has centered on semiparametric models based on kernel machine regression, which can flexibly model the complex relationships between gene expression and drug response. However, performance suffers if irrelevant covariates are unknowingly included when training the model. We propose a new semi-parametric regression procedure, based on a novel penalized garrotized kernel machine (PGKM), which can better adapt to the presence of irrelevant covariates while still allowing for a complex nonlinear model and gene-gene interactions. We study the performance of our approach in simulations and in a pharmacogenomic study of the renal carcinoma drug temsirolimus. Our method predicts plasma concentration of temsirolimus as well as standard kernel machine regression when no irrelevant covariates are included in training, but has much higher prediction accuracy when the truly important covariates are not known in advance. Supplemental materials, including $\mathrm{R}$ code used in this manuscript, are available online at $\href{http://intlpress.com/site/pub/files/_supp/SII-2018-11-4-s2.zip}{\small{\texttt{http://intlpress.com/site/pub/files/_supp/SII-2018-11-4-s2.zip}}}$.
关键词:kernel machine; semiparametric regression; model selection