标题:Compared application of the new OPLS-DA statistical model versus partial least squares regression to manage large numbers of variables in an injury case-control study
摘要:The use of modern statistical methodology to overcome the known pitfalls of classical regression models in the analysis of large numbers of highly correlated data, has increased considerably in recent years. Statisticians in the field of chemometrics and OMICS research have developed a new method called Orthogonal projections to latent structures (OPLS). In comparison with the regular partial least squares (PLS) regression, OPLS provides a simpler method with the additional advantage that the orthogonal variation can be analyzed separately. Use of the OPLS model has spread to fields other than its origin but it is not yet applied to the field of epidemiology, which is a wide field of research. In public health and clinical research, there are situations in which large numbers of correlated variables need to be modeled. The authors successfully applied OPLS-DA to model large numbers of variables in a case-control study and compared it with discriminant analysis done by partial least squares regression. Prior to fitting the models, the dataset was split into two parts: a training set and a prediction set. Models fitted on the training dataset were later tested for validity in the prediction dataset. The OPLS-DA was compared with PLS-DA for model fitness, diagnostics and model interpretability. Both models suited the data but OPLS-DA was preferable. The authors encourage the use of these methods to increase study power and statistical validity in epidemiology and similar settings in which large numbers of correlated variables need to be modeled.
关键词:Partial least squares regression; orthogonal projections to latent structures; logistic regression; multicollinearity; injury epidemiology; burns