期刊名称:International Journal of Electrical and Computer Engineering
电子版ISSN:2088-8708
出版年度:2022
卷号:12
期号:4
页码:4457-4468
DOI:10.11591/ijece.v12i4.pp4457-4468
语种:English
出版社:Institute of Advanced Engineering and Science (IAES)
摘要:A dataset containing 1924 observations used in this study to evaluate the effect of 435 different independent variables on one dependent variable. Big data has some issues such as irrelevant variables and outliers. Therefore, this study focused on analysing and comparing the impact of three different variable selection based on machine learning techniques, including random forest (RF), support vector machines (SVM), and Boosting. Further, the M robust regression was applied to address the outliers using M–bi square, M–Hampel, and M–Huber. Random forest and M-Hampel results revealed the significant comparing from the other methods such as mean absolute error (MAE) 175.33995, mean square error (MSE) 31.8608, mean average percentage error (MAPE) 9.16091, sum of square error (SSE) 89270.45, R–square 0.829511, and R–square adjusted 0.82670. Also, these techniques indicated that the 8 selection criteria were lower than the other techniques including Akaike information criterion (AIC) 47.25915, generalized cross validation (GCV) 47.27169, hannan-quinn (HQ) 47.60351, RICE (47.2845), SCHWARZ 51.7099, sigma square (SGMASQ) 46.50605, SHIBATA 47.23489, and final prediction error (FPE) 47.25929. Therefore, the study recommended that the best random forest and M-Hampel models are helpful to show the minimum issues and efficient validation for analysing and comparing big data.