期刊名称:IJAIN (International Journal of Advances in Intelligent Informatics)
印刷版ISSN:2442-6571
电子版ISSN:2548-3161
出版年度:2016
卷号:2
期号:2
页码:103-122
DOI:10.26555/ijain.v2i2.74
语种:English
出版社:Universitas Ahmad Dahlan
摘要:The emergence and growth of internet usage has accumulated an extensive amount of data. These data contain a wealth of undiscovered valuable information and problems of incomplete data set may lead to observation error. This research explored a technique to analyze data that transforms meaningless data to meaningful information. The work focused on Rough Set (RS) to deal with incomplete data and rules derivation. Rules with high and low left-hand-side (LHS) support value generated by RS were used as query statements to form a cluster of data. The model was tested on AIDS blog data set consisting of 146 bloggers and E-Learning@UTM (EL) log data set comprising 23105 URLs. 5-fold and 10-fold cross validation were used to split the data. Naïve algorithm and Boolean algorithm as discretization techniques and Johnson’s algorithm (Johnson) and Genetic algorithm (GA) as reduction techniques were employed to compare the results. 5-fold cross validation tended to suit AIDS data well while 10-fold cross validation was the best for EL data set. Johnson and GA yielded the same number of rules for both data sets. These findings are significant as evidence in terms of accuracy that was achieved using the proposed model.
关键词:Rough Set;AIDS blog data;E-Learning log data;Rules derivation;Cross validation