期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2018
卷号:96
期号:3
页码:822
出版社:Journal of Theoretical and Applied
摘要:The laboratory safety management system that can predict the risk situation and monitor the safety status. In order to predict and inform the researchers about the risk situation of the laboratory, it is necessary to classify the location area where the risk factor exists and the status information of the researcher according to the real time position. Based on the classification results of the location history data for the previous risk situation, classification algorithms such as K-Means or density-based spatial clustering of applications with noise (DBSCAN) are used to classify the real-time location. However, since the classification algorithm requires a large amount of computation, there is a problem that a high-grade processor must be used in order to process many position record data. To solve this problem, we use Apache Spark, which has recently become a big data processing framework. Since Apache Spark processes in memory and is suitable for iterative operation of large-scale data, it can perform classification operation of large amount of position data more quickly. In addition, Apache Spark supports RDD-based Matrix storage method to process location data type, enabling faster location data processing. In this paper, we design and implement a classification algorithm for location data stored in the Apache Spark environment. The classification algorithm uses the existing K-means algorithm and the DBSCAN algorithm more suitable for position data. Based on the classified result data, the classification speed of position data is compared and analyzed.
关键词:Laboratory Safety; Apache Spark; Big Data; Clustering Algorithm; DBSCAN