期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2022
卷号:13
期号:4
DOI:10.14569/IJACSA.2022.0130406
语种:English
出版社:Science and Information Society (SAI)
摘要:Data analysis is very important for the development of any business today. It helps to identify organizational bottlenecks, optimize business processes, foresee customers’ demands and behavior, and provides summarized data that could help reducing costs and increase profits. Having this information when designing new products or services highly increases the chances of their success, and thus provides an additional competitive advantage over other businesses. However, having a single data analyst with a computer is far from enough in the era of big data. There are powerful data analytical software tools, but they are either expensive or hard to deploy and require multiple high-performance servers to run. Buying expensive hardware and software, and hiring high-qualified IT experts, is not affordable for all companies, especially for smaller ones and start-ups. Therefore, this article proposes an architecture for integration of a company’s heterogeneous data (stored within a database of any type, or in the file system) to a remote Hadoop cluster, providing powerful data analytical services on demand. This is an affordable and cost-effective cloud-based solution, suitable for a company of any size. Businesses are not required to by any hardware or software, but use the data analytical services on demand, paying a small processing fee per request or by subscription.
关键词:Hadoop integration; data analytical tools; heterogeneous data integration; Hadoop distributed file system (HDFS); HBase; hive