期刊名称:BVICAM's International Journal of Information Technology
印刷版ISSN:0973-5658
出版年度:2015
卷号:7
期号:2
语种:English
出版社:Bharati Vidyapeeth's Institute of Computer Applications and Management
摘要:Data mining discloses hidden, previously unknown, and potentially useful information from large amounts of data. As comparison to the traditional statistical and machine learning data analysis techniques, data mining emphasizes to provide a convenient and complete environment for the data analysis. Data mining has become a popular technology in analyzing complex data. Clustering is one of the data mining core techniques. Data mining and data clustering, the prominent field of today it is a highly desirable task to apply unsupervised classification analysis on high volume of data sets with combined ordinal, ratio-scaled, binary and nominal with numeric, categorical, with values. However, most already available data merging and grouping through unsupervised classification algorithms are effective for the data with numeric category rather than the mixed data set. So,in this paper we have made efforts to present a new amalgamation techniques for these combined data sets by doing changes in the common cost function, and here we have tofindtraceof the internal cluster dispersion matrix. To obtain correct clustering result the algorithm used is GA that optimizes the new cost function. We can compare and analyze that for high dimensional sets of data having mixed attributes GA-based clustering algorithm is feasible. Core Idea of Our Paper By this paper, we try to describe a technique for estimating the cost function metrics from mixed numeric, categorical and other type databases by using a uncertain grade-of?membership clustering model with the efficiency of Genetic Algorithm. This technique can be applied to the problem of opportunity analysis for business decision-making. This general approach could be adapted to many other applications where a decision agent needs to assess the value of items from a set of opportunities with respect to a reference set representing its business. For processing numeric attributes, instead of generalizing them, a prototype may be developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques.