期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2015
卷号:23
期号:4
页码:192-197
DOI:10.14445/22312803/IJCTT-V23P139
出版社:Seventh Sense Research Group
摘要:Data mining has been a popular research area among the researchers for more than a decade because of its vast use of applications. This vast collection of data need to be mined for the purpose of knowledge discovery as Data mining is the field of extracting interesting patterns from large data collections. Data mining enables organizations to get agreed on grouping their data together for mining purpose because they know that mining results are fruitful for them. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals as large data collections consists sensitive information about the individual. Organizations want to apply data mining on their data without leaking any sensitive information about their individuals to other organizations. Thus the aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases. These techniques disclose nothing but the final results to all the sites. Privacy Preserving techniques are applied in many different areas like medical, bioinformatics, shopping, credit card analysis etc. And it has been a fruitful technique in all the fields. Privacy preserving techniques have been proposed for many data models like classification on centralized data then for association rules in distributed environments and clustering in vertical data partitioning. In this dissertation, we propose methods for privacy preservation in distributed environment. We construct the privacy preserving dissimilarity matrix of objects stored in different sites which can be used for privacy preserving clustering and other operations. It deals with the pair wise comparison of individual private sensitive data objects which are distributed horizontally to multiple sites. Here all the sites taking part in mining process are supposed to be the semihonest means these sites are in honest but curious state. In this dissertation we deal with the alphanumeric, categorical with numeric attributes as well. Dissimilarity matrix is being constructed with the help of a third party that is being involved to perform mining on over all collected data. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a basic protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the basic protocol and our protocol.
关键词:Each experiment is alsoperformed for a basic protocol which has no privacyconcern to show that the overhead comes with security andprivacy by comparing the basic protocol and our protocol.