摘要:Distributed medical, financial, or social databases are analyzed daily for the discovery of patterns and useful information. Privacy concerns have emerged as some database segments contain sensitive data. Data mining techniques are used to parse, process, and manage enormous amounts of data while ensuring the preservation of private information. Cryptography, as shown by previous research, is the most accurate approach to acquiring knowledge while maintaining privacy. In this paper, we present an extension of a privacy-preserving data mining algorithm, thoroughly designed and developed for both horizontally and vertically partitioned databases, which contain either nominal or numeric attribute values. The proposed algorithm exploits the multi-candidate election schema to construct a privacy-preserving tree-augmented naive Bayesian classifier, a more robust variation of the classical naive Bayes classifier. The exploitation of the Paillier cryptosystem and the distinctive homomorphic primitive shows in the security analysis that privacy is ensured and the proposed algorithm provides strong defences against common attacks. Experiments deriving the benefits of real world databases demonstrate the preservation of private data while mining processes occur and the efficient handling of both database partition types.
关键词:privacy preserving; data mining; tree augmented naive Bayes; Paillier cryptosystem; homomorphic encryption; distributed databases privacy preserving ; data mining ; tree augmented naive Bayes ; Paillier cryptosystem ; homomorphic encryption ; distributed databases