摘要:Set-valued data is comprised of records that are sets of items, such as goods purchased
by each individual. Methods of publishing and widely utilizing set-valued data while protecting
personal information have been extensively studied in the field of privacy-preserving data publishing.
Until now, basic models such as k-anonymity or k
m-anonymity could not cope with attribute
inference by an adversary with background knowledge of the records. On the other hand, the ρ-
uncertainty model makes it possible to prevent attribute inference with a confidence value above a
certain level in set-valued data. However, even in that case, there is the problem that items to be
protected have to be designated as common to everyone. In this research, we propose a new model
that can provide more suitable privacy protection for each individual by protecting different items
designated for each record distinctively and build a heuristic algorithm to achieve this guarantee using
partial suppression. In addition, considering the problem that the computational complexity of
the algorithm increases combinatorially with increasing data size, we introduce the concept of probabilistic
relaxation of privacy guarantee. Finally, we show the experimental results of evaluating the
performance of the algorithms using real-world datasets.
关键词:privacy;preserving; anonymization; set;valued data; attribute inference; association rule