首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Utility-driven assessment of anonymized data via clustering
  • 本地全文:下载
  • 作者:Maria Eugénia Ferrão ; Paula Prata ; Paulo Fazendeiro
  • 期刊名称:Scientific Data
  • 电子版ISSN:2052-4463
  • 出版年度:2022
  • 卷号:9
  • 期号:1
  • 页码:1-11
  • DOI:10.1038/s41597-022-01561-6
  • 语种:English
  • 出版社:Nature Publishing Group
  • 摘要:In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. this approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. the clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, 6)-diferential as privacy models . The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. the results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant feld-of-study estimates obtained from anonymized data are biased .
国家哲学社会科学文献中心版权所有