首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Redblock: a tool for online deduplication on large datasets
  • 本地全文:下载
  • 作者:Luan Félix Pimentel ; Igor Lemos Vicente ; Guilherme Dal Bianco
  • 期刊名称:Revista Brasileira de Computação Aplicada
  • 电子版ISSN:2176-6649
  • 出版年度:2017
  • 卷号:9
  • 期号:2
  • 页码:125-134
  • DOI:10.5335/rbca.v9i2.7143
  • 语种:English
  • 出版社:Universidade de Passo Fundo (UPF)
  • 摘要:Online data deduplication aims to identify records that represent the same purpose on a continuous data flow environment. It must be able to process a range of information with high effectiveness and no delays. The purpose of this paper is to introduce a developed tool entitled Redblock, for real-time data deduplication, using a distributed platform for online processing combined with an Inverted Index. During the experimental evaluation, Redblock managed to provide good preliminary results in terms of efficiency and effectiveness in a database.
  • 其他摘要:Online data deduplication aims to identify records that represent the same purpose on a continuous data flow environment. It must be able to process a range of information with high effectiveness and no delays. The purpose of this paper is to introduce a developed tool entitled Redblock, for real-time data deduplication, using a distributed platform for online processing combined with an Inverted Index. During the experimental evaluation, Redblock managed to provide good preliminary results in terms of efficiency and effectiveness in a database.
  • 关键词:Integração de Dados; Deduplicação Online; Blocagem.
国家哲学社会科学文献中心版权所有