首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Record Matching : Improving Performance in Classification
  • 本地全文:下载
  • 作者:Cyju Elizabeth Varghese ; G. Naveen Sundar
  • 期刊名称:International Journal on Computer Science and Engineering
  • 印刷版ISSN:2229-5631
  • 电子版ISSN:0975-3397
  • 出版年度:2011
  • 卷号:3
  • 期号:03
  • 页码:1207-1212
  • 出版社:Engg Journals Publications
  • 摘要:Duplication detection identifies the records that represent the same real-world entity. This is a vital process in data integration. Record matching refers to the task of finding entries that refer to the same entity in two or more files. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. Supervised methods are the current techniques used for duplication detection. This requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. To address the problem of record matching in the Web database scenario, we present a Fast Duplication Detection, FDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. Starting from the non-duplicate set, we use two, a dynamic classification classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Performing clustering before giving vectors to classify should produce a better result. Moreover a nonlinear SVM produce a better result in case of noise document which improves overall performance of the system. Experimental results show that FDD performs better for web database scenario.
  • 关键词:Record Matching; Duplication Detection; Record matching; SVM; Unsupervised
国家哲学社会科学文献中心版权所有