首页    期刊浏览 2024年12月14日 星期六
登录注册

文章基本信息

  • 标题:Efficient and Scalable Replication Discovery in XML Data
  • 本地全文:下载
  • 作者:M.Bhargavi ; R.Naveen ; Madhira.Srinivas
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2013
  • 卷号:4
  • 期号:3
  • 页码:286-289
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detection in relational data, there is yet little work for duplicates in other, more complex data models, such as XML. Our research in XML duplicate detection addresses four major challenges. First, we investigate on how object descriptions can be selected automatically, a difficult task in XML where objects and object descriptions are both represented by XML elements. Second, we define new domain-independent duplicate classifiers that take into account not only data, but also structural diversity of XML objects. Third, we define comparison strategies that make use of element dependencies to improve efficiency without jeopardizing effectiveness. Finally, we consider scalability by investigating how relational and XML databases can support the duplicate detection process. By considering the problem of XML duplicate detection under the aspects of effectiveness, efficiency and scalability, we believe that our insights and solutions will significantly contribute to solving XML duplicate detection for a wide range of applications.
  • 关键词:XML;Tree Pattern;Duplicate Detection
国家哲学社会科学文献中心版权所有