期刊名称:International Journal of Web & Semantic Technology
印刷版ISSN:0976-2280
电子版ISSN:0975-9026
出版年度:2015
卷号:6
期号:1
页码:01
DOI:10.5121/ijwest.2015.6101
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The proliferation of heterogeneous data sources of semantic knowledge base intensifies the need of anautomatic instance matching technique. However, the efficiency of instance matching is often influenced bythe weight of a property associated to instances. Automatic weight generation is a non-trivial, however animportant task in instance matching technique. Therefore, identifying an appropriate metric for generatingweight for a property automatically is nevertheless a formidable task. In this paper, we investigate anapproach of generating weights automatically by considering hypotheses: (1) the weight of a property isdirectly proportional to the ratio of the number of its distinct values to the number of instances contain theproperty, and (2) the weight is also proportional to the ratio of the number of distinct values of a propertyto the number of instances in a training dataset. The basic intuition behind the use of our approach is theclassical theory of information content that infrequent words are more informative than frequent ones. Ourmathematical model derives a metric for generating property weights automatically, which is applied ininstance matching system to produce re-conciliated instances efficiently. Our experiments and evaluationsshow the effectiveness of our proposed metric of automatic weight generation for properties in an instancematching technique.