期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2016
卷号:5
期号:2
页码:199-207
出版社:International Journal of Computer and Information Technology
摘要:In the era of big data, data veracity is one of the most
challenging problems. One important task in big data integration
is to derive the most accurate records from noisy and conflicting
data records collected from multiple sources. However, data
sources may process a set of properties with inconsistent
reliabilities, e.g., height and weight of a patient are more likely to
be true than profession in medical records, departure and
landing time of a flight are more likely to be true than weather in
airline records. In a cloud computing environment, discrepancies
among data describing the same object appear more common
because of the increased degree of data replication and unknown
trustiness of servers storing the data in a cloud. Besides, we
observed that the difficulty to provide truth for different entity is
quite different. In this paper, we propose an ARTF model to
estimate attribute reliabilities with heterogeneous data types and
update it with the entity hardness automatically. The property
trustworthiness will be more precise in describing source
reliability, which in turn will achieve a better precision in
inferring the truth. We compare the performance of our method
to the state-of-art truth discovery methods through a real world
dataset and a synthetic dataset respectively, the experimental
results show that our algorithm can process source conflicts
much more accurately while reducing the convergence rate.
关键词:truth finding; heterogeneous data types; entity
hardness; attribute reliability estimation