期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:2
页码:71-81
出版社:IEEE Computer Society
摘要:The product domain contains valuable data for many important applications. Given the large and in-creasing number of sources that provide data about product specifications and the velocity as well as thevariety with which such data are available, this domain represents a challenging scenario for developingand evaluating big data integration solutions. In this paper, we present the results of our efforts towardsbig data integration for product specifications. We present a pipeline that decomposes the problem intodifferent tasks from source and data discovery, to extraction, data linkage, schema alignment and datafusion. Although we present the pipeline as a sequence of tasks, different configurations can be defineddepending on the application goals..