期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2011
卷号:34
期号:03
出版社:IEEE Computer Society
摘要:Data Auditor is a system for analyzing data quality via exploring data semantics. Given a user-supplied
constraint, such as a functional dependency or an inclusion dependency, the system computes pattern
tableaux, which are concise summaries of subsets of the data that satisfy (or fail) the constraint. The
engine of Data Auditor is an efficient algorithm for finding these patterns, which defers expensive compu-
tation on patterns until needed during search, thereby pruning wasted effort. We demonstrate the utility
of our approach on a variety of data as well as the performance gain from employing this algorithm.