期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:3
页码:43-51
出版社:IEEE Computer Society
摘要:We present a robust solution to the following problem: given a table with multiple categorical dimensionattributes and one binary outcome attribute, construct a summary that offers an interpretable explana-tion of the factors affecting the outcome attribute in terms of the dimension attribute value combinations.We refer to such a summary as an explanation table, which is a disjunction of overlapping patternsover the dimension attributes, where each pattern specifies a conjunction of attribute=value conditions.The Flashlight algorithm that we describe is based on sampling and includes optimizations related tocomputing the information content of a summary from a sample of the data. Using real data sets, wedemonstrate the advantages of explanation tables compared to related approaches that can be adaptedto solve our problem, and we show significant performance benefits of our approach..