文章基本信息

标题：Sample-Optimal Identity Testing with High Probability
本地全文：下载
作者：Ilias Diakonikolas ; Themis Gouleakis ; John Peebles 等
期刊名称：Electronic Colloquium on Computational Complexity
印刷版ISSN：1433-8092
出版年度：2017
卷号：2017
出版社：Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
摘要：
We study the problem of testing identity against a given distribution (a.k.a. goodness-of-fit) with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q , and parameters 0 1 , we wish to distinguish, {\em with probability at least 1 − }, whether the distributions are identical versus -far in total variation (or statistical) distance. Existing work has focused on the constant confidence regime, i.e., the case that = (1) , for which the sample complexity of identity testing is known to be ( n 2 ) . Typical applications of distribution property testing require small values of the confidence parameter (which correspond to small `` p -values'' in the statistical hypothesis testing terminology). Prior work achieved arbitrarily small values of via black-box amplification, which multiplies the required number of samples by ( log (1 )) . We show that this upper bound is suboptimal for any = o (1) , and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is 1 2 n log (1 ) + log (1 ) for any n , and . For the special case of uniformity testing, where the given distribution is the uniform distribution U n over the domain, our new tester is surprisingly simple: to test whether p = U n versus \dtv ( p U n ) , we simply threshold \dtv ( \wh p U n ) , where \wh p is the empirical probability distribution. We believe that our novel analysis techniques may be useful for other distribution testing problems as well.
关键词：distribution testing ; identity testing