文章基本信息

标题：Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1
本地全文：下载
作者：Ioannis Z. Emiris ; Vasilis Margonis ; Ioannis Psarros 等
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2019
卷号：145
页码：1-13
DOI：10.4230/LIPIcs.APPROX-RANDOM.2019.47
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest.
关键词：Approximate nearest neighbor; Manhattan metric; randomized embedding