摘要:The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set S of n points in â"^d, a point yâ^^ â"^d, and an integer 2 ⤠k ⤠d, find an affine combination of at most k points of S that is nearest to y. We describe a O(n^{k-1} log^{d-k+2} n)-time randomized (1+ε)-approximation algorithm for this problem with d and ε constant. This is the first algorithm for this problem running in time o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP'18), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k = d = O(1), we provide a simple O_δ(n^{d-1+δ})-time deterministic exact algorithm, for any δ > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors.
关键词:Sparse Linear Regression; Orthogonal Range Searching; Affine Degeneracy Testing; Nearest Neighbors; Hyperplane Arrangements