Timothy M. Chan's Publications: Approximate nearest neighbors

Faster deterministic and Las Vegas algorithms for offline approximate nearest neighbors in high dimensions

Josh Alman and Ryan Williams)

We present a deterministic, truly subquadratic algorithm for offline (1+epsilon)-approximate nearest or farthest neighbor search (in particular, the closest pair or diameter problem) in Hamming space in any dimension d < n^delta, for a sufficiently small constant delta > 0. The running time of the algorithm is roughly n^{2-epsilon^{1/2+O(delta)}} for nearest neighbors, or n^{2-Omega(sqrt{epsilon}/log(1/epsilon))} for farthest. The algorithm follows from a simple combination of expander walks, Chebyshev polynomials, and rectangular matrix multiplication.

We also show how to eliminate errors in the previous Monte Carlo randomized algorithm of Alman, Chan, and Williams [FOCS'16] for offline approximate nearest or farthest neighbors, and obtain a Las Vegas randomized algorithm with expected running time n^{2-Omega(epsilon^{1/3}/log(1/epsilon))}.

Finally, we note a simplification of Alman, Chan, and Williams' method and obtain a slightly improved Monte Carlo randomized algorithm with running time n^{2-Omega(epsilon^{1/3}/log^{2/3}(1/epsilon))}.

As one application, we obtain improved deterministic and randomized (1+epsilon)-approximation algorithms for MAX-SAT.

On locality-sensitive orderings and their applications

Sariel Har-Peled and Mitchell Jones)

For any constant d and parameter eps > 0, we show the existence of (roughly) 1/eps^d orderings on the unit cube [0,1)^d, such that any two points p,q in [0,1)^d that are close together under the Euclidean metric are "close together" in one of these linear orderings in the following sense: the only points that could lie between p and q in the ordering are points with Euclidean distance at most eps d(p,q) from p or q. These orderings are extensions of the "Z-order", and they can be efficiently computed.

Functionally, the orderings can be thought of as a replacement to quadtrees and related structures (like well-separated pair decompositions). We use such orderings to obtain surprisingly simple algorithms for a number of basic problems in low-dimensional computational geometry, including (i) dynamic approximate bichromatic closest pair, (ii) dynamic spanners, (iii) dynamic approximate minimum spanning trees, (iv) static and dynamic fault-tolerant spanners, and (v) approximate nearest neighbor search.

Applications of Chebyshev polynomials to low-dimensional computational geometry

We apply the polynomial method---specifically, Chebyshev polynomials---to obtain a number of new results on geometric approximation algorithms in low constant dimensions. For example, we give an algorithm for constructing epsilon-kernels (coresets for approximate width and approximate convex hull) in close to optimal time O(n + (1/epsilon)^{(d-1)/2}), up to a small near-(1/epsilon)^{3/2} factor, for any d-dimensional n-point set. We obtain an improved data structure for Euclidean approximate nearest neighbor search with close to O(n log n + (1/epsilon)^{d/4}n) preprocessing time and O((1/epsilon)^{d/4} log n) query time. We obtain improved approximation algorithms for discrete Voronoi diagrams, diameter, and bichromatic closest pair in the L_s-metric for any even integer constant s > 2. The techniques are general and may have further applications.

A clustering-based approach to kinetic closest pair

Zahed Rahmati)

Given a set P of n moving points in fixed dimension d, where the trajectory of each point is a polynomial of degree bounded by some constant, we present a kinetic data structure (KDS) for maintenance of the closest pair on P. Assuming the closest pair distance is between 1 and Delta over time, our KDS uses O(n log Delta) space and processes O(n^2 beta log Delta log n + n^2 beta log Delta loglog Delta) events, each in worst-case time O(log^2 n + log^2 log Delta). Here, beta is an extremely slow-growing function. The locality of the KDS is O(log n + loglog Delta). Our closest pair KDS supports insertions and deletions of points. An insertion or deletion takes worst-case time O(log Delta log^2 n +log Delta log^2log Delta). Also, we use a similar approach to provide a KDS for the all epsilon-nearest neighbors in R^d. The complexities of the previous KDSs, for both closest pair and all epsilon-nearest neighbors, have polylogarithmic factors, where the number of logs depends on dimension d. Assuming Delta is polynomial in n, our KDSs obtain improvements on the previous KDSs. Our solutions are based on a kinetic clustering on P. Though we use ideas from the previous clustering KDS by Hershberger, we simplify and improve his work.

Polynomial representations of threshold functions and algorithmic applications

Josh Alman and Ryan Williams)

We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with "nice" threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilistic PTFs where we combine the above techniques to achieve even lower degree with similar "nice" threshold behavior. Utilizing these polynomial constructions, we design faster algorithms for a variety of problems:

Time-windowed closest pair

(with Simon Pratt)

Given a set of points in any constant dimension, each of which is associated with a time during which that point is active, we design a data structure with O(n log n) space that can find the closest pair of active points within a query interval of time in O(loglog n) time using a quadtree-based approach in the word-RAM model.

Approximating the minimum closest pair distance and nearest neighbor distances of linearly moving points

Zahed Rahmati)

Given a set of n moving points in R^d, where each point moves along a linear trajectory at arbitrary but constant velocity, we present an O~(n^{5/3})-time algorithm to compute a (1+epsilon)-factor approximation to the minimum closest pair distance over time, for any constant epsilon>0 and any constant dimension d. This addresses an open problem posed by Gupta, Janardan, and Smid (1996).

More generally, we consider a data structure version of the problem: for any linearly moving query point q, we want a (1+epsilon)-factor approximation to the minimum nearest neighbor distance to q over time. We present a data structure that requires O~(n^{5/3}) space and O~(n^{2/3}) query time, O~(n^5) space and polylogarithmic query time, or O~(n) space and O~(n^{4/5}) query time, for any constant epsilon>0 and any constant dimension d.

Dynamic data structures for approximate Hausdorff distance in the word RAM

(with Dimitrios Skrepetos)

We give a fully dynamic data structure for maintaining an approximation of the Hausdorff distance between two point sets in a constant dimension d, a standard problem in computational geometry. Our solution has an approximation factor of 1+epsilon for any constant epsilon>0 and expected update time O(log U/loglog n}). The result of the paper greatly improves over the previous exact method, which required O~(n^{5/6}) time and worked only in a semi-online setting. The model of computation is the word RAM model.

Better epsilon-dependencies for offline approximate nearest neighbor search, Euclidean minimum spanning trees, and epsilon-kernels

Sunil Arya)

Recently, Arya, da Fonseca, and Mount [STOC 2011, SODA 2012] made notable progress in improving the epsilon-dependencies in the space/query-time tradeoffs for (1+epsilon)-factor approximate nearest neighbor search in fixed-dimensional Euclidean spaces. However, epsilon-dependencies in the preprocessing time were not considered, and so their data structures cannot be used to derive faster algorithms for offline proximity problems. Known algorithms for many such problems, including approximate bichromatic closest pair (BCP) and approximate Euclidean minimum spanning trees (EMST), typically have factors near (1/epsilon)^{d/2 +/- O(1)} in the running time when the dimension d is a constant.

We describe a technique that breaks the (1/epsilon)^{d/2} barrier and yields new results for many well-known proximity problems, including:

Using additional bit-packing tricks, we can shave off the log n factor for EMST, and even move most of the epsilon-factors to a sublinear term.

The improvement arises from a new time bound for exact "discrete Voronoi diagrams", which were previously used in the construction of epsilon-kernels (or extent-based coresets), a well-known tool for another class of fundamental problems. This connection leads to more results, including:

Closest pair and the post office problem for stochastic points

Pegah Kamousi and Subhash Suri)

Given a (master) set M of n points in d-dimensional Euclidean space, consider drawing a random subset that includes each point m_i in M with an independent probability p_i. How difficult is it to compute elementary statistics about the closest pair of points in such a subset? For instance, what is the probability that the distance between the closest pair of points in the random subset is no more than l, for a given value l? Or, can we preprocess the master set M such that given a query point q, we can efficiently estimate the expected distance from q to its nearest neighbor in the random subset? These basic computational geometry problems, whose complexity is quite well-understood in the deterministic setting, prove to be surprisingly hard in our stochastic setting. We obtain hardness results and approximation algorithms for stochastic problems of this kind.

Well-separated pair decomposition in linear time?

Given a point set in a fixed dimension, we note that a well-separated pair decomposition can be found in linear time if we assume that the ratio of the farthest pair distance to the closest pair distance is polynomially bounded. Many consequences follow; for example, we can construct spanners or solve the all-nearest-neighbors problem in linear time (under the same assumption), and we compute an approximate Euclidean minimum spanning tree in linear time (without any assumption).

A minimalist's implementation of an approximate nearest neighbor algorithm in fixed dimensions

We consider the standard problem of approximate nearest neighbor search, for a given set of n points with integer coordinates in a constant-dimensional Euclidean space. We describe a simple implementation of a randomized algorithm that guarantees O(log n) expected query time and O(n log n) preprocessing time. The entire C++ code is under 100 lines long and requires no extra space other than the input array. The algorithm can easily be made dynamic as well.

Closest-point problems simplified on the RAM

Basic proximity problems for low-dimensional point sets, such as closest pair and approximate nearest neighbor, have been studied extensively in the computational geometry literature, with well over a hundred papers published. Generally, optimal algorithms designed for worst-case input require hierarchical spatial structures with sophisticated balancing conditions; dynamization of these structures is even more involved.

In this note, we point out that much simpler algorithms with the same performance are possible using standard, though nonalgebraic, RAM operations. This is interesting, considering that nonalgebraic operations have been used before in the literature...

Approximate nearest neighbor queries revisited

This paper proposes new methods to answer approximate nearest neighbor queries on a set of n points in d-dimensional Euclidean space. For any fixed constant d, a data structure with O(eps^{(1-d)/2} n log n) preprocessing time and O(eps^{(1-d)/2} log n) query time achieves approximation factor 1+eps for any given 0 < eps < 1; a variant reduces the eps-dependence by a factor of eps^{-1/2}. For any arbitrary d, a data structure with O(d^2 n log n) preprocessing time and O(d^2 log n) query time achieves approximation factor O(d^{3/2}). Applications to various proximity problems are discussed.

Copyright Notice

The documents contained in this directory are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Timothy Chan (Last updated Aug 2023)