We present a number of new results about range searching for colored (or "categorical") data:
Our data structures are designed using a variety of techniques, including colored variants of randomized incremental construction (which may be of independent interest), colored variants of shallow cuttings, and bit-packing tricks.
Range searching on categorical, or "colored", data has been studied extensively for over two decades. In this paper, we obtain the current best results for perhaps the most basic, and most often studied, version of the geometric problem: colored orthogonal range reporting.
Given n colored points in two-dimensional space [U]^2, we present a data structure with O(n log^{3/4+epsilon} n) space, for an arbitrarily small constant epsilon > 0, so that all k distinct colors in any axis-aligned query rectangle can be reported in (optimal) O(loglog U + k) time; this is the first method to break the O(n log n) space barrier.
In three dimensions, we present a data structure with O(n log^{9/5+epsilon} n) space and O(log n/loglog n + k) time; this improves the previous space bound of O(n log^4 n).
Range closest-pair (RCP) search is a range-search variant of the classical closest-pair problem, which aims to store a given set S of points into some space-efficient data structure such that when a query range Q is specified, the closest pair in S \cap Q can be reported quickly. RCP search has received attention over years, but the primary focus was only on R^2. In this paper, we study RCP search in higher dimensions. We give the first nontrivial RCP data structures for orthogonal, simplex, halfspace, and ball queries in R^d for any constant d. Furthermore, we prove a conditional lower bound for orthogonal RCP search for d >= 3.
In this paper we study two geometric data structure problems in the special case when input objects or queries are fat rectangles. We show that in this case a significant improvement compared to the general case can be achieved. We describe data structures that answer two- and three-dimensional orthogonal range reporting queries in the case when the query range is a fat rectangle. Our two-dimensional data structure uses O(n) words and supports queries in O(loglog U + k) time, where n is the number of points in the data structure, U is the size of the universe and k is the number of points in the query range. Our three-dimensional data structure needs O(n log^eps U) words of space and answers queries in O(loglog U + k) time. We also consider the rectangle stabbing problem on a set of three-dimensional fat rectangles. Our data structure uses O(n) space and answers stabbing queries in O(log U log log U + k) time.
In this work, we present a collection of new results on two fundamental problems in geometric data structures: orthogonal point location and rectangle stabbing.
We study a longstanding problem in computational geometry: 2-d dynamic orthogonal range reporting. We present a new data structure achieving O(log n/loglog n + k) optimal query time and O(log^{2/3+o(1)}n) update time (amortized) in the word RAM model, where n is the number of data points and k is the output size. This is the first improvement in over 10 years of Mortensen's previous result [SIAM J. Comput., 2006], which has O(log^{7/8+epsilon}n) update time for an arbitrarily small constant epsilon.
In the case of 3-sided queries, our update time reduces to O(log^{1/2+epsilon}n), improving Wilkinson's previous bound [ESA 2014] of O(log^{2/3+epsilon}n).
We revisit the orthogonal range searching problem and the exact l_infinity nearest neighbor searching problem for a static set of n points when the dimension d is moderately large. We give the first data structure with near linear space that achieves truly sublinear query time when the dimension is any constant multiple of log n. Specifically, the preprocessing time/space is O(n^{1+delta}) for any constant delta > 0, and the expected query time is n^{1 - 1/O(c log c)}. The data structure is simple and is based on a new "augmented, randomized, lopsided" variant of k-d trees. It matches (in fact, slightly improves) the performance of previous combinatorial algorithms that work only in the case of offline queries [Impagliazzo, Lovett, Paturi, and Schneider (2014) and Chan (SODA'15)]. It leads to faster combinatorial algorithms for all-pairs shortest paths in general weighted graphs and rectangular Boolean matrix multiplication.
In the offline case, we show that the problem can be reduced to the Boolean orthogonal vectors problem and thus admits an n^{2 - 1/O(log c)}-time non-combinatorial algorithm [Abboud, Williams, and Yu (SODA'15)]. This reduction is also simple and is based on range trees.
Finally, we use a similar approach to obtain a small improvement to Indyk's data structure [FOCS'98] for approximate l_infinity nearest neighbor search when d=c log n.
Given a set P of n moving points in fixed dimension d, where the trajectory of each point is a polynomial of degree bounded by some constant, we present a kinetic data structure (KDS) for maintenance of the closest pair on P. Assuming the closest pair distance is between 1 and Delta over time, our KDS uses O(n log Delta) space and processes O(n^2 beta log Delta log n + n^2 beta log Delta loglog Delta) events, each in worst-case time O(log^2 n + log^2 log Delta). Here, beta is an extremely slow-growing function. The locality of the KDS is O(log n + loglog Delta). Our closest pair KDS supports insertions and deletions of points. An insertion or deletion takes worst-case time O(log Delta log^2 n +log Delta log^2log Delta). Also, we use a similar approach to provide a KDS for the all epsilon-nearest neighbors in R^d. The complexities of the previous KDSs, for both closest pair and all epsilon-nearest neighbors, have polylogarithmic factors, where the number of logs depends on dimension d. Assuming Delta is polynomial in n, our KDSs obtain improvements on the previous KDSs. Our solutions are based on a kinetic clustering on P. Though we use ideas from the previous clustering KDS by Hershberger, we simplify and improve his work.
Given a set of geometric objects each associated with a time value, we wish to determine whether a given property is true for a subset of those objects whose time values fall within a query time window. We call such problems time-windowed decision problems, and they have been the subject of much recent attention, for instance studied by Bokal, Cabello, and Eppstein [SoCG 2015]. In this paper, we present new approaches to this class of problems that are conceptually simpler than Bokal et al.'s, and also lead to faster algorithms. For instance, we present algorithms for preprocessing for the time-windowed 2D diameter decision problem in O(n log n) time and the time-windowed 2D convex hull area decision problem in O(n alpha(n) log n) time (where alpha is the inverse Ackermann function), improving Bokal et al.'s O(n log^2 n) and O(n log n loglog n) solutions respectively.
Our first approach is to reduce time-windowed decision problems to a generalized range successor problem, which we solve using a novel way to search range trees. Our other approach is to use dynamic data structures directly, taking advantage of a new observation that the total number of combinatorial changes to a planar convex hull is near linear for any FIFO update sequence, in which deletions occur in the same order as insertions. We also apply these approaches to obtain the first O(n polylog n) algorithms for the time-windowed 3D diameter decision and 2D orthogonal segment intersection detection problems.
We study the problem of supporting (orthogonal) range selection queries over a set of n points in constant-dimensional space. Under the standard word-RAM model with word size w = Omega(lg n), we present data structures that occupy O(n (lg n/lglg n)^{d-1}) words of space and support d-dimensional range selection queries using O((lg n/lglg n)^d) query time. This improves the best known data structure by a factor of lglg n in query time. To develop our data structures, we generalize the "parallel counting" technique of Brodal, Gfeller, Jørgensen, and Sanders (2011) for one-dimensional range selection to higher dimensions.
As a byproduct, we design data structures to support d-dimensional range counting queries within O(n (log_w n)^{d-2}) words of space and O((log_w n)^{d-1}) query time, for any word size w = Omega(lg n). This improves the best known result of JaJa, Mortensen, and Shi (2004) when lg w >> lglg n.
Given a set of points in any constant dimension, each of which is associated with a time during which that point is active, we design a data structure with O(n log n) space that can find the closest pair of active points within a query interval of time in O(loglog n) time using a quadtree-based approach in the word-RAM model.
Given a set of n moving points in R^d, where each point moves along a linear trajectory at arbitrary but constant velocity, we present an O~(n^{5/3})-time algorithm to compute a (1+epsilon)-factor approximation to the minimum closest pair distance over time, for any constant epsilon>0 and any constant dimension d. This addresses an open problem posed by Gupta, Janardan, and Smid (1996).
More generally, we consider a data structure version of the problem: for any linearly moving query point q, we want a (1+epsilon)-factor approximation to the minimum nearest neighbor distance to q over time. We present a data structure that requires O~(n^{5/3}) space and O~(n^{2/3}) query time, O~(n^5) space and polylogarithmic query time, or O~(n) space and O~(n^{4/5}) query time, for any constant epsilon>0 and any constant dimension d.
A set of strings, called a string dictionary, is a basic string data structure. The most primitive query, where one seeks the existence of a pattern in the dictionary, is called a lookup query. Approximate lookup queries, i.e., to lookup the existence of a pattern with a bounded number of errors, is a fundamental string problem. Several data structures have been proposed to do so efficiently. Almost all solutions consider a single error, as will this result. Lately, Belazzougui and Venturini (CPM 2013) raised the question whether one can construct efficient indexes that support lookup queries with one error in optimal query time, that is, O(|p|/w + occ), where p is the query, w the machine word-size, and occ the number of occurrences.
Specifically, for the problem of one mismatch and constant alphabet size, we obtain optimal query time. For a dictionary of d strings our proposed index uses O(w d log^{1+eps}d) additional bit space (beyond the dictionary which can be maintained in compressed form). Our results are parameterized for a space-time tradeoff.
We propose more results for the case of lookup queries with one insertion/deletion on dictionaries over a constant sized alphabet. These results are especially effective for large patterns.
We present optimal deterministic algorithms for constructing shallow cuttings in an arrangement of lines in two dimensions or planes in three dimensions. Our results improve the deterministic polynomial-time algorithm of Matousek (1992) and the optimal but randomized algorithm of Ramos (1999). This leads to efficient derandomization of previous algorithms for numerous well-studied problems in computational geometry, including halfspace range reporting in 2-d and 3-d, k nearest neighbors search in 2-d, (<= k)-levels in 3-d, order-k Voronoi diagrams in 2-d, linear programming with k violations in 2-d, dynamic convex hulls in 3-d, dynamic nearest neighbor search in 2-d, convex layers (onion peeling) in 3-d, epsilon-nets for halfspace ranges in 3-d, and more. As a side product we also describe an optimal deterministic algorithm for constructing standard (non-shallow) cuttings in two dimensions, which is arguably simpler than the known optimal algorithms by Matousek (1991) and Chazelle (1993).
In the path minimum query problem, we preprocess a tree on n weighted nodes, such that given an arbitrary path, we can locate the node with the smallest weight along this path. We design novel succinct indices for this problem; one of our index structures supports queries in O(alpha(m,n)) time, and occupies O(m) bits of space in addition to the space required for the input tree, where m is an integer greater than or equal to n and alpha(m,n) is the inverse-Ackermann function. These indices give us the first succinct data structures for the path minimum problem, and allow us to obtain new data structures for path reporting queries, which report the nodes along a query path whose weights are within a query range. We achieve three different time/space tradeoffs for path reporting by designing (a) an O(n)-word structure with O(lg^eps n + occ lg^eps n) query time, where occ is the number of nodes reported; (b) an O(n lglg n)-word structure with O(lglg n + occ lglg n) query time; and (c) an O(n lg^eps n)- word structure with O(lglg n + occ) query time. These tradeoffs match the state of the art of two-dimensional orthogonal range reporting queries which can be treated as a special case of path reporting queries. When the number of distinct weights is much smaller than n, we further improve both the query time and the space cost of these three results.
We revisit a classical problem in computational geometry that has been studied since the 1980s: in the rectangle enclosure problem we want to report all k enclosing pairs of n input rectangles in 2D. We present the first deterministic algorithm that takes O(n log n + k) worst-case time and O(n) space in the word-RAM model. This improves previous deterministic algorithms with O((n log n + k) loglog n) running time. We achieve the result by derandomizing the algorithm of Chan, Larsen and Patrascu [SoCG'11] that attains the same time complexity but in expectation.
The 2D rectangle enclosure problem is related to the offline dominance range reporting problem in 4D, and our result leads to the currently fastest deterministic algorithm for offline dominance reporting in any constant dimension d >= 4.
A key tool behind Chan et al.'s previous randomized algorithm is shallow cuttings for 3D dominance ranges. Recently, Afshani and Tsakalidis [SODA'14] obtained a deterministic O(n log n)-time algorithm to construct such cuttings. We first present an improved deterministic construction algorithm that runs in O(n loglog n) time in the word-RAM; this result is of independent interest. Many additional ideas are then incorporated, including a linear-time algorithm for merging shallow cuttings and an algorithm for an offline tree point location problem.
We present three new results on one of the most basic problems in geometric data structures, 2-D orthogonal range counting. All the results are in the w-bit word RAM model.
We consider range queries in arrays that search for low-frequency elements: least frequent elements and alpha-minorities. An alpha-minority of a query range has multiplicity no greater than an alpha fraction of the elements in the range. Our data structure for the least frequent element range query problem requires O(n) space, O(n^{3/2}) preprocessing time, and O(sqrt{n}) query time. A reduction from boolean matrix multiplication to this problem shows the hardness of simultaneous improvements in both preprocessing time and query time. Our data structure for the alpha-minority range query problem requires O(n) space and O(1/alpha) query time, and allows alpha to be specified at query time.
A mode of a multiset S is an element a in S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1:n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query consists of an input pair of indices (i, j) for which a mode of A[i:j] must be returned. The best previous data structure with linear space, by Krizanc, Morin, and Smid (ISAAC 2003), requires O(sqrt(n) loglog n) query time. We improve their result and present an O(n)-space data structure that supports range mode queries in O(sqrt(n / log n)) worst-case time. Furthermore, we present strong evidence that a query time significantly below sqrt(n) cannot be achieved by purely combinatorial techniques; we show that boolean matrix multiplication of two sqrt(n) by sqrt(n) matrices reduces to n range mode queries in an array of size O(n). Additionally, we give linear-space data structures for orthogonal range mode in higher dimensions (queries in near O(n^(1-1/2d)) time) and for halfspace range mode in higher dimensions (queries in O(n^(1-1/d^2)) time).
We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model:
The most recent previous development on (a) was reported back in SoCG'95 by Gupta, Janardan, Smid, and Dasgupta, whose main result was an O([n lg n + k] lglg n) algorithm. The best previous result on (b) was an O(n lg n lglg n) algorithm due to Gabow, Bentley, and Tarjan---from STOC'84! As a consequence, we also obtain the current-record time bound for the maxima problem in all constant dimensions above 4.
We revisit one of the most fundamental classes of data structure problems in computational geometry: range searching. Back in SoCG'92, Matousek gave a partition tree method for d-dimensional simplex range searching achieving O(n) space and O(n^{1-1/d}) query time. Although this method is generally believed to be optimal, it is complicated and requires O(n^{1+eps}) preprocessing time for any fixed eps > 0. An earlier method by Matousek (SoCG'91) requires O(n log n) preprocessing time but O(n^{1-1/d} polylog n) query time. We give a new method that achieves simultaneously O(n log n) preprocessing time, O(n) space, and O(n^{1-1/d}) query time with high probability. Our method has several advantages:
We give an O(n sqrt{lg n})-time algorithm for counting the number of inversions in a permutation on n elements. This improves a long-standing previous bound of O(n lg n/lg lg n) that followed from Dietz's data structure [WADS'89], and answers a question of Andersson and Petersson [SODA'95]. As Dietz's result is known to be optimal for the related dynamic rank problem, our result demonstrates a significant improvement in the offline setting. Our new technique is quite simple: we perform a "vertical partitioning" of a trie (akin to van Emde Boas trees), and use ideas from external memory. However, the technique finds numerous applications: for example, we obtain
As a bonus, we also give a simple (1+epsilon)-approximation algorithm for counting inversions that runs in linear time, improving the previous O(n lg lg n) bound by Andersson and Petersson.
We give the first optimal solution to a standard problem in computational geometry: three-dimensional halfspace range reporting. We show that n points in 3-d can be stored in a linear-space data structure so that all k points inside a query halfspace can be reported in O(log n + k) time. The data structure can be built in O(n log n) expected time. The previous methods with optimal query time required superlinear (O(n log log n)) space.
We also mention consequences, for example, to higher dimensions and to external-memory data structures. As an aside, we partially answer another open question concerning the crossing number in Matousek's shallow partition theorem in the 3-d case (a tool used in many known halfspace range reporting methods).
Improving previous methods by Aronov and Har-Peled (SODA'05) and Kaplan and Sharir (SODA'06), we present a randomized data structure of O(n) expected size which can answer 3D approximate halfspace range counting queries in O(log (n/k)) expected time, where k is the actual value of the count. This is the first optimal method for the problem in the standard decision tree model; moreover, unlike previous methods, the new method is Las Vegas instead of Monte Carlo. In addition, we describe new results for several related problems, including approximate Tukey depth queries in 3D, approximate regression depth queries in 2D, and approximate linear programming with violations in low dimensions.
In this paper we give a fully dynamic data structure to maintain the connectivity of the intersection graph of n axis-parallel rectangles. The amortized update time (insertion and deletion of rectangles) is O(n^{10/11} polylog n) and the query time (deciding whether two given rectangles are connected) is O(1). It slightly improves the update time (O(n^{0.94})) of the previous method while drastically reducing the query time (near O(n^{1/3})). Our method does not use fast matrix multiplication results and supports a wider range of queries.
The documents contained in this directory are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.