Approximate Nearest Neighbors
Distance-based Projection onto Euclidean Space (DPES)
Nearest Neighbors in Motion Planning: Research in motion planning has in recent years focused on sampling-based algorithms, such as PRM, RRT, for solving problems involving multiple and highly complex robots. Such algorithms use nearest neighbors to create roadmaps or trees in order to effectively search for a collision-free path to the goal
Nearest Neighbors in Nonlinear Dimensionality Reduction: Nonlinear dimensionality reduction methods often rely on the nearest-neighbors graph to extract low-dimensional embeddings that reliably capture the underlying structure of high-dimensional data.
Impracticality of Nearest-Neighbors Algorithms for High-Dimensional Data: Although researchers have developed many nearest-neighbors algorithms, such as kd-tree, R-tree, X-tree, SR-tree, M-tree, VP-tree, GNAT, CoverTree, iDistance, analysis has shown that computing nearest neighbors of a point from a high-dimensional data set generally requires time proportional to the size of the data set itself, rendering the computation of the nearest neighbors prohibitively expensive.
Approach: Approximate Nearest Neighbors: This work significantly reduces the major computational bottleneck of many motion planning and nonlinear dimensionality reduction methods by efficiently and accurately approximating the nearest neighbors. The approximation relies on a distance-based projection of high-dimensional data onto a low-dimensional Euclidean space. The approximation is based on projecting the data set S onto a low-dimensional Euclidean space according to distances between points in S and a set of pivots P ⊂ S. To maximize the correlation of distances between data points and the corresponding projections, a hill-climbing approach is used to select the pivots. Furthermore, distances from data points to pivots are transformed using landmark multi-dimensional scaling (MDS) in order to minimize the distortion of distances resulting from the projection of high-dimensional data onto a low-dimensional Euclidean space. As indicated by the experiments, the advantage of the proposed approximation is that it significantly reduces the computational time while yielding similar accuracy to nearest neighbors.
Projection of High-Dimensional Data onto Low-Dimensional Euclidean Space Input: S = {s1,...,sn}, a set of data points d : S x S --> R, distance metric m, dimension of Euclidean space h, number of pivots Output: E(S) = {e(s1),...,e(sn)}, projection of S onto R^m 1: P <-- select h pivot points from S 2: D <-- an nxh matrix 3: for each si in S do 4: for each pj in P do 5: D(i,j) = d(si, pj) 6: end for 7: end for 8: E(S) <-- LandmarkMDS(D) Computation of Approximate Nearest Neighbors Input: s, query point k, number of approximate neighbors K, number of neighbors in Euclidean Space (K > k) Output: k approximate neighbors for the point s from S according to d 1: e(s) <-- projection of s onto R^m 2: A <-- K nearest neighbors from E(S) to e(s) according to Euclidean distance in R^m 3: B <-- k nearest neighbors to s from A according to distance metric d
Projection of high-dimensional data onto a low-dimensional Euclidean space: The objective is to find a projection E(S) that can be efficiently computed while preserving the relative distances between points in the data set S, i.e., the projections of points in S that are close according to the distance metric d should be close according to the Euclidean distance in R^{m}. This is achieved by carefully selecting a set of pivots P ⊂ S, computing the distances D(i,j) = d(s_{i}, p_{j}) between each point s_{i} ∈ S and each pivot p_{j} ∈ P, and then using landmark multi-dimensional scaling (MDS) to obtain the projection E(S) while minimizing distortion.
Selection of pivots: The selection of pivots greatly affects how well E(S) preserves the relative distances of points in S. The selection scheme developed in this work attempts to maximize the correlation between distances according to d for any two points s', s'' ∈ S and distances according to the Euclidean metric in R^{m} for e(s') and e(s''). If such distances correlate well, then the distortion resulting from the projection of S onto E(S) is small.
A strategy that works well in practice is to select the first pivot p_{1} uniformly at random from S and select each other pivot p_{j} as the point in S that maximizes min_{i=1,...,j-1}d(p_{i}, p_{j}).
We also proposed a hill-climbing strategy for improving the initial selection of pivots. Let C denote a set of candidate pivots selected from S, where |C| > m. Initially each pivot in P is selected uniformly at random from C. At each iteration, a pivot in P is replaced by the point in C that maximizes the correlation among distances d(p_{i}, p_{j}) and ||e(p_{i}), e(p_{j})||. This process is repeated until no significant improvement in the correlation occurs or when a maximum number of iterations is exceeded.
Computation of approximate nearest neighbors: The low-dimensional Euclidean projection E(S) is then used to compute approximate nearest neighbors. For a query point s, its low-dimension Euclidean projection e(s) is computed first. Then, k approximate nearest neighbors of s from the dataset S according to the distance metric d are computed in two steps:
- compute K > k candidate neighbors as the K exact nearest neighbors of e(s) from E(S) according to the Euclidean distance metric in R^{m}; and
- select the k closest points to s according to d from the K candidate neighbors.
- Plaku E and Kavraki LE (2007): "Nonlinear Dimensionality Reduction Using Approximate Nearest Neighbors." SIAM International Conference on Data Mining, pp. 180--191 [publisher] [preprint]
- Plaku E and Kavraki LE (2008): "Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning." Springer Tracts in Advanced Robotics, vol. 47, pp. 3--18 [publisher] [preprint]
- Plaku E, Stamati H, Clementi C, and Kavraki LE (2007): "Fast and Reliable Analysis of Molecular Motion Using Proximity Relations and Dimensionality Reduction." Proteins: Structure, Function, and Bioinformatics, vol. 67(4), pp. 897--907 [publisher] [preprint]