computational robotics

from tasks to motions

Approximate Nearest Neighbors

Distance-based Projection onto Euclidean Space (DPES)

Nearest Neighbors in Motion Planning: Research in motion planning has in recent years focused on sampling-based algorithms, such as PRM, RRT, for solving problems involving multiple and highly complex robots. Such algorithms use nearest neighbors to create roadmaps or trees in order to effectively search for a collision-free path to the goal

Nearest Neighbors in Nonlinear Dimensionality Reduction: Nonlinear dimensionality reduction methods often rely on the nearest-neighbors graph to extract low-dimensional embeddings that reliably capture the underlying structure of high-dimensional data.

Impracticality of Nearest-Neighbors Algorithms for High-Dimensional Data: Although researchers have developed many nearest-neighbors algorithms, such as kd-tree, R-tree, X-tree, SR-tree, M-tree, VP-tree, GNAT, CoverTree, iDistance, analysis has shown that computing nearest neighbors of a point from a high-dimensional data set generally requires time proportional to the size of the data set itself, rendering the computation of the nearest neighbors prohibitively expensive.

Approach: Approximate Nearest Neighbors: This work significantly reduces the major computational bottleneck of many motion planning and nonlinear dimensionality reduction methods by efficiently and accurately approximating the nearest neighbors. The approximation relies on a distance-based projection of high-dimensional data onto a low-dimensional Euclidean space. The approximation is based on projecting the data set S onto a low-dimensional Euclidean space according to distances between points in S and a set of pivots P ⊂ S. To maximize the correlation of distances between data points and the corresponding projections, a hill-climbing approach is used to select the pivots. Furthermore, distances from data points to pivots are transformed using landmark multi-dimensional scaling (MDS) in order to minimize the distortion of distances resulting from the projection of high-dimensional data onto a low-dimensional Euclidean space. As indicated by the experiments, the advantage of the proposed approximation is that it significantly reduces the computational time while yielding similar accuracy to nearest neighbors.

Algorithm Details

Projection of High-Dimensional Data onto Low-Dimensional Euclidean Space

Input: S = {s1,...,sn}, a set of data points
       d : S x S --> R, distance metric
       m, dimension of Euclidean space
       h, number of pivots

Output: E(S) = {e(s1),...,e(sn)}, projection of S onto R^m

1: P <-- select h pivot points from S
2: D <-- an nxh matrix
3: for each si in S do
4:  for each pj in P do
5:   D(i,j) = d(si, pj)
6:  end for
7: end for
8: E(S) <-- LandmarkMDS(D)  

Computation of Approximate Nearest Neighbors

Input: s, query point
       k, number of approximate neighbors
       K, number of neighbors in Euclidean Space (K > k)  
         
Output: k approximate neighbors for the point s from S according to d

1: e(s) <-- projection of s onto R^m
2: A <-- K nearest neighbors from E(S) to e(s) according to Euclidean distance in R^m
3: B <-- k nearest neighbors to s from A according to distance metric d

Projection of high-dimensional data onto a low-dimensional Euclidean space: The objective is to find a projection E(S) that can be efficiently computed while preserving the relative distances between points in the data set S, i.e., the projections of points in S that are close according to the distance metric d should be close according to the Euclidean distance in Rm. This is achieved by carefully selecting a set of pivots P ⊂ S, computing the distances D(i,j) = d(si, pj) between each point si ∈ S and each pivot pj ∈ P, and then using landmark multi-dimensional scaling (MDS) to obtain the projection E(S) while minimizing distortion.

Selection of pivots: The selection of pivots greatly affects how well E(S) preserves the relative distances of points in S. The selection scheme developed in this work attempts to maximize the correlation between distances according to d for any two points s', s'' ∈ S and distances according to the Euclidean metric in Rm for e(s') and e(s''). If such distances correlate well, then the distortion resulting from the projection of S onto E(S) is small.

A strategy that works well in practice is to select the first pivot p1 uniformly at random from S and select each other pivot pj as the point in S that maximizes mini=1,...,j-1d(pi, pj).

We also proposed a hill-climbing strategy for improving the initial selection of pivots. Let C denote a set of candidate pivots selected from S, where |C| > m. Initially each pivot in P is selected uniformly at random from C. At each iteration, a pivot in P is replaced by the point in C that maximizes the correlation among distances d(pi, pj) and ||e(pi), e(pj)||. This process is repeated until no significant improvement in the correlation occurs or when a maximum number of iterations is exceeded.

Computation of approximate nearest neighbors: The low-dimensional Euclidean projection E(S) is then used to compute approximate nearest neighbors. For a query point s, its low-dimension Euclidean projection e(s) is computed first. Then, k approximate nearest neighbors of s from the dataset S according to the distance metric d are computed in two steps:

  • compute K > k candidate neighbors as the K exact nearest neighbors of e(s) from E(S) according to the Euclidean distance metric in Rm; and
  • select the k closest points to s according to d from the K candidate neighbors.
When the projection E(S) preserves the relative distances of points in S, it also preserves the relative ordering of points in S. Consequently, if a point s' is one of the k nearest neighbors of a query point s, then most likely e(s') is also one of the k nearest neighbors of e(s). The computation of K > k candidate neighbors further improves the accuracy, since the likelihood that e(s') is not one of the K nearest neighbors of e(s) is even smaller.

Related Publications

  • Plaku E and Kavraki LE (2007): "Nonlinear Dimensionality Reduction Using Approximate Nearest Neighbors." SIAM International Conference on Data Mining, pp. 180--191   [publisher]   [preprint]
  • Plaku E and Kavraki LE (2008): "Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning." Springer Tracts in Advanced Robotics, vol. 47, pp. 3--18  [publisher]  [preprint]
  • Plaku E, Stamati H, Clementi C, and Kavraki LE (2007): "Fast and Reliable Analysis of Molecular Motion Using Proximity Relations and Dimensionality Reduction." Proteins: Structure, Function, and Bioinformatics, vol. 67(4), pp. 897--907   [publisher]   [preprint]