Distributed k-Nearest-Neighbors Graph (DKNNG)
As research in robot motion planning, biology, data mining, geographic information systems, and other scientific fields progressively addresses problems of unprecedented complexity, the demand for computing nearest-neighbors graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to single machines.
This work address the problem of computing the nearest-neighbors graph utilizing multiple processors communicating via message passing in a cluster system with no-shared memory and where the amount of memory available to each processor is limited.
DKNNG supports computation of very large nearest-neighbors graphs consisting of millions of points with hundreds of dimensions. The efficiency of DKNNG derives in part from a careful prioritization and handling of requests between processors and a systematic exploitation of properties of knn data structures. Experimental results show nearly linear speedup on 140 CPUs and indicate that similar speedup can be obtained on several hundred CPUs.
Input: Si : set of points stored in Pi Output: k nearest neighbors for each point in Si
Computation by processor Pi. All communications are done asynchronously.
1: initialize cache Ci 14: request results from P' 2: construct knn data structure Di 15: if received points then 3: while computing G(S,k) is not over do 16: if cache Ci is full then 4: if cache Ci is not empty then 17: create room in Ci 5: select and remove one point s from Ci 18: add received points to Ci 6: compute query Di(s, k) 19: if received results then 7: send results to owner of s 20: update results 8: if cache Ci is not full then 21: if no pending requests then 9: post request to fill Ci 22: select one point s from Si 10: if received "cache is not full" then 23: compute query Di(s, k) 11: P' <-- processors posting the request 24: update results 12: select points from Si to send to P' 25: end while 13: send the selected points to P'
Model of computation: Processors communicate via message passing and there is no-shared memory available. There are also restrictions on the memory available to each processor P1,...,Pp. The data set S={s1,...,sn} is partitioned into subsets S1,...,Sp, where each Si is localy stored in Pi. This model of computation is particularly well-suited for many scientific applications such as robot motion planning, biological applications, etc., which often generate very large data sets and require the computation of the knn graph for such data sets. Such applications are computationally intensive and the computation of the knn graph G(S, k) constitutes only one stage of the overall computation. Therefore, effective distribution schemes, as the one developed in this work, should use most of the main memory of each processor to store as much of the data set as possible.
Local nearest neighbors: Each processor Pi constructs a knn data structure Di for the computation of the local k-nearest neighbors from the set Si for any query point s ∈ S.
Data dependencies: The computation of the knn query knn(s, k) for a point s ∈ Si requires the combination of results obtained from querying the knn data structures D1,...,Dp associated with the processors P1,...,Pp, respectively, since the data set S is partitioned into S1,...,Sp.
Distribution of computation: The computation of knn queries for points owned by processor Pi requires no communication, while the computation of knn queries for points owned by other processors requires communication. Each processor Pi has a limited cache Ci, thus only a small number of points from other processors can be stored in Ci. In order to accommodate as many points from other processors as possible, it is important that Pi empties its cache Ci quickly. Hence, before computing any knn queries for points it owns, Pi first computes any knn queries pending in the cache Ci. Furthermore, in order to minimize the idle or waiting time of other processors, processor Pi also handles any communication requests made by other processors before computing any knn queries for the points it owns. In this way, Pi gives higher priority to requests from other processors and computes knn queries for points it owns only when there are no pending requests from other processors. The overall effect of this schema is shorter idle and waiting times for each processor which translates into better utilization of resources for useful computations.
- Plaku E and Kavraki LE (2007): "Distributed Computation of the knn Graph for Large High-Dimensional Point Sets." Journal of Parallel and Distributed Computing, vol. 67(3), pp. 346--359 [publisher] [preprint]