文章基本信息

标题：A modified Fastmap algorithm.
作者：Radulescu, Florin ; Boicea, Alexandru
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Local searches involve finding a resource in a specified geographical area. This kind of searches account already more than 10 percent of all Internet queries and are expected to grow (Helft, 2006).

A modified Fastmap algorithm.

Radulescu, Florin ; Boicea, Alexandru

1. INTRODUCTION

Local searches involve finding a resource in a specified geographical area. This kind of searches account already more than 10 percent of all Internet queries and are expected to grow (Helft, 2006).

These approaches are almost all based on real maps. In (Radulescu, 2007) we described a simple alternative method for local searches. This method uses a graph as a simplified map for a city or town and a simple distance estimation method from a specified point to a resource for ordering the search result from 'near' to 'far' without using real maps and visual interfaces.

A real map is actually like a graph and there are two ways to code a map in a graph:

1) Street segments as graph edges

2) Streets as nodes: every street is a node and every intersection between any two streets is an edge.

The second modeling approach was selected in this paper for the simplified map of a city because it leads to a much simple representation.

In such a graph the distance between any two nodes may be defined as the length of the minimal path between them. For equal-weighted edges (the length of each edge is 1), the meaning is "how many streets away is one street from another".

Storing the graph or all distances between any two streets is either slow or memory consuming. The idea investigated in this paper was to create a Euclidian space knowing only the distances between any two points. This operation is called multidimensional scaling (Torgerson, 1952). In that case every street may be represented by few coordinates and so the distance between any two streets can be easily computed.

There are many algorithms for multidimensional scaling: FastMap (Faloutsos & Lin, 1995; Yang et al., 2006), MetricMap and Landmark MDS (LMDS). These algorithms approximate classical MDS using a subset of the data and fitting the remainder to the solution.

2. FASTMAP

2.1 Classical Fastmap

For our approach FastMap was selected to be used for multidimensional scaling of the simplified map. It is a recursive algorithm and at each step the following operations are done:

1) Two distant points (a, b)--nodes in the graph in our case are selected as an axis.

2) For every point c compute the coordinate x for this axis using the generalized Pythagoras theorem (also known as the law of cosine):

x = ([D.sup.2] (a, c) + [D.sup.2] (a, b) - [D.sup.2] (b, c))/ (2*D (a, b)) (1)

3) Use for further axis not the original distances D between points but the remainder D' after subtracting the distance with respect with the coordinates already computed.

In Fig. 1, (a, b) are the two points selected as axis, (x, y) are the coordinates of c and d for this axis (the first axis point a being the origin) and D and D' are the distances between c and d before and after computing the current axis coordinates:

[D'.sup.2] = [D.sup.2] - [(x - y).sup.2] (2)

This process stops after computing the desired number of coordinates for every point or no more axes can be found.

2.2 Problems with real graphs

For real graphs the problem is that the distance matrix is not a Euclidian one, so when we use (2) for computing D' we obtain for some pairs of nodes a negative value for square D'. In that case the only way to continue is to assume D' is 0.

But this assumption leads to propagated errors in computing real good coordinates for our purpose. The algorithm was run with this assumption on a 2000 nodes matrix and the optimal number of axes was 18 (too much) and also the average differences between real distances and computed distances (computed from the resulting coordinates) was high: 0.78 after 18 steps, as shown in Fig. 2.

2.3 Modified Fastmap

For solving this problem we tried different types of adjustments, modifying the original FastMap algorithm. One of them had very good results.

The modified algorithm is the following. The third step was introduced in order to recalculate the distance matrix and have integer distances between nodes.

Step 1: Search for a pair of points (a, b) with maximum distance between them. These points are selected as an axis.

Step 2: For every point c compute the coordinate x for this axis using the generalized Pythagoras theorem (1).

Step 3: For every pair of points c and d (Fig. 3.) recalculate the distance between c and d using:

[D.sup.2](c, d) = INT ([D.sup.2](c, e) + [D.sup.2](e, d)) (3)

Step 4: Use for further axis use D' as in the original algorithm, using (2), but D' is now computed starting from the modified distance matrix.

D(c, e) can be obtained directly from the coordinates of c and d for the current axis.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

D(e, d) = |D(d, d') - D(c, c')| and the two distances from the right side can be obtained using (1) for the two square triangles (a, c, c') and (b, d, d') where all two other edges are known.

3. EXPERIMENTAL RESULTS

This proposed algorithm was used on 12 distance matrices containing random generated data:

* 4 matrices for a graph with 1000 nodes and 4 edges for every node. In Table 1 the files are labeled F1-F4.

* 4 matrices for a graph with 2000 nodes and 4 edges for every node. In Table 1 the files are labeled F5-F8.

* 4 matrices for a graph with 2000 nodes and 3 edges for every node in Table 1 the files are labeled F9-F12.

Table 1 contains the results of running the algorithm for all 12 matrices and for only two steps (first two coordinates).

The table columns are the following: File: The file with the distance matrix. AvgErr: The average of Abs(Dreal(x, y) - Dcomputed(x, y)) over all pairs of nodes (x, y). Dcomputed(x, y) is the distance between nodes x and y resulting from their coordinates using the Euclidian distance (1). Diff 0: The number of matrix positions where | Dreal(x, y) - Dcomputed(x, y) | = 0. Diff 1: The number of matrix positions where | Dreal(x, y) - Dcomputed(x, y) | = 1. Diff >1: The number of matrix positions where | Dreal(x, y) - Dcomputed(x, y) | > 1. %0-1: The percentage of 0 and 1 difference values from the total: (Diff0 + Diff1) / (Diff) + Diff1 + Diff>1).

The results are clear: for over 95% of the graph node pairs, if the described algorithm is used for finding the first two coordinates and then this coordinates are used for computing the distance between the two nodes, the distance is the same or almost the same (0 or 1) with the original (real) distance. These first two coordinates can be pre-computed and stored. More than that, for some graphs this percentage is over 98-99%.

[FIGURE 4 OMITTED]

For a comparison between the original and modified algorithm, Fig. 4 presents the graphic for both (update for Fig. 2, for the same dataset).

The modified algorithm stops after only 6 steps but only first two coordinates are enough (and optimal). The original algorithm has a minimum after 18 steps and this is minimum is greater.

4. CONCLUSIONS

The results show that using this modified Fastmap algorithm we can associate with every node in a graph two coordinates that leads to a good approximation of the real distance between them. For many applications (including search engines that order the results form 'near' to 'far') it is then not necessary to store an entire graph distance matrix but only these few coordinates because the time for calculating any distance using Euclidian distance formula is reasonable short.

5. REFERENCES

Helft, M. (2006), The Retooling of a Search Engine, New York Times, December 4th 2006.

Radulescu, F. (2007). "Locality Sensitive Search: A Fast Approach", Proceedings of the CSCS16, Bucharest, May 2007, vol. 2, pp. 192-195.

Torgerson, W.S. (1952). Multidimensional Scaling: Theory and Method, Psychometrika, vol 17, 1952, pp. 401-419.

Christos F. & Lin K.I. (1995). FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May, 1995.

Yang, T., Liu, J., McMillan, L., Wang, W. (2006). A Fast Approximation to Multidimensional Scaling, Proceedings of the ECCV Workshop on Computation Intensive Methods for Computer Vision, Graz, Austria, May 2006.

Tab 1. Experimental results

File AvgErr Diff 0 Diff 1

 F1 0.17 419.680 77.024
 F2 0.16 423.536 74.221
 F3 0.17 416.155 82.031
 F4 0.16 423.264 74.831
 F5 0.20 1.604.398 384.910
 F6 0.34 1.431.513 472.224
 F7 0.18 1.645.942 344.927
 F8 0.37 1.387.147 493.416
 F9 0.29 1.440.510 538.753
F10 0.28 1.495.317 457.619
F11 0.28 1.456.455 527.476
F12 0.30 1.419.806 559.414

File Diff > 1 % 0-1

 F1 3.796 99.24
 F2 2.743 99.45
 F3 2.314 99.54
 F4 2.405 99.52
 F5 11.692 99.42
 F6 97.263 95.14
 F7 10.131 99.49
 F8 120.437 93.98
 F9 21.737 98.91
F10 48.064 97.60
F11 17.069 99.15
F12 21.780 98.91