350 rub
Journal Highly available systems №1 for 2015 г.
Article in number:
Parallel implementation for solving optimization problems by means of databases
Authors:
V.I. Munerman - Ph. D. (Eng.), Associate Professor, Smolensk State University. E-mail: vimoon@gmail.com T.A. Samoylova - Ph. D. (Eng.), Associate Professor, Smolensk State University. E-mail: tatsam@hotbox.ru
Abstract:
The article describes a way to solve a wide range of optimization problems, whose common name would be «the shortest path problems». In practice, this class presents various problems. The weights of the edges of the graph can be elements of different Algebraic Systems. The choice of an Algebraic System is defined by the semantics of the domain, to which belongs the problem under consideration. Various algorithms for solving these problems are considered. The algorithm based on the computation of the transitive closure matrix of the weights of graph G, which is calculated as (Z is the null matrix), is selected. This algorithm is based on matrix multiplication and is therefore easily parallelized. But most of the advantages of that algorithm are lost in the case when G is a sparse matrix, in which most of the elements are neutral. The technology of storage and processing of sparse matrices is selected. That technology is based on the fact that in-memory computing system matrix is stored as a set of tuples of the form (i, j, w), where i, j are indexes of the element of the matrix, w is the weight of the edges between vertices i and j (the value of w is different from the additive neutral element of the Algebraic System, which belong to the weights of the edges). If you use such a representation of sparse matrices, they can be stored and processed in relational databases. In this case, the weight matrix of the graph can be given by a table with the scheme G(i, j, w). The query: SELECT Gk.i, G.j, Q(G.w×Gk.w) FROM Gk INNER JOIN G ON Gk.j = G.i GROUP BY G k.i, G.j; consequently performs all powers of table G, resulting in a sequence of tables with the scheme of Gk(i, j, w) (k = 1, ?, K). Q is a function that implements a group operation, such as Sum, Min, Max and the like. The program in the programming language Tranzact-SQL, which allows solving the problem under consideration, is shown. The basic operation used in solving the problem of the search of the shortest paths is JOIN operation. This operation has a large computational complexity. The time of its implementation on Big Data is large. A method is provided which allows speeding up the solution of the problem by a parallel implementing JOIN operation. This possibility is based on the principle of symmetric horizontal distribution of database tables. It is shown that the heuristic algorithm boustrophedon which distributes pair fragments tables Gk and G between calculators constituting firmware computing system can also be implemented by means of databases. The program in the programming language Tranzact-SQL, which implements the algorithm, is shown. The following conclusions are made: when solving optimization problems, which belong to a class of problems of the shortest path, it is possible to use methods based on databases; these methods are applied most effectively by using algorithms in the matrix when the weights matrix is spars; the modern data manipulation language can effectively implement the matrix algorithms and heuristic algorithms such as the boustrophedon algorithm; the approach based on the use of the symmetric horizontal distribution of data principle can effectively solve optimization problems on graphs, by constructing a virtual software-hardware- complexes for parallel implementation of mass data processing, specifically for each task.
Pages: 18-22
References

 

  1. Fletcher P., Hoyle H., Patty C.W. Foundations of Discrete Mathematics. Boston: PWS-KENTPub. Co. 1991. 781 p.
  2. Lidl R., Pilz G. Applied abstract algebra, 2nd edition. Undergraduate Texts in Mathematics. Springer. 1998. 488 p.
  3. Gross J.L., Yellen J. Handbook of Graph Theory, Discrete Mathematics and Its Applications. CRCPress. 2004. 779 p.
  4. Aho A., Ullman J.D., Hopcroft J.E. Data Structures and Algorithms. AddisonWesley. 1983.
  5. Agarwal R.C., Gustavson F.G., Zubair M. A High-Performance Matrix-Multiplication Algorithm on a Distributed-Memory Parallel Computer Using Overlapped Communication // IBM Journal of Research and Development. 1994. № 38(6). P. 673−681.
  6. Gupta H., Sadayappan P. Communication Efficient Matrix-Multiplication on Hypercubes. Technical Report // Stanford InfoLab. Publication Note: Sixth Annual ACM Symposium on Parallel Algorithms and Architectures. 1994. (SPAA 1994). ExtendedversioninParallelComputing. 1996. № 22. P. 75−99.
  7. Pissanetzky Sergio Sparse Matrix Technology. AcademicPress. 1984. 321 p.
  8. Munerman V.I. The Object-Oriented Model of Mass Data Processing // Highly available systems. 2011. V. 7. № 4. P. 72−74.
  9. Munerman V.I. A multidimesional matrix model of mass data processing // Highly available systems. 2012. V. 8. № 1. P. 19−22.
  10. Levin N.A., Munerman V.I. Models of big data processing in massively parallel systems // Highly available systems. 2013. V. 9. № 2. P. 36−39.
  11. Munerman V.I. Implementation of big data processing on symmetric multiprocessing systems // Highly available systems. 2013. V. 9. № 3. P. 35−43.
  12. Munerman V.I. The experience of massive data processing in the cloud using Windows Azure (as an example) // Highly available systems. 2014. V. 10. № 2. P. 3−8.