V.I. Munerman – Ph. D. (Eng.), Associate Professor, Department of Informatics, Smolensk State University
In the article two methods of increasing the efficiency of data processing in solving problems of derivation of associative rules are considered. Unlike most works in this field, which offer methods to improve the needs of end users involved in data analysis, it offers methods aimed at the programmer of the developer of analytical information systems.
The derivation of associative rules consists of two stages:
1. Construct for each subset of properties the number of objects that have all the properties of this subset, and only these properties.
2. Derivation of associative rules based on the received statistical data and the requirements of the user-analyst.
We consider the first stage, which realizes the preparation of data and has a high computational complexity. Acceleration can be achieved by applying a symmetrical horizontal distribution of the original data and the pipeline method of executing the chain of JOIN operations. This is possible due to the representation of data and operations by means of the file model. The possibility of data re-presentation by multidimensional matrices over which a sequence of multiplication operations is performed is shown.
The results of a computational experiment are presented, which showed that the application of the methods proposed in the article makes it possible to develop parallel software that significantly accelerates the process of preparing data for the derivation of associative rules.
- Houtsma M., Swami A. Set-oriented mining of association rules. Research Report RJ 9567. IBM Almaden Research Center. San Jose, California. October 1993.
- Salim M., Yao X. Evolving SQL Queries for Data Mining // Lecture Notes in Computer Science. 2002. V. 2412. P. 62−67.
- Agrawal R., Imielinski T., Swami A. Mining Associations between Sets of Items in Massive Databases // Proc. of the 1993 ACM SIGMOD Int’l Conf. on Management of Data. 1993. P. 207−216.
- Agrawal R., Srikant R. Fast Discovery of Association Rules // Proc. of the 20th International Conference on VLDB. Santiago, Chile. September 1994.
- Srikant R., Agrawal R. Mining Generalized Association Rules // Proc. of the 21th International Conference on VLDB. Zurich, Switzerland. 1995.
- Srikant R., Agrawal R. Mining quantitative association rules in large relational tables // Proc. of the ACM-SIGMOD Conference on Management of Data. Montreal, Canada. June 1996.
- Pol U. Design and Development of Apriori Algorithm for Sequential to concurrent mining using MPI // International journal of Computers & Technology. 2013. V. 10. № 7. P. 1785−1790.
- Rechkalov T.V. Podhod k integratsii intellektualnogo analiza dannyih v relyatsionnuyu SUBD na osnove generatsii tekstov hranimyih protsedur // Vestnik Yujno-Uralskogo gos. un-ta. Ser. «Vyichislitelnaya matematika i informatika». Chelyabinsk: Izd-vo Yujno-Uralskogo gos. un-ta. 2013. T. 2. № 1. S. 114−121.
- Sidló C.I., Lukács A. Shaping SQL-based frequent pattern mining algorithms (Revised Selected and Invited Papers) // 4th International Workshop «Knowledge Discovery in Inductive Databases». KDID 2005. Springer. Heidelberg. P. 188−201.
- Munerman V.I. Modeli obrabotki bolshih obemov dannyih v sistemah massovogo parallelizma // Sistemyi vyisokoy dostupnosti. 2013. T. 9. № 1. S. 35−43.
- Munerman V.I., Munerman D.V. Algebraicheskiy podhod k postroeniyu programmno-apparatnyih kompleksov dlya povyisheniya effektivnosti massovoy obrabotki dannyih // Sovremennyie informatsionnyie tehnologii i IT-obrazovanie. 2015. T. 2. № 11. S. 391−396.
- Zaharov V.N., Munerman V.I. Parallelnyiy algoritm umnojeniya mnogomernyih matrits // Sovremennyie informatsionnyie tehnologii i IT-obrazovanie. 2015. T. 2. № 11. S. 384−391.