350 rub
Journal Information-measuring and Control Systems №6 for 2015 г.
Article in number:
Canonical algorithm Apriori detail elaboration
Authors:
E.B. Tutov - Lecturer, Chair of Information Systems and Technologies, South-Western State University (Kursk). E-mail: tutov-e-b@yandex.ru
Abstract:
The paper formalizes an association rules search and generation task in the terms of theory-set semantics. The paper contains basic de-finitions which are important for the correct association rules search task general formalization. The paper presents a general association rules search scheme. The key phase of this scheme is a large sets (itemsets) search procedure. Large sets are the sets for which a portion of transaction containing these sets are more or equal to the value of the minimum support defined by the user. The paper presents the canonical Apriori algorithm. The paper formalizes the Apriori-property in the theory-set terms, which articulates that any subset of a large set is also large. This property is used in a candidate-itemsets generation and sets apart the Apriori algorithm family (Apriori, AprioriTid, AprioriHybrid) from other earlier introduced algorithms (AIS, SETM etc.). The paper shows that the Apriori algorithm is featured as an iterative and incremental algorithm. Large itemsets containing only one element are seeked during the first step. The candidate-itemsets generation based on the large itemsets founded on the previous phase is performed on the next steps. After that the verification of generated candidate-itemsets to be large is performed. If candidate-itemsets are large they are added to the result set and are used as seminal sets for candidate-set generation on the next phase. The algorithm finishes its work when there are no large itemsets founded during the current step. The paper articulates the necessity of a more detailed exploration of the one-element large itemsets search task. Just these itemsets will be a core for further elements generation.
Pages: 58-62
References

 

  1. Agrawal R.,Srikant R. Fast algorithms for mining association rules // Proceedings of the International Conference on Very Large Databases (VLDB). 1994. P. 487-499.
  2. Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases // In Proc. of the ACM SIGMOD Conference on Management of Data. Washington, D.C. May 1993.
  3. Dong Guozhu, Jian Pei, Guozhu Dong. Sequence data mining // Springer Science+Business Media, LLC. 2007. P. 137.
  4. Houtsma M., Swami A. Set-oriented mining of association rules // Houtsma and Research Report RJ 9567. IBM Almaden Research Center. San Jose, California. October 1993.
  5. Mannila H., Mannila H., Toivonen H., Verkamo A.I. Efficient algorithms for discovering association rules / In KDD-94. AAAI Workshop on Knowledge Discovery in Databases. July 1994.
  6. Witten I.H., Frank E., Hall M.A. Data mining practical machine learning tools and techniques. Morgan Kaufmann Publishers. 3-d ed. 2011. P. 629.
  7. BarsegjanA.A., KuprijanovM.S., StepanenkoV.V., KHolodI.I. Tekhnologiianalizdannykh: Data Mining, Visual Mining, Text Mining, OLAP. Izd-e 2-e, pererab. i dop. SPb.: BKHV-Peterburg. 2007. 384 s.