E.B. Tutov - Lecturer, Chair of Information Systems and Technologies, South-Western State University (Kursk). E-mail: firstname.lastname@example.org
The paper formalizes an association rules search and generation task in the terms of theory-set semantics. The paper contains basic de-finitions which are important for the correct association rules search task general formalization.
The paper presents a general association rules search scheme. The key phase of this scheme is a large sets (itemsets) search procedure. Large sets are the sets for which a portion of transaction containing these sets are more or equal to the value of the minimum support defined by the user.
The paper presents the canonical Apriori algorithm. The paper formalizes the Apriori-property in the theory-set terms, which articulates that any subset of a large set is also large. This property is used in a candidate-itemsets generation and sets apart the Apriori algorithm family (Apriori, AprioriTid, AprioriHybrid) from other earlier introduced algorithms (AIS, SETM etc.).
The paper shows that the Apriori algorithm is featured as an iterative and incremental algorithm. Large itemsets containing only one element are seeked during the first step. The candidate-itemsets generation based on the large itemsets founded on the previous phase is performed on the next steps. After that the verification of generated candidate-itemsets to be large is performed. If candidate-itemsets are large they are added to the result set and are used as seminal sets for candidate-set generation on the next phase. The algorithm finishes its work when there are no large itemsets founded during the current step.
The paper articulates the necessity of a more detailed exploration of the one-element large itemsets search task. Just these itemsets will be a core for further elements generation.
R.,Srikant R. Fast algorithms for mining
association rules // Proceedings of the International Conference on Very Large
Databases (VLDB). 1994. P. 487–499.
R., Imielinski T., Swami A. Mining association rules between sets of items in
large databases // In Proc. of the ACM SIGMOD Conference on Management of Data.
Washington, D.C. May 1993.
Guozhu, Jian Pei, Guozhu Dong. Sequence data
mining // Springer Science+Business Media, LLC. 2007. P. 137.
M., Swami A. Set-oriented mining of association rules // Houtsma and Research Report
RJ 9567. IBM Almaden Research Center. San Jose,
California. October 1993.
H., Mannila H., Toivonen H., Verkamo A.I. Efficient algorithms for
discovering association rules / In KDD-94. AAAI Workshop on Knowledge Discovery
in Databases. July 1994.
I.H., Frank E., Hall M.A. Data mining practical machine learning tools and
techniques. Morgan Kaufmann Publishers. 3-d ed.
2011. P. 629.
BarsegjanA.A., KuprijanovM.S., StepanenkoV.V., KHolodI.I. Tekhnologiianalizdannykh: Data Mining, Visual Mining, Text Mining, OLAP. Izd-e 2-e, pererab. i dop. SPb.: BKHV-Peterburg. 2007.