S. V. Korelov, L. Yu. Rotkov
Presently, everyone, who has a personal computer or uses an e-mail, faced with spam.
We estimate the possibility of using the method of constructing genetic maps of texts for spam detection.
The method of genetic maps is an algorithm of probabilistic identification and selection of genes of data sequences. It based on a sequential piecewise approximation of the original sequence by the use of minimal number of sequences.
The criteria for assessing the applicability of this method for spam detection is the probability of an error of the spam skipping.
The method of genetic maps has been implemented in the form of the program.
In experiment as input texts the output (passed) spams-letters of antispam-system SpamAssassin are used.
The block of the content-analysis functioning on the basis of a method of genetic maps, has allowed to reveal from 25 % to 90 % of the spams-letters which have passed through antispam-system SpamAssassin. Combined usage of antispam-system SpamAssassin and the block of content-analysis on the basis of genetic maps has allowed to reveal from 85 % to 98 % of spams-letters.
Advantages of using of a method of genetic maps for spam detection problem are absence of restrictions on possibility of modification and creation of the new objects corresponding to spam. There are no restrictions as well on accumulation, tracing, constant update and optimization of genetic maps of spams-letters.