Distributed neural network based solving of classification task on selective clustering approach

350 rub

Journal Neurocomputers №11 for 2010 г.

Article in number:

Keywords: distributed on-line learning bath learning χ 2-statistics selective clustering neural network classification

Authors:

V. V. Ayuyev

Abstract:

The paper describes an original model for the on-line classification task solving on continuous data. The model-s work is based on selective subset creation that is to proper describe all the data available. Interrelation analysis for every attribute in the selective subset versus classification attribute provides a set of the most worth attributes, based on which one can cluster all the data. Initial cluster centers are used for selecting instances for the proper neural networks, which is to train independently on the non-crossing data sets. The classification task for MAGIC Telescope photos was used as a testing area. An open-access experimental database was large enough and quite unbalanced in a sense of target classes. Comparative analysis of the proposed model and traditional neural network based architecture, Bayes naive network, decision tables, decision trees, linear regression and K-nearest neighbor classifiers was fulfilled. The results showed slight lack of proposed model-s accuracy in comparison with the best known solution (which was the neural network), that was compensated by five times faster operation speed and lesser required memory for storing training instances. The optimal values of internal parameters were found as a result of analysis for exogenous and endogenous factors influence on model-s quality rates. The major role of packet processing algorithm in the model-s on-line learning phase was also discovered. Whereas this approach appeared as the main reason for classification accuracy decreasing, it became the cause of 2-3 times lesser amount of memory for storing training instances. It is expected that changing the current non-robust clustering method would affect on packet filtering method consequences. This could happen if the data in clusters would be better distributed and the neural networks are better specified.

Pages: 45-53

References

Айвазян С. А., Бухштабер В. М., Енюков Е. С. и др. Прикладная статистика. Классификация и снижение размерности - М.: Финансы и статистика. 1989.
Tan, P. N., Steinbach, M., Kumar, V. Introduction to Data Mining. New York: Addison Wesley, 2005.
Luger, G. F., Artificial Intelligence. Structures and Strategies for Complex problem Solving. 5-thEdition. Hearlow: AddisonWesley. 2005.
Логинов Б. М., Аюев В. В. Нейросетевые агенты в задачах управления с разделением по времени входными данными высокой размерности // Нейрокомпьютеры: разработка, применение. 2007. № 5. С. 29-39.
Аюев В. В., Тура А., Лайнг Н. Н. и др. Метод быстрой динамической кластеризации неоднородных данных // Системы управления и информационные технологии. 2008. № 3(33). С. 26-29.
Аюев В. В., Карпухин П. А. Кластерный метод подбора параметров и обучения на неполных данных искусственных нейронных сетей Хехт-Нильсона // Информатика и системы управления. 2009. № 1(19). С. 91-103.
Аюев В. В., Аунг З. Е., Тейн Ч. М. и др. Кластерный метод восстановления пропусков в данных для обучения ИНС // Нейрокомпьютеры: разработка, применение. 2009. № 7. С. 23-34.
Imam, I. F., Michalski, R. S., Kerschberg, L., Discovering Attribute Dependence in Datasets by Integrating Symbolic Learning and Statistical Analysis Techniques // Knowledge Discovery and Databases Workshop-93. New York. 1993. P. 264-275.
Тархов Д. А. Нейронные сети. Моделииалгоритмы. Кн. 18. / Общ. ред. А. И. Галушкина. М.: Радиотехника. 2005.
Bock, R. K., Chilingarian, A., Gaug, M., et al. Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope // Nuclear Instruments and Methods in Physics Research. 2004. V. 516. P. 511-528.
Хайкин С. Нейронные сети. Полный курс. 2-е изд. испр. М.: Вильямс. 2006.
Cufoglu, A., Lohi, M., Madani, K., Classification accuracy performance of Naïve Bayesian (NB), Bayesian Networks (BN), Lazy Learning of Bayesian Rules (LBR) and Instance-Based Learner (IB1) ? comparative study // International Conference on Computer Engineering & Systems. Cairo. 2008. P. 210-215.
Pooch, U. W., Translation of Decision Tables // ACM Computing Surveys. 1974. V. 6. No. 2. P. 125-151.
Дрейпер Н., Смит Г. Прикладной регрессионный анализ. Множественная регрессия. 3-е изд. М.: Диалектика. 2007.
Kohavi, R., Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid // Proc. In 2-nd Int. KDDM Conference. Portland. 1996. P. 202-207.
Cohen, J.,A Coefficient of Agreement for Nominal Scales // Educational and Psychological Measurement. 1960. V. 14. P. 37-46.
Agresti, A., Categorical Data Analysis. Hoboken: John Wiley and Sons. 2002.
Burset, M., Guigo, R., Evaluation of gene structure prediction programs // Genomics. 1996. No. 34. P. 353-367.
Dvorak, J., Savicky, P., Softening Splits in Decision Trees Using Simulated Annealing // Proceedings of ICANNGA-07. Warsaw. 2007. P. 721-729.
McGill, R., Tukey, J. W., Larsen, W. A., Variations of Boxplots // The American Statistician. 1978. V. 32. P. 12-16.