Application of information-theoretical approach to problem of multilayered perceptron training

350 rub

Journal Neurocomputers №3 for 2011 г.

Article in number:

Keywords: artificial neural network training Renyi-s entropy

Authors:

O. A. Morozov, P. E. Ovchinnikov, Yu. A. Semin, V. R. Fidelman

Abstract:

In the work in the question the method of training of artificial neural network (ANN) based on information criterion corresponding expression of entropy of error is presented. Minimization of entropy of error provides the fullest use of the information of training set that result in improvement of characteristics of neural network classifier: reduction of training time, increasing of classification efficiency. In the work the method of training consists in optimization of functional, based on expression for entropy of error in the form of Renyi is presented. Because of entropy doesn't depend on a mean value of random variable after optimization of entropy errors generally will be nonzero. To achieve zero (small) errors we brought in training functional the second term: the sum of squares of errors. Such restriction non-linear vary depending on errors, therefore the traditional scheme of optimization through search of Lagrangian multipliers is inapplicable. In the work following method for functional optimization is proposed. The optimization step consists in application of correction s into weights of neural network separately on each element of training set by the error back-propagation method. Entropy is minimized on each step by dynamic change of choice frequency of samples for training for keeping approximate equality of errors (such distribution of probabilities of errors corresponds to the minimum entropy). In addition to simultaneous optimization of entropy and mean value of error our approach allows to exclude computational-consuming procedure of direct calculation of entropy gradient. In the work is proposed to do alignment of errors by use of distribution of probabilities corresponding to errors for random choice of a sample from training set. The proposed method has been used for network training in a test problem of phoneme classification. The following experiment was repeatedly made to collect statistics training speed. The network was trained to threshold value of the sum error of classification for training set, and the number of the spent iterations was marked. At training initial values of weights of a neural network assigned in a random way from some range near zero. Average value of recognition rate on some range of initial values of parameters was calculated during experiment. The standard deviation of time of training was used for estimation of dependence on a choice of initial conditions (stability of training). As a result of use of the proposed training method the extra time spent for calculation of probability distribution and generation of random numbers, corresponding to this distribution, is compensated by reduction of number of training steps. For large neural networks, the additional time for distribution calculation is negligibly small in comparison with the main calculations. The economy of training time by the proposed method exists at any choice of parameters. The size of a relative gain depends on initial conditions and a training step. Besides, the considerable difference in a standard deviation of number of training steps for the traditional and proposed methods denotes to relative stability of the proposed training method. The proposed training method of allows to reduce training time significantly. The economy of time depends on initial conditions and step. The least times of training for the traditional and proposed algorithms differ in 10 times. In problems where training time is limited, at synthesis of algorithms with adaptive adjustment of ANN weights, for example, using of the proposed method allows to obtain better classifier at equal conditions.

Pages: 29-33

References

Колмогоров А. Н.О представлении непрерывных функций нескольких переменных в виде суперпозиций непрерывных функций одного переменного и сложения // Докл. АН СССР. 1957. Т. 114. № 5. С. 953-956.
Овчинников П. Е., Семин Ю. А.Влияние способа параметризации звукового сигнала на эффективность распознавания фонем персептроном // Изв. вузов. Радиофизика. НижнийНовгород. 2007. Том L. № 4. С. 350-356.
Rumelhart, D. E., Hinton, C. E., Williams, R. J.,Learning Internal Representations by Error Propagation/ In D. E. Rumelhart & J. L. McClelland (Eds.). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. V. 1: Foundations. MIT Press.
Erdogmus, D., Principe, J. C.,Generalized Information Potential Criterion for Adaptive System Training // IEEE Transactions on Neural Networks. 2002. V. 13. № 5. P. 1035-1044.
Freeman, J. A., Skapura, D. M.,Neural networks: algorithms, applications, and programming techniques / Addison-Wesley Publishing Company. 1991.
Cybenko, G.,Approximation by Superpositions of a Sigmoid Function // Mathematics of Control, Signals, and Systems. 1989. V. 2. P. 303.