__Keywords:__artificial neural networks algorithmic information theory minimum description length principle forecasting learning dynamic systems

A.S. Potapov

Artificial neural networks (ANNs) are the widely used tool for solving the task of forecasting. However, their advantages should be strictly described in order to perform their further improvement. To achieve this aim, the problem of ANN learning is considered as the task of inductive inference within the algorithmic information theory framework. Efficiency of ANNs as model representations is specified by compactness of descriptions of regularities, which presence is expected in the data to be extrapolated, because model description length corresponds to amount of training data necessary for their reconstruction.
Automatic construction of models of dynamic systems is necessary for solving the forecasting tasks, so such the models should be representable within ANNs and have low complexity (description length). As far as classical ANNs with nonlinear activation functions don’t possess these properties (the basic elementary functions aren't representable with them), modification of ANN formalism was performed. Dynamic (recurrent with continuous time) ANNs with linear activation function were taken as the basis, because combinations of harmonic, polynomial, and exponential functions are representable with them. Although approximation of other functions with arbitrary precision with the use of such ANNs is possible in theory, corresponding extrapolation precision will always be restricted in practice. Therefore, improvement of expressive power of underlying representation is necessary in order to extend capabilities of ANNs.
In order to preserve representability of mentioned elementary functions, extension of linear dynamic ANNs by introduction of «connections on connections» with nonlinear effect on signals propagating through ordinary connections is proposed instead of introducing the nonlinear activation functions. A general algorithm of training and architecture selection of such the networks is offered on the base of the minimum description length criterion that helps to avoid the overfitting problem and to provide the best forecasting precision (reachable within the selected representation).
Experimental validation on model data showed low (about 1% on the doubled time interval) extrapolation error for time series built with the use of elementary functions. More complex, in particular, chaotic regularities are also representable. However, real time series that are non-stationary and chaotic require additional development of both neural representation of models and algorithms of their optimization for further improvement of their forecasting efficiency.

References: