Applying Reinforcement Learning in distribution computational system - Internet of Things

350 rub

Journal Dynamics of Complex Systems - XXI century №2 for 2020 г.

Article in number:

DOI: 10.18127/j19997493-202002-10

UDC: 004.75

Keywords: Internet of Things distributed systems distributed programs reinforcement learning multiarmed bandit algorithm parallel calculations

Authors:

О.Y. Eremin – Ph.D. (Eng),

Department «Computer Systems and Networks», Bauman Moscow State Technical University

E-mail: ereminou@bmstu.ru

M.V. Stepanova – Post-graduate Student,

Department «Computer Systems and Networks», Bauman Moscow State Technical University

E-mail: stepanova@bmstu.ru

Abstract:

Internet of Things (IoT) consists of numerous computational nodes which could be used to construct a distributed computational system. Traditional computational approaches for distributed systems could not be implemented into IoT infrastructure due to its nature: continuous number changing of computational nodes, continuous parameters changing in low-speed communication channel, high levels of interference in the radio channels. Thereby, an adaptive method should be designed to fulfill IoT mutability requirements and to ensure the distribution of computational tasks to its nodes. In this paper such adaptive method is described. The method is based on Multi-Armed Bandit (MBA) algorithm belonging to Reinforcement Learning methods section. According to the method Environment is represented by IoT infrastructure and a main computational task distribution node, represented as an Agent, interacts with it. Implementation of MBA in a core of the developed method allows controlling continuous changing of IoT structure. Also, algorithms parameters allow generating different strategies of Agent interaction during tasks nodes distribution.Agent does not require a comprehensive description of an environment to take a decision. Instead, it evaluates previously taken actions and received reactions from the environment by actions to take a decision. Such approach allows avoiding complexity and heterogeneity of IoT structure. The realized approach could be used in software development process for IoT infrastructure.

Pages: 84-167

References

Tanenbaum Je., van Steen M. Raspredelennye sistemy. Principy i paradigmy. SPb: Piter. 2003. 877 s. (In Russian).
Shevchuk E.V., Shevchuk Ju.V. Sovremennye tendencii v oblasti hranenija i obrabotki sensornyh dannyh. Programmnye sistemy: teorija i prilozhenija. 2015. №4(27). S. 157–176 (In Russian).
Mathur P., Desnoyers D., Ganesan P. Shenoy. Ultra-low power data storage for sensor networks. Proceedings of the Fifth International Conference on Information Processing in Sensor Networks. IPSN’06. 2006. P. 374–381.
Gel'fand I.M., Pjateckij-Shapiro I.I., Cetlin M.L. O nekotoryh klassah igr i igr avtomatov. Doklady AN SSSR. 1963. T. 152. № 4. S. 845-848 (In Russian).
Satton R.S., Barto Je.G. Obuchenie s podkrepleniem = Reinforcement Learning. M.: DMK press. 2020. 552 s.
Chernen'kij V.M., Semkin P.S. Metod opisanija processov vypolnenija zadanij v mul'tiprogrammnyh i mul'tiprocessornyh sistemah. Vestnik MGTU im. N.Je. Baumana. Ser. Priborostroenie. 2014. №1 (94). S. 121–132 (In Russian).
Yeckle J., Rivera W. Mapping and characterization of applications in heterogeneous distributed systems. Proceedings of the 7th World Multiconference on Systemics. Cybernetics and Informatics (SCI2003). 2003. P. 1-6.
Voevodin V.V., Voevodin Vl.V. Parallel'nye vychislenija. SPb: BHV-Peterburg. 2002. 608 s. (In Russian).
Mischel W., Ebbesen E.B., Zeiss A.R. Cognitive and attentional mechanisms in delay of gratification. Journal of personality and social psychology. 1972. V. 21. № 2. P. 204-218.
Akanmu S., Garg R., Gilal A. Towards an Improved Strategy for Solving Multi-Armed Bandit Problem. International Journal of Innovative Technology and Exploring Engineering (IJITEE). 2019. V. 8. № 12.
Mignona A., Rocha R. An Adaptive Implementation of ε-Greedy in Reinforcement Learning. Procedia Computer Science 109C. 2017. P. 1146–1151
Peter A., Fischer N., Fischer P. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning. 2002. № 47. P. 235–256.

Date of receipt: 5 мая 2020 г.