350 rub
Journal Dynamics of Complex Systems - XXI century №3 for 2013 г.
Article in number:
Distributed scheduling for composite applications in hierarchical cloud environments
Authors:
P.A. Smirnov - Post-graduant Student. E-mail: smirnp@gmail.com
A.V. Boukhanovsky - Dr. Sc. (Eng.), Professor, National Research University of Information Technologies, Mechanics and Optics. E-mail: avb@mail.ru
Abstract:
A market-based multiagent algorithm (MA) for scheduling composite applications in hierarchical cloud environment is proposed in the article. The algorithm was tested on resources of CLAVIRE platform, aimed to organize computational environments for executing composite applications (CA) according to SaaS and AaaS approaches. CA in CLAVIRE are describing in form of workflow, using special domain-oriented language called EasyFlow. Workflow parsing and assigning tasks on computational resources are doing automatically during execution process. Meanwhile, CLAVIRE do not organize own low-level environment - it performs higher layer to integrate existing computational resources, environments, supercomputers, Grids, cloud environments with support Iaas and PaaS and groups of servers and PCs, interconnected by local networks. Hierarchical resource structure is based on ability to initialize resource-agents (R-agents further) ? independent computers, responsible for execution tasks on other computers from their local network. Scheduling plan is generating due to auctions, where R-agents provide their offers. Resource offer is execution plan, compiled of task estimations on nodes from local network only and contains whole CA or it-s parts. Such approach allows to minimize data transfer time, and total execution time as a result. R-agents, which are able to execute maximum tasks on resources of their local network are supposed to win the auctions. Every R-agent is responsible for automatic monitoring local nodes and providing for offers to attend auctions. Algorithm comparison experiment shows that multiagent algorithm is slower, rather than centralized, but scheduling time ~3 seconds on 1535 nodes is acceptable (total overhead time in CLAVIRE is about 5-12 seconds - depends on resource types). It-s worth to note, that increasing number of resources also increase MA speed. Comparing it with centralized algorithm we can see that speed gap decreases from 10 to 3 times (on 20 and 60 R-agents respectively). MA also demonstrates efficiency of market-policy - preferring the most "talented" R-agent, avoiding unnecessary data transfers through the Internet. Both algorithms show stable performance on more than 1500 nodes, equivalent to 100-140 TFlop peak performance.
Pages: 78-82
References

  1. Buxanovskij A.V., Vasil'ev V.N., Vinogradov V.N., Smirnov D.Ju., Suxorukov S.A., Japparov T.G. CLAVIRE - perspektivnaya texnologiya oblachny'x vy'chislenij vtorogo pokoleniya // Izvestiya vuzov. Priborostroenie. 2011. T. 54. № 10. S. 7 - 13.
  2. Ouelhadj D. et al. A multi-agent infrastructure and a service level agreement negotiation protocol for robust scheduling in grid computing // Advances in Grid Computing-EGC 2005. Springer Berlin Heidelberg, 2005. S. 651 - 660.
  3. Hamscher V. et al. Evaluation of job-scheduling strategies for grid computing // Grid Computing. GRID 2000. Springer Berlin Heidelberg, 2000. S. 191 - 202.
  4. Castillo C., Rouskas G. N., Harfoush K. On the design of online scheduling algorithms for advance reservations and QoS in grids //Parallel and Distributed Processing Symposium. 2007. IPDPS 2007. IEEE International. IEEE. 2007. S. 1 - 10.
  5. Garg S. K., Buyya R. Market-oriented resource management and scheduling: a taxonomy and survey //Cooperative Networking. 2011. S. 277 - 306.
  6. CLAVIRE: e-Science Infrastructure for Data-driven Computing/ Knyazkov K. V. et. al. // Journal of Computational Science. 2012. V. 3(6). R. 504 - 510.