350 rub
Journal Neurocomputers №2 for 2009 г.
Article in number:
ViPNet. Distributed computing in a high-performance cluster
Authors:
A. V. Urivskiy, A. L. Chmora
Abstract:
Clusters are a well-known and popular approach to high performance and high availability computing. Controllable performance and fault-tolerance levels are achieved through distribution of tasks among a number of computing machines. In this paper we mainly discuss the problem of efficient load balancing. First we define the problem of load balancing and discuss characteristics of the cluster and a stream of tasks relevant to that problem. We restrict ourselves to deterministic online balancing algorithms, when balancing is done immediately when the task is arrived, as the most realistic scenario for network applications. Also we assume that the exact characteristics of the task stream are unknown a priori even statistically. In our discussion we follow the setting of uniformly related machines without task reassignments, preemptions or restarts. Then different objective functions (performance criteria) are considered, in particular the makespan (the time when the last task is completed) and the average performance (the number of operations made by the cluster at a specified interval of time). Two algorithms are evaluated - the famous "greedy" algorithm and the so-called "double" algorithm. The former sends the task to the fastest machine among least loaded, while the letter on the contrary sends the task to the slowest machine among those able to complete the task to the specified point of time. The competitive ratios (a measure of how good an algorithm is with respect to the optimal offline algorithm) of both algorithms are discussed. Though they are quite different in worst case scenario, in most cases likely to occur in practice they exhibit similar performance. Despite that we propose a method for online switching between the two algorithms running in parallel for tuning to the changing task stream. We also give due attention to the practical problem of evaluating the load vector of a task. The i-th component of the load vector shows how much time the task take the i-th machine to complete it. The evaluating method is periodical testing of machines in the cluster. We give two testing strategies. The first is when testing is done on a preliminary prepared set of tasks. Another choice is when the load vector is assessed on real tasks handled by the cluster. Techniques for load balancing described in this paper are employed in ViPNet software produces by JSC "InfoTeCS".