350 rub
Journal Dynamics of Complex Systems - XXI century №3 for 2013 г.
Article in number:
Domain-specific technology for big data storing and processing within cloud computing platform CLAVIRE
Authors:
S.V. Kovalchuk - Ph.D. (Eng.), Senior Research Scientist, National Research University of Information Technologies, Mechanics and Optics
A.V. Razumovskiy - Ph.D. (Eng.), Associate Professor, National Research University of IT, Mechanics and Optics
A.I. Spivak - Ph.D. (Eng.), Associate Professor, National Research University of IT, Mechanics and Optics
Abstract:
The availability of huge data sets forces the development of a new scientific paradigm - data drive research, which is oriented to the acquisition of knowledge from these data sets. The development of this paradigm is tightly related to the BigData technology, which is focused on the development of the toolbox for storing and processing of diverse data arrays within distributed environments. Today the toolbox is usually developed using MapReduce model. One of the key directions of the BigData technology development is integration with different software systems for complex systems simulation where the simulation process itself can be considered as a source of large data sets. Therefore, the specific toolbox for integration of computational models, data sources and processing software need to be developed. The presented work discuss the development of toolbox for the processing of large data arrays and simulation results within second-generation cloud computing environment CLAVIRE. One of the important features of the presented solution is development of interconnection technologies for integration of high-level objects: data sources, available to the user; and ways of access to them within the iPSE concept. Following requirements are considered: - automatic data scalability support (including load balancing); - remote management of the code, transmitted to the place of data storing for the purpose of effective data processing without network transmission; - support of domain-specific languages (DSL) for the purpose of task description using domain and data semantics; - integrated toolbox for Big Data analysis, processing and visualization as well as end-user support within these processes. For the sake of scalability experimental checking the test software was developed, for hydro-meteorological data processing and statistical analysis. The measurements shows that within considered interval the processing time depends linearly on the data size showing stable and scalable processing of large data sets with low overhead. The experiments show the speedup of 1.8 and 2.7 times on 2 and 3 nodes correspondingly (the parallel efficiency can be estimated as ~90 %). Thus, the developed solution shows the ability to organize the distributed process of large data arrays analysis using CLAVIRE platform. The key features of the developed solution are as follows: high-level semantic integration of distributed data processing with existing formal description of composite applications within the cloud computing environment; b) domain-specific end-user toolbox for description of the tasks being executed using distributed data storage; c) automatic interpretation of high-level task descriptions provided by the user into executable form.
Pages: 106-109
References

  1. Hey T., Tansley S., Tolle K. (ed.) The Fourth Paradigm. Data-Intensive Scientific Discovery // Microsoft. 2009. R. 252.
  2. Manyika J. [et al.] Big data: The next frontier for innovation, competition, and productivity // McKinsey Global Institute. June 2011. 156 p.
  3. Doulkeridis C., Nørvåg K. A survey of large-scale analytical query processing in MapReduce // The VLDB Journal. June 2013. 27 p.
  4. Apache Hadoop [http://hadoop.apache.org/]
  5. Apache Pig [http://pig.apache.org/]
  6. Hive [http://hive.apache.org/]
  7. Baranowski M., Belloum A., Bubak M. MapReduce Operations with WS-VLAM Workflow Management System // Procedia Computer Science. 2013. V. 18. R. 2599 - 2602.
  8. Vasil'ev V.N. i dr. CLAVIRE: oblachnaya platforma dlya obrabotki danny'x bol'shix ob''emov // Informaczionno-izmeritel'ny'e i upravlyayushhie sistemy'. 2012. T. 10. № 11. S. 7 - 16.
  9. Buxanovskij A. V., Koval'chuk S. V., Mar'in S. V. Intellektual'ny'e vy'sokoproizvoditel'ny'e programmny'e kompleksy' modelirovaniya slozhny'x sistem: konczepcziya, arxitektura i primery' realizaczii // Izvestiya vuzov. Priborostroenie. 2009. № 10. S. 5 - 24.
  10. Deursen A. van, Klint P., Visser J. Domain-Specific Languages: An Annotated Bibliography // ACM SIGPLAN Notices. 2000. V. 35. Issue 6. R. 26 - 36.
  11. Kovalchuk S.V. [et al.] Virtual Simulation Objects Concept as a Framework for System-Level Simulation // 2012 IEEE 8th International Conference on E-Science (e-Science). 2012. R. 1 - 8.