350 rub
Journal Highly available systems №2 for 2014 г.
Article in number:
Mapping of graph data models into a canonical model for the development of data intensive systems
Authors:
S. A. Stupnikov - Ph.D. (Eng.), Senior Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: ssa@ipi.ac.ru
Abstract:
The development of methods and facilities for operating of Big Data requires new approaches providing for coping with the diversity of various chaotically developed data models and their languages. For integration of the heterogeneous information resources the unification of their data models is required by their mapping into the canonical information model (serving as the generalized language in the environment of various resource models). Such mappings should preserve information and semantics of their data description language and semantics of the operations of their data manipulation language. This research is devoted to the unification of the graph models - the important and fastly developing kind of the existing data models. The essential modern graph models are discussed including their features, application domains, peculiarities of their use under big data manipulation. On the basis of the survey of the existing graph DBMSs the paper concludes that the majority of the existing graph databases are based on the property or attributed graphs in which attributes are assigned to the vertices and edges of the graph. The generalization of such models has been selected in this paper as the source data models to be unified. The issues of proof of information and operation semantics preserving by the mapping of such graph models into the object-frame canonical model applying the formal specification language AMN are considered.
Pages: 13-31
References

  1. The Fourth Paradigm: Data-Intensive Scientific Discovery. Eds. Tony Hey, Stewart Tansley, and Kristin Tolle. Redmond: Microsoft Research. 2009.
  2. Zakharov V. N., Kalinichenko L. A., Sokolov I. A., Stupnikov S. A. Konstruirovanie kanonicheskikh informatsionnykh modeley dlya integrirovannykh informatsionnykh sistem // Informatika i ee primeneniya. M.: IPI RAN, 2007. T. 1. Vyp. 2. C. 15 - 38.
  3. Kalinichenko L. A., Stupnikov S. A., Martynov D. O. SYNTHESIS: a Language for Canonical Information Modeling and Mediator Definition for Problem Solving in Heterogeneous Information Resource Environments. IPI RAN. 2007. 171 p.
  4. Stupnikov S. A., Skvortsov N. A., Budzko V. I., Kalinichenko L. A., Zakharov V. N. Metody unifikatsii netraditsionnykh modeley dannykh. // Sistemy vysokoy dostupnosti. № 1. 2014.
  5. Angles R. A Comparison of Current Graph Database Models. // Proc. of IEEE 28th International Conference on Data Engineering Workshops (ICDEW). 2012. P. 171 - 177.
  6. Robinson I., Webber J., Eifrem E. Graph Databases. O-Reilly Media. 2013. 212 p.
  7. Iordanov B. Hypergraphdb: a generalized graph database. // Proc. 2010 International Conference on Web-age information management (WAIM). Springer-Verlag, 2010. P. 25 - 36.
  8. Malewicz G., Austern M. H., Bik A. J. C., Dehnert J. C., Horn I., Leiser N., Czajkowski G. Pregel A System for Large-Scale Graph Processing. // Proc. of the 2010 ACM SIGMOD International Conference on Management of Data. 2010. P. 135 - 145.
  9. Apache Giraph Project. URL: http://giraph.apache.org/ (data obrashcheniya: 05.02.2014).
  10. Hadoop Project. URL: http://hadoop.apache.org/ (data obrashcheniya: 05.02.2014).
  11. Shao B., Wang H., Li Y. Trinity: a distributed graph engine on a memory cloud. // Proc. of the 2013 ACM SIGMOD International Conference on Management of Data. 2013. P. 505 - 516.
  12. Kyrola A., Blelloch G., Guestrin C. Graphchi: Large-scale graph computation on just a PC. // 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Berkeley: USENIX, 2012. P. 31 - 46.
  13. Trinity Manual. Microsoft Research Asia. 2012. 68 p. URL: http://research.microsoft.com/en-us/projects/trinity/trinitymanual.pdf (data obrashcheniya: 05.02.2014).
  14. Shao B., Wang H., Xiao Y. Managing and mining large graphs: systems and implementations. // Proc. of the 2012 ACM SIGMOD International Conference on Management of Data. 2012. P. 589 - 592.
  15. Sun Z., Wang H. (Hongzhi), Wang H. (Haixun), Shao B., Li J. Efficient Subgraph Matching on Billion Node Graphs. // Proceedings of the VLDB Endowment. 2012. V. 5. Iss. 9. P. 788 - 799.
  16. Neo4j Graph Database. URL: http://www.neo4j.org/ (data obrashcheniya: 05.02.2014).
  17. The Dex Graph Database Management System. URL: http://www.sparsity-technologies.com/dex.php (data obrashcheniya: 05.02.2014).
  18. Sarwat M., Elnikety S., He Y., Kliot G. Horton: Online Query Execution Engine for Large Distributed Graphs. // Conference: International Conference on Data Engineering (ICDE). 2012. P. 1289 - 1292.
  19. Bykov S., Geller A., Kliot G., Larus J., Pandya R., Thelin J. Orleans: Cloud Computing for Everyone. // Proc. of the 2nd ACM Symposium on Cloud Computing (SOCC). ACM. 2011.
  20. Titan Project. URL: https://github.com/thinkaurelius/titan/wiki (data obrashcheniya: 05.02.2014).
  21. Jouili S., Vansteenberghe V. An Empirical Comparison of Graph Databases. // International Conference on Social Computing (SocialCom). 2013. P. 708 - 715.
  22. Montag D. Understanding Neo4j Scalability. Neo Technology, 2013. URL: http://info.neotechnology.com/rs/neotechnology/ images/Understanding%20Neo4j%20Scalability(2).pdf (data obrashcheniya: 05.02.2014).
  23. The Neo4j Manual. 2013. URL: http://docs.neo4j.org/ (data obrashcheniya: 05.02.2014).
  24. RDF Primer. W3C Recommendation 10 February 2004. Eds. Manola F., Miller E. W3C, 2004. URL: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ (data obrashcheniya: 05.02.2014).
  25. Skvortsov N. A. Otobrazhenie modeli dannykh RDF v kanonicheskuyu model' predmetnykh posrednikov // Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii: Trudy XV Vserossiyskoy nauchnoy konferentsii RCDL'2013 (Yaroslavl', 14 - 17 oktyabrya 2013). Yaroslavl': YarGU im. P. G. Demidova, 2013. S. 202 - 209.
  26. Calvanese D., Giacomo G., Lenzerini M., Vardi M. Query Processing under GLAV Mappings for Relational and Graph Databases. // Proc. of the VLDB Endowment. 2012. V. 6. № 2. P. 61 - 72.
  27. Angles R., Gutierrez C. Survey of Graph Database Models. // ACM Computing Surveys. 2008. V. 40. № 1.
  28. Wood P. T. Query languages for graph databases. // ACM SIGMOD Record. 2012. V. 41. Iss. 1. P. 50 - 60.
  29. Kalinichenko L. A., Briukhov D. O., Martynov D. O., Skvortsov N. A., Stupnikov S. A. Mediation Framework for Enterprise Information System Infrastructures. // Proc. of the 9th International Conference on Enterprise Information Systems ICEIS 2007. Funchal, 2007. V.: Databases and Information Systems Integration. P. 246 - 251.
  30. Abrial J.-R. The B-Book: Assigning Programs to Meanings. Cambridge: Cambridge University Press, 1996.
  31. Atelier B, the industrial tool to efficiently deploy the B Method. URL: http://www.atelierb.eu/index-en.php (data obrashcheniya: 05.02.2014).
  32. Stupnikov S. A. Unifikatsiya modeli dannykh, osnovannoy na mnogomernykh massivakh, pri integratsii neodnorodnykh informatsionnykh resursov. // Trudy RCDL'2012. Pereslavl'-Zalesskiy: Universitet g. Pereslavlya, 2012. S. 67 - 77.
  33. Theodoratos D. Semantic Integration and Querying of Heterogeneous Data Sources Using a Hypergraph Data Model // BNCOD 2002. Lecture Notes in Computer Science. V. 2405. Springer. 2002. P. 166 - 182.
  34. Sundaresan S. Hu G: Schema integration of distributed databases using hyper-graph data model. // Proc. of Information Reuse and Integration Conf, IRI 2005. 2005. P. 548 - 553. ISBN: 0-7803-9093-8.
  35. Tahat A., Ling M.H.T. Mapping Relational Operations onto Hypergraph Model // Proc. of CoRR 2011. The Python Papers. 2011. V. 6. Iss. 1. P. 1.
  36. Kalinichenko L. A., Stupnikov S. A. Heterogeneous information model unification as a pre-requisite to resource schema mapping // A. D-Atri and D. Saccà (eds.), Information Systems: People, Organizations, Institutions, and Technologies (Proc. of the V Conference of the Italian Chapter of Association for Information Systems itAIS). - Berlin-Heidelberg: Springer Physica Verlag, 2010. P. 373 - 380.